Denumerable state nonhomogeneous Markov decision processes (Q805501)

From MaRDI portal
Revision as of 08:04, 10 February 2024 by RedirectionBot (talk | contribs) (‎Removed claims)
scientific article
Language Label Description Also known as
English
Denumerable state nonhomogeneous Markov decision processes
scientific article

    Statements

    Denumerable state nonhomogeneous Markov decision processes (English)
    0 references
    0 references
    0 references
    1990
    0 references
    The framework for this paper is: a denumerable number of states, \(\{\) \(i\}\) ; an infinite number of decision stages, \(\{\) \(k\}\) ; a finite set of actions, \(\{x^ i_ k\}\) for each i and k, and a set of infinite horizon strategies \(\{\) \(x\}\), whose k-th member is a policy k; a set of transition probability matrices \(\{P_ k(x_ k)\}\); a set of immediate reward vectors \(\{R_ k(x_ k)\}\); a discount factor \(0\leq \alpha \leq 1\); a set of expected discounted reward vectors, \(\{V_ 0(x,N)\}\) covering the first N decision stages if strategy x (a finite strategy, x(N), truncated to the first N decision stages) is used. Discount \((\alpha <1)\) and average \((\alpha =1\), lim.inf) optimality are defined for \(N\to \infty\). A strategy, \(x^*\), is algorithmically optimal, if there is a sequence \(\{N_ m\}\) such that \(\{x^*(N_ m)\}\) (i.e. optimal \(N_ m\)-stage strategies) converge to \(x^*\) in the topology induced by a specified metric p(.,.). The paper deals with algorithmic optimality and its relationship with other constructs. Various assumptions are made and each result is governed by some of these assumptions. Theorem 1 demonstrates the existence of algorithmically optimal strategies. Theorem 2 demonstrates that algorithmic optimality implies discount optimality. Theorem 5 demonstrates that algorithmic optimality implies average optimality. Theorem 6 demonstrates that an algorithmically optimal strategy is unique if and only if \(\{x^*(N)\}\) converges to \(x^*\), in the topology induced by the p(.,.) metric, for all choices of \(\{x^*(N)\}.\) Finally, using relative values, solution horizon extensions of some earlier work are given, in terms of algorithmic optimality, and an algorithm is given for finding, in a finite number of iterations, a first policy, \(x^*_ 0\), of an algorithmically optimal strategy \(x^*\).
    0 references
    denumerable number of states
    0 references
    infinite number of decision stages
    0 references
    finite set of actions
    0 references
    infinite horizon strategies
    0 references
    algorithmic optimality
    0 references
    algorithmically optimal strategies
    0 references
    solution horizon extensions
    0 references

    Identifiers