Denumerable state nonhomogeneous Markov decision processes (Q805501)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Denumerable state nonhomogeneous Markov decision processes
scientific article

    Statements

    Denumerable state nonhomogeneous Markov decision processes (English)
    0 references
    0 references
    0 references
    0 references
    1990
    0 references
    The framework for this paper is: a denumerable number of states, \(\{\) \(i\}\) ; an infinite number of decision stages, \(\{\) \(k\}\) ; a finite set of actions, \(\{x^ i_ k\}\) for each i and k, and a set of infinite horizon strategies \(\{\) \(x\}\), whose k-th member is a policy k; a set of transition probability matrices \(\{P_ k(x_ k)\}\); a set of immediate reward vectors \(\{R_ k(x_ k)\}\); a discount factor \(0\leq \alpha \leq 1\); a set of expected discounted reward vectors, \(\{V_ 0(x,N)\}\) covering the first N decision stages if strategy x (a finite strategy, x(N), truncated to the first N decision stages) is used. Discount \((\alpha <1)\) and average \((\alpha =1\), lim.inf) optimality are defined for \(N\to \infty\). A strategy, \(x^*\), is algorithmically optimal, if there is a sequence \(\{N_ m\}\) such that \(\{x^*(N_ m)\}\) (i.e. optimal \(N_ m\)-stage strategies) converge to \(x^*\) in the topology induced by a specified metric p(.,.). The paper deals with algorithmic optimality and its relationship with other constructs. Various assumptions are made and each result is governed by some of these assumptions. Theorem 1 demonstrates the existence of algorithmically optimal strategies. Theorem 2 demonstrates that algorithmic optimality implies discount optimality. Theorem 5 demonstrates that algorithmic optimality implies average optimality. Theorem 6 demonstrates that an algorithmically optimal strategy is unique if and only if \(\{x^*(N)\}\) converges to \(x^*\), in the topology induced by the p(.,.) metric, for all choices of \(\{x^*(N)\}.\) Finally, using relative values, solution horizon extensions of some earlier work are given, in terms of algorithmic optimality, and an algorithm is given for finding, in a finite number of iterations, a first policy, \(x^*_ 0\), of an algorithmically optimal strategy \(x^*\).
    0 references
    0 references
    denumerable number of states
    0 references
    infinite number of decision stages
    0 references
    finite set of actions
    0 references
    infinite horizon strategies
    0 references
    algorithmic optimality
    0 references
    algorithmically optimal strategies
    0 references
    solution horizon extensions
    0 references