Denumerable state nonhomogeneous Markov decision processes (Q805501)

The framework for this paper is: a denumerable number of states, \(\{\) \(i\}\) ; an infinite number of decision stages, \(\{\) \(k\}\) ; a finite set of actions, \(\{x^ i_ k\}\) for each i and k, and a set of infinite horizon strategies \(\{\) \(x\}\), whose k-th member is a policy k; a set of transition probability matrices \(\{P_ k(x_ k)\}\); a set of immediate reward vectors \(\{R_ k(x_ k)\}\); a discount factor \(0\leq \alpha \leq 1\); a set of expected discounted reward vectors, \(\{V_ 0(x,N)\}\) covering the first N decision stages if strategy x (a finite strategy, x(N), truncated to the first N decision stages) is used. Discount \((\alpha <1)\) and average \((\alpha =1\), lim.inf) optimality are defined for \(N\to \infty\). A strategy, \(x^*\), is algorithmically optimal, if there is a sequence \(\{N_ m\}\) such that \(\{x^*(N_ m)\}\) (i.e. optimal \(N_ m\)-stage strategies) converge to \(x^*\) in the topology induced by a specified metric p(.,.). The paper deals with algorithmic optimality and its relationship with other constructs. Various assumptions are made and each result is governed by some of these assumptions. Theorem 1 demonstrates the existence of algorithmically optimal strategies. Theorem 2 demonstrates that algorithmic optimality implies discount optimality. Theorem 5 demonstrates that algorithmic optimality implies average optimality. Theorem 6 demonstrates that an algorithmically optimal strategy is unique if and only if \(\{x^*(N)\}\) converges to \(x^*\), in the topology induced by the p(.,.) metric, for all choices of \(\{x^*(N)\}.\) Finally, using relative values, solution horizon extensions of some earlier work are given, in terms of algorithmic optimality, and an algorithm is given for finding, in a finite number of iterations, a first policy, \(x^*_ 0\), of an algorithmically optimal strategy \(x^*\).

0 references

reviewed by

Douglas J. White

0 references

zbMATH Keywords

denumerable number of states

0 references

infinite number of decision stages

0 references

finite set of actions

0 references

infinite horizon strategies