Semi-infinite Markov decision processes (Q1974026)

This paper studies Markov Decision Processes (MDPs) with finite state and countable action sets. It deals with two major criteria for MDPs: the total discounted rewards and average rewards per unit time. For the total discounted criteria, the authors study the linear programming approach. In particular, they develop a semi-infinite linear program for the MDP and show that it can be approximated by finite linear programs associated with finite state and action MDPs, which are truncated versions of the original MDPs. The authors also investigate the question when the supremum of the objective criterion over the class of deterministic (or stationary, according to another terminology) policies is equal to the supremum over the class of all policies. Referee's remark: (i) Theorem 3.1, which states that the supremum of the total discounted rewards over all policies is equal to the supremum over deterministic policies, is a particular case of Theorem 5 (b) in \textit{D. Blackwell} [Ann. Math. Stat. 36, 226-235 (1965; Zbl 0133.42805)] where the finiteness of the state space was not assumed. (ii) Theorem 6.1, which states that in a communicating MDP the supremum of the average reward per unit time over all policies, is equal the supremum over randomized stationary policies is a particular case of Corollary 4.5 (C) in \textit{E. Feinberg} and \textit{H. Park} [Stochastic Processes Appl. 49, 159-177 (1994; Zbl 0787.60085)] where it was proved for a smaller class of deterministic policies and without additional Assumption 6.1.

0 references

zbMATH Keywords

semi-infinite Markov decision processes

0 references

optimal strategy

0 references

semi-infinite linear program

0 references

reviewed by

Eugene A. Feinberg

0 references