Hierarchical algorithms for discounted and weighted Markov decision processes (Q1416788)

In the recent paper, \textit{M. Abbad} and \textit{H. Boustique} [Oper. Res. Lett. 31, No. 6, 473--476 (2003; Zbl 1052.90097)] described a state decomposition and an algorithm for an average-reward Markov Decision Process (MDP) with finite state and action sets. The advantage of this algorithm is that, for a multichain problem, it solves a sequence of smaller problems. In fact, a version of this decomposition and this algorithm were described in 1973 in the paper by \textit{J. Bather} [Adv. Appl. Probab. 5, 541--553 (1973; Zbl 0275.90050)] which is not mentioned in Abbad and Boustique (loc. cit.). The current paper applies the same decomposition to compute optimal policies for discounted MDPs. It also describes an algorithm to find an ultimately stationary \(\epsilon\)-optimal policy for a criterion that is a linear combination of average rewards per unit time and discounted rewards. Though the authors assume that the average and discounted rewards are calculated for the same reward function, this assumption is not essential.

0 references

reviewed by

Eugene A. Feinberg

0 references

zbMATH Keywords

Markov decision process

0 references

discounted rewards

0 references

weighted criterion

0 references

decomposition