Hierarchical algorithms for discounted and weighted Markov decision processes (Q1416788): Difference between revisions

In the recent paper, \textit{M. Abbad} and \textit{H. Boustique} [Oper. Res. Lett. 31, No. 6, 473--476 (2003; Zbl 1052.90097)] described a state decomposition and an algorithm for an average-reward Markov Decision Process (MDP) with finite state and action sets. The advantage of this algorithm is that, for a multichain problem, it solves a sequence of smaller problems. In fact, a version of this decomposition and this algorithm were described in 1973 in the paper by \textit{J. Bather} [Adv. Appl. Probab. 5, 541--553 (1973; Zbl 0275.90050)] which is not mentioned in Abbad and Boustique (loc. cit.). The current paper applies the same decomposition to compute optimal policies for discounted MDPs. It also describes an algorithm to find an ultimately stationary \(\epsilon\)-optimal policy for a criterion that is a linear combination of average rewards per unit time and discounted rewards. Though the authors assume that the average and discounted rewards are calculated for the same reward function, this assumption is not essential.

0 references

reviewed by

Eugene A. Feinberg

0 references

zbMATH Keywords

Markov decision process

0 references

discounted rewards

0 references

weighted criterion

0 references

decomposition

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1007/s001860300290

0 references

Identifiers

zbMATH Open document ID

1069.90106

0 references

DOI

10.1007/s001860300290

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1416788

@@ Property / full work available at URL @@
+https://doi.org/10.1007/s001860300290
+Normal rank
@@ Property / OpenAlex ID @@
+W2030072125
@@ Property / OpenAlex ID: W2030072125 / rank @@
+Normal rank