Hierarchical algorithms for discounted and weighted Markov decision processes (Q1416788): Difference between revisions

In the recent paper, \textit{M. Abbad} and \textit{H. Boustique} [Oper. Res. Lett. 31, No. 6, 473--476 (2003; Zbl 1052.90097)] described a state decomposition and an algorithm for an average-reward Markov Decision Process (MDP) with finite state and action sets. The advantage of this algorithm is that, for a multichain problem, it solves a sequence of smaller problems. In fact, a version of this decomposition and this algorithm were described in 1973 in the paper by \textit{J. Bather} [Adv. Appl. Probab. 5, 541--553 (1973; Zbl 0275.90050)] which is not mentioned in Abbad and Boustique (loc. cit.). The current paper applies the same decomposition to compute optimal policies for discounted MDPs. It also describes an algorithm to find an ultimately stationary \(\epsilon\)-optimal policy for a criterion that is a linear combination of average rewards per unit time and discounted rewards. Though the authors assume that the average and discounted rewards are calculated for the same reward function, this assumption is not essential.

0 references

reviewed by

Eugene A. Feinberg

0 references

zbMATH Keywords

Markov decision process

0 references

discounted rewards

0 references

weighted criterion

0 references

decomposition

0 references

MaRDI profile type

MaRDI publication profile

0 references

Identifiers

zbMATH Open document ID

1069.90106

0 references

DOI

10.1007/s001860300290

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1416788

Revision as of 17:05, 31 January 2024 Import240129110113 (talk \| contribs) Bots 7,163,963 edits Added link to MaRDI item. ← Older edit	Revision as of 03:16, 5 March 2024 Import240304020342 (talk \| contribs) 4,416,906 edits Set profile property. Newer edit →
	Property / MaRDI profile type
		MaRDI publication profile
	Property / MaRDI profile type: MaRDI publication profile / rank
		Normal rank