Hierarchical algorithms for discounted and weighted Markov decision processes (Q1416788): Difference between revisions
From MaRDI portal
Set profile property. |
Set OpenAlex properties. |
||
Property / full work available at URL | |||
Property / full work available at URL: https://doi.org/10.1007/s001860300290 / rank | |||
Normal rank | |||
Property / OpenAlex ID | |||
Property / OpenAlex ID: W2030072125 / rank | |||
Normal rank |
Latest revision as of 00:48, 20 March 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Hierarchical algorithms for discounted and weighted Markov decision processes |
scientific article |
Statements
Hierarchical algorithms for discounted and weighted Markov decision processes (English)
0 references
16 December 2003
0 references
In the recent paper, \textit{M. Abbad} and \textit{H. Boustique} [Oper. Res. Lett. 31, No. 6, 473--476 (2003; Zbl 1052.90097)] described a state decomposition and an algorithm for an average-reward Markov Decision Process (MDP) with finite state and action sets. The advantage of this algorithm is that, for a multichain problem, it solves a sequence of smaller problems. In fact, a version of this decomposition and this algorithm were described in 1973 in the paper by \textit{J. Bather} [Adv. Appl. Probab. 5, 541--553 (1973; Zbl 0275.90050)] which is not mentioned in Abbad and Boustique (loc. cit.). The current paper applies the same decomposition to compute optimal policies for discounted MDPs. It also describes an algorithm to find an ultimately stationary \(\epsilon\)-optimal policy for a criterion that is a linear combination of average rewards per unit time and discounted rewards. Though the authors assume that the average and discounted rewards are calculated for the same reward function, this assumption is not essential.
0 references
Markov decision process
0 references
discounted rewards
0 references
weighted criterion
0 references
decomposition
0 references