Hierarchical algorithms for discounted and weighted Markov decision processes (Q1416788)

From MaRDI portal
Revision as of 00:48, 20 March 2024 by Openalex240319060354 (talk | contribs) (Set OpenAlex properties.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
scientific article
Language Label Description Also known as
English
Hierarchical algorithms for discounted and weighted Markov decision processes
scientific article

    Statements

    Hierarchical algorithms for discounted and weighted Markov decision processes (English)
    0 references
    0 references
    0 references
    16 December 2003
    0 references
    In the recent paper, \textit{M. Abbad} and \textit{H. Boustique} [Oper. Res. Lett. 31, No. 6, 473--476 (2003; Zbl 1052.90097)] described a state decomposition and an algorithm for an average-reward Markov Decision Process (MDP) with finite state and action sets. The advantage of this algorithm is that, for a multichain problem, it solves a sequence of smaller problems. In fact, a version of this decomposition and this algorithm were described in 1973 in the paper by \textit{J. Bather} [Adv. Appl. Probab. 5, 541--553 (1973; Zbl 0275.90050)] which is not mentioned in Abbad and Boustique (loc. cit.). The current paper applies the same decomposition to compute optimal policies for discounted MDPs. It also describes an algorithm to find an ultimately stationary \(\epsilon\)-optimal policy for a criterion that is a linear combination of average rewards per unit time and discounted rewards. Though the authors assume that the average and discounted rewards are calculated for the same reward function, this assumption is not essential.
    0 references
    Markov decision process
    0 references
    discounted rewards
    0 references
    weighted criterion
    0 references
    decomposition
    0 references

    Identifiers