Non-homogeneous Markov decision processes with a constraint (Q1378679)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Non-homogeneous Markov decision processes with a constraint
scientific article

    Statements

    Non-homogeneous Markov decision processes with a constraint (English)
    0 references
    0 references
    20 April 1999
    0 references
    This paper considers a finite state and action non-homogeneous Markov decision process with finite state and action sets. There are two reward functions. The goal is to maximize average rewards per unit time for one function subject to the constraint that the total discounted reward for another function is equal to a given value. For unconstrained problems, \textit{W. Hopp, J. Bean} and \textit{R. Smith} [Oper. Res. 35, No. 6, 875-883 (1987; Zbl 0651.90090)] introduced the notion of periodic forecast horizon optimality which was later studied by \textit{J. Bean, R. Smith} and \textit{J. Lasserre} [Math. Oper. Res. 9, 391-401 (1990)] under the name algorithmic optimality. A policy \(\pi\) is algorithmic optimal if there is a sequence of optimal \(N\)-horizon policies that converge to \(\pi\) as \(N\to\infty\). In this paper, the author defines algorithmic optimal policies for problems with the described constraint and provides conditions under which these policies are average optimal among policies satisfying this constraint.
    0 references
    constrained Markov decision process
    0 references
    algorithmic optimality
    0 references
    average optimality
    0 references

    Identifiers