Non-homogeneous Markov decision processes with a constraint (Q1378679)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Non-homogeneous Markov decision processes with a constraint |
scientific article |
Statements
Non-homogeneous Markov decision processes with a constraint (English)
0 references
20 April 1999
0 references
This paper considers a finite state and action non-homogeneous Markov decision process with finite state and action sets. There are two reward functions. The goal is to maximize average rewards per unit time for one function subject to the constraint that the total discounted reward for another function is equal to a given value. For unconstrained problems, \textit{W. Hopp, J. Bean} and \textit{R. Smith} [Oper. Res. 35, No. 6, 875-883 (1987; Zbl 0651.90090)] introduced the notion of periodic forecast horizon optimality which was later studied by \textit{J. Bean, R. Smith} and \textit{J. Lasserre} [Math. Oper. Res. 9, 391-401 (1990)] under the name algorithmic optimality. A policy \(\pi\) is algorithmic optimal if there is a sequence of optimal \(N\)-horizon policies that converge to \(\pi\) as \(N\to\infty\). In this paper, the author defines algorithmic optimal policies for problems with the described constraint and provides conditions under which these policies are average optimal among policies satisfying this constraint.
0 references
constrained Markov decision process
0 references
algorithmic optimality
0 references
average optimality
0 references
0 references
0 references