Average cost Markov decision processes: Optimality conditions (Q1176301)

scientific article

Language	Label	Description	Also known as
English	Average cost Markov decision processes: Optimality conditions	scientific article

Statements

instance of

scholarly article

0 references

title

Average cost Markov decision processes: Optimality conditions (English)

0 references

published in

Journal of Mathematical Analysis and Applications

0 references

publication date

25 June 1992

0 references

review text

The authors consider the following discrete-time Markov decision processes with long run expected average cost criterion: both state space \(X\) and action set \(A\) are Borel sets (i.e. Borel subsets of complete separable metric spaces) and, for each state \(x\in X\), a nonempty measurable subset \(A(x)\) of \(A\), which is the set of the admissible actions when the process is in state \(x\), is compact. The transition law \(q(\cdot\mid\cdot,\cdot)\) is a stochastic kernel on \(X\) given \(X\times A\) such that \(\int_ X v(y)q(dy\mid x,a)\) is a lower semi-continuous function in \(a\in A(x)\) for each \(x\in X\) and any bounded measurable function \(v\) on \(X\). The one-stage cost function \(c\) is bounded measurable on \(X\times A\) and lower semi-continuous in \(a\in A(x)\) for each \(x\in X\). The authors give ergodicity conditions with respect to the transition law \(q\) under which a duality theorem holds, that is, the existence of an optimal solution to the primal problem, which is equivalently a solution to the optimality equation for the Markov decision model with the average cost criterion, yields an optimal solution to the dual problem or the deterministic version and conversely, and furthermore the corresponding optimal values of the problems are equal. This result extends those of \textit{K. Yamada} [J. Math. Anal. Appl. 50, 579-595 (1975; Zbl 0323.90053)] and \textit{J. A. Filar} and \textit{T. A. Schultz} [Oper. Res. Lett. 7, 303-307 (1988; Zbl 0659.90095)] to the model with general Borel spaces. Also, using the concept of opportunity cost introduced by \textit{J. Flynn} [J. Math. Anal. Appl. 76, 202-208 (1980; Zbl 0438.90100); ibid. 144, 586-594 (1989; Zbl 0679.90084)], they show that a stationary policy determined from the optimality equation is strong average optimal.

0 references

zbMATH Keywords

duality theorem

0 references

strong average optimality

0 references

long run expected average cost criterion

0 references

ergodicity conditions

0 references

opportunity cost

0 references