Markov control process with the expected total cost criterion: Optimality, stability, and transient models (Q1973302)

The authors study discrete-time Markov Control Processes (MCPs) on Borel spaces under the Expected Total Cost (ETC) criterion \[ V(\pi, x)= E^\pi_x\Biggl[ \sum^\infty_{t= 0} c(x_ t,a_t)\Biggr], \] where \(c(x_t, a_t)\) is the cost-per-stage function and is possibly unbounded [for the basic concepts and notations of the MCPs, cf. \textit{O. Hernández-Lerma} and \textit{J. B. Lasserre}, Discrete-time Markov control processes: Basic optimality criteria, Springer-Verlag, New York (1995; Zbl 0840.93001)]. A lot of optimality questions are answered affirmatively here. Conditions for a control policy to be ETC-optimal and conditions for the ETC-value function to be a solution to the dynamic programming equation are well provided. It is also shown that the finiteness of the ETC function may lead to two kinds of stability: Lagrange stability and stability with probability one. In addition, transient control models [cf. \textit{S. R. Pliska}, Dynamic programming and its applications, Proc. Int. Conf., Vancouver 1977, 335-349 (1978; Zbl 0458.90082)] are fully analyzed. In fact, with the authors' new results, the paper provides a fairly complete, up-dated, survey-like presentation of the ETC criterion for MCPs.

0 references

reviewed by

Wu Chengxun

0 references

zbMATH Keywords

policy iteration

0 references

discrete-time Markov control processes

0 references

expected total cost

0 references

dynamic programming