On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model (Q581259)

This paper investigates the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in the model of \textit{Guo Shizhen} [Math. Economics 1, 109-120 (1984) (Chinese)]. It is shown that, if \(\pi^*=(\pi_ 0^*\), \(\pi_ 1^*\), \(\cdot \cdot \cdot\), \(\pi^*_ n\), \(\pi^*_{n+1}\), \(\cdot \cdot \cdot)\) is a \(\beta\)-discounted optimal policy, then \((\pi^*_ 0\), \(\pi^*_ 1\), \(\cdot \cdot \cdot\), \(\pi^*_ n)^{\infty}\) for all \(n\geq 0\) is also a \(\beta\)-discounted optimal policy. Under some conditions we prove that a stochastic stationary policy \(\pi_ n^{*\infty}\) corresponding to the decision rule \(\pi^*_ n\) is also optimal for the same discounting factor \(\beta\). We have also shown that each \(\beta\)-optimal stochastic stationary policy \(\pi_ 0^{*\infty}\), \(\pi_ 0^{*\infty}\) can be decomposed into several decision rules to which the corresponding stationary policies are also \(\beta\)-optimal separately; and conversely, a proper convex combination of these decision rules is identified with the former \(\pi^*_ 0\). We have further proved that for any (\(\epsilon\),\(\beta)\)-optimal policy, say \(\pi^*=(\pi^*_ 0,\pi^*_ 1,...\), \(\pi^*_ n,\pi^*_{n+1}\), \(\cdot \cdot \cdot)\), \((\pi^*_ 0\), \(\pi^*_ 1\), \(\cdot \cdot \cdot\), \(\pi^*_{n-1})^{\infty}\) is \(((1-\beta^ n)^{- 1}\epsilon,\beta)\) optimal for \(n>0\). At the end of this paper we mention that the results about convex combinations and decompositions of optimal policies given by \textit{Luo Handong}, \textit{Liu Jiwei} and \textit{Xia Zhihao} [J. Huazhong (Central China) Univ. of Sci. and Technol. 14, No.4 (1986)] can be extended to our case.

0 references

Mathematics Subject Classification ID

90C40

0 references

zbMATH DE Number

4018805

0 references

zbMATH Keywords

\(\epsilon \)-optimal policy

0 references

\(\beta \)-discounted optimal policy

0 references