On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model (Q581259)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model |
scientific article |
Statements
On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model (English)
0 references
1987
0 references
This paper investigates the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in the model of \textit{Guo Shizhen} [Math. Economics 1, 109-120 (1984) (Chinese)]. It is shown that, if \(\pi^*=(\pi_ 0^*\), \(\pi_ 1^*\), \(\cdot \cdot \cdot\), \(\pi^*_ n\), \(\pi^*_{n+1}\), \(\cdot \cdot \cdot)\) is a \(\beta\)-discounted optimal policy, then \((\pi^*_ 0\), \(\pi^*_ 1\), \(\cdot \cdot \cdot\), \(\pi^*_ n)^{\infty}\) for all \(n\geq 0\) is also a \(\beta\)-discounted optimal policy. Under some conditions we prove that a stochastic stationary policy \(\pi_ n^{*\infty}\) corresponding to the decision rule \(\pi^*_ n\) is also optimal for the same discounting factor \(\beta\). We have also shown that each \(\beta\)-optimal stochastic stationary policy \(\pi_ 0^{*\infty}\), \(\pi_ 0^{*\infty}\) can be decomposed into several decision rules to which the corresponding stationary policies are also \(\beta\)-optimal separately; and conversely, a proper convex combination of these decision rules is identified with the former \(\pi^*_ 0\). We have further proved that for any (\(\epsilon\),\(\beta)\)-optimal policy, say \(\pi^*=(\pi^*_ 0,\pi^*_ 1,...\), \(\pi^*_ n,\pi^*_{n+1}\), \(\cdot \cdot \cdot)\), \((\pi^*_ 0\), \(\pi^*_ 1\), \(\cdot \cdot \cdot\), \(\pi^*_{n-1})^{\infty}\) is \(((1-\beta^ n)^{- 1}\epsilon,\beta)\) optimal for \(n>0\). At the end of this paper we mention that the results about convex combinations and decompositions of optimal policies given by \textit{Luo Handong}, \textit{Liu Jiwei} and \textit{Xia Zhihao} [J. Huazhong (Central China) Univ. of Sci. and Technol. 14, No.4 (1986)] can be extended to our case.
0 references
\(\epsilon \)-optimal policy
0 references
\(\beta \)-discounted optimal policy
0 references
stochastic stationary policy
0 references
convex combinations
0 references
decompositions
0 references
0 references