On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model (Q581259)

From MaRDI portal
scientific article
Language Label Description Also known as
English
On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model
scientific article

    Statements

    On the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in discounted unbounded return model (English)
    0 references
    0 references
    0 references
    0 references
    1987
    0 references
    This paper investigates the properties of \(\epsilon\) (\(\geq 0)\) optimal policies in the model of \textit{Guo Shizhen} [Math. Economics 1, 109-120 (1984) (Chinese)]. It is shown that, if \(\pi^*=(\pi_ 0^*\), \(\pi_ 1^*\), \(\cdot \cdot \cdot\), \(\pi^*_ n\), \(\pi^*_{n+1}\), \(\cdot \cdot \cdot)\) is a \(\beta\)-discounted optimal policy, then \((\pi^*_ 0\), \(\pi^*_ 1\), \(\cdot \cdot \cdot\), \(\pi^*_ n)^{\infty}\) for all \(n\geq 0\) is also a \(\beta\)-discounted optimal policy. Under some conditions we prove that a stochastic stationary policy \(\pi_ n^{*\infty}\) corresponding to the decision rule \(\pi^*_ n\) is also optimal for the same discounting factor \(\beta\). We have also shown that each \(\beta\)-optimal stochastic stationary policy \(\pi_ 0^{*\infty}\), \(\pi_ 0^{*\infty}\) can be decomposed into several decision rules to which the corresponding stationary policies are also \(\beta\)-optimal separately; and conversely, a proper convex combination of these decision rules is identified with the former \(\pi^*_ 0\). We have further proved that for any (\(\epsilon\),\(\beta)\)-optimal policy, say \(\pi^*=(\pi^*_ 0,\pi^*_ 1,...\), \(\pi^*_ n,\pi^*_{n+1}\), \(\cdot \cdot \cdot)\), \((\pi^*_ 0\), \(\pi^*_ 1\), \(\cdot \cdot \cdot\), \(\pi^*_{n-1})^{\infty}\) is \(((1-\beta^ n)^{- 1}\epsilon,\beta)\) optimal for \(n>0\). At the end of this paper we mention that the results about convex combinations and decompositions of optimal policies given by \textit{Luo Handong}, \textit{Liu Jiwei} and \textit{Xia Zhihao} [J. Huazhong (Central China) Univ. of Sci. and Technol. 14, No.4 (1986)] can be extended to our case.
    0 references
    0 references
    \(\epsilon \)-optimal policy
    0 references
    \(\beta \)-discounted optimal policy
    0 references
    stochastic stationary policy
    0 references
    convex combinations
    0 references
    decompositions
    0 references
    0 references