Strong 1-optimal stationary policies in denumerable Markov decision processes (Q1108940)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Strong 1-optimal stationary policies in denumerable Markov decision processes
scientific article

    Statements

    Strong 1-optimal stationary policies in denumerable Markov decision processes (English)
    0 references
    1988
    0 references
    Consider a Markov decision process with countable state space S, compact action sets and bounded rewards. Let \(V_{\alpha}(\pi,i)\) denote the expected \(\alpha\)-discounted reward under policy \(\pi\), starting in state i. \(\pi\) * is called a strong 1-optimal policy (SOP) if, for each \(i\in S\), \(\lim_{\alpha \to 1}[V_{\alpha}(\pi\) *,i)- \(\sup_{\pi}V_{\alpha}(\pi,i)]=O\). Under a standard set of assumptions (including the simultaneous Doeblin condition) for the existence of a stationary average optimal policy, the author proves that (i) a stationary SOP exists, (ii) any limit point, as \(\alpha\) \(\to 1\), of stationary \(\alpha\)-discounted optimal policies is a (stationary) SOP.
    0 references
    Markov decision process
    0 references
    countable state space
    0 references
    compact action sets
    0 references
    bounded rewards
    0 references
    \(\alpha\)-discounted reward
    0 references
    strong 1-optimal policy
    0 references
    simultaneous Doeblin condition
    0 references
    stationary average optimal policy
    0 references

    Identifiers