Strong 1-optimal stationary policies in denumerable Markov decision processes (Q1108940)

From MaRDI portal





scientific article; zbMATH DE number 4068651
Language Label Description Also known as
default for all languages
No label defined
    English
    Strong 1-optimal stationary policies in denumerable Markov decision processes
    scientific article; zbMATH DE number 4068651

      Statements

      Strong 1-optimal stationary policies in denumerable Markov decision processes (English)
      0 references
      1988
      0 references
      Consider a Markov decision process with countable state space S, compact action sets and bounded rewards. Let \(V_{\alpha}(\pi,i)\) denote the expected \(\alpha\)-discounted reward under policy \(\pi\), starting in state i. \(\pi\) * is called a strong 1-optimal policy (SOP) if, for each \(i\in S\), \(\lim_{\alpha \to 1}[V_{\alpha}(\pi\) *,i)- \(\sup_{\pi}V_{\alpha}(\pi,i)]=O\). Under a standard set of assumptions (including the simultaneous Doeblin condition) for the existence of a stationary average optimal policy, the author proves that (i) a stationary SOP exists, (ii) any limit point, as \(\alpha\) \(\to 1\), of stationary \(\alpha\)-discounted optimal policies is a (stationary) SOP.
      0 references
      Markov decision process
      0 references
      countable state space
      0 references
      compact action sets
      0 references
      bounded rewards
      0 references
      \(\alpha\)-discounted reward
      0 references
      strong 1-optimal policy
      0 references
      simultaneous Doeblin condition
      0 references
      stationary average optimal policy
      0 references
      0 references
      0 references

      Identifiers