Strong 1-optimal stationary policies in denumerable Markov decision processes (Q1108940)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Strong 1-optimal stationary policies in denumerable Markov decision processes |
scientific article |
Statements
Strong 1-optimal stationary policies in denumerable Markov decision processes (English)
0 references
1988
0 references
Consider a Markov decision process with countable state space S, compact action sets and bounded rewards. Let \(V_{\alpha}(\pi,i)\) denote the expected \(\alpha\)-discounted reward under policy \(\pi\), starting in state i. \(\pi\) * is called a strong 1-optimal policy (SOP) if, for each \(i\in S\), \(\lim_{\alpha \to 1}[V_{\alpha}(\pi\) *,i)- \(\sup_{\pi}V_{\alpha}(\pi,i)]=O\). Under a standard set of assumptions (including the simultaneous Doeblin condition) for the existence of a stationary average optimal policy, the author proves that (i) a stationary SOP exists, (ii) any limit point, as \(\alpha\) \(\to 1\), of stationary \(\alpha\)-discounted optimal policies is a (stationary) SOP.
0 references
Markov decision process
0 references
countable state space
0 references
compact action sets
0 references
bounded rewards
0 references
\(\alpha\)-discounted reward
0 references
strong 1-optimal policy
0 references
simultaneous Doeblin condition
0 references
stationary average optimal policy
0 references
0 references