An expected average reward criterion (Q1115360)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | An expected average reward criterion |
scientific article |
Statements
An expected average reward criterion (English)
0 references
1987
0 references
Consider a Markovian decision model with finite state space and a fixed initial state i. Let \[ \bar U(i,\pi)=E_{i\pi}[\overline{\lim}\frac{1}{n}\sum^{n-1}_{t=0}r(X_ t,\quad A_ t)]\quad and\quad \bar U(i)=\sup_{\pi \in \Delta}\bar U(i,\pi), \] where \(r(X_ t,A_ t)\) is a bounded random reward when action \(A_ t\) at time t is chosen in state \(X_ t\) at time t, \(\Delta\) is the set of all policies \(\pi\). Similarly we have \b{U}(i,\(\pi)\) and \b{U}(i). The paper shows that all results for the criteria \[ \bar V(i,\pi)=\overline{\lim}_{n\to \infty}E_{i\pi}[\frac{1}{n}\sum^{n- 1}_{t=0}r(X_ t,A_ t)] \] and \b{V}(i,\(\pi)\) can be carried over to the new criteria \(\bar U\) and \b{U} without the assumption of a special gain function as in \textit{S. Demko} and \textit{T. P. Hill} [ibid. 17, 349- 357 (1984; Zbl 0537.90095)]. There exists some deterministic Markov policy which is (strongly) \(\epsilon\)-optimal for any one of the four criteria. Under some conditions including that the set of actions in state i is a compact metric space, for any \(\epsilon >0\) there exists some deterministic stationary policy which is (strongly) \(\epsilon\)- optimal for any criterion. The author also proves that \b{U}\(=\bar U=\underline V=\bar V\).
0 references
expected average reward
0 references
strong \(\epsilon \) -optimal decision
0 references
finite state space
0 references
deterministic stationary policy
0 references
0 references
0 references