An expected average reward criterion (Q1115360)

Consider a Markovian decision model with finite state space and a fixed initial state i. Let \[ \bar U(i,\pi)=E_{i\pi}[\overline{\lim}\frac{1}{n}\sum^{n-1}_{t=0}r(X_ t,\quad A_ t)]\quad and\quad \bar U(i)=\sup_{\pi \in \Delta}\bar U(i,\pi), \] where $r(X_ t,A_ t)$ is a bounded random reward when action $A_ t$ at time t is chosen in state $X_ t$ at time t, $\Delta$ is the set of all policies $\pi$. Similarly we have \b{U}(i,$\pi)$ and \b{U}(i). The paper shows that all results for the criteria \[ \bar V(i,\pi)=\overline{\lim}_{n\to \infty}E_{i\pi}[\frac{1}{n}\sum^{n- 1}_{t=0}r(X_ t,A_ t)] \] and \b{V}(i,$\pi)$ can be carried over to the new criteria $\bar U$ and \b{U} without the assumption of a special gain function as in \textit{S. Demko} and \textit{T. P. Hill} [ibid. 17, 349- 357 (1984; Zbl 0537.90095)]. There exists some deterministic Markov policy which is (strongly) $\epsilon$-optimal for any one of the four criteria. Under some conditions including that the set of actions in state i is a compact metric space, for any $\epsilon >0$ there exists some deterministic stationary policy which is (strongly) $\epsilon$- optimal for any criterion. The author also proves that \b{U}$=\bar U=\underline V=\bar V$.

0 references

Mathematics Subject Classification ID

90C40

0 references

0 references

0 references

expected average reward

0 references