An expected average reward criterion (Q1115360)

From MaRDI portal
scientific article
Language Label Description Also known as
English
An expected average reward criterion
scientific article

    Statements

    An expected average reward criterion (English)
    0 references
    0 references
    1987
    0 references
    Consider a Markovian decision model with finite state space and a fixed initial state i. Let \[ \bar U(i,\pi)=E_{i\pi}[\overline{\lim}\frac{1}{n}\sum^{n-1}_{t=0}r(X_ t,\quad A_ t)]\quad and\quad \bar U(i)=\sup_{\pi \in \Delta}\bar U(i,\pi), \] where \(r(X_ t,A_ t)\) is a bounded random reward when action \(A_ t\) at time t is chosen in state \(X_ t\) at time t, \(\Delta\) is the set of all policies \(\pi\). Similarly we have \b{U}(i,\(\pi)\) and \b{U}(i). The paper shows that all results for the criteria \[ \bar V(i,\pi)=\overline{\lim}_{n\to \infty}E_{i\pi}[\frac{1}{n}\sum^{n- 1}_{t=0}r(X_ t,A_ t)] \] and \b{V}(i,\(\pi)\) can be carried over to the new criteria \(\bar U\) and \b{U} without the assumption of a special gain function as in \textit{S. Demko} and \textit{T. P. Hill} [ibid. 17, 349- 357 (1984; Zbl 0537.90095)]. There exists some deterministic Markov policy which is (strongly) \(\epsilon\)-optimal for any one of the four criteria. Under some conditions including that the set of actions in state i is a compact metric space, for any \(\epsilon >0\) there exists some deterministic stationary policy which is (strongly) \(\epsilon\)- optimal for any criterion. The author also proves that \b{U}\(=\bar U=\underline V=\bar V\).
    0 references
    0 references
    expected average reward
    0 references
    strong \(\epsilon \) -optimal decision
    0 references
    finite state space
    0 references
    deterministic stationary policy
    0 references