Separable value functions for infinite horizon average reward Markov decision processes (Q908861)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Separable value functions for infinite horizon average reward Markov decision processes
scientific article

    Statements

    Separable value functions for infinite horizon average reward Markov decision processes (English)
    0 references
    1989
    0 references
    Consider the following decision problem: (a) The state and control spaces S, C are of the form \(S=\times^ p_{j=1}S_ j\), \(C=\times^ p_{j=1}C_ j\), \(S_ j\subset R^{nj}\), \(C_ j\subset R^{m_ j}\) for some integers \(n_ j\), \(m_ j\), \(1\leq j\leq p\) and finite \(C_ j\), \(S_ j;\) (b) for any given state x, the action set is \(A(x)=x^ p_{j=0}w_ j(x_ j)\), \(x_ j\in S_ j\), \(w_ j(x_ j)\subset C_ j;\) (c) for each \(i\in \{1,2,...,p\}\) there is a set \(I_ i\subset \{1,2,...,p\}\) such that \(I_ i\cap I_ j=\emptyset\) if \(i\neq j\); \(\cup^{p}_{i=1}I_ i\subset \{1,2,...,p\};\) (d) The reward in the current period is of the form \(r(x,y)=\sum^{p}_{j=1}r_ j(x_ j,y_ j)\), \(x_ j\in S_ j\), \(y_ j\in w_ j(x_ j);\) (e) The states of the next period (given the state \(x\in S\) and the action \(y\in A(x))\) are described by a random variable D (with values in a finite set D) and a function g of the form \(g(x,y,d)=\times^ p_{j=1}g_ j(x_{i(j)},y_{i(j)},d)\) where \(i(j)=i\) for \(j\in I_ i\), \(1\leq i\leq p\), \(d\in D.\) The problem is to find a policy to maximize the expected reward per period in the long run. A method to find a solution is given. The optimality equation is investigated and the relation to the separated optimality equations is given. An elementary inventory problem within this framework is treated. The paper extends the results of \textit{W. S. Lovejoy} [Oper. Res. 34, 630-637 (1986; Zbl 0632.90088)].
    0 references
    Markov decision problem
    0 references
    infinite horizon
    0 references
    average reward
    0 references
    expected reward
    0 references
    optimality equation
    0 references
    separated optimality equations
    0 references
    0 references

    Identifiers