Separable value functions for infinite horizon average reward Markov decision processes (Q908861)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Separable value functions for infinite horizon average reward Markov decision processes |
scientific article |
Statements
Separable value functions for infinite horizon average reward Markov decision processes (English)
0 references
1989
0 references
Consider the following decision problem: (a) The state and control spaces S, C are of the form \(S=\times^ p_{j=1}S_ j\), \(C=\times^ p_{j=1}C_ j\), \(S_ j\subset R^{nj}\), \(C_ j\subset R^{m_ j}\) for some integers \(n_ j\), \(m_ j\), \(1\leq j\leq p\) and finite \(C_ j\), \(S_ j;\) (b) for any given state x, the action set is \(A(x)=x^ p_{j=0}w_ j(x_ j)\), \(x_ j\in S_ j\), \(w_ j(x_ j)\subset C_ j;\) (c) for each \(i\in \{1,2,...,p\}\) there is a set \(I_ i\subset \{1,2,...,p\}\) such that \(I_ i\cap I_ j=\emptyset\) if \(i\neq j\); \(\cup^{p}_{i=1}I_ i\subset \{1,2,...,p\};\) (d) The reward in the current period is of the form \(r(x,y)=\sum^{p}_{j=1}r_ j(x_ j,y_ j)\), \(x_ j\in S_ j\), \(y_ j\in w_ j(x_ j);\) (e) The states of the next period (given the state \(x\in S\) and the action \(y\in A(x))\) are described by a random variable D (with values in a finite set D) and a function g of the form \(g(x,y,d)=\times^ p_{j=1}g_ j(x_{i(j)},y_{i(j)},d)\) where \(i(j)=i\) for \(j\in I_ i\), \(1\leq i\leq p\), \(d\in D.\) The problem is to find a policy to maximize the expected reward per period in the long run. A method to find a solution is given. The optimality equation is investigated and the relation to the separated optimality equations is given. An elementary inventory problem within this framework is treated. The paper extends the results of \textit{W. S. Lovejoy} [Oper. Res. 34, 630-637 (1986; Zbl 0632.90088)].
0 references
Markov decision problem
0 references
infinite horizon
0 references
average reward
0 references
expected reward
0 references
optimality equation
0 references
separated optimality equations
0 references
0 references