On discounted Markov decision programming with multi-vector constraints (Q1913925)

We discuss discounted Markov decision programming with multi-vector constraints. This model is similar to a model of \textit{K. Tanaka} [J. Math. Anal. Appl. 155, No. 1, 264-277 (1991; Zbl 0729.90087)]. The model in our paper is \(\{S, A, q, r, c, (W_i, i\in S), \beta\}\), where state space \(S\) and action set \(A\) are all nonempty countable sets, \(q(j|i, a)\) is a one step transition probability, \(r(i, a)\) is a one step reward function, \(i, j\in S\), \(a\in A\). \(c(i, a)\) is a \(p(\geq 1)\)-dimensional real vector, representing one step cost vector. \(r\) and \(c\) are bounded, i.e.: \(|r(i, a)|\leq M\), \(|c(i, a)_k|\leq M\), \(k= 1,\dots, p\), \(i\in S\), \(a\in A\), where \(c(i, a)_k\) denotes \(k\)th component of \(c(i, a)\). \(W_i\) is a \(p\)-dimensional real vector, \(i\in S\). \(\beta\in (0, 1)\) is a discount factor. \(\Pi\) denotes the class of general strategy, and \(\Pi_m\) denotes the class of stochastic stationary strategy. The discounted expected total reward is: \[ V_\beta(\pi, i)= \sum^\infty_{t= 0} \beta^t E_\pi[r(X_t, \Delta_t)|X_0= i],\quad i\in S,\quad \pi\in \Pi, \] where \(X_t\) and \(\Delta_t\) denote a state at time \(t\) and action taken at time \(t\) respectively. The discounted expected total cost is \[ \overline C_\beta(\pi, i)= \sum^\infty_{t= 0} \beta^t E_\pi[c(X_t, \Delta_t) |X_0= i],\quad i\in S,\quad \pi\in \Pi. \] Let \(V_1= (V_1(1),\dots, V_1(p))\), \(V_2= (V_2(1),\dots, V_2(p))\) be two \(p\)-dimensional real vectors. If \(V_1(k)\leq V_2(k)\) for all \(k= 1,2,\dots, p\), then we denote \(V_1\leq V_2\). Let \(\Pi_a= \{\pi\in \Pi : \overline C_\beta(\pi, i)\leq W_i\), for all \(i\in S\}\). Definition 1. Let \(\varepsilon\geq 0\) and \(\pi^*\in \Pi_a\). If \(V_\beta(\pi^*, i)\geq \sup_{\pi\in \Pi_a} V_\beta(\pi, i)- \varepsilon\) for all \(i\in S\), then we call \(\pi^*\) an \(\varepsilon\)-constrained optimal strategy. A 0-constrained optimal strategy will be simply called a constrained optimal strategy. For the above model we prove: if \(\Pi_a\neq \emptyset\), then there exists a sub-stochastic stationary strategy which is an \(\varepsilon(> 0)\)-constrained optimal strategy. When \(S\) and \(A\) are finite, we give an algorithm for finding a constrained optimal strategy.

0 references

zbMATH Keywords

vector constraints

0 references

discounted Markov decision programming

0 references

multi-vector constraints

0 references

discounted expected total reward

0 references

\(\varepsilon\)-constrained optimal strategy

0 references