Sensitivity of constrained Markov decision processes (Q1176864)

The paper considers a stationary parameter, discrete time, finite state, finite action Markov decision process, where \(X_ t\in X\), \(A_ t\in A\) are, respectively, the random state and action at time \(t\); \(x\) is the initial state; \(\beta\), \(0<\beta\leq 1\), is a discount factor; \(S\) is the set of stationary Markov policies; \(c(y,a)\), \(\{d^ k(y,a)\}\), \(1\leq k\leq K\), are cost functions; and \(\{V_ k\}\) are preset constraint levels. If \(E^ u_ x\) is the expectation operator, given the initial state \(x\) and policy \(u\in S\), the main problem addressed is \(\text{COP}_ \beta(x)\) \[ \text{minimize }\biggl[C_ \beta(x,u):=(1-\beta)E^ u_ x\Bigl[\sum^ \infty_{s=0}\beta^ sc(X_ s,A_ s)\Bigl]\biggl] \] \[ \text{subject to }D^ k_ \beta:=(1-\beta)E^ u_ x\Bigl[\sum^ \infty_{s=0}\rho^ sd^ k(X_ s,A_ s)\Bigl]\leq V_ k, \quad 1\leq k\leq K,\;u\in S. \] The associated linear program is, with \(\{P_{yav}\}\) being the transition probabilities and \(\delta_ v(y)\) being the Kronecker function, \(\text{LP}_ \beta(x)\) \[ \text{minimize }\Bigl[C(z):=\sum_{y,a}c(y,a)z(y,a)\Bigl] \] \[ \text{subject to }\sum_{y,a}z(y,a)(\delta_ v(y)-\beta P_{yav})=(1-\beta)\delta_ x(v),\quad v\in X, \] \[ \text{and }D^ k(z):=\sum_{y,a}d^ k(y,a)z(y,a)\leq V_ k,\quad 1\leq k\leq K. \] Under a positive recurrent state assumption, it is, in effect, shown that \(\text{COP}_ \beta(x)\) and \(\text{LP}_ \beta(x)\) are equivalent problems. The main purpose of the paper is to study the continuity properties of \(\text{COP}_ \beta(x)\) in terms of the parameters \(\{\beta, P_{yav}, c(y,a), \{d^ k(y,a)\}\}\). These are replaced by sequences \(\{\beta_ n, P^ n_{yav}, c_ n(y,a),\allowbreak \{d^ k_ n(y,a)\}\}\) with specified convergence properties. \(\text{LP}_ \beta(x)\) is then generalized to \(\{\text{LP}^ n_ \beta(x)\}\) and it is shown that various limiting properties of \(\{\text{LP}^ n_ \beta\}\) hold in relationship to \(\text{LP}_ \beta\). This is used to establish continuity results. -- Some consideration to limiting finite horizon problems and to adaptive problems is given.

0 references

zbMATH Keywords

Markov decision process

0 references

stationary Markov policies

0 references

continuity properties

0 references

finite horizon problems

0 references

adaptive problems

0 references