Sensitivity of constrained Markov decision processes (Q1176864)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Sensitivity of constrained Markov decision processes
scientific article

    Statements

    Sensitivity of constrained Markov decision processes (English)
    0 references
    0 references
    0 references
    25 June 1992
    0 references
    The paper considers a stationary parameter, discrete time, finite state, finite action Markov decision process, where \(X_ t\in X\), \(A_ t\in A\) are, respectively, the random state and action at time \(t\); \(x\) is the initial state; \(\beta\), \(0<\beta\leq 1\), is a discount factor; \(S\) is the set of stationary Markov policies; \(c(y,a)\), \(\{d^ k(y,a)\}\), \(1\leq k\leq K\), are cost functions; and \(\{V_ k\}\) are preset constraint levels. If \(E^ u_ x\) is the expectation operator, given the initial state \(x\) and policy \(u\in S\), the main problem addressed is \(\text{COP}_ \beta(x)\) \[ \text{minimize }\biggl[C_ \beta(x,u):=(1-\beta)E^ u_ x\Bigl[\sum^ \infty_{s=0}\beta^ sc(X_ s,A_ s)\Bigl]\biggl] \] \[ \text{subject to }D^ k_ \beta:=(1-\beta)E^ u_ x\Bigl[\sum^ \infty_{s=0}\rho^ sd^ k(X_ s,A_ s)\Bigl]\leq V_ k, \quad 1\leq k\leq K,\;u\in S. \] The associated linear program is, with \(\{P_{yav}\}\) being the transition probabilities and \(\delta_ v(y)\) being the Kronecker function, \(\text{LP}_ \beta(x)\) \[ \text{minimize }\Bigl[C(z):=\sum_{y,a}c(y,a)z(y,a)\Bigl] \] \[ \text{subject to }\sum_{y,a}z(y,a)(\delta_ v(y)-\beta P_{yav})=(1-\beta)\delta_ x(v),\quad v\in X, \] \[ \text{and }D^ k(z):=\sum_{y,a}d^ k(y,a)z(y,a)\leq V_ k,\quad 1\leq k\leq K. \] Under a positive recurrent state assumption, it is, in effect, shown that \(\text{COP}_ \beta(x)\) and \(\text{LP}_ \beta(x)\) are equivalent problems. The main purpose of the paper is to study the continuity properties of \(\text{COP}_ \beta(x)\) in terms of the parameters \(\{\beta, P_{yav}, c(y,a), \{d^ k(y,a)\}\}\). These are replaced by sequences \(\{\beta_ n, P^ n_{yav}, c_ n(y,a),\allowbreak \{d^ k_ n(y,a)\}\}\) with specified convergence properties. \(\text{LP}_ \beta(x)\) is then generalized to \(\{\text{LP}^ n_ \beta(x)\}\) and it is shown that various limiting properties of \(\{\text{LP}^ n_ \beta\}\) hold in relationship to \(\text{LP}_ \beta\). This is used to establish continuity results. -- Some consideration to limiting finite horizon problems and to adaptive problems is given.
    0 references
    Markov decision process
    0 references
    stationary Markov policies
    0 references
    continuity properties
    0 references
    finite horizon problems
    0 references
    adaptive problems
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references