Sensitivity of constrained Markov decision processes (Q1176864): Difference between revisions

The paper considers a stationary parameter, discrete time, finite state, finite action Markov decision process, where \(X_ t\in X\), \(A_ t\in A\) are, respectively, the random state and action at time \(t\); \(x\) is the initial state; \(\beta\), \(0<\beta\leq 1\), is a discount factor; \(S\) is the set of stationary Markov policies; \(c(y,a)\), \(\{d^ k(y,a)\}\), \(1\leq k\leq K\), are cost functions; and \(\{V_ k\}\) are preset constraint levels. If \(E^ u_ x\) is the expectation operator, given the initial state \(x\) and policy \(u\in S\), the main problem addressed is \(\text{COP}_ \beta(x)\) \[ \text{minimize }\biggl[C_ \beta(x,u):=(1-\beta)E^ u_ x\Bigl[\sum^ \infty_{s=0}\beta^ sc(X_ s,A_ s)\Bigl]\biggl] \] \[ \text{subject to }D^ k_ \beta:=(1-\beta)E^ u_ x\Bigl[\sum^ \infty_{s=0}\rho^ sd^ k(X_ s,A_ s)\Bigl]\leq V_ k, \quad 1\leq k\leq K,\;u\in S. \] The associated linear program is, with \(\{P_{yav}\}\) being the transition probabilities and \(\delta_ v(y)\) being the Kronecker function, \(\text{LP}_ \beta(x)\) \[ \text{minimize }\Bigl[C(z):=\sum_{y,a}c(y,a)z(y,a)\Bigl] \] \[ \text{subject to }\sum_{y,a}z(y,a)(\delta_ v(y)-\beta P_{yav})=(1-\beta)\delta_ x(v),\quad v\in X, \] \[ \text{and }D^ k(z):=\sum_{y,a}d^ k(y,a)z(y,a)\leq V_ k,\quad 1\leq k\leq K. \] Under a positive recurrent state assumption, it is, in effect, shown that \(\text{COP}_ \beta(x)\) and \(\text{LP}_ \beta(x)\) are equivalent problems. The main purpose of the paper is to study the continuity properties of \(\text{COP}_ \beta(x)\) in terms of the parameters \(\{\beta, P_{yav}, c(y,a), \{d^ k(y,a)\}\}\). These are replaced by sequences \(\{\beta_ n, P^ n_{yav}, c_ n(y,a),\allowbreak \{d^ k_ n(y,a)\}\}\) with specified convergence properties. \(\text{LP}_ \beta(x)\) is then generalized to \(\{\text{LP}^ n_ \beta(x)\}\) and it is shown that various limiting properties of \(\{\text{LP}^ n_ \beta\}\) hold in relationship to \(\text{LP}_ \beta\). This is used to establish continuity results. -- Some consideration to limiting finite horizon problems and to adaptive problems is given.

0 references

zbMATH Keywords

Markov decision process

0 references

stationary Markov policies

0 references

continuity properties

0 references

finite horizon problems

0 references

adaptive problems

0 references

reviewed by

Douglas J. White

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

Markov Decision Problems and State-Action Frequencies

0 references

Adaptive control of constrained Markov chains

0 references

Adaptive control of constrained Markov chains: Criteria and policies

0 references

A convex analytic approach to Markov decision processes

0 references

Controlled Markov chains with constraints.

0 references

On the continuity of the minimum set of a continuous function

0 references

Q3703677

0 references

Finite state Markovian decision processes

0 references

Some Remarks on Finite Horizon Markovian Decision Models

0 references

Solving stochastic dynamic programming problems by linear programming — An annotated bibliography

0 references

Constrained Undiscounted Stochastic Dynamic Programming

0 references

Q4739658

0 references

Linear Programming and Sequential Decisions

0 references

Optimal priority assignment with hard constraint

0 references

Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints

0 references

Optimal scheduling of interactive and noninteractive traffic in telecommunication systems

0 references

Estimation and control in discounted stochastic dynamic programming

0 references

Q3725837

0 references

Identifiers

zbMATH Open document ID

0735.60091

0 references

DOI

10.1007/BF02204825

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1176864

@@ Property / cites work @@
+Markov Decision Problems and State-Action Frequencies
+Normal rank
@@ Property / cites work @@
+Adaptive control of constrained Markov chains
@@ Property / cites work: Adaptive control of constrained Markov chains / rank @@
+Normal rank
@@ Property / cites work @@
+Adaptive control of constrained Markov chains: Criteria and policies
+Normal rank
@@ Property / cites work @@
+A convex analytic approach to Markov decision processes
+Normal rank
@@ Property / cites work @@
+Controlled Markov chains with constraints.
@@ Property / cites work: Controlled Markov chains with constraints. / rank @@
+Normal rank
@@ Property / cites work @@
+On the continuity of the minimum set of a continuous function
+Normal rank
@@ Property / cites work @@
+Q3703677
@@ Property / cites work: Q3703677 / rank @@
+Normal rank
@@ Property / cites work @@
+Finite state Markovian decision processes
@@ Property / cites work: Finite state Markovian decision processes / rank @@
+Normal rank
@@ Property / cites work @@
+Some Remarks on Finite Horizon Markovian Decision Models
+Normal rank
@@ Property / cites work @@
+Solving stochastic dynamic programming problems by linear programming — An annotated bibliography
+Normal rank
@@ Property / cites work @@
+Constrained Undiscounted Stochastic Dynamic Programming
+Normal rank
@@ Property / cites work @@
+Q4739658
@@ Property / cites work: Q4739658 / rank @@
+Normal rank
@@ Property / cites work @@
+Linear Programming and Sequential Decisions
@@ Property / cites work: Linear Programming and Sequential Decisions / rank @@
+Normal rank
@@ Property / cites work @@
+Optimal priority assignment with hard constraint
@@ Property / cites work: Optimal priority assignment with hard constraint / rank @@
+Normal rank
@@ Property / cites work @@
+Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints
+Normal rank
@@ Property / cites work @@
+Optimal scheduling of interactive and noninteractive traffic in telecommunication systems
+Normal rank
@@ Property / cites work @@
+Estimation and control in discounted stochastic dynamic programming
+Normal rank
@@ Property / cites work @@
+Q3725837
@@ Property / cites work: Q3725837 / rank @@
+Normal rank