Sensitivity of constrained Markov decision processes (Q1176864): Difference between revisions

The paper considers a stationary parameter, discrete time, finite state, finite action Markov decision process, where \(X_ t\in X\), \(A_ t\in A\) are, respectively, the random state and action at time \(t\); \(x\) is the initial state; \(\beta\), \(0<\beta\leq 1\), is a discount factor; \(S\) is the set of stationary Markov policies; \(c(y,a)\), \(\{d^ k(y,a)\}\), \(1\leq k\leq K\), are cost functions; and \(\{V_ k\}\) are preset constraint levels. If \(E^ u_ x\) is the expectation operator, given the initial state \(x\) and policy \(u\in S\), the main problem addressed is \(\text{COP}_ \beta(x)\) \[ \text{minimize }\biggl[C_ \beta(x,u):=(1-\beta)E^ u_ x\Bigl[\sum^ \infty_{s=0}\beta^ sc(X_ s,A_ s)\Bigl]\biggl] \] \[ \text{subject to }D^ k_ \beta:=(1-\beta)E^ u_ x\Bigl[\sum^ \infty_{s=0}\rho^ sd^ k(X_ s,A_ s)\Bigl]\leq V_ k, \quad 1\leq k\leq K,\;u\in S. \] The associated linear program is, with \(\{P_{yav}\}\) being the transition probabilities and \(\delta_ v(y)\) being the Kronecker function, \(\text{LP}_ \beta(x)\) \[ \text{minimize }\Bigl[C(z):=\sum_{y,a}c(y,a)z(y,a)\Bigl] \] \[ \text{subject to }\sum_{y,a}z(y,a)(\delta_ v(y)-\beta P_{yav})=(1-\beta)\delta_ x(v),\quad v\in X, \] \[ \text{and }D^ k(z):=\sum_{y,a}d^ k(y,a)z(y,a)\leq V_ k,\quad 1\leq k\leq K. \] Under a positive recurrent state assumption, it is, in effect, shown that \(\text{COP}_ \beta(x)\) and \(\text{LP}_ \beta(x)\) are equivalent problems. The main purpose of the paper is to study the continuity properties of \(\text{COP}_ \beta(x)\) in terms of the parameters \(\{\beta, P_{yav}, c(y,a), \{d^ k(y,a)\}\}\). These are replaced by sequences \(\{\beta_ n, P^ n_{yav}, c_ n(y,a),\allowbreak \{d^ k_ n(y,a)\}\}\) with specified convergence properties. \(\text{LP}_ \beta(x)\) is then generalized to \(\{\text{LP}^ n_ \beta(x)\}\) and it is shown that various limiting properties of \(\{\text{LP}^ n_ \beta\}\) hold in relationship to \(\text{LP}_ \beta\). This is used to establish continuity results. -- Some consideration to limiting finite horizon problems and to adaptive problems is given.

0 references

zbMATH Keywords

Markov decision process

0 references

stationary Markov policies

0 references

continuity properties

0 references

finite horizon problems

0 references

adaptive problems

0 references

reviewed by

Douglas J. White

0 references

MaRDI profile type

MaRDI publication profile

0 references

Identifiers

zbMATH Open document ID

0735.60091

0 references

DOI

10.1007/BF02204825

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1176864

Revision as of 08:16, 10 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Changed an Item ← Older edit	Revision as of 23:31, 4 March 2024 Import240304020342 (talk \| contribs) 4,416,906 edits Set profile property. Newer edit →
	Property / MaRDI profile type
		MaRDI publication profile
	Property / MaRDI profile type: MaRDI publication profile / rank
		Normal rank