Singularly perturbed Markov decision processes with inclusion of transient states. (Q5943682): Difference between revisions
From MaRDI portal
Created a new Item |
Added link to MaRDI item. |
||
links / mardi / name | links / mardi / name | ||
Revision as of 00:52, 30 January 2024
scientific article; zbMATH DE number 1652518
Language | Label | Description | Also known as |
---|---|---|---|
English | Singularly perturbed Markov decision processes with inclusion of transient states. |
scientific article; zbMATH DE number 1652518 |
Statements
Singularly perturbed Markov decision processes with inclusion of transient states. (English)
0 references
30 July 2002
0 references
The authors consider continuous-time Markov Decision Processes (MDP) with weak and strong interactions as follows. Let \(x^\varepsilon (\cdot)=\{x^\varepsilon (t):t\geq 0\}\) be a real valued MDP with finite state space \({\mathcal M}=\{1,2, \dots,m\}\) and let \(u(\cdot)= \{u(t)= u(x^\varepsilon(t)) :t\geq 0\}\) be a feedback control such that \(u(t)\) is in a compact subset \(\Gamma\) of an Euclidean space. Let \(Q^\varepsilon (u(t))\) be a generator of \(x^\varepsilon(\cdot)\) having the form: \(Q^\varepsilon (u)=\widetilde Q(u)/\varepsilon+\widehat Q(u), u\in\Gamma\), where \(\widetilde Q(u)\) and \(\widehat Q(u)\) are generators and \(\varepsilon>0\) is a small parameter. For the initial state \(i=x^\varepsilon(0)\) of \(x^\varepsilon (\cdot)\), the cost-to-go function \(G(x,u)\) and the discount factor \(\rho> 0\), the cost functional is \[ J^\varepsilon \bigl(i,u(\cdot) \bigr)=E \int^\infty_0 e^{-\rho t}G\biggl( x^\varepsilon(t), u\bigl(x^\varepsilon (t)\bigr) \biggr)\,dt \] and the objective of the problem is to find a function \(u(\cdot)\) that minimizes \(J^\varepsilon(i,u (\cdot))\). They formulate a singularly perturbed MDP by decomposing the state space into several groups of recurrent states and a group of transient states, and derive the limit problem. They then construct the asymptotically optimal controls of the original problem by using the optimal solution of the limit problem. They obtain the convergence rate and error bound of the approximate control. Furthermore they deal with the related MDP with long-run average costs.
0 references
asymptotically optimal control
0 references