Singularly perturbed Markov decision processes with inclusion of transient states. (Q5943682)

The authors consider continuous-time Markov Decision Processes (MDP) with weak and strong interactions as follows. Let \(x^\varepsilon (\cdot)=\{x^\varepsilon (t):t\geq 0\}\) be a real valued MDP with finite state space \({\mathcal M}=\{1,2, \dots,m\}\) and let \(u(\cdot)= \{u(t)= u(x^\varepsilon(t)) :t\geq 0\}\) be a feedback control such that \(u(t)\) is in a compact subset \(\Gamma\) of an Euclidean space. Let \(Q^\varepsilon (u(t))\) be a generator of \(x^\varepsilon(\cdot)\) having the form: \(Q^\varepsilon (u)=\widetilde Q(u)/\varepsilon+\widehat Q(u), u\in\Gamma\), where \(\widetilde Q(u)\) and \(\widehat Q(u)\) are generators and \(\varepsilon>0\) is a small parameter. For the initial state \(i=x^\varepsilon(0)\) of \(x^\varepsilon (\cdot)\), the cost-to-go function \(G(x,u)\) and the discount factor \(\rho> 0\), the cost functional is \[ J^\varepsilon \bigl(i,u(\cdot) \bigr)=E \int^\infty_0 e^{-\rho t}G\biggl( x^\varepsilon(t), u\bigl(x^\varepsilon (t)\bigr) \biggr)\,dt \] and the objective of the problem is to find a function \(u(\cdot)\) that minimizes \(J^\varepsilon(i,u (\cdot))\). They formulate a singularly perturbed MDP by decomposing the state space into several groups of recurrent states and a group of transient states, and derive the limit problem. They then construct the asymptotically optimal controls of the original problem by using the optimal solution of the limit problem. They obtain the convergence rate and error bound of the approximate control. Furthermore they deal with the related MDP with long-run average costs.

0 references

reviewed by

Yoshio Ohtsubo

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references