Singularly perturbed Markov decision processes with inclusion of transient states. (Q5943682)

From MaRDI portal
scientific article; zbMATH DE number 1652518
Language Label Description Also known as
English
Singularly perturbed Markov decision processes with inclusion of transient states.
scientific article; zbMATH DE number 1652518

    Statements

    Singularly perturbed Markov decision processes with inclusion of transient states. (English)
    0 references
    0 references
    0 references
    0 references
    30 July 2002
    0 references
    The authors consider continuous-time Markov Decision Processes (MDP) with weak and strong interactions as follows. Let \(x^\varepsilon (\cdot)=\{x^\varepsilon (t):t\geq 0\}\) be a real valued MDP with finite state space \({\mathcal M}=\{1,2, \dots,m\}\) and let \(u(\cdot)= \{u(t)= u(x^\varepsilon(t)) :t\geq 0\}\) be a feedback control such that \(u(t)\) is in a compact subset \(\Gamma\) of an Euclidean space. Let \(Q^\varepsilon (u(t))\) be a generator of \(x^\varepsilon(\cdot)\) having the form: \(Q^\varepsilon (u)=\widetilde Q(u)/\varepsilon+\widehat Q(u), u\in\Gamma\), where \(\widetilde Q(u)\) and \(\widehat Q(u)\) are generators and \(\varepsilon>0\) is a small parameter. For the initial state \(i=x^\varepsilon(0)\) of \(x^\varepsilon (\cdot)\), the cost-to-go function \(G(x,u)\) and the discount factor \(\rho> 0\), the cost functional is \[ J^\varepsilon \bigl(i,u(\cdot) \bigr)=E \int^\infty_0 e^{-\rho t}G\biggl( x^\varepsilon(t), u\bigl(x^\varepsilon (t)\bigr) \biggr)\,dt \] and the objective of the problem is to find a function \(u(\cdot)\) that minimizes \(J^\varepsilon(i,u (\cdot))\). They formulate a singularly perturbed MDP by decomposing the state space into several groups of recurrent states and a group of transient states, and derive the limit problem. They then construct the asymptotically optimal controls of the original problem by using the optimal solution of the limit problem. They obtain the convergence rate and error bound of the approximate control. Furthermore they deal with the related MDP with long-run average costs.
    0 references
    0 references
    0 references
    0 references
    0 references
    asymptotically optimal control
    0 references