Optimal control of diffusion processes with terminal constraint in law (Q2082225)

The author considers the stochastic optimal control problem written as: \[ inf_{\alpha _{t}\in \mathcal{A}}\mathbb{E}[\int_{0}^{T}(f_{1}(t,X_{t},\alpha _{t})+f_{2}(t,\mathcal{L}(X_{t})))dt+g(\mathcal{L}(X_{T}))], \] under the constraint \(\Psi (\mathcal{L}(X_{T}))\leq 0\) for the diffusion solution satisfying: \(dX_{t}=b(t,X_{t},\alpha _{t})dt+\sqrt{2}\sigma (t,X_{t},\alpha _{t})dB_{t}\), with the initial condition \(\mathcal{L}(X_{0})=m_{0}\in \mathcal{P}_{2}(\mathbb{R}^{d})\), the space of probability measures over \( \mathbb{R}^{d}\) with finite second-order moment. Here \(f_{1}:[0,T]\times \mathbb{R}^{d}\times A\rightarrow \mathbb{R}\) and \(f_{2}:[0,T]\times \mathcal{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}\) are the instantaneous costs, \(g:\mathcal{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}\) is the terminal cost, \(\Psi :\mathcal{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}\) is the final constraint, \(b:[0,T]\times \mathbb{R}^{d}\times A\rightarrow \mathbb{R}^{d}\) and \(\sigma :[0,T]\times \mathbb{R}^{d}\times A\rightarrow \mathbb{S}_{d}(\mathbb{R})\), the space of symmetric matrices of size \( d\times d\), are, respectively, the drift and the volatility of the controlled process \(X\), and \(\alpha \) is the control process with values in the control space \(A\), a closed subset of an Euclidean space. The main purpose of the paper is to prove that optimal Markov policies exist and are related to the solutions \((\lambda ,u,m)\in \mathbb{R}^{+}\times C_{b}^{1,2}([0,T]\times \mathbb{R}^{d})\times C^{0}([0,T],\mathcal{P}_{2}( \mathbb{R}^{d}))\) to the following system of partial differential equations (optimality conditions): \(-\partial _{t}u(t,x)+H(t,x,Du(t,x),D^{2}u(t,x))= \frac{\delta f_{2}}{\delta m}(t,m(t),x)\), \(\partial _{t}m-div(\partial _{p}H(t,x,Du(t,x),D^{2}u(t,x))m)+\sum_{i,j}\partial _{ij}^{2}((\partial _{M}H(t,x,Du(t,x),D^{2}u(t,x)))_{ij}m)=0\), in \([0,T]\times \mathbb{R}^{d}\), with the conditions \(u(T,x)=\lambda \frac{\delta \Psi }{\delta m}(m(T),x)+ \frac{\delta g}{\delta m}(m(T),x)\) in \(\mathbb{R}^{d}\), \(m(0)=m0\), \(\lambda (m(T))=0\), \(\Psi (m(T))\leq 0\), \(\lambda \geq 0\). Here \(H(t,x,p,M):=sup_{a \in A}\{-b(t,x,a)\cdot p-\sigma ^{t}\sigma (t,x,a)\cdot M-f_{1}(t,x,a)\}\) is the Hamiltonian of the system. The (backward) first equation is a Hamilton-Jacobi-Bellman equation satisfied by the adjoint state \(u\). The (forward) second equation is a Fokker-Planck equation which describes the evolution of the probability distribution \(m\) of the optimally controlled process. The nonnegative parameter \(\lambda \) is the Lagrange multiplier associated with the terminal constraint. The author introduces the Lagrangian \(L(t,x,q,N)=sup_{(p,M)\in \mathbb{R}^{d}\times \mathbb{S}_{d}( \mathbb{R})}\{-p\cdot q-M\cdot N-H(t,x,p,M)\}=H^{\ast }(t,x,-q,-N)\). The first main result proves, under appropriate hypotheses on the data of the problem, the existence of optimal Markov policies and if \((\alpha _{t})\in \mathcal{A}\) is an optimal Markov policy, there exists \((\lambda ,\varphi ,m)\in \mathbb{R}^{+}\times C_{b}^{1,2}([0,T]\times \mathbb{R}^{d})\times C^{0}([0,T],\mathcal{P}_{2}(\mathbb{R}^{d}))\) such that for \(m(t)\otimes dt\) almost all \((t,x)\in \lbrack 0,T]\times \mathbb{R}^{d}\) \(H(t,x,D\phi (t,x),D^{2}\phi (t,x))=-b(t,x,\alpha (t,x))\cdot D\varphi (t,x)-\sigma ^{t} \sigma (t,x,\alpha (t,x))\cdot D^{2}\phi (t,x)-f_{1}(t,x,\alpha (t,x))\}\), and \((\lambda ,\phi ,m)\) satisfies the above system of optimality conditions. If \(\Psi \), \(f_{2}\) and \(g\) are convex with respect to the measure argument and under different assumptions, the conditions of the previous result are also proved to be sufficient conditions, that is if \( \alpha \in L^{0}([0,T]\times \mathbb{R}^{d},A)\) satisfies the previous Hamiltonian equality for some \((\lambda ,\phi ,m)\) satisfying the system of optimality conditions, then the problem \(dX_{t}=b(t,X_{t},\alpha (t,X_{t}))dt+\sqrt{2}\sigma (t,X_{t},\alpha (t,X_{t}))dB_{t}\), starting from \(X_{0}\), has unique strong solution \(X_{t}\), and \(m(t)=\mathcal{L}(X_{t})\) and \(\alpha _{t}:=\alpha (t,X_{t})\) is a Markovian solution to the minimization problem \(inf_{\alpha \in \mathcal{U}_{ad}}J_{SP}(\alpha )\), where \(J_{SP}\) is the above-defined cost functional and \(\mathcal{U}_{ad}\) is the set of admissible controls \(\mathcal{U}_{ad}=\{\alpha \in \mathcal{A} :\Psi (\mathcal{L}(X_{T}^{\alpha }))\leq 0\) and \(J_{SP}(\alpha )<+\infty \}\) . For the proof, the author first proves that the Hamilton-Jacobi-Bellman equation admits a unique strong solution \(\phi \in C_{b}^{\frac{3+\alpha }{2} ,3+\alpha }([0,T]\times \mathbb{R}^{d})\). He introduces a relaxed problem for which he proves an existence result. He proves Lipschitz continuity properties for the solution to the Hamilton-Jacobi-Bellman equation.

0 references

reviewed by

Alain Brillard

0 references

zbMATH Keywords

stochastic optimal control

0 references

constraints in law

0 references

Hamilton-Jacobi-Bellman equation

0 references

Fokker-Planck equation