Optimal control of diffusion processes with terminal constraint in law (Q2082225)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Optimal control of diffusion processes with terminal constraint in law |
scientific article |
Statements
Optimal control of diffusion processes with terminal constraint in law (English)
0 references
4 October 2022
0 references
The author considers the stochastic optimal control problem written as: \[ inf_{\alpha _{t}\in \mathcal{A}}\mathbb{E}[\int_{0}^{T}(f_{1}(t,X_{t},\alpha _{t})+f_{2}(t,\mathcal{L}(X_{t})))dt+g(\mathcal{L}(X_{T}))], \] under the constraint \(\Psi (\mathcal{L}(X_{T}))\leq 0\) for the diffusion solution satisfying: \(dX_{t}=b(t,X_{t},\alpha _{t})dt+\sqrt{2}\sigma (t,X_{t},\alpha _{t})dB_{t}\), with the initial condition \(\mathcal{L}(X_{0})=m_{0}\in \mathcal{P}_{2}(\mathbb{R}^{d})\), the space of probability measures over \( \mathbb{R}^{d}\) with finite second-order moment. Here \(f_{1}:[0,T]\times \mathbb{R}^{d}\times A\rightarrow \mathbb{R}\) and \(f_{2}:[0,T]\times \mathcal{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}\) are the instantaneous costs, \(g:\mathcal{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}\) is the terminal cost, \(\Psi :\mathcal{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}\) is the final constraint, \(b:[0,T]\times \mathbb{R}^{d}\times A\rightarrow \mathbb{R}^{d}\) and \(\sigma :[0,T]\times \mathbb{R}^{d}\times A\rightarrow \mathbb{S}_{d}(\mathbb{R})\), the space of symmetric matrices of size \( d\times d\), are, respectively, the drift and the volatility of the controlled process \(X\), and \(\alpha \) is the control process with values in the control space \(A\), a closed subset of an Euclidean space. The main purpose of the paper is to prove that optimal Markov policies exist and are related to the solutions \((\lambda ,u,m)\in \mathbb{R}^{+}\times C_{b}^{1,2}([0,T]\times \mathbb{R}^{d})\times C^{0}([0,T],\mathcal{P}_{2}( \mathbb{R}^{d}))\) to the following system of partial differential equations (optimality conditions): \(-\partial _{t}u(t,x)+H(t,x,Du(t,x),D^{2}u(t,x))= \frac{\delta f_{2}}{\delta m}(t,m(t),x)\), \(\partial _{t}m-div(\partial _{p}H(t,x,Du(t,x),D^{2}u(t,x))m)+\sum_{i,j}\partial _{ij}^{2}((\partial _{M}H(t,x,Du(t,x),D^{2}u(t,x)))_{ij}m)=0\), in \([0,T]\times \mathbb{R}^{d}\), with the conditions \(u(T,x)=\lambda \frac{\delta \Psi }{\delta m}(m(T),x)+ \frac{\delta g}{\delta m}(m(T),x)\) in \(\mathbb{R}^{d}\), \(m(0)=m0\), \(\lambda (m(T))=0\), \(\Psi (m(T))\leq 0\), \(\lambda \geq 0\). Here \(H(t,x,p,M):=sup_{a \in A}\{-b(t,x,a)\cdot p-\sigma ^{t}\sigma (t,x,a)\cdot M-f_{1}(t,x,a)\}\) is the Hamiltonian of the system. The (backward) first equation is a Hamilton-Jacobi-Bellman equation satisfied by the adjoint state \(u\). The (forward) second equation is a Fokker-Planck equation which describes the evolution of the probability distribution \(m\) of the optimally controlled process. The nonnegative parameter \(\lambda \) is the Lagrange multiplier associated with the terminal constraint. The author introduces the Lagrangian \(L(t,x,q,N)=sup_{(p,M)\in \mathbb{R}^{d}\times \mathbb{S}_{d}( \mathbb{R})}\{-p\cdot q-M\cdot N-H(t,x,p,M)\}=H^{\ast }(t,x,-q,-N)\). The first main result proves, under appropriate hypotheses on the data of the problem, the existence of optimal Markov policies and if \((\alpha _{t})\in \mathcal{A}\) is an optimal Markov policy, there exists \((\lambda ,\varphi ,m)\in \mathbb{R}^{+}\times C_{b}^{1,2}([0,T]\times \mathbb{R}^{d})\times C^{0}([0,T],\mathcal{P}_{2}(\mathbb{R}^{d}))\) such that for \(m(t)\otimes dt\) almost all \((t,x)\in \lbrack 0,T]\times \mathbb{R}^{d}\) \(H(t,x,D\phi (t,x),D^{2}\phi (t,x))=-b(t,x,\alpha (t,x))\cdot D\varphi (t,x)-\sigma ^{t} \sigma (t,x,\alpha (t,x))\cdot D^{2}\phi (t,x)-f_{1}(t,x,\alpha (t,x))\}\), and \((\lambda ,\phi ,m)\) satisfies the above system of optimality conditions. If \(\Psi \), \(f_{2}\) and \(g\) are convex with respect to the measure argument and under different assumptions, the conditions of the previous result are also proved to be sufficient conditions, that is if \( \alpha \in L^{0}([0,T]\times \mathbb{R}^{d},A)\) satisfies the previous Hamiltonian equality for some \((\lambda ,\phi ,m)\) satisfying the system of optimality conditions, then the problem \(dX_{t}=b(t,X_{t},\alpha (t,X_{t}))dt+\sqrt{2}\sigma (t,X_{t},\alpha (t,X_{t}))dB_{t}\), starting from \(X_{0}\), has unique strong solution \(X_{t}\), and \(m(t)=\mathcal{L}(X_{t})\) and \(\alpha _{t}:=\alpha (t,X_{t})\) is a Markovian solution to the minimization problem \(inf_{\alpha \in \mathcal{U}_{ad}}J_{SP}(\alpha )\), where \(J_{SP}\) is the above-defined cost functional and \(\mathcal{U}_{ad}\) is the set of admissible controls \(\mathcal{U}_{ad}=\{\alpha \in \mathcal{A} :\Psi (\mathcal{L}(X_{T}^{\alpha }))\leq 0\) and \(J_{SP}(\alpha )<+\infty \}\) . For the proof, the author first proves that the Hamilton-Jacobi-Bellman equation admits a unique strong solution \(\phi \in C_{b}^{\frac{3+\alpha }{2} ,3+\alpha }([0,T]\times \mathbb{R}^{d})\). He introduces a relaxed problem for which he proves an existence result. He proves Lipschitz continuity properties for the solution to the Hamilton-Jacobi-Bellman equation.
0 references
stochastic optimal control
0 references
constraints in law
0 references
Hamilton-Jacobi-Bellman equation
0 references
Fokker-Planck equation
0 references
mean field games
0 references
minmax
0 references
convex duality
0 references
existence result
0 references
relaxed problem
0 references
optimal Markov policies
0 references