Optimal control of diffusion processes with terminal constraint in law (Q2082225)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Optimal control of diffusion processes with terminal constraint in law
scientific article

    Statements

    Optimal control of diffusion processes with terminal constraint in law (English)
    0 references
    0 references
    4 October 2022
    0 references
    The author considers the stochastic optimal control problem written as: \[ inf_{\alpha _{t}\in \mathcal{A}}\mathbb{E}[\int_{0}^{T}(f_{1}(t,X_{t},\alpha _{t})+f_{2}(t,\mathcal{L}(X_{t})))dt+g(\mathcal{L}(X_{T}))], \] under the constraint \(\Psi (\mathcal{L}(X_{T}))\leq 0\) for the diffusion solution satisfying: \(dX_{t}=b(t,X_{t},\alpha _{t})dt+\sqrt{2}\sigma (t,X_{t},\alpha _{t})dB_{t}\), with the initial condition \(\mathcal{L}(X_{0})=m_{0}\in \mathcal{P}_{2}(\mathbb{R}^{d})\), the space of probability measures over \( \mathbb{R}^{d}\) with finite second-order moment. Here \(f_{1}:[0,T]\times \mathbb{R}^{d}\times A\rightarrow \mathbb{R}\) and \(f_{2}:[0,T]\times \mathcal{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}\) are the instantaneous costs, \(g:\mathcal{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}\) is the terminal cost, \(\Psi :\mathcal{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}\) is the final constraint, \(b:[0,T]\times \mathbb{R}^{d}\times A\rightarrow \mathbb{R}^{d}\) and \(\sigma :[0,T]\times \mathbb{R}^{d}\times A\rightarrow \mathbb{S}_{d}(\mathbb{R})\), the space of symmetric matrices of size \( d\times d\), are, respectively, the drift and the volatility of the controlled process \(X\), and \(\alpha \) is the control process with values in the control space \(A\), a closed subset of an Euclidean space. The main purpose of the paper is to prove that optimal Markov policies exist and are related to the solutions \((\lambda ,u,m)\in \mathbb{R}^{+}\times C_{b}^{1,2}([0,T]\times \mathbb{R}^{d})\times C^{0}([0,T],\mathcal{P}_{2}( \mathbb{R}^{d}))\) to the following system of partial differential equations (optimality conditions): \(-\partial _{t}u(t,x)+H(t,x,Du(t,x),D^{2}u(t,x))= \frac{\delta f_{2}}{\delta m}(t,m(t),x)\), \(\partial _{t}m-div(\partial _{p}H(t,x,Du(t,x),D^{2}u(t,x))m)+\sum_{i,j}\partial _{ij}^{2}((\partial _{M}H(t,x,Du(t,x),D^{2}u(t,x)))_{ij}m)=0\), in \([0,T]\times \mathbb{R}^{d}\), with the conditions \(u(T,x)=\lambda \frac{\delta \Psi }{\delta m}(m(T),x)+ \frac{\delta g}{\delta m}(m(T),x)\) in \(\mathbb{R}^{d}\), \(m(0)=m0\), \(\lambda (m(T))=0\), \(\Psi (m(T))\leq 0\), \(\lambda \geq 0\). Here \(H(t,x,p,M):=sup_{a \in A}\{-b(t,x,a)\cdot p-\sigma ^{t}\sigma (t,x,a)\cdot M-f_{1}(t,x,a)\}\) is the Hamiltonian of the system. The (backward) first equation is a Hamilton-Jacobi-Bellman equation satisfied by the adjoint state \(u\). The (forward) second equation is a Fokker-Planck equation which describes the evolution of the probability distribution \(m\) of the optimally controlled process. The nonnegative parameter \(\lambda \) is the Lagrange multiplier associated with the terminal constraint. The author introduces the Lagrangian \(L(t,x,q,N)=sup_{(p,M)\in \mathbb{R}^{d}\times \mathbb{S}_{d}( \mathbb{R})}\{-p\cdot q-M\cdot N-H(t,x,p,M)\}=H^{\ast }(t,x,-q,-N)\). The first main result proves, under appropriate hypotheses on the data of the problem, the existence of optimal Markov policies and if \((\alpha _{t})\in \mathcal{A}\) is an optimal Markov policy, there exists \((\lambda ,\varphi ,m)\in \mathbb{R}^{+}\times C_{b}^{1,2}([0,T]\times \mathbb{R}^{d})\times C^{0}([0,T],\mathcal{P}_{2}(\mathbb{R}^{d}))\) such that for \(m(t)\otimes dt\) almost all \((t,x)\in \lbrack 0,T]\times \mathbb{R}^{d}\) \(H(t,x,D\phi (t,x),D^{2}\phi (t,x))=-b(t,x,\alpha (t,x))\cdot D\varphi (t,x)-\sigma ^{t} \sigma (t,x,\alpha (t,x))\cdot D^{2}\phi (t,x)-f_{1}(t,x,\alpha (t,x))\}\), and \((\lambda ,\phi ,m)\) satisfies the above system of optimality conditions. If \(\Psi \), \(f_{2}\) and \(g\) are convex with respect to the measure argument and under different assumptions, the conditions of the previous result are also proved to be sufficient conditions, that is if \( \alpha \in L^{0}([0,T]\times \mathbb{R}^{d},A)\) satisfies the previous Hamiltonian equality for some \((\lambda ,\phi ,m)\) satisfying the system of optimality conditions, then the problem \(dX_{t}=b(t,X_{t},\alpha (t,X_{t}))dt+\sqrt{2}\sigma (t,X_{t},\alpha (t,X_{t}))dB_{t}\), starting from \(X_{0}\), has unique strong solution \(X_{t}\), and \(m(t)=\mathcal{L}(X_{t})\) and \(\alpha _{t}:=\alpha (t,X_{t})\) is a Markovian solution to the minimization problem \(inf_{\alpha \in \mathcal{U}_{ad}}J_{SP}(\alpha )\), where \(J_{SP}\) is the above-defined cost functional and \(\mathcal{U}_{ad}\) is the set of admissible controls \(\mathcal{U}_{ad}=\{\alpha \in \mathcal{A} :\Psi (\mathcal{L}(X_{T}^{\alpha }))\leq 0\) and \(J_{SP}(\alpha )<+\infty \}\) . For the proof, the author first proves that the Hamilton-Jacobi-Bellman equation admits a unique strong solution \(\phi \in C_{b}^{\frac{3+\alpha }{2} ,3+\alpha }([0,T]\times \mathbb{R}^{d})\). He introduces a relaxed problem for which he proves an existence result. He proves Lipschitz continuity properties for the solution to the Hamilton-Jacobi-Bellman equation.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    stochastic optimal control
    0 references
    constraints in law
    0 references
    Hamilton-Jacobi-Bellman equation
    0 references
    Fokker-Planck equation
    0 references
    mean field games
    0 references
    minmax
    0 references
    convex duality
    0 references
    existence result
    0 references
    relaxed problem
    0 references
    optimal Markov policies
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references