Linear Convergence of a Policy Gradient Method for Some Finite Horizon Continuous Time Control Problems
From MaRDI portal
Publication:6140987
Abstract: Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for continuous space-time control problems with nonlinear state dynamics has been elusive. This paper proposes proximal gradient algorithms for feedback controls of finite-time horizon stochastic control problems. The state dynamics are nonlinear diffusions with control-affine drift, and the cost functions are nonconvex in the state and nonsmooth in the control. The system noise can degenerate, which allows for deterministic control problems as special cases. We prove under suitable conditions that the algorithm converges linearly to a stationary point of the control problem, and is stable with respect to policy updates by approximate gradient steps. The convergence result justifies the recent reinforcement learning heuristics that adding entropy regularization or a fictitious discount factor to the optimization objective accelerates the convergence of policy gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations.
Cites work
- scientific article; zbMATH DE number 1577097 (Why is no real title available?)
- scientific article; zbMATH DE number 1121855 (Why is no real title available?)
- A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
- A modified MSA for stochastic control problems
- A neural network-based policy iteration algorithm with global \(H^2\)-superlinear convergence for stochastic games on domains
- A numerical scheme for a mean field game in some queueing systems based on Markov chain approximation method
- A regression-based Monte Carlo method to solve backward stochastic differential equations
- BSDEs with polynomial growth generators
- Backward stochastic differential equations. From linear to fully nonlinear theory
- Continuous-time stochastic control and optimization with financial applications
- Convexity and optimization in Banach spaces.
- Deep backward schemes for high-dimensional nonlinear PDEs
- Exponential convergence and stability of Howard's policy improvement algorithm for controlled diffusions
- Introductory lectures on convex optimization. A basic course.
- Lectures on BSDEs, stochastic control, and stochastic differential games with financial applications
- Maximum principle based algorithms for deep learning
- Mean field games and mean field type control theory
- Nonlinear control systems: An introduction
- On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman Equations
- Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon
- Policy gradient in continuous time
- Reinforcement learning. An introduction
- Solving high-dimensional partial differential equations using deep learning
- Sufficient stochastic maximum principle for discounted control problem
- Time discretization and Markovian iteration for coupled FBSDEs
- Time discretization of FBSDE with polynomial growth drivers and reaction-diffusion PDEs
- Variational Analysis
Cited in
(5)- An explicit Milstein-type scheme for interacting particle systems and McKean-Vlasov SDEs with common noise and non-differentiable drift coefficients
- A fast iterative PDE-based algorithm for feedback controls of nonsmooth mean-field control problems
- The modified MSA, a gradient flow and convergence
- Improved order 1/4 convergence for piecewise constant policy approximation of stochastic control problems
- Near optimality of Lipschitz and smooth policies in controlled diffusions
This page was built for publication: Linear Convergence of a Policy Gradient Method for Some Finite Horizon Continuous Time Control Problems
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6140987)