Linear Convergence of a Policy Gradient Method for Some Finite Horizon Continuous Time Control Problems

From MaRDI portal
Publication:6140987

DOI10.1137/22M1492180arXiv2203.11758OpenAlexW4389301972MaRDI QIDQ6140987FDOQ6140987


Authors: C. Reisinger, Wolfgang Stockinger, Yu-Fei Zhang Edit this on Wikidata


Publication date: 2 January 2024

Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)

Abstract: Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for continuous space-time control problems with nonlinear state dynamics has been elusive. This paper proposes proximal gradient algorithms for feedback controls of finite-time horizon stochastic control problems. The state dynamics are nonlinear diffusions with control-affine drift, and the cost functions are nonconvex in the state and nonsmooth in the control. The system noise can degenerate, which allows for deterministic control problems as special cases. We prove under suitable conditions that the algorithm converges linearly to a stationary point of the control problem, and is stable with respect to policy updates by approximate gradient steps. The convergence result justifies the recent reinforcement learning heuristics that adding entropy regularization or a fictitious discount factor to the optimization objective accelerates the convergence of policy gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations.


Full work available at URL: https://arxiv.org/abs/2203.11758







Cites Work


Cited In (5)





This page was built for publication: Linear Convergence of a Policy Gradient Method for Some Finite Horizon Continuous Time Control Problems

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6140987)