Linear Convergence of a Policy Gradient Method for Some Finite Horizon Continuous Time Control Problems

DOI10.1137/22M1492180MaRDI QIDQ6140987zbMATH OpenOpenAlexFDO

Authors C. Reisinger, Wolfgang Stockinger, Yu-Fei Zhang

Publication date 2 January 2024

Published in SIAM Journal on Control and Optimization (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/2203.11758

backward stochastic differential equation stochastic control stationary point reinforcement learning linear convergence policy gradient method

Mathematics Subject Classification ID

Analysis of algorithms and problem complexity (68Q25) Optimal stochastic control (93E20) Numerical methods based on necessary conditions (49M05)

Abstract: Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for continuous space-time control problems with nonlinear state dynamics has been elusive. This paper proposes proximal gradient algorithms for feedback controls of finite-time horizon stochastic control problems. The state dynamics are nonlinear diffusions with control-affine drift, and the cost functions are nonconvex in the state and nonsmooth in the control. The system noise can degenerate, which allows for deterministic control problems as special cases. We prove under suitable conditions that the algorithm converges linearly to a stationary point of the control problem, and is stable with respect to policy updates by approximate gradient steps. The convergence result justifies the recent reinforcement learning heuristics that adding entropy regularization or a fictitious discount factor to the optimization objective accelerates the convergence of policy gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations.

Cites work

Cited in

(5)

This page was built for publication: Linear Convergence of a Policy Gradient Method for Some Finite Horizon Continuous Time Control Problems

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6140987)