Exponential convergence and stability of Howard's policy improvement algorithm for controlled diffusions

DOI10.1137/19M1236758zbMATH Open1441.93343arXiv1812.07846WikidataQ114978697 ScholiaQ114978697MaRDI QIDQ5111071FDOQ5111071

Authors: B. Kerimkulov, David Šiška, Lukasz Szpruch

Publication date: 26 May 2020

Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)

Abstract: Optimal control problems are inherently hard to solve as the optimization must be performed simultaneously with updating the underlying system. Starting from an initial guess, Howard's policy improvement algorithm separates the step of updating the trajectory of the dynamical system from the optimization and iterations of this should converge to the optimal control. In the discrete space-time setting this is often the case and even rates of convergence are known. In the continuous space-time setting of controlled diffusion the algorithm consists of solving a linear PDE followed by maximization problem. This has been shown to converge, in some situations, however no global rate of is known. The first main contribution of this paper is to establish global rate of convergence for the policy improvement algorithm and a variant, called here the gradient iteration algorithm. The second main contribution is the proof of stability of the algorithms under perturbations to both the accuracy of the linear PDE solution and the accuracy of the maximization step. The proof technique is new in this context as it uses the theory of backward stochastic differential equations.

Full work available at URL: https://arxiv.org/abs/1812.07846

Recommendations

zbMATH Keywords

backward stochastic differential equation stochastic control policy improvement algorithm

Mathematics Subject Classification ID

Stochastic ordinary differential equations (aspects of stochastic analysis) (60H10) Optimal stochastic control (93E20) Exponential stability (93D23)

Cites Work

Cited In (16)

This page was built for publication: Exponential convergence and stability of Howard's policy improvement algorithm for controlled diffusions

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5111071)