A convex optimization approach to dynamic programming in continuous state and action spaces

From MaRDI portal
Publication:831365

DOI10.1007/S10957-020-01747-1zbMATH Open1467.90085arXiv1810.03847OpenAlexW3104081151MaRDI QIDQ831365FDOQ831365


Authors: Insoon Yang Edit this on Wikidata


Publication date: 11 May 2021

Published in: Journal of Optimization Theory and Applications (Search for Journal in Brave)

Abstract: In this paper, a convex optimization-based method is proposed for numerically solving dynamic programs in continuous state and action spaces. The key idea is to approximate the output of the Bellman operator at a particular state by the optimal value of a convex program. The approximate Bellman operator has a computational advantage because it involves a convex optimization problem in the case of control-affine systems and convex costs. Using this feature, we propose a simple dynamic programming algorithm to evaluate the approximate value function at pre-specified grid points by solving convex optimization problems in each iteration. We show that the proposed method approximates the optimal value function with a uniform convergence property in the case of convex optimal value functions. We also propose an interpolation-free design method for a control policy, of which performance converges uniformly to the optimum as the grid resolution becomes finer. When a nonlinear control-affine system is considered, the convex optimization approach provides an approximate policy with a provable suboptimality bound. For general cases, the proposed convex formulation of dynamic programming operators can be modified as a nonconvex bi-level program, in which the inner problem is a linear program, without losing uniform convergence properties.


Full work available at URL: https://arxiv.org/abs/1810.03847




Recommendations




Cites Work


Cited In (8)





This page was built for publication: A convex optimization approach to dynamic programming in continuous state and action spaces

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q831365)