Model-based reinforcement learning for approximate optimal regulation

DOI10.1016/J.AUTOMATICA.2015.10.039zbMATH Open1329.93051arXiv1304.3477OpenAlexW2086975818MaRDI QIDQ899267FDOQ899267

Authors: Patrick Walters, Warren E. Dixon, Rushikesh Kamalapurkar

Publication date: 23 December 2015

Published in: Automatica (Search for Journal in Brave)

Abstract: In deterministic systems, reinforcement learning-based online approximate optimal control methods typically require a restrictive persistence of excitation (PE) condition for convergence. This paper presents a concurrent learning-based solution to the online approximate optimal regulation problem that eliminates the need for PE. The development is based on the observation that given a model of the system, the Bellman error, which quantifies the deviation of the system Hamiltonian from the optimal Hamiltonian, can be evaluated at any point in the state space. Further, a concurrent learning-based parameter identifier is developed to compensate for parametric uncertainty in the plant dynamics. Uniformly ultimately bounded (UUB) convergence of the system states to the origin, and UUB convergence of the developed policy to the optimal policy are established using a Lyapunov-based analysis, and simulations are performed to demonstrate the performance of the developed controller.

Full work available at URL: https://arxiv.org/abs/1304.3477

Recommendations

zbMATH Keywords

system identification adaptive control data-based control model-based reinforcement learning concurrent learning simulated experience

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Applications of optimal control and differential games (49N90) System identification (93B30) Adaptive control/observation systems (93C40)

Cites Work

Cited In (30)

Uses Software

PILCO

This page was built for publication: Model-based reinforcement learning for approximate optimal regulation

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q899267)