Error controlled actor-critic

DOI10.1016/J.INS.2022.08.079arXiv2109.02517MaRDI QIDQ6205028FDOQ6205028

Authors: Xingen Gao, Fei Chao, Chang-Le Zhou, Zhen Ge, Longzhi Yang, Xiang Chang, Changjing Shang, Qiang Shen

Publication date: 11 April 2024

Published in: Information Sciences (Search for Journal in Brave)

Abstract: On error of value function inevitably causes an overestimation phenomenon and has a negative impact on the convergence of the algorithms. To mitigate the negative effects of the approximation error, we propose Error Controlled Actor-critic which ensures confining the approximation error in value function. We present an analysis of how the approximation error can hinder the optimization process of actor-critic methods.Then, we derive an upper boundary of the approximation error of Q function approximator and find that the error can be lowered by restricting on the KL-divergence between every two consecutive policies when training the policy. The results of experiments on a range of continuous control tasks demonstrate that the proposed actor-critic algorithm apparently reduces the approximation error and significantly outperforms other model-free RL algorithms.

Full work available at URL: https://arxiv.org/abs/2109.02517

Recommendations

zbMATH Keywords

KL-divergence reinforcement learning actor-critic approximation error overestimation

Mathematics Subject Classification ID

Numerical analysis (65-XX) Computer science (68-XX)

Cites Work

This page was built for publication: Error controlled actor-critic

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6205028)