Error controlled actor-critic

From MaRDI portal
Publication:6205028

DOI10.1016/J.INS.2022.08.079arXiv2109.02517MaRDI QIDQ6205028FDOQ6205028


Authors: Xingen Gao, Fei Chao, Chang-Le Zhou, Zhen Ge, Longzhi Yang, Xiang Chang, Changjing Shang, Qiang Shen Edit this on Wikidata


Publication date: 11 April 2024

Published in: Information Sciences (Search for Journal in Brave)

Abstract: On error of value function inevitably causes an overestimation phenomenon and has a negative impact on the convergence of the algorithms. To mitigate the negative effects of the approximation error, we propose Error Controlled Actor-critic which ensures confining the approximation error in value function. We present an analysis of how the approximation error can hinder the optimization process of actor-critic methods.Then, we derive an upper boundary of the approximation error of Q function approximator and find that the error can be lowered by restricting on the KL-divergence between every two consecutive policies when training the policy. The results of experiments on a range of continuous control tasks demonstrate that the proposed actor-critic algorithm apparently reduces the approximation error and significantly outperforms other model-free RL algorithms.


Full work available at URL: https://arxiv.org/abs/2109.02517







Cites Work






This page was built for publication: Error controlled actor-critic

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6205028)