Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning
From MaRDI portal
Publication:6409792
arXiv2209.02179MaRDI QIDQ6409792FDOQ6409792
Authors: Jinchi Chen, Jie Feng, Weiguo Gao, Ke Wei
Publication date: 5 September 2022
Abstract: This paper studies a policy optimization problem arising from collaborative multi-agent reinforcement learning in a decentralized setting where agents communicate with their neighbors over an undirected graph to maximize the sum of their cumulative rewards. A novel decentralized natural policy gradient method, dubbed Momentum-based Decentralized Natural Policy Gradient (MDNPG), is proposed, which incorporates natural gradient, momentum-based variance reduction, and gradient tracking into the decentralized stochastic gradient ascent framework. The sample complexity for MDNPG to converge to an -stationary point has been established under standard assumptions, where is the number of agents. It indicates that MDNPG can achieve the optimal convergence rate for decentralized policy gradient methods and possesses a linear speedup in contrast to centralized optimization methods. Moreover, superior empirical performance of MDNPG over other state-of-the-art algorithms has been demonstrated by extensive numerical experiments.
Has companion code repository: https://github.com/fccc0417/mdnpg
This page was built for publication: Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6409792)