Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

arXiv2209.02179MaRDI QIDQ6409792FDOQ6409792

Authors: Jinchi Chen, Jie Feng, Weiguo Gao, Ke Wei

Publication date: 5 September 2022

Abstract: This paper studies a policy optimization problem arising from collaborative multi-agent reinforcement learning in a decentralized setting where agents communicate with their neighbors over an undirected graph to maximize the sum of their cumulative rewards. A novel decentralized natural policy gradient method, dubbed Momentum-based Decentralized Natural Policy Gradient (MDNPG), is proposed, which incorporates natural gradient, momentum-based variance reduction, and gradient tracking into the decentralized stochastic gradient ascent framework. The

m a t h c a l O (n^{- 1} e p s i l o n^{- 3})

sample complexity for MDNPG to converge to an

e p s i l o n

-stationary point has been established under standard assumptions, where

n

is the number of agents. It indicates that MDNPG can achieve the optimal convergence rate for decentralized policy gradient methods and possesses a linear speedup in contrast to centralized optimization methods. Moreover, superior empirical performance of MDNPG over other state-of-the-art algorithms has been demonstrated by extensive numerical experiments.

Has companion code repository: https://github.com/fccc0417/mdnpg

This page was built for publication: Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6409792)