Provably efficient reinforcement learning in decentralized general-sum Markov games
From MaRDI portal
Publication:6159512
Abstract: This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents' strategies. We propose an algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates. We show that the agents can find an -approximate CCE in at most episodes, where is the number of states, is the size of the largest individual action space, and is the length of an episode. This appears to be the first sample complexity result for learning in generic general-sum Markov games. Our results rely on a novel investigation of an anytime high-probability regret bound for OMD with a dynamic learning rate and weighted regret, which would be of independent interest. One key feature of our algorithm is that it is fully emph{decentralized}, in the sense that each agent has access to only its local information, and is completely oblivious to the presence of others. This way, our algorithm can readily scale up to an arbitrary number of agents, without suffering from the exponential dependence on the number of agents.
Recommendations
- Robustness and sample complexity of model-based MARL for general-sum Markov games
- Deep Q-Learning for Nash Equilibria: Nash-DQN
- 10.1162/1532443041827880
- Learning to play efficient coarse correlated equilibria
- Partially decentralized reinforcement learning in finite, multi-agent Markov decision processes
Cites work
- scientific article; zbMATH DE number 3790208 (Why is no real title available?)
- scientific article; zbMATH DE number 1306865 (Why is no real title available?)
- 10.1162/1532443041827880
- A Simple Adaptive Procedure Leading to Correlated Equilibrium
- Common Information Based Markov Perfect Equilibria for Stochastic Games With Asymmetric Information: Finite Games
- Computing correlated equilibria in multi-player games
- Convex optimization: algorithms and complexity
- Correlated Equilibrium as an Expression of Bayesian Rationality
- Decentralized Q-Learning for Stochastic Teams and Games
- Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach
- Individual Q-Learning in Normal Form Games
- Near-optimal regret bounds for reinforcement learning
- Prediction, Learning, and Games
- Scale-free online learning
- Settling the complexity of computing two-player Nash equilibria
- Stochastic Games
- Stochastic networked control systems. Stabilization and optimization under information constraints
- Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon
- Superhuman AI for heads-up no-limit poker: Libratus beats top professionals
- The Complexity of Decentralized Control of Markov Decision Processes
- The complexity of computing a Nash equilibrium
Cited in
(9)- Cournot policy model: rethinking centralized training in multi-agent reinforcement learning
- Decentralized Learning in Finite Markov Chains: Revisited
- Robustness and sample complexity of model-based MARL for general-sum Markov games
- Special issue: multi-agent dynamic decision making and learning
- Multiagent Fully Decentralized Value Function Learning With Linear Convergence Rates
- Byzantine-Resilient Decentralized Policy Evaluation With Linear Function Approximation
- Multi-agent inverse reinforcement learning for certain general-sum stochastic games
- Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
- Decentralized Q-Learning for Stochastic Teams and Games
This page was built for publication: Provably efficient reinforcement learning in decentralized general-sum Markov games
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6159512)