Multi-agent reinforcement learning: a selective overview of theories and algorithms
From MaRDI portal
Publication:2094040
DOI10.1007/978-3-030-60990-0_12OpenAlexW2991046523MaRDI QIDQ2094040
Kaiqing Zhang, Zhuoran Yang, Tamer Başar
Publication date: 28 October 2022
Full work available at URL: https://arxiv.org/abs/1911.10635
Related Items
Scalable Reinforcement Learning for Multiagent Networked Systems, Scalable Online Planning for Multi-Agent MDPs, Stackelberg population dynamics: a predictive-sensitivity approach, Fictitious Play in Zero-Sum Stochastic Games, Toward multi-target self-organizing pursuit in a partially observable Markov game, Zeroth-order algorithms for nonconvex-strongly-concave minimax problems with improved complexities, TEAMSTER: model-based reinforcement learning for ad hoc teamwork, A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets, Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games, Learning Stationary Nash Equilibrium Policies in \(n\)-Player Stochastic Games with Independent Chains, Multi-agent natural actor-critic reinforcement learning algorithms, Robustness and sample complexity of model-based MARL for general-sum Markov games, Approximated multi-agent fitted Q iteration, Independent learning in stochastic games, A mini review on UAV mission planning, Dynamics and risk sharing in groups of selfish individuals, Fully asynchronous policy evaluation in distributed reinforcement learning over networks, Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis
Uses Software
Cites Work
- Stochastic Approximations and Differential Inclusions
- An Adaptive Sampling Algorithm for Solving Markov Decision Processes
- Mean Field Games and Mean Field Type Control Theory
- Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach
- Optimal Decentralized Control of Coupled Subsystems With Control Sharing
- Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor
- Performance Bounds in $L_p$‐norm for Approximate Value Iteration
- Prediction, Learning, and Games
- On the Global Convergence of Stochastic Fictitious Play
- On Nonterminating Stochastic Games
- The Complexity of Decentralized Control of Markov Decision Processes
- Stochastic Games
- \(H^ \infty\)-optimal control and related minimax design problems. A dynamic game approach.
- A general class of adaptive strategies
- The challenge of poker
- Finite-time analysis of the multiarmed bandit problem
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Minimizing finite sums with the stochastic average gradient
- Consistency and cautious fictitious play
- Model-free \(Q\)-learning designs for linear discrete-time zero-sum games with application to \(H^\infty\) control
- Distributed learning and cooperative control for multi-agent systems
- Mean field games
- Stochastic approximation. A dynamical systems viewpoint.
- If multi-agent learning is the answer, what is the question?
- Natural actor-critic algorithms
- The complexity of two-person zero-sum games in extensive form
- Subjectivity and correlation in randomized strategies
- Discounted Markov games: Generalized policy iteration method
- The weighted majority algorithm
- Belief affirming in learning processes
- Convergence results for single-step on-policy reinforcement-learning algorithms
- A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
- Multiagent learning using a variable learning rate
- Sampled fictitious play is Hannan consistent
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- \({\mathcal Q}\)-learning
- Efficient computation of behavior strategies
- Efficient computation of equilibria for extensive two-person games
- Potential games
- Adaptive game playing using multiplicative weights
- Stochastic networked control systems. Stabilization and optimization under information constraints
- No-regret dynamics and fictitious play
- Value iteration algorithm for mean-field games
- Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games
- Distributed learning of average belief over networks using sequential observations
- Finite mean field games: fictitious play and convergence to a first order continuous mean field game
- AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
- Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle
- Generalised weakened fictitious play
- Algorithms for discounted stochastic games
- An iterative method of solving a game
- Optimally Solving Dec-POMDPs as Continuous-State MDPs
- Fast algorithms for finding randomized strategies in game trees
- Discrete–Time Stochastic Control and Dynamic Potential Games
- A Concise Introduction to Decentralized POMDPs
- Distributed Stochastic Approximation: Weak Convergence and Network Design
- Distributed Policy Evaluation Under Multiple Behavior Strategies
- Risk-Sensitive Mean-Field Games
- 10.1162/153244303765208377
- Revisiting CFR+ and Alternating Updates
- Approximate Markov-Nash Equilibria for Discrete-Time Risk-Sensitive Mean-Field Games
- Settling the complexity of computing two-player Nash equilibria
- Multiagent Systems
- State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
- Decomposition of dynamic team decision problems
- OnActor-Critic Algorithms
- A Simple Adaptive Procedure Leading to Correlated Equilibrium
- Markov--Nash Equilibria in Mean-Field Games with Discounted Cost
- Cooperative Convex Optimization in Networked Systems: Augmented Lagrangian Algorithms With Directed Gossip Communication
- Diffusion Strategies Outperform Consensus Strategies for Distributed Estimation Over Adaptive Networks
- ${{\cal Q} {\cal D}}$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through ${\rm Consensus} + {\rm Innovations}$
- Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs
- Stochastic Proximal Gradient Consensus Over Random Networks
- Dynamic Potential Games With Constraints: Fundamentals and Applications in Communications
- Harnessing Smoothness to Accelerate Distributed Optimization
- DeepStack: Expert-level artificial intelligence in heads-up no-limit poker
- The Nonstochastic Multiarmed Bandit Problem
- 10.1162/1532443041827880
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- Superhuman AI for heads-up no-limit poker: Libratus beats top professionals
- Discrete-time average-cost mean-field games on Polish spaces
- Distributed Subgradient Methods for Multi-Agent Optimization
- Handbook of Dynamic Game Theory
- A Distributed Actor-Critic Algorithm and Applications to Mobile Sensor Network Coordination Problems
- Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation
- Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions
- Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
- Consistency of Vanishingly Smooth Fictitious Play
- Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games
- The Complexity of Computing a Nash Equilibrium
- Superhuman AI for multiplayer poker
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
- Decentralized Q-Learning for Stochastic Teams and Games
- The Evolution of Conventions
- Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path