Multi-agent reinforcement learning: a selective overview of theories and algorithms
From MaRDI portal
Publication:2094040
DOI10.1007/978-3-030-60990-0_12OpenAlexW2991046523MaRDI QIDQ2094040FDOQ2094040
Authors: Kaiqing Zhang, Zhuoran Yang, Tamer Başar
Publication date: 28 October 2022
Full work available at URL: https://arxiv.org/abs/1911.10635
Cites Work
- DeepStack: expert-level artificial intelligence in heads-up no-limit poker
- Harnessing Smoothness to Accelerate Distributed Optimization
- Decentralized Q-Learning for Stochastic Teams and Games
- Sampled fictitious play is Hannan consistent
- An emphatic approach to the problem of off-policy temporal-difference learning
- Superhuman AI for multiplayer poker
- Optimally solving Dec-POMDPs as continuous-state MDPs
- A concise introduction to decentralized POMDPs
- Stochastic Proximal Gradient Consensus Over Random Networks
- Value iteration algorithm for mean-field games
- Finite mean field games: fictitious play and convergence to a first order continuous mean field game
- Approximate Markov-Nash equilibria for discrete-time risk-sensitive mean-field games
- ${{\cal Q} {\cal D}}$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through ${\rm Consensus} + {\rm Innovations}$
- Approximate Nash equilibria in partially observed stochastic games with mean-field interactions
- Distributed learning of average belief over networks using sequential observations
- Distributed Policy Evaluation Under Multiple Behavior Strategies
- Discrete-time average-cost mean-field games on Polish spaces
- Finite-time performance of distributed temporal-difference learning with linear function approximation
- Global convergence of policy gradient methods to (almost) locally optimal policies
- Title not available (Why is that?)
- Title not available (Why is that?)
- The challenge of poker
- 10.1162/153244303765208377
- Superhuman AI for heads-up no-limit poker: Libratus beats top professionals
- AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
- State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
- Prediction, Learning, and Games
- Title not available (Why is that?)
- 10.1162/1532443041827880
- Stochastic Games
- \({\mathcal Q}\)-learning
- Stochastic approximation. A dynamical systems viewpoint.
- Mean field games
- Discounted Markov games: Generalized policy iteration method
- Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- The Nonstochastic Multiarmed Bandit Problem
- Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games
- Mean field games and mean field type control theory
- The Complexity of Decentralized Control of Markov Decision Processes
- Finite-time analysis of the multiarmed bandit problem
- Subjectivity and correlation in randomized strategies
- The weighted majority algorithm
- Belief affirming in learning processes
- Potential games
- Title not available (Why is that?)
- Dynamic programming and optimal control. Vol. 1.
- A course in game theory.
- Multiagent Systems
- A Simple Adaptive Procedure Leading to Correlated Equilibrium
- Distributed Subgradient Methods for Multi-Agent Optimization
- The complexity of computing a Nash equilibrium
- Title not available (Why is that?)
- Title not available (Why is that?)
- \(H^ \infty\)-optimal control and related minimax design problems. A dynamic game approach.
- A general class of adaptive strategies
- Consistency and cautious fictitious play
- Multiagent learning using a variable learning rate
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- The Evolution of Conventions
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Efficient computation of equilibria for extensive two-person games
- Adaptive game playing using multiplicative weights
- Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games
- Title not available (Why is that?)
- Cooperative Convex Optimization in Networked Systems: Augmented Lagrangian Algorithms With Directed Gossip Communication
- Distributed learning and cooperative control for multi-agent systems
- Natural actor-critic algorithms
- An iterative method of solving a game
- Risk-Sensitive Mean-Field Games
- OnActor-Critic Algorithms
- On the Global Convergence of Stochastic Fictitious Play
- Stochastic networked control systems. Stabilization and optimization under information constraints
- Reinforcement learning. An introduction
- Settling the complexity of computing two-player Nash equilibria
- A Distributed Actor-Critic Algorithm and Applications to Mobile Sensor Network Coordination Problems
- Reinforcement learning with replacing eligibility traces
- Finite-time bounds for fitted value iteration
- Stochastic Approximations and Differential Inclusions
- Near-optimal regret bounds for reinforcement learning
- Title not available (Why is that?)
- Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach
- Decomposition of dynamic team decision problems
- Optimal Decentralized Control of Coupled Subsystems With Control Sharing
- Performance Bounds in $L_p$‐norm for Approximate Value Iteration
- Learning, regret minimization, and equilibria
- The complexity of two-person zero-sum games in extensive form
- Fast algorithms for finding randomized strategies in game trees
- Title not available (Why is that?)
- If multi-agent learning is the answer, what is the question?
- Algorithms for discounted stochastic games
- On Nonterminating Stochastic Games
- Model-free \(Q\)-learning designs for linear discrete-time zero-sum games with application to \(H^\infty\) control
- Title not available (Why is that?)
- An Adaptive Sampling Algorithm for Solving Markov Decision Processes
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Efficient computation of behavior strategies
- Title not available (Why is that?)
- Markov-Nash equilibria in mean-field games with discounted cost
- Consistency of vanishingly smooth fictitious play
- Minimizing finite sums with the stochastic average gradient
- Generalised weakened fictitious play
- Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
- Handbook of dynamic game theory. In 2 volumes
- Diffusion Strategies Outperform Consensus Strategies for Distributed Estimation Over Adaptive Networks
- Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor
- Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
- Title not available (Why is that?)
- A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
- No-regret dynamics and fictitious play
- A comprehensive survey on safe reinforcement learning
- Expected policy gradients for reinforcement learning
- Dynamic Potential Games With Constraints: Fundamentals and Applications in Communications
- Title not available (Why is that?)
- Regularized policy iteration with nonparametric function spaces
- Revisiting CFR\(^+\) and alternating updates
- Policy evaluation with temporal differences: a survey and comparison
- Distributed Stochastic Approximation: Weak Convergence and Network Design
- Discrete-time stochastic control and dynamic potential games. The Euler-equation approach
Cited In (29)
- Predator-prey survival pressure is sufficient to evolve swarming behaviors
- The possible and the impossible in multi-agent learning
- Scalable Online Planning for Multi-Agent MDPs
- A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets
- Cournot policy model: rethinking centralized training in multi-agent reinforcement learning
- Approximated multi-agent fitted Q iteration
- Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games
- Learning Stationary Nash Equilibrium Policies in \(n\)-Player Stochastic Games with Independent Chains
- Multi-agent natural actor-critic reinforcement learning algorithms
- Fictitious play in zero-sum stochastic games
- Robustness and sample complexity of model-based MARL for general-sum Markov games
- On neural networks application in integral sliding mode control
- Statistical inference for generative adversarial networks and other minimax problems
- Zeroth-order algorithms for nonconvex-strongly-concave minimax problems with improved complexities
- Toward multi-target self-organizing pursuit in a partially observable Markov game
- Scalable Reinforcement Learning for Multiagent Networked Systems
- Dynamics and risk sharing in groups of selfish individuals
- Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response
- TEAMSTER: model-based reinforcement learning for ad hoc teamwork
- Recent developments in machine learning methods for stochastic control and games
- Independent learning in stochastic games
- Stackelberg population dynamics: a predictive-sensitivity approach
- Fully asynchronous policy evaluation in distributed reinforcement learning over networks
- Reinforcement learning in a prisoner's dilemma
- Mean-field controls with Q-learning for cooperative MARL: convergence and complexity analysis
- A mini review on UAV mission planning
- Title not available (Why is that?)
- Finite-time error bounds for distributed linear stochastic approximation
- An optimal Bayesian intervention policy in response to unknown dynamic cell stimuli
Uses Software
This page was built for publication: Multi-agent reinforcement learning: a selective overview of theories and algorithms
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2094040)