Multi-agent reinforcement learning: a selective overview of theories and algorithms (Q2094040): Difference between revisions

@@ Property / cites work @@
+Superhuman AI for multiplayer poker
@@ Property / cites work: Superhuman AI for multiplayer poker / rank @@
+Normal rank
@@ Property / cites work @@
+Distributed learning and cooperative control for multi-agent systems
+Normal rank
@@ Property / cites work @@
+Dynamic Potential Games With Constraints: Fundamentals and Applications in Communications
+Normal rank
@@ Property / cites work @@
+A Concise Introduction to Decentralized POMDPs
@@ Property / cites work: A Concise Introduction to Decentralized POMDPs / rank @@
+Normal rank
@@ Property / cites work @@
+Decentralized Q-Learning for Stochastic Teams and Games
+Normal rank
@@ Property / cites work @@
+State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
+Normal rank
@@ Property / cites work @@
+Q3376698
@@ Property / cites work: Q3376698 / rank @@
+Normal rank
@@ Property / cites work @@
+\({\mathcal Q}\)-learning
@@ Property / cites work: \({\mathcal Q}\)-learning / rank @@
+Normal rank
@@ Property / cites work @@
+Convergence results for single-step on-policy reinforcement-learning algorithms
+Normal rank
@@ Property / cites work @@
+An Adaptive Sampling Algorithm for Solving Markov Decision Processes
+Normal rank
@@ Property / cites work @@
+Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem
+Normal rank
@@ Property / cites work @@
+Finite-time analysis of the multiarmed bandit problem
+Normal rank
@@ Property / cites work @@
+Q4626283
@@ Property / cites work: Q4626283 / rank @@
+Normal rank
@@ Property / cites work @@
+Q2934010
@@ Property / cites work: Q2934010 / rank @@
+Normal rank
@@ Property / cites work @@
+Simple statistical gradient-following algorithms for connectionist reinforcement learning
+Normal rank
@@ Property / cites work @@
+Q4533362
@@ Property / cites work: Q4533362 / rank @@
+Normal rank
@@ Property / cites work @@
+OnActor-Critic Algorithms
@@ Property / cites work: OnActor-Critic Algorithms / rank @@
+Normal rank
@@ Property / cites work @@
+Natural actor-critic algorithms
@@ Property / cites work: Natural actor-critic algorithms / rank @@
+Normal rank
@@ Property / cites work @@
+Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
+Normal rank
@@ Property / cites work @@
+Stochastic Games
@@ Property / cites work: Stochastic Games / rank @@
+Normal rank
@@ Property / cites work @@
+Q4226167
@@ Property / cites work: Q4226167 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4382287
@@ Property / cites work: Q4382287 / rank @@
+Normal rank
@@ Property / cites work @@
+Decomposition of dynamic team decision problems
@@ Property / cites work: Decomposition of dynamic team decision problems / rank @@
+Normal rank
@@ Property / cites work @@
+Discrete–Time Stochastic Control and Dynamic Potential Games
+Normal rank
@@ Property / cites work @@
+${{\cal Q} {\cal D}}$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through ${\rm Consensus} + {\rm Innovations}$
+Normal rank
@@ Property / cites work @@
+Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games
+Normal rank
@@ Property / cites work @@
+\(H^ \infty\)-optimal control and related minimax design problems. A dynamic game approach.
+Normal rank
@@ Property / cites work @@
+.1162/1532443041827880
@@ Property / cites work: 10.1162/1532443041827880 / rank @@
+Normal rank
@@ Property / cites work @@
+The Complexity of Decentralized Control of Markov Decision Processes
+Normal rank
@@ Property / cites work @@
+Q3576736
@@ Property / cites work: Q3576736 / rank @@
+Normal rank
@@ Property / cites work @@
+Multiagent Systems
@@ Property / cites work: Multiagent Systems / rank @@
+Normal rank
@@ Property / cites work @@
+The complexity of two-person zero-sum games in extensive form
+Normal rank
@@ Property / cites work @@
+Q5817864
@@ Property / cites work: Q5817864 / rank @@
+Normal rank
@@ Property / cites work @@
+AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
+Normal rank
@@ Property / cites work @@
+If multi-agent learning is the answer, what is the question?
+Normal rank
@@ Property / cites work @@
+Multiagent learning using a variable learning rate
+Normal rank
@@ Property / cites work @@
+Q3524713
@@ Property / cites work: Q3524713 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4552708
@@ Property / cites work: Q4552708 / rank @@
+Normal rank
@@ Property / cites work @@
+Q5148965
@@ Property / cites work: Q5148965 / rank @@
+Normal rank
@@ Property / cites work @@
+Optimal Decentralized Control of Coupled Subsystems With Control Sharing
+Normal rank
@@ Property / cites work @@
+Distributed Policy Evaluation Under Multiple Behavior Strategies
+Normal rank
@@ Property / cites work @@
+Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation
+Normal rank
@@ Property / cites work @@
+The Evolution of Conventions
@@ Property / cites work: The Evolution of Conventions / rank @@
+Normal rank
@@ Property / cites work @@
+Potential games
@@ Property / cites work: Potential games / rank @@
+Normal rank
@@ Property / cites work @@
+Handbook of Dynamic Game Theory
@@ Property / cites work: Handbook of Dynamic Game Theory / rank @@
+Normal rank
@@ Property / cites work @@
+Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle
+Normal rank
@@ Property / cites work @@
+Mean field games
@@ Property / cites work: Mean field games / rank @@
+Normal rank
@@ Property / cites work @@
+Mean Field Games and Mean Field Type Control Theory
+Normal rank
@@ Property / cites work @@
+Risk-Sensitive Mean-Field Games
@@ Property / cites work: Risk-Sensitive Mean-Field Games / rank @@
+Normal rank
@@ Property / cites work @@
+Stochastic networked control systems. Stabilization and optimization under information constraints
+Normal rank
@@ Property / cites work @@
+Distributed learning of average belief over networks using sequential observations
+Normal rank
@@ Property / cites work @@
+Distributed Subgradient Methods for Multi-Agent Optimization
+Normal rank
@@ Property / cites work @@
+Cooperative Convex Optimization in Networked Systems: Augmented Lagrangian Algorithms With Directed Gossip Communication
+Normal rank
@@ Property / cites work @@
+Diffusion Strategies Outperform Consensus Strategies for Distributed Estimation Over Adaptive Networks
+Normal rank
@@ Property / cites work @@
+Q4969098
@@ Property / cites work: Q4969098 / rank @@
+Normal rank
@@ Property / cites work @@
+Q2810885
@@ Property / cites work: Q2810885 / rank @@
+Normal rank
@@ Property / cites work @@
+A Distributed Actor-Critic Algorithm and Applications to Mobile Sensor Network Coordination Problems
+Normal rank
@@ Property / cites work @@
+Stochastic Proximal Gradient Consensus Over Random Networks
+Normal rank
@@ Property / cites work @@
+Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs
+Normal rank
@@ Property / cites work @@
+Performance Bounds in $L_p$‐norm for Approximate Value Iteration
+Normal rank
@@ Property / cites work @@
+Q3096132
@@ Property / cites work: Q3096132 / rank @@
+Normal rank
@@ Property / cites work @@
+Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
+Normal rank
@@ Property / cites work @@
+Harnessing Smoothness to Accelerate Distributed Optimization
+Normal rank
@@ Property / cites work @@
+Minimizing finite sums with the stochastic average gradient
+Normal rank
@@ Property / cites work @@
+Q5477862
@@ Property / cites work: Q5477862 / rank @@
+Normal rank
@@ Property / cites work @@
+Distributed Stochastic Approximation: Weak Convergence and Network Design
+Normal rank
@@ Property / cites work @@
+Q3624177
@@ Property / cites work: Q3624177 / rank @@
+Normal rank
@@ Property / cites work @@
+Optimally Solving Dec-POMDPs as Continuous-State MDPs
+Normal rank
@@ Property / cites work @@
+Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach
+Normal rank
@@ Property / cites work @@
+Q4027186
@@ Property / cites work: Q4027186 / rank @@
+Normal rank
@@ Property / cites work @@
+The Complexity of Computing a Nash Equilibrium
@@ Property / cites work: The Complexity of Computing a Nash Equilibrium / rank @@
+Normal rank
@@ Property / cites work @@
+Q3433855
@@ Property / cites work: Q3433855 / rank @@
+Normal rank
@@ Property / cites work @@
+On Nonterminating Stochastic Games
@@ Property / cites work: On Nonterminating Stochastic Games / rank @@
+Normal rank
@@ Property / cites work @@
+Discounted Markov games: Generalized policy iteration method
+Normal rank
@@ Property / cites work @@
+Algorithms for discounted stochastic games
@@ Property / cites work: Algorithms for discounted stochastic games / rank @@
+Normal rank
@@ Property / cites work @@
+Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor
+Normal rank
@@ Property / cites work @@
+Model-free \(Q\)-learning designs for linear discrete-time zero-sum games with application to \(H^\infty\) control
+Normal rank
@@ Property / cites work @@
+Q2834459
@@ Property / cites work: Q2834459 / rank @@
+Normal rank
@@ Property / cites work @@
+Q2896090
@@ Property / cites work: Q2896090 / rank @@
+Normal rank
@@ Property / cites work @@
+Fast algorithms for finding randomized strategies in game trees
+Normal rank
@@ Property / cites work @@
+Efficient computation of behavior strategies
@@ Property / cites work: Efficient computation of behavior strategies / rank @@
+Normal rank
@@ Property / cites work @@
+Efficient computation of equilibria for extensive two-person games
+Normal rank
@@ Property / cites work @@
+Q4506458
@@ Property / cites work: Q4506458 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3245635
@@ Property / cites work: Q3245635 / rank @@
+Normal rank
@@ Property / cites work @@
+Q5808755
@@ Property / cites work: Q5808755 / rank @@
+Normal rank
@@ Property / cites work @@
+An iterative method of solving a game
@@ Property / cites work: An iterative method of solving a game / rank @@
+Normal rank
@@ Property / cites work @@
+Stochastic Approximations and Differential Inclusions
+Normal rank
@@ Property / cites work @@
+A general class of adaptive strategies
@@ Property / cites work: A general class of adaptive strategies / rank @@
+Normal rank
@@ Property / cites work @@
+Belief affirming in learning processes
@@ Property / cites work: Belief affirming in learning processes / rank @@
+Normal rank
@@ Property / cites work @@
+No-regret dynamics and fictitious play
@@ Property / cites work: No-regret dynamics and fictitious play / rank @@
+Normal rank
@@ Property / cites work @@
+Q4421713
@@ Property / cites work: Q4421713 / rank @@
+Normal rank
@@ Property / cites work @@
+Consistency and cautious fictitious play
@@ Property / cites work: Consistency and cautious fictitious play / rank @@
+Normal rank
@@ Property / cites work @@
+On the Global Convergence of Stochastic Fictitious Play
+Normal rank
@@ Property / cites work @@
+Generalised weakened fictitious play
@@ Property / cites work: Generalised weakened fictitious play / rank @@
+Normal rank
@@ Property / cites work @@
+Consistency of Vanishingly Smooth Fictitious Play
@@ Property / cites work: Consistency of Vanishingly Smooth Fictitious Play / rank @@
+Normal rank
@@ Property / cites work @@
+Sampled fictitious play is Hannan consistent
@@ Property / cites work: Sampled fictitious play is Hannan consistent / rank @@
+Normal rank
@@ Property / cites work @@
+Q3093261
@@ Property / cites work: Q3093261 / rank @@
+Normal rank
@@ Property / cites work @@
+Stochastic approximation. A dynamical systems viewpoint.
+Normal rank
@@ Property / cites work @@
+Prediction, Learning, and Games
@@ Property / cites work: Prediction, Learning, and Games / rank @@
+Normal rank
@@ Property / cites work @@
+The Nonstochastic Multiarmed Bandit Problem
@@ Property / cites work: The Nonstochastic Multiarmed Bandit Problem / rank @@
+Normal rank
@@ Property / cites work @@
+The weighted majority algorithm
@@ Property / cites work: The weighted majority algorithm / rank @@
+Normal rank
@@ Property / cites work @@
+Adaptive game playing using multiplicative weights
+Normal rank
@@ Property / cites work @@
+A Simple Adaptive Procedure Leading to Correlated Equilibrium
+Normal rank
@@ Property / cites work @@
+Revisiting CFR+ and Alternating Updates
@@ Property / cites work: Revisiting CFR+ and Alternating Updates / rank @@
+Normal rank
@@ Property / cites work @@
+Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games
+Normal rank
@@ Property / cites work @@
+Q5381139
@@ Property / cites work: Q5381139 / rank @@
+Normal rank
@@ Property / cites work @@
+Settling the complexity of computing two-player Nash equilibria
+Normal rank
@@ Property / cites work @@
+Subjectivity and correlation in randomized strategies
+Normal rank
@@ Property / cites work @@
+Markov--Nash Equilibria in Mean-Field Games with Discounted Cost
+Normal rank
@@ Property / cites work @@
+Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions
+Normal rank
@@ Property / cites work @@
+Discrete-time average-cost mean-field games on Polish spaces
+Normal rank
@@ Property / cites work @@
+Approximate Markov-Nash Equilibria for Discrete-Time Risk-Sensitive Mean-Field Games
+Normal rank
@@ Property / cites work @@
+Finite mean field games: fictitious play and convergence to a first order continuous mean field game
+Normal rank
@@ Property / cites work @@
+Value iteration algorithm for mean-field games
@@ Property / cites work: Value iteration algorithm for mean-field games / rank @@
+Normal rank
@@ Property / cites work @@
+A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
+Normal rank
@@ Property / cites work @@
+The challenge of poker
@@ Property / cites work: The challenge of poker / rank @@
+Normal rank
@@ Property / cites work @@
+Q5801632
@@ Property / cites work: Q5801632 / rank @@
+Normal rank
@@ Property / cites work @@
+DeepStack: Expert-level artificial intelligence in heads-up no-limit poker
+Normal rank
@@ Property / cites work @@
+Superhuman AI for heads-up no-limit poker: Libratus beats top professionals
+Normal rank
@@ Property / cites work @@
+A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
+Normal rank
@@ Property / cites work @@
+.1162/153244303765208377
@@ Property / cites work: 10.1162/153244303765208377 / rank @@
+Normal rank
@@ Property / cites work @@
+Q5744808
@@ Property / cites work: Q5744808 / rank @@
+Normal rank