A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

From MaRDI portal
Publication:5218653

DOI10.1126/science.aar6404zbMath1433.68320OpenAlexW2902907165WikidataQ59594962 ScholiaQ59594962MaRDI QIDQ5218653

David Silver, Marc Lanctot, Thomas Hubert, Laurent Sifre, Dharshan Kumaran, Karen Simonyan, Timothy P. Lillicrap, Demis Hassabis, Arthur Guez, Matthew Lai, Thore Graepel, Julian Schrittwieser, Ioannis Antonoglou

Publication date: 4 March 2020

Published in: Science (Search for Journal in Brave)

Full work available at URL: https://discovery.ucl.ac.uk/id/eprint/10069050/



Related Items

Comparative analysis of machine learning methods for active flow control, A Proof that Artificial Neural Networks Overcome the Curse of Dimensionality in the Numerical Approximation of Black–Scholes Partial Differential Equations, Deep Statistical Model Checking, Construction of symmetric orthogonal designs with deep Q-network and orthogonal complementary design, Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management, Unnamed Item, Archetypal landscapes for deep neural networks, The unreasonable effectiveness of deep learning in artificial intelligence, Planning for potential: efficient safe reinforcement learning, Scalable Online Planning for Multi-Agent MDPs, Deep policy dynamic programming for vehicle routing problems, A machine learning framework for LES closure terms, Comparison of deep neural networks and deep hierarchical models for spatio-temporal data, The explanation game: a formal framework for interpretable machine learning, Explore and Exploit with Heterotic Line Bundle Models, Optimal production ramp‐up in the smartphone manufacturing industry, Risk-aware shielding of partially observable Monte Carlo planning policies, Smoothing policies and safe policy gradients, Is there a role for statistics in artificial intelligence?, Spatial state-action features for general games, A reinforcement learning approach to the stochastic cutting stock problem, Forecasting Hamiltonian dynamics without canonical coordinates, Unnamed Item, Enhancing differential-neural cryptanalysis, What will drive global economic growth in the digital age?, A policy-based learning beam search for combinatorial optimization, A Reinforcement Learning Based Slope Limiter for Two-Dimensional Finite Volume Schemes, Pessimistic value iteration for multi-task data sharing in offline reinforcement learning, A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms, Synthesizing explainable counterfactual policies for algorithmic recourse with program synthesis, Learning key steps to attack deep reinforcement learning agents, Quantum circuit compilation for nearest-neighbor architecture based on reinforcement learning, Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective, Simulation-based search, Inductive general game playing, On solving the problem of 7-piece chess endgames, Provable Training of a ReLU Gate with an Iterative Non-Gradient Algorithm, Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets, Making sense of sensory input, Deliberative acting, planning and learning with hierarchical operational models, Reward is enough, Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme, Meta-modeling game for deriving theory-consistent, microstructure-based traction-separation laws via deep reinforcement learning, Deep reinforcement learning for the optimal placement of cryptocurrency limit orders, A non-cooperative meta-modeling game for automated third-party calibrating, validating and falsifying constitutive laws with parallelized adversarial attacks, Topological properties of the set of functions generated by neural networks of fixed size, Dynamic selective maintenance optimization for multi-state systems over a finite horizon: a deep reinforcement learning approach, Artificial Intelligence, Chaos, Prediction and Understanding in Science, The Hanabi challenge: a new frontier for AI research, Automatic discovery of interpretable planning strategies, A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation, Unnamed Item, Compact and efficient encodings for planning in factored state and action spaces with learned binarized neural network transition models, Deep reinforcement learning for \textsf{FlipIt} security game, What may lie ahead in reinforcement learning, Multi-agent reinforcement learning: a selective overview of theories and algorithms, Reinforcement learning: an industrial perspective, Benchmark and Survey of Automated Machine Learning Frameworks, Constrained Multiagent Markov Decision Processes: a Taxonomy of Problems and Algorithms, Induction and Exploitation of Subgoal Automata for Reinforcement Learning, World-class interpretable poker, Model-based Reinforcement Learning: A Survey


Uses Software