Simulation-based search (Q6198646)

From MaRDI portal

Jump to:navigation, search

scientific article; zbMATH DE number 7821712

Language	Label	Description	Also known as
English	Simulation-based search	scientific article; zbMATH DE number 7821712

Statements

scholarly article

0 references

Simulation-based search (English)

0 references

10.4171/icm2022/180

0 references

0 references

André M. S. Barreto

0 references

International Congress of Mathematicians

0 references

publication date

20 March 2024

0 references

Summary: Planning is one of the oldest and most important problems in artificial intelligence. Simulation-based search algorithms, such as AlphaZero, have achieved superhuman performance in chess and Go and are used widely in real-world applications of planning. In this paper we provide a unified framework for simulation-based search. Algorithms in this framework interleave operators for policy evaluation (better estimating the value function of the current policy) and policy improvement (using the value function to form a better policy). These operators are applied to states and actions that are sampled in sequential trajectories, and that may branch recursively into other sampled trajectories. The value function and policy may also be represented by a function approximator. Our framework includes a broad family of search algorithms that includes Monte-Carlo tree search, sparse sampling, nested Monte-Carlo search, classification-based policy iteration, and AlphaZero. For the entire collection see [Zbl 07816360].

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

zbMATH DE Number

0 references

zbMATH Keywords

planning

0 references

reinforcement learning

0 references

Markov decision processes

0 references

Monte-Carlo tree search

0 references

Monte-Carlo simulation

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.4171/icm2022/180

0 references

0 references

Finite-time analysis of the multiarmed bandit problem

0 references

0 references

0 references

A comparison of minimax tree search algorithms

0 references

Model predictive control: Theory and practice - a survey

0 references

A sparse sampling algorithm for near-optimal planning in large Markov decision processes

0 references

Approximate Dynamic Programming

0 references

0 references

0 references

World-championship-caliber Scrabble*

0 references

Temporal-difference search in Computer Go

0 references

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

0 references

Convergence results for single-step on-policy reinforcement-learning algorithms

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:6198646

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q6198646&oldid=37649954"