From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning

DOI10.1561/2200000038zbMath1296.91086OpenAlexW2073107347MaRDI QIDQ5168384

Publication date: 4 July 2014

Published in: Foundations and Trends® in Machine Learning (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1561/2200000038

zbMATH Keywords

optimization stochastic optimization Markov decision processes online learning operations research algorithmic game theory game theoretic learning

Mathematics Subject Classification ID

Decision theory (91B06) Monte Carlo methods (65C05) Large-scale problems in mathematical programming (90C06) Learning and adaptive systems in artificial intelligence (68T05) Stochastic programming (90C15) Management decision making, including multiple objectives (90B50) Combinatorial optimization (90C27) Stopping times; optimal stopping problems; gambling theory (60G40) Markov and semi-Markov decision processes (90C40)

Related Items (25)

Convergence rate of a rectangular subdivision-based optimization algorithm for smooth multivariate functions ⋮ Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values ⋮ Nonasymptotic Analysis of Monte Carlo Tree Search ⋮ Optimistic optimization for model predictive control of \(\max\)-plus linear systems ⋮ Optimistic optimization for continuous nonconvex piecewise affine functions ⋮ Optimistic planning algorithms for state-constrained optimal control problems ⋮ The aircraft runway scheduling problem: a survey ⋮ Revisiting norm optimization for multi-objective black-box problems: a finite-time analysis ⋮ Multi-armed bandits with censored consumption of resources ⋮ Unnamed Item ⋮ Optimistic planning for control of hybrid-input nonlinear systems ⋮ Improving SAT Solving Using Monte Carlo Tree Search-Based Clause Learning ⋮ Planning in hybrid relational MDPs ⋮ Gaussian process bandits with adaptive discretization ⋮ A unified framework for stochastic optimization ⋮ Consensus for black-box nonlinear agents using optimistic optimization ⋮ Unnamed Item ⋮ Planning for optimal control and performance certification in nonlinear systems with controlled or uncontrolled switches ⋮ Convergence rate of a simulated annealing algorithm with noisy observations ⋮ Optimistic minimax search for noncooperative switched control with or without dwell time ⋮ On Monte-Carlo tree search for deterministic games with alternate moves and complete information ⋮ Learning‐based iterative modular adaptive control for nonlinear systems ⋮ Online Learning in Markov Decision Processes with Continuous Actions ⋮ Multi-objective simultaneous optimistic optimization ⋮ Benchmark and Survey of Automated Machine Learning Frameworks

This page was built for publication: From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning