The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

From MaRDI portal

Publication:2892224

Jump to:navigation, search

DOI10.1287/opre.1110.0999zbMath1241.90201OpenAlexW2069034916MaRDI QIDQ2892224

Ilya O. Ryzhov, Peter I. Frazier, Warren B. Powell

Publication date: 18 June 2012

Published in: Operations Research (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1287/opre.1110.0999

zbMATH Keywords

online learning Gittins index index policy optimal learning knowledge gradient multiarmed bandit

Mathematics Subject Classification ID

Numerical mathematical programming methods (65K05) Applications of mathematical programming (90C90) Rationality and learning in game theory (91A26)

Related Items (24)

Predictive stochastic programming ⋮ Learning Manipulation Through Information Dissemination ⋮ Perspectives of approximate dynamic programming ⋮ Bandit Theory: Applications to Learning Healthcare Systems and Clinical Trials ⋮ Convergence rate analysis for optimal computing budget allocation algorithms ⋮ On the Convergence Rates of Expected Improvement Methods ⋮ Reinforcement Learning, Bit by Bit ⋮ ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS ⋮ A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model ⋮ Nonstationary Bandits with Habituation and Recovery Dynamics ⋮ Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors ⋮ Learning in Combinatorial Optimization: What and How to Explore ⋮ Simple Bayesian Algorithms for Best-Arm Identification ⋮ Managing mobile production-inventory systems influenced by a modulation process ⋮ Choosing a good toolkit. II: Bayes-rule based heuristics ⋮ Optimal learning with non-Gaussian rewards ⋮ Optimal learning for sequential sampling with non-parametric beliefs ⋮ Optimal learning with a local parametric belief model ⋮ Learning to Optimize via Information-Directed Sampling ⋮ Bayesian Exploration for Approximate Dynamic Programming ⋮ Variance Regularization in Sequential Bayesian Optimization ⋮ Learning to Optimize via Posterior Sampling ⋮ Satisficing in Time-Sensitive Bandit Learning ⋮ Dynamic decision making for graphical models applied to oil exploration

This page was built for publication: The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:2892224&oldid=15849891"