The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
From MaRDI portal
Publication:2892224
DOI10.1287/opre.1110.0999zbMath1241.90201OpenAlexW2069034916MaRDI QIDQ2892224
Ilya O. Ryzhov, Peter I. Frazier, Warren B. Powell
Publication date: 18 June 2012
Published in: Operations Research (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1287/opre.1110.0999
Numerical mathematical programming methods (65K05) Applications of mathematical programming (90C90) Rationality and learning in game theory (91A26)
Related Items (24)
Predictive stochastic programming ⋮ Learning Manipulation Through Information Dissemination ⋮ Perspectives of approximate dynamic programming ⋮ Bandit Theory: Applications to Learning Healthcare Systems and Clinical Trials ⋮ Convergence rate analysis for optimal computing budget allocation algorithms ⋮ On the Convergence Rates of Expected Improvement Methods ⋮ Reinforcement Learning, Bit by Bit ⋮ ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS ⋮ A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model ⋮ Nonstationary Bandits with Habituation and Recovery Dynamics ⋮ Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors ⋮ Learning in Combinatorial Optimization: What and How to Explore ⋮ Simple Bayesian Algorithms for Best-Arm Identification ⋮ Managing mobile production-inventory systems influenced by a modulation process ⋮ Choosing a good toolkit. II: Bayes-rule based heuristics ⋮ Optimal learning with non-Gaussian rewards ⋮ Optimal learning for sequential sampling with non-parametric beliefs ⋮ Optimal learning with a local parametric belief model ⋮ Learning to Optimize via Information-Directed Sampling ⋮ Bayesian Exploration for Approximate Dynamic Programming ⋮ Variance Regularization in Sequential Bayesian Optimization ⋮ Learning to Optimize via Posterior Sampling ⋮ Satisficing in Time-Sensitive Bandit Learning ⋮ Dynamic decision making for graphical models applied to oil exploration
This page was built for publication: The Knowledge Gradient Algorithm for a General Class of Online Learning Problems