Learning to Optimize via Posterior Sampling
From MaRDI portal
Publication:5247618
DOI10.1287/moor.2014.0650zbMath1310.93091arXiv1301.2609OpenAlexW2149721706MaRDI QIDQ5247618
Daniel J. Russo, Benjamin van Roy
Publication date: 24 April 2015
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1301.2609
Stochastic programming (90C15) Sampled-data control/observation systems (93C57) Stochastic learning and adaptive control (93E35) Sequential statistical design (62L05)
Related Items (45)
Optimal Learning for Nonlinear Parametric Belief Models Over Multidimensional Continuous Spaces ⋮ Unnamed Item ⋮ Practical Bayesian support vector regression for financial time series prediction and market condition change detection ⋮ Optimal Information Blending with Measurements in the L2 Sphere ⋮ Bandit Theory: Applications to Learning Healthcare Systems and Clinical Trials ⋮ Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning ⋮ Bayesian optimization with partially specified queries ⋮ Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems ⋮ Online Resource Allocation with Personalized Learning ⋮ Online learning of network bottlenecks via minimax paths ⋮ On the Convergence Rates of Expected Improvement Methods ⋮ Decomposable Markov Decision Processes: A Fluid Optimization Approach ⋮ Reward Maximization Through Discrete Active Inference ⋮ Reinforcement Learning, Bit by Bit ⋮ ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS ⋮ Optimal Learning in Linear Regression with Combinatorial Feature Selection ⋮ Online learning of energy consumption for navigation of electric vehicles ⋮ Multi-fidelity cost-aware Bayesian optimization ⋮ Gaussian process bandits with adaptive discretization ⋮ Online Decision Making with High-Dimensional Covariates ⋮ Technical Note—Consistency Analysis of Sequential Learning Under Approximate Bayesian Inference ⋮ Online Network Revenue Management Using Thompson Sampling ⋮ A unified framework for stochastic optimization ⋮ Nonstationary Bandits with Habituation and Recovery Dynamics ⋮ Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors ⋮ Optimal Learning with Local Nonlinear Parametric Models over Continuous Designs ⋮ Unnamed Item ⋮ On Bayesian index policies for sequential resource allocation ⋮ Bayesian adversarial multi-node bandit for optimal smart grid protection against cyber attacks ⋮ Infinite Arms Bandit: Optimality via Confidence Bounds ⋮ Multi-Armed Bandit for Species Discovery: A Bayesian Nonparametric Approach ⋮ Improved regret for zeroth-order adversarial bandit convex optimisation ⋮ Complete expected improvement converges to an optimal budget allocation ⋮ Learning to Optimize via Information-Directed Sampling ⋮ Efficient Simulation of High Dimensional Gaussian Vectors ⋮ The Local Time Method for Targeting and Selection ⋮ Bayesian Exploration for Approximate Dynamic Programming ⋮ Variance Regularization in Sequential Bayesian Optimization ⋮ Multi-armed bandit with sub-exponential rewards ⋮ Best arm identification in generalized linear bandits ⋮ On the Prior Sensitivity of Thompson Sampling ⋮ IntelligentPooling: practical Thompson sampling for mHealth ⋮ Game of Thrones: Fully Distributed Learning for Multiplayer Bandits ⋮ Unnamed Item ⋮ Satisficing in Time-Sensitive Bandit Learning
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Asymptotically efficient adaptive allocation rules
- Adaptive treatment allocation and the multi-armed bandit problem
- The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
- Linearly Parameterized Bandits
- Near-Optimal Regret Bounds for Thompson Sampling
- Computationally Related Problems
- Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: Learning to Optimize via Posterior Sampling