On Bayesian index policies for sequential resource allocation
From MaRDI portal
Abstract: This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem. Our main contribution is to prove that the Bayes-UCB algorithm, which relies on quantiles of posterior distributions, is asymptotically optimal when the reward distributions belong to a one-dimensional exponential family, for a large class of prior distributions. We also show that the Bayesian literature gives new insight on what kind of exploration rates could be used in frequentist, UCB-type algorithms. Indeed, approximations of the Bayesian optimal solution or the Finite Horizon Gittins indices provide a justification for the kl-UCB+ and kl-UCB-H+ algorithms, whose asymptotic optimality is also established.
Recommendations
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- A minimax and asymptotically optimal algorithm for stochastic bandits
- Optimistic Gittins Indices
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- scientific article; zbMATH DE number 3850855
Cites work
- scientific article; zbMATH DE number 3125136 (Why is no real title available?)
- scientific article; zbMATH DE number 4078557 (Why is no real title available?)
- scientific article; zbMATH DE number 3638998 (Why is no real title available?)
- scientific article; zbMATH DE number 700091 (Why is no real title available?)
- ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM
- Adaptive treatment allocation and the multi-armed bandit problem
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Asymptotically efficient adaptive allocation rules
- Computing a classic index for finite-horizon bandits
- Concentration inequalities. A nonasymptotic theory of independence
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- Finite-time analysis of the multiarmed bandit problem
- Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Learning to optimize via posterior sampling
- Multi-armed bandit allocation indices. With a foreword by Peter Whittle.
- Near-optimal regret bounds for Thompson sampling
- On Bayesian index policies for sequential resource allocation
- On Sequential Designs for Maximizing the Sum of $n$ Observations
- On the Prior Sensitivity of Thompson Sampling
- Optimal stopping and dynamic allocation
- Regret bounds and minimax policies under partial monitoring
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- Small-sample performance of Bernoulli two-armed bandit Bayesian strategies
- Thompson sampling: an asymptotically optimal finite-time analysis
Cited in
(19)- Simple Bayesian algorithms for best-arm identification
- Learning the distribution with largest mean: two bandit frameworks
- scientific article; zbMATH DE number 7596797 (Why is no real title available?)
- Adaptive policies for perimeter surveillance problems
- Online learning of energy consumption for navigation of electric vehicles
- Exploration-exploitation policies with almost sure, arbitrarily slow growing asymptotic regret
- Customization of J. Bather's UCB strategy for a Gaussian multiarmed bandit
- scientific article; zbMATH DE number 6982311 (Why is no real title available?)
- Two-armed bandit problem and batch version of the mirror descent algorithm
- Optimistic Gittins Indices
- Sequential resource allocation in a stochastic environment: an overview and numerical experiments
- Gaussian two-armed bandit: limiting description
- UCB strategies and optimization of batch processing in a one-armed bandit problem
- A confirmation of a conjecture on Feldman’s two-armed bandit problem
- A minimax and asymptotically optimal algorithm for stochastic bandits
- Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems
- On Bayesian index policies for sequential resource allocation
- A better resource allocation algorithm with semi-bandit feedback
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
This page was built for publication: On Bayesian index policies for sequential resource allocation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1750289)