A minimax and asymptotically optimal algorithm for stochastic bandits
From MaRDI portal
Publication:4645646
zbMATH Open1407.62046arXiv1702.07211MaRDI QIDQ4645646FDOQ4645646
Authors: Pierre Ménard, Aurélien Garivier
Publication date: 10 January 2019
Abstract: We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.
Full work available at URL: https://arxiv.org/abs/1702.07211
Recommendations
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- On Bayesian index policies for sequential resource allocation
- scientific article; zbMATH DE number 6982311
- The Nonstochastic Multiarmed Bandit Problem
minimax optimalityasymptotic optimalitystochastic multi-armed banditsregret analysisupper confidence bound (UCB)
Cited In (16)
- An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
- Minimax Off-Policy Evaluation for Multi-Armed Bandits
- Learning the distribution with largest mean: two bandit frameworks
- Title not available (Why is that?)
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Ballooning multi-armed bandits
- On Bayesian index policies for sequential resource allocation
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- A penalized bandit algorithm
- A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Greedy Algorithm Almost Dominates in Smoothed Contextual Bandits
- Boundary crossing for general exponential families
- Asymptotically optimal algorithms for budgeted multiple play bandits
- An Efficient Algorithm for Learning with Semi-bandit Feedback
- Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems
This page was built for publication: A minimax and asymptotically optimal algorithm for stochastic bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4645646)