A minimax and asymptotically optimal algorithm for stochastic bandits
From MaRDI portal
Publication:4645646
Abstract: We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.
Recommendations
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- On Bayesian index policies for sequential resource allocation
- scientific article; zbMATH DE number 6982311
- The Nonstochastic Multiarmed Bandit Problem
Cited in
(16)- An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
- Minimax Off-Policy Evaluation for Multi-Armed Bandits
- Learning the distribution with largest mean: two bandit frameworks
- scientific article; zbMATH DE number 4059270 (Why is no real title available?)
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- Ballooning multi-armed bandits
- On Bayesian index policies for sequential resource allocation
- A penalized bandit algorithm
- A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Greedy Algorithm Almost Dominates in Smoothed Contextual Bandits
- Boundary crossing for general exponential families
- Asymptotically optimal algorithms for budgeted multiple play bandits
- An Efficient Algorithm for Learning with Semi-bandit Feedback
- Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems
This page was built for publication: A minimax and asymptotically optimal algorithm for stochastic bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4645646)