Asymptotically optimal algorithms for budgeted multiple play bandits
From MaRDI portal
Publication:2331676
DOI10.1007/S10994-019-05799-XzbMATH Open1446.91032arXiv1606.09388OpenAlexW2964045314MaRDI QIDQ2331676FDOQ2331676
Authors: Alex Luedtke, Emilie Kaufmann, Antoine Chambaz
Publication date: 30 October 2019
Published in: Machine Learning (Search for Journal in Brave)
Abstract: We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rateand leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.
Full work available at URL: https://arxiv.org/abs/1606.09388
Recommendations
- Approximation algorithms for budgeted learning problems
- Multi-armed bandit problems with multiple plays and switching cost
- Asymptotically optimal multi-armed bandit policies under a cost constraint
- Budgeted multi-armed bandit in continuous action space
- A minimax and asymptotically optimal algorithm for stochastic bandits
Cites Work
- Asymptotically efficient adaptive allocation rules
- Title not available (Why is that?)
- Some aspects of the sequential design of experiments
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Combinatorial bandits
- Adaptive treatment allocation and the multi-armed bandit problem
- Thompson sampling: an asymptotically optimal finite-time analysis
- Discrete-variable extremum problems
- Optimal adaptive policies for sequential allocation problems
- Asymptotically optimal algorithms for budgeted multiple play bandits
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards
- Near-optimal regret bounds for Thompson sampling
- Explore first, exploit next: the true shape of regret in bandit problems
- Bandits with knapsacks
Cited In (14)
- Learning Theory
- Title not available (Why is that?)
- Budgeted multi-armed bandit in continuous action space
- Title not available (Why is that?)
- Asymptotic optimality for decentralised bandits
- Adaptive policies for perimeter surveillance problems
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Asymptotically optimal multi-armed bandit policies under a cost constraint
- Combining multiple strategies for multiarmed bandit problems and asymptotic optimality
- Approximation algorithms for budgeted learning problems
- Asymptotically optimal algorithms for budgeted multiple play bandits
- Evaluation of asymptotic approximations for a two-stage Bernoulli bandit
- Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration
- Title not available (Why is that?)
Uses Software
This page was built for publication: Asymptotically optimal algorithms for budgeted multiple play bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2331676)