Asymptotically optimal algorithms for budgeted multiple play bandits
From MaRDI portal
Publication:2331676
Abstract: We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rateand leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.
Recommendations
- Approximation algorithms for budgeted learning problems
- Multi-armed bandit problems with multiple plays and switching cost
- Asymptotically optimal multi-armed bandit policies under a cost constraint
- Budgeted multi-armed bandit in continuous action space
- A minimax and asymptotically optimal algorithm for stochastic bandits
Cites work
- scientific article; zbMATH DE number 3638998 (Why is no real title available?)
- Adaptive treatment allocation and the multi-armed bandit problem
- Asymptotically efficient adaptive allocation rules
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards
- Asymptotically optimal algorithms for budgeted multiple play bandits
- Bandits with knapsacks
- Combinatorial bandits
- Discrete-variable extremum problems
- Explore first, exploit next: the true shape of regret in bandit problems
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Near-optimal regret bounds for Thompson sampling
- Optimal adaptive policies for sequential allocation problems
- Some aspects of the sequential design of experiments
- Thompson sampling: an asymptotically optimal finite-time analysis
Cited in
(14)- scientific article; zbMATH DE number 7053386 (Why is no real title available?)
- Learning Theory
- scientific article; zbMATH DE number 4064878 (Why is no real title available?)
- Budgeted multi-armed bandit in continuous action space
- scientific article; zbMATH DE number 4059270 (Why is no real title available?)
- Asymptotic optimality for decentralised bandits
- Adaptive policies for perimeter surveillance problems
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Asymptotically optimal multi-armed bandit policies under a cost constraint
- Combining multiple strategies for multiarmed bandit problems and asymptotic optimality
- Approximation algorithms for budgeted learning problems
- Asymptotically optimal algorithms for budgeted multiple play bandits
- Evaluation of asymptotic approximations for a two-stage Bernoulli bandit
- Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration
This page was built for publication: Asymptotically optimal algorithms for budgeted multiple play bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2331676)