Asymptotically optimal algorithms for budgeted multiple play bandits

From MaRDI portal
Publication:2331676

DOI10.1007/S10994-019-05799-XzbMATH Open1446.91032arXiv1606.09388OpenAlexW2964045314MaRDI QIDQ2331676FDOQ2331676


Authors: Alex Luedtke, Emilie Kaufmann, Antoine Chambaz Edit this on Wikidata


Publication date: 30 October 2019

Published in: Machine Learning (Search for Journal in Brave)

Abstract: We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rateand leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.


Full work available at URL: https://arxiv.org/abs/1606.09388




Recommendations




Cites Work


Cited In (14)

Uses Software





This page was built for publication: Asymptotically optimal algorithms for budgeted multiple play bandits

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2331676)