Asymptotically optimal algorithms for budgeted multiple play bandits

DOI10.1007/S10994-019-05799-XMaRDI QIDQ2331676zbMATH OpenOpenAlexFDO

Authors Alex Luedtke, Emilie Kaufmann, Antoine Chambaz

Publication date 30 October 2019

Published in Machine Learning (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1606.09388

zbMATH Keywords

Thompson sampling budgeted bandits Kl-UCB knapsack bandits multiple-play bandits

Mathematics Subject Classification ID

Probabilistic games; gambling (91A60) Algorithmic game theory and complexity (91A68)

Abstract: We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rateand leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.

Recommendations

Cites work

Cited in

(14)

Describes a project that uses

Uses Software

R

This page was built for publication: Asymptotically optimal algorithms for budgeted multiple play bandits

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2331676)