Batched bandit problems

From MaRDI portal
Publication:282463

DOI10.1214/15-AOS1381zbMATH Open1338.62180arXiv1505.00369OpenAlexW1958090791MaRDI QIDQ282463FDOQ282463


Authors: Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg Edit this on Wikidata


Publication date: 12 May 2016

Published in: The Annals of Statistics (Search for Journal in Brave)

Abstract: Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.


Full work available at URL: https://arxiv.org/abs/1505.00369




Recommendations




Cites Work


Cited In (18)





This page was built for publication: Batched bandit problems

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q282463)