A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

From MaRDI portal
Publication:6225697

arXiv1105.5820MaRDI QIDQ6225697FDOQ6225697


Authors: Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz Edit this on Wikidata


Publication date: 29 May 2011

Abstract: We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).













This page was built for publication: A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6225697)