Kullback-Leibler upper confidence bounds for optimal sequential allocation

DOI10.1214/13-AOS1119zbMATH Open1293.62161arXiv1210.1136OpenAlexW3100329718MaRDI QIDQ366995FDOQ366995

Authors: Olivier Cappé, Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz, Aurélien Garivier

Publication date: 25 September 2013

Published in: The Annals of Statistics (Search for Journal in Brave)

Abstract: We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins [J. R. Stat. Soc. Ser. B Stat. Methodol. 41 (1979) 148-177], based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: the kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins [Adv. in Appl. Math. 6 (1985) 4-22] and Burnetas and Katehakis [Adv. in Appl. Math. 17 (1996) 122-142], respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.

Full work available at URL: https://arxiv.org/abs/1210.1136

Recommendations

zbMATH Keywords

Kullback-Leibler divergence multi-armed bandit problems sequential testing upper confidence bound

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Sequential statistical analysis (62L10) Sequential estimation (62L12)

Cites Work

Cited In (40)

Uses Software

py/maBandits

This page was built for publication: Kullback-Leibler upper confidence bounds for optimal sequential allocation

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q366995)