Kullback-Leibler upper confidence bounds for optimal sequential allocation

From MaRDI portal
Publication:366995

DOI10.1214/13-AOS1119zbMath1293.62161arXiv1210.1136OpenAlexW3100329718MaRDI QIDQ366995

Rémi Munos, Odalric-Ambrym Maillard, Olivier Cappé, Gilles Stoltz, Aurélien Garivier

Publication date: 25 September 2013

Published in: The Annals of Statistics (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1210.1136




Related Items (33)

Bayesian adaptive bandit-based designs using the Gittins index for multi-armed trials with normally distributed endpointsBatched bandit problemsScalar utility theory and proportional processing: what does it actually imply?Probabilistic learning inference of boundary value problem with uncertainties based on Kullback-Leibler divergence under implicit constraintsInfomax strategies for an optimal balance between exploration and exploitationKullback-Leibler upper confidence bounds for optimal sequential allocationRegret bounds for Narendra-Shapiro bandit algorithmsThe multi-armed bandit problem: an efficient nonparametric solutionLocal Dvoretzky-Kiefer-Wolfowitz confidence bandsDealing with expert bias in collective decision-makingUnnamed ItemUnnamed ItemUnnamed ItemGood arm identification via bandit feedbackASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINTProbabilistic learning constrained by realizations using a weak formulation of Fourier transform of probability measuresA confirmation of a conjecture on Feldman’s two-armed bandit problemLearning the distribution with largest mean: two bandit frameworksFinite-Time Analysis for the Knowledge-Gradient PolicyUnnamed ItemOn Bayesian index policies for sequential resource allocationUnnamed ItemUnnamed ItemInfinite Arms Bandit: Optimality via Confidence BoundsBoundary crossing probabilities for general exponential familiesAdaptive policies for perimeter surveillance problemsLearning to Optimize via Information-Directed SamplingExplore First, Exploit Next: The True Shape of Regret in Bandit ProblemsNonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit modelsTechnical Note—A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient AgentsLearning to Optimize via Posterior SamplingAsymptotically optimal algorithms for budgeted multiple play banditsSatisficing in Time-Sensitive Bandit Learning


Uses Software


Cites Work


This page was built for publication: Kullback-Leibler upper confidence bounds for optimal sequential allocation