Publication:2896165
From MaRDI portal
zbMath1242.91034MaRDI QIDQ2896165
Jean-Yves Audibert, Sébastien Bubeck
Publication date: 13 July 2012
Full work available at URL: http://www.jmlr.org/papers/v11/audibert10a.html
online learning; regret bound; minimax rate; bandits (adversarial and stochastic); label efficient; prediction with limited feedback; upper confidence bound (UCB) policy
68Q32: Computational learning theory
62M05: Markov processes: estimation; hidden Markov models
91A60: Probabilistic games; gambling
Related Items
Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback, Unnamed Item, Online Learning over a Finite Action Set with Limited Switching, Optimistic Gittins Indices, Setting Reserve Prices in Second-Price Auctions with Unobserved Bids, Data-Driven Decisions for Problems with an Unspecified Objective Function, Small-Loss Bounds for Online Learning with Partial Information, Unifying mirror descent and dual averaging, On two continuum armed bandit problems in high dimensions, Batched bandit problems, The multi-armed bandit problem with covariates, Kullback-Leibler upper confidence bounds for optimal sequential allocation, The \(K\)-armed dueling bandits problem, On Bayesian index policies for sequential resource allocation, Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm, Ballooning multi-armed bandits, Truthful Mechanisms with Implicit Payment Computation, Bayesian Incentive-Compatible Bandit Exploration