Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

From MaRDI portal
Publication:1017665

DOI10.1016/j.tcs.2009.01.016zbMath1167.68059OpenAlexW2142971854MaRDI QIDQ1017665

Jean-Yves Audibert, Csaba Szepesvári, Rémi Munos

Publication date: 12 May 2009

Published in: Theoretical Computer Science (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1016/j.tcs.2009.01.016




Related Items (38)

Functional Sequential Treatment AllocationNonasymptotic Analysis of Monte Carlo Tree SearchSetting Reserve Prices in Second-Price Auctions with Unobserved BidsImproving multi-armed bandit algorithms in online pricing settingsEXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRETAdaptive Sampling Strategies for Stochastic OptimizationKullback-Leibler upper confidence bounds for optimal sequential allocationAn adaptive and robust biological network based on the vacant-particle transportation modelExploration and exploitation of scratch gamesRobustness of stochastic bandit policiesPrimal-Dual Algorithms for Optimization with Stochastic DominanceUnnamed ItemBayesian optimistic Kullback-Leibler explorationConstructing effective personalized policies using counterfactual inference from biased data sets with many featuresASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINTRobust supervised learning with coordinate gradient descentMulti-armed linear bandits with latent biasesA confirmation of a conjecture on Feldman’s two-armed bandit problemPure exploration in finitely-armed and continuous-armed banditsFinite-Time Analysis for the Knowledge-Gradient PolicyUCB revisited: improved regret bounds for the stochastic multi-armed bandit problemData-Driven Decisions for Problems with an Unspecified Objective FunctionCorruption-tolerant bandit learningUnnamed ItemOn Bayesian index policies for sequential resource allocationTune and mix: learning to rank using ensembles of calibrated multi-class classifiersOptimal learning with Bernstein Online AggregationUnnamed ItemUnnamed ItemBoundary crossing probabilities for general exponential familiesConcentration inequalities for sampling without replacementTime-uniform, nonparametric, nonasymptotic confidence sequencesMulti-armed bandits with episode contextLearning Unknown Service Rates in Queues: A Multiarmed Bandit ApproachDismemberment and design for controlling the replication variance of regret for the multi-armed banditA PAC algorithm in relative precision for bandit problem with costly samplingMixing time estimation in reversible Markov chains from a single sample pathTrading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning



Cites Work


This page was built for publication: Exploration-exploitation tradeoff using variance estimates in multi-armed bandits