Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
From MaRDI portal
Publication:1017665
DOI10.1016/j.tcs.2009.01.016zbMath1167.68059OpenAlexW2142971854MaRDI QIDQ1017665
Jean-Yves Audibert, Csaba Szepesvári, Rémi Munos
Publication date: 12 May 2009
Published in: Theoretical Computer Science (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.tcs.2009.01.016
risk analysismulti-armed banditsBernstein's inequalityexploration-exploitation tradeoffhigh-probability bound
Related Items (38)
Functional Sequential Treatment Allocation ⋮ Nonasymptotic Analysis of Monte Carlo Tree Search ⋮ Setting Reserve Prices in Second-Price Auctions with Unobserved Bids ⋮ Improving multi-armed bandit algorithms in online pricing settings ⋮ EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET ⋮ Adaptive Sampling Strategies for Stochastic Optimization ⋮ Kullback-Leibler upper confidence bounds for optimal sequential allocation ⋮ An adaptive and robust biological network based on the vacant-particle transportation model ⋮ Exploration and exploitation of scratch games ⋮ Robustness of stochastic bandit policies ⋮ Primal-Dual Algorithms for Optimization with Stochastic Dominance ⋮ Unnamed Item ⋮ Bayesian optimistic Kullback-Leibler exploration ⋮ Constructing effective personalized policies using counterfactual inference from biased data sets with many features ⋮ ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT ⋮ Robust supervised learning with coordinate gradient descent ⋮ Multi-armed linear bandits with latent biases ⋮ A confirmation of a conjecture on Feldman’s two-armed bandit problem ⋮ Pure exploration in finitely-armed and continuous-armed bandits ⋮ Finite-Time Analysis for the Knowledge-Gradient Policy ⋮ UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem ⋮ Data-Driven Decisions for Problems with an Unspecified Objective Function ⋮ Corruption-tolerant bandit learning ⋮ Unnamed Item ⋮ On Bayesian index policies for sequential resource allocation ⋮ Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers ⋮ Optimal learning with Bernstein Online Aggregation ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Boundary crossing probabilities for general exponential families ⋮ Concentration inequalities for sampling without replacement ⋮ Time-uniform, nonparametric, nonasymptotic confidence sequences ⋮ Multi-armed bandits with episode context ⋮ Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach ⋮ Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit ⋮ A PAC algorithm in relative precision for bandit problem with costly sampling ⋮ Mixing time estimation in reversible Markov chains from a single sample path ⋮ Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning
Cites Work
- Unnamed Item
- Asymptotically efficient adaptive allocation rules
- On tail probabilities for martingales
- Machine learning and nonparametric bandit theory
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- Probability Inequalities for Sums of Bounded Random Variables
- Some aspects of the sequential design of experiments
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: Exploration-exploitation tradeoff using variance estimates in multi-armed bandits