Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

From MaRDI portal

Publication:1017665

Jump to:navigation, search

DOI10.1016/j.tcs.2009.01.016zbMath1167.68059OpenAlexW2142971854MaRDI QIDQ1017665

Jean-Yves Audibert, Csaba Szepesvári, Rémi Munos

Publication date: 12 May 2009

Published in: Theoretical Computer Science (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1016/j.tcs.2009.01.016

zbMATH Keywords

risk analysis multi-armed bandits Bernstein's inequality exploration-exploitation tradeoff high-probability bound

Mathematics Subject Classification ID

Nonnumerical algorithms (68W05) Stopping times; optimal stopping problems; gambling theory (60G40)

Related Items (38)

Functional Sequential Treatment Allocation ⋮ Nonasymptotic Analysis of Monte Carlo Tree Search ⋮ Setting Reserve Prices in Second-Price Auctions with Unobserved Bids ⋮ Improving multi-armed bandit algorithms in online pricing settings ⋮ EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET ⋮ Adaptive Sampling Strategies for Stochastic Optimization ⋮ Kullback-Leibler upper confidence bounds for optimal sequential allocation ⋮ An adaptive and robust biological network based on the vacant-particle transportation model ⋮ Exploration and exploitation of scratch games ⋮ Robustness of stochastic bandit policies ⋮ Primal-Dual Algorithms for Optimization with Stochastic Dominance ⋮ Unnamed Item ⋮ Bayesian optimistic Kullback-Leibler exploration ⋮ Constructing effective personalized policies using counterfactual inference from biased data sets with many features ⋮ ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT ⋮ Robust supervised learning with coordinate gradient descent ⋮ Multi-armed linear bandits with latent biases ⋮ A confirmation of a conjecture on Feldman’s two-armed bandit problem ⋮ Pure exploration in finitely-armed and continuous-armed bandits ⋮ Finite-Time Analysis for the Knowledge-Gradient Policy ⋮ UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem ⋮ Data-Driven Decisions for Problems with an Unspecified Objective Function ⋮ Corruption-tolerant bandit learning ⋮ Unnamed Item ⋮ On Bayesian index policies for sequential resource allocation ⋮ Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers ⋮ Optimal learning with Bernstein Online Aggregation ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Boundary crossing probabilities for general exponential families ⋮ Concentration inequalities for sampling without replacement ⋮ Time-uniform, nonparametric, nonasymptotic confidence sequences ⋮ Multi-armed bandits with episode context ⋮ Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach ⋮ Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit ⋮ A PAC algorithm in relative precision for bandit problem with costly sampling ⋮ Mixing time estimation in reversible Markov chains from a single sample path ⋮ Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning

Cites Work

This page was built for publication: Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1017665&oldid=13016472"