Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
From MaRDI portal
Publication:1017665
DOI10.1016/J.TCS.2009.01.016zbMATH Open1167.68059OpenAlexW2142971854MaRDI QIDQ1017665FDOQ1017665
Authors: Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári
Publication date: 12 May 2009
Published in: Theoretical Computer Science (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.tcs.2009.01.016
Recommendations
risk analysismulti-armed banditsBernstein's inequalityexploration-exploitation tradeoffhigh-probability bound
Cites Work
- On tail probabilities for martingales
- Probability Inequalities for Sums of Bounded Random Variables
- Asymptotically efficient adaptive allocation rules
- Some aspects of the sequential design of experiments
- Finite-time analysis of the multiarmed bandit problem
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- Title not available (Why is that?)
- Machine learning and nonparametric bandit theory
Cited In (45)
- Primal-dual algorithms for optimization with stochastic dominance
- Robustness of stochastic bandit policies
- 10.1162/153244303321897663
- Time-uniform, nonparametric, nonasymptotic confidence sequences
- Pure exploration in finitely-armed and continuous-armed bandits
- Profile-based bandit with unknown profiles
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- On Bayesian index policies for sequential resource allocation
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- Setting Reserve Prices in Second-Price Auctions with Unobserved Bids
- Exploration-exploitation policies with almost sure, arbitrarily slow growing asymptotic regret
- Boundary crossing probabilities for general exponential families
- Functional Sequential Treatment Allocation
- Title not available (Why is that?)
- Improving multi-armed bandit algorithms in online pricing settings
- Title not available (Why is that?)
- Volumetric spanners: an efficient exploration basis for learning
- Multi-armed bandits with episode context
- Asymptotically optimal multi-armed bandit policies under a cost constraint
- Finite-time analysis for the knowledge-gradient policy
- Multi-armed linear bandits with latent biases
- A confirmation of a conjecture on Feldman’s two-armed bandit problem
- Normal bandits of unknown means and variances
- An adaptive and robust biological network based on the vacant-particle transportation model
- Title not available (Why is that?)
- Nonasymptotic Analysis of Monte Carlo Tree Search
- Tuning Bandit Algorithms in Stochastic Environments
- Optimal learning with Bernstein online aggregation
- Learning unknown service rates in queues: a multiarmed bandit approach
- Adaptive sampling strategies for stochastic optimization
- A stochastic process approach for multi-agent path finding with non-asymptotic performance guarantees
- Multi-armed bandits based on a variant of simulated annealing
- Data-driven decisions for problems with an unspecified objective function
- Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers
- Concentration inequalities for sampling without replacement
- Mixing time estimation in reversible Markov chains from a single sample path
- Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit
- Exploration and exploitation of scratch games
- Robust supervised learning with coordinate gradient descent
- A PAC algorithm in relative precision for bandit problem with costly sampling
- Corruption-tolerant bandit learning
- Understanding the stochastic dynamics of sequential decision-making processes: a path-integral analysis of multi-armed bandits
- Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning
- Bayesian optimistic Kullback-Leibler exploration
- Constructing effective personalized policies using counterfactual inference from biased data sets with many features
This page was built for publication: Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1017665)