UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
From MaRDI portal
Publication:653803
DOI10.1007/s10998-010-3055-6zbMath1240.68164OpenAlexW1975779216MaRDI QIDQ653803
Publication date: 19 December 2011
Published in: Periodica Mathematica Hungarica (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10998-010-3055-6
Markov processes: estimation; hidden Markov models (62M05) Learning and adaptive systems in artificial intelligence (68T05) Probabilistic games; gambling (91A60)
Related Items (15)
Batched bandit problems ⋮ Modification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree search ⋮ The multi-armed bandit problem with covariates ⋮ Unnamed Item ⋮ ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT ⋮ Transfer learning for contextual multi-armed bandits ⋮ Ballooning multi-armed bandits ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Explore First, Exploit Next: The True Shape of Regret in Bandit Problems ⋮ Approximations of the Restless Bandit Problem ⋮ Unnamed Item ⋮ Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning ⋮ A Bandit-Learning Approach to Multifidelity Approximation
Cites Work
- Unnamed Item
- Unnamed Item
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- Asymptotically efficient adaptive allocation rules
- The Nonstochastic Multiarmed Bandit Problem
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- Probability Inequalities for Sums of Bounded Random Variables
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem