EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET
From MaRDI portal
Publication:5070864
DOI10.1017/S0269964818000529zbMath1484.62039arXiv1505.02865OpenAlexW2914435863MaRDI QIDQ5070864
Michael N. Katehakis, Wesley Cowan
Publication date: 14 April 2022
Published in: Probability in the Engineering and Informational Sciences (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1505.02865
online learningupper confidence boundssequential allocationbanditsinflated sample meansforcing actionsmulti-armed
Asymptotic properties of nonparametric inference (62G20) Nonparametric estimation (62G05) Sequential statistical analysis (62L10)
Cites Work
- Unnamed Item
- Unnamed Item
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- Asymptotically efficient adaptive allocation rules
- Optimal adaptive policies for sequential allocation problems
- Regret Bounds for Reinforcement Learning via Markov Chain Concentration
- Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
- MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Some aspects of the sequential design of experiments
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET