Robustness of stochastic bandit policies
From MaRDI portal
Publication:391739
DOI10.1016/j.tcs.2013.09.019zbMath1371.68239arXiv1107.4506OpenAlexW1985558253MaRDI QIDQ391739
Antoine Salomon, Jean-Yves Audibert
Publication date: 13 January 2014
Published in: Theoretical Computer Science (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1107.4506
Learning and adaptive systems in artificial intelligence (68T05) Sequential statistical analysis (62L10) Probabilistic games; gambling (91A60) General considerations in statistical decision theory (62C05)
Cites Work
- Unnamed Item
- The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- Asymptotically efficient adaptive allocation rules
- When can the two-armed bandit algorithm be trusted?
- Optimal adaptive policies for sequential allocation problems
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- Some aspects of the sequential design of experiments
- Stationary multi-choice bandit problems.
- Finite-time analysis of the multiarmed bandit problem