Learning the distribution with largest mean: two bandit frameworks
From MaRDI portal
Publication:4606431
DOI10.1051/proc/201760114zbMath1426.68237arXiv1702.00001OpenAlexW2584453124MaRDI QIDQ4606431
Emilie Kaufmann, Aurélien Garivier
Publication date: 7 March 2018
Published in: ESAIM: Proceedings and Surveys (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1702.00001
Learning and adaptive systems in artificial intelligence (68T05) Optimal stopping in statistics (62L15)
Related Items
Unnamed Item, Response-adaptive randomization in clinical trials: from myths to practical considerations, Learning the distribution with largest mean: two bandit frameworks
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Batched bandit problems
- The multi-armed bandit problem with covariates
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Context tree selection: a unifying view
- Asymptotically efficient adaptive allocation rules
- Landmark learning: An illustration of associative search
- On Bayesian index policies for sequential resource allocation
- Optimal adaptive policies for sequential allocation problems
- Pure exploration in finitely-armed and continuous-armed bandits
- On Upper-Confidence Bound Policies for Switching Bandit Problems
- Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis
- Sequential Design of Experiments
- Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: finite parameter space
- Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains
- Learning the distribution with largest mean: two bandit frameworks
- A minimax and asymptotically optimal algorithm for stochastic bandits
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- Simple Bayesian Algorithms for Best-Arm Identification
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Prediction, Learning, and Games
- Some aspects of the sequential design of experiments
- A Single-Sample Multiple Decision Procedure for Ranking Means of Normal Populations with known Variances
- Finite-time analysis of the multiarmed bandit problem