Satisficing in Time-Sensitive Bandit Learning
From MaRDI portal
Publication:5870357
DOI10.1287/moor.2021.1229OpenAlexW2791950347MaRDI QIDQ5870357
Benjamin van Roy, Daniel J. Russo
Publication date: 9 January 2023
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1803.02855
information theoryonline optimizationsatisficingrate-distortion theoryThompson samplingbandit learning
Bayesian problems; characterization of Bayes procedures (62C10) Learning and adaptive systems in artificial intelligence (68T05)
Related Items
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Asymptotically efficient adaptive allocation rules
- Bandit problems with infinitely many arms
- Choosing a good toolkit. I: Prior-free heuristics
- Choosing a good toolkit. II: Bayes-rule based heuristics
- The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
- Linearly Parameterized Bandits
- A Tutorial on Thompson Sampling
- Learning to Optimize via Posterior Sampling
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems