A dynamic programming strategy to balance exploration and exploitation in the bandit problem
From MaRDI portal
Publication:647433
DOI10.1007/s10472-010-9190-1zbMath1226.68079OpenAlexW2052471706MaRDI QIDQ647433
Olivier Caelen, Gianluca Bontempi
Publication date: 23 November 2011
Published in: Annals of Mathematics and Artificial Intelligence (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10472-010-9190-1
Estimation in multivariate analysis (62H12) Learning and adaptive systems in artificial intelligence (68T05) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20)
Related Items
Uses Software
Cites Work
- A dynamic programming strategy to balance exploration and exploitation in the bandit problem
- Exploration of multi-state environments: Local measures and back-propagation of uncertainty
- The Sample Average Approximation Method for Stochastic Discrete Optimization
- Approximate Dynamic Programming
- Some aspects of the sequential design of experiments
- Finite-time analysis of the multiarmed bandit problem
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item