A dynamic programming strategy to balance exploration and exploitation in the bandit problem
From MaRDI portal
Publication:647433
DOI10.1007/s10472-010-9190-1zbMath1226.68079MaRDI QIDQ647433
Olivier Caelen, Gianluca Bontempi
Publication date: 23 November 2011
Published in: Annals of Mathematics and Artificial Intelligence (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10472-010-9190-1
62H12: Estimation in multivariate analysis
68T05: Learning and adaptive systems in artificial intelligence
68T20: Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
Related Items
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- A dynamic programming strategy to balance exploration and exploitation in the bandit problem
- Exploration of multi-state environments: Local measures and back-propagation of uncertainty
- The Sample Average Approximation Method for Stochastic Discrete Optimization
- Approximate Dynamic Programming
- Some aspects of the sequential design of experiments
- Finite-time analysis of the multiarmed bandit problem