A dynamic programming strategy to balance exploration and exploitation in the bandit problem
DOI10.1007/S10472-010-9190-1zbMATH Open1226.68079OpenAlexW2052471706MaRDI QIDQ647433FDOQ647433
Authors: Olivier Caelen, Gianluca Bontempi
Publication date: 23 November 2011
Published in: Annals of Mathematics and Artificial Intelligence (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10472-010-9190-1
Recommendations
- Optimal exploration-exploitation in a multi-armed bandit problem with non-stationary rewards
- scientific article; zbMATH DE number 1907146
- Finite-time analysis of the multiarmed bandit problem
- Pure exploration in multi-armed bandits problems
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Learning and adaptive systems in artificial intelligence (68T05) Estimation in multivariate analysis (62H12) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20)
Cites Work
- Title not available (Why is that?)
- Title not available (Why is that?)
- Pattern recognition and machine learning.
- Title not available (Why is that?)
- Title not available (Why is that?)
- Some aspects of the sequential design of experiments
- Finite-time analysis of the multiarmed bandit problem
- The sample average approximation method for stochastic discrete optimization
- Title not available (Why is that?)
- Approximate Dynamic Programming
- Title not available (Why is that?)
- Exploration of multi-state environments: Local measures and back-propagation of uncertainty
- A dynamic programming strategy to balance exploration and exploitation in the bandit problem
Cited In (1)
Uses Software
This page was built for publication: A dynamic programming strategy to balance exploration and exploitation in the bandit problem
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q647433)