A dynamic programming strategy to balance exploration and exploitation in the bandit problem
From MaRDI portal
(Redirected from Publication:647433)
Recommendations
- Optimal exploration-exploitation in a multi-armed bandit problem with non-stationary rewards
- scientific article; zbMATH DE number 1907146
- Finite-time analysis of the multiarmed bandit problem
- Pure exploration in multi-armed bandits problems
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Cites work
- scientific article; zbMATH DE number 4061056 (Why is no real title available?)
- scientific article; zbMATH DE number 1306865 (Why is no real title available?)
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- scientific article; zbMATH DE number 700091 (Why is no real title available?)
- scientific article; zbMATH DE number 708500 (Why is no real title available?)
- scientific article; zbMATH DE number 194374 (Why is no real title available?)
- A dynamic programming strategy to balance exploration and exploitation in the bandit problem
- Approximate Dynamic Programming
- Exploration of multi-state environments: Local measures and back-propagation of uncertainty
- Finite-time analysis of the multiarmed bandit problem
- Pattern recognition and machine learning.
- Some aspects of the sequential design of experiments
- The sample average approximation method for stochastic discrete optimization
Cited in
(2)
This page was built for publication: A dynamic programming strategy to balance exploration and exploitation in the bandit problem
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q647433)