Multi-armed bandits based on a variant of simulated annealing
From MaRDI portal
Publication:2520136
DOI10.1007/S13226-016-0184-5zbMATH Open1351.90161OpenAlexW2475275076MaRDI QIDQ2520136FDOQ2520136
Authors: Mohammed Shahid Abdulla, Shalabh Bhatnagar
Publication date: 13 December 2016
Published in: Indian Journal of Pure \& Applied Mathematics (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s13226-016-0184-5
Recommendations
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- scientific article; zbMATH DE number 3976142
- Thompson sampling: an asymptotically optimal finite-time analysis
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
Approximation methods and heuristics in mathematical programming (90C59) Markov and semi-Markov decision processes (90C40) Probabilistic games; gambling (91A60)
Cites Work
- Stochastic approximation. A dynamical systems viewpoint.
- The Nonstochastic Multiarmed Bandit Problem
- Finite-time analysis of the multiarmed bandit problem
- Adaptive game playing using multiplicative weights
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- An Adaptive Sampling Algorithm for Solving Markov Decision Processes
- The Irrevocable Multiarmed Bandit Problem
- An Asymptotically Efficient Simulation-Based Algorithm for Finite Horizon Stochastic Dynamic Programming
This page was built for publication: Multi-armed bandits based on a variant of simulated annealing
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2520136)