Multi-armed bandits based on a variant of simulated annealing
From MaRDI portal
Publication:2520136
DOI10.1007/s13226-016-0184-5zbMath1351.90161OpenAlexW2475275076MaRDI QIDQ2520136
Shalabh Bhatnagar, Mohammed Shahid Abdulla
Publication date: 13 December 2016
Published in: Indian Journal of Pure \& Applied Mathematics (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s13226-016-0184-5
Approximation methods and heuristics in mathematical programming (90C59) Markov and semi-Markov decision processes (90C40) Probabilistic games; gambling (91A60)
Cites Work
- Stochastic approximation. A dynamical systems viewpoint.
- Adaptive game playing using multiplicative weights
- The Irrevocable Multiarmed Bandit Problem
- The Nonstochastic Multiarmed Bandit Problem
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- An Asymptotically Efficient Simulation-Based Algorithm for Finite Horizon Stochastic Dynamic Programming
- An Adaptive Sampling Algorithm for Solving Markov Decision Processes
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Finite-time analysis of the multiarmed bandit problem