Reinforcement Learning, Bit by Bit
From MaRDI portal
Publication:6139546
DOI10.1561/2200000097zbMath1525.68120arXiv2103.04047OpenAlexW4383982036MaRDI QIDQ6139546
Zheng Wen, Unnamed Author, Morteza Ibrahimi, Benjamin van Roy, Ian Osband, Vikranth R. Dwaracherla
Publication date: 19 December 2023
Published in: Foundations and Trends® in Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2103.04047
Learning and adaptive systems in artificial intelligence (68T05) Research exposition (monographs, survey articles) pertaining to computer science (68-02)
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Asymptotically efficient adaptive allocation rules
- Adaptive treatment allocation and the multi-armed bandit problem
- Some upper bounds for relative entropy and applications
- Near-optimal reinforcement learning in polynomial time
- Practical issues in temporal difference learning
- Convex Optimization: Algorithms and Complexity
- The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
- 10.1162/153244303765208377
- An adaptive optimal controller for discrete-time Markov environments
- A Tutorial on Thompson Sampling
- Learning to Optimize via Information-Directed Sampling
- Learning to Optimize via Posterior Sampling
- An Adaptive Sampling Algorithm for Solving Markov Decision Processes
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Elements of Information Theory
- A Definition of Subjective Probability
- Satisficing in Time-Sensitive Bandit Learning
- Finite-time analysis of the multiarmed bandit problem