Reinforcement Learning, Bit by Bit
DOI10.1561/2200000097zbMATH Open1525.68120arXiv2103.04047OpenAlexW4383982036MaRDI QIDQ6139546FDOQ6139546
Zheng Wen, Author name not available (Why is that?), Morteza Ibrahimi, Benjamin Van Roy, Ian Osband, Vikranth R. Dwaracherla
Publication date: 19 December 2023
Published in: Foundations and Trends® in Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2103.04047
Learning and adaptive systems in artificial intelligence (68T05) Research exposition (monographs, survey articles) pertaining to computer science (68-02)
Cites Work
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- 10.1162/153244303765208377
- Elements of Information Theory
- Asymptotically efficient adaptive allocation rules
- Finite-time analysis of the multiarmed bandit problem
- A Definition of Subjective Probability
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Adaptive treatment allocation and the multi-armed bandit problem
- The knowledge gradient algorithm for a general class of online learning problems
- An Adaptive Sampling Algorithm for Solving Markov Decision Processes
- Near-optimal reinforcement learning in polynomial time
- Practical issues in temporal difference learning
- Learning to Optimize via Posterior Sampling
- Some upper bounds for relative entropy and applications
- An adaptive optimal controller for discrete-time Markov environments
- Bayesian reinforcement learning: a survey
- An information-theoretic analysis of Thompson sampling
- A Tutorial on Thompson Sampling
- Learning to Optimize via Information-Directed Sampling
- Satisficing in Time-Sensitive Bandit Learning
Cited In (3)
This page was built for publication: Reinforcement Learning, Bit by Bit
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6139546)