Reinforcement Learning, Bit by Bit
From MaRDI portal
Publication:6139546
DOI10.1561/2200000097zbMath1525.68120arXiv2103.04047OpenAlexW4383982036MaRDI QIDQ6139546
Zheng Wen, Unnamed Author, Morteza Ibrahimi, Benjamin van Roy, Ian Osband, Vikranth R. Dwaracherla
Publication date: 19 December 2023
Published in: Foundations and Trends® in Machine Learning (Search for Journal in Brave)
Abstract: Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deeper understanding of information acquisition and representation. We discuss concepts and regret analysis that together offer principled guidance. This line of thinking sheds light on questions of what information to seek, how to seek that information, and what information to retain. To illustrate concepts, we design simple agents that build on them and present computational results that highlight data efficiency.
Full work available at URL: https://arxiv.org/abs/2103.04047
Learning and adaptive systems in artificial intelligence (68T05) Research exposition (monographs, survey articles) pertaining to computer science (68-02)
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Asymptotically efficient adaptive allocation rules
- Adaptive treatment allocation and the multi-armed bandit problem
- Some upper bounds for relative entropy and applications
- Near-optimal reinforcement learning in polynomial time
- Practical issues in temporal difference learning
- Convex Optimization: Algorithms and Complexity
- The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
- 10.1162/153244303765208377
- An adaptive optimal controller for discrete-time Markov environments
- A Tutorial on Thompson Sampling
- Learning to Optimize via Information-Directed Sampling
- Learning to Optimize via Posterior Sampling
- An Adaptive Sampling Algorithm for Solving Markov Decision Processes
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Elements of Information Theory
- A Definition of Subjective Probability
- Satisficing in Time-Sensitive Bandit Learning
- Finite-time analysis of the multiarmed bandit problem
Related Items (3)
Reinforcement learning in non-Markovian environments ⋮ Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization ⋮ Occupancy information ratio: infinite-horizon, information-directed, parameterized policy search
This page was built for publication: Reinforcement Learning, Bit by Bit