Reinforcement Learning, Bit by Bit
From MaRDI portal
Publication:6139546
Abstract: Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deeper understanding of information acquisition and representation. We discuss concepts and regret analysis that together offer principled guidance. This line of thinking sheds light on questions of what information to seek, how to seek that information, and what information to retain. To illustrate concepts, we design simple agents that build on them and present computational results that highlight data efficiency.
Recommendations
Cites work
- scientific article; zbMATH DE number 3474804 (Why is no real title available?)
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- 10.1162/153244303765208377
- A Definition of Subjective Probability
- A Tutorial on Thompson Sampling
- Adaptive treatment allocation and the multi-armed bandit problem
- An Adaptive Sampling Algorithm for Solving Markov Decision Processes
- An adaptive optimal controller for discrete-time Markov environments
- An information-theoretic analysis of Thompson sampling
- Asymptotically efficient adaptive allocation rules
- Bayesian reinforcement learning: a survey
- Deep exploration via randomized value functions
- Elements of Information Theory
- Finite-time analysis of the multiarmed bandit problem
- Learning to optimize via information-directed sampling
- Learning to optimize via posterior sampling
- Near-optimal reinforcement learning in polynomial time
- Practical issues in temporal difference learning
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- Reinforcement learning. An introduction
- Satisficing in Time-Sensitive Bandit Learning
- Some upper bounds for relative entropy and applications
- The knowledge gradient algorithm for a general class of online learning problems
Cited in
(3)
This page was built for publication: Reinforcement Learning, Bit by Bit
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6139546)