An information-theoretic analysis of return maximization in reinforcement learning

Recommendations

The asymptotic equipartition property in reinforcement learning and its relation to return maximization
An information-theoretic analysis of Thompson sampling
Reinforcement learning in finite MDPs: PAC analysis
Near-optimal reinforcement learning in polynomial time
Using Expectation-Maximization for Reinforcement Learning

Cites work

scientific article; zbMATH DE number 3126094 (Why is no real title available?)
scientific article; zbMATH DE number 3148886 (Why is no real title available?)
scientific article; zbMATH DE number 3908323 (Why is no real title available?)
scientific article; zbMATH DE number 1043533 (Why is no real title available?)
scientific article; zbMATH DE number 1179314 (Why is no real title available?)
scientific article; zbMATH DE number 1821199 (Why is no real title available?)
A Mathematical Theory of Communication
A New Optimality Criterion for Nonhomogeneous Markov Decision Processes
A simple proof of the Moy-Perez generalization of the Shannon-McMillan theorem
Approximation theory of output statistics
Asymptotically mean stationary measures
Asynchronous stochastic approximation and Q-learning
Boundedness of iterates in \(Q\)-learning
Convergence results for single-step on-policy reinforcement-learning algorithms
Correction Notes: Correction to "The Individual Ergodic Theorem of Information Theory"
Discrete Dynamic Programming
Elements of Information Theory
Generalizations of Shannon-McMillan theorem
Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation
The Basic Theorems of Information Theory
The Individual Ergodic Theorem of Information Theory
The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
The asymptotic equipartition property in reinforcement learning and its relation to return maximization
The convergence of \(TD(\lambda)\) for general \(\lambda\)
The method of types [information theory]
The role of the asymptotic equipartition property in noiseless source coding
The strong ergodic theorem for densities: Generalized Shannon-McMillan- Breiman theorem
\({\mathcal Q}\)-learning

Cited in

(3)

The asymptotic equipartition property in reinforcement learning and its relation to return maximization
An information-theoretic analysis of Thompson sampling
Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning

This page was built for publication: An information-theoretic analysis of return maximization in reinforcement learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2375396)