An information-theoretic analysis of return maximization in reinforcement learning
From MaRDI portal
Publication:2375396
DOI10.1016/J.NEUNET.2011.05.002zbMATH Open1266.68156OpenAlexW2034994237WikidataQ51559078 ScholiaQ51559078MaRDI QIDQ2375396FDOQ2375396
Authors: Kazunori Iwata
Publication date: 14 June 2013
Published in: Neural Networks (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.neunet.2011.05.002
Recommendations
- The asymptotic equipartition property in reinforcement learning and its relation to return maximization
- An information-theoretic analysis of Thompson sampling
- Reinforcement learning in finite MDPs: PAC analysis
- Near-optimal reinforcement learning in polynomial time
- Using Expectation-Maximization for Reinforcement Learning
information theoryreinforcement learningasymptotic equipartition propertystochastic sequential decision process
Cites Work
- Elements of Information Theory
- Title not available (Why is that?)
- Title not available (Why is that?)
- A Mathematical Theory of Communication
- Title not available (Why is that?)
- \({\mathcal Q}\)-learning
- Title not available (Why is that?)
- Approximation theory of output statistics
- Title not available (Why is that?)
- The Basic Theorems of Information Theory
- The Individual Ergodic Theorem of Information Theory
- Title not available (Why is that?)
- The convergence of \(TD(\lambda)\) for general \(\lambda\)
- Discrete Dynamic Programming
- Asymptotically mean stationary measures
- The method of types [information theory]
- Asynchronous stochastic approximation and Q-learning
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- The strong ergodic theorem for densities: Generalized Shannon-McMillan- Breiman theorem
- A simple proof of the Moy-Perez generalization of the Shannon-McMillan theorem
- A New Optimality Criterion for Nonhomogeneous Markov Decision Processes
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Correction Notes: Correction to "The Individual Ergodic Theorem of Information Theory"
- Generalizations of Shannon-McMillan theorem
- Boundedness of iterates in \(Q\)-learning
- The asymptotic equipartition property in reinforcement learning and its relation to return maximization
- The role of the asymptotic equipartition property in noiseless source coding
- Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation
Cited In (3)
This page was built for publication: An information-theoretic analysis of return maximization in reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2375396)