An information-theoretic analysis of return maximization in reinforcement learning (Q2375396): Difference between revisions

@@ Property / full work available at URL @@
+https://doi.org/10.1016/j.neunet.2011.05.002
+Normal rank
@@ Property / OpenAlex ID @@
+W2034994237
@@ Property / OpenAlex ID: W2034994237 / rank @@
+Normal rank
@@ Property / cites work @@
+The strong ergodic theorem for densities: Generalized Shannon-McMillan- Breiman theorem
+Normal rank
@@ Property / cites work @@
+Q3241581
@@ Property / cites work: Q3241581 / rank @@
+Normal rank
@@ Property / cites work @@
+Discrete Dynamic Programming
@@ Property / cites work: Discrete Dynamic Programming / rank @@
+Normal rank
@@ Property / cites work @@
+The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
+Normal rank
@@ Property / cites work @@
+The Individual Ergodic Theorem of Information Theory
+Normal rank
@@ Property / cites work @@
+Correction Notes: Correction to "The Individual Ergodic Theorem of Information Theory"
+Normal rank
@@ Property / cites work @@
+Elements of Information Theory
@@ Property / cites work: Elements of Information Theory / rank @@
+Normal rank
@@ Property / cites work @@
+The method of types [information theory]
@@ Property / cites work: The method of types [information theory] / rank @@
+Normal rank
@@ Property / cites work @@
+Q3686615
@@ Property / cites work: Q3686615 / rank @@
+Normal rank
@@ Property / cites work @@
+The convergence of \(TD(\lambda)\) for general \(\lambda\)
+Normal rank
@@ Property / cites work @@
+Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation
+Normal rank
@@ Property / cites work @@
+Boundedness of iterates in \(Q\)-learning
@@ Property / cites work: Boundedness of iterates in \(Q\)-learning / rank @@
+Normal rank
@@ Property / cites work @@
+Asymptotically mean stationary measures
@@ Property / cites work: Asymptotically mean stationary measures / rank @@
+Normal rank
@@ Property / cites work @@
+Q4779829
@@ Property / cites work: Q4779829 / rank @@
+Normal rank
@@ Property / cites work @@
+Approximation theory of output statistics
@@ Property / cites work: Approximation theory of output statistics / rank @@
+Normal rank
@@ Property / cites work @@
+A New Optimality Criterion for Nonhomogeneous Markov Decision Processes
+Normal rank
@@ Property / cites work @@
+Q3266141
@@ Property / cites work: Q3266141 / rank @@
+Normal rank
@@ Property / cites work @@
+The asymptotic equipartition property in reinforcement learning and its relation to return maximization
+Normal rank
@@ Property / cites work @@
+A simple proof of the Moy-Perez generalization of the Shannon-McMillan theorem
+Normal rank
@@ Property / cites work @@
+Q4346705
@@ Property / cites work: Q4346705 / rank @@
+Normal rank
@@ Property / cites work @@
+The Basic Theorems of Information Theory
@@ Property / cites work: The Basic Theorems of Information Theory / rank @@
+Normal rank
@@ Property / cites work @@
+Q4398828
@@ Property / cites work: Q4398828 / rank @@
+Normal rank
@@ Property / cites work @@
+Generalizations of Shannon-McMillan theorem
@@ Property / cites work: Generalizations of Shannon-McMillan theorem / rank @@
+Normal rank
@@ Property / cites work @@
+A Mathematical Theory of Communication
@@ Property / cites work: A Mathematical Theory of Communication / rank @@
+Normal rank
@@ Property / cites work @@
+Convergence results for single-step on-policy reinforcement-learning algorithms
+Normal rank
@@ Property / cites work @@
+Asynchronous stochastic approximation and Q-learning
+Normal rank
@@ Property / cites work @@
+The role of the asymptotic equipartition property in noiseless source coding
+Normal rank
@@ Property / cites work @@
+\({\mathcal Q}\)-learning
@@ Property / cites work: \({\mathcal Q}\)-learning / rank @@
+Normal rank