The convergence of \(TD(\lambda)\) for general \(\lambda\)

From MaRDI portal

Publication:1812934

Jump to:navigation, search

DOI10.1007/BF00992701zbMath0773.68060MaRDI QIDQ1812934

Peter Dayan

Publication date: 11 August 1992

Published in: Machine Learning (Search for Journal in Brave)

zbMATH Keywords

reinforcement learning temporal differences asynchronous dynamic programming

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Related Items

Adaptive learning via selectionism and Bayesianism. II: The sequential case, An information-theoretic analysis of return maximization in reinforcement learning, A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning, Linear least-squares algorithms for temporal difference learning, Feature-based methods for large scale dynamic programming, On the worst-case analysis of temporal-difference learning algorithms, Reinforcement learning with replacing eligibility traces, Positivity and strict contractivity of functions of operators, Unnamed Item, Premium control with reinforcement learning, Reinforcement learning algorithms with function approximation: recent advances and applications, A \(Sarsa(\lambda)\) algorithm based on double-layer fuzzy reasoning, Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, The asymptotic equipartition property in reinforcement learning and its relation to return maximization, Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control, Practical issues in temporal difference learning, Reinforcement distribution in fuzzy Q-learning, Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation, On the existence of fixed points for approximate value iteration and temporal-difference learning, A simulation-based approach to stochastic dynamic programming

Cites Work

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1812934&oldid=12039130"