The convergence of TD() for general
From MaRDI portal
Publication:1812934
DOI10.1007/BF00992701zbMATH Open0773.68060MaRDI QIDQ1812934FDOQ1812934
Authors: Peter Dayan
Publication date: 11 August 1992
Published in: Machine Learning (Search for Journal in Brave)
Recommendations
- Linear least-squares algorithms for temporal difference learning
- Linear least-squares algorithms for temporal difference learning
- On the worst-case analysis of temporal-difference learning algorithms
- On the convergence of temporal-difference learning with linear function approximation
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Cites Work
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- A New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)
- Title not available (Why is that?)
- An adaptive optimal controller for discrete-time Markov environments
- A Boolean complete neural model of adaptive behavior
- Title not available (Why is that?)
- Title not available (Why is that?)
Cited In (35)
- TD(λ) learning without eligibility traces: a theoretical analysis
- An information-theoretic analysis of return maximization in reinforcement learning
- Neural Temporal Difference and Q Learning Provably Converge to Global Optima
- Adaptive learning via selectionism and Bayesianism. II: The sequential case
- Weak convergence properties of constrained emphatic temporal-difference learning with constant and slowly diminishing stepsize
- Aspects regarding the existence of fixed points of the iterates of Stancu operators
- Premium control with reinforcement learning
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
- On the worst-case analysis of temporal-difference learning algorithms
- Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control
- Practical issues in temporal difference learning
- Reinforcement learning with replacing eligibility traces
- A simulation-based approach to stochastic dynamic programming
- The asymptotic equipartition property in reinforcement learning and its relation to return maximization
- On the worst-case analysis of temporal-difference learning algorithms
- Least squares temporal difference methods: An analysis under general conditions
- A \(Sarsa(\lambda)\) algorithm based on double-layer fuzzy reasoning
- Feature-based methods for large scale dynamic programming
- Linear least-squares algorithms for temporal difference learning
- Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
- Reinforcement distribution in fuzzy Q-learning
- Eligibility traces and forgetting factor in recursive least-squares-based temporal difference
- Title not available (Why is that?)
- Iterates of Stancu operators (via fixed point principles) revisited
- Title not available (Why is that?)
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- Linear least-squares algorithms for temporal difference learning
- Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison
- Finite-time error bounds for distributed linear stochastic approximation
- Extension of \(\lambda\)-PIR for weakly contractive operators via fixed point theory
- Finite-time performance of distributed temporal-difference learning with linear function approximation
- Positivity and strict contractivity of functions of operators
- Reinforcement learning algorithms with function approximation: recent advances and applications
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
This page was built for publication: The convergence of \(TD(\lambda)\) for general \(\lambda\)
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1812934)