Eligibility traces and forgetting factor in recursive least-squares-based temporal difference
From MaRDI portal
Publication:6495643
DOI10.1002/ACS.3282WikidataQ107162054 ScholiaQ107162054MaRDI QIDQ6495643
Simone Baldi, Zichen Zhang, Di Liu
Publication date: 30 April 2024
Published in: International Journal of Adaptive Control and Signal Processing (Search for Journal in Brave)
least squaresreinforcement learningtemporal differenceinstrumental variable methodeligibility traces
Cites Work
- Unnamed Item
- Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Technical update: Least-squares temporal difference learning
- Practical issues in temporal difference learning
- The convergence of \(TD(\lambda)\) for general \(\lambda\)
- Linear least-squares algorithms for temporal difference learning
- Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems
- An adaptive optimization scheme with satisfactory transient performance
- Adaptive critic design with graph Laplacian for online learning control of nonlinear systems
- Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems
- Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning
- Q(λ)-learning adaptive fuzzy logic controllers for pursuit-evasion differential games
- Adaptive Control Tutorial
- An analysis of temporal-difference learning with function approximation
- Adaptive Optimal Control for Large-Scale Nonlinear Systems
- Composite Model Reference Adaptive Control with Parameter Convergence Under Finite Excitation
- Adaptive Control Design Based on Adaptive Optimization Principles
- Adaptive dynamic programming for model‐free tracking of trajectories with time‐varying parameters
- Initial Excitation-Based Iterative Algorithm for Approximate Optimal Control of Completely Unknown LTI Systems
- Stability of Stochastic Approximations With “Controlled Markov” Noise and Temporal Difference Learning
This page was built for publication: Eligibility traces and forgetting factor in recursive least-squares-based temporal difference