Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

DOI10.1002/ACS.3282WikidataQ107162054 ScholiaQ107162054MaRDI QIDQ6495643FDOQ6495643

Authors: Simone Baldi, Zichen Zhang, Di Liu

Publication date: 30 April 2024

Published in: International Journal of Adaptive Control and Signal Processing (Search for Journal in Brave)

Recommendations

scientific article; zbMATH DE number 1753141
scientific article; zbMATH DE number 1501878
Reinforcement learning with replacing eligibility traces
Linear least-squares algorithms for temporal difference learning
Linear least-squares algorithms for temporal difference learning

zbMATH Keywords

instrumental variable method least squares reinforcement learning temporal difference eligibility traces

Mathematics Subject Classification ID

Systems theory; control (93-XX)

Cites Work

The convergence of \(TD(\lambda)\) for general \(\lambda\)
Adaptive Control Tutorial
An analysis of temporal-difference learning with function approximation
Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning
Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems
Technical update: Least-squares temporal difference learning
Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems
Convergence results for single-step on-policy reinforcement-learning algorithms
Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning
Linear least-squares algorithms for temporal difference learning
Title not available (Why is that?)
Practical issues in temporal difference learning
Adaptive Optimal Control for Large-Scale Nonlinear Systems
An adaptive optimization scheme with satisfactory transient performance
Composite Model Reference Adaptive Control with Parameter Convergence Under Finite Excitation
Adaptive Control Design Based on Adaptive Optimization Principles
Stability of Stochastic Approximations With “Controlled Markov” Noise and Temporal Difference Learning
Initial Excitation-Based Iterative Algorithm for Approximate Optimal Control of Completely Unknown LTI Systems
Adaptive critic design with graph Laplacian for online learning control of nonlinear systems
\(Q(\lambda )\)-learning adaptive fuzzy logic controllers for pursuit-evasion differential games
Adaptive dynamic programming for model-free tracking of trajectories with time-varying parameters

This page was built for publication: Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6495643)