A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
From MaRDI portal
Publication:859737
DOI10.1007/s10626-006-8134-8zbMath1104.93054OpenAlexW2062541405MaRDI QIDQ859737
Publication date: 18 January 2007
Published in: Discrete Event Dynamic Systems (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10626-006-8134-8
Dynamic programmingKalman filterOptimal stoppingQueueingReinforcement learningRecursive least-squaresTemporal-difference learning
Filtering in stochastic control theory (93E11) Least squares and related methods for stochastic control systems (93E24) Stochastic learning and adaptive control (93E35)
Related Items (6)
Approximate policy iteration: a survey and some new methods ⋮ A new learning algorithm for optimal stopping ⋮ Q-learning and policy iteration algorithms for stochastic shortest path problems ⋮ On regression-based stopping times ⋮ Projected equation methods for approximate solution of large linear systems ⋮ Fundamental design principles for reinforcement learning algorithms
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- Technical update: Least-squares temporal difference learning
- The convergence of \(TD(\lambda)\) for general \(\lambda\)
- Functional Approximations and Dynamic Programming
- Extensions of the multiarmed bandit problem: The discounted case
- An analysis of temporal-difference learning with function approximation
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- 10.1162/1532443041827907
- On the convergence of temporal-difference learning with linear function approximation
This page was built for publication: A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning