On the existence of fixed points for approximate value iteration and temporal-difference learning
From MaRDI portal
Publication:1586803
DOI10.1023/A:1004641123405zbMath1028.90077MaRDI QIDQ1586803
D. P. de Farias, Benjamin van Roy
Publication date: 19 February 2001
Published in: Journal of Optimization Theory and Applications (Search for Journal in Brave)
dynamic programmingreinforcement learningvalue iterationtemporal-difference learningneurodynamic programming
Related Items (8)
Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage ⋮ A perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces ⋮ A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning ⋮ Unnamed Item ⋮ Shape constraints in economics and operations research ⋮ Application of interval iterations to the entrainment problem in respiratory physiology ⋮ A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs ⋮ Analyzing Approximate Value Iteration Algorithms
Cites Work
This page was built for publication: On the existence of fixed points for approximate value iteration and temporal-difference learning