On the existence of fixed points for approximate value iteration and temporal-difference learning

From MaRDI portal

Publication:1586803

Jump to:navigation, search

DOI10.1023/A:1004641123405zbMath1028.90077MaRDI QIDQ1586803

D. P. de Farias, Benjamin van Roy

Publication date: 19 February 2001

Published in: Journal of Optimization Theory and Applications (Search for Journal in Brave)

zbMATH Keywords

dynamic programming reinforcement learning value iteration temporal-difference learning neurodynamic programming

Mathematics Subject Classification ID

Dynamic programming (90C39)

Related Items (8)

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage ⋮ A perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces ⋮ A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning ⋮ Unnamed Item ⋮ Shape constraints in economics and operations research ⋮ Application of interval iterations to the entrainment problem in respiratory physiology ⋮ A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs ⋮ Analyzing Approximate Value Iteration Algorithms

Cites Work

This page was built for publication: On the existence of fixed points for approximate value iteration and temporal-difference learning

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1586803&oldid=13872696"