Technical update: Least-squares temporal difference learning

From MaRDI portal
Revision as of 02:43, 1 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:1604819

DOI10.1023/A:1017936530646zbMath1014.68072MaRDI QIDQ1604819

Justin A. Boyan

Publication date: 8 July 2002

Published in: Machine Learning (Search for Journal in Brave)




Related Items (21)

Approximate policy iteration: a survey and some new methodsAn online prediction algorithm for reinforcement learning with linear function approximation using cross entropy methodRestricted gradient-descent algorithm for value-function approximation in reinforcement learningA generalized Kalman filter for fixed point approximation and efficient temporal-difference learningBatch mode reinforcement learning based on the synthesis of artificial trajectoriesAn approximate dynamic programming approach to the admission control of elective patientsDeep reinforcement trading with predictable returnsThe optimal unbiased value estimator and its relation to LSTD, TD and MCReinforcement learning algorithms with function approximation: recent advances and applicationsAsymptotic analysis of value prediction by well-specified and misspecified modelsA two-level optimization model for elective surgery scheduling with downstream capacity constraintsOn Generalized Bellman Equations and Temporal-Difference LearningBasis function adaptation in temporal difference reinforcement learningDerivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement LearningProximal algorithms and temporal difference methods for solving fixed point problemsConvergence of the standard RLS method andUDUTfactorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programmingProjected equation methods for approximate solution of large linear systemsApproximate optimal adaptive control for weakly coupled nonlinear systems: A neuro-inspired approachUnnamed ItemUnnamed ItemSolving factored MDPs using non-homogeneous partitions




This page was built for publication: Technical update: Least-squares temporal difference learning