Technical update: Least-squares temporal difference learning

From MaRDI portal

Revision as of 02:43, 1 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:1604819

Jump to:navigation, search

DOI10.1023/A:1017936530646zbMath1014.68072MaRDI QIDQ1604819

Justin A. Boyan

Publication date: 8 July 2002

Published in: Machine Learning (Search for Journal in Brave)

zbMATH Keywords

model-based reinforcement learning technique

Mathematics Subject Classification ID

Computational learning theory (68Q32)

Related Items (21)

Approximate policy iteration: a survey and some new methods ⋮ An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method ⋮ Restricted gradient-descent algorithm for value-function approximation in reinforcement learning ⋮ A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning ⋮ Batch mode reinforcement learning based on the synthesis of artificial trajectories ⋮ An approximate dynamic programming approach to the admission control of elective patients ⋮ Deep reinforcement trading with predictable returns ⋮ The optimal unbiased value estimator and its relation to LSTD, TD and MC ⋮ Reinforcement learning algorithms with function approximation: recent advances and applications ⋮ Asymptotic analysis of value prediction by well-specified and misspecified models ⋮ A two-level optimization model for elective surgery scheduling with downstream capacity constraints ⋮ On Generalized Bellman Equations and Temporal-Difference Learning ⋮ Basis function adaptation in temporal difference reinforcement learning ⋮ Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning ⋮ Proximal algorithms and temporal difference methods for solving fixed point problems ⋮ Convergence of the standard RLS method andUDU^Tfactorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programming ⋮ Projected equation methods for approximate solution of large linear systems ⋮ Approximate optimal adaptive control for weakly coupled nonlinear systems: A neuro-inspired approach ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Solving factored MDPs using non-homogeneous partitions

This page was built for publication: Technical update: Least-squares temporal difference learning

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1604819&oldid=13899938"