On average versus discounted reward temporal-difference learning
From MaRDI portal
Publication:1604814
DOI10.1023/A:1017980312899zbMath1014.68070MaRDI QIDQ1604814
John N. Tsitsiklis, Benjamin van Roy
Publication date: 8 July 2002
Published in: Machine Learning (Search for Journal in Brave)
Related Items (6)
Scalable Reinforcement Learning for Multiagent Networked Systems ⋮ Hyperbolically Discounted Temporal Difference Learning ⋮ Long-Term Reward Prediction in TD Models of the Dopamine System ⋮ Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning ⋮ Internal-Time Temporal Difference Model for Neural Value-Based Decision Making ⋮ Representation and Timing in Theories of the Dopamine System
This page was built for publication: On average versus discounted reward temporal-difference learning