The optimal unbiased value estimator and its relation to LSTD, TD and MC
DOI10.1007/s10994-010-5220-9zbMath1446.62112arXiv0908.3458OpenAlexW1971070720MaRDI QIDQ415609
Steffen Grünewälder, Klaus Obermayer
Publication date: 8 May 2012
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/0908.3458
sufficient statisticsleast-squares temporal difference learning (LSTD)Lehmann-Scheffe theoremmaximum likelihood value estimatorMonte Carlo estimation (MC)optimal unbiased value estimatortemporal difference learning (TD)
Density estimation (62G07) Monte Carlo methods (65C05) Learning and adaptive systems in artificial intelligence (68T05)
Related Items (1)
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Analytical mean squared error curves for temporal difference learning
- Technical update: Least-squares temporal difference learning
- \({\mathcal Q}\)-learning
- Reinforcement learning with replacing eligibility traces
- Bias and Variance Approximation in Value Function Estimates
- A Course in Enumeration
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- The variance of discounted Markov decision processes
This page was built for publication: The optimal unbiased value estimator and its relation to LSTD, TD and MC