The optimal unbiased value estimator and its relation to LSTD, TD and MC
DOI10.1007/S10994-010-5220-9zbMATH Open1446.62112arXiv0908.3458OpenAlexW1971070720MaRDI QIDQ415609FDOQ415609
Steffen Grünewälder, Klaus Obermayer
Publication date: 8 May 2012
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/0908.3458
Recommendations
sufficient statisticsleast-squares temporal difference learning (LSTD)Lehmann-Scheffe theoremmaximum likelihood value estimatorMonte Carlo estimation (MC)optimal unbiased value estimatortemporal difference learning (TD)
Density estimation (62G07) Monte Carlo methods (65C05) Learning and adaptive systems in artificial intelligence (68T05)
Cites Work
- Title not available (Why is that?)
- \({\mathcal Q}\)-learning
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- The variance of discounted Markov decision processes
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- Technical update: Least-squares temporal difference learning
- A Course in Enumeration
- Analytical mean squared error curves for temporal difference learning
- Reinforcement learning with replacing eligibility traces
- Bias and Variance Approximation in Value Function Estimates
Cited In (1)
This page was built for publication: The optimal unbiased value estimator and its relation to LSTD, TD and MC
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q415609)