Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

From MaRDI portal

Publication:2051259

Jump to:navigation, search

DOI10.1007/s10994-020-05912-5OpenAlexW3118861484MaRDI QIDQ2051259

Nathaniel Korda, L. A. Prashanth, Rémi Munos

Publication date: 24 November 2021

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1306.2557

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Related Items

Risk-Sensitive Reinforcement Learning via Policy Gradient Search, A concentration bound for \(\operatorname{LSPE}( \lambda )\), Concentration of Contractive Stochastic Approximation and Reinforcement Learning

Cites Work

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:2051259&oldid=14531379"