An emphatic approach to the problem of off-policy temporal-difference learning

From MaRDI portal

Publication:2810885

Jump to:navigation, search

MaRDI QIDQ2810885zbMATH OpenFDO

Authors Richard S. Sutton, A. Rupam Mahmood, Martha White

Publication date 6 June 2016

Published in Journal of Machine Learning Research (JMLR) (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1503.04269, http://jmlr.csail.mit.edu/papers/v17/14-488.html

zbMATH Keywords

convergence stability function approximation off-policy learning temporal-difference learning

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Recommendations

Cited in

(11)

This page was built for publication: An emphatic approach to the problem of off-policy temporal-difference learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2810885)

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:2810885&oldid=15709462"