On generalized Bellman equations and temporal-difference learning

From MaRDI portal

Jump to:navigation, search

MaRDI QIDQ6829291zbMATH OpenFDO

Authors Huizhen Yu, A. Rupam Mahmood, Richard S. Sutton

Publication date 21 November 2018

Published in Journal of Machine Learning Research (JMLR) (Search for Journal in Brave)

Full work available at URL http://jmlr.csail.mit.edu/papers/v19/17-283.html

zbMATH Keywords

Markov chain Markov decision process reinforcement learning randomized stopping time generalized Bellman equation approximate policy evaluation temporal-difference method

Mathematics Subject Classification ID

Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Learning and adaptive systems in artificial intelligence (68T05) Dynamic programming (90C39) Markov and semi-Markov decision processes (90C40)

This page was built for publication: On generalized Bellman equations and temporal-difference learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6829291)

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=On_generalized_Bellman_equations_and_temporal-difference_learning&oldid=56016856"