Natural actor-critic algorithms

From MaRDI portal

(Redirected from Publication:1049136)

Jump to:navigation, search

DOI10.1016/J.AUTOMATICA.2009.07.008MaRDI QIDQ1049136zbMATH OpenOpenAlexFDO

Authors Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee

Publication date 8 January 2010

Published in Automatica (Search for Journal in Brave)

Full work available at URL https://doi.org/10.1016/j.automatica.2009.07.008

zbMATH Keywords

approximate dynamic programming function approximation natural gradient temporal difference learning actor-critic reinforcement learning algorithms policy-gradient methods two-timescale stochastic approximation

Mathematics Subject Classification ID

Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Dynamic programming in optimal control and differential games (49L20) Stochastic learning and adaptive control (93E35)

Recommendations

Cites work

Cited in

(54)

This page was built for publication: Natural actor-critic algorithms

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1049136)

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Natural_actor-critic_algorithms&oldid=66954513"