Natural actor-critic algorithms
DOI10.1016/j.automatica.2009.07.008zbMath1183.93130OpenAlexW2094387729MaRDI QIDQ1049136
Mohammad Ghavamzadeh, Mark Lee, Richard S. Sutton, Shalabh Bhatnagar
Publication date: 8 January 2010
Published in: Automatica (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.automatica.2009.07.008
temporal difference learningfunction approximationapproximate dynamic programmingnatural gradientactor-critic reinforcement learning algorithmspolicy-gradient methodstwo-timescale stochastic approximation
Dynamic programming in optimal control and differential games (49L20) Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Stochastic learning and adaptive control (93E35)
Related Items
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Nonconvergence to unstable points in urn models and stochastic approximations
- Natural actor-critic algorithms
- Stochastic approximation methods for constrained and unconstrained systems
- Elevator group control using multiple reinforcement learning agents
- Asynchronous stochastic approximation and Q-learning
- Stochastic approximation with two time scales
- Average cost temporal-difference learning
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Reinforcement learning based algorithms for average cost Markov decision processes
- Learning Algorithms for Markov Decision Processes with Average Cost
- Functional Approximations and Dynamic Programming
- Some Pathological Traps for Stochastic Approximation
- A Survey of Applications of Markov Decision Processes
- An analysis of temporal-difference learning with function approximation
- Perturbation realization, potentials, and sensitivity analysis of Markov processes
- OnActor-Critic Algorithms
- Simulation-based optimization of Markov reward processes
- Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization
- Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization
- 10.1162/1532443041827907
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes
- On the convergence of temporal-difference learning with linear function approximation