Natural actor-critic algorithms

DOI10.1016/J.AUTOMATICA.2009.07.008zbMath1183.93130OpenAlexW2094387729MaRDI QIDQ1049136

Mohammad Ghavamzadeh, Mark Lee, Richard S. Sutton, Shalabh Bhatnagar

Publication date: 8 January 2010

Published in: Automatica (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1016/j.automatica.2009.07.008

zbMATH Keywords

temporal difference learning function approximation approximate dynamic programming natural gradient actor-critic reinforcement learning algorithms policy-gradient methods two-timescale stochastic approximation

Mathematics Subject Classification ID

Dynamic programming in optimal control and differential games (49L20) Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Stochastic learning and adaptive control (93E35)

Related Items (38)

An incremental off-policy search in a model-free Markov decision process using a single sample path ⋮ A constrained optimization perspective on actor-critic algorithms and application to network routing ⋮ The factored policy-gradient planner ⋮ Temporal concatenation for Markov decision processes ⋮ A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning ⋮ Multiscale Q-learning with linear function approximation ⋮ Unnamed Item ⋮ A stability criterion for two timescale stochastic approximation schemes ⋮ On linear and super-linear convergence of natural policy gradient algorithm ⋮ An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes ⋮ Risk-Sensitive Reinforcement Learning via Policy Gradient Search ⋮ Variance-constrained actor-critic algorithms for discounted and average reward MDPs ⋮ Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning ⋮ Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization ⋮ Unnamed Item ⋮ On the sample complexity of actor-critic method for reinforcement learning with function approximation ⋮ Multi-agent natural actor-critic reinforcement learning algorithms ⋮ Reinforcement learning algorithms with function approximation: recent advances and applications ⋮ Preference-based reinforcement learning: a formal framework and a policy iteration algorithm ⋮ Dynamics and risk sharing in groups of selfish individuals ⋮ An online actor-critic algorithm with function approximation for constrained Markov decision processes ⋮ Unnamed Item ⋮ Nonconvex Policy Search Using Variational Inequalities ⋮ Autonomous reinforcement learning with experience replay ⋮ Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies ⋮ Learning and control of exploration primitives ⋮ Deep Reinforcement Learning: A State-of-the-Art Walkthrough ⋮ Parameterized Markov decision process and its application to service rate control ⋮ Risk-Constrained Reinforcement Learning with Percentile Risk Criteria ⋮ Hessian matrix distribution for Bayesian policy gradient reinforcement learning ⋮ The Borkar-Meyn theorem for asynchronous stochastic approximations ⋮ Real-time reinforcement learning by sequential actor-critics and experience replay ⋮ Adaptive critic design with graph Laplacian for online learning control of nonlinear systems ⋮ Unnamed Item ⋮ Natural actor-critic algorithms ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks ⋮ Actor-Critic Algorithms with Online Feature Adaptation

Cites Work

This page was built for publication: Natural actor-critic algorithms