Natural actor-critic algorithms
DOI10.1016/J.AUTOMATICA.2009.07.008zbMATH Open1183.93130OpenAlexW2094387729MaRDI QIDQ1049136FDOQ1049136
Authors: Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee
Publication date: 8 January 2010
Published in: Automatica (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.automatica.2009.07.008
Recommendations
approximate dynamic programmingfunction approximationnatural gradienttemporal difference learningactor-critic reinforcement learning algorithmspolicy-gradient methodstwo-timescale stochastic approximation
Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Dynamic programming in optimal control and differential games (49L20) Stochastic learning and adaptive control (93E35)
Cites Work
- Title not available (Why is that?)
- Stochastic approximation methods for constrained and unconstrained systems
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Perturbation realization, potentials, and sensitivity analysis of Markov processes
- Title not available (Why is that?)
- Title not available (Why is that?)
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- A Survey of Applications of Markov Decision Processes
- Natural actor-critic algorithms
- Reinforcement learning based algorithms for average cost Markov decision processes
- OnActor-Critic Algorithms
- 10.1162/1532443041827907
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- Linear least-squares algorithms for temporal difference learning
- Functional Approximations and Dynamic Programming
- An analysis of temporal-difference learning with function approximation
- Simulation-based optimization of Markov reward processes
- Variance reduction techniques for gradient estimates in reinforcement learning
- Asynchronous stochastic approximation and Q-learning
- Stochastic approximation with two time scales
- Average cost temporal-difference learning
- Learning algorithms for Markov decision processes with average cost
- Some Pathological Traps for Stochastic Approximation
- Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization
- Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes
- On the convergence of temporal-difference learning with linear function approximation
- Nonconvergence to unstable points in urn models and stochastic approximations
- Title not available (Why is that?)
- Bayesian policy gradient and actor-critic algorithms
- Elevator group control using multiple reinforcement learning agents
Cited In (47)
- Variational actor-critic algorithms,
- A convergent online single time scale actor critic algorithm
- Learning and control of exploration primitives
- OnActor-Critic Algorithms
- Risk-Sensitive Reinforcement Learning via Policy Gradient Search
- The factored policy-gradient planner
- Hessian matrix distribution for Bayesian policy gradient reinforcement learning
- Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
- Variance-constrained actor-critic algorithms for discounted and average reward MDPs
- Multi-agent natural actor-critic reinforcement learning algorithms
- An incremental off-policy search in a model-free Markov decision process using a single sample path
- Title not available (Why is that?)
- Convergence of entropy-regularized natural policy gradient with linear function approximation
- A constrained optimization perspective on actor-critic algorithms and application to network routing
- On the sample complexity of actor-critic method for reinforcement learning with function approximation
- Hierarchical speed control for autonomous electric vehicle through deep reinforcement learning and robust control
- Approximate Newton Policy Gradient Algorithms
- Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
- The Borkar-Meyn theorem for asynchronous stochastic approximations
- Deep Reinforcement Learning: A State-of-the-Art Walkthrough
- Temporal concatenation for Markov decision processes
- Adaptive critic design with graph Laplacian for online learning control of nonlinear systems
- Dynamics and risk sharing in groups of selfish individuals
- A stability criterion for two timescale stochastic approximation schemes
- Natural actor-critic algorithms
- Multiscale Q-learning with linear function approximation
- Title not available (Why is that?)
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Autonomous reinforcement learning with experience replay
- Title not available (Why is that?)
- Actor-Critic Algorithms with Online Feature Adaptation
- Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
- Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning
- On linear and super-linear convergence of natural policy gradient algorithm
- A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning
- Artificial Intelligence and Soft Computing - ICAISC 2004
- Reinforced mixture learning
- Parameterized Markov decision process and its application to service rate control
- Occupancy information ratio: infinite-horizon, information-directed, parameterized policy search
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
- Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks
- Reinforcement learning algorithms with function approximation: recent advances and applications
- Title not available (Why is that?)
- Nonconvex Policy Search Using Variational Inequalities
This page was built for publication: Natural actor-critic algorithms
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1049136)