Natural actor-critic algorithms
DOI10.1016/J.AUTOMATICA.2009.07.008zbMATH Open1183.93130OpenAlexW2094387729MaRDI QIDQ1049136FDOQ1049136
Authors: Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee
Publication date: 8 January 2010
Published in: Automatica (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.automatica.2009.07.008
Recommendations
approximate dynamic programmingfunction approximationnatural gradienttemporal difference learningactor-critic reinforcement learning algorithmspolicy-gradient methodstwo-timescale stochastic approximation
Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Dynamic programming in optimal control and differential games (49L20) Stochastic learning and adaptive control (93E35)
Cites Work
- scientific article; zbMATH DE number 3954793 (Why is no real title available?)
- scientific article; zbMATH DE number 48727 (Why is no real title available?)
- scientific article; zbMATH DE number 51132 (Why is no real title available?)
- scientific article; zbMATH DE number 1206370 (Why is no real title available?)
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- scientific article; zbMATH DE number 700091 (Why is no real title available?)
- scientific article; zbMATH DE number 1043533 (Why is no real title available?)
- scientific article; zbMATH DE number 1753152 (Why is no real title available?)
- 10.1162/1532443041827907
- A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes
- A Survey of Applications of Markov Decision Processes
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization
- Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization
- An analysis of temporal-difference learning with function approximation
- Asynchronous stochastic approximation and Q-learning
- Average cost temporal-difference learning
- Bayesian policy gradient and actor-critic algorithms
- Elevator group control using multiple reinforcement learning agents
- Functional Approximations and Dynamic Programming
- Learning algorithms for Markov decision processes with average cost
- Linear least-squares algorithms for temporal difference learning
- Natural actor-critic algorithms
- Nonconvergence to unstable points in urn models and stochastic approximations
- On the convergence of temporal-difference learning with linear function approximation
- OnActor-Critic Algorithms
- Perturbation realization, potentials, and sensitivity analysis of Markov processes
- Reinforcement learning based algorithms for average cost Markov decision processes
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Simulation-based optimization of Markov reward processes
- Some Pathological Traps for Stochastic Approximation
- Stochastic approximation methods for constrained and unconstrained systems
- Stochastic approximation with two time scales
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Variance reduction techniques for gradient estimates in reinforcement learning
Cited In (54)
- TD-regularized actor-critic methods
- Variational actor-critic algorithms,
- A convergent online single time scale actor critic algorithm
- Learning and control of exploration primitives
- OnActor-Critic Algorithms
- Risk-Sensitive Reinforcement Learning via Policy Gradient Search
- The factored policy-gradient planner
- Hessian matrix distribution for Bayesian policy gradient reinforcement learning
- Variance-constrained actor-critic algorithms for discounted and average reward MDPs
- Multi-agent natural actor-critic reinforcement learning algorithms
- An incremental off-policy search in a model-free Markov decision process using a single sample path
- Actor-critic method for high dimensional static Hamilton-Jacobi-Bellman partial differential equations based on neural networks
- Convergence of entropy-regularized natural policy gradient with linear function approximation
- A constrained optimization perspective on actor-critic algorithms and application to network routing
- On the sample complexity of actor-critic method for reinforcement learning with function approximation
- Nonconvex policy search using variational inequalities
- Hierarchical speed control for autonomous electric vehicle through deep reinforcement learning and robust control
- Approximate Newton Policy Gradient Algorithms
- The Borkar-Meyn theorem for asynchronous stochastic approximations
- Deep Reinforcement Learning: A State-of-the-Art Walkthrough
- Temporal concatenation for Markov decision processes
- Expected policy gradients for reinforcement learning
- Adaptive critic design with graph Laplacian for online learning control of nonlinear systems
- Dynamics and risk sharing in groups of selfish individuals
- A stability criterion for two timescale stochastic approximation schemes
- Natural actor-critic algorithms
- Multiscale Q-learning with linear function approximation
- Title not available (Why is no real title available?)
- Actor-critic algorithms based on symmetric perturbation sampling
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Autonomous reinforcement learning with experience replay
- Title not available (Why is no real title available?)
- Fast global convergence of natural policy gradient methods with entropy regularization
- Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning
- Finite-time analysis of natural actor-critic for POMDPs
- Inverse reinforcement learning via nonparametric spatio-temporal subgoal modeling
- On linear and super-linear convergence of natural policy gradient algorithm
- Error controlled actor-critic
- A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning
- Artificial Intelligence and Soft Computing - ICAISC 2004
- Reinforced mixture learning
- Parameterized Markov decision process and its application to service rate control
- Occupancy information ratio: infinite-horizon, information-directed, parameterized policy search
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Natural actor-critic based on batch recursive least-squares
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- Risk-constrained reinforcement learning with percentile risk criteria
- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
- Compatible natural gradient policy search
- Actor-critic algorithms with online feature adaptation
- Full gradient DQN reinforcement learning: a provably convergent scheme
- Global convergence of policy gradient methods to (almost) locally optimal policies
- Reinforcement learning algorithms with function approximation: recent advances and applications
This page was built for publication: Natural actor-critic algorithms
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1049136)