Recommendations
Cites work
- scientific article; zbMATH DE number 3954793 (Why is no real title available?)
- scientific article; zbMATH DE number 48727 (Why is no real title available?)
- scientific article; zbMATH DE number 51132 (Why is no real title available?)
- scientific article; zbMATH DE number 1206370 (Why is no real title available?)
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- scientific article; zbMATH DE number 700091 (Why is no real title available?)
- scientific article; zbMATH DE number 1043533 (Why is no real title available?)
- scientific article; zbMATH DE number 1753152 (Why is no real title available?)
- 10.1162/1532443041827907
- A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes
- A Survey of Applications of Markov Decision Processes
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization
- Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization
- An analysis of temporal-difference learning with function approximation
- Asynchronous stochastic approximation and Q-learning
- Average cost temporal-difference learning
- Bayesian policy gradient and actor-critic algorithms
- Elevator group control using multiple reinforcement learning agents
- Functional Approximations and Dynamic Programming
- Learning algorithms for Markov decision processes with average cost
- Linear least-squares algorithms for temporal difference learning
- Natural actor-critic algorithms
- Nonconvergence to unstable points in urn models and stochastic approximations
- On the convergence of temporal-difference learning with linear function approximation
- OnActor-Critic Algorithms
- Perturbation realization, potentials, and sensitivity analysis of Markov processes
- Reinforcement learning based algorithms for average cost Markov decision processes
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Simulation-based optimization of Markov reward processes
- Some Pathological Traps for Stochastic Approximation
- Stochastic approximation methods for constrained and unconstrained systems
- Stochastic approximation with two time scales
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Variance reduction techniques for gradient estimates in reinforcement learning
Cited in
(54)- On linear and super-linear convergence of natural policy gradient algorithm
- scientific article; zbMATH DE number 7370615 (Why is no real title available?)
- Temporal concatenation for Markov decision processes
- Parameterized Markov decision process and its application to service rate control
- Learning and control of exploration primitives
- The Borkar-Meyn theorem for asynchronous stochastic approximations
- Adaptive critic design with graph Laplacian for online learning control of nonlinear systems
- Occupancy information ratio: infinite-horizon, information-directed, parameterized policy search
- Expected policy gradients for reinforcement learning
- Actor-critic algorithms with online feature adaptation
- A constrained optimization perspective on actor-critic algorithms and application to network routing
- TD-regularized actor-critic methods
- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
- Actor-critic algorithms based on symmetric perturbation sampling
- Fast global convergence of natural policy gradient methods with entropy regularization
- Dynamics and risk sharing in groups of selfish individuals
- OnActor-Critic Algorithms
- Variational actor-critic algorithms,
- Reinforced mixture learning
- Hessian matrix distribution for Bayesian policy gradient reinforcement learning
- Error controlled actor-critic
- Natural actor-critic based on batch recursive least-squares
- An incremental off-policy search in a model-free Markov decision process using a single sample path
- A stability criterion for two timescale stochastic approximation schemes
- A convergent online single time scale actor critic algorithm
- Multi-agent natural actor-critic reinforcement learning algorithms
- Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning
- Natural actor-critic algorithms
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- Deep Reinforcement Learning: A State-of-the-Art Walkthrough
- Inverse reinforcement learning via nonparametric spatio-temporal subgoal modeling
- Reinforcement learning algorithms with function approximation: recent advances and applications
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Nonconvex policy search using variational inequalities
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Risk-Sensitive Reinforcement Learning via Policy Gradient Search
- Hierarchical speed control for autonomous electric vehicle through deep reinforcement learning and robust control
- Multiscale Q-learning with linear function approximation
- Global convergence of policy gradient methods to (almost) locally optimal policies
- Full gradient DQN reinforcement learning: a provably convergent scheme
- Actor-critic method for high dimensional static Hamilton-Jacobi-Bellman partial differential equations based on neural networks
- A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning
- Approximate Newton Policy Gradient Algorithms
- Compatible natural gradient policy search
- Finite-time analysis of natural actor-critic for POMDPs
- On the sample complexity of actor-critic method for reinforcement learning with function approximation
- scientific article; zbMATH DE number 7453114 (Why is no real title available?)
- Artificial Intelligence and Soft Computing - ICAISC 2004
- The factored policy-gradient planner
- Variance-constrained actor-critic algorithms for discounted and average reward MDPs
- Risk-constrained reinforcement learning with percentile risk criteria
- Autonomous reinforcement learning with experience replay
- Convergence of entropy-regularized natural policy gradient with linear function approximation
This page was built for publication: Natural actor-critic algorithms
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1049136)