Natural actor-critic algorithms

From MaRDI portal
Publication:1049136

DOI10.1016/j.automatica.2009.07.008zbMath1183.93130OpenAlexW2094387729MaRDI QIDQ1049136

Mohammad Ghavamzadeh, Mark Lee, Richard S. Sutton, Shalabh Bhatnagar

Publication date: 8 January 2010

Published in: Automatica (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1016/j.automatica.2009.07.008



Related Items

An incremental off-policy search in a model-free Markov decision process using a single sample path, A constrained optimization perspective on actor-critic algorithms and application to network routing, The factored policy-gradient planner, Temporal concatenation for Markov decision processes, A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning, Multiscale Q-learning with linear function approximation, Unnamed Item, A stability criterion for two timescale stochastic approximation schemes, On linear and super-linear convergence of natural policy gradient algorithm, An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Variance-constrained actor-critic algorithms for discounted and average reward MDPs, Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning, Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization, Unnamed Item, On the sample complexity of actor-critic method for reinforcement learning with function approximation, Multi-agent natural actor-critic reinforcement learning algorithms, Reinforcement learning algorithms with function approximation: recent advances and applications, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Dynamics and risk sharing in groups of selfish individuals, An online actor-critic algorithm with function approximation for constrained Markov decision processes, Unnamed Item, Nonconvex Policy Search Using Variational Inequalities, Autonomous reinforcement learning with experience replay, Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies, Learning and control of exploration primitives, Deep Reinforcement Learning: A State-of-the-Art Walkthrough, Parameterized Markov decision process and its application to service rate control, Risk-Constrained Reinforcement Learning with Percentile Risk Criteria, Hessian matrix distribution for Bayesian policy gradient reinforcement learning, The Borkar-Meyn theorem for asynchronous stochastic approximations, Real-time reinforcement learning by sequential actor-critics and experience replay, Adaptive critic design with graph Laplacian for online learning control of nonlinear systems, Unnamed Item, Natural actor-critic algorithms, Multi-agent reinforcement learning: a selective overview of theories and algorithms, Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks, Actor-Critic Algorithms with Online Feature Adaptation



Cites Work