Natural actor-critic algorithms

From MaRDI portal
Revision as of 00:09, 31 January 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:1049136

DOI10.1016/j.automatica.2009.07.008zbMath1183.93130OpenAlexW2094387729MaRDI QIDQ1049136

Mohammad Ghavamzadeh, Mark Lee, Richard S. Sutton, Shalabh Bhatnagar

Publication date: 8 January 2010

Published in: Automatica (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1016/j.automatica.2009.07.008




Related Items

An incremental off-policy search in a model-free Markov decision process using a single sample pathA constrained optimization perspective on actor-critic algorithms and application to network routingThe factored policy-gradient plannerTemporal concatenation for Markov decision processesA unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learningMultiscale Q-learning with linear function approximationUnnamed ItemA stability criterion for two timescale stochastic approximation schemesOn linear and super-linear convergence of natural policy gradient algorithmAn actor-critic algorithm with function approximation for discounted cost constrained Markov decision processesRisk-Sensitive Reinforcement Learning via Policy Gradient SearchVariance-constrained actor-critic algorithms for discounted and average reward MDPsMulti-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learningFast Global Convergence of Natural Policy Gradient Methods with Entropy RegularizationUnnamed ItemOn the sample complexity of actor-critic method for reinforcement learning with function approximationMulti-agent natural actor-critic reinforcement learning algorithmsReinforcement learning algorithms with function approximation: recent advances and applicationsPreference-based reinforcement learning: a formal framework and a policy iteration algorithmDynamics and risk sharing in groups of selfish individualsAn online actor-critic algorithm with function approximation for constrained Markov decision processesUnnamed ItemNonconvex Policy Search Using Variational InequalitiesAutonomous reinforcement learning with experience replayGlobal Convergence of Policy Gradient Methods to (Almost) Locally Optimal PoliciesLearning and control of exploration primitivesDeep Reinforcement Learning: A State-of-the-Art WalkthroughParameterized Markov decision process and its application to service rate controlRisk-Constrained Reinforcement Learning with Percentile Risk CriteriaHessian matrix distribution for Bayesian policy gradient reinforcement learningThe Borkar-Meyn theorem for asynchronous stochastic approximationsReal-time reinforcement learning by sequential actor-critics and experience replayAdaptive critic design with graph Laplacian for online learning control of nonlinear systemsUnnamed ItemNatural actor-critic algorithmsMulti-agent reinforcement learning: a selective overview of theories and algorithmsActor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural NetworksActor-Critic Algorithms with Online Feature Adaptation



Cites Work