Variance-constrained actor-critic algorithms for discounted and average reward MDPs

From MaRDI portal

Publication:1689603

Jump to:navigation, search

DOI10.1007/s10994-016-5569-5zbMath1432.90158arXiv1403.6530OpenAlexW2963856199MaRDI QIDQ1689603

L. A. Prashanth, Mohammad Ghavamzadeh

Publication date: 12 January 2018

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1403.6530

zbMATH Keywords

actor-critic algorithms reinforcement learning (RL)Markov decision process (MDP)simultaneous perturbation stochastic approximation (SPSA)multi-time-scale stochastic approximation risk sensitive RL smoothed functional (SF)

Mathematics Subject Classification ID

Analysis of algorithms (68W40) Markov and semi-Markov decision processes (90C40)

Related Items

Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Learning equilibrium mean‐variance strategy, Efficient reductions in cyclotomic rings -- application to Ring LWE based FHE schemes, Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Cites Work

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1689603&oldid=14004886"