Variance-constrained actor-critic algorithms for discounted and average reward MDPs
From MaRDI portal
Publication:1689603
DOI10.1007/s10994-016-5569-5zbMath1432.90158arXiv1403.6530OpenAlexW2963856199MaRDI QIDQ1689603
L. A. Prashanth, Mohammad Ghavamzadeh
Publication date: 12 January 2018
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1403.6530
actor-critic algorithmsreinforcement learning (RL)Markov decision process (MDP)simultaneous perturbation stochastic approximation (SPSA)multi-time-scale stochastic approximationrisk sensitive RLsmoothed functional (SF)
Related Items
Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Learning equilibrium mean‐variance strategy, Efficient reductions in cyclotomic rings -- application to Ring LWE based FHE schemes, Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Stochastic recursive algorithms for optimization. Simultaneous perturbation methods
- Risk-averse dynamic programming for Markov decision processes
- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
- A one-measurement form of simultaneous perturbation stochastic approximation
- On general minimax theorems
- Stochastic approximation. A dynamical systems viewpoint.
- Natural actor-critic algorithms
- Stochastic approximation methods for constrained and unconstrained systems
- Risk-sensitive reinforcement learning
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Convergence rate of linear two-time-scale stochastic approximation.
- Algorithmic aspects of mean-variance optimization in Markov decision processes
- An actor-critic algorithm for constrained Markov decision processes
- Adaptive stochastic approximation by the simultaneous perturbation method
- Risk-Sensitive Markov Control Processes
- Distributionally Robust Markov Decision Processes
- Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
- A Learning Algorithm for Risk-Sensitive Cost
- Variance-Penalized Markov Decision Processes
- Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
- Acceleration of Stochastic Approximation by Averaging
- An analysis of temporal-difference learning with function approximation
- Weighted Means in Stochastic Approximation of Minima
- OnActor-Critic Algorithms
- A Kiefer-Wolfowitz algorithm with randomized differences
- Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences
- Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization
- Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization
- Stochastic approximation algorithms for constrained optimization via simulation
- The variance of discounted Markov decision processes
- Percentile performance criteria for limiting average Markov decision processes
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Robust Control of Markov Decision Processes with Uncertain Transition Matrices
- Envelope Theorems for Arbitrary Choice Sets
- On Asymptotic Normality in Stochastic Approximation
- Perturbation theory and finite Markov chains
- Risk-Sensitive Markov Decision Processes
- Q-Learning for Risk-Sensitive Control
- A sensitivity formula for risk-sensitive cost and the actor-critic algorithm