Variance-constrained actor-critic algorithms for discounted and average reward MDPs (Q1689603)

scientific article

Language	Label	Description	Also known as
English	Variance-constrained actor-critic algorithms for discounted and average reward MDPs	scientific article

Statements

instance of

scholarly article

0 references

title

Variance-constrained actor-critic algorithms for discounted and average reward MDPs (English)

0 references

0 references

0 references

0 references

12 January 2018

0 references

full work available at URL

https://arxiv.org/abs/1403.6530

0 references

zbMATH Keywords

Markov decision process (MDP)

0 references

reinforcement learning (RL)

0 references

risk sensitive RL

0 references

actor-critic algorithms

0 references

multi-time-scale stochastic approximation

0 references

simultaneous perturbation stochastic approximation (SPSA)

0 references

smoothed functional (SF)

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

Q4264741

0 references

A Learning Algorithm for Risk-Sensitive Cost

0 references

0 references

0 references

0 references

0 references

Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization

0 references

Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization

0 references

An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes

0 references

An online actor-critic algorithm with function approximation for constrained Markov decision processes

0 references

Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences

0 references

Natural actor-critic algorithms

0 references

Stochastic approximation algorithms for constrained optimization via simulation

0 references

Stochastic recursive algorithms for optimization. Simultaneous perturbation methods

0 references

A sensitivity formula for risk-sensitive cost and the actor-critic algorithm

0 references

Q-Learning for Risk-Sensitive Control

0 references

An actor-critic algorithm for constrained Markov decision processes

0 references

Stochastic approximation. A dynamical systems viewpoint.

0 references

A Kiefer-Wolfowitz algorithm with randomized differences

0 references

The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

0 references

Percentile Optimization for Markov Decision Processes with Parameter Uncertainty

0 references

Weighted Means in Stochastic Approximation of Minima

0 references

On Asymptotic Normality in Stochastic Approximation

0 references

Variance-Penalized Markov Decision Processes

0 references

Percentile performance criteria for limiting average Markov decision processes

0 references

Q4739659

0 references

Risk-Sensitive Markov Decision Processes

0 references

OnActor-Critic Algorithms

0 references

Convergence rate of linear two-time-scale stochastic approximation.

0 references

Stochastic approximation methods for constrained and unconstrained systems

0 references

Algorithmic aspects of mean-variance optimization in Markov decision processes

0 references

Q4902563

0 references

Risk-sensitive reinforcement learning

0 references

Envelope Theorems for Arbitrary Choice Sets

0 references

Robust Control of Markov Decision Processes with Uncertain Transition Matrices

0 references

Risk-averse dynamic programming for Markov decision processes

0 references

Acceleration of Stochastic Approximation by Averaging

0 references

Q4315289

0 references

Perturbation theory and finite Markov chains

0 references

Risk-Sensitive Markov Control Processes

0 references

On general minimax theorems

0 references

The variance of discounted Markov decision processes

0 references

Multivariate stochastic approximation using a simultaneous perturbation gradient approximation

0 references

A one-measurement form of simultaneous perturbation stochastic approximation

0 references

Adaptive stochastic approximation by the simultaneous perturbation method

0 references

Simple statistical gradient-following algorithms for connectionist reinforcement learning

0 references

An analysis of temporal-difference learning with function approximation

0 references

Distributionally Robust Markov Decision Processes

0 references

Identifiers

zbMATH Open document ID

1432.90158

0 references

DOI

10.1007/s10994-016-5569-5

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1689603