An actor-critic algorithm for constrained Markov decision processes

From MaRDI portal

Revision as of 03:03, 3 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:2504518

Jump to:navigation, search

DOI10.1016/J.SYSCONLE.2004.08.007zbMath1129.90322OpenAlexW2070570138MaRDI QIDQ2504518

Vivek S. Borkar

Publication date: 25 September 2006

Published in: Systems \& Control Letters (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1016/j.sysconle.2004.08.007

zbMATH Keywords

stochastic approximation reinforcement learning envelope theorem actor-critic algorithms constrained Markov decision processes

Mathematics Subject Classification ID

Search theory (90B40) Stochastic models in economics (91B70) Economic growth models (91B62)

Related Items (16)

A new learning algorithm for optimal stopping ⋮ An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes ⋮ Dimension reduction based adaptive dynamic programming for optimal control of discrete-time nonlinear control-affine systems ⋮ Risk-Sensitive Reinforcement Learning via Policy Gradient Search ⋮ Variance-constrained actor-critic algorithms for discounted and average reward MDPs ⋮ Safety-constrained reinforcement learning with a distributional safety critic ⋮ An online actor-critic algorithm with function approximation for constrained Markov decision processes ⋮ Approachability in Stackelberg stochastic games with vector costs ⋮ Delay-aware online service scheduling in high-speed railway communication systems ⋮ Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization ⋮ Risk-Constrained Reinforcement Learning with Percentile Risk Criteria ⋮ Optimal Distributed Uplink Channel Allocation: A Constrained MDP Formulation ⋮ Opportunistic Transmission over Randomly Varying Channels ⋮ A note on linear function approximation using random projections ⋮ Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation ⋮ Whittle index based Q-learning for restless bandits with average reward

Cites Work

This page was built for publication: An actor-critic algorithm for constrained Markov decision processes

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:2504518&oldid=15219861"