The actor-critic algorithm as multi-time-scale stochastic approximation.
From MaRDI portal
Publication:5955801
DOI10.1007/BF02745577zbMath1075.90557OpenAlexW2047364871MaRDI QIDQ5955801
Vijaymohan R. Konda, Vivek S. Borkar
Publication date: 18 February 2002
Published in: Sādhanā (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/bf02745577
Related Items (1)
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Nonconvergence to unstable points in urn models and stochastic approximations
- Generalized polynomial approximations in Markovian decision processes
- New method of stochastic approximation type
- Stochastic approximation methods for constrained and unconstrained systems
- Asynchronous stochastic approximation and Q-learning
- Stochastic approximation with two time scales
- \({\mathcal Q}\)-learning
- Feature-based methods for large scale dynamic programming
- Do stochastic algorithms avoid traps?
- Chaotic relaxation
- Estimation and control in discounted stochastic dynamic programming
- A tutorial survey of reinforcement learning
This page was built for publication: The actor-critic algorithm as multi-time-scale stochastic approximation.