Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
From MaRDI portal
Publication:5189863
DOI10.1162/neco.2009.12-08-922zbMath1186.68380OpenAlexW1967459934WikidataQ51782240 ScholiaQ51782240MaRDI QIDQ5189863
Kenji Doya, Eiji Uchibe, Jan Peters, Junichiro Yoshimoto, Tetsuro Morimura
Publication date: 11 March 2010
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1162/neco.2009.12-08-922
Related Items
Geometry and convergence of natural policy gradient methods ⋮ Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
Cites Work
- How to optimize discrete-event systems from a single sample path by the score function method
- On average versus discounted reward temporal-difference learning
- Technical update: Least-squares temporal difference learning
- Average cost temporal-difference learning
- Linear least-squares algorithms for temporal difference learning
- OnActor-Critic Algorithms
- Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
- Unnamed Item
- Unnamed Item
- Unnamed Item