Policy gradient in continuous time
From MaRDI portal
Recommendations
- scientific article; zbMATH DE number 1753152
- scientific article; zbMATH DE number 1753153
- On the policy improvement algorithm in continuous time
- Approximate gradient methods in policy-space optimization of Markov reward processes
- Policy iterations for reinforcement learning problems in continuous time and space -- fundamental theory and methods
Cited in
(29)- Policy iterations for reinforcement learning problems in continuous time and space -- fundamental theory and methods
- A policy gradient framework for stochastic optimal control problems with global convergence guarantee
- Monte Carlo gradient estimation in machine learning
- The factored policy-gradient planner
- Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning
- On the policy improvement algorithm in continuous time
- Linear Convergence of a Policy Gradient Method for Some Finite Horizon Continuous Time Control Problems
- On high-order differentiability of the policy function
- Policy Gradient Learning Methods for Stochastic Control with Exit Time and Applications to Share Repurchase Pricing
- Expected policy gradients for reinforcement learning
- Analysis and improvement of policy gradient estimation
- Policy gradient in Lipschitz Markov decision processes
- Recent developments in machine learning methods for stochastic control and games
- scientific article; zbMATH DE number 6982305 (Why is no real title available?)
- scientific article; zbMATH DE number 1753152 (Why is no real title available?)
- scientific article; zbMATH DE number 1753153 (Why is no real title available?)
- Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
- Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems
- scientific article; zbMATH DE number 7626721 (Why is no real title available?)
- Time-varying policy rule under learning
- Policy learning for time-bounded reachability in continuous-time Markov decision processes via doubly-stochastic gradient ascent
- Compatible natural gradient policy search
- Sublinear regret for a class of continuous-time linear-quadratic reinforcement learning problems
- 10.1162/jmlr.2003.3.4-5.921
- Entropy annealing for policy mirror descent in continuous time and space
- Derivative-free methods for policy optimization: guarantees for linear quadratic systems
- Inhomogeneous deep Q-network for time sensitive applications
- Multiagent relative investment games in a jump diffusion market with deep reinforcement learning algorithm
- An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
This page was built for publication: Policy gradient in continuous time
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3093369)