An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
From MaRDI portal
Publication:5380403
Recommendations
- Online learning in Markov decision processes with continuous actions
- Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes
- Policy Gradient for Continuing Tasks in Discounted Markov Decision Processes
- A basic formula for online policy gradient algorithms
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Policy gradient in continuous time
- Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
- Gradient based policy optimization of constrained Markov decision processes
Cites work
- An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
- Efficient algorithms for online decision problems
- Logarithmic Regret Algorithms for Online Convex Optimization
- Markov decision processes with arbitrary reward processes
- Online Markov Decision Processes Under Bandit Feedback
- Online Markov decision processes
- Online convex optimization in the bandit setting: gradient descent without a gradient
- Reinforcement learning. An introduction
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
Cited in
(10)- A Q-learning algorithm for Markov decision processes with continuous state spaces
- An incremental off-policy search in a model-free Markov decision process using a single sample path
- Logarithmic regret bounds for continuous-time average-reward Markov decision processes
- Online Markov decision processes
- Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning
- Policy gradient in Lipschitz Markov decision processes
- Approximate stochastic annealing for online control of infinite horizon Markov decision processes
- Online learning in Markov decision processes with continuous actions
- Markov decision processes with arbitrary reward processes
- An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
This page was built for publication: An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5380403)