An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
DOI10.1162/NECO_A_00808zbMATH Open1472.68149DBLPjournals/neco/MaZHS16OpenAlexW2225522132WikidataQ47600318 ScholiaQ47600318MaRDI QIDQ5380403FDOQ5380403
Kohei Hatano, Yao Ma, Tingting Zhao, Masashi Sugiyama
Publication date: 4 June 2019
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1162/neco_a_00808
Recommendations
- Online learning in Markov decision processes with continuous actions
- Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes
- Policy Gradient for Continuing Tasks in Discounted Markov Decision Processes
- A basic formula for online policy gradient algorithms
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Policy gradient in continuous time
- Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
- Gradient based policy optimization of constrained Markov decision processes
Learning and adaptive systems in artificial intelligence (68T05) Online algorithms; streaming algorithms (68W27) Markov and semi-Markov decision processes (90C40)
Cites Work
- Online convex optimization in the bandit setting: gradient descent without a gradient
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Efficient algorithms for online decision problems
- Logarithmic Regret Algorithms for Online Convex Optimization
- Title not available (Why is that?)
- Markov Decision Processes with Arbitrary Reward Processes
- Online Markov Decision Processes
- Online Markov Decision Processes Under Bandit Feedback
- An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
Cited In (2)
This page was built for publication: An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5380403)