An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions (Q5380403): Difference between revisions

← Older edit

@@ Property / DOI @@
-.1162/NECO_a_00808
@@ Property / DOI: 10.1162/NECO_a_00808 / rank @@
-Normal rank
@@ Property / cites work @@
+Online Markov Decision Processes
@@ Property / cites work: Online Markov Decision Processes / rank @@
+Normal rank
@@ Property / cites work @@
+Q2921693
@@ Property / cites work: Q2921693 / rank @@
+Normal rank
@@ Property / cites work @@
+Logarithmic Regret Algorithms for Online Convex Optimization
+Normal rank
@@ Property / cites work @@
+Efficient algorithms for online decision problems
@@ Property / cites work: Efficient algorithms for online decision problems / rank @@
+Normal rank
@@ Property / cites work @@
+An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
+Normal rank
@@ Property / cites work @@
+Online Markov Decision Processes Under Bandit Feedback
+Normal rank
@@ Property / cites work @@
+Q4626283
@@ Property / cites work: Q4626283 / rank @@
+Normal rank
@@ Property / cites work @@
+Simple statistical gradient-following algorithms for connectionist reinforcement learning
+Normal rank
@@ Property / cites work @@
+Markov Decision Processes with Arbitrary Reward Processes
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1162/neco_a_00808
+Normal rank
@@ Property / OpenAlex ID @@
+W2225522132
@@ Property / OpenAlex ID: W2225522132 / rank @@
+Normal rank
@@ Property / DBLP publication ID @@
+journals/neco/MaZHS16
@@ Property / DBLP publication ID: journals/neco/MaZHS16 / rank @@
+Normal rank
@@ Property / DOI @@
+.1162/NECO_A_00808
@@ Property / DOI: 10.1162/NECO_A_00808 / rank @@
+Normal rank
@@ Property / Recommended article @@
+Online Learning in Markov Decision Processes with Continuous Actions
+Normal rank
+Similarity Score: 0.92699593Amount 0.92699593
Unit 1
-Amount
+.92699593
 Unit
+Recommender Run: Recommender Run 2
@@ Property / Recommended article @@
+Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes
+Normal rank
+Similarity Score: 0.91683835Amount 0.91683835
Unit 1
-Amount
+.91683835
 Unit
+Recommender Run: Recommender Run 2
@@ Property / Recommended article @@
+Policy Gradient for Continuing Tasks in Discounted Markov Decision Processes
+Normal rank
+Similarity Score: 0.90851086Amount 0.90851086
Unit 1
-Amount
+.90851086
 Unit
+Recommender Run: Recommender Run 2
@@ Property / Recommended article @@
+A basic formula for online policy gradient algorithms
+Normal rank
+Similarity Score: 0.9016662Amount 0.9016662
Unit 1
-Amount
+.9016662
 Unit
+Recommender Run: Recommender Run 2
@@ Property / Recommended article @@
+An online actor-critic algorithm with function approximation for constrained Markov decision processes
+Normal rank
+Similarity Score: 0.8997418Amount 0.8997418
Unit 1
-Amount
+.8997418
 Unit
+Recommender Run: Recommender Run 2
@@ Property / Recommended article @@
+Q3093369
@@ Property / Recommended article: Q3093369 / rank @@
+Normal rank
@@ Property / Recommended article: Q3093369 / qualifier @@
+Similarity Score: 0.8991422Amount 0.8991422
Unit 1
-Amount
+.8991422
 Unit
@@ Property / Recommended article: Q3093369 / qualifier @@
+Recommender Run: Recommender Run 2
@@ Property / Recommended article @@
+Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
+Normal rank
+Similarity Score: 0.89336705Amount 0.89336705
Unit 1
-Amount
+.89336705
 Unit
+Recommender Run: Recommender Run 2
@@ Property / Recommended article @@
+Real-Time Reinforcement Learning of Constrained Markov Decision Processes with Weak Derivatives
+Normal rank
+Similarity Score: 0.89125407Amount 0.89125407
Unit 1
-Amount
+.89125407
 Unit
+Recommender Run: Recommender Run 2

Latest revision as of 13:22, 4 April 2025

scientific article; zbMATH DE number 7062532

Language	Label	Description	Also known as
English	An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions	scientific article; zbMATH DE number 7062532