Analysis and improvement of policy gradient estimation (Q448295): Difference between revisions

@@ Property / full work available at URL @@
+https://doi.org/10.1016/j.neunet.2011.09.005
+Normal rank
@@ Property / OpenAlex ID @@
+W2148053762
@@ Property / OpenAlex ID: W2148053762 / rank @@
+Normal rank
@@ Property / Wikidata QID @@
+Q51513131
@@ Property / Wikidata QID: Q51513131 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4533363
@@ Property / cites work: Q4533363 / rank @@
+Normal rank
@@ Property / cites work @@
+Using Expectation-Maximization for Reinforcement Learning
+Normal rank
@@ Property / cites work @@
+Q4692508
@@ Property / cites work: Q4692508 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3093234
@@ Property / cites work: Q3093234 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4427427
@@ Property / cites work: Q4427427 / rank @@
+Normal rank
@@ Property / cites work @@
+.1162/1532443041827907
@@ Property / cites work: 10.1162/1532443041827907 / rank @@
+Normal rank
@@ Property / cites work @@
+Q2769922
@@ Property / cites work: Q2769922 / rank @@
+Normal rank
@@ Property / cites work @@
+Approximate gradient methods in policy-space optimization of Markov reward processes
+Normal rank
@@ Property / cites work @@
+Simple statistical gradient-following algorithms for connectionist reinforcement learning
+Normal rank