Analysis and improvement of policy gradient estimation (Q448295): Difference between revisions

@@ Property / author @@
-Masashi Sugiyama
@@ Property / author: Masashi Sugiyama / rank @@
-Normal rank
@@ Property / MaRDI profile type @@
+Publication
@@ Property / MaRDI profile type: Publication / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1016/j.neunet.2011.09.005
+Normal rank
@@ Property / OpenAlex ID @@
+W2148053762
@@ Property / OpenAlex ID: W2148053762 / rank @@
+Normal rank
@@ Property / Wikidata QID @@
+Q51513131
@@ Property / Wikidata QID: Q51513131 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4533363
@@ Property / cites work: Q4533363 / rank @@
+Normal rank
@@ Property / cites work @@
+Using Expectation-Maximization for Reinforcement Learning
+Normal rank
@@ Property / cites work @@
+Q4692508
@@ Property / cites work: Q4692508 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3093234
@@ Property / cites work: Q3093234 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4427427
@@ Property / cites work: Q4427427 / rank @@
+Normal rank
@@ Property / cites work @@
+.1162/1532443041827907
@@ Property / cites work: 10.1162/1532443041827907 / rank @@
+Normal rank
@@ Property / cites work @@
+Q2769922
@@ Property / cites work: Q2769922 / rank @@
+Normal rank
@@ Property / cites work @@
+Approximate gradient methods in policy-space optimization of Markov reward processes
+Normal rank
@@ Property / cites work @@
+Simple statistical gradient-following algorithms for connectionist reinforcement learning
+Normal rank
@@ Property / DBLP publication ID @@
+journals/nn/ZhaoHNS12
@@ Property / DBLP publication ID: journals/nn/ZhaoHNS12 / rank @@
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:448295