Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning (Q5189863): Difference between revisions

@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1162/neco.2009.12-08-922
+Normal rank
@@ Property / OpenAlex ID @@
+W1967459934
@@ Property / OpenAlex ID: W1967459934 / rank @@
+Normal rank
@@ Property / Wikidata QID @@
+Q51782240
@@ Property / Wikidata QID: Q51782240 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4368722
@@ Property / cites work: Q4368722 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4257216
@@ Property / cites work: Q4257216 / rank @@
+Normal rank
@@ Property / cites work @@
+Technical update: Least-squares temporal difference learning
+Normal rank
@@ Property / cites work @@
+Linear least-squares algorithms for temporal difference learning
+Normal rank
@@ Property / cites work @@
+OnActor-Critic Algorithms
@@ Property / cites work: OnActor-Critic Algorithms / rank @@
+Normal rank
@@ Property / cites work @@
+Q4457477
@@ Property / cites work: Q4457477 / rank @@
+Normal rank
@@ Property / cites work @@
+Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
+Normal rank
@@ Property / cites work @@
+How to optimize discrete-event systems from a single sample path by the score function method
+Normal rank
@@ Property / cites work @@
+Average cost temporal-difference learning
@@ Property / cites work: Average cost temporal-difference learning / rank @@
+Normal rank
@@ Property / cites work @@
+On average versus discounted reward temporal-difference learning
+Normal rank
@@ Property / DBLP publication ID @@
+journals/neco/MorimuraUYPD10
@@ Property / DBLP publication ID: journals/neco/MorimuraUYPD10 / rank @@
+Normal rank