Reinforcement learning based algorithms for average cost Markov decision processes (Q2643632): Difference between revisions

@@ Property / full work available at URL @@
+https://doi.org/10.1007/s10626-006-0003-y
+Normal rank
@@ Property / OpenAlex ID @@
+W2061769118
@@ Property / OpenAlex ID: W2061769118 / rank @@
+Normal rank
@@ Property / cites work @@
+Dynamic programming and stochastic control
@@ Property / cites work: Dynamic programming and stochastic control / rank @@
+Normal rank
@@ Property / cites work @@
+A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes
+Normal rank
@@ Property / cites work @@
+Actor-critic algorithms for hierarchical Markov decision processes
+Normal rank
@@ Property / cites work @@
+Asynchronous Stochastic Approximations
@@ Property / cites work: Asynchronous Stochastic Approximations / rank @@
+Normal rank
@@ Property / cites work @@
+The actor-critic algorithm as multi-time-scale stochastic approximation.
+Normal rank
@@ Property / cites work @@
+The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
+Normal rank
@@ Property / cites work @@
+Actor-Critic--Type Learning Algorithms for Markov Decision Processes
+Normal rank
@@ Property / cites work @@
+OnActor-Critic Algorithms
@@ Property / cites work: OnActor-Critic Algorithms / rank @@
+Normal rank
@@ Property / cites work @@
+Q4715203
@@ Property / cites work: Q4715203 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4315289
@@ Property / cites work: Q4315289 / rank @@
+Normal rank
@@ Property / cites work @@
+An analysis of temporal-difference learning with function approximation
+Normal rank
@@ Property / cites work @@
+Average cost temporal-difference learning
@@ Property / cites work: Average cost temporal-difference learning / rank @@
+Normal rank
@@ Property / cites work @@
+Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
+Normal rank
@@ Property / cites work @@
+A one-measurement form of simultaneous perturbation stochastic approximation
+Normal rank
@@ Property / cites work @@
+Q4547446
@@ Property / cites work: Q4547446 / rank @@
+Normal rank