Basis function adaptation in temporal difference reinforcement learning (Q2485935): Difference between revisions

@@ Property / full work available at URL @@
+https://doi.org/10.1007/s10479-005-5732-z
+Normal rank
@@ Property / OpenAlex ID @@
+W1998172110
@@ Property / OpenAlex ID: W1998172110 / rank @@
+Normal rank
@@ Property / cites work @@
+Application of the cross-entropy method to the buffer allocation problem in a simulation-based environment
+Normal rank
@@ Property / cites work @@
+Q4368722
@@ Property / cites work: Q4368722 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3151174
@@ Property / cites work: Q3151174 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4257216
@@ Property / cites work: Q4257216 / rank @@
+Normal rank
@@ Property / cites work @@
+Technical update: Least-squares temporal difference learning
+Normal rank
@@ Property / cites work @@
+Q5477859
@@ Property / cites work: Q5477859 / rank @@
+Normal rank
@@ Property / cites work @@
+A tutorial on the cross-entropy method
@@ Property / cites work: A tutorial on the cross-entropy method / rank @@
+Normal rank
@@ Property / cites work @@
+Q4001920
@@ Property / cites work: Q4001920 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4422978
@@ Property / cites work: Q4422978 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4353852
@@ Property / cites work: Q4353852 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4709211
@@ Property / cites work: Q4709211 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4315289
@@ Property / cites work: Q4315289 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4709223
@@ Property / cites work: Q4709223 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4828558
@@ Property / cites work: Q4828558 / rank @@
+Normal rank
@@ Property / cites work @@
+The cross-entropy method for combinatorial and continuous optimization
+Normal rank
@@ Property / cites work @@
+Q5477860
@@ Property / cites work: Q5477860 / rank @@
+Normal rank
@@ Property / cites work @@
+An analysis of temporal-difference learning with function approximation
+Normal rank
@@ Property / cites work @@
+An adaptive optimal controller for discrete-time Markov environments
+Normal rank