Temporal-difference search in Computer Go (Q420936): Difference between revisions

@@ Property / cites work @@
+Finite-time analysis of the multiarmed bandit problem
+Normal rank
@@ Property / cites work @@
+Learning to play chess using temporal differences
@@ Property / cites work: Learning to play chess using temporal differences / rank @@
+Normal rank
@@ Property / cites work @@
+Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength
+Normal rank
@@ Property / cites work @@
+Q4000104
@@ Property / cites work: Q4000104 / rank @@
+Normal rank
@@ Property / cites work @@
+Analytical mean squared error curves for temporal difference learning
+Normal rank
@@ Property / cites work @@
+Amazons Discover Monte-Carlo
@@ Property / cites work: Amazons Discover Monte-Carlo / rank @@
+Normal rank
@@ Property / cites work @@
+Computer Go
@@ Property / cites work: Computer Go / rank @@
+Normal rank
@@ Property / cites work @@
+Convergence results for single-step on-policy reinforcement-learning algorithms
+Normal rank
@@ Property / cites work @@
+An Analysis of UCT in Multi-player Games
@@ Property / cites work: An Analysis of UCT in Multi-player Games / rank @@
+Normal rank
@@ Property / cites work @@
+.1162/153244303768966102
@@ Property / cites work: 10.1162/153244303768966102 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3724211
@@ Property / cites work: Q3724211 / rank @@
+Normal rank