Bayesian optimistic Kullback-Leibler exploration (Q2425228): Difference between revisions

@@ Property / cites work @@
+Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
+Normal rank
@@ Property / cites work @@
+Q4821526
@@ Property / cites work: Q4821526 / rank @@
+Normal rank
@@ Property / cites work @@
+.1162/153244303765208377
@@ Property / cites work: 10.1162/153244303765208377 / rank @@
+Normal rank
@@ Property / cites work @@
+Q2896090
@@ Property / cites work: Q2896090 / rank @@
+Normal rank
@@ Property / cites work @@
+Near-optimal reinforcement learning in polynomial time
+Normal rank
@@ Property / cites work @@
+Q5305630
@@ Property / cites work: Q5305630 / rank @@
+Normal rank
@@ Property / cites work @@
+An analysis of model-based interval estimation for Markov decision processes
+Normal rank