Exploration-exploitation tradeoff using variance estimates in multi-armed bandits (Q1017665): Difference between revisions

@@ Property / cites work @@
+Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem
+Normal rank
@@ Property / cites work @@
+Finite-time analysis of the multiarmed bandit problem
+Normal rank
@@ Property / cites work @@
+On tail probabilities for martingales
@@ Property / cites work: On tail probabilities for martingales / rank @@
+Normal rank
@@ Property / cites work @@
+Q4692329
@@ Property / cites work: Q4692329 / rank @@
+Normal rank
@@ Property / cites work @@
+Probability Inequalities for Sums of Bounded Random Variables
+Normal rank
@@ Property / cites work @@
+Asymptotically efficient adaptive allocation rules
+Normal rank
@@ Property / cites work @@
+Machine learning and nonparametric bandit theory
@@ Property / cites work: Machine learning and nonparametric bandit theory / rank @@
+Normal rank
@@ Property / cites work @@
+Some aspects of the sequential design of experiments
+Normal rank