Q5214215 (Q5214215): Difference between revisions

@@ Property / arXiv ID @@
+.07608
@@ Property / arXiv ID: 1703.07608 / rank @@
+Normal rank
@@ Property / cites work @@
+Near-Optimal Regret Bounds for Thompson Sampling
@@ Property / cites work: Near-Optimal Regret Bounds for Thompson Sampling / rank @@
+Normal rank
@@ Property / cites work @@
+Q4257216
@@ Property / cites work: Q4257216 / rank @@
+Normal rank
@@ Property / cites work @@
+Some asymptotic theory for the bootstrap
@@ Property / cites work: Some asymptotic theory for the bootstrap / rank @@
+Normal rank
@@ Property / cites work @@
+Discounted Dynamic Programming
@@ Property / cites work: Discounted Dynamic Programming / rank @@
+Normal rank
@@ Property / cites work @@
+Q4821526
@@ Property / cites work: Q4821526 / rank @@
+Normal rank
@@ Property / cites work @@
+.1162/153244303765208377
@@ Property / cites work: 10.1162/153244303765208377 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3959963
@@ Property / cites work: Q3959963 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4318617
@@ Property / cites work: Q4318617 / rank @@
+Normal rank
@@ Property / cites work @@
+Bootstrap prediction and Bayesian prediction under misspecified models
+Normal rank
@@ Property / cites work @@
+The Efficiency Analysis of Choices Involving Risk
@@ Property / cites work: The Efficiency Analysis of Choices Involving Risk / rank @@
+Normal rank
@@ Property / cites work @@
+Q2896090
@@ Property / cites work: Q2896090 / rank @@
+Normal rank
@@ Property / cites work @@
+Near-optimal reinforcement learning in polynomial time
+Normal rank
@@ Property / cites work @@
+Reducing reinforcement learning to KWIK online regression
+Normal rank
@@ Property / cites work @@
+Knows what it knows: a framework for self-aware learning
+Normal rank
@@ Property / cites work @@
+Increasing risk: Some direct constructions
@@ Property / cites work: Increasing risk: Some direct constructions / rank @@
+Normal rank
@@ Property / cites work @@
+Q( $$\lambda $$ ) with Off-Policy Corrections
@@ Property / cites work: Q( $$\lambda $$ ) with Off-Policy Corrections / rank @@
+Normal rank
@@ Property / cites work @@
+Q5214215
@@ Property / cites work: Q5214215 / rank @@
+Normal rank
@@ Property / cites work @@
+Bootstrapping data arrays of arbitrary order
@@ Property / cites work: Bootstrapping data arrays of arbitrary order / rank @@
+Normal rank
@@ Property / cites work @@
+Learning to Optimize via Posterior Sampling
@@ Property / cites work: Learning to Optimize via Posterior Sampling / rank @@
+Normal rank
@@ Property / cites work @@
+Learning to Optimize via Information-Directed Sampling
+Normal rank
@@ Property / cites work @@
+How Much Does Your Data Exploration Overfit? Controlling Bias via Information Usage
+Normal rank
@@ Property / cites work @@
+A Tutorial on Thompson Sampling
@@ Property / cites work: A Tutorial on Thompson Sampling / rank @@
+Normal rank
@@ Property / cites work @@
+Q4626283
@@ Property / cites work: Q4626283 / rank @@
+Normal rank
@@ Property / cites work @@
+Algorithms for Reinforcement Learning
@@ Property / cites work: Algorithms for Reinforcement Learning / rank @@
+Normal rank
@@ Property / cites work @@
+Q2880944
@@ Property / cites work: Q2880944 / rank @@
+Normal rank
@@ Property / cites work @@
+An analysis of temporal-difference learning with function approximation
+Normal rank