An analysis of model-based interval estimation for Markov decision processes (Q959899): Difference between revisions

@@ Property / cites work @@
+.1162/153244303321897663
@@ Property / cites work: 10.1162/153244303321897663 / rank @@
+Normal rank
@@ Property / cites work @@
+.1162/153244303765208377
@@ Property / cites work: 10.1162/153244303765208377 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3046711
@@ Property / cites work: Q3046711 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3093383
@@ Property / cites work: Q3093383 / rank @@
+Normal rank
@@ Property / cites work @@
+Bounded-parameter Markov decision processes
@@ Property / cites work: Bounded-parameter Markov decision processes / rank @@
+Normal rank
@@ Property / cites work @@
+Near-optimal reinforcement learning in polynomial time
+Normal rank
@@ Property / cites work @@
+Adaptive treatment allocation and the multi-armed bandit problem
+Normal rank
@@ Property / cites work @@
+Robust Control of Markov Decision Processes with Uncertain Transition Matrices
+Normal rank
@@ Property / cites work @@
+Q4315289
@@ Property / cites work: Q4315289 / rank @@
+Normal rank
@@ Property / cites work @@
+A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem
+Normal rank
@@ Property / cites work @@
+A theory of the learnable
@@ Property / cites work: A theory of the learnable / rank @@
+Normal rank