Online Regret Bounds for Markov Decision Processes with Deterministic Transitions (Q3529915): Difference between revisions

@@ Property / cites work @@
+Q4315289
@@ Property / cites work: Q4315289 / rank @@
+Normal rank
@@ Property / cites work @@
+A characterization of the minimum cycle mean in a digraph
+Normal rank
@@ Property / cites work @@
+Finding minimum cost to time ratio cycles with small integral transit times
+Normal rank
@@ Property / cites work @@
+Faster parametric shortest path and minimum‐balance algorithms
+Normal rank
@@ Property / cites work @@
+Near-optimal reinforcement learning in polynomial time
+Normal rank
@@ Property / cites work @@
+Finite-time analysis of the multiarmed bandit problem
+Normal rank
@@ Property / cites work @@
+Optimal Adaptive Policies for Markov Decision Processes
+Normal rank
@@ Property / cites work @@
+Mixing times with applications to perturbed Markov chains
+Normal rank
@@ Property / cites work @@
+Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
+Normal rank
@@ Property / cites work @@
+Markov chain sensitivity measured by mean first passage times
+Normal rank
@@ Property / cites work @@
+Q3093197
@@ Property / cites work: Q3093197 / rank @@
+Normal rank
@@ Property / cites work @@
+The Nonstochastic Multiarmed Bandit Problem
@@ Property / cites work: The Nonstochastic Multiarmed Bandit Problem / rank @@
+Normal rank
@@ Property / cites work @@
+Asymptotically efficient adaptive allocation rules
+Normal rank
@@ Property / cites work @@
+Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost
+Normal rank
@@ Property / cites work @@
+Optimal learning and experimentation in bandit problems.
+Normal rank
@@ Property / cites work @@
+Improved Rates for the Stochastic Continuum-Armed Bandit Problem
+Normal rank