Online regret bounds for Markov decision processes with deterministic transitions (Q982638): Difference between revisions

@@ Property / cites work @@
+Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
+Normal rank
@@ Property / cites work @@
+Finite-time analysis of the multiarmed bandit problem
+Normal rank
@@ Property / cites work @@
+Optimal Adaptive Policies for Markov Decision Processes
+Normal rank
@@ Property / cites work @@
+Q2896090
@@ Property / cites work: Q2896090 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4315289
@@ Property / cites work: Q4315289 / rank @@
+Normal rank
@@ Property / cites work @@
+A characterization of the minimum cycle mean in a digraph
+Normal rank
@@ Property / cites work @@
+Faster parametric shortest path and minimum‐balance algorithms
+Normal rank
@@ Property / cites work @@
+Finding minimum cost to time ratio cycles with small integral transit times
+Normal rank
@@ Property / cites work @@
+Near-optimal reinforcement learning in polynomial time
+Normal rank
@@ Property / cites work @@
+Probability Inequalities for Sums of Bounded Random Variables
+Normal rank
@@ Property / cites work @@
+Q3093197
@@ Property / cites work: Q3093197 / rank @@
+Normal rank
@@ Property / cites work @@
+The Nonstochastic Multiarmed Bandit Problem
@@ Property / cites work: The Nonstochastic Multiarmed Bandit Problem / rank @@
+Normal rank
@@ Property / cites work @@
+Asymptotically efficient adaptive allocation rules
+Normal rank
@@ Property / cites work @@
+Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost
+Normal rank
@@ Property / cites work @@
+Optimal learning and experimentation in bandit problems.
+Normal rank
@@ Property / cites work @@
+Improved Rates for the Stochastic Continuum-Armed Bandit Problem
+Normal rank
@@ Property / cites work @@
+Online Markov Decision Processes
@@ Property / cites work: Online Markov Decision Processes / rank @@
+Normal rank