Online regret bounds for Markov decision processes with deterministic transitions
From MaRDI portal
Publication:982638
DOI10.1016/j.tcs.2010.04.005zbMath1198.90388OpenAlexW2150011303WikidataQ29307615 ScholiaQ29307615MaRDI QIDQ982638
Publication date: 7 July 2010
Published in: Theoretical Computer Science (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.tcs.2010.04.005
Related Items (1)
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Asymptotically efficient adaptive allocation rules
- A characterization of the minimum cycle mean in a digraph
- Near-optimal reinforcement learning in polynomial time
- Optimal learning and experimentation in bandit problems.
- Finding minimum cost to time ratio cycles with small integral transit times
- Online Markov Decision Processes
- Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
- Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost
- Optimal Adaptive Policies for Markov Decision Processes
- The Nonstochastic Multiarmed Bandit Problem
- Probability Inequalities for Sums of Bounded Random Variables
- Improved Rates for the Stochastic Continuum-Armed Bandit Problem
- Faster parametric shortest path and minimum‐balance algorithms
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: Online regret bounds for Markov decision processes with deterministic transitions