Near-optimal regret bounds for reinforcement learning

zbMATH Open1242.68229MaRDI QIDQ2896090FDOQ2896090

Authors: Thomas Jaksch, Ronald Ortner, Peter Auer

Publication date: 13 July 2012

Published in: Journal of Machine Learning Research (JMLR) (Search for Journal in Brave)

Full work available at URL: http://www.jmlr.org/papers/v11/jaksch10a.html

Recommendations

scientific article; zbMATH DE number 7014219
Near-optimal PAC bounds for discounted MDPs
Regret bounds for reinforcement learning via Markov chain concentration
Near-optimal reinforcement learning in polynomial time
PAC Bounds for Discounted MDPs

zbMATH Keywords

Markov decision process online learning regret sample complexity undiscounted reinforcement learning

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)

Cited In (55)

Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization
Logarithmic regret bounds for continuous-time average-reward Markov decision processes
Learning optimal admission control in partially observable queueing networks
Value iteration for streaming data on a continuous space with gradient method in an RKHS
Scale-free online learning
Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
Learning the distribution with largest mean: two bandit frameworks
Selecting near-optimal approximate state representations in reinforcement learning
Omega-Regular Objectives in Model-Free Reinforcement Learning
Dynamic pricing with multiple products and partially specified demand distribution
Estimation and approximation bounds for gradient-based reinforcement learning
Pessimistic value iteration for multi-task data sharing in offline reinforcement learning
Performance guarantees for policy learning
Learning to optimize via information-directed sampling
Regret bounds for reinforcement learning via Markov chain concentration
Attainability of boundary points under reinforcement learning
Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
Regret bounds for online-learning-based linear quadratic control under database attacks
Temporal concatenation for Markov decision processes
Provably efficient reinforcement learning in decentralized general-sum Markov games
Reducing reinforcement learning to KWIK online regression
Relative loss bounds for temporal-difference learning
Near-optimal reinforcement learning in polynomial time
Title not available (Why is that?)
Improved regret for zeroth-order adversarial bandit convex optimisation
Regret bounds for restless Markov bandits
Extreme state aggregation beyond Markov decision processes
Settling the sample complexity of model-based offline reinforcement learning
Dynamic Inventory and Price Controls Involving Unknown Demand on Discrete Nonperishable Items
Near-optimal PAC bounds for discounted MDPs
A bandit-learning approach to multifidelity approximation
Reinforcement learning in robust Markov decision processes
Title not available (Why is that?)
Lipschitzness is all you need to tame off-policy generative adversarial imitation learning
Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning
Optimistic Posterior Sampling for Reinforcement Learning: Worst-Case Regret Bounds
Online regret bounds for Markov decision processes with deterministic transitions
Robust MDPs with \(k\)-rectangular uncertainty
PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP
Regret-Optimal Estimation and Control
Learning unknown service rates in queues: a multiarmed bandit approach
Title not available (Why is that?)
Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Multi-agent reinforcement learning: a selective overview of theories and algorithms
Online learning in Markov decision processes with continuous actions
Learning in structured MDPs with convex cost functions: improved regret bounds for inventory management
Explicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial time
Title not available (Why is that?)
Adaptive aggregation for reinforcement learning in average reward Markov decision processes
Scale-free algorithms for online linear optimization
Deep exploration via randomized value functions
Bayesian optimistic Kullback-Leibler exploration
Regret bounds for Narendra-Shapiro bandit algorithms

Uses Software

R-MAX

This page was built for publication: Near-optimal regret bounds for reinforcement learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2896090)