Near-optimal regret bounds for reinforcement learning
From MaRDI portal
Publication:2896090
zbMATH Open1242.68229MaRDI QIDQ2896090FDOQ2896090
Authors: Thomas Jaksch, Ronald Ortner, Peter Auer
Publication date: 13 July 2012
Published in: Journal of Machine Learning Research (JMLR) (Search for Journal in Brave)
Full work available at URL: http://www.jmlr.org/papers/v11/jaksch10a.html
Recommendations
Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)
Cited In (55)
- Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization
- Logarithmic regret bounds for continuous-time average-reward Markov decision processes
- Learning optimal admission control in partially observable queueing networks
- Value iteration for streaming data on a continuous space with gradient method in an RKHS
- Scale-free online learning
- Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations
- Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
- Learning the distribution with largest mean: two bandit frameworks
- Selecting near-optimal approximate state representations in reinforcement learning
- Omega-Regular Objectives in Model-Free Reinforcement Learning
- Dynamic pricing with multiple products and partially specified demand distribution
- Estimation and approximation bounds for gradient-based reinforcement learning
- Pessimistic value iteration for multi-task data sharing in offline reinforcement learning
- Performance guarantees for policy learning
- Learning to optimize via information-directed sampling
- Regret bounds for reinforcement learning via Markov chain concentration
- Attainability of boundary points under reinforcement learning
- Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
- Regret bounds for online-learning-based linear quadratic control under database attacks
- Temporal concatenation for Markov decision processes
- Provably efficient reinforcement learning in decentralized general-sum Markov games
- Reducing reinforcement learning to KWIK online regression
- Relative loss bounds for temporal-difference learning
- Near-optimal reinforcement learning in polynomial time
- Title not available (Why is that?)
- Improved regret for zeroth-order adversarial bandit convex optimisation
- Regret bounds for restless Markov bandits
- Extreme state aggregation beyond Markov decision processes
- Settling the sample complexity of model-based offline reinforcement learning
- Dynamic Inventory and Price Controls Involving Unknown Demand on Discrete Nonperishable Items
- Near-optimal PAC bounds for discounted MDPs
- A bandit-learning approach to multifidelity approximation
- Reinforcement learning in robust Markov decision processes
- Title not available (Why is that?)
- Lipschitzness is all you need to tame off-policy generative adversarial imitation learning
- Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning
- Optimistic Posterior Sampling for Reinforcement Learning: Worst-Case Regret Bounds
- Online regret bounds for Markov decision processes with deterministic transitions
- Robust MDPs with \(k\)-rectangular uncertainty
- PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP
- Regret-Optimal Estimation and Control
- Learning unknown service rates in queues: a multiarmed bandit approach
- Title not available (Why is that?)
- Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives
- Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- Online learning in Markov decision processes with continuous actions
- Learning in structured MDPs with convex cost functions: improved regret bounds for inventory management
- Explicit explore, exploit, or escape \((E^4)\): near-optimal safety-constrained reinforcement learning in polynomial time
- Title not available (Why is that?)
- Adaptive aggregation for reinforcement learning in average reward Markov decision processes
- Scale-free algorithms for online linear optimization
- Deep exploration via randomized value functions
- Bayesian optimistic Kullback-Leibler exploration
- Regret bounds for Narendra-Shapiro bandit algorithms
Uses Software
This page was built for publication: Near-optimal regret bounds for reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2896090)