Reinforcement learning in finite MDPs: PAC analysis

MaRDI QIDQ2880979zbMATH OpenFDO

Authors Alexander Strehl, Lihong Li, Michael L. Littman

Publication date 17 April 2012

Published in Journal of Machine Learning Research (JMLR) (Search for Journal in Brave)

Full work available at URL http://www.jmlr.org/papers/v10/strehl09a.html

zbMATH Keywords

Markov decision processes reinforcement learning exploration sample complexity PAC-MDP

Mathematics Subject Classification ID

General nonlinear regression (62J02) Learning and adaptive systems in artificial intelligence (68T05)

Recommendations

PAC Reinforcement Learning Algorithm for General-Sum Markov Games
PAC Bounds for Discounted MDPs
Near-optimal PAC bounds for discounted MDPs
From perturbation analysis to Markov decision processes and reinforcement learning
Reinforcement learning with approximation spaces
Finite-sample analysis of least-squares policy iteration
Learning algorithms for finite horizon constrained Markov decision processes

Cited in

(34)

Hybrid answer set programming
An analysis of model-based interval estimation for Markov decision processes
Efficient PAC learning for episodic tasks with acyclic state spaces
Controlling estimation error in reinforcement learning via reinforced operation
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
An information-theoretic analysis of return maximization in reinforcement learning
Model-free reinforcement learning for branching Markov decision processes
Learning to play using low-complexity rule-based policies: illustrations through Ms. Pac-Man
Recent advances in reinforcement learning in finance
10.1162/153244303765208377
Learning algorithms for verification of Markov decision processes
Reinforcement learning: a comparison of UCB versus alternative adaptive policies
scientific article; zbMATH DE number 2089367 (Why is no real title available?)
A Small Gain Analysis of Single Timescale Actor Critic
Unsupervised basis function adaptation for reinforcement learning
Reducing reinforcement learning to KWIK online regression
Reinforcement learning with immediate rewards and linear hypotheses
Episodic reinforcement learning in finite MDPs: minimax lower bounds revisited
scientific article; zbMATH DE number 1804129 (Why is no real title available?)
scientific article; zbMATH DE number 2159039 (Why is no real title available?)
Extreme state aggregation beyond Markov decision processes
Speedy categorical distributional reinforcement learning and complexity analysis
Near-optimal PAC bounds for discounted MDPs
Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis
Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods
PAC Bounds for Discounted MDPs
scientific article; zbMATH DE number 7307478 (Why is no real title available?)
Provably efficient learning with typed parametric models
scientific article; zbMATH DE number 7370552 (Why is no real title available?)
scientific article; zbMATH DE number 7626790 (Why is no real title available?)
Online reinforcement learning control by Bayesian inference
Knows what it knows: a framework for self-aware learning
10.1162/153244303768966148
Identity concealment games: how I learned to stop revealing and love the coincidences

Describes a project that uses

Uses Software

R-MAX

This page was built for publication: Reinforcement learning in finite MDPs: PAC analysis

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2880979)