Reinforcement learning in finite MDPs: PAC analysis

zbMATH Open1235.68193MaRDI QIDQ2880979FDOQ2880979

Authors: Alexander Strehl, Lihong Li, Michael L. Littman

Publication date: 17 April 2012

Published in: Journal of Machine Learning Research (JMLR) (Search for Journal in Brave)

Full work available at URL: http://www.jmlr.org/papers/v10/strehl09a.html

Recommendations

PAC Reinforcement Learning Algorithm for General-Sum Markov Games
PAC Bounds for Discounted MDPs
Near-optimal PAC bounds for discounted MDPs
From perturbation analysis to Markov decision processes and reinforcement learning
Reinforcement learning with approximation spaces
Finite-sample analysis of least-squares policy iteration
Learning algorithms for finite horizon constrained Markov decision processes

zbMATH Keywords

Markov decision processes reinforcement learning exploration sample complexity PAC-MDP

Mathematics Subject Classification ID

General nonlinear regression (62J02) Learning and adaptive systems in artificial intelligence (68T05)

Cited In (31)

Controlling estimation error in reinforcement learning via reinforced operation
Hybrid answer set programming
Efficient PAC learning for episodic tasks with acyclic state spaces
An analysis of model-based interval estimation for Markov decision processes
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
An information-theoretic analysis of return maximization in reinforcement learning
Model-free reinforcement learning for branching Markov decision processes
Learning to play using low-complexity rule-based policies: illustrations through Ms. Pac-Man
Recent advances in reinforcement learning in finance
10.1162/153244303765208377
Reinforcement learning: a comparison of UCB versus alternative adaptive policies
A Small Gain Analysis of Single Timescale Actor Critic
Title not available (Why is that?)
Unsupervised basis function adaptation for reinforcement learning
Reducing reinforcement learning to KWIK online regression
Reinforcement learning with immediate rewards and linear hypotheses
Title not available (Why is that?)
Title not available (Why is that?)
Speedy categorical distributional reinforcement learning and complexity analysis
Extreme state aggregation beyond Markov decision processes
Near-optimal PAC bounds for discounted MDPs
Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis
Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods
PAC Bounds for Discounted MDPs
Title not available (Why is that?)
Title not available (Why is that?)
Provably efficient learning with typed parametric models
Title not available (Why is that?)
10.1162/153244303768966148
Knows what it knows: a framework for self-aware learning
Identity concealment games: how I learned to stop revealing and love the coincidences

Uses Software

R-MAX

This page was built for publication: Reinforcement learning in finite MDPs: PAC analysis

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2880979)