Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

From MaRDI portal

Publication:399890

Jump to:navigation, search

DOI10.1007/s10994-013-5368-1zbMath1295.68180arXiv1206.6461OpenAlexW2120678009MaRDI QIDQ399890

Hilbert J. Kappen, Rémi Munos, Mohammad Gheshlaghi Azar

Publication date: 20 August 2014

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1206.6461

zbMATH Keywords

Markov decision processes learning theory reinforcement learning sample complexity

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)

Related Items

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization, Softmax policy gradient methods can take exponential time to converge, Recent advances in reinforcement learning in finance, Robustness and sample complexity of model-based MARL for general-sum Markov games, Settling the sample complexity of model-based offline reinforcement learning, Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time, Near-optimal PAC bounds for discounted MDPs, Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods, Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis, A lexicographic approach to constrained MDP admission control, Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes, Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics

Cites Work

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:399890&oldid=12272409"