Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
From MaRDI portal
Publication:399890
DOI10.1007/s10994-013-5368-1zbMath1295.68180arXiv1206.6461OpenAlexW2120678009MaRDI QIDQ399890
Hilbert J. Kappen, Rémi Munos, Mohammad Gheshlaghi Azar
Publication date: 20 August 2014
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1206.6461
Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)
Related Items
Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization, Softmax policy gradient methods can take exponential time to converge, Recent advances in reinforcement learning in finance, Robustness and sample complexity of model-based MARL for general-sum Markov games, Settling the sample complexity of model-based offline reinforcement learning, Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time, Near-optimal PAC bounds for discounted MDPs, Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods, Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis, A lexicographic approach to constrained MDP admission control, Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes, Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics
Cites Work
- A guided tour of Chernoff bounds
- An upper bound on the loss from approximate optimal-value functions
- PAC Bounds for Discounted MDPs
- Algorithms for Reinforcement Learning
- The variance of discounted Markov decision processes
- Neuro-Dynamic Programming: An Overview and Recent Results
- Prediction, Learning, and Games
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item