Reducing reinforcement learning to KWIK online regression
DOI10.1007/S10472-010-9201-2zbMATH Open1207.68243OpenAlexW2020753891MaRDI QIDQ616761FDOQ616761
Authors: Lihong Li, Michael L. Littman
Publication date: 12 January 2011
Published in: Annals of Mathematics and Artificial Intelligence (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10472-010-9201-2
Recommendations
- Exploration in relational domains for model-based reinforcement learning
- scientific article; zbMATH DE number 1453042
- Bayesian Reinforcement Learning with Exploration
- The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms
- Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning
value function approximationreinforcement learningexplorationknows what it knows (KWIK)online regressionPAC-MDP
Learning and adaptive systems in artificial intelligence (68T05) Analysis of algorithms and problem complexity (68Q25)
Cites Work
- 10.1162/153244303765208377
- Title not available (Why is that?)
- Prediction, Learning, and Games
- 10.1162/153244303321897663
- A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations
- 10.1162/1532443041827907
- The complexity of dynamic programming
- Reinforcement learning in finite MDPs: PAC analysis
- Near-optimal regret bounds for reinforcement learning
- A sparse sampling algorithm for near-optimal planning in large Markov decision processes
- Knows what it knows: a framework for self-aware learning
- Near-optimal reinforcement learning in polynomial time
- Approximate policy iteration with a policy language bias: solving relational Markov decision processes
- The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms
Cited In (7)
- Exploration in relational domains for model-based reinforcement learning
- Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm
- Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains
- Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning
- Knows what it knows: a framework for self-aware learning
- Deep exploration via randomized value functions
- Provably Efficient Reinforcement Learning with Linear Function Approximation
Uses Software
This page was built for publication: Reducing reinforcement learning to KWIK online regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q616761)