scientific article; zbMATH DE number 5037123
From MaRDI portal
DOI10.1023/A:1018012322525zbMath1099.68700OpenAlexW4249855001MaRDI QIDQ5477862
Richard S. Sutton, Satinder Pal Singh
Publication date: 29 June 2006
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1023/a:1018012322525
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Monte Carlo methodMarkov chainreinforcement learningtemporal difference learningeligibility traceCMAC
Related Items
An incremental off-policy search in a model-free Markov decision process using a single sample path, Learning to grasp and extract affordances: the integrated learning of grasps and affordances (ILGA) model, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Performance evaluation of direct heuristic dynamic programming using control-theoretic measures, From Reinforcement Learning to Deep Reinforcement Learning: An Overview, Toward Nonlinear Local Reinforcement Learning Rules Through Neuroevolution, Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison, Qualitative case-based reasoning and learning, Importance sampling in reinforcement learning with an estimated behavior policy, Multi-agent reinforcement learning: a selective overview of theories and algorithms, Perception control