scientific article; zbMATH DE number 5037123

From MaRDI portal

Jump to:navigation, search

DOI10.1023/A:1018012322525zbMath1099.68700OpenAlexW4249855001MaRDI QIDQ5477862

Richard S. Sutton, Satinder Pal Singh

Publication date: 29 June 2006

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1023/a:1018012322525

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

Monte Carlo method Markov chain reinforcement learning temporal difference learning eligibility trace CMAC

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Related Items

An incremental off-policy search in a model-free Markov decision process using a single sample path, Learning to grasp and extract affordances: the integrated learning of grasps and affordances (ILGA) model, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Performance evaluation of direct heuristic dynamic programming using control-theoretic measures, From Reinforcement Learning to Deep Reinforcement Learning: An Overview, Toward Nonlinear Local Reinforcement Learning Rules Through Neuroevolution, Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison, Qualitative case-based reasoning and learning, Importance sampling in reinforcement learning with an estimated behavior policy, Multi-agent reinforcement learning: a selective overview of theories and algorithms, Perception control

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5477862&oldid=30026800"