A tutorial survey of reinforcement learning (Q5955768)

From MaRDI portal

Jump to:navigation, search

scientific article; zbMATH DE number 1706651

Language	Label	Description	Also known as
English	A tutorial survey of reinforcement learning	scientific article; zbMATH DE number 1706651

Statements

scholarly article

0 references

A tutorial survey of reinforcement learning (English)

0 references

zbMATH Open document ID

0 references

10.1007/BF02743935

0 references

0 references

publication date

18 February 2002

0 references

Reinforcement learning (RL) refers to the process whereby a learning system learns an associative mapping by maximizing a scalar evaluation (a reinforcement) of its performance from the environment. Delayed RL is a process in which the environment yields only a single scalar reinforcement collectively. Such tasks arise in the optimal control of dynamic systems and planning problems of artificial intelligence. Here, the authors `provide a comprehensive tutorial survey of various ideas and methods of delayed RL'. The connexion with stochastic optimal control is explored, and differences between delayed RL and dynamic programming methods are discussed. Model-based and model-free methods are examined, and general issues relating to the practical implementation of RL algorithms are noted.

0 references

0 references

Mathematics Subject Classification ID

0 references

0 references

zbMATH DE Number

0 references

zbMATH Keywords

reinforcement learning

0 references

dynamic programming

0 references

optimal control

0 references

neural networks

0 references

model-free methods

0 references

MaRDI profile type

MaRDI publication profile

0 references

A New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)

0 references

Pattern-recognizing stochastic learning automata

0 references

Landmark learning: An illustration of associative search

0 references

Associative search network: A reinforcement learning associative memory

0 references

0 references

Distributed dynamic programming

0 references

0 references

0 references

Real-time heuristic search

0 references

A Survey of Some Results in Stochastic Adaptive Control

0 references

0 references

0 references

Transfer of learning by composing solutions of elemental sequential tasks

0 references

Practical issues in temporal difference learning

0 references

\({\mathcal Q}\)-learning

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:5955768

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q5955768&oldid=34292608"