scientific article

From MaRDI portal

Publication:2933988

Jump to:navigation, search

zbMath1317.68158arXiv1304.3999MaRDI QIDQ2933988

Bruno Scherrer, Matthieu Geist

Publication date: 8 December 2014

Full work available at URL: https://arxiv.org/abs/1304.3999

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

reinforcement learning off-policy learning eligibility traces value function estimation

Mathematics Subject Classification ID

Markov processes: estimation; hidden Markov models (62M05) Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)

Related Items

Learning‐based T‐sHDP() for optimal control of a class of nonlinear discrete‐time systems, Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning, Distributed consensus-based multi-agent temporal-difference learning, On Generalized Bellman Equations and Temporal-Difference Learning, Off-policy temporal difference learning with distribution adaptation in fast mixing chains

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:2933988&oldid=15922143"