scientific article

From MaRDI portal

Revision as of 20:15, 3 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:2933988

Jump to:navigation, search

zbMath1317.68158arXiv1304.3999MaRDI QIDQ2933988

Bruno Scherrer, Matthieu Geist

Publication date: 8 December 2014

Full work available at URL: https://arxiv.org/abs/1304.3999

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

reinforcement learning off-policy learning eligibility traces value function estimation

Mathematics Subject Classification ID

Markov processes: estimation; hidden Markov models (62M05) Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)

Related Items (5)

Learning‐based T‐sHDP() for optimal control of a class of nonlinear discrete‐time systems ⋮ Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning ⋮ Distributed consensus-based multi-agent temporal-difference learning ⋮ On Generalized Bellman Equations and Temporal-Difference Learning ⋮ Off-policy temporal difference learning with distribution adaptation in fast mixing chains

This page was built for publication:

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:2933988&oldid=15922143"