Off-policy temporal difference learning with distribution adaptation in fast mixing chains

From MaRDI portal

Publication:1797759

Jump to:navigation, search

DOI10.1007/s00500-017-2490-1zbMath1398.68436OpenAlexW2581667043MaRDI QIDQ1797759

Maziar Palhang, Arash Givchi

Publication date: 22 October 2018

Published in: Soft Computing (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1007/s00500-017-2490-1

zbMATH Keywords

reinforcement learning mixing time covariate shift adaptation least-squares temporal difference off-policy evaluation

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Cites Work

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1797759&oldid=14150368"