Gradient temporal-difference learning for off-policy evaluation using emphatic weightings

From MaRDI portal

Jump to:navigation, search

DOI10.1016/J.INS.2021.08.082MaRDI QIDQ6146179zbMATH OpenOpenAlexFDO

Authors Jiaqing Cao, Quan Liu, Fei Zhu, Qi-Ming Fu, Shan Zhong

Publication date 10 January 2024

Published in Information Sciences (Search for Journal in Brave)

Full work available at URL https://doi.org/10.1016/j.ins.2021.08.082

zbMATH Keywords

reinforcement learning temporal-difference learning off-policy evaluation emphatic approach gradient temporal-difference learning

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Logic in computer science (03B70)

Recommendations

Cites work

Cited in

(1)

Projected state-action balancing weights for offline reinforcement learning

This page was built for publication: Gradient temporal-difference learning for off-policy evaluation using emphatic weightings

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6146179)

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Gradient_temporal-difference_learning_for_off-policy_evaluation_using_emphatic_weightings&oldid=98856908"