Double reinforcement learning for efficient off-policy evaluation in Markov decision processes

From MaRDI portal

Publication:5148951

Jump to:navigation, search

MaRDI QIDQ5148951zbMATH OpenFDO

Authors Nathan Kallus, Masatoshi Uehara

Publication date 5 February 2021

Full work available at URL https://arxiv.org/abs/1908.08526, https://jmlr.csail.mit.edu/papers/v21/19-827.html

zbMATH Keywords

semiparametric efficiency double machine learning Markov decision processes off-policy evaluation

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Recommendations

Cites work

Cited in

(12)

This page was built for publication: Double reinforcement learning for efficient off-policy evaluation in Markov decision processes

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5148951)

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5148951&oldid=19699175"