Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
From MaRDI portal
Publication:6190662
DOI10.1007/s11222-023-10351-yzbMath1529.62039OpenAlexW4388767681MaRDI QIDQ6190662
Yu-Qiang Li, Wei-wei Wang, Xianyi Wu
Publication date: 6 February 2024
Published in: Statistics and Computing (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s11222-023-10351-y
importance samplingMarkov decision processreinforcement learningoff-policy evaluationsynthetic trajectories
Cites Work