10.1162/jmlr.2003.3.4-5.921
From MaRDI portal
Publication:4656015
DOI10.1162/JMLR.2003.3.4-5.921zbMATH Open1112.68452OpenAlexW4238778767MaRDI QIDQ4656015FDOQ4656015
Andrew W. Moore, Malcolm J. A. Strens
Publication date: 8 March 2005
Published in: CrossRef Listing of Deleted DOIs (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1162/jmlr.2003.3.4-5.921
Cited In (1)
Recommendations
- Policy gradient in continuous time π π
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning π π
- Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm π π
- Compatible natural gradient policy search π π
- Dynamic programming or direct comparison? π π
This page was built for publication: 10.1162/jmlr.2003.3.4-5.921
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4656015)