10.1162/jmlr.2003.3.4-5.921
From MaRDI portal
Publication:4656015
DOI10.1162/JMLR.2003.3.4-5.921zbMATH Open1112.68452OpenAlexW4238778767MaRDI QIDQ4656015FDOQ4656015
Authors: Malcolm J. A. Strens, Andrew W. Moore
Publication date: 8 March 2005
Published in: CrossRef Listing of Deleted DOIs (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1162/jmlr.2003.3.4-5.921
Recommendations
- Policy gradient in continuous time
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
- Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
- Compatible natural gradient policy search
- Dynamic programming or direct comparison?
Cited In (1)
This page was built for publication: 10.1162/jmlr.2003.3.4-5.921
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4656015)