Mathematical Research Data Initiative
Main page
Recent changes
Random page
SPARQL
MaRDI@GitHub
New item
In other projects
MaRDI portal item
Discussion
View source
View history
English
Log in

10.1162/jmlr.2003.3.4-5.921

From MaRDI portal
Publication:4656015
Jump to:navigation, search

DOI10.1162/JMLR.2003.3.4-5.921zbMATH Open1112.68452OpenAlexW4238778767MaRDI QIDQ4656015FDOQ4656015


Authors: Malcolm J. A. Strens, Andrew W. Moore Edit this on Wikidata


Publication date: 8 March 2005

Published in: CrossRef Listing of Deleted DOIs (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1162/jmlr.2003.3.4-5.921




Recommendations

  • Policy gradient in continuous time
  • Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
  • Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
  • Compatible natural gradient policy search
  • Dynamic programming or direct comparison?


zbMATH Keywords

reinforcement learning


Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)



Cited In (1)

  • Inverse modeling of a solar collector involving Fourier and non-Fourier heat conduction





This page was built for publication: 10.1162/jmlr.2003.3.4-5.921

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4656015)

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4656015&oldid=18857279"
Tools
What links here
Related changes
Printable version
Permanent link
Page information
This page was last edited on 7 February 2024, at 16:52. Warning: Page may not contain recent updates.
Privacy policy
About MaRDI portal
Disclaimers
Imprint
Powered by MediaWiki