scientific article; zbMATH DE number 6860841

From MaRDI portal

Publication:4637066

Jump to:navigation, search

zbMath1435.68287MaRDI QIDQ4637066

Riad Akrour, Christian Wirth, Johannes Fürnkranz, Gerhard Neumann

Publication date: 17 April 2018

Full work available at URL: http://jmlr.csail.mit.edu/papers/v18/16-634.html

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

Markov decision process reinforcement learning temporal difference learning preference learning policy search preference-based reinforcement learning qualitative feedback

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Research exposition (monographs, survey articles) pertaining to computer science (68-02)

Related Items (2)

Reward (Mis)design for autonomous driving ⋮ Unnamed Item

Uses Software

Cites Work

This page was built for publication:

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4637066&oldid=18824455"