Preference-based reinforcement learning: a formal framework and a policy iteration algorithm

From MaRDI portal

Publication:1945130

Jump to:navigation, search

DOI10.1007/s10994-012-5313-8zbMath1260.68328OpenAlexW2154023516WikidataQ59195227 ScholiaQ59195227MaRDI QIDQ1945130

Weiwei Cheng, Johannes Fürnkranz, Sang-Hyeun Park, Eyke Hüllermeier

Publication date: 2 April 2013

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1007/s10994-012-5313-8

zbMATH Keywords

reinforcement learning preference learning

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Related Items

Preferences in artificial intelligence, A one-bit, comparison-based gradient estimator, Deterministic policies based on maximum regrets in MDPs with imprecise rewards, Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm, Global optimization based on active preference learning with radial basis functions, Active Inference: Demystified and Compared, Unnamed Item

Uses Software

WEKA

Cites Work

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1945130&oldid=14382257"