Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
From MaRDI portal
Publication:1945130
DOI10.1007/s10994-012-5313-8zbMath1260.68328OpenAlexW2154023516WikidataQ59195227 ScholiaQ59195227MaRDI QIDQ1945130
Weiwei Cheng, Johannes Fürnkranz, Sang-Hyeun Park, Eyke Hüllermeier
Publication date: 2 April 2013
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10994-012-5313-8
Related Items
Preferences in artificial intelligence, A one-bit, comparison-based gradient estimator, Deterministic policies based on maximum regrets in MDPs with imprecise rewards, Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm, Global optimization based on active preference learning with radial basis functions, Active Inference: Demystified and Compared, Unnamed Item
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Efficient prediction algorithms for binary decomposition techniques
- Policy search for motor primitives in robotics
- Integrating guidance into relational reinforcement learning
- Qualitative decision theory with preference relations and comparative uncertainty: an axiomatic approach
- Natural actor-critic algorithms
- Elevator group control using multiple reinforcement learning agents
- Modeling agents as qualitative decision makers
- Learning to play chess using temporal differences
- Temporal difference learning applied to game playing and the results of application to Shogi
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Practical issues in temporal difference learning
- \({\mathcal Q}\)-learning
- Rollout sampling approximate policy iteration
- Label ranking by learning pairwise preferences
- Qualitative decision under uncertainty: back to expected utility
- Label Ranking Algorithms: A Survey
- A Survey and Empirical Comparison of Object Ranking Methods
- Preference Learning
- Stochastic Orderings for Markov Processes on Partially Ordered Spaces
- OnActor-Critic Algorithms
- Relational reinforcement learning
- Programming backgammon using self-teaching neural nets
- Finite-time analysis of the multiarmed bandit problem