scientific article; zbMATH DE number 6860841
From MaRDI portal
Publication:4637066
zbMath1435.68287MaRDI QIDQ4637066
Riad Akrour, Christian Wirth, Johannes Fürnkranz, Gerhard Neumann
Publication date: 17 April 2018
Full work available at URL: http://jmlr.csail.mit.edu/papers/v18/16-634.html
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Markov decision processreinforcement learningtemporal difference learningpreference learningpolicy searchpreference-based reinforcement learningqualitative feedback
Learning and adaptive systems in artificial intelligence (68T05) Research exposition (monographs, survey articles) pertaining to computer science (68-02)
Related Items (2)
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- The \(K\)-armed dueling bandits problem
- Machine learning and knowledge discovery in databases. European conference, ECML PKDD 2011, Athens, Greece, September 5--9, 2011. Proceedings, Part III
- Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes
- An introduction to MCMC for machine learning
- Convergence results for the (1,\(\lambda\))-SA-ES using the theory of \(\varphi\)-irreducible Markov chains
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
- Swinging up a pendulum by energy control
- Rollout sampling approximate policy iteration
- Label ranking by learning pairwise preferences
- Model-based contextual policy search for data-efficient generalization of robot skills
- Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
- Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer
- A Survey of Preference-Based Online Learning with Bandit Algorithms
- Label Ranking Algorithms: A Survey
- A Survey and Empirical Comparison of Object Ranking Methods
- Introduction to Information Retrieval
- Preference Learning
- 10.1162/1532443041827907
- Probability Inequalities for Sums of Bounded Random Variables
- An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
This page was built for publication: