Preference-based reinforcement learning: a formal framework and a policy iteration algorithm (Q1945130): Difference between revisions

@@ Property / describes a project that uses @@
+WEKA
@@ Property / describes a project that uses: WEKA / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1007/s10994-012-5313-8
+Normal rank
@@ Property / OpenAlex ID @@
+W2154023516
@@ Property / OpenAlex ID: W2154023516 / rank @@
+Normal rank
@@ Property / Wikidata QID @@
+Q59195227
@@ Property / Wikidata QID: Q59195227 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4252717
@@ Property / cites work: Q4252717 / rank @@
+Normal rank
@@ Property / cites work @@
+Finite-time analysis of the multiarmed bandit problem
+Normal rank
@@ Property / cites work @@
+Learning to play chess using temporal differences
@@ Property / cites work: Learning to play chess using temporal differences / rank @@
+Normal rank
@@ Property / cites work @@
+Temporal difference learning applied to game playing and the results of application to Shogi
+Normal rank
@@ Property / cites work @@
+Natural actor-critic algorithms
@@ Property / cites work: Natural actor-critic algorithms / rank @@
+Normal rank
@@ Property / cites work @@
+Modeling agents as qualitative decision makers
@@ Property / cites work: Modeling agents as qualitative decision makers / rank @@
+Normal rank
@@ Property / cites work @@
+Elevator group control using multiple reinforcement learning agents
+Normal rank
@@ Property / cites work @@
+Q3093335
@@ Property / cites work: Q3093335 / rank @@
+Normal rank
@@ Property / cites work @@
+Rollout sampling approximate policy iteration
@@ Property / cites work: Rollout sampling approximate policy iteration / rank @@
+Normal rank
@@ Property / cites work @@
+Integrating guidance into relational reinforcement learning
+Normal rank
@@ Property / cites work @@
+Qualitative decision theory with preference relations and comparative uncertainty: an axiomatic approach
+Normal rank
@@ Property / cites work @@
+Relational reinforcement learning
@@ Property / cites work: Relational reinforcement learning / rank @@
+Normal rank
@@ Property / cites work @@
+Q3093188
@@ Property / cites work: Q3093188 / rank @@
+Normal rank
@@ Property / cites work @@
+Qualitative decision under uncertainty: back to expected utility
+Normal rank
@@ Property / cites work @@
+Q3623997
@@ Property / cites work: Q3623997 / rank @@
+Normal rank
@@ Property / cites work @@
+Preference Learning
@@ Property / cites work: Preference Learning / rank @@
+Normal rank
@@ Property / cites work @@
+Label ranking by learning pairwise preferences
@@ Property / cites work: Label ranking by learning pairwise preferences / rank @@
+Normal rank
@@ Property / cites work @@
+A Survey and Empirical Comparison of Object Ranking Methods
+Normal rank
@@ Property / cites work @@
+Policy search for motor primitives in robotics
@@ Property / cites work: Policy search for motor primitives in robotics / rank @@
+Normal rank
@@ Property / cites work @@
+OnActor-Critic Algorithms
@@ Property / cites work: OnActor-Critic Algorithms / rank @@
+Normal rank
@@ Property / cites work @@
+Q5477865
@@ Property / cites work: Q5477865 / rank @@
+Normal rank
@@ Property / cites work @@
+Stochastic Orderings for Markov Processes on Partially Ordered Spaces
+Normal rank
@@ Property / cites work @@
+Efficient prediction algorithms for binary decomposition techniques
+Normal rank
@@ Property / cites work @@
+Q5305630
@@ Property / cites work: Q5305630 / rank @@
+Normal rank
@@ Property / cites work @@
+Q2880944
@@ Property / cites work: Q2880944 / rank @@
+Normal rank
@@ Property / cites work @@
+Practical issues in temporal difference learning
@@ Property / cites work: Practical issues in temporal difference learning / rank @@
+Normal rank
@@ Property / cites work @@
+Programming backgammon using self-teaching neural nets
+Normal rank
@@ Property / cites work @@
+Q2896181
@@ Property / cites work: Q2896181 / rank @@
+Normal rank
@@ Property / cites work @@
+Label Ranking Algorithms: A Survey
@@ Property / cites work: Label Ranking Algorithms: A Survey / rank @@
+Normal rank
@@ Property / cites work @@
+\({\mathcal Q}\)-learning
@@ Property / cites work: \({\mathcal Q}\)-learning / rank @@
+Normal rank
@@ Property / cites work @@
+Simple statistical gradient-following algorithms for connectionist reinforcement learning
+Normal rank