Q4637066 (Q4637066): Difference between revisions

@@ Property / describes a project that uses @@
+MuJoCo
@@ Property / describes a project that uses: MuJoCo / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+TAMER
@@ Property / describes a project that uses: TAMER / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+C4.5
@@ Property / describes a project that uses: C4.5 / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+PILCO
@@ Property / describes a project that uses: PILCO / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / cites work @@
+An introduction to MCMC for machine learning
@@ Property / cites work: An introduction to MCMC for machine learning / rank @@
+Normal rank
@@ Property / cites work @@
+Swinging up a pendulum by energy control
@@ Property / cites work: Swinging up a pendulum by energy control / rank @@
+Normal rank
@@ Property / cites work @@
+Q4252717
@@ Property / cites work: Q4252717 / rank @@
+Normal rank
@@ Property / cites work @@
+Convergence results for the (1,\(\lambda\))-SA-ES using the theory of \(\varphi\)-irreducible Markov chains
+Normal rank
@@ Property / cites work @@
+Q4893747
@@ Property / cites work: Q4893747 / rank @@
+Normal rank
@@ Property / cites work @@
+A Survey of Preference-Based Online Learning with Bandit Algorithms
+Normal rank
@@ Property / cites work @@
+Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
+Normal rank
@@ Property / cites work @@
+Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes
+Normal rank
@@ Property / cites work @@
+Rollout sampling approximate policy iteration
@@ Property / cites work: Rollout sampling approximate policy iteration / rank @@
+Normal rank
@@ Property / cites work @@
+Q4403756
@@ Property / cites work: Q4403756 / rank @@
+Normal rank
@@ Property / cites work @@
+Preference Learning
@@ Property / cites work: Preference Learning / rank @@
+Normal rank
@@ Property / cites work @@
+Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
+Normal rank
@@ Property / cites work @@
+Probability Inequalities for Sums of Bounded Random Variables
+Normal rank
@@ Property / cites work @@
+Label ranking by learning pairwise preferences
@@ Property / cites work: Label ranking by learning pairwise preferences / rank @@
+Normal rank
@@ Property / cites work @@
+A Survey and Empirical Comparison of Object Ranking Methods
+Normal rank
@@ Property / cites work @@
+Model-based contextual policy search for data-efficient generalization of robot skills
+Normal rank
@@ Property / cites work @@
+.1162/1532443041827907
@@ Property / cites work: 10.1162/1532443041827907 / rank @@
+Normal rank
@@ Property / cites work @@
+Introduction to Information Retrieval
@@ Property / cites work: Introduction to Information Retrieval / rank @@
+Normal rank
@@ Property / cites work @@
+Machine learning and knowledge discovery in databases. European conference, ECML PKDD 2011, Athens, Greece, September 5--9, 2011. Proceedings, Part III
+Normal rank
@@ Property / cites work @@
+An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
+Normal rank
@@ Property / cites work @@
+Q4626283
@@ Property / cites work: Q4626283 / rank @@
+Normal rank
@@ Property / cites work @@
+Label Ranking Algorithms: A Survey
@@ Property / cites work: Label Ranking Algorithms: A Survey / rank @@
+Normal rank
@@ Property / cites work @@
+Q5844986
@@ Property / cites work: Q5844986 / rank @@
+Normal rank
@@ Property / cites work @@
+The \(K\)-armed dueling bandits problem
@@ Property / cites work: The \(K\)-armed dueling bandits problem / rank @@
+Normal rank
@@ Property / cites work @@
+Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer
+Normal rank