Interactive Thompson sampling for multi-objective multi-armed bandits
From MaRDI portal
Publication:1990281
Recommendations
- Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search
- Efficient multi-objective reinforcement learning via multiple-gradient descent with iteratively discovered weight-vector sets
- A Survey of Preference-Based Online Learning with Bandit Algorithms
- Multi-objective reinforcement learning using sets of Pareto dominating policies
- scientific article; zbMATH DE number 6276176
Cited in
(1)
This page was built for publication: Interactive Thompson sampling for multi-objective multi-armed bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1990281)