Interactive Thompson sampling for multi-objective multi-armed bandits
From MaRDI portal
Publication:1990281
DOI10.1007/978-3-319-67504-6_2zbMATH Open1398.90082OpenAlexW2759684794MaRDI QIDQ1990281FDOQ1990281
Authors: Diederik M. Roijers, Luisa M. Zintgraf, Ann Nowé
Publication date: 25 October 2018
Full work available at URL: https://doi.org/10.1007/978-3-319-67504-6_2
Recommendations
- Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search
- Efficient multi-objective reinforcement learning via multiple-gradient descent with iteratively discovered weight-vector sets
- A Survey of Preference-Based Online Learning with Bandit Algorithms
- Multi-objective reinforcement learning using sets of Pareto dominating policies
- scientific article; zbMATH DE number 6276176
Management decision making, including multiple objectives (90B50) Utility theory (91B16) Software, source code, etc. for problems pertaining to operations research and mathematical programming (90-04)
Cited In (1)
This page was built for publication: Interactive Thompson sampling for multi-objective multi-armed bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1990281)