On the Prior Sensitivity of Thompson Sampling
From MaRDI portal
Publication:2831392
Abstract: The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm's regret is low (high) when the prior is good (bad), little is known about the exact dependence. In this paper, we fully characterize the algorithm's worst-case dependence of regret on the choice of prior, focusing on a special yet representative case. These results also provide insights into the general sensitivity of the algorithm to the choice of priors. In particular, with being the prior probability mass of the true reward-generating model, we prove and regret upper bounds for the bad- and good-prior cases, respectively, as well as emph{matching} lower bounds. Our proofs rely on the discovery of a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the literature, to the best of our knowledge.
Recommendations
- Near-optimal regret bounds for Thompson sampling
- Thompson Sampling for Bayesian Bandits with Resets
- Linear Thompson sampling revisited
- An information-theoretic analysis of Thompson sampling
- Thompson Sampling for Stochastic Control: The Finite Parameter Case
- Thompson sampling: an asymptotically optimal finite-time analysis
- A Tutorial on Thompson Sampling
- Thompson Sampling for Stochastic Control: The Continuous Parameter Case
- scientific article; zbMATH DE number 7626733
- Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning
Cites work
- scientific article; zbMATH DE number 6276176 (Why is no real title available?)
- An information-theoretic analysis of Thompson sampling
- Approximate indexability and bandit problems with concave rewards and delayed feedback
- Asymptotically efficient adaptive allocation rules
- Learning to optimize via posterior sampling
- On the Prior Sensitivity of Thompson Sampling
- Prediction, Learning, and Games
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- Sequential experimentation in clinical trials. Design and analysis
- The Nonstochastic Multiarmed Bandit Problem
- Thompson sampling: an asymptotically optimal finite-time analysis
- Towards optimal algorithms for prediction with expert advice
Cited in
(11)- Sliding-Window Thompson Sampling for Non-Stationary Settings
- Thompson Sampling for Bayesian Bandits with Resets
- Linear Thompson sampling revisited
- Multinomial Thompson sampling for rating scales and prior considerations for calibrating uncertainty
- Learning to optimize via posterior sampling
- Thompson sampling: an asymptotically optimal finite-time analysis
- An information-theoretic analysis of Thompson sampling
- A Tutorial on Thompson Sampling
- Near-optimal regret bounds for Thompson sampling
- On Bayesian index policies for sequential resource allocation
- On the Prior Sensitivity of Thompson Sampling
This page was built for publication: On the Prior Sensitivity of Thompson Sampling
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2831392)