On the Prior Sensitivity of Thompson Sampling
From MaRDI portal
Publication:2831392
DOI10.1007/978-3-319-46379-7_22zbMATH Open1480.62025arXiv1506.03378OpenAlexW2962974313MaRDI QIDQ2831392FDOQ2831392
Authors: Che-Yu Liu, Lihong Li
Publication date: 9 November 2016
Published in: Lecture Notes in Computer Science (Search for Journal in Brave)
Abstract: The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm's regret is low (high) when the prior is good (bad), little is known about the exact dependence. In this paper, we fully characterize the algorithm's worst-case dependence of regret on the choice of prior, focusing on a special yet representative case. These results also provide insights into the general sensitivity of the algorithm to the choice of priors. In particular, with being the prior probability mass of the true reward-generating model, we prove and regret upper bounds for the bad- and good-prior cases, respectively, as well as emph{matching} lower bounds. Our proofs rely on the discovery of a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the literature, to the best of our knowledge.
Full work available at URL: https://arxiv.org/abs/1506.03378
Recommendations
- Near-optimal regret bounds for Thompson sampling
- Thompson Sampling for Bayesian Bandits with Resets
- Linear Thompson sampling revisited
- An information-theoretic analysis of Thompson sampling
- Thompson Sampling for Stochastic Control: The Finite Parameter Case
- Thompson sampling: an asymptotically optimal finite-time analysis
- A Tutorial on Thompson Sampling
- Thompson Sampling for Stochastic Control: The Continuous Parameter Case
- scientific article; zbMATH DE number 7626733
- Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning
Cites Work
- Prediction, Learning, and Games
- Asymptotically efficient adaptive allocation rules
- The Nonstochastic Multiarmed Bandit Problem
- Sequential experimentation in clinical trials. Design and analysis
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Thompson sampling: an asymptotically optimal finite-time analysis
- Learning to Optimize via Posterior Sampling
- Title not available (Why is that?)
- An information-theoretic analysis of Thompson sampling
- On the Prior Sensitivity of Thompson Sampling
- Towards Optimal Algorithms for Prediction with Expert Advice
- Approximate Indexability and Bandit Problems with Concave Rewards and Delayed Feedback
Cited In (9)
- On the Prior Sensitivity of Thompson Sampling
- Thompson Sampling for Bayesian Bandits with Resets
- Multinomial Thompson sampling for rating scales and prior considerations for calibrating uncertainty
- On Bayesian index policies for sequential resource allocation
- Sliding-Window Thompson Sampling for Non-Stationary Settings
- A Tutorial on Thompson Sampling
- An information-theoretic analysis of Thompson sampling
- Thompson sampling: an asymptotically optimal finite-time analysis
- Linear Thompson sampling revisited
This page was built for publication: On the Prior Sensitivity of Thompson Sampling
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2831392)