On the Prior Sensitivity of Thompson Sampling

From MaRDI portal
Publication:2831392

DOI10.1007/978-3-319-46379-7_22zbMATH Open1480.62025arXiv1506.03378OpenAlexW2962974313MaRDI QIDQ2831392FDOQ2831392


Authors: Che-Yu Liu, Lihong Li Edit this on Wikidata


Publication date: 9 November 2016

Published in: Lecture Notes in Computer Science (Search for Journal in Brave)

Abstract: The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm's regret is low (high) when the prior is good (bad), little is known about the exact dependence. In this paper, we fully characterize the algorithm's worst-case dependence of regret on the choice of prior, focusing on a special yet representative case. These results also provide insights into the general sensitivity of the algorithm to the choice of priors. In particular, with p being the prior probability mass of the true reward-generating model, we prove O(sqrtT/p) and O(sqrt(1p)T) regret upper bounds for the bad- and good-prior cases, respectively, as well as emph{matching} lower bounds. Our proofs rely on the discovery of a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the literature, to the best of our knowledge.


Full work available at URL: https://arxiv.org/abs/1506.03378




Recommendations



Cites Work


Cited In (9)





This page was built for publication: On the Prior Sensitivity of Thompson Sampling

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2831392)