Linear Thompson sampling revisited

DOI10.1214/17-EJS1341SIMaRDI QIDQ1688988zbMATH OpenOpenAlexFDO

Authors Marc Abeille, Alessandro Lazaric

Publication date 12 January 2018

Published in Electronic Journal of Statistics (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1611.06534, https://projecteuclid.org/euclid.ejs/1513306870

zbMATH Keywords

linear bandit regularized linear optimization Thompson sampling

Mathematics Subject Classification ID

Sampling theory, sample surveys (62D05) Stochastic programming (90C15)

Abstract: We derive an alternative proof for the regret of Thompson sampling ( s) in the stochastic linear bandit setting. While we obtain a regret bound of order

w i d e t i l d e O (d^{3 / 2} s q r t T)

as in previous results, the proof sheds new light on the functioning of the s. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to extit{optimistic} parameters does control it. Thus we show that s can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional

s q r t d

regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.

Recommendations

Cited in

(15)

This page was built for publication: Linear Thompson sampling revisited

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1688988)