Linear Thompson sampling revisited
From MaRDI portal
Abstract: We derive an alternative proof for the regret of Thompson sampling ( s) in the stochastic linear bandit setting. While we obtain a regret bound of order as in previous results, the proof sheds new light on the functioning of the s. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to extit{optimistic} parameters does control it. Thus we show that s can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.
Recommendations
- Near-optimal regret bounds for Thompson sampling
- A Tutorial on Thompson Sampling
- Thompson sampling: an asymptotically optimal finite-time analysis
- scientific article; zbMATH DE number 7626733
- Thompson Sampling for Bayesian Bandits with Resets
- Thompson Sampling for Stochastic Control: The Finite Parameter Case
- Thompson Sampling for Stochastic Control: The Continuous Parameter Case
- On the Prior Sensitivity of Thompson Sampling
- Safe Linear Thompson Sampling With Side Information
- Linearly parameterized bandits
Cited in
(15)- On adaptive linear-quadratic regulators
- Sliding-Window Thompson Sampling for Non-Stationary Settings
- scientific article; zbMATH DE number 7626780 (Why is no real title available?)
- Thompson Sampling for Bayesian Bandits with Resets
- scientific article; zbMATH DE number 7626733 (Why is no real title available?)
- scientific article; zbMATH DE number 7307478 (Why is no real title available?)
- Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon
- IntelligentPooling: practical Thompson sampling for mHealth
- Derivative-free methods for policy optimization: guarantees for linear quadratic systems
- An information-theoretic analysis of Thompson sampling
- A Tutorial on Thompson Sampling
- Near-optimal regret bounds for Thompson sampling
- Streaming kernel regression with provably adaptive mean, variance, and regularization
- On the sample complexity of the linear quadratic regulator
- On the Prior Sensitivity of Thompson Sampling
This page was built for publication: Linear Thompson sampling revisited
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1688988)