On the bias, risk, and consistency of sample means in multi-armed bandits

DOI10.1137/20M1361249MaRDI QIDQ5018902zbMATH OpenOpenAlexFDO

Authors Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo

Publication date 27 December 2021

Published in SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1902.00746

consistency bias multi-armed bandits sample mean risk bounds

Asymptotic properties of nonparametric inference (62G20) Statistical aspects of big data and data science (62R07) Sampling theory, sample surveys (62D05)

Abstract: The sample mean is among the most well studied estimators in statistics, having many desirable properties such as unbiasedness and consistency. However, when analyzing data collected using a multi-armed bandit (MAB) experiment, the sample mean is biased and much remains to be understood about its properties. For example, when is it consistent, how large is its bias, and can we bound its mean squared error? This paper delivers a thorough and systematic treatment of the bias, risk and consistency of MAB sample means. Specifically, we identify four distinct sources of selection bias (sampling, stopping, choosing and rewinding) and analyze them both separately and together. We further demonstrate that a new notion of emph{effective sample size} can be used to bound the risk of the sample mean under suitable loss functions. We present several carefully designed examples to provide intuition on the different sources of selection bias we study. Our treatment is nonparametric and algorithm-agnostic, meaning that it is not tied to a specific algorithm or goal. In a nutshell, our proofs combine variational representations of information-theoretic divergences with new martingale concentration inequalities.

Recommendations

Cites work

Cited in

(1)

Sample mean based index policies by O(log n) regret for the multi-armed bandit problem

This page was built for publication: On the bias, risk, and consistency of sample means in multi-armed bandits

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5018902)