On the bias, risk, and consistency of sample means in multi-armed bandits
From MaRDI portal
Publication:5018902
Abstract: The sample mean is among the most well studied estimators in statistics, having many desirable properties such as unbiasedness and consistency. However, when analyzing data collected using a multi-armed bandit (MAB) experiment, the sample mean is biased and much remains to be understood about its properties. For example, when is it consistent, how large is its bias, and can we bound its mean squared error? This paper delivers a thorough and systematic treatment of the bias, risk and consistency of MAB sample means. Specifically, we identify four distinct sources of selection bias (sampling, stopping, choosing and rewinding) and analyze them both separately and together. We further demonstrate that a new notion of emph{effective sample size} can be used to bound the risk of the sample mean under suitable loss functions. We present several carefully designed examples to provide intuition on the different sources of selection bias we study. Our treatment is nonparametric and algorithm-agnostic, meaning that it is not tied to a specific algorithm or goal. In a nutshell, our proofs combine variational representations of information-theoretic divergences with new martingale concentration inequalities.
Recommendations
- Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
- Active Learning in Multi-armed Bandits
- The sample complexity of exploration in the multi-armed bandit problem
- The multi-armed bandit problem: an efficient nonparametric solution
- Lower bounds on the sample complexity of exploration in the multi-armed bandit problem.
Cites work
- scientific article; zbMATH DE number 4113733 (Why is no real title available?)
- scientific article; zbMATH DE number 3072540 (Why is no real title available?)
- scientific article; zbMATH DE number 3073499 (Why is no real title available?)
- scientific article; zbMATH DE number 3085434 (Why is no real title available?)
- Bandit algorithms
- CONFIDENCE SEQUENCES FOR MEAN, VARIANCE, AND MEDIAN
- Estimation Following Sequential Tests
- Estimation of densities and applications
- Further Remarks on Sequential Estimation: The Exponential Case
- Information-theoretic determination of minimax rates of convergence
- Introduction to nonparametric estimation
- Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges
- On confidence sequences
- On the Asymptotic Efficiency of a Sequential Procedure for Estimating the Mean
- On the complexity of best-arm identification in multi-armed bandit models
- Optimum Character of the Sequential Probability Ratio Test
- Probability
- Relative loss bounds for on-line density estimation with the exponential family of distributions
- SOME FURTHER REMARKS ON INEQUALITIES FOR SAMPLE SUMS
- Self-normalized processes: exponential inequalities, moment bounds and iterated logarithm laws.
- Some aspects of the sequential design of experiments
- Stopped Random Walks
- Time-uniform Chernoff bounds via nonnegative supermartingales
- Time-uniform, nonparametric, nonasymptotic confidence sequences
- \(L_p\)-version of the Dubins-Savage inequality and some exponential inequalities
This page was built for publication: On the bias, risk, and consistency of sample means in multi-armed bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5018902)