Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models
From MaRDI portal
Publication:4987192
DOI10.1080/07474946.2021.1847965zbMATH Open1464.62360arXiv1905.03495OpenAlexW3138493539MaRDI QIDQ4987192FDOQ4987192
Authors: Emilie Kaufmann, Aurélien Garivier
Publication date: 29 April 2021
Published in: Sequential Analysis (Search for Journal in Brave)
Abstract: In this paper, we study sequential testing problems with emph{overlapping} hypotheses. We first focus on the simple problem of assessing if the mean of a Gaussian distribution is smaller or larger than a fixed ; if , both answers are considered to be correct. Then, we consider PAC-best arm identification in a bandit model: given probability distributions on with means , we derive the asymptotic complexity of identifying, with risk at most , an index such that . We provide non-asymptotic bounds on the error of a parallel General Likelihood Ratio Test, which can also be used for more general testing problems. We further propose lower bound on the number of observation needed to identify a correct hypothesis. Those lower bounds rely on information-theoretic arguments, and specifically on two versions of a change of measure lemma (a high-level form, and a low-level form) whose relative merits are discussed.
Full work available at URL: https://arxiv.org/abs/1905.03495
Recommendations
Cites Work
- Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
- The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses
- Sequential Tests of Statistical Hypotheses
- Asymptotically efficient adaptive allocation rules
- Some aspects of the sequential design of experiments
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- The sample complexity of exploration in the multi-armed bandit problem
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- A fully sequential procedure for indifference-zone selection in simulation
- Sequential Design of Experiments
- The expected sample size of some tests of power one
- Boundary crossing problems for sample means
- Explore first, exploit next: the true shape of regret in bandit problems
Cited In (3)
This page was built for publication: Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4987192)