Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control
From MaRDI portal
Publication:480982
DOI10.1214/14-AOS1249zbMATH Open1305.62213arXiv1310.4371OpenAlexW2963682607MaRDI QIDQ480982FDOQ480982
Publication date: 12 December 2014
Published in: The Annals of Statistics (Search for Journal in Brave)
Abstract: Applying Benjamini and Hochberg (B-H) method to multiple Student's tests is a popular technique in gene selection in microarray data analysis. Because of the non-normality of the population, the true p-values of the hypothesis tests are typically unknown. Hence, it is common to use the standard normal distribution N(0,1), Student's distribution or the bootstrap method to estimate the p-values. In this paper, we first study N(0,1) and calibrations. We prove that, when the population has the finite 4-th moment and the dimension and the sample size satisfy , B-H method controls the false discovery rate (FDR) at a given level asymptotically with p-values estimated from N(0,1) or distribution. However, a phase transition phenomenon occurs when . In this case, the FDR of B-H method may be larger than or even tends to one. In contrast, the bootstrap calibration is accurate for as long as the underlying distribution has the sub-Gaussian tails. However, such light tailed condition can not be weakened in general. The simulation study shows that for the heavy tailed distributions, the bootstrap calibration is very conservative. In order to solve this problem, a regularized bootstrap correction is proposed and is shown to be robust to the tails of the distributions. The simulation study shows that the regularized bootstrap method performs better than the usual bootstrap method.
Full work available at URL: https://arxiv.org/abs/1310.4371
Recommendations
- FDR control in multiple testing under non-normality
- To How Many Simultaneous Hypothesis Tests Can Normal, Student'stor Bootstrap Calibration Be Applied?
- Adaptive choice of the number of bootstrap samples in large scale multiple testing
- Estimation of False Discovery Rates in Multiple Testing: Application to Gene Microarray Data
- Control of the false discovery rate under dependence using the bootstrap and subsampling
Cited In (28)
- Testing the differential network between two gaussian graphical models with false discovery rate control
- High-dimensional two-sample mean vectors test and support recovery with factor adjustment
- Change-point testing for parallel data sets with FDR control
- Large-Scale Two-Sample Comparison of Support Sets
- Title not available (Why is that?)
- A new perspective on robust \(M\)-estimation: finite sample theory and applications to dependence-adjusted multiple testing
- Structure learning of exponential family graphical model with false discovery rate control
- Null-free false discovery rate control using decoy permutations
- Testing independence with high-dimensional correlated samples
- Support recovery of Gaussian graphical model with false discovery rate control
- A dynamic screening algorithm for hierarchical binary marketing data
- Robust high-dimensional tuning free multiple testing
- Self-normalization: taming a wild population in a heavy-tailed world
- RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs
- On simultaneous calibration of two-sample t-tests for high-dimension low-sample-size data
- StarTrek: combinatorial variable selection with false discovery rate control
- Robust inference via multiplier bootstrap
- Testing and estimation for clustered signals
- FarmTest: Factor-Adjusted Robust Multiple Testing With Approximate False Discovery Control
- Bootstrap analysis of mutual fund performance
- Asymptotic false discovery control of the Benjamini-Hochberg procedure for pairwise comparisons
- Multiple Testing of Submatrices of a Precision Matrix With Applications to Identification of Between Pathway Interactions
- TEAM: a multiple testing algorithm on the aggregation tree for flow cytometry analysis
- Threshold Selection in Feature Screening for Error Rate Control
- Statistical Inference for High-Dimensional Vector Autoregression with Measurement Error
- False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation
- Data-driven selection of the number of change-points via error rate control
- Large-scale simultaneous testing using kernel density estimation
Uses Software
This page was built for publication: Phase transition and regularized bootstrap in large-scale \(t\)-tests with false discovery rate control
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q480982)