Higher criticism for large-scale inference, especially for rare and weak effects
From MaRDI portal
Publication:254401
DOI10.1214/14-STS506zbMATH Open1332.62019arXiv1410.4743OpenAlexW2127886488MaRDI QIDQ254401FDOQ254401
Authors: Jiashun Jin, David Donoho
Publication date: 8 March 2016
Published in: Statistical Science (Search for Journal in Brave)
Abstract: In modern high-throughput data analysis, researchers perform a large number of statistical tests, expecting to find perhaps a small fraction of significant effects against a predominantly null background. Higher Criticism (HC) was introduced to determine whether there are any nonzero effects; more recently, it was applied to feature selection, where it provides a method for selecting useful predictive features from a large body of potentially useful features, among which only a rare few will prove truly useful. In this article, we review the basics of HC in both the testing and feature selection settings. HC is a flexible idea, which adapts easily to new situations; we point out simple adaptions to clique detection and bivariate outlier detection. HC, although still early in its development, is seeing increasing interest from practitioners; we illustrate this with worked examples. HC is computationally effective, which gives it a nice leverage in the increasingly more relevant "Big Data" settings we see today. We also review the underlying theoretical "ideology" behind HC. The Rare/Weak (RW) model is a theoretical framework simultaneously controlling the size and prevalence of useful/significant items among the useless/null bulk. The RW model shows that HC has important advantages over better known procedures such as False Discovery Rate (FDR) control and Family-wise Error control (FwER), in particular, certain optimality properties. We discuss the rare/weak phase diagram, a way to visualize clearly the class of RW settings where the true signals are so rare or so weak that detection and feature selection are simply impossible, and a way to understand the known optimality properties of HC.
Full work available at URL: https://arxiv.org/abs/1410.4743
Recommendations
- Rare and weak effects in large-scale inference: methods and phase diagrams
- Beyond HC: more sensitive tests for rare/weak alternatives
- Detectability of nonparametric signals: higher criticism versus likelihood ratio
- Feature selection by higher criticism thresholding achieves the optimal phase diagram
- Higher criticism for detecting sparse heterogeneous mixtures.
feature selectionclassificationhigher criticismlarge-scale inferencecontrol of FDRlarge covariance matrixphase diagramrare and weak effectssparse signal detection
Cites Work
- The beta-binomial SGoF method for multiple dependent tests
- Detecting column dependence when rows are correlated and estimating the strength of the row correlation
- Random forests
- Detection boundary in sparse regression
- High-dimensional graphs and variable selection with the Lasso
- Hierarchical testing of variable importance
- Title not available (Why is that?)
- Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism
- High-dimensional classification using features annealed independence rules
- Some problems of hypothesis testing leading to infinitely divisible distributions
- Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses
- Sparse inverse covariance estimation with the graphical lasso
- On the distribution of the largest eigenvalue in principal components analysis
- Minimax detection of a signal for \(l^ n\)-balls.
- Higher criticism for detecting sparse heterogeneous mixtures.
- Detection of an anomalous cluster in a network
- Goodness-of-fit tests via phi-divergences
- Estimation and confidence sets for sparse normal mixtures
- Regularized estimation of large covariance matrices
- Properties of higher criticism under strong dependence
- Cosmological non-Gaussian signature detection: comparing performance of different statistical tests
- Higher criticism thresholding: Optimal feature selection when useful features are rare and weak
- Impossibility of successful classification when useful features are rare and weak
- Feature selection by higher criticism thresholding achieves the optimal phase diagram
- Theoretical Measures of Relative Performance of Classifiers for High Dimensional Data with Small Sample Sizes
- Proportion of Non-Zero Normal Means: Universal Oracle Equivalences and Uniformly Consistent Estimators
- Optimal sparse segment identification with application in copy number variation analysis
- Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons
- Detecting a target in very noisy data from mutliple looks
- Simultaneous discovery of rare and common segment variants
- Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes
- Innovated higher criticism for detecting sparse signals in correlated noise
- Large-Scale Simultaneous Hypothesis Testing
- A constrained \(\ell _{1}\) minimization approach to sparse precision matrix estimation
- Empirical Bayes Analysis of a Microarray Experiment
- Higher criticism: \(p\)-values and criticism
- Optimal detection of heterogeneous and heteroscedastic mixtures
- Title not available (Why is that?)
- Alignments in two-dimensional random sets of points
- Title not available (Why is that?)
- Compressed sensing
- Genome-wide significance levels and weighted hypothesis testing
- Detection of a sparse submatrix of a high-dimensional noisy matrix
- An Analysis for Unreplicated Fractional Factorials
- Detection of sparse additive functions
- A Cramér moderate deviation theorem for Hotelling's \(T^{2}\)-statistic with applications to global tests
- Distribution-free tests for sparse heterogeneous mixtures
- A guided random walk through some high dimensional problems
- Optimal classification in sparse Gaussian graphic model
- Tests alternative to higher criticism for high-dimensional means under sparsity and column-wise dependence
- Goodness of fit tests in terms of local levels with special emphasis on higher criticism tests
- Optimality of Graphlet Screening in High Dimensional Variable Selection
- Optimal Detection of Sparse Mixtures Against a Given Null Distribution
- On the Grenander estimator at zero
- Sample size and power analysis for sparse signal recovery in genome-wide association studies
- On the efficiency of genome-wide scans: a multiple hypothesis testing perspective
- Cosmological model discrimination with weak lensing
- Classification of sparse high-dimensional vectors
- Probability Content of Regions Under Spherical Normal Distributions, I
- Detection boundary and higher criticism approach for rare and weak genetic effects
- Goodness-of-fit test statistics that dominate the Kolmogorov statistics
- Control of the false discovery proportion for independently tested null hypotheses
- Robust test for detecting a signal in a high dimensional sparse normal vector
- Title not available (Why is that?)
- UPS delivers optimal phase diagram in high-dimensional variable selection
- Covariate assisted screening and estimation
- On the ``Poisson boundaries of the family of weighted Kolmogorov statistics
- A comparison of the Lasso and marginal regression
- The average likelihood ratio for large-scale multiple testing and detecting sparse mixtures
- Submanifolds with constant scalar curvature
- Non-asymptotic detection of two-component mixtures with unknown means
- On combinatorial testing problems
- Asymptotic Bayes-optimality under sparsity of some multiple testing procedures
- Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing
- Feature selection in omics prediction problems using cat scores and false nondiscovery rate control
Cited In (38)
- Standardized Partial Sums and Products of p-Values
- Detection boundary and higher criticism approach for rare and weak genetic effects
- Beyond HC: more sensitive tests for rare/weak alternatives
- Two-sample Kolmogorov-Smirnov-type tests revisited: old and new tests in terms of local levels
- Higher criticism for detecting sparse heterogeneous mixtures.
- PCA consistency for the power spiked model in high-dimensional settings
- High-dimensional covariance matrices in elliptical distributions with application to spherical test
- Anomaly Detection for a Large Number of Streams: A Permutation-Based Higher Criticism Approach
- Powerful test based on conditional effects for genome-wide screening
- Exact tests via multiple data splitting
- Higher criticism to compare two large frequency tables, with sensitivity to possible rare and weak differences
- Interactive martingale tests for the global null
- Asymptotically independent U-statistics in high-dimensional testing
- Signal detection via Phi-divergences for general mixtures
- Feature selection by higher criticism thresholding achieves the optimal phase diagram
- Which bridge estimator is the best for variable selection?
- The intermediates take it all: asymptotics of higher criticism statistics and a powerful alternative based on equal local levels
- Rare and weak effects in large-scale inference: methods and phase diagrams
- Intermediate efficiency of some weighted goodness-of-fit statistics
- Thresholding-based outlier detection for high-dimensional data
- Sparse equisigned PCA: algorithms and performance bounds in the noisy rank-1 setting
- An overview of tests on high-dimensional means
- On the asymptotics of a normal beta-transformed empirical process
- Identifying the support of rectangular signals in Gaussian noise
- Testing and signal identification for two-sample high-dimensional covariances via multi-level thresholding
- Mean tests for high-dimensional time series
- Two-sample hypothesis testing for inhomogeneous random graphs
- Higher criticism for discriminating word-frequency tables and authorship attribution
- Detection of sparse positive dependence
- Diagonally Dominant Principal Component Analysis
- Accurate and Efficient P-value Calculation Via Gaussian Approximation: A Novel Monte-Carlo Method
- Sharp optimality for high-dimensional covariance testing under sparse signals
- Testing equivalence of clustering
- Detectability of nonparametric signals: higher criticism versus likelihood ratio
- Special invited paper: the SCORE normalization, especially for heterogeneous network and text data
- On the asymptotic distribution of the scan statistic for empirical distributions
- Statistical proof? The problem of irreproducibility
- Statistical limits of sparse mixture detection
Uses Software
This page was built for publication: Higher criticism for large-scale inference, especially for rare and weak effects
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q254401)