Higher criticism for large-scale inference, especially for rare and weak effects
From MaRDI portal
(Redirected from Publication:254401)
Abstract: In modern high-throughput data analysis, researchers perform a large number of statistical tests, expecting to find perhaps a small fraction of significant effects against a predominantly null background. Higher Criticism (HC) was introduced to determine whether there are any nonzero effects; more recently, it was applied to feature selection, where it provides a method for selecting useful predictive features from a large body of potentially useful features, among which only a rare few will prove truly useful. In this article, we review the basics of HC in both the testing and feature selection settings. HC is a flexible idea, which adapts easily to new situations; we point out simple adaptions to clique detection and bivariate outlier detection. HC, although still early in its development, is seeing increasing interest from practitioners; we illustrate this with worked examples. HC is computationally effective, which gives it a nice leverage in the increasingly more relevant "Big Data" settings we see today. We also review the underlying theoretical "ideology" behind HC. The Rare/Weak (RW) model is a theoretical framework simultaneously controlling the size and prevalence of useful/significant items among the useless/null bulk. The RW model shows that HC has important advantages over better known procedures such as False Discovery Rate (FDR) control and Family-wise Error control (FwER), in particular, certain optimality properties. We discuss the rare/weak phase diagram, a way to visualize clearly the class of RW settings where the true signals are so rare or so weak that detection and feature selection are simply impossible, and a way to understand the known optimality properties of HC.
Recommendations
- Rare and weak effects in large-scale inference: methods and phase diagrams
- Beyond HC: more sensitive tests for rare/weak alternatives
- Detectability of nonparametric signals: higher criticism versus likelihood ratio
- Feature selection by higher criticism thresholding achieves the optimal phase diagram
- Higher criticism for detecting sparse heterogeneous mixtures.
Cites work
- scientific article; zbMATH DE number 5604036 (Why is no real title available?)
- scientific article; zbMATH DE number 720689 (Why is no real title available?)
- scientific article; zbMATH DE number 2078189 (Why is no real title available?)
- scientific article; zbMATH DE number 6122810 (Why is no real title available?)
- A Cramér moderate deviation theorem for Hotelling's \(T^{2}\)-statistic with applications to global tests
- A comparison of the Lasso and marginal regression
- A constrained \(\ell _{1}\) minimization approach to sparse precision matrix estimation
- A guided random walk through some high dimensional problems
- Alignments in two-dimensional random sets of points
- An Analysis for Unreplicated Fractional Factorials
- Asymptotic Bayes-optimality under sparsity of some multiple testing procedures
- Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes
- Classification of sparse high-dimensional vectors
- Compressed sensing
- Control of the false discovery proportion for independently tested null hypotheses
- Cosmological model discrimination with weak lensing
- Cosmological non-Gaussian signature detection: comparing performance of different statistical tests
- Covariate assisted screening and estimation
- Detecting a target in very noisy data from mutliple looks
- Detecting column dependence when rows are correlated and estimating the strength of the row correlation
- Detection boundary and higher criticism approach for rare and weak genetic effects
- Detection boundary in sparse regression
- Detection of a sparse submatrix of a high-dimensional noisy matrix
- Detection of an anomalous cluster in a network
- Detection of sparse additive functions
- Distribution-free tests for sparse heterogeneous mixtures
- Empirical Bayes Analysis of a Microarray Experiment
- Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons
- Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses
- Estimation and confidence sets for sparse normal mixtures
- Feature selection by higher criticism thresholding achieves the optimal phase diagram
- Feature selection in omics prediction problems using cat scores and false nondiscovery rate control
- Genome-wide significance levels and weighted hypothesis testing
- Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism
- Goodness of fit tests in terms of local levels with special emphasis on higher criticism tests
- Goodness-of-fit test statistics that dominate the Kolmogorov statistics
- Goodness-of-fit tests via phi-divergences
- Hierarchical testing of variable importance
- High-dimensional classification using features annealed independence rules
- High-dimensional graphs and variable selection with the Lasso
- Higher criticism for detecting sparse heterogeneous mixtures.
- Higher criticism thresholding: Optimal feature selection when useful features are rare and weak
- Higher criticism: \(p\)-values and criticism
- Impossibility of successful classification when useful features are rare and weak
- Innovated higher criticism for detecting sparse signals in correlated noise
- Large-Scale Simultaneous Hypothesis Testing
- Minimax detection of a signal for \(l^ n\)-balls.
- Non-asymptotic detection of two-component mixtures with unknown means
- On combinatorial testing problems
- On the Grenander estimator at zero
- On the ``Poisson boundaries of the family of weighted Kolmogorov statistics
- On the distribution of the largest eigenvalue in principal components analysis
- On the efficiency of genome-wide scans: a multiple hypothesis testing perspective
- Optimal Detection of Sparse Mixtures Against a Given Null Distribution
- Optimal classification in sparse Gaussian graphic model
- Optimal detection of heterogeneous and heteroscedastic mixtures
- Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing
- Optimal sparse segment identification with application in copy number variation analysis
- Optimality of Graphlet Screening in High Dimensional Variable Selection
- Probability Content of Regions Under Spherical Normal Distributions, I
- Properties of higher criticism under strong dependence
- Proportion of Non-Zero Normal Means: Universal Oracle Equivalences and Uniformly Consistent Estimators
- Random forests
- Regularized estimation of large covariance matrices
- Robust test for detecting a signal in a high dimensional sparse normal vector
- Sample size and power analysis for sparse signal recovery in genome-wide association studies
- Simultaneous discovery of rare and common segment variants
- Some problems of hypothesis testing leading to infinitely divisible distributions
- Sparse inverse covariance estimation with the graphical lasso
- Submanifolds with constant scalar curvature
- Tests alternative to higher criticism for high-dimensional means under sparsity and column-wise dependence
- The average likelihood ratio for large-scale multiple testing and detecting sparse mixtures
- The beta-binomial SGoF method for multiple dependent tests
- Theoretical Measures of Relative Performance of Classifiers for High Dimensional Data with Small Sample Sizes
- UPS delivers optimal phase diagram in high-dimensional variable selection
Cited in
(38)- Mean tests for high-dimensional time series
- The impossibility region for detecting sparse mixtures using the higher criticism
- Rare and weak effects in large-scale inference: methods and phase diagrams
- An overview of tests on high-dimensional means
- Anomaly Detection for a Large Number of Streams: A Permutation-Based Higher Criticism Approach
- High-dimensional covariance matrices in elliptical distributions with application to spherical test
- Diagonally Dominant Principal Component Analysis
- Intermediate efficiency of some weighted goodness-of-fit statistics
- Two-sample hypothesis testing for inhomogeneous random graphs
- Two-sample Kolmogorov-Smirnov-type tests revisited: old and new tests in terms of local levels
- Asymptotically independent U-statistics in high-dimensional testing
- Special invited paper: the SCORE normalization, especially for heterogeneous network and text data
- Standardized Partial Sums and Products of p-Values
- On the asymptotic distribution of the scan statistic for empirical distributions
- Higher criticism for discriminating word-frequency tables and authorship attribution
- Signal detection via Phi-divergences for general mixtures
- Powerful test based on conditional effects for genome-wide screening
- Sharp optimality for high-dimensional covariance testing under sparse signals
- Accurate and Efficient P-value Calculation Via Gaussian Approximation: A Novel Monte-Carlo Method
- Higher criticism for detecting sparse heterogeneous mixtures.
- Which bridge estimator is the best for variable selection?
- Testing equivalence of clustering
- On the asymptotics of a normal beta-transformed empirical process
- Higher criticism to compare two large frequency tables, with sensitivity to possible rare and weak differences
- The intermediates take it all: asymptotics of higher criticism statistics and a powerful alternative based on equal local levels
- Identifying the support of rectangular signals in Gaussian noise
- Testing and signal identification for two-sample high-dimensional covariances via multi-level thresholding
- Detection boundary and higher criticism approach for rare and weak genetic effects
- Detectability of nonparametric signals: higher criticism versus likelihood ratio
- Feature selection by higher criticism thresholding achieves the optimal phase diagram
- Statistical limits of sparse mixture detection
- Exact tests via multiple data splitting
- Beyond HC: more sensitive tests for rare/weak alternatives
- Detection of sparse positive dependence
- Thresholding-based outlier detection for high-dimensional data
- Interactive martingale tests for the global null
- Statistical proof? The problem of irreproducibility
- Sparse equisigned PCA: algorithms and performance bounds in the noisy rank-1 setting
This page was built for publication: Higher criticism for large-scale inference, especially for rare and weak effects
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q254401)