Rare and weak effects in large-scale inference: methods and phase diagrams
From MaRDI portal
Publication:3465091
feature selectionvariable selectionclassificationfeature rankingHamming distancehigher criticismlarge-scale inferencecontrol of FDRphase diagramrare and weak effectssparse signal detectionsparse precision matrixgraphlet screening (GS)asymptotic rare and weak (ARW)graph-guided multivariate screening proceduregraphlet screening
Abstract: Often when we deal with `Big Data', the true effects we are interested in are Rare and Weak (RW). Researchers measure a large number of features, hoping to find perhaps only a small fraction of them to be relevant to the research in question; the effect sizes of the relevant features are individually small so the true effects are not strong enough to stand out for themselves. Higher Criticism (HC) and Graphlet Screening (GS) are two classes of methods that are specifically designed for the Rare/Weak settings. HC was introduced to determine whether there are any relevant effects in all the measured features. More recently, HC was applied to classification, where it provides a method for selecting useful predictive features for trained classification rules. GS was introduced as a graph-guided multivariate screening procedure, and was used for variable selection. We develop a theoretic framework where we use an Asymptotic Rare and Weak (ARW) model simultaneously controlling the size and prevalence of useful/significant features among the useless/null bulk. At the heart of the ARW model is the so-called phase diagram, which is a way to visualize clearly the class of ARW settings where the relevant effects are so rare or weak that desired goals (signal detection, variable selection, etc.) are simply impossible to achieve. We show that HC and GS have important advantages over better known procedures and achieve the optimal phase diagrams in a variety of ARW settings. HC and GS are flexible ideas that adapt easily to many interesting situations. We review the basics of these ideas and some of the recent extensions, discuss their connections to existing literature, and suggest some new applications of these ideas.
Recommendations
- Higher criticism for large-scale inference, especially for rare and weak effects
- Feature selection by higher criticism thresholding achieves the optimal phase diagram
- Beyond HC: more sensitive tests for rare/weak alternatives
- Detecting weak signals in high dimensions
- Optimality of Graphlet Screening in High Dimensional Variable Selection
Cited in
(10)- Global testing against sparse alternatives under Ising models
- The impossibility region for detecting sparse mixtures using the higher criticism
- Higher criticism for discriminating word-frequency tables and authorship attribution
- On the power of some sequential multiple testing procedures
- Higher criticism for large-scale inference, especially for rare and weak effects
- Testing equivalence of clustering
- Feature selection by higher criticism thresholding achieves the optimal phase diagram
- Statistical limits of sparse mixture detection
- Sharp multiple testing boundary for sparse sequences
- Beyond HC: more sensitive tests for rare/weak alternatives
This page was built for publication: Rare and weak effects in large-scale inference: methods and phase diagrams
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3465091)