Finding large average submatrices in high dimensional data
From MaRDI portal
Publication:985018
Abstract: The search for sample-variable associations is an important problem in the exploratory analysis of high dimensional data. Biclustering methods search for sample-variable associations in the form of distinguished submatrices of the data matrix. (The rows and columns of a submatrix need not be contiguous.) In this paper we propose and evaluate a statistically motivated biclustering procedure (LAS) that finds large average submatrices within a given real-valued data matrix. The procedure operates in an iterative-residual fashion, and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value. We examine the performance and potential utility of LAS, and compare it with a number of existing methods, through an extensive three-part validation study using two gene expression datasets. The validation study examines quantitative properties of biclusters, biological and clinical assessments using auxiliary information, and classification of disease subtypes using bicluster membership. In addition, we carry out a simulation study to assess the effectiveness and noise sensitivity of the LAS search procedure. These results suggest that LAS is an effective exploratory tool for the discovery of biologically relevant structures in high dimensional data. Software is available at https://genome.unc.edu/las/.
Recommendations
Cites work
- scientific article; zbMATH DE number 1687015 (Why is no real title available?)
- scientific article; zbMATH DE number 1750182 (Why is no real title available?)
- Clustering Objects on Subsets of Attributes (with Discussion)
- Decomposing gene expression into cellular processes
- Finding large average submatrices in high dimensional data
- Improved biclustering of microarray data demonstrated through systematic performance tests
- The minimum description length principle in coding and modeling
Cited in
(29)- Energy landscape for large average submatrix detection problems in Gaussian random matrices
- Finding a large submatrix of a Gaussian random matrix
- Statistical mechanics of the maximum-average submatrix problem
- Distribution-free detection of a submatrix
- Generalized co-clustering analysis via regularized alternating least squares
- On combinatorial testing problems
- Biclustering via structured regularized matrix decomposition
- Spike-and-slab Lasso biclustering
- Computational barriers in minimax submatrix detection
- Distribution-free, size adaptive submatrix detection with acceleration
- Multilevel Matrix-Variate Analysis and its Application to Accelerometry-Measured Physical Activity in Clinical Populations
- Biclustering via sparse singular value decomposition
- Local spatial biclustering and prediction of urban juvenile delinquency and recidivism
- Finding large average submatrices in high dimensional data
- On uniform concentration bounds for bi-clustering by using the Vapnik-Chervonenkis theory
- Handling high-dimensional data with missing values by modern machine learning techniques
- Mediation analysis for high-dimensional mediators and outcomes with an application to multimodal imaging data
- Submatrix localization via message passing
- Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix
- A goodness-of-fit test on the number of biclusters in a relational data matrix
- Convex biclustering
- Compressed spectral screening for large-scale differential correlation analysis with application in selecting glioblastoma gene modules
- Biclustering with heterogeneous variance
- Detection of a sparse submatrix of a high-dimensional noisy matrix
- The overlap gap property in principal submatrix recovery
- A testing based extraction algorithm for identifying significant communities in networks
- Computational barriers to estimation from low-degree polynomials
- Finding one community in a sparse graph
- Finding hidden cliques of size \(\sqrt{N/e}\) in nearly linear time
This page was built for publication: Finding large average submatrices in high dimensional data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q985018)