Correlated variables in regression: clustering and sparse estimation
From MaRDI portal
(Redirected from Publication:394080)
Lassovariable selectionhigh-dimensional inferencegroup Lassohierarchical clusteringoracle inequalityvariable screeningcanonical correlations
Computational methods for problems pertaining to statistics (62-08) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Linear regression; mixed models (62J05) Measures of association (correlation, canonical correlation, etc.) (62H20) Ridge regression; shrinkage estimators (Lasso) (62J07)
Abstract: We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso for cluster-representatives or the group Lasso based on the structure from the clusters. Regarding the first step, we present a novel and bottom-up agglomerative clustering algorithm based on canonical correlations, and we show that it finds an optimal solution and is statistically consistent. We also present some theoretical arguments that canonical correlation based clustering leads to a better-posed compatibility constant for the design matrix which ensures identifiability and an oracle inequality for the group Lasso. Furthermore, we discuss circumstances where cluster-representatives and using the Lasso as subsequent estimator leads to improved results for prediction and detection of variables. We complement the theoretical analysis with various empirical results.
Recommendations
- Cluster feature selection in high-dimensional linear models
- Supervised clustering of variables
- A Bayesian approach to multicollinearity and the simultaneous selection and clustering of predictors in linear regression
- scientific article; zbMATH DE number 6458363
- Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR
Cites work
- scientific article; zbMATH DE number 5957408 (Why is no real title available?)
- scientific article; zbMATH DE number 4062374 (Why is no real title available?)
- scientific article; zbMATH DE number 845714 (Why is no real title available?)
- Finding predictive gene groups from microarray data
- High-dimensional additive modeling
- High-dimensional graphs and variable selection with the Lasso
- Lasso-type recovery of sparse representations for high-dimensional data
- Local operator theory, random matrices and Banach spaces.
- Model Selection and Estimation in Regression with Grouped Variables
- On the conditions used to prove oracle results for the Lasso
- PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE
- Regularization and Variable Selection Via the Elastic Net
- Rejoinder: One-step sparse estimates in nonconcave penalized likelihood models
- Relaxed Lasso
- Scaled sparse linear regression
- Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR
- Simultaneous analysis of Lasso and Dantzig selector
- Sparse regression with exact clustering
- Statistics for high-dimensional data. Methods, theory and applications.
- The Adaptive Lasso and Its Oracle Properties
- The Lasso, correlated design, and improved oracle inequalities
- The sparse Laplacian shrinkage estimator for high-dimensional regression
- The sparsity and bias of the LASSO selection in high-dimensional linear regression
Cited in
(29)- Correlation and variable importance in random forests
- Hierarchical inference for genome-wide association studies: a view on methodology with software
- MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression
- Robust grouped variable selection using distributionally robust optimization
- A Bayesian approach to multicollinearity and the simultaneous selection and clustering of predictors in linear regression
- Graph-based regularization for regression problems with alignment and highly correlated designs
- A component Lasso
- On the Use of Minimum Penalties in Statistical Learning
- Cluster feature selection in high-dimensional linear models
- Extensions of stability selection using subsamples of observations and covariates
- Discussion of ``Correlated variables in regression: clustering and sparse estimation
- Numerical characterization of support recovery in sparse regression with correlated design
- A sequential rejection testing method for high-dimensional regression with correlated variables
- Sequential knockoffs for continuous and categorical predictors: with application to a large psoriatic arthritis clinical trial pool
- Weak signals in high-dimensional regression: detection, estimation and prediction
- Semi-Standard Partial Covariance Variable Selection When Irrepresentable Conditions Fail
- Spatially relaxed inference on high-dimensional linear models
- Split Regularized Regression
- The cluster graphical Lasso for improved estimation of Gaussian graphical models
- Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models
- Bayesian linear regression with sparse priors
- A general framework for estimation and inference from clusters of features
- scientific article; zbMATH DE number 6458363 (Why is no real title available?)
- Modeling association between multivariate correlated outcomes and high-dimensional sparse covariates: the adaptive SVS method
- The trimmed Lasso: sparse recovery guarantees and practical optimization by the generalized soft-min penalty
- Bayesian latent factor on image regression with nonignorable missing data
- A clustering-based feature selection method for automatically generated relational attributes
- Evolution of high-frequency systematic trading: a performance-driven gradient boosting model
- A cluster elastic net for multivariate regression
This page was built for publication: Correlated variables in regression: clustering and sparse estimation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q394080)