Correlated variables in regression: clustering and sparse estimation
From MaRDI portal
(Redirected from Publication:394080)
Lassovariable selectionhigh-dimensional inferencegroup Lassohierarchical clusteringoracle inequalityvariable screeningcanonical correlations
Computational methods for problems pertaining to statistics (62-08) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Linear regression; mixed models (62J05) Measures of association (correlation, canonical correlation, etc.) (62H20) Ridge regression; shrinkage estimators (Lasso) (62J07)
Abstract: We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso for cluster-representatives or the group Lasso based on the structure from the clusters. Regarding the first step, we present a novel and bottom-up agglomerative clustering algorithm based on canonical correlations, and we show that it finds an optimal solution and is statistically consistent. We also present some theoretical arguments that canonical correlation based clustering leads to a better-posed compatibility constant for the design matrix which ensures identifiability and an oracle inequality for the group Lasso. Furthermore, we discuss circumstances where cluster-representatives and using the Lasso as subsequent estimator leads to improved results for prediction and detection of variables. We complement the theoretical analysis with various empirical results.
Recommendations
- Cluster feature selection in high-dimensional linear models
- Supervised clustering of variables
- A Bayesian approach to multicollinearity and the simultaneous selection and clustering of predictors in linear regression
- scientific article; zbMATH DE number 6458363
- Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR
Cites work
- scientific article; zbMATH DE number 5957408 (Why is no real title available?)
- scientific article; zbMATH DE number 4062374 (Why is no real title available?)
- scientific article; zbMATH DE number 845714 (Why is no real title available?)
- Finding predictive gene groups from microarray data
- High-dimensional additive modeling
- High-dimensional graphs and variable selection with the Lasso
- Lasso-type recovery of sparse representations for high-dimensional data
- Local operator theory, random matrices and Banach spaces.
- Model Selection and Estimation in Regression with Grouped Variables
- On the conditions used to prove oracle results for the Lasso
- PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE
- Regularization and Variable Selection Via the Elastic Net
- Rejoinder: One-step sparse estimates in nonconcave penalized likelihood models
- Relaxed Lasso
- Scaled sparse linear regression
- Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR
- Simultaneous analysis of Lasso and Dantzig selector
- Sparse regression with exact clustering
- Statistics for high-dimensional data. Methods, theory and applications.
- The Adaptive Lasso and Its Oracle Properties
- The Lasso, correlated design, and improved oracle inequalities
- The sparse Laplacian shrinkage estimator for high-dimensional regression
- The sparsity and bias of the LASSO selection in high-dimensional linear regression
Cited in
(29)- Discussion of ``Correlated variables in regression: clustering and sparse estimation
- On the Use of Minimum Penalties in Statistical Learning
- Numerical characterization of support recovery in sparse regression with correlated design
- Hierarchical inference for genome-wide association studies: a view on methodology with software
- Bayesian latent factor on image regression with nonignorable missing data
- A general framework for estimation and inference from clusters of features
- Sequential knockoffs for continuous and categorical predictors: with application to a large psoriatic arthritis clinical trial pool
- scientific article; zbMATH DE number 6458363 (Why is no real title available?)
- The trimmed Lasso: sparse recovery guarantees and practical optimization by the generalized soft-min penalty
- The cluster graphical Lasso for improved estimation of Gaussian graphical models
- Evolution of high-frequency systematic trading: a performance-driven gradient boosting model
- Bayesian linear regression with sparse priors
- A clustering-based feature selection method for automatically generated relational attributes
- A component Lasso
- Cluster feature selection in high-dimensional linear models
- Split Regularized Regression
- A Bayesian approach to multicollinearity and the simultaneous selection and clustering of predictors in linear regression
- Graph-based regularization for regression problems with alignment and highly correlated designs
- Weak signals in high-dimensional regression: detection, estimation and prediction
- A sequential rejection testing method for high-dimensional regression with correlated variables
- Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models
- Modeling association between multivariate correlated outcomes and high-dimensional sparse covariates: the adaptive SVS method
- Correlation and variable importance in random forests
- Extensions of stability selection using subsamples of observations and covariates
- Robust grouped variable selection using distributionally robust optimization
- Semi-Standard Partial Covariance Variable Selection When Irrepresentable Conditions Fail
- MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression
- Spatially relaxed inference on high-dimensional linear models
- A cluster elastic net for multivariate regression
This page was built for publication: Correlated variables in regression: clustering and sparse estimation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q394080)