The composite absolute penalties family for grouped and hierarchical variable selection

DOI10.1214/07-AOS584zbMath1369.62164arXiv0909.0411OpenAlexW1994309289MaRDI QIDQ1043749

Publication date: 9 December 2009

Published in: The Annals of Statistics (Search for Journal in Brave)

Abstract: Extracting useful information from high-dimensional data is an important focus of today's statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the $L_1$-penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including $L_1$ to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the across-group and within-group levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached by defining groups with particular overlapping patterns. We propose using the BLASSO and cross-validation to compute CAP estimates in general. For a subfamily of CAP estimates involving only the $L_1$ and $L_{infty}$ norms, we introduce the iCAP algorithm to trace the entire regularization path for the grouped selection problem. Within this subfamily, unbiased estimates of the degrees of freedom (df) are derived so that the regularization parameter is selected without cross-validation. CAP is shown to improve on the predictive performance of the LASSO in a series of simulated experiments, including cases with $pgg n$ and possibly mis-specified groupings. When the complexity of a model is properly calculated, iCAP is seen to be parsimonious in the experiments.

Full work available at URL: https://arxiv.org/abs/0909.0411

zbMATH Keywords

linear regression variable selection penalized regression hierarchical models coefficient paths grouped selection

Mathematics Subject Classification ID

Ridge regression; shrinkage estimators (Lasso) (62J07)

Cites Work

Related Items (only showing first 100 items - show all)

Nonnegative-Lasso and application in index tracking ⋮ P-splines with an $\ell_1$ penalty for repeated measures ⋮ Nonnegative adaptive Lasso for ultra-high dimensional regression models and a two-stage method applied in financial modeling ⋮ Tuning parameter selection in sparse regression modeling ⋮ An analysis of penalized interaction models ⋮ Linearized alternating direction method of multipliers for sparse group and fused Lasso models ⋮ Joint estimation of precision matrices in heterogeneous populations ⋮ Biclustering via structured regularized matrix decomposition ⋮ An auxiliary function approach for Lasso in music composition using cellular automata ⋮ Identification of homogeneous and heterogeneous variables in pooled cohort studies ⋮ Structured Sparsity: Discrete and Convex Approaches ⋮ Modeling gene-covariate interactions in sparse regression with group structure for genome-wide association studies ⋮ Objective Bayesian group variable selection for linear model ⋮ Robust grouped variable selection using distributionally robust optimization ⋮ Robust shrinkage estimation and selection for functional multiple linear model through LAD loss ⋮ Data shared Lasso: a novel tool to discover uplift ⋮ Robust groupwise least angle regression ⋮ A lasso for hierarchical interactions ⋮ Variable selection and structure identification for varying coefficient Cox models ⋮ Graph structured sparse subset selection ⋮ Estimating the health effects of environmental mixtures using Bayesian semiparametric regression and sparsity inducing priors ⋮ Generalized Kalman smoothing: modeling and algorithms ⋮ Regression with outlier shrinkage ⋮ Grouping strategies and thresholding for high dimensional linear models ⋮ Discussion about ``Grouping strategies and thresholding for high dimensional linear models ⋮ Model selection for functional linear regression with hierarchical structure ⋮ A distributed algorithm for fitting generalized additive models ⋮ Variable selection for generalized linear mixed models by $L_1$-penalized estimation ⋮ Learning with optimal interpolation norms ⋮ The LASSO on latent indices for regression modeling with ordinal categorical predictors ⋮ Sparse and low-rank matrix regularization for learning time-varying Markov networks ⋮ Structural properties of affine sparsity constraints ⋮ Support union recovery in high-dimensional multivariate regression ⋮ Fast projections onto mixed-norm balls with applications ⋮ Coordinate ascent for penalized semiparametric regression on high-dimensional panel count data ⋮ A unified formulation for generalized oilfield development optimization ⋮ Simultaneous estimation and factor selection in quantile regression via adaptive sup-norm regularization ⋮ Regularizers for structured sparsity ⋮ An alternating determination-optimization approach for an additive multi-index model ⋮ Hierarchical sparse modeling: a choice of two group Lasso formulations ⋮ Theoretical properties of the overlapping groups Lasso ⋮ Smoothing proximal gradient method for general structured sparse regression ⋮ Sparsity with sign-coherent groups of variables via the cooperative-Lasso ⋮ Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis ⋮ Forest Garrote ⋮ Minimax sparse principal subspace estimation in high dimensions ⋮ Composite kernel learning ⋮ Proximal methods for the latent group lasso penalty ⋮ Oracle inequalities and optimal inference under group sparsity ⋮ Variable selection techniques after multiple imputation in high-dimensional data ⋮ A two-stage regularization method for variable selection and forecasting in high-order interaction model ⋮ Concave group methods for variable selection and estimation in high-dimensional varying coefficient models ⋮ On Degrees of Freedom of Projection Estimators With Applications to Multivariate Nonparametric Regression ⋮ Fixed and Random Effects Selection in Mixed Effects Models ⋮ Additive model selection ⋮ Logistic regression: from art to science ⋮ Learning with tensors: a framework based on convex optimization and spectral regularization ⋮ Structured, Sparse Aggregation ⋮ Factor Selection and Structural Identification in the Interaction ANOVA Model ⋮ Hierarchically penalized additive hazards model with diverging number of parameters ⋮ Efficient primal-dual fixed point algorithms with dynamic stepsize for composite convex optimization problems ⋮ Structured variable selection and estimation ⋮ Multinomial logit models with implicit variable selection ⋮ Adaptive group Lasso for high-dimensional generalized linear models ⋮ Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer ⋮ A novel T-S fuzzy systems identification with block structured sparse representation ⋮ Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping ⋮ Truncated estimation in functional generalized linear regression models ⋮ A flexible shrinkage operator for fussy grouped variable selection ⋮ Low Complexity Regularization of Linear Inverse Problems ⋮ Sparse regression using mixed norms ⋮ Bayesian indicator variable selection to incorporate hierarchical overlapping group structure in multi-omics applications ⋮ OR Forum—An Algorithmic Approach to Linear Regression ⋮ On integer and MPCC representability of affine sparsity ⋮ Model Selection for High-Dimensional Quadratic Regression via Regularization ⋮ Bi-selection in the high-dimensional additive hazards regression model ⋮ Fast global convergence of gradient methods for high-dimensional statistical recovery ⋮ Tight conditions for consistency of variable selection in the context of high dimensionality ⋮ Robust group non-convex estimations for high-dimensional partially linear models ⋮ Bayesian adaptive Lasso ⋮ The structured elastic net for quantile regression and support vector classification ⋮ Another look at linear programming for feature selection via methods of regularization ⋮ Multi-species distribution modeling using penalized mixture of regressions ⋮ Multiple Response Regression for Gaussian Mixture Models with Known Labels ⋮ AIC for the group Lasso in generalized linear models ⋮ Interaction Screening for Ultrahigh-Dimensional Data ⋮ Penalized Cox's proportional hazards model for high-dimensional survival data with grouped predictors ⋮ BIVAS: A Scalable Bayesian Method for Bi-Level Variable Selection With Applications ⋮ Variable selection in functional regression models: a review ⋮ The adaptive BerHu penalty in robust regression ⋮ Rank-based group variable selection ⋮ Investigating consumers' store-choice behavior via hierarchical variable selection ⋮ An easy-to-implement hierarchical standardization for variable selection under strong heredity constraint ⋮ Toward a theory of molecular computing ⋮ Sparse hierarchical regression with polynomials ⋮ Stochastic relaxed inertial forward-backward-forward splitting for monotone inclusions in Hilbert spaces ⋮ Sparse Pairwise Likelihood Estimation for Multivariate Longitudinal Mixed Models ⋮ Information criteria bias correction for group selection ⋮ Structured estimation for the nonparametric Cox model ⋮ Network‐Based Penalized Regression With Application to Genomic Data

Uses Software

This page was built for publication: The composite absolute penalties family for grouped and hierarchical variable selection