A note on coding and standardization of categorical variables in (sparse) group Lasso regression
From MaRDI portal
(Redirected from Publication:146016)
Abstract: Categorical regressor variables are usually handled by introducing a set of indicator variables, and imposing a linear constraint to ensure identifiability in the presence of an intercept, or equivalently, using one of various coding schemes. As proposed in Yuan and Lin [J. R. Statist. Soc. B, 68 (2006), 49-67], the group lasso is a natural and computationally convenient approach to perform variable selection in settings with categorical covariates. As pointed out by Simon and Tibshirani [Stat. Sin., 22 (2011), 983-1001], "standardization" by means of block-wise orthonormalization of column submatrices each corresponding to one group of variables can substantially boost performance. In this note, we study the aspect of standardization for the special case of categorical predictors in detail. The main result is that orthonormalization is not required; column-wise scaling of the design matrix followed by re-scaling and centering of the coefficients is shown to have exactly the same effect. Similar reductions can be achieved in the case of interactions. The extension to the so-called sparse group lasso, which additionally promotes within-group sparsity, is considered as well. The importance of proper standardization is illustrated via extensive simulations.
Recommendations
Cites work
- scientific article; zbMATH DE number 65765 (Why is no real title available?)
- scientific article; zbMATH DE number 961607 (Why is no real title available?)
- A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers
- Model Selection and Estimation in Regression with Grouped Variables
- Oracle inequalities and optimal inference under group sparsity
- Standardization and the group lasso penalty
- The Group Lasso for Logistic Regression
- The benefit of group sparsity
This page was built for publication: A note on coding and standardization of categorical variables in (sparse) group Lasso regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q146016)