A note on coding and standardization of categorical variables in (sparse) group Lasso regression

From MaRDI portal
Publication:146016

DOI10.48550/ARXIV.1805.06915zbMATH Open1437.62276arXiv1805.06915OpenAlexW2971287324WikidataQ127301634 ScholiaQ127301634MaRDI QIDQ146016FDOQ146016


Authors: Felicitas Detmer, Martin Slawski, Felicitas J. Detmer, Martin Slawski, Juan R. Cebral Edit this on Wikidata


Publication date: 17 May 2018

Published in: Journal of Statistical Planning and Inference (Search for Journal in Brave)

Abstract: Categorical regressor variables are usually handled by introducing a set of indicator variables, and imposing a linear constraint to ensure identifiability in the presence of an intercept, or equivalently, using one of various coding schemes. As proposed in Yuan and Lin [J. R. Statist. Soc. B, 68 (2006), 49-67], the group lasso is a natural and computationally convenient approach to perform variable selection in settings with categorical covariates. As pointed out by Simon and Tibshirani [Stat. Sin., 22 (2011), 983-1001], "standardization" by means of block-wise orthonormalization of column submatrices each corresponding to one group of variables can substantially boost performance. In this note, we study the aspect of standardization for the special case of categorical predictors in detail. The main result is that orthonormalization is not required; column-wise scaling of the design matrix followed by re-scaling and centering of the coefficients is shown to have exactly the same effect. Similar reductions can be achieved in the case of interactions. The extension to the so-called sparse group lasso, which additionally promotes within-group sparsity, is considered as well. The importance of proper standardization is illustrated via extensive simulations.


Full work available at URL: https://arxiv.org/abs/1805.06915




Recommendations




Cites Work


Cited In (1)

Uses Software





This page was built for publication: A note on coding and standardization of categorical variables in (sparse) group Lasso regression

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q146016)