Better subset regression

DOI10.1093/BIOMET/AST041MaRDI QIDQ5410312zbMATH OpenOpenAlexFDO

Publication date 16 April 2014

Published in Biometrika (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1212.0634

variable selection EM algorithm combinatorial optimization sure screening property orthogonal designs best subset regression

Mathematics Subject Classification ID

Linear regression; mixed models (62J05) Combinatorial optimization (90C27)

Abstract: To find efficient screening methods for high dimensional linear regression models, this paper studies the relationship between model fitting and screening performance. Under a sparsity assumption, we show that a subset that includes the true submodel always yields smaller residual sum of squares (i.e., has better model fitting) than all that do not in a general asymptotic setting. This indicates that, for screening important variables, we could follow a "better fitting, better screening" rule, i.e., pick a "better" subset that has better model fitting. To seek such a better subset, we consider the optimization problem associated with best subset regression. An EM algorithm, called orthogonalizing subset screening, and its accelerating version are proposed for searching for the best subset. Although the two algorithms cannot guarantee that a subset they yield is the best, their monotonicity property makes the subset have better model fitting than initial subsets generated by popular screening methods, and thus the subset can have better screening performance asymptotically. Simulation results show that our methods are very competitive in high dimensional variable screening even for finite sample sizes.

Recommendations

Cited in

(8)

Describes a project that uses

Uses Software

UCI

This page was built for publication: Better subset regression

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5410312)