On b-bit min-wise hashing for large-scale regression and classification with sparse data
From MaRDI portal
Publication:4558503
zbMATH Open1468.68170arXiv1308.1269MaRDI QIDQ4558503FDOQ4558503
Authors: Rajen D. Shah, Nicolai Meinshausen
Publication date: 22 November 2018
Abstract: Large-scale regression problems where both the number of variables, , and the number of observations, , may be large and in the order of millions or more, are becoming increasingly more common. Typically the data are sparse: only a fraction of a percent of the entries in the design matrix are non-zero. Nevertheless, often the only computationally feasible approach is to perform dimension reduction to obtain a new design matrix with far fewer columns and then work with this compressed data. -bit min-wise hashing (Li and Konig, 2011) is a promising dimension reduction scheme for sparse matrices which produces a set of random features such that regression on the resulting design matrix approximates a kernel regression with the resemblance kernel. In this work, we derive bounds on the prediction error of such regressions. For both linear and logistic models we show that the average prediction error vanishes asymptotically as long as , where is the average number of non-zero entries in each row of the design matrix and is the coefficient of the linear predictor. We also show that ordinary least squares or ridge regression applied to the reduced data can in fact allow us fit more flexible models. We obtain non-asymptotic prediction error bounds for interaction models and for models where an unknown row normalisation must be applied in order for the signal to be linear in the predictors.
Full work available at URL: https://arxiv.org/abs/1308.1269
Recommendations
Classification and discrimination; cluster analysis (statistical aspects) (62H30) Learning and adaptive systems in artificial intelligence (68T05) Ridge regression; shrinkage estimators (Lasso) (62J07)
Cites Work
- Predictive learning via rule ensembles
- Least angle regression. (With discussion)
- Title not available (Why is that?)
- Statistics for high-dimensional data. Methods, theory and applications.
- Random forests
- High-dimensional generalized linear models and the lasso
- Ridge Regression: Biased Estimation for Nonorthogonal Problems
- Boosting for high-dimensional linear models
- Extensions of Lipschitz mappings into a Hilbert space
- Title not available (Why is that?)
- Kernel methods in machine learning
- On the equivalence between kernel quadrature rules and random feature expansions
- Title not available (Why is that?)
- Faster least squares approximation
- Randomized Algorithms for Matrices and Data
- Greed is Good: Algorithmic Results for Sparse Approximation
- Universal classes of hash functions
- Title not available (Why is that?)
- Hash kernels for structured data
- Title not available (Why is that?)
- Title not available (Why is that?)
- A proof for the positive definiteness of the Jaccard index matrix
- Randomized Sketches of Convex Programs With Sharp Guarantees
- Random projections for the nonnegative least-squares problem
- Iterative Hessian sketch: fast and accurate solution approximation for constrained least-squares
- Randomized sketches for kernels: fast and optimal nonparametric regression
- Title not available (Why is that?)
Cited In (1)
Uses Software
This page was built for publication: On \(b\)-bit min-wise hashing for large-scale regression and classification with sparse data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4558503)