WONDER: weighted one-shot distributed ridge regression in high dimensions
From MaRDI portal
Publication:4969115
zbMATH Open1498.68232arXiv1903.09321MaRDI QIDQ4969115FDOQ4969115
Authors: Edgar Dobriban, Yue Sheng
Publication date: 5 October 2020
Abstract: In many areas, practitioners need to analyze large datasets that challenge conventional single-machine computing. To scale up data analysis, distributed and parallel computing approaches are increasingly needed. Here we study a fundamental and highly important problem in this area: How to do ridge regression in a distributed computing environment? Ridge regression is an extremely popular method for supervised learning, and has several optimality properties, thus it is important to study. We study one-shot methods that construct weighted combinations of ridge regression estimators computed on each machine. By analyzing the mean squared error in a high dimensional random-effects model where each predictor has a small effect, we discover several new phenomena. 1. Infinite-worker limit: The distributed estimator works well for very large numbers of machines, a phenomenon we call "infinite-worker limit". 2. Optimal weights: The optimal weights for combining local estimators sum to more than unity, due to the downward bias of ridge. Thus, all averaging methods are suboptimal. We also propose a new Weighted ONe-shot DistributEd Ridge regression (WONDER) algorithm. We test WONDER in simulation studies and using the Million Song Dataset as an example. There it can save at least 100x in computation time, while nearly preserving test accuracy.
Full work available at URL: https://arxiv.org/abs/1903.09321
Recommendations
Statistical aspects of big data and data science (62R07) Learning and adaptive systems in artificial intelligence (68T05) Ridge regression; shrinkage estimators (Lasso) (62J07) Random matrices (probabilistic aspects) (60B20)
Cites Work
- Distributed optimization and statistical learning via the alternating direction method of multipliers
- Title not available (Why is that?)
- Spectral analysis of large dimensional random matrices
- Lectures on the Combinatorics of Free Probability
- Spectral convergence for a general class of random matrices
- On high-dimensional misspecified mixed model analysis in genome-wide association study
- Title not available (Why is that?)
- Random matrix theory in statistics: a review
- Title not available (Why is that?)
- Title not available (Why is that?)
- Free Random Variables
- Random matrix methods for wireless communications.
- Deterministic equivalents for certain functionals of large random matrices
- Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates
- Variance estimation in high-dimensional linear models
- A partially linear framework for massive heterogeneous data
- A split-and-conquer approach for analysis of
- Large sample covariance matrices and high-dimensional data analysis
- A Deterministic Equivalent for the Analysis of Correlated MIMO Multiple Access Channels
- REML estimation: Asymptotic behavior and related topics
- Distributed inference for linear support vector machine
- Ridge regression and asymptotic minimax estimation over spheres of growing dimension
- Distributed learning with regularized least squares
- Flexible results for quadratic forms with applications to variance components estimation
- Learning theory of distributed regression with bias corrected regularization kernel network
- Communication-efficient sparse regression
- Distributed testing and estimation under sparse high dimensional models
- Distributed inference for quantile regression processes
- Communication-efficient algorithms for statistical optimization
- Title not available (Why is that?)
- Communication-efficient distributed statistical inference
- Divide and conquer in nonstandard problems and the super-efficiency phenomenon
- Computational Limits of A Distributed Algorithm For Smoothing Spline
- On the optimality of averaging in distributed statistical learning
- High-dimensional asymptotics of prediction: ridge regression and classification
- Algorithmic aspects of parallel data processing
- Communication lower bounds for statistical estimation problems via a distributed data processing inequality
- Parallel Programming
- Distributed estimation of principal eigenspaces
- Quantile regression under memory constraint
Cited In (9)
- Title not available (Why is that?)
- High-Dimensional Analysis of Double Descent for Linear Regression with Random Projections
- Canonical thresholding for nonsparse high-dimensional linear regression
- Distributed estimation with empirical likelihood
- Distributed estimation and inference for semiparametric binary response models
- Distributed linear regression by averaging
- WONDER
- Estimation and inference in sparse multivariate regression and conditional Gaussian graphical models under an unbalanced distributed setting
- Title not available (Why is that?)
This page was built for publication: WONDER: weighted one-shot distributed ridge regression in high dimensions
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4969115)