Divide and conquer local average regression
From MaRDI portal
Publication:527072
DOI10.1214/17-EJS1265zbMATH Open1362.62085arXiv1601.06239OpenAlexW2963200104MaRDI QIDQ527072FDOQ527072
Shaobo Lin, Xiangyu Chang, Yao Wang
Publication date: 16 May 2017
Published in: Electronic Journal of Statistics (Search for Journal in Brave)
Abstract: The divide and conquer strategy, which breaks a massive data set into a se- ries of manageable data blocks, and then combines the independent results of data blocks to obtain a final decision, has been recognized as a state-of-the-art method to overcome challenges of massive data analysis. In this paper, we merge the divide and conquer strategy with local average regression methods to infer the regressive relationship of input-output pairs from a massive data set. After theoretically analyzing the pros and cons, we find that although the divide and conquer local average regression can reach the optimal learning rate, the restric- tion to the number of data blocks is a bit strong, which makes it only feasible for small number of data blocks. We then propose two variants to lessen (or remove) this restriction. Our results show that these variants can achieve the optimal learning rate with much milder restriction (or without such restriction). Extensive experimental studies are carried out to verify our theoretical assertions.
Full work available at URL: https://arxiv.org/abs/1601.06239
Recommendations
\(k\) nearest neighbor estimatedivide and conquer strategylocal average regressionNadaraya-Watson estimate
Cited In (21)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Radial basis function approximation with distributively stored data on spheres
- A model robust subsampling approach for generalised linear models in big data settings
- Distributed regression learning with coefficient regularization
- An Asymptotic Analysis of Random Partition Based Minibatch Momentum Methods for Linear Regression Models
- A distributed community detection algorithm for large scale networks under stochastic block models
- Distributed estimation and inference for spatial autoregression model with large scale networks
- A selective review on statistical methods for massive data computation: distributed computing, subsampling, and minibatch techniques
- Automatic variable selection in a linear model on massive data
- On a Nadaraya-Watson estimator with two bandwidths
- Least-Square Approximation for a Distributed System
- Distributed sequential estimation procedures
- Adaptive distributed inference for multi-source massive heterogeneous data
- Title not available (Why is that?)
- Online Updating of Survival Analysis
- A review of distributed statistical inference
- Title not available (Why is that?)
- Local averaging of heterogeneous regression models
- Confidence interval construction in massive data sets
- Feature Screening for Massive Data Analysis by Subsampling
This page was built for publication: Divide and conquer local average regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q527072)