Distributed statistical inference for massive data
From MaRDI portal
Publication:2054533
DOI10.1214/21-AOS2062zbMATH Open1486.62123arXiv1805.11214OpenAlexW3211347790MaRDI QIDQ2054533FDOQ2054533
Authors: Liuhua Peng, Song Xi Chen
Publication date: 3 December 2021
Published in: The Annals of Statistics (Search for Journal in Brave)
Abstract: This paper considers distributed statistical inference for general symmetric statistics %that encompasses the U-statistics and the M-estimators in the context of massive data where the data can be stored at multiple platforms in different locations. In order to facilitate effective computation and to avoid expensive communication among different platforms, we formulate distributed statistics which can be conducted over smaller data blocks. The statistical properties of the distributed statistics are investigated in terms of the mean square error of estimation and asymptotic distributions with respect to the number of data blocks. In addition, we propose two distributed bootstrap algorithms which are computationally effective and are able to capture the underlying distribution of the distributed statistics. Numerical simulation and real data applications of the proposed approaches are provided to demonstrate the empirical performance.
Full work available at URL: https://arxiv.org/abs/1805.11214
Recommendations
- Statistical inference in massive datasets by empirical likelihood
- Distributed inference for quantile regression processes
- A partitioned quasi-likelihood for distributed statistical inference
- Distributed simultaneous inference in generalized linear models via confidence distribution
- Parallel inference for massive distributed spatial data using low-rank models
Asymptotic properties of nonparametric inference (62G20) Asymptotic distribution theory in statistics (62E20) Nonparametric statistical resampling methods (62G09)
Cites Work
- Measuring and testing dependence by correlation of distances
- Approximation Theorems of Mathematical Statistics
- Bootstrap methods: another look at the jackknife
- Title not available (Why is that?)
- Bootstrap procedures under some non-i.i.d. models
- On the bootstrap of \(U\) and \(V\) statistics
- On the validity of the formal Edgeworth expansion
- The bootstrap and Edgeworth expansion
- Smoothed empirical likelihood confidence intervals for quantiles
- Fast surrogates of U-statistics
- Title not available (Why is that?)
- The Limiting Distribution of the Maximum Rank Correlation Estimator
- Title not available (Why is that?)
- A split-and-conquer approach for analysis of
- A general Bahadur representation of \(M\)-estimators and its application to linear regression with nonstochastic designs
- Title not available (Why is that?)
- A Scalable Bootstrap for Massive Data
- Distributed testing and estimation under sparse high dimensional models
- Distributed inference for quantile regression processes
- Communication-efficient algorithms for statistical optimization
- FAST DOUBLE BOOTSTRAP TESTS OF NONNESTED LINEAR REGRESSION MODELS
- Title not available (Why is that?)
- Double-bootstrap methods that use a single double-bootstrap simulation
Cited In (19)
- LIC criterion for optimal subset selection in distributed interval estimation
- Edgeworth expansions for network moments
- Projection divergence in the reproducing kernel Hilbert space: asymptotic normality, block-wise and slicing estimation, and computational efficiency
- Distributed prediction from vertically partitioned data
- Distributed inference for two‐sample U‐statistics in massive data analysis
- Discussion of the paper ‘A review of distributed statistical inference’
- Discussion of: ‘A review of distributed statistical inference’
- Rejoinder on ‘A review of distributed statistical inference’
- The COR criterion for optimal subset selection in distributed estimation
- Optimal Subsampling Bootstrap for Massive Data
- A selective review on statistical methods for massive data computation: distributed computing, subsampling, and minibatch techniques
- Empirical likelihood ratio tests for non-nested model selection based on predictive losses
- Distributed hypothesis testing for large dimensional two-sample mean vectors
- An Asynchronous Distributed Expectation Maximization Algorithm for Massive Data: The DEM Algorithm
- Title not available (Why is that?)
- Distributed Bayesian posterior voting strategy for massive data
- A review of distributed statistical inference
- On variance estimation of random forests with Infinite-order U-statistics
- Asymptotic distributions of a new type of design-based incomplete U-statistics
This page was built for publication: Distributed statistical inference for massive data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2054533)