Distributed statistical inference for massive data
From MaRDI portal
Publication:2054533
Abstract: This paper considers distributed statistical inference for general symmetric statistics %that encompasses the U-statistics and the M-estimators in the context of massive data where the data can be stored at multiple platforms in different locations. In order to facilitate effective computation and to avoid expensive communication among different platforms, we formulate distributed statistics which can be conducted over smaller data blocks. The statistical properties of the distributed statistics are investigated in terms of the mean square error of estimation and asymptotic distributions with respect to the number of data blocks. In addition, we propose two distributed bootstrap algorithms which are computationally effective and are able to capture the underlying distribution of the distributed statistics. Numerical simulation and real data applications of the proposed approaches are provided to demonstrate the empirical performance.
Recommendations
- Statistical inference in massive datasets by empirical likelihood
- Distributed inference for quantile regression processes
- A partitioned quasi-likelihood for distributed statistical inference
- Distributed simultaneous inference in generalized linear models via confidence distribution
- Parallel inference for massive distributed spatial data using low-rank models
Cites work
- scientific article; zbMATH DE number 5711156 (Why is no real title available?)
- scientific article; zbMATH DE number 3744299 (Why is no real title available?)
- scientific article; zbMATH DE number 777879 (Why is no real title available?)
- scientific article; zbMATH DE number 845999 (Why is no real title available?)
- scientific article; zbMATH DE number 3336465 (Why is no real title available?)
- A Scalable Bootstrap for Massive Data
- A general Bahadur representation of \(M\)-estimators and its application to linear regression with nonstochastic designs
- A split-and-conquer approach for analysis of
- Approximation Theorems of Mathematical Statistics
- Bootstrap methods: another look at the jackknife
- Bootstrap procedures under some non-i.i.d. models
- Communication-efficient algorithms for statistical optimization
- Distributed inference for quantile regression processes
- Distributed testing and estimation under sparse high dimensional models
- Double-bootstrap methods that use a single double-bootstrap simulation
- FAST DOUBLE BOOTSTRAP TESTS OF NONNESTED LINEAR REGRESSION MODELS
- Fast surrogates of U-statistics
- Measuring and testing dependence by correlation of distances
- On the bootstrap of \(U\) and \(V\) statistics
- On the validity of the formal Edgeworth expansion
- Smoothed empirical likelihood confidence intervals for quantiles
- The Limiting Distribution of the Maximum Rank Correlation Estimator
- The bootstrap and Edgeworth expansion
Cited in
(34)- A selective review on statistical methods for massive data computation: distributed computing, subsampling, and minibatch techniques
- scientific article; zbMATH DE number 3878755 (Why is no real title available?)
- A review of distributed statistical inference
- WONDER: weighted one-shot distributed ridge regression in high dimensions
- Projection divergence in the reproducing kernel Hilbert space: asymptotic normality, block-wise and slicing estimation, and computational efficiency
- Discussion of the paper ‘A review of distributed statistical inference’
- Discussion of: ‘A review of distributed statistical inference’
- Discussion on the paper ‘A review of distributed statistical inference’
- Rejoinder on ‘A review of distributed statistical inference’
- Distributed inference for linear support vector machine
- Circumventing superefficiency: an effective strategy for distributed computing in non-standard problems
- Distributed parameter estimation framework based on moment method
- Distributed prediction from vertically partitioned data
- On variance estimation of random forests with Infinite-order U-statistics
- Asymptotic distributions of a new type of design-based incomplete U-statistics
- Communication-efficient distributed statistical inference
- Empirical likelihood ratio tests for non-nested model selection based on predictive losses
- The COR criterion for optimal subset selection in distributed estimation
- An Asynchronous Distributed Expectation Maximization Algorithm for Massive Data: The DEM Algorithm
- A partitioned quasi-likelihood for distributed statistical inference
- Statistical inference in massive datasets by empirical likelihood
- Distributed inference for quantile regression processes
- Distributed statistical estimation and rates of convergence in normal approximation
- LIC criterion for optimal subset selection in distributed interval estimation
- Distributed hypothesis testing for large dimensional two-sample mean vectors
- Edgeworth expansions for network moments
- Parallel inference for massive distributed spatial data using low-rank models
- Distributed statistical inference for linear models with multi-source massive heterogeneous data
- Distributed simultaneous inference in generalized linear models via confidence distribution
- An asymptotic analysis of distributed nonparametric methods
- Distributed Bayesian posterior voting strategy for massive data
- Optimal Subsampling Bootstrap for Massive Data
- Distributed inference for two‐sample U‐statistics in massive data analysis
- A distributed one-step estimator
This page was built for publication: Distributed statistical inference for massive data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2054533)