Distributed statistical inference for massive data
From MaRDI portal
Abstract: This paper considers distributed statistical inference for general symmetric statistics %that encompasses the U-statistics and the M-estimators in the context of massive data where the data can be stored at multiple platforms in different locations. In order to facilitate effective computation and to avoid expensive communication among different platforms, we formulate distributed statistics which can be conducted over smaller data blocks. The statistical properties of the distributed statistics are investigated in terms of the mean square error of estimation and asymptotic distributions with respect to the number of data blocks. In addition, we propose two distributed bootstrap algorithms which are computationally effective and are able to capture the underlying distribution of the distributed statistics. Numerical simulation and real data applications of the proposed approaches are provided to demonstrate the empirical performance.
Recommendations
- Statistical inference in massive datasets by empirical likelihood
- Distributed inference for quantile regression processes
- A partitioned quasi-likelihood for distributed statistical inference
- Distributed simultaneous inference in generalized linear models via confidence distribution
- Parallel inference for massive distributed spatial data using low-rank models
Cites work
- scientific article; zbMATH DE number 5711156 (Why is no real title available?)
- scientific article; zbMATH DE number 3744299 (Why is no real title available?)
- scientific article; zbMATH DE number 777879 (Why is no real title available?)
- scientific article; zbMATH DE number 845999 (Why is no real title available?)
- scientific article; zbMATH DE number 3336465 (Why is no real title available?)
- A Scalable Bootstrap for Massive Data
- A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs
- A split-and-conquer approach for analysis of
- Approximation Theorems of Mathematical Statistics
- Bootstrap methods: another look at the jackknife
- Bootstrap procedures under some non-i.i.d. models
- Communication-efficient algorithms for statistical optimization
- Distributed inference for quantile regression processes
- Distributed testing and estimation under sparse high dimensional models
- Double-bootstrap methods that use a single double-bootstrap simulation
- FAST DOUBLE BOOTSTRAP TESTS OF NONNESTED LINEAR REGRESSION MODELS
- Fast surrogates of U-statistics
- Measuring and testing dependence by correlation of distances
- On the bootstrap of \(U\) and \(V\) statistics
- On the validity of the formal Edgeworth expansion
- Smoothed empirical likelihood confidence intervals for quantiles
- The Limiting Distribution of the Maximum Rank Correlation Estimator
- The bootstrap and Edgeworth expansion
Cited in
(48)- A Simple Divide-and-Conquer-based Distributed Method for the Accelerated Failure Time Model
- Distributed learning for kernel mode-based regression
- Supervised Stratified Subsampling for Predictive Analytics
- LIC criterion for optimal subset selection in distributed interval estimation
- Circumventing superefficiency: an effective strategy for distributed computing in non-standard problems
- A partitioned quasi-likelihood for distributed statistical inference
- Distributed statistical inference for linear models with multi-source massive heterogeneous data
- Scalable resampling in massive generalized linear models via subsampled residual bootstrap
- Quadratic discriminant analysis in distributed frameworks
- A distributed one-step estimator
- Edgeworth expansions for network moments
- Distributed prediction from vertically partitioned data
- Byzantine-robust distributed support vector machine
- Projection divergence in the reproducing kernel Hilbert space: asymptotic normality, block-wise and slicing estimation, and computational efficiency
- Communication-efficient distributed statistical inference
- Distributed inference for two‐sample U‐statistics in massive data analysis
- Discussion of the paper ‘A review of distributed statistical inference’
- Discussion of: ‘A review of distributed statistical inference’
- Discussion on the paper ‘A review of distributed statistical inference’
- Rejoinder on ‘A review of distributed statistical inference’
- The COR criterion for optimal subset selection in distributed estimation
- Distributed inference for quantile regression processes
- An asymptotic analysis of distributed nonparametric methods
- Distributed estimation of spiked eigenvalues in spiked population models
- Statistical inference in massive datasets by empirical likelihood
- Optimal Subsampling Bootstrap for Massive Data
- A selective review on statistical methods for massive data computation: distributed computing, subsampling, and minibatch techniques
- Multiplier and empirical subsample bootstraps for maxima in high dimensional time series analysis
- Distributed semi-supervised single-index model with corruption
- Distributed inference for linear support vector machine
- Empirical likelihood ratio tests for non-nested model selection based on predictive losses
- Hypothesis testing of one sample mean vector in distributed frameworks
- Distributed hypothesis testing for large dimensional two-sample mean vectors
- Communication-efficient distributed estimation and computation using skew-normal distribution
- Distributed simultaneous inference in generalized linear models via confidence distribution
- An Asynchronous Distributed Expectation Maximization Algorithm for Massive Data: The DEM Algorithm
- Distributed parameter estimation framework based on moment method
- scientific article; zbMATH DE number 3878755 (Why is no real title available?)
- U-Statistic Reduction: Higher-Order Accurate Risk Control and Statistical-Computational Trade-Off
- Distributed Bayesian posterior voting strategy for massive data
- Distributed statistical estimation and rates of convergence in normal approximation
- CluBear: a subsampling package for interactive statistical analysis with massive data on a single machine
- A review of distributed statistical inference
- Heterogeneity-aware debiased machine learning for high-dimensional partially linear models
- Parallel inference for massive distributed spatial data using low-rank models
- On variance estimation of random forests with Infinite-order U-statistics
- Asymptotic distributions of a new type of design-based incomplete U-statistics
- WONDER: weighted one-shot distributed ridge regression in high dimensions
This page was built for publication: Distributed statistical inference for massive data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2054533)