Distributed simultaneous inference in generalized linear models via confidence distribution
From MaRDI portal
Abstract: We propose a distributed method for simultaneous inference for datasets with sample size much larger than the number of covariates, i.e., N >> p, in the generalized linear models framework. When such datasets are too big to be analyzed entirely by a single centralized computer, or when datasets are already stored in distributed database systems, the strategy of divide-and-combine has been the method of choice for scalability. Due to partition, the sub-dataset sample sizes may be uneven and some possibly close to p, which calls for regularization techniques to improve numerical stability. However, there is a lack of clear theoretical justification and practical guidelines to combine results obtained from separate regularized estimators, especially when the final objective is simultaneous inference for a group of regression parameters. In this paper, we develop a strategy to combine bias-corrected lasso-type estimates by using confidence distributions. We show that the resulting combined estimator achieves the same estimation efficiency as that of the maximum likelihood estimator using the centralized data. As demonstrated by simulated and real data examples, our divide-and-combine method yields nearly identical inference as the centralized benchmark.
Recommendations
- A partitioned quasi-likelihood for distributed statistical inference
- Communication-efficient distributed estimator for generalized linear models with a diverging number of covariates
- Distributed statistical inference for massive data
- Distributed testing and estimation under sparse high dimensional models
- Communication-efficient sparse regression
Cites work
- scientific article; zbMATH DE number 3117956 (Why is no real title available?)
- scientific article; zbMATH DE number 47310 (Why is no real title available?)
- scientific article; zbMATH DE number 3511563 (Why is no real title available?)
- scientific article; zbMATH DE number 845714 (Why is no real title available?)
- A Scalable Bootstrap for Massive Data
- A split-and-conquer approach for analysis of
- Aggregated estimating equation estimation
- Balancing covariates via propensity score weighting
- Bayes and likelihood calculations from confidence intervals
- Combining information from independent sources through confidence distributions
- Communication-efficient sparse regression
- Confidence Distributions and a Unifying Framework for Meta-Analysis
- Confidence distribution, the frequentist distribution estimator of a parameter: a review
- Confidence intervals for low dimensional parameters in high dimensional linear models
- Correlated data analysis: modeling, analytics, and applications
- Distributed testing and estimation under sparse high dimensional models
- Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates
- Estimation in high-dimensional linear models with deterministic design matrices
- Fused Lasso approach in regression coefficients clustering -- learning parameter heterogeneity in data integration
- Fused Lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements
- Ideal spatial adaptation by wavelet shrinkage
- Merging multiple longitudinal studies with study-specific missing covariates: a joint estimating function approach
- Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness
- On asymptotically optimal confidence regions and tests for high-dimensional models
- On the relative efficiency of using summary statistics versus individual-level data in meta-analysis
- Regularization and Variable Selection Via the Elastic Net
- Scalable estimation strategies based on stochastic approximations: classical results and new insights
- Statistics for high-dimensional data. Methods, theory and applications.
- The Adaptive Lasso and Its Oracle Properties
- Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
Cited in
(22)- scientific article; zbMATH DE number 7448088 (Why is no real title available?)
- Communication-efficient distributed estimator for generalized linear models with a diverging number of covariates
- Distributed Bayesian Estimation of Linear Models With Unknown Observation Covariances
- A partitioned quasi-likelihood for distributed statistical inference
- Distributed statistical inference for linear models with multi-source massive heterogeneous data
- Distributed testing and estimation under sparse high dimensional models
- Multivariate survival analysis in big data: A divide‐and‐combine approach
- Turning the information-sharing dial: efficient inference from different data sources
- A discussion on “A selective review of statistical methods using calibration information from similar studies”
- An Asymptotic Analysis of Random Partition Based Minibatch Momentum Methods for Linear Regression Models
- Global debiased DC estimations for biased estimators via pro forma regression
- A selective review on statistical methods for massive data computation: distributed computing, subsampling, and minibatch techniques
- Transfer learning via random forests: a one-shot federated approach
- A binary hidden Markov model on spatial network for amyotrophic lateral sclerosis disease spreading pattern analysis
- CEDAR: Communication Efficient Distributed Analysis for Regressions
- Online two-way estimation and inference via linear mixed-effects models
- Fused mean structure learning in data integration with dependence
- Distributed statistical inference for massive data
- A distributed and integrated method of moments for high-dimensional correlated data analysis
- Unbalanced distributed estimation and inference for the precision matrix in Gaussian graphical models
- Divide and conquer for accelerated failure time model with massive time‐to‐event data
- Distributed smoothed rank regression with heterogeneous errors for massive data
This page was built for publication: Distributed simultaneous inference in generalized linear models via confidence distribution
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2293540)