Divide-and-conquer information-based optimal subdata selection algorithm
From MaRDI portal
Publication:2321778
DOI10.1007/S42519-019-0048-5zbMATH Open1425.62087arXiv1905.09948OpenAlexW3105524933WikidataQ127560775 ScholiaQ127560775MaRDI QIDQ2321778FDOQ2321778
Authors: Haiying Wang
Publication date: 23 August 2019
Published in: Journal of Statistical Theory and Practice (Search for Journal in Brave)
Abstract: The information-based optimal subdata selection (IBOSS) is a computationally efficient method to select informative data points from large data sets through processing full data by columns. However, when the volume of a data set is too large to be processed in the available memory of a machine, it is infeasible to implement the IBOSS procedure. This paper develops a divide-and-conquer IBOSS approach to solving this problem, in which the full data set is divided into smaller partitions to be loaded into the memory and then subsets of data are selected from each partitions using the IBOSS algorithm. We derive both finite sample properties and asymptotic properties of the resulting estimator. Asymptotic results show that if the full data set is partitioned randomly and the number of partitions is not very large, then the resultant estimator has the same estimation efficiency as the original IBOSS estimator. We also carry out numerical experiments to evaluate the empirical performance of the proposed method.
Full work available at URL: https://arxiv.org/abs/1905.09948
Recommendations
- Subset selection algorithm based on mutual information
- Information-based optimal subdata selection for non-linear models
- Information-Based Optimal Subdata Selection for Big Data Linear Regression
- scientific article; zbMATH DE number 3940420
- Subdata selection algorithm for linear model discrimination
- Best subset selection via a modern optimization lens
- Subdata selection based on orthogonal array for big data
- Information-based optimal subdata selection for big data logistic regression
D-optimalitylinear regressionbig datainformation matrixinformation-based optimal subdata selection (IBOSS)subdata
Cites Work
- Julia: A Fresh Approach to Numerical Computing
- Title not available (Why is that?)
- A split-and-conquer approach for analysis of
- Title not available (Why is that?)
- Aggregated estimating equation estimation
- Title not available (Why is that?)
- Information-Based Optimal Subdata Selection for Big Data Linear Regression
- Distributed testing and estimation under sparse high dimensional models
- Optimal Subsampling for Large Sample Logistic Regression
- An online updating approach for testing the proportional hazards assumption with streams of survival data
- On the relative stability of large order statistics
- Computational Limits of A Distributed Algorithm For Smoothing Spline
- Optimal subsampling for softmax regression
Cited In (12)
- Orthogonal subsampling for big data linear regression
- Optimal Poisson subsampling for softmax regression
- A distance metric-based space-filling subsampling method for nonparametric models
- A selective review on statistical methods for massive data computation: distributed computing, subsampling, and minibatch techniques
- Title not available (Why is that?)
- A Subsampling Method for Regression Problems Based on Minimum Energy Criterion
- Optimal subsampling for least absolute relative error estimators with massive data
- Subdata selection based on orthogonal array for big data
- Optimal subsampling for modal regression in massive data
- Entropy-based subsampling methods for big data
- Divide and conquer for accelerated failure time model with massive time‐to‐event data
- Title not available (Why is that?)
Uses Software
This page was built for publication: Divide-and-conquer information-based optimal subdata selection algorithm
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2321778)