Information-Based Optimal Subdata Selection for Big Data Linear Regression
From MaRDI portal
Publication:5229921
DOI10.1080/01621459.2017.1408468zbMath1478.62196arXiv1710.10382OpenAlexW3099924168MaRDI QIDQ5229921
John Stufken, Hai Ying Wang, Min Yang
Publication date: 19 August 2019
Published in: Journal of the American Statistical Association (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1710.10382
Ridge regression; shrinkage estimators (Lasso) (62J07) Linear regression; mixed models (62J05) Optimal statistical designs (62K05) Point estimation (62F10)
Related Items (59)
Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators With Massive Data ⋮ Robust active learning with binary responses ⋮ Distributed subdata selection for big data via sampling-based approach ⋮ Sequential online subsampling for thinning experimental designs ⋮ Optimal subsample selection for massive logistic regression with distributed data ⋮ Score-matching representative approach for big data analysis with generalized linear models ⋮ A two-stage optimal subsampling estimation for missing data problems with large-scale data ⋮ Randomized Spectral Clustering in Large-Scale Stochastic Block Models ⋮ Inversion-free subsampling Newton's method for large sample logistic regression ⋮ Optimal Sampling for Generalized Linear Models Under Measurement Constraints ⋮ LowCon: A Design-based Subsampling Approach in a Misspecified Linear Model ⋮ Online Updating of Survival Analysis ⋮ Unnamed Item ⋮ Gaussian Process Prediction using Design-Based Subsampling ⋮ Surface temperature monitoring in liver procurement via functional variance change-point analysis ⋮ Model Checking in Large-Scale Dataset via Structure-Adaptive-Sampling ⋮ Divide and conquer for accelerated failure time model with massive time‐to‐event data ⋮ Optimal subsampling for large‐sample quantile regression with massive data ⋮ Fast Calibration for Computer Models with Massive Physical Observations ⋮ Information-based optimal subdata selection for big data logistic regression ⋮ Optimal subsampling for multiplicative regression with massive data ⋮ Online updating method to correct for measurement error in big data streams ⋮ Subsampling spectral clustering for stochastic block models in large-scale networks ⋮ Global debiased DC estimations for biased estimators via pro forma regression ⋮ Information-based optimal subdata selection for non-linear models ⋮ Optimal subsampling design for polynomial regression in one covariate ⋮ Accounting for outliers in optimal subsampling methods ⋮ A model robust subsampling approach for generalised linear models in big data settings ⋮ Predictive Subdata Selection for Computer Models ⋮ Sketched approximation of regularized canonical correlation analysis ⋮ Optimal subsampling for softmax regression ⋮ Subdata selection based on orthogonal array for big data ⋮ Generalized linear models for massive data via doubly-sketching ⋮ Optimal subsampling algorithms for composite quantile regression in massive data ⋮ Optimal sampling designs for multidimensional streaming time series with application to power grid sensor data ⋮ Adaptive iterative Hessian sketch via \(A\)-optimal subsampling ⋮ Subsampling in longitudinal models ⋮ Unnamed Item ⋮ LIC criterion for optimal subset selection in distributed interval estimation ⋮ Experimental Design Issues in Big Data: The Question of Bias ⋮ Crawling subsampling for multivariate spatial autoregression model in large-scale networks ⋮ Randomized sketches for kernel CCA ⋮ On greedy heuristics for computing D-efficient saturated subsets ⋮ Unnamed Item ⋮ Orthogonal subsampling for big data linear regression ⋮ Optimal subsampling for large-scale quantile regression ⋮ Surprise sampling: improving and extending the local case-control sampling ⋮ Optimal designs for model averaging in non-nested models ⋮ Model-robust subdata selection for big data ⋮ Accounting for Factor Variables in Big Data Regression ⋮ Divide-and-conquer information-based optimal subdata selection algorithm ⋮ Ascent with quadratic assistance for the construction of exact experimental designs ⋮ Parallel-and-stream accelerator for computationally fast supervised learning ⋮ Optimal subsampling for composite quantile regression in big data ⋮ Optimal subsampling for least absolute relative error estimators with massive data ⋮ Model-free global likelihood subsampling for massive data ⋮ On stochastic Kaczmarz type methods for solving large scale systems of ill-posed equations ⋮ Comments on ``Data science, big data and statistics ⋮ Subdata selection algorithm for linear model discrimination
Uses Software
Cites Work
- Aggregated estimating equation estimation
- Faster least squares approximation
- Lasso-type recovery of sparse representations for high-dimensional data
- The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). (With discussions and rejoinder).
- Sampling algorithms for l2 regression and applications
- On the relative stability of large order statistics
- Sure Independence Screening for Ultrahigh Dimensional Feature Space
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
This page was built for publication: Information-Based Optimal Subdata Selection for Big Data Linear Regression