Abstract: How should statistical procedures be designed so as to be scalable computationally to the massive datasets that are increasingly the norm? When coupled with the requirement that an answer to an inferential question be delivered within a certain time budget, this question has significant repercussions for the field of statistics. With the goal of identifying "time-data tradeoffs," we investigate some of the statistical consequences of computational perspectives on scability, in particular divide-and-conquer methodology and hierarchies of convex relaxations.
Recommendations
Cites work
- scientific article; zbMATH DE number 1057566 (Why is no real title available?)
- A note on methods of restoring consistency to the bootstrap
- A simpler approach to matrix completion
- A split-and-conquer approach for analysis of
- Bootstrap methods: another look at the jackknife
- Computational and statistical tradeoffs via convex relaxation
- High-dimensional analysis of semidefinite relaxations for sparse principal components
- Minimax estimation via wavelet shrinkage
- Quantum mechanical algorithms for the nonabelian hidden subgroup problem
- Weak convergence of dependent empirical measures with application to subsampling in function spaces
Cited in
(20)- Statistical and computational tradeoff in genetic algorithm-based estimation
- Computationally and efficient inference for complex large-scale data. Abstracts from the workshop held March 6--12, 2016
- Variational discriminant analysis with variable selection
- A MOM-based ensemble method for robustness, subsampling and hyperparameter tuning
- Parallel-and-stream accelerator for computationally fast supervised learning
- A divide-and-conquer algorithm for core-periphery identification in large networks
- A large-scale constrained joint modeling approach for predicting user activity, engagement, and churn with application to freemium mobile games
- Distributed sequential estimation procedures
- Distributed Censored Quantile Regression
- Relative efficiency of using summary versus individual data in random‐effects meta‐analysis
- Distributed statistical estimation and rates of convergence in normal approximation
- Robust classification via MOM minimization
- Joint integrative analysis of multiple data sources with correlated vector outcomes
- Nonconvex Dantzig selector and its parallel computing algorithm
- A general framework for penalized mixed-effects multitask learning with applications on DNA methylation surrogate biomarkers creation
- Optimal sampling designs for multidimensional streaming time series with application to power grid sensor data
- Discussion of ``Analysis of spatio-temporal mobile phone data: a case study in the metropolitan area of Milan by Piercesare Secchi, Simone Vantini and Valeria Vitelli
- Variable selection for distributed sparse regression under memory constraints
- Eigenvector-based sparse canonical correlation analysis: fast computation for estimation of multiple canonical vectors
- A communication-efficient method for ℓ0 regularization linear regression models
This page was built for publication: On statistics, computation and scalability
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q373533)