The potential and perils of preprocessing: building new foundations

DOI10.3150/13-BEJSP16MaRDI QIDQ373524zbMATH OpenOpenAlexFDO

Authors Alexander W. Blocker, Xiao-Li Meng

Publication date 17 October 2013

Published in Bernoulli (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1309.6790, https://projecteuclid.org/euclid.bj/1377612848

measurement error multiple imputation data compression data repositories multiphase inference statistical principles

Computational methods for problems pertaining to statistics (62-08) Foundations and philosophical topics in statistics (62A01) Sampling theory, sample surveys (62D05) Missing data (62D10)

Abstract: Preprocessing forms an oft-neglected foundation for a wide range of statistical and scientific analyses. However, it is rife with subtleties and pitfalls. Decisions made in preprocessing constrain all later analyses and are typically irreversible. Hence, data analysis becomes a collaborative endeavor by all parties involved in data collection, preprocessing and curation, and downstream inference. Even if each party has done its best given the information and resources available to them, the final result may still fall short of the best possible in the traditional single-phase inference framework. This is particularly relevant as we enter the era of "big data". The technologies driving this data explosion are subject to complex new forms of measurement error. Simultaneously, we are accumulating increasingly massive databases of scientific analyses. As a result, preprocessing has become more vital (and potentially more dangerous) than ever before.

Recommendations

Cites work

Cited in

(6)

Describes a project that uses

Uses Software

This page was built for publication: The potential and perils of preprocessing: building new foundations

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q373524)