The potential and perils of preprocessing: building new foundations

From MaRDI portal
Publication:373524

DOI10.3150/13-BEJSP16zbMATH Open1440.62019arXiv1309.6790OpenAlexW2023338600MaRDI QIDQ373524FDOQ373524


Authors: Alexander W. Blocker, Xiao-Li Meng Edit this on Wikidata


Publication date: 17 October 2013

Published in: Bernoulli (Search for Journal in Brave)

Abstract: Preprocessing forms an oft-neglected foundation for a wide range of statistical and scientific analyses. However, it is rife with subtleties and pitfalls. Decisions made in preprocessing constrain all later analyses and are typically irreversible. Hence, data analysis becomes a collaborative endeavor by all parties involved in data collection, preprocessing and curation, and downstream inference. Even if each party has done its best given the information and resources available to them, the final result may still fall short of the best possible in the traditional single-phase inference framework. This is particularly relevant as we enter the era of "big data". The technologies driving this data explosion are subject to complex new forms of measurement error. Simultaneously, we are accumulating increasingly massive databases of scientific analyses. As a result, preprocessing has become more vital (and potentially more dangerous) than ever before.


Full work available at URL: https://arxiv.org/abs/1309.6790




Recommendations




Cites Work


Cited In (6)

Uses Software





This page was built for publication: The potential and perils of preprocessing: building new foundations

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q373524)