The potential and perils of preprocessing: building new foundations
From MaRDI portal
(Redirected from Publication:373524)
Abstract: Preprocessing forms an oft-neglected foundation for a wide range of statistical and scientific analyses. However, it is rife with subtleties and pitfalls. Decisions made in preprocessing constrain all later analyses and are typically irreversible. Hence, data analysis becomes a collaborative endeavor by all parties involved in data collection, preprocessing and curation, and downstream inference. Even if each party has done its best given the information and resources available to them, the final result may still fall short of the best possible in the traditional single-phase inference framework. This is particularly relevant as we enter the era of "big data". The technologies driving this data explosion are subject to complex new forms of measurement error. Simultaneously, we are accumulating increasingly massive databases of scientific analyses. As a result, preprocessing has become more vital (and potentially more dangerous) than ever before.
Recommendations
Cites work
- scientific article; zbMATH DE number 1220667 (Why is no real title available?)
- scientific article; zbMATH DE number 720689 (Why is no real title available?)
- scientific article; zbMATH DE number 2140075 (Why is no real title available?)
- scientific article; zbMATH DE number 3385132 (Why is no real title available?)
- scientific article; zbMATH DE number 3068103 (Why is no real title available?)
- A Predictive Approach to Model Selection
- An invariant form for the prior probability in estimation problems
- Classifying Gene Expression Profiles from Pairwise mRNA Comparisons
- Comparison of experiments and information measures
- Consistent Estimates Based on Partially Consistent Observations
- Equivalent Comparisons of Experiments
- Exploration, normalization, and summaries of high density oligonucleotide array probe level data
- Handbook of Markov Chain Monte Carlo
- Inference and missing data
- Invariant Prior Distributions
- Multiple Imputation After 18+ Years
- On a Necessary and Sufficient Condition for Admissibility of Estimators When Strictly Convex Loss is Used
- On rereading R. A. Fisher
- On surrogate loss functions and \(f\)-divergences
- On the Reconciliation of Probability Assessments
- Parameter Estimation for the Exponential-Normal Convolution Model for Background Correction of Affymetrix GeneChip Data
- Partial likelihood
- Proper and Improper Multiple Imputation
- Quantization
- Several Bayesians: a review. (With discussion)
- Significance analysis of microarrays applied to the ionizing radiation response
- Statistical decision theory and Bayesian analysis. 2nd ed
- Sufficiency and Approximate Sufficiency
- The Selection of Prior Distributions by Formal Rules
Cited in
(6)- Multiple Improvements of Multiple Imputation Likelihood Ratio Tests
- Recent Developments in the Theory of Pre-processing
- Conducting highly principled data science: a statistician's job and joy
- Discussion: The Q‐q Dynamic for Deeper Learning and Research
- Nonstandard conditionally specified models for nonignorable missing data
- Multi-way blockmodels for analyzing coordinated high-dimensional responses
This page was built for publication: The potential and perils of preprocessing: building new foundations
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q373524)