Robust and parallel Bayesian model selection

DOI10.1016/J.CSDA.2018.05.016zbMATH Open1469.62178arXiv1610.06194OpenAlexW2536592938WikidataQ129739809 ScholiaQ129739809MaRDI QIDQ1663127FDOQ1663127

Authors: Michael Minyi Zhang, Henry Lam, Lizhen Lin

Publication date: 21 August 2018

Published in: Computational Statistics and Data Analysis (Search for Journal in Brave)

Abstract: Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another challenge one may encounter is the presence of outliers and contaminations that damage the inference quality. The parallel "divide and conquer" model selection strategy divides the observations of the full data set into roughly equal subsets and perform inference and model selection independently on each subset. After local subset inference, this method aggregates the posterior model probabilities or other model/variable selection criteria to obtain a final model by using the notion of geometric median. This approach leads to improved concentration in finding the "correct" model and model parameters and also is provably robust to outliers and data contamination.

Full work available at URL: https://arxiv.org/abs/1610.06194

Recommendations

zbMATH Keywords

model selection machine learning Bayesian statistics scalable inference

Mathematics Subject Classification ID

Computational methods for problems pertaining to statistics (62-08) Bayesian inference (62F15) Statistical aspects of big data and data science (62R07)

Cites Work

Cited In (6)

This page was built for publication: Robust and parallel Bayesian model selection

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1663127)