Distributed computation for marginal likelihood based model choice
From MaRDI portal
Abstract: We propose a general method for distributed Bayesian model choice, using the marginal likelihood, where a data set is split in non-overlapping subsets. These subsets are only accessed locally by individual workers and no data is shared between the workers. We approximate the model evidence for the full data set through Monte Carlo sampling from the posterior on every subset generating a model evidence per subset. The results are combined using a novel approach which corrects for the splitting using summary statistics of the generated samples. Our divide-and-conquer approach enables Bayesian model choice in the large data setting, exploiting all available information but limiting communication between workers. We derive theoretical error bounds that quantify the resulting trade-off between computational gain and loss in precision. The embarrassingly parallel nature yields important speed-ups when used on massive data sets as illustrated by our real world experiments. In addition, we show how the suggested approach can be extended to model choice within a reversible jump setting that explores multiple feature combinations within one run.
Recommendations
- Robust and parallel Bayesian model selection
- Distributed Bayesian posterior voting strategy for massive data
- Comparing consensus Monte Carlo strategies for distributed Bayesian computation
- Expectation propagation as a way of life: a framework for Bayesian inference on partitioned data
- Bayesian model choice based on Monte Carlo estimates of posterior model probabilities
Cites work
- scientific article; zbMATH DE number 6377992 (Why is no real title available?)
- scientific article; zbMATH DE number 947416 (Why is no real title available?)
- scientific article; zbMATH DE number 6781368 (Why is no real title available?)
- A characterization theorem for externally Bayesian groups
- A survey of Bayesian statistical approaches for big data
- Accurate Approximations for Posterior Moments and Marginal Densities
- An asymptotic analysis of distributed nonparametric methods
- Bayes Factors
- Bayesian Aggregation
- Bayesian Inference in Econometric Models Using Monte Carlo Integration
- Bayesian auxiliary variable models for binary and multinomial regression
- Bayesian data analysis.
- Bayesian group belief
- Bayesian nonparametrics
- Bounding distributional errors via density ratios
- Communication-efficient distributed statistical inference
- Comparing consensus Monte Carlo strategies for distributed Bayesian computation
- Efficient Bayes factor estimation from the reversible jump output
- Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator
- Expectation propagation as a way of life: a framework for Bayesian inference on partitioned data
- From EM to data augmentation: the emergence of MCMC Bayesian computation in the 1980s
- Global Consensus Monte Carlo
- Hamiltonian Monte Carlo with energy conserving subsampling
- Joining and splitting models with Markov melding
- Marginal Likelihood from the Gibbs Output
- Monte Carlo sampling methods using Markov chains and their applications
- Nested sampling for general Bayesian computation
- Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels
- On Russian roulette estimates for Bayesian inference with doubly-intractable likelihoods
- Reversible jump Markov chain Monte Carlo computation and Bayesian model determination
- Robust and parallel Bayesian model selection
- Scalable Bayes via barycenter in Wasserstein space
- Sequential Monte Carlo Samplers
- Shotgun Stochastic Search for “Largep” Regression
- Simulating normalizing constants: From importance sampling to bridge sampling to path sampling
- Speeding Up MCMC by Efficient Data Subsampling
- Statistical analysis of kernel-based least-squares density-ratio estimation
- Subsampling sequential Monte Carlo for static Bayesian models
- The Bayesian Choice
- The Hastings algorithm at fifty
- The original Borda count and partial voting
- Training Products of Experts by Minimizing Contrastive Divergence
- Unbiased Markov Chain Monte Carlo Methods with Couplings
Cited in
(25)- Authors' reply to the discussion of `safe testing'
- Zihao Wen and David L. Dowe's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Maozai Tian, Keming Yu and Jiangfeng Wang's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Andrej Srakar's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Judith ter Schure's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Christian P. Robert and Joshua Bon's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Stefano Rizzelli's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Luigi Pace and Alessandra Salvan's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Joris Mulder's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Alexander Ly's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Sander Greenland's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Neil Dey, Ryan Martin, and Jonathan P. Williams' contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Christine P. Chai's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Marco Cattaneo's contribution to the discussion of ``Safe testing by Grünwald, de Heide, and Koolen
- Joshua bon and christian P. Robert's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Vladimir Vovk's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Wenkai Xu's contribution to the discussion of `safe testing' by Grünwald, de Heide and Koolen
- Christian Hennig's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Glenn Shafer's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Thorsten Dickhaus's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Martin larsson, aaditya ramdas, and johannes Ruf's contribution to the discussion of `safe testing' by Grünwald, de heide, and koolen
- David R. Bickel's contribution to the discussion of `safe testing' by Grünwald, de Heide, and Koolen
- Seconder of the vote of thanks to Grünwald, de Heide, and Koolen and contribution to the discussion of `safe testing'
- Proposer of the vote of thanks to Grünwald, de Heide, and Koolen and contribution to the discussion of `safe testing'
- Safe testing
This page was built for publication: Distributed computation for marginal likelihood based model choice
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6122039)