Robust Bayesian inference for big data: combining sensor-based records with traditional survey data
From MaRDI portal
Publication:2154206
Abstract: Big Data often presents as massive non-probability samples. Not only is the selection mechanism often unknown, but larger data volume amplifies the relative contribution of selection bias to total error. Existing bias adjustment approaches assume that the conditional mean structures have been correctly specified for the selection indicator or key substantive measures. In the presence of a reference probability sample, these methods rely on a pseudo-likelihood method to account for the sampling weights of the reference sample, which is parametric in nature. Under a Bayesian framework, handling the sampling weights is an even bigger hurdle. To further protect against model misspecification, we expand the idea of double robustness such that more flexible non-parametric methods, as well as Bayesian models, can be used for prediction. In particular, we employ Bayesian additive regression trees, which not only capture non-linear associations automatically but permit direct quantification of the uncertainty of point estimates through its posterior predictive draws. We apply our method to sensor-based naturalistic driving data from the second Strategic Highway Research Program using the 2017 National Household Travel Survey as a benchmark.
Recommendations
- A survey of Bayesian statistical approaches for big data
- Data Integration by Combining Big Data and Survey Sample Data for Finite Population Inference
- Comment: A brief survey of the current state of play for Bayesian computation in data science at big-data scale
- Bayesian empirical likelihood inference with complex survey data
- Nonparametric Bayesian aggregation for massive data
- Bayesian data fusion: Probabilistic sensitivity analysis for unmeasured confounding using informative priors based on secondary data
- Robust Bayesian analysis: sensitivity to the prior
- Practical Bayesian modeling and inference for massive spatial data sets on modest computing environments†
Cites work
- scientific article; zbMATH DE number 2140075 (Why is no real title available?)
- A Distributional Approach for Causal Inference Using Propensity Scores
- A Note on Handling Nonresponse in Sample Surveys
- A comparative study of doubly robust estimators of the mean with missing data
- A two-step Bayesian approach for propensity score analysis: simulations and case study
- Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models
- Assessing lack of common support in causal inference using Bayesian nonparametrics: Implications for evaluating the effect of breastfeeding on children's cognitive outcomes
- BART: Bayesian additive regression trees
- Bayesian approach for addressing differential covariate measurement error in propensity score methods
- Bayesian propensity score analysis for clustered observational data
- Beta Regression for Modelling Rates and Proportions
- Combining data from two independent surveys: a model-assisted approach
- Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data
- Doubly Robust Estimation in Missing Data and Causal Inference Models
- Doubly Robust Inference With Nonprobability Survey Samples
- Doubly robust inference with missing data in survey sampling
- Estimation of Regression Coefficients When Some Regressors Are Not Always Observed
- Estimation with missing data: beyond double robustness
- Imputation using response probability
- Inference for domains under imputation for missing survey data
- Inference for nonprobability samples
- Model feedback in Bayesian propensity score estimation
- Modelling and generating correlated binary variables
- On the Bias of the Multiple-Imputation Variance Estimator in Survey Sampling
- On the Validity of Inferences from Non-random Sample
- Parametric and semi-parametric estimation of regression models fitted to survey data
- Predicting human-driving behavior to help driverless vehicles drive: random intercept Bayesian additive regression trees
- Resampling Inference With Complex Survey Data
- Robust Model-Based Inference for Incomplete Data via Penalized Spline Propensity Prediction
- Statistical paradises and paradoxes in big data. I: Law of large populations, big data paradox, and the 2016 US presidential election
- The central role of the propensity score in observational studies for causal effects
- To Model or Not To Model? Competing Modes of Inference for Finite Population Sampling
- Using calibration weighting to adjust for nonignorable unit nonresponse
Cited in
(1)
This page was built for publication: Robust Bayesian inference for big data: combining sensor-based records with traditional survey data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2154206)