Multiple imputation for sharing precise geographies in public use data
From MaRDI portal
Abstract: When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects' identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data.
Recommendations
- Releasing Multiply Imputed, Synthetic Public use Microdata: An Illustration and Empirical Study
- Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality
- Distribution-preserving statistical disclosure limitation
- A new approach for disclosure control in the IAB establishment panel -- multiple imputation for a better data access
- Sampling with synthesis: a new approach for releasing public use census microdata
Cites work
- scientific article; zbMATH DE number 3860199 (Why is no real title available?)
- scientific article; zbMATH DE number 2020395 (Why is no real title available?)
- scientific article; zbMATH DE number 5274812 (Why is no real title available?)
- A smoothing approach for masking spatial data
- BART: Bayesian additive regression trees
- Data dissemination and disclosure limitation in a world without microdata: a risk-utility framework for remote access analysis servers
- Data-swapping: A technique for disclosure control
- Estimating Risks of Identification Disclosure in Microdata
- Gaussian Predictive Process Models for Large Spatial Data Sets
- Multiple imputation for sharing precise geographies in public use data
- Releasing Multiply Imputed, Synthetic Public use Microdata: An Illustration and Empirical Study
- Sampling with synthesis: a new approach for releasing public use census microdata
- Significance tests for multi-component estimands from multiply imputed, synthetic microdata
Cited in
(6)- 30 years of synthetic data
- Providing access to confidential research data through synthesis and verification: an application to data on employees of the U.S. federal government
- Bayesian multiscale multiple imputation with implications for data confidentiality
- Simultaneous edit-imputation and disclosure limitation for business establishment data
- Multiple imputation for sharing precise geographies in public use data
- Multiple-shrinkage multinomial probit models with applications to simulating geographies in public use data
This page was built for publication: Multiple imputation for sharing precise geographies in public use data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2428744)