Conditional masking to numerical data

DOI10.1007/S42519-019-0042-YzbMATH Open1480.68007arXiv1807.05035OpenAlexW2883491867MaRDI QIDQ2321775FDOQ2321775

Publication date: 23 August 2019

Published in: Journal of Statistical Theory and Practice (Search for Journal in Brave)

Abstract: Protecting the privacy of data-sets has become hugely important these days. Many real-life data-sets like income data, medical data need to be secured before making it public. However, security comes at the cost of losing some useful statistical information about the data-set. Data obfuscation deals with this problem of masking a data-set in such a way that the utility of the data is maximized while minimizing the risk of the disclosure of sensitive information. Two popular approaches to data obfuscation for numerical data involves (i) data swapping and (ii) adding noise to data. While the former masks well sacrificing the whole of correlation information, the latter gives estimates for most of the popular statistics like mean, variance, quantiles, correlation but fails to give an unbiased estimate of the distribution curve of the original data. In this paper, we propose a mixed method of obfuscation combining the above two approaches and discuss how the proposed method succeeds in giving an unbiased estimation of the distribution curve while giving reliable estimates of the other well-known statistics like moments, correlation.

Full work available at URL: https://arxiv.org/abs/1807.05035

Recommendations

zbMATH Keywords

privacy protection quantile estimation data obfuscation masking numerical data-sets

Mathematics Subject Classification ID

Statistical aspects of information-theoretic topics (62B10) Privacy of data (68P27)

Cites Work

This page was built for publication: Conditional masking to numerical data

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2321775)