Conditional masking to numerical data

From MaRDI portal
Publication:2321775

DOI10.1007/S42519-019-0042-YzbMATH Open1480.68007arXiv1807.05035OpenAlexW2883491867MaRDI QIDQ2321775FDOQ2321775


Authors: Debolina Ghatak, Bimal K. Roy Edit this on Wikidata


Publication date: 23 August 2019

Published in: Journal of Statistical Theory and Practice (Search for Journal in Brave)

Abstract: Protecting the privacy of data-sets has become hugely important these days. Many real-life data-sets like income data, medical data need to be secured before making it public. However, security comes at the cost of losing some useful statistical information about the data-set. Data obfuscation deals with this problem of masking a data-set in such a way that the utility of the data is maximized while minimizing the risk of the disclosure of sensitive information. Two popular approaches to data obfuscation for numerical data involves (i) data swapping and (ii) adding noise to data. While the former masks well sacrificing the whole of correlation information, the latter gives estimates for most of the popular statistics like mean, variance, quantiles, correlation but fails to give an unbiased estimate of the distribution curve of the original data. In this paper, we propose a mixed method of obfuscation combining the above two approaches and discuss how the proposed method succeeds in giving an unbiased estimation of the distribution curve while giving reliable estimates of the other well-known statistics like moments, correlation.


Full work available at URL: https://arxiv.org/abs/1807.05035




Recommendations




Cites Work






This page was built for publication: Conditional masking to numerical data

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2321775)