The cost of privacy: optimal rates of convergence for parameter estimation with differential privacy

DOI10.1214/21-AOS2058MaRDI QIDQ2054532zbMATH OpenOpenAlexFDO

Authors Yichen Wang, Linjun Zhang, T. Tony Cai

Publication date 3 December 2021

Published in The Annals of Statistics (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1902.04495, https://projecteuclid.org/journals/annals-of-statistics/volume-49/issue-5/The-cost-of-privacy-Optimal-rates-of-convergence-for/10.1214/21-AOS2058.full

zbMATH Keywords

high-dimensional data linear regression minimax optimality differential privacy mean estimation

Mathematics Subject Classification ID

Asymptotic properties of parametric estimators (62F12) Linear regression; mixed models (62J05) Minimax procedures in statistical decision theory (62C20) Parametric inference under constraints (62F30) Authentication, digital signatures and secret sharing (94A62)

Abstract: Privacy-preserving data analysis is a rising challenge in contemporary statistics, as the privacy guarantees of statistical methods are often achieved at the expense of accuracy. In this paper, we investigate the tradeoff between statistical accuracy and privacy in mean estimation and linear regression, under both the classical low-dimensional and modern high-dimensional settings. A primary focus is to establish minimax optimality for statistical estimation with the

(v a r e p s i l o n, d e l t a)

-differential privacy constraint. To this end, we find that classical lower bound arguments fail to yield sharp results, and new technical tools are called for. By refining the "tracing adversary" technique for lower bounds in the theoretical computer science literature, we formulate a general lower bound argument for minimax risks with differential privacy constraints, and apply this argument to high-dimensional mean estimation and linear regression problems. We also design computationally efficient algorithms that attain the minimax lower bounds up to a logarithmic factor. In particular, for the high-dimensional linear regression, a novel private iterative hard thresholding pursuit algorithm is proposed, based on a privately truncated version of stochastic gradient descent. The numerical performance of these algorithms is demonstrated by simulation studies and applications to real data containing sensitive information, for which privacy-preserving statistical methods are necessary.

Recommendations

Cites work

Cited in

(30)

Describes a project that uses

Uses Software

RAPPOR

This page was built for publication: The cost of privacy: optimal rates of convergence for parameter estimation with differential privacy

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2054532)