Distribution-invariant differential privacy

Research output: Contribution to journalArticlepeer-review

Abstract

Differential privacy is becoming one gold standard for protecting the privacy of publicly shared data. It has been widely used in social science, data science, public health, information technology, and the U.S. decennial census. Nevertheless, to guarantee differential privacy, existing methods may unavoidably alter the conclusion of original data analysis, as privatization often changes the sample distribution. This phenomenon is known as the trade-off between privacy protection and statistical accuracy. In this work, we mitigate this trade-off by developing a distribution-invariant privatization (DIP) method to reconcile both high statistical accuracy and strict differential privacy. As a result, any downstream statistical or machine learning task yields essentially the same conclusion as if one used the original data. Numerically, under the same strictness of privacy protection, DIP achieves superior statistical accuracy in in a wide range of simulation studies and real-world benchmarks.

Original languageEnglish (US)
JournalJournal of Econometrics
DOIs
StateAccepted/In press - 2022

Bibliographical note

Funding Information:
The authors thank the editor and three reviewers for insightful comments and suggestions, which improve the article significantly. This research is supported in part by NSF, USA grant DMS-1952539 , and NIH, USA grants R01AG069895 , R01AG065636 , 1R01GM126002 , R01HL105397 , R01AG074858 , and U01AG073079 .

Publisher Copyright:
© 2022 Elsevier B.V.

Keywords

  • Data perturbation
  • Data sharing
  • Distribution preservation
  • Privacy protection
  • Randomized mechanism

Fingerprint

Dive into the research topics of 'Distribution-invariant differential privacy'. Together they form a unique fingerprint.

Cite this