Generating Poisson-distributed differentially private synthetic data

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

The dissemination of synthetic data can be an effective means of making information from sensitive data publicly available with a reduced risk of disclosure. While mechanisms exist for synthesizing data that satisfy formal privacy guarantees, these mechanisms do not typically resemble the models an end-user might use to analyse the data. More recently, the use of methods from the disease mapping literature has been proposed to generate spatially referenced synthetic data with high utility but without formal privacy guarantees. The objective for this paper is to help bridge the gap between the disease mapping and the differential privacy literatures. In particular, we generalize an approach for generating differentially private synthetic data currently used by the US Census Bureau to the case of Poisson-distributed count data in a way that accommodates heterogeneity in population sizes and allows for the infusion of prior information regarding the underlying event rates. Following a pair of small simulation studies, we illustrate the utility of the synthetic data produced by this approach using publicly available, county-level heart disease-related death counts. This study demonstrates the benefits of the proposed approach’s flexibility with respect to heterogeneity in population sizes and event rates while motivating further research to improve its utility.

Original languageEnglish (US)
Pages (from-to)1093-1108
Number of pages16
JournalJournal of the Royal Statistical Society. Series A: Statistics in Society
Volume184
Issue number3
DOIs
StatePublished - Jul 2021
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2021 Royal Statistical Society

Keywords

  • Bayesian methods
  • confidentiality
  • data suppression
  • disclosure risk
  • spatial data
  • uncertainty

Fingerprint

Dive into the research topics of 'Generating Poisson-distributed differentially private synthetic data'. Together they form a unique fingerprint.

Cite this