Improving the Utility of Poisson-Distributed, Differentially Private Synthetic Data Via Prior Predictive Truncation with an Application to CDC WONDER

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

CDC WONDER is a web-based tool for the dissemination of epidemiologic data collected by the National Vital Statistics System. While CDC WONDER has built-in privacy protections, they do not satisfy formal privacy protections such as differential privacy and thus are susceptible to targeted attacks. Given the importance of making high-quality public health data publicly available while preserving the privacy of the underlying data subjects, we aim to improve the utility of a recently developed approach for generating Poisson-distributed, differentially private synthetic data by using publicly available information to truncate the range of the synthetic data. Specifically, we utilize county-level population information from the US Census Bureau and national death reports produced by the CDC to inform prior distributions on county-level death rates and infer reasonable ranges for Poisson-distributed, county-level death counts. In doing so, the requirements for satisfying differential privacy for a given privacy budget can be reduced by several orders of magnitude, thereby leading to substantial improvements in utility. To illustrate our proposed approach, we consider a dataset comprised of over 26,000 cancer-related deaths from the Commonwealth of Pennsylvania belonging to over 47,000 combinations of cause-of-death and demographic variables such as age, race, sex, and county-of-residence and demonstrate the proposed framework's ability to preserve features such as geographic, urban/rural, and racial disparities present in the true data.

Original languageEnglish (US)
Pages (from-to)596-617
Number of pages22
JournalJournal of Survey Statistics and Methodology
Volume10
Issue number3
DOIs
StatePublished - Jun 1 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2022 The Author(s) 2022. Published by Oxford University Press on behalf of the American Association for Public Opinion Research.

Keywords

  • Bayesian methods
  • Cancer mortality
  • Confidentiality
  • Data suppression
  • Disclosure risk
  • Spatial data

Fingerprint

Dive into the research topics of 'Improving the Utility of Poisson-Distributed, Differentially Private Synthetic Data Via Prior Predictive Truncation with an Application to CDC WONDER'. Together they form a unique fingerprint.

Cite this