Using spatiotemporal models to generate synthetic data for public use

Harrison Quick, Lance A. Waller

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

SUMMARY. When agencies release public-use data, they must be cognizant of the potential risk of disclosure associated with making their data publicly available. This issue is particularly pertinent in disease mapping, where small counts pose both inferential challenges and potential disclosure risks. While the small area estimation, disease mapping, and statistical disclosure limitation literatures are individually robust, there have been few intersections between them. Here, we formally propose the use of spatiotemporal data analysis methods to generate synthetic data for public use. Specifically, we analyze ten years of county-level heart disease death counts for multiple age-groups using a Bayesian model that accounts for dependence spatially, temporally, and between age-groups; generating synthetic data from the resulting posterior predictive distribution will preserve these dependencies. After demonstrating the synthetic data's privacy-preserving features, we illustrate their utility by comparing estimates of urban/rural disparities from the synthetic data to those from data with small counts suppressed.

Original languageEnglish (US)
Pages (from-to)37-45
Number of pages9
JournalSpatial and Spatio-temporal Epidemiology
Volume27
DOIs
StatePublished - Nov 2018
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2018 Elsevier Ltd

Keywords

  • Bayesian data analysis
  • Disclosure risk
  • Disease mapping
  • Multivariate conditional autoregressive models
  • Small area estimation

Fingerprint

Dive into the research topics of 'Using spatiotemporal models to generate synthetic data for public use'. Together they form a unique fingerprint.

Cite this