Generating partially synthetic geocoded public use data with decreased disclosure risk by using differential smoothing

Harrison Quick, Scott H. Holan, Christopher K. Wikle

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

When collecting geocoded confidential data with the intent to disseminate, agencies often resort to altering the geographies before making data publicly available. An alternative to releasing aggregated and/or perturbed data is to release synthetic data, where sensitive values are replaced with draws from models designed to capture distributional features in the data collected. The issues associated with spatially outlying observations in the data, however, have received relatively little attention. Our goal here is to shed light on this problem, to propose a solution—referred to as ‘differential smoothing’—and to illustrate our approach by using sale prices of homes in San Francisco.

Original languageEnglish (US)
Pages (from-to)649-661
Number of pages13
JournalJournal of the Royal Statistical Society. Series A: Statistics in Society
Volume181
Issue number3
DOIs
StatePublished - Jun 2018
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2018 Royal Statistical Society

Keywords

  • Bayesian methods
  • Data privacy
  • Multiple imputation
  • Spatial modelling
  • Synthetic data

Fingerprint

Dive into the research topics of 'Generating partially synthetic geocoded public use data with decreased disclosure risk by using differential smoothing'. Together they form a unique fingerprint.

Cite this