Abstract
When collecting geocoded confidential data with the intent to disseminate, agencies often resort to altering the geographies before making data publicly available. An alternative to releasing aggregated and/or perturbed data is to release synthetic data, where sensitive values are replaced with draws from models designed to capture distributional features in the data collected. The issues associated with spatially outlying observations in the data, however, have received relatively little attention. Our goal here is to shed light on this problem, to propose a solution—referred to as ‘differential smoothing’—and to illustrate our approach by using sale prices of homes in San Francisco.
Original language | English (US) |
---|---|
Pages (from-to) | 649-661 |
Number of pages | 13 |
Journal | Journal of the Royal Statistical Society. Series A: Statistics in Society |
Volume | 181 |
Issue number | 3 |
DOIs | |
State | Published - Jun 2018 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2018 Royal Statistical Society
Keywords
- Bayesian methods
- Data privacy
- Multiple imputation
- Spatial modelling
- Synthetic data