Abstract
In the current era of global internet connectivity, privacy concerns are of the utmost importance. When official statistical agencies collect spatially referenced, confidential data that they intend to release as public-use files, the suppression of small counts is a common measure that agencies take to protect the confidentiality of the datasubjects from ill-intentioned users. The goal of this paper is to demonstrate that an interval suppression criterion that does not suppress zeros can fail to protect regions with a single occurrence. We illustrate the difference in disclosure risk between an interval suppression criterion and a one-sided suppression criterion by considering a US county-level dataset composed of the number of deaths due to stroke in White men. Here, we illustrate that an interval suppression criterion leads to a twofold increase in the disclosure risk when compared with a one-sided suppression criterion for regions with a single incidence among a population of less than 600. We conclude with an extension of these findings beyond stroke mortality and by offering general guidelines for data suppression.
Original language | English (US) |
---|---|
Pages (from-to) | 227-234 |
Number of pages | 8 |
Journal | Stat |
Volume | 4 |
Issue number | 1 |
DOIs | |
State | Published - Feb 2015 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2015 John Wiley & Sons, Ltd.
Keywords
- Bayesian methods
- Data privacy
- Disclosure limitation
- Spatial data analysis
- Synthetic data