Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database

Robert McCaa, Krishnamurty Muralidhar, Rathindra Sarathy, Michael Comerford, Albert Esteve-Palos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microdata is a sine qua non of the IPUMS project. For the 2010 round of censuses, even greater protections are required, while researchers are demanding ever higher precision and utility. This paper describes a tripartite collaborative experiment using a ten percent household sample of the 2011 census of Ireland to estimate risk, mask the microdata using controlled shuffling, and assess analytical utility by comparing the masked data against the unprotected source microdata. Controlled shuffling exploits hierarchically ordered coding schemes to protect privacy and enhance utility. With controlled shuffling, the lesson seems to be the more detail means less risk and greater utility. Overall, despite substantial perturbation of the masked dataset (30% of adults on one or more characteristic), we find that data utility is very high and information loss is slight, even for fairly complex analytical problems.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsJosep Domingo-Ferrer
PublisherSpringer Verlag
Pages326-337
Number of pages12
ISBN (Electronic)9783319112565
DOIs
StatePublished - Jan 1 2014
EventInternational Conference on Privacy in Statistical Databases, PSD 2014 - Ibiza, Spain
Duration: Sep 17 2014Sep 19 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8744
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherInternational Conference on Privacy in Statistical Databases, PSD 2014
CountrySpain
CityIbiza
Period9/17/149/19/14

Keywords

  • Controlled shuffling
  • Data privacy
  • Data utility
  • IPUMS-International
  • Ireland
  • Microdata sample
  • Population census
  • Statistical disclosure controls

Fingerprint Dive into the research topics of 'Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database'. Together they form a unique fingerprint.

  • Cite this

    McCaa, R., Muralidhar, K., Sarathy, R., Comerford, M., & Esteve-Palos, A. (2014). Controlled shuffling, statistical confidentiality and microdata utility: A successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-international database. In J. Domingo-Ferrer (Ed.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 326-337). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8744). Springer Verlag. https://doi.org/10.1007/978-3-319-11257-2_25