You are what you say: Privacy risks of public mentions

Dan Frankowski, Dan Cosley, Shilad Sen, Loren Terveen, John Riedl

Research output: Chapter in Book/Report/Conference proceedingConference contribution

54 Scopus citations

Abstract

In today's data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people's intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address. This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn't work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven't rated.

Original languageEnglish (US)
Title of host publicationProceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages565-572
Number of pages8
StatePublished - Oct 31 2006
Event29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - Seatttle, WA, United States
Duration: Aug 6 2006Aug 11 2006

Publication series

NameProceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Volume2006

Other

Other29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
CountryUnited States
CitySeatttle, WA
Period8/6/068/11/06

Keywords

  • Datasets
  • Mentions
  • Privacy
  • Re-identification
  • Sparse relation space
  • k-anonymity
  • k-identification

Fingerprint Dive into the research topics of 'You are what you say: Privacy risks of public mentions'. Together they form a unique fingerprint.

Cite this