Name discrimination and E-mail clustering using unsupervised clustering of similar contexts

Anagha Kulkarni, Ted Pedersen

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

In this paper, we apply an unsupervised word-sense discrimination technique based on clustering similar contexts (Purandare & Pedersen, 2004) to the problems of name discrimination and e-mail clustering. Names of people, places, and organizations are not always unique. This can create a problem when we refer to or seek out information about such entities. When this occurs in written text, we show that we can cluster ambiguous names into unique groups by identifying which contexts are similar to each other. It has been previously shown by Pedersen et al. (2005) that this approach can be successfully used for discrimination of names with two-way ambiguity. Here we show that it can be extended to multi-way distinctions as well. On the similar lines of contextual similarity, we also observe that e-mail messages can be treated as contexts, and that in clustering them together we are able to group them based on their underlying topic.

Original languageEnglish (US)
Pages (from-to)37-50
Number of pages14
JournalJournal of Intelligent Systems
Volume17
Issue number1-3
DOIs
StatePublished - Jan 1 2008

Keywords

  • Contextual similarity
  • E-mail clustering
  • Proper name discrimination
  • Unsupervised clustering
  • Word sense discrimination

Fingerprint Dive into the research topics of 'Name discrimination and E-mail clustering using unsupervised clustering of similar contexts'. Together they form a unique fingerprint.

Cite this