Name discrimination and email clustering using unsupervised clustering and labeling of similar contexts

Anagha Kulkarni, Ted Pedersen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

In this paper, we apply an unsupervised word sense discrim ination technique based on clustering similar contexts (Purandare and Pedersen, 2004) to the problems of name discrimination and email clus tering. Names of people, places, and organizations are not always unique. This can create a problem when we refer to or seek out information about such entities. When this occurs in written text, we show that we can clus-ter ambiguous names into unique groups by identifying which contexts are similar to each other. It has been previously shown by (Pedersen, Pu randare, and Kulkarni, 2005) that this approach can be successfully used for discrimination of names with two-way ambiguity. Here we show that it can be extended to multiway distinctions as well. We adapt the clus ter labeling technique introduced by (Kulkarni, 2005) for the multiway distinctions of name discrimination. On the similar lines of contextual similarity, we also observe that email messages can be treated as con texts, and that in clustering them together we are able to group them based on their underlying content rather than the occurrence of specific strings.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd Indian International Conference on Artificial Intelligence, IICAI 2005
Pages703-722
Number of pages20
StatePublished - 2005
Event2nd Indian International Conference on Artificial Intelligence, IICAI 2005 - Pune, India
Duration: Dec 20 2005Dec 22 2005

Publication series

NameProceedings of the 2nd Indian International Conference on Artificial Intelligence, IICAI 2005

Other

Other2nd Indian International Conference on Artificial Intelligence, IICAI 2005
Country/TerritoryIndia
CityPune
Period12/20/0512/22/05

Fingerprint

Dive into the research topics of 'Name discrimination and email clustering using unsupervised clustering and labeling of similar contexts'. Together they form a unique fingerprint.

Cite this