Identifying similar words and contexts in natural language with SenseClusters

Ted Pedersen, Anagha Kulkarni

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

SenseClusters is a freely available intelligent system that clusters together similar contexts in natural language text. Thereafter it assigns identifying labels to these clusters based on their content. It is a purely unsupervised approach that is language independent, and uses no knowledge other than what is available in raw un-annotated corpora. In addition to clustering similar contexts, it can be used to identify synonyms and sets of related words. It has been applied to a diverse range of problems, including proper name disambiguation, word sense discrimination, email organization, and document clustering. SenseClusters is a complete system that supports feature selection from large corpora, several different context representation schemes, various clustering algorithms, the creation of descriptive and discriminating labels for the discovered clusters, and evaluation relative to gold standard data.

Original languageEnglish (US)
Title of host publicationProceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
Pages1694-1695
Number of pages2
Volume4
StatePublished - Dec 1 2005
Event20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05 - Pittsburgh, PA, United States
Duration: Jul 9 2005Jul 13 2005

Other

Other20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
Country/TerritoryUnited States
CityPittsburgh, PA
Period7/9/057/13/05

Fingerprint

Dive into the research topics of 'Identifying similar words and contexts in natural language with SenseClusters'. Together they form a unique fingerprint.

Cite this