The effect of different context representations on word sense discrimination in biomedical texts

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

Unsupervised word sense discrimination relies on the idea that words that occur in similar contexts will have similar meanings. These techniques cluster multiple contexts in which an ambiguous word occurs, and the number of clusters discovered indicates the number of senses in which the ambiguous word is used. One important distinction among these methods is the underlying means of representing the contexts to be clustered. This paper compares the efficacy of first-order methods that directly represent the features that occur in a context with several second-order methods that use a more indirect representation. The experiments in this paper show that second order methods that use word by word co-occurrence matrices result in the highest accuracy and most robust word sense discrimination. These experiments were conducted on MedLine abstracts that contained pseudo - words created by conflating together pairs of MeSH preferred terms to create new ambiguous words. The experiments were carried out with SenseClusters, a freely available open source software package.

Original languageEnglish (US)
Title of host publicationIHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium
Pages56-65
Number of pages10
DOIs
StatePublished - 2010
Externally publishedYes
Event1st ACM International Health Informatics Symposium, IHI'10 - Arlington, VA, United States
Duration: Nov 11 2010Nov 12 2010

Publication series

NameIHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium

Other

Other1st ACM International Health Informatics Symposium, IHI'10
Country/TerritoryUnited States
CityArlington, VA
Period11/11/1011/12/10

Keywords

  • natural language processing
  • semantic ambiguity
  • word sense discrimination

Fingerprint

Dive into the research topics of 'The effect of different context representations on word sense discrimination in biomedical texts'. Together they form a unique fingerprint.

Cite this