Abstract
This paper presents a corpus-based approach to word sense disambiguation that builds an ensemble of Naive Bayesian classifiers, each of which is based on lexical features that represent co-occurring words in varying sized windows of context. Despite the simplicity of this approach, empirical results disambiguating the widely studied nouns line and interest show that such an ensemble achieves accuracy rivaling the best previously published results.
Original language | English (US) |
---|---|
Pages | 63-69 |
Number of pages | 7 |
State | Published - 2000 |
Event | 1st Meeting of the North American Chapter of the Association for Computational Linguistics, NAACL 2000 - Seattle, United States Duration: Apr 29 2000 → May 4 2000 |
Conference
Conference | 1st Meeting of the North American Chapter of the Association for Computational Linguistics, NAACL 2000 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 4/29/00 → 5/4/00 |
Bibliographical note
Funding Information:This work extends ideas that began in collaboration with Rebecca Bruce and Janyce Wiebe. Claudia Leacock and Raymond Mooney provided valuable assistance with the line data. I am indebted to an anonymous reviewer who pointed out the importance of separate test and devtest data sets. A preliminary version of this paper appears in (Pedersen, 2000).
Publisher Copyright:
© ANLP 2000. All rights reserved.