An unsupervised language independent method of name discrimination using second order co-occurrence features

Ted Pedersen, Anagha Kulkarni, Roxana Angheluta, Zornitsa Kozareva, Thamar Solorio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations

Abstract

Previous work by Pedersen, Purandare and Kulkarni (2005) has resulted in an unsupervised method of name discrimination that represents the context in which an ambiguous name occurs using second order co-occurrence features. These contexts are then clustered in order to identify which are associated with different underlying named entities. It also extracts descriptive and discriminating bigrams from each of the discovered clusters in order to serve as identifying labels. These methods have been shown to perform well with English text, although we believe them to be language independent since they rely on lexical features and use no syntactic features or external knowledge sources. In this paper we apply this methodology in exactly the same way to Bulgarian, English, Romanian, and Spanish corpora. We find that it attains discrimination accuracy that is consistently well above that of a majority classifier, thus providing support for the hypothesis that the method is language independent.

Original languageEnglish (US)
Title of host publicationComputational Linguistics and Intelligent Text Processing - 7th International Conference, CICLing 2006, Proceedings
PublisherSpringer Verlag
Pages208-222
Number of pages15
ISBN (Print)3540322051, 9783540322054
DOIs
StatePublished - 2006
Event7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006 - Mexico City, Mexico
Duration: Feb 19 2006Feb 25 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3878 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other7th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2006
Country/TerritoryMexico
CityMexico City
Period2/19/062/25/06

Bibliographical note

Funding Information:
We are grateful for general support from the Don T. Nakanishi Award. N. A. Ponce’s work on this study was partially supported by the Robert Wood Johnson Foundation (Advancing the Disaggregation of Ethnic/Racial Data Through Technical Assistance, Training, and Case-Making; grant 76329; primary investigator: N. A. P.). R. C. Chang, N. Pierson, and J. Greer’s work on this study was supported by the University of Chicago, Harris School of a Public Policy, Summer 2020 Computational Analysis and Public Policy Internship Fund.

Fingerprint

Dive into the research topics of 'An unsupervised language independent method of name discrimination using second order co-occurrence features'. Together they form a unique fingerprint.

Cite this