TY - GEN
T1 - Unsupervised discrimination of person names in Web contexts
AU - Pedersen, Ted
AU - Kulkarni, Anagha
PY - 2007
Y1 - 2007
N2 - Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques to discriminate among entities named in Web pages. We examine three main issues via an extensive experimental study. First, the effect of using a held-out set of training data for feature selection versus using the data in which the ambiguous names occur. Second, the impact of using different measures of association for identifying lexical features. Third, the success of different cluster stopping measures that automatically determine the number of clusters in the data.
AB - Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques to discriminate among entities named in Web pages. We examine three main issues via an extensive experimental study. First, the effect of using a held-out set of training data for feature selection versus using the data in which the ambiguous names occur. Second, the impact of using different measures of association for identifying lexical features. Third, the success of different cluster stopping measures that automatically determine the number of clusters in the data.
UR - http://www.scopus.com/inward/record.url?scp=37149024114&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=37149024114&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-70939-8_27
DO - 10.1007/978-3-540-70939-8_27
M3 - Conference contribution
AN - SCOPUS:37149024114
SN - 354070938X
SN - 9783540709381
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 299
EP - 310
BT - Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings
PB - Springer Verlag
T2 - 8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007
Y2 - 18 February 2007 through 24 February 2007
ER -