We describe the application of unsupervised clustering methodologies to the problem of discriminating among ambiguous names found in short passages of text that appear on Web pages. We show how to tailor these methods to handle the very noisy data that we typically find on the Web. We experiment with several variations in feature selection, two methods that automatically determine the number of clusters in the data, two different representations of the contexts to be discriminated, and with dimensionality reduction. Our evaluation is carried out usingWeb contexts for five different ambiguous names that were manually disambiguated to use as a gold standard.
|Original language||English (US)|
|Number of pages||8|
|State||Published - Dec 1 2007|
|Event||IJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007 - Hyderabad, India|
Duration: Jan 8 2007 → Jan 8 2007
|Other||IJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007|
|Period||1/8/07 → 1/8/07|