Abstract
Retriev al techniques based on dimensionaljfc reduction, such as Latent Semantic Indexing (LSI), have been shown to improve the quality of the information being retrieved by capturing the latent meaning of the words present in the documents. Unfortunately, the high computational and memory requirements of LSI and its inabilit yto compute an effective dimensionality reduction in a supervised setting limits its applicability. In this paper we present a fast supervised dimensionality reduction algorithm that is derived from the recently dev eloped cluster-based unsupervised dimensionality reduction algorithms. We experimentally evaluate the quality of the low er dimensional spaces both in the coitext of document categorization and improvements in retrieval performance on a variety of different document collections. Our experiments sho w that the lower dimensional spaces computed by our algorithm consistently improve the performance of traditional algorithms such as C4.5, fc-nearestneigh bor, and Support Vector Machines (SVM), by an average of 2% to 7%. Furthermore, the supervised lover dimensional space greatly improves the retriev al performance when compared to LSI.
Original language | English (US) |
---|---|
Pages | 12-19 |
Number of pages | 8 |
DOIs | |
State | Published - 2000 |
Externally published | Yes |
Event | 9th International Conference on Information and Knowledge Management (CIKM 2000) - McLean, VA, United States Duration: Nov 10 2000 → … |
Conference
Conference | 9th International Conference on Information and Knowledge Management (CIKM 2000) |
---|---|
Country/Territory | United States |
City | McLean, VA |
Period | 11/10/00 → … |
Bibliographical note
Publisher Copyright:© ACM 2000.