Fast Supervised Dimensionality Reduction Algorithm with Applications to Document Categorization & Retrieval

George Karypis, Eui Hong Han

Research output: Contribution to conferencePaperpeer-review

66 Scopus citations

Abstract

Retriev al techniques based on dimensionaljfc reduction, such as Latent Semantic Indexing (LSI), have been shown to improve the quality of the information being retrieved by capturing the latent meaning of the words present in the documents. Unfortunately, the high computational and memory requirements of LSI and its inabilit yto compute an effective dimensionality reduction in a supervised setting limits its applicability. In this paper we present a fast supervised dimensionality reduction algorithm that is derived from the recently dev eloped cluster-based unsupervised dimensionality reduction algorithms. We experimentally evaluate the quality of the low er dimensional spaces both in the coitext of document categorization and improvements in retrieval performance on a variety of different document collections. Our experiments sho w that the lower dimensional spaces computed by our algorithm consistently improve the performance of traditional algorithms such as C4.5, fc-nearestneigh bor, and Support Vector Machines (SVM), by an average of 2% to 7%. Furthermore, the supervised lover dimensional space greatly improves the retriev al performance when compared to LSI.

Original languageEnglish (US)
Pages12-19
Number of pages8
DOIs
StatePublished - 2000
Externally publishedYes
Event9th International Conference on Information and Knowledge Management (CIKM 2000) - McLean, VA, United States
Duration: Nov 10 2000 → …

Conference

Conference9th International Conference on Information and Knowledge Management (CIKM 2000)
Country/TerritoryUnited States
CityMcLean, VA
Period11/10/00 → …

Bibliographical note

Publisher Copyright:
© ACM 2000.

Fingerprint

Dive into the research topics of 'Fast Supervised Dimensionality Reduction Algorithm with Applications to Document Categorization & Retrieval'. Together they form a unique fingerprint.

Cite this