Abstract
Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, attribute dependency, and multi-modality of categories. Existing classification techniques have limited applicability in the data sets of these natures. In this paper, we present a Weight Ad- justed k-Nearest Neighbor (WAKNN) classification that learns feature weights based on a greedy hill climbing technique. We also present two performance optimizations of WAKNN that improve the computational performance by a few orders of magnitude, but do not compromise on the classification quality. We experimentally evaluated WAKNN on 52 document data sets from a variety of domains and compared its performance against several classification algorithms, such as C4.5, RIPPER, Naive-Bayesian, PEBLS and VSM. Experimental results on these data sets confirm that WAKNN consistently outperforms other existing classification algorithms.
Original language | English (US) |
---|---|
Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Editors | David Cheung, Graham J. Williams, Qing Li |
Publisher | Springer Verlag |
Pages | 53-65 |
Number of pages | 13 |
Volume | 2035 |
ISBN (Print) | 3540419101, 9783540419105 |
DOIs | |
State | Published - 2001 |
Event | 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2001 - Kowloon, Hong Kong Duration: Apr 16 2001 → Apr 18 2001 |
Publication series
Name | Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) |
---|---|
Volume | 2035 |
ISSN (Print) | 0302-9743 |
Other
Other | 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2001 |
---|---|
Country/Territory | Hong Kong |
City | Kowloon |
Period | 4/16/01 → 4/18/01 |
Bibliographical note
Publisher Copyright:© Springer-Verlag Berlin Heidelberg 2001.
Keywords
- K-NN classification
- Text categorization
- Weight adjustments