Text categorization using weight adjusted k-nearest neighbor classification

Research output: Chapter in Book/Report/Conference proceedingConference contribution

216 Scopus citations

Abstract

Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, attribute dependency, and multi-modality of categories. Existing classification techniques have limited applicability in the data sets of these natures. In this paper, we present a Weight Ad- justed k-Nearest Neighbor (WAKNN) classification that learns feature weights based on a greedy hill climbing technique. We also present two performance optimizations of WAKNN that improve the computational performance by a few orders of magnitude, but do not compromise on the classification quality. We experimentally evaluated WAKNN on 52 document data sets from a variety of domains and compared its performance against several classification algorithms, such as C4.5, RIPPER, Naive-Bayesian, PEBLS and VSM. Experimental results on these data sets confirm that WAKNN consistently outperforms other existing classification algorithms.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsDavid Cheung, Graham J. Williams, Qing Li
PublisherSpringer Verlag
Pages53-65
Number of pages13
Volume2035
ISBN (Print)3540419101, 9783540419105
DOIs
StatePublished - 2001
Event5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2001 - Kowloon, Hong Kong
Duration: Apr 16 2001Apr 18 2001

Publication series

NameLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume2035
ISSN (Print)0302-9743

Other

Other5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2001
Country/TerritoryHong Kong
CityKowloon
Period4/16/014/18/01

Bibliographical note

Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2001.

Keywords

  • K-NN classification
  • Text categorization
  • Weight adjustments

Fingerprint

Dive into the research topics of 'Text categorization using weight adjusted k-nearest neighbor classification'. Together they form a unique fingerprint.

Cite this