Predicting rare classes: Comparing two-phase rule induction to cost-sensitive boosting

Mahesh V. Joshi, Ramesh C. Agarwal, Vipin Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Learning good classifier models of rare events is a challenging task. On such problems, the recently proposed two-phase rule induction algorithm, PNrule, outperforms other non-meta methods of rule induction. Boosting is a strong meta-classifier approach, and has been shown to be adaptable to skewed class distributions. PNrule's key feature is to identify the relevant false positives and to collectively remove them. In this paper, we qualitatively argue that this ability is not guaranteed by the boosting methodology. We simulate learning scenarios of varying difficulty to demonstrate that this fundamental qualitative difference in the two mechanisms results in existence of many scenarios in which PNrule achieves comparable or significantly better performance than AdaCost, a strong cost-sensitive boosting algorithm. Even a comparable performance by PNrule is desirable because it yields a more easily interpretable model over an ensemble of models generated by boosting. We also show similar supporting results on real-world and benchmark datasets.

Original languageEnglish (US)
Title of host publicationPrinciples of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, Proceedings
EditorsTapio Elomaa, Heikki Mannila, Hannu Toivonen
PublisherSpringer Verlag
Pages237-249
Number of pages13
ISBN (Print)3540440372, 9783540440376
DOIs
StatePublished - 2002
Event6th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2002 - Helsinki, Finland
Duration: Aug 19 2002Aug 23 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2431 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other6th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2002
CountryFinland
CityHelsinki
Period8/19/028/23/02

Fingerprint Dive into the research topics of 'Predicting rare classes: Comparing two-phase rule induction to cost-sensitive boosting'. Together they form a unique fingerprint.

Cite this