Exploiting unlabeled data in ensemble methods

Kristin P. Bennett, Ayhan Demiriz, Richard Maclin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

151 Scopus citations

Abstract

An adaptive semi-supervised ensemble method, ASSEMBLE, is proposed that constructs classification ensembles based on both labeled and unlabeled data. ASSEMBLE alternates between assigning "pseudo-classes" to the unlabeled data using the existing ensemble and constructing the next base classifier using both the labeled and pseudo-labeled data. Mathematically, this intuitive algorithm corresponds to maximizing the classification margin in hypothesis space as measured on both the labeled and unlabeled data. Unlike alternative approaches, ASSEMBLE does not require a semi-supervised learning method for the base classifier. ASSEMBLE can be used in conjunction with any cost-sensitive classification algorithm for both two-class and multi-class problems. ASSEMBLE using decision trees won the NIPS 2001 Unlabeled Data Competition. In addition, strong results on several benchmark datasets using both decision trees and neural networks support the proposed method.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsD. Hand, D. Keim, R. Ng
Pages289-296
Number of pages8
StatePublished - Dec 1 2002
EventKDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Edmonton, Alta, Canada
Duration: Jul 23 2002Jul 26 2002

Other

OtherKDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/TerritoryCanada
CityEdmonton, Alta
Period7/23/027/26/02

Keywords

  • Boosting
  • Classification
  • Ensemble learning
  • Semi-supervised learning

Fingerprint

Dive into the research topics of 'Exploiting unlabeled data in ensemble methods'. Together they form a unique fingerprint.

Cite this