Abstract
An adaptive semi-supervised ensemble method, ASSEMBLE, is proposed that constructs classification ensembles based on both labeled and unlabeled data. ASSEMBLE alternates between assigning "pseudo-classes" to the unlabeled data using the existing ensemble and constructing the next base classifier using both the labeled and pseudo-labeled data. Mathematically, this intuitive algorithm corresponds to maximizing the classification margin in hypothesis space as measured on both the labeled and unlabeled data. Unlike alternative approaches, ASSEMBLE does not require a semi-supervised learning method for the base classifier. ASSEMBLE can be used in conjunction with any cost-sensitive classification algorithm for both two-class and multi-class problems. ASSEMBLE using decision trees won the NIPS 2001 Unlabeled Data Competition. In addition, strong results on several benchmark datasets using both decision trees and neural networks support the proposed method.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
Editors | D. Hand, D. Keim, R. Ng |
Pages | 289-296 |
Number of pages | 8 |
State | Published - Dec 1 2002 |
Event | KDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Edmonton, Alta, Canada Duration: Jul 23 2002 → Jul 26 2002 |
Other
Other | KDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|
Country/Territory | Canada |
City | Edmonton, Alta |
Period | 7/23/02 → 7/26/02 |
Keywords
- Boosting
- Classification
- Ensemble learning
- Semi-supervised learning