Active semi-supervision for pairwise constrained clustering

Sugato Basu, Arindam Banerjee, Raymond J. Mooney

Research output: Chapter in Book/Report/Conference proceedingConference contribution

356 Scopus citations

Abstract

Semi-supervised clustering uses a small amount of supervised data to aid unsupervised learning. One typical approach specifies a limited number of must-link and cannot-link constraints between pairs of examples. This paper presents a pairwise constrained clustering framework and a new method for actively selecting informative pairwise constraints to get improved clustering performance. The clustering and active learning methods are both easily scalable to large datasets, and can handle very high dimensional data. Experimental and theoretical results confirm that this active querying of pairwise constraints significantly improves the accuracy of clustering when given a relatively small amount of supervision.

Original languageEnglish (US)
Title of host publicationProceedings of the Fourth SIAM International Conference on Data Mining
EditorsM.W. Berry, U. Dayal, C. Kamath, D. Skillicorn
Pages333-344
Number of pages12
StatePublished - Jun 22 2004
EventProceedings of the Fourth SIAM International Conference on Data Mining - Lake Buena Vista, FL, United States
Duration: Apr 22 2004Apr 24 2004

Other

OtherProceedings of the Fourth SIAM International Conference on Data Mining
Country/TerritoryUnited States
CityLake Buena Vista, FL
Period4/22/044/24/04

Fingerprint

Dive into the research topics of 'Active semi-supervision for pairwise constrained clustering'. Together they form a unique fingerprint.

Cite this