Bayesian cluster ensembles

Hongjun Wang, Hanhuai Shan, Arindam Banerjee

Research output: Contribution to journalArticlepeer-review

84 Scopus citations

Abstract

Cluster ensembles provide a framework for combining multiple base clusterings of a dataset to generate a stable and robust consensus clustering. There are important variants of the basic cluster ensemble problem, notably including cluster ensembles with missing values, row- or column-distributed cluster ensembles. Existing cluster ensemble algorithms are applicable only to a small subset of these variants. In this paper, we propose Bayesian cluster ensemble (BCE), which is a mixed-membership model for learning cluster ensembles, and is applicable to all the primary variants of the problem. We propose a variational approximation based algorithm for learning Bayesian cluster ensembles. BCE is further generalized to deal with the case where the features of original data points are available, referred to as generalized BCE (GBCE). We compare BCE extensively with several other cluster ensemble algorithms, and demonstrate that BCE is not only versatile in terms of its applicability but also outperforms other algorithms in terms of stability and accuracy. Moreover, GBCE can have higher accuracy than BCE, especially with only a small number of available base clusterings.

Original languageEnglish (US)
Pages (from-to)54-70
Number of pages17
JournalStatistical Analysis and Data Mining
Volume4
Issue number1
DOIs
StatePublished - Feb 2011

Keywords

  • Bayesian models
  • Cluster ensembles

Fingerprint Dive into the research topics of 'Bayesian cluster ensembles'. Together they form a unique fingerprint.

Cite this