A rapid method for the comparison of cluster analyses

Cavan Reilly, Changchun Wang, Mark Rutherford

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

Cluster analysis has become a very popular tool for the exploration of high dimensional data. Dozens of algorithms have been proposed, each with its own merits and shortcomings. It is not known to what extent various methods give the same results, nor is it even clear how to measure how similar is the output of two distinct algorithms. Here we propose a statistic that is designed to measure the "correlation" between two clustering methods when applied to a particular data set. In contrast to the Rank index, the most common statistic useed for this purpose, the method is very fast. We provide an algorithm that approximates the statistic and demonstrate two of its possible uses. Finally, we use this statistic to understand the clustering in a data set in the context that motivated this work: analysis of a gene expression experiment.

Original languageEnglish (US)
Pages (from-to)19-33
Number of pages15
JournalStatistica Sinica
Volume15
Issue number1
StatePublished - Jan 1 2005

Keywords

  • Cluster analysis
  • Cohen's kappa
  • Metropolis algorithm
  • Microarray
  • Traveling salesman problem

Fingerprint

Dive into the research topics of 'A rapid method for the comparison of cluster analyses'. Together they form a unique fingerprint.

Cite this