How many clusters to report: A recursive heuristic

John Carlis, Kelsey Bruso

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Clustering can be a valuable tool for analyzing large amounts of data, but anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter when working within each of the three available frameworks where one thinks of clustering: as a Euclidean distance problem; as a statistical model problem; or as a complexity theory problem. We report here a novel recursive square root heuristic, RSQRT, which accurately predicts Kreported as a function of the attribute or item count, depending on attribute scales. We tested the heuristic on 226 widely-varying, but mostly scientific, studies, and found that the heuristic's Kbest-predicted rounded to exactly Kreported in over half of the studies and was close in almost all of them. We claim that this strongly-supported heuristic makes sense and that, although it is not prescriptive, using it prospectively is much better than guessing.

PubMed: MeSH publication types

  • Journal Article
  • Research Support, N.I.H., Extramural

Fingerprint Dive into the research topics of 'How many clusters to report: A recursive heuristic'. Together they form a unique fingerprint.

Cite this