Abstract
Clustering can be a valuable tool for analyzing large amounts of data, but anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter when working within each of the three available frameworks where one thinks of clustering: as a Euclidean distance problem; as a statistical model problem; or as a complexity theory problem. We report here a novel recursive square root heuristic, RSQRT, which accurately predicts Kreported as a function of the attribute or item count, depending on attribute scales. We tested the heuristic on 226 widely-varying, but mostly scientific, studies, and found that the heuristic's Kbest-predicted rounded to exactly Kreported in over half of the studies and was close in almost all of them. We claim that this strongly-supported heuristic makes sense and that, although it is not prescriptive, using it prospectively is much better than guessing.
Original language | English (US) |
---|---|
Pages (from-to) | 1069-1072 |
Number of pages | 4 |
Journal | Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference |
DOIs | |
State | Published - 2010 |
Event | 2010 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC'10 - Buenos Aires, Argentina Duration: Aug 31 2010 → Sep 4 2010 |
PubMed: MeSH publication types
- Journal Article
- Research Support, N.I.H., Extramural