Abstract
The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data. [Received November 2014. Revised August 2015.]
| Original language | English (US) |
|---|---|
| Pages (from-to) | 91-99 |
| Number of pages | 9 |
| Journal | American Statistician |
| Volume | 70 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jan 2 2016 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2016 American Statistical Association.
Keywords
- Clustering
- Imputation
- Majorization-minimization
- Missing data
- k-means
Fingerprint
Dive into the research topics of 'k-POD: A Method for k-Means Clustering of Missing Data'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS