Cluster analysis: Unsupervised learning via supervised learning with a non-convex penalty

Wei Pan, Xiaotong Shen, Binghui Liu

Research output: Contribution to journalArticlepeer-review

60 Scopus citations

Abstract

Clustering analysis is widely used in many fields. Traditionally clustering is regarded as unsupervised learning for its lack of a class label or a quantitative response variable, which in contrast is present in supervised learning such as classification and regression. Here we formulate clustering as penalized regression with grouping pursuit. In addition to the novel use of a non-convex group penalty and its associated unique operating characteristics in the proposed clustering method, a main advantage of this formulation is its allowing borrowing some well established results in classification and regression, such as model selection criteria to select the number of clusters, a difficult problem in clustering analysis. In particular, we propose using the generalized cross-validation (GCV) based on generalized degrees of freedom (GDF) to select the number of clusters. We use a few simple numerical examples to compare our proposed method with some existing approaches, demonstrating our method's promising performance.

Original languageEnglish (US)
Pages (from-to)1865-1889
Number of pages25
JournalJournal of Machine Learning Research
Volume14
StatePublished - Jun 2013

Keywords

  • Generalized degrees of freedom
  • Grouping
  • K-means clustering
  • Lasso
  • Penalized regression
  • Truncated Lasso penalty (TLP)

Fingerprint

Dive into the research topics of 'Cluster analysis: Unsupervised learning via supervised learning with a non-convex penalty'. Together they form a unique fingerprint.

Cite this