Penalized model-based clustering with application to variable selection

Research output: Contribution to journalArticlepeer-review

183 Scopus citations


Variable selection in clustering analysis is both challenging and important. In the context of model-based clustering analysis with a common diagonal covariance matrix, which is especially suitable for "high dimension, low sample size" settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.

Original languageEnglish (US)
Pages (from-to)1145-1164
Number of pages20
JournalJournal of Machine Learning Research
StatePublished - May 2007


  • BIC
  • EM
  • Mixture model
  • Penalized likelihood
  • Shrinkage
  • Soft-thresholding


Dive into the research topics of 'Penalized model-based clustering with application to variable selection'. Together they form a unique fingerprint.

Cite this