Sparse Biclustering of Transposable Data

Kean Ming Tan, Daniela M. Witten

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

We consider the task of simultaneously clustering the rows and columns of a large transposable data matrix. We assume that the matrix elements are normally distributed with a bicluster-specific mean term and a common variance, and perform biclustering by maximizing the corresponding log-likelihood. We apply an ℓ1 penalty to the means of the biclusters to obtain sparse and interpretable biclusters. Our proposal amounts to a sparse, symmetrized version of k-means clustering. We show that k-means clustering of the rows and of the columns of a data matrix can be seen as special cases of our proposal, and that a relaxation of our proposal yields the singular value decomposition. In addition, we propose a framework for biclustering based on the matrix-variate normal distribution. The performances of our proposals are demonstrated in a simulation study and on a gene expression dataset. This article has supplementary material online.

Original languageEnglish (US)
Pages (from-to)985-1008
Number of pages24
JournalJournal of Computational and Graphical Statistics
Volume23
Issue number4
DOIs
StatePublished - Oct 25 2014

Bibliographical note

Funding Information:
The authors thank the editor, an associate editor, and two reviewers for helpful comments that improved the quality of this manuscript. The authors were supported by NIH Grant DP5OD009145 and NSF CAREER Award DMS-1252624.

Keywords

  • Clustering
  • Gene expression
  • Matrix-variate normal distribution
  • Unsupervised learning
  • ℓ penalty

Fingerprint Dive into the research topics of 'Sparse Biclustering of Transposable Data'. Together they form a unique fingerprint.

Cite this