Co-clustering as multilinear decomposition with sparse latent factors

Evangelos E. Papalexakis, Nicholas D. Sidiropoulos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

23 Scopus citations

Abstract

The K-means clustering problem seeks to partition the columns of a data matrix in subsets, such that columns in the same subset are 'close' to each other. The co-clustering problem seeks to simultaneously partition the rows and columns of a matrix to produce 'coherent' groups called co-clusters. Co-clustering has recently found numerous applications in diverse areas. The concept readily generalizes to higher-way data sets (e.g., adding a temporal dimension). Starting from K-means, we show how co-clustering can be formulated as constrained multilinear decomposition with sparse latent factors. In the case of three- and higher-way data, this corresponds to a PARAFAC decomposition with sparse latent factors. This is important, for PARAFAC is unique under mild conditions - and sparsity further improves identifiability. This allows us to uniquely unravel a large number of possibly overlapping co-clusters that are hidden in the data. Interestingly, the imposition of latent sparsity pays a collateral dividend: as one increases the number of fitted co-clusters, new co-clusters are added without affecting those previously extracted. An important corollary is that co-clusters can be extracted incrementally; this implies that the algorithm scales well for large datasets. We demonstrate the validity of our approach using the ENRON corpus, as well as synthetic data.

Original languageEnglish (US)
Title of host publication2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages2064-2067
Number of pages4
DOIs
StatePublished - Aug 18 2011
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: May 22 2011May 27 2011

Other

Other36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
CountryCzech Republic
CityPrague
Period5/22/115/27/11

Fingerprint Dive into the research topics of 'Co-clustering as multilinear decomposition with sparse latent factors'. Together they form a unique fingerprint.

Cite this