Sketched subspace clustering

Panagiotis A. Traganitis, Georgios B. Giannakis

Research output: Contribution to journalArticlepeer-review

40 Scopus citations

Abstract

The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of groundtruth labels, is an important tool for drawing inferences from data. Subspace clustering (SC) is a relatively recent method that is able to successfully classify nonlinearly separable data in a multitude of settings. In spite of their high clustering accuracy, SC methods incur prohibitively high computational complexity when processing large volumes of high-dimensional data. Inspired by random sketching approaches for dimensionality reduction, the present paper introduces a randomized scheme for SC, termed Sketch-SC, tailored for large volumes of high-dimensional data. Sketch-SC accelerates the computationally heavy parts of state-of-the-art SC approaches by compressing the data matrix across both dimensions using random projections, thus enabling fast and accurate large-scale SC. Performance analysis as well as extensive numerical tests on real data corroborate the potential of Sketch-SC and its competitive performance relative to state-of-the-art scalable SC approaches.

Original languageEnglish (US)
Article number8170271
Pages (from-to)1663-1675
Number of pages13
JournalIEEE Transactions on Signal Processing
Volume66
Issue number7
DOIs
StatePublished - Apr 1 2018

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

Keywords

  • Big data
  • Random projections
  • Sketching
  • Subspace clustering

Fingerprint

Dive into the research topics of 'Sketched subspace clustering'. Together they form a unique fingerprint.

Cite this