The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of groundtruth labels, is an important tool for drawing inferences from data. Subspace clustering (SC) is a relatively recent method that is able to successfully classify nonlinearly separable data in a multitude of settings. In spite of their high clustering accuracy, SC methods incur prohibitively high computational complexity when processing large volumes of high-dimensional data. Inspired by random sketching approaches for dimensionality reduction, the present paper introduces a randomized scheme for SC, termed Sketch-SC, tailored for large volumes of high-dimensional data. Sketch-SC accelerates the computationally heavy parts of state-of-the-art SC approaches by compressing the data matrix across both dimensions using random projections, thus enabling fast and accurate large-scale SC. Performance analysis as well as extensive numerical tests on real data corroborate the potential of Sketch-SC and its competitive performance relative to state-of-the-art scalable SC approaches.
Bibliographical noteFunding Information:
Manuscript received July 22, 2017; revised October 26, 2017; accepted November 17, 2017. Date of publication December 8, 2017; date of current version February 13, 2018. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Sotirios Chatzis. This work was supported by the National Science Foundation under Grants 1500713 and 1514056. This paper was presented in part at the 50th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, November 2016. (Corresponding author: Georgios B. Giannakis.) The authors are with the Department of Electrical and Computer Engineering and the Digital Technology Center, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: email@example.com; firstname.lastname@example.org).
© 2017 IEEE.
- Big data
- Random projections
- Subspace clustering