Big data clustering via random sketching and validation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

As the number and dimensionality of data increases, development of new efficient processing tools has become a necessity. The present paper introduces a novel dimensionality reduction approach for fast and efficient clustering of high-dimensional data. The new methods extend random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to the dimensionality reduction problem. The advocated random sketching and validation K-means (SkeVa K-means) and Divergence SkeVa algorithms can achieve high performance, with the latter being able to afford lower computational footprint than the former. Extensive numerical tests on synthetic and real datasets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives.

Original languageEnglish (US)
Title of host publicationConference Record of the 48th Asilomar Conference on Signals, Systems and Computers
EditorsMichael B. Matthews
PublisherIEEE Computer Society
Pages1046-1050
Number of pages5
ISBN (Electronic)9781479982974
DOIs
StatePublished - Apr 24 2015
Event48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015 - Pacific Grove, United States
Duration: Nov 2 2014Nov 5 2014

Publication series

NameConference Record - Asilomar Conference on Signals, Systems and Computers
Volume2015-April
ISSN (Print)1058-6393

Other

Other48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015
CountryUnited States
CityPacific Grove
Period11/2/1411/5/14

Keywords

  • Clustering
  • K-means
  • big data
  • feature selection
  • high-dimensional data
  • random sampling and consensus
  • random sketching and validation

Fingerprint Dive into the research topics of 'Big data clustering via random sketching and validation'. Together they form a unique fingerprint.

Cite this