Scalable temporal clustering for massive multidimensional data streams

Gediminas Adomavicius, Jesse Bockstedt, Vishnu Parimi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Today's organizations are continuously capturing extremely large amounts of data, which will only continue to increase. In this paper we present a new approach to discovering clusters in these massive amounts of complex (i.e., multidimensional) continuously-arriving data, which are much too large to be analyzed as one dataset. In order to guarantee acceptable scalability, our approach builds on existing data mining literature and uses sampling-based techniques, an advanced variation of hierarchical agglomerative clustering, and an approach for sample-based cluster reconstruction to provide an approximate clustering solution of very high accuracy. We test the proposed approach empirically and show that it provides excellent clustering performance and, at the same time, demonstrates significant computational savings.

Original languageEnglish (US)
Title of host publication2008 Workshop on Information Technologies and Systems, WITS 2008
PublisherSocial Science Research Network
Pages121-126
Number of pages6
StatePublished - Jan 1 2008
Event2008 Workshop on Information Technologies and Systems, WITS 2008 - Paris, France
Duration: Dec 13 2008Dec 14 2008

Other

Other2008 Workshop on Information Technologies and Systems, WITS 2008
CountryFrance
CityParis
Period12/13/0812/14/08

Fingerprint Dive into the research topics of 'Scalable temporal clustering for massive multidimensional data streams'. Together they form a unique fingerprint.

Cite this