Competitive Learning Mechanisms for Scalable, Incremental and Balanced Clustering of Streaming Texts

Arindam Banerjee, Joydeep Ghosh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Automated clustering of text documents such as web pages is becoming increasingly important for organizing the vast amounts of information available over the internet. This problem is also very challenging since typically text is represented by very high dimensional (> 1000), normalized (unit length) vectors. Moreover documents are continually being created and their statistics also change with time because of changing new-stories etc, so one needs incremental learning algorithms that can adapt to non-stationary environments. We model high-dimensional, normalized data using a mixture of von Mises-Fisher distributions, and then modify this generative model in a principled way to yield frequency sensitive competitive learning mechanisms that are applicable to streaming data, and produce balanced clusters. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques.

Original languageEnglish (US)
Title of host publicationProceedings of the International Joint Conference on Neural Networks
Pages2697-2702
Number of pages6
Volume4
StatePublished - Sep 25 2003
EventInternational Joint Conference on Neural Networks 2003 - Portland, OR, United States
Duration: Jul 20 2003Jul 24 2003

Other

OtherInternational Joint Conference on Neural Networks 2003
CountryUnited States
CityPortland, OR
Period7/20/037/24/03

Fingerprint Dive into the research topics of 'Competitive Learning Mechanisms for Scalable, Incremental and Balanced Clustering of Streaming Texts'. Together they form a unique fingerprint.

Cite this