Clustering with Bregman divergences

Arindam Banerjee, Srujana Merugu, Inderjit Dhillon, Joydeep Ghosh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

59 Scopus citations

Abstract

A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans and information-theoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by rate-distortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an efficient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.

Original languageEnglish (US)
Title of host publicationProceedings of the Fourth SIAM International Conference on Data Mining
EditorsM.W. Berry, U. Dayal, C. Kamath, D. Skillicorn
Pages234-245
Number of pages12
StatePublished - Jan 1 2004
EventProceedings of the Fourth SIAM International Conference on Data Mining - Lake Buena Vista, FL, United States
Duration: Apr 22 2004Apr 24 2004

Other

OtherProceedings of the Fourth SIAM International Conference on Data Mining
CountryUnited States
CityLake Buena Vista, FL
Period4/22/044/24/04

Fingerprint Dive into the research topics of 'Clustering with Bregman divergences'. Together they form a unique fingerprint.

Cite this