Abstract In many applications of clustering, solutions that are balanced, i.e., where the clusters obtained are of comparable sizes, are preferred. This chapter describes several approaches to obtaining balanced clustering results that also scale well to large data sets. First, we describe a general scalable framework for obtaining balanced clustering that ﬁrst clusters only a small subset of the data and then eﬃciently allocates the rest of the data to these initial clusters while simultaneously reﬁning the clustering. Next, we discuss how frequency sensitive competitive learning can be used for balanced clustering in both batch and on-line scenarios, and illustrate the mechanism with a case study of clustering directional data such as text documents. Finally, we brieﬂy outline balanced clustering based on other methods such as graph partitioning and mixture modeling.
|Original language||English (US)|
|Title of host publication||Constrained Clustering|
|Subtitle of host publication||Advances in Algorithms, Theory, and Applications|
|Number of pages||30|
|State||Published - Jan 1 2008|