TY - JOUR
T1 - Multi-threaded modularity based graph clustering using the multilevel paradigm
AU - Lasalle, Dominique
AU - Karypis, George
N1 - Publisher Copyright:
© 2014 Elsevier Inc. All rights reserved.
PY - 2015/2
Y1 - 2015/2
N2 - Graphs are an important tool for modeling data in many diverse domains. Recent increase in sensor technology and deployment, the adoption of online services, and the scale of VLSI circuits has caused the size of these graphs to skyrocket. Finding clusters of highly connected vertices within these graphs is a critical part of their analysis. In this paper we apply the multilevel paradigm to the modularity graph clustering problem. We improve upon the state of the art by introducing new efficient methods for coarsening graphs, creating initial clusterings, and performing local refinement on the resulting clusterings. We establish that for a graph with n vertices and m edges, these algorithms have an O(m+n) runtime complexity and an O(m+n) space complexity, and show that in practice they are extremely fast. We present shared-memory parallel formulations of these algorithms to take full advantage of modern architectures, which we show have a parallel runtime of O(m/p+n/p+k), where p is the number of threads and k is the number of clusters. Finally, we present the product of this research, the clustering tool Nerstrand.1 In serial mode, Nerstrand runs in a fraction of the time of current methods and produces results of equal quality. When run in parallel mode, Nerstrand exhibits significant speedup with less than one percent degradation of clustering quality. Nerstrand works well on large graphs, clustering a graph with over 105 million vertices and 3.3 billion edges in 90 s.
AB - Graphs are an important tool for modeling data in many diverse domains. Recent increase in sensor technology and deployment, the adoption of online services, and the scale of VLSI circuits has caused the size of these graphs to skyrocket. Finding clusters of highly connected vertices within these graphs is a critical part of their analysis. In this paper we apply the multilevel paradigm to the modularity graph clustering problem. We improve upon the state of the art by introducing new efficient methods for coarsening graphs, creating initial clusterings, and performing local refinement on the resulting clusterings. We establish that for a graph with n vertices and m edges, these algorithms have an O(m+n) runtime complexity and an O(m+n) space complexity, and show that in practice they are extremely fast. We present shared-memory parallel formulations of these algorithms to take full advantage of modern architectures, which we show have a parallel runtime of O(m/p+n/p+k), where p is the number of threads and k is the number of clusters. Finally, we present the product of this research, the clustering tool Nerstrand.1 In serial mode, Nerstrand runs in a fraction of the time of current methods and produces results of equal quality. When run in parallel mode, Nerstrand exhibits significant speedup with less than one percent degradation of clustering quality. Nerstrand works well on large graphs, clustering a graph with over 105 million vertices and 3.3 billion edges in 90 s.
KW - Graph clustering
KW - Multi-threading
KW - Multilevel paradigm
KW - Shared-memory parallel
UR - http://www.scopus.com/inward/record.url?scp=85027951398&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027951398&partnerID=8YFLogxK
U2 - 10.1016/j.jpdc.2014.09.012
DO - 10.1016/j.jpdc.2014.09.012
M3 - Article
AN - SCOPUS:85027951398
SN - 0743-7315
VL - 76
SP - 66
EP - 80
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
ER -