TY - JOUR
T1 - Hierarchical Community Detection by Recursive Partitioning
AU - Li, Tianxi
AU - Lei, Lihua
AU - Bhattacharyya, Sharmodeep
AU - Van den Berge, Koen
AU - Sarkar, Purnamrita
AU - Bickel, Peter J.
AU - Levina, Elizaveta
N1 - Publisher Copyright:
© 2020 American Statistical Association.
PY - 2022
Y1 - 2022
N2 - The problem of community detection in networks is usually formulated as finding a single partition of the network into some “correct” number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive partitioning algorithm, starting with a single community and separating the nodes into two communities by spectral clustering repeatedly, until a stopping rule suggests there are no further communities. This class of algorithms is model-free, computationally efficient, and requires no tuning other than selecting a stopping rule. We show that there are regimes where this approach outperforms K-way spectral clustering, and propose a natural framework for analyzing the algorithm’s theoretical performance, the binary tree stochastic block model. Under this model, we prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. We apply the algorithm to a gene network based on gene co-occurrence in 1580 research papers on anemia, and identify six clusters of genes in a meaningful hierarchy. We also illustrate the algorithm on a dataset of statistics papers. Supplementary materials for this article are available online.
AB - The problem of community detection in networks is usually formulated as finding a single partition of the network into some “correct” number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive partitioning algorithm, starting with a single community and separating the nodes into two communities by spectral clustering repeatedly, until a stopping rule suggests there are no further communities. This class of algorithms is model-free, computationally efficient, and requires no tuning other than selecting a stopping rule. We show that there are regimes where this approach outperforms K-way spectral clustering, and propose a natural framework for analyzing the algorithm’s theoretical performance, the binary tree stochastic block model. Under this model, we prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. We apply the algorithm to a gene network based on gene co-occurrence in 1580 research papers on anemia, and identify six clusters of genes in a meaningful hierarchy. We also illustrate the algorithm on a dataset of statistics papers. Supplementary materials for this article are available online.
KW - Community detection
KW - Hierarchical clustering
KW - Network
KW - Recursive partitioning
UR - http://www.scopus.com/inward/record.url?scp=85096518017&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096518017&partnerID=8YFLogxK
U2 - 10.1080/01621459.2020.1833888
DO - 10.1080/01621459.2020.1833888
M3 - Article
AN - SCOPUS:85096518017
SN - 0162-1459
VL - 117
SP - 951
EP - 968
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 538
ER -