The task of estimating a Gaussian graphical model in the high-dimensional setting is considered. The graphical lasso, which involves maximizing the Gaussian log likelihood subject to a lasso penalty, is a well-studied approach for this task. A surprising connection between the graphical lasso and hierarchical clustering is introduced: the graphical lasso in effect performs a two-step procedure, in which (1) single linkage hierarchical clustering is performed on the variables in order to identify connected components, and then (2) a penalized log likelihood is maximized on the subset of variables within each connected component. Thus, the graphical lasso determines the connected components of the estimated network via single linkage clustering. The single linkage clustering is known to perform poorly in certain finite-sample settings. Therefore, the cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering, and then performing the graphical lasso on the subset of variables within each cluster, is proposed. Model selection consistency for this technique is established, and its improved performance relative to the graphical lasso is demonstrated in a simulation study, as well as in applications to a university webpage and a gene expression data sets.
Bibliographical noteFunding Information:
We thank two reviewers and an associate editor for helpful comments that improved the quality of this manuscript. We thank Noah Simon for helpful conversations about the connection between graphical lasso and SLC; Jian Guo and Ji Zhu for providing the university webpage data set in Guo et al. (2011) ; and Han Liu and Tuo Zhao for sharing the equities data set in Liu et al. (2012) . This work was supported by NIH Grant DP5OD009145 , 1R21GM10171901A1 ; and NSF Grant DMS-1252624 , DMS-1161565 .
© 2014 Elsevier B.V. All rights reserved.
- Hierarchical clustering
- High-dimensional setting
- Single linkage clustering