Simultaneous grouping pursuit and feature selection over an undirected graph

Yunzhang Zhu, Xiaotong Shen, Wei Pan

Research output: Contribution to journalArticlepeer-review

42 Scopus citations

Abstract

In high-dimensional regression, grouping pursuit and feature selection have their own merits while complementing each other in battling the curse of dimensionality. To seek a parsimonious model, we perform simultaneous grouping pursuit and feature selection over an arbitrary undirected graph with each node corresponding to one predictor. When the corresponding nodes are reachable from each other over the graph, regression coefficients can be grouped, whose absolute values are the same or close. This is motivated from gene network analysis, where genes tend to work in groups according to their biological functionalities. Through a nonconvex penalty, we develop a computational strategy and analyze the proposed method. Theoretical analysis indicates that the proposed method reconstructs the oracle estimator, that is, the unbiased least-square estimator given the true grouping, leading to consistent reconstruction of grouping structures and informative features, as well as to optimal parameter estimation. Simulation studies suggest that themethod combines the benefit of grouping pursuit with that of feature selection, and compares favorably against its competitors in selection accuracy and predictive performance. An application to eQTL data is used to illustrate the methodology, where a network is incorporated into analysis through an undirected graph.

Original languageEnglish (US)
Pages (from-to)713-725
Number of pages13
JournalJournal of the American Statistical Association
Volume108
Issue number502
DOIs
StatePublished - 2013

Bibliographical note

Funding Information:
Yunzhang Zhu is Ph.D. candidate, School of Statistics, University of Minnesota, Minneapolis, MN 55455 (E-mail: zhuxx351@umn.edu). Xiaotong Shen is Professor, School of Statistics, University of Minnesota, Minneapolis, MN 55455 (E-mail: xshen@stat.umn.edu). Wei Pan is Professor, Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455 (E-mail: weip@biostat.umn.edu). The research was supported in part by NSF grants DMS-0906616 and DMS-1207771, NIH grants 1R01GM081535-01 and grant HL65462. The authors thank the editor, the associate editor, and anonymous referees for helpful comments and suggestions.

Keywords

  • Network analysis
  • Nonconvex minimization
  • Prediction
  • Structured data

Fingerprint

Dive into the research topics of 'Simultaneous grouping pursuit and feature selection over an undirected graph'. Together they form a unique fingerprint.

Cite this