Extracting grouping structure or identifying homogenous subgroups of predictors in regression is crucial for high-dimensional data analysis. A low-dimensional structure in particular-grouping, when captured in a regression model-enables to enhance predictive performance and to facilitate a model's interpretability. Grouping pursuit extracts homogenous subgroups of predictors most responsible for outcomes of a response. This is the case in gene network analysis, where grouping reveals gene functionalities with regard to progression of a disease. To address challenges in grouping pursuit, we introduce a novel homotopy method for computing an entire solution surface through regularization involving a piecewise linear penalty. This nonconvex and overcomplete penalty permits adaptive grouping and nearly unbiased estimation, which is treated with a novel concept of grouped subdifferentials and difference convex programming for efficient computation. Finally, the proposed method not only achieves high performance as suggested by numerical analysis, but also has the desired optimality with regard to grouping pursuit and prediction as showed by our theoretical results.
|Original language||English (US)|
|Number of pages||13|
|Journal||Journal of the American Statistical Association|
|State||Published - Jun 2010|
Bibliographical noteFunding Information:
Xiaotong Shen is Professor, School of Statistics, University of Minnesota, 224 Church Street S.E., Minneapolis, MN 55455 (E-mail: email@example.com. edu). Hsin-Cheng Huang is Research Fellow, Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan (E-mail: firstname.lastname@example.org). X. Shen’s research is supported in part by National Science Foundation grants DMS-0604394 and DMS-0906616 and National Institute of Health grant 1R01GM081535. H.-C. Huang is supported in part by grant NSC 97-2118-M-001-001-MY3. The authors thank the editor, the associate editor, and three referees for their helpful comments and suggestions.
- Gene networks
- Large p but small n
- Nonconvex minimization
- Supervised clustering