In this paper, we extend the LASSO method (Tibshirani 1996) for simultaneously fitting a regression model and identifying important interaction terms. Unlike most of the existing variable selection methods, our method automatically enforces the heredity constraint, that is, an interaction term can be included in the model only if the corresponding main terms are also included in the model. Furthermore, we extend our method to generalized linear models, and show that it performs as well as if the true model were given in advance, that is, the oracle property as in Fan and Li (2001) and Fan and Peng (2004). The proof of the oracle property is given in online supplemental materials. Numerical results on both simulation data and real data indicate that our method tends to remove irrelevant variables more effectively and provide better prediction performance than previous work (Yuan, Joseph, and Lin 2007 and Zhao, Rocha, and Yu 2009 as well as the classical LASSO method).
Bibliographical noteFunding Information:
Nam Hee Choi is Lecturer, Department of Statistics, University of Michigan, Ann Arbor, MI 48109. William Li is Professor, Carlson School of Management, University of Minnesota, Minneapolis, MN 55455. Ji Zhu is Associate Professor, Department of Statistics, University of Michigan, Ann Arbor, MI 48109 (E-mail: email@example.com). We thank Rayjean Hung, Stefano Porru, Paolo Boffetta, and John Witte for sharing the bladder cancer dataset. Choi and Zhu are partially supported by grants DMS-0705532 and DMS-0748389 from the National Science Foundation.
Copyright 2010 Elsevier B.V., All rights reserved.
- Heredity structure