Many exciting results have been obtained on model selection for high-dimensional data in both efficient algorithms and theoretical developments. The powerful penalized regression methods can give sparse representations of the data even when the number of predictors is much larger than the sample size. One important question then is: How do we know when a sparse pattern identified by such a method is reliable? In this work, besides investigating instability of model selection methods in terms of variable selection, we propose variable selection deviation measures that give one a proper sense on how many predictors in the selected set are likely trustworthy in certain aspects. Simulation and a real data example demonstrate the utility of these measures for application.
Bibliographical noteFunding Information:
The authors appreciate comments from Wei Pan, Lan Wang, Yi Yang, and Hui Zou. We also sincerely thank two referees, the AE, and the Editor for very helpful suggestions on improving our work in both theoretical and numerical aspects. This research was partially supported by NSF grant DMS-1106576.
- Model selection diagnostics
- Model selection instability
- Variable selection deviation