Abstract
High-dimensional feature selection has become increasingly crucial for seeking parsimonious models in estimation. For selection consistency, we derive one necessary and sufficient condition formulated on the notion of degree of separation. The minimal degree of separation is necessary for any method to be selection consistent. At a level slightly higher than the minimal degree of separation, selection consistency is achieved by a constrained L-0 -method and its computational surrogate - the constrained truncated L-1 -method. This permits up to exponentially many features in the sample size. In other words, these methods are optimal in feature selection against any selection method. In contrast, their regularization counterparts - the L-0 -regularization and truncated L1 -regularization methods enable so under slightly stronger assumptions. More importantly, sharper parameter estimation/prediction is realized through such selection, leading to minimax parameter estimation. This, otherwise, is impossible in the absence of a good selection method for high-dimensional analysis.
Original language | English (US) |
---|---|
Pages (from-to) | 807-832 |
Number of pages | 26 |
Journal | Annals of the Institute of Statistical Mathematics |
Volume | 65 |
Issue number | 5 |
DOIs | |
State | Published - Oct 2013 |
Bibliographical note
Funding Information:Research supported in part by NSF grant DMS-0906616 and DMS-1207771, and NIH grants 1R01GM081535-01 and HL65462.
Keywords
- (p, n) versus fixed p-asymptotics
- Constrained regression
- Difference convex programming
- Nonconvex regularization
- Parameter and nonparametric models