We introduce the notion of variable selection confidence set (VSCS) for linear regression based on F-testing. Our method identifies the most important variables in a principled way that goes beyond simply trusting the single winner based on a model selection criterion. The VSCS extends the usual notion of confidence intervals to the variable selection problem: A VSCS is a set of regression models that contains the true model with a given level of confidence. Although the size of the VSCS properly reflects the model selection uncertainty, without specific assumptions on the true model, the VSCS is typically rather large (unless the number of predictors is small). As a solution, we advocate special attention to the set of lower boundary models (LBMs), which are the most parsimonious models not statistically significantly inferior to the full model at a given confidence level. Based on the LBMs, variable importance and measures of co-appearance importance of predictors can be naturally defined.
Bibliographical noteFunding Information:
We sincerely thank the two reviewers and the AE for their very helpful comments and suggestions for improving our work. In particular, the reference of Hansen, Lunde and Nason (2011) that they brought to our attention for comparison and discussion is appreciated. The work of Yuhong Yang was partially supported by the NSF Grant DMS-1106576.
- Confidence set
- Linear regression
- Model selection
- Variable selection