Confidence sets for model selection by F-testing

Davide Ferrari, Yuhong Yang

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

We introduce the notion of variable selection confidence set (VSCS) for linear regression based on F-testing. Our method identifies the most important variables in a principled way that goes beyond simply trusting the single winner based on a model selection criterion. The VSCS extends the usual notion of confidence intervals to the variable selection problem: A VSCS is a set of regression models that contains the true model with a given level of confidence. Although the size of the VSCS properly reflects the model selection uncertainty, without specific assumptions on the true model, the VSCS is typically rather large (unless the number of predictors is small). As a solution, we advocate special attention to the set of lower boundary models (LBMs), which are the most parsimonious models not statistically significantly inferior to the full model at a given confidence level. Based on the LBMs, variable importance and measures of co-appearance importance of predictors can be naturally defined.

Original languageEnglish (US)
Pages (from-to)1637-1658
Number of pages22
JournalStatistica Sinica
Volume25
Issue number4
DOIs
StatePublished - Oct 2015

Bibliographical note

Funding Information:
We sincerely thank the two reviewers and the AE for their very helpful comments and suggestions for improving our work. In particular, the reference of Hansen, Lunde and Nason (2011) that they brought to our attention for comparison and discussion is appreciated. The work of Yuhong Yang was partially supported by the NSF Grant DMS-1106576.

Keywords

  • Confidence set
  • Linear regression
  • Model selection
  • Variable selection

Fingerprint Dive into the research topics of 'Confidence sets for model selection by F-testing'. Together they form a unique fingerprint.

Cite this