Abstract
Central to statistical theory and application is statistical modeling, which typically involves choosing a single model or combining a number of models of different sizes and from different sources. Whereas model selection seeks a single best modeling procedure, model combination combines the strength of different modeling procedures. In this article we look at several key issues and argue that model assessment is the key to model selection and combination. Most important, we introduce a general technique of optimal model assessment based on data perturbation, thus yielding optimal selection, in particular model selection and combination. From a frequentist perspective, we advocate model combination over a selected subset of modeling procedures, because it controls bias while reducing variability, hence yielding better performance in terms of the accuracy of estimation and prediction. To realize the potential of model combination, we develop methodologies for determining the optimal tuning parameter, such as weights and subsets for combining via optimal model assessment. We present simulated and real data examples to illustrate main aspects.
Original language | English (US) |
---|---|
Pages (from-to) | 554-568 |
Number of pages | 15 |
Journal | Journal of the American Statistical Association |
Volume | 101 |
Issue number | 474 |
DOIs | |
State | Published - Jun 2006 |
Bibliographical note
Funding Information:Xiaotong Shen is Professor, School of Statistics, University of Minnesota, Minneapolis, MN 55455 (E-mail: [email protected]). Hsin-Cheng Huang is Associate Research Fellow, Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan (E-mail: [email protected]). Shen’s research was supported in part by National Science Foundation, grant IIS-0328802. The authors thank the editor, the associate editor, two anonymous referees, Yuhong Yang, and David Anderson for helpful comments and suggestions.
Keywords
- Data perturbation
- Degrees of freedom
- Dependent
- Modeling uncertainty
- Non/semiparametric
- Parametric
- Prediction