TY - JOUR
T1 - Consistency of cross validation for comparing regression procedures
AU - Yang, Yuhong
PY - 2007/12
Y1 - 2007/12
N2 - Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.
AB - Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.
KW - Consistency
KW - Cross validation
KW - Model selection
UR - http://www.scopus.com/inward/record.url?scp=39649100346&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=39649100346&partnerID=8YFLogxK
U2 - 10.1214/009053607000000514
DO - 10.1214/009053607000000514
M3 - Article
AN - SCOPUS:39649100346
SN - 0090-5364
VL - 35
SP - 2450
EP - 2473
JO - Annals of Statistics
JF - Annals of Statistics
IS - 6
ER -