Abstract
We address the consistency property of cross validation (CV) for classification. Sufficient conditions are obtained on the data splitting ratio to ensure that the better classifier between two candidates will be favored by CV with probability approaching 1. Interestingly, it turns out that for comparing two general learning methods, the ratio of the training sample size and the evaluation size does not have to approach 0 for consistency in selection, as is required for comparing parametric regression models (Shao (1993)). In fact, the ratio may be allowed to converge to infinity or any positive constant, depending on the situation. In addition, we also discuss confidence intervals and sequential instability in selection for comparing classifiers.
Original language | English (US) |
---|---|
Pages (from-to) | 635-657 |
Number of pages | 23 |
Journal | Statistica Sinica |
Volume | 16 |
Issue number | 2 |
State | Published - Apr 2006 |
Keywords
- Classification
- Comparing learning methods
- Consistency in selection
- Cross validation paradox
- Sequential instability