Deterministic fallacies and model validation

Douglas M. Hawkins, Jessica Kraker

Research output: Contribution to journalArticle

29 Scopus citations

Abstract

Stochastic settings differ from deterministic ones in many subtle ways, making it easy to slip into errors through applying deterministic thinking inappropriately. We suspect this is the cause of much of the disagreement about model validation. A further technical issue is a common misapplication of cross-validation, in which it is applied only partially, leading to incorrect results. Statistical theory and empirical investigation verify the efficacy of cross- validation when it is applied correctly. In settings where data are relatively scarce, cross-validation is attractive in that it makes the maximum possible use of all available information, at the cost of potentially substantial computation. The bootstrap is another method that makes full use of all available data for both model fitting and model validation, at a cost of substantially increased computation, and it shares many of the broad philosophical background of cross- validation. Increasingly, the computational cost of these methods is not a major concern, leading to the recom- mendation, in most circumstances, to use cross-validation or bootstrapping rather than the earlier standard method of splitting the available data into a learning and a testing portion.

Original languageEnglish (US)
Pages (from-to)188-193
Number of pages6
JournalJournal of Chemometrics
Volume24
Issue number3-4
DOIs
StatePublished - Mar 1 2010

    Fingerprint

Keywords

  • Bootstrap
  • Cross-validation
  • Diagnostics
  • Model validation
  • Stochastic

Cite this