The Problem of Overfitting

Douglas M. Hawkins

Research output: Contribution to journalReview article

908 Scopus citations

Abstract

Overfitting problem in model fitting for quantitative measurements is discussed. Two types of overfitting can be distinguished, which include using a model that is more flexible than it needs to be and using a model that includes irrelevant components or predictors. Adding predictors that perform no useful function means that in future use of the regression to make predictions it will be needed to measure and record the predictors so that their values can be substituted in the model. Adding irrelevant predictors can also make predictions worse because the coefficients fitted to them add random variation to the subsequent predictions.

Original languageEnglish (US)
Pages (from-to)1-12
Number of pages12
JournalJournal of chemical information and computer sciences
Volume44
Issue number1
DOIs
Publication statusPublished - Jan 1 2004

    Fingerprint

Cite this