Proper statistical modeling and validation in QSAR: A case study in the prediction of rat fat-air partitioning

Subhash C Basak, Denise Mills, Douglas M Hawkins, Jessica J. Kraker

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

A number of multivariate regression methods commonly used to develop predictive models, along with model validation techniques, are contrary to the current opinion of experts in the field of statistics. Such methods result in overly optimistic models that cannot be relied upon to produce meaningful predictions for new compounds. Ridge regression is one appropriate methodology when the number of independent variables exceeds the number of observations. Although variable reduction is not a necessary component of a ridge regression analysis, descriptor thinning may be applied to eliminate variables that have no relationship to the property or activity of interest in an effort to increase model interpretability; although it is critical that this process be carried out correctly. In this paper, we have developed a predictive model for rat fat:air partition coefficient using proper statistical techniques. For comparative purposes, we have also used stepwise ordinary least squares regression, commonly used in QSAR studies but which often results in an inflated "naïve" q2. It is important to note that all descriptors used in this analysis are computed strictly from chemical structure without the need for any additional experimental input and, therefore, can be applied to any chemical, real or hypothetical, in order to assess the pharmacokinetics and toxic potential.

Original languageEnglish (US)
Title of host publicationComputation in Modern Science and Engineering - Proceedings of the International Conference on Computational Methods in Science and Engineering 2007 (ICCMSE 2007)
Pages548-551
Number of pages4
Edition2
DOIs
Publication statusPublished - Dec 1 2007
EventInternational Conference on Computational Methods in Science and Engineering 2007, ICCMSE 2007 - Corfu, Greece
Duration: Sep 25 2007Sep 30 2007

Publication series

NameAIP Conference Proceedings
Number2
Volume963
ISSN (Print)0094-243X
ISSN (Electronic)1551-7616

Other

OtherInternational Conference on Computational Methods in Science and Engineering 2007, ICCMSE 2007
CountryGreece
CityCorfu
Period9/25/079/30/07

    Fingerprint

Keywords

  • Descriptor thinning
  • Gram-Schmidt
  • Mathematical descriptors
  • Overfitting
  • Ridge regression
  • Stepwise regression

Cite this

Basak, S. C., Mills, D., Hawkins, D. M., & Kraker, J. J. (2007). Proper statistical modeling and validation in QSAR: A case study in the prediction of rat fat-air partitioning. In Computation in Modern Science and Engineering - Proceedings of the International Conference on Computational Methods in Science and Engineering 2007 (ICCMSE 2007) (2 ed., pp. 548-551). (AIP Conference Proceedings; Vol. 963, No. 2). https://doi.org/10.1063/1.2836137