QSARs for chemical mutagens from structure: Ridge regression fitting and diagnostics

Douglas M Hawkins, Subhash C Basak, Denise Mills

Research output: Contribution to journalArticlepeer-review

23 Scopus citations


QSAR models have been developed for a diverse set of mutagens using computed molecular descriptors. Such models can be used in predicting mutagenicity from structure. All common methods - regression, neural nets, k-nearest neighbors - are 'linear smoothers' - weighted averages of the activities in the calibration data with weights dependent on the descriptors. While they have been studied extensively, a vital but overlooked area is 'case diagnostics', pointers to compounds that are poorly fitted, or are unusually influential in fitting the model. This is particularly true where the measured activity is binary - present or absent. We illustrate the use of numeric and graphic diagnostics, particularly that of the FF plot, with a data set with 508 compounds and 307 structural descriptors used to predict mutagenicity.

Original languageEnglish (US)
Pages (from-to)37-44
Number of pages8
JournalEnvironmental Toxicology and Pharmacology
Issue number1-2
StatePublished - Mar 2004

Bibliographical note

Funding Information:
This is contribution number 354 from the Center for Water and the Environment of the Natural Resources Research Institute. Research reported in this paper was supported in part by Grant F49620-02-1-0138-0l-0098 from the United States Airforce and Cooperative Agreement Number 572112 from the Agency for Toxic Substances and Disease Registry.


  • Diagnostics
  • Influence
  • Leverage
  • Linear model
  • Mutagenicity
  • Residuals


Dive into the research topics of 'QSARs for chemical mutagens from structure: Ridge regression fitting and diagnostics'. Together they form a unique fingerprint.

Cite this