QSAR models have been developed for a diverse set of mutagens using computed molecular descriptors. Such models can be used in predicting mutagenicity from structure. All common methods - regression, neural nets, k-nearest neighbors - are 'linear smoothers' - weighted averages of the activities in the calibration data with weights dependent on the descriptors. While they have been studied extensively, a vital but overlooked area is 'case diagnostics', pointers to compounds that are poorly fitted, or are unusually influential in fitting the model. This is particularly true where the measured activity is binary - present or absent. We illustrate the use of numeric and graphic diagnostics, particularly that of the FF plot, with a data set with 508 compounds and 307 structural descriptors used to predict mutagenicity.
|Original language||English (US)|
|Number of pages||8|
|Journal||Environmental Toxicology and Pharmacology|
|State||Published - Mar 2004|
Bibliographical noteFunding Information:
This is contribution number 354 from the Center for Water and the Environment of the Natural Resources Research Institute. Research reported in this paper was supported in part by Grant F49620-02-1-0138-0l-0098 from the United States Airforce and Cooperative Agreement Number 572112 from the Agency for Toxic Substances and Disease Registry.
- Linear model