QSARs for chemical mutagens from structure

Ridge regression fitting and diagnostics

Douglas M Hawkins, Subhash C Basak, Denise Mills

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

QSAR models have been developed for a diverse set of mutagens using computed molecular descriptors. Such models can be used in predicting mutagenicity from structure. All common methods - regression, neural nets, k-nearest neighbors - are 'linear smoothers' - weighted averages of the activities in the calibration data with weights dependent on the descriptors. While they have been studied extensively, a vital but overlooked area is 'case diagnostics', pointers to compounds that are poorly fitted, or are unusually influential in fitting the model. This is particularly true where the measured activity is binary - present or absent. We illustrate the use of numeric and graphic diagnostics, particularly that of the FF plot, with a data set with 508 compounds and 307 structural descriptors used to predict mutagenicity.

Original languageEnglish (US)
Pages (from-to)37-44
Number of pages8
JournalEnvironmental Toxicology and Pharmacology
Volume16
Issue number1-2
DOIs
StatePublished - Jan 1 2004

Fingerprint

Quantitative Structure-Activity Relationship
Mutagens
Calibration
Regression Analysis
Weights and Measures
Neural networks
Datasets

Keywords

  • Diagnostics
  • Influence
  • Leverage
  • Linear model
  • Mutagenicity
  • Residuals

Cite this

QSARs for chemical mutagens from structure : Ridge regression fitting and diagnostics. / Hawkins, Douglas M; Basak, Subhash C; Mills, Denise.

In: Environmental Toxicology and Pharmacology, Vol. 16, No. 1-2, 01.01.2004, p. 37-44.

Research output: Contribution to journalArticle

Hawkins, Douglas M ; Basak, Subhash C ; Mills, Denise. / QSARs for chemical mutagens from structure : Ridge regression fitting and diagnostics. In: Environmental Toxicology and Pharmacology. 2004 ; Vol. 16, No. 1-2. pp. 37-44.
@article{8feabca142ba4fc78fc9ac95570c51d9,
title = "QSARs for chemical mutagens from structure: Ridge regression fitting and diagnostics",
abstract = "QSAR models have been developed for a diverse set of mutagens using computed molecular descriptors. Such models can be used in predicting mutagenicity from structure. All common methods - regression, neural nets, k-nearest neighbors - are 'linear smoothers' - weighted averages of the activities in the calibration data with weights dependent on the descriptors. While they have been studied extensively, a vital but overlooked area is 'case diagnostics', pointers to compounds that are poorly fitted, or are unusually influential in fitting the model. This is particularly true where the measured activity is binary - present or absent. We illustrate the use of numeric and graphic diagnostics, particularly that of the FF plot, with a data set with 508 compounds and 307 structural descriptors used to predict mutagenicity.",
keywords = "Diagnostics, Influence, Leverage, Linear model, Mutagenicity, Residuals",
author = "Hawkins, {Douglas M} and Basak, {Subhash C} and Denise Mills",
year = "2004",
month = "1",
day = "1",
doi = "10.1016/j.etap.2003.09.001",
language = "English (US)",
volume = "16",
pages = "37--44",
journal = "Environmental Toxicology and Pharmacology",
issn = "1382-6689",
publisher = "Elsevier",
number = "1-2",

}

TY - JOUR

T1 - QSARs for chemical mutagens from structure

T2 - Ridge regression fitting and diagnostics

AU - Hawkins, Douglas M

AU - Basak, Subhash C

AU - Mills, Denise

PY - 2004/1/1

Y1 - 2004/1/1

N2 - QSAR models have been developed for a diverse set of mutagens using computed molecular descriptors. Such models can be used in predicting mutagenicity from structure. All common methods - regression, neural nets, k-nearest neighbors - are 'linear smoothers' - weighted averages of the activities in the calibration data with weights dependent on the descriptors. While they have been studied extensively, a vital but overlooked area is 'case diagnostics', pointers to compounds that are poorly fitted, or are unusually influential in fitting the model. This is particularly true where the measured activity is binary - present or absent. We illustrate the use of numeric and graphic diagnostics, particularly that of the FF plot, with a data set with 508 compounds and 307 structural descriptors used to predict mutagenicity.

AB - QSAR models have been developed for a diverse set of mutagens using computed molecular descriptors. Such models can be used in predicting mutagenicity from structure. All common methods - regression, neural nets, k-nearest neighbors - are 'linear smoothers' - weighted averages of the activities in the calibration data with weights dependent on the descriptors. While they have been studied extensively, a vital but overlooked area is 'case diagnostics', pointers to compounds that are poorly fitted, or are unusually influential in fitting the model. This is particularly true where the measured activity is binary - present or absent. We illustrate the use of numeric and graphic diagnostics, particularly that of the FF plot, with a data set with 508 compounds and 307 structural descriptors used to predict mutagenicity.

KW - Diagnostics

KW - Influence

KW - Leverage

KW - Linear model

KW - Mutagenicity

KW - Residuals

UR - http://www.scopus.com/inward/record.url?scp=1542316262&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1542316262&partnerID=8YFLogxK

U2 - 10.1016/j.etap.2003.09.001

DO - 10.1016/j.etap.2003.09.001

M3 - Article

VL - 16

SP - 37

EP - 44

JO - Environmental Toxicology and Pharmacology

JF - Environmental Toxicology and Pharmacology

SN - 1382-6689

IS - 1-2

ER -