Mathematical structural descriptors and mutagenicity assessment

a study with congeneric and diverse datasets$

S. Majumdar, Subhash C Basak, C. N. Lungu, M. V. Diudea, G. D. Grunwald

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Quantitative bioactivity and toxicity assessment of chemical compounds plays a central role in drug discovery as it saves a substantial amount of resources. To this end, high-performance computing has enabled researchers and practitioners to leverage hundreds, or even thousands, of computed molecular descriptors for the activity prediction of candidate compounds. In this paper, we evaluate the utility of two large groups of chemical descriptors by such predictive modelling, as well as chemical structure discovery, through empirical analysis. We use a suite of commercially available and in-house software to calculate molecular descriptors for two sets of chemical mutagens–a homogeneous set of 95 amines, and a diverse set of 508 chemicals. Using calculated descriptors, we model the mutagenic activity of these compounds using a number of methods from the statistics and machine-learning literature, and use robust principal component analysis to investigate the low-dimensional subspaces that characterize these chemicals. Our results suggest that combining different sets of descriptors is likely to result in a better predictive model–but that depends on the compounds being modelled and the modelling technique being used.

Original languageEnglish (US)
Pages (from-to)579-590
Number of pages12
JournalSAR and QSAR in environmental research
Volume29
Issue number8
DOIs
StatePublished - Aug 3 2018

Fingerprint

Computing Methodologies
Drug Discovery
Principal Component Analysis
Amines
Software
Research Personnel
Chemical compounds
Bioactivity
Principal component analysis
Toxicity
Learning systems
Statistics
Datasets
Machine Learning

Keywords

  • dimension reduction
  • machine learning
  • molecular descriptors
  • quantitative structure–activity relationship (QSAR)
  • two-deep cross-validation
  • variable selection

Cite this

Mathematical structural descriptors and mutagenicity assessment : a study with congeneric and diverse datasets$. / Majumdar, S.; Basak, Subhash C; Lungu, C. N.; Diudea, M. V.; Grunwald, G. D.

In: SAR and QSAR in environmental research, Vol. 29, No. 8, 03.08.2018, p. 579-590.

Research output: Contribution to journalArticle

Majumdar, S. ; Basak, Subhash C ; Lungu, C. N. ; Diudea, M. V. ; Grunwald, G. D. / Mathematical structural descriptors and mutagenicity assessment : a study with congeneric and diverse datasets$. In: SAR and QSAR in environmental research. 2018 ; Vol. 29, No. 8. pp. 579-590.
@article{89e0fbd9e1ea469198c7e16b7acf234e,
title = "Mathematical structural descriptors and mutagenicity assessment: a study with congeneric and diverse datasets$",
abstract = "Quantitative bioactivity and toxicity assessment of chemical compounds plays a central role in drug discovery as it saves a substantial amount of resources. To this end, high-performance computing has enabled researchers and practitioners to leverage hundreds, or even thousands, of computed molecular descriptors for the activity prediction of candidate compounds. In this paper, we evaluate the utility of two large groups of chemical descriptors by such predictive modelling, as well as chemical structure discovery, through empirical analysis. We use a suite of commercially available and in-house software to calculate molecular descriptors for two sets of chemical mutagens–a homogeneous set of 95 amines, and a diverse set of 508 chemicals. Using calculated descriptors, we model the mutagenic activity of these compounds using a number of methods from the statistics and machine-learning literature, and use robust principal component analysis to investigate the low-dimensional subspaces that characterize these chemicals. Our results suggest that combining different sets of descriptors is likely to result in a better predictive model–but that depends on the compounds being modelled and the modelling technique being used.",
keywords = "dimension reduction, machine learning, molecular descriptors, quantitative structure–activity relationship (QSAR), two-deep cross-validation, variable selection",
author = "S. Majumdar and Basak, {Subhash C} and Lungu, {C. N.} and Diudea, {M. V.} and Grunwald, {G. D.}",
year = "2018",
month = "8",
day = "3",
doi = "10.1080/1062936X.2018.1496475",
language = "English (US)",
volume = "29",
pages = "579--590",
journal = "SAR and QSAR in Environmental Research",
issn = "1062-936X",
publisher = "Taylor and Francis Ltd.",
number = "8",

}

TY - JOUR

T1 - Mathematical structural descriptors and mutagenicity assessment

T2 - a study with congeneric and diverse datasets$

AU - Majumdar, S.

AU - Basak, Subhash C

AU - Lungu, C. N.

AU - Diudea, M. V.

AU - Grunwald, G. D.

PY - 2018/8/3

Y1 - 2018/8/3

N2 - Quantitative bioactivity and toxicity assessment of chemical compounds plays a central role in drug discovery as it saves a substantial amount of resources. To this end, high-performance computing has enabled researchers and practitioners to leverage hundreds, or even thousands, of computed molecular descriptors for the activity prediction of candidate compounds. In this paper, we evaluate the utility of two large groups of chemical descriptors by such predictive modelling, as well as chemical structure discovery, through empirical analysis. We use a suite of commercially available and in-house software to calculate molecular descriptors for two sets of chemical mutagens–a homogeneous set of 95 amines, and a diverse set of 508 chemicals. Using calculated descriptors, we model the mutagenic activity of these compounds using a number of methods from the statistics and machine-learning literature, and use robust principal component analysis to investigate the low-dimensional subspaces that characterize these chemicals. Our results suggest that combining different sets of descriptors is likely to result in a better predictive model–but that depends on the compounds being modelled and the modelling technique being used.

AB - Quantitative bioactivity and toxicity assessment of chemical compounds plays a central role in drug discovery as it saves a substantial amount of resources. To this end, high-performance computing has enabled researchers and practitioners to leverage hundreds, or even thousands, of computed molecular descriptors for the activity prediction of candidate compounds. In this paper, we evaluate the utility of two large groups of chemical descriptors by such predictive modelling, as well as chemical structure discovery, through empirical analysis. We use a suite of commercially available and in-house software to calculate molecular descriptors for two sets of chemical mutagens–a homogeneous set of 95 amines, and a diverse set of 508 chemicals. Using calculated descriptors, we model the mutagenic activity of these compounds using a number of methods from the statistics and machine-learning literature, and use robust principal component analysis to investigate the low-dimensional subspaces that characterize these chemicals. Our results suggest that combining different sets of descriptors is likely to result in a better predictive model–but that depends on the compounds being modelled and the modelling technique being used.

KW - dimension reduction

KW - machine learning

KW - molecular descriptors

KW - quantitative structure–activity relationship (QSAR)

KW - two-deep cross-validation

KW - variable selection

UR - http://www.scopus.com/inward/record.url?scp=85050339878&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050339878&partnerID=8YFLogxK

U2 - 10.1080/1062936X.2018.1496475

DO - 10.1080/1062936X.2018.1496475

M3 - Article

VL - 29

SP - 579

EP - 590

JO - SAR and QSAR in Environmental Research

JF - SAR and QSAR in Environmental Research

SN - 1062-936X

IS - 8

ER -