TY - JOUR
T1 - Mathematical structural descriptors and mutagenicity assessment
T2 - a study with congeneric and diverse datasets$
AU - Majumdar, S.
AU - Basak, Subhash C
AU - Lungu, C. N.
AU - Diudea, M. V.
AU - Grunwald, G. D.
PY - 2018/8/3
Y1 - 2018/8/3
N2 - Quantitative bioactivity and toxicity assessment of chemical compounds plays a central role in drug discovery as it saves a substantial amount of resources. To this end, high-performance computing has enabled researchers and practitioners to leverage hundreds, or even thousands, of computed molecular descriptors for the activity prediction of candidate compounds. In this paper, we evaluate the utility of two large groups of chemical descriptors by such predictive modelling, as well as chemical structure discovery, through empirical analysis. We use a suite of commercially available and in-house software to calculate molecular descriptors for two sets of chemical mutagens–a homogeneous set of 95 amines, and a diverse set of 508 chemicals. Using calculated descriptors, we model the mutagenic activity of these compounds using a number of methods from the statistics and machine-learning literature, and use robust principal component analysis to investigate the low-dimensional subspaces that characterize these chemicals. Our results suggest that combining different sets of descriptors is likely to result in a better predictive model–but that depends on the compounds being modelled and the modelling technique being used.
AB - Quantitative bioactivity and toxicity assessment of chemical compounds plays a central role in drug discovery as it saves a substantial amount of resources. To this end, high-performance computing has enabled researchers and practitioners to leverage hundreds, or even thousands, of computed molecular descriptors for the activity prediction of candidate compounds. In this paper, we evaluate the utility of two large groups of chemical descriptors by such predictive modelling, as well as chemical structure discovery, through empirical analysis. We use a suite of commercially available and in-house software to calculate molecular descriptors for two sets of chemical mutagens–a homogeneous set of 95 amines, and a diverse set of 508 chemicals. Using calculated descriptors, we model the mutagenic activity of these compounds using a number of methods from the statistics and machine-learning literature, and use robust principal component analysis to investigate the low-dimensional subspaces that characterize these chemicals. Our results suggest that combining different sets of descriptors is likely to result in a better predictive model–but that depends on the compounds being modelled and the modelling technique being used.
KW - dimension reduction
KW - machine learning
KW - molecular descriptors
KW - quantitative structure–activity relationship (QSAR)
KW - two-deep cross-validation
KW - variable selection
UR - http://www.scopus.com/inward/record.url?scp=85050339878&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050339878&partnerID=8YFLogxK
U2 - 10.1080/1062936X.2018.1496475
DO - 10.1080/1062936X.2018.1496475
M3 - Article
C2 - 30025481
AN - SCOPUS:85050339878
VL - 29
SP - 579
EP - 590
JO - SAR and QSAR in Environmental Research
JF - SAR and QSAR in Environmental Research
SN - 1062-936X
IS - 8
ER -