Abstract
Quantitative bioactivity and toxicity assessment of chemical compounds plays a central role in drug discovery as it saves a substantial amount of resources. To this end, high-performance computing has enabled researchers and practitioners to leverage hundreds, or even thousands, of computed molecular descriptors for the activity prediction of candidate compounds. In this paper, we evaluate the utility of two large groups of chemical descriptors by such predictive modelling, as well as chemical structure discovery, through empirical analysis. We use a suite of commercially available and in-house software to calculate molecular descriptors for two sets of chemical mutagens–a homogeneous set of 95 amines, and a diverse set of 508 chemicals. Using calculated descriptors, we model the mutagenic activity of these compounds using a number of methods from the statistics and machine-learning literature, and use robust principal component analysis to investigate the low-dimensional subspaces that characterize these chemicals. Our results suggest that combining different sets of descriptors is likely to result in a better predictive model–but that depends on the compounds being modelled and the modelling technique being used.
Original language | English (US) |
---|---|
Pages (from-to) | 579-590 |
Number of pages | 12 |
Journal | SAR and QSAR in environmental research |
Volume | 29 |
Issue number | 8 |
DOIs | |
State | Published - Aug 3 2018 |
Bibliographical note
Funding Information:The research of SM is supported by George Michailidis.
Publisher Copyright:
© 2018, © 2018 Informa UK Limited, trading as Taylor & Francis Group.
Keywords
- dimension reduction
- machine learning
- molecular descriptors
- quantitative structure–activity relationship (QSAR)
- two-deep cross-validation
- variable selection