Quantitative structure-activity relationship modeling of juvenile hormone mimetic compounds for Culex pipiens larvae, with a discussion of descriptor-thinning methods

Subhash C Basak, Ramanathan Natarajan, Denise Mills, Douglas M Hawkins, Jessica J. Kraker

Research output: Contribution to journalArticlepeer-review

34 Scopus citations

Abstract

Quantitative structure-activity relationship (QSAR) modelers often encounter the problem of multicollinearity owing to the availability of large numbers of computable molecular descriptors. Sparsity of the variables while using descriptors such as atom pairs increases the complexity. Three different predictor-thinning methods, namely, a modified Gram-Schmidt algorithm, a marginal soft thresholding algorithm, and LASSO (least absolute shrinkage and selection operator), were utilized to reduce the number of descriptors prior to developing linear models. Juvenile hormone (JH) activity of 304 compounds on Culex pipiens larvae was taken as the model data set, and predictor trimming of a large number of diverse descriptors comprising 268 global molecular descriptors (topostructural, topochemical, and geometrical), 13 quantum chemical descriptors, and 915 atom pairs (substructural counts) was applied prior to linear regression by the ridge regression method. The data set (N = 304) was split into five calibration data sets of random samples of sizes 60/110/160/210/260, and the remaining 244/194/144/94/44 compounds were used for validations. LASSO was not found to be a very effective method in handling a large set of descriptors because the number of predictors retained could not exceed the number of observations. The results indicated that the modified Gram-Schmidt algorithm could be used to trim the number of predictors in the global molecular descriptor set where collinearity of the descriptors was the major concern. On the contrary, the soft thresholding approach was found to be an effective tool in subset selection from a diverse set of descriptors having both sparsity and multicollinearity, as in the case of the combined set of atom pairs and global molecular descriptors. The final model developed after variable selection was dominated more by atom pairs, which indicated the important structural moieties that affect JH activity of the compounds. The success of the method reiterates the fact that QSAR or quantitative structure - property relationship (QSPR) models can be developed for a diverse set of compounds using properly parametrized and diverse sets of descriptors, of course, with the selection of the appropriate statistical tools.

Original languageEnglish (US)
Pages (from-to)65-77
Number of pages13
JournalJournal of Chemical Information and Modeling
Volume46
Issue number1
DOIs
StatePublished - Jan 1 2006

Fingerprint Dive into the research topics of 'Quantitative structure-activity relationship modeling of juvenile hormone mimetic compounds for Culex pipiens larvae, with a discussion of descriptor-thinning methods'. Together they form a unique fingerprint.

Cite this