Mathematical chemodescriptors and biodescriptors: Background and their applications in the prediction of bioactivity/toxicity of chemicals

Subhash C. Basak

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Scopus citations


This chapter reviews results of research carried out by Basak and collaborators during the past four decades in the development of novel mathematical chemodescriptors and omics-based biodescriptors as well as their applications in quantitative structure-activity relationship (QSAR) and quantitative molecular similarity analysis (QMSA) studies related to the prediction of toxicities, bioactivities, and properties of chemicals. For chemodescriptor-based QSAR and QMSA studies, we have used graph theoretical, three-dimensional (3D), and quantum chemical indices. The graph theoretic chemodescriptors fall into two major categories:(a) Numerical invariants defined on simple molecular graphs representing only the adjacency and distance relationship of atoms bonds; such invariants are called topostructural (TS) indices (b) Topological indices derived from weighted molecular graphs, called topochemical (TC) indices. Collectively, the TS and TC descriptors are known as topological indices (TIs). The set of independent variables used for modeling also includes a group of three-dimensional (3D) molecular descriptors. Semiempirical and various levels of ab initio quantum chemical indices have also been used for hierarchical QSAR (HiQSAR) modeling. Results indicate that in many cases of property-activity/toxicity analyzed by us, a TS + TC combination explains most of the variance in the data. In the area of quantitative molecular similarity analysis (QMSA), we have used different arbitrary (user-defined) and tailored (property-specific) similarity spaces for analog selection and k-nearest neighbor (KNN)-based property estimation of chemicals from their selected analogs. Preliminary data suggest that tailored spaces outperform arbitrary spaces. Additional research is needed to test the validity of this observation. Rapid clustering of large chemical libraries can be accomplished using calculated TIs, and this approach has promise both for drug discovery and toxicology. With respect to biodescriptor development, we have mainly applied techniques of statistics, chemometrics, and discrete mathematics in order to calculate invariants of objects associated with proteomics maps. Invariants or vectors calculated from maps derived from normal animals or cells vis-à-vis those treated with drugs and toxicants show that such descriptors are capable of discriminating between maps of control biological systems and those exposed to drugs or xenobiotics. Finally, we discussed the approach of integrated QSAR (I-QSAR) where both computed chemodescriptors and biodescriptors are used for quantitative prediction of bioactivity.

Original languageEnglish (US)
Title of host publicationSystems Biology Application in Synthetic Biology
PublisherSpringer India
Number of pages31
ISBN (Electronic)9788132228097
ISBN (Print)9788132228073
StatePublished - Jan 1 2016


  • Adjacency matrix
  • Applicability domain (AD)
  • Arbitrary similarity
  • Aryl hydrocarbon (Ah) receptor
  • Atom pair (AP)
  • Atom pairs (APs)
  • Big Data
  • Biodescriptors of proteomics maps 2D gel electrophoresis (2DE)
  • Blood/air partition coefficients
  • Chemical graph
  • Chemodescriptor
  • Congenericity principle
  • Cross-validation (CV)
  • Dibenzofurans
  • Distance matrix
  • Diversity begets diversity principle
  • Equivalence class
  • Euclidean distance
  • External validation
  • Graph invariant
  • HN bird flu HN pandemic bird flu
  • Interrelated two-way clustering (ITC)
  • Learning vector quantization (LVQ) k-nearest neighbor (KNN)
  • Leave-one-out (LOO) cross-validation k-fold cross-validation
  • Map information content (MIC)
  • Model object
  • Molecular structure
  • Pharmacokinetics (PK)
  • Physiologically based pharmacokinetic (PBPK) models
  • Principal component analysis (PCA)
  • Principal components (PCs)
  • Property-activity relationship (PAR)
  • Proteomics maps
  • Quantitative molecular similarity analysis (QMSA)
  • Quantitative structure-activity relationship (QSAR)
  • Quantum chemical descriptors
  • Rank-deficient Two-deep CV Naïve q True q
  • Spectrum-like data D/D matrix
  • Structure-activity relationship (SAR)
  • Tailored similarity
  • Theoretical model
  • Three-dimensional (3D) or geometrical descriptors
  • Topochemical (TC) indices
  • Topological indices (TIs)
  • Topostructural (TS) indices
  • Volatile organic chemicals (VOCs)


Dive into the research topics of 'Mathematical chemodescriptors and biodescriptors: Background and their applications in the prediction of bioactivity/toxicity of chemicals'. Together they form a unique fingerprint.

Cite this