Molecular similarity-based estimation of properties: A comparison of three structure spaces

Brian D. Gute, Subhash C. Basak

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


Similarity, like beauty, is an intuitive concept based on personal perception and bias. In the realm of molecular similarity, each method is user defined based on the features deemed important. A method's efficacy depends on the set of descriptors used to define the intermolecular similarity of chemicals and on the mathematical function used to quantify similarity. Quantitative molecular similarity analysis (QMSA) methods, based on experimental data or computed molecular descriptors, have emerged as powerful tools for analog selection and property estimation. We have carried out a comparative study of similarity spaces derived from atom pairs and a large set of topological indices for two diverse sets of chemicals: (a) a set of 469 chemicals with vapor pressure data from the TSCA inventory, and (b) a set of 213 chemicals with lipophilicity data from the STARLIST inventory. These spaces were used for the KNN-based estimation of properties (K=1-10, 15, 20, 25). The results for the QMSA models developed in this paper are also compared with model estimates derived from hierarchical QSARs.

Original languageEnglish (US)
Pages (from-to)95-109
Number of pages15
JournalJournal of Molecular Graphics and Modelling
Issue number1
StatePublished - Oct 16 2001


  • Atom pairs
  • Hierarchical QSAR
  • Molecular similarity
  • Property estimation
  • Structure space
  • Topological indices

Fingerprint Dive into the research topics of 'Molecular similarity-based estimation of properties: A comparison of three structure spaces'. Together they form a unique fingerprint.

Cite this