Data mining algorithms for virtual screening of bioactive compounds

Mukund Deshpande, Michihiro Kuramochi, George Karypis

Research output: Chapter in Book/Report/Conference proceedingChapter

3 Scopus citations


In this chapter we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of this approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on the average, outperforms existing schemes by 10% to 35%.

Original languageEnglish (US)
Title of host publicationSpringer Optimization and Its Applications
PublisherSpringer International Publishing
Number of pages28
StatePublished - 2007

Publication series

NameSpringer Optimization and Its Applications
ISSN (Print)1931-6828
ISSN (Electronic)1931-6836

Bibliographical note

Publisher Copyright:
© Springer Science+Business Media, LLC.


  • Chemical Compounds
  • Classification
  • Graphs
  • SVM
  • Virtual Screening


Dive into the research topics of 'Data mining algorithms for virtual screening of bioactive compounds'. Together they form a unique fingerprint.

Cite this