Support vector machines in HTS data mining: Type I MetAPs inhibition study

Jianwen Fang, Yinghua Dong, Gerald H. Lushington, Qi Zhuang Ye, Gunda I. Georg

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


This article reports a successful application of support vector machines (SVMs) in mining high-throughput screening (HTS) data of a type I methionine aminopeptidases (MetAPs) inhibition study. A library with 43,736 small organic molecules was used in the study, and 1355 compounds in the library with 40% or higher inhibition activity were considered as active. The data set was randomly split into a training set and a test set (3:1 ratio). The authors were able to rank compounds in the test set using their decision values predicted by SVM models that were built on the training set. They defined a novel score PT 50, the percentage of the test set needed to be screened to recover 50% of the actives, to measure the performance of the models. With carefully selected parameters, SVM models increased the hit rates significantly, and 50% of the active compounds could be recovered by screening just 7% of the test set. The authors found that the size of the training set played a significant role in the performance of the models. A training set with 10,000 member compounds is likely the minimum size required to build a model with reasonable predictive power.

Original languageEnglish (US)
Pages (from-to)138-144
Number of pages7
JournalJournal of Biomolecular Screening
Issue number2
StatePublished - Mar 2006


  • High-throughput screening
  • Machine learning
  • MetAP
  • Support vector machines


Dive into the research topics of 'Support vector machines in HTS data mining: Type I MetAPs inhibition study'. Together they form a unique fingerprint.

Cite this