TY - JOUR
T1 - Support vector machines in HTS data mining
T2 - Type I MetAPs inhibition study
AU - Fang, Jianwen
AU - Dong, Yinghua
AU - Lushington, Gerald H.
AU - Ye, Qi Zhuang
AU - Georg, Gunda I.
PY - 2006/3
Y1 - 2006/3
N2 - This article reports a successful application of support vector machines (SVMs) in mining high-throughput screening (HTS) data of a type I methionine aminopeptidases (MetAPs) inhibition study. A library with 43,736 small organic molecules was used in the study, and 1355 compounds in the library with 40% or higher inhibition activity were considered as active. The data set was randomly split into a training set and a test set (3:1 ratio). The authors were able to rank compounds in the test set using their decision values predicted by SVM models that were built on the training set. They defined a novel score PT 50, the percentage of the test set needed to be screened to recover 50% of the actives, to measure the performance of the models. With carefully selected parameters, SVM models increased the hit rates significantly, and 50% of the active compounds could be recovered by screening just 7% of the test set. The authors found that the size of the training set played a significant role in the performance of the models. A training set with 10,000 member compounds is likely the minimum size required to build a model with reasonable predictive power.
AB - This article reports a successful application of support vector machines (SVMs) in mining high-throughput screening (HTS) data of a type I methionine aminopeptidases (MetAPs) inhibition study. A library with 43,736 small organic molecules was used in the study, and 1355 compounds in the library with 40% or higher inhibition activity were considered as active. The data set was randomly split into a training set and a test set (3:1 ratio). The authors were able to rank compounds in the test set using their decision values predicted by SVM models that were built on the training set. They defined a novel score PT 50, the percentage of the test set needed to be screened to recover 50% of the actives, to measure the performance of the models. With carefully selected parameters, SVM models increased the hit rates significantly, and 50% of the active compounds could be recovered by screening just 7% of the test set. The authors found that the size of the training set played a significant role in the performance of the models. A training set with 10,000 member compounds is likely the minimum size required to build a model with reasonable predictive power.
KW - High-throughput screening
KW - Machine learning
KW - MetAP
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=33644934561&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33644934561&partnerID=8YFLogxK
U2 - 10.1177/1087057105284334
DO - 10.1177/1087057105284334
M3 - Article
C2 - 16418315
AN - SCOPUS:33644934561
SN - 1087-0571
VL - 11
SP - 138
EP - 144
JO - Journal of Biomolecular Screening
JF - Journal of Biomolecular Screening
IS - 2
ER -