Ensemble of linear models for predicting drug properties

Tomasz Arodź, David A. Yuen, Arkadiusz Z. Dudek

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

We propose a new classification method for the prediction of drug properties, called random feature subset boosting for linear discriminant analysis (LDA). The main novelty of this method is the ability to overcome the problems with constructing ensembles of linear discriminant models based on generalized eigenvectors of covariance matrices. Such linear models are popular in building classification-based structure-activity relationships. The introduction of ensembles of LDA models allows for an analysis of more complex problems than by using single LDA, for example, those involving multiple mechanisms of action. Using four data sets, we show experimentally that the method is competitive with other recently studied chemoinformatic methods, including support vector machines and models based on decision trees. We present an easy scheme for interpreting the model despite its apparent sophistication. We also outline theoretical evidence as to why, contrary to the conventional AdaBoost ensemble algorithm, this method is able to increase the accuracy of LDA models.

Original languageEnglish (US)
Pages (from-to)416-423
Number of pages8
JournalJournal of Chemical Information and Modeling
Volume46
Issue number1
DOIs
StatePublished - Jan 1 2006

Fingerprint

Dive into the research topics of 'Ensemble of linear models for predicting drug properties'. Together they form a unique fingerprint.

Cite this