Meta-prediction of protein subcellular localization with reduced voting

Jie Liu, Shuli Kang, Chuanning Tang, Lynda B.M. Ellis, Tongbin Li

Research output: Contribution to journalArticlepeer-review

34 Scopus citations

Abstract

Meta-prediction seeks to harness the combined strengths of multiple predicting programs with the hope of achieving predicting performance surpassing that of all existing predictors in a defined problem domain. We investigated meta-prediction for the four-compartment eukaryotic subcellular localization problem. We compiled an unbiased subcellular localization dataset of 1693 nuclear, cytoplasmic, mitochondrial and extracellular animal proteins from Swiss-Prot 50.2. Using this dataset, we assessed the predicting performance of 12 predictors from eight independent subcellular localization predicting programs: ELSPred, LOCtree, PLOC, Proteome Analyst, PSORT, PSORT II, SubLoc and WoLF PSORT. Gorodkin correlation coefficient (GCC) was one of the performance measures. Proteome Analyst is the best individual subcellular localization predictor tested in this four-compartment prediction problem, with GCC= 0.811. A reduced voting strategy eliminating six of the 12 predictors yields a meta-predictor (RAW-RAG-6) with GCC= 0.856, substantially better than all tested individual subcellular localization predictors (P= 8.2×10-6, Fisher's Z-transformation test). The improvement in performance persists when the meta-predictor is tested with data not used in its development. This and similar voting strategies, when properly applied, are expected to produce meta-predictors with outstanding performance in other life sciences problem domains.

Original languageEnglish (US)
Article numbere96
JournalNucleic acids research
Volume35
Issue number15
DOIs
StatePublished - Aug 2007

Fingerprint Dive into the research topics of 'Meta-prediction of protein subcellular localization with reduced voting'. Together they form a unique fingerprint.

Cite this