Frequent substring-based sequence classification with an ensemble of support vector machines trained using reduced amino acid alphabets

Charith D. Chitraranjan, Loai Alnemer, Omar Al-Azzam, Saeed Salem, Anne M. Denton, Muhammad J. Iqbal, Shahryar F. Kianian

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs. We evaluate the performance of our algorithm using protein sub-cellular localization and protein function datasets. Pair-wise sequence-alignment-based nearest neighbor and basic SVM k-gram classifiers are included as comparison algorithms. Results show that the frequent sub string-based SVM classifier demonstrates better performance compared with other classifiers on the sub-cellular localization datasets and it performs competitively with the nearest neighbor classifier on the protein function datasets. Our results also show that the use of reduced alphabets provides statistically significant performance improvements for half of the classes studied.

Original languageEnglish (US)
Title of host publicationProceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011
Pages180-185
Number of pages6
DOIs
StatePublished - Dec 1 2011
Event10th International Conference on Machine Learning and Applications, ICMLA 2011 - Honolulu, HI, United States
Duration: Dec 18 2011Dec 21 2011

Publication series

NameProceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011
Volume2

Other

Other10th International Conference on Machine Learning and Applications, ICMLA 2011
CountryUnited States
CityHonolulu, HI
Period12/18/1112/21/11

Fingerprint Dive into the research topics of 'Frequent substring-based sequence classification with an ensemble of support vector machines trained using reduced amino acid alphabets'. Together they form a unique fingerprint.

Cite this