TY - GEN
T1 - A kernel framework for protein residue annotation
AU - Rangwala, Huzefa
AU - Kauffman, Christopher
AU - Karypis, George
PY - 2009
Y1 - 2009
N2 - Over the last decade several prediction methods have been developed for determining structural and functional properties of individual protein residues using sequence and sequence-derived information. Most of these methods are based on support vector machines as they provide accurate and generalizable prediction models. We developed a general purpose protein residue annotation toolkit (ProSAT) to allow biologists to formulate residue-wise prediction problems. ProSAT formulates annotation problem as a classification or regression problem using support vector machines. For every residue ProSAT captures local information (any sequence-derived information) around the reside to create fixed length feature vectors. ProSAT implements accurate and fast kernel functions, and also introduces a flexible window-based encoding scheme that allows better capture of signals for certain prediction problems. In this work we evaluate the performance of ProSAT on the disorder prediction and contact order estimation problems, studying the effect of the different kernels introduced here. ProSAT shows better or at least comparable performance to state-of-the-art prediction systems. In particular ProSAT has proven to be the best performing transmembrane-helix predictor on an independent blind benchmark.
AB - Over the last decade several prediction methods have been developed for determining structural and functional properties of individual protein residues using sequence and sequence-derived information. Most of these methods are based on support vector machines as they provide accurate and generalizable prediction models. We developed a general purpose protein residue annotation toolkit (ProSAT) to allow biologists to formulate residue-wise prediction problems. ProSAT formulates annotation problem as a classification or regression problem using support vector machines. For every residue ProSAT captures local information (any sequence-derived information) around the reside to create fixed length feature vectors. ProSAT implements accurate and fast kernel functions, and also introduces a flexible window-based encoding scheme that allows better capture of signals for certain prediction problems. In this work we evaluate the performance of ProSAT on the disorder prediction and contact order estimation problems, studying the effect of the different kernels introduced here. ProSAT shows better or at least comparable performance to state-of-the-art prediction systems. In particular ProSAT has proven to be the best performing transmembrane-helix predictor on an independent blind benchmark.
UR - http://www.scopus.com/inward/record.url?scp=67650680261&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650680261&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-01307-2_40
DO - 10.1007/978-3-642-01307-2_40
M3 - Conference contribution
AN - SCOPUS:67650680261
SN - 3642013066
SN - 9783642013065
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 439
EP - 451
BT - 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
T2 - 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
Y2 - 27 April 2009 through 30 April 2009
ER -