TY - GEN
T1 - Partial profile alignment kernels for protein classification
AU - Ngo, Thanh
AU - Kuang, Rui
PY - 2009
Y1 - 2009
N2 - Remote homology detection and fold recognition are the central problems in protein classification. In real applications, kernel algorithms that are both accurate and efficient are required for classification of large databases. We explore a class of partial profile alignment kernels to be used with support vector machines (SVMs) for remote homology detection and fold recognition. While existing profile-based kernels use the whole profiles to determine the similarity between pairs of proteins, the partial profile alignment kernels are derived from part of the position specific scoring matrices (PSSMs) in the profiles for alignment. Specifically, at each position in the PSSM, only amino acids in the mutationneighborhood of the corresponding amino acid in the original protein sequence are considered for alignment to remove noise and improve computing efficiency. Our experiments on SCOP bench datasets show that the partial profile alignmentkernels achieved overall better classification results for both fold recognition and remote homology detection than profile kernels and profile-alignment kernels. In addition, our algorithm using only a fraction of the profiles saves the cost of computing the kernels significantly, compared to the full-profile alignment methods.
AB - Remote homology detection and fold recognition are the central problems in protein classification. In real applications, kernel algorithms that are both accurate and efficient are required for classification of large databases. We explore a class of partial profile alignment kernels to be used with support vector machines (SVMs) for remote homology detection and fold recognition. While existing profile-based kernels use the whole profiles to determine the similarity between pairs of proteins, the partial profile alignment kernels are derived from part of the position specific scoring matrices (PSSMs) in the profiles for alignment. Specifically, at each position in the PSSM, only amino acids in the mutationneighborhood of the corresponding amino acid in the original protein sequence are considered for alignment to remove noise and improve computing efficiency. Our experiments on SCOP bench datasets show that the partial profile alignmentkernels achieved overall better classification results for both fold recognition and remote homology detection than profile kernels and profile-alignment kernels. In addition, our algorithm using only a fraction of the profiles saves the cost of computing the kernels significantly, compared to the full-profile alignment methods.
UR - http://www.scopus.com/inward/record.url?scp=70349495509&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349495509&partnerID=8YFLogxK
U2 - 10.1109/GENSIPS.2009.5174328
DO - 10.1109/GENSIPS.2009.5174328
M3 - Conference contribution
AN - SCOPUS:70349495509
SN - 9781424447619
T3 - 2009 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2009
BT - 2009 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2009
T2 - 2009 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2009
Y2 - 17 May 2009 through 21 May 2009
ER -