TY - JOUR
T1 - YASSPP
T2 - Better kernels and coding schemes lead to improvements in protein secondary structure prediction
AU - Karypis, George
PY - 2006/8/15
Y1 - 2006/8/15
N2 - The accurate prediction of a protein's secondary structure plays an increasingly critical role in predicting its function and tertiary structure, as it is utilized by many of the current state-of-the-art methods for remote homology, fold recognition, and ab initio structure prediction. We developed a new secondary structure prediction algorithm called YASSPP, which uses a pair of cascaded models constructed from two sets of binary SVM-based models. YASSPP uses an input coding scheme that combines both position-specific and nonposition-specific information, utilizes a kernel function designed to capture the sequence conservation signals around the local window of each residue, and constructs a second-level model by incorporating both the three-state predictions produced by the first-level model and information about the original sequence. Experiments on three standard datasets (RS126, CB513, and EVA common subset 4) show that YASSPP is capable of producing the highest Q3 and SOV scores than that achieved by existing widely used schemes such as PSIPRED, SSPro 4.0, SAM-T99sec, as well as previously developed SVM-based schemes. On the EVA dataset it achieves a Q3 and SOV score of 79.34 and 78.65%, which are considerably higher than the best reported scores of 77.64 and 76.05%, respectively.
AB - The accurate prediction of a protein's secondary structure plays an increasingly critical role in predicting its function and tertiary structure, as it is utilized by many of the current state-of-the-art methods for remote homology, fold recognition, and ab initio structure prediction. We developed a new secondary structure prediction algorithm called YASSPP, which uses a pair of cascaded models constructed from two sets of binary SVM-based models. YASSPP uses an input coding scheme that combines both position-specific and nonposition-specific information, utilizes a kernel function designed to capture the sequence conservation signals around the local window of each residue, and constructs a second-level model by incorporating both the three-state predictions produced by the first-level model and information about the original sequence. Experiments on three standard datasets (RS126, CB513, and EVA common subset 4) show that YASSPP is capable of producing the highest Q3 and SOV scores than that achieved by existing widely used schemes such as PSIPRED, SSPro 4.0, SAM-T99sec, as well as previously developed SVM-based schemes. On the EVA dataset it achieves a Q3 and SOV score of 79.34 and 78.65%, which are considerably higher than the best reported scores of 77.64 and 76.05%, respectively.
KW - Machine learning
KW - Proteins
KW - Structural bioinformatics1
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=33746267388&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33746267388&partnerID=8YFLogxK
U2 - 10.1002/prot.21036
DO - 10.1002/prot.21036
M3 - Article
C2 - 16763996
AN - SCOPUS:33746267388
SN - 0887-3585
VL - 64
SP - 575
EP - 586
JO - Proteins: Structure, Function and Genetics
JF - Proteins: Structure, Function and Genetics
IS - 3
ER -