Abstract
The prediction of protein secondary structure is an important step in the prediction of protein tertiary structure. A new protein secondary structure prediction method, SVMpsi, was developed to improve the current level of prediction by incorporating new tertiary classifiers and their jury decision system, and the PSI-BLAST PSSM profiles. Additionally, efficient methods to handle unbalanced data and a new optimization strategy for maximizing the Q 3 measure were developed. The SVMpsi produces the highest published Q3 and SOV94 scores on both the RS126 and CB513 data sets to date. For a new KP480 set, the prediction accuracy of SVMpsi was Q3 = 78. 5% and SOV94 = 82.8%. Moreover, the blind test results for 136 non-redundant protein sequences which do not contain homologues of training data sets were Q3 = 77.2% and SOV94 = 81.8%. The SVMpsi results in CASP5 illustrate that it is another competitive method to predict protein secondary structure.
Original language | English (US) |
---|---|
Pages (from-to) | 553-560 |
Number of pages | 8 |
Journal | Protein Engineering |
Volume | 16 |
Issue number | 8 |
DOIs | |
State | Published - Aug 1 2003 |
Externally published | Yes |
Bibliographical note
Funding Information:This work was supported in part by the National Science Foundation, grants CCR-0204109 and ACI-0305543. Any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation. Part of this work was carried out while the authors were visiting the Korea Institute for Advanced Study, Seoul, Korea, from January 2002 to August 2002. The authors thank the University of Minnesota Supercomputing Institute (MSI) for intensive numerical computing. They also thank Professor Thorsten Joachims for making SVMlight software available, James A.Cuff and Professor Geoffrey J.Barton for providing the data set and Professor David T.Jones for the PFILT software and his kind help.
Keywords
- Directed acyclic graph scheme
- Position-specific scoring matrix
- Protein structure prediction
- Secondary structure
- Support vector machines