TY - JOUR
T1 - Prediction of Protein Relative Solvent Accessibility with Support Vector Machines and Long-Range Interaction 3D Local Descriptor
AU - Kim, Hyunsoo
AU - Park, Haesun
PY - 2004/2/15
Y1 - 2004/2/15
N2 - The prediction of protein relative solvent accessibility gives us helpful information for the prediction of tertiary structure of a protein. The SVMpsi method, which uses support vector machines (SVMs), and the position-specific scoring matrix (PSSM) generated from PSI-BLAST have been applied to achieve better prediction accuracy of the relative solvent accessibility. We have introduced a three-dimensional local descriptor that contains information about the expected remote contacts by both the long-range interaction matrix and neighbor sequences. Moreover, we applied feature weights to kernels in SVMs in order to consider the degree of significance that depends on the distance from the specific amino acid. Relative solvent accessibility based on a two state-model, for 25%, 16%, 5%, and 0% accessibility are predicted at 78.7%, 80.7%, 82.4%, and 87.4% accuracy, respectively. Three-state prediction results provide a 64.5% accuracy with 9%; 36% threshold. The support vector machine approach has successfully been applied for solvent accessibility prediction by considering long-range interaction and handling unbalanced data.
AB - The prediction of protein relative solvent accessibility gives us helpful information for the prediction of tertiary structure of a protein. The SVMpsi method, which uses support vector machines (SVMs), and the position-specific scoring matrix (PSSM) generated from PSI-BLAST have been applied to achieve better prediction accuracy of the relative solvent accessibility. We have introduced a three-dimensional local descriptor that contains information about the expected remote contacts by both the long-range interaction matrix and neighbor sequences. Moreover, we applied feature weights to kernels in SVMs in order to consider the degree of significance that depends on the distance from the specific amino acid. Relative solvent accessibility based on a two state-model, for 25%, 16%, 5%, and 0% accessibility are predicted at 78.7%, 80.7%, 82.4%, and 87.4% accuracy, respectively. Three-state prediction results provide a 64.5% accuracy with 9%; 36% threshold. The support vector machine approach has successfully been applied for solvent accessibility prediction by considering long-range interaction and handling unbalanced data.
KW - Directed acyclic graph scheme
KW - Long range interaction
KW - PSSM
KW - Protein structure prediction
KW - Solvent accessibility
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=1042268067&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=1042268067&partnerID=8YFLogxK
U2 - 10.1002/prot.10602
DO - 10.1002/prot.10602
M3 - Article
C2 - 14748002
AN - SCOPUS:1042268067
SN - 0887-3585
VL - 54
SP - 557
EP - 562
JO - Proteins: Structure, Function and Genetics
JF - Proteins: Structure, Function and Genetics
IS - 3
ER -