TY - JOUR
T1 - Coarse- and fine-grained models for proteins
T2 - Evaluation by decoy discrimination
AU - Kauffman, Chris
AU - Karypis, George
PY - 2013/5
Y1 - 2013/5
N2 - Coarse-grained models for protein structure are increasingly used in simulations and structural bioinformatics. In this study, we evaluated the effectiveness of three granularities of protein representation based on their ability to discriminate between correctly folded native structures and incorrectly folded decoy structures. The three levels of representation used one bead per amino acid (coarse), two beads per amino acid (medium), and all atoms (fine). Multiple structure features were compared at each representation level including two-body interactions, three-body interactions, solvent exposure, contact numbers, and angle bending. In most cases, the all-atom level was most successful at discriminating decoys, but the two-bead level provided a good compromise between the number of model parameters which must be estimated and the accuracy achieved. The most effective feature type appeared to be two-body interactions. Considering three-body interactions increased accuracy only marginally when all atoms were used and not at all in medium and coarse representations. Though two-body interactions were most effective for the coarse representations, the accuracy loss for using only solvent exposure or contact number was proportionally less at these levels than in the all-atom representation. We propose an optimization method capable of selecting bead types of different granularities to create a mixed representation of the protein. We illustrate its behavior on decoy discrimination and discuss implications for data-driven protein model selection.
AB - Coarse-grained models for protein structure are increasingly used in simulations and structural bioinformatics. In this study, we evaluated the effectiveness of three granularities of protein representation based on their ability to discriminate between correctly folded native structures and incorrectly folded decoy structures. The three levels of representation used one bead per amino acid (coarse), two beads per amino acid (medium), and all atoms (fine). Multiple structure features were compared at each representation level including two-body interactions, three-body interactions, solvent exposure, contact numbers, and angle bending. In most cases, the all-atom level was most successful at discriminating decoys, but the two-bead level provided a good compromise between the number of model parameters which must be estimated and the accuracy achieved. The most effective feature type appeared to be two-body interactions. Considering three-body interactions increased accuracy only marginally when all atoms were used and not at all in medium and coarse representations. Though two-body interactions were most effective for the coarse representations, the accuracy loss for using only solvent exposure or contact number was proportionally less at these levels than in the all-atom representation. We propose an optimization method capable of selecting bead types of different granularities to create a mixed representation of the protein. We illustrate its behavior on decoy discrimination and discuss implications for data-driven protein model selection.
KW - Coarse-grained models
KW - Machine learning
KW - N-body interactions
KW - Protein decoy discrimination
KW - Protein model selection
UR - http://www.scopus.com/inward/record.url?scp=84875833176&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84875833176&partnerID=8YFLogxK
U2 - 10.1002/prot.24222
DO - 10.1002/prot.24222
M3 - Article
C2 - 23184763
AN - SCOPUS:84875833176
SN - 0887-3585
VL - 81
SP - 754
EP - 773
JO - Proteins: Structure, Function and Bioinformatics
JF - Proteins: Structure, Function and Bioinformatics
IS - 5
ER -