We describe a large scale application of a back-propagation neural network to the analysis, classification and prediction of protein secondary and tertiary structure from sequence information alone. A back-propagation network called BigNet has been implemented along with a Network Description Language (NDL) on the 512 MWord Cray 2 at the Minnesota Supercomputer Center. The proof-of-concept experiments described here used a small, heterologous training set of small protein structures (15 proteins each with less than 133 residues) from the Brookhaven Protein Data Bank (PDB). Simulations with one hidden layer and one half to ten million connections execute at three to five million connection updates per second in full back-propagation learning mode and routinely converge to solutions where input of hydrophobicity-coded sequence yields output distance matrices with 0.3 to 1.5% RMS deviation from actual distance matrices. Although the training set used is too small to expect useful generalization, some evidence of generalization was evident in similarity of learning progress of homologous pairs within the training set and in production of novel distance matrix outputs upon presentation with novel input sequences. The discussion addresses limitations in the current implementation, plans for software improvements, and characteristics of future training sets.
Bibliographical noteFunding Information:
The authors gratefully acknowledge the expert technical assistance of Yiyi Xin and Tidhar Carmeli, who contributed substantially to the conduct of the experiments described here; we also thank Joseph Habermann of the Minnesota Supercomputer Institute (MSI) and Bill King of the Minnesota Supercomputer Center, Inc. for their help in development of graphic display programs. We also acknowledge the following organizations for support of this research: MSI and Cray Research Inc. provided supercomputer access to GLW, and MSI partially supports MOP and YX; the Army High Performance Computing Research Center (AHPCRC) partially supports YX, and the National Institutes of Health (NIH grant R03-RR-05294) partially supported GLW, MOP and TC.
- Conformation prediction
- Distance Matrix
- Hydrophobicity coding