A major obstacle towards understanding the molecular basis of transcriptional regulation is the lack of a recognition code for protein-DNA interactions. Using high-quality crystal structures and binding data on the promiscuous family of C2H2 zinc fingers (ZF), we decode 10 fundamental specific interactions responsible for protein-DNA recognition. The interactions include five hydrogen bond types, three atomic desolvation penalties, a favorable non-polar energy, and a novel water accessibility factor. We apply this code to three large datasets containing a total of 89 C2 H2 transcription factor (TF) mutants on the three ZFs of EGR. Guided by molecular dynamics simulations of individual ZFs, we map the interactions into homology models that embody all feasible intra- and intermolecular bonds, selecting for each sequence the structure with the lowest free energy. These interactions reproduce the change in affinity of 35 mutants of finger I (R2 = 0.998), 23 mutants of finger II (R2 = 0.96) and 31 finger III human domains (R2 = 0.94). Our findings reveal recognition rules that depend on DNA sequence/structure, molecular water at the interface and induced fit of the C2H2 TFs. Collectively, our method provides the first robust framework to decode the molecular basis of TFs binding to DNA.
Bibliographical noteFunding Information:
National Science Foundation MCB-0744077. Funding for open access charge: National Science Foundation MCB-0744077.