Skip to main content

Optimized atomic statistical potentials: assessment of protein interfaces and loops

Dong GQ, Fan H, Schneidman-Duhovny D, Webb B, Sali A. (2013) Bioinformatics 29, 3158-66. PMCID: PMC3842762

This Sali lab publication from the EFI Computation Core presents improved statistical potentials which have an application in protein modeling. As more and more of the protein universe is uncovered in genomic sequencing projects, modeling has become a crucial way to analyze the millions of proteins now known. Here the Sali group devises a Baysian framework which can derive optimized statistical potentials for many modeling applications.

ABSTRACT

Motivation: Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state.

Link to Pubmed »

2013 Dong 1 Fig. 1: Flowchart for optimizing statistical potentials. The corresponding sections in the text are indicated.

2013 Dong 1 Fig. 2: Distance and angles between two covalent bonds, A–B and C–D. Graphic, distance between atoms A and C. Graphic, angle between atoms B, A and C. Graphic, angle between atoms A, C and D. Graphic, dihedral angle between atoms B, A, C and D. Graphic is defined using atoms A and C.

2013 Dong 3 Fig. 3: Distance and dihedral angle joint distribution between alanine N-Cα and alanine O-C, when Graphic and Graphic. (A) Original distribution. (B) Smoothed distribution.

2013 Dong 4 Fig. 4: Distance distributions P(fc|Qk) for different atom pairs are clustered into 15 different groups. Each line represents a distance distribution from a pair of atoms of certain types. Each group has 6–8401 distributions. During k-mean clustering, the number of clusters was set to 20, resulting in 14 clusters with >5 distributions and 6 clusters with <5 distributions; the latter 6 clusters are grouped together (bottom right panel).

2013 Dong 5 Fig. 5: Success rates of SOAP-PP, ZRANK and FireDock on the PatchDock and ZDOCK decoy sets. (A) Success rates on the PatchDock decoy set, where a success is defined as having an acceptable accuracy structure in the top N predictions (x-axis). (B) Success rates on the PatchDock decoy set for picking structures with medium accuracy. (C) Success rates on the ZDOCK decoy set for picking structures with acceptable accuracy. (D) Success rates on the ZDOCK decoy set for picking structures with medium accuracy.

2013 Dong 6 Fig. 6: Comparison of the top ranked, best sampled and native configurations. (A) 2G77. (B) 1OC0. The receptor is shown in gray. The ligand is shown in the native configuration (yellow), the best sampled configuration (green for 2G77 and black for 1OC0) and the top ranked configuration by SOAP (green), FireDock (blue) and ZRANK (red).

2013 Dong 7 Fig. 7: Accuracy of SOAP-Loop. The average main-chain RMSD of top ranked structures by DOPE, DFIRE, Rosetta, PLOP and SOAP-Loop on PLOP loop modeling decoys. The average RMSD of the most accurate conformations sampled by PLOP is plotted by a dash-dotted line.

2013 Dong 8 Fig. 8: Recovery functions for SOAP-PP and SOAP-Loop are compared with DOPE and DFIRE’s reference states

2013 Dong 9 Fig. 9: Comparison of the top ranked, best sampled and native configurations. (A) 1CYO. (B) 2AYH. The native structure is shown in light gray. The loop is shown in the native configuration (yellow), the best sampled configuration (black for 1CYO and green for 2AYH) and the top ranked configuration by SOAP (green), DOPE (blue), DFIRE (red), Rosetta (magenta) and PLOP (light blue)

Dong, et al. Optimized atomic statistical potentials: asessment of protein interfaces and loops, Bioinformatics 2013 Sept 27, by permission of Oxford University Press.