This Sali lab publication from the EFI Computation Core presents improved statistical potentials which have an application in protein modeling. As more and more of the protein universe is uncovered in genomic sequencing projects, modeling has become a crucial way to analyze the millions of proteins now known. Here the Sali group devises a Baysian framework which can derive optimized statistical potentials for many modeling applications.

### ABSTRACT

Motivation: Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state.

**Fig. 4:** Distance distributions P(fc|Qk) for different atom pairs are clustered into 15 different groups. Each line represents a distance distribution from a pair of atoms of certain types. Each group has 6–8401 distributions. During k-mean clustering, the number of clusters was set to 20, resulting in 14 clusters with >5 distributions and 6 clusters with <5 distributions; the latter 6 clusters are grouped together (bottom right panel).

**Fig. 5:** Success rates of SOAP-PP, ZRANK and FireDock on the PatchDock and ZDOCK decoy sets. (A) Success rates on the PatchDock decoy set, where a success is defined as having an acceptable accuracy structure in the top N predictions (x-axis). (B) Success rates on the PatchDock decoy set for picking structures with medium accuracy. (C) Success rates on the ZDOCK decoy set for picking structures with acceptable accuracy. (D) Success rates on the ZDOCK decoy set for picking structures with medium accuracy.

**Fig. 6:** Comparison of the top ranked, best sampled and native configurations. (A) 2G77. (B) 1OC0. The receptor is shown in gray. The ligand is shown in the native configuration (yellow), the best sampled configuration (green for 2G77 and black for 1OC0) and the top ranked configuration by SOAP (green), FireDock (blue) and ZRANK (red).

**Fig. 9:** Comparison of the top ranked, best sampled and native configurations. (A) 1CYO. (B) 2AYH. The native structure is shown in light gray. The loop is shown in the native configuration (yellow), the best sampled configuration (black for 1CYO and green for 2AYH) and the top ranked configuration by SOAP (green), DOPE (blue), DFIRE (red), Rosetta (magenta) and PLOP (light blue)

Dong, et al. Optimized atomic statistical potentials: asessment of protein interfaces and loops, Bioinformatics 2013 Sept 27, by permission of Oxford University Press.