HAD superfamily members share a Rossman‑like core domain and sometimes contain a cap module that regulates access of substrates to the active site. Insight into the co-variance of cap and core domains and thus HAD structure/function relationship is critically needed to guide target selection and assist computational docking. The HAD Bridging Project and Superfamily/Genome and Computation Cores combined efforts to carry out a quantitative analysis of over a hundred core-domain-only and cap-domain-only structures to assess the impact of cap insertion on divergence of the core. The results detailed in this publication revealed basic protein design principles that suggest intra-molecular coevolution in which the thermodynamically stable superfold diverges differentially in the context of an accessory domain. The relationships between sequence and structure for other enzyme superfamilies and families with multiple domains may be elucidated with a similar strategy, thereby helping in the development of functional assignments strategies that are broadly applicable.
Although the universe of protein structures is vast, these innumerable structures can be categorized into a finite number of folds. New functions commonly evolve by elaboration of existing scaffolds, for example, via domain insertions. Thus, understanding structural diversity of a protein fold evolving via domain insertions is a fundamental challenge. The haloalkanoic dehalogenase superfamily serves as an excellent model system wherein a variable cap domain accessorizes the ubiquitous Rossmann-fold core domain. Here, we determine the impact of the cap-domain insertion on the sequence and structure divergence of the core domain. Through quantitative analysis on a unique dataset of 154 core-domain-only and cap-domain-only structures, basic principles of their evolution have been uncovered. The relationship between sequence and structure divergence of the core domain is shown to be monotonic and independent of the corresponding type of domain insert, reflecting the robustness of the Rossmann fold to mutation. However, core domains with the same cap type share greater similarity at the sequence and structure levels, suggesting interplay between the cap and core domains. Notably, results reveal that the variance in structure maps to α-helices flanking the central β-sheet and not to the domain-domain interface. Collectively, these results hint at intramolecular coevolution where the fold diverges differentially in the context of an accessory domain, a feature that might also apply to other multidomain superfamilies.
Figure 1. Correlation between sequence identity and structural similarity in the HADSF core domain. Each point denotes one protein pair with percent sequence identity value plotted on the x axis and the fTM score plotted on the y axis. A shows data for the entire dataset (number of points = 11,781) whereas B shows the dataset split into core domains with the same cap type (red, number of points = 3,606) and core domains with different cap type (blue, number of points = 8,175). Inset shows Spearman’s rank correlation coefﬁcient for the three individual sets.
Figure 3. Structure similarity networks for the HADSF cap domain (A) and core domain (B). Each node represents a single protein structure and an edge is drawn if fTM score is higher than the thresholds of ≥0.3 (A) and ≥0.7 (B), respectively. The network shown in A does not contain C0 class members. Annotation information, including cap type (obtained from manual examination of the structures), was associated with each node. The network was visualized using Cytoscape version 2.8 (45) with the yFiles organic layout scheme.
Figure 4. Correlation between cap domain and core domain structural similarity. Each point represents a pair of proteins with the core domain fTM score along the x axis and cap domain fTM score along the y axis. (A) All of the pair-wise comparisons with the linear best-squaresﬁt to data represented by the line. (B) The comparisons between core domains with the same type and core domains with different cap type in red and blue, respectively. The continuous line represents the linear best-squaresﬁt to data for all comparisons with the same cap type, and the dotted represents line comparisons with different cap type.
Figure 5. Primary Components from Probabilistic Principal Component Analysis using SALIGN. (A) Core domain structural data projected onto Principal Component 1 (PC1) plotted against data projected onto Principal Component 2 (PC2). Core domains are colored according to corresponding cap type. (B) The plot of cumulative variance described by the principal components (red) and random (black).
Figure 6. Visualizing structural variation in the Rossmann fold. A depicts positional variance of Cα coordinates from multiple structure alignment mapped onto representative core domain 2HSZ, chain A, whereas B depicts Bfactors for the same structure. Structures are colored as a color ramp according to corresponding values, with blue denoting the lowest value and red the highest. C shows lack of correlation (Pearson R 2 = 0.09) between positional variance and B-factor for each residue position for 2HSZ, chain A. The typical HAD Rossmann fold consists of the central β-sheet [strand 1 (6–9), strand 2 (133–118), strand 3 (140–142), strand 4 (171–175), and strand 5 (211–213)] and ﬂanking α-helices [helix 1 (100–110), helix 2 (122–132), helix 3 (154–161), and helix 4 (178–187).
Reprinted with permission from Proc Natl Acad Sci USA. Copyright (2013) National Academy of Sciences, USA