Skip to main content


Understanding Domain Complexity

Understanding Domain Complexity

A common mechanism by which proteins acquire new functions is through addition of domains, either by fusion to one of the termini (contiguous) or insertion into the fold (non-contiguous).  Intrinsically stable folds, known as superfolds, are more amenable to such alterations.  Consequently, superfolds tend to be widespread by virtue of their greater functional and structural plasticity.  One particularly prevalent superfold is the Rossmann fold which is a structural component of over ten different superfamilies, including the Haloacid Dehalogenase (HAD) superfamily under study by the Allen and Dunaway-Mariano labs.  

In the HAD superfamily, the Rossmann superfold forms a basic “core” domain.  In some cases the core domain has remained unmodified, but the majority of the time a variable “cap” domain has been inserted into the Rossmann scaffold.  Several different types of insertions have occurred, ranging from just a short stretch of amino acids to incorporation of caps that constitute multiple secondary structure elements.  Moreover, the insertions can occur at multiple sites within the fold, such that versions have been found with multiple caps accessorizing a single Rossmann core domain.  This assortment of domain organization highlights the plasticity of the Rossmann superfold and the variability of the HAD superfamily itself.  While the complexity of domain insertion dramatically increases the diversity of the protein universe, it is problematic for high-throughput bioinformatic and computational methods for functional annotation because the ensuing variability in length, location, and degree of divergence all complicate sequence and structure analyses.  For this reason, the HAD Bridging Project undertook a comprehensive study with the EFI’s Superfamily/Genome and Computation Cores in order to understand the underlying implications of domain insertion on sequence and structure diversity in the HAD superfamily.    

The evaluation began by generating core-domain only and cap-domain only datasets from existing protein structures. Interestingly, when the core-domain only dataset was sub-grouped into sets with different cap types and the same cap type, they found that the relationship between sequence and structural divergence was independent of the insertion.  While the relationship held true regardless, the correlation was slightly higher for cores with the same cap type versus those with different caps.   Furthermore, core domains with the same cap type were statistically more similar, both in sequence and structure, than those with different cap types.  This result was surprising since structural variation in the Rossmann core was not expected to be coupled to the cap domain.  Cores without an insert were the most diverse overall, indicating they are under the least evolutionary constraint.  Finally, an analysis was carried out to uncover the structural basis for the cap domain’s effect on the core domain.  The most highly variable regions mapped to connecting loops and bordering helices instead of the cap-core interface.  From these results it was apparent that only a few degrees of freedom dominate structural plasticity and these parameters depended on different cap domains. 

Through this series of comparisons of sequence and structural diversity, several surprising correlations were uncovered.  The results all lead to the conclusion that the Rossmann superfold is highly robust and although influenced by large insertions more than previously recognized, is able to incorporate such changes without extensive modification.  Rather, minimal mutations to structural elements within the superfold are all that is needed to accommodate new domains.  Given how readily new functions can arise by expanding domain complexity, it is likely that other protein superfolds are equally as tolerant which explains, in part, their prevalence in nature.

View the publication here