Enolase (EN) Superfamily
Project period : 2010 - present
John Gerlt, UIUC
- α‑proton abstraction of a carboxylate substrate generating a Mg2+-stabilized enediolate intermediate which undergoes subsequent epimerization/racemization or β-elimination
- di-domain structure consisting of an N-terminal α+β “cap” and a C-terminal (β/α)7β barrel fold
Challenges for Function Assignment
- difficult library development due to unknown metabolites serving as substrates/products
- mobile active site loops complicates crystallographic and computational efforts
- cryptic physiological functions due to roles in niche/novel metabolism
Value to Integrated Strategy
- serves as the paradigm for functional assignment through an integrated bioinformatic-computational-crystallographic approach
- drives development of crystallographic and computational methods to obtain structures and templates of maximum value for accurate in silico docking
- forces assessment of current understanding of bacterial metabolism
The EN superfamily is a diverse set of >13,000 homologous enzymes that all carry out enolization of carboxylate substrates but are divided into distinct subgroups based on the identity and location of active site residues. The namesake enolases comprise ~40% of the superfamily, forming the largest subgroup. Six additional subgroups have been identified to date: muconate lactonizing enzyme (MLE; ~25%), mandelate racemase (MR; ~25%), 3-methylaspartate ammonia lyase (MAL; <1%), D-mannonate dehydratase (ManD; ~3%), D-glucarate dehydratase (GlucD; ~5%), and D-galactarate dehydratase (GalrD; <1%).
Active sites of EN superfamily members are located at the interface between an N-terminal α+β capping domain and a C-terminal (β/α)7β barrel domain (Figures EN1 and EN2). The Mg2+ is coordinated to three conserved ligands at the ends of the 3rd (Asp), 4th (Glu), and 5th (Asp/Glu/Asn) β‑strands of the barrel domain and at least one carboxylate oxygen of the substrate. A base at the end of the 2nd/3rd or 6th/7th β‑strand generates the enediolate intermediate that is stabilized by coordination to the Mg2+ while a complementary acid at one of these positions directs the intermediate to product. Although residues at the end of the 8th β-strand frequently form substrate binding interactions, the majority of substrate specificity determinants are located in loop regions that extend from the N-terminal capping domain.
Members of the EN superfamily catalyze 1,1‑proton transfer reactions or β‑elimination reactions, totaling ~20 unique reactions discovered thus far (Figure EN3). While the enolase, MAL, ManD, GlucD, and GalrD subgroups are thought to be isofunctional, members of the MLE and MR subgroups are known to be multifunctional, and new activities are certain to exist based on the sequence similarity network (SSN) analysis developed by the Superfamily/Genome Core.
Four functions are known in the MLE subgroup: cycloisomerization (MLE), dehydration (o‑succinylbenzoate synthase, OSBS), epimerization (L‑Ala-D/L‑Glu epimerase, AEE) and racemization (N‑succinylamino acid racemase, NSAR) (Figure EN3, red box). Nine functions are known in the MR subgroup: racemization by MR and dehydration by seven families of acid sugar dehydratases in bacterial carbohydrate catabolism (Figure EN3, blue box). While only ~15% of MLE members have unknown functions largely due to integrated bioinformatic-computational-crystallographic efforts, ~65% of the MR members have unknown functions reflecting the challenge in correctly predicting the identity of carbohydrate substrates with multiple chiral centered.
With many members of unknown function and the number expanding as additional microbial genomes are sequenced, the EN superfamily represents a well-established system for continued development of an integrated sequence/structure-based strategy for functional assignment. Although the initial Program Project effort (P01 GM071790) provided considerable success indicating this method is viable as a general strategy, many difficulties were experienced which must be addressed by the EFI. These include poor docking templates due to disordered regions in crystal structures and incorrect substrate predictions due to model inaccuracy and/or incomplete in silico libraries. Extensive collaboration with the Structure, Computation, and Protein Cores aims to met these challenges head-on by development of new methods for obtaining and utilizing quality docking templates. Additionally, although some functions/substrate specificities were correctly predicted (i.e. several novel dipeptide epimerases), the metabolic context remains ambiguous due to unknown physiology. Collaboration with the Microbiology Core on a select number of cases will provide in vitro context for in vivo assigned functions.
- Evolution of enzymatic activities in the enolase superfamily: functional assignment of unknown proteins in Bacillus subtilis and Escherichia coli as L-Ala-D/L-Glu epimerases. Schmidt DM, Hubbard BK, Gerlt JA. (2001) Biochemistry 40,15707-15.
- Divergent evolution in the enolase superfamily: the interplay of mechanism and specificity. Gerlt JA, Babbitt PC, Rayment I. (2005) Arch Biochem Biophys 433, 59-70.
- Prediction and assignment of function for a divergent N-succinyl amino acid racemase. Song L, Kalyanaraman C, Fedorov AA, Fedorov EV, Glasner ME, Brown S, Imker HJ, Babbitt PC, Almo SC, Jacobson MP, Gerlt JA. (2007) Nat Chem Biol 3, 486-91.
- Computation-facilitated assignment of the function in the enolase superfamily: a regiochemically distinct galactarate dehydratase from Oceanobacillus iheyensis. Rakus JF, Kalyanaraman C, Fedorov AA, Fedorov EV, Mills-Groninger FP, Toro R, Bonanno J, Bain K, Sauder JM, Burley SK, Almo SC, Jacobson MP, Gerlt JA. (2009) Biochemistry 48, 11546-58.