Skip to main content

Discovery of new enzymes and metabolic pathways by structure and genome context guided analyses

Zhao S, Kumar R, Sakai A, Vetting MW, Wood BM, Brown S, Hillerich BS, Seidel RD, Babbitt PC, Almo SC, Sweedler JV, Gerlt JA, Cronan JE, Jacobson MP (2013) Nature 502, 698-702. PMCID: PMC3966649

In a truly cohesive effort, the expertise of multiple EFI groups was leveraged to assign the function of an "orphaned" PSI structure belonging to the EN Superfamily, PDB:2PMQ. Metabolite docking to several proteins in a putative operon predicted a novel reaction and pathway, catabolism of tHyp-B to α-ketoglutarate via epimerization of 4R hydroxyproline betaine (tHyp-B) by 2PMQ. Experimental follow-up by x-ray crystallography, enzymology, genetics, and metabolomics established the veracity of the prediction and further elucidated the role of tHyp-B as an osmolyte. This work demonstrates the power of pathway docking for prediction of novel in vitro enzymatic activities and in vivo physiological functions and the immense utility of bringing multiple techniques together to address the problem of functional assignment.



Assigning valid functions to proteins identified in genome projects is challenging: overprediction and database annotation errors are the principal concerns. We and others are developing computation-guided strategies for functional discovery with ‘metabolite docking’ to experimentally derived or homology-based three-dimensional structures. Bacterial metabolic pathways often are encoded by ‘genome neighbourhoods’ (gene clusters and/or operons), which can provide important clues for functional assignment. We recently demonstrated the synergy of docking and pathway context by ‘predicting’ the intermediates in the glycolytic pathway in Escherichia coli. Metabolite docking to multiple binding proteins and enzymes in the same pathway increases the reliability of in silico predictions of substrate specificities because the pathway intermediates are structurally similar. Here we report that structure-guided approaches for predicting the substrate specificities of several enzymes encoded by a bacterial gene cluster allowed the correct prediction of the in vitro activity of a structurally characterized enzyme of unknown function (PDB 2PMQ), 2-epimerization of trans-4-hydroxy-L-proline betaine (tHyp-B) and cis-4-hydroxy-D-proline betaine (cHyp-B), and also the correct identification of the catabolic pathway in which Hyp-B 2-epimerase participates. The substrate-liganded pose predicted by virtual library screening (docking) was confirmed experimentally. The enzymatic activities in the predicted pathway were confirmed by in vitro assays and genetic analyses; the intermediates were identified by metabolomics; and repression of the genes encoding the pathway by high salt concentrations was established by transcriptomics, confirming the osmolyte role of tHyp-B. This study establishes the utility of structure-guided functional predictions to enable the discovery of new metabolic pathways.

Link to Pubmed »

2013 Zhao 1 a, The reaction catalysed by HpbD, the Hyp-B 2-epimerase. b, The binding site of the model of HpbJ, with the top-ranked ligand tHyp-B docked. The ligand surface is shown in magenta. c, Comparison of HpbD top-ranked docking pose of D-Pro-B (magenta) with the experimental pose of tHyp-B (cyan). The unliganded structure used in docking (PDB 2PMQ) and the subsequently determined liganded structure (PDB 4H2H) are shown in magenta and cyan, respectively. d, Superposition of the model of HpbB1 (magenta) and the closest characterized Rieske-type protein (cyan; PDB 1O7G, a naphthalene dioxygenase), showing that the active site of the model is too small to accept naphthalene as a substrate. Steric clashes identified by using a van der Waals overlap of 0.6 Å or more are shown in red lines.

2013 Zhao 2The genes encoding orthologues are highlighted with the same colour; the sequence identities relating orthologues in P. bermudensis and P. denitrificans are indicated. The ecological sources of tHyp-B would be seaweed (sargasso) for the Sargasso Sea bacterium P. bermudensis, and plants for the soil bacterium P. denitrificans.

2013 Zhao a, Enriched chemotypes in the top 120 hits. Most of them are amino acid derivatives, in which N-capped amino acid derivatives and proline analogues are the two most common subtypes. b, Proline analogues in the rank-ordered list of predicted ligands, illustrating the frequent occurrence of N-modified proline analogues. Pro-B, a substrate for HpbD, ranks at number 110 in the list (top 0.12% of the docking library).

2013 Zhao 4 a, Catabolic pathway for tHyp-B. On the basis of the genome neighbourhood contexts in P. bermudensis and P. denitrificans, tHyp-B is epimerized to cHyp-B which undergoes two N-demethylation reactions to cHyp; cHyp is oxidized, dehydrated and deaminated, and finally oxidized to α-ketoglutarate (α-KG). Pyr4H2C, Δ1-pyrroline-4-hydroxy-2-carboxylate; α-KGSA, α-ketoglutarate semialdehyde. The enzymes are coloured as in Fig. 2. b, Kinetic constants for HpbD from P. bermudensis and its orthologue from P. denitrificans.

Reprinted with permission from Nature Publishing Group.
Copyright © 2013, Rights Managed by Nature Publishing Group