Skip to main content

Prediction and characterization of enzymatic activities guided by sequence similarity and genome neighborhood networks

Zhao S, Sakai A, Zhang X, Vetting MW, Kumar R, Hillerich B, San Francisco B, Solbiati J, Steves A, Brown S, Akiva E, Barber A, Seidel R, Babbitt PC, Almo SC, Gerlt JA, Jacobson MP (2014) eLife; 10.7554/eLife.03275. PMCID: PMC4113996

Employing all EFI cores and the Enolase Bridging Project, the following work acts as the prototype for utilizing genome neighborhood networks (GNNs) in the process of functional discovery and elucidation. The resulting synergy between GNNs and sequence similarity networks allowed confident functional predictions to be made for over 85% of the sequences within a diverse enzyme superfamily.


Metabolic pathways in eubacteria and archaea often are encoded by operons and/or gene clusters (genome neighborhoods) that provide important clues for assignment of both enzyme functions and metabolic pathways. We describe a bioinformatic approach (genome neighborhood network; GNN) that enables large scale prediction of the in vitro enzymatic activities and in vivo physiological functions (metabolic pathways) of uncharacterized enzymes in protein families. We demonstrate the utility of the GNN approach by predicting in vitro activities and in vivo functions in the proline racemase superfamily (PRS; InterPro IPR008794). The predictions were verified by measuring in vitro activities for 51 proteins in 12 families in the PRS that represent ~85% of the sequences; in vitro activities of pathway enzymes, carbon/nitrogen source phenotypes, and/or transcriptomic studies confirmed the predicted pathways. The synergistic use of sequence similarity networks3 and GNNs will facilitate the discovery of the components of novel, uncharacterized metabolic pathways in sequenced genomes.

Link to PubMed »



Figure 1: The reactions catalyzed by proline racemase (ProR), 4R-hydroxyproline 2-epimerase (4HypE), and trans-3-hydroxy-L-proline dehydratase (t3HypD) and the metabolic pathways in which they participate.

Figure 2: Sequence similarity networks (SSNs) for the PRS.

Figure 3: The genome neighborhood network (GNN) for the PRS.

Figure 4: Library of proline and proline betaine derivatives tested for ESI-MS screening. These substrates were divided into four groups to avoid mass duplication.

Figure 5: Structures of members of the PRS.

Figure 6: Sequence divergent members of the ornithine cyclodeaminase superfamily (OCDS) have been assigned novel pyrroline-2-carboxylate reductase (Pyr2C reductase) function in this work.

Figure 7: Mapping members of the GNN clusters back to the SSN for the PRS.

Figure 8: Experimentally characterized enzymes reported by Swiss-Prot (small colored circles) and newly characterized in this work (large colored circles). Colors match the color scheme in Figure 2b.

Figure 9: Demonstration of the 4HypE, 3HypE, and t3HypD reactions by 1H NMR.

Figure 10: Representative 1H NMR spectra for delta1-pyrroline-2-carboxylate (delta1-Pyr2C) reductase activity.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.