Skip to main content

Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks

Gerlt JA, Bouvier JT, Davidson DB, Imker HJ, Sadkhin B, Slater DR, Whalen KL (2015) Biochimica et Biophysica Acta - Proteins and Proteomics, 1854, 1019-1037. PMID: 25900361

From the Information and Data Dissemination Core, a comprehensive tutorial for the EFI-Enzyme Similarity Tool, including example cases using the Orotidine 5'-phosphate decarboxylase family.

Abstract

The Enzyme Function Initiative, an NIH/NIGMS-supported Large-Scale Collaborative Project (EFI; U54GM093342;), is focused on devising and disseminating bioinformatics and computational tools as well as experimental strategies for the prediction and assignment of functions (in vitro activities and in vivo physiological/metabolic roles) to uncharacterized enzymes discovered in genome projects. Protein sequence similarity networks (SSNs) are visually powerful tools for analyzing sequence relationships in protein families (H.J. Atkinson, J.H. Morris, T.E. Ferrin, and P.C. Babbitt, PLoS One 2009, 4, e4345). However, the members of the biological/biomedical community have not had access to the capability to generate SSNs for their “favorite” protein families. In this article we announce the EFI-EST (Enzyme Function Initiative-Enzyme Similarity Tool) web tool that is available without cost for the automated generation of SSNs by the community. The tool can create SSNs for the “closest neighbors” of a user-supplied protein sequence from the UniProt database (Option A) or of members of any user-supplied Pfam and/or InterPro family (Option B). We provide an introduction to SSNs, a description of EFI-EST, and a demonstration of the use of EFI-EST to explore sequence-function space in the OMP decarboxylase superfamily (PF00215). This article is designed as a tutorial that will allow members of the community to use the EFI-EST web tool for exploring sequence/function space in protein families.

Link to PubMed »

 

Figure 2:  A comparison of tress and sequence similarity networks.

Figure 3:  The "Start Page" page for EFI-EST.

Figure 4:  The dependence of the SSN for the OMP decarboxylase superfamily (PF00215) on the minimum alignment score.

Figure 5:  InterPro homepage.

Figure 6:  The ouput of InterProScan5 using the sequence of MtOMPDC as the query.

Figure 7:  Panel A, the "Length Histogram" for the OMP decarboxylase superfamily (PF00215) showing the number of sequences as a function of length.

Figure 8:  The "Number of Edges Histogram" for the OMP decarboxylase superfamily (PF00215) showing the number of edges calculated by BLAST as a function of alignment score.

Figure 9:  The "Alignment Length Quartile Plot" for the OMP decarboxylase superfamily.

Figure 10:  The "Percent Identity Quartile Plot" for the OMP decarboxylase superfamily.

Figure 11:  The "Data Set Completed" page for EFI-EST.

Figure 12:  The "Download Network Files" page for EFI-EST showing the sizes of the full and representative networks.

Figure 13:  Reactions catalyzed by the OMP decarboxylase superfamily.

Figure 14:  Representative node networks for the OMP decarboxylase superfamily using a minimum alignment score of 35.

Figure 15:  The 80% rep node network for the OMP decarboxylase superfamily (PF00215) with a minimum alignment score of 35 in which the metanodes with reviewed SwissProt status are highlighted yellow.

Reprinted with permission from Gerlt et al. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics. © 2015 Elsevier.