Skip to main content



The Enzyme Function Initiative (EFI) is developing robust sequence/structure‑based strategies for facilitating discovery of in vitro enzymatic and in vivo physiological functions of unknown enzymes discovered in genome projects, a crucial limitation in genomic biology. This goal is being accomplished by integrating bioinformatics, structural biology, and computation with enzymology, genetics, and metabolomics.

Determining the functions of proteins encoded by genome sequences represents a major challenge in contemporary biology. As of February 2015, the TrEMBL database contained 90,860,905 sequences (up from 10,867,798 in June 2010 just after the EFI began). However, the conservative estimate remains that ≥ 50% of the sequences in the databases have uncertain, unknown, or incorrectly annotated functions. Without correct annotations, the unlimited promise that a functional and mechanistic understanding of Nature’s complete repertoire of enzymes and metabolic pathways would provide for medicine, chemistry, and industry cannot be realized.

New orthogonal approaches for predicting the substrate specificity of unknown enzymes are needed that provide a general, more direct method for functional discovery. To be effective, new approaches must incorporate high throughput predictive methods to focus and enable the more time‑consuming experimental assignment of function. The EFI’s goal is to develop and disseminate integrated sequence/structure‑based strategies that incorporates a spectrum of essential expertise, including high‑throughput bioinformatics, computation and structural biology that will enable focused experimental enzymology, genetics, and metabolomics.

The EFI is funded by a Large-Scale Collaborative Project (Glue Grant) from the National Institute of General Medical Sciences (U54 GM093342). The EFI is formed from approximately 70 researchers at 7 academic institutions in the US.