As part of the EFI’s integrated strategy, all Cores and Bridging Projects of the EFI are involved in target selection for the EFI “pipeline.” Following selection and vetting, the pipeline begins with the Protein Core cloning and purifying EFI targets and providing samples to 1) Bridging Projects for testing predictions made by the Computation Core and 2) the Structure Core for determination of structures for in silico ligand docking templates and for verification of structure-based predictions. Priority is given enzymes in genetically tractable organisms as identified by the Microbiology Core, thereby allowing genetic, phenotypic, and metabolomic approaches for establishing in vivo function.
Several themes are used to categorize individual EFI targets:
- Specificity Boundaries: As sequence diverges within a superfamily, the substrate specificity (function) changes. An important test of substrate specificity predictions by the Computation Core is whether changes in the substrate specificity of homologous enzymes can be predicted.
- Sequence/Function Diversity: Sequence similarity networks allow facile identification of divergent families that have not been experimentally or structurally characterized, and such divergent families likely will have new substrate specificities. An important test of the Computation Core’s algorithms is whether novel specificities can be predicted for targets selected from divergent families.
- Structures with No Functions (SNFs): The goal of the Protein Structure Initiative (PSI‑1 and PSI‑2) was to explore sequence space in order to define “fold space.” To meet that goal, structures were determined for many functionally uncharacterized enzymes. A challenge is to “rescue” these targets by testing Computation Core generated predictions of substrate specificities.
- Operon‑Encoded Proteins: Many bacterial enzymes (the primary focus of the EFI) are localized in operons that encode metabolic pathways. Because enzymes in a pathway will bind structurally related metabolites, in silico ligand docking by the Computation Core to all of the enzymes in a pathway is expected to facilitate identification of the pathway and, therefore, the substrate specificity for the target.
- Chemoenzymatic Reagent: In response to the predictions made by the Computation Core, the Bridging Projects undertake the preparation of possible substrates. Many of these are most efficiently prepared enzymatically with, for example, kinases, dehydrogenases, and/or aldolases. As these enzymes are needed, they are added to the protein production pipeline.