EFI 2015 - Mining the hts results of the hadsf
High throughput screening of the Haloalkanoic Acid Dehalogenase superfamily (HADSF) conducted over the latter half of the first installment of the EFI Glue Grant has produced a wealth of experimental information covering nearly 35% of the sequence space of the 20th largest Clan in the Pfam database. In total, 217 proteins were purified and screened, exhibiting a median of 15.5 active substrates per enzyme, with specfic proteins exhibiting as few as no hits and as many 143 hits. Approximately one quarter of the HAD enzymes screened were deemed "substrate specific" (defined as five or fewer substrates), while the remainder of the superfamily representatives were found to possess moderate or extreme substrate ambiguity.
While some element of substrate ambiguity is expected from the HADSF, which is well-documented as containing member proteins responsible for broad "house-keeping" cellular functions, such as detoxification of metabolite intermediate buildups, a fraction of this ambiguity could be phenomenologically related to the assay conditions. Aspects of the cellular milleiu, such as specific metabolite concentrations or localizations, are not well represented in a 96-well plate assay. Thus, it is essential to consider corroborating evidence when assigning physiological function to the proteins characterized here in high throughput.
Figure 1. Heat maps showing activity profiles for screened HAD enzymes. Each row represents a screened enzyme and each column a specific substrate. Enzymes (rows) are hierarchically clustered by cap type. Substrates (columns) are clustered by small molecule class. Inset depicts C0, C1, and C2 members (PDB IDs 3L8E, 1ZOL, 2FUC).
As the EFI moves forward, the overlap between the characterized HADSF sequence space and Solute Binding Protein sequence space will be enlarged to specifically include more instances of genome proximal HAD-SBP combinations. The high affinity of SBPs for their cognate ligand make them excellent candidates for probing a metabolite library. The subsequent substrate specificity determined for an SBP will continue to narrow the focus of possible metabolites being imported by the organism and then processed by a downstream HAD member.
Currently, this overlap between these two protein families is quite small: golden nodes in the below HAD Superfamily SSN. With the excepetion of the large eukaryotic cluster (upper left), the HAD members targeted by the above-described HTS campaign (lime green) are well dispersed throughout the superfamily's identity sequence landscape. A GNN was used to identify HAD members that were found proximal to an SBP (as defined by membership in a TRAP- or ABC-type SBP Pfam family). These SBP-proximal HAD members (red) are found in a more localized sequence identity space - most likely corresponding to bacterial classes for which SBP-type transporters are the predominant mode of metabolite import (ie. Gram negative bacteria). Each red node represents an opportunity to target novel HADs, SBP,s and any additional downstream components of a metabolic pathway identified by genome context.
At this time, several hundred HAD and SBP targets have been added to the EFI protein production pipeline for high throughput phosphatase activity profiling (HAD) and metabolite binding profiling (SBP). Genome context will be investigated in the near future to identify additional enzymes required for pathway prediction and validation.
Figure 2. Sequence similarity network (SSN) of the Haloalkanoic Acid Dehalogenase Superfamily (HADSF) separated at approx. 30% sequence identity. HAD SF members screened by HTS (green), members containing a genome proximal SBP (red), and members that satisfy both conditions (gold) have been highlighted.
Figure 1 reproduced with permission from Huang et al. 2015 Proceedings of the National Academy of Sciences of the USA; doi: 10.1073/pnas.1423570112