Skip to main content


Protein Core

Protein Core

(Steve Almo, Director)

The role of the EFI Protein Core is to produce samples of targeted enzymes, and each year it is tasked with processing thousands of targets.  Because individual wet labs in the EFI are unable to clone and characterize targets at the pace needed to evaluate the functions predicted, the EFI depends on the Protein Core for efficient, automated high-throughput production, underscoring the value of the EFI as a large scale effort.  Protein samples are distributed to the EFI Bridging Projects for use in activity assays and screens and to the EFI Structure Core for structural determination.

Directed by Prof Steve Almo at the Albert Einstein College of Medicine and managed by Drs Ron Seidel and Brandan Hillerich, the team at AECOM has built and staffed a state-of-the-art protein production facility in conjunction with the New York Structural Genomics Research Consortium (NYSGRC).  In the years since the EFI was established, their large scale high-throughput pipeline has been fully developed to carry out target cloning, expression screening, purification, validation, and distribution. 

The protein pipeline begins with en mass cloning of targets in 96-well format, using DNA template generally acquired from commercial suppliers ATCC or DSMZ, repositories such PSI-MR, or companies that specialize in gene synthesis.  The Protein Core relies on high-throughput Ligation Independent Cloning (LIC), which allows for a single PCR product to be cloned into multiple vectors quickly and efficiently.  A series of custom pET vectors has been engineered in the Protein Core to contain dual affinity tags (e.g. strepII, His6) at either the C or N terminus for subsequent automated multi-step purification.  Some also contain SUMO tags for increased expression and solubility.  The Protein Core and Bridging Projects work together to assess a given tag’s influence on activity, to ensure that the protein samples produced are of optimal utility.

Once clones have been obtained, small-scale expression is carried out on 96-well block format.  Using defined autoinduction media, 750 mL samples are incubated at different temperatures.  For frequently cryptic reasons, varying the vector backbone influences gene expression and/or protein solubility in some cases.  The Protein Core has pioneered a vector screening protocol which allows parallel examination of multiple constructs to help optimize output.  Partial or full vector screening has been essential in achieving maximum flux through the EFI pipeline.  Samples of small scale expression screens are lysed through use of a custom sonication robot, after which pellet and soluble fractions are quantified with a Caliper GXII Bioanalyzer.  Expression and solubility are each reported on a scale of 0-3, where 0 indicates failure and 3 indicates maximum expression or solubility was observed. These values help prioritize which targets are scaled up for large scale purification.  Additional efforts are underway to examine and optimize several other factors that influence expression and solubility, such as media composition and lysis conditions.

For large scale growth and purification, the Protein Core uses a LEX fermentor which supports 48 x 1 liter growths in parallel.  Through use of this platform the Protein Core has doubled their initial expression capacity, and they are continuing to explore other avenues to increase yields, such as influence of percent oxygen during aeration.  Depending on which construct gave maximum expression and solubility, automated large scale purifications are tailored to take advantage of affinity tags via tandem chromatography on 5 AKTA Xpress FPLCs.  Protein samples are split for distribution between the Bridging Projects and Structure Core following validation. 

With so many targets at various stages of the protein production pipeline, sample tracking is paramount for efficiency, quality control, and sanity.  The Protein Core has worked out a system such that all cloning, expression screening, purification and associated procedures are bar-coded with “chain-of-custody” protocols to minimize the opportunities for error that inevitably arise from human intervention.  Samples are sequenced at several steps along the pipeline to ensure physical mix-ups or actuarial mistakes are caught.  Furthermore, MALDI and ESI mass spectrometry are routinely used to confirm the expected mass of the final protein product, and peptide mass fingerprinting has also been adopted to validate all large-scale protein preparations.



Protein and Structure Cores Production Statistics - Cumulative To Date
All EFI 2010 -2015 (cumulative) GST AHY ENL HAD ISY LacI-Tx TRAP ABC Totals
Clones/expression vectors generated 1275 1303 2420 2120 1936 114 1225 750 11143
Small-scale expression validation             1240 1691 1758 3073 2464 114 1376 376 12092
Large-scale growths 456 416 681 592 437 60 386 249 3277
Macropreps 581 350 634 615 411 54 408 238 3291
Protein shipments 489 316 492 656 329 55 460 260 3057
Differential Scanning Fluorimetry 165 120 250 0 30 26 162 182 935
Activity-based screened 0 0 0 297 0 0 0 0 297
Structure Depositions (New for Target) 61 (43) 26 (14) 85 (59) 29 (10) 19 (14) 0 60 (46) 5 (2) 285 (188)
Depositions with Ligand 38 12 34 7 8 0 51 4 154


Click here for yearly breakdown >>