Skip to main content

Protein production from the structural genomics perspective: achievements and future needs.

Almo SC, Garforth SJ, Hillerich BS, Love JD, Seidel RD, Burley SK. (2013) Curr Opin Struct Biol 23, 335-44. PMCID: PMC4163025

Housed and run in parallel with the NYSGRC pipeline at AECOM, the the EFI Protein Core alone tackles over a thousand protein targets per year. These targets are necessary for the experimental validation of computationally predicted functions. The high-throughput nature of the protein pipeline provides a wealth of expertise and data that is useful to the greater scientific community. Here Almo et al. describe the current methods used to maximize production of a variety of challenging protein targets (e.g. multi-domain, complexed, etc.).


Despite a multitude of recent technical breakthroughs speeding high-resolution structural analysis of biological macromolecules, production of sufficient quantities of well-behaved, active protein continues to represent the rate-limiting step in many structure determination efforts. These challenges are only amplified when considered in the context of ongoing structural genomics efforts, which are now contending with multi-domain eukaryotic proteins, secreted proteins, and ever-larger macromolecular assemblies. Exciting new developments in eukaryotic expression platforms, including insect and mammalian-based systems, promise enhanced opportunities for structural approaches to some of the most important biological problems. Development and implementation of automated eukaryotic expression techniques promises to significantly improve production of materials for structural, functional, and biomedical research applications.

Link to PubMed »

Figure 1. The NYSGRC expression/purification pipeline. Departing from traditional structural genomics pipelines focused largely on prokaryotic targets, the NYSGRC manages four independent expression pipelines to meet current challenges of Protein Structure Initiative.

Figure 2. The NYSGRC prokaryotic screening pipeline. (a) To reduce attrition and address recalcitrant targets, the NYSGRC and others have leveraged automated small-scale expression efforts to screen multiple expression vectors against multiple host cell lines, growth media, temperature and lysis conditions to rapidly identify the optimal conditions for growth, expression and purification. Following ligation independent cloning into an N-terminal TEV cleavable HIS vector (pET30 based, First pass Vector 1), the NYSGRC screens for expression in a phage resistant version of BL21(DE3) harboring a plasmid encoding rare tRNAs (RIL). These cells are grown in 96-well deep block plates using two types of autoinduction media: the fully defined PSAM media and the ultra-rich ZYP-5052 media based on yeast extract. Cultures are grown at two temperatures (22°C and 37°C), and cells are lysed and purified on 96-well Ni-IDA plates with three different buffer systems [20 mM HEPES, PH 7.5, 20 mM Imadazole, 10% Glycerol, 0.1%Tween 20, supplemented with either 500 mM NaCl (NaCl), 200 mM ammonium sulfate (NH4SO4) or 500 mM NaCl plus 50 mM Arginine and 50 mM Glutamine (Arg/Glu)]. Total cellular fractions (Totals) and eluted fractions (Elutes) are analyzed using a Caliper GXII, scored and positive expressors proceed to scale up. Of note, one protein screened using this process results in 24 gel lanes. Depending on the value of the target, this process can be repeated for failures against rescue vectors (e.g. C-terminal HIS fusions, N-terminal SUMO, MBP) or other cell lines (e.g. C41/43, SoluBL21). (b) Collection of 72 structures determined from the nitrogen fixing symbiotic model organism Sinorhizobium meliloti and other nitrogen-fixing microbes. These structures and greater than 850 proteins purified by the NYSGRC prokaryotic expression platform contribute to the efforts of a consortium focusing on the systems biology of the symbiotic relationship between Sinorhizobium and alfalfa; other program participants include Allen Orville (Brookhaven National Laboratory); Michael Kahn and Svetlana Yurgel (Washington State University); Mary Lipton (Pacific Northwest National Laboratory), Haiping Cheng (Lehman College), and Sharon Long (Stanford University).

Figure 3. NYSGRC Infrastructure: (a) LEX 48 airlift fermenter for parallel growth of E. coli cultures; (b) Caliper GXII 384-well capillary electrophoresis robot for rapid analysis of protein expression and small-scale purification; (c) Beckman FxP liquid handling robot for DNA and protein manipulation in multiwell plate format; (d) AKTA Xpress multistep purification instrument for automated protein purification; (e) custom built EPSON Scara-based Sonication robot for high-throughput cell lysis in multiwell plates or 50 mL tubes; (f) Perkin-Elmer Janus Cell::Explorer tissue culture robotics platform (from the point of view of the Guava FACS analyzer), showing the six axis robotic arm, the plate carousel, incubator space, and a Janus liquid Handler system (not labeled) all located within the BSL-2 bio-containment hood.

Figure 4. High-throughput Eukaryotic Expression at the NYSGRC. (a) Automated small-scale expression testing of secreted targets (members of the immunoglobulin or Ig superfamily) in BEV-infected SF9 and Hi5 insect cells. (b) Example large-scale expression and purification of ∼40 secreted proteins or ectodomains from the Ig Superfamily, generated using BEV and purified from supernatants of large scale (>2 L) insect cultures. Heterogeneity is due in part to variability of post-translational modifications, and, potentially proteolysis in the culture supernatant. The number of potential N-linked glycosylation sites (predicted using N-glycosite [65]) is indicated for each target. (c) Proteins recently purified from the NYSGRC mammalian stable cell line expression platform (Lentiviral-driven): (C1) human IgG1 fusion (Fc fusion) proteins of B7-1; (C2) murine Programmed cell Death Ligand 1, PDL-1; (C3) murine Programmed cell Death receptor 1, PD-1; (C4) murine B7-1; (C5-9) Fc fusions of wild type (C5) and mutant (C6-9) versions of PDL-1; (C10) erythroid membrane-associated protein, ERMAP; (C11) murine CD300 antigen like family member H, Cd300lh; (C12) Paired immunoglobulin-like type 2 receptor beta 1, PILRB; (C13) human voltage gated sodium channel type II beta. (d) Representative X-ray structures recently determined from proteins produced by the NYSGRC BEV platform: (1) Ig-C2 domain of human Fc-receptor like A, FCRLA (PDB ID 4HWN); (2) Ig-C2 type 1 domain from mouse Fibroblast growth factor receptor 2, FGFR2 (PDB ID 4HWU); (3) Carcinoembryonic antigen-related cell adhesion molecule 15, CEACAM 15; and (4) beta 2 glycoprotein 1, B2GPI.

Reprinted from Current Opinion in Structural Biology 23, Almo et. al., Protein production from the structural genomics perspective: achievements and future needs, 335-44, Copyright (2013), with permission from Elsevier.