Redesigning Cells for Production of
Complex Organic Molecules
If metabolically engineered properly, cells replenish
both enzymes and cofactors while producing complex, potentially useful
Vincent J. J. Martin, Christina D. Smolke, and Jay D.
Engineered microorganisms are becoming a significant
alternative for synthesizing complicated organic molecules. While there
may be more appeal in discovering new molecules, pathways, and their
corresponding genes, the development of appropriate hosts and expression
systems to produce these molecules must be undertaken simultaneously if
we ever hope to produce them in quantities sufficient for evaluating or,
eventually, using them on a commercial scale. Recent advances in gene
expression systems will certainly play a part in developing optimized
microbial hosts in which to produce many of these molecules.
During the last century, synthetic organic chemistry was
the workhorse of the chemical and pharmaceutical industries for
producing feedstock chemicals, fuels, polymers, and drugs. Though
methods in organic synthesis continue to improve, some chemical
compounds remain difficult to synthesize, particularly those that
contain multiple stereochemical centers. The complexities involved in
producing these compounds are reflected in intricate multistep chemical
syntheses that often result in low yields and potentially toxic chemical
Enzymes, Metabolic Engineering Offer Routes to
Complex Organic Molecules
Over the last two decades, enzymes have become
increasingly popular for catalyzing some of the most complicated organic
chemical transformations. Specialized enzymes allow one to produce
enantiomerically pure molecules, sometimes eliminating one or more steps
in a conventional organic chemical synthesis, unnecessary side
reactions, and the associated use of toxic organic solvents. New and
improved enzymes with broadened substrate ranges and activities,
resulting from recent genome sequencing and prospecting projects, and
also from directed genetic evolution techniques, increasingly enable
more efficient production of diverse organic molecules.
While two or more enzymes may be used together in an in
vitro synthetic train, differences in enzyme activity and stability and
the costs associated with cofactor regeneration typically make these
synthetic metabolic pathways uneconomical for all but the smallest
product quantities. However, what may be difficult and costly in vitro
is often relatively simpler to accomplish in vivo. Redirecting multiple
enzymatically catalyzed reactions to improve production of useful or
novel chemicals, or to remediate toxic chemicals in the environment,
constitutes the field of metabolic engineering. Metabolic engineering
and the various mathematical and experimental methods that encompass it
have been used most widely for optimizing transformation of chemicals by
cells. The principal advantage of cells over multienzyme in vitro
reaction systems is that cellular metabolism replenishes both enzymes
and cofactors, and may even furnish valuable precursors derived from
inexpensive starting materials.
Metabolic engineering of cells often involves
introducing multiple, heterologous genes encoding enzymes needed in the
new pathway. The expression of these genes must be balanced so that no
single enzyme is overproduced and no single enzyme severely limits the
flux through the pathway, both of which could lead to an accumulation of
one or more metabolic intermediates and inefficient use of cellular
resources. Thus, accurate and reproducible control of the expression of
individual genes in the heterologous metabolic pathway is necessary to
maximize production of the desired compound. Unfortunately, most gene
expression systems that have been developed for high-level production of
heterologous proteins lack the finesse required in metabolic
engineering. In this brief review, we discuss some of the recent
developments in gene expression, with emphasis on those techniques most
important for remodeling cellular metabolism.
In their early efforts, genetic engineers emphasized the
development of multicopy expression vectors, assuming that protein
production would increase with copy number of the encoding genes. While
high vector copy numbers may be advantageous for producing certain
sought-after proteins, in metabolic engineering, enzymes are sought for
their catalytic propertiesnot as the end product. Thus,
low-copy-number (or single-copy-number) vectors should be sufficient for
producing enzymes in metabolically engineered hosts. Indeed, most genes
involved in cellular metabolism are located on the chromosome,
suggesting that expression of single-copy genes is sufficient for their
ordinary function. Here, we will focus on two single-copy expression
systems, one based on modifying the chromosome and the other on
bacterial artificial chromosomes.
Four basic methods have been developed to deliver and
insert DNA sequences into the chromosomes of microbial cells (Fig. 1).
One method relies on transposons to insert DNA randomly into
chromosomes, whereas the other three methods involve site-specific
insertions at a predetermined gene or region of a chromosome. To
eliminate selection markers and replicon sequences that are inserted
into the chromosome when any of these techniques is used, several new
methods are available.
Transposons are specific DNA sequences that catalyze
their own movement to alternate sites within a chromosome. Victor
deLorenzo and his collaborators at the National Biotechnology Center in
Spain developed a series of mini-Tn5 transposon suicide delivery vectors
(pUT system) for use in randomly inserting genes into bacterial
chromosomes. Unlike the native Tn5 transposon, the transposase
gene (tnp) in these vectors is located outside the mobile
element, so that only the insertion sequence terminal sequences and the
DNA cloned between the two sequences are delivered to the chromosome.
Since the tnp gene is not transposed, the resulting insertion is
not prone to genetic instability and cells containing the mobile element
are not immune to further rounds of transposition. There are two
disadvantages to using transposons in metabolic engineering: (i)
insertion of a transposon into the chromosome may negatively affect
expression of neighboring genes and (ii) the level of expression may
depend on the location of the transposon in the chromosome.
Vectors containing conditional replicons, such as the
temperature-sensitive pSC101 and the ?pir-dependent R6K are
particularly useful for delivering genes into the chromosome of Escherichia
coli. Additional narrow-host-range suicide vectors are being used
for delivering genes into bacterial hosts other than E. coli,
such as Pseudomonas species. In these systems, a specific cloned
DNA sequence is used to target a region of the chromosome for
recombination. In a single crossover event, heterologous DNA inserts
along with the replicon and a selection marker (usually antibiotic
resistance). The replicon and/or the marker is eliminated by a
counter-selection strategy such as by cultivating the modified cells at
an elevated temperature (pSC101) or on sucrose using the sacB
To eliminate the requirement for first cloning a target
sequence into a gene delivery vector, alternative site-specific,
homologous recombination systems were engineered. For instance, one of
these systems uses phage attachment sites (attP) on the delivery
vector that then target specific attachment sites on the chromosome (attB).
Recombination at the attB site may rely on host function or
exogenous expression of the specific phage int gene, which promotes
site-specific integrations. These systems have been developed to
integrate genes at alternative attB sites using attP from various phages
including ?, HK220, F80, P21, P22, FCTX (Pseudomonas
aeruginosa) and FFSW (Lactobacillus casei).
Researchers in several groups are using the ?
bacteriophage (?Red) and E. coli (RecET) recombination
systems to introduce linear, heterologous DNA into the E. coli
chromosome. These systems exploit the induced hyper-recombinogenic state
of E. coli that arises when the ? bacteriophage exo and bet
or E. coli recE and recT genes are transiently
expressed, thereby promoting recombination, along with the ? phage gam
gene, which inhibits RecBCD activities. In E. coli, recombination
with sequences as short as 40 bp occurs efficiently during this hyper-recombinogenic
state. This technology offers advantages over traditional homologous
recombination in that: (i) recombination is achieved in a single step
(resolution of cointegrants is not necessary), (ii) prior cloning of the
gene of interest is optional (PCR products can be used), and (iii) only
very small regions of homology are required for recombination to take
place. Although these tools were designed for single gene replacements,
they can also deliver multigene expression cassettes to the chromosome.
Cassettes as large as 3.1 kb have been successfully inserted, but the
upper size limit remains to be determined.
Many chromosome integration systems retain antibiotic
resistance markers or other functional replicon-derived DNA that was
inserted into the chromosome. However, in some of the newer integration
systems, these sequences can be precisely removed once the desired
chromosomal modifications are constructed. Removing antibiotic selection
markers and inserted replicons not only allows for subsequent rounds of
modification without accumulating additional resistance markers, but
also yields modified microbial strains that are virtually free of DNA
sequences that might interfere with subsequent rounds of homologous
recombination. Furthermore, removing replicon sequences sometimes
results in recombinants with improved genetic stability and reduces the
chances of genetic transfer to other organisms. Examples of such
recombinase/site exision systems include Flp/FRT from Saccharomyces
cerevisiae, Cre/loxP from the P1 bacteriophage, ParA/res
from the RP4 plasmid, and Xis/att found in phage.
Artificial Chromosomes Are Being Used for Metabolic
Researchers in several laboratories are working with
extremely low-copy-number or single-copy plasmids that are large and
stable, and behave in effect as additional chromosomes. Examples include
the F plasmid and P1 prophage of E. coli and the TOL and NAH
plasmids of Pseudomonas.
With their ability to harbor very large sequences of DNA
and their relative stability, the replication origins and associated
partition elements of these plasmids make them well suited for use in
manipulating metabolism. Recently, for instance, we developed a
single-copy, narrow-host-range expression vector (bacterial artificial
chromosome) with which to engineer E. coli. This bacterial
artificial chromosome was constructed from a 9-kb region of the E.
coli F plasmid that contains several necessary elements, including
the oriV and oriS origins that ensure cell cycle-specific
replication of the plasmid; the locus that partitions plasmids into
daughter cells at division; the repD gene that recombines plasmid
dimers into monomers; and the ccd genes that kill any plasmid-free
This and other artificial chromosomes are segregatively
stable in the absence of selection pressure, have well-controlled gene
expression, and confer low metabolic burdens on host cells. Indeed,
bacterial artificial chromosomes would seem to be excellent alternatives
to multiple-copy-number plasmids for metabolically engineering E.
coli where extreme stability and low metabolic burden are sought.
Based on their large native size, these plasmids should be capable of
faithfully replicating the many genes that may be needed for
synthesizing complex products.
Fine-Tuning Gene Expression in Metabolically
Stringent control over the timing and level of gene
expression can profoundly affect a metabolically engineered pathway and
may determine its overall success. Achieving stringent control typically
depends on several factors, including choosing promoters of various
strengths, using inducers to control the induction level of a particular
promoter, varying the strength of the ribosome binding site, and
altering the stability of the transcript or enzyme it specifies.
Useful methods of modulating promoter activity include
varying the concentration of a specific inducer, such as with arabinose
and the araBAD promoter, or shifting the metabolic state of the
cell, such as with phosphate starvation and the phoA promoter. A
desirable and important feature of promoters is low (or no) expression
in the noninduced state, with gene expression levels proportional to the
amount of inducer used.
Whether the arabinose-inducible araBAD promoter (PBAD)
and regulator (AraC) of E. coli is used for controlling a single
gene or with another inducible promoter for controlling several genes,
it offers control of gene expression in response to inducer and tight
control in its absence. Unfortunately, the araC-PBAD
system and the associated high-capacity, low-affinity L-arabinose
transporter, AraE, display autocatalytic behavior that results in
all-or-none expression in E. coli.
Rather than varying levels of gene expression in
individual cells of the culture, varying arabinose concentrations in the
medium changes the fraction of cells that are fully induced and yields
two subpopulations of cells. However, if PBAD is being used
to control the expression of a gene or genes necessary for synthesizing
a particular product, only one subpopulation of cells will produce that
product. Recently, we showed that expression of araE from an
arabinose-independent (e.g., IPTG-inducible or constitutive) promoter
allows control of gene expression from PBAD in individual
cells, meaning a single, homogenous population of cells is found at all
inducer concentrations (Fig. 2).
To simplify gene expression, it would be valuable to
design cells that regulate the timing and level of expression during
fermentation runs. One approach is based on introducing a gene
expression system that can sense the metabolic state of the cell, based
on factors such as carbon, energy, nutrients, or stress, and to use it
to regulate expression of a pathway. For example, William R. Farmer and
James C. Liao of the University of California, Los Angeles developed an
elegant autoregulation approach, which they called "metabolic
control engineering," by using the Ntr regulon of E. coli.
They engineered a gene expression control loop to respond to excess
glycolytic flux, during which acetyl phosphate accumulates within cells.
In the absence of NRII, the nitrogen sensor of the Ntr regulon, excess
acetyl phosphate phosphorylates NRI (the response regulator of this
regulon)which, in turn, positively regulates expression of the
recruited glnAp2 promoter in the engineered cell.
Quorum sensing enables a bacterial cell to modify its
behavior in response to signals from other bacteria. This form of
communication between cells is observed in processes such as
bioluminescence, expression of virulence or pathogenicity, stimulation
of competence, and production of antimicrobial agents. Since quorum
sensing allows cells within a population to coordinate their aggregate
behavior, this form of cell signalling might be used for regulating gene
expression within recombinant pathways. For example, Oscar Kuipers and
other microbiologists at NIZO Food Research in Ede, the Netherlands,
recruited the nisin antibiotic quorum-sensing peptide (NICE) system to
control the onset of gene expression in lactic acid bacteria. With it,
they efficiently rerouted pyruvate metabolism to produce diacetyl, which
yields a buttery aroma in dairy products, and L-alanine.
Coordinating Multigene Expression Systems Adds
Some metabolic engineering applications depend on the
coordinated expression of several genes in a single host cell. These
genes may encode enzymes that are introduced to divert intermediates
from the usual pathway into producing an unusual end product. Under
ordinary circumstances, biochemical pathways operate under control
mechanisms that coordinate expression of genes within particular
pathways, optimizing the flux through the pathway and minimizing its
burden on the cell. In bacteria, for example, such pathways operate
under the control of operons, which coordinate gene expression.
However, transferring an operon from one microorganism
to another and achieving efficient expression of the enzymes within that
operon in a heterologous organism is not a trivial undertaking.
Constructing new operons containing one or several heterologous genes is
also a major challenge. Once constructed, additional methods are needed
to coordinate and optimize expression of multiple genes in a
One of the most straightforward, energy-efficient ways
to control expression of genes is at the transcriptional level. Some
operons, including the dadAX and cydAB operons of E.
coli, use multiple promoters to achieve coordinated protein
production. Others, such as the gapA operon of Bacillus
subtilis, use additional promoters located between genes to provide
differential control over gene expression.
In metabolic engineering applications, the most direct
approach to constructing a coordinated multistep biochemical pathway
entails cloning each separately introduced gene behind a different
promoter. In this way, the choice and subsequent manipulation of
inducers permits one to control expression of individual genes within
the constructed pathway (Fig. 3). For instance, Christian Solem and
Peter Jensen of the Technical University of Denmark in Lyngby
constructed a library of synthetic constitutive promoters of different
strengths by randomizing the spacer sequences flanking promoter
consensus regions. Synthetic constitutive promoters such as these can be
used to coordinate gene expression if independent control is not
required. This approach may be advantageous in that the engineered
pathway will be expressed at steady state with no need to fine-tune its
expression by adding precise amounts of inducers, some of which are
Alternatively, coordinated gene expression can be
achieved by directing the mRNA processing within operons. In some
bacteria, several or all of the genes encoding enzymes within a
particular metabolic pathway are under the control of a single promoter.
To compensate for differences in the specific activities of the enzymes
and to prevent intermediates from accumulating, stability differences of
individual coding regions in multicistronic transcripts may give rise to
vastly different enzyme levels even though the genes are under the
control of the same promoter. Technology based on this type of
posttranscriptional control represents an efficient but less direct
method than targeting transcriptional control to coordinate multiple
For example, we developed a technology for coordinating
multiple gene expression based on controlling the longevity of mRNA
species being generated through use of stability control elements. Some
of the control elements tested in this system include 5 and 3 RNA
hairpin structures of varying strength (?Gfolding),
RNase cleavage sites, and gene order in the operon (Fig. 3). We tested
this series of mRNA stability control elements for their ability to
modify the flux through several enzyme-catalyzed steps of a carotenoid
pathway, whose genes were introduced into E. coli. We could alter
individual enzyme levels, thereby altering the flux through this
pathway, yielding different levels of carotenoid intermediates within
these cells, indicating that rational design can be used to control how
much of particular metabolic intermediates will accumulate.
We find that changing gene location and endonuclease
cleavage sites within an operon leads to drastic differences in gene
expression, whereas introducing hairpin structures fine-tunes gene
expression levels. Depending on the type and location of control
elements that are introduced, we can vary relative steady-state
transcript and protein levels 500-fold and 1,000-fold, respectively.
Controlling transcript and protein levels over such a large dynamic
range undoubtedly will prove useful for anyone attempting to engineer
Alternative Control Elements
Work in this field initially focused on multiple
promoter systems and directed mRNA processing as a way of optimizing
multigene expression systems. However, as we learn more about the
various controls cells use to coordinate gene expression, alternative
design strategies that might be applied to these systems will be
discovered. Controlling translation initiation and elongation provides
one such potential means for coordinating gene expression. For example,
native operons sometimes use translational coupling to coordinate the
expression of genes within a multicistronic transcript. In some cases,
this control depends on a secondary structure that sequesters the
ribosome binding site (RBS) of the distal gene, thereby blocking its
translation. However, when the proximal gene is being translated, this
inhibitory structure unfolds, rendering the previously sequestered RBS
accessible. This natural phenomenon might be adapted for use in
producing two proteins within an engineered pathway at equivalent
Protein fusion systems are often used to monitor
production of one protein by tracking a reporter protein fused to it.
This approach could be put to a different usefor example, by fusing
two genes encoding different enzymes in a pathway to balance their
production. Enzyme fusion strategies may solve other problems. For
example, they might provide a means for properly folding otherwise
difficult proteins or for forcing two enzymes of a metabolic pathway
together, providing an efficient means for channeling products of the
first to be substrates of the second.
Deliberately induced molecular breeding based on DNA
shuffling is becoming a widely used technique for rapidly generating
diversity in genes as a way of seeking particular traits. Extensions of
this method from a focus on single genes to entire metabolic pathways
and, in some cases, microbial genomes are now being developed. Applying
this approach to particular biochemical pathways might yield operons
with balanced expression and activity of each enzyme within those
Putting Several Sweeping Analytic Approaches To Use
Technologies for analyzing metabolism on a cell-wide
basis could prove very useful in metabolic engineering. Altering or
engineering cell-wide gene expression (the transcriptome), corresponding
proteins (the proteome), and levels of cellular metabolites (the
metabolome) could be used to improve desired metabolic properties.
Several analytic approaches could play roles in these ambitious efforts,
DNA microarrays that can detect relative transcript
levels in cells are being used to analyze metabolism of several
microorganisms used in industry, including S. cerevisiae, E.
coli, B. subtilis, and Corynebacterium glutamicum.
This technology tracks global changes in mRNA levels in response to
different mutational backgrounds, heterologous protein production,
and growth conditions. However, transcript levels do not always
correlate with protein production, enzyme activity, and metabolic
Two-dimensional gels track changes in global
cellular protein production. Coupled with recent advances in protein
separation and mass spectrometry, these techniques are collectively
referred to as proteomics, bringing analysis an important step
closer to physiology. Thus, changes in one or more proteins, either
endogenous or heterologous, are more likely to affect cellular
Metabolomics, a comprehensive analysis of all
cellular metabolites, would allow one to monitor imbalances in
intermediates in engineered and endogenous pathways. Progress toward
such a sweeping undertaking is limited, and it is based on capillary
electrophoresis, high-performance liquid chromatography, gas
chromatography coupled with mass spectrometry, and nuclear magnetic
Phenotypic microarrays, developed by Biolog, use a
simple, inexpensive colorimetric assay developed in 96-well plates
to analyze thousands of different phenotypes through detection of
changes in cellular respiration.
These analyses produce massive amounts of data to
interpretrepresenting another major challenge facing those
attempting to optimize cellular metabolism for specific purposes.
Metabolic models for E. coli and various other industrial
microorganisms are being developed to analyze cellular resources
that may be diverted into heterologous metabolic pathways and to
identify bottlenecks. However, such analyses are complex, and
available models typically are designed to analyze specific subsets
of cellular metabolism, or particular growth conditions and
physiological states. Gene expression models will need to be
integrated with metabolic models if we hope to obtain a clear
picture of the effects of changes in metabolism on cellular
Datsenko, K. A., and B. L. Wanner.
2000. One-step inactivation of chromosomal genes in Escherichia coli K-12
using PCR products. Proc. Nat. Acad. Sci. USA 97:6640-6645.
de Lorenzo, V., M. Herrero, J. M.
Sanchez, and K. N. Timmis. 1998. Mini-transposons
in microbial ecology and environmental biotechnology. FEMS Microbiol.
Farmer, W. R., and J. C. Liao. 2000.
Improving lycopene production in Escherichia coli by engineering
metabolic control. Nature Biotechnol. 18:533-537.
Haldimann, A., and B. L. Wanner. 2001.
Conditional-replication, integration, excision, and retrieval plasmid-host
systems for gene structure-function studies in bacteria. J.
Jones, K. L., and J. D. Keasling.
1998. Construction and characterization of F plasmid-based expression
vectors. Biotechnol. Bioeng. 59:659-665.
Khlebnikov, A., O. Risa, T. Skaug, T. A.
Carrier, and J. D. Keasling. 2000. Regulatable
arabinose-inducible gene expression system with consistent control in
all cells of a culture. J.
Solem, C., and P. R. Jensen. 2002.
Modulation of gene expression made easy. Appl. Environ.
Smolke, C. D., T. A. Carrier, and J. D.
Keasling. 2000. Coordinated, differential
expression of two genes through directed mRNA cleavage and stabilization
by secondary structures. Appl. Environ.
Smolke, C. D., V. J. J. Martin, and J. D.
Keasling. 2001. Controlling the metabolic flux
through the carotenoid pathway using directed mRNA processing and
stabilization. Metabol. Eng. 3:313-321.
Zhang, Y.-X., K. Perry, V. A. Vinci, K.
Powell, W. P. C. Stemmer, and S. B. delCardayre. 2002.
Genome shuffling leads to rapid phenotypic improvement in bacteria.