Enzymology 2004 - 75-91
Enzymology 2004 - 75-91
Enzymology 2004 - 75-91
Introduction
Structure-based combinatorial protein engineering (SCOPE) is a pro-
cess for the synthesis of gene libraries that lay the genetic foundation for
the exploration of the relationship between structure and function in the
encoded proteins.1 The comparative analysis both structurally and func-
tionally of protein primary, secondary, and tertiary structure generates
numerous hypotheses with which to probe the relationship between molec-
ular structure and the ensuing functional readout. SCOPE provides a tool
for constructing the gene libraries that encode rationally engineered pro-
tein variants that provide the raw material for addressing these hypothesis
using both structural and functional analyses. Mechanistic hypotheses gen-
erated from structures derived from both experiment (crystallographic and
nuclear magnetic resonance) and homology modeling is used to design
oligonucleotides that code for crossovers between genes encoding structur-
ally related proteins. A series of polymerase chain reactions (PCR), culmi-
nating in the selective amplification of crossover products, incorporates
spatial information encoded in the oligonucleotide into a full-length gene
and the resultant hybrid protein. Iteration of the process enables the
synthesis of all possible combinations of desired crossovers, producing a
hierarchical collection of chimeras in analogy to a Mendelian population.
1
P. E. O’Maille, M. Bakhtina, and M. Tsai, J. Mol. Biol. 321, 677 (2002).
The principles of the process are generally applicable and the method-
ology is easily adapted to a range of experimental objectives. At its incep-
tion, SCOPE was developed to provide a means of generating multiple
crossover gene libraries from distantly related proteins, constituting a
homology-independent in vitro recombination approach.1 This article pre-
sents the adaptation of SCOPE to the facile combinatorial synthesis of
mutant gene libraries. The newly incorporated refinements to the originally
designed SCOPE approach illustrate the underlying principles of the ex-
perimental process that make it a robust technique for the parallel explo-
ration of protein sequence space in three dimensions. This tertiary
information is embodied within the mechanistic and evolutionary under-
pinnings of protein structure and function, both of which are fundamental
aspects of biochemical adaptive change in organisms. In addition, this
information can be exploited by SCOPE for a myriad of applications in
biotechnology.
Principles
The construction of gene libraries by SCOPE involves a series of PCRs.
Other recombination techniques use multiple primers or random fragments
in a single step, thus carrying out multiple reactions in parallel.2,3 Separation
of gene synthesis into discrete steps is an essential feature of SCOPE. This
simple but critical property of SCOPE enables one to control recombination
through pairing gene fragments and genes that give rise to designed and
anticipated combinations of crossovers. As a consequence, libraries are
constructed as a series of less complex mixtures, which reduce numerical
complexity and the cost and extent of sampling required during screening,
including gene sequencing and functional assays. Crossover locations and
the frequency of genetically encoded crossovers are established by experi-
mental design and are not dictated or constrained by homology between
genes or the linear distance between multiple mutations.
An overview of the process illustrates the basic steps encompassing
SCOPE-based recombination (Fig. 1). In step I, standard PCR amplifica-
tion, using an internal and external primer pair and the appropriate tem-
plate DNA, produces chimeric gene fragments. Internal primers are
designed on the basis of one or more encoded three-dimensional structures
viewed with reference to the variable sequence space of protein homo-
logues and code for crossovers in the protein-coding region of genes.
External primers correspond to the 50 and 30 termini of a given gene, as
2
W. P. Stemmer, Nature 370, 389 (1994).
3
F. J. Perlak, Nucleic Acids Res. 18, 7457 (1990).
[8] SCOPE 77
with any typical primer pair used in a standard PCR amplification. The
template consists of a plasmid or PCR product that contains the gene of
interest. In step II, in vitro recombination occurs between a gene frag-
ment(s) and a new template; in other words, gene fragments serve as a new
set of primers, which anneal and are extended to produce single-stranded
full-length chimeras. In step III, a new external primer set directs the
selective amplification of recombination products by virtue of the unique
genetic identity encoded at their termini. Repetition of steps II and III
using various pairs of gene fragments from step I and crossover products
from step III allows the production of genetically diverse, multiple cross-
over libraries in high yield.
Careful oligonucleotide design is central to the SCOPE recombination
process. A discussion of the properties of the synthetic oligonucleotides, in
relationship to specific applications, illustrates how SCOPE can be adapted
to construct multiple crossover libraries from distantly related proteins or
combinatorial mutant libraries from mechanistically related proteins.
Internal Primers
Shuffling exons or ‘‘equivalent’’ structural elements between homo-
logues require chimeric oligonucleotides. These are composed of approxi-
mately equal halves of two distinct genes and code for a crossover region.
An example of their use is illustrated in the process overview (Fig. 1).
78 methodology [8]
External Primers
Polymerase chain reaction amplification of mutant or chimeric genes
(step III, Fig. 1) is the final step of the SCOPE cycle. Like any conventional
amplification, a primer set that flanks the target gene of interest is required
during this final amplification step. Additionally, the inclusion of restriction
or recombination sites into the final primer set for the efficient cloning of
the resultant collection of genes is often desirable. However, a fundamental
aspect of SCOPE is that the ‘‘proper’’ primer set be used for the selective
amplification of a particular crossover product from a recombination reac-
tion, which may contain a mixture of products. In the chimeragenesis of
distantly related proteins, the termini of each gene are unique and can be
exploited in this way for selective amplification.
SCOPE, as applied to the combinatorial synthesis of mutant libraries,
where the termini of wild-type genes and crossover products are indistin-
guishable, required the design of alternative external primers and a book-
keeping system for their successful implementation and hierarchical
organization and storage. Primary amplification primers (PAPs) code for
DNA sequences flanking the gene (like any generic external primer), but
contain an additional and unique 50 sequence tag. Their use in gene frag-
ment synthesis (step I) links a unique sequence to a particular mutation.
Following recombination, secondary amplification primers (SAPs), which
[8] SCOPE 79
Terpene cyclases are an ideal proof of the principal system for explor-
ing the utility of SCOPE given their (1) unusual mechanism employing the
conformationally directed production of reactive carbocation intermedi-
ates, (2) well-defined three dimensional structures, (3) ease of product
identification and quantification using high-throughput GC-MS analysis,
(4) evolutionarily diverse distribution of protein sequences and small mol-
ecule products across multiple kingdoms, and (5) biotechnological poten-
tial for the biosynthesis of unique small molecules representing a currently
untapped region of natural product space.
Experimental Procedures
Materials
PCR components: 10 cloned pfu reaction buffer and pfu turbo DNA
polymerase (Stratagene, La Jolla, CA), dNTPs (Invitrogen, Carlsbad, CA),
and bovine serum albumin (BSA; New England Biolabs, Beverly, MA).
PCR reactions are carried out using a PTC 200 Peltier thermal cycler (MJ
Research, Waltham, MA). All PCR products are purified by gel extraction
(Qiagen, Valencia, CA), cloned into pDONR 207 using Gateway cloning
technology (Invitrogen) according to the manufacturer’s recommended
conditions. Plasmid DNA from gentamicin-resistant transformants is mini-
preped by the Salk Institute Microarray facility for sequencing at the Salk
Institute DNA sequencing/quantitative PCR facility. The cDNA of TEAS
is cloned into pH8GW (an in-house gateway destination vector) and this
plasmid DNA is used as a template for PCR.
All PCR are carried out using a master mix of a standard set of PCR
components for a 50-l scale reaction:
5 l of 10 cloned pfu reaction buffer to give 1
1 l of pfu turbo DNA polymerase (Stratagene) (2.5 U/l) to give
0.05 U/l
0.5 l of BSA (10 mg/ml) to give 0.1 mg/ml
8 l of dNTP mix (1.25 mM) to give 200 M each dNTP
Primers
Oligonucleotides are from Integrated DNA Technologies (IDT) and
are listed in Table I. For both mutagenic and chimeric primers, the muta-
tion(s) or crossover point(s) is located in the center of the oligonucleotide,
such that the flanking sequence is complementary to a given gene; ideally,
this should be 18 to 24 nucleotides (or have a Tm greater than or equal
to 50 ) for effective PCR. SAPs are designed to consist of 21 nucleotides
82 methodology [8]
TABLE I
Oligonucleotides Used for SCOPE Combinatorial Mutagenesis
a
Bold and underlined characters indicate the sites of designed mutations. Shading is used
to indicate part of the attB1 and attB2 recombination sequences; complete attB sites are
generated by amplification of the target gene(s) from the destination vector pH8GW.
b
Mutageneic primers are named according to the amino acid substitutions they code for.
[8] SCOPE 83
and have a Tm greater than or equal to 55 . PAPs contain 24 bases (in
addition to their unique sequence), which correspond to partial Gateway
attB sites; the remaining attB sequence becomes incorporated into PCR
products by amplification from pH8GW. Tm values are calculated based on
nearest-neighbor thermodynamic parameters.6
Gel Electrophoresis
Analysis of PCR fragments and separation of products for gel purifica-
tion are performed using 2% (w/v) agarose gels in 1 TAE buffer contain-
ing 0.1 g/ml ethidium bromide. Concentrations of PCR products (steps IB
and III) are estimated by comparison to a standard of known concentra-
tion, such as the low DNA mass ladder (Invitrogen) using densitometry
software such as ImageJ (http://rsb.info.nih.gov/ij/).
Method
Prior to library construction, all primers are tested to ensure that they
result in unique amplification products of the expected size. Like any
standard PCR amplification, optimization of cycling parameters for specific
template and primer sets may be necessary.
6
H. T. Allawi and J. Santa Lucia, Jr., Biochemistry 36, 10581 (1997).
c
Primary amplifications primers are named according to their unique sequence tag
(A through F for forward and 1 through 6 for reverse as listed in Fig. 4) and gateway
recombination sequence (b1 for attB1 and b2 for attB2).
d
Unique sequence tags are labeled according to their corresponding primary amplification
primer (A through F for forward and 1 through 6 for reverse as listed in Fig. 4).
84 methodology [8]
Single Mutants/Crossovers
Procedure. Reactions are mixed on ice using the following:
5.8 l of PCR master mix (as defined earlier)
1 l of step IB reaction to give 10 nM (or 1–5 ng/l) gene fragment
[8] SCOPE 85
Multiple Mutants/Crossovers
Procedure. Same as just described, except the gel-purified full-length
mutant/chimeric gene (step III product) at 1.0 ng/l (1 nM final
concentration) is substituted for plasmid DNA.
Multiplex Recombination
Procedure. A mixture of gene fragments (step IB products) corres-
ponding to a collection of mutations or alternative crossovers is pooled, and
1 l (to give 10 nM) is used with either a plasmid or a full-length mutant/
chimeric gene (step III product) as the template in a recombination reaction.
Comments. The amount of full-length, single-stranded recombination
product produced in step II is limited by the amount of gene fragment from
step IB added to the reaction mixture. Gene fragments should ideally be
1- to 10-fold molar excess of the plasmid or mutant gene that it is recom-
bining with. This is particularly important in the case of single mutants/
crossovers, where only one terminus can be exploited in the following step
for selective amplification. The plasmid concentration should be kept to a
minimum; about 10 pM is the lowest concentration that can be used to give
the amplifiable recombination product in step III.
and 72 for 1 min/kb of product followed by an additional 10 min at 72 and
incubation at 4 at the completion of cycling. Amplification products are
verified by agarose gel electrophoresis.
Controls
Fig. 4. Recombination units and tagging system. The recombined positions and their
associated unique sequence tags give rise to a naming system to describe the recombination
product created by SCOPE.
where k is the sample size, n is the number of unique members, and p is the
probability that a sample of size k contains at least one representative of
each unique member. As complexity increases, the amount of oversampling
required to achieve the same probability of screening the library increases,
where oversampling refers to sample size (k) in multiples of library
complexity (n). This can be shown graphically in Fig. 5.
Each iteration of the process ends with a conventional PCR amplifica-
tion step, and after multiple iterations, additional mutations accumulate.
The overall frequency of nondesired additional mutations in the popula-
tion analyzed is 5.5%. No strong bias for the type of error or its location
within the gene was observed. The nondesired mutation rate after the first
round was 2.67%, which matches previous measures of pfu error fre-
quency.7 However, the random mutation rate increases as a function of
7
J. Cline, J. C. Braman, and H. H. Hogrefe, Nucleic Acids Res. 24, 3546 (1996).
[8] SCOPE 89
Library Analysis
Over 600 colonies from discrete mixtures, representing about half of the
complexity of the TEAS library (241 unique members), were picked and
their sequences determined. A summary of the results is listed in Table II.
Of the clones sequenced, only 24 wild-type genes (3.5%) were found. This
library was synthesized prior to addition of the DpnI restriction step (as
described earlier), and while the efficiency of the first round of mutagenesis
was 80%, the overall efficiency of the entire process reached 96.5%
90 methodology [8]
TABLE II
Sequence Analysis Results
Library statistics
Concluding Remarks
Adaptation of SCOPE to combinatorial mutant library design and
construction demonstrates the broader utility of these library construction
principles. While various techniques have been developed for either ho-
mology-independent recombination or combinatorial mutagenesis, none
can efficiently do both. SCOPE provides an effective means for the crea-
tion of both global or local sequence space as demonstrated by the synthe-
sis of DNA libraries representing the genetically encoded information
spanned by distant homologues1 or closely related members of a gene
family.
[9] oligonucleotide library modules 91
Acknowledgments
We are grateful to the National Institutes of Health for the grants that supported this
work (GM43268 to M.D.T. and GM54029 to J.C. and J.P.N.). P.E.O. is an NIH Postdoctoral
Research Fellow (GM069056-01). Additionally, we thank Marina Bakhtina and Brandon
Lamarch for valuble consultations during the early phases of this work.
Introduction
Current strategies for the construction of combinatorial gene libraries
for directed evolution experiments generally make use of cassette muta-
genesis1,2 to insert library modules3–6 into plasmids. We have applied this
technique in a variety of formats to investigate chorismate mutase, a key
enzyme in the biosynthesis of aromatic amino acids.7 Active variants are
directly selected from gene libraries transformed into a chorismate mutase-
deficient Escherichia coli strain (Fig. 1).8 Because catalytic activity is an
extremely sensitive probe for protein integrity, a wealth of information on
structural and functional aspects of this enzyme can be derived from
sequence patterns in selected variants.9
The extent of randomization of the gene library cassettes depends
on the questions asked. For instance, to investigate the roles of indi-
vidual active site residues, one or two codons were randomized at
a time.8,10 When loops connecting secondary structural elements were
(re-)designed, we opted for formats mutagenizing three to seven codons
1
J. A. Wells, M. Vasser, and D. B. Powers, Gene 34, 315 (1985).
2
J. F. Reidhaar-Olson and R. T. Sauer, Science 241, 53 (1988).
3
S. Kamtekar, J. M. Schiffer, H. Xiong, J. M. Babik, and M. H. Hecht, Science 262, 1680
(1993).
4
G. Cho, A. D. Keefe, R. Liu, D. S. Wilson, and J. W. Szostak, J. Mol. Biol. 297, 309 (2000).
5
S. V. Taylor, K. U. Walter, P. Kast, and D. Hilvert, Proc. Natl. Acad. Sci. USA 98, 10596
(2001).
6
T. Matsuura, A. Ernst, and A. Plückthun, Protein Sci. 11, 2631 (2002).
7
E. Haslam, ‘‘Shikimic Acid: Metabolism and Metabolites.’’ Wiley, Chichester, UK, 1993.
8
P. Kast, M. Asif-Ullah, N. Jiang, and D. Hilvert, Proc. Natl. Acad. Sci. USA 93, 5043 (1996).
9
S. V. Taylor, P. Kast, and D. Hilvert, Angew. Chem. Int. Ed. 40, 3310 (2001).
10
P. Kast, J. D. Hartgerink, M. Asif-Ullah, and D. Hilvert, J. Am. Chem. Soc. 118, 3069
(1996).