The ISME Journal (2008) 2, 171–179& 2008 International Society for Microbial Ecology All rights reserved 1751-7362/08 $30.00
Enzyme improvement in the absenceof structural knowledge: a novelstatistical approach
Yoram Barak1,4, Yuval Nov2,4, David F Ackerley1,3 and A Matin11Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA;2Department of Statistics, University of Haifa, Haifa, Israel and 3School of Biological Sciences, VictoriaUniversity of Wellington, Wellington, New Zealand
Most existing methods for improving protein activity are laborious and costly, as they either requireknowledge of protein structure or involve expression and screening of a vast number of proteinmutants. We describe here a successful first application of a novel approach, which requires nostructural knowledge and is shown to significantly reduce the number of mutants that need to bescreened. In the first phase of this study, around 7000 mutants were screened through standarddirected evolution, yielding a 230-fold improvement in activity relative to the wild type. Usingsequence analysis and site-directed mutagenesis, an additional single mutant was then produced,with 500-fold improved activity. In the second phase, a novel statistical method for proteinimprovement was used; building on data from the first phase, only 11 targeted additional mutantswere produced through site-directed mutagenesis, and the best among them achieved a 41500-foldimprovement in activity over the wild type. Thus, the statistical model underlying the experiment wasvalidated, and its predictions were shown to reduce laboratory labor and resources. The ISME Journal (2008) 2, 171–179; published online 22 November 2007Subject Category: microbial engineeringKeywords: protein design; Nov–Wein model; directed evolution; rational design
targeted changes in amino acids around its activesite; several other such structure-based improve-
Improving the activity of a protein by manipulating
its sequence—a process termed protein design—is of
great interest in medicine and biotechnology, and
protein is expensive, laborious and time consuming,
has been widely practiced. However, the sequence
and activity predictions based on structure are
space is ‘more than astronomically’ vast
limited in their success. Thus, design methods that
do not rely on structural knowledge are needed, not
experimentally feasible to test all possible mutants
only for proteins whose structure is not known but
of a protein nor is it necessary, since many of the
also where structural information is available, since
resulting sequences do not fold into functioning
activity may be influenced by amino acids not
One mutagenesis approach, termed rational de-
sign, uses information about the three-dimensional
An alternative to structure-based rational design
structure of the protein and its target molecule to
is directed evolution—a selective process that
identify promising sequence changes. Thus,
mimics nature, whereby a protein is ‘bred’ through
successive generation of gene libraries; the members
coli nitroreductase, NfsB, for prodrug reduction by
of these libraries are randomly mutated andshuffled, and their resulting proteins are thenscreened for improved activity. Common methods
Correspondence: A Matin, Department of Microbiology andImmunology, Sherman Fairchild Science Building, Stanford
for generating such libraries include error-prone
University School of Medicine, 299 Campus Drive W, Stanford,
PCR and recombination between homologous re-
These authors contributed equally to this work.
Received 4 June 2007; revised 8 October 2007; accepted 9 October
Directed evolution is widely practiced and
has produced important results, yet it typically
A statistical approach for enzyme improvement
necessitates expression, purification and screening
tase of unknown structure, which has a wide
of thousands of protein mutants. In addition,
directed evolution is a ‘blind’ process, and it is
several beneficial activities such as chromate and
virtually impossible to mathematically predict its
uranyl (U(VI)) reduction (useful in the bioremedia-
A third approach to protein design models the
relation between the sequence of a protein mutant
reduction (useful in cancer chemotherapy
and its activity (fitness) as a statistical relationship.
)). Improvement in all three activities is
That is, one assigns a distribution of activity levels
for each protein mutant, rather than a singlepredicted activity, and can thereby specify probabil-ities for the various activity levels. Among the
models that belong to this class are the NK model
Strains, plasmids, genes, primers and growth
Supplementary Table 1 lists the strains, plasmids
and primers used in this study. The various strains
were grown at 37 1C to mid-exponential phase,
the NK models are Gaussian, and most of the
induced by 0.5 mM isopropyl-b-D-thiogalactoside
regression-based methods are implicitly Gaussian,
and incubated overnight for protein production.
as they assume Gaussian distribution of the errorswhen computing confidence intervals, P-values, etc. The statistical approach to protein design circum-
vents the need to decipher a protein’s structure and
Routine DNA manipulations were performed as
promotes identification of promising mutant candi-
dates, thus significantly reducing the number of
was carried out by miniprep (Qiagen Inc., Valencia,
in which the activity of bacterial halohydrin
Corporation, CA, USA, using appropriate primers
dehalogenase was significantly improved to meet
design criteria in the commercial production ofatorvastatin (Lipitor), a cholesterol-lowering drug. The enzyme was optimized through a statistical
Directed evolution of the chrR gene for improving
analysis method termed protein sequence activity
relationship, combined with directed evolution and
Error-prone PCR was used to introduce random
We report here a successful first empirical
using the GeneMorph II Random Mutagenesis kit
application of a novel method belonging to the last
(Stratagene Corporation, La Jolla, CA, USA). For-
mentioned class; the method is based on a statistical
ward and reverse chrR primers (Supplementary
model for the sequence–activity relationship pro-
Table 1) were used to amplify full-length hybrid
posed by Nov and Wein (hereafter referred to as ‘the
model’), whose theoretical and mathematical details
The shuffled genes were ligated into the pET28a þ
plasmid, and transformed into E. coli BL21 (DE3)
Briefly, this model is additive, in the sense that it
(Invitrogen Inc., Carlsbad, CA, USA) to allow over-
assumes that after proper transformation of the data,
expression. Recombinants were selected on plates
the change in activity caused by a multiple-residue
containing kanamycin (50 mg mlÀ1). High-throughput
mutation roughly equals the sum of the activity
screening of 7000 recombinants was performed by
changes caused by the corresponding single-residue
inoculating colonies into individual wells of 96-well
mutations; the degree of non-additivity is captured
microtiter plates, containing 200 ml Luria–Bertani
through one of the model’s parameters. The model is
medium and kanamycin. After growth to stationary
sparse in parameters, and is mathematically tract-
phase (overnight incubation, final A660, 1–1.5), 20 ml
able, conveniently allowing one to update the
aliquots from each well were used to inoculate a
activity distributions of the yet-unexplored mutants
second series of plates, using M9 minimal medium
from the sequence–activity data of tested mutants.
(Sigma Inc., St Louis, MO, USA). Each well received
In addition to their sequence–activity relationship
the same initial inoculum. The first set of plates was
model, Nov and Wein suggested an optimization
stored at À80 1C after addition of glycerol. Cells in
module for selecting promising mutant candidates; a
the second inoculation series were allowed to grow
variant of this module was used in this study. The
to mid-exponential phase and then exposed to
relevant aspects of the model used in this study are
0.5 mM isopropyl-b-D-thiogalactoside to induce the
presented in the Materials and methods section.
recombinant gene expression. After overnight in-
The improvement efforts targeted the E. coli
cubation, cells were lysed by addition of 30 ml
enzyme ChrR, an NAD(P)H-dependant oxidoreduc-
BugBuster (Novagen Inc., San Diego, CA, USA),
The ISME Journal
A statistical approach for enzyme improvementY Barak et al
incubated for 20 min at room temperature, and
Table 1 First phase: the effect of sequence changes on chromate
centrifuged for 20 min at 3000 g. Supernatant
reductase activity (Vmax) of the E. coli ChrR protein
(100 ml) was mixed with 10 ml solution of the
2 mM NADH, 100 mM Tris-HCl (pH 7) and ddH2O
and chromate reduction wasassayed as described below.
The most efficient enzymes for Cr(VI) reductase
activity were purified on nickel columns, as previously
obtained from the frozen plates. Protein concentrations
were determined with the Bio-Rad Dc protein assay
kit, using bovine serum albumin as a standard.
were used for site-directed mutagenesis. These were
designed to create single-codon mutations following
the desired mutations had been generated was
obtained by sequencing. Proteins encoded by the
modified genes were generated as described above.
Abbreviation: WT, wild type. Mutants ChrR6–ChrR20 were obtained through directed evolution,
and ChrR21 was produced through site-directed mutagenesis.
Determination of Cr(VI) reduction rates by cellextract preparation and chromate reductase assayswere conducted as described previously
were collected after incubation for the specified
time. A 120 ml sample was mixed with 130 ml reagent
of enzyme activity were performed at pH 7 and at
mixture containing 5:1:1:1:5 proportion of complex-
ing solution, TAC (2-(2-thiazolyazo-p-cresol)), Triton
1C. Each assay was conducted four times unless
X-100 (0.15 M), CTAB (N-cetyl-N,N,N-trimethyam-monium bromide) and triethanolamine buffer (pH6.5). The method depends on the TAC binding to
U(VI), which is aided by Triton and CTAB. After
Reductive prodrugs become strong killing agents of
15 min of color development, the samples were
biological cells upon reduction. The capacity of the
read at A588 nm using a Micro-Plate Reader (ASYS
mutant enzymes to carry out this reduction was
determined with minor modifications as previouslydescribed Briefly, prodrugreduction mixtures contained mitomycin C, CB
1954 (5-aziridinyl-2,4-dinitrobenzamide) or 17-AAG
(17-allylamino-17-demethoxygeldanamycin) at a con-
centration of 15 mM, 10 mg ml–1 pure enzyme, 50 mM
NADPH and Dulbecco’s modified Eagle’s medium
estimation of the model parameters, as well as other
scientific programming, was carried out through
Following prodrug reduction for 30 min at 37 1C,
MATLAB (The MathWorks Inc., Natick, MA, USA).
0.5 ml of JC breast cancer cells (B0.5–1 Â 105) wereadded and the cells were incubated for additional24 h. After the latter incubation, 20 ml of the color
reagent, CellTiter 96 AqueousOne (Promega Inc.,
The model has four parameters: the drift m, which is
Madison, CA, USA) was added to 100 ml aliquots
the expected change in fitness due to introduction of
of the reaction mixture. Following 1 h of further
a new, arbitrary mutation (a negative number, as
mutations more often decrease than increase fit-
ness); the site variance sS, which is the variance ofthe change in expected fitness contribution due to amutation across sites; the residue variance s2R, which
is the variance of the fitness contribution of a
For selected mutant enzymes, uranyl reductase
specific single-residue mutation within a site; and
activity was also determined. This was carried out
the non-additivity variance s2N, which captures
both the degree of non-additivity and the level of
The ISME Journal
A statistical approach for enzyme improvement
measurement noise (as in all additive models, these
two effects cannot be distinguished from one
another). For a thorough presentation of the model,
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
below, a variant of the model, with only three
substrate converted per milligram protein per min-
where F1 is the log-transformed r-vector (r ¼ 15) of
ute) was taken to equal log10(Vmax/vwt), where vwt is
the 15 Vmax values of the mutant proteins
the Vmax value of the wild-type enzyme (which was
excluding the wild type and ChrR17), m1 is its mean
vector (computed according to Equation (1)) and S1
goodness-of-fit of the data to the model, and set
is its r  r covariance matrix (computed according
the fitness of the wild type to 0, as required by the
to Equations (2) and (3)). The resulting estimates
S ¼ 0.4861 and sN ¼ 0.1478. The estimate of
The 16 mutant proteins sequenced in the first
the third parameter, m, was positive, in contrast to
phase involved mutations in n ¼ 11 sites (A44,
the model’s assumptions. This finding was ex-
D103, V120, Y128, G150, Q153, N154, T160, Q175,
pected: the sequences obtained in the first phase
Q184, K187; Only one of these sites, Q175,
were not a random sample from the sequence space,
had more than one substituent amino acid—Q175L
in which a priori it is expected that most mutations
and Q175H—of which the latter appeared in only
are deleterious (corresponding to a negative m);
one sequence, ChrR17. To improve the numerical
rather, these sequences were chosen by the selective
stability of the estimation computations, ChrR17
directed evolution process due to their improved
was omitted from the data, so that only 15 sequences
fitness, and thus carry seriously distorted informa-
were used; otherwise, the parameter s2R would have
tion about m. Therefore, for fitness prediction
appeared in only two entries of a 16 Â 16 covariance
purposes (see below), only the two estimated
matrix. It is for this reason that a three-parameter
version of the model was used, employing the
value of m was varied, in jumps of size 0.2, across
the model is a Gaussian random field F ¼ {Fs}, where
For the second application of the model predic-
the index set of F consists of all 211 ¼ 2048 sequences
tions, after the activity information of the first
that may be generated from the genetic diversity of
round’s five mutants became available, the para-
the 15 mutants found in the first phase. The joint
meters were re-estimated in a similar way, using
distribution of the elements of F is given by the
r ¼ 15 þ 5 ¼ 20 in Equation (4) and appropriately
modified F1, m1 and S1. The resulting estimates were
s ¼ 0.4361 and s ¼ 0.1961, very similar to those
By the additivity of the model, the conditional
expected fitness of a sequence s given the data,
where d(s, sˆ) is the number of sites in which a
E(Fs|F1), is the sum of the conditional expected
sequence s differs from the wild-type sequence sˆ
fitness contributions from each of the 11 mutated
and M(s, s0) is the number of sites in which both
sites. The contribution from a site having the wild-
sequences s and s0 differ from the wild-type
type residue is 0 (and hence so is its expected
sequence. In this three-parameter form (but not in
contribution), and that of a site i with a non-wild-
the full four-parameter form), the model is similar to
type residue is a random variable fi. The conditional
a regression model with random coefficients (some-
expected value of the vector f ¼ (f1, ., fn) is
an intercept, in which the predictors are binary
variables, indicating the presence or absence of
a mutation, their coefficients are N(m, s2S) randomvariables, and the variance of the error terms is
where m is a constant n-vector, having all of its
As no prior distribution is assumed over the
element equal to m; the matrix S1 is the inverse of
parameters, the model is not Bayesian.
S1 and S2 is an n  r matrix, having sS as its (i, j)thentry, if mutant j had a mutation at site i, and0 otherwise.
In the first round of the second phase, the
The parameters of the model were estimated by the
maximum likelihood method. Specifically, m, s2S
pond to a proportion of non-additive variance of
and s2N were initially estimated to be the maximizers
0.1478/(2 Â 0.4861 þ 0.1478) ¼ 0.132 among double
The ISME Journal
A statistical approach for enzyme improvementY Barak et al
mutants, which is low enough to allow reliable
mutants isolated in this phase, so an additional
predictions. As mentioned above, the value of m
mutant, containing the single mutation Y128N,
was not estimated from the data, and was varied
was generated through site-directed mutagenesis.
from À1.5 to À0.1. For each value of m, the n-vector
E(f|F1) was computed according to Equation (5) (see
mutants in chromate reductase activity, exhibiting
Supplementary Table 2), and the conditional ex-
a 500-fold improvement over the wild type. An
pected fitness values of all possible n(nÀ1)/2 ¼ 55
additional (fourth) round of directed evolution,
double mutants were calculated. Among these, the
using DNA from ChrR6 to ChrR21 as template and
five double mutants with the highest expected
screening around 1000 variants, did not yield
fitness (averaged across all m, and not including
sequences already in the data set) were identified,
The second phase of the study consisted of
applying the model to the sequence–activity data
column). The sequences for the second round were
chosen in the same method, with the appropriate
estimated from the entire information of
changes to r, F1, m1 and S1. Since only triple-residue
and the sequences of the five most promising double
mutants were considered in this round, the mutants
mutants (that is, the five mutants that possess the
chosen were the top ones, in terms of conditional
highest conditional expected activity, among those
expected fitness, among all n(nÀ1)(nÀ2)/6 ¼ 165
differing from the wild type in two amino acids)
were mathematically identified (ChrR22 to ChrR26;These mutant proteins were generated inpure form by site-directed mutagenesis and nickel
column purification as described and their Vmax for chromate reduction was
A two-phase strategy for ChrR improvement was
employed: a ‘blind’ directed-evolution approach in
ChrR23, exhibited a Vmax of 258 000, corresponding
the first phase and the model-based predictions to
to an 876-fold improvement in activity over the wild
obtain further improvement in the second. In the
type, around fourfold improvement over ChrR13
first phase, ChrR protein mutants were obtained by
(the best mutant obtained in four rounds of directed
subjecting the chrR gene to three rounds of error-
evolution, which necessitated screening of 7000
prone PCR. Each round was followed by screening
mutants), and a 1.75-fold improvement over ChrR21
the resulting mutant proteins for chromate reductase
(the best mutant isolated in the first phase). In
activity, using a colorimetric method that provides
addition, the average Vmax of mutants ChrR22–
an approximate indication of the degree of improve-
ChrR26 was significantly higher than the average
ment in this activity. Around 6000 mutants were
Vmax of the first-phase mutants ChrR6–ChrR21
screened. The top 15 mutant proteins were purified
(104 000 vs 26 000; P ¼ 0.0084 in a one-tailed
Mann–Whitney test for median comparison).
reduced per milligram protein per minute) was
To further improve the ChrR enzyme, we con-
ducted a second screening round according to the
model predictions. The parameters of the model
reduction (425-fold improvement) compared to the
were re-estimated, using the sequence–activity data
wild-type enzyme, the best, ChrR13, showing a V
of 67 500, corresponding to about 230-fold improved
seven most promising triple mutants were identi-
fied. One of these mutants could not be generated,
Sequence analysis revealed that the Y128N sub-
but the remaining six were produced as described
stitution was common to almost all of the improved
above, and their chromate reductase Vmax valueswere measured Strikingly, one of these,ChrR30, exhibited 1554-, 6.6- and 3.1-fold improve-ments over the wild type, ChrR13 and ChrR21,
Table 2A First round of second phase: sequence and Vmaxactivity of five mutants predicted by the Nov–Wein model to
respectively. Thus, by screening just a few mutants,
have improved chromate reductase activity
a multifold enhancement was obtained in anenzyme already improved to a large degree. The
aggregate average Vmax of the 11 mutants ChrR22–
ChrR32 (117 000) was also significantly higher
than that of the first-phase mutants ChrR6–ChrR21
Previous results had shown a positive correlation
between chromate reductase activity and other
We therefore examined the activity ofthree of the most active mutants in chromate
Abbreviation: WT, wild type. Predictions were based on the sequence–activity data of
reductase—ChrR21, ChrR23 and ChrR30—in two
The ISME Journal
A statistical approach for enzyme improvement
additional respects, namely, prodrug and U(VI)
reductase activity for each mutant; ChrR30 being
reduction. The capacity of the mutants to reduce
prodrugs was determined by the efficiency with
The three mutants also exhibited improved uranyl
which they killed cells of the JC breast cancer cell
reductase activity compared to the wild-type en-
line. Three prodrugs, namely, mitomycin C, CB 1954
and 17-AAG, were used. All three mutants were
activity, no further improvement in this activity was
more potent than the wild-type enzyme in activating
shown by the other mutants over ChrR21.
each of the drugs, and in causing the drug-mediatedkilling of the cells (This activity corre-lated, by and large, with improved chromate
Directed evolution has resulted in successful gene-ration of many improved proteins, but this approach
Table 2B Second round of second phase: sequence and Vmax
is blind, laborious and time consuming. Typically,
activity of six additional mutants chosen according to the
the improvements achieved in the early rounds of
directed evolution are significant, but in laterrounds, even when a large number of further
mutants is screened, improvements become smaller
and less frequent. For example, and were able
to improve the thermostability of lactate oxidase by
18-fold after screening around 3000 mutants, but
had to screen more than 20 000 additional mutants
for a twofold further improvement. Since mutations
are more often deleterious to protein activity, it hasbeen thought that increased mutational rate was
likely to correlate with loss of function, and 1–3
mutation rate per gene was considered desirable. Recently,however this notion has been questioned. and have
shown that higher mutation rate libraries (15–30 per
gene) have a better probability of generating im-proved
employed in our experiment was low (1–5 per gene)and therefore the fourth round of directed evolution
resulting with no improvement might be explained
Viability (%)
by ‘masking’ of deleterious mutations over bene-ficial ones.
While we could have obtained further improve-
ment in ChrR activity using the directed evolution
process by screening a large number of additionalmutants (perhaps 20 000 or more) in later rounds,
The effect of mitomycin C ( ), CB 1954 (5-aziridinyl-
2,4-dinitrobenzamide) ( ) and 17-AAG (17-allylamino-17-de-
our use of the Nov and Wein model clearly afforded
methoxygeldanamycin) ( ) on the killing of JC breast cancer
a significant saving in screening effort in these later
cells in the presence of the wild-type or the evolved enzymes
stages. To provide a perspective: it was necessary to
(10 mg mlÀ1). The concentration of the drugs was 15 mM. The
screen around 7000 mutants in four rounds of
enzymes were incubated with the drug for 30 min (37 1C),
directed evolution (the last of which yielded no
followed by the addition of the cells. After 24-h incubation(37 1C), cell viability was determined as described in the Materials
additional improvement) to obtain a 230-fold in-
crease in ChrR activity; in contrast, the model made
Table 3 Uranyl reduction kinetics of selected evolved mutants
The ISME Journal
A statistical approach for enzyme improvementY Barak et al
it possible to improve the enzyme significantly
mutant activity (such as and thus to isolate
further (46-fold improvement over the best mutant
mutants that otherwise may have been difficult to
obtained by directed evolution and 41500-fold
improvement over the wild type) by screening only
All mutants designed in the second phase based
11 targeted new mutants. This saving in screening is
on the model predictions are built from mutations
especially attractive when the screening cost is high
generated through directed evolution in the first
compared to the cost of producing site-directed
phase. Thus, in principle, it was possible to obtain
the new mutants through additional rounds of
directed evolution, without using the statistical
how a statistical model can augment directed
model. However, as directed evolution is a blind
evolution to significantly improve the cyanation
process governed by chance, it is not clear screening
of how many additional mutants would have been
Although both studies employed additive statistical
required to achieve an improvement comparable
models coupled with traditional techniques, their
to that which the application of the model made
results do not permit easy comparison, since (a)
possible; it should be kept in mind that the
enzyme activity was measured differently in the two
last round of directed evolution yielded no
studies and (b) it is not known which of the two
enzymes is more amenable to optimization. How-
The model postulates that mutations are approxi-
ever, Fox et al. improved activity by B4000-fold
mately additive. Is this assumption supported by the
after screening more than 500 000 mutants in 18
data? Based on the second, more complete estimate
rounds, while in the present work, we achieved
of the parameters, the fraction of the total variance of
B1500-fold activity improvement after screening
the fitness of a double mutant that is due to non-
B7000 variants in six rounds. Furthermore, as the
additivity (and measurement noise) is 0.1961/
structure of the halohydrin dehalogenase enzyme is
known, some of the diversity in Fox et al. experi-
the fraction is 0.1961/(3 Â 0.4361 þ 0.1961) ¼ 0.13.
ments was generated through rational design. In this
These relatively low numbers indicate that the data
work, no structural knowledge was used, as the
are not particularly noisy, there are no strong
structure of the ChrR enzyme is unknown. Both
epistatic effects and that the mutational effects are
studies demonstrate the power of statistical model-
mostly additive. Two additional points regarding
ing in protein design, and both permit beneficial use
additivity are noteworthy. First, additivity is as-
of information gained from mutants with reduced
sumed to apply to the transformed activity measure-
ments, rather than to the raw data. For example,
The genetic diversity spanned by the directed
ChrR31 is the combination of ChrR11 and ChrR25,
and the deviation from perfect additivity in
3000 possible combinations, among which (after
the raw data (202 778 vs 1000 þ 147 222) is much
omitting ChrR17) 55 are double mutants and 165 are
greater than that in the transformed data (2.83 vs
triple mutants. As diversity increases, the numbers
0.53 þ 2.70). Second, as often happens in statistical
grow exponentially: when one considers 15 mutated
analysis, even when approximate additivity holds,
positions with two possible mutations in each, there
some considerable exceptions occur; this can be
are 4107 possible combinations (420 double mu-
seen in our data when comparing ChrR10, whose
tants, 3640 triple mutants); and with 20 mutated
transformed activity is 1.81, against ChrR11 com-
positions and three possible mutations in each,
bined with ChrR21, whose sum of transformed
there are 41012 possible combinations (1710 double
mutants, 430 000 triple mutants). Thus, exhaustive
Producing designed mutants through site-directed
search in a laboratory, even only for double and
mutagenesis, as our approach required, is not
triple mutants, does not scale well, and system-
always simple, as certain designed mutants are
atically producing and screening all of them would
difficult to generate in a laboratory. A potential
be an extensive and highly laborious feat. The
remedy for this problem is to create in the second
predictions of the model allow one to screen instead
phase, after statistically analyzing sequence–activity
only a few targeted mutants, and still improve
data from the first phase, combinatorial libraries
containing only putatively beneficial mutations.
It is serendipitously possible to identify promis-
These focused libraries will then be subject to
ing mutants by simply ‘gazing’ at activity data and
directed evolution, and are more likely to achieve
detecting beneficial mutations, as was done in the
improvement than straightforward directed evolu-
discovery of the single-residue mutant ChrR21 in
tion libraries that do not incorporate statistical
this work. However, a systematic mathematical
analysis in their design. This approach is pursued
approach is needed to identify more complex
mutants, such as ChrR30. The model is shown here
One might suggest that our statistical analysis
to be a valuable tool for such situations, as it allows
could benefit from adopting a Bayesian approach,
one to rigorously separate the expected contribution
where prior distributions are set over the para-
to activity from each of the mutations in a data set of
meters. However, as this work is the first to study
The ISME Journal
A statistical approach for enzyme improvement
enzyme activity data in light of the model, we could
Chen K, Arnold FH. (1993). Tuning the activity of an
not use informative priors for Bayesian estimation.
enzyme for unusual environments: sequential random
The proper choice of non-informative priors is
mutagenesis of Subtilisin E for catalysis in dimethyl-
under debate among statisticians, especially for
formamide. Proc Natl Acad Sci USA 90: 5618–5622.
parameters of the type appearing in our model,
Chica RA, Doucet N, Pelletier JN. (2005). Semi-rational
approaches to engineering enzyme activity: combining
which are not constrained to lie in a known interval.
the benefits of directed evolution and rational design.
We note, though, that when varying the value of m
in our analysis, we effectively used a Bayesian-like
Daugherty PS, Chen G, Iverson BL, Georgiou G. (2000).
approach with a non-informative prior for estima-
Quantitative analysis of the effect of the mutation
frequency on the affinity maturation of single chain Fv
antibodies. Proc Natl Acad Sci USA 97: 2029–2034.
improving bacterial bioremediation and prodrug
Dennett DC. (1995). Darwin’s Dangerous Idea: Evolution
and the Meanings of Life. Simon & Schuster Inc.:New York, NY.
Drummond DA, Iverson BL, Georgiou G, Arnold FH.
(2005). Why high-error-rate random mutagenesis li-
braries are enriched in functional and improvedproteins. J Mol Biol 350: 806–816.
We are grateful to Drs Bruno Salles, Mike Benoit and
Fox RJ, Davis SC, Mundorff EC, Newman LM, Gavrilovic
Ms Mimi Keyhan for their useful advice and stimulating
V, Ma SK et al. (2007). Improving catalytic function by
discussion. We thank Dr Stephen H Thorne for kindly
ProSAR-driven enzyme evolution. Nat Biotech 25:
supplying us with freshly made JC breast cancer cells. We
also thank three anonymous referees whose insightful
Grove JI, Lovering AL, Guise C, Race PR, Wrighton CJ,
comments and suggestions greatly improved this article.
White SA et al. (2003). Generation of Escherichia coli
This work was supported by Grants DE-FG03-97ER-
nitroreductase mutants conferring improved cell
624940 and DE-FG02-96ER20228 from the Natural and
sensitization to the prodrug CB1954. Cancer Res 63:
Accelerated Bioremediation Program of US Department of
Energy, and Stanford Office of Technology Licensing
Kauffman SA, Levin S. (1987). Towards a general theory
(1105626-100-WOAAA). YB and DFA were supported,
of adaptive walks on rugged landscapes. J Theor Biol
in part, by a Postdoctoral Fellowship from Lady
Davis Postdoctoral Fellowship and FRST New Zealand
Kuipers OP, Boot HJ, de-Vos WM. (1991). Improved site-
(STAX0101) Fellowship, respectively.
directed mutagenesis method using PCR. NucleicAcids Res 19: 4558.
Lejon T, Strom MB, Svendsen JS. (2001). Antibiotic
activity of pentadecapeptides modeled from aminoacid descriptors. J Pept Sci 7: 74–81.
Mee RP, Burton TR, Morgan PJ. (1997). Design of active
analogues of a 15-residue peptide using D-optimal
Ackerley DF, Barak Y, Lynch SV, Curtin J, Matin A. (2006).
design, QSAR and a combinatorial search algorithm.
Effect of chromate stress on Escherichia coli K12.
Minagawa H, Hiroki K. (2000). Effect of double mutation
Ackerley DF, Gonzalez CF, Park CH, Blake R, Keyhan M,
on thermostability of lactate oxidase. Biotechnol Lett
Matin A. (2004). Chromate reducing properties of
soluble flavoproteins from Pseudomonas putida and
Minagawa H, Yoshida Y, Kenmochi N, Furuichi M,
Escherichia coli. Appl Environ Microbiol 70: 873–882.
Shimada J, Kaneko H. (2007). Improving the thermal
Aharoni A, Gaidukov L, Khersonsky O, Gould McQS,
stability of lactate oxidase by directed evolution. Cell
Roodveldt C, Tawfik DS. (2005). The ‘evolvability’ of
promiscuous protein functions. Nat Gen 37: 73–76.
Nov Y, Wein LM. (2005). Modeling and analysis of protein
Aita A, Husimi Y. (2000). Adaptive walks by the fittest
design under resource constraints. J Comput Biol 12:
among finite random mutants on a Mt. Fuji-type
fitness landscape. J Math Biol 41: 207–231.
Park C-H, Keyhan M, Wielinga B, Fendorf S, Matin A.
Arnold FH. (1998). Enzyme engineering reaches the
(2000). Purification to homogeneity and charcteri-
boiling point. Proc Natl Acad Sci USA 95: 2035–2036.
zation of a novel Pseudomonas putida chromate
Arnold FH. (2006). Fancy footwork in the sequence space
reductase. Appl Environ Microbiol 66: 1788–1795.
Park H-S, Nam SH, Lee JK, Yoon CN, Mannervik B,
Barak Y, Ackerley DF, Dodge CJ, Lal B, Cheng A, Francis
Benkovic SJ et al. (2006). Design and evolution of new
AJ et al. (2006b). Analysis of novel soluble Cr(VI) and
catalytic activity with an existing protein scaffold.
U(VI) reductases and generation of improved enzymes
using directed evolution. Appl Environ Microbiol 72:
Qian Z, Lutz SJ. (2005). Improving the catalytic activity of
Candida antarctica lipase B by circular permutation.
Barak Y, Thorne SH, Ackerley DF, Lynch SV, Contag CH,
Matin A. (2006a). New enzyme for reductive cancer
Sambrook J, Fritsch EF, Maniatis T. (1989). Molecular
chemotherapy (YieF) and its improvement by directed
Cloning: a Laboratory Manual, 2nd edn. Cold Spring
evolution. Mol Cancer Ther 5: 97–103.
Harbour Laboratory Press: Cold Spring Harbor, NY.
Chatterjee R, Yuan L. (2006). Directed evolution of
Stemmer WP. (1994). DNA shuffling by random fragmen-
metabolic pathways. Trends Biotech 24: 28–38.
tation and reassembly: in vitro recombination for
The ISME Journal
A statistical approach for enzyme improvementY Barak et al
molecular evolution. Proc Natl Acad Sci USA 91:
Teixeira LSG, Costa ACS, Ferreira SLCM, Freitas LM,
Carvalho S. (1999). Spectrophotometric determination
Suzuki FC, Christians B, Kim A, Skandalis MEB, Loeb LA.
of uranium using 2-(2-thiazolylazo)-p-cresol (TAC)
(1996). Tolerance of different proteins for amino acid
in the presence of surfactants. J Braz Chem Soc 10:
Supplementary Information accompanies the paper on The ISME Journal website
The ISME Journal
For the use of a Registered Medical Practitioner or a Hospital or a Laboratory only GLIZID - M Gliclazide and Metformin Hydrochloride Tablets DESCRIPTION Glizid-M contains Gliclazide and Metformin Hydrochloride. Gliclazide, chemically is 1-(3-azabicyclo [3.3.0.]Oct - 3-yl) -3-p-tolylsulphonylurea. Metformin Hydrochloride is 1,1-dimethyl biguanide hydrochloride. Glizid -M is a white , o
CASE REPORT J ournal of A ndrological S ciences 2009;16:130-132 Chlamydia trachomatis attacks young male spermatozoon T. Cai, S. Mazzoli*, D. Bani**, T. Sacchi Bani**, R. Bartoletti Department of Urology, University of Florence, Italy; * STDs Centre, Santa Maria Annunziata Hospital, Florence, Italy; ** Department of Anatomy, Histology & Forensic Medicine, University of Flo