AEM Accepted Manuscript Posted Online 4 December 2015 Appl. Environ. Microbiol. doi:10.1128/AEM.03074-15 Copyright © 2015 Jin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.
1
Novel rhizosphere soil alleles for the enzyme ACC deaminase queried functionally with
2
an in-vivo competition assay
3 4
Zhao Jin1,2,$, Sara C. Di Rienzi1,2,$, Anders Janzon1,2, Jeff J. Werner3, , Largus T.
5
Angenent3, Jeffrey L. Dangl4,5,6,7, Douglas M. Fowler8, Ruth E. Ley1,2,*
§
6 7
1
8
2
9
3
Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA Department of Microbiology, Cornell University, Ithaca, NY, USA Department of Biological and Environmental Engineering, Cornell University, Ithaca,
10
NY, USA
11
4
12
5
13
6
14
USA
15
7
16
8
17
§
18
*Corresponding author. E-mail:
[email protected]
19
$
Howard Hughes Medical Institute Department of Biology, University of North Carolina, Chapel Hill, USA Department of Microbiology and Immunology, University of North Carolina, Chapel Hill,
Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, USA Department of Genome Sciences, University of Washington, Seattle, WA, USA Current address: Department of Chemistry, SUNY Cortland, Cortland, NY, USA
Contributed equally
20 21
Subject Category: Integrated genomics and post-genomics approaches in
22
microbial ecology
23 24
Keywords: rhizosphere soil; microbial DNA; selection assay; mutational scanning; competition assay; ACC deaminase
25
Abstract
26
Metagenomes derived from environmental microbiotas encode a vast diversity of
27
protein homologs. How this diversity impacts protein function can be explored through
28
selection assays aimed to optimize function. While artificially generated gene sequence
29
pools are typically used in selection assays, their usage may be limited by technical or
30
ethical reasons. Here, we investigate an alternative strategy: the use of a soil microbial
31
DNA as a starting point. We demonstrate this approach by optimizing the function of a
32
widely occurring soil bacterial enzyme, 1-aminocyclopropane-1-carboxylate (ACC)
33
deaminase. We identified a specific ACC deaminase domain region (ACCD-DR) that,
34
when PCR-amplified from the soil, produced a variant pool that we could swap into
35
functional ACC deaminase-encoding plasmids. Functional clones of ACC deaminase
36
were selected for in a competition assay based on their capacity to provide nitrogen to E.
37
coli in vitro. The most successful ACCD-DR variants were identified after multiple rounds
38
of selection by sequence analysis. We observed that previously identified essential
39
active site residues were fixed in the original, unselected library and that additional
40
residues went to fixation after selection. We identified a divergent essential residue that
41
hints at alternative substrates and a cluster of neutral residues that did not influence
42
ACCD performance. Using an artificial ACCD-DR variant library generated by DNA
43
oligomer synthesis, we validated the same fixation patterns. Our study demonstrates that
44
soil metagenomes are useful starting pools of protein-encoding gene diversity that can
45
be utilized for protein optimization and functional characterization when synthetic
46
libraries are not appropriate.
47
1
48 49
Introduction Competition assays allow for the massively parallel assessment of the relative
50
fitness of variants in a functional context (1). Variant pools can be generated
51
synthetically or can be harvested from the environment. Recently, deep-mutational
52
scanning was developed as a method to elucidate the sequence-function relationships
53
and the optimal catalytic sequence of proteins (2, 3). Using a doped DNA oligomer
54
library as a starting point for selection assays followed by next-generation sequencing,
55
Fowler et al. mapped the mutational preferences of hundreds of thousands of protein
56
variants for an important human protein domain, and thereby assessed the fitness
57
effects of nearly all possible point mutations in the protein domain. This method is able to
58
assay truly novel mutations and combinations of mutations affecting enzyme function,
59
thereby helping to generate optimized engineered proteins for biomedical or other use.
60
The use of artificially produced proteins may be constrained for ethical and
61
social reasons, however, particularly in the agricultural arena. An alternative to selection
62
based on artificially produced proteins is to select on naturally occurring protein diversity.
63
Naturally occurring environmental microbial communities, which contain a high diversity
64
of protein variants (4-6), might provide an attractive source of gene variants for use in
65
structure-function studies or enzyme optimization (7-9). Soil metagenomes, in particular,
66
comprise a rich and mostly uncharacterized reservoir of protein-coding gene diversity
67
encoded by a vast diversity of microorganisms (9). Metagenomes have been mined for
68
natural product discovery, including novel proteins that function as antibiotics (6) and
69
cellulose-degrading enzymes (10). Such proteins are typically encoded by many
70
homologs, which are likely to vary in their functional attributes (11). Here, we asked the
71
simple question: can a soil metagenome be used as a starting point for a protein
72
selection assay?
2
73
To address this question, we designed a study in which we used a growth-
74
based competition assay to investigate the soil enzyme 1-aminocyclopropane-1-
75
carboxylate deaminase (ACCD), which catalyzes the degradation of 1-
76
aminocyclopropane-1-carboxylate (ACC) to α-ketobutyrate and ammonia (12). ACC is a
77
key intermediate in the production of the plant growth hormone ethylene (13).
78
Agricultural control of ethylene levels has proven crucial for minimizing the destructive
79
effects of environmental stresses including salt (14), water (15), and heavy metals (16),
80
and promoting root elongation (17, 18). Due to its critical role in promoting plant growth,
81
ACCD has been identified as a key target for bioprospecting (17).
82
ACCD is expressed by bacteria associated with the plant roots, or rhizosphere
83
(19). Though it is encoded by divergent taxa, predominantly belonging to the phyla
84
Proteobacteria, Firmicutes, and Actinobacteria, regions of the protein are well conserved
85
(20). ACCD is comprised of a pyridoxal 5’-phosphate (PLP) dependent (21) active site
86
surrounded by a large and small domain. The active site is situated in a well-conserved
87
region, the ACCD domain region or ACCD-DR (21-24). ACCD DNA sequences, however,
88
have been recovered in bacteria and plants lacking ACC deaminase activity (17).
89
Therefore, functional assays must be applied to confirm ACC deaminase function.
90
ACCD is a unique enzyme in that it allows for bacterial growth when ACC is the
91
only source of nitrogen (25). Therefore, we tested whether soil microbiome-derived
92
variants in a bacterial strain competition assay conducted with ACC as the sole nitrogen
93
source would emerge and be indicative of enhanced ACCD activity in this growth context.
94
We observed dominant variants in the competition assay, which confirms that a soil
95
microbiome can be used as a starting point in a selection assay. Based on the selection
96
results, we predict residues to be functional, divergent, or neutral within the ACCD-DR,
97
and these results are largely in agreement with previous structure-function analyses. Of
98
particular note, we uncovered diversification at a previously identified essential residue, 3
99
hinting at alternative substrates or structural conformations that could be accommodated
100
by ACCD. By using a wide pool of variants versus single mutations, we were also able to
101
capture combinations of residues that appear to have undergone collective selection and
102
therefore may function cooperatively. Finally, we validated the use of a soil microbiome
103
library for protein optimization as we obtained similar results from competition among
104
synthetic ACCD-DR variants generated by doped DNA oligomer synthesis. Altogether,
105
this work highlights the structural and evolutionary knowledge that can be gleaned by
106
assessing the sequence variants present in a natural sample.
107 108 109
Materials and Methods: Initial assessment of ACCD-DR diversity in rhizopshere soil - Rhizosphere
110
soil samples from four Maize inbred lines (Oh43, MS71, M37W, and NC358) were
111
collected from a field in Missouri, USA, in 2010, and DNA was extracted as previously
112
described (26). PCR reactions used custom primers a2F 5'- GSAACAAGACGCGCAAG-
113
3' and a2R 5'-CACSAGCACGCACTTCATG-3', which amplify a 37 amino acid region
114
from the full-length ACCD gene (Figure 1). These degenerate primers were designed
115
using an alignment of bacterial ACCD genes obtained from public databases. PCRs (25
116
µl) consisted of 10 ng rhizosphere DNA, 0.16 µM of primers a2F and a2R, 1x GoTaq
117
buffer (Promega Corporation, Madison, WI), 0.001 U GoTaq, 2 mM Mg2+, and 0.2 mM
118
dNTPs. Thermal cycling consisted of an initial denaturation at 95°C for 2 min, 30 cycles
119
of denaturation at 95°C for 1 min, annealing at 52.9°C for 50 s, and elongation at 72°C
120
for 1 min, followed by a final extension at 72°C for 7 min. The amplicons were purified
121
with the QIAquick PCR purification kit (Qiagen, Valencia, CA) and pooled for sequencing.
122
Addition of Illumina linker and adaptor sequences and sequencing on the Illumina
123
Genome Analyzer IIx (Illumina Inc., San Diego, CA) were conducted by the Cornell
4
124
University Biotechnology Resource Center. The Illumina sequences are availabe in the
125
European Nucleotide Archive under study accession number PRJEB11480.
126
Resulting Illumina sequence reads were processed using in-house perl scripts as
127
follows. Paired-end sequences were joined based on aligning the 23 bp overlapping
128
region, with no internal gaps allowed. Reads were filtered by trimming low-quality bases
129
(Q20 cutoff) from single-direction reads and discarding reads trimmed more than six
130
bases. Up to three tailing bases (for each direction) that disagreed with the
131
complimentary sequence were allowed to be trimmed. We confirmed that trimmed tailing
132
bases had comparatively lower quality scores and likely sequencing errors. The ACCD-
133
DR DNA sequences were translated to their corresponding amino acid sequences.
134
Amino acid sequences were then clustered by absolute identity using UCLUST (27), to
135
tabulate the protein-level diversity available in the soil sample pool of variants.
136
Generation of ACCD plasmids lacking the ACCD-DR – To assay the ACCD-
137
DR variants amplified from the rhizosphere, we first generated ACCD bearing plasmids
138
lacking the ACCD domain region. A plasmid, p4U2, containing the Pseudomonas
139
cloacae ACCD and its flanking region (28) was obtained as a generous gift from Dr.
140
Bernard Glick, University of Waterloo. We deleted the ACCD-DR from p4U2 using the
141
primer acdSdelF 5'-
142
AATAGCGGCCTGGCCTTCGGCGCAGGAAAACTGGGTGAACTACT-3' and the Agilent
143
QuikChange Lightning Site-Directed Mutagenesis Kit (Agilent Technologies, Santa Clara,
144
CA). This PCR reaction was as follows: 51 µl containing 1x QuikChange reaction buffer,
145
50 ng p4U2 plasmid DNA, 0.2 µM acdSdelF primer, 1 µl QuikChange dNTP mix, 1.5 µl
146
QuikSolution reagent, and 1 µl QuikChange Lightning Enzyme. Thermal cycling
147
consisted of an initial denaturation at 95°C for 2 min, and 18 cycles of denaturation at
148
95°C for 20 s, annealing at 60°C for 10 s, and elongation at 68°C for 5 min, followed by a
5
149
final extension at 68°C for 5 min. The plasmid without the P. cloacae ACCD-DR is
150
referred to as p4U2ΔACCD-DR.
151
Construction of E. coli ACCD-DR variant libraries - To amplify and sequence by
152
454 pyrosequencing the ACCD-DR, we added to the above a2F/a2R primers the 454
153
adaptors and generated the following primer set: acdSinserF 5'-
154
AATAGCGGCCTGGCCTTCGGCGGSAACAAGACGCGCAAG-3' and acdSinserR 5'-
155
CGGAGTAGTTCACCCAGTTTTCCTGCACSAGCACGCACTTCATG-3', where the bold
156
regions indicate the primer sequences used to amplify the ACCD-DR, and the unbold
157
sequences are the 454 adaptors. For the library construction, we used DNA extracted
158
from a soil sample obtained from B73 Maize grown in Lansing, NY (26).
159
To capture the diversity of ACCD genes from the rhizosphere, we performed 15
160
separate PCR reactions and pooled these in groups of five for a total of three pools.
161
These pools are here referred to as Libraries A, B, and C (see Figure 2 for an overview
162
of the library construction and competition assay). For each library, three separate
163
cloning reactions were used to insert the amplicons into p4U2ΔACCD-DR using the
164
QuikChange Lightning Site-Directed Mutagenesis Kit similar to as described above. PCR
165
conditions were as described above, and for each library, amplicons from the five
166
replicate PCRs were combined and purified using the QIAquick PCR Purification Kit
167
(Qiagen, Valencia, CA).
168
Each separate library plasmid pool was then transformed into E. coli XL10-Gold
169
chemical ultracompetent cells (Agilent Technologies, Santa Clara, CA) following the
170
manufacturer's protocol in triplicate. The three transformations were then combined (total
171
volume 1.65 ml), Lysogeny Broth (LB) supplemented with 50 mg/ml ampicillin was added
172
to a final volume of 5 ml, and cells were grown at 37°C overnight in a shaking incubator.
173
The overnight culture was spun down, and washed twice in 0.1 M Tris-HCl buffer (pH 7.5)
174
to prevent carry-over nitrogen. The washed cell pellets were resuspended in 1.65 ml DF 6
175
minimal media (29) minus (NH4)2SO4, and supplemented with 0.2% dextrose, 50 mM
176
MgSO4, 1 mM CaCl2, 50 mg/ml ampicillin, and 10 µg/ml thiamine. Washed overnight
177
cultures (300 µl) were frozen as “before selection” ACCD-DR pools.
178
Growth-based competition assay - Resuspended cell pellets (300 µl, normalized
179
by the OD600 values of the previous round of cultures) were added to 30 ml
180
supplemented DF minimal media minus (NH4)2SO4 in three replicates, and ACC was
181
added to the medium as the sole nitrogen source at a final concentration of 16 mM.
182
Hereafter, this growth medium is called the DF/ACC medium. These E. coli cells were
183
grown at 30°C for five days in the first round of selection. At the end of the first round of
184
selection, the cultures were harvested, spun down, washed, and resuspended. Washed
185
and resuspended cells (300 µl, normalized by the OD600 values of the previous round of
186
cultures, as described above) were again transferred into 30 ml fresh DF/ACC medium
187
to start the second round of selection. The second round of selection consisted of three
188
days of growth at 30°C. Cells were passaged into fresh DF/ACC medium to start the
189
third round of selection in a similar way. A total of six rounds of selection were conducted
190
on each ACCD-DR variant library. Importantly, 300 µl of cultures were collected as
191
ACCD-DR variant pool samples after each round of selection for sequencing.
192
Note that E. coli cells containing different rhizosphere ACCD-DR variants had
193
heterogeneous growth rates within each variant library and between libraries, so we did
194
not synchronize the E. coli cells to the same growth stage but rather ensured that we
195
provided the same amount of E. coli cells to each round of selection across all three
196
variant libraries based on normalized OD600 values, and we gave each library of variants
197
the same growth time. Library B was only assayed in duplicate because the time zero
198
culture was lost for one of the pools.
199
To test for cheaters (i.e., E. coli cells growing without a functional ACC deaminase
200
on the nitrogen produced by E. coli cells with a functional ACC deaminase), we 7
201
performed the competition assay similarly except that three rounds of selection were
202
conducted.
203
DNA oligomer synthesis for the artificial ACCD-DR variant pool - A DNA oligo
204
variant pool was synthesized as described previously (2) (Gene Link, Hawthorne, NY).
205
The DNA sequence of the winning ‘LA’ variant
206
(‘CTCGAATACCTGATCCCCGAGGCGCTGGCGCAGGGCTGCGACACGCTGGTGTCG
207
ATCGGCGGCATCCAGTCGAACCAGACACGCCAGGTTGCGGCCGTGGCTGCCCAC
208
CTGGG’, which encoded ‘LEYLIPEALAQGCDTLVSIGGIQSNQTRQVAAVAAHL’), was
209
chosen as the ‘wildtype’ backbone of the oligo and doped at each base with 2.1% non-
210
wildtype nucleotides. The growth-based competition for the artifical variant pool was
211
conducted as described above for the rhizosphere-derived ACCD-DR library.
212
Sequencing of ACCD-DR variants from the competition assays - Plasmid
213
DNA was extracted from the ACCD-DR variant pools collected before and after each
214
round of selection using the QIAprep Spin Miniprep Kit (Qiagen, Valencia, CA). The
215
ACCD-DR variants were amplified by PCR from the plasmid DNA using the following
216
composite primer pair: forward primer: 454 Titanium Lib-I Primer A/5-base barcode/a2F
217
primer; and reverse primer: 454 Titanium Lib-I Primer B/a2R primer. Each sample was
218
amplified in quadruplicate 20 µl-PCRs consisting of 10 ng plasmid DNA, 0.2 µM of
219
forward and reverse primers, and 1x Phusion HF Master Mix (New England BioLabs,
220
Ipswich, MA). Thermal cycling consisted of an initial denaturation at 98°C for 30s, 30
221
cycles of denaturation at 98°C for 10 s, annealing at 51.2°C for 30 s, and elongation at
222
72°C for 1 min, followed by a final extension at 72°C for 7 min. Following PCR, DNA
223
amplicons were purified with the Agencourt AMPure XP PCR purification beads
224
(Beckman Coulter, Indianapolis, IN), quantified using the Quant-iT PicoGreen dsDNA
225
Assay Kit (Life Technologies, Grand Island, NY), and pooled in equimolar ratios with a
226
final concentration of 30 ng/µl. Pyrosequencing was performed using the Roche 454 GS 8
227
FLX Titanium chemistry (454 Life Sciences, Branford, CT) at the Engencore facility in the
228
University of South Carolina. The 454 sequence reads from both the natural and artifical
229
variant libraries are availabe in the European Nucleotide Archive under study accession
230
number PRJEB11657.
231
Analysis of ACCD-DR diversity- 454-generated sequence reads were analyzed
232
with the QIIME software package (Quantitative Insights into Microbial Ecology), using
233
default parameters for each step (30). Sequences were chimera-checked and clustered
234
into ACCD-DR variant clusters using OTUpipe (27) at 99% sequence identity. Each
235
ACCD-DR variant cluster (equivalent to an OTU in 16S rRNA analysis) was represented
236
by its most abundant sequence. A total of 33,625 quality-filtered reads were obtained for
237
51 samples, with an average of 659 reads per sample. The forward and reverse primers
238
were removed using a customized script. Due to the many indels in the sequences, a
239
custom script was employed to ensure the correct sequence length. Using the EMBOSS
240
water program (31), each 454 sequence trimmed of both primers was aligned to the P.
241
cloacae ACCD-DR as the 'backbone' sequence. If an insertion was found relative to the
242
backbone, the insertion was deleted in the 454 sequence. If a deletion was found
243
relative to the backbone, a gap was inserted into the 454 sequence at the corresponding
244
position. The insertions in the 454 sequences were easy to identify; however, the
245
contents of the gaps (i.e., what base to fill in the gaps) were impossible to determine
246
within the limited context. Therefore, inevitably, a number of the resulting sequences still
247
contained gaps. However, after this process, all sequences were of the same length,
248
and the correct reading frame was maintained. The DNA sequences were translated into
249
amino acid sequences, and AA sequences containing more than one unknown residue
250
were excluded from the analysis. After quality-filtering, 26,764 DNA sequences remained
251
for 51 samples, an average of 524 per sample.
9
252
To calculate the between-sample (β) diversity between the ACCD-DR variant
253
pools before and after each round of selection, we first standardized the sequence count
254
per sample by rarefaction, so that each sample contained 80 sequences. A phylogenetic
255
(neighbor-joining) tree was built from the representative sequences of the ACCD-DR
256
variant clusters using ClustalW (32), and the tree was used for calculating β-diversity
257
using the UniFrac distance metrics (33).
258
To calculate the frequency of each DNA base or amino acid residue at every
259
DNA/protein position, variant counts were normalized across samples by frequency. The
260
'plyr' (34) and 'reshape2' packages (35) in R v.2.15.0 (36) were applied to determine the
261
DNA base/amino acid residue frequencies. The amino acid and DNA waffle plots were
262
generated based on the frequencies of the residues and bases using the R package
263
‘ggplot2’ (37). The structure of the H26 ACCD-DR variant was computed using homology
264
modeling on the SWISS-MODEL server (38) with the P. sp. ACP ACCD (bears the Q26
265
ACCD variant, PDB ID 1TYZ) as the template (23). The structures were visualized and
266
aligned in PyMol (39).
267
To identify the amino acid residues that were fixed or neutral in the selection
268
assay, a linear regression was fitted in R to the frequencies of ACCD-DR variants in
269
each library at time zero and after each round of selection for each residue at each
270
position. A residue is defined as (i) neutral if the linear regressions show both positive
271
and negative slopes in the three libraries, and (ii) as fixed if it has a starting frequency of
272
over 0.1 and positive slopes in all three libraries.
273
To test whether eight fixed residues identified by our selection assay that lack
274
previously known function hitchhiked to fixation together, the Chi-squared test of
275
independence and the Fisher’s exact test of independence were conducted in R to
276
identify whether any two of the eight residues were associated. In the ACCD-DR variant
277
sequences at time zero before the competition assay, a binary code (0 for absence of 10
278
the fixed residue and 1 otherwise) was used to indicate whether a base position
279
contained the fixed residue or not. A loglinear model-based independence test was
280
applied using the R package MASS to the 8-way contingency tables generated from the
281
binary data for ACCD-DR variants in the three rhizosphere bacterial ACCD-DR libraries.
282
Molecular evolution analyses of ACCD-DR pools - We performed the codon-
283
based Z-test (40) and the Tajima’s neutrality test (41) using the Molecular Evolutionary
284
Genetics Analysis (MEGA 6.0) program (42) on the ACCD-DR DNA variants from all
285
three rhizosphere bacterial ACCD-DR libraries at time zero. The ACCD-DR DNA
286
variants were filtered by length and aligned using their encoding protein sequences in
287
the EMBOSS water program as described earlier. A total of 16,432 sequences were
288
obtained for the molecular evolution analyses. The codon-based Z-test calculates the
289
test statistic dS-dN, where dS and dN represent the synonymous and nonsynonymous
290
substitutions per site, respectively. The dataset was bootstrapped 500 times to estimate
291
the variance, and the modified Nei-Gojobori method with Jukes-Cantor correction
292
(assumed transition/transversion bias=15) (43) was selected as the substitution model.
293
Any position that contained alignment gaps or missing data was eliminated from pairwise
294
sequence comparisons. The probability of rejecting the null hypothesis of strict neutral
295
(dN=dS) in favor of the alternative hypothesis (purifying selection with dN