Novel rhizosphere soil alleles for the enzyme ACC deaminase queried ...

5 downloads 12846 Views 3MB Size Report
Dec 4, 2015 - deaminase. We identified a specific ACC deaminase domain region (ACCD-DR) that, ..... Sequences were chimera-checked and clustered. 233.
AEM Accepted Manuscript Posted Online 4 December 2015 Appl. Environ. Microbiol. doi:10.1128/AEM.03074-15 Copyright © 2015 Jin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.

1

Novel rhizosphere soil alleles for the enzyme ACC deaminase queried functionally with

2

an in-vivo competition assay

3 4

Zhao Jin1,2,$, Sara C. Di Rienzi1,2,$, Anders Janzon1,2, Jeff J. Werner3, , Largus T.

5

Angenent3, Jeffrey L. Dangl4,5,6,7, Douglas M. Fowler8, Ruth E. Ley1,2,*

§

6 7

1

8

2

9

3

Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA Department of Microbiology, Cornell University, Ithaca, NY, USA Department of Biological and Environmental Engineering, Cornell University, Ithaca,

10

NY, USA

11

4

12

5

13

6

14

USA

15

7

16

8

17

§

18

*Corresponding author. E-mail: [email protected]

19

$

Howard Hughes Medical Institute Department of Biology, University of North Carolina, Chapel Hill, USA Department of Microbiology and Immunology, University of North Carolina, Chapel Hill,

Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, USA Department of Genome Sciences, University of Washington, Seattle, WA, USA Current address: Department of Chemistry, SUNY Cortland, Cortland, NY, USA

Contributed equally

20 21

Subject Category: Integrated genomics and post-genomics approaches in

22

microbial ecology

23 24

Keywords: rhizosphere soil; microbial DNA; selection assay; mutational scanning; competition assay; ACC deaminase

25

Abstract

26

Metagenomes derived from environmental microbiotas encode a vast diversity of

27

protein homologs. How this diversity impacts protein function can be explored through

28

selection assays aimed to optimize function. While artificially generated gene sequence

29

pools are typically used in selection assays, their usage may be limited by technical or

30

ethical reasons. Here, we investigate an alternative strategy: the use of a soil microbial

31

DNA as a starting point. We demonstrate this approach by optimizing the function of a

32

widely occurring soil bacterial enzyme, 1-aminocyclopropane-1-carboxylate (ACC)

33

deaminase. We identified a specific ACC deaminase domain region (ACCD-DR) that,

34

when PCR-amplified from the soil, produced a variant pool that we could swap into

35

functional ACC deaminase-encoding plasmids. Functional clones of ACC deaminase

36

were selected for in a competition assay based on their capacity to provide nitrogen to E.

37

coli in vitro. The most successful ACCD-DR variants were identified after multiple rounds

38

of selection by sequence analysis. We observed that previously identified essential

39

active site residues were fixed in the original, unselected library and that additional

40

residues went to fixation after selection. We identified a divergent essential residue that

41

hints at alternative substrates and a cluster of neutral residues that did not influence

42

ACCD performance. Using an artificial ACCD-DR variant library generated by DNA

43

oligomer synthesis, we validated the same fixation patterns. Our study demonstrates that

44

soil metagenomes are useful starting pools of protein-encoding gene diversity that can

45

be utilized for protein optimization and functional characterization when synthetic

46

libraries are not appropriate.

47

1

48 49

Introduction Competition assays allow for the massively parallel assessment of the relative

50

fitness of variants in a functional context (1). Variant pools can be generated

51

synthetically or can be harvested from the environment. Recently, deep-mutational

52

scanning was developed as a method to elucidate the sequence-function relationships

53

and the optimal catalytic sequence of proteins (2, 3). Using a doped DNA oligomer

54

library as a starting point for selection assays followed by next-generation sequencing,

55

Fowler et al. mapped the mutational preferences of hundreds of thousands of protein

56

variants for an important human protein domain, and thereby assessed the fitness

57

effects of nearly all possible point mutations in the protein domain. This method is able to

58

assay truly novel mutations and combinations of mutations affecting enzyme function,

59

thereby helping to generate optimized engineered proteins for biomedical or other use.

60

The use of artificially produced proteins may be constrained for ethical and

61

social reasons, however, particularly in the agricultural arena. An alternative to selection

62

based on artificially produced proteins is to select on naturally occurring protein diversity.

63

Naturally occurring environmental microbial communities, which contain a high diversity

64

of protein variants (4-6), might provide an attractive source of gene variants for use in

65

structure-function studies or enzyme optimization (7-9). Soil metagenomes, in particular,

66

comprise a rich and mostly uncharacterized reservoir of protein-coding gene diversity

67

encoded by a vast diversity of microorganisms (9). Metagenomes have been mined for

68

natural product discovery, including novel proteins that function as antibiotics (6) and

69

cellulose-degrading enzymes (10). Such proteins are typically encoded by many

70

homologs, which are likely to vary in their functional attributes (11). Here, we asked the

71

simple question: can a soil metagenome be used as a starting point for a protein

72

selection assay?

2

73

To address this question, we designed a study in which we used a growth-

74

based competition assay to investigate the soil enzyme 1-aminocyclopropane-1-

75

carboxylate deaminase (ACCD), which catalyzes the degradation of 1-

76

aminocyclopropane-1-carboxylate (ACC) to α-ketobutyrate and ammonia (12). ACC is a

77

key intermediate in the production of the plant growth hormone ethylene (13).

78

Agricultural control of ethylene levels has proven crucial for minimizing the destructive

79

effects of environmental stresses including salt (14), water (15), and heavy metals (16),

80

and promoting root elongation (17, 18). Due to its critical role in promoting plant growth,

81

ACCD has been identified as a key target for bioprospecting (17).

82

ACCD is expressed by bacteria associated with the plant roots, or rhizosphere

83

(19). Though it is encoded by divergent taxa, predominantly belonging to the phyla

84

Proteobacteria, Firmicutes, and Actinobacteria, regions of the protein are well conserved

85

(20). ACCD is comprised of a pyridoxal 5’-phosphate (PLP) dependent (21) active site

86

surrounded by a large and small domain. The active site is situated in a well-conserved

87

region, the ACCD domain region or ACCD-DR (21-24). ACCD DNA sequences, however,

88

have been recovered in bacteria and plants lacking ACC deaminase activity (17).

89

Therefore, functional assays must be applied to confirm ACC deaminase function.

90

ACCD is a unique enzyme in that it allows for bacterial growth when ACC is the

91

only source of nitrogen (25). Therefore, we tested whether soil microbiome-derived

92

variants in a bacterial strain competition assay conducted with ACC as the sole nitrogen

93

source would emerge and be indicative of enhanced ACCD activity in this growth context.

94

We observed dominant variants in the competition assay, which confirms that a soil

95

microbiome can be used as a starting point in a selection assay. Based on the selection

96

results, we predict residues to be functional, divergent, or neutral within the ACCD-DR,

97

and these results are largely in agreement with previous structure-function analyses. Of

98

particular note, we uncovered diversification at a previously identified essential residue, 3

99

hinting at alternative substrates or structural conformations that could be accommodated

100

by ACCD. By using a wide pool of variants versus single mutations, we were also able to

101

capture combinations of residues that appear to have undergone collective selection and

102

therefore may function cooperatively. Finally, we validated the use of a soil microbiome

103

library for protein optimization as we obtained similar results from competition among

104

synthetic ACCD-DR variants generated by doped DNA oligomer synthesis. Altogether,

105

this work highlights the structural and evolutionary knowledge that can be gleaned by

106

assessing the sequence variants present in a natural sample.

107 108 109

Materials and Methods: Initial assessment of ACCD-DR diversity in rhizopshere soil - Rhizosphere

110

soil samples from four Maize inbred lines (Oh43, MS71, M37W, and NC358) were

111

collected from a field in Missouri, USA, in 2010, and DNA was extracted as previously

112

described (26). PCR reactions used custom primers a2F 5'- GSAACAAGACGCGCAAG-

113

3' and a2R 5'-CACSAGCACGCACTTCATG-3', which amplify a 37 amino acid region

114

from the full-length ACCD gene (Figure 1). These degenerate primers were designed

115

using an alignment of bacterial ACCD genes obtained from public databases. PCRs (25

116

µl) consisted of 10 ng rhizosphere DNA, 0.16 µM of primers a2F and a2R, 1x GoTaq

117

buffer (Promega Corporation, Madison, WI), 0.001 U GoTaq, 2 mM Mg2+, and 0.2 mM

118

dNTPs. Thermal cycling consisted of an initial denaturation at 95°C for 2 min, 30 cycles

119

of denaturation at 95°C for 1 min, annealing at 52.9°C for 50 s, and elongation at 72°C

120

for 1 min, followed by a final extension at 72°C for 7 min. The amplicons were purified

121

with the QIAquick PCR purification kit (Qiagen, Valencia, CA) and pooled for sequencing.

122

Addition of Illumina linker and adaptor sequences and sequencing on the Illumina

123

Genome Analyzer IIx (Illumina Inc., San Diego, CA) were conducted by the Cornell

4

124

University Biotechnology Resource Center. The Illumina sequences are availabe in the

125

European Nucleotide Archive under study accession number PRJEB11480.

126

Resulting Illumina sequence reads were processed using in-house perl scripts as

127

follows. Paired-end sequences were joined based on aligning the 23 bp overlapping

128

region, with no internal gaps allowed. Reads were filtered by trimming low-quality bases

129

(Q20 cutoff) from single-direction reads and discarding reads trimmed more than six

130

bases. Up to three tailing bases (for each direction) that disagreed with the

131

complimentary sequence were allowed to be trimmed. We confirmed that trimmed tailing

132

bases had comparatively lower quality scores and likely sequencing errors. The ACCD-

133

DR DNA sequences were translated to their corresponding amino acid sequences.

134

Amino acid sequences were then clustered by absolute identity using UCLUST (27), to

135

tabulate the protein-level diversity available in the soil sample pool of variants.

136

Generation of ACCD plasmids lacking the ACCD-DR – To assay the ACCD-

137

DR variants amplified from the rhizosphere, we first generated ACCD bearing plasmids

138

lacking the ACCD domain region. A plasmid, p4U2, containing the Pseudomonas

139

cloacae ACCD and its flanking region (28) was obtained as a generous gift from Dr.

140

Bernard Glick, University of Waterloo. We deleted the ACCD-DR from p4U2 using the

141

primer acdSdelF 5'-

142

AATAGCGGCCTGGCCTTCGGCGCAGGAAAACTGGGTGAACTACT-3' and the Agilent

143

QuikChange Lightning Site-Directed Mutagenesis Kit (Agilent Technologies, Santa Clara,

144

CA). This PCR reaction was as follows: 51 µl containing 1x QuikChange reaction buffer,

145

50 ng p4U2 plasmid DNA, 0.2 µM acdSdelF primer, 1 µl QuikChange dNTP mix, 1.5 µl

146

QuikSolution reagent, and 1 µl QuikChange Lightning Enzyme. Thermal cycling

147

consisted of an initial denaturation at 95°C for 2 min, and 18 cycles of denaturation at

148

95°C for 20 s, annealing at 60°C for 10 s, and elongation at 68°C for 5 min, followed by a

5

149

final extension at 68°C for 5 min. The plasmid without the P. cloacae ACCD-DR is

150

referred to as p4U2ΔACCD-DR.

151

Construction of E. coli ACCD-DR variant libraries - To amplify and sequence by

152

454 pyrosequencing the ACCD-DR, we added to the above a2F/a2R primers the 454

153

adaptors and generated the following primer set: acdSinserF 5'-

154

AATAGCGGCCTGGCCTTCGGCGGSAACAAGACGCGCAAG-3' and acdSinserR 5'-

155

CGGAGTAGTTCACCCAGTTTTCCTGCACSAGCACGCACTTCATG-3', where the bold

156

regions indicate the primer sequences used to amplify the ACCD-DR, and the unbold

157

sequences are the 454 adaptors. For the library construction, we used DNA extracted

158

from a soil sample obtained from B73 Maize grown in Lansing, NY (26).

159

To capture the diversity of ACCD genes from the rhizosphere, we performed 15

160

separate PCR reactions and pooled these in groups of five for a total of three pools.

161

These pools are here referred to as Libraries A, B, and C (see Figure 2 for an overview

162

of the library construction and competition assay). For each library, three separate

163

cloning reactions were used to insert the amplicons into p4U2ΔACCD-DR using the

164

QuikChange Lightning Site-Directed Mutagenesis Kit similar to as described above. PCR

165

conditions were as described above, and for each library, amplicons from the five

166

replicate PCRs were combined and purified using the QIAquick PCR Purification Kit

167

(Qiagen, Valencia, CA).

168

Each separate library plasmid pool was then transformed into E. coli XL10-Gold

169

chemical ultracompetent cells (Agilent Technologies, Santa Clara, CA) following the

170

manufacturer's protocol in triplicate. The three transformations were then combined (total

171

volume 1.65 ml), Lysogeny Broth (LB) supplemented with 50 mg/ml ampicillin was added

172

to a final volume of 5 ml, and cells were grown at 37°C overnight in a shaking incubator.

173

The overnight culture was spun down, and washed twice in 0.1 M Tris-HCl buffer (pH 7.5)

174

to prevent carry-over nitrogen. The washed cell pellets were resuspended in 1.65 ml DF 6

175

minimal media (29) minus (NH4)2SO4, and supplemented with 0.2% dextrose, 50 mM

176

MgSO4, 1 mM CaCl2, 50 mg/ml ampicillin, and 10 µg/ml thiamine. Washed overnight

177

cultures (300 µl) were frozen as “before selection” ACCD-DR pools.

178

Growth-based competition assay - Resuspended cell pellets (300 µl, normalized

179

by the OD600 values of the previous round of cultures) were added to 30 ml

180

supplemented DF minimal media minus (NH4)2SO4 in three replicates, and ACC was

181

added to the medium as the sole nitrogen source at a final concentration of 16 mM.

182

Hereafter, this growth medium is called the DF/ACC medium. These E. coli cells were

183

grown at 30°C for five days in the first round of selection. At the end of the first round of

184

selection, the cultures were harvested, spun down, washed, and resuspended. Washed

185

and resuspended cells (300 µl, normalized by the OD600 values of the previous round of

186

cultures, as described above) were again transferred into 30 ml fresh DF/ACC medium

187

to start the second round of selection. The second round of selection consisted of three

188

days of growth at 30°C. Cells were passaged into fresh DF/ACC medium to start the

189

third round of selection in a similar way. A total of six rounds of selection were conducted

190

on each ACCD-DR variant library. Importantly, 300 µl of cultures were collected as

191

ACCD-DR variant pool samples after each round of selection for sequencing.

192

Note that E. coli cells containing different rhizosphere ACCD-DR variants had

193

heterogeneous growth rates within each variant library and between libraries, so we did

194

not synchronize the E. coli cells to the same growth stage but rather ensured that we

195

provided the same amount of E. coli cells to each round of selection across all three

196

variant libraries based on normalized OD600 values, and we gave each library of variants

197

the same growth time. Library B was only assayed in duplicate because the time zero

198

culture was lost for one of the pools.

199

To test for cheaters (i.e., E. coli cells growing without a functional ACC deaminase

200

on the nitrogen produced by E. coli cells with a functional ACC deaminase), we 7

201

performed the competition assay similarly except that three rounds of selection were

202

conducted.

203

DNA oligomer synthesis for the artificial ACCD-DR variant pool - A DNA oligo

204

variant pool was synthesized as described previously (2) (Gene Link, Hawthorne, NY).

205

The DNA sequence of the winning ‘LA’ variant

206

(‘CTCGAATACCTGATCCCCGAGGCGCTGGCGCAGGGCTGCGACACGCTGGTGTCG

207

ATCGGCGGCATCCAGTCGAACCAGACACGCCAGGTTGCGGCCGTGGCTGCCCAC

208

CTGGG’, which encoded ‘LEYLIPEALAQGCDTLVSIGGIQSNQTRQVAAVAAHL’), was

209

chosen as the ‘wildtype’ backbone of the oligo and doped at each base with 2.1% non-

210

wildtype nucleotides. The growth-based competition for the artifical variant pool was

211

conducted as described above for the rhizosphere-derived ACCD-DR library.

212

Sequencing of ACCD-DR variants from the competition assays - Plasmid

213

DNA was extracted from the ACCD-DR variant pools collected before and after each

214

round of selection using the QIAprep Spin Miniprep Kit (Qiagen, Valencia, CA). The

215

ACCD-DR variants were amplified by PCR from the plasmid DNA using the following

216

composite primer pair: forward primer: 454 Titanium Lib-I Primer A/5-base barcode/a2F

217

primer; and reverse primer: 454 Titanium Lib-I Primer B/a2R primer. Each sample was

218

amplified in quadruplicate 20 µl-PCRs consisting of 10 ng plasmid DNA, 0.2 µM of

219

forward and reverse primers, and 1x Phusion HF Master Mix (New England BioLabs,

220

Ipswich, MA). Thermal cycling consisted of an initial denaturation at 98°C for 30s, 30

221

cycles of denaturation at 98°C for 10 s, annealing at 51.2°C for 30 s, and elongation at

222

72°C for 1 min, followed by a final extension at 72°C for 7 min. Following PCR, DNA

223

amplicons were purified with the Agencourt AMPure XP PCR purification beads

224

(Beckman Coulter, Indianapolis, IN), quantified using the Quant-iT PicoGreen dsDNA

225

Assay Kit (Life Technologies, Grand Island, NY), and pooled in equimolar ratios with a

226

final concentration of 30 ng/µl. Pyrosequencing was performed using the Roche 454 GS 8

227

FLX Titanium chemistry (454 Life Sciences, Branford, CT) at the Engencore facility in the

228

University of South Carolina. The 454 sequence reads from both the natural and artifical

229

variant libraries are availabe in the European Nucleotide Archive under study accession

230

number PRJEB11657.

231

Analysis of ACCD-DR diversity- 454-generated sequence reads were analyzed

232

with the QIIME software package (Quantitative Insights into Microbial Ecology), using

233

default parameters for each step (30). Sequences were chimera-checked and clustered

234

into ACCD-DR variant clusters using OTUpipe (27) at 99% sequence identity. Each

235

ACCD-DR variant cluster (equivalent to an OTU in 16S rRNA analysis) was represented

236

by its most abundant sequence. A total of 33,625 quality-filtered reads were obtained for

237

51 samples, with an average of 659 reads per sample. The forward and reverse primers

238

were removed using a customized script. Due to the many indels in the sequences, a

239

custom script was employed to ensure the correct sequence length. Using the EMBOSS

240

water program (31), each 454 sequence trimmed of both primers was aligned to the P.

241

cloacae ACCD-DR as the 'backbone' sequence. If an insertion was found relative to the

242

backbone, the insertion was deleted in the 454 sequence. If a deletion was found

243

relative to the backbone, a gap was inserted into the 454 sequence at the corresponding

244

position. The insertions in the 454 sequences were easy to identify; however, the

245

contents of the gaps (i.e., what base to fill in the gaps) were impossible to determine

246

within the limited context. Therefore, inevitably, a number of the resulting sequences still

247

contained gaps. However, after this process, all sequences were of the same length,

248

and the correct reading frame was maintained. The DNA sequences were translated into

249

amino acid sequences, and AA sequences containing more than one unknown residue

250

were excluded from the analysis. After quality-filtering, 26,764 DNA sequences remained

251

for 51 samples, an average of 524 per sample.

9

252

To calculate the between-sample (β) diversity between the ACCD-DR variant

253

pools before and after each round of selection, we first standardized the sequence count

254

per sample by rarefaction, so that each sample contained 80 sequences. A phylogenetic

255

(neighbor-joining) tree was built from the representative sequences of the ACCD-DR

256

variant clusters using ClustalW (32), and the tree was used for calculating β-diversity

257

using the UniFrac distance metrics (33).

258

To calculate the frequency of each DNA base or amino acid residue at every

259

DNA/protein position, variant counts were normalized across samples by frequency. The

260

'plyr' (34) and 'reshape2' packages (35) in R v.2.15.0 (36) were applied to determine the

261

DNA base/amino acid residue frequencies. The amino acid and DNA waffle plots were

262

generated based on the frequencies of the residues and bases using the R package

263

‘ggplot2’ (37). The structure of the H26 ACCD-DR variant was computed using homology

264

modeling on the SWISS-MODEL server (38) with the P. sp. ACP ACCD (bears the Q26

265

ACCD variant, PDB ID 1TYZ) as the template (23). The structures were visualized and

266

aligned in PyMol (39).

267

To identify the amino acid residues that were fixed or neutral in the selection

268

assay, a linear regression was fitted in R to the frequencies of ACCD-DR variants in

269

each library at time zero and after each round of selection for each residue at each

270

position. A residue is defined as (i) neutral if the linear regressions show both positive

271

and negative slopes in the three libraries, and (ii) as fixed if it has a starting frequency of

272

over 0.1 and positive slopes in all three libraries.

273

To test whether eight fixed residues identified by our selection assay that lack

274

previously known function hitchhiked to fixation together, the Chi-squared test of

275

independence and the Fisher’s exact test of independence were conducted in R to

276

identify whether any two of the eight residues were associated. In the ACCD-DR variant

277

sequences at time zero before the competition assay, a binary code (0 for absence of 10

278

the fixed residue and 1 otherwise) was used to indicate whether a base position

279

contained the fixed residue or not. A loglinear model-based independence test was

280

applied using the R package MASS to the 8-way contingency tables generated from the

281

binary data for ACCD-DR variants in the three rhizosphere bacterial ACCD-DR libraries.

282

Molecular evolution analyses of ACCD-DR pools - We performed the codon-

283

based Z-test (40) and the Tajima’s neutrality test (41) using the Molecular Evolutionary

284

Genetics Analysis (MEGA 6.0) program (42) on the ACCD-DR DNA variants from all

285

three rhizosphere bacterial ACCD-DR libraries at time zero. The ACCD-DR DNA

286

variants were filtered by length and aligned using their encoding protein sequences in

287

the EMBOSS water program as described earlier. A total of 16,432 sequences were

288

obtained for the molecular evolution analyses. The codon-based Z-test calculates the

289

test statistic dS-dN, where dS and dN represent the synonymous and nonsynonymous

290

substitutions per site, respectively. The dataset was bootstrapped 500 times to estimate

291

the variance, and the modified Nei-Gojobori method with Jukes-Cantor correction

292

(assumed transition/transversion bias=15) (43) was selected as the substitution model.

293

Any position that contained alignment gaps or missing data was eliminated from pairwise

294

sequence comparisons. The probability of rejecting the null hypothesis of strict neutral

295

(dN=dS) in favor of the alternative hypothesis (purifying selection with dN