AEM Accepted Manuscript Posted Online 1 December 2017 Appl. Environ. Microbiol. doi:10.1128/AEM.02340-17 Copyright © 2017 American Society for Microbiology. All Rights Reserved.
1
Running title: Quasi-metagenomics sequencing of Salmonella from food
2
Quasi-metagenomics and realtime sequencing aided detection and subtyping of Salmonella
4
enterica from food samples
5 6
Ji-Yeon Hyeon1#, Shaoting Li1#, David A. Mann1, Shaokang Zhang1, Zhen Li2, Yi Chen3,
7
Xiangyu Deng1*
8 9
1
Center for Food Safety, Department of Food Science and Technology, University of Georgia,
10
1109 Experiment St, Griffin, Georgia, 30223, US
11
2
12
98155, US
13
3
14
MD 20740, US
15
Key words: Salmonella, detection, subtyping, metagenomics, MinION
Washington State Department of Health, Public Health Laboratories, Shoreline, Washington,
Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park,
16 17
#
18
*Corresponding author
19
Xiangyu Deng
20
Center for Food Safety, Department of Food Science and Technology, University of Georgia,
21
1109 Experiment St, Griffin, Georgia, 30223, US
22
E-mail:
[email protected]
23
Contributed equally
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
3
Abstract
25
Metagenomics analysis of food samples promises isolation-independent detection and subtyping
26
of foodborne bacterial pathogens in a single workflow. Selective concentration of Salmonella
27
genomic DNA through immunomagnetic separation (IMS) and multiple displacement
28
amplification (MDA) were shown to shorten culture enrichment of Salmonella-spiked raw
29
chicken breast samples by over 12 hours while permitting serotyping and high-fidelity single
30
nucleotide polymorphisms (SNP) typing of the pathogen using short shotgun sequencing reads.
31
The herein termed quasi-metagenomics approach was evaluated on Salmonella-spiked lettuce
32
and black peppercorn samples as well as retail chicken parts naturally contaminated with
33
different serotypes of Salmonella. Between 8 and 24 h culture enrichment was required for
34
detecting and subtyping naturally occurring Salmonella from unspiked chicken parts compared
35
with 4 to 12 h culture enrichment when Salmonella-spiked food samples were analyzed,
36
indicating the likely need for longer culture enrichment to revive low levels of stressed or injured
37
Salmonella cells in food. Further acceleration of the workflow was achieved by real-time
38
nanopore sequencing. After 1.5 hours of analysis on a potable sequencer, sufficient data were
39
generated from sequencing IMS-MDA product of a cultured-enriched lettuce sample to allow
40
serotyping and robust phylogenetic placement of the inoculated isolate.
41 42
Importance
43
Both culture enrichment and next-generation sequencing remain to be time-consuming processes
44
for food testing where rapid methods for pathogen detection are widely available. Our study
45
demonstrated substantial acceleration of the respective process through IMS-MDA and real-time
46
nanopore sequencing. In one example, the combined use of the two methods delivered a less than
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
24
24 h turnaround time from a Salmonella-contaminated lettuce sample to phylogenetic
48
identification of the pathogen. Improved efficiency like this is important for further expanding
49
the use of whole genome and metagenomics sequencing in microbial analysis of food. Our
50
results suggest the potential of the quasi-metagenomics approach in areas where rapid detection
51
and subtyping of foodborne pathogens is important, such as foodborne outbreak response and
52
precision tracking and monitoring of foodborne pathogens in production environments and
53
supply chains.
54 55
Introduction
56
Detection and subtyping of foodborne pathogens are typically separated. After a pathogen is
57
detected, further subtyping assays may ensue. According to the United States Food and Drug
58
Administration’s Bacteriological Analytical Manual (BAM)
59
(https://www.fda.gov/food/foodscienceresearch/ laboratorymethods/ucm2006949.htm) and
60
U.S.Department of Agriculture Food Safety and Inspection Service’s Microbiology Laboratory
61
Guidebook (MLG) (https://www.fsis.usda.gov/wps/portal/ fsis/topics/science/laboratories-and-
62
procedures/guidebooks-and-methods/microbiology-laboratory-guidebook/microbiology-
63
laboratory-guidebook), confirmed detection of bacterial foodborne pathogens from food and
64
environmental samples requires culture isolation of bacterial isolates and confirmatory
65
identification by biochemical or molecular tests. Isolation and identification of major bacterial
66
foodborne pathogens takes 5-7 days or even longer using these isolate-centric workflows. Then
67
the isolates may be further characterized by a variety of pheno-and genotyping methods (1),
68
which can further increase the laboratory turnaround time.
69
Faster alternatives for the detection and subtyping of foodborne pathogens have been developed
70
and implemented. A wide array of rapid detection methods, including nucleic acid-based,
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
47
immunological-based and biosensor-based techniques, are commercially available for selected
72
pathogens (2). While most of these methods still require culture enrichment for 8 – 48 h, they
73
typically allow presumptive detection of specific pathogens in certain food matrices much faster
74
than culture-based detection methods. Routine use of WGS promises substantial reduction of
75
time and cost for public health laboratories by providing a one-stop platform for various
76
subtyping methods. Using WGS data, multiple subtyping analyses can be integrated into a single
77
in silico workflow, including serotyping (3), SNP typing (4), multilocus sequence typing (MLST)
78
(5, 6) and antimicrobial resistance profiling (7). However, most rapid detection methods do not
79
yield bacterial isolates, which are required by current practices of WGS. In addition, standard
80
laboratory procedures of WGS, which consist of regrowth of the pathogen, genomic DNA
81
purification and library preparation in addition to actual sequencing, take 5-7 days to complete.
82
That means the entire process from contaminated food to pathogen genomes can take up to 10-14
83
days.
84
Recent studies using metagenomics sequencing demonstrated isolation-independent detection
85
and subtyping of Shiga toxin-producing Escherichia coli (STEC) from spinach (8, 9). Direct
86
capture and characterization of STEC genomic sequences was made possible by sequencing the
87
metagenomes derived from enrichment cultures of spinach samples. Using this method, pathogen
88
detection and subtyping can be effectively combined into a single workflow uninterrupted by
89
culture isolation.
90
Such applications also underscored the importance of culture enrichment for metagenomics
91
analysis of pathogen analytes. In the aforementioned studies, both nonselective pre-enrichment
92
and selective enrichment through a variety of antibiotics were performed to effectively enrich
93
STEC (8, 9). In fact, metagenomics sequencing has been used as a tool to evaluate and
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
71
rationalize culture enrichment methods for detecting STEC from fresh spinach (8), Listeria
95
monocytogenes from ice cream (10), and Salmonella enterica from tomato phyllosphere (11) and
96
cilantro (12). These studies collectively suggest that the often low levels of pathogen cells in
97
food samples, presence of competitive or antagonist organisms against the analyte, and food
98
processing and storage conditions detrimental to optimal growth of target pathogens can all pose
99
challenges for effective culture enrichment. Therefore, alternative methods to partially replace
100
culture enrichment are needed to improve the efficiency of analyte DNA concentration and
101
accelerate the workflow of metagenomics food testing.
102
Besides culture enrichment, sequencing itself is another time-consuming step for detecting
103
foodborne pathogen. A full sequencing run on an Illumina MiSeq platform takes ~ 24 to 56 h
104
(150 – 300 bp paired-end reads), whereas rapid pathogen detection methods for microbiological
105
analysis of food generally refer to assays that can be completed within minutes and hours
106
excluding culture enrichment (13). The advent of nanopore sequencing on a portable device has
107
enabled rapid and in-field detection and analysis of clinical pathogens (14). This technology
108
allows real-time analysis of sequencing data as they are being generated, permitting rapid
109
identification of bacterial and viral pathogens thorough whole genome (15) and metagenomics
110
sequencing (16).
111
In this study, we aimed to improve and expedite metagenomics detection and subtyping of
112
foodborne pathogens through selective concentration of analyte DNA and real-time nanopore
113
sequencing of concentrated DNA samples. Using Salmonella-spiked chicken breast as a model
114
system, we first investigated whether culture enrichment could be shortened through targeted cell
115
capture by immunomagnetic separation (IMS) and whole genome amplification by multiple
116
displacement amplification (MDA). Unlike culture enrichment, which is intrinsically restricted in
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
94
speed by the length of cell cycle, MDA provides a rapid and highly efficient alternative to
118
enriching analyte DNA for molecular detection of bacteria. Using bacteriophage ɸ29 DNA
119
polymerase, MDA was reported to generate sufficient amounts of DNA from single E. coli cells
120
for whole genome sequencing (17). The ɸ29 DNA polymerase has high processivity (18) and
121
high proofreading activity (19). Its reaction can be performed isothermally at 30ºC without the
122
need of a thermocycler. The IMS-MDA method had allowed sequencing-based, culture-
123
independent detection of Chlamydia trachomatis, an obligate intracellular pathogen, from
124
clinical samples (20). We have recently shown that IMS-MDA led to real-time PCR detection of
125
low levels of Salmonella from raw chicken breast with no or shortened (4 h) culture enrichment
126
(21). Unlike previous studies that were focused on optimizing culture enrichment prior to
127
metagenomics sequencing (8-12, 22), we aimed to reduce the need for culture enrichment
128
through the alternative of IMS-MDA. To differentiate it from conventional metagenomics
129
sequencing without selective analyte concentration, shotgun sequencing of IMS-MDA products
130
was termed as quasi-metagenomics sequencing in this study. We further evaluated the method
131
with Salmonella-spiked iceberg lettuce, black peppercorns, peanut butter as well as naturally-
132
contaminated retail chicken parts. Finally, we demonstrated rapid detection and phylogenetic
133
identification of Salmonella from a lettuce sample using quasi-metagenomics sequencing on a
134
MinION device (Oxford Nanopore Technologies, Oxford, UK).
135 136
Results
137
Comparison of culture enrichment methods. Both buffered peptone water (BPW) (23) and
138
Rappaport Vassiliadis (RV) broth (24) have been used to enrich Salmonella from chicken. Pre-
139
enrichment in BPW followed by selective enrichment in RV was reported to increase the
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
117
sensitivity of PCR detection of Salmonella in poultry (23).Each medium alone and the
141
combination of both were evaluated to identify optimal conditions to increase the abundance of S.
142
enterica serotype Enteritidis (SE) relative to background flora on raw chicken breast. Real-time
143
PCR threshold cycles (i.e., Ct values) were used to estimate relative abundance of SE (21). The
144
Ct values were obtained from real-time PCR assays using DNA extracted from enrichment
145
cultures as PCR templates. SE cells after enrichment were enumerated on xylose-lysine-tergitol-
146
4 (XLT) agar that is selective for Salmonella. The level of microorganisms after enrichment,
147
including both SE and background flora, was estimated on trypticase soy agar (TSA). As shown
148
in Table S1, while BPW was most effective in enriching SE by yielding the lowest Ct value and
149
the highest SE count on XLT, it also resulted in the highest level of background flora measured
150
by the difference between CFU counts on TSA and XLT. The combination of BPW and RV was
151
least effective in enriching SE relative to background flora as indicated by the highest Ct value.
152
Therefore, RV was selected for SE enrichment prior to IMS and MDA because of its balanced
153
performance in enriching SE and suppressing excessive growth of background flora.
154
Effects of IMS, MDA and IMS-MDA on recovering SE genome by shotgun sequencing.
155
After culture enrichment, IMS was used to selectively capture SE cells and MDA was used to
156
generate DNA from captured cells for shotgun sequencing. Their individual and combined
157
effects on improving sequencing yield of SE among chicken and microbial DNA were assessed.
158
When IMS was performed alone without MDA, DNA extracted from cells bound to
159
immunomagnetic beads was insufficient for sequencing (below 10 pg/µg quantification limit of
160
Qubit HS dsDNA assay). When MDA was used, alone or in combination with IMS, all the
161
resulting DNA samples allowed construction of libraries for Illumina MiSeq sequencing.
162
Sequencing results were evaluated by multiple metrics as shown in Table 1. Raw reads from all
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
140
the MDA and IMS-MDA samples allowed accurate serotype prediction using SeqSero (3). When
164
MDA was used alone without IMS after 12 h of RV enrichment, only an average of 4.74% of all
165
sequencing reads were classified as Salmonella. By contrast, using IMS in conjunction with
166
MDA after enrichment substantially increased the percentage Salmonella reads to an average of
167
48.14%. The increased sequencing output of Salmonella by IMS-MDA led to substantial
168
improvements of sequencing parameters of the SE genome. Sequencing depth normalized by 100
169
million bases of sequencing data increased from 1.01x by MDA to 9.82x by IMS-MDA. The
170
N50 of draft SE genome assembly using metagenomically classified Salmonella reads increased
171
by 31 folds through IMS-MDA instead of just MDA after RV enrichment. The values of
172
normalized sequencing depth and N50 were equivalent to those obtained by WGS of SE
173
genomes prepared from pure cultures (25).
174
IMS-MDA shortened culture enrichment for quasi-metagenomics detection of SE. To
175
evaluate how IMS-MDA could improve selective concentration of Salmonella in comparison to
176
culture enrichment alone, we further sequenced 1) DNA samples prepared immediately after SE
177
inoculation on chicken breast (~1 CFU/g) and after RV enrichment of the inoculated samples for
178
4, 8, 12 and 24 h; and 2) IMS-MDA products after RV enrichment for 4, 8 and 12 h. As shown
179
in Figure 1A and Table 1, the percentage of Salmonella in chicken microbiome (i.e., Salmonella
180
abundance) increased slowly in the first 12 h of RV enrichment and rose to only 18.00% after
181
culturing for 24 h. In comparison, IMS-MDA treatment after 4 h of enrichment increased
182
Salmonella abundance to 31.49%. Furthermore, RV enrichment alone for 12 h only allowed
183
11.04% of the target SE genome to be sequenced, while IMS-MDA was able to recover 21.61%
184
of the genome after only 4 h enrichment, and almost the entire genome (99.09%) after 12 h
185
enrichment (Figure 1B and Table 1). IMS-MDA also improved overall Salmonella sequencing
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
163
output among all sequencing reads including chicken DNA. Forty-eight percent of all sequencing
187
reads were classified as Salmonella after 12 h of RV enrichment followed by IMS-MDA,
188
compared with 16.74% after 24 h of RV enrichment alone (Figure 1C and Table 1). These results
189
showed that IMS-MDA, a 2-3 h process, could reduce culture enrichment by at least 12 h as
190
evaluated by different descriptive measures.
191
Detection and high fidelity subtyping by shotgun sequencing following IMS-MDA. The
192
ability of the quasi-metagenomics approach to distinguish the spiked analyte from other SE
193
strains was evaluated using the CFSAN SNP pipeline (26). In addition to the raw chicken breast
194
samples that were inoculated with the SE strain at ~ 1 CFU/g as previously described, samples
195
with additional inoculum levels at ~ 0.1 and 10 CFU/g were prepared and analyzed. An
196
uninoculated sample was enriched for 12 h before going through the entire IMS-MDA and
197
shotgun sequencing process as a negative control. The sample was further confirmed to be
198
Salmonella negative by culture enrichment (data not shown). Results from all the samples were
199
summarized in Table 2.
200
An average of 569 Mb of sequences were generated from inoculated samples by shotgun
201
sequencing on an Illumina MiSeq instrument, which accounted for ~5% of the total output of a
202
MiSeq run (MiSeq Reagent Kit V3, according to manufacturer’s specification).
203
Accurate serotype prediction using sequencing reads was achieved from all the inoculated
204
samples except when lowest inoculation level (0.1 CFU/g) was coupled with the shortest culture
205
enrichment duration (4 h). The lowest sequencing coverage permitting serotyping from Illumina
206
reads was 21.61%. At least 10% of the reference SE genome was recovered by quasi-
207
metagenomics sequencing of inoculated samples. The minimum sequencing coverage of an
208
inoculated sample was 10.72%, which was obtained at the lowest inoculum level of 0.1 CFU/g
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
186
with shortest culture enrichment for 4 h. When 12 h of culture enrichment was performed, more
210
than 90% of the SE genome was mapped by sequencing reads at all inoculation levels. By
211
contrast, 0.02% of the reference genome was mapped by sequencing reads from the negative
212
control sample.
213
For each inoculated sample, a core genome SNP phylogeny was constructed to include the quasi-
214
metagenomics sample and a total of 52 SE isolates representing 16 major outbreaks and 3
215
sporadic cases in the US between 2001 and 2012 (4). As shown in Figure 2, tight clustering of
216
the quasi-metagenomics sample (Target) and the WGS sample of the spiked strain (Reference)
217
was achieved in all the nine combinations of inoculation levels and culture enrichment durations,
218
indicating equivalence of the two methods in supporting core genome SNP typing. When spiked
219
chicken samples were culture enriched for 12 h, perfect match between each pair of quasi-
220
metagenomics and WGS samples was observed with 0 SNP distance in between (Figure 2).
221
Besides the clustering of the quasi-metagenomics and WGS samples, the rest of the phylogenetic
222
tree was congruent across all the trials. These results suggest that high-fidelity subtyping with
223
phylogenetic discrimination can be achieved by the quasi-metagenomics approach with culture
224
enrichment for 12 h or shorter even when the contamination level was low (~ 0.1 CFU/g).
225
Detection and subtyping of Salmonella from unspiked retail raw chicken meat. The
226
performance of the quasi-metagenomics approach was further assessed by analyzing naturally
227
contaminated retail chicken samples. As opposed to spiked samples, naturally contaminated
228
samples referred to retail products that had been contaminated by Salmonella during production.
229
A total of 76 retail chicken part samples (25 g aliquots), including breasts (n=24), wings (n=27),
230
thighs (n=12), drumsticks (n=9), ground chicken (n=2), gizzards (n=2) and hearts (n=2) were
231
screened for Salmonella by RV enrichment. In parallel, IMS-MDA-real-time PCR was
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
209
performed after 4, 8, 12 and 24 h of enrichment (21). Salmonella was isolated from three wing
233
samples by culture enrichment. WGS of the isolated strains was performed and their serotypes
234
were determined to be Enteritidis (Sample A), Typhimurium (Sample B) and Heidelberg
235
(Sample C) using WGS data (Table 3).
236
The same three samples were also determined to be Salmonella positive by IMS-MDA-real-time
237
PCR and further analyzed using a three-tube most probable number (MPN) method (Table 3).
238
Using IMS-MDA-real-time PCR, Salmonella was first detected after 8 h of enrichment in
239
Sample A and 24 h of enrichment in Samples B and C. The longer enrichment time required by
240
Samples B and C was likely due to the low level of Salmonella contamination (< 3 MPN/g)
241
compared with Sample A (43 MPN/g). Quasi-metagenomics sequencing was performed on
242
selected IMS-MDA products prepared from positive wing samples. As shown in Table 3 and
243
Figure S1, correct serotyping (Enteritidis and Typhimurium) and accurate phylogenetic
244
placement were achieved from Sample A and Sample B. Sample C had a low sequencing
245
coverage of 12.53%, which did not permit serotyping and strain-level phylogenetic placement
246
(data now shown). Instead, genome distance between Sample C and a set of 258 complete
247
Salmonella reference genomes of 57 serotypes was estimated using Mash (27). The eight closest
248
genomes to Sample C were all of serotype Heidelberg (Table S2), supporting the detection and
249
preliminary identification of a Heidelberg isolate from this sample.
250
Detection and subtyping of SE from other selected food samples. In addition to raw chicken
251
parts, the IMS-MDA-shotgun sequencing method was further evaluated with other selected food
252
samples including lettuce, black peppercorn and peanut butter, all of which were linked to recent
253
Salmonella outbreaks (28-30). With 12 h of culture enrichment, strain-level, high-fidelity
254
subtyping was achieved in both lettuce and peppercorn samples at all inoculation levels (~0.1, 1
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
232
and 10 CFU/g) as shown by the clustering of IMS-MDA-shotgun sequencing and WGS samples
256
with 0 or 1 SNP distance (Table 4 and Figure S2). While IMS-MDA allowed real-time PCR
257
detection of SE from peanut butter samples at all inoculation levels after 12 h of culture
258
enrichment (data now shown), the Ct values were 25 or higher and insufficient DNA samples
259
were obtained for shotgun sequencing. The high fat contents in peanut butter likely compromised
260
the effective capture of SE by IMS beads.
261
Rapid quasi-metagenomics detection and subtyping of SE from lettuce using MinION
262
sequencing. IMS-MDA product prepared after 12 h of culture enrichment of a spiked lettuce
263
sample (1 CFU/g) was sequenced on a MinION device. Sequencing data were collected hourly
264
until the full run finished after 48.5 h. The same sample had been sequenced on a MiSeq
265
platform (Table 4). After 1.5 h of sequencing, a total of 14,760 1D and 2D reads with an average
266
length of 2,362 bp were generated. These reads covered 65.19% of the SE reference genome and
267
allowed accurate prediction of its serotype as Enteritidis (Table S3). Using core genome SNP
268
typing, the MinION quasi-metagenomics sample was accurately placed on the phylogenetic tree
269
that included 52 previously described outbreak and clinical SE isolates. As shown in Figure 3,
270
the 1.5 h MinION sample clustered closely with the WGS reference of the inoculated isolate.
271
Similar results were obtained using MinION data after 48.5 h of sequencing (Figure 3), which
272
contained 197,070 1D and 2D reads with an average length of 2,388 bp. SNP distance between
273
the quasi-metagenomics sample and the WGS reference was 70 and 65 after 1.5 h and 48.5 h of
274
MinION sequencing, respectively.
275
Correlation between Ct value and sequencing coverage. Prior to shotgun sequencing on an
276
Illumina MiSeq instrument, all the IMS-MDA processed samples (n=28) in this study were
277
analyzed by real-time PCR. The resulting Ct values displayed a positive correlation with shotgun
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
255
sequencing coverage (R2 = 0.76, Figure 4). This observation suggests that Ct value is a useful
279
indicator of target genome output by shotgun sequencing. When Ct values were lower than 25,
280
the majority (>50%) of the SE genome was likely to be sequenced. Serotype prediction from raw
281
sequencing reads was successful in every sample tested in the study when the Ct value was
282
below 26. Therefore, Ct values can be used as a performance parameter for developing and
283
optimizing the quasi-metagenomics method or as a quality check before committing to
284
sequencing.
285 286
Discussion
287
Conventional metagenomics approach relies on deep sequencing to identify low abundant
288
microbial species directly from environmental samples. This strategy can be impractical, if not
289
ineffective, for detecting low levels of bacterial pathogen contaminants in food samples. As
290
shown by previous studies (8-12), adequate concentration of pathogen analytes prior to
291
sequencing is critical for metagenomics identification of pathogen sequences. Culture
292
enrichment alone was used in these studies to concentrate target pathogen cells. Given sufficient
293
time, when the analyte rose to become a dominant species in the enrichment culture, nearly full
294
recovery of the analyte genome could be achieved from sequencing enriched samples. This
295
allowed a variety of subtyping analyses to be performed on shotgun metagenomics data,
296
generating rich information about the analyte in addition to its detection. In this study, we
297
improved and accelerated the isolation-independent, shotgun sequencing-based detection and
298
subtyping of Salmonella from selected food samples using selective enrichment of the analyte
299
genomic DNA by IMS-MDA, real-time nanopore sequencing by MinION and streamlined
300
bioinformatics analysis of sequencing data.
Downloaded from http://aem.asm.org/ on March 5, 2018 by UNIV OF GEORGIA
278
Firstly, culture enrichment was substantially shortened by IMS-MDA. While necessary and
302
effective in microbial analysis of food samples, culture enrichment alone can be time-consuming,
303
especially when low levels of pathogen contaminants are present in food samples together with
304
competing flora and, in some cases, antimicrobial substances. With approximate detection
305
sensitivity at