Identification of DNA Signatures Suitable for Use in Development of Real-Time PCR Assays by Whole-Genome Sequence Approaches: Use of Streptococcus pyogenes in a Pilot Study Guo-Chiuan Hung, Kenjiro Nagamine, Bingjie Li, and Shyh-Ching Lo Tissue Microbiology Laboratory, Division of Cellular and Gene Therapies, Office of Cellular, Tissue and Gene Therapies, Center for Biologics Evaluation and Research, Food and Drug Administration, Bethesda, Maryland, USA
A stepwise computational approach using three layers of publicly available software was found to effectively identify DNA signatures for Streptococcus pyogenes. PCR testing validated that 9 out of 15 signature-derived primer sets could detect as low as 5 fg of target DNA with high specificity. The selected signature-derived primer sets were successfully evaluated against all 23 clinical isolates. The approach is readily applicable for designing molecular assays for rapid detection and characterization of various pathogenic bacteria.
I
n developing any molecular assays, including real-time PCR assays, it is critical to effectively identify and select “DNA signatures” (i.e., species-specific sequences) from the pathogen of interest (1). Unfortunately, the method of DNA signature identification has generally not been well described (2). In the current study, we developed an effective workflow using Insignia, dCAS (Desktop cDNA Annotation System), and NCBI BLASTN (Fig. 1) to analyze whole-genome sequences and identify the pathogen’s DNA signatures that could be used to design highly species-specific primers for real-time PCR assays. Using Streptococcus pyogenes as a working model, we identified the initial 566 DNA signatures, ranging from 125 to 478 bp in length, using a Web-accessible Insignia program (4, 5). Upon the initial assessment of these DNA signatures by BLASTN, it was found that most of these signatures still had significant homology (up to 95% similarity) with sequences from some other, closely related Streptococcus species. To facilitate the identification of DNA signatures that had low homology with closely related species, we constructed a local database containing complete genome sequences of 14 phylogenetically related species within the dCAS program (3) for the specific comparison with the DNA signatures identified by the Insignia program. In the second step, the top 10% of the initial DNA signatures (n ⫽ 56) that had the highest E value
against genome sequences of the 14 closely related species, indicating their sequence distinctness, were selected for further analysis (Fig. 1). Finally, a total of 15 DNA signatures that had the least number of hits with the NCBI nonredundant nucleotide database by BLASTN were selected out of the 56 signatures as final candidates for primer design (Table 1). These DNA signatures were strategically selected in different regions of the 1.8-Mb S. pyogenes genome to mitigate potential adverse effects in PCR amplification caused by any random mutation that might have occurred at these signatures. The Primer-BLAST program (6) was used to design 15 primer sets on the 15 final selected DNA signatures. These 15 primer sets were first evaluated using conventional PCR assays performed in 10 mM Tris-HCl, pH 8.3, 50 mM KCl, 2 mM MgCl2, 200 M deoxynucleoside triphosphates (dNTPs),
Received 4 May 2012 Accepted 8 May 2012 Published ahead of print 16 May 2012 Address correspondence to Shyh-Ching Lo,
[email protected]. Copyright © 2012, American Society for Microbiology. All Rights Reserved. doi:10.1128/JCM.01155-12
FIG 1 Workflow for primer design. A total of 566 DNA signatures conserved for 13 strains of S. pyogenes were identified by the Insignia program. These DNA signatures were screened the second time against closely related species by dCAS (Desktop cDNA Annotation System) and sorted by their E values. The selected DNA signatures were subjected to a third screening against NCBI nonredundant databases by BLASTN to ensure specificity. Primer sets were then designed based on these final DNA signatures by Primer-Blast. All the primer sets designed were examined experimentally for their sensitivities and specificities in detection against the target S. pyogenes using conventional PCR and/or real-time PCR assays.
2770
jcm.asm.org
Journal of Clinical Microbiology
p. 2770 –2773
August 2012 Volume 50 Number 8
Workflow for Efficient DNA Signature Discovery
13 14
8 9 10 11 12
7
4 5 6
3
1 2
Primer no.
223565 to 223715
438388 to 438563 137856 to 138108
1613451 to 1613605 1305919 to 1306087 211489 to 211657 1002747 to 1002897 1087115 to 1087263
142301 to 142479
1763076 to 1763304 1488956 to 1489189 1606001 to 1606212
126413 to 126631
1036539 to 1036676 439564 to 439806
Genomic coordinates for strain M1 GAS Hypothetical protein Putative S-adenosylmethionine synthetase Putative short-chain fatty acid transporter Hypothetical protein Hypothetical protein Putative lactose phosphotransferase system repressor protein Putative V-type Na⫹ ATPase subunit B Hypothetical protein Putative beta-galactosidase Putative histidine kinase Hypothetical protein Putative extramembranal protein Hypothetical protein Putative V-type Na⫹ ATPase subunit C Putative glucose kinase
Gene product information
ACTTCCTCTTTGTCCACTAGCA AAAGCAATCAATGACCTGACCT TCAGAACGTGGTATCGGCCT TGCTAGCAGCTCTAAGGTTCG AGGGACTCTTCTCTCAATGCT
GTCGATTTTGCCACGTACCG
CCATGACGTCTGCGATGACT AGCTCTTGCTGACTAGCTTTTT TCAGATATGACCGCTAGACGAGA
TGAAATGGATGCCAGAATCGCT
TTGTGAAACTGCTCTAATCGCT AGGTGCAGTTGGTCGAGGTA
Forward primer
ACCTCTCGAGTTTAGGATGTGGA TTCTGCAAAAGCCCAGCGAT
TCATCGATCAACTTCCTTAAAGCG AGTGCCTCTAGCTGATTTGGC ACCTCCTCCTTTGGCATGGT CACCTCACACAGTCAAGAAACA TGCTTGAGCTTAACCCTGGTG
TGCATGGTCAACTCAATCATTTGC
TGATAGGGTCGTCAAGTGGCT GCTTTATGATCGTAGCTCTGGGA TCGTGACCTTCGTCCTTTGC
GCCCGTCTTTTCCGTACCAA
GAGGCTGCTCCTGCCTATCT TCAGACAAAAATCCCCAAACCT
Reverse primer
5
5 50
50 500 50 5 5
5
5 50 5
50
50 50
LOD (fg)b
3
Many 1
0 Many 0 1 4
0
4 0 0
Many
0 Many
No. of side bandsc
5
Not done 5
5 Not done 5 50 5
5
5 5 5
Not done
50 Not done
LOD (fg)b
3
0
5 0 4
0
5
Many 2 Many
2
No. of side peaksd
Yes
Yes
Yes Yes Yes
Yes
Yes
No Yes No
Yes
Acceptability
Real-time PCRa
GGGAAGTTTTGCAATAAGAGAGGG TGCACCAGTCAGTCTATCACCT
GCAAAAAGGGCGTTGTTTTCT
Conventional PCRa
AGCGAGTAGCTAATCTTAGTGGC
Sequence
TABLE 1 Evaluation of primer sets designed from the 15 selected signature sequences of S. pyogenes against DNAs of S. pyogenes and non-S. pyogenes bacteria by 2 different PCR platforms
15
a For both conventional and real-time PCR testing, 5 pg to 5 fg of S. pyogenes DNA was used, and 5 ng of DNA was used for all non-S. pyogenes species. b LOD, limit of detection. No. of side bands, number of side amplifications that were produced from DNA of non-S. pyogenes species in conventional PCR testing. No. of side peaks, number of side amplifications that were produced from DNA of non-S. pyogenes species in real-time PCR testing. c d
jcm.asm.org 2771
August 2012 Volume 50 Number 8
Hung et al.
FIG 2 Representative conventional PCR testing of S. pyogenes-specific primer set number 7 using 5 pg, 500 fg, 50 fg, and 5 fg of purified S. pyogenes genomic DNA (lanes 1 to 4 without human DNA and lanes 5 to 8 with 5 ng of background human DNA, respectively) and 5 ng of purified DNA from Streptococcus dysgalactiae group G, Streptococcus agalactiae group B, S. dysgalactiae group C, Streptococcus sp. strain H60R group F, viridans group streptococci, Streptococcus equi group C, Streptococcus gallolyticus, Streptococcus mutans, Streptococcus uberis, Streptococcus salivarius, Streptococcus thermophilus, Streptococcus pneumoniae, Enterococcus faecium, Enterococcus faecalis, Staphylococcus aureus, Staphylococcus epidermis, Clostridium perfringens, Listeria monocytogenes, Corynebacterium sp., Bacillus subtilis, Bacillus cereus, Bacillus circulans, Bacillus polymyxa, Bacillus megaterium, Bacillus sphaericus, Bacillus thuringiensis, Bacillus coagulans, Bacillus alvei, Escherichia coli, Salmonella enterica serovar Typhimurium, Salmonella sp. group D, Pseudomonas fluorescens, Shigella sonnei, Enterobacter aerogenes, Klebsiella pneumoniae, Morganella morganii, Proteus mirabilis, Alcaligenes ordorans, Stenotrophomonas maltophilia, Candida albicans, Candida tropicalis, Candida parapsilosis, human, and a no-template control (lanes 9 to 52, respectively).
and 50 pmol of each primer with 1 U AmpliTaq gold DNA polymerase (Applied Biosystems, Carlsbad, CA) using an Eppendorf Mastercycler ep gradient S system (Eppendorf) with the following cycling parameters: an initial activation of AmpliTaq gold DNA polymerase at 94°C for 10 min, followed by 45 cycles of 94°C for 30 s (denaturation), 55°C for 30 s (annealing), and 72°C for 30 s (extension), followed by a final extension at 72°C for 7 min. For all 15 primer sets, specific amplicons of the expected sizes were produced from as low as 5 fg of S. pyogenes DNA (approximately 2.5 genome copies), while no amplicons of the expected sizes were
produced using 5 ng DNA (a millionfold excess) of nontarget bacteria species, including the closely related Streptococcus species (Fig. 2). The limits of detection for the 15 selected primer sets are listed in Table 1. For most reactions, the limit of detection was not affected by adding a 5-ng excess amount of human DNA to the background (Fig. 2A, lanes 5 to 8). For 4 primer sets (sets 2, 3, 9, and 13), it was observed that other amplicons with sizes different from those of the predicted S. pyogenes-specific amplicons were produced from more than 4 non-S. pyogenes species. To avoid any potential difficulty of interpreting the real-time PCR results, we did not include these 4 primer sets in the next phase of evaluation by real-time PCR. The real-time PCR assays were carried out in triplicate for each primer set against the same set of S. pyogenes and non-S. pyogenes DNA samples studied in the previous conventional PCR. Power SYBR green PCR master mix with AmpliTaq gold DNA polymerase and a CFX96 real-time PCR detection system (Bio-Rad, Hercules, CA) were used with the following cycling parameters: an initial activation of AmpliTaq gold DNA polymerase at 94°C for 10 min, followed by 50 cycles of 94°C for 15 s (denaturation) and 60°C for 1 min. Among the 11 remaining primer sets, 2 primer sets (sets 4 and 6) (Table 1) produced some side amplicons from non-S. pyogenes DNA with melting peaks close to those of S. pyogenes. These 2 primer sets were considered “unacceptable” (Table 1) and were not selected as S. pyogenes-specific primers for the development of the real-time PCR assay. In comparison, primer sets 8, 11, and 14 showed very high specificity, with no above-thethreshold amplification observed from all of the non-S. pyogenes DNA tested (Table 1 and Fig. 3). The remaining 6 primer sets (sets 1, 5, 7, 10, 12, and 15) were also considered “acceptable.” The few side product amplicons produced by these primer sets from the non-S. pyogenes DNA had clearly different Tm peaks or melting profiles distinguishable from that of the target S. pyogenes PCR product. To evaluate the abilities of these 9 selected primer sets in detection of various S. pyogenes clinical isolates, we tested 23 field strains, belonging to 19 different emm types of S. pyogenes, collected by the CDC, using our real-time PCR assay. The results showed that the DNA isolated from all of the 23 strains could be amplified with an efficiency similar to that for the DNA isolated from ATCC strain 19615 using all the 9 selected signature-derived primer sets (Fig. 4). The results indicate that DNA signatures se-
FIG 3 Representative real-time PCR testing of S. pyogenes-specific primer set number 8 using 5 pg (blue), 500 fg (orange), 50 fg (pink), and 5 fg (black) of S. pyogenes genomic DNA and 5 ng of DNA from S. dysgalactiae group G, S. agalactiae group B, S. dysgalactiae group C, Streptococcus sp. strain H60R group F, viridans group streptococci, S. equi group C, S. gallolyticus, S. mutans, S. uberis, S. salivarius, S. thermophilus, S. pneumoniae, Enterococcus faecium, E. faecalis, Staphylococcus aureus, S. epidermis, Clostridium perfringens, Listeria monocytogenes, Corynebacterium sp., Bacillus subtilis, B. cereus, B. circulans, B. polymyxa, B. megaterium, B. sphaericus, B. thuringiensis, B. coagulans, B. alvei, E. coli, S. Typhimurium, Salmonella sp. group D, Pseudomonas fluorescens, Shigella sonnei, Enterobacter aerogenes, Klebsiella pneumoniae, Morganella morganii, Proteus mirabilis, Alcaligenes ordorans, Stenotrophomonas maltophilia, Candida albicans, C. tropicalis, C. parapsilosis, human, and a no-template control (green). (A) Amplification curve. (B) Standard curve. (C) Melting peaks of PCR products; Tm specific for S. pyogenes amplicons is shown.
2772
jcm.asm.org
Journal of Clinical Microbiology
Workflow for Efficient DNA Signature Discovery
FIG 4 Representative real-time PCR testing of S. pyogenes-specific primer set number 8 using 500 fg genomic DNA extracted from 23 clinical isolates and strain ATCC 19615. (A) Amplification curve. Orange, S. pyogenes DNA from ATCC 19615 tested in triplicate; green, DNA from 23 field strains collected by the CDC. (B) Melting peaks of PCR products; Tm specific for S. pyogenes amplicons is shown.
lected for primer design are apparently highly conserved in a wide range of different clinical isolates and strains of S. pyogenes. In conclusion, the study demonstrated that our computational workflow, with a genome-wide approach, could be a useful tool to effectively identify multiple, highly species-specific DNA signatures for various bacterial pathogens. The approach should be most applicable in the development of highly specific and sensitive real-time PCR arrays for simultaneous detection of a group of target bacteria pathogens through rapid designing of multiple species-specific primer sets. This genome-wide approach should prove even more useful and effective with the rapid advance of the next-generation sequencing technology and increasingly common use of high-throughput sequencers in ordinary laboratories. ACKNOWLEDGMENTS We thank Chris A. van Beneden and Bernard Beall from the CDC for providing S. pyogenes strains used in this study. Oligonucleotide synthesis was provided by the Core Facility of CBER, FDA. Adam Phillippy and Jason Guo are acknowledged for their technical assistance with the Insignia and dCAS programs, respectively. We also thank Shien Tsai for critically reading the manuscript and Jing Zhang for editorial assistance. The NIH Library’s editorial service is also acknowledged.
August 2012 Volume 50 Number 8
This project was supported in part by an appointment to the Research Participation Program at the Center for Biologics Evaluation and Research administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration.
REFERENCES 1. Albuquerque P, Mendes MV, Santos CL, Moradas-Ferreira P, Tavares FF. 2008. DNA signature-based approaches for bacterial detection and identification. Sci. Total Environ. 26:3641–3651. 2. Fukushima H, Tsunomori Y, Seki R. 2003. Duplex real-time SYBR green PCR assays for detection of 17 species of food- or waterborne pathogens in stools. J. Clin. Microbiol. 41:5134 –5146. 3. Guo YJ, Jose MC, Ribeiro JM, Anderson JM, Bour S. 2009. dCAS: a desktop application for cDNA sequence annotation. Bioinformatics 25: 1195–1196. 4. Phillippy AM, Ayanbule K, Edwards NJ, Salzberg SL. 1 July 2009. Insignia: a DNA signature search web server for diagnostic assay development. Nucleic Acids Res. doi:10.1093/nar/gkp286. 5. Phillippy AM, et al. 2007. Comprehensive DNA signature discovery and validation. PLoS Comput. Biol. 3:e98. doi:10.1371/journal.pcbi.0030098. 6. Rozen S, Helen JS. 2000. Primer3 on the WWW for general users and for biologist programmers, p 365–386. In Krawetz S, Misener S (ed), Bioinformatics methods and protocols: methods in molecular biology. Humana Press, Totowa, NJ.
jcm.asm.org 2773