Short Technical Reports Combining multiple PCR ... - BioTechniques

2 downloads 0 Views 3MB Size Report
Columbia, Vancouver, Canada) (11) ... University of British Columbia, St. Paul's Hospital, Vancouver, BC, Canada ..... Centre at Vancouver General Hospital.
Short Technical Reports Combining multiple PCR primer pairs for each amplicon can improve SNP genotyping accuracy by reducing allelic drop-out Scott J. Tebbutt1,2 and Jian Ruan1 1The

James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research, University of British Columbia, St. Paul’s Hospital, Vancouver, BC, Canada and 2Department of Medicine, Division of Respiratory Medicine, University of British Columbia, Providence Heart + Lung Institute, St. Paul’s Hospital, Vancouver, BC, Canada BioTechniques 45:637-646 (December 2008) doi 10.2144/000112992

We have developed a systematic, single-tube assay approach to reduce the chance of SNP genotyping error due to otherwise unidentifiable allelic drop-out during PCR amplification. Allelic drop-out in such cases would normally be caused by additional and rare genetic variation within the PCR primer site sequence itself. Our method is novel in that it does not require prior knowledge of the additional “hidden” genetic variation. The method has been tested in multiplex using a microarray-based SNP genotyping chip and results in the rescue of previously reported false genotype calls.

INTRODUCTION In response to the ever-increasing numbers of research studies and diagnostic applications that involve the genotyping of single nucleotide polymorphisms (SNPs), the causes and consequences of genotyping errors have been the subject of several interesting publications in recent years (1,2). Solutions to detect and/or minimize errors have been proposed and instigated in the high-throughput research realm, and usually involve careful research study design and sophisticated statistical methods (2,3). In the lowerthroughput research and diagnostic fields, genotyping errors have often been detected only by chance, and individual molecular assays have subsequently been redesigned to correct for the specific error-causing issue (4,5). One of the major causes of genotyping error is the presence of additional sequence variation close to the genetic marker that is actually under interrogation (1,6–8). This can cause partial or full allelic drop-out, due to inefficient primer hybridization in instances where the additional variation overlaps the primer sequence. It has been estimated that primer-site SNPs (primerSNPs) may account for half of all missed heterozygotes in some Vol. 45 ı No. 6 ı 2008

data sets, plausibly due to the missed heterozygotes also being heterozygous at a primerSNP (8). If the primerSNP is within a PCR priming site, allelic drop-out would be caused by preferential amplification of the chromosome with the perfectly matched allele, thereby leading the investigator to erroneously believe the locus being queried is homozygous. Our laboratory has previously reported three such examples of allelic drop-out due to PCR primerSNPs (9) or primer “single nucleotide variants” in a study designed to validate robust microarray-based SNP genotyping methodology and analysis algorithms: 50-plex PCR was used to amplify 50 HapMapcharacterized (www.hapmap.org) SNP loci from each of 49 human Coriell DNA samples (Coriell Institute for Medical Research, Camden, NJ, USA), followed by de-multiplexing and genotyping on mini-sequencing chips (9–12). We achieved a 100% genotype call rate and >99.9% accuracy with just three discordant genotypes. For each of these discordant genotype cases we identified additional single nucleotide variant sites that coincided with the positions delimited by the PCR primer sequences used for the 50-plex reaction, therefore consistent with full or partial

allelic drop-out during the PCR. Only one of these sites was identified as an existing SNP (rs6871885), referenced in publicly available databases. As we previously concluded, more stringent due diligence at the PCR primer design stage would have avoided the deleterious effect of this primerSNP. Nevertheless, our identification of two hitherto unreported single nucleotide variants strengthens the concept that previously unknown variants cannot be avoided (8,13) and, as such, genotyping will always be somewhat at risk of error due to allelic drop-out. Solutions to these types of challenges have been proposed and tested, including independent sequencing from more than one amplicon which has been shown to dramatically reduce the impact of unknown primerSNPs on the rate of missed heterozygotes (8). We report here another, more systematic, singletube multiplex PCR assay solution that we believe is a novel and very simple method to reduce the risk of otherwise unidentifiable allelic drop-out. MATERIALS AND METHODS Original Multiplex PCR Methodology and Genotype Error Detection Our original 50-plex PCR methodology with subsequent SNP genotyping assay is detailed in Podder et al. (9). Briefly, PCR primers were designed for 50 SNP loci, with amplicon sizes restricted to 100–200 bp. Each PCR primer had a common linker sequence designed at its 5′ end (5′-TACGACTCACTTAGGGAG-3′ for each of the left hand PCR primers, and 5′-CGATGTAGGTGACACTAG-3′ for each of the right hand PCR primers). The linkers serve to stabilize the PCR reaction, especially in multiplex mode. Specific PCR cycling conditions were adopted from a previously published study by Wang et al. (14). PCR and arrayed primer extension (APEX) genotyping assays were performed, and microarray image data were imported into SNP Chart genotyping software (SNP Chart v3.1, University of British Columbia, Vancouver, Canada) (11) and analyzed as described previously www.biotechniques.com ı BioTechniques ı 637

Short Technical Reports

Figure 1. SNP Charts of discrepant genotype cases. SNP Charts (left-hand panels) showing the fluorescent signal intensity data for the three discrepant genotype cases arising from allelic drop-out during initial 50-plex PCR. Each chart shows four-channel fluorescent intensity data (y-axis: A, C, G and T) from 30 SNPspecific array spots (five replicate spots for six different probes, arranged along the x-axis). For example, starting at the left hand side of SNP Chart (i), the first five spots (LEFT G/A) refer to the left-hand APEX probe that emits either a single G (red) signal (for homozygous GG genotypes), or an A (yellow) signal (for homozygous AA genotypes), or a mixture of G and A (heterozygous GA). The next five spots (RIGHT C/T) refer to the right-hand APEX probe, which interrogates nucleotides of the DNA strand complementary to that interrogated by the left-hand APEX probe, thus giving a single C (green) signal (for CC), a single T (blue) signal (for TT), or a mixed C and T signal (for TC). The remaining spots represent allele-specific (ASO) APEX probes in which base-specific fluorescence signifies the presence of the allele. Allele-specific APEX probes (two probes per DNA strand) include the actual SNP site allele at the 3′ end of the probe (20,21). Allele-specific single base extension of these ASO-APEX probes during the reaction is contingent on the presence of the actual complementary base at the SNP site in the sample template DNA. “_1” probes correspond to the first allele (G in the case of rs3776720), and “_2” probes correspond to the second allele (A). The redundancy and consistency of the data across different probes give high confidence in the assigned genotypes. The right-hand panels show DNA sequencing chromatograms of the three sample loci from independently amplified material, with the true SNP genotypes annotated in black and the additional genetic variation annotated in orange. The PCR primers are delimited by horizontal arrows, clearly showing the relative position of the new genetic variants that caused allelic drop-out. 638 ı BioTechniques ı www.biotechniques.com

Vol. 45 ı No. 6 ı 2008

Short Technical Reports

(9,15). Genotypes were compared with HapMap data for concordance. The SNP loci from the three discrepant genotype cases were independently re-amplified using longer-range PCR primers, followed by sequencing as previously described (9).

amplicon fragmentation followed by APEX genotyping assays and analysis were performed as described previously (9).

Redundant Multiplex PCR Primer Design

Figure 1 shows the SNP Chart data for the three originally discrepant genotype cases, as well as the actual sequencing results showing the additional single nucleotide variation. SNP Chart (i) shows a high-quality, unambiguous GG genotype for sample NA18502 [Yoruba; Ibadan, Nigeria (YRI)] at SNP rs3776720, which is discrepant from the true GA genotype. We found a neighboring SNP (T/A) which is located at the 3′ end of the anti-sense PCR primer site [5′-linker-TAT TGC AGG CAG ACG TGA-3′ (bolded and underlined bases represent the site of new sequence variants)]. This polymorphic site (30 bp downstream of rs3776720) is reported in dbSNP as rs6871885, with the A base (sense strand) being described as a rare allele (0.083) only in sub-Saharan African populations (NA18502 is heterozygous for this SNP). SNP Chart (ii) shows a high-quality, unambiguous CC genotype for sample NA18621 [Han Chinese; Beijing, China (CHB)] at SNP rs12472674, which is discrepant from the true CT genotype. We found a sequence variant (G/A) 52 bp downstream of SNP rs12472674, located within the anti-sense PCR primer site (5′-linker-CTC AAT ATG TTA CCA CAA-3′). This variant (heterozygous in NA18621) has not been previously reported in dbSNP and may represent a novel polymorphism. SNP Chart (iii) shows a low-quality, ambiguous TT/CT genotype for sample NA19210 (YRI) at SNP rs4739199, which is discrepant from the true CT genotype. We found a sequence variant (G/A) 45 bp downstream of SNP rs4739199, located within the anti-sense PCR primer site (5′-linker-TCC ACT TCA TTA GGT GAA-3′); this variant (heterozygous in NA19210) also has not been previously reported in dbSNP and may represent a novel polymorphism. Interestingly, the additional single nucleotide variation that occurred at a position corresponding to the actual 3′ end of the PCR primer (for rs3776720),

An additional pair of PCR primers were designed for each of the 50 SNP loci, with similar properties to the original 50-plex PCR primer pairs (including the equivalent common 5′ linker sequences; see Materials and Methods, Original Multiplex PCR Methodology and Genotype Error Detection). This gave four primers for each SNP locus, or 200 PCR primers in total for the 50 SNPs, all of which would be within the same reaction vessel. Although primers for each SNP were allowed to be contiguous with one another, to ensure small length amplicons, no PCR primer was allowed to overlap in any way with the sequence of any other PCR primer, or with any of the sequences delimited by the APEX genotyping probes used subsequently. Redundant Multiplex PCR The redundant multiplex PCR was carried out in a 25-μL reaction containing 20 nM (final) of each primer plus 20 nM of left and right linker-only primers (9), 200 μM dNTPs without dTTP, 160 μM dTTP, 40 μM dUTP, 6 U of HotStar Taq DNA polymerase (5 U/μL; Qiagen, Valencia, CA, USA), 1.5 mM MgCl2 in 1× PCR reaction buffer (100 mM Tris-HCl, 50 mM KCl, 100 μg/mL gelatin, pH 8.3) with 5 ng of genomic DNA (for three separate Coriell DNA samples: NA18502, NA18621, and NA19210). PCR was performed using a MJR PTC 200 ThermoCycler (MJ Research, Waltham, MA, USA). PCR was initiated by a 15-min polymerase activation step at 95°C and completed by a final 3-min extension step at 72°C. The reaction procedure consisted of 40 cycles of denaturation at 95°C for 40 s, primer annealing at 55°C for 2 min and one ramping-up step from 55°C to 70°C for 2.5 min (0.1°C/s) (9,14). PCR 640 ı BioTechniques ı www.biotechniques.com

RESULTS AND DISCUSSION

and the variant that was positioned several bases upstream of the 3′ end of the primer (for rs12472674), both caused complete allelic drop-out, resulting in high quality genotype calls that were entirely wrong in these cases. For the third case, in which the variant was positioned close to the 5′ end of the SNP locus-specific part of the primer (for rs4739199), the resulting genotype call had a poor quality score, with one autocalling method determining the genotype as CT and another as TT. This likely reflects partial allelic drop-out. For research and especially clinical diagnostic applications, it would be better to have a partial allelic drop-out causing poor quality genotype calls that can be automatically flagged and either discarded or manually inspected, than a complete allelic drop-out, which would result in high quality homozygous calls that could be both wrong and undetectable. In order to compensate for the allelic drop-out effect of these hidden single nucleotide variants, we needed to design a systematic approach that could be applied across all 50 SNPs at the 50-plex PCR stage. This would ensure that the alleles at all the remaining 47 SNP sites would also be subject to the design approach, not just the three discrepant SNP sites. Such a design criteria would be necessary for any truly unbiased compensation, one which could reasonably be argued to be capable of accounting for hidden single nucleotide PCR primer variants in any of the 50 SNP loci amplicons. We also wanted to minimize any extra steps in the PCR and genotyping process, thereby keeping costs down and the methodology as simple as possible. We reasoned that if a PCR primer was failing to hybridize effectively to a template due to a primerSNP, then it would easily be displaced by polymerization from another primer that was able to hybridize further upstream, similar to the principle of the TaqMan assay (Applied Biosystems, Foster City, CA, USA) (16). We therefore designed an additional pair of PCR primers for each of the 50 SNP loci, with similar properties to the original 50-plex PCR primer pairs (including the equivalent common 5′ linker sequences). This would give four primers for each SNP locus, or 200 PCR Vol. 45 ı No. 6 ı 2008

Short Technical Reports

Figure 2. SNP Charts of corrected genotype cases. SNP Charts (right-hand panels) show the fluorescent signal intensity data for the three previously discordant genotype cases now corrected by the new redundant 50-plex PCR method. Chart interpretation is as described for Figure 1. The left-hand panels show the DNA sequences and PCR primer positions of the three sample loci, with the interrogated SNP site alleles highlighted in green. The original PCR primer pairs are highlighted (and underlined) in magenta with the primerSNP variants shown in orange. The additional primers that were added to the multiplex PCR mix for these three loci are highlighted (and underlined) in blue. The yellow underlined sequences at the 5′ and 3′ ends of each locus represent PCR primer pairs used in the independent amplifications and subsequent DNA sequencing reactions, as described in Materials and Methods. 642 ı BioTechniques ı www.biotechniques.com

Vol. 45 ı No. 6 ı 2008

Short Technical Reports

primers in total for the 50 SNPs, all within the same reaction vessel. Such a redundancy would be more effective at reducing allelic drop-out than if we simply performed another 50-plex PCR using the additional set of primer pairs separately from the original set. This is because two primer pairs per SNP working simultaneously in the same tube would give four different primer pair combinations for each SNP locus. When we performed our novel, redundant multiplex PCR (with subsequent APEX genotyping) we obtained accurate genotyping results for all three Coriell samples, exactly matching our previous data except for the three discordant genotype cases described above and in Figure 1. Figure 2 shows the positions of the additional PCR primer sites for each of the three previously discordant SNP loci, as well as the new SNP Charts for the actual corrected genotype cases themselves. It is clear that the previously incorrect data sets (Figure 1, SNP Charts) have now been corrected, and gives evidence that the allelic drop-out has been eliminated. To reiterate from above, the other 47 SNP loci also had additional primers added to the same mix. The genotypes of each of these SNPs for each of the three Coriell samples remained the same as previously determined by us, and concordant with the HapMap database. With respect to economics, we acknowledge that if a previously unknown primerSNP can be identified, it may be more cost-effective to simply redesign subsequent assays completely, rather than include extra, redundant primers. However, we believe that our method is cost-effective for avoiding allelic drop-outs due to hidden primerSNPs that are unidentified. The extent of such unidentified single nucleotide genetic variation in individual genomes is currently unknown, and awaits the results of additional human DNA sequencing endeavors such as the Personal Genome Project and the 1000 Genomes Project (17). The fact that we have discovered two potentially new variants in extremely limited genomic areas of two out of 49 tested DNA samples suggests that the extent may be quite high. Our redundant and robust PCR amplification approach might also 644 ı BioTechniques ı www.biotechniques.com

have advantages for mutation detection using Sanger dideoxy sequencing (18). If sequencing chromatographs show unexpectedly high levels of homozygosity (i.e., 100%) at the base peaks for a gene target that would perhaps be expected to have genetic variability, this might be due to a hidden SNP within one of the original PCR primers causing failure of one of the chromosome targets to amplify. Multiple pairs of PCR primers, working simultaneously in the same tube, would increase the likelihood of amplification of both allelic targets. Subsequent independent sequencing from multiple primers would certainly detect hidden primerSNPs, as well as new mutations further within the amplicon region of the originally dropped-out chromosome, but would increase the costs of the sequencing. Finally, data reported by Koboldt et al. describe a surprisingly close clustering of SNPs in the human genome (1), with the proportion of adjacent SNPs being sharply increased if they are very near each other compared with pairs further apart. Specifically, for allocated SNPs on autosomes, 7.5% had an SNP within 1–5 bp, 6.0% had an SNP within 6–10 bp, and 3.3% had an SNP within 26–30 bp (1). In light of this data, genotyping primers could be even more affected by hidden SNPs than PCR primers. On the other hand, the genotyping primers might be less sensitive to mismatches besides those at their 3′ ends. Our robust assays, which are independently redundant both at the PCR amplification level and at the SNP genotyping level (genetic probes for both DNA strands, including allele-specific probes), are likely to be more immune to the effects of hidden SNPs than other genotyping design approaches, such as that recently described by Krjutskov et al., who used a single pair of primers for both PCR and genotyping (19). In summary, we have successfully designed and tested a robust and redundant single-tube multiplex PCR assay that is capable of systematically reducing the influence of primerSNPs, while retaining highly accurate genotyping. Such an assay design may have useful and important application in clinical genetic diagnostics, either in single- or multiplex PCR format, where accurate genotyping of an

individual patient across few or many genetic markers may assist in healthcare decision-making. ACKNOWLEDGEMENTS

We would like to thank Paul Lem for helpful discussions on multiplex PCR methodology, and Andrew Sandford for critical reading of the manuscript. Microarrays used in this study were kindly printed for us by staff at the Microarray Facility of the Prostate Centre at Vancouver General Hospital. We thank the reviewers of this manuscript for their helpful comments, and Bruce McManus for continued support. This research was supported by the National Sanitarium Association (Canada), AllerGen NCE, and the Michael Smith Foundation for Health Research. COMPETING INTERESTS STATEMENT

The authors declare no competing interests. REFERENCES 1. Koboldt, D.C., R.D. Miller, and P.Y. Kwok. 2006. Distribution of human SNPs and its effect on high-throughput genotyping. Hum. Mutat. 27:249-254. 2. Pompanon, F., A. Bonin, E. Bellemain, and P. Taberlet. 2005. Genotyping errors: causes, consequences and solutions. Nat. Rev. Genet. 6:847-859. 3. Sobel, E., J.C. Papp, and K. Lange. 2002. Detection and integration of genotyping errors in statistical genetics. Am. J. Hum. Genet. 70:496-508. 4. Laios, E. and K. Glynou. 2008. Allelic dropout in the LDLR gene affects mutation detection in familial hypercholesterolemia. Clin. Biochem. 41:38-40. 5. Ward, K.J., S. Ellard, C.S. Yajnik, T.M. Frayling, A.T. Hattersley, P.N. Venigalla, and G.R. Chandak. 2006. Allelic drop-out may occur with a primer binding site polymorphism for the commonly used RFLP assay for the -1131T>C polymorphism of the Apolipoprotein AV gene. Lipids Health Dis. 5:11. 6. Heinrich, M., M. Muller, S. Rand, B. Brinkmann, and C. Hohoff. 2004. Allelic drop-out in the STR system ACTBP2 (SE33) as a result of mutations in the primer binding region. Int. J. Legal Med. 118:361-363. 7. Kirsten, H., D. Teupser, J. Weissfuss, G. Wolfram, F. Emmrich, and P. Ahnert. 2007. Robustness of single-base extension against Vol. 45 ı No. 6 ı 2008

Short Technical Reports

mismatches at the site of primer attachment in a clinical assay. J. Mol. Med. 85:361-369. 8. Quinlan, A.R. and G.T. Marth. 2007. Primersite SNPs mask mutations. Nat. Methods 4:192. 9. Podder, M., J. Ruan, B.W. Tripp, Z.E. Chu, and S.J. Tebbutt. 2008. Robust SNP genotyping by multiplex PCR and arrayed primer extension. BMC Med. Genomics 1:5. 10. Podder, M., W.J. Welch, R.H. Zamar, and S.J. Tebbutt. 2006. Dynamic variable selection in SNP genotype autocalling from APEX microarray data. BMC Bioinformatics 7:521. 11. Tebbutt, S.J., I.V. Opushnyev, B.W. Tripp, A.M. Kassamali, W.L. Alexander, and M.I. Andersen. 2005. SNP Chart: an integrated platform for visualization and interpretation of microarray genotyping data. Bioinformatics 21:124-127. 12. Walley, D.C., B.W. Tripp, Y.C. Song, K.R. Walley, and S.J. Tebbutt. 2006. MACGT: multi-dimensional automated clustering genotyping tool for analysis of microarray-based mini-sequencing data. Bioinformatics 22:11471149. 13. Lahermo, P., U. Liljedahl, G. Alnaes, T. Axelsson, A.J. Brookes, P. Ellonen, P.H. Groop, C. Hallden, et al. 2006. A quality as-

sessment survey of SNP genotyping laboratories. Hum. Mutat. 27:711-714. 14. Wang, H.Y., M. Luo, I.V. Tereshchenko, D.M. Frikker, X. Cui, J.Y. Li, G. Hu, Y. Chu, et al. 2005. A genotyping system capable of simultaneously analyzing >1000 single nucleotide polymorphisms in a haploid genome. Genome Res. 15:276-283. 15. Tebbutt, S.J., G.D. Mercer, R. Do, B.W. Tripp, A.W. Wong, and J. Ruan. 2006. Deoxynucleotides can replace dideoxynucleotides in minisequencing by arrayed primer extension. Biotechniques 40:331-338. 16. Livak, K.J., S.J. Flood, J. Marmaro, W. Giusti, and K. Deetz. 1995. Oligonucleotides with fluorescent dyes at opposite ends provide a quenched probe system useful for detecting PCR product and nucleic acid hybridization. PCR Methods Appl. 4:357-362. 17. Lunshof, J.E., R. Chadwick, D.B. Vorhaus, and G.M. Church. 2008. From genetic privacy to open consent. Nat. Rev. Genet. 9:406-411. 18. Sanger, F., S. Nicklen, and A.R. Coulson. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74:54635467. 19. Krjutskov, K., R. Andreson, R. Magi, T. Nikopensius, A. Khrunin, E. Mihailov, V. Tammekivi, H. Sork, et al. 2008. Development

of a single tube 640-plex genotyping method for detection of nucleic acid variations on microarrays. Nucleic Acids Res. 36:e75. 20. Gemignani, F., C. Perra, S. Landi, F. Canzian, A. Kurg, N. Tonisson, R. Galanello, A. Cao, et al. 2002. Reliable detection of beta-thalassemia and G6PD mutations by a DNA microarray. Clin. Chem. 48:2051-2054. 21. Pastinen, T., M. Raitio, K. Lindroos, P. Tainola, L. Peltonen, and A.C. Syvanen. 2000. A system for specific, high-throughput genotyping by allele-specific primer extension on microarrays. Genome Res. 10:1031-1042.

Received 13 June 2008; accepted 9 September 2008. Address correspondence to Scott J. Tebbutt, iCAPTURE Centre, St. Paul’s Hospital, Room 166, 1081 Burrard Street, Vancouver, BC, V6Z 1Y6, Canada. e-mail: [email protected] To purchase reprints of this article, contact: [email protected]

Connecting. Informing. Advancing. For 25 Years.

BioTechniques Weekly Keep up-to-date with this free weekly electronic newsletter delivered every Thursday to BioTechniques subscribers. Coverage includes: Science news ◆ Recent top grants ◆ New products ◆ Employment opportunities ◆ Event calendar ◆ and more… Not a subscriber? Sign up free at www.BioTechniques.com/newsletters Sponsorship opportunities are available. Your sponsorship will reach over 75,000 subscribers per newsletter. To sponsor a newsletter, please contact Michelle Graves. [email protected] | +44 (20) 701 76785

Suggest Documents