Journal of Heredity 2012:103(2):308–312 doi:10.1093/jhered/esr137 Advance Access publication January 13, 2012
Ó The American Genetic Association. 2012. All rights reserved. For permissions, please email:
[email protected].
Computer Note Bisprimer—A Program for the Design of Primers for Bisulfite-Based Genomic Sequencing of Both Plant and Mammalian DNA Samples VIERA KOVACOVA
AND
BOHUSLAV JANOUSEK
From the Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, v.v.i. Kralovopolska 135, 612 65 Brno, Czech Republic. Address correspondence to Bohuslav Janousek at the address above, or e-mail:
[email protected].
Plants and animals differ in the sequence context of the methylated sites in DNA. Plants exhibit cytosine methylation in CG, CHG, and CHH sites, whereas CG methylation is the only form present in mammals (with an exception of the early embryonic development). This fact must be taken into account in the design of primers for bisulfite-based genomic sequencing because CHG and CHH sites can remain unmodified. Surprisingly, no user-friendly primer design program is publicly available that could be used to design primers in plants and to simultaneously check the properties of primers such as the potential for primer-dimer formation. For studies concentrating on particular DNA loci, the correct design of primers is crucial. The program, called BisPrimer, includes 2 different subprograms for the primer design, the first one for mammals and the second one for angiosperm plants. Each subprogram is divided into 2 variants. The first variant serves to design primers that preferentially bind to the bisulfite-modified primer-binding sites (C to U conversion). This type of primer preferentially amplifies the bisulfite-converted DNA strands. This feature can help to avoid problems connected with an incomplete bisulfite modification that can sometimes occur for technical reasons. The second variant is intended for the analysis of samples that are supposed to consist of a mixture of DNA molecules that have different levels of cytosine methylation (e.g., pollen DNA). In this case, the aim is to minimize the selection in favor of either less methylated or more methylated molecules. Key words: bisulfite, cytosine methylation, DNA methylation,
such serious phenomena as cancer (reviewed by Mund and Lyko 2010) and heritable diseases (reviewed by Robertson 2005 and Hitchins 2010). There is also an increasing amount of evidence that the cytosine methylation status plays an important role in the control of various epigenetic processes in plants. Examples include vernalization (Kakutani 1997), the control of the differential expression of the genes in the embryo and endosperm via imprinting (Jullien et al. 2006), paramutation (reviewed by Suter and Martin 2010), heritable gene silencing in polyploids (Mittelsten Scheid et al. 2003), and the stabilization of the quiescent status in pollen (Janousek et al. 2000) and the central zone of shoot apical meristem (Zluvova et al. 2001). Reports of the role of changed DNA methylation patterns in inheritance of phenotypic defects include mostly changes in flower morphology (Janousek et al. 1996; Cubas et al. 1999; Jacobsen et al. 2000; Martin et al. 2009). Recently, Arabidopsis epigenome studies were completed using a combination of a 454 strategy and the bisulfite sequencing (Cokus et al. 2008; Lister et al. 2008), but detailed studies of the DNA methylation dynamics, in particular genes, remain still very important as they can reveal the role of epigenetics in phenotypic changes. Moreover, data on DNA methylation patterns in non-Arabidopsis species are extremely rare and so every contribution in this area is valuable. An important difference between plants and animals exists in the sequence context of the methylated sites; angiosperm plants exhibit cytosine methylation in CG, CHG, and CHH sites, whereas in mammals CG methylation is the only form present (with exception of early embryonal development, Lister et al. 2009). This fact must be taken into account in the design of primers for bisulfite-based genomic sequencing because CHG and CHH sites can remain unmodified. Surprisingly, no user-friendly primer design program is publicly available that could be used to design primers for bisulfite-based genomic sequencing in plants and to simultaneously check the primers properties, such as, for example, the potential for primer-dimer formation. The only publicly available primer design program for angiosperm plants, called Kismeth, neither filters the searched primers according to quality (hairpins, self-complementarity, and forward/reverse complementarity) nor shows the results of analysis of the primer quality (Gruntman et al. 2008, see also Table 1). For studies concentrating on particular loci, the correct design of primers is crucial. Our program has the aim to fill the above listed gap.
genomic sequencing, primers
Program Description Numerous studies in human and in mammalian models documented the important role of cytosine methylation in
308
The program presented here, called BisPrimer, was designed specifically for the DNA methylation analysis of plant
Marshall (2004) No In several animal and yeast models Free download from Web Yes Yes Selective
Web server Web server Free download from Web No No No No No No Nonselective Selective Selective Yes No No
No
Yes Yes Yes
Yes
Kismeth Methprimer Methyl Primer Express PerlPrimer
Yes No No
No Yes Nonselective No No Yes Bisearch
No
Gruntman et al. (2008) Li and Dahiya (2002) Applied Biosystems (2006) No Yes Yes
Ara´nyi and Tusna´dy (2007) Yes
In several animal and yeast models No No No
No No
Free download from Web Web server Yes Yes Both Yes Yes
CHH CHG
Yes
Server/download CG
Long single-base stretches Selective/ nonselective variants
Detection of mispriming sites in databases
Methylation sensitive PCR primers design Hairpins, self-complementarity, and forward/reverse pair annealing
ePCR Availability Check of quality Modified template preference Expected context of DNA methylation
Bisprimer
Table 1
Comparison of the publicly available programs for the design of primers for sodium bisulfite genomic sequencing with Bisprimer
MSP
Reference
This article
Computer Note
samples that contain CG-, CHG-, and CHH-methylated sites. In order to enable the user to switch to another organism, we have also included the possibility to switch the program to the design primers for samples containing just CG methylation. The program, therefore, includes 2 different subprograms for primer design (for mammalian or plant samples). Each subprogram can be used in either a selective or a nonselective variant. The selective variant of each subprogram serves to design primers that preferentially bind to the bisulfite-modified primer binding sites (C to U conversion). This type of primer preferentially amplifies the bisulfite-converted DNA strands. This feature can help avoid problems connected with an incomplete bisulfite modification that can sometimes occur for technical reasons. This type of selection is possible also in plants as the occurrence of methylation in CHH context is, even in plants, relatively rare (Cokus et al. 2008). The nonselective variant has the aim to design primers that are able to bind to completely and incompletely modified DNA templates with the same efficiency. In plants (where also methylation in CHH context can be present), this approach can be especially advantageous in the analysis of samples that are supposed to consist of a mixture of DNA molecules that have different levels of cytosine methylation (e.g., pollen DNA, Oakeley and Jost 1996). In this case, the aim is to design the primers so that both heavily modified molecules (from lowly methylated original templates) and lowly modified molecules (from heavily methylated original templates) are amplified with similar efficiency. Complete information concerning the DNA methylation status of the studied region can be therefore obtained. The design of primers is performed in 2 steps (for data flow diagram, see Supplementary Figure 1). To begin, the studied sequence is copied into a sequence input box. Alternatively, 1 or more sequences can be loaded from a file in FASTA format and batch processing is so enabled. The graphical user interface includes some general functions such as the ability to create the reverse complement sequence, a melting temperature count, and the option to clean unwanted characters (numbers and spaces) from the sequence. The minimal length of a pasted sequence is set to 200 base pairs. The user can also set the scopes of the region in the pasted sequence when searching for both primers, so that the approximate length of the sequence to be amplified can be determined. The acceptable size of the amplified region is from 150 to 500 bp. Amplification of longer regions is not recommended because of the increasing risk that some hybrid amplification products can arise (Foerster and Mittelsten Scheid 2010). As an alternative, the program can design primers for each part of the sequence step-by-step in the moving 450-bp long window. In each step, the window is shifted by 150 bp. The user can also specify parameters for counting the melting temperatures of the primers. Three choices are available for mammalian sequences: counting by Bolton and McCarthy (Yang et al. 2009), using Wallace– Ikatura’s empirical formula (Ahsen et al. 2001), or nearestneighbor (NN) thermodynamics (SantaLucia 1998) using formula according to Wetmur and Sninsky (1995), which is
309
Journal of Heredity 2012:103(2)
set as the default. In addition to these choices, empirical Jacobsen’s formula designed specifically for Arabidopsis (reviewed by Chan et al. 2005) can be used for plant sequences. When using the Bolton–McCarthy and NN thermodynamics formulas, knowledge of the salt cations type and concentration, and the concentration of primers in the PCR reaction are needed. If the user does not insert any data about concentration of salts and primers, the program operates with the concentration of salts that are present in 1 TrueStart Taq buffer (20 mM Tris–HCl (pH 8.3 at 25°C), 20 mM KCl, and 5 mM (NH4)2SO4) and the concentration of primers (800 nmol) for typical PCR conditions for the bisulfite-treated templates (Foerster and Mittelsten Scheid 2010). The beginning step is similar in both variants (selective and nonselective). First, an initial population of primers with length between 25 and 40 nucleotides is created. This length is generally recommended for bisulfite-based genomic sequencing (Tusnady et al. 2005, Foerster and Mittelsten Scheid 2010, Henderson et al. 2010). If the second round of selection is performed, the allowed range of primer size is set to 20–40. In the selective variant, a complete conversion of the primer binding sites is assumed, and so all the cytosines are replaced by thymines in the forward primer and all the guanines in the reverse primer are replaced by adenines. In the nonselective variant, the degeneration of the primers is introduced (C to Y in forward primer and G to R in reverse primer) similar to the Kismeth package (Gruntman et al. 2008). These primers are then subjected to further selection. Each primer from the forward group is paired with each primer from the reverse group to create a large set of primer pair candidates to which a product length condition (150–500 bp) is simultaneously applied. To economize the computing time, the program uses a genetic algorithm. The next step is a test of the primers according to the CG content filter. All the created primers with G content (in case of forward group) and with C content (in case of reverse group) higher than 60% are removed. The optimum CG content is 30% according to Tusnady et al. (2005), but even this low value is not achievable in some sequences after bisulfite modification. In the selective variant, a filter follows that excludes all primers which do not reach minimal difference of the melting temperatures (Tm) between absolutely converted and nonconverted primer sequence 8 °C in the first round (Tusnady et al. 2005) and 6 °C in the second round. In addition, at least 1 thymine must be present in the last 5 bases at the 3# end of the forward primer and at least 1 adenine (replacing original guanine) at the 3# end is required at the end of the reverse primer. No cytosine in CG context is allowed at the 3# end of the forward primer in selective variant for mammals. No guanine in CG context is allowed at the 3# end of the reverse primers for mammalian samples. In plants, the presence of any cytosine in CG or CHG context is forbidden at the 3# end of the forward primer. No guanine in CG or CHG context is allowed at the 3# end of the reverse primer in plants. For the nonselective variant, the next step removes all those primers where the difference in
310
the melting temperatures (Tm) between absolutely converted and nonconverted primer sequence (C to T in forward and G to A in reverse) is higher than 2.5 °C (Tusnady et al. 2005). If less than 5 forward or less than 5 reverse primers are generated then the difference up to 5 °C is accepted in the second round of selection. In the nonselective variant for plants, Bisprimer does not allow any cytosine in the last 3 bases at the 3# end of the forward primer. Similarly, no guanine is allowed in the last 3 bases at the 3# end of the reverse primer. For plants, an absence of CG sequence context, a maximum of 1 cytosine in CHG context, and maximally 4 cytosines in the CHH context are demanded in the unmodified binding site of the given primer (Gruntman et al. 2008; Henderson et al. 2010). Corresponding degenerations are introduced for any cytosine in the forward primer (C to Y) and any guanine in the reverse primer (G to R). In the nonselective variant of the mammal oriented subprogram, a similar rule is applied but only CG context is avoided. Degenerations in the primers are also introduced only in CG context. If no matching primer is found, strength of selection is weakened so that 5 cytosines in the CHH context, 3 in the CHG context, and 1 in the CG context are acceptable in the subprogram for plants in the second round of selection, but the sum of CG dinucleotides and CHG trinucleotides must not exceed 4. In all cases (selective and nonselective variants of both subprograms), the difference in the melting temperatures between forward and reverse primer must not exceed 5 °C. In the last steps, the program excludes all the primers that have a high potential to form complexes with themselves or with another primer in the pair. The importance of this type of selection was stressed by Burpo (2001). Primers seemingly assembling any hairpin secondary structures are removed from the population as well. No stretches longer than 7 nucleotides in a run are enabled. In the case that a stretch contains 5, 6, or 7 nucleotides in run, the content of this stretch cannot be higher than 20% of the primer length. The suitable combinations of primers are selected based on their ‘‘relative pairing ratio’’ (gained points divided by maximally possible number of points). The values of gained points obtained in a test for selfannealing, self-end annealing, and annealing of the pair of primers are computed as described by Kampke et al. (2001). The limit for relative pairing ratios can be set by the user. The default value is set to 45%. Important property of each primer design software is the rate at what it generates primers for the tested sequences. We have tested the ability of Bisprimer to generate at least 1 suitable primer pair on the sample of 108 sequences (Supplementary Material 1) semirandomly chosen (from the 9 locations on each chromosome irrespective of the sequence properties) from the Oryza sativa genome at PlantGDB (http://www.plantgdb.org/OsGDB/cgi-bin/ downloadGDB.pl). The obtained rates of the success in primer design are: 85% for nonselective variant in plants, 80% for selective variant in plants, 98% for nonselective variant in mammals, and 94% for selective variant in mammals (all with the application of the filters for hairpins,
Computer Note
self-complementarity, etc.) (for details, see Supplementary Table 1). It must be taken into account that the design of the primers for the bisulfite-based genomic sequencing is more difficult that the design of primers for a typical PCR and therefore the finding of suitable primers for any sequence can not be guaranteed by any software. For the putative case that no primer pair is found using the filters, the filters can be manually switched off. Rates of success with the switched off filters were 98% (plants, nonselective variant), 93% (plants, selective variant), 100% (mammals, nonselective variant), and 97% (mammals, selective variant) (for details, see Supplementary Table 1). Even in this case, the user is provided with the information showing the quality of the found primers. The program displays a minimum of 1 and a maximum of 5 pairs of primers, if it is practicable and set conditions are met. The sample output file (for the input of several sequences) is shown in Supplementary Material 2. The output text file shows regions where primers are hybridized. It also provides characteristics of primers (sequence, position, Tm, length, and the results of the analysis of primer quality). The comparison of the Bisprimer with the other 3 currently used programs is shown in Table 1. We have also performed tests of these programs with the same sequences that were used to test Bisprimer (see Supplementary Table 1). The only programs that are specifically constructed for the design of primers for bisulfite genomic sequencing in plants are Bisprimer (this article) and Kismeth (Gruntman et al. 2008). Kismeth is easy to use but it does not enable to check primers for putative hairpins, selfcomplementarity, forward/reverse pair annealing, and forward/reverse pair end annealing. Bisprimer is also the only program in the table that allows to choose between a selective and a nonselective variant. Similar to Kismeth, there are also programs designed for mammals that do not apply any additional filters for primer quality (hairpins, selfcomplementarity, forward/reverse pair annealing, and forward/reverse pair end annealing): Methprimer (Li and Dahiya 2002) and Methyl Primer Express (see http:// www3.appliedbiosystems.com/cms/groups/mcb_support/ documents/generaldocuments/cms_042392.pdf). In some cases, the absence of additional filters can be even advantageous because some primers are almost always generated (see Supplementary Table 1), and a user of Bisprimer can, therefore, decide if he wants to use them or not. The advantage of the Bisprimer is that it enables to switch the additional filters for primer quality off and, simultaneously, putative problematic features of the primers can be seen in the output. There are 2 programs for mammalian samples that enable the additional check of quality of the primers. Bisearch (Ara´nyi and Tusna´dy, 2007) should be the program of choice if the user wants to use nonselective variant as Bisearch enables also to check putative mispriming sites by the search in databases (ePCR) for several model animal and yeast species. Bisearch does not perform preferential search for the primers selecting modified templates. Some primer pairs amplifying preferentially bisulfite-modified molecules can
be, however, found by chance as a number of non-CG cytosines in primers are not minimized by Bisearch. In mammalian samples, this feature does not represent any problem. A disadvantage of Bisearch is that there is no filter for long single-base repeats. PerlPrimer (Marshall 2004) also enables to filter the searched primers for formation of primer dimers, hairpins, etc. This program is (in contrast to Bisearch) designed to selectively prefer amplification of bisulfite-modified templates. PerlPrimer does not perform ePCR, but it enables to BLAST search several public databases and/or a local database for similar sequences. This can also help avoid mispriming. In contast to Bisearch, long single-base repeats are not allowed in primers designed by PerlPrimer. PerlPrimer possesses relatively strict filters for primer quality, and it is often necessary to set off the filter for CG in the primer binding sites to obtain some primer pairs (see Supplementary Table 2). We suppose that Bisprimer will be useful mainly for the researchers performing bisulfite genomic sequencing in plants, but occasionally it can be used also for mammalian samples or other samples where only CG methylation is present.
Availability For noncommercial use, the program compatible with all currently available versions of WINDOWS is freely available by downloading from the server of the Institute of Biophysics in Brno (www.ibp.cz/local/software/BisPrimer). A free copy of the program can also be requested directly from the authors.
Supplementary Material Supplementary material can be www.jhered.oxfordjournals.org/.
found
at
http://
Funding Czech Science Foundation (projects 521/08/0932 and 204/ 09/H002); Institutional Research Plans AV0Z50040507 and AV0Z50040702.
Acknowledgments The authors would like to thank to Dr. Jaroslav Fulnecek and Prof. Boris Vyskot for fruitful discussions.
References Ahsen N, Wittwer C, Schutz E. 2001. Oligonucleotide melting temperatures under PCR conditions: nearest-neighbor correction for Mg2þ, deoxynucleotide triphosphate, and dimethyl sulfoxide concentrations with comparison to alternative empirical formulas. Clin Chem. 47:1956–1961. Applied Biosystems [Internet]. 2006. Methyl Primer ExpressÒ Software v1.0 Getting Started Guide. Applied Biosystems; [cited 2011 Nov 18]. Available from: URL http://www3.appliedbiosystems.com/cms/groups/mcb_support/ documents/generaldocuments/cms_042392.pdf.
311
Journal of Heredity 2012:103(2) Ara´nyi T, Tusna´dy GE. 2007. BiSearch: ePCR tool for native or bisulfitetreated genomic template. Methods Mol Biol. 402:385–402. Burpo J. 2001. A critical review of PCR primer design algorithms and crosshybridization case study. Biochemistry. 218:1–10. Chan SW, Henderson IR, Jacobsen SE. 2005. Gardening the genome: DNA methylation in Arabidopsis thaliana. Nat Rev Genet. 6:351–360. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE. 2008. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 452:215–219. Cubas P, Vincent C, Coen E. 1999. An epigenetic mutation responsible for natural variation in floral symmetry. Nature. 401:157–161. Foerster A, Mittelsten Scheid O. 2010. Analysis of DNA methylation in plants by bisulfite sequencing plant epigenetics: methods and protocols. Methods Mol Biol. 631:1–11. Gruntman E, Qi Y, Slotkin RK, Roeder T, Martienssen R, Sachidanandam R. 2008. Kismeth: analyzer of plant methylation states through bisulfite sequencing. BMC Bioinformatics. 9:371.
Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. 2008. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 133:523–536. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, et al. 2009. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 462:315–322. Marshall OJ. 2004. PerlPrimer: cross-platform, graphical primer design for standard, bisulphite and real-time PCR. Bioinformatics. 20:2471–2472. Martin A, Troadec C, Boualem A, Rajab M, Fernandez R, Morin H, Pitrat M, Dogimont C, Bendahmane A. 2009. A transposon-induced epigenetic change leads to sex determination in melon. Nature. 46:1135–1138. Mittelsten Scheid O, Afsar K, Paszkowski J. 2003. Formation of stable epialleles and their paramutation-like interaction in tetraploid Arabidopsis thaliana. Nat Genet. 34:450–454. Mund C, Lyko F. 2010. Epigenetic cancer therapy: proof of concept and remaining challenges. Bioessays. 32:949–957. Oakeley EJ, Jost J. 1996. Non-symmetrical cytosine methylation in tobacco pollen DNA. Plant Mol Biol. 31:927–930.
Henderson I, Chan S, Cao X, Johnson L, Jacobsen S. 2010. Accurate sodium bisulfite sequencing in plants. Epigenetics. 5:47–49.
Robertson KD. 2005. DNA methylation and human disease. Nat Rev Genet. 6:597–610.
Hitchins MP. 2010. Inheritance of epigenetic aberrations (constitutional epimutations) in cancer susceptibility. Adv Genet. 70:201–243.
SantaLucia J. 1998. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. PNAS. 95:1460–1465.
Jacobsen SE, Sakai H, Finnegan EJ, Cao X, Meyerowitz EM. 2000. Ectopic hypermethylation of flower-specific genes in Arabidopsis. Curr Biol. 10: 179–186.
Suter CM, Martin DI. 2010. Paramutation: the tip of an epigenetic iceberg? Trends Genet. 26:9–14.
Janousek B, Siroky´ J, Vyskot B. 1996. Epigenetic control of sexual phenotype in a dioecious plant, melandrium album. Mol Gen Genet. 250: 483–490. Janousek B, Zluvova J, Vyskot B. 2000. Histone H4 acetylation and DNA methylation dynamics during pollen development. Protoplasma. 211: 116–122. Jullien PE, Kinoshita T, Ohad N, Berger F. 2006. Maintenance of DNA methylation during the Arabidopsis life cycle is essential for parental imprinting. Plant Cell. 18:1360–1372. Kakutani T. 1997. Genetic characterization of late-flowering traits induced by DNA hypomethylation mutation in Arabidopsis thaliana. Plant J. 12: 1447–1451.
Tusnady G, Simon I, Varadi A, Aranyi T. 2005. BiSearch: primer-design and search tool for PCR on bisulfite-treated genomes. Nucleic Acids Res. 33:e9, doi: 10.1093/nar/gni012. Wetmur J, Sninsky J. 1995. Nucleic acid hybridization and unconventional bases. In: Innis M, Gelfand D, Sninsky J, editors. PCR strategies. 1st ed. San Diego (CA): Academic Press. p. 69–83. Yang Ch-H, Cheng Y-H, Chang H-W, Chuang L-Y. 2009. Primer design with specific PCR product using particle swarm optimization. World Acad Sci Eng Technol. 53:1331–1336. Zluvova J, Janousek B, Vyskot B. 2001. Immunohistochemical study of DNA methylation dynamics during plant development. J Exp Bot. 52:2265–2273.
Kampke T, Kieninger M, Mecklenburg M. 2001. Efficient primer design algorithms. Bioinformatics. 17:214–225.
Received April 1, 2011; Revised October 25, 2011; Accepted November 4, 2011
Li LC, Dahiya R. 2002. MethPrimer: designing primers for methylation PCRs. Bioinformatics. 18:1427–1431.
Corresponding Editor: J. Perry Gustafson
312