Mol Biol Rep (2013) 40:6855–6862 DOI 10.1007/s11033-013-2803-0
Development of SSR markers by next-generation sequencing of Korean landraces of chamoe (Cucumis melo var. makuwa) Inkyu Park • Jungeun Kim • Jeongyeo Lee • Sewon Kim Okhee Cho • Kyungbong Yang • Jongmoon Ahn • Seokhyeon Nahm • HyeRan Kim
•
Received: 12 March 2013 / Accepted: 26 September 2013 / Published online: 5 October 2013 Ó Springer Science+Business Media Dordrecht 2013
Abstract The oriental melon (Cucumis melo var. makuwa), called ‘chamoe’ in Korean, is a popular fruit crop cultivated mainly in Asia and a high-market value crop in Korea. To provide molecular breeding resources for chamoe, we developed and characterized genomic SSR markers from the preliminary Illumina read assemblies of Gotgam chamoe (one of the major landraces; KM) and SW3 (the breeding parent). Mononucleotide motifs were the most abundant type of markers, followed by di-, tri-, tetra-, and pentanucleotide motifs. The most abundant dinucleotide was AT, followed by AG and AC, and AAT Inkyu Park and Jungeun Kim contributed equally to this study.
Electronic supplementary material The online version of this article (doi:10.1007/s11033-013-2803-0) contains supplementary material, which is available to authorized users. I. Park J. Kim J. Lee S. Kim O. Cho K. Yang H. Kim (&) Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), 125 Gwahangno, Daejeon 305-806, Republic of Korea e-mail:
[email protected] I. Park College of Agriculture and Life Science, Chungnam National University, 99 Daehak-ro, Daejeon 305-764, Republic of Korea J. Kim K. Yang H. Kim Systems and Bioengineering, University of Science and Technology (UST), 217 Gajung-ro, Daejeon, Republic of Korea J. Ahn Breeding Institute, Nongwoo Bio Co., LTD., Yeoju, Kyonggi-do 469-885, Republic of Korea S. Nahm Biotechnology Institute, Nongwoo Bio Co., LTD., Yeoju, Kyonggi-do 469-885, Republic of Korea
was the most abundant trinucleotide motif in both assemblies. Following our SSR-marker development strategy, we designed a total of 370 primer sets. Of these, 236 primer sets were tested, exhibiting 93 % polymorphism between KM and SW3. Those polymorphic SSRs were successfully amplified in the netted and Kirkagac melons, which respectively exhibited 81 and 76 % polymorphism relative to KM, and 32 and 38 % polymorphism relative to SW3. Seven selected SSR markers with a total of 17 alleles (2–3 alleles per locus) were used to distinguish between KM, SW3, and four chamoe cultivars. Our results represent the first attempt to provide genomic resources for Korean landraces for the purposes of chamoe breeding, as well as to discover a set of SSR markers capable of discriminating chamoe varieties from Korea and the rest of Asia, which possess little genetic diversity. This study establishes a highly efficient strategy for developing SSR markers from preliminary Illumina assemblies of AT-rich genomes. Keywords Illumina preliminary assembly Oriental melon SSR Genetic diversity
Introduction To increase productivity and commercial value, plant breeders are constantly trying to develop new varieties. However, conventional breeding requires enormous time, effort, and expense, and can also cause crop losses [1, 2]. The power of breeding techniques has been enhanced by the introduction of molecular markers, which enable the methods known as marker-assisted selection [3]. Molecular markers have numerous uses, including genetic-diversity characterization, effective-loci estimation, allelic-effect studies, quantitative-trait locus mapping, gene-flow studies,
123
6856
and evolutionary studies [4]. During the last three decades, many types of markers have been developed and used for crop breeding (reviewed in [5]). Simple sequence repeats (SSRs), or microsatellites, are tandemly repeated one- to six-nucleotide sequence motifs [6]. SSRs are a preferred marker type because they are distributed throughout the genome and exhibit high polymorphism with medium throughput, locus-specific co-dominance, and high rates of transferability [7, 8]. SSR marker analysis remains a popular genotyping method for breeding programs due to its comparatively ease of analysis and low cost compared to single nucleotide polymorphism (SNP) marker analysis. The most important advantage of SSR markers compared to SNP markers is that they can detect multiple alleles per locus. Due to its requirement for prior sequence information, discovery of SSRs was once a tedious, labor-intensive, and expensive process. The advent of next-generation sequencing technologies (NGS) [9], however, has allowed cost-effective sequence generation at extremely high throughput, and the resulting new technologies have been used for rapid and cost-effective SSR discovery in crops. The two NGS systems most widely used for SSR development are the Roche 454 pyrosequencing system and the Illumina sequencing-by-synthesis system. Currently, shotgun sequencing using pyrosequencing is the most popular method for investigating SSR loci in plants, due to this method’s comparatively longer read lengths [10]. By contrast, the short reads of the Illumina system make it possible to perform de novo assembly of contigs without a reference sequence [2], and this system is therefore useful for developing large numbers of SSR markers. Robust algorithms contribute to identification of SSRs from genomes (reviewed in [11]), and the numbers and reliability of the SSRs discovered are dependent upon the algorithms and strategies used [12]. However, it is still important to select high-quality SSR loci for further testing and to increase SSR-screening capacity and polymorphism rates. Melon (Cucumis melo; 2n = 24), which belongs to the Cucurbitaceae family, is an economically important horticultural crop around the world. The species is thought to have originated in Asia [13]. Today, in addition to its agricultural importance, it is also used as a model species for studying fruit ripening, sex determination, and phloem physiology. The melon genome has been mostly sequenced, with the original report describing the assembly of 375 Mb from the DHL92 line, representing 83 % of the melon genome. The initially published draft genome contains 27,427 protein-coding genes and has high AT content [14]. DHL92 is a double-haploid line derived from Sunghwan chamoe (PI 161375 [15]). Chamoe, the Korean name for the oriental melon (C. melo var. makuwa), is a popular fruit crop cultivated mainly in Asia and a high-market value crop in Korea.
123
Mol Biol Rep (2013) 40:6855–6862
There are two major landraces of chamoe in Korea: Sunghwan and Gotgam. Both contain more nutrients and exhibit greater disease resistance and other useful agronomic traits than cultivated chamoes. In particular, the Gotgam chamoe has the aroma of a dried persimmon, from which its name is derived, and breeders have attempted to introduce this trait to melons as well as other cultivated chamoes. To date, however, detailed molecular-biological studies and genotyping of these traits has not been performed. A number of molecular markers and linkage maps [16–19] have been developed, albeit mainly for Western varieties of melons. Needless to say, genomic resources and markers developed for the Korean landraces of chamoe will be important, not only for chamoe breeding but also for molecular-biological studies of desirable chamoe traits. In this study, we aimed to characterize genomic SSRs for a Korean landrace and a breeding line of chamoe, as well as to develop SSR markers to reveal polymorphisms between the two lines. Additional objectives of this study included establishment of an efficient strategy for developing SSR markers from AT-rich genomes using the preliminary NGS assemblies, and evaluation of the utility of the resulting SSR markers for cultivar identification and melon breeding.
Materials and methods Plant materials All plant materials were provided from NongWoo Bio Co., Korea. An inbred line of the Korean landrace Gotgam chamoe (KM, C. melo var. makuwa, NongWoo Bio accession No. 1638) and a breeding-resource inbred line of SW3 Chamoe (SW3, C. melo var. makuwa, NongWoo Bio accession No.1601) were used for NGS sequencing and SSR-marker development. Fresh leaves of KM, SW3, netted melon (C. melo var. reticulatus, NongWoo Bio accession No.1408), Kirkagac melon (C. melo var. inodorus, NongWoo Bio accession No.1533), and four commercial cultivars of chamoe (Bingbichi, Obokggul, Obokplusggul, and Smartggul) were used for genomic DNA isolation to validate SSR markers by polymerase chain reaction (PCR) analysis. Construction of genome assemblies KM and SW3 were grown to the seedling stage in a growth chamber. Samples of fresh leaves were harvested, and subjected to dark conditions for 48 h to avoid contamination by chloroplasts. DNA was extracted using the DNeasy Plant Maxi Kit (QIAGEN, Cat. No. 68163) according to the manufacturer’s instructions. A total of four libraries
Mol Biol Rep (2013) 40:6855–6862
6857
(a)
(b)
Raw read set (.fastq)
KM
SW3 RepeatMasker
Error correction (.corr)
Repeat Masking
Preprocessing (.fasta)
Identification of genomic SSR
SciRoKo
BLASTn
SOAPdenovo Assembly (.fasta)
Finding conserved region
Assembly quality assessment
Analyzing Polymorphic SSRs Primer3
Gap closing (.fasta)
Primer Design to amplify the targeted polymorphic SSRs
Validation of reported primers
Print primer sets
Fig. 1 Strategies for NGS read assembly and SSR mining. a Assembly process of HiSeq 2000 reads using the SOAPdenovo package. b The pipeline for development of SSR markers from preliminary NGS assemblies
from KM (three from intact genomic DNA and one from partially digested genomic DNA) and one library from SW3 were generated by paired-end sequencing with 101 bp read length using a HiSeq 2000 (Illumina). The overall assembly process is shown in Fig. 1a. Sequencing errors were corrected using the short-read correction tool from the SOAPdenovo software (http://soap.genomics.org. cn/soapdenovo.html) with default parameters. Preliminary genome sequences were assembled using SOAPdenovo with various k-mer lengths ranging from 31 to 81 [20]. The optimized assemblies (KM, k-mer = 81; SW3, k-mer = 51) were selected by considering the number of contigs, total assembly size, and N50 (Supplementary Table 1).
scaffolds, we used the SciRoKo software with ‘‘MisMatchFixedPenalty’’ parameters (score C15; mismatch penalty = 5; minimum (min.) length of SSRs = 8; min. repeats of SSRs = 3; and maximum (max.) simultaneous mismatches = 3) [22]. Scaffold sequences were aligned with BLASTN (ver. 2.2.20) with options of ‘‘-F F’’ and an expectation (E) value less than 1E-100 [23]. SSRs identified coding regions were defined by mapping reference genes of Melon [12] to KM and SW3 assemblies with BLAT (The BLAST-Like Alignment Tool) [24]. To remove mapping false positives, the genic regions in the assemblies were defined by [80 % of the gene mapping coverage. To design primer sets for targeted SSRs, we used the primer3_core software with options as follows: optimum length of primers = 20; optimum melting temperature = 60 °C; and PCR product size = 200–500. The primers designed for SSR loci were named CMMS (C. melo makuwa SSR). The overall pipeline was controlled with scripts developed in-house in Python (ver. 2.7) using the BioPython modules (ver. 1.59). PCR analysis DNA was extracted from fresh leaf tissues of each plant according to the modified chloroform-based DNA extraction protocol [25]. Total volume of the reaction mixtures was 20 ll, consisting of 5 ll template DNA (5 ng/ll), 0.5 ll Taq polymerase (5 U/ll), 2 ll dNTPs (2.5 mM each), 2 ll forward ? reverse primer (10 pmol each), 2 ll 109 PCR buffer, and 8.5 ll triple-distilled water. Amplification was performed on a C1000 Thermo Cycler (BioRad, Hercules, CA, USA) according to the step-cycle program set as follows: initial denaturing step at 94 °C for 2 min; 35 cycles at 94 °C for 20 s, 55 °C for 10 s, and 72 °C for 30 s; and a final extension at 72 °C for 5 min. PCR products were separated on 2 % agarose gels for 2 h at 100 V. SSR marker bands were visualized using ethidium bromide. PCR products were also separated on 4 % polyacrylamide gels for 1–2 h at 1,800–2,200 V, and detected using the Silverstar silver-staining kit (Bioneer, Daejeon, Korea).
SSR marker development
Results
The overall process of the pipeline for identification of SSR polymorphisms is represented in Fig. 1b. To prevent amplification of SSRs in interspersed repetitive regions, we masked repetitive sequences in the assembled KM and SW3 genomes by applying RepeatMasker (http://www. repeatmsker.org, ver. 1.0) with species parameters set for Arabidopsis and the ‘‘-nolow’’ option selected to prevent masking of simple repeats [21]. To identify SSRs in
Characteristics of genomic SSRs in the NGS assemblies of KM and SW3 The Illumina sequencing generated 29 and 5 Gb of preprocessed paired-end reads from the KM and SW3 lines, respectively. The assembler SOAPdenovo was used to assemble the two genomes de novo, generating 18,639 scaffolds representing 252 Mb from KM and 122,122
123
6858
Mol Biol Rep (2013) 40:6855–6862
Table 1 Details of KM and SW3 genome assemblies KM
SW3
Raw data Number of reads
392,486,686
107,314,066
Total length (bp)
39,641,149,226
10,842,356,666
Preprocessing Number of reads
340,008,786
74,771,282
Total length (bp)
29,569,295,415
5,212,626,027
Sequence deptha
65.71
11.58
96,248
293,589
Assembly Number of contigs Number of scaffolds
18,639
122,122
Average scaffold length (bp) Scaffold N50 (bp)
252,868,634 13,165
272,541,427 4,441
Scaffold GC contents (%)
31.4
31.5
a
Estimated based on the genome size of 450 Mb [14]. KM: Korean landrace; Gotgam Chamoe, SW3: inbred line
scaffolds representing 272 Mb from SW3 (Table 1). The optimized k-mer values for the assembly of KM and SW3 reads were 31 and 81 bp, yielding scaffold N50 values of 13,165 and 4,441 bp, respectively (Supplementary Table 1). The GC contents for KM and SW3 were 30.4 and 31.4 %, respectively, consistent with the high AT content of the C. melo genome. From scaffolds longer than 200 bp, we identified 96,287 SSRs from KM and 101,899 SSRs from SW3 (Supplementary Table 2). As a reference, our SSR-calling criteria identified 36,968 SSRs from the previously reported melon genome of 1,594 scaffolds, a finished assembly from the DHL92 double-haploid line [14]. Mononucleotide motifs were the most abundant type of repeats (25 % of the total SSRs in KM; 27 % in SW) followed by di-, tri-, tetra-, and pentanucleotide motifs in both chamoe assemblies; by contrast, pentanucleotide motifs were the second most plentiful type in the reference assembly (Supplementary Table 2). The most abundant dinucleotide motif was AT (41 % in KM; 37 % in SW3) followed by AG (8 and 10 %) and AC (4 and 4 %), as in the reference assembly. AAT was the most abundant trinucleotide motif in both chamoe assemblies, whereas AAG was the major trinucleotide motif in the reference assembly. GC-rich motifs were very rare in all three genomes. However, the abundance of different repeat motifs varied with genomic region. Mononucleotide motifs were predominant in intergenic regions and introns while trinucleotide repeats were most plentiful in exons (Supplementary Table 4, 5). Most of SSRs identified in KM and SW3 assemblies were in intergenic regions accounting for 98 and 95 %, respectively. Overall, we detected 2-fold more SSRs from the chamoe assemblies than from the finished reference assembly. The
123
SSRs discovered from the chamoe assemblies were frequently located at the edges of contigs, suggesting that the high rate of SSR discovery was a result of assembly artifacts from low-complexity sequences. Efficient SSR-marker development by in silico polymorphism analysis To increase the success rate of the primers developed for the SSR loci, SSRs were only selected for marker development if they were discovered from aligned regions of the KM and SW3 assemblies. A total of 68 and 89 % of KM and SW3 SSRs, respectively, were screened out because they failed to align, exhibited no SSR length polymorphism between KM and SW3, or could not be used to design primer sets (Table 2). More SSRs were filtered out for shorter motifs than for longer ones. Specifically, due to the lower sequence coverage of SW3, more SW3 SSRs than KM SSRs were filtered out in the alignment steps. After the alignment-based filtering steps, the numbers of SSRs in di- and tri-nucleotide motifs were similar to those in DHL92. The numbers of excluded SSRs were highest for AT-rich motifs, especially those discovered at the edges of contigs. To computationally identify SSR polymorphisms, we surveyed length differences greater than 9 bp between SSRs in conserved regions of the two chamoe genomes. This survey revealed that 45 and 43 % of filtered KM and SW3 SSRs, respectively, exhibited no polymorphisms, whereas 10 and 9 % exhibited polymorphisms shorter than 10 bp. In total, 1,569 SSRs were selected as initial templates for marker development. For these SSRs, 370 primer sets were successfully designed against conserved regions and predicted to yield products longer than 200 bp. Utilities of developed SSR markers in cultivar identification To experimentally validate SSR polymorphisms, we selected one primer set for each scaffold, comprising 236 primer sets. All SSRs were successfully amplified, and 220 (93 %) were polymorphic between KM and SW3 (Supplementary Table 3). Those polymorphic SSRs were amplified in two melon varieties, netted (C. melo var. reticulatus) and Kirkagac (C. melo var. inodorus) melons. All SSRs exhibiting polymorphism between KM and SW3 were successfully amplified from the netted and Kirkagac genomes (Table 3). Of the 220 aforementioned SSRs, 179 (81 %) and 168 (76 %) amplicons of the netted and Kirkagac melons, respectively, were polymorphic relative to KM, whereas 85 and 76 amplicons were polymorphic relative to SW3. There were 71 SSRs polymorphic between netted and Kirkagac melons, similar to the polymorphism rates of those genomes relative to SW3. We also selected seven SSR markers (CMMS2, CMMS3, CMMS4, CMMS39, CMMS44,
Mol Biol Rep (2013) 40:6855–6862
6859
Table 2 Distribution of di- and tri- nucleotide repeat motifs in KM, SW3 and DHL92 Motif length
Species
AC
AG
AT
CG
AAT
AAC
AAG
ACT
ATC
AGG
ACG
ACC
AGC
CCG
Repeat numbers
No. of total SSRs (%)
No. of SSRs in KM/SW3 alignment (%)
5
6
7
8
9
10
[10
DHL92a
–
–
–
509
321
113
49
992 (9.4)
–
KM
–
–
–
480
290
195
430
1,395 (3.8)
576 (41.3)
SW3
–
–
–
533
365
236
569
1,703 (4.4)
214 (12.6)
DHL92
–
–
–
900
598
194
142
1,834 (17.4)
–
KM
–
–
–
687
450
359
1,480
2,976 (8.2)
1,197 (40.2)
SW3
–
–
–
827
602
475
2,136
4,040 (10.5)
391 (9.7)
DHL92 KM
– –
– –
– –
1,366 3,344
714 2,812
171 2,146
103 6,642
2,354 (22.4) 14,944 (41.1)
– 3,846 (25.7)
SW3
–
–
–
3,464
2,867
2,143
5,814
14,288 (37.3)
1,417 (9.9)
DHL92
–
–
–
3
–
–
–
3 (0.0)
–
KM
–
–
–
–
–
–
–
–
–
SW3
–
–
–
–
1
–
–
1 (0.0)
–
DHL92
916
434
80
12
5
–
–
1,447 (13.7)
–
KM
2,365
2,399
1,659
1,102
799
562
2,708
11,594 (31.9)
3,860 (33.3)
SW3
2,313
2,214
1,669
1,163
825
538
2,499
11,221 (29.3)
1,423 (12.7)
DHL92
181
140
32
3
1
–
–
357 (3.4)
–
KM
164
146
98
58
32
16
121
635 (1.8)
213 (33.5)
SW3
201
165
133
72
42
22
105
740 (1.9)
86 (11.6)
DHL92
1,147
741
322
47
18
–
–
2,275 (21.6)
–
KM
819
692
759
408
334
189
647
3,848 (10.6)
1,525 39.6
SW3
988
895
866
523
493
301
934
5,000 (13.0)
523 10.5
DHL92 KM
66 63
59 47
9 41
1 23
– 9
– 4
– 27
135 (1.3) 214 (0.6)
– 90 (42.1)
SW3
77
55
45
23
10
4
32
246 (0.6)
36 (14.6)
DHL92
227
128
35
4
4
–
1
399 (3.8)
–
KM
169
131
123
47
44
7
13
534 (1.5)
216 (40.5)
SW3
227
156
133
48
47
11
17
639 (1.7)
94 (14.7)
DHL92
120
42
16
5
1
–
–
184 (2.0)
–
KM
42
33
17
13
5
2
3
115 (0.3)
41 (35.7)
SW3
83
64
41
33
19
9
6
255 (0.7)
7(2.8)
DHL92
51
22
5
–
–
–
–
78 (0.7)
–
KM
9
3
–
–
–
–
1
13 (0.0)
3 (23.1)
SW3
12
9
2
3
1
1
1
29 (0.1)
1 (3.5)
DHL92
88
53
10
–
–
–
–
151 (1.4)
–
KM
13
12
6
4
2
1
–
38 (0.1)
15 (39.5)
SW3
36
24
9
9
4
3
2
87 (0.2)
4 (4.6)
DHL92
68
42
6
2
–
–
–
118 (1.1)
–
KM SW3
19 31
16 29
8 20
3 11
3 3
– –
– –
49 (0.1) 94 (0.3)
15 (30.6) 7 (7.5)
DHL92
99
43
22
14
2
–
2
182 (1.7)
–
KM
4
–
–
–
1
–
1
6 (0.0)
–
SW3
1
–
–
–
–
–
–
1 (0.0)
–
123
6860
Mol Biol Rep (2013) 40:6855–6862
Table 2 continued Motif length
Species
Total (%)
Repeat numbers
No. of total SSRs (%)
No. of SSRs in KM/SW3 alignment (%)
5
6
7
8
9
10
[10
DHL92
2,963 (28.1)
1,729 (16.4)
537 (5.1)
2,866 (27.2)
1,664 (15.8)
478 (4.5)
297 (2.8)
10,534
–
KM
3,667(10.1)
3,479 (9.6)
2,711 (7.5)
6,169 (16.97)
4,781 (13.2)
3,481 (9.6)
12,073 (33.2)
36,361
11,597 (31.9)
SW3
3,969 (10.4)
3,611 (9.4)
2,918 (7.6)
6,709 (17.5)
5,279 (13.8)
3,743 (9.8)
12,115 (31.6)
38,344
4,203 (11.0)
a
C. melo genome assembly [14] was used for SSR identification using the same strategy of this study. KM: Korean landrace; Gotgam Chamoe, SW3: inbred line
Table 3 Amplification of the developed SSR markers in melons and marker polymorphisms among the varieties Varieties compared
No. of total primers used No. of polymorphic primers (% of polymorphism)
KM/ SW3
KM/ Netted melon
KM/ Kirkagac melon
SW3/ Netted melon
SW3/ Kirkagac melon
Netted melon/ Kirkagac melon
236 220 (93)
220 179 (81)
220 168 (76)
220 76 (32)
220 85 (38)
220 71 (34)
KM: C. melo var. makua, SW3: C. melo var. makuwa), Netted melon: C. melo var. reticulatus, Kirkagac melon: C. melo var. inodorus
CMMS85, and CMMS185) to distinguish between chamoe cultivars: KM, SW3, three Korean cultivars (Obokggul, Obokplusggul, and Smartggul), and one Chinese cultivar (Bingbichi) with KM and SW3 (Fig. 2). Among the seven SSR markers, there were a total of 17 alleles, with 2–3 alleles per locus. Allele sizes ranged between 196 and 393 bp. The SSR genotype data from the seven loci successfully identified all six varieties of chamoes. The genotypic patterns of these seven loci were highly similar between the Korean commercial cultivars and SW3, whereas the genotype of Bingbichi was more similar to that of KM. Obokplusggul and Obokggul were almost identical at all seven loci, except that the former line lacked one band (around 300 bp) representing the CMMS2 marker, shown in dark blue in Fig. 2b. Smartggul exhibited the greatest variety in band patterns, and had more alleles for these loci than the other cultivars. For cases in which these primers cannot distinguish a specific variety from other cultivars, the remaining primer sets should be considered for further study. Our results represent the first attempt to find a set of SSR markers capable of discriminating chamoe varieties from Korea and elsewhere in Asia, which possess limited genetic diversity.
Discussion Advances in NGS have facilitated rapid and cost-effective discovery of molecular markers as a consequence of the
123
release of whole-genome sequences [2, 5]. In this context, we constructed genome assemblies of two chamoe lines using the HiSeq2000 platform, yielding 252 and 272 Mb of scaffolds from KM and SW3, respectively (Table 1). These lines are frequently used for chamoe breeding in Korea; therefore, high-density markers developed from those lines will increase the power of breeding methods. Based on our preliminary versions of genome assemblies, we identified many more SSRs than previously identified in the finished version of the reference melon genome. In addition, the SSRs we discovered from the assemblies of KM and SW3 were extremely enriched in AT-rich motifs, and the ATrich SSRs were frequently detected at the edges of contigs (Table 2, Supplementary Table 3). Considering the ATrichness of the chamoe genome, we interpreted the abundance of SSRs detected from the preliminary assemblies as a reflection of sequencing and assembly artifacts, which were resolved by our strategy for SSR-marker development (i.e., the filtering step). Robust algorithms have contributed to the identification of SSRs from genomes (reviewed in [11]). The numbers of SSRs discovered are dependent upon the algorithms and options used [12]. However, it is still challenging to select SSRs for amplification from several hundreds of thousands of anonymous SSRs and to determine the quality of feasible markers by validating polymorphisms. In the small genome of Arabidopsis thaliana, for instance, there are 104,102 markers [26], and this number increases to
Mol Biol Rep (2013) 40:6855–6862
(a)
6861
M1 2 3 4 5 6 M 500 bp CMMS2
300 bp 200 bp
CMMS3
CMMS4
CMMS39
CMMS44
CMMS85
CMMS185
(b)
M1 2 3 4 5 6 M
Variety
Primer set CMMS2
CMMS3
CMMS4
CMMS39
CMMS44
CMMS85
CMMS185
KM SW3 Bingbichi Obokggul Obokplusggul Smartggul
Fig. 2 Cultivar identification using seven selected SSR markers. a SSR allele patterns of chamoe cultivars including KM and SW3 using the seven selected SSR markers. 1 KM (Korean landrace, Gotgam chamoe), 2 SW3 (inbred line), 3 Bingbichi (Chinese cultivar), 4 Obokggul (Korean cultivar), 5 Obokplusggul (Korean cultivar), 6 Smartgul (Korean cultivar), M: 100-bp ladder molecularweight marker. b Schematic representation of SSR polymorphisms from Fig. 2a. Colors represent different alleles of the seven selected SSR markers
124,325 when different algorithms and options are used [12]. In previous reports, the success ratios of PCR amplifications of SSR markers ranged from 60 to 90 %, and the polymorphic ratios of the amplicons were generally less than 60 % [27–31]. To discover high-quality SSRs from the preliminary assemblies used in this study, we investigated SSRs only in contig regions that aligned between the two assemblies (Table 2). Furthermore, we designed primer sets based on the highly conserved regions of the two assemblies. We also evaluated SSR polymorphisms in silico before designing PCR primers for SSR detection. This strategy removed most pseudo-SSRs and low-quality SSRs (approximately 64–90 % of the total)
resulting from sequencing or assembly artifacts. In addition, all 236 (100 %) of the primer sets were designed successfully amplified the targeted SSR loci in both genomes, and almost all (93 %) were polymorphic between genomes. Thus, our strategy of SSR marker development from preliminary NGS assemblies was highly efficient, rapid, accurate, informative, and cost-effective. The strategy we applied in this study does not require any prior genomic knowledge or genetic information, a feature that increases its utility in other plant species. Therefore, this strategy could be widely used to efficiently develop genomic SSR markers efficiently when whole-genome sequences of related species are released. The chamoe is a variety of melon (C. melo L.). Melon, which belongs to the Cucurbitaceae family, is one of the most diverse species, and its many varieties are important vegetable crops around the world. A large number of melon varieties are developed and commercialized every year. Detection and use of the genetic differences among these varieties, as well as improving the ability to identify melon cultivars, are important tasks for melon breeders. All markers developed in this study are applicable to multiple varieties of melon: our validation using two different melon varieties demonstrated 100 % PCR amplification of the targeted SSR loci (Table 3). This result demonstrates the direct utility of the markers we developed, not only in chamoe breeding but also in melon breeding more generally. Netted melon exhibited more polymorphism (81 %) than the Kirkagac melon (76 %) relative to KM, whereas this relationship was reversed relative to SW3 (32 % for netted melon; 38 % for Kirkagac melon), reflecting the genetic diversity among the varieties (Table 3). KM, a Korean landrace, exhibited large genetic variation relative to other melon varieties, representing an advantage for melon breeding and genetic study. This result further demonstrates the utility of the marker sets we developed for use in genetic-diversity analysis of C. melo varieties. Our markers are also very useful for cultivar identification (Fig. 2). The seven markers we selected could unambiguously identify four commercial cultivars. This is the first attempt to develop a core set of SSR markers to distinguish chamoe cultivars. In conclusion, the chamoe SSR markers obtained in this study are valuable resources for evaluations of varietal purity and authenticity of chamoe varieties, as well as other melons. Acknowledgments This work was financially supported by grants from the Next-Generation Bio Green 21 Program (No. PJ008200) funded by the Rural Development Administration of the Republic of Korea, and the Cabbage Genomics Assisted-Breeding Support Center (CGC), funded by the Ministry for Food, Agriculture, Forestry, and Fisheries of the Republic of Korea.
123
6862
Mol Biol Rep (2013) 40:6855–6862
References 1. Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB (2011) The real cost of sequencing: higher than you think! Genome Biol 12(8):125 2. Varshney RK, Nayak SN, May GD, Jackson SA (2009) Nextgeneration sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 27(9):522–530 3. Collard BC, Mackill DJ (2008) Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philosophical transactions of the Royal Society of London Series B. Biological sciences 363(1491):557–572 4. Moose SP, Mumm RH (2008) Molecular plant breeding as the foundation for 21st century crop improvement. Plant Physiol 147(3):969–977 5. Paux E, Sourdille P, Mackay I, Feuillet C (2012) Sequence-based marker development in wheat: advances and applications to breeding. Biotechnol Adv 30(5):1071–1088 6. Tautz D, Renz M (1984) Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Res 12(10):4127–4138 7. Saha MC, Cooper JD, Mian MA, Chekhovskiy K, May GD (2006) Tall fescue genomic SSR markers: development and transferability across multiple grass species. TAG Theoretical and applied genetics Theoretische und angewandte Genetik 113(8):1449–1458 8. Aggarwal RK, Hendre PS, Varshney RK, Bhat PR, Krishnakumar V, Singh L (2007) Identification, characterization and utilization of EST-derived genic microsatellite markers for genome analyses of coffee and related species. TAG Theoretical and applied genetics Theoretische und angewandte Genetik 114(2):359–372 9. Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11(1):31–46 10. Zalapa JE, Cuevas H, Zhu H, Steffan S, Senalik D, Zeldin E, McCown B, Harbut R, Simon P (2012) Using next-generation sequencing approaches to isolate simple sequence repeat (SSR) loci in the plant sciences. Am J Bot 99(2):193–208 11. Varshney RK, Graner A, Sorrells ME (2005) Genic microsatellite markers in plants: features and applications. Trends Biotechnol 23(1):48–55 12. Kim J, Choi J-P, Ahmad R, Oh S-K, Kwon S-Y, Hur C-G (2012) RISA: a new web-tool for Rapid Identification of SSRs and Analysis of primers. Genes Genom 34(6):583–590 13. Sebastian P, Schaefer H, Telford IR, Renner SS (2010) Cucumber (Cucumis sativus) and melon (C. melo) have numerous wild relatives in Asia and Australia, and the sister species of melon is from Australia. Proc Natl Acad Sci USA 107(32):14269–14273 14. Garcia-Mas J, Benjak A, Sanseverino W, Bourgeois M, Mir G, Gonzalez VM, Henaff E, Camara F, Cozzuto L, Lowy E, Alioto T, Capella-Gutierrez S, Blanca J, Canizares J, Ziarsolo P, Gonzalez-Ibeas D, Rodriguez-Moreno L, Droege M, Du L, Alvarez-Tejado M, Lorente-Galdos B, Mele M, Yang L, Weng Y, Navarro A, Marques-Bonet T, Aranda MA, Nuez F, Pico B, Gabaldon T, Roma G, Guigo R, Casacuberta JM, Arus P, Puigdomenech P (2012) The genome of melon (Cucumis melo L.). Proc Natl Acad Sci USA 109(29):11872–11877 15. van Leeuwen H, Monfort A, Zhang H-B, Puigdomenech P (2003) Identification and characterisation of a melon genomic region containing a resistance gene cluster from a constructed BAC library. Microcolinearity between Cucumis melo and Arabidopsis thaliana. Plant Mol Biol 51(5):703–718 16. Diaz A, Fergany M, Formisano G, Ziarsolo P, Blanca J, Fei Z, Staub JE, Zalapa JE, Cuevas HE, Dace G, Oliver M, Boissot N, Dogimont C, Pitrat M, Hofstede R, van Koert P, Harel-Beja R,
123
17.
18.
19.
20. 21.
22.
23. 24.
25.
26.
27.
28.
29.
30.
31.
Tzuri G, Portnoy V, Cohen S, Schaffer A, Katzir N, Xu Y, Zhang H, Fukino N, Matsumoto S, Garcia-Mas J, Monforte AJ (2011) A consensus linkage map for molecular markers and quantitative trait loci associated with economically important traits in melon (Cucumis melo L.). BMC Plant Biol 11:111 Deleu W, Esteras C, Roig C, Gonzalez-To M, Fernandez-Silva I, Gonzalez-Ibeas D, Blanca J, Aranda MA, Arus P, Nuez F, Monforte AJ, Pico MB, Garcia-Mas J (2009) A set of EST-SNPs for map saturation and cultivar identification in melon. BMC Plant Biol 9:90 Fernandez-Silva I, Eduardo I, Blanca J, Esteras C, Pico B, Nuez F, Arus P, Garcia-Mas J, Monforte AJ (2008) Bin mapping of genomic and EST-derived SSRs in melon (Cucumis melo L.). TAG Theoretical and applied genetics Theoretische und angewandte Genetik 118(1):139–150 Gonzalo MJ, Oliver M, Garcia-Mas J, Monfort A, Dolcet-Sanjuan R, Katzir N, Arus P, Monforte AJ (2005) Simple-sequence repeat markers used in merging linkage maps of melon (Cucumis melo L.). TAG Theoretical and applied genetics Theoretische und angewandte Genetik 110(5):802–811 Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24(5):713–714 Smit AF (1999) Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev 9(6):657–663 Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132:365–386 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410 Kent WJ (2002) BLAT–the BLAST-like alignment tool. Genome Res 12(4):656–664. doi:10.1101/gr.229202 Article published online before March 2002 Causse MA, Fulton TM, Cho YG, Ahn SN, Chunwongse J, Wu K, Xiao J, Yu Z, Ronald PC, Harrington SE et al (1994) Saturated molecular map of the rice genome based on an interspecific backcross population. Genetics 138(4):1251–1274 Lawson MJ, Zhang L (2006) Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes. Genome Biol 7(2):R14 Hong Y, Chen X, Liang X, Liu H, Zhou G, Li S, Wen S, Holbrook CC, Guo B (2010) A SSR-based composite genetic linkage map for the cultivated peanut (Arachis hypogaea L.) genome. BMC Plant Biol 10:17 Xin D, Sun J, Wang J, Jiang H, Hu G, Liu C, Chen Q (2012) Identification and characterization of SSRs from soybean (Glycine max) ESTs. Mol Biol Rep 39(9):9047–9057 Wang Z, Yan H, Fu X, Li X, Gao H (2012) Development of simple sequence repeat markers and diversity analysis in alfalfa (Medicago sativa L.). Mol Biol Rep. doi:10.1007/s11033-0122404-3 Biswas MK, Chai L, Mayer C, Xu Q, Guo W, Deng X (2012) Exploiting BAC-end sequences for the mining, characterization and utility of new short sequences repeat (SSR) markers in Citrus. Mol Biol Rep 39(5):5373–5386 Cloutier S, Miranda E, Ward K, Radovanovic N, Reimer E, Walichnowski A, Datla R, Rowland G, Duguid S, Ragupathy R (2012) Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.). TAG Theoretical and applied genetics Theoretische und angewandte Genetik 125(4):685–694