Development of SSR markers by next-generation ...

Mol Biol Rep (2013) 40:6855–6862 DOI 10.1007/s11033-013-2803-0

Development of SSR markers by next-generation sequencing of Korean landraces of chamoe (Cucumis melo var. makuwa) Inkyu Park • Jungeun Kim • Jeongyeo Lee • Sewon Kim Okhee Cho • Kyungbong Yang • Jongmoon Ahn • Seokhyeon Nahm • HyeRan Kim

•

Received: 12 March 2013 / Accepted: 26 September 2013 / Published online: 5 October 2013 Ó Springer Science+Business Media Dordrecht 2013

Abstract The oriental melon (Cucumis melo var. makuwa), called ‘chamoe’ in Korean, is a popular fruit crop cultivated mainly in Asia and a high-market value crop in Korea. To provide molecular breeding resources for chamoe, we developed and characterized genomic SSR markers from the preliminary Illumina read assemblies of Gotgam chamoe (one of the major landraces; KM) and SW3 (the breeding parent). Mononucleotide motifs were the most abundant type of markers, followed by di-, tri-, tetra-, and pentanucleotide motifs. The most abundant dinucleotide was AT, followed by AG and AC, and AAT Inkyu Park and Jungeun Kim contributed equally to this study.

Electronic supplementary material The online version of this article (doi:10.1007/s11033-013-2803-0) contains supplementary material, which is available to authorized users. I. Park J. Kim J. Lee S. Kim O. Cho K. Yang H. Kim (&) Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), 125 Gwahangno, Daejeon 305-806, Republic of Korea e-mail: [email protected] I. Park College of Agriculture and Life Science, Chungnam National University, 99 Daehak-ro, Daejeon 305-764, Republic of Korea J. Kim K. Yang H. Kim Systems and Bioengineering, University of Science and Technology (UST), 217 Gajung-ro, Daejeon, Republic of Korea J. Ahn Breeding Institute, Nongwoo Bio Co., LTD., Yeoju, Kyonggi-do 469-885, Republic of Korea S. Nahm Biotechnology Institute, Nongwoo Bio Co., LTD., Yeoju, Kyonggi-do 469-885, Republic of Korea

was the most abundant trinucleotide motif in both assemblies. Following our SSR-marker development strategy, we designed a total of 370 primer sets. Of these, 236 primer sets were tested, exhibiting 93 % polymorphism between KM and SW3. Those polymorphic SSRs were successfully amplified in the netted and Kirkagac melons, which respectively exhibited 81 and 76 % polymorphism relative to KM, and 32 and 38 % polymorphism relative to SW3. Seven selected SSR markers with a total of 17 alleles (2–3 alleles per locus) were used to distinguish between KM, SW3, and four chamoe cultivars. Our results represent the first attempt to provide genomic resources for Korean landraces for the purposes of chamoe breeding, as well as to discover a set of SSR markers capable of discriminating chamoe varieties from Korea and the rest of Asia, which possess little genetic diversity. This study establishes a highly efficient strategy for developing SSR markers from preliminary Illumina assemblies of AT-rich genomes. Keywords Illumina preliminary assembly Oriental melon SSR Genetic diversity

Introduction To increase productivity and commercial value, plant breeders are constantly trying to develop new varieties. However, conventional breeding requires enormous time, effort, and expense, and can also cause crop losses [1, 2]. The power of breeding techniques has been enhanced by the introduction of molecular markers, which enable the methods known as marker-assisted selection [3]. Molecular markers have numerous uses, including genetic-diversity characterization, effective-loci estimation, allelic-effect studies, quantitative-trait locus mapping, gene-flow studies,

123

6856

and evolutionary studies [4]. During the last three decades, many types of markers have been developed and used for crop breeding (reviewed in [5]). Simple sequence repeats (SSRs), or microsatellites, are tandemly repeated one- to six-nucleotide sequence motifs [6]. SSRs are a preferred marker type because they are distributed throughout the genome and exhibit high polymorphism with medium throughput, locus-specific co-dominance, and high rates of transferability [7, 8]. SSR marker analysis remains a popular genotyping method for breeding programs due to its comparatively ease of analysis and low cost compared to single nucleotide polymorphism (SNP) marker analysis. The most important advantage of SSR markers compared to SNP markers is that they can detect multiple alleles per locus. Due to its requirement for prior sequence information, discovery of SSRs was once a tedious, labor-intensive, and expensive process. The advent of next-generation sequencing technologies (NGS) [9], however, has allowed cost-effective sequence generation at extremely high throughput, and the resulting new technologies have been used for rapid and cost-effective SSR discovery in crops. The two NGS systems most widely used for SSR development are the Roche 454 pyrosequencing system and the Illumina sequencing-by-synthesis system. Currently, shotgun sequencing using pyrosequencing is the most popular method for investigating SSR loci in plants, due to this method’s comparatively longer read lengths [10]. By contrast, the short reads of the Illumina system make it possible to perform de novo assembly of contigs without a reference sequence [2], and this system is therefore useful for developing large numbers of SSR markers. Robust algorithms contribute to identification of SSRs from genomes (reviewed in [11]), and the numbers and reliability of the SSRs discovered are dependent upon the algorithms and strategies used [12]. However, it is still important to select high-quality SSR loci for further testing and to increase SSR-screening capacity and polymorphism rates. Melon (Cucumis melo; 2n = 24), which belongs to the Cucurbitaceae family, is an economically important horticultural crop around the world. The species is thought to have originated in Asia [13]. Today, in addition to its agricultural importance, it is also used as a model species for studying fruit ripening, sex determination, and phloem physiology. The melon genome has been mostly sequenced, with the original report describing the assembly of 375 Mb from the DHL92 line, representing 83 % of the melon genome. The initially published draft genome contains 27,427 protein-coding genes and has high AT content [14]. DHL92 is a double-haploid line derived from Sunghwan chamoe (PI 161375 [15]). Chamoe, the Korean name for the oriental melon (C. melo var. makuwa), is a popular fruit crop cultivated mainly in Asia and a high-market value crop in Korea.

123

Mol Biol Rep (2013) 40:6855–6862

There are two major landraces of chamoe in Korea: Sunghwan and Gotgam. Both contain more nutrients and exhibit greater disease resistance and other useful agronomic traits than cultivated chamoes. In particular, the Gotgam chamoe has the aroma of a dried persimmon, from which its name is derived, and breeders have attempted to introduce this trait to melons as well as other cultivated chamoes. To date, however, detailed molecular-biological studies and genotyping of these traits has not been performed. A number of molecular markers and linkage maps [16–19] have been developed, albeit mainly for Western varieties of melons. Needless to say, genomic resources and markers developed for the Korean landraces of chamoe will be important, not only for chamoe breeding but also for molecular-biological studies of desirable chamoe traits. In this study, we aimed to characterize genomic SSRs for a Korean landrace and a breeding line of chamoe, as well as to develop SSR markers to reveal polymorphisms between the two lines. Additional objectives of this study included establishment of an efficient strategy for developing SSR markers from AT-rich genomes using the preliminary NGS assemblies, and evaluation of the utility of the resulting SSR markers for cultivar identification and melon breeding.

Materials and methods Plant materials All plant materials were provided from NongWoo Bio Co., Korea. An inbred line of the Korean landrace Gotgam chamoe (KM, C. melo var. makuwa, NongWoo Bio accession No. 1638) and a breeding-resource inbred line of SW3 Chamoe (SW3, C. melo var. makuwa, NongWoo Bio accession No.1601) were used for NGS sequencing and SSR-marker development. Fresh leaves of KM, SW3, netted melon (C. melo var. reticulatus, NongWoo Bio accession No.1408), Kirkagac melon (C. melo var. inodorus, NongWoo Bio accession No.1533), and four commercial cultivars of chamoe (Bingbichi, Obokggul, Obokplusggul, and Smartggul) were used for genomic DNA isolation to validate SSR markers by polymerase chain reaction (PCR) analysis. Construction of genome assemblies KM and SW3 were grown to the seedling stage in a growth chamber. Samples of fresh leaves were harvested, and subjected to dark conditions for 48 h to avoid contamination by chloroplasts. DNA was extracted using the DNeasy Plant Maxi Kit (QIAGEN, Cat. No. 68163) according to the manufacturer’s instructions. A total of four libraries

Mol Biol Rep (2013) 40:6855–6862

6857

(a)

(b)

Raw read set (.fastq)

KM

SW3 RepeatMasker

Error correction (.corr)

Repeat Masking

Preprocessing (.fasta)

Identification of genomic SSR

SciRoKo

BLASTn

SOAPdenovo Assembly (.fasta)

Finding conserved region

Assembly quality assessment

Analyzing Polymorphic SSRs Primer3

Gap closing (.fasta)

Primer Design to amplify the targeted polymorphic SSRs

Validation of reported primers

Print primer sets

Fig. 1 Strategies for NGS read assembly and SSR mining. a Assembly process of HiSeq 2000 reads using the SOAPdenovo package. b The pipeline for development of SSR markers from preliminary NGS assemblies

from KM (three from intact genomic DNA and one from partially digested genomic DNA) and one library from SW3 were generated by paired-end sequencing with 101 bp read length using a HiSeq 2000 (Illumina). The overall assembly process is shown in Fig. 1a. Sequencing errors were corrected using the short-read correction tool from the SOAPdenovo software (http://soap.genomics.org. cn/soapdenovo.html) with default parameters. Preliminary genome sequences were assembled using SOAPdenovo with various k-mer lengths ranging from 31 to 81 [20]. The optimized assemblies (KM, k-mer = 81; SW3, k-mer = 51) were selected by considering the number of contigs, total assembly size, and N50 (Supplementary Table 1).

scaffolds, we used the SciRoKo software with ‘‘MisMatchFixedPenalty’’ parameters (score C15; mismatch penalty = 5; minimum (min.) length of SSRs = 8; min. repeats of SSRs = 3; and maximum (max.) simultaneous mismatches = 3) [22]. Scaffold sequences were aligned with BLASTN (ver. 2.2.20) with options of ‘‘-F F’’ and an expectation (E) value less than 1E-100 [23]. SSRs identified coding regions were defined by mapping reference genes of Melon [12] to KM and SW3 assemblies with BLAT (The BLAST-Like Alignment Tool) [24]. To remove mapping false positives, the genic regions in the assemblies were defined by [80 % of the gene mapping coverage. To design primer sets for targeted SSRs, we used the primer3_core software with options as follows: optimum length of primers = 20; optimum melting temperature = 60 °C; and PCR product size = 200–500. The primers designed for SSR loci were named CMMS (C. melo makuwa SSR). The overall pipeline was controlled with scripts developed in-house in Python (ver. 2.7) using the BioPython modules (ver. 1.59). PCR analysis DNA was extracted from fresh leaf tissues of each plant according to the modified chloroform-based DNA extraction protocol [25]. Total volume of the reaction mixtures was 20 ll, consisting of 5 ll template DNA (5 ng/ll), 0.5 ll Taq polymerase (5 U/ll), 2 ll dNTPs (2.5 mM each), 2 ll forward ? reverse primer (10 pmol each), 2 ll 109 PCR buffer, and 8.5 ll triple-distilled water. Amplification was performed on a C1000 Thermo Cycler (BioRad, Hercules, CA, USA) according to the step-cycle program set as follows: initial denaturing step at 94 °C for 2 min; 35 cycles at 94 °C for 20 s, 55 °C for 10 s, and 72 °C for 30 s; and a final extension at 72 °C for 5 min. PCR products were separated on 2 % agarose gels for 2 h at 100 V. SSR marker bands were visualized using ethidium bromide. PCR products were also separated on 4 % polyacrylamide gels for 1–2 h at 1,800–2,200 V, and detected using the Silverstar silver-staining kit (Bioneer, Daejeon, Korea).

SSR marker development

Results

The overall process of the pipeline for identification of SSR polymorphisms is represented in Fig. 1b. To prevent amplification of SSRs in interspersed repetitive regions, we masked repetitive sequences in the assembled KM and SW3 genomes by applying RepeatMasker (http://www. repeatmsker.org, ver. 1.0) with species parameters set for Arabidopsis and the ‘‘-nolow’’ option selected to prevent masking of simple repeats [21]. To identify SSRs in

Characteristics of genomic SSRs in the NGS assemblies of KM and SW3 The Illumina sequencing generated 29 and 5 Gb of preprocessed paired-end reads from the KM and SW3 lines, respectively. The assembler SOAPdenovo was used to assemble the two genomes de novo, generating 18,639 scaffolds representing 252 Mb from KM and 122,122

123

6858

Mol Biol Rep (2013) 40:6855–6862

Table 1 Details of KM and SW3 genome assemblies KM

SW3

Raw data Number of reads

392,486,686

107,314,066

Total length (bp)

39,641,149,226

10,842,356,666

Preprocessing Number of reads

340,008,786

74,771,282

Total length (bp)

29,569,295,415

5,212,626,027

Sequence deptha

65.71

11.58

96,248

293,589

Assembly Number of contigs Number of scaffolds

18,639

122,122

Average scaffold length (bp) Scaffold N50 (bp)

252,868,634 13,165

272,541,427 4,441

Scaffold GC contents (%)

31.4

31.5

a

Estimated based on the genome size of 450 Mb [14]. KM: Korean landrace; Gotgam Chamoe, SW3: inbred line

scaffolds representing 272 Mb from SW3 (Table 1). The optimized k-mer values for the assembly of KM and SW3 reads were 31 and 81 bp, yielding scaffold N50 values of 13,165 and 4,441 bp, respectively (Supplementary Table 1). The GC contents for KM and SW3 were 30.4 and 31.4 %, respectively, consistent with the high AT content of the C. melo genome. From scaffolds longer than 200 bp, we identified 96,287 SSRs from KM and 101,899 SSRs from SW3 (Supplementary Table 2). As a reference, our SSR-calling criteria identified 36,968 SSRs from the previously reported melon genome of 1,594 scaffolds, a finished assembly from the DHL92 double-haploid line [14]. Mononucleotide motifs were the most abundant type of repeats (25 % of the total SSRs in KM; 27 % in SW) followed by di-, tri-, tetra-, and pentanucleotide motifs in both chamoe assemblies; by contrast, pentanucleotide motifs were the second most plentiful type in the reference assembly (Supplementary Table 2). The most abundant dinucleotide motif was AT (41 % in KM; 37 % in SW3) followed by AG (8 and 10 %) and AC (4 and 4 %), as in the reference assembly. AAT was the most abundant trinucleotide motif in both chamoe assemblies, whereas AAG was the major trinucleotide motif in the reference assembly. GC-rich motifs were very rare in all three genomes. However, the abundance of different repeat motifs varied with genomic region. Mononucleotide motifs were predominant in intergenic regions and introns while trinucleotide repeats were most plentiful in exons (Supplementary Table 4, 5). Most of SSRs identified in KM and SW3 assemblies were in intergenic regions accounting for 98 and 95 %, respectively. Overall, we detected 2-fold more SSRs from the chamoe assemblies than from the finished reference assembly. The

123

SSRs discovered from the chamoe assemblies were frequently located at the edges of contigs, suggesting that the high rate of SSR discovery was a result of assembly artifacts from low-complexity sequences. Efficient SSR-marker development by in silico polymorphism analysis To increase the success rate of the primers developed for the SSR loci, SSRs were only selected for marker development if they were discovered from aligned regions of the KM and SW3 assemblies. A total of 68 and 89 % of KM and SW3 SSRs, respectively, were screened out because they failed to align, exhibited no SSR length polymorphism between KM and SW3, or could not be used to design primer sets (Table 2). More SSRs were filtered out for shorter motifs than for longer ones. Specifically, due to the lower sequence coverage of SW3, more SW3 SSRs than KM SSRs were filtered out in the alignment steps. After the alignment-based filtering steps, the numbers of SSRs in di- and tri-nucleotide motifs were similar to those in DHL92. The numbers of excluded SSRs were highest for AT-rich motifs, especially those discovered at the edges of contigs. To computationally identify SSR polymorphisms, we surveyed length differences greater than 9 bp between SSRs in conserved regions of the two chamoe genomes. This survey revealed that 45 and 43 % of filtered KM and SW3 SSRs, respectively, exhibited no polymorphisms, whereas 10 and 9 % exhibited polymorphisms shorter than 10 bp. In total, 1,569 SSRs were selected as initial templates for marker development. For these SSRs, 370 primer sets were successfully designed against conserved regions and predicted to yield products longer than 200 bp. Utilities of developed SSR markers in cultivar identification To experimentally validate SSR polymorphisms, we selected one primer set for each scaffold, comprising 236 primer sets. All SSRs were successfully amplified, and 220 (93 %) were polymorphic between KM and SW3 (Supplementary Table 3). Those polymorphic SSRs were amplified in two melon varieties, netted (C. melo var. reticulatus) and Kirkagac (C. melo var. inodorus) melons. All SSRs exhibiting polymorphism between KM and SW3 were successfully amplified from the netted and Kirkagac genomes (Table 3). Of the 220 aforementioned SSRs, 179 (81 %) and 168 (76 %) amplicons of the netted and Kirkagac melons, respectively, were polymorphic relative to KM, whereas 85 and 76 amplicons were polymorphic relative to SW3. There were 71 SSRs polymorphic between netted and Kirkagac melons, similar to the polymorphism rates of those genomes relative to SW3. We also selected seven SSR markers (CMMS2, CMMS3, CMMS4, CMMS39, CMMS44,

Mol Biol Rep (2013) 40:6855–6862

6859

Table 2 Distribution of di- and trinucleotide repeat motifs in KM, SW3 and DHL92 Motif length

Species

AC

AG

AT

CG

AAT

AAC

AAG

ACT

ATC

AGG

ACG

ACC

AGC

CCG

Repeat numbers

No. of total SSRs (%)

No. of SSRs in KM/SW3 alignment (%)

5

6

7

8

9

10

[10

DHL92a

–

–

–

509

321

113

49

992 (9.4)

–

KM

–

–

–

480

290

195

430

1,395 (3.8)

576 (41.3)

SW3

–

–

–

533

365

236

569

1,703 (4.4)

214 (12.6)

DHL92

–

–

–

900

598

194

142

1,834 (17.4)

–

KM

–

–

–

687

450

359

1,480

2,976 (8.2)

1,197 (40.2)

SW3

–

–

–

827

602

475

2,136

4,040 (10.5)

391 (9.7)

DHL92 KM

– –

– –

– –

1,366 3,344

714 2,812

171 2,146

103 6,642

2,354 (22.4) 14,944 (41.1)

– 3,846 (25.7)

SW3

–

–

–

3,464

2,867

2,143

5,814

14,288 (37.3)

1,417 (9.9)

DHL92

–

–

–

3

–

–

–

3 (0.0)

–

KM

–

–

–

–

–

–

–

–

–

SW3

–

–

–

–

1

–

–

1 (0.0)

–

DHL92

916

434

80

12

5

–

–

1,447 (13.7)

–

KM

2,365

2,399

1,659

1,102

799

562

2,708

11,594 (31.9)

3,860 (33.3)

SW3

2,313

2,214

1,669

1,163

825

538

2,499

11,221 (29.3)

1,423 (12.7)

DHL92

181

140

32

3

1

–

–

357 (3.4)

–

KM

164

146

98

58

32

16

121

635 (1.8)

213 (33.5)

SW3

201

165

133

72

42

22

105

740 (1.9)

86 (11.6)

DHL92

1,147

741

322

47

18

–

–

2,275 (21.6)

–

KM

819

692

759

408

334

189

647

3,848 (10.6)

1,525 39.6

SW3

988

895

866

523

493

301

934

5,000 (13.0)

523 10.5

DHL92 KM

66 63

59 47

9 41

1 23

– 9

– 4

– 27

135 (1.3) 214 (0.6)

– 90 (42.1)

SW3

77

55

45

23

10

4

32

246 (0.6)

36 (14.6)

DHL92

227

128

35

4

4

–

1

399 (3.8)

–

KM

169

131

123

47

44

7

13

534 (1.5)

216 (40.5)

SW3

227

156

133

48

47

11

17

639 (1.7)

94 (14.7)

DHL92

120

42

16

5

1

–

–

184 (2.0)

–

KM

42

33

17

13

5

2

3

115 (0.3)

41 (35.7)

SW3

83

64

41

33

19

9

6

255 (0.7)

7(2.8)

DHL92

51

22

5

–

–

–

–

78 (0.7)

–

KM

9

3

–

–

–

–

1

13 (0.0)

3 (23.1)

SW3

12

9

2

3

1

1

1

29 (0.1)

1 (3.5)

DHL92

88

53

10

–

–

–

–

151 (1.4)

–

KM

13

12

6

4

2

1

–

38 (0.1)

15 (39.5)

SW3

36

24

9

9

4

3

2

87 (0.2)

4 (4.6)

DHL92

68

42

6

2

–

–

–

118 (1.1)

–

KM SW3

19 31

16 29

8 20

3 11

3 3

– –

– –

49 (0.1) 94 (0.3)

15 (30.6) 7 (7.5)

DHL92

99

43

22

14

2

–

2

182 (1.7)

–

KM

4

–

–

–

1

–

1

6 (0.0)

–

SW3

1

–

–

–

–

–

–

1 (0.0)

–

123

6860

Mol Biol Rep (2013) 40:6855–6862

Table 2 continued Motif length

Species

Total (%)

Repeat numbers

No. of total SSRs (%)

No. of SSRs in KM/SW3 alignment (%)

5

6

7

8

9

10

[10

DHL92

2,963 (28.1)

1,729 (16.4)

537 (5.1)

2,866 (27.2)

1,664 (15.8)

478 (4.5)

297 (2.8)

10,534

–

KM

3,667(10.1)

3,479 (9.6)

2,711 (7.5)

6,169 (16.97)

4,781 (13.2)

3,481 (9.6)

12,073 (33.2)

36,361

11,597 (31.9)

SW3

3,969 (10.4)

3,611 (9.4)

2,918 (7.6)

6,709 (17.5)

5,279 (13.8)

3,743 (9.8)

12,115 (31.6)

38,344

4,203 (11.0)

a

C. melo genome assembly [14] was used for SSR identification using the same strategy of this study. KM: Korean landrace; Gotgam Chamoe, SW3: inbred line

Table 3 Amplification of the developed SSR markers in melons and marker polymorphisms among the varieties Varieties compared

No. of total primers used No. of polymorphic primers (% of polymorphism)

KM/ SW3

KM/ Netted melon

KM/ Kirkagac melon

SW3/ Netted melon

SW3/ Kirkagac melon

Netted melon/ Kirkagac melon

236 220 (93)

220 179 (81)

220 168 (76)

220 76 (32)

220 85 (38)

220 71 (34)

KM: C. melo var. makua, SW3: C. melo var. makuwa), Netted melon: C. melo var. reticulatus, Kirkagac melon: C. melo var. inodorus

CMMS85, and CMMS185) to distinguish between chamoe cultivars: KM, SW3, three Korean cultivars (Obokggul, Obokplusggul, and Smartggul), and one Chinese cultivar (Bingbichi) with KM and SW3 (Fig. 2). Among the seven SSR markers, there were a total of 17 alleles, with 2–3 alleles per locus. Allele sizes ranged between 196 and 393 bp. The SSR genotype data from the seven loci successfully identified all six varieties of chamoes. The genotypic patterns of these seven loci were highly similar between the Korean commercial cultivars and SW3, whereas the genotype of Bingbichi was more similar to that of KM. Obokplusggul and Obokggul were almost identical at all seven loci, except that the former line lacked one band (around 300 bp) representing the CMMS2 marker, shown in dark blue in Fig. 2b. Smartggul exhibited the greatest variety in band patterns, and had more alleles for these loci than the other cultivars. For cases in which these primers cannot distinguish a specific variety from other cultivars, the remaining primer sets should be considered for further study. Our results represent the first attempt to find a set of SSR markers capable of discriminating chamoe varieties from Korea and elsewhere in Asia, which possess limited genetic diversity.

Discussion Advances in NGS have facilitated rapid and cost-effective discovery of molecular markers as a consequence of the

123

release of whole-genome sequences [2, 5]. In this context, we constructed genome assemblies of two chamoe lines using the HiSeq2000 platform, yielding 252 and 272 Mb of scaffolds from KM and SW3, respectively (Table 1). These lines are frequently used for chamoe breeding in Korea; therefore, high-density markers developed from those lines will increase the power of breeding methods. Based on our preliminary versions of genome assemblies, we identified many more SSRs than previously identified in the finished version of the reference melon genome. In addition, the SSRs we discovered from the assemblies of KM and SW3 were extremely enriched in AT-rich motifs, and the ATrich SSRs were frequently detected at the edges of contigs (Table 2, Supplementary Table 3). Considering the ATrichness of the chamoe genome, we interpreted the abundance of SSRs detected from the preliminary assemblies as a reflection of sequencing and assembly artifacts, which were resolved by our strategy for SSR-marker development (i.e., the filtering step). Robust algorithms have contributed to the identification of SSRs from genomes (reviewed in [11]). The numbers of SSRs discovered are dependent upon the algorithms and options used [12]. However, it is still challenging to select SSRs for amplification from several hundreds of thousands of anonymous SSRs and to determine the quality of feasible markers by validating polymorphisms. In the small genome of Arabidopsis thaliana, for instance, there are 104,102 markers [26], and this number increases to

Mol Biol Rep (2013) 40:6855–6862

(a)

6861

M1 2 3 4 5 6 M 500 bp CMMS2

300 bp 200 bp

CMMS3

CMMS4

CMMS39

CMMS44

CMMS85

CMMS185

(b)

M1 2 3 4 5 6 M

Variety

Primer set CMMS2

CMMS3

CMMS4

CMMS39

CMMS44

CMMS85

CMMS185

KM SW3 Bingbichi Obokggul Obokplusggul Smartggul

Fig. 2 Cultivar identification using seven selected SSR markers. a SSR allele patterns of chamoe cultivars including KM and SW3 using the seven selected SSR markers. 1 KM (Korean landrace, Gotgam chamoe), 2 SW3 (inbred line), 3 Bingbichi (Chinese cultivar), 4 Obokggul (Korean cultivar), 5 Obokplusggul (Korean cultivar), 6 Smartgul (Korean cultivar), M: 100-bp ladder molecularweight marker. b Schematic representation of SSR polymorphisms from Fig. 2a. Colors represent different alleles of the seven selected SSR markers

124,325 when different algorithms and options are used [12]. In previous reports, the success ratios of PCR amplifications of SSR markers ranged from 60 to 90 %, and the polymorphic ratios of the amplicons were generally less than 60 % [27–31]. To discover high-quality SSRs from the preliminary assemblies used in this study, we investigated SSRs only in contig regions that aligned between the two assemblies (Table 2). Furthermore, we designed primer sets based on the highly conserved regions of the two assemblies. We also evaluated SSR polymorphisms in silico before designing PCR primers for SSR detection. This strategy removed most pseudo-SSRs and low-quality SSRs (approximately 64–90 % of the total)

resulting from sequencing or assembly artifacts. In addition, all 236 (100 %) of the primer sets were designed successfully amplified the targeted SSR loci in both genomes, and almost all (93 %) were polymorphic between genomes. Thus, our strategy of SSR marker development from preliminary NGS assemblies was highly efficient, rapid, accurate, informative, and cost-effective. The strategy we applied in this study does not require any prior genomic knowledge or genetic information, a feature that increases its utility in other plant species. Therefore, this strategy could be widely used to efficiently develop genomic SSR markers efficiently when whole-genome sequences of related species are released. The chamoe is a variety of melon (C. melo L.). Melon, which belongs to the Cucurbitaceae family, is one of the most diverse species, and its many varieties are important vegetable crops around the world. A large number of melon varieties are developed and commercialized every year. Detection and use of the genetic differences among these varieties, as well as improving the ability to identify melon cultivars, are important tasks for melon breeders. All markers developed in this study are applicable to multiple varieties of melon: our validation using two different melon varieties demonstrated 100 % PCR amplification of the targeted SSR loci (Table 3). This result demonstrates the direct utility of the markers we developed, not only in chamoe breeding but also in melon breeding more generally. Netted melon exhibited more polymorphism (81 %) than the Kirkagac melon (76 %) relative to KM, whereas this relationship was reversed relative to SW3 (32 % for netted melon; 38 % for Kirkagac melon), reflecting the genetic diversity among the varieties (Table 3). KM, a Korean landrace, exhibited large genetic variation relative to other melon varieties, representing an advantage for melon breeding and genetic study. This result further demonstrates the utility of the marker sets we developed for use in genetic-diversity analysis of C. melo varieties. Our markers are also very useful for cultivar identification (Fig. 2). The seven markers we selected could unambiguously identify four commercial cultivars. This is the first attempt to develop a core set of SSR markers to distinguish chamoe cultivars. In conclusion, the chamoe SSR markers obtained in this study are valuable resources for evaluations of varietal purity and authenticity of chamoe varieties, as well as other melons. Acknowledgments This work was financially supported by grants from the Next-Generation Bio Green 21 Program (No. PJ008200) funded by the Rural Development Administration of the Republic of Korea, and the Cabbage Genomics Assisted-Breeding Support Center (CGC), funded by the Ministry for Food, Agriculture, Forestry, and Fisheries of the Republic of Korea.

123

6862

Mol Biol Rep (2013) 40:6855–6862

References 1. Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB (2011) The real cost of sequencing: higher than you think! Genome Biol 12(8):125 2. Varshney RK, Nayak SN, May GD, Jackson SA (2009) Nextgeneration sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 27(9):522–530 3. Collard BC, Mackill DJ (2008) Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philosophical transactions of the Royal Society of London Series B. Biological sciences 363(1491):557–572 4. Moose SP, Mumm RH (2008) Molecular plant breeding as the foundation for 21st century crop improvement. Plant Physiol 147(3):969–977 5. Paux E, Sourdille P, Mackay I, Feuillet C (2012) Sequence-based marker development in wheat: advances and applications to breeding. Biotechnol Adv 30(5):1071–1088 6. Tautz D, Renz M (1984) Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Res 12(10):4127–4138 7. Saha MC, Cooper JD, Mian MA, Chekhovskiy K, May GD (2006) Tall fescue genomic SSR markers: development and transferability across multiple grass species. TAG Theoretical and applied genetics Theoretische und angewandte Genetik 113(8):1449–1458 8. Aggarwal RK, Hendre PS, Varshney RK, Bhat PR, Krishnakumar V, Singh L (2007) Identification, characterization and utilization of EST-derived genic microsatellite markers for genome analyses of coffee and related species. TAG Theoretical and applied genetics Theoretische und angewandte Genetik 114(2):359–372 9. Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11(1):31–46 10. Zalapa JE, Cuevas H, Zhu H, Steffan S, Senalik D, Zeldin E, McCown B, Harbut R, Simon P (2012) Using next-generation sequencing approaches to isolate simple sequence repeat (SSR) loci in the plant sciences. Am J Bot 99(2):193–208 11. Varshney RK, Graner A, Sorrells ME (2005) Genic microsatellite markers in plants: features and applications. Trends Biotechnol 23(1):48–55 12. Kim J, Choi J-P, Ahmad R, Oh S-K, Kwon S-Y, Hur C-G (2012) RISA: a new web-tool for Rapid Identification of SSRs and Analysis of primers. Genes Genom 34(6):583–590 13. Sebastian P, Schaefer H, Telford IR, Renner SS (2010) Cucumber (Cucumis sativus) and melon (C. melo) have numerous wild relatives in Asia and Australia, and the sister species of melon is from Australia. Proc Natl Acad Sci USA 107(32):14269–14273 14. Garcia-Mas J, Benjak A, Sanseverino W, Bourgeois M, Mir G, Gonzalez VM, Henaff E, Camara F, Cozzuto L, Lowy E, Alioto T, Capella-Gutierrez S, Blanca J, Canizares J, Ziarsolo P, Gonzalez-Ibeas D, Rodriguez-Moreno L, Droege M, Du L, Alvarez-Tejado M, Lorente-Galdos B, Mele M, Yang L, Weng Y, Navarro A, Marques-Bonet T, Aranda MA, Nuez F, Pico B, Gabaldon T, Roma G, Guigo R, Casacuberta JM, Arus P, Puigdomenech P (2012) The genome of melon (Cucumis melo L.). Proc Natl Acad Sci USA 109(29):11872–11877 15. van Leeuwen H, Monfort A, Zhang H-B, Puigdomenech P (2003) Identification and characterisation of a melon genomic region containing a resistance gene cluster from a constructed BAC library. Microcolinearity between Cucumis melo and Arabidopsis thaliana. Plant Mol Biol 51(5):703–718 16. Diaz A, Fergany M, Formisano G, Ziarsolo P, Blanca J, Fei Z, Staub JE, Zalapa JE, Cuevas HE, Dace G, Oliver M, Boissot N, Dogimont C, Pitrat M, Hofstede R, van Koert P, Harel-Beja R,

123

17.

18.

19.

20. 21.

22.

23. 24.

25.

26.

27.

28.

29.

30.

31.

Tzuri G, Portnoy V, Cohen S, Schaffer A, Katzir N, Xu Y, Zhang H, Fukino N, Matsumoto S, Garcia-Mas J, Monforte AJ (2011) A consensus linkage map for molecular markers and quantitative trait loci associated with economically important traits in melon (Cucumis melo L.). BMC Plant Biol 11:111 Deleu W, Esteras C, Roig C, Gonzalez-To M, Fernandez-Silva I, Gonzalez-Ibeas D, Blanca J, Aranda MA, Arus P, Nuez F, Monforte AJ, Pico MB, Garcia-Mas J (2009) A set of EST-SNPs for map saturation and cultivar identification in melon. BMC Plant Biol 9:90 Fernandez-Silva I, Eduardo I, Blanca J, Esteras C, Pico B, Nuez F, Arus P, Garcia-Mas J, Monforte AJ (2008) Bin mapping of genomic and EST-derived SSRs in melon (Cucumis melo L.). TAG Theoretical and applied genetics Theoretische und angewandte Genetik 118(1):139–150 Gonzalo MJ, Oliver M, Garcia-Mas J, Monfort A, Dolcet-Sanjuan R, Katzir N, Arus P, Monforte AJ (2005) Simple-sequence repeat markers used in merging linkage maps of melon (Cucumis melo L.). TAG Theoretical and applied genetics Theoretische und angewandte Genetik 110(5):802–811 Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24(5):713–714 Smit AF (1999) Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev 9(6):657–663 Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132:365–386 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410 Kent WJ (2002) BLAT–the BLAST-like alignment tool. Genome Res 12(4):656–664. doi:10.1101/gr.229202 Article published online before March 2002 Causse MA, Fulton TM, Cho YG, Ahn SN, Chunwongse J, Wu K, Xiao J, Yu Z, Ronald PC, Harrington SE et al (1994) Saturated molecular map of the rice genome based on an interspecific backcross population. Genetics 138(4):1251–1274 Lawson MJ, Zhang L (2006) Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes. Genome Biol 7(2):R14 Hong Y, Chen X, Liang X, Liu H, Zhou G, Li S, Wen S, Holbrook CC, Guo B (2010) A SSR-based composite genetic linkage map for the cultivated peanut (Arachis hypogaea L.) genome. BMC Plant Biol 10:17 Xin D, Sun J, Wang J, Jiang H, Hu G, Liu C, Chen Q (2012) Identification and characterization of SSRs from soybean (Glycine max) ESTs. Mol Biol Rep 39(9):9047–9057 Wang Z, Yan H, Fu X, Li X, Gao H (2012) Development of simple sequence repeat markers and diversity analysis in alfalfa (Medicago sativa L.). Mol Biol Rep. doi:10.1007/s11033-0122404-3 Biswas MK, Chai L, Mayer C, Xu Q, Guo W, Deng X (2012) Exploiting BAC-end sequences for the mining, characterization and utility of new short sequences repeat (SSR) markers in Citrus. Mol Biol Rep 39(5):5373–5386 Cloutier S, Miranda E, Ward K, Radovanovic N, Reimer E, Walichnowski A, Datla R, Rowland G, Duguid S, Ragupathy R (2012) Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.). TAG Theoretical and applied genetics Theoretische und angewandte Genetik 125(4):685–694

Development of SSR markers by next-generation ...

Development of SSR markers by next-generation ...

Suggest Documents

Development of EST-SSR and genomic-SSR markers to ... - Scolaris

Development of genomic SSR and potential EST-SSR markers - AJOL

(ssr) markers

DEVELOPMENT OF SSR MARKERS FOR GENOTYPING ... - CiteSeerX

Development of Polymorphic Genic SSR Markers by ... - MDPI

(SSR) markers - Academic Journals

(SSR) markers - Academic Journals

(SSR) Markers of Sesame - MDPI

Development of EST-SSR markers of Ipomoea nil - BioMedSearch

Development of genic SSR markers from transcriptome sequencing of ...

Development of SSR markers of mangosteen (Garcinia mangostana L.)

Development of SSR markers from ESTs of ... - CyberLeninka

Development of simple sequence repeat (SSR) markers for oil palm ...

Development and integration of ESTâSSR markers into an established ...

Development and Characterization of EST-SSR Markers from ...

Development, characterization and use of genomic SSR markers for ...

Development of novel EST-SSR markers for ploidy ... - PLOS

Development of highly polymorphic SSR markers for chickpea (Cicer ...

Development of SSR markers for the genus Patellifolia ...

Development of EST-based new SSR markers in ... - Springer Link

Development of EST-SSR markers and construction ...

Development of Reproducible EST-derived SSR Markers and ...

Development and validation of the first SSR markers for Mimosa ...

Characterization and Development of EST-SSR Markers ... - MDPI

Development of SSR markers by next-generation ...