Transcriptome sequencing of wild chickpea as a ... - Wiley Online Library

6 downloads 3069 Views 461KB Size Report
fax 91 11 26741658; email .... http://www.nipgr.res.in/ctdb.html). The average ... *Total number of reads for each tissue sample. ..... procal best blast hit approach.
Plant Biotechnology Journal (2012) 10, pp. 690–702

doi: 10.1111/j.1467-7652.2012.00712.x

Transcriptome sequencing of wild chickpea as a rich resource for marker development Shalu Jhanwar, Pushp Priya, Rohini Garg, Swarup K. Parida, Akhilesh K. Tyagi and Mukesh Jain* National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, New Delhi, India

Received 13 December 2011; revised 6 April 2012; accepted 20 April 2012. *Correspondence (Tel 91 11 26735182; fax 91 11 26741658; email [email protected])

Keywords: wild chickpea, transcriptome, polymorphism, simple sequence repeat, single-nucleotide polymorphism.

Summary The transcriptome of cultivated chickpea (Cicer arietinum L.), an important crop legume, has recently been sequenced. Here, we report sequencing of the transcriptome of wild chickpea, C. reticulatum (PI489777), the progenitor of cultivated chickpea, by GS-FLX 454 technology. The optimized assembly of C. reticulatum transcriptome generated 37 265 transcripts in total with an average length of 946 bp. A total of 4072 simple sequence repeats (SSRs) could be identified in these transcript sequences, of which at least 561 SSRs were polymorphic between C. arietinum and C. reticulatum. In addition, a total of 36 446 single-nucleotide polymorphisms (SNPs) were identified after optimization of probability score, quality score, read depth and consensus base ratio. Several of these SSRs and SNPs could be associated with tissue-specific and transcription factor encoding transcripts. A high proportion (92–94%) of polymorphic SSRs and SNPs identified between the two chickpea species were validated successfully. Further, the estimation of synonymous substitution rates of orthologous transcript pairs suggested that the speciation event for divergence of C. arietinum and C. reticulatum may have happened approximately 0.53 million years ago. The results of our study provide a rich resource for exploiting genetic variations in chickpea for breeding programmes.

Introduction Chickpea (Cicer arietinum L.) is an important food legume because of its high nutritional value and ability to fix atmospheric nitrogen. The global chickpea production is about 11 million tons representing more than 16% of total production of pulses (FAOSTAT, 2010; http://faostat.fao.org/site/567/DesktopDefault. aspx?PageID=567#ancor). The Cicer genus encompasses nine annual and 34 perennial wild species. Among the nine annual species, C. arietinum is the only cultivated species and other eight species are wild. The wild annual species, C. reticulatum, is considered the progenitor of cultivated chickpea, C. arietinum (Iruela et al., 2002; Nguyen et al., 2004; Singh et al., 2008). Very low levels of genetic variations have been found within cultivated chickpea species, which led to the use of inter-specific crosses for genome mapping and genetic linkage studies (Ahmad et al., 1992; Udupa et al., 1993; Labdi et al., 1996). In other crops, inter-specific crosses have been used to maximize polymorphisms in linkage analysis (Yamamoto et al., 2002; Aluko et al., 2004; Lowe and Walker, 2006). Among the various species of chickpea, C. reticulatum has been successfully crossed with the cultivated chickpea producing fertile hybrids, thus making C. reticulatum of special interest to breeders (Singh et al., 2008). Further, it has been found to exhibit resistance ⁄ tolerance to various abiotic and biotic factors that limit the chickpea productivity (Haware et al., 1992; Singh et al., 1994, 1998, 2008; Collard et al., 2001; Sharma et al., 2005). Hence, genome level analysis and identification of molecular markers from C. reticulatum will help breeding programmes to introgress agronomic traits into cultivated chickpea. Although several studies have reported the identification of markers within C. arietinum species, only a few studies have focused on C. reticulatum (Sethy et al., 2006a,b).

690

The recent transcriptome sequencing of C. arietinum (Garg et al., 2011a,b) has provided a powerful tool to characterize the genes responsible for agronomic traits. However, sequencing the transcriptome of only one species does not provide an understanding of gene ⁄ genome evolution, genome organization and genetic variations. Thus, sequencing the transcriptome of wild species will lead to a better understanding of evolutionary processes and aid in the identification of genetic variations of agronomic use. The importance of wild species in improvement of chickpea has been comprehensively reviewed (Singh et al., 2008). Various studies, including morphological characteristics, inter-specific hybridization, karyotyping, seed storage protein profiling, isozyme studies and marker-based molecular studies, have placed the wild C. reticulatum species along with cultivated C. arietinum in one group (Singh et al., 2008). C. reticulatum has the same chromosome number of 2n = 16 as that of C. arietinum. Karyotype studies have suggested that nuclear DNA content of C. reticulatum is lower than the C. arietinum, but have similar chromosome banding patterns and heterochromatin (Tayyar et al., 1994; Galasso et al., 1996). Very few genomic resources are available for wild relatives of Cicer, including C. reticulatum. Only 51 nucleotide sequences are available for C. reticulatum, and 30 expressed sequence tags (ESTs) are available for recombinant inbred lines of C. arietinum and C. reticulatum, in Genbank as of January, 2012. Next generation sequencing (NGS) technologies have enabled the sequencing of transcriptomes very rapidly and cost-effectively. In this study, we sequenced the transcriptome of wild chickpea (C. reticulatum) using GS-FLX Roche 454 NGS technology and analysed the same in detail. A wealth of genomic variations, including simple sequence repeats (SSRs) and single-nucleotide polymorphisms (SNPs), between C. arietinum

ª 2012 The Authors Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd

Transcriptome sequencing of wild chickpea 691 and C. reticulatum have been catalogued. In addition, we have estimated the substitution rates among the orthologous transcript pairs to calculate the age of speciation between the two chickpea species.

Results 454 sequencing of C. reticulatum transcriptome Earlier, we reported sequencing and characterization of the transcriptome of cultivated chickpea, C. arietinum (Garg et al., 2011a). In this study, we have sequenced the transcriptome of a wild chickpea, C. reticulatum (genotype PI489777). Using GSFLX Titanium sequencer, we generated about 1.2 million reads of Q20 quality by sequencing of three libraries in two flow cells. After a stringent quality control, a total of 969 132 high-quality reads with an average length of 396 bp covering a total of 383 669 305 bases were obtained. A large fraction (84.7%) of these reads were >300 bp in length (Figure S1A), and average Phred quality score of more than 90% of the reads was at least 30 (Figure S1B). The summary of sequencing data generated from different tissues ⁄ libraries and their quality filtering at each step is given in Table 1.

Assembly optimization of C. reticulatum transcriptome The optimization of assembly is an important step of NGS data analysis for accurate downstream analyses. Several criteria, including assembly statistics, number of reads used for assembly and coverage of the reference (soybean) proteome, were considered for the assembly assessment (Garg et al., 2011a). We undertook three approaches for assembly optimization. In the first approach, de novo assembly of all the high-quality reads was performed using five different assembly programs, including CAP3, TGICL, MIRA, CLC and Newbler (Table 2). The number of contigs generated and total assembly size varied for different assemblers (Table 2). Although the average length and N50 length were largest in Newbler assembly, it utilized the least (82%) number of reads for the assembly. In addition, the number of uniquely mapped reads was significantly lesser (71.5%) than that of other programs (84–90%). The largest number of soybean proteins was represented in the TGICL assembly (42.7%) and least (33.5%) in case of Newbler assembly (Table 2). A larger number of reads were uniquely mapped on TGICL assembly (90%) as compared to MIRA (87.7%). The

output of CLC and CAP3 assemblies was intermediate in all respects. In the second approach, a reference-based assembly was performed with CLC and Newbler programs using transcriptome sequence of C. arietinum as reference. The reference-based assembly generated 24 120 and 26 949 contigs representing 28.7 and 24.8 Mb of the sequence using CLC and Newbler, respectively. The average length and N50 length decreased in the reference assembly using Newbler, but it was better for CLC as compared to the de novo assemblies. However, the reference-based assembly used fewer reads (81% for Newbler and 85% for CLC) as compared to de novo assemblies (Table 2). For the third approach, we attempted a merged assembly of the contigs generated in the primary reference assemblies along with unassembled reads using TGICL. The assembly size increased by 30% (37.1 Mb) for CLC and 42% (35.2 Mb) for Newbler primary reference assemblies (Table 2). The average length also increased for merged assembly of the Newbler primary reference assembly and unassembled reads. Most importantly, the merged assemblies included about 96% of reads in both cases, which is much higher than the primary reference assemblies. However, the number of uniquely mapped reads and soybean proteins represented in the assembly was more for the merged assembly of Newbler primary reference assembly and unassembled reads. Overall, based upon several criteria, the merged assembly of Newbler primary reference assembly and unassembled reads was the best and used in further analysis.

Overview of the C. reticulatum transcriptome The 37 265 contigs generated from assembly optimization were designated as C. reticulatum tentative consensus (CrTC) transcripts and assigned unique identifier numbers from CrTC00001 to CrTC37265. The sequences of all the CrTCs are available for download from Chickpea Transcriptome Database (CTDB; http://www.nipgr.res.in/ctdb.html). The average length of the CrTC transcripts is 946 bp, which is slightly less than that of C. arietinum transcripts (1020 bp). Overall, the C. reticulatum transcriptome covered a total of 35 242 239 bp of the sequence. About 32% (11 889) transcripts were at least 1000 bp in length (Figure S2A). The mapping of all the highquality reads on CrTC transcripts showed that each transcript was represented by an average of 24.9 reads. The read depth of transcripts varied greatly with more than 47% of the

Table 1 Summary of 454 sequencing data generated for C. reticulatum transcriptome and quality filtering

Library ⁄ tissue-type

Total

Low-quality

Trimmed

Trashed

rRNA

High-quality

Average

reads*

reads†

readsà

reads§

reads–

mRNA reads**

length (bp)†† 391

Shoot

548 123

4622

18 015

33 446

102 796

407 259

Root

137 233

943

4464

8529

21 603

106 158

389

Mixed

498 186

5475

13 631

32 651

4345

455 715

402

1 183 542

11 040

36 110

74 626

128 744

969 132

396

Total

*Total number of reads for each tissue sample. †

Number of low-quality reads (Phred quality score of 30% of bases) removed.

à

Number of trimmed reads containing primer ⁄ adapter sequence and >7 bp long homopolymer starting from the first base of homopolymer.

§

Number of short reads (10

Repeat unit size difference (c)

250

Number of SSRs

200

150

100

SSR polymorphism between C. arietinum and C. reticulatum Overall, a similar number and abundance of different types (di- to hexa-nucleotide) of SSRs was observed in C. arietinum (Garg et al., 2011a) and C. reticulatum (this study) transcriptomes. Applying stringent criteria, a total of 561 (approximately 14%) SSRs exhibited polymorphism between C. arietinum and C. reticulatum species with a difference of at least one repeat unit for all types of repeats (Table S3). These 561 SSRs were present in 552 transcripts; nine transcripts harboured two polymorphic SSRs. The number of polymorphic SSRs was inversely proportional to the difference in number of repeat units (Figure 1b). The difference in number of repeat units varied from one to 40 with an average of about 2.33 in polymorphic SSRs between C. arietinum and C. reticulatum. At least 12 polymorphic SSRs showed a difference in the number of repeat units more than 10 (Figure 1b). The polymorphism rate of trinucleotides was higher than di-nucleotides. The di-nucleotide repeat AG ⁄ CT (34.2%) was most represented, followed by the tri-nucleotide AAG ⁄ CTT (20.9%) repeat, which constituted more than 55% of the total number of polymorphic SSRs (Figure 1c). The average repeat length varied from 17.2–33.3 bp for all the SSRs identified in C. arietinum and 16.7–32.2 bp for C. reticulatum. However, the average repeat length was higher for polymorphic SSRs in both C. arietinum (20.1–34.5 bp) and C. reticulatum (18.3–34.5 bp). The average repeat length and average number of repeat units were larger in case of all and polymorphic SSRs of C. arietinum as compared to C. reticulatum. For example, the average number of repeat units was 8.7 in all and 10.1 in polymorphic di-nucleotide repeats of C. arietinum as compared to 8.5 in all and 9.2 in polymorphic di-nucleotide repeats of C. reticulatum. The polymorphic SSRs were found to be well distributed over the entire length, in particular within first 2 kb of the transcript sequences (Figure S5). We also analysed the presence of polymorphic SSRs in the tissue-specific chickpea transcripts identified earlier (Garg et al., 2011a). At least 12 and four polymorphic SSRs were present in the flower bud- and young pod-specific transcripts, respectively. In addition, 70 polymorphic SSRs were detected in transcripts representing different transcription factor families (Table S3).

SNP identification 50

0

Repeat-type Figure 1 Distribution of polymorphic SSRs among two chickpea genotypes. (a) Distribution of total number of SSRs identified in C. reticulatum and polymorphic SSRs among C. arietinum and C. reticulatum in different classes. (b) Frequency difference of different classes of polymorphic SSRs between C. arietinum and C. reticulatum. (c) Number of polymorphic SSRs representing different repeat types.

ATC ⁄ ATG constituted 33% of the total and 56.6% of the trinucleotide repeats. AAAG ⁄ CTTT and AAAT ⁄ ATTT represented the most (45.6%) of tetra-nucleotide repeats.

One of the key applications of NGS technologies is the identification of genetic variants via whole-genome or transcriptome sequencing. The use of 454 sequencing technology in the identification of SNPs has been well demonstrated (Barbazuk et al., 2007; Bundock et al., 2009). To identify putative SNPs between the C. arietinum and C. reticulatum transcriptomes, all the high-quality reads were aligned to the C. arietinum transcriptome sequence taken as a reference and the generated ACE file was used as input in the GigaBayes. GigaBayes algorithm identifies both the single base substitutions and single base insertions ⁄ deletions with a high successful validation rate (Barbazuk et al., 2007; Smith et al., 2008; Milano et al., 2011). However, because of susceptibility of the 454 sequencing technology to indels errors, only base substitutions (i.e. SNPs) were analysed in the present study. The number of SNPs detected was determined at different probability cut-offs (Figure S6A), and the probability cut-off of 0.95 was considered to be optimal, which resulted into 86 135 SNPs (Figure S6A). This was followed by the optimization of SNP base quality score

ª 2012 The Authors Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 10, 690–702

694 Shalu Jhanwar et al. (Figure S6B). The average quality score of at least 30 was selected to identify the SNPs at different read depths in both the species for consensus base ratio of 0.9 and 1 after removing the closely lying SNPs (three or more SNPs in any 10 bp window) and SNPs lying close to potential indel (within 3 bp flanking region) (Figure S7). With increase in read depth, number of SNPs decreased at both the allele frequencies. Finally, we selected a minimum read depth of three and base consensus ratio of one as the criteria to minimize the false detection. Similar criteria for screening high-quality SNPs have been used in previous studies (Barbazuk et al., 2007; Wu et al., 2010; Milano et al., 2011). Applying the above stringent criteria, the raw SNP data resulted into 36 446 putative SNPs (designated as CaTSNP_00001 to CaTSNP_36446) distributed in 10 880 transcripts (Table S4). This represented the highest confidence data set with a probability of >0.99 and minimum read depth of three for both chickpea species. The average quality score of the SNP base was 36.9 and 36.3 for C. arietinum and C. reticulatum, respectively. A large fraction of SNP base positions possessed a quality score of at least 35 (Figure 2a). Likewise, about 88% and 72% of SNPs were supported by at least five reads in C. arietinum and C. reticulatum, respectively (Figure 2b). The (a)

average frequency of SNP was one SNP per 973.2 and 967 bp for C. arietinum and C. reticulatum, respectively. The number of SNPs detected per transcript was highly variable from one to 37. Among 10 880 transcripts, which contained at least one SNP, more than 30% transcripts contained only one SNP (Figure 3a). Two to ten SNPs were detected in approximately 66% transcripts. However, a few transcripts (3%) contained more than 10 SNPs as well (Figure 3a). SNPs were well distributed throughout the length of transcripts with larger number at their 5’ ends (Figure S8). Among all the SNPs, transitions (65.2%) were more frequent than the transversions (34.8%) (Figure 3b). Among the 36 446 SNPs, 34 407 were detected in 10 119 transcripts for which a putative function has been assigned; however, other 2039 SNPs were detected in unknown expressed sequences. Subsequently, we analysed the SNPs identified in chickpea transcripts found to be expressed in tissuespecific manner or those encoding transcription factors in our previous study (Garg et al., 2011a). A total of 1071 SNPs were detected in 455 transcripts exhibiting tissue-specific expression (Figure 4a; Table S4). The largest number of SNPs was detected in flower bud-specific transcripts followed by young pod-specific transcripts. In addition, we identified a total of 2847 SNPs in 808 transcription factor encoding transcripts (Figure 4b).

25 C. arietinum

(a)

4000

C. reticulatum 3500 3000

Number of TCs

Number of SNPs (%)

20

15

10

5

2500 2000 1500 1000 500

0

30

31

32

33

34

35

36

37

38

39

40

0

Average Phred quality score (b)

(b)

C. arietinum C. reticulatum

3

4

5

6

7

8

9

10

>10

18 16

Number of SNPs (%)

Number of SNPs (%)

2

Frequency of SNPs

60

50

1

40

30

20

14 12 10 8 6 4

10 2

0

3

4

5

6

7

8

9

10

>10

Number of reads Figure 2 Quality of SNPs in C. arietinum and C. reticulatum. (a) Average quality scores of SNP base in C. arietinum and C. reticulatum. (b) Number of SNPs identified at different read depths.

0

A/C C/A A/T T/A

G/C C/G G/T

T/G A/G G/A C/T T/C

Substitution type Figure 3 Frequency and substitution types of the identified SNPs. (a) Number of SNPs detected per transcript (b) Frequency of different substitution types.

ª 2012 The Authors Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 10, 690–702

Transcriptome sequencing of wild chickpea 695 (a)

350 TCs SNPs

Number of TCs/SNPs

300

250

200

150

100

50

0 Root

Shoot

Mature leaf

Flower bud

Young pod

Tissue-sample

(b)

Divergence between C. arietinum and C. reticulatum

1000 TCs SNPs

300

Number of TCs/SNPs

SNP detection was carried out via Maq software using C. arietinum transcriptome as reference. After applying various stringent criteria for SNP detection as described above, at least 8297 (23%) SNPs could be validated. Further, for experimental validation, a set of 24 SNPs located in the transcription factor encoding transcripts were selected. These SNPs were genotyped via cleaved amplified polymorphic sequence (CAPS) genotyping assay using the appropriate restriction enzymes whose recognition sequence had been created or abolished because of single nucleotide substitutions (Table S6). The restriction enzymes used for CAPS analysis gave the expected patterns of restriction digestion in 22 (91.7%) of the 24 SNP loci, validating the presence of target SNPs in the transcription factor encoding transcripts (Figure 5b, Table S6). The remaining two SNP loci (8.3%), which could not be validated, might represent false SNPs detected through in silico analysis.

200

150

100

50

0

Transcription factor family Figure 4 Distribution of SNPs in the transcripts showing tissue-specific expression (a) and those encoding for transcription factors (b).

Among the various families, SNPs were most abundant in homeobox transcription factors.

Validation of polymorphic SSRs and SNPs Among the 561 polymorphic SSRs identified above, 96 polymorphic SSRs with a minimum length difference of 6 bp between C. arietinum and C. reticulatum were selected randomly for evaluation of their potential for PCR amplification and experimental validation of the polymorphism (Table S5). Ninety-four of the selected 96 SSRs showed PCR amplification in both chickpea species indicating a high success rate of 98%. Eighty-nine (94.7%) of the 94 PCR amplified SSRs exhibited polymorphism between the two chickpea species (Figure 5a, Table S5) as expected. Remarkably, the fragment length polymorphism observed in case of 73 (82%) polymorphic SSRs by experimental validation corresponded well with their in silico predicted length polymorphism. For validation of SNPs identified between C. arietinum and C. reticulatum, we used short-read Illumina data available for C. reticulatum in NCBI SRA database (accession number SRR210598). Following stringent quality control of this data,

We identified a total of 17 889 putative orthologous transcript pairs between C. arietinum and C. reticulatum based on reciprocal best blast hit approach. Synonymous (Ks) and nonsynonymous (Ka) substitution rates were calculated for all the orthologous transcript pairs. The Ks value distribution of orthologous pairs exhibited a distinct peak at 0.008 (Figure 6a), which inferred the age of speciation between C. arietinum and C. reticulatum approximately 0.53 million years ago. Further, to investigate the selection pressure on orthologous pairs, we calculated the ratio of Ka and Ks (Ka ⁄ Ks) for 11 622 orthologous pairs, for which Ks value was greater than zero and £2. The average Ka ⁄ Ks value was 0.27 for orthologous pairs. About 95% of the orthologous pairs exhibited Ka ⁄ Ks value of 1.0, indicating diversifying selection (Figure 6b). Among the 503 orthologous pairs under diversifying selection, functional annotation was available for 374 pairs, and largest number of these were found to be involved in transferase activity and DNA ⁄ RNA binding molecular functions.

Discussion Although C. arietinum is the cultivated species, it has a low level of genetic diversity, which might be one of the contributing factors for its susceptibility to pathogens ⁄ pests in agricultural settings. On the other hand, C. reticulatum, considered as the progenitor of cultivated chickpea, is resistant ⁄ tolerant to various abiotic and biotic factors (Haware et al., 1992; Singh et al., 1994, 1998, 2008; Collard et al., 2001; Sharma et al., 2005). Hence, it can serve as a source of genes for resistance ⁄ tolerance that can be used in C. arietinum breeding programmes. Therefore, there is a need to better understand the genetic mechanisms underlying the phenotypic variability between C. arietinum and C. reticulatum, which would help in devising breeding strategies utilizing C. reticulatum genes. Genome ⁄ transcriptome sequencing of cultivated and wild chickpea species and the study of sequence polymorphisms provide an opportunity to understand the genetic basis of phenotypes associated with the two species as well as aspects of genome evolution within Cicer. For example, the genomes of soybean (Glycine max) and its wild relative (G. soja) have been sequenced, which provided a rich resource of genetic variations between the two species to exploit for genetic improvement of soybean (Kim et al., 2010; Schmutz et al., 2010).

ª 2012 The Authors Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 10, 690–702

696 Shalu Jhanwar et al. A

(a) 1000 bp

M 1

B

C

2

1 2 1

2

1 2 1

D 2 1

E 2 1

F 2 1

G 2 1

H 2 1

I 2 1 2

J 1

K 2 1

L 2 1

2

50 bp

M 1000 bp

M 1

N

O

P

2 1

Q 2 1

R 2 1

S 2 1

T 2 1

U 2 1 2

V 1

W 2 1

X 2 1

2

50 bp

(b) 1000 bp

HpyCH4V M Un 1

Tsp509I

2 Un

1

PstI 2 Un 1

HpyAV 2 Un 1

2

FaiI Un 1

100 bp 50 bp

We have sequenced and characterized the transcriptomes of both cultivated (Garg et al., 2011a,b) and wild (this study) chickpea, employing NGS technologies. The optimized assembly resulted in a total of 37 265 transcripts in C. reticulatum representing a total transcriptome size of (35.2 Mb), which is slightly smaller than that of C. arietinum (35.5 Mb). The transcriptome sequencing provides clues about the larger genome size of C. arietinum as compared to C. reticulatum, which is consistent with previous report (Tayyar et al., 1994; Galasso et al., 1996). However, the exact picture of the reasons for genome size differences will be clear once the complete genome sequence of both the species is available. More than 72% of the C. reticulatum transcripts were generated via a reference assembly approach utilizing the transcriptome of C. arietinum, indicating a close genomic similarity between the two transcriptomes. This is in agreement with the high interspecific similarity found between the two recently sequenced soybean species based on their gene content (Kim et al., 2010), but in contrast with that of ten Oryza species (Buell, 2009). A comparative analysis revealed that many of C. arietinum transcripts were not detected in C. reticulatum and many C. reticulatum transcripts were not detected in the C. arietinum. These differences in gene content ⁄ expression might underlie some of the phenotypic variations between the two species. Many of these genes might have been acquired or lost in the cultivated chickpea during its domestication and improvement, or might reflect differences in expression patterns between the two species. Because of cost and throughput, conventional markers such as RFLP and RAPD are being replaced with SSRs and SNPs. Besides being cost-effective, co-dominant, highly reproducible and amenable to high-throughput analysis, SNPs have better potential for linkage to loci that contribute to agronomic traits, and SSRs from transcribed regions provide functional markers (Rafalski, 2002; Varshney et al., 2005; Parida et al., 2006). Most

SgeI 2 Un 1

2

M

Figure 5 Experimental validation of in silico identified polymorphic SSRs and SNPs. (a) Experimental validation of polymorphic SSRs. Representative gel showing PCR amplification of polymorphic SSRs (A to X) validating the length polymorphism between two chickpea species C. arietinum (1) and C. reticulatum (2) as expected. The PCR amplicons were resolved in 3.5% MetaPhor agarose gel. M, 50 bp DNA ladder as size standard. (b) Experimental validation of polymorphic SNPs using CAPS assay. Representative gel showing restriction enzyme (as indicated) digested PCR amplicons of SNPs validating the polymorphism between two chickpea species C. arietinum (1) and C. reticulatum (2) as expected. The PCR amplicons were resolved in 3.5% MetaPhor agarose gel. M, 50 bp DNA ladder as size standard; Un, undigested amplicon.

importantly, the cross-species transferability of SSRs has been well demonstrated. Because of a broad range of applications, including genetic mapping, genotype identification, markerassisted selection ⁄ breeding and molecular tagging of genes, SSRs have been proven to be the markers of choice in plant genetics. SSRs derived from EST sequences have been used to construct the genetic map of chickpea (Hu¨ttel et al., 1999; Winter et al., 1999; Radhika et al., 2007; Choudhary et al., 2009). However, the limited genetic diversity found in chickpea germplasm has restricted the use of SSRs identified for practical use (Hu¨ttel et al., 1999; Winter et al., 1999; Lichtenzveig et al., 2005; Sethy et al., 2006a,b), and thus, there is a need to develop a larger set of SSRs. We identified a total of 4072 SSRs in the wild species of chickpea. Among these, tri-nucleotide repeats were found to be most abundant. This is in accordance with our report for C. arietinum transcriptome and studies in various other plants (La Rota et al., 2005; Hisano et al., 2007; Garg et al., 2011a). It has been suggested that the expansion of repeats other than tri-nucleotides is suppressed in coding DNA, because they may lead to frame-shift mutations (Metzgar et al., 2000). The development of SSR markers is labour-intensive, costly and time demanding. Hence, in silico approaches are gaining popularity to screen the polymorphic SSRs and found to be a relatively inexpensive alternative (Grover et al., 2007; Zhang et al., 2007; Tang et al., 2008). The comparative analysis identified at least 561 polymorphic SSRs with repeat unit difference of one to more than 10 among C. arietinum and C. reticulatum in this study. Being functional markers, the variation in number of repeats of SSRs in the transcripts (5’-UTR and coding region) may modify expression and function of the gene(s) and thus provide a source for quantitative and qualitative phenotypic variation (Kashi and King, 2006). Further, the study of SSRs present particularly in tissue-specific transcripts and regulatory

ª 2012 The Authors Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 10, 690–702

Transcriptome sequencing of wild chickpea 697 (a) 1400 1400

1200

1200 1000 800 600

800

400

600

0.10

0.09

0.08

0.07

0.06

0.05

0.04

0.03

0.01

0

400

0.02

200 0

Number of pairs

1000

2.0

1.9

1.8

1.7

1.6

1.5

1.4

1.3

1.2

1.1

1.0

0.9

0.8

0.7

0.5

0.6

0.4

0.3

0.2

0

0

0.1

200

Ks (b)

40

10 8 6 4 2

genes may be important for marker-assisted selection needed to accelerate the genetic enhancement of chickpea. The analysis of SSRs indicates a greater repeat content in the C. arietinum genome, which may be partially responsible for the larger genome size as compared to C. reticulatum genome (Tayyar et al., 1994; Galasso et al., 1996). The utility of identified polymorphic SSRs was evaluated by their potential to amplify the target sequences and detect polymorphism. The primers designed from transcript sequences flanking the SSRs were highly efficient with amplification success rate of 98%, suggesting the efficacy of transcript sequences for developing large-scale genic microsatellite markers in chickpea. The amplification efficiency observed in this study was significantly higher than that reported earlier (88%) for microsatellite markers derived from transcriptome sequence in chickpea (Hiremath et al., 2011). Interestingly, 94.8% of the selected polymorphic SSRs between two chickpea species were validated experimentally. These

2.0

1.9

1.8

1.7

1.6

1.5

1.4

1.3

1.2

1.1

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0 0.1

Figure 6 (a) Distribution of synonymous substitution rate (Ks) value among the orthologous transcript pairs between the two chickpea species. The secondary peak (marked by arrow) in Ks distribution of orthologs indicates the speciation event. Inset represents enlarged version of the graph showing secondary peak. (b) Frequency distribution of Ka ⁄ Ks ratio in orthologous pairs. The average value of Ka ⁄ Ks ratio is shown by arrow head.

Number of pairs (%)

30

Ka/Ks

results suggested that the identification of in silico polymorphic SSRs by comparing their repeat length variations in C. arietinum and C. reticulatum is very useful and cost-effective for selecting and developing the informative genic microsatellite markers for their use in large-scale validation and genotyping applications in chickpea. The paucity of usable and robust sequence-based molecular markers has been a major limitation in genetic analysis in this important pulse crop. Therefore, a large number of polymorphic genic microsatellite markers developed in the present study would be a very useful resource of functional markers for rapidly establishing marker-trait linkages and identifying genes for many traits of agricultural importance in chickpea. SNP discovery is an important area of molecular genetic research, which involves the identification of sequence polymorphisms that can be exploited for high-resolution genotyping. EST sequence-based SNP discovery has been used to generate high-density genetic maps for marker-assisted selection studies,

ª 2012 The Authors Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 10, 690–702

698 Shalu Jhanwar et al. identification of variable genomic regions, genome-wide association studies to assign genes to specific traits or develop allelespecific assays for the examination of cis-regulatory variations in agricultural crops (Guo et al., 2003; Stupar and Springer, 2006; Choi et al., 2007; Novaes et al., 2008; Duran et al., 2009). Using stringent criteria, we identified more than 36 000 SNPs distributed in 10 880 transcripts between the transcriptomes of two chickpea species. These SNPs were monomorphic within but polymorphic between the two sequenced species. The frequency of transitions and transversions was comparable to that observed in other plants (Picoult-Newberg et al., 1999; McNally et al., 2009; Nelson et al., 2011). About 91.7% of the selected SNPs present in the transcription factor encoding transcripts could be validated successfully using the CAPS genotyping assay. The remaining 8.3% candidate SNPs turned out to be false possibly because of sequencing errors. False SNPs have been observed earlier also in several SNP discovery studies based on EST sequences in crop plants (Thiel et al., 2003; Varshney et al., 2008). The utility of cost-efficient CAPS assay for large-scale validation and genotyping of SNPs, particularly identified from the genic sequences, has been demonstrated earlier in many crop species including chickpea (Nayak et al., 2010; Gujaria et al., 2011). The SNPs identified in the transcript sequences between C. arietinum and C. reticulatum would be a useful and informative functional genetic marker resource for rapid selection of SNPs for their large-scale validation using high-throughput and multiplex SNP genotyping assays (Ragoussis, 2009). These SNPs once validated and genotyped in largescale could be utilized for various marker-based applications in chickpea genetics, genomics and breeding. Further, SNPs detected in the transcripts expressed in a tissue-specific manner and in regulatory transcription factors will be of particular interest as these may be associated with phenotypes and can be used as candidate markers ⁄ genes. The calculation of Ks rate for orthologous transcript pairs was used to estimate the divergence age between the two chickpea species. A distinct sharp secondary peak in the Ks distribution suggested the divergence time between C. arietinum and C. reticulatum approximately 0.53 million years ago. Our estimation of divergence time between the wild and cultivated chickpea species is more than that has been estimated for wild and cultivated soybean species (Kim et al., 2010). The estimated divergence time between the two soybean species was older than the established domestication time for soybean (Kim et al., 2010). Although our analysis provide an idea about the approximate age of speciation between two chickpea species vis-a`-vis domestication, the exact picture will be clearer once their genome sequences will become available. The estimation of the ratio of Ka and Ks substitution rates provides a measure of the selection pressure to which a gene pair is subjected and has been used in several studies to identify the genes under positive ⁄ diversifying selection or under negative ⁄ purifying selection. Our results are consistent with those observed for many eucaryotic species showing that most orthologous genes are under purifying selection (Schlueter et al., 2004). In conclusion, we sequenced and characterized the transcriptome of wild chickpea using NGS platform. The transcriptome sequencing is important for gene discovery and future annotation of genome sequences of different chickpea genotypes. These data may be utilized for either basic or applied research programmes in chickpea. A comparative analysis of C. arietinum and C. reticulatum transcriptomes resulted in the identification

of large number of informative SSRs and SNPs with high polymorphism success rate, which provide a set of functional markers and constitute a resource for mapping and marker-assisted breeding in chickpea and other legumes. Our results suggest the divergence time between C. arietinum and C. reticulatum was 0.53 million years ago. This approach provides a cost-effective and efficient method for the optimization of transcriptome assembly and discovery of genetic variations in species with as yet unsequenced genomes. This study will aid in the future structural and functional genomics in chickpea and other legumes.

Experimental procedures Plant material Chickpea (C. reticulatum L. genotype PI489777) seeds procured from ICRISAT, Hyderabad, were grown as described (Garg et al., 2010). Root and shoot tissue samples were collected from the 15-day-old seedlings. The mature leaves, flower buds and young pods were harvested from plants grown in the field. At least three biological replicates of each tissue sample were harvested and snap frozen in liquid nitrogen.

RNA isolation and 454 sequencing Total RNA from various tissues was extracted using TRI Reagent (Sigma Life Science, St Louis, MO) and subjected to quality and quantity assessment using NanoVue (GE Healthcare, Hong Kong) and Agilent 2100 Bioanalyzer (Agilent Technologies, Singapore) as described previously (Garg et al., 2010). The equal quantities of total RNA from the three biological replicates of root and shoot samples were pooled before mRNA purification. For mixed tissue sample, total RNA isolated from biological replicates of root, shoot, mature leaf and flower buds was pooled in equal quantities. The mRNA purification, removal of rRNA contamination and double-stranded cDNA synthesis were performed as described previously (Garg et al., 2011a). cDNA library preparation and sequencing was performed using GS-FLX Titanium series reagents essentially following the manufacturer’s instructions (Roche Diagnostics GmbH, Mannheim, Germany) as described previously (Garg et al., 2011a). Two cDNA libraries generated, one each from the mRNA isolated from shoot and root tissue samples, were sequenced in one flow cell, and third cDNA library from the mixed tissue sample was sequenced in another flow cell.

Sequence pre-processing and assembly The Standard Flowgram Format (SFF) files of the Q20 sequence data generated in this study have been deposited in the Short Read Archive database at NCBI under the accession number SRA037766 (experiment accession numbers SRX072829– SRX072831). Various quality controls (filtering of high-quality reads, trimming of reads containing primer ⁄ adaptor sequences, trimming of reads containing homopolymers of more than seven bases and removal of reads with length of 2, to minimize artefacts. The divergence time was calculated considering the synonymous substitution of 1.5 · 10)8 substitution ⁄ synonymous site ⁄ year for dicots (Koch et al., 2000).

Acknowledgements We acknowledge the financial support from the Department of Biotechnology, Government of India, New Delhi, under the Next Generation Challenge Programme on Chickpea Genomics. We are thankful to Professor Scott Jackson, University of Georgia, for critical reading of the MS and valuable suggestions. We declare that no competing interests exist.

References Ahmad, F., Gaur, P.M. and Slinkard, A.E. (1992) Isozyme polymorphism and phylogenetic interpretations in the genus Cicer L. Theor. Appl. Genet. 83, 620–627. Aluko, G., Martinez, C., Tohme, J., Castano, C., Bergman, C. and Oard, J.H. (2004) QTL mapping of grain quality traits from the interspecific cross Oryza sativa x O. glaberrima. Theor. Appl. Genet. 109, 630–639. Barbazuk, W.B., Emrich, S.J., Chen, H.D., Li, L. and Schnable, P.S. (2007) SNP discovery via 454 transcriptome sequencing. Plant J. 51, 910–918. Blanc, G. and Wolfe, K.H. (2004) Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell, 16, 1667–1678. Buell, C.R. (2009) Poaceae genomes: going from unattainable to becoming a model clade for comparative plant genomics. Plant Physiol. 149, 111–116. Bundock, P.C., Eliott, F.G., Ablett, G., Benson, A.D., Casu, R.E., Aitken, K.S. and Henry, R.J. (2009) Targeted single nucleotide polymorphism (SNP) discovery in a highly polyploid plant species using 454 sequencing. Plant Biotechnol. J. 7, 347–354. Choi, I.Y., Hyten, D.L., Matukumalli, L.K., Song, Q., Chaky, J.M., Quigley, C.V., Chase, K., Lark, K.G., Reiter, R.S., Yoon, M.S., Hwang, E.Y., Yi, S.I., Young, N.D., Shoemaker, R.C., van Tassell, C.P., Specht, J.E. and Cregan, P.B. (2007) A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis. Genetics, 176, 685–696. Choudhary, S., Sethy, N.K., Shokeen, B. and Bhatia, S. (2009) Development of chickpea EST-SSR markers and analysis of allelic variation across related species. Theor. Appl. Genet. 118, 591–608. Collard, B.C.Y., Ades, P.K., Pang, E.C.K., Brouwer, J.B. and Taylor, P.W.J. (2001) Prospecting for sources of resistance to aschochyta blight in wild Cicer species. Aust. Plant Pathol. 30, 271–276. Duran, C., Appleby, N., Clark, T., Wood, D., Imelfort, M., Batley, J. and Edwards, D. (2009) AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants. Nucleic Acids Res. 37, D951– D953. Galasso, I., Pignone, D., Frediani, M., Maggiani, M. and Cremonini, R. (1996) Chromatin characterization by banding techniques, in situ hybridization, and nuclear DNA content in Cicer L. (Leguminosae). Genome, 39, 258– 265. Garg, R., Sahoo, A., Tyagi, A.K. and Jain, M. (2010) Validation of internal control genes for quantitative gene expression studies in chickpea (Cicer arietinum L.). Biochem. Biophys. Res. Commun. 396, 283–288. Garg, R., Patel, R.K., Jhanwar, S., Priya, P., Bhattacharjee, A., Yadav, G., Bhatia, S., Chattopadhyay, D., Tyagi, A.K. and Jain, M. (2011a) Gene discovery and tissue-specific transcriptome analysis in chickpea with massively parallel pyrosequencing and web resource development. Plant Physiol. 156, 1661–1678. Garg, R., Patel, R.K., Tyagi, A.K. and Jain, M. (2011b) De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res. 18, 53–63.

Grover, A., Aishwarya, V. and Sharma, P.C. (2007) Biased distribution of microsatellite motifs in the rice genome. Mol. Genet. Genomics 277, 469– 480. Gujaria, N., Kumar, A., Dauthal, P., Dubey, A., Hiremath, P., Bhanu Prakash, A., Farmer, A., Bhide, M., Shah, T., Gaur, P.M., Upadhyaya, H.D., Bhatia, S., Cook, D.R., May, G.D. and Varshney, R.K. (2011) Development and use of genic molecular markers (GMMs) for construction of a transcript map of chickpea (Cicer arietinum L.). Theor. Appl. Genet. 122, 1577–1589. Guo, M., Rupe, M.A., Danilevskaya, O.N., Yang, X. and Hu, Z. (2003) Genome-wide mRNA profiling reveals heterochronic allelic variation and a new imprinted gene in hybrid maize endosperm. Plant J. 36, 30–44. Haware, M.P., Narayana, R.J. and Pundir, R.P.S. (1992) Evaluation of wild Cicer species for resistance to four chickpea diseases. Int. Chickpea Newslett. 27, 16–18. Hillier, L.W., Marth, G.T., Quinlan, A.R., Dooling, D., Fewell, G., Barnett, D., Fox, P., Glasscock, J.I., Hickenbotham, M., Huang, W., Magrini, V.J., Richt, R.J., Sander, S.N., Stewart, D.A., Stromberg, M., Tsung, E.F., Wylie, T., Schedl, T., Wilson, R.K. and Mardis, E.R. (2008) Whole-genome sequencing and variant discovery in C. elegans. Nat. Methods, 5, 183–188. Hiremath, P.J., Farmer, A., Cannon, S.B., Woodward, J., Kudapa, H., Tuteja, R., Kumar, A., Bhanuprakash, A., Mulaosmanovic, B., Gujaria, N., Krishnamurthy, L., Gaur, P.M., Kavikishor, P.B., Shah, T., Srinivasan, R., Lohse, M., Xiao, Y., Town, C.D., Cook, D.R., May, G.D. and Varshney, R.K. (2011) Large-scale transcriptome analysis in chickpea (Cicer arietinum L.), an orphan legume crop of the semi-arid tropics of Asia and Africa. Plant Biotechnol. J. 9, 922–931. Hisano, H., Sato, S., Isobe, S., Sasamoto, S., Wada, T., Matsuno, A., Fujishiro, T., Yamada, M., Nakayama, S., Nakamura, Y., Watanabe, S., Harada, K. and Tabata, S. (2007) Characterization of the soybean genome using ESTderived microsatellite markers. DNA Res. 14, 271–281. Hu¨ttel, B., Winter, P., Weising, K., Choumane, W., Weigand, F. and Kahl, G. (1999) Sequence-tagged microsatellite site markers for chickpea (Cicer arietinum L.). Genome, 42, 210–217. Iruela, M., Rubio, J., Cubero, J.I., Gil, J. and Millan, T. (2002) Phylogenetic analysis in the genus Cicer and cultivated chickpea using RAPD and ISSR markers. Theor. Appl. Genet. 104, 643–651. Kashi, Y. and King, D.G. (2006) Simple sequence repeats as advantageous mutators in evolution. Trends Genet. 22, 253–259. Kim, M.Y., Lee, S., Van, K., Kim, T.H., Jeong, S.C., Choi, I.Y., Kim, D.S., Lee, Y.S., Park, D., Ma, J., Kim, W.Y., Kim, B.C., Park, S., Lee, K.A., Kim, D.H., Kim, K.H., Shin, J.H., Jang, Y.E., Kim, K.D., Liu, W.X., Chaisan, T., Kang, Y.J., Lee, Y.H., Moon, J.K., Schmutz, J., Jackson, S.A., Bhak, J. and Lee, S.H. (2010) Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome. Proc. Natl. Acad. Sci. USA 107, 22032–22037. Koch, M.A., Haubold, B. and Mitchell-Olds, T. (2000) Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol. Biol. Evol. 17, 1483–1498. Koski, L.B., Gray, M.W., Lang, B.F. and Burger, G. (2005) AutoFACT: an automatic functional annotation and classification tool. BMC Bioinformatics, 6, 151. La Rota, M., Kantety, R.V., Yu, J.K. and Sorrells, M.E. (2005) Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley. BMC Genomics, 6, 23. Labdi, M., Robertson, L.D., Singh, K.B. and Charrier, A. (1996) Genetic diversity and phylogenetic relationships among the annual Cicer species as revealed by isozyme polymorphisms. Euphytica, 88, 181–188. Lichtenzveig, J., Scheuring, C., Dodge, J., Abbo, S. and Zhang, H.B. (2005) Construction of BAC and BIBAC libraries and their applications for generation of SSR markers for genome analysis of chickpea, Cicer arietinum L. Theor. Appl. Genet. 110, 492–510. Lowe, K.M. and Walker, M.A. (2006) Genetic linkage map of the interspecific grape rootstock cross Ramsey (Vitis champinii) x Riparia Gloire (Vitis riparia). Theor. Appl. Genet. 112, 1582–1592. Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H.,

ª 2012 The Authors Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 10, 690–702

Transcriptome sequencing of wild chickpea 701 Irzyk, G.P., Jando, S.C., Alenquer, M.L., Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F. and Rothberg, J.M. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376–380. Marth, G.T., Korf, I., Yandell, M.D., Yeh, R.T., Gu, Z., Zakeri, H., Stitziel, N.O., Hillier, L., Kwok, P.Y. and Gish, W.R. (1999) A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23, 452–456. McNally, K.L., Childs, K.L., Bohnert, R., Davidson, R.M., Zhao, K., Ulat, V.J., Zeller, G., Clark, R.M., Hoen, D.R., Bureau, T.E., Stokowski, R., Ballinger, D.G., Frazer, K.A., Cox, D.R., Padhukasahasram, B., Bustamante, C.D., Weigel, D., Mackill, D.J., Bruskiewich, R.M., Ra¨tsch, G., Buell, C.R., Leung, H. and Leach, J.E. (2009) Genome wide SNP variation reveals relationships among landraces and modern varieties of rice. Proc. Natl. Acad. Sci. USA 106, 12273–12278. Metzgar, D., Bytof, J. and Wills, C. (2000) Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res. 10, 72–80. Milano, I., Babbucci, M., Panitz, F., Ogden, R., Nielsen, R.O., Taylor, M.I., Helyar, S.J., Carvalho, G.R., Espin˜eira, M., Atanassova, M., Tinti, F., Maes, G.E., Patarnello, T., FishPopTrace Consortium and Bargelloni, L. (2011) Novel tools for conservation genomics: comparing two high-throughput approaches for SNP discovery in the transcriptome of the European hake. PLoS ONE, 6, e28008. Nayak, S.N., Zhu, H., Varghese, N., Datta, S., Choi, H.-K., Horres, R., Ju¨ngling, R., Singh, J., Kavi Kishor, P.B., Sivaramakrihnan, S., Hoisington, D.A., Kahl, G., Winter, P., Cook, D.R. and Varshney, R.K. (2010) Integration of novel SSR and gene-based SNP marker loci in the chickpea genetic map and establishment of new anchor points with Medicago truncatula genome. Theor. Appl.Genet. 120, 1415–1441. Nelson, J.C., Wang, S., Wu, Y., Li, X., Antony, G., White, F.F. and Yu, J. (2011) Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum. BMC Genomics, 12, 352. Nguyen, T.T., Taylor, P.W.J., Redden, R.J. and Ford, R. (2004) Genetic diversity estimates in Cicer using AFLP analysis. Plant Breeding, 123, 173– 179. Novaes, E., Drost, D.R., Farmerie, W.G., Pappas Jr, G.J., Grattapaglia, D., Sederoff, R.R. and Kirst, M. (2008) High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics, 9, 312. Parida, S.K., Anand, R.K.K., Dalal, V., Singh, N.K. and Mohapatra, T. (2006) Unigene derived microsatellite markers for the cereal genomes. Theor. Appl. Genet. 112, 808–817. Patel, R.K. and Jain, M. (2012) NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE, 7, e30619. Picoult-Newberg, L., Ideker, T.E., Pohl, M.G., Taylor, S.L., Donaldson, M.A., Nickerson, D.A. and Boyce-Jacino, M. (1999) Mining SNPs from EST databases. Genome Res. 9, 167–174. Radhika, P., Gowda, S.J., Kadoo, N.Y., Mhase, L.B., Jamadagni, B.M., Sainani, M.N., Chandra, S. and Gupta, V.S. (2007) Development of an integrated intraspecific map of chickpea (Cicer arietinum L.) using two recombinant inbred line populations. Theor. Appl. Genet. 115, 209–216. Rafalski, A. (2002) Applications of single nucleotide polymorphisms in crop genetics. Curr. Opin. Plant Biol. 5, 94–100. Ragoussis, J. (2009) Genotyping technologies for genetic research. Annu. Rev. Genomics Hum. Genet. 10, 117–133. Schlueter, J.A., Dixon, P., Granger, C., Grant, D., Clark, L., Doyle, J.J. and Shoemaker, R.C. (2004) Mining EST databases to resolve evolutionary events in major crop species. Genome 47, 868–876. Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L., Song, Q., Thelen, J.J., Cheng, J., Xu, D., Hellsten, U., May, G.D., Yu, Y., Sakurai, T., Umezawa, T., Bhattacharyya, M.K., Sandhu, D., Valliyodan, B., Lindquist, E., Peto, M., Grant, D., Shu, S., Goodstein, D., Barry, K., Futrell-Griggs, M., Abernathy, B., Du, J., Tian, Z., Zhu, L., Gill, N., Joshi, T.,

Libault, M., Sethuraman, A., Zhang, X.C., Shinozaki, K., Nguyen, H.T., Wing, R.A., Cregan, P., Specht, J., Grimwood, J., Rokhsar, D., Stacey, G., Shoemaker, R.C. and Jackson, S.A. (2010) Genome sequence of the palaeopolyploid soybean. Nature, 463, 178–183. Sethy, N.K., Choudhary, S., Shokeen, B. and Bhatia, S. (2006a) Identification of microsatellite markers from Cicer reticulatum: molecular variation and phylogenetic analysis. Theor. Appl. Genet. 112, 347–357. Sethy, N.K., Shokeen, B., Edwards, K.J. and Bhatia, S. (2006b) Development of microsatellite markers and analysis of intraspecific genetic variability in chickpea (Cicer arietinum L.). Theor. Appl. Genet. 112, 1416–1428. Sharma, H.C., Pampapathy, G., Lanka, S.K. and Ridsdill-Smith, T.J. (2005) Exploitation of wild Cicer reticulatum germplasm for resistance to Helicoverpa armigera. J. Econ. Entomol. 98, 2246–2253. Singh, K.B., Malhotra, R.S., Halila, M.H., Knights, E.J. and Verma, M.M. (1994) Current status and future strategy in breeding chickpea for resistance to biotic and abiotic stresses. Euphytica, 73, 137–149. Singh, K.B., Ocampo, B. and Robertson, L.D. (1998) Diversity for abiotic and biotic stress resistance in the wild annual Cicer species. Genet. Resour. Crop Evol. 45, 9–17. Singh, R., Sharma, P., Varshney, R.K., Sharma, S.K. and Singh, N.K. (2008) Chickpea improvement: role of wild species and genetic markers. Biotechnol. Genet. Eng. Rev. 25, 267–314. Smith, D.R., Quinlan, A.R., Peckham, H.E., Makowsky, K., Tao, W., Woolf, B., Shen, L., Donahue, W.F., Tusneem, N., Stromberg, M.P., Stewart, D.A., Zhang, L., Ranade, S.S., Warne, r.J.B., Lee, C.C., Coleman, B.E., Zhang, Z., McLaughlin, S.F., Malek, J.A., Sorenson, J.M., Blanchard, A.P., Chapman, J., Hillman, D., Chen, F., Rokhsar, D.S., McKernan, K.J., Jeffries, T.W., Marth, G.T. and Richardson, P.M. (2008) Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res. 18, 1638–1642. Srivastava, A., Rogers, W.L., Breton, C.M., Cai, L. and Malmberg, R.L. (2011) Transcriptome analysis of Sarracenia, an insectivorous plant. DNA Res. 18, 253–261. Stupar, R.M. and Springer, N.M. (2006) Cis-transcriptional variation in maize inbred lines B73 and Mo17 leads to additive expression patterns in the F1 hybrid. Genetics, 173, 2199–2210. Suyama, M., Torrents, D. and Bork, P. (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612. Tang, J., Baldwin, S.J., Jacobs, J.M., Linden, C.G., Voorrips, R.E., Leunissen, J.A., van Eck, H. and Vosman, B. (2008) Large-scale identification of polymorphic microsatellites using an in silico approach. BMC Bioinformatics, 9, 374. Tayyar, R.I., Lukaszewski, A.J. and Waines, J.G. (1994) Chromosome banding patterns in the annual species of Cicer. Genome, 37, 656–663. Thiel, T., Michalek, W., Varshney, R.K. and Graner, A. (2003) Exploiting EST databases for the development and characterization of gene-derived SSRmarkers in barley (Hordeum vulgare L.). Theor. Appl. Genet. 106, 411–422. Udupa, S.M., Sharma, A., Sharma, A.P. and Pai, R.A. (1993) Narrow genetic variability in Cicer arietinum L. as revealed by RFLP analysis. J. Plant Biochem. Biotechnol. 2, 83–86. Varshney, R.K., Graner, A. and Sorrells, M.E. (2005) Genic microsatellite markers in plants: features and applications. Trends Biotechnol. 23, 48–55. Varshney, R.K., Thiel, T., Sretenovic-Rajicic, T., Baum, M., Valkoun, J., Guo, P., Grando, S., Ceccarelli, S. and Graner, A. (2008) Identification and validation of a core set of informative genic SSR and SNP markers for assaying functional diversity in barley. Mol. Breed. 22, 1–13. Winter, P., Pfaff, T., Udupa, S.M., Hu¨ttel, B., Sharma, P.C., Sahi, S., ArreguinEspinoza, R., Weigand, F., Muehlbauer, F.J. and Kahl, G. (1999) Characterization and mapping of sequence-tagged microsatellite sites in the chickpea (Cicer arietinum L.) genome. Mol. Gen. Genet. 262, 90–101. Wu, X., Ren, C., Joshi, T., Vuong, T., Xu, D. and Nguyen, H.T. (2010) SNP discovery by high-throughput sequencing in soybean. BMC Genomics, 11, 469. Yamamoto, T., Kimura, T., Shoda, M., Imai, T., Saito, T., Sawamura, Y., Kotobuki, K., Hayashi, T. and Matsuta, N. (2002) Genetic linkage maps constructed by using an interspecific cross between Japanese and European pears. Theor. Appl. Genet. 106, 9–18.

ª 2012 The Authors Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 10, 690–702

702 Shalu Jhanwar et al. Yang, Z. (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591. Zhang, Z., Deng, Y., Tan, J., Hu, S., Yu, J. and Xue, Q. (2007) A genomewide microsatellite polymorphism database for the indica and japonica rice. DNA Res. 14, 37–45.

Supporting information Additional Supporting information may be found in the online version of this article: Figure S1 Length (A) and average quality score (B) distribution of total number of high-quality reads generated for Cicer reticulatum. Figure S2 Length distribution (A) and read depth distribution (B) of C. reticulatum transcripts generated from optimized assembly. Figure S3 GC content of C. arietinum and C. reticulatum transcripts. The average GC content of each transcript was calculated and percentage of transcripts with GC content within a range are represented. Figure S4 Functional annotation of C. arietinum and C. reticulatum transcripts. GOSlim term assignment to the transcripts in different categories of biological process, molecular function and cellular component. Figure S5 Distribution of polymorphic SSRs at various transcript positions.

Figure S6 Number of SNPs as a function of probability (A) and average Phred quality score (B) cut-off. Figure S7 Number of SNPs and SNP containing transcripts as a function of read depth at base consensus ratio of 0.9 (A) and 1.0 (B). Figure S8 Distribution of SNPs at various transcript positions. Table S1 Statistics of SSRs identified in C. reticulatum transcripts. Table S2 Frequency of SSRs identified in C. reticulatum transcripts. Table S3 List of polymorphic SSRs identified between C. arietinum and C. reticulatum. Table S4 List of SNPs identified between C. arietinum and C. reticulatum. Table S5 Validation of 96 selected polymorphic SSRs between C. arietinum and C. reticulatum. Table S6 Validation of 24 selected SNPs between C. arietinum and C. reticulatum using CAPS genotyping assay. Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

ª 2012 The Authors Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 10, 690–702

Suggest Documents