Single-nucleotide polymorphism frequency in a set of selected lines of ...

1 downloads 0 Views 105KB Size Report
Biogemma, Les Cézeaux, F-63170 Aubie`re, France. A. Canaguier2 and D. Brunel. INRA, Centre National de Génotypage, 2 rue Gaston Crémieux, CP 5708, ...
1131

Single-nucleotide polymorphism frequency in a set of selected lines of bread wheat (Triticum aestivum L.) Catherine Ravel, Se´bastien Praud, Alain Murigneux, Aure´lie Canaguier, Fre´de´ric Sapet, Delphine Samson, Franc¸ois Balfourier, Philippe Dufour, Boulos Chalhoub, Dominique Brunel, Michel Beckert, and Gilles Charmet

Abstract: Information on single-nucleotide polymorphisms (SNPs) in hexaploid bread wheat is still scarce. The goal of this study was to detect SNPs in wheat and examine their frequency. Twenty-six bread wheat lines from different origins worldwide were used. Specific PCR-products were obtained from 21 genes and directly sequenced. SNPs were discovered from the alignment of these sequences. The overall sequence polymorphism observed in this sample appears to be low; 64 single-base polymorphisms were detected in ~21.5 kb (i.e., 1 SNP every 335 bp). The level of polymorphism is highly variable among the different genes studied. Fifty percent of the genes studied contained no sequence polymorphism, whereas most SNPs detected were located in only 2 genes. As expected, taking into account a synthetic line created with a wild Triticum tauschii parent increases the level of polymorphism (101 SNPs; 1 SNP every 212 bp). The detected SNPs are available at http://urgi.versailles.inra.fr/GnpSNP. Data on linkage disequilibrium (LD) are still preliminary. They showed a significant level of LD in the 2 most polymorphic genes. To conclude, the genome size of hexaploid wheat and its low level of polymorphism complicate SNP discovery in this species. Key words: linkage disequilibrium (LD), haplotype, sequence polymorphisms, Triticum aestivum L. Re´sume´ : La fre´quence des marqueurs de type polymorphisme d’une paire de base (SNP) est encore mal connue chez le ble´ tendre hexaploı¨de. L’objectif du travail pre´sente´ est donc d’e´tudier l’abondance de ces marqueurs dans cette espe`ce. Pour cela, 21 ge`nes (ou fragments) ont e´te´ amplifie´s spe´cifiquement par PCR puis se´quence´s dans une collection de 26 ligne´es de ble´ tendre. Dans cet e´chantillon, nous avons de´tecte´ 64 e´ve´nements de polymorphisme d’une base pour ~21,5 kb. Le niveau global de polymorphisme est relativement faible (1 SNP pour 334 pb) et tre`s variable d’un ge`ne a` l’autre. La moitie´ des ge`nes e´tudie´s sont monomorphes alors que la plupart des SNPs proviennent de seulement 2 ge`nes. Le niveau de polymorphisme est augmente´ par la prise en compte d’une ligne´e de ble´ synthe´tique (101 SNP; 1 SNP pour 212 pb). Tous les SNP de´tecte´s sont disponibles a` http://urgi.versailles.inra.fr/GnpSNP. Nous avons calcule´ le de´se´quilibre de liaison (DL) en utilisant les donne´es des 2 ge`nes les plus polymorphes, situe´s sur le chromosome 1BL et distants de 1,3 cM. Le DL intrage´nique est significatif. Pour conclure, la taille et le faible niveau de polymorphisme du ge´nome du ble´ hexaploı¨de compliquent la recherche de SNP dans cette espe`ce. Mots cle´s : de´se´quilibre de liaison (DL), haplotype, polymorphisme d’une paire de base (SNP), Triticum aestivum L.

Received 8 March 2006. Accepted 16 May 2006. Published on the NRC Research Press Web site at http://genome.nrc.ca on 9 November 2006. Corresponding Editor: D.J. Somers. C. Ravel,1 F. Balfourier, M. Beckert, and G. Charmet. INRA, UMR1095, Ame´lioration et Sante´ des Plantes, 234 avenue du Be´zet, Clermont-Ferrand, F-63100 France et Universite´ Blaise-Pascal, UMR1095, Campus des Ce´zeaux, F-63170 Aubie`re, France. S. Praud and A. Murigneux. Biogemma, Les Ce´zeaux, F-63170 Aubie`re, France. A. Canaguier2 and D. Brunel. INRA, Centre National de Ge´notypage, 2 rue Gaston Cre´mieux, CP 5708, F-91057 Evry, France. F. Sapet and D. Samson. INRA, Ge´nomique – Info, 523 place des terrasses, F-91034 Evry cedex, France. P. Dufour. ULICE, ZAC des portes de Riom, F-63200 RIOM cedex, France. B. Chalhoub. INRA, Unite´ mixte de Recherche sur les Ge´nomes Ve´ge´taux, 2 rue Gaston Cre´mieux, CP 5708, F-91057 Evry, France. 1Corresponding 2Present

author (e-mail: [email protected]). address: INRA, Unite´ mixte de Recherche sur les Ge´nomes Ve´ge´taux, 2 rue Gaston Cre´mieux, CP 5708, F-91057 Evry, France.

Genome 49: 1131–1139 (2006)

doi:10.1139/G06-067

#

2006 NRC Canada

1132

Introduction Single-nucleotide polymorphisms (SNPs) and small insertions–deletions (indels) are the most abundant forms of DNA sequence variations. Once SNPs are discovered, it is possible to genotype them using high-throughput automated methods. They can thus provide a huge number of useful markers for ultra-dense genetic mapping, population genetics, and evolutionary and genotype–phenotype association studies. Several authors have underlined the interest of developing SNP markers in plants (Rafalski 2002a, 2002b; Buckler and Thornsberry 2002). Up to now, extensive SNP analyses in plants have been carried out for Arabidopsis thaliana, rice (Oryza sativa), maize (Zea mays subsp. mays L.), soybean (Glycine max L.), and barley (Hordeum vulgare). Arabidopsis, and rice genomes are the most completely sequenced plant genomes. For Arabidopsis, a large collection of SNPs was deduced from the comparison of the whole genomic sequence of 2 land races (http://www.arabidopsis.org/Cereon/). Likewise, The Arabidopsis Information Resource (TAIR; http://www. arabidopsis.org/jsp/tairjsp/pubDbStats.jsp, accessed 23 January 2006) also published information about 63 074 nucleotide substitutions. Similarly, rice genomes were sequenced from 2 subspecies (Oryza sativa subsp. indica and Oryza sativa subsp. japonica). The comparison of these genomes allowed the detection of SNPs (Feltus et al. 2004). SNPs have been already used in Arabidopsis to study linkage disequilibrium (LD), which shows decay within 50 kb (Nordborg et al. 2005). In maize, a high frequency of nucleotide changes (on average, 1 SNP every 30 bp) and a rapid decay of LD within 100–200 bp have been reported (Tenaillon et al. 2001; Remington et al. 2001; Ching et al. 2002). In soybean, Zhu et al. (2003) provided a comprehensive study on sequence polymorphisms. These authors surveyed about 76 400 bp from 25 diverse genotypes. They found roughly 1 SNP every 273 bp, and concluded that mean nucleotide diversity in cultivated soybean is much lower than that observed in A. thaliana, the model selfing species. In addition, they estimated the LD from their data and concluded that, in contrast to reports on maize (Tenaillon et al. 2001; Remington et al. 2001), there is little decay of LD over distances of about 50 kb; LD declined only at genetic map distances greater than 2.5 cM. In barley, the level of sequence polymorphism was reported to be approximately 1 mutation per 189 bases (Kanazin et al. 2002), but was much higher (1 SNP every 50 bp) in another sample of accessions including elite lines, landraces of Hordeum vulgare, and wild accessions of Hordeum spontaneum (Russell et al. 2004). In bread wheat, SNP studies have been limited to single genes or DNA fragments (Giroux and Morris 1997; Peng et al. 1999; Morris 2002; Yanagisawa et al. 2003; Guillaumie et al. 2004; Boisson et al. 2005) allowing association studies or genetic mapping. The size of the bread wheat genome and its allohexaploidy make the identification of SNPs difficult, since variations in allelic sequences can be confused with homoeologous (differences between the copies of the A, B, and D genomes) and paralogous (differences in duplicate copies that may exist within a given genome) variations. Recently, systematic searches for SNPs in wheat have

Genome Vol. 49, 2006

been initiated and many SNPs are available at http://wheat. pw.usda.gov/SNP/. Our objective was thus to assess the frequency of sequencebased polymorphism in a sample of 26 hexaploid wheat genotypes using BAC (bacterial artificial chromosome) pooled DNAs to design locus-specific PCR primers. Twenty loci were surveyed for SNPs. We identified 64 single-base mutations (including 1 single-base deletion) in 10 out of these 21 loci.

Materials and methods Plant material and DNA extraction Twenty-six lines (Table 1) were sequenced to determine the frequency and nature of SNPs in Triticum aestivum. This sample was chosen from a large collection of genetic resources described and maintained at INRA, ClermontFerrand (France). In this large collection, the accessions represented diverse geographical origins, growth habits (winter or spring type), and status (landraces, old or recent cultivars). Data concerning neutral polymorphism based on a set of 42 microsatellite loci (1 per chromosome arm) were also available (Roussel et al. 2004, 2005) for most of these accessions. The 26 lines were chosen to maximize the variability in this collection using MSTRAT software (Gouesnard et al. 2001; http://www.montpellier.inra.fr/gap/ MSTRAT/mstratno.htm) based on the maximization of allelic richness at the 42 SSR loci. The synthetic hexaploid wheat T. aestivum ‘W7984’, which was generated using Triticum tauschii accession CIGM86.940 (DD) crossed with the tetraploid wheat, T. aestivum ‘Altar 84’ (AABB), by Dr. A Mujeeb-Kazi at CIMMYT, Mexico, was also studied, as it is a parental line used in the International Triticeae Mapping Initiative (ITMI). A BAC library from T. aestivum ‘Renan’ (Chalhoub et al. 2003) was used to develop PCR primers. This BAC library corresponds to a 7 genome coverage organised in seven 384-well plates. The nullitretrasomic, ditelosomic and deletion lines of T. aestivum ‘Chinese Spring’ (Sears 1966) allowed the chromosome assignation of amplicons. Each seed used to obtain plantlets for DNA extraction came from a single head that was self-pollinated to avoid possible cross-pollination. All the plantlets for a given accession are thus considered genetically identical. Leaves were harvested from a pool of five 3-week old seedlings per accession and bulk genomic DNA was extracted from fresh leaves using the Sigma GenElute Plant Genomic DNA Kit (G2N-350, Sigma Aldrich, St. Louis, Mo.). Candidate genes The genes studied (Table 2) were known to be involved (i) in grain hardness (pina and pinb coding for puroindolines), (ii) in storage-protein synthesis, e.g., structural genes coding for high-molecular-weight glutenin subunits (HMWGS) and transcriptional factors able to interact with them, and (iii) in global regulation of activated defence responses or in such disease-resistance mechanisms as Rar1, Sgt1, and Npr1. SNP discovery As wheat is a hexaploid species, the discovery of SNPs is #

2006 NRC Canada

Ravel et al.

1133 Table 1. Bread wheat lines sequenced to identify single-nucleotide polymorphisms. Cultivar ‘Opata’ ‘Chinese Spring’ ‘Seu Seun 27’ ‘Thatcher’ ‘Austro-Bankut’ ‘Glenlea’ ‘Bonpain’ ‘Flint’ ‘Libellula’ ‘Elo’ ‘Gama’ ‘Frandoc’ ‘Renan’ ‘Courtot’ ‘Re´cital’ ‘Ze´nith’ ‘Rouge de Bordeaux’ ‘Cle´ment’ ‘Eure´ka’ ‘Maris Huntsman’ ‘Arche’ ‘Capelle Desprez’ ‘Malacca’ ‘Arfort’ ‘Frontana’ ‘Nobeo Kabozukomugi’

Geographical origin Mexico China Korea USA Austria Canada France USA Italia Finland Poland France France France France Switzerland France Netherlands France UK France France France France Brazil Japan

Growth habit Spring Spring Spring Spring Winter Spring Winter Winter Winter Winter Winter Winter Winter Facultative Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Spring Spring

Status (date of release) Released cultivar (1985) Land race Released cultivar (1936) Released cultivar (1934) Released cultivar (1919) Released cultivar (1972) Released cultivar (1993) Released cultivar (1830) Released cultivar (1965) Released cultivar (1963) Released cultivar (1981) Released cultivar (1980) Released cultivar (1989) Released cultivar (1974) Released cultivar (1986) Released cultivar (1969) Land race Released cultivar (1974) Released cultivar (1991) Released cultivar (1971) Released cultivar (1989) Released cultivar (1946) Released cultivar (1997) Released cultivar (1990) Released cultivar (1930) Land race

Table 2. Candidate genes studied and chromosomal location. Pathway

Gene name or gene coding for

Chromosomal location

Length sequenced (bp)

Protein synthesis

Glu-B1–1 Glu-D1–1 Spaa Spaa Spaa wPbfb wPbfb wPbfb Pinac Pinb UDP-glucose 4-epimerase UDP-glucose 4-epimerased UDP-glucose 4-epimerased Sgt1 Sgt1 Sgt1 Npr1 Npr1 Rar1 Rar1 Rar1

1BL 1DL 1AL 1BL 1DL 5AL 5BL 5DL 5DS 5DS 5AL 5DL 5DL 3AL 3BL 3DL 3BL 3DL 6A 6B 6D

1753 707 524 2858 453 1764 1757 1631 762 783 376 752 752 570 859 600 1451 1400 529 551 616

Hardness Pentosans

Disease resistance

Note: Bold type indicate genes with polymorphism. Storage protein activator. b Prolamin-box binding factor. c This gene can be present or deleted. d Probably two different copies of the same gene. a

#

2006 NRC Canada

1134

hindered by the presence of 3 homoeologous genomes; consequently, genes are present in several copies, with some exceptions such as pina and pinb. Direct sequencing of genes from PCR products thus requires the preliminary design of locus-specific PCR primers to avoid co-amplification of the different copies. For standardization, all the primers were defined with a melting temperature between 55 and 65 8C using Oligo 61 software (Medprobe, Oslo, Norway) (Rychlik and Rhoads 1989). For genes coding for HMW-GS and puroindolines, locusspecific primers were designed from the alignment of sequences of all the copies available in public databases. For the other genes, designing locus-specific primers required several steps as follows: (i) deducing their structure from wheat ESTs aligned with wheat (if available) or rice genomic sequences to design PCR primers in conserved regions of putative exons; (ii) testing each primer pair by PCR (see PCR conditions) performed with the DNA from T. aestivum ‘Renan’ to discard pairs giving no band or a smear (this avoided wasting DNA from the BAC library); (iii) screening the BAC library with these primer pairs to obtain all the copies of the genes under study (PCR were done on each pooled row (24 wells) and column (16 wells) of each 384-well plate representing 1 genome equivalent; because each well in one plate contained less than a genome equivalent, a PCR product was likely the amplification of a single homoeologous sequence); (iv) sequencing PCR products obtained from the BAC library; and (v) designing locus-specific primers from the alignment of these products. Locus-specific primers were then validated by PCR on nullitretrasomic, ditelosomic, and deletion lines of ‘Chinese Spring’ (Sears 1966). The primer pairs giving amplicons that could be assigned to only 1 chromosome, chromosome arm, or deletion bin were subsequently used for amplification of DNA from the 27 lines. The single-copy PCR products obtained were then sequenced. The sequences obtained were analyzed using the Staden software package (Staden et al. 2000). Mutations detected in only 1 line (singleton) were checked by resequencing an independent amplification product. PCR amplification and sequencing PCRs were performed in a final volume of 25 mL containing 25 ng of genomic DNA, 250 mmol/L of each dNTP, 0.4 mmol/L of each primer, 1 U Taq polymerase (Qiagen, Valencia, Calif.) and 1 Taq polymerase buffer. Cycling consisted of a touch-down profile as follows: 1 cycle at 94 8C for 4 min; 10 cycles at 948c for 1 min, 65 8C for 1 min (decreasing the annealing temperature by 1 8C/cycle in subsequent cycles), and 72 8C for 2 min; 30 cycles at 94 8C for 1 min, 55 8C for 1min, and 72 8C for 2 min; a final extension at 72 8C for 5 min. Amplification products were resolved on Tris–acetate–EDTA (TAE) 1 1.2% w/v agarose gels. After running for 30 min at 100 V, gels were stained with ethidium bromide for 20 min in a 1 mg/L solution and viewed on a UV table. All PCR products were purified and sequenced using the Big Dye Sequencing kit according to the manufacturer’s specifications (Applied Biosystems, Courtaboeuf, France). Sequence products were purified and loaded onto ABI3700 96 capillary sequencers.

Genome Vol. 49, 2006

Nucleotide diversity, haplotype definition, and LD estimation To describe the diversity observed in the panel of 26 lines, we calculated the following sequence statistics: haplotype number and diversity (Hd), nucleotide diversity estimated by p (Tajima 1983), and the mean pairwise differences, using DNASP software version 4.00 (Rozas et al. 2003). For each gene studied and for each genotype, a binary code was applied to the combinations of allelic haplotypes. A pairwise Jaccard (1908) distance matrix among the 26 lines was calculated from data and represented by a UPGMA tree using Splus software (MathSoft Inc., Needham, Mass.). LD was estimated using squared allele-frequency correlations (r2; see Flint-Garcia et al. 2003) for pairs of polymorphic sites. A w2 test was used to determine if the associations between polymorphisms were significant. Only sites with a frequency at least of 10% for the rarer allele were taken into consideration, because LD measures do not perform well with low allele frequency.

Results All the results (sequences, primers, PCR conditions, polymorphisms) are available at http://urgi.versailles.inra.fr/ GnpSNP. The panel of 26 lines was selected to represent a large part of the variability present in a collection of genetic resources used by breeders. It could thus be expected that this sample would permit the discovery of a large proportion of DNA polymorphism present in bread wheat. The genes are located on chromosomes from homoeologous groups 1, 3, 5, and 6 (Table 2). The A, B, and D genome accounted for 3762, 9228, and 8458 bases, respectively. Nucleotide diversity A total of 21 448 bases were sequenced from DNA fragments of 21 genes for each line. A total of 64 single-base changes, 1 single-base indel, and 4 larger indels were identified (Table 3). Out of the 64 mutations observed, 16 (25%) corresponded to singletons (polymorphism found in only 1 line), which often characterized T. aestivum ‘Seu-Seun 27’. Transversions accounted for 48 SNPs (75%) and transitions for 16 SNPs (25%). The gene pina was deleted in some lines; this was considered an indel. Three large indels were detected in Glu-B1-1, the fourth being located in an intron of the B homoeologous gene coding for SPA. These data showed that wheat had, in this sample, an average of 1 SNP every 334 bases. The coding regions, representing 10 139 bases, contained 38 mutations, 1 single-base indel, and 2 larger indels. Among these changes, large indels having been discarded, 25 SNPs were non-synonymous; hence the ratio of synonymous to non-synonymous mutations was about 0.5. An average of 1 SNP every 267 bases was observed in coding regions, whereas the average was 1 SNP every 435 bases in non-coding regions. The presence of polymorphism varied considerably between DNA fragments (Tables 3 and 4). No polymorphism was detected in 10 DNA fragments representing 7405 bases, of which 2818 bp were situated in coding regions (Table 3). These highly con#

2006 NRC Canada

Ravel et al.

1135

Table 3. Nucleotide diversity in wheat. Non-coding regions

No. of bases sequenced No. of SNPsa In the panel of 26 lines Including the synthetic line No. of single base indelsb No. of large indels (bp)b,c

Associated genomic sequences

UTR regions

Intron

Coding regions

Total

2131

380

8798

10139

21448

12 12 0 1.(54 bp)

0 0 0 0

14 21 0 1 .(4 bp)

38.(13; 25) 68.(28; 40) 1 2.(15 and 27bp)

64 101 1 0

a

For coding regions, the number of synomymous SNPs and the number of non-synonymous SNPs are indicated in parentheses. Not changed by taking into account the synthetic line. c The deletion of pina is not taken into account. b

Table 4. Diversity statistics for polymorphic gene fragments in hexaploid bread. Non-coding regions

Gene, genome

Bases

Spa, A

92

Spa, B

2045

Coding regions

Indels

SNPs

0 0 1 1 1 0 0

0 0 14 16 9 9 3 4 0 0 0 2 0 0 1

Glu-B1–1, B

653

Glu-D1–1, D

707

wPbf, A

894

WPbf, B

854

wPbf, D

779

UDP-glucose 4epimerase, Dc

506

0 0 0 0 0 0 0

UDP-glucose 4epimerase, Dc

506

0 0

1 1

Pinb-5, D

333

0 0 0

1 0 0

Haplotype

SNPsa

Nucleotide statistics (103)

No.

Hdb

246

0 0 0 0 2 2 0 0 0 0 0 0 0 0 0

2.(1) 2.(1) 2.(1) 4.(3) 27.(20) 27.(20) 0.(0) 0.(0) 1.(1) 1.(1) 1.(0) 3.(0) 1.(1) 0.(0) 0.(0)

1.05 0.98 1.47 1.75 6.23 6.07 1.27 1.34 0.18 0.18 0.04 0.21 0.09 0.09 0.43

3 3 4 5 5 5 3 4 2 2 2 3 2 2 2

0.439 0.410 0.458 0.499 0.548 0.533 0.465 0.504 0.323 0.313 0.077 0.145 0.077 0.074 0.323

246

0 0

0.(0) 0.(0)

0.42 0.41

2 2

0.313 0.312

0 1d 1d

0.(0) 2.(2) 30.(15)

0.42 0.98 6.74

2 4 5

0.300 0.644 0.673

Bases 432 813 1110 0 870 903 852

420

Indels

Total

Bases 524

SNPs

752

2 2 16 20 36 36 3 4 1 1 1 5 1 1 1

752

1 1

2858 1753 707 1764 1757 1631

783

1 2 30

Note: For each gene, the first and second lines show the results from panels 26 and 27 (with the synthetic line ‘W7984’), respectively. No. of SNPs (no. of non-synomynous polymorphisms among those detected). b Haplotype diversity. c Probably 2 different copies. d Single-base deletion. a

served fragments mostly concerned the genes involved in disease-resistance mechanisms. Two gene fragments provided 52 (81%) of the 64 SNPs observed. Some of the genes studied here have already been investigated. Up to now, 4 alleles were reported for pina (Chen et al. 2006). The 2 most frequent alleles were found in our sequences: the presence of the gene or its complete deletion described by Sourdille et al. (1996). No polymorphism was detected in our sample when this gene was not deleted. Recently, Chen et al. (2006) described 12 alleles for pinb. In the sample of 26 lines, we detected 4 of these al-

leles: pinb-D1a, -D1b, -D1d, and -D1p (Giroux and Morris 1997; Lillemo and Morris 2000; Chen et al. 2006). Nucleotide diversity estimated by  varied considerably from one gene to another and ranged from 0 for all the genes showing no polymorphism to 0.00623 for Glu-B1–1 with a mean value of 0.00055 (Table 4). As expected, taking into account the synthetic line ‘W7984’ increased the level of polymorphism observed (Table 4); this led to 37 additional SNPs (1 SNP every 212 bp, mean value of  & 0.0009). Most of these additional SNPs are in pinb, which is located on the 5D chromosome. #

2006 NRC Canada

1136

Haplotype study and linkage disequilibrium For polymorphic fragments, the number of haplotypes in the sample of 26 lines ranged from 2 to 5 with diversity values ranging from 0.077 for the homoeologous genes wPbf-A and -B to 0.644 for pinb (Table 4). The highest value for the number of haplotypes is observed for Glu-B1-1. Taking the average polymorphic fragment size as 1280 bp, the average number of SNPs per polymorphic fragment was about 6. Under the assumption that SNPs are randomly associated within a gene in an infinitely large population, there would be an average of 26 possible haplotypes per gene fragment, of which obviously a maximum of 26 could have been present in the collection sequenced. However, an average of only 3 haplotypes per polymorphic gene was observed (Table 4). This clearly suggested that SNPs are in strong LD within a gene. Two genes, Glu-B1-1 and the B-homoeologous gene coding for SPA, contained enough polymorphisms to calculate within-gene LD. As expected, in these 2 genes, LD, estimated as r2, showed significant values ranging from 0.6 to 1. The UPGMA tree based on the observed combination of alleles showed no identical lines among the 26 genotypes studied (data not shown). In this sample, the most divergent lines from any other were ‘Re´cital’ and ‘Seu-Seun 27’. If the sample of 26 lines studied in this work was actually representative of the diversity of bread wheat, the limited number of haplotypes suggested that SNP might be discovered by sequence analysis of a smaller set of lines with similar efficiency. For each studied gene, sequencing only one line per haplotype allows the detection of all the SNPs reported in this work. It appeared that all the polymorphisms could be identified in a minimal-size sample of 9 genotypes (‘Opata’, ‘Chinese Spring’, ‘Seu-Seun27’, ‘Austro Bankut’, ‘Thatcher’, ‘Libellula’, ‘Re´cital’, ‘Malacca’, and ‘Frontana’).

Discussion Our results represent a new set of SNPs in Triticum aestivum L., discovered by genomic sequence analysis. This set of SNPs is available in GnpSNP. GnpSNP is a database of a web-based system composed of several applications built above a relational database that includes integrated schemas for sequence data, map data, transcriptome data, genomic and sequence annotation data, SNP data, and proteomic data (Samson et al. 2003). The GnpSNP database allows the storage of various sequence polymorphisms (SNPs, insertion– deletion polymorphisms, and short tandem repeats), coming from different sequencing or genotyping technologies and obtained on different species. This release is available at http://urgi.versailles.inra.fr/GnpSNP and is the first public release of the application and its data. It is worth keeping in mind that the results reported here were obtained on particular samples of bread wheat lines and genes, and therefore any inference to other genes and (or) lines should be made with caution. Bread wheat is an autogamous species. Moreover, DNA was extracted from pooled plantlets of each accession that were produced from a single self-pollinated mother head. These plantlets were thus considered genetically identical. Therefore, in this work, variations that can exist within wheat cultivars are expected to be minimized.

Genome Vol. 49, 2006

The developed method was based on the design of genomespecific primers and thus the discovery of allelic variations by directly sequencing PCR-products. This method proved to be efficient at detecting polymorphism. Here, the design of specific primers depends on the differences between the sequences of each copy of the studied gene, which were obtained using the BAC library of ‘Renan’. Designing specific primers can be done by an accurate in silico study of contigs of expressed sequence tags (ESTs) as suggested by Somers et al. (2003). Among the SNPs discovered, 25% were singletons. They were checked by resequencing independent amplicons, which confirmed their biological existence. This rate of single alleles in the panel of 26 lines is not surprising because these lines were chosen on the basis of their diversity. As expected, the data showed that transversions were more abundant than transitions. This study allowed us to estimate an average of 1 SNP for every 334 bp of genomic sequence including both coding and non-coding regions, and 1 SNP every 267 bp in coding regions. This was based on a sample of 26 lines. In wheat, Somers et al. (2003) detected 1 SNP every 540 bp of EST sequence using a bioinformatics strategy based on assembling ESTs from 12 cultivars under stringent conditions. The different levels of overall sequence polymorphism in coding regions observed in these 2 works can be explained by the different samples of lines studied (26 lines representative of worldwide diversity vs. 12 lines being a source of ESTs). These 2 studies reported that the level of polymorphism in wheat is actually lower than that reported for maize, barley, and soybean (Tenaillon et al. 2001; Ching et al. 2002; Kanazin et al. 2002; Zhu et al. 2003). The mean value of nucleotide diversity (mean value of  = 0.00055) confirmed the low level of diversity in this sample. Most of the SNPs detected in this study (80%) were located in only 2 genes, Glu-B1-1 and the B-homoeologous gene coding for SPA. There was no sequence variation in half of the studied fragments. This suggests an uneven distribution of sequence variations. Several factors, such as natural selection or the heterogeneity of mutation rates across loci, lead to heterogeneity of genetic variability in plants. As far as disease tolerance is concerned, most of the genes studied here were highly conserved with no polymorphism detected. The NPR1 gene controls the onset of systemic acquired resistance (SAR), one of the most important induced defence responses in plants. SAR is triggered after local infection with pathogens, causing hypersensitive necrosis, and is effective against a broad spectrum of plant pathogens (Ryals et al. 1996). Different signal transduction pathways lead to SAR, but many of them result in a convergent complex in which RAR1 and SGT1 interact (Azevedo et al. 2002; Austin et al. 2002). Thus Npr1, Sgt1, and Rar1 have major and central roles and some mutations or silencing analysis in barley, Nicotinia, and Arabidopsis have demonstrated that they were essential for several diseaseresistance genes (Shirasu et al. 1999; Spoel et al. 2003). Therefore, the absence of polymorphism in these 3 genes may be explained in part by a considerable selection pressure in the course of wheat breeding. Glu-B1-1 contains indels and many SNPs, most of them located in the coding sequence and being non-synonymous. #

2006 NRC Canada

Ravel et al.

In this study, the ratio of synonymous to non-synonymous changes was 0.5. This value is lower than that observed in maize (2.8 in Tenaillon et al. 2001) and soybean (2.6 in Zhu et al. 2003). Low nonsynonymous diversity may reflect purifying selection against nonsynonymous polymorphism conferring a lower fitness. Our result is thus surprising and can be explained by the presence of Glu-B1-1. Similarly, more polymorphisms were observed in coding than in noncoding regions. Salmaso et al. (2004) reported a similar result from grapevine. However, the presence of Glu-B1-1 in the set of genes studied explains this unexpected result. The very high level of sequence polymorphism in Glu-B1-1 is in agreement with that observed in Glu-A3 genes coding for certain low-molecular-weight glutenin subunits (Zhang et al. 2004b). Genes coding for storage proteins have probably not been submitted to any selection pressure until very recently, and most of the many mutations observed in Glu-B1-1 and in Glu-A3 are likely to be selectively neutral. The limited haplotype diversity observed in our data, and the high level of LD observed within 2 genes showed that nucleotide variability was not randomly distributed along the sequence, but was organised in specific combinations. Based on these data, there was no evidence of frequent intragenic recombination. The small number of haplotypes found suggests that Triticum aestivum evolved from a small number of founder genotypes. However, despite the small number of haplotypes detected per gene, we found no identical line when these haplotypes were studied together. Studying the haplotypes led us to identify a smaller set of 9 bread wheat lines containing all the SNPs detected. This set may facilitate the discovery of further SNPs. Ravel et al. (2006) reported the absence of LD between Glu-B1-1 and the B-homoeologous gene coding for SPA. These 2 genes were located in a 1.3 cM length region of chromosome 1BL. This lack of LD was therefore favourable for high-precision mapping through association studies However, LD in hexaploid wheat may vary according to the region of the genome and the material under study. Taking into account the synthetic line ‘W7984’ led us to detect 37 additional SNPs, i.e. 101 SNPs in 21 448 bp (1 SNP every 212 bp), leading to a  mean value of 0.0009. This result is in agreement with that reported by Bryan et al. (1999). These authors detected 10 SNPs in 2370 bp from a sample including adapted hexaploid wheat and a synthetic line and estimated a  mean value of 0.001. Synthetic lines result from crosses between durum wheat and T. tauschii, and may thus bring interspecific polymorphisms characterizing their parental lines. Previous studies have already revealed the higher diversity in synthetic wheat lines compared with non synthetic lines (Lage et al. 2003; Zhang et al. 2004a). In conclusion, the level of sequence polymorphism in bread wheat appeared to be rather low. This was expected from its evolutionary history. During the evolutionary history of hexaploid wheat, several bottlenecks occurred, one of them being linked to a recent step of interspecific hybridization that carried the D genome (Dvorak et al. 1998). Such phenomena may have contributed to a dramatic reduction in population size, and therefore in the level of diversity and the number of haplotypes. However, using amphiploid lines

1137

may be a way to increase the genetic variability of hexaploid wheat.

Acknowledgements The nullitetrasomic and ditelosomic lines were kindly provided by Dr. S. Reader (John Innes Centre, Norwich, UK) and deletion lines by Dr. B. Gill (Kansas State University). This research was funded by the French federative genomic program Ge´noplante.

References Austin, M.J., Muskett, P., Kahn, K., Feys, B.J., Jones, J.D.G., and Parker, J.E. 2002. Regulatory role of SGT1 in early R genemediated plant defenses. Science (Washington, D.C.), 295: 2077–2080. doi:10.1126/science.1067747. PMID:11847308. Azevedo, C., Sadanandom, A., Kitagawa, K., Freialdenhoven, A., Shirasu, K., and Schulze-Lefert, P. 2002. The RAR1 interactor SGT1, an essential component of R gene-triggered disease resistance. Science (Washington, D.C.), 295: 2073–2076. doi:10. 1126/science.1067554. PMID:11847307. Boisson, M., Mondon, K., Torney, V., Nicot, N., Laine, A.L., Bahrman, N., et al. 2005. Partial sequences of nitrogen metabolism genes in hexaploid wheat. Theor. Appl. Genet. 110: 932–940. doi:10.1007/s00122-004-1913-4. PMID:15714330. Bryan, G.J., Stephenson, P., Collins, A., Kirby, J., Smith, J.B., and Gale, M.D. 1999. Low levels of DNA sequence variation among adapted genotypes of hexaploid wheat. Theor. Appl. Genet. 99: 192–198. doi:10.1007/s001220051224. Buckler, E.S., and Thornsberry, J.M. 2002. Plant molecular diversity and applications to genomics. Curr. Opin. Plant Biol. 5: 107–111. doi:10.1016/S1369-5266(02)00238-8. PMID:11856604. Chalhoub, B., Allouis, S., Safar, J., Janda, J., Bellec, A. Sarda, X., et al. 2003. Towards precise analysis of the wheat genome: preparation of genomic resources for structural and functional characterization. In Proceedings of the Xth International Wheat Genetics Symposium, 1–6 September, Paestum, Italy. Edited by N.W. Pogna, M. Romano`, E.A. Pogna, and G. Galterio. SIMI, Roma, Italy. pp. 229–233. Chen, F., He, Z.H., Xia, X.C., Xia, L.Q., Zhang, X.Y., Lillemo, M., and Morris, C.F. 2006. Molecular and biochemical characterization of puroindoline a and b alleles in Chinese landraces and historical cultivars. Theor. Appl. Genet. 112: 400–409. doi:10. 1007/s00122-005-0095-z. PMID:16344983. Ching, A., Caldwell, K.S., Jung, M., Dolan, M., Smith, O.S., Tingey, S., et al. 2002. SNP frequency: haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC Genet. 3: 19–32. doi:10.1186/1471-2156-3-19. PMID:12366868. Dvorak, J., Luo, M.C., Yang, Z.L., and Zhang, H.B. 1998. The structure of the Aegilops tauschii genepool and the evolution of hexaploid wheat. Theor. Appl. Genet. 97: 657–670. doi:10.1007/ s001220050942. Feltus, F.A., Wan, J., Schulze, S.R., Estill, J.C., Jiang, N., and Paterson, A.H. 2004. An SNP resource for rice genetics and breeding based on subspecies Indica and Japonica genome alignments. Genome Res. 14: 1812–1819. doi:10.1101/gr. 2479404. PMID:15342564. Flint-Garcia, S.A., Thornsberry, J.M., and Buckler, E.S., IV. 2003. Structure of linkage disequilibrium in plants. Annu. Rev. Plant Biol. 54: 357–374. doi:10.1146/annurev.arplant.54.031902. 134907. PMID:14502995. Giroux, M.J., and Morris, C.F. 1997. A glycine to serine change in puroindoline b is associated with wheat grain hardness and low #

2006 NRC Canada

1138 levels of starch-surface friabilin. Theor. Appl. Genet. 95: 857–864. doi:10.1007/s001220050636. Gouesnard, B., Bataillon, T., Decoux, G., Rozale, C., Schoen, D.L., and David, J.L. 2001. An algorithm for building germplasm core collections by maximizing allelic or phenotypic richness. J. Hered. 92: 93–94. doi:10.1093/jhered/92.1.93. PMID:11336240. Guillaumie, S., Charmet, G., Linossier, L., Torney, V., Robert, N., and Ravel, C. 2004. Co-location between a gene encoding for the bZip factor SPA and an eQTL for a high-molecular-weight glutenin subunit in wheat (Triticum aestivum). Genome, 47: 705–713. doi:10.1139/g04-031. PMID:15284875. Jaccard, P. 1908. Nouvelles recherches sur la distribution florale. Bull. Soc. Vaudoise Sci. Mat. 44: 223–270. Kanazin, V., Talbert, H., See, D., DeCamp, P., Nevo, E., and Blake, T. 2002. Discovery and assay of single-nucleotide polymorphisms in barley (Hordeum vulgare). Plant Mol. Biol. 48: 529–537. doi:10.1023/A:1014859031781. PMID:11999833. Lage, J., Skovmand, B., and Andersen, S.B. 2003. Characterization of greenbug (Homoptera: Aphididae) resistance in synthetic hexaploid wheats. J. Econ. Entomol. 96: 1922–1928. PMID:14977134. Lillemo, M., and Morris, C.F. 2000. A leucine to proline mutation in puroindoline b is frequently present in hard wheats from northern Europe. Theor. Appl. Genet. 100: 1100–1107. doi:10. 1007/s001220051392. Morris, C.F. 2002. Puroindolines: the molecular genetic basis of wheat grain hardness. Plant Mol. Biol. 48: 633–647. doi:10. 1023/A:1014837431178. PMID:11999840. Nordborg, M., Hu, T.T., Ishino, Y., Jhaveri, J., Toomajian, C., Zheng, H., et al. 2005. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3: e196. doi:10.1371/journal.pbio. 0030196. PMID:15907155. Peng, J., Richards, D.E., Hartley, N.M., Murphy, G.P., Devos, K.M., Flintharm, J.E., et al. 1999. ‘Green revolution’ genes encode for mutant gibberellin response modulators. Nature (London), 400: 256–261. PMID:10421366. Rafalski, A. 2002a. Applications of single nucleotide polymorphisms in crop genetics. Curr. Opin. Plant Biol. 5: 94–100. doi:10. 1016/S1369-5266(02)00240-6. PMID:11856602. Rafalski, A. 2002b. Novel genetic mapping tools in plants: SNPs and LD-based approaches. Science (Washington, D.C.), 162: 329–333. Ravel, C., Praud, S., Murigneux, A., Linossier, L., Dardevet, M., Balfourier, F., et al. 2006. An association study to discriminate candidate genes in bread wheat (Triticum aestivum). Theor. Appl. Genet. 112: 738–743. PMID:16362275. Remington, D.L., Thornberry, J.M., Matsuoka, Y., Wilson, L.M., Whitt, S.R., Doebley, J., et al. 2001. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc. Natl. Acad. Sci. U.S.A. 98: 11479–11484. doi:10.1073/ pnas.201394398. PMID:11562485. Roussel, V., Koenig, J., Beckert, M., and Balfourier, F. 2004. Molecular diversity in French bread wheat accessions related to temporal trend and breeding programmes. Theor. Appl. Genet. 108: 920–930. doi:10.1007/s00122-003-1502-y. PMID:14614567. Roussel, V., Leisova, L., Exbrayat, F., Stehno, Z., and Balfourier, F. 2005. SSR allelic diversity changes in 480 European bread wheat varieties released from 1840 to 2000. Theor. Appl. Genet. 111: 162–170. doi:10.1007/s00122-005-2014-8. PMID:15887038. Rozas, J., Sa´nchez-DelBarrio, J.C., Messeguer, X., and Rozas, R. 2003. DnaSP. DNA polymorphism analyses by the coalescent

Genome Vol. 49, 2006 and other methods. Bioinformatics, 19: 2496–2497. doi:10. 1093/bioinformatics/btg359. PMID:14668244. Russell, J., Booth, A., Fuller, J., Harrower, B., Hedley, P., Machray, G., and Powell, W. 2004. A comparison of sequencebased polymorphism and haplotype content in transcribed and anonymous regions of the barley genome. Genome, 47: 389–398. PMID:15060592. Ryals, J.A., Neuenschwander, U.H., Willits, M.G., Molina, A., Steiner, H.Y., and Hunt, M.D. 1996. Systemic acquired resistance. Plant Cell, 8: 1809–1819. doi:10.1105/tpc.8.10.1809. PMID:12239363. Rychlik, W., and Rhoads, R.E. 1989. A computer program for choosing optimal oligonucleotides for filter hybridization, sequencing and in vitro amplification of DNA. Nucleic Acids Res. 17: 8543–8551. PMID:2587212. Salmaso, M., Faes, G., Segala, C., Stefanini, M., Salakhutdinov, I., Zyprian, E., et al. 2004. Genome diversity and gene haplotypes in the grapevine (Vitis vinifera L.), as revealed by single nucleotide polymorphisms. Mol. Breed. 14: 385–395. doi:10.1007/ s11032-004-0261-z. Samson, D., Legeai, F., Karsenty, E., Reboux, S., Veyrieras, J.B., Just, J., and Barillot, E. 2003. Ge´noPlante-Info (GPI): a collection of databases and bioinformatics resources for plant genomics. Nucleic Acids Res. 31: 179–182. doi:10.1093/nar/gkg060. PMID:12519976. Sears, E.R. 1966. Nullisomic-tetrasomic combinations in hexaploid wheat. In Chromosome manipulations and plant genetics. Edited by R. Riley and K.R. Lewis. Olivier and Boyd, Edinburgh, UK. pp. 29–45. Shirasu, K., Lahaye, T., Tan, M.W., Zhou, F., Azevedo, C., and Schulze-Lefert, P. 1999. A novel class of eukaryotic zinc-binding proteins is required for disease resistance signalling in barley and development in C. elegans. Cell, 99: 355–366. doi:10.1016/ S0092-8674(00)81522-6. PMID:10571178. Somers, D.J., Kirkpatrick, R., Moniwa, M., and Walsh, A. 2003. Mining single-nucleotid polymorphisms from hexaploid wheat ESTs. Genome, 46: 431–437. doi:10.1139/g03-027. PMID:12834059. Sourdille, P., Perretant, M.R., Charmet, G., Leroy, P., Gautier, M.-F., Joudrier, P., et al. 1996. Linkage between RFLP markers and genes affecting kernel hardness in wheat. Theor. Appl. Genet. 93: 580–586. Spoel, S.H., Koornneef, A., Claessens, S.M., Korzelius, J.P., Van Pelt, J.A., Mueller, M.J., et al. 2003. NPR1 modulates crosstalk between salicylate- and jasmonate-dependent defense pathways through a novel function in the cytosol. Plant Cell, 15: 760–770. doi:10.1105/tpc.009159. PMID:12615947. Staden, R., Beal, K.F., and Bonfield, J.K. 2000. The Saden package, 1998. Methods Mol. Biol. 132: 115–130. PMID:10547834. Tajima, F. 1983. Evolutionary relationship of DNA sequences in finite populations. Genetics, 105: 437–460. PMID:6628982. Tenaillon, M.I., Sawkins, M.C., Long, A.D., Gaut, R.L., Doebley, J.F., and Gaut, B.S. 2001. Patterns of DNA sequence polymorphism along chromosome 1 of maize. Zea mays ssp. mays L. Proc. Natl. Acad. Sci. U.S.A. 98: 9161–9166. doi:10.1073/ pnas.151244298. PMID:11470895. Yanagisawa, T., Kiribuchi-Otobe, C., Hirano, C., Suzuki, Y., and Fujita, M. 2003. Detection of single nucleotide polymorphism (SNP) controlling the waxy character in wheat by using a derived cleaved amplified polymorphic sequence (dCAPS) marker. Theor. Appl. Genet. 107: 84–88. PMID:12669198. Zhang, L., Liu, D., Yan, Z., Lan, X., Zheng, Y., and Zhou, Y. 2004a. Rapid changes of microsatellite flanking sequence in the #

2006 NRC Canada

Ravel et al. allopolyploidization of new synthesized hexaploid wheat. Sci. China C Life Sci. 47: 553–561. PMID:15620112. Zhang, W., Gianibelli, M.C., Rampling, L.R., and Gale, K.R. 2004b. Characterization and marker development for low molecular weight glutenin genes from Glu-A3 alleles of bread wheat

1139 (Triticum aestivum L.). Theor. Appl. Genet. 108: 1409–1419. doi:10.1007/s00122-003-1558-8. PMID:14727031. Zhu, Y.L., Song, Q.J., Hyten, D.L., Van Tassell, C.P., Matukumalli, L.K., Grimm, D.R., et al. 2003. Single-nucleotide polymorphism in soybean. Genetics, 163: 1123–1134. PMID:12663549.

#

2006 NRC Canada

Suggest Documents