Genetic Diversity, Population Structure, and Linkage Disequilibrium in ...

Published September, 2010 O R I G I N A L R ES E A R C H

Genetic Diversity, Population Structure, and Linkage Disequilibrium in U.S. Elite Winter Wheat Dadong Zhang, Guihua Bai,* Chengsong Zhu, Jianming Yu, and Brett F. Carver

Information on genetic diversity and population structure of elite wheat (Triticum aestivum L.) breeding lines promotes effective use of genetic resources. We analyzed 205 elite wheat breeding lines from major winter wheat breeding programs in the USA using 245 markers across the wheat genomes. This collection showed a high level of genetic diversity as reflected by allele number per locus (7.2) and polymorphism information content (0.54). However, the diversity of U.S. modern wheat appeared to be lower than previously reported diversity levels in worldwide germplasm collections. As expected, this collection was highly structured according to geographic origin and market class with soft and hard wheat clearly separated from each other. Hard wheat accessions were further divided into three subpopulations. Linkage disequilibrium (LD) was primarily distributed around centromere regions. The mean genome-wide LD decay estimate was 10 cM (r2 > 0.1), although the extent of LD was highly variable throughout the genome. Our results on genetic diversity of different gene pools and the distribution of LD facilitates the effective use of genetic resources for wheat breeding and the choice of marker density in gene mapping and markerassisted breeding.

Published in The Plant Genome 3:117–127. Published 15 Sept. 2010. doi: 10.3835/plantgenome2010.03.0004 © Crop Science Society of America 5585 Guilford Rd., Madison, WI 53711 USA An open-access publication All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher. THE PL ANT GENOME

W

HEAT,

Abstract

Q

SEPTEMBER 2010

Q

VOL . 3, NO . 2

rice (Oryza sativa L.), and maize (Zea mays L.) are the three staple food crops in the world. As the human population increases and more maize is dedicated to a growing biofuel industry, the demand for high yields of good quality wheat grain will increase. In the past century, wheat breeders worldwide have made remarkable progress in improving disease resistance, grain yield, and end-use quality of wheat. However, continuous breeding activities may result in erosion of genetic diversity as a result of intense selection pressure, recurrent use of a few adapted elite germplasm lines as parents, and adoption of breeding schemes that do not perpetuate genetic recombination (Hoisington et al., 1999). The degree of genetic diversity in contemporary germplasm from breeding programs may indirectly reflect the level of genetic progress achievable in future cultivars. Therefore, evaluation of the genetic diversity resident in current breeding programs at the molecular level and integration of this information into cultivar development programs are essential to using genetic resources effectively in breeding programs (Chao et al., 2007). Genetic diversity, population structure, and LD of a population provide strategic information for association mapping and marker-assisted breeding. The resolution of association mapping and effectiveness of marker-assisted

Dadong Zhang, Chengsong Zhu, and Jianming Yu, Dep. of Agronomy, Kansas State Univ., 2004 Throckmorton Hall, Manhattan, KS 66506; Guihua Bai, USDA-ARS Hard Winter Wheat Genetics Research Unit, Manhattan, KS 66506; Brett Carver, Dep. of Plant and Soil Sciences, Oklahoma State Univ., 368 Agricultural Hall, Stillwater, OK 74078. Received 21 Dec. 2009. *Corresponding author ([email protected]). Abbreviations: AMOVA, an analysis of molecular variance; LD, linkage disequilibrium; PCoA, principal coordinate analysis; PIC, polymorphism information content; QTL, quantitative trait locus(i); SNP, single-nucleotide polymorphism; STS, sequence-tagged site; SSR, simple sequence repeat; UPGMA, unweighted pair group method with arithmetic mean.

117

breeding are mostly determined by LD, the nonrandom association of alleles at different loci (Flint-Garcia et al., 2003; Kim et al., 2007). High LD in a population often facilitates QTL detection with a reduced number of marker requirements, whereas low LD usually implies that high mapping resolution can be achieved, provided that enough markers are available in the target chromosome region. In a self-pollinated species such as wheat, recombination frequency is obviously low in populations undergoing a rapid rate of inbreeding with selfing. Thus, the estimated LD of wheat (0.5–50 cM) (Breseghello and Sorrells, 2006; Maccaferri et al., 2006; Chao et al., 2007; Somers et al., 2007) is relatively high compared with that of maize (200–2000 bp) (Tenaillon et al., 2001) and Arabidopsis (less than 10 kb) (Kim et al., 2007; Song et al., 2009). In wheat, LD estimates also vary with chromosome regions across genomes in a population (Breseghello and Sorrells, 2006; Chao et al., 2007). To date, the extent of LD among contemporary wheat accessions resident in breeding programs has not been reported. The degree of marker polymorphism and genome coverage is also important to association mapping and candidate gene detection. Because of its large genome size, wheat has the poorest marker coverage compared with other crops such as maize, rice, barley (Hordeum vulgare L.), and sorghum [Sorghum bicolor (L.) Moench]. Simple sequence repeat (SSR) markers remain one of the best marker systems for wheat research because of their chromosome specificity, high polymorphism (Plaschke et al., 1995; Huang et al., 2002), high reproducibility, and codominant inheritance patterns (Roder et al., 1998). Elite breeding lines tested in regional performance nurseries represent the most advanced materials derived from breeding programs and best represent the pool of candidates for cultivar release or use as parents in future crosses. A collection of elite breeding lines from major U.S. regional performance nurseries may represent the current status of genetic diversity available to major U.S. hard and soft winter wheat breeding programs. From two major contemporary U.S. wheat gene pools, our objectives were to estimate the levels of genetic diversity and LD and to characterize the population structure of these two major contemporary U.S. wheat gene pools.

MATERIALS AND METHODS Plant Materials After removing full-sib lines, this study used 205 wheat accessions combined from the 2008 U.S. Southern (SRPN, n = 42) and Northern (NRPN, n = 23) Hard Winter Wheat Regional Performance Nurseries, 2008 Hard Winter Wheat Regional Germplasm Observation Nursery (RGON, n = 37), an 2008 elite hard winter wheat nursery at Oklahoma State University (OSU, n = 18), 2008 U.S. Uniform Eastern Soft Red Winter Wheat Nursery (UESRWWN, n = 32), and 2008 Uniform Southern Soft Red Winter Wheat Nursery (USSRWWN, n = 31), plus 22 major cultivars recently released in the 118

hard winter wheat region (Supp. Table 1). Among these accessions, 115 hard red winter (HRW) wheat and 22 hard white winter (HWW) wheat were from eight major hard wheat growing states, and 68 soft winter wheat lines were from 15 major soft wheat growing states in the eastern and midwestern USA (Supp. Table 1). DNA from all accessions were derived from a single plant in the greenhouse to minimize within-line heterogeneity.

Extraction of DNA and Marker Analysis Leaf tissue from a single plant was sampled at the two-leaf stage by means of 1.5-mL strip tubes and dried for 2 d in a freeze drier (Thermo Fisher, Waltham, MA) for DNA isolation. Tubes containing a 3.2-mm stainless steel bead and dried tissue were shaken in a Mixer Mill (Retsch GmbH, Germany) at 25 times s−1 for 5 min. Genomic DNA was extracted by the cetyltrimethyl ammonium bromide (CTAB) method (Saghai-Maroof et al., 1984). Polymerase chain reaction (PCR) amplifications were performed in a Tetrad Peltier DNA Engine (Bio-Rad Lab, Hercules, CA, USA) with a 12-μL PCR mixture containing 1.2 μL 10× PCR buffer (Bioline, Taunton, MA, USA), 2.5 mM MgCl2, 200 μM of each dNTP, 50 nM forward tailed primer, 250 nM reverse primer, 200 nM M13 fluorescent-dye-labeled primer, 0.6 U Taq DNA polymerase, and about 80 ng of template DNA. A touchdown PCR program was used for PCR amplification. Briefly, the reaction was incubated at 95°C for 5 min then continued for five cycles of 1 min at 96°C, 5 min at 68°C with a decrease of 2°C in each subsequent cycle, and 1 min at 72°C. For another five cycles, the annealing temperature started at 58°C for 2 min with a decrease of 2°C for each subsequent cycle. Reactions then went through an additional 25 cycles of 1 min at 96°C, 1 min at 50°C, and 1 min at 72°C with a final extension at 72°C for 5 min. PCR products were analyzed on an ABI PRISM 3730 DNA Analyzer (Applied Biosystems, Foster City, CA). A total of 245 markers covering all 21 wheat chromosomes were selected to analyze the entire population (Supp. Table 2). All primers were selected according to currently available genetic maps (Matthews et al., 2003) (http://www.graingenes.org; verified 11 Aug. 2010) and previously reported polymorphism information content (PIC) (Breseghello and Sorrells, 2006; Chao et al., 2007). Consideration was also given to markers linked to QTL for several important traits such as resistance to Fusarium head blight (caused by Fusarium graminearum Schwabe), wheat rusts [caused by Puccinia graminis Pers. f. sp. tritici Eriks. & Henn.; Puccinia striiformis Westend. f. sp. tritici Eriks.; and Puccinia triticina Erikss. (= P. recondita Roberge ex Desmaz. f. sp. tritici)], soil aluminum toxicity, preharvest sprouting, and several end-use quality traits. Included were 86 GWM, 73 WMC, 48 BARC, 12 CFD, 7 GDM, and 3 CFA markers, plus 16 sequence-tagged site (STS) markers for specific genes or QTL. Data collected from an ABI DNA Analyzer (Applied Biosystems, Foster City, CA) were processed by GeneMarker version 1.6 (SoftGenetics LLC, State College, PA) THE PL ANT GENOME

Q

SEPTEMBER 2010

Q

VOL . 3, NO . 2

and manually checked twice for accuracy. Marker allele scoring followed Breseghello and Sorrells (2006). All data from the 245 SSR markers were used in cluster analysis, but only 79 markers, covering all 42 chromosome arms at a genetic distance of at least 15 cM apart, were selected for structure calculation. A linkage map was assembled according to information from the SSR consensus map (Somers et al., 2004). Markers that were mapped to more than one position were assigned by aligning the Ta-SSR-2004 map with either the wheat composite 2004 map (http://wheat.pw.usda.gov; verified 11 Aug. 2010) or the Ta-Synthetic/Opata-SSR map (Song et al., 2005) and by comparing fragment size for markers with known position on GrainGenes if available. A total of 186 were selected for LD analysis after discarding markers that couldn’t be positioned by either of the above methods.

Data Analysis PowerMarker soft ware version 3.25 (Liu and Muse, 2005) was used to calculate number of alleles and values of gene diversity and PIC (Botstein, 1980) of each locus from 245 SSR markers. Distance-based cluster analysis, using Nei’s distance (Nei, 1973), was conducted separately for hard and soft wheat groups because they were clearly demarcated in structure analysis. PIC was calculated using the formula PIC = 1 − Σ(Pi)2, where Pi is the proportion of the population carrying the ith allele. A model-based (Bayesian) method was used to assess the number of subpopulations among all accessions with the soft ware STRUCTURE 2.2 (Pritchard et al., 2000). Structure was analyzed by means of k-values (an assumed fi xed number of subpopulations) from 1 to 10 in the entire population. Six independent analyses were conducted for each k-value with the burn-in time and replication number set at 5 × 105 and 1 × 105, respectively. All accessions were assigned to a subpopulation according to the largest probability estimated by the program. All subpopulations derived from either cluster analysis or model-based structure analysis were also evaluated by principal coordinate analysis (PCoA) implemented by the MVSP 3.13 program (Kovach, 1998). An analysis of molecular variance (AMOVA) (Weir, 1996) was conducted by Arlequin v. 3.11 (Excoffier et al., 2005) to partition the variation among and within clusters. The threshold for statistical significance was determined by running 1000 permutations. One hundred eighty-six marker loci that were assembled in a consensus map, mainly based on linkage information from the Ta-SSR-2004 map, were used for genome-wide LD analysis (Supp. Table 2). Pairwise LD was calculated by TASSEL 1.9.4 (http://www.maize genetics.net; verified 11 Aug. 2010) (Bradbury et al., 2007). The comparison-wise significance was computed by 1000 permutations. LD was estimated separately for each chromosome, then the genome-wide LD decay was estimated by plotting the pairwise r2 against the genetic distance from all 21 chromosomes. A trend line was drawn by second-degree loess (Cleveland, 1979) by

means of the statistical program R (http://www.r-project. org; verified 11 Aug. 2010).

RESULTS Diversity of Marker Alleles All 245 primers tested were polymorphic across 205 accessions and amplified a total of 2153 alleles at 301 loci, confirming that most of the selected markers were highly informative. The total number of alleles per locus varied from 2 to 25, averaging 7.2 alleles per locus and 8.8 alleles per primer (Table 1 and Fig. 1). Primer WMC644 on chromosome 2A amplified the most alleles. The average PIC across all loci was 0.54 with a range from 0.01 (BARC123) to 0.92 (GWM484). Among 137 hard wheat accessions, 243 primers were polymorphic and amplified 1918 alleles. Only two primers (BARC85 and BYAGi) were monomorphic (Supp. Table 2 and Fig. 1). The number of alleles per locus ranged from 1 to 22, averaging 6.4 alleles per locus across all hard wheat accessions. Among 68 soft wheat accessions, 242 primers showed polymorphism and amplified 1574 alleles. Three primers (BARC123, CFD106, and PinaD1) were monomorphic. The number of alleles per locus ranged from 2 to 21 with an average of 5.2 (Table 1). Four SSR markers for aluminum tolerance gene ALMT1, WMC331, GDM125, SSR3a, and SSR3b, were screened across 205 accessions. SSR3a amplified 15 alleles in hard wheat and nine in soft wheat, indicating high genetic variation for this gene marker sequence in the U.S. wheat germplasm. SSR primer WMC331 amplified six alleles in hard wheat and three in soft wheat. A high level of polymorphism was also found for other markers linked to specific genes or QTLs including Sr2-x3b061c22 (a marker for stem rust resistance gene Sr2), SWM10 (a marker for a leaf rust resistance gene), csLV34-Lr34 (a marker for leaf rust resistance gene Lr34), GWM526 (a marker for Barley yellow dwarf virus resistance gene Bdv2), GWM533 (a marker for a major wheat Fusarium head blight resistance QTL FHB1), GWM261 (a marker for the reduced plant height gene Rht8), GWM111 [a marker for Russian wheat aphid Diuraphis noxia (Mordvilko), Dn, resistance gene], GDM33 (a marker for Lr21), CFD132 (a marker for a Hessian fly resistance gene H13), BARC321 (a marker for seed dormancy), BARC164 (a marker for an aluminum tolerance QTL), and WMC474 (a marker for stem rust resistance gene Sr40) (Supp. Table 2).

Genetic Relationship among the Accessions Distance-based cluster analysis clearly identified two major clusters in corresponding to their geographic origins and market classes, soft vs. hard (Fig. 2). Within a class, hard wheat could be classified into three subclusters of 23, 52, and 62 accessions each (Fig. 2). The first two subclusters primarily contained accessions from the southern and central Great Plains states, with 20 of 23 and 45 of 52 accessions from Kansas, Oklahoma, Texas, and Colorado. ‘Jagger’, ‘Ogallala’, and ‘Trego’ were the

ZHANG ET AL .: GENETI C DIVERSIT Y AND POPU L ATION STRUCTU RE OF ELITE WHEAT

119

Table 1. Summary of allelic diversity for 205 U.S. wheat accessions. Total Hard wheat Soft wheat

Sample size

No. of polymorphic markers

No. of alleles scored

Average no. of alleles†

Gene diversity

PIC‡

205 137 68

245 243 242

2153 1918 1574

7.2 (8.8) 6.4 (7.8) 5.2 (6.4)

0.57 0.55 0.54

0.54 0.51 0.50

†

Mean allele number per marker locus; value in parentheses is mean allele number per primer. PIC: Polymorphism information content.

‡

Figure 1. Frequency distribution of number of markers in each allele category across hard wheat (Hard), soft wheat (Soft), and combined hard and soft (Total) accessions.

prevailing parents in these subclusters. These parents respectively appeared in the pedigrees of 18, six, and five of 75 accessions. The third subcluster contained accessions originating throughout the Great Plains and could be further divided into two groups with 29 accessions in Group 1 from the southern Great Plains and 33 accessions in Group 2 from the two northern states of Nebraska and South Dakota (Fig. 2). ‘Wesley’ was a frequent parent in these two states, appearing in the pedigrees of eight accessions. Soft wheat accessions were classified into three subclusters. The first subcluster consisted of 35 accessions from 13 of the 14 soft wheat states. The other two subclusters contained accessions almost all from four midwestern states (Ohio, Indiana, Illinois, and Missouri) with 13 of 14 accessions in the second group and 15 of 19 accessions in the third group from these four states. About 75% of the Indiana accessions (11 out of 14) and 86% of the Ohio accessions (6 out of 7) were grouped in these two subclusters (Fig. 2).

Population Structure The number of subpopulations was determined by model-based structure analysis with model parameter k from 1 to 10. The maximum likelihood values showed a typical curvilinear response to increasing k, such that k = 4 (four subpopulations) was defined to provide the optimal structure for further analysis. Similar to cluster analysis, structure analysis clearly separated soft wheat from hard wheat accessions. Almost all soft wheat accessions (67 of 68) were classified into one group with the only exception of Atlas 66, which was structured into the 120

hard group with the lowest membership coefficient. Hard wheat accessions were further divided into three subpopulations: two south-central hard and north hard (Fig. 3). Two south-central hard wheat subpopulations consisted of 41 and 39 accessions. The first subpopulation had 30 accessions from Colorado, Kansas, Oklahoma, and Texas with 21 from Oklahoma alone, and the second subpopulation contained 34 accessions from the same four states with 21 accessions from Kansas alone. Thirty-seven accessions were grouped in the north hard subpopulation with 19 accessions from Nebraska and seven from South Dakota (Fig. 3). Similar to structure analysis, the first principal coordinate (PCo1) also clearly separated hard wheat from soft wheat accessions. Among the three hard wheat subpopulations, the south-central hard subpopulation (majority of accessions from Kansas) could be singled out as one subpopulation, whereas the other two showed mixed geographic origin (Fig. 4). AMOVA partitioned 10% of the total variation for genetic diversity as among subcluster and 90% as within subcluster in the distance-based analysis. Similar variance partitioning was obtained among the groupings on the basis of the structure (model-based) analysis (Table 2).

Linkage Disequilibrium A genome-wide linkage map was assembled on the basis of a previously reported map (Somers et al., 2004) (http:// wheat.pw.usda.gov/ggpages/map_shortlist.html; verified 11 Aug. 2010). A total of 186 marker loci (60 markers from the A genome, 65 from the B genome, and 61 from the D genome) was assembled in the linkage map for THE PL ANT GENOME

Q

SEPTEMBER 2010

Q

VOL . 3, NO . 2


121

Figure 2. Dendrogram showing the relationships among 137 hard wheat accessions and 68 soft wheat accessions as revealed by cluster analysis based on Nei (1973) genetic distance. The colors on the map show the geographical distribution of subgroups estimated from hard and soft wheat on the basis of their genetic distance. The number in the map indicates the total number of accessions collected from the state. The digits on the cluster tree represent the accessions corresponding to Supp. Table 1.

Figure 3. Four subpopulations inferred from structure analysis. The vertical coordinate of each subpopulation indicates the membership coefficients for each individual, and the digits on the horizontal coordinate represent the accessions corresponding to Supp. Table 1.

122

LD detection. In the map, 62 and 41 genome-wide LD blocks were identified at P < 0.01 and P < 0.0001 (Fig. 5), respectively. Most LD blocks (79.0%) were 40 cM) was on chromosome 6A. LD blocks were often detected in regions encompassing the centromere, especially on chromosomes 4D, 5A, and throughout the B genome (Fig. 5). Combined analysis of all pair-wise r2 from 21 chromosomes indicated that the r2-values declined rapidly to 0.1 within 10 cM (Fig. 6).

DISCUSSION Genetic Diversity in Modern U.S. Winter Wheat The wheat accessions used in this study represent the diversity in the most advanced breeding stages of major publicly and privately operated variety development programs from the major hard winter wheat and soft winter wheat regions (Fig. 2) and encompass the marketable gene pools available in both regions. Therefore, information on genetic diversity, population structure, and LD in this collection should provide relevant guidelines for designing new breeding strategies for wheat cultivar improvement and assembly of association mapping populations. Compared with previous reports, the genetic diversity discovered across 205 U.S. hard and soft wheat accessions in this study was high, as reflected in gene diversity value (0.54) and an average number of alleles per loci (7.2). Other researchers reported averages of 4.2 to 6.9 alleles per locus (Plaschke et al., 1995; Dreisigacker et al., 2004; Breseghello and Sorrells, 2006; Maccaferri et al., 2005) among varied sources of wheat collections. Chao et al. (2007) reported a level of genetic diversity similar to that in this study by surveying 42 mapping parents from all major U.S. wheat breeding programs. These mapping parents included cultivars from hard winter, soft winter, and hard spring wheat classes, as well as some hallmark unadapted germplasm. Similar magnitudes of genetic diversity between wheat collections from hard and soft winter wheat regions alone versus from all U.S. wheat growing areas indicates no erosion of diversity within gene pools tied to a specific market class. This result is not unexpected given the extensive sharing of germplasm among U.S. breeding programs across different regions. However, higher levels of diversity have been reported elsewhere. Roussel et al. (2004) evaluated 559 French wheat accessions with 42 SSR markers and identified 14.5 alleles per locus and a PIC value of 0.66. Huang et al. (2002) studied a world collection of 998 accessions and found a gene diversity value of 0.77 and 18.1 alleles per locus at 26 SSR loci. It is possible that the higher genetic diversity in these studies resulted from use of a highly selective set of markers with greatest PIC. However, when markers common to both sets of studies were compared, the genetic diversity parameters from their studies were still significantly higher than our results. For example, the average number of alleles and PIC in Roussel et al. THE PL ANT GENOME

Q

SEPTEMBER 2010

Q

VOL . 3, NO . 2

Figure 4. Principal coordinate analysis (PCoA) of 205 U.S. wheat accessions. The different colors represent the four subpopulations inferred by structure analysis. Hard was clearly separated from soft by the first principal component, whereas south-central Hard and north Hard could be roughly separated by the second principal component.

Table 2. Summarized result of AMOVA of subpopulations generated by the model-based method (STRUCTURE 2.2) and distance-based method (Nei, 1973). Cluster method Distance based (Cluster, Nei 1973) Model based (STRUCTURE)

Source of variation among populations within populations total among populations within populations total

df 5 199 204 3 201 204

(2004) were 16.1 and 0.72, respectively, compared with 10.3 and 0.69 in this study based on 14 common primers. This study shared 11 common markers with Huang et al. (2002) and had eight fewer alleles with an average allele of 17.4 per primer and gene diversity of 0.76 in Huang et al. (2002) versus 9.3 alleles and gene diversity of 0.69 in this study. These studies also included collections from wider geographic areas, especially the report by Huang et al. (2002). In addition, researchers using DArT markers recently reported a lower level of genetic diversity within U.S. wheat in comparison with Australian wheat (White et al., 2008). Therefore, continuous introduction of wheat germplasm from other countries into U.S. breeding programs has potential to enhance the genetic diversity of U.S. breeding materials. Hard wheat appeared to have slightly higher genetic diversity than soft wheat on the basis of average number of alleles per locus or degree of dispersion based on population structure and PCoA. The soft wheat accessions

Sum of squares 1522.25 13498.94 15021.19 1403.35 13617.84 15021.19

Variance components 7.38 67.83 75.21 7.89 67.75 75.64

Percentage of variation 9.82 90.18 100.0 10.43 89.57 100.0

averaged 5.2 alleles per locus, whereas hard wheat averaged 6.4 alleles per locus. About one third of the soft wheat accessions had ≤3 alleles per locus, whereas most hard wheat accessions had ≥4 alleles per locus. The lower genetic diversity in soft wheat was also reflected in fewer marker alleles (4%) that were specific to the soft wheat class. A previous study revealed a similar low number of alleles per locus (4.8) in eastern U.S. soft winter wheat lines (Breseghello and Sorrells, 2006). Relatively lower polymorphism among soft wheat accessions than hard wheat was also observed for some important gene markers, such as Sr2-X3b061c22 and SSR3a. The narrower latitudinal distribution of soft wheat (Fig. 2) may partly explain the fewer alleles per locus observed in this study, as less sequence variation might be needed for adaptation to a narrower latitude area. In addition, structure analysis resulted in three subpopulations in hard wheat, but only one in soft wheat, which also suggests lower polymorphism among soft wheat than hard wheat. However,


123

Figure 5. Wheat consensus map assembled from 186 SSR markers and genome-wide distribution of LD of 205 U.S. wheat accessions. Red: P < 0.0001; Green: P < 0.001; Blue: P < 0.01; No fill: P > 0.01. Centromere region for each chromosome is marked with an ellipse. *markers used for structure analysis.

the smaller sample size of the soft wheat collection may underrepresent the soft wheat gene pool.

Genetic Relationship and Population Structure Population structure analysis enables understanding genetic diversity in a given collection and identifying the 124

appropriate population for association mapping. In this study, all wheat accessions were consistently separated, whether by cluster, structure, or PCo analyses, according to both grain texture (hard versus soft) and geographical region (Fig. 2 and Fig. 3). Grain texture, a major delineator for U.S. wheat market classes, is controlled THE PL ANT GENOME

Q

SEPTEMBER 2010

Q

VOL . 3, NO . 2

Figure 6. Genome-wide LD (r2) distribution against the genetic distance. The simulation trend line within 20 cM showed that the r2 declined to below 0.1 within 10 cM.

by puroindoline genes at two loci on chromosome 5DS, but they alone are unlikely the key factor with which the collection was grouped (Hogg et al., 2004). One of the sources of divergence between soft and hard wheat can be traced back to their entirely different founder populations. The hard red winter wheat was primarily founded on Turkey Red and its derivatives, whereas soft red derived from the landrace Mediterranean and a number of other Northern European landraces (Cox et al., 1986). In addition, hard and soft wheat cultivars are typically grown in distinct ecogeographic zones featuring different climatic patterns for optimum grain production, thereby requiring different genes for localized adaptation. Regional adaptation may be another primary determinant for major group separation, as reported in several studies on different wheat classes and traits (Bai et al., 2003; Maccaferri et al., 2005; White et al., 2008). Therefore, the population from the USA used in this study was also highly structured according to geographic origin or class, and hard and soft wheat accessions should be analyzed separately in an association study. In this study, hard wheat accessions were consistently divided into three subpopulations in both distance-based cluster analysis and model-based structure analysis. Within each subgroup, some cultivars or lines were frequently used as parents in different breeding programs. For example, Jagger, Ogallala, and Trego could be found in many accessions from the southern and central Great Plains breeding programs, and Wesley could be found in many accessions from Nebraska and South Dakota breeding programs. Jagger has many desirable fitness traits such

as resistance to aluminum toxicity, stripe rust (caused by Puccinia striiformis), Soil-borne wheat mosaic virus, Wheat spindle streak mosaic virus, tan spot [caused by Pyrenophora tritici-repentis (Died.) Drechs], speckled leaf blotch (caused by Septoria tritici Roberge), with broad adaptation across the southern Great Plains (http://www. ars-grin.gov/; verified 11 Aug. 2010), any one of which would contribute to its popularity in southern Great Plains breeding programs. Similarly, Wesley has desirable breadmaking quality, resistance to a series of diseases, high yield potential, and good adaptation in the north central Great Plains. These results suggested that breeding activities with different objectives in different geographic regions imparted significant impact on the genetic structure of breeding populations.

Genome-Wide Linkage Disequilibrium Genome-wide LD and the magnitude of its distribution across genomes and chromosomes may significantly affect the power of association mapping and effectiveness of marker-assisted breeding (Sorrells and Yu, 2009). Usually, a higher level of LD is expected in hexaploid wheat than in maize and sorghum because of the rapid rate of inbreeding in wheat with a high degree of self-pollination. In this study, genome-wide LD was reflected by r2-values that abruptly declined to 0.1 within 10 cM when all mapped loci were analyzed (Fig. 6). The LD estimated in this study was similar to that reported by Chao et al. (2007) but still larger than that reported by Somers et al. (2007) for Canadian hard red spring wheat breeding lines. Therefore, a high level of LD in many chromosome


125

regions of the population suggests that association mapping with SSR markers can be an effective method for QTL identification and validation in these regions. Marker-assisted selection can also be an effective tool for integrating these QTL into elite breeding lines. However, varied levels of LD were observed across chromosomes or genomes. Low LD observed in the majority of each genome suggests that map resolution can be improved in many important regions of chromosomes by association mapping, especially for chromosomes 4D, 5A, and all B chromosomes where LD blocks were mainly distributed near the centromere (Devos and Gale, 2000). Sandhu and Gill (2002) reported that gene density was low in the centromere region but high in the distal part of chromosomes. Thus, it is possible to improve map resolution through association mapping in these gene-rich regions, especially as more single-nucleotide polymorphism (SNP) markers become available in wheat. Genetic distance used for LD calculations in this study was based on the consensus map from mapping populations (Somers et al., 2004), and thus the distances may not reflect actual genetic distances in the population. Further, actual LD may differ among populations and need to be evaluated for each population on a caseby-case basis (Breseghello and Sorrells, 2006). Some alien chromosome translocations may also affect LD in the population. Currently high-density SNP maps that are ideal for LD estimation are still not available in wheat because of unavailability of the whole genome sequence and the complexity of the wheat genome. Nevertheless, nearly all LD studies are empirically predicated on genetic distances in the SSR consensus map (Somers et al., 2004). Whereas the consensus map has proven useful for linkage mapping of many important genes–QTL (Arbelbide and Bernardo, 2006; Pushpendra et al., 2007; Santra et al., 2008), and although genetic distance among markers may vary between populations, it is still the most appropriate map currently available for LD analysis in wheat. Therefore, the LD estimates from this study provide a valuable addition to the current literature on LD in wheat. A more precise estimation can be generated when a high-resolution SNP map is available. In summary, this study demonstrated that modern breeding practices still maintain reasonable genetic diversity in major U.S. winter wheat gene pools. However, introduction of useful genes from other countries or gene pools may broaden U.S. wheat genetic diversity. Two subpopulations, hard vs. soft, were clearly structured, which matched with their market classification and geographic distribution. Higher genetic diversity in hard winter wheat than soft winter wheat suggests that gene flow from hard wheat to soft may enhance genetic diversity of soft gene pool. LD analysis identified significant LD blocks, especially in centromere regions, and low-density markers in these regions may be sufficient for association mapping and marker-assisted selection. However, the majority of genomes have low LD, especially in many gene-rich regions, and thus association 126

mapping in these regions has potential to significantly improve map resolution for cloning many important genes when higher density marker coverage with new marker systems such as SNP is available in wheat. Acknowledgments Th is project is partly funded by the NRI of the USDA CSREES, CAP grant number 2006-55606-16629. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. Contribution no. 10-166-J from the Kansas Agricultural Experiment Station, Manhattan, KS.

References Arbelbide, M., and R. Bernardo. 2006. Mixed-model QTL mapping for kernel hardness and dough strength in bread wheat. Theor. Appl. Genet. 112:885–890. Bai, G., P. Guo, and F.L. Kolb. 2003. Genetic relationships among head blight resistant cultivars of wheat assessed on the basis of molecular markers. Crop Sci. 43:498–507. Botstein, D. 1980. A theory of modular evolution for bacteriophages. Ann. N.Y. Acad. Sci. 354:484–490. Bradbury, P.J., Z. Zhang, D.E. Kroon, T.M. Casstevens, Y. Ramdoss, and E.S. Buckler. 2007. TASSEL: Soft ware for association mapping of complex traits in diverse samples. Bioinformatics 23:2633–2635. Breseghello, F., and M.E. Sorrells. 2006. Association mapping of kernel size and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics 172:1165–1177. Chao, S., W. Zhang, J. Dubcovsky, and M. Sorrells. 2007. Evaluation of genetic diversity and genome-wide linkage disequilibrium among U.S. wheat (Triticum aestivum L.) germplasm representing different market class. Crop Sci. 47:1018–1030. Cleveland, W.S. 1979. Robust locally weighted regression and smoothing scatterplots. J. Am. Statist. Assoc. 74:829–836. Cox, T.S., J.P. Murphy, and D.M. Rodgers. 1986. Changes in genetic diversity in the red winter wheat regions of the United States. Proc. Natl. Acad. Sci. USA 83:5583–5586. Devos, K.M., and M.D. Gale. 2000. Genome relationships: The grass model in current research. Plant Cell 12:637–646. Dreisigacker, S., P. Zhang, M.L. Warburton, M.V. Ginkel, D. Hoisington, M. Bohn, and A.E. Melchinger. 2004. SSR and pedigree analyses of genetic diversity among CIMMYT wheat lines targeted to different megaenvironments. Crop Sci. 44:381–388. Excoffier, L., G. Laval, and S. Schneider. 2005. Arlequin (version 3.0): An integrated soft ware package for population genetics data analysis. Evol. Bioinform. Online 1:47–50. Flint-Garcia, S.A., J.M. Thornsberry, and E.S.T. Buckler. 2003. Structure of linkage disequilibrium in plants. Annu. Rev. Plant Biol. 54:357–374. Hogg, A.C., T. Sripo, B. Beecher, J.M. Martin, and M.J. Giroux. 2004. Wheat puroindolines interact to form friabilin and control wheat grain hardness. Theor. Appl. Genet. 108:1089–1097. Hoisington, D., M. Khairallah, T. Reeves, J.M. Ribaut, B. Skovmand, S. Taba, and M. Warburton. 1999. Plant genetic resources: What can they contribute toward increased crop productivity? Proc. Natl. Acad. Sci. USA 96:5937–5943. Huang, Q., A. Borner, S. Roder, and W. Ganal. 2002. Assessing genetic diversity of wheat (Triticum aestivum L.) germplasm using microsatellite markers. Theor. Appl. Genet. 105:699–707. Kim, S., V. Plagnol, T.T. Hu, C. Toomajian, R.M. Clark, S. Ossowski, J.R. Ecker, D. Weigel, and M. Nordborg. 2007. Recombination and linkage disequilibrium in Arabidopsis thaliana. Nat. Genet. 39:1151–1155. Kovach, W.L. 1998. MVSP- A Multivariate Statistical Package for Windows. Release 3.0. Kovach, W.L., Pentraeth, Wales, UK. Liu, K., and S.V. Muse. 2005. PowerMarker: An integrated analysis environment for genetic marker analysis. Bioinformatics 21:2128–2129. Maccaferri, M., M.C. Sanguineti, V. Natoli, J.L. Ortega, M.B. Salem, J. Bort, C. Chenenaoui, E. De Ambrogio, L.G. del Moral, and A. De Montis. 2006. A panel of elite accessions of durum wheat (Triticum THE PL ANT GENOME

Q

SEPTEMBER 2010

Q

VOL . 3, NO . 2

durum Desf.) suitable for association mapping studies. Plant Genet. Resour. 4:79–85. Maccaferri, M., M.C. Sanguineti, E. Noli, and R. Tuberosa. 2005. Population structure and long-range linkage disequilibrium in a durum wheat elite collection. Mol. Breed. 15:271–289. Matthews, D.E., V.L. Carollo, G.R. Lazo, and O.D. Anderson. 2003. GrainGenes, the genome database for small-grain crops. Nucleic Acids Res. 31:183–186. Nei, M. 1973. Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70:3321–3323. Plaschke, J., M.W. Ganal, and M.S. Röder. 1995. Detection of genetic diversity in closely related bread wheat using microsatellite markers. Theor. Appl. Genet. 91:1001–1007. Pritchard, J.K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945–959. Pushpendra, K.G., S.B. Harindra, L.K. Pawan, K. Neeraj, K. Ajay, R.M. Reyazul, M. Amita, and K. Jitendra. 2007. QTL analysis for some quantitative traits in bread wheat. J. Zhejiang Univ. Sci. B 8:807–814. Roder, M.S., V. Korzun, K. Wendehake, J. Plaschke, M.H. Tixier, P. Leroy, and M.W. Ganal. 1998. A microsatellite map of wheat. Genetics 149:2007–2023. Roussel, V., J. Koenig, M. Beckert, and F. Balfourier. 2004. Molecular diversity in French bread wheat accessions related to temporal trends and breeding programmes. Theor. Appl. Genet. 108:920–930. Saghai-Maroof, M.A., K.M. Soliman, R.A. Jorgensen, and R.W. Allard. 1984. Ribosomal DNA spacer-length polymorphisms in barley: Mendelian inheritance, chromosomal location, and population dynamics. Proc. Natl. Acad. Sci. USA 81:8014–8018. Sandhu, D., and K.S. Gill. 2002. Gene-containing regions of wheat and the other grass genomes. Plant Physiol. 128:803–811. Santra, D.K., X.M. Chen, M. Santra, K.G. Campbell, and K.K. Kidwell. 2008. Identification and mapping QTL for high-temperature

adult-plant resistance to stripe rust in winter wheat (Triticum aestivum L.) cultivar ‘Stephens’. Theor. Appl. Genet. 117:793–802. Somers, D.J., T. Banks, R. Depauw, S. Fox, J. Clarke, C. Pozniak, and C. McCartney. 2007. Genome-wide linkage disequilibrium analysis in bread wheat and durum wheat. Genome 50:557–567. Somers, D.J., P. Isaac, and K. Edwards. 2004. A high-density microsatellite consensus map for bread wheat (Triticum aestivum L.). Theor. Appl. Genet. 109:1105–1114. Song, B.H., A.J. Windsor, K.J. Schmid, S. Ramos-Onsins, M.E. Schranz, A.J. Heidel, and T. Mitchell-Olds. 2009. Multilocus patterns of nucleotide diversity, population structure and linkage disequilibrium in Boechera stricta, a wild relative of Arabidopsis. Genetics 181:1021–1033. Song, Q.J., J.R. Shi, S. Singh, E.W. Fickus, J.M. Costa, J. Lewis, B.S. Gill, R. Ward, and P.B. Cregan. 2005. Development and mapping of microsatellite (SSR) markers in wheat. Theor. Appl. Genet. 110:550–560. Sorrells, M.E., and J. Yu. 2009. Linkage disequilibrium and association mapping in the Triticeae. p. 655–683. In C. Feuillet and G.J. Muehlbauer (ed.) Genetics and genomics of the Triticeae, Vol. 7. Springer. Tenaillon, M.I., M.C. Sawkins, A.D. Long, R.L. Gaut, J.F. Doebley, and B.S. Gaut. 2001. Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc. Natl. Acad. Sci. USA 98:9161–9166. Weir, B.S. 1996. Genetic data analysis II. Sinauer Associated, Inc., Sunderland, MA. White, J., J.R. Law, I. MacKay, K.J. Chalmers, J.S. Smith, A. Kilian, and W. Powell. 2008. The genetic diversity of UK, US and Australian cultivars of Triticum aestivum measured by DArT markers and considered by genome. Theor. Appl. Genet. 116:439–453.


127