Genomic Selection using Multiple Populations

5 downloads 0 Views 870KB Size Report
heritability (e.g., Dekkers, 2007). This is not always satis- factory because calculating heritability for unbalanced data sets with correlated genotypic effects is far ...
RESEARCH

Genomic Selection using Multiple Populations T. Schulz-Streeck, J. O. Ogutu, Z. Karaman, C. Knaak, and H. P. Piepho*

ABSTRACT Using different populations in genomic selection raises the possibility of marker effects varying across populations. However, common models for genomic selection only account for the main marker effects, assuming that they are consistent across populations. We present an approach in which the main plus populationspecific marker effects are simultaneously estimated in a single mixed model. Cross-validation is used to compare the predictive ability of this model to that of the ridge regression best linear unbiased prediction (RR-BLUP) method involving only either the main marker effects or the population-specific marker effects. We used a maize (Zea mays L.) data set with 312 genotypes derived from five biparental populations, which were genotyped with 39,339 markers. A combined analysis incorporating genotypes for all the populations and hence using a larger training set was better than separate analyses for each population. Modeling the main plus the population-specific marker effects simultaneously improved predictive ability only slightly compared with modeling only the main marker effects. The performance of the RR-BLUP method was comparable to that of two regularization methods, namely the ridge regression and the elastic net, and was more accurate than that of the least absolute shrinkage and selection operator (LASSO). Overall, combining information from related populations and increasing the number of genotypes improved predictive ability, but further allowing for population-specific marker effects made minor improvement.

T. Schulz-Streeck, J.O. Ogutu, and H.P. Piepho, Bioinformatics Unit, Institute of Crop Science, University of Hohenheim, Fruwirthstrasse 23, 70599 Stuttgart, Germany; T. Schulz-Streeck and C. Knaak, KWS SAAT AG, Grimsehlstraße 31, 37555 Einbeck, Germany; Z. Karaman, Limagrain Europe, CS 3911, 63720 Chappes, France. Received 7 Mar. 2012. *Corresponding author ([email protected]). Abbreviations: AIC, Akaike information criterion; GS, genomic selection; LASSO, least absolute shrinkage and selection operator; LD, linkage disequilibrium; QTL, quantitative trait locus/loci; RMSE, root mean squared error; RR-BLUP, ridge regression best linear unbiased prediction; SNP, single nucleotide polymorphism.

G

enomic selection (GS) is a marker-based method for predicting genomic breeding values (Meuwissen et al., 2001) that involves simultaneous estimation of the effects of many markers or chromosomal segments. Two assumptions fundamental to GS are that the markers cover the whole genome so that each quantitative trait locus (QTL) is linked to at least one marker or chromosomal segment and that the marker effects are consistent across different genotypes. In animal breeding, more accurate genomic predictions can be achieved by combining genotypes from different breeds into one training set (e.g., de Roos et al., 2009; Ibáne˜z-Escriche et al., 2009; Hayes et al., 2009b). This can be particularly useful if the training set for one of the pooled populations is too small to reliably perform within-population evaluation (de Roos et al., 2009). However, combining different breeds to perform one analysis is not without its inherent problems, which include making prediction accuracy to vary with such factors as marker density, number of training genotypes, how closely related the different populations Published in Crop Sci. 52:2453–2461 (2012). doi: 10.2135/cropsci2012.03.0160 © Crop Science Society of America | 5585 Guilford Rd., Madison, WI 53711 USA All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.

CROP SCIENCE, VOL. 52, NOVEMBER– DECEMBER 2012

2453

are, and the heritability of traits of interest (de Roos et al., 2009; Ibáne˜z-Escriche et al., 2009). These problems make it hard to attain a consistent increase in prediction accuracy for certain traits in combined analyses involving information from different breeds (Hayes et al., 2009b). In plant breeding, often different populations (crosses) are combined for analysis to increase the number of genotypes and predictive accuracy of GS (e.g., Albrecht et al., 2011; Crossa et al., 2010; Heslot et al., 2012; Zhao et al., 2012). Moreover, with a combined analysis a subset of individuals from each population need not be first phenotyped in the target set of environments. But in maize breeding, conducting separate within-population analyses may yield relatively high gains in prediction accuracy, in particular if the number of genotypes within each population is also relatively large, because prediction accuracy can vary markedly among different populations (Lorenzana and Bernardo, 2009; Albrecht et al., 2011; Heslot et al., 2012; Schulz-Streeck et al., 2012). Nevertheless, conducting separate analyses for each population may not always improve predictive accuracy and may sometimes be less accurate than a combined analysis that exploits information from related populations (Jannink et al., 2010; Albrecht et al., 2011; Riedelsheimer et al., 2012; Zhao et al., 2012). Therefore, although often desirable, a combined analysis is not always optimal and can lead to inconsistent estimates of marker effects among populations (Liu et al., 2011). In animal breeding, Ibáne˜z-Escriche et al. (2009) proposed a breed-specific single nucleotide polymorphism (SNP) allele model in which breed-specific substitution effects for alleles are estimated based on the population origin of the allele and whether the alleles originate from a sire or a dam. This model did not, however, improve prediction accuracy at high marker densities for breed-specific analyses but did so at low marker densities in some cases. Even so, high marker densities are required for multiple populations because haplotype segments in strong linkage disequilibrium (LD) are often shorter for multiple than for single populations (Toosi et al., 2010). In many GS studies, the genetic data are usually subjected to quality control (e.g., Hayes et al., 2009a; Albrecht et al., 2011; Crossa et al., 2010). Besides the genetic data, the quality of the phenotypic data should be tested. This is typically done for plant breeding data (Fox et al., 1997) and observations with unexpected values excluded from the data set. But the frequent use of unreplicated field trial designs in plant breeding programs complicates the detection of outliers. In particular, because of the lack of replicate observations on genotypes within locations, using genotypes as fixed effects makes it impossible to conduct residual diagnostics for outlier detection. This difficulty can be overcome by using marker information to make additional diagnostic methods available for phenotypic data from unreplicated field trials. Yet in most GS studies, quality control for the phenotypic observations is hardly adequately described. 2454

Here, we combine the idea of modeling the main marker effects, which are consistent across all populations, with that of modeling population-specific marker effects into one mixed model. We then compare the predictive ability of this combined model with that of (i) the common ridge regression best linear unbiased prediction (RRBLUP) model, which involves only the main marker effects, and (ii) a population-specific analysis, involving only population-specific marker effects. Additionally, we compare the different models by varying the number of markers. Moreover, we show how the quality of phenotypic data can be checked using the marker information. Lastly, we briefly compare the performance of the RR-BLUP models against those of three widely used regularization methods, in particular ridge regression, the least absolute shrinkage and selection operator (LASSO), and the elastic net.

METHODS Data Set We analyzed a data set provided by AgReliant Genetics containing 568 doubled haploid maize lines derived from five different biparental populations. The hybrid performance for kernel dry weight was assessed with the same common tester using unreplicated testcross genotypes. We additionally analyzed grain moisture to exemplify a trait with a high heritability. The testcross genotypes were tested in five different locations in 1 yr, but every testcross genotype was not tested at each location. An augmented design with 20 trials, each with two incomplete blocks, was used at all locations. In each trial, a subset of the testcross genotypes was tested. The standard varieties but not the testcross genotypes were replicated (i.e., planted in all blocks) at each location. From the total of 568 doubled haploid maize lines, 312 were genotyped with a 39,339 SNPs. The marker information was stored in a matrix M = {mik}. The marker covariate mik for the ith genotype (i = 1, 2,…, G) and the kth marker (k = 1, 2,…, M), for biallelic SNP markers with alleles A1 and A2, was coded as 1 for A1A1, –1 for A2 A2, and 0 for A1A2, A2 A1, or missing values. Excluding markers with minor allele frequencies smaller than 5%, more than 15% missing values, or more than 5% heterozygous observations from the data set reduced the total number of markers to 24,448. Genotypes were excluded from the data set if more than 5% of their markers were heterozygous or inconsistent with the parental genotypes. Overall, 33 testcross genotypes were excluded. The relationships between the genotypes and populations were estimated using a principal component analysis of the marker data and visualized by plotting the scores on the first two principal components obtained from a singular-value decomposition of the matrix with the marker information.

Phenotypic Quality Control Before phenotypic analysis, we used three complementary approaches to identify and eliminate outlying phenotypic observations from the data set. In the fi rst approach, the marker information was used to identify problematic observations for each trial and location combination. The model (I) accounted for the effects of the testcross population mean, the fi xed effects

WWW.CROPS.ORG

CROP SCIENCE, VOL. 52, NOVEMBER– DECEMBER 2012

of the standard varieties, the random effects of the testcross main effects, the within-location incomplete block effects, and the plot error. The variance–covariance structure for the genotyped testcrosses was assumed to be a linear function of the relationship matrix of the markers (Piepho, 2009) whereas that for the testcrosses without marker information was assumed to be proportional to an independent matrix. Note that the testcrosses without marker information in each trial came from the same population. In cases where genotypes came from different populations, we replaced the independent matrix with a pedigree matrix. The random within-location incomplete block effect and the error were assumed to be homogenous and independent. Phenotypic observations were dropped from the data set if the studentized residual of the estimated trait in model (I) was larger than 3 in absolute value. To enable residual diagnostics to be performed to identify outlying phenotypic observations, if any, testcross genotypes were represented as random effects in model (I). It is important to note that representing the testcross genotypes as fi xed effects would render such residual analysis unfeasible because of the lack of replicate observations on testcross genotypes within locations. In the second approach, we formulated a model (II) in which the interaction between testcross genotypes and location contributed to the residual; thus we now analyzed each trial across all five locations. We excluded all observations with studentized residuals exceeding 3 in absolute value from the data set, based on the model in which genotype × location interaction was not explicitly included in the model but implicitly contributed to the residual error term. In contrast to model (I), which uses only the unreplicated within-location information on testcross genotypes, model (II) exploits the replication of testcross genotypes across locations. More precisely, model (II) accounts for fi xed effects of testcrosses, locations, within-location incomplete blocks, and the error contributed by both the genotype × location interaction and the plot error. A homogenous variance across locations was assumed for this error term. Locations for which phenotypic observations for the testcross genotypes for each trial had very low correlations (less than 0.1 to at least three other locations) with corresponding observations from all the other locations were also assumed to be outlying and were therefore excluded from the data set. To identify such locations, we assumed that, because the same genotypes were planted in each location, corresponding observations of all the testcross genotypes should be positively correlated between pairs of locations for each trial. The model accounted for fi xed effects of locations, within-location incomplete blocks, standard variety × location interaction effects, and random effects of testcross genotype × location interaction and the error term. The variance–covariance for genotype × location interaction effects was assumed to be unstructured, with a separate correlation for each pair of locations. The observations from one location that had none or only partial tilling had unusually low correlations with observations from the other locations. Overall, the three quality-control procedures suggested excluding 110 observations for kernel dry weight and 146 observations for grain moisture from the data set. Of the outlying observations, 23 for kernel dry weight and 34 for grain moisture had extreme values, that is, deviated by more than two standard deviations from their respective means. CROP SCIENCE, VOL. 52, NOVEMBER– DECEMBER 2012

Phenotypic Analysis Phenotypic analysis for computing genotype means was performed for all the five populations using the 535 testcross genotypes and associated observations selected using the above three quality-control procedures. The model used for phenotypic analysis was as follows: y = X a β a + Z aua + Zbub + Z cuc + Zd ud + e,

[1]

in which y is the observed data vector of the target trait (kernel dry weight or grain moisture), X a is the design matrix for the fi xed testcross genotypic effect (β a), Z a , Zb, Z c , and Zd are design matrices for the random effects, ua is a vector of random 2 environment effects with var(ua) = I σa , ub is a vector of ran2 dom genotype–environment effects with var(ub) = I σb , uc is a 2 vector of random trial effects with var(uc) = I σc , ud is a vector of random within-environment incomplete block effects with 2 2 var(ud) = I σd , e is a vector of plot errors with var(e) = I σe ( l ) , 2 and σe ( l ) is the error variance for the lth location. The Akaike information criterion (AIC) favored using a heterogeneous over a homogeneous error variance. Adjusted means for the genotypes were estimated using the AIC-selected best model.

Genomic Selection The adjusted means for the testcross genotypes were submitted to the GS stage. Adjusted means for the standard varieties and the testcross genotypes without marker information were excluded from the data set before submission to the GS stage.

Method A (Main Marker Effects) Genomic selection considering only the main marker effects was performed using the RR-BLUP method (Piepho, 2009) and the adjusted means of the 279 (n) testcross genotypes with marker information: y2 = 1n μ + Mu + e,

[2]

in which y2 is an n-vector of adjusted means per genotype, 1n is an n-vector of ones, μ is a common intercept, M is an n × p covariate matrix of p SNP markers for n tested genotypes, u is a vector of random SNP effects, and e is a vector of errors with var(e) = I σe2 . Under the assumed model, the variance of the observed data is given by V = var(y2) = Γ σu2 + I σe2 , in which Γ = MMT and MT is the transpose of M, the matrix with the marker information. Because the number of markers far exceeds that of genotypes, it is computationally more efficient to rewrite model [2] (Piepho et al., 2012) as y2 = 1n μ + gsnp + e,

[3]

in which gsnp = M snpu snp with var(gsnp) = Γ σ2snp .

WWW.CROPS.ORG

2455

Method B (Population-Specific Marker Effects) We also performed GS by considering each population separately and therefore using only the population-specific marker effects. The model is identical to model [3] except that it is applied separately to each of the five populations.

Method C (Main And Population-Specific Marker Effects) We combined the models in methods A and B into one model by partitioning the marker effect (u snp) into the main (u snp) and population-specific (upsnp) effects as follows: y2 = 1n μ + Z pup + M snpu snp + Mpsnpupsnp + e,

[4]

in which Z p is a design matrix for the vector of the random population main effects up with var(up) = I σ2p , u snp is a vector of random population-specific marker effect with var(upsnp) = I σ2psnp , and design matrix Mpsnp and the other terms are defi ned similarly as for the preceding model [2]. The variance of the observed data under this model is V = var(y2) = Γ σ2p + Γ σ2snp + Γ σ2psnp + Γ σe2 , in which Γp = Z p ZTp , Γpsnp = M psnp MTpsnp , and MTpsnp is the transpose of Mpsnp, the matrix with the marker information with a block-diagonal structure, with blocks corresponding to the individual populations. For the sake of computational efficiency, in particular when the number of markers is far greater than that of genotypes, it is useful to rewrite model [4] as y2 = 1n μ + Z pup + gsnp + g psnp + e,

[5]

in which g psnp = Mpsnpupsnp with var(g psnp) = Γpsnp σ2psnp and the other terms are defi ned analogously as for model [3].

Evaluating Model Performance by Cross-Validation

RESULTS

Fivefold cross-validation was used to comparatively evaluate the predictive abilities of the three mixed models (methods A through C) for GS. We determined how predictive ability varies with increasing number of markers and across different populations. The fivefold cross-validation involved splitting the genotypes into five random subsamples, with approximately one-fi fth of the genotypes of each population, each of which was used, in turn, as a validation set whereas all the other four subsamples were concatenated and used as a training set. We made two separate sets of model comparisons using fivefold cross-validation. In the fi rst set, we combined all the five populations into one data set and then selected each of the five random subsamples for the fivefold cross-validation to have approximately the same proportion of genotypes from each of its five constituent subpopulations. This process was repeated five times to generate a total of 25 different replicate training and validation sets from the combined data set. Each training and validation set contained all available markers (p = 24,448). The fivefold cross-validation was used to assess the predictive ability of each model on the combined data set and on each of 2456

its five constituent population-specific data sets. For the combined data set, genotypes from all the different populations were used as the training and the validation sets. However, for the population-specific data sets, genotypes from all the different populations were used as the training set and only genotypes from one of the five populations were used as the validation set. In the second set, we focused only on kernel dry weight because it is an economically more important trait than grain moisture. We performed separate fivefold cross-validations to compare the three mixed models by varying the number of markers. Unlike in the first set of cross-validations where we used all the markers, we randomly selected subsets of markers (p = 100, 500, 1000, 2500, 5000, 10,000, and 20,000) from the total of 24,448 markers. The selection process was repeated 10 times, resulting in 10 replicate data sets for each prescribed number of markers. For each prescribed subset of markers, this amounted to executing 10 different replicate fivefold cross-validation runs, generating 50 training and 50 validation sets. These were used to evaluate the predictive ability of each of the three mixed models across all the five populations as a function of the number of markers as well as the predictive ability of the models on each population separately. The Pearson correlation (r) between the adjusted means and their predicted values, calculated for each of the validation sets and averaged across all the replicates of the fivefold crossvalidation (n = 5 for the fi rst set or 10 for the second set), was used to measure predictive ability. Predictions for the validation set were obtained for methods A and B as yˆ 2 = 1μˆ + M snp uˆ snp and for method C as yˆ 2 = 1μˆ + Z p uˆ p + M snp uˆ snp + M psnp uˆ psnp. A t test was used to compare the mean Pearson correlations derived from the five replicate runs of the fivefold cross-validation for the fi rst set across the three models using all markers. Note that the used t test is only approximate. The root mean squared error (RMSE) between the adjusted means and their predicted values was also calculated and similarly averaged across all the replicates of the fivefold cross-validation and used as an additional measure of predictive ability.

Plots of the principal component scores revealed that the five different populations clearly separated out, with only minor overlaps apparent between populations one and two as well as between populations four and five (Fig. 1). The Pearson correlations (0.283 ≤ r ≤ 0.365) for the combined analysis of kernel dry weight were low but similar for all the three mixed models. The predictive abilities (0.649 ≤ r ≤ 0.669) for grain moisture were relatively higher but also similar across all the three mixed models. Method B, with only the population-specific marker effects, was somewhat less accurate than the other two methods for both traits (Tables 1 and 2). Considering the five populations separately for kernel dry weight, method C (main plus population-specific marker effects) showed the best performance in three of the five populations and method A (main effects) did so in the other two populations whereas method B showed unexpectedly very low correlations in one population (Table 1). For grain moisture, the population-specific analyzes similarly showed that method C performed the

WWW.CROPS.ORG

CROP SCIENCE, VOL. 52, NOVEMBER– DECEMBER 2012

best in three populations whereas method A did in two of the five populations. However, the differences between the performances of methods A and C were minor. The RMSE showed similar ranking of models as the Pearson correlation coefficient did for both traits (Table 2). Refitting the models to the original data set without first subjecting it to the quality-control procedures resulted in the Pearson correlation coefficient indicating better performance for all three methods for kernel dry weight (Table 1). However, the apparent gain in accuracy was not supported by the RMSE, which showed strong evidence of decreased prediction accuracy for all the three methods, thus suggesting that the quality controls undertaken improved model performance for kernel dry weight (Table 2). For grain moisture, the quality-control procedures improved the performance of all the three methods based on both the Pearson correlation and RMSE. When each of the mixed models with the main, population-specific, or both marker effects was fitted to varying numbers of markers (100 to 24,448), three distinct patterns emerged for kernel dry weight (Fig. 2):

Figure 1. Principal component (PC) analysis based on the singular value decomposition on the single nucleotide polymorphism (SNP) data. Genotypes from the same populations are colored identically. The principal components scores of the genotypes on the first two principal components are shown.

Table 1. Comparison of the predictive abilities of kernel dry weight and grain moisture of methods A (main marker effects), B (population-specific marker effects), and C (main plus population-specific marker effects). Predictive ability is the Pearson correlation between the adjusted means and their predicted values for each validation set averaged across all cross-validation replicates. The Pearson correlation was computed for each population and across all populations. Note that correlations for each trait and method combination (column entries) are not tested for significant differences between populations. Kernel dry weight (quintal ha –1) Population 1 2 3 4 5 All All without quality control¶

Grain moisture (g kg –1)

g†

m‡

Method A

Method B

Method C

Method A

Method B

Method C

85 60 56 40 38 279 312

10,197 12,288 11,275 10,608 11,313 24,448 32,878

0.308 a§ 0.223 a 0.277 a 0.230 a 0.294 a 0.348 b 0.494 a

0.228 b 0.193 a 0.153 b 0.024 b 0.180 a 0.283 c 0.452 b

0.272 ab 0.245 a 0.230 ab 0.289 a 0.269 a 0.365 a 0.474 a

0.691 a 0.462 a 0.640 a 0.528 a 0.529 a 0.669 a 0.586 a

0.699 a 0.435 a 0.615 b 0.502 b 0.395 b 0.649 b 0.537 b

0.701 a 0.471 a 0.619 b 0.532 a 0.529 a 0.669 a 0.585 a



Number of genotypes. Number of polymorphic markers. § Correlations for each population and trait combination (row entries) followed by a common letter are not significantly different between methods (p < 0.05). ¶ All markers and all testcross genotypes were used without first applying any quality control. ‡

Table 2. Root mean squared errors between adjusted means and their predicted values for each validation set averaged across all cross-validation replicates for methods A (main marker effects), B (population-specific marker effects), and C (main plus population-specific marker effects). Note that for each method and trait combination (column entries), correlations are not tested for significant differences between populations. Kernel dry weight (quintal ha –1)

Grain moisture (g kg –1)

Population

Method A

Method B

Method C

Method A

Method B

Method C

1 2 3 4 5 All All without quality control ‡

4.458 a† 8.488 b 5.809 a 5.151 a 7.883 a 6.371 b 11.851 a

4.545 a 8.699 b 6.038 a 5.342 b 8.245 a 6.599 c 12.189 c

4.457 a 7.472 a 6.015 a 5.003 a 7.609 a 6.065 a 12.016 b

1.094 a 1.593 a 1.673 a 1.226 a 1.510 a 1.414 a 1.642 a

1.080 a 1.606 a 1.738 c 1.227 a 1.652 b 1.445 b 1.695 a

1.078 a 1.587 a 1.705 b 1.217 a 1.513 a 1.415 a 1.633 a



Root mean squared errors for each population and trait combination (row entries) for each trait followed by a common letter are not significantly different between methods (p < 0.05). ‡ All markers and all testcross genotypes were used without first applying any quality control. CROP SCIENCE, VOL. 52, NOVEMBER– DECEMBER 2012

WWW.CROPS.ORG

2457

DISCUSSION

Figure 2. Comparison of the predictive abilities of methods A (main marker effects), B (population-specific marker effects), and C (main plus population-specific marker effects) for grain moisture focusing on the number of markers. Predictive ability is the Pearson correlation between the adjusted means and their predicted values for each validation set averaged across all cross-validation replicates. The Pearson correlation was calculated using the combined relevant data from all the populations. For each prescribed number of markers (p = 100, 500, 1000, 2500, 5000, 10,000, and 20,000), 10 random subsets of markers were selected from the available total of 24,448 markers.

1. For all the three mixed models, performance increased monotonically with increasing number of markers and leveled off at p > 2500 markers regardless of the type of marker effects included in the model (main, population-specific, or both; Fig. 2). 2. The performances of methods A and C with the main and the main plus the population-specific marker effects, respectively, were similar, especially at high numbers (p > 1000) of markers, and were consistently better than that for the population-specific method B (Fig. 1). 3. Method C with both the main plus the populationspecific marker effects performed consistently slightly better than method A with only the main marker effects, especially at low numbers (p ≤ 1000) of markers (Fig. 2). The performances of the three methods, when each of the five populations was considered separately, revealed important similarities with their performances on the full validation data set but with some differences. First, none of the three methods was ranked as being clearly the best across the full range of the different subsets of markers evaluated (Fig. 3). Second, there were large differences in predictive abilities of each of the three methods across the five different populations (Fig. 3). 2458

Predictive ability was higher for the combined analysis of all the five different populations than for the populationspecific analyses for both traits, reinforcing the findings of other studies in maize breeding (e.g., Albrecht et al., 2011; Riedelsheimer et al., 2012; Zhao et al., 2012). The increase in predictive accuracy for the combined analysis most probably reflects the increase in the number of genotypes in the training set (e.g., de Roos et al., 2009; Ibáne˜z-Escriche et al., 2009; Zhao et al., 2012). The significant reduction in predictive ability on accounting for the population-specific marker effects through the population-specific analyses in this study is intriguing and contradicts the expectation that such an analysis should enhance accuracy, based on recent empirical evidence from association mapping studies showing that differences do occur among marker effects across populations in maize breeding data sets (Liu et al., 2011). This contradiction arises from the fact that if population-specific marker effects are present and substantial but are accounted for by splitting the data set into many smaller population-specific subsets to enable population-specific analyses, then the accompanying reduction in the size of the training set will adversely affect accuracy (Heslot et al., 2012). Hence, accounting for population-specific marker effects through a combined analysis that enlarges the size of the training set is the more likely path to increased accuracy. Therefore the improvement in predictive ability achieved by the combined analysis is associated with the increased number of genotypes constituting the training set. Our results show that analyzing populations with less than 60 genotypes separately will be less accurate than a combined analysis. However, it was not feasible with our data to test for differences between the methods when the number of genotypes within a biparental population is relatively high (>100) and so can produce accurate predictions (Lorenzana and Bernardo, 2009; Schulz-Streeck et al., 2012). The difference between the combined and the populationspecific analyses, furthermore, varied with the number of markers used in GS. Specifically, the advantage of the combined analysis over that of the population-specific analyses decreased at lower numbers of markers (p = 100 and 500), as was also found by Ibáne˜z-Escriche et al. (2009). Combining the population-specific and the main marker effects into one model only marginally improved accuracy compared with the main marker effects model, but even this modest improvement decreased with increasing number of markers. If predictions are to be made for genotypes of untested populations, then only the main marker effects can be used. Predictive accuracy may then be improved through preselection of markers with constant effects across populations (Schulz-Streeck et al., 2011) but not in all cases (Zhao et al., 2012). All models revealed marked variation in predictive abilities across the five different populations, in accord with the findings of two other recent studies (Albrecht et al., 2011; Heslot

WWW.CROPS.ORG

CROP SCIENCE, VOL. 52, NOVEMBER– DECEMBER 2012

Figure 3. Comparison of the predictive abilities of methods A (main marker effects), B (population-specific marker effects), and C (main plus population-specific marker effects) for grain moisture focusing on the number of markers. Predictive ability is the Pearson correlation between the adjusted means and their predicted values for each validation set averaged across all cross-validation replicates. The Pearson correlation was calculated separately for each population. For each prescribed number of markers (p = 100, 500, 1000, 2500, 5000, 10,000, and 20,000) 10 random subsets of markers were selected from the available total of 24,448 markers.

et al., 2012). No method was consistently ranked the best for all populations even though the two methods with the main marker effects and both the main plus the population-specific marker effects were nearly always better than the method allowing only for the population-specific marker effect. The predictive ability of the combined data set, containing all the genotypes from all the different populations in the training and the validation sets, was higher than that of the population-specific data sets, in which genotypes from all the different populations were used as the training set whereas only genotypes from one of the five populations were used as the validation set, resulting in five validation sets. This suggests that the estimated correlation among all the populations captures the interpopulation variation and hence does not require GS to estimate this source of variation because it can be easily estimated using the pedigree information (Piepho et al., 2008) or a population main effect (Schulz-Streeck and Piepho, 2010), as we did in method C with the main plus population-specific marker effects. Additionally, extending method A with the main marker effect with a population main effect gave results CROP SCIENCE, VOL. 52, NOVEMBER– DECEMBER 2012

similar to methods A and C (Supplemental Table S1). The RMSE was more robust to the interpopulation variation than the Pearson correlation, thus underlining the importance of using different alternative metrics to robustify the assessment of predictive accuracy in GS. Varying the number of markers had only a slight effect on predictive accuracy, in agreement with the findings of Zhao et al. (2012) for other maize data sets with similarly few populations. Therefore, the number of markers in our data set was reduced from 24,448 to 2500 without incurring any appreciable loss in predictive ability. Even decreasing the number of markers further to only 500, which is within the range of 200 to 800 markers suggested as useful for random mating maize populations (Lorenzana and Bernardo, 2009), did not evidently reduce accuracy. In commercial maize breeding populations, a long-range LD has been identified (Ching et al., 2002; Albrecht et al., 2011; Riedelsheimer et al., 2012). This raises a fundamental question regarding which type of information is used for GS between the LD between markers and QTL and the relationship between

WWW.CROPS.ORG

2459

genotypes. Habier et al. (2007) showed that different methods for GS use different types of information and that the type of information used by a given method can affect its long-term gain for GS. In particular, Habier et al. (2007) suggested that methods relying on the LD between markers and QTL lead to more accurate long-term gains than those that mainly exploit the relationship between genotypes. However, although the accuracy of RR-BLUP depends mainly on the relationship between genotypes whereas the accuracies of some Bayesian methods (e.g., BayesB) depend most strongly on the LD between markers and QTL, recent comparative within-generation studies have shown the predictive accuracies of RR-BLUP to be similar to those of the Bayesian methods (Hayes et al., 2009a; Verbyla et al., 2009), suggesting that differences in the type of information used by a method may not always have much practical relevance for short-term gain. Other suggested ways of increasing the long-term gain include making within-population selection, but even this may come at the expense of a reduced shortterm gain (Jannink et al., 2010). Combining both the ideas of within- and among-population selection only slightly increased predictive ability in our analysis, but it would be interesting to further explore if this approach would increase the long-term gain as well using other data sets because it was not possible to do so with the data set on hand. Predictive ability was calculated as the Pearson correlation between the adjusted means and their predicted values. But many GS studies calculate predictive accuracy from predictive ability by dividing it by the square root of heritability (e.g., Dekkers, 2007). This is not always satisfactory because calculating heritability for unbalanced data sets with correlated genotypic effects is far from straightforward. Piepho and Möhring (2007) proposed a simulation approach for estimating the correlation between the predicted and true genotypic values and then using it to calculate heritability for unbalanced data sets with correlated genotypic effects, an approach that may prove useful for GS as well. Instead of using this computationally demanding simulation method, we opted to use the simpler ad hoc method (Piepho and Möhring, 2007) to calculate heritability and obtained an estimated heritability of 0.55 for kernel dry weight and 0.82 for grain moisture. Because we focus here on the relative performances of different methods, dividing the predictive ability by the square root of the estimated heritability to yield estimated accuracy would seem superfluous as heritability is invariant across methods. For all three mixed models (methods A through C), we found that improving the quality of the data increased predictive ability. We conclude that it is always crucial to carefully examine the data and identify outlying observations with extreme values, if any. Extreme values can adversely affect performance metrics such as the Pearson correlation. Because different performance metrics can be affected differentially by extreme observations, it is important to use 2460

different alternative criteria to obtain a more robust evaluation of model performance. The quality of the phenotypic data can be further enhanced by reducing the number of testcross genotypes allocated to an incomplete block of the augmented design and hence reducing the within-block error. Alternative ways of improving the quality of phenotypic data include using designs with replicated testcrosses, such as the α-design, or designs where certain proportions of the testcrosses are replicated within each location (Smith et al., 2006; Cullis et al., 2006; Williams et al., 2011) or testing genotypes in many locations. For modeling population-specific marker effects we focused on the use of mixed models, in particular RRBLUP because it is widely used and has predictive accuracy comparable to several alternative procedures used for GS across many different plant species (Heslot et al., 2012). However, recent simulation evidence indicates that several regularized regression procedures related to the RR-BLUP model, such as ridge regression, the LASSO, and the elastic net, may attain accuracies superior to that of RR-BLUP, at least with simulated data sets (Ogutu et al., 2012). But it is not yet known whether this is also the case for real data sets because not all of these methods have thus far been thoroughly comparatively tested for GS using real data sets. Compared with RR-BLUP or ridge regression, the LASSO and the elastic net possess the advantage that they can automatically select the most relevant subset of markers. Nonetheless, the non-Bayesian LASSO can only select as many markers as there are genotypes, arbitrarily selecting one and omitting the other member of a pair of highly correlated markers, thus failing to do consistent marker selection when the numbers of markers and genotypes are very large (Zou, 2006). The elastic net combines the ridge regression and LASSO penalties to remedy the first two shortcomings of the LASSO but, like the LASSO, cannot consistently select the most important markers when the number of markers and genotypes are very large (Zou, 2006). However, the regularized regression procedures did not improve the prediction ability for the tested data set for botah traits (Supplemental Table S2). Moreover, on optimizing the parameters for the elastic net and ridge regression methods using a fivefold cross-validation the accuracy of both methods became identical and very similar to that of RR-BLUP. Heslot et al. (2012) also found similar results for the elastic net and RRBLUP methods. However, the LASSO showed significantly lower predictive abilities. Lastly, there are as yet no algorithms, to our knowledge, for accounting for populationspecific marker effects using the LASSO or the elastic net. Overall, a combined analysis incorporating several different maize populations and hence increasing the number of genotypes slightly increased predictive accuracy compared with separate population-specific analyses. Accounting for population-specific marker effects in addition to the main marker effects also only slightly increased accuracy.

WWW.CROPS.ORG

CROP SCIENCE, VOL. 52, NOVEMBER– DECEMBER 2012

Supplemental Information Available Supplemental material is available at http://www.crops.org/ publications/cs. Supplemental Table S1. Prediction ability of the main marker effect model extended by adding a population main effect. Supplemental Table S2. Comparisons of the prediction abilities of the different regularized regression methods. Acknowledgments We thank AgReliant Genetics for providing the data set used in this study. This research was funded by AgReliant Genetics and the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr “Synbreed – Synergistic plant and animal breeding” (Grant ID: 0315526).

References Albrecht, T., V. Wimmer, H.J. Auinger, M. Erbe, C. Knaak, M. Ouzunova, H. Simianer, and C.C. Schön. 2011. Genome-based prediction of testcross values in maize. Theor. Appl. Genet. 123:339–350. doi:10.1007/s00122-011-1587-7 Ching, A., K.S. Caldwell, M. Jung, M. Dolan, O.S. Smith, S. Tingey, M. Morgante, and A.J. Rafalski. 2002. SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC Genet. 3:19. doi:10.1186/1471-2156-3-19 Crossa, J., G. de los Campos, P. Pérez, D. Gianola, G. Atlin, J. Burgueño, J.L. Araus, D. Makumbi, J. Yan, V. Arief, M. Banziger, and H.J. Braun. 2010. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724. doi:10.1534/genetics.110.118521 Cullis, B.R., A.B. Smith, and N.E. Coombes. 2006. On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 11:381–393. doi:10.1198/108571106X154443 Dekkers, J.C.M. 2007. Prediction of response to marker-assisted and genomic selection using selection index theory. J. Anim. Breed. Genet. 124:331–341. doi:10.1111/j.1439-0388.2007.00701.x de Roos, A.P.W., B.J. Hayes, and M.E. Goddard. 2009. Reliability of genomic predictions across multiple populations. Genetics 183:545– 1553. doi:10.1534/genetics.109.104935 Fox, P.N., R. Mead, M. Talbot, and J.D. Corbett. 1997. Data management and validation. In: R.A. Kempton and P.N. Fox, editors, Statistical methods for plant variety evaluation. Chapman and Hall, London, UK. Habier, D., R.L. Fernando, and J.C.M. Dekkers. 2007. The impact of genetic relationship information on genome-assisted breeding values. Genetics 177:2389–2397. Hayes, B.J., P.J. Bowman, A.J. Chamberlain, and M.E. Goddard. 2009a. Invited review: Genomic selection in dairy cattle: Progress and challenges. J. Dairy Sci. 92:433–443. doi:10.3168/jds.2008-1646 Hayes, B., P. Bowman, A. Chamberlain, K. Verbyla, and M.E. Goddard. 2009b. Accuracy of genomic breeding values in multi-breed dairy cattle populations. Genet. Sel. Evol. 41:51. doi:10.1186/12979686-41-51 Heslot, N., H.P. Yang, M.E. Sorrells, and J.L. Jannink. 2012. Genomic selection in plant breeding: A comparison of models. Crop Sci. 52:146–160. Ibáne˜z-Escriche, N., R.L. Fernando, A. Toosi, and J.C. Dekkers. 2009. Genomic selection of purebreds for crossbred performance. Genet. Sel. Evol. 41:12.

CROP SCIENCE, VOL. 52, NOVEMBER– DECEMBER 2012

Jannink, J.L., A.J. Lorenz, and H. Iwata. 2010. Genomic selection in plant breeding: From theory to practice. Briefings Funct. Genomics 9:166–177. doi:10.1093/bfgp/elq001 Liu, W., M. Gowda, J. Steinhof, H.P. Maurer, T. Würschum, C.F.H. Longin, F. Cossic, and J.C. Reif. 2011. Association mapping in an elite maize breeding population. Theor. Appl. Genet. 123:847–858. doi:10.1007/s00122-011-1631-7 Lorenzana, R., and R. Bernardo. 2009. Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theor. Appl. Genet. 120:151–161. doi:10.1007/s00122-009-1166-3 Meuwissen, T.H.E., B.J. Hayes, and M.E. Goddard. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829. Ogutu, J.O., T. Schulz-Streeck, and H.P. Piepho. 2012. Genomic selection using regularized linear regression models: Ridge regression, lasso, elastic net and their extensions. BMC Proc. 6(Suppl. 2):S10. doi:10.1186/1753-6561-6-S2-S10 Piepho, H.P. 2009. Ridge regression and extensions for genome-wide selection in maize. Crop Sci. 49:1165–1176. doi:10.2135/cropsci2008.10.0595 Piepho, H.P., and J. Möhring. 2007. Computing heritability and selection response from unbalanced plant breeding trials. Genetics 177:1881– 1888. doi:10.1534/genetics.107.074229 Piepho, H.P., J. Möhring, A.E. Melchinger, and A. Büchse. 2008. BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161:209–228. doi:10.1007/s10681-007-9449-8 Piepho, H.P., J.O. Ogutu, T. Schulz-Streeck, B. Estaghvirou, A. Gordillo, and F. Technow. 2012. Efficient computation of ridge-regression BLUP in genomic selection in plant breeding. Crop Sci. 52:1093– 1104. doi:10.2135/cropsci2011.11.0592 Riedelsheimer, C., A. Czedik-Eysenberg, C. Grieder, J. Lisec, F. Technow, R. Sulpice, T. Altmann, M. Stitt, L. Willmitzer, and A.E. Melchinger. 2012. Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat. Genet. 44:217–220. doi:10.1038/ng.1033 Schulz-Streeck, T., J.O. Ogutu, and H.P. Piepho. 2011. Pre-selection of markers for genomic selection. BMC Proc. 5(Suppl. 3):S12. doi:10.1186/1753-6561-5-S3-S12 Schulz-Streeck, T., J.O. Ogutu, and H.P. Piepho. 2012. Comparisons of single-stage and two-stage approaches to genomic selection. Theor. Appl. Gen. (in press). Schulz-Streeck, T., and H.P. Piepho. 2010. Genome-wide selection by mixed model ridge regression and extensions based on geostatistical models. BMC Proc. 4(Suppl. 1):S8. doi:10.1186/1753-6561-4-S1-S8 Smith, A.B., P. Lim, and B.R. Cullis. 2006. The design and analysis of multi-phase plant breeding experiments. J. Agric. Sci. 144:393–409. doi:10.1017/S0021859606006319 Toosi, A., R.L. Fernando, and J.C.M. Dekkers. 2010. Genomic selection in admixed and crossbred populations. J. Anim. Sci. 88:32–46. doi:10.2527/jas.2009-1975 Verbyla, K.L., B.J. Hayes, P.J. Bowman, and M.E. Goddard. 2009. Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle. Genet. Res. 91(05):307–311. doi:10.1017/S0016672309990243 Williams, E., H.P. Piepho, and D. Whitaker. 2011. Augmented p-rep designs. Biom. J. 53(1):19–27. doi:10.1002/bimj.201000102 Zhao, Y., M. Gowda, W. Liu, T. Würschum, H.P. Maurer, F.H. Longin, N. Ranc, and J.C. Reif. 2012. Accuracy of genomic selection in European maize elite breeding populations. Theor. Appl. Genet. 124(4):769–776. doi:10.1007/s00122-011-1745-y Zou, H. 2006. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101:1418–1429. doi:10.1198/01621450600000073

WWW.CROPS.ORG

2461

Suggest Documents