Improved Statistical Inference for Graphical Description and ...

13 downloads 93 Views 3MB Size Report
Sep 27, 2013 - In evaluating the validity of biplot analysis of genotype × environment interaction (GE) based on different linear–bilin- ear models (e.g., Yan et ...
Published September 27, 2013

RESEARCH

Improved Statistical Inference for Graphical Description and Interpretation of Genotype × Environment Interaction Zhiqiu Hu and Rong-Cai Yang*

ABSTRACT Nonparametric resampling bootstrapping approach to constructing confidence regions (CR) for genotypic and environmental principal component (PC) scores recently has been used to statistically assess the biplot analysis of genotype × environment interaction (GE). However, it is possible to generate “greater-than-expected” CR due to nonunique singular value decomposition (SVD) of two-way GE data from bootstrap samples. The objective of this study is to improve the current bootstrapping procedure to correct for the “systematic bias” due to the nonuniqueness of SVD through the use of Procrustes rotation. The Procrustes rotation is to compare the genotypic and environmental PC scores from bootstrap samples and original (target) data, with the comparison being done by rotating and then stretching and/or shrinking the PC scores from bootstrap samples such that the sum of squared distances between the corresponding elements of bootstrap and target scores is minimized. The bootstrapping and Procrustes rotation are implemented in an R package, bbplot/R. The analysis of two data sets from wheat (Triticum aestivum L.) and barley (Hordeum vulgare L.) cultivar trials shows that the CR for rotated genotypic and environmental scores are up to 10 times smaller than the CR for the corresponding unrotated scores. The shrunk CR constructed using the rotated scores for the biplot analysis reveal more definite delineations of mega-environments than the assessment based on mere visual inspection of biplots. Thus, the improved bootstrapping approach will construct the more precise CR for the genotypic and environmental PC scores, thereby facilitating the correct use of biplot analysis for critical decisions on genotype selection or mega-environment delineation.

2400

Z. Hu and R.-C. Yang, Dep. of Agricultural, Food and Nutritional Science, Univ. of Alberta, Edmonton, AB T6G 2P5, Canada; R.-C. Yang, Crop Research and Extension Division, Alberta Agriculture and Rural Development, Edmonton, Alberta T6H 5T6, Canada. Received 3 Apr. 2013. *Corresponding author ([email protected]). Abbreviations: 1-D, one-dimensional; 2-D, two-dimensional; AMMI, additive main effects and multiplicative interaction; CR, confidence regions; GE, genotype × environment interaction; GGE, genotype main effects and genotype × environment interaction; GLBM, general linear– bilinear model; PC, principal component; PCA, principal component analysis; SREG, sites regression; SVD, singular value decomposition.

I

n evaluating the validity of biplot analysis of genotype × environment interaction (GE) based on different linear–bilinear models (e.g., Yan et al., 2000; Zobel et al., 1988), Yang et al. (2009) pointed out the need for use of confidence regions (CR) for individual genotypic and environmental principal component (PC) scores in biplots to make critical decisions on genotype selection or cultivar recommendation on a sound statistical footing. While parametric approaches to constructing such CR are available (Denis and Gower, 1994, 1996; Denis and Pázman, 1999), they are not easily implemented for complex linear–bilinear models and furthermore they require restrictive assumptions such as asymptotic normality. It is also unclear how the CR constructed under the strictly fixed-effect model can be extended under a mixed-effect model. For these reasons, Yang et al. (2009) advocated the use of bootstrapping, a nonparametric resampling technique (Efron, 1982; Lebart, 2007; Timmerman et al., 2007; Yang et al., 1996) for constructing the CR for genotypic and environmental scores. Bootstrapping operates by drawing random

Published in Crop Sci. 53:2400–2410 (2013). doi: 10.2135/cropsci2013.04.0218 © Crop Science Society of America | 5585 Guilford Rd., Madison, WI 53711 USA All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher. www.crops.org

crop science, vol. 53, november– december 2013

samples of the same size as the original sample from that sample with replacement and these bootstrap samples are used to construct empirical distributions of estimated genotypic and environmental scores. This nonparametric approach is more flexible, requires no distributional assumption concerning the estimates, and can be used for both fixed- and mixed-effect models. Genotypic and environmental PC scores are routinely obtained using a mathematical technique known as singular value decomposition (SVD). However, it is well known that SVD of a two-way GE table is not unique (Milan and Whittaker, 1995). Therefore, in bootstrapping, two different bootstrap samples from the same GE two-way table may have almost identical biplots, but one biplot may be flipped horizontally, or vertically, or both or in general it may be deflected in an arbitrary angle due to the nonunique SVD. It is desirable that if the flip or deflection is known or detectable, then the biplot should be rotated at each bootstrap sample so that it lines up as closely as possible with the original complete-data biplot. Yang et al. (2009) did not consider the effect of nonunique SVD and thus their CR generated by bootstrapping might have been deviated from the true CR (Yan et al., 2010; Kevin Wright, personal communication, 2009). For simplicity, Yang et al. (2009) used the onedimensional (1-D) bootstrapping. In other words, Yang et al. (2009) did not randomize individual cell means in the two-way table but instead randomized only either columns or rows (but not both), keeping rows or columns unchanged. However, bidirectional bootstrapping is required for the two-way GE table because either genotypes or environments have been used as variables for different types of GE analysis (Gauch, 1992; Hussein et al., 2000). A direct application of such bootstrapping would require that each bootstrap sample is drawn at random with replacement from all GE cell means in the two-way table. Since the resampling with replacement implies that some of the original cell means will not appear in a bootstrap sample whereas others may appear many times, the new two-way GE data from the bootstrap sample is obviously unbalanced. However, SVD needs to be done on a balanced data set from bootstrap samples. In the present paper, we propose a new method of bootstrapping that allows for removing the deficiencies in the earlier method of Yang et al. (2009). With this new development, we believe that bootstrapping will be a useful statistical procedure for biplot analyses of GE based on different linear–bilinear models.

MATERIALS AND METHODS Models for Biplot

Let us consider a set of multi-environment trials where g genotypes are tested in each of e environments each with r replications. The phenotypic values of individual genotypes averaged crop science, vol. 53, november– december 2013 

over r replications within each environment can be arranged in the g × e two-way table with the value of the ith genotype in the jth environment being denoted as yij. Such a two-way table may be analyzed through the joint use of ANOVA and SVD. The SVD is essentially a doubled use of principal component analysis (PCA) for genotypes and environments in the GE twoway table. The analysis is performed using the following general linear–bilinear model (GLBM) (Cornelius and Seyedsadr, 1997; Yang et al., 2009),

yij = å h=1 bh xhij + å k=1 l ka ik g jk + eij , m

t

[1]

in which yij is the mean of the ith genotype in the jth environment, the xhij ’s are the indicator variables in the design matrix for linear terms and the b h parameters (regression coefficients) are for the linear terms, and lk ’s (l1 ≥ l 2 ≥ … ≥ l t) are scaling constants (singular values) that allow the imposition of orthonormality constraints on the singular vectors for genotypes [a k = (a1k , …, a gk)] and for environments [g k = (g1k , …, g ek)] such that å i a ij2 = å j g 2jk = 1 and å i a ika ik ¢ = å j g jk g jk ¢ = 0 for k ≠ k¢. The a ik and g jk for k = 1, 2, 3,… are called “primary,” “secondary,” “tertiary,” etc. effects of genotypes and environments, respectively; e ij is the residual error assumed to be normally and independently distributed (NID)(0, s2/r) with s2 being the pooled within-environment error variance. Special cases of the GLBM and their applications to the GE analysis have been described in the literature (for reviews, see Gauch, 2006; Gauch et al., 2008; Yan et al., 2007; Yan and Tinker, 2006; Yang et al., 2009). The additive main effects and multiplicative interaction (AMMI) model and the genotype main effects and genotype × environment interaction (GGE) model (fitted to residuals after removal of environment main effects) have been the two most commonly used models for the biplot analysis. The GGE is sometimes called the sites regression (SREG) model (Crossa and Cornelius, 1997), but the model has been of common use since (Yan et al., 2000) introduced the GGE biplot analysis. In the AMMI model, only the GE is modeled by the bilinear terms whereas in the GGE or SREG model, the bilinear terms model the main effects of genotypes plus the GE. To illustrate our new method, we will use the GGE or SREG model,

yij = m + d j + å k=1 l ka ik g jk + eij , t

[2]

in which the maximum number of multiplicative terms in the sum is t = minimum of (g, e – 1) for the full model. Two practices are often done to model [2] for the usual GGE biplot analysis. First, the singular value g k is absorbed by the vectors * f of genotypic and environmental scores, that is, a ik = l k a ik and g *jk = l1k- f g jk , with 0 ≤ f ≤ 1. Second, a reduced model with only the first two multiplicative terms being retained is used, that is,

yij* = yij - m - d j » a i*1g *j1 + a i*2 g *j 2 + eij ,

[3]

www.crops.org 2401

in which the approximation indicated in Eq. [3] reflects the constraint that the third singular value and all subsequent sint * * gular values are zero, that is, å k=3 a ik g jk = 0 .

Bidirectional Bootstrapping A direct application of bootstrapping would require that each bootstrap sample is drawn at random with replacement from all the cell means in the two-way GE table. Since the resampling with replacement means that some of the cell means will not appear in a bootstrap sample whereas others may appear many times, the new two-way GE data from such a bootstrap sample is obviously unbalanced. However, SVD cannot be performed on an unbalanced data set. In the present study, a constraint is imposed to this direct approach to ensure a balanced data set in each bootstrap sample. Specifically, for a g row × e column GE table, two sets of random numbers are generated, one set for g random draws (with replacement) from g genotypes and the other set for e random draws (with replacement) from e environments. A cell mean would be included in a bootstrap sample if it is located at the intersection of the respective row and column that are taken by the random draws (see Supplemental Fig. S1 for graphical demonstration). This resampling process is repeated 10,000 times to obtain 10,000 bootstrap samples. The genotypic and environmental PC scores from SVD are computed directly from the original data and from each of 10,000 bootstrap samples. From the empirical distribution by bootstrapping, we obtain approximate upper and lower 95% confidence limits for PC1 and PC2 scores. The above bidirectional bootstrapping is different from the 1-D bootstrapping strategy of Yang et al. (2009). In generating each bootstrap sample, rather than randomizing individual cell means in the two-way table, Yang et al. (2009) randomized only either columns or rows (but not both), keeping rows or columns unchanged. Although this sampling strategy produced a reasonable 95% confidence interval for each score examined, it remains somewhat arbitrary which of the two dimensions (rows or columns) should be used for bootstrapping while the other leaves unchanged. The procedure of Yang et al. (2009) can be extended for the bidirectional bootstrapping, but it requires a two-step sampling process: bootstrapping for rows (genotypes) and for columns (environments), respectively. The two-step sampling would be less efficient because it requires an intermediate matrix for the second bootstrapping.

Procrustes Rotation It is well known ( Jackson, 1995; Milan and Whittaker, 1995) that the matrices for genotypic and environmental scores (known as loading matrices) from PCA or SVD are not unique. Ignoring this nonuniqueness would lead to inflated bootstrap distributions of individual genotypic and environmental PC scores (Timmerman et al., 2007). In bootstrapping, for example, two different bootstrap samples may have almost identical biplots, but one of the two biplots appears to be flipped horizontally, or vertically, or both or in general it may be deflected in an arbitrary angle simply due to the nonunique SVD. Therefore, it is important to align the biplot at each bootstrapping so that it lines up as closely as possible with the biplot based on the original (target) data.

2402

The “Procrustes rotation” (e.g., Andrade et al., 2004) is often used to compare the loading matrices from bootstrap samples and original data. Specifically, the comparison is done by rotating and then stretching and/or shrinking the loading scores from the bootstrap data such that the sum of squared distances (M 2) between the corresponding elements of bootstrap and target scores is minimized. The smaller value of M 2, the more similar are the two configurations. A perfect match gives M 2 = 0. Here we illustrate the use of Procrustes rotation for genotypic scores and the same procedure can be applied to environmental scores as well. Let A and T be the loading matrices for genotypic scores from a bootstrap sample and target data, respectively. The Procrustes rotation is achieved by finding an orthogonal matrix (Q) such that the squared Euclidean norm (M 2) is minimized, 2

M 2 = AQ - T = tr( TT '+ AA ') - 2tr( T'AQ ) = min , [4]

in which tr(X) is the trace of matrix X, which equals to the sum of diagonal elements of the matrix, and min is the minimum. Using SVD of T′A = UDV¢ and the cyclic property of matrix trace, we have tr(T¢AQ) = tr(DH), in which H = V¢QU is an orthogonal matrix because it is the product of three orthogonal matrices. Thus, M 2 is minimized if H = I or Q = VU¢, that is,

M 2 = tr( TT '+ AA '- 2D ) .

[5]

Clearly, M 2 = 0 in Eq. [5] if T = A.

Software Implementation We have implemented the above bidirectional bootstrapping and Procrustes rotation in an R package, bbplot/R (Hu and Yang, 2013b). The implementation includes the following steps: (i) carry out the SVD of GGE matrix, as implemented by the svd function in R (R Core Team, 2012), to obtain genotypic and environmental PC scores corresponding to the first two PCs, (ii) generate bootstrap samples through random sampling with replacement for rows and columns of the GGE matrix simultaneously, (iii) repeat step (i) for each bootstrap sample, (iv) perform Procrustes rotation by aligning the genotypic and environmental scores from bootstrap samples toward those corresponding scores from the original data, and (v) construct the confidence regions based on the empirical distribution of the aligned genotypic or environmental scores from all bootstrap samples, using a distribution-free method as implemented in our R package, distfree.cr/R (Hu and Yang, 2013a).

Calculating Confidence Region Areas As described above, the nonunique SVD of the GE two-way data in bootstrap samples could add an extra source of variation (systematic bias) to the random variation among PC scores form different bootstrap samples and the Procrustes rotation helps minimize such systematic bias. Therefore, the CR constructed directly from the unrotated PC scores are expected to be larger than the CR from rotated PC scores. This expectation is assessed by comparing the CR areas inferred from rotated and unrotated scores. In this study, we calculate the CR areas using the areapl function in the splancs/R package (Bivand et

www.crops.org

crop science, vol. 53, november– december 2013

Figure 1. Empirical 95% confidence regions of the 18 genotypes obtained from 10,000 bootstrap samples for the Ontario winter wheat data. The black and gray dots represent rotated and unrotated genotypic principal component (PC) scores, respectively. The white dots stand for the PC scores obtained from the original data. G, genotype.

al., 2013), an R implementation of the splancs/S-plus package that was initially developed by Rowlingson and Diggle (1993).

RESULTS

Data Sets

The genotypic and environmental scores under unrotated PC1 and PC2 axes show larger 95% CR than those under rotated PC1 and PC2 axes (Fig. 1 and 2). The ratios of 95% CR areas for unrotated to rotated genotypic scores range from 1.7 (G17) to 12.3 (G12) with the averaged ratio being 4.2 (Table 1). In other words, on average, the 95% CR of unrotated genotypic scores are 4.2 times larger than those of rotated genotypic scores. Similarly, the ratios of 95% PC for unrotated to rotated environmental scores range from 2.1 (E3) to 6.5 (E9) with the averaged ratio being 4.5. While Procrustes rotation has definitely shrunk the CR, the extent of shrinkage varies considerably among different genotypic and environmental scores. It is also evident from Fig. 1 and 2 that the 95% CR of unrotated or rotated genotypic scores are generally larger than those of environmental scores. This is expected as the main environmental effect is removed from the GGE matrix. The GGE biplot along with 95% CR for rotated genotypic and environmental scores is given in Fig. 3. To avoid unnecessary overcrowding, the CR are displayed only for those scores that are significantly different from the origin of the biplot [i.e., the CR of the scores that exclude the point of (0,0)]. Thus, 11 out of the 18 genotypes, G3, G4, G5, G7,

Two data sets are used for demonstration. The first data set is the Ontario winter wheat example and it is the same data set used by Yan et al. (2007). Yang et al. (2009) also used it for illustrating the application of their bootstrapping approach to construction of confidence intervals for genotypic and environmental scores. To see improved confidence regions in this study, we reanalyze the data set here. Briefly, it consists of the yield data of 18 winter wheat genotypes (G1 to G18) tested at nine Ontario locations (E1 to E9). As required in the usual GGE biplot analysis, the deviations of cell means for all 162 (18 × 9) genotype–location combinations from location means are calculated. This GGE matrix is the basis for bidirectional bootstrapping, SVD, and Procrustes rotation. The second data set is the barley example taken from Yang (2007) who analyzed the yield of six cultivars (G1 to G6) evaluated in 2003 at 18 sites across the Province of Alberta including the two neighboring sites in the Province of British Columbia (E1 to E18). The cultivar trial in each of the 18 sites was performed using a randomized complete block design with three or four replications. Just like in the above winter wheat example, the deviations of cell means for all 108 (6 × 18) genotype–location combinations from location means are calculated and this GGE matrix is subsequently used for bidirectional bootstrapping, SVD, and Procrustes rotation. crop science, vol. 53, november– december 2013 

Wheat Data

www.crops.org 2403

Table 1. Areas and ratios of the empirical 95% confidence regions (CR) of principal component (PC) 1 and PC2 scores based on 10,000 rotated and unrotated bootstrap samples for 18 genotypes (G1–G18) and nine environments (E1–E9) in the wheat data. Area of CR Genotype or environment

Figure 2. Empirical 95% confidence regions of the nine environments obtained from 10,000 bootstrap samples for the Ontario winter wheat data. The black and gray dots represent rotated and unrotated environmental principal component (PC) scores, respectively. The white dots stand for the PC scores obtained from the original data. E, environment.

G8, G9, G10, G12, G13, G14, and G18, have a 95% CR not including the origin of the biplot. Of these 11 genotypes, six (G3, G7, G8, G12, G13, and G18) are located at the corners (i.e., vertices) of the polygon in the biplot. Following the standard interpretation of the GGE biplot, six line segments perpendicular to different sides of the polygon are drawn through the origin to subdivide the polygon into six sectors involving different subsets of environments and genotypes: the genotype at the corner of each sector is the best performer in the environments included in that sector. However, the 95% CR of the scores for the “best” genotypes frequently overlap with those for other genotypes. For example, genotype G8 at the upright corner is indistinguishable from genotypes G4 and G10 in the same sector, judging from their overlapped confidence regions. As expected from the GGE biplot (Yan et al., 2000), all the environmental scores are located at the right-hand side of the vertical axis. With one exception (E3), they are significantly different from the origin of the biplot. It is noted that the CR of two environmental scores (E7 and E8) spread over to the left-hand side of the vertical axis, suggesting that these environments may appear unexpectedly on the left-hand side of the vertical axis due merely to sampling variability. 2404

Rotated Unrotated (R) (U)

Ratio (U:R)

PC1

PC2

G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12

–0.31 0.38 0.43 0.96 0.72 0.71 –1.33 1.14 0.89 0.84 0.39 –3.01

–0.74 –0.37 –0.69 0.53 –0.39 –0.12 –0.85 1.33 –0.38 0.66 –0.55 –0.30

1.44 1.46 0.98 0.89 2.12 4.15 1.66 3.32 1.53 1.88 1.75 1.77

4.33 4.20 3.42 5.23 6.58 8.16 11.82 17.59 5.10 7.10 4.00 21.87

3.01 2.87 3.48 5.89 3.11 1.97 7.12 5.29 3.34 3.77 2.28 12.33

G13 G14 G15 G16 G17 G18 Average

–1.64 –2.10 0.45 0.68 –0.24 1.08 0.00

1.45 0.18 0.10 0.09 0.72 –0.63 0.00

5.20 1.76 1.63 1.47 5.51 2.49 2.28

23.34 11.92 2.89 3.38 9.34 8.00 8.79

4.49 6.77 1.77 2.31 1.70 3.22 4.15

E1 E2 E3 E4 E5 E6 E7 E8 E9 Average

0.43 0.34 0.34 0.30 0.49 0.20 0.38 0.08 0.24 0.31

0.14 0.05 0.19 0.21 –0.36 0.37 –0.57 0.43 0.35 0.09

0.48 0.20 0.57 0.18 0.34 0.15 0.35 0.14 0.12 0.28

1.40 0.75 1.19 0.66 1.86 0.78 2.12 0.71 0.76 1.14

2.91 3.68 2.07 3.67 5.51 5.24 6.06 5.24 6.48 4.54

Barley Data With one exception (E5), the 95% CR of the genotypic and environmental scores under unrotated PC1 and PC2 axes are larger than those under rotated PC1 and PC2 axes (Fig. 4 and 5). The ratios of 95% CR for unrotated to rotated genotypic scores range from 1.5 (G1) to 9.6 (G6) with the averaged ratio being 5.0 (Table 2). Similarly, the ratios of 95% CR for unrotated to rotated environmental scores range from 0.6 (E5) to 7.9 (E4) with the averaged ratio being 3.5. Therefore, there is a considerable amount of variation among environmental or genotypic scores in shrinkage by the Procrustes rotation. Despite similar shrinkage ranges for both genotypic and environmental scores, the 95% CR of unrotated or rotated genotypic scores are generally larger than those of environmental scores. This occurs as the main environmental effect is removed from the GGE matrix.

www.crops.org

crop science, vol. 53, november– december 2013

Figure 3. Biplot of 18 genotypic scores and nine environmental scores from the Ontario winter wheat data. The 95% confidence regions are constructed for the genotypic and environmental scores that are located at the corners of the polygon or those that are significantly different from the origin of the biplot using 10,000 rotated bootstrap samples. E, environment; G, genotype.

six barley genotypes (G1 to G6), the confidence regions are displayed for four genotypes (G2, G3, G4, and G6) that are also located at the four corners (i.e., vertices) of the polygon in the biplot but not for the remaining two genotypes (G1 and G5). On the left-hand side of the biplot, genotypes G2 and G3 are obviously not significantly different from each other, judging from their overlapped CR. Given that G5 appears within the CR for G3, G5 is not different from G3 or G2 either. On the right-hand side, genotypes G4 and G6 are not significantly different from each other even though they seem to be far apart. With one exception (E8), all the environmental scores are located at the right-hand side of the vertical axis. The 95% CR for 11 out of the 18 environments (E1 to E4, E6, E8, E10, E12, E15, E16, and E18) are displayed because the PC scores for these environments are significantly different from the origin of the biplot.

DISCUSSION

Figure 4. Empirical 95% confidence regions of the six genotypes obtained from 10,000 bootstrap samples for the barley data. The black and gray dots represent rotated and unrotated genotypic principal component (PC) scores, respectively. The white dots stand for the PC scores obtained from the original data. G, genotype.

Figure 6 shows the GGE biplot along with 95% CR for those rotated genotypic and environmental scores that are significantly different from the origin of the plot. Of the crop science, vol. 53, november– december 2013 

In this study, we improve the bootstrapping procedure of Yang et al. (2009) for constructing the 95% CR for genotypic and environmental scores by removing the following two deficiencies that exist in the earlier version: 1-D bootstrapping and no consideration of nonunique SVD. Regarding the first improvement over the 1-D bootstrapping, we conduct two-dimensional (2-D) bootstrapping by randomizing both rows and columns simultaneously. The 2-D bootstrapping is needed to avoid arbitrary choice of whether rows or columns should be randomized and to be more reflective of the true sampling variability. Of course, the 2-D bootstrapping leads to a greater variation in the sample sizes among bootstrap samples than the 1-D bootstrapping. Since the bootstrapping is a resampling with replacement, the number of unique observations in a given bootstrap sample can be much smaller than in the original data. For a small data set with a limited

www.crops.org 2405

Figure 5. Empirical 95% confidence regions of the 18 environments obtained from 10,000 bootstrap samples for the barley data. The black and gray dots represent rotated and unrotated environmental principal component (PC) scores, respectively. The white dots stand for the PC scores obtained from the original data. E, environment.

number of genotypes or environments, this raises the issue of the minimum sample size that is required for validating the SVD or PCA (Osborne and Costello, 2004). Unfortunately, there is no consensus guideline regarding the actual minimum sample size required by SVD or PCA. In the present study, we consider from a geometrical view that a minimum sample size of four unique observations is required to carry out meaningful SVD and subsequent CR construction. In pattern matching as done through Procrustes rotation in our study, three points are the minimum requirement to align two 2-D patterns because three equations need to be solved to anchor two panels (Bruss and Horn, 1983). Thus, with the sample size of n < 4, no “degree of freedom” is left to estimate the sampling variability that would have been revealed by bootstrapping. Our second improvement is to deal with the potential problem of generating “larger-than-expected” CR due to nonunique SVD of two-way GGE data from different bootstrap samples. In other words, the nonuniqueness of SVD would likely lead to different genotypic and environmental PC scores in bootstrap samples from those in the original data. Geometrically, this is analogous to the situation where a biplot in a given bootstrap sample can be flipped horizontally or vertically or deflected at an arbitrary angle, relative to the biplot in the original data. However, despite these geometrical alterations, the patterns may remain unchanged. Such “systematic bias,” if not corrected, will be added to the random variability in 2406

bootstrap samples. To correct for this bias, we propose the use of “Procrustes rotation” (e.g., Andrade et al., 2004; Timmerman et al., 2007) that allows for comparing the biplots from a bootstrap sample and original data. This method rotates and stretches or shrinks the bootstrap data such that the sum of squared distances (M 2) between the corresponding elements of bootstrap and target data is minimized. The suspected systematic bias is substantial, judging from the sizes of shrinkage from CR of unrotated scores to CR of rotated scores in wheat (Fig. 1 and 2; Table 1) and barley (Fig. 4 and 5; Table 2). Thus, the correction through the Procrustes rotation is very effective. As discussed above, the 2-D bootstrapping generates a great deal of systematic bias due to nonunique SVD in the resultant bootstrap samples. From the same data set as in our wheat example, Yang et al. (2009) used the 1-D bootstrapping to generate bootstrap samples, but these authors did not consider the use of any rotation procedure to correct for the systematic bias. Therefore, it is reasonable to expect some systematic bias as well. However, Fig. 7 shows that the sizes of the confidence intervals for rotated genotypic and environmental scores corresponding to PC1 and PC2 from the 2-D bootstrap samples are very similar to those for unrotated scores from the 1-D bootstrap samples (Supplemental Table S1), suggesting little systematic bias generated from 1-D bootstrapping. This result is somewhat surprising and further research is needed to confirm it with the analysis of simulation and empirical data. Nevertheless, despite some disadvantages, the

www.crops.org

crop science, vol. 53, november– december 2013

Table 2. Areas and ratios of the empirical 95% confidence regions (CR) of principal component (PC) 1 and PC2 scores based on 10,000 rotated and unrotated bootstrap samples for six genotypes (G1–G6) and 18 environments (E1–E18) in the barley data. Genotype or environment

Area of CR PC1

PC2

G1 G2 G3 G4 G5 G6 Average

–0.28 –1.25 –1.70 1.51 –0.77 2.49 0.00

0.60 –1.26 0.32 1.43 0.09 –1.19 0.00

5.44 3.08 3.36 2.91 2.88 3.02 3.45

7.96 17.91 15.77 18.06 5.88 28.94 15.76

1.46 5.81 4.69 6.22 2.04 9.58 4.97

E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 E16 E17 E18 Average

0.18 0.45 0.51 0.27 0.04 0.42 0.22 –0.04 0.03 0.07 0.13 0.29 0.17 0.08 0.18 0.14 0.02 0.07 0.18

0.14 –0.01 0.30 –0.61 0.08 0.09 0.01 –0.34 –0.02 –0.37 0.18 0.12 –0.18 –0.12 0.00 –0.33 –0.09 –0.21 –0.07

0.09 0.21 0.23 0.18 0.26 0.52 0.31 0.09 0.37 0.09 0.13 0.22 0.28 0.16 0.18 0.11 0.24 0.03 0.21

0.31 0.95 1.51 1.40 0.15 1.50 0.58 0.43 0.46 0.52 0.37 0.61 0.62 0.23 0.33 0.54 0.30 0.18 0.61

3.44 4.44 6.68 7.89 0.57 2.88 1.86 4.73 1.24 5.53 2.90 2.78 2.20 1.42 1.86 5.07 1.24 5.46 3.46

Rotated Unrotated

Ratio

1-D bootstrapping introduced by Yang et al. (2009) may be a viable resampling strategy for constructing the CR. The wheat example has been extensively analyzed in the past (e.g., Yan et al., 2007, 2010). In particular, under the “which-won-where” view of the GGE biplot (Yan et al., 2007), it is claimed that genotype G18 yielded more than genotype G8 in eastern Ontario (represented by E5 and E7) and G8 yielded more than G18 in southwestern Ontario (represented by the other seven environments). It is evident from Fig. 3 that this claim is not true because the CR for G8 and G18 overlap. Therefore, Yang et al. (2009) suggested that any “which-won-where” pattern based on initial inspection of biplots is simply a curious visual observation only and must be subject to subsequent parametric or nonparametric statistical assessments (e.g., the use of bootstrapping-based CR) before being recommended for practical utility. Alternatively, Yan et al. (2010) suggested the use of a biplot of genotype-by-replication-within-environment to visually confirm the “which-won-where” pattern. These authors analyzed the wheat example with the intent to show that if the real pattern exists, then all or most of the replicates in a given environment within one mega-environment would be crop science, vol. 53, november– december 2013 

clustered together and be separated from those in the other mega-environment. Two points are immediately clear from a comparison of Fig. 1 and 2 of Yan et al. (2010). First, the two mega-environments (E5 and E7 vs. the other seven environments) become much less distinct when replicate values are used in the biplot. Second, the shapes of the genotype polygons in the two figures are somewhat different, presumably because the averages of genotypic scores over replicates are used in Fig. 2. More importantly, a considerable amount of replication-to-replication variation observed within individual environments supports our advocating of the need for measuring the uncertainty of genotypic and environmental scores. In many practical cases where only the two-way GE table of cell means is available without replicate data, the bootstrapping is likely the only means of statistical assessment of the biplot. In addition, the wheat example and many other examples presented in the literature have identified “which-wonwhere” patterns based on 1-yr data only. Such a pattern is one realization among many possible outcomes, and its repeatability in the realization of future years is quite unknown. Even when the biplot analyses of multiple-year GE data are done (Navabi et al., 2006; Yan et al., 2000), the consistency of “which-won-where” patterns is assessed through visual inspection of biplots over years. While such practice has often been performed, it has at least three problems. First, the configuration of a biplot for any given year is arbitrary due to nonunique SVD of two-way GE data for that year. Therefore, a comparison between biplots over years may be misleading when the flip, deflection, or any other types of inconsistency occur. Second, the comparison is highly unreliable because there is no statistical basis for visual judgment. Third, it may be difficult to line up the biplots for different years because there is no basis to decide which year should serve as a reference for the alignment. Statistical assessment of the consistency of “which-wonwhere” patterns is possible by extending our current bootstrap-based procedure to the analysis of the multiyear data. An extension to the analysis of balanced multiyear data is relatively straightforward. When the same cultivar trials (i.e., the same genotypes and locations) are performed in multiple years, genotypic and environmental PC scores along with their CR across multiple years can be overlaid and statistical assessment of a repeatable “whichwon-where” pattern over years are achieved by inspecting whether or not the CR for the same genotypic or environmental PC scores overlap over years. This statistical assessment will help determine if two-way GE tables across different years can be averaged for mega-environment delineation. To illustrate the analysis of balanced multiyear data, we analyze a subset of the data from our barley example that six genotypes were tested only in nine locations but over three consecutive years (2001–2003) (see Supplemental Fig. S2, S3, and S4). The overlaying of

www.crops.org 2407

Figure 6. Biplot of six genotypic scores and 18 environmental scores from the barley data. The 95% confidence regions are constructed for the genotypic and environmental scores that are located at the corners of the polygon or those that are significantly different from the origin of the biplot using 10,000 rotated bootstrap samples. E, environment; G, genotype.

Figure 7. Sizes of the 95% confidence intervals for 18 wheat genotypic principal component (PC) scores and nine environmental PC scores inferred from bootstrap samples generated by rotated, unrotated, and the one-dimensional sampling strategies of Yang et al. (2009). E, environment; G, genotype.

biplots for the 3 yr is done by using the biplot for 2001 as an “anchor” and the other two biplots are rotated to align with the anchored biplot. It is quite evident from these figures that the inconsistency occurs in terms of (i) the best cultivars identified for individual candidate megaenvironments and (ii) the membership of individual locations within the candidate mega-environments. However, it remains to be investigated how best our current method can be extended to the analysis of unbalanced multiyear data, a more common type of multiyear data from most cultivar trials. The obvious challenge is how to line up the biplots when genotypes and environments are different from 2408

biplot to biplot. One possible strategy of tackling this challenge is to construct a consensus biplot for all years by using the generalized Procrustes analysis (Gower, 1975). Another possible strategy is the use of genotypic and environmental PC scores derived averaged two-way GE table across years as references for the alignment of biplots from different years. Further studies are required to fully explore these topics. In this study, we focus on the improvement of our earlier bootstrap-based method for constructing CR (Yang et al., 2009) for a better use of biplot analysis. However, the success of such improvement is conditional on the adequacy of biplot analysis itself. It is evident from Eq. [3] that the usual

www.crops.org

crop science, vol. 53, november– december 2013

construction of a biplot is based on the rank-two approximation. In other words, the biplot is drawn using the genotypic and environmental scores from the first two PCs under the assumption that the remaining PCs are negligible. If this assumption is not true, the biplot based on the first two PCs is not adequate and any meaningful use or inference of the biplot must go beyond just the rank-two approximation. For example, as Gauch et al. (2008) eloquently demonstrated through the analysis of the wheat example that is also used in this study, inclusion of more than two PCs would increase the number of mega-environments delineated, ranging from three to six mega-environments. This is different from the conclusion of two mega-environments drawn by Yan et al. (2007) who used the rank-two approximation. Our improved bootstrap-based method for CR is illustrated through the GGE biplot analysis. This does not at all mean that the improvement works only for the GGE model. It would work for other linear–bilinear models as well. To show that this is indeed the case, we carry out the analyses of wheat and barley data based on the AMMI model, another commonly used linear–bilinear model for the biplot analysis. The AMMI model is similar to the GGE model as given in Eq. [3] except that the main genotypic effect needs to be further adjusted, y * = y - m - t - d » ai*1g *j1 + ai*2 g *j 2 + eij . ij ij i j

The results from the AMMI biplot analysis (Supplemental Fig. S5, S6, S7, and S8; Supplemental Tables S2 and S3) show similar reductions in CR due to correction for the systematic bias.

Supplemental Information Available Supplemental material is available at http://www.crops. org/publications/cs. Supplemental Figure S1. Graphical demonstration of bidirectional bootstrapping with four genotypes and five environments. Supplemental Figure S2. Consistency of genotypic principal component (PC) scores across three years revealed by genotype main effects and genotype × environment interaction (GGE) biplots for six barley varieties tested at nine locations in Alberta during 2001 to 2003. A “solid” line means that the empirical 95% confidence regions (CR) of the two PC scores overlap; a “dotted” line means that the two CR do not overlap. Supplemental Figure S3. Consistency of environmental principal component (PC) scores across three years revealed by genotype main effects and genotype × environment interaction (GGE) biplots for six barley varieties tested at nine locations in Alberta during 2001 to 2003. A “solid” line means that the empirical 95% confidence regions (CR) of the two PC scores overlap; a “dotted” line means that the two CR do not overlap. crop science, vol. 53, november– december 2013 

Supplemental Figure S4. Genotype main effects and genotype × environment interaction (GGE) biplot of six varieties and nine environments derived from the cell means in a genotype × environment table that are the average yields from the barley cultivar trials tested during 2001 to 2003. The 95% confidence regions are constructed for the environmental principal component scores that are significantly different from the origin of the biplot (0,0) using 10,000 rotated bootstrap samples. Supplemental Figure S5. Empirical 95% confidence regions of principal component (PC) 1 and PC2 scores in additive main effects and multiplicative interaction (AMMI) 2 model for the 18 genotypes obtained from 10,000 bootstrap samples for the Ontario winter wheat data. The black and gray dots represent rotated and unrotated genotypic PC scores, respectively. The white dots stand for the PC scores obtained from the original data. Supplemental Figure S6. Empirical 95% confidence regions of principal component (PC) 1 and PC2 scores in additive main effects and multiplicative interaction (AMMI) 2 model for the nine environments obtained from 10,000 bootstrap samples for the Ontario winter wheat data. The black and gray dots represent rotated and unrotated environmental PC scores, respectively. The white dots stand for the PC scores obtained from the original data. Supplemental Figure S7. Empirical 95% confidence regions of principal component (PC) 1 and PC2 scores in additive main effects and multiplicative interaction (AMMI) 2 model for the six genotypes obtained from 10,000 bootstrap samples for the barley data. The black and gray dots represent rotated and unrotated genotypic PC scores, respectively. The white dots stand for the PC scores obtained from the original data. Supplemental Figure S8. Empirical 95% confidence regions of principal component (PC) 1 and PC2 scores in additive main effects and multiplicative interaction (AMMI) 2 model for the 18 environments obtained from 10,000 bootstrap samples for the barley data using AMMI2 model. The black and gray dots represent rotated and unrotated environmental PC scores, respectively. The white dots stand for the PC scores obtained from the original data. Acknowledgments We thank Kevin Wright for helpful discussion before the study, and three anonymous reviewers for constructive criticisms and valuable comments. This research was supported by the Natural Sciences and Engineering Research Council of Canada discovery grant (Award #183983) to R.-C. Y.

References Andrade, J.M., M.P. Gómez-Carracedo, W. Krzanowski, and M. Kubista. 2004. Procrustes rotation in analytical chemistry, a tutorial. Chemometr. Intell. Lab. 72:123–132. doi:10.1016/j. chemolab.2004.01.007 Bivand, R., B. Rowlingson, P. Diggle, G. Petris, and S. Eglen.

www.crops.org 2409

2013. splancs: Spatial and space-time point pattern analysis. R package version 2.01-33. Comprehensive R Archive Network, Wirtschaftsuniversitat Wien, Austria. http://CRAN.R-project.org/package=splancs (accessed 20 Aug. 2013). Bruss, A.R., and B.K.P. Horn. 1983. Passive navigation. Comput. Vision Graph. 21:3–20. doi:10.1016/S0734-189X(83)80026-7 Cornelius, P.L., and M.S. Seyedsadr. 1997. Estimation of general linear-bilinear models for two-way tables. J. Statist. Comput. Simulation 58:287–322. doi:10.1080/00949659708811837 Crossa, J., and P.L. Cornelius. 1997. Sites regression and shifted multiplicative model clustering of cultivar trial sites under heterogeneity of error variances. Crop Sci. 37:406–415. doi:10.2135/cropsci1997.0011183X003700020017x Denis, J.-B., and J.C. Gower. 1994. Biadditive models. Biometrics 50:310–311. Denis, J.-B., and J.C. Gower. 1996. Asymptotic confidence regions for biadditive models: Interpreting genotype-environment interactions. J. Roy. Stat. Soc. C- App. 45:479–493. Denis, J.-B., and A. Pázman. 1999. Bias of LS estimators in nonlinear regression models with constraints. Part II: Biadditive models. Appl. Math. 44:375–403. Efron, B. 1982. The jackknife, the bootstrap, and other resampling plans. Society for Industrial and Applied Mathematics., Philadelphia, PA. Gauch, H.G. 1992. Statistical analysis of regional yield trials: AMMI analysis of factorial designs. Elsevier, Amsterdam, the Netherlands. Gauch, H.G. 2006. Statistical analysis of yield trials by AMMI and GGE. Crop Sci. 46:1488–1500. doi:10.2135/cropsci2005.07-0193 Gauch, H.G., H.P. Piepho, and P. Annicchiarico. 2008. Statistical analysis of yield trials by AMMI and GGE: Further considerations. Crop Sci. 48:866–889. doi:10.2135/cropsci2007.09.0513 Gower, J.C. 1975. Generalized Procrustes analysis. Psychometrika 40:33–51. doi:10.1007/BF02291478 Hu, Z., and R.-C. Yang. 2013a. An R package for a new distribution-free approach to constructing confidence regions (distfree.cr/R). Department of Agricultural, Food and Nutritional Science, University of Alberta. Edmonton, Alberta, Canada. http://statgen.ualberta.ca/index.html?open=software.html (accessed 20 Aug. 2013). Hu, Z., and R.-C. Yang. 2013b. An R package for bidirectional bootstrapping and Procrustes rotation (bbplot/R). Department of Agricultural, Food and Nutritional Science, University of Alberta. Edmonton, Alberta, Canada. http://statgen.ualberta. ca/index.html?open=software.html (accessed 20 Aug. 2013). Hussein, M.A., A. Bjornstad, and A.H. Aastveit. 2000. SASG× ESTAB: A SAS program for computing genotype × environment stability statistics. Agron. J. 92:454–459. doi:10.2134/ agronj2000.923454x Jackson, D.A. 1995. Bootstrapping principal components analysis: Reply to Mehlman et al. Ecology 76:644–645. doi:10.2307/1941220

2410

Lebart, L. 2007. Which bootstrap for principal axes methods? In: E. Diday and P. Brito, editors, Selected contributions in data analysis and classification. Springer, Berlin, New York. p. 581–588. Milan, L., and J. Whittaker. 1995. Application of the parametric bootstrap to models that incorporate a singular-value decomposition. J. Roy. Stat. Soc. C- App. 44:31–49. Navabi, A., R.-C. Yang, J. Helm, and D.M. Spaner. 2006. Can spring wheat-growing megaenvironments in the northern Great Plains be dissected for representative locations or nicheadapted genotypes? Crop Sci. 46:1107–1116. doi:10.2135/ cropsci2005.06-0159 Osborne, J.W., and A.B. Costello. 2004. Sample size and subject to item ratio in principal components analysis. Pract. Assess., Res. Eval. 9:11. R Core Team. 2012. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Rowlingson, B.S., and P.J. Diggle. 1993. Splancs: Spatial point pattern analysis code in S-plus. Comput. Geosci. 19:627–655. doi:10.1016/0098-3004(93)90099-Q Timmerman, M.E., H.A.L. Kiers, and A.K. Smilde. 2007. Estimating confidence intervals for principal component loadings: A comparison between the bootstrap and asymptotic results. Br. J. Math. Stat. Psychol. 60:295–314. doi:10.1348/000711006X109636 Yan, W., K.D. Glover, and M.S. Kang. 2010. Comment on “Biplot analysis of genotype × environment interaction: Proceed with caution” by R.-C. Yang, J. Crossa, P.L. Cornelius, and J. Burgueño in 2009 49:1564–1576. Crop Sci. 50:1121–1123. doi:10.2135/cropsci2010.01.0001le Yan, W., M.S. Kang, B. Ma, S. Woods, and P.L. Cornelius. 2007. GGE Biplot vs. AMMI analysis of genotype-by-environment data. Crop Sci. 47:641–653. Yan, W., and N.A. Tinker. 2006. Biplot analysis of multi-environment trial data: Principles and applications. Can. J. Plant Sci. 86:623–645. doi:10.4141/P05-169 Yan, W.K., L.A. Hunt, Q.L. Sheng, and Z. Szlavnics. 2000. Cultivar evaluation and mega-environment investigation based on the GGE biplot. Crop Sci. 40:597–605. doi:10.2135/ cropsci2000.403597x Yang, R.-C. 2007. Mixed-model analysis of crossover genotype-environment interactions. Crop Sci. 47:1051–1062. doi:10.2135/cropsci2006.09.0611 Yang, R.-C., J. Crossa, P.L. Cornelius, and J. Burgueño. 2009. Biplot analysis of genotype × environment interaction: Proceed with caution. Crop Sci. 49:1564–1576. doi:10.2135/ cropsci2008.11.0665 Yang, R.-C., F.C. Yeh, and A.D. Yanchuk. 1996. A comparison of isozyme and quantitative genetic variation in Pinus contorta ssp. latifolia by FST. Genetics 142:1045–1052. Zobel, R.W., M.J. Wright, and H.G. Gauch. 1988. Statistical analysis of a yield trial. Agron. J. 80:388–393. doi:10.2134/agronj 1988.00021962008000030002x

www.crops.org

crop science, vol. 53, november– december 2013