RESEARCH
Evaluating Sugarcane Families for Yield Potential and Repeatability Using Random Coefficient Models M. M. Zhou,* C.A. Kimbeng, S. Andru, T. L. Tew, M. J. Pontif, and K. A. Gravois
ABSTRACT Sugarcane (Saccharum species hybrid) is a clonally propagated crop but during variety development, the first stage of selection occurs among seedlings established from true seed. All subsequent stages are established from vegetative stem cuttings and evaluated as clones. During family selection, whole families of seedlings are selected or rejected based on the family mean performance. Individual seedling selection is restricted to selected families. Confounding due to “seed type” can occur when the relative performance of families or individual genotypes differ between the seedling and clonal stages. The objective of this study was to demonstrate the use of random coefficient models (RCMs) for estimating the potential of sugarcane families to produce elite progeny. Data were collected from 17 families evaluated as seedlings in 2002 and clones in 2003. The elite families selected using RCM analysis produced higher mean cane yield in both the seedling and clonal stages and a higher proportion of high yielding clones than those selected using analysis of covariance and analysis of variance. The slope was the most discriminating parameter for family repeatability and was associated with high within-family variability.
M.M. Zhou, C.A. Kimbeng, and S. Andru, School of Plant, Environmental and Soil Sciences, LSU AgCenter, Baton Rouge, LA 70803; T.L. Tew, Sugarcane Research Unit, ARS, USDA, Houma, LA 70360; M.J. Pontif and K.A. Gravois, Sugar Research Station, LSU AgCenter, St. Gabriel, LA 70776; M.M. Zhou, South African Sugarcane Research Institute, P. Bag X02, Mt Edgecombe 4300, South Africa; M.M. Zhou, University of the Free State, PO Box 339, Bloemfontein 9300, Republic of South Africa. Received 24 Jan. 2013. *Corresponding author (
[email protected]). Abbreviations: ANCOVA, analysis of covariance; RCM, random coefficient model.
S
election is the cornerstone of plant breeding and is practiced across all stages of sugarcane improvement (Skinner et al., 1987). As a clonally propagated crop, the first episode of selection in sugarcane improvement programs occurs among seedlings planted from true seed of desired crosses. Referred to as the seedling stage (hereafter referred to as Stage I), this is the only stage to be established from true seed. Subsequently, the seedlings are appraised either as individuals (mass selection) or as families (family selection). The second stage of selection (hereafter referred to as Stage II) occurs when individual seedlings selected in Stage I are clonally propagated through stem cuttings and evaluated in row plots. Family selection in sugarcane involves positive selection of whole populations of seedlings based on data derived from family plots (Kimbeng and Cox, 2003). Family selection in Stage I is widely practiced to different extents in Australia (Hogarth et al., 1990; Jackson et al., 1995a, 1995b; Cox and Stringer, 1998; Kimbeng et al., 2000), the United States (Milligan and Legendre, 1990; Chang and Milligan, 1992a, 1992b; Tai et al., 2003), India (Shanthi
Published in Crop Sci. 53:2352–2362 (2013). doi: 10.2135/cropsci2013.01.0052 © Crop Science Society of America | 5585 Guilford Rd., Madison, WI 53711 USA All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.
2352
www.crops.org
crop science, vol. 53, november– december 2013
et al., 2008), and Brazil (De Resende and Barbosa, 2006; Pedrozo et al., 2011) and other countries. Family selection is efficient when based on data obtained from plant crops of replicated trials followed by individual selection of seedlings in the first ratoon crop of the trial (Cox and Hogarth, 1993; Kimbeng et al., 2000, 2001a, 2001b). One key advantage of family selection is the ability to replicate families (but not individual seedlings) across locations (Hogarth and Mullins, 1989; Jackson et al., 1995b), thus evaluating genotype × environment interaction (Skinner et al., 1987). In Stage I, individual seedlings cannot be replicated because of inadequate planting material and the large numbers planted. Family selection is practiced for cane yield (Zhou et al., 2011), a trait known to possess low heritability ( Jackson et al., 1995a, 1995b). Before family selection, the proven cross system was used (Heinz and Tew, 1987). The proven cross system was based on proportion of seedlings advanced to later stages to assess sugarcane crosses. The value of the proven cross was earlier questioned by Walker (1963) because larger numbers of seedlings were planted from families judged to be elite at the expense of untested families creating and increasing further any bias against the untested families. Another disadvantage of the proven cross system was the lack of statistical tests to compare families. The proven cross system also took several years to evaluate the family potential because the breeder had to wait for advanced clones to quantify the value of the individual families (Milligan and Legendre, 1990). The advent of mobile weighing machines made it possible to harvest and weigh large numbers of seedlings from family plots ushering in a new era of family selection in sugarcane breeding (Hogarth and Mullins, 1989; Kimbeng and Cox, 2003). Data derived from family selection is used in determining the breeding values of parents (Chang and Milligan, 1992a, 1992b; Stringer et al., 1996; Cox and Stringer, 1998; Balzarini, 2000; Aitken et al., 2008) to retain for future crossing, cross combinations to make, and number of crosses and seedlings per cross to plant and ultimately select. Sugarcane breeders have customarily relied on differences between means to discriminate among families during selection (Hogarth et al., 1990; Milligan and Legendre, 1990; Chang and Milligan, 1992a, 1992b; Cox and Stringer, 1998; Jackson et al., 1995a, 1995b; Kimbeng et al., 2000). The mean was easier to estimate and compare families (Chang and Milligan, 1992a, 1992b; Tai et al., 2003). Sugarcane is grown commercially as a clone. The rationale for family selection is to identify families that possess a higher proportion of superior yielding seedlings that would perform well when grown as clones (Kimbeng and Cox, 2003). Instances have been recorded in Stage II trials where individual clones or family of clones have been found to perform better or worse than expected when compared with their family mean in Stage I (Hogarth et al., 1990; Kimbeng et al., 2000). Kimbeng et al. (2000) attributed this discrepancy to crop science, vol. 53, november– december 2013
lack of accounting for within-family variation. Kimbeng and Cox (2003) suggested that evaluation of repeatability or prediction of performance of Stage II from Stage I was needed. There is potential confounding for seed types between stages I (established from true seed) and II (established from vegetative stem cuttings). Confounding occurs when seed type affects differently the genotype yield in stages I and II. A statistical analysis method that addresses confounding is required. The objective of this study was to demonstrate the usefulness of the random coefficient model (RCM) analysis as a statistical tool for estimating the potential of sugarcane families to produce elite progeny.
MATERIALS AND METHODS
Experimental Materials and Data Collection A random sample of 17 out of 300 biparental crosses of sugarcane was used in this study (Table 1). Seedlings from these crosses were established from true seed in the greenhouse and later transplanted to the field as individual plants in the spring of 2002.
Stage I (Seedling Trial) The Stage I trial refers to seedlings that were initially established from true seed. One set of the unselected individual seedlings was planted in the first replication and the other set in the second replication. Seedlings from each family were planted to two-row plots with 16 seedlings per row. Families but not seedlings were replicated. The seedlings were harvested and left to overwinter in 2003. From the seedlings that survived the winter, eight seedlings (four seedlings from each row) were randomly chosen for a total of 16 seedlings per family. At harvest, the number of stalks produced by each of the chosen seedlings was counted. The stalk height was measured from the base of the tallest stalk to the top most visible dewlap. The stalk diameter of three random stalks per seedling was measured at the center of the stalk (without reference to the bud) using a caliper and the mean diameter together with stalk number and stalk height were used to estimate the seedling cane yield (Eq. [1]). The seedling cane yield was calculated using the formula used by De Sousa-Vieira and Milligan (1999) (Eq. [1]). Their calculation assumed the sugarcane stalk was a perfect cylinder with specific gravity of 1.00 g cm-3 (Miller and James, 1974; Gravois et al., 1991; Chang and Milligan, 1992b): Seedling cane yield = ndπr 2L/1000,
[1]
in which seedling cane yield is measured in kilograms, n is the number of stalks, d is the density at 1.00 g cm-3, r is the stalk radius (cm) (radius was calculated from the diameter divided by 2), and L is the stalk height (cm).
Stage II (Clonal Generation I Trial) In Stage II, the 17 families, established through stem cuttings of selected Stage I individuals, were planted in a trial with two replications where each family was randomized to a plot. A plot consisted of two rows with each row planted to four clones for a total of eight clones per plot. Each clone was planted using a twostalk sample that was topped at the natural breaking point to a
www.crops.org 2353
Table 1. The number of stools in replications 1 and 2 of 17 families derived from the 2000 and 2001 crossing series. Series Family
Cross
Replica- Replication 1 tion 2
HB00
306
Ho 94-856 × HoCP 96-540
18
17
HB01
3055
HoCP 00-945 × HoCP 99-866
18
17 19
HB01
3074
HoCP 00-950 × HoCP 96-540
16
HB01
3093
HoCP 00-945 × HoCP 96-540
17
18
HB01
3101
HoCP 99-866 × HoCP 96-540
17
16
HB01
3107
HoCP 00-950 × LCP 85-384
18
16
HB01
3111
HoCP 99-866 × LCP 85-384
18
18
HB01
3174
HoCP 00-945 × LCP 85-384
18
18
HB01
3249
N27 × LCP 85-384
23
11
HB01
3255
HoCP 00-945 × Ho 94-856
18
18
HB01
3256
HoCP 00-950 × Ho 94-856
17
18
HB01
3257
L 98-207 × Ho 94-856
18
18
HB01
3276
TucCP 77-42 × HoCP 99-866
17
17
HB01
3322
TucCP 77-42 × L 98-207
18
18
HB01
3328
HoCP 91-555 × LCP 85-384
18
20
HB01
3345
HoCP91-555 × L 98-207
17
17
HB01
3417
HoCP91-555 × TucCP 77-42
18
18
1.8 m of row within the main plot. The families were replicated. Each family was therefore represented by 16 individual clones derived from the 16 chosen seedlings from Stage I. The identity of the seedlings was maintained in the clonal plots in Stage II. At harvest, the number of stalks per clone was counted. From each clone, five random stalks were manually cut and weighed. The five-stalk sample weights were used to calculate the average stalk weight for each clone. The number of stalks per clone was multiplied by the average stalk weight for that clone to estimate the clonal cane yield. The data were measured in the plant (2004) and second ratoon (2006) crops. The first ratoon crop (2005) was severely lodged after hurricane Katrina and no data were collected. All field tests were conducted at the USDA ARS Ardoyne Research Farm at Schreiver, LA (29°44¢ N, 90°49¢ W).
Statistical Considerations and Data Analysis Using Random Coefficient Models The RCMs or conditional hierarchical linear models were developed from analysis of covariance (ANCOVA) following Bryk and Raudenbush (1992). In ANCOVA, the families are treated as fixed populations and produce fixed intercepts and slopes. The RCM analysis assumes a hierarchical or multilevel arrangement between the population (all families) and subpopulations (individual families) nested within the population of all families (Goldstein, 1987; Bryk and Raudenbush, 1992). The families are treated as independent and random subjects derived from the fixed population (Littell et al., 2005). The intercept and slope of each family is derived from the regression between the cane yield of clones (response variable) versus cane yield of seedlings (predictor variable) and are therefore treated as random parameters described by variances and covariance. The intercept and slope from each family are correlated and the strength of the correlation is determined by the magnitude of the covariance. The intercept and slope of each family is tested against the population intercept and slope, respectively (Longford, 1993). The RCM analysis was previously applied in animal breeding (Longford, 2354
1993), education (Raudenbush, 1988), finance (Fieldsend et al., 1987), health (Lundbye-Christensen, 1991), and real estate (Harrison and Rubinfeld, 1978). For example, in real estate, the RCM analysis was used to compare the trends for house prices over time across states, cities, and suburbs. The seedlings within a family represented a random sample derived from a large seedling population of all families and were evaluated in stages I and II. The family trend was tested against that of the population. The population regression model was yj = a + bxj + e,
[2]
in which yj is the cane yield of the jth clone ( j = 1, 2, …, s) in Stage II, a is the population intercept, b is the population slope, xj is the cane yield of the jth seedling (Stage I), and e is the residual error. The family regression model was yij = ai + bixj(i) + eij,
[3]
in which yij is the cane yield of the jth clone nested within the ith family (i = 1, 2, …, f ), ai is the ith family intercept, bi is the ith family slope, xj(i) is the jth seedling nested with the ith family, and eij is the residual error for the ith family. The intercepts and slopes are not independent and follow a normal distribution where éæa ö ù æ sa2 æai ö çç ÷÷÷ ~ iid N êêçç ÷÷÷÷ , Y úú , Y =ççç ç çèbi ÷ø çèsab ëè b ø û
sab ö÷ , ÷÷ sb2 ø÷
and ej(i) ~ iid N(0, s2), and Y is their covariance matrix, sa2 is the variance for intercepts, sb2 is the variance for slopes, sab is the covariance of slopes and intercepts, and iid is the independent, identical distribution. Equation [2] (population) and Eq. [3] (family) are combined to produce the effects model,
yij = a + ai* + bx j ( i ) + bi* x j ( i ) + ei ,
in which ai* = ai -a , bi* = bi - b , and *
é ù æai* ö÷ çç ÷ ~ iid N êæçç0÷ö÷ , Y ú çèçb * ÷ø÷ êçè0÷ø÷ ú i ë û
[4]
.
The random effect ai is the deviation of the ith family inter* cept from the population intercept (a), and the random effect bi is the deviation of the ith family slope from the fixed population * * slope (b). The random effects ai and bi have a mean of zero and covariance matrix Y. Equation [4] resembles a mixed model,
yij = a + bx j ( i ) + ai* + bi* x j ( i ) + eij ,
[5]
in which a + bxj(i) is the fixed effects component of the model (population model) and ai* + bi* x j ( i ) + ei is the random effects component of the model. Equation [5] can be written as yij = a + bx j ( i ) + eij* ,
[6]
in which E(yij) = a + bxj(i) produces the fixed effects component é 1 ù of the model and Var (y ) = éëê1x ùûú Y êêx úú + s the variance used for testing ë û the random effects. Equation [5] was used in the mixed procedure of SAS (SAS Institute, 2007) to perform the data analysis.
www.crops.org
ij
2 e
j(i )
j(i )
crop science, vol. 53, november– december 2013
Data Analysis Using Simple Linear Regression, Analysis of Covariance, and Analysis of Variance Simple linear regression was performed using SAS mixed procedures to determine if the population (all families) intercept and slope were significant. Using the SAS mixed procedure accounted for the variation associated with the random variables such as crop–years, replications, and clones within families from the experimental error. The linear mixed model used was Yijkm = Ti + R(T )j(i) + a + FR(T )jk(i) + bxm(ikl) + C(F)m(k) + e ijkm ,
[7]
in which Yijkm is the clonal yield estimated from the ith crop–year (i = 1, 2), jth replication ( j = 1, 2), kth family (k = 1, 2,…, 17), and mth clone (m = 1, 2,…, 16), Ti is the random effect of the ith crop– year, R(T)j(i) is the random effect of the jth replication nested within the ith crop–year, a is the population intercept, FR(T) is the random interaction effect of the interaction of the jth jk(i) replication by the kth family nested in the ith crop–year, C(F)m(k) is the random effect of the mth clonal effect nested within the kth family, b is the population slope, xm(ikl) is the estimated seedling cane yield from the mth clone nested within the ith crop–year, jth replication and kth family, and eijkm is the residual error. The ANCOVA was performed using SAS mixed procedures and generated individual family intercepts and slopes. The linear mixed model used was Yijkm = Ti + R(T )j(i) + ak + FR(T )jk(i) + bkxm(ikl) + C(F)m(k) + e ijkm ,
[8]
in which ak is the intercept of the kth family and bk is the slope of the kth family. The Pearson correlation coefficient for cane yield of clones and seedlings for each family was also determined. The ANOVA was performed for the seedlings and clones using the SAS mixed procedures. The linear mixed model used for the seedlings was Yjkm = Ri + Fk + FRjk + C(F)m(k) + e jkm ,
[9]
and the linear mixed model used for the clones was Yijkm = Ti + R(T)j(i) + Fk + FR(T)jk(i) + C(F)m(k) + e ijkm , [10] in which Fk is the fixed effect of the kth family. To demonstrate the utility of the RCM analysis, we compared the mean cane yield of seedlings and clones of families that were selected using the ANOVA, ANCOVA, and RCMs.
RESULTS
Population Parameters Preliminary analysis of the clonal cane yield (Stage II) from the plant and second ratoon crops using ANOVA, ANCOVA, and RCMs produced similar trends; therefore, further analyses combined data from both crops. The clonal yield plotted against the seedling yield (Fig. 1) was also analyzed using simple linear regression (Eq. [7]). There was crop science, vol. 53, november– december 2013
Figure 1. The clonal cane yield versus seedling cane yield of the 17 families, their population trend, and the perfect linear association (PLA). Each family comprised 16 entries. Table 2. The intercepts, slopes, standard errors, P-values, and correlation coefficients of the 17 families derived from analysis of covariance. Intercept
Slope Correlation† P-value coefficient
Estimate ± SE
P-value
Estimate ± SE
306
27.36 ± 5.38
**
1.01 ± 0.63
ns‡
3055
18.73 ± 4.93
**
1.62 ± 0.47
**
0.46
3074
29.73 ± 6.00
**
0.97 ± 0.71
ns
0.38
3093
18.15 ± 4.27
**
1.94 ± 0.34
**
0.71
3101
5.76 ± 6.86
ns
4.08 ± 0.68
**
0.67 0.33
Family
0.45
3107
21.24 ± 6.25
**
1.32 ± 0.62
*
3111
15.59 ± 7.48
*
2.03 ± 0.69
**
0.57
3174
11.74 ± 5.20
*
3.55 ± 0.56
**
0.62
3249
44.08 ± 5.16
**
–0.42 ± 0.63
ns
0.01
3255
29.41 ± 5.97
**
1.84 ± 0.67
**
0.57
3256
33.60 ± 5.56
**
1.16 ± 0.76
ns
0.38
3257
45.18 ± 5.87
**
–1.43 ± 0.85
ns
–0.08
3276
25.56 ± 4.99
**
2.06 ± 0.44
**
0.61
3322
16.79 ± 4.88
**
2.88 ± 0.48
**
0.67
3328
32.15 ± 5.72
**
0.38 ± 0.90
ns
0.07
3345
20.43 ± 5.59
**
1.22 ± 0.59
*
0.39
3417
23.67 ± 4.50
**
1.71 ± 0.42
**
0.56
*Significant at the 0.05 probability level. **Significant at the 0.01 probability level. †
Pearson correlation coefficient between Stage I (seedlings) and Stage II (clones).
‡
ns, not significant at P = 0.05.
significant (r = 0.45, P < 0.001) association between seedling and clonal yield indicating that Stage II yield can be predicted from Stage I.
Family Evaluation Using Analysis of Covariance The ANCOVA was performed using the clonal yield as the response variable and the seedling yield as the covariate and family as a fixed effect. The overall test for the intercepts and slopes was highly significant (P < 0.001) (Table 2). All families except 3101 (P = 0.41), 3111 (P = 0.04),
www.crops.org 2355
Figure 2. The plots of the family slopes versus the family intercepts (a), the family means versus the family intercepts (b), the family means derived from Stage II versus the family slopes (c), and the family slopes versus the family standard deviations (d) of the 17 families.
and 3174 (P = 0.03) produced significant intercepts (P 0.05).
www.crops.org
crop science, vol. 53, november– december 2013
Table 4. The effects (kg), SEs (kg), and the probability of obtaining a larger t-statistic (P-value) for the tests of the intercept and slope of the 17 families. Test for the intercept Effect of intercept Family ± SE P-value
Test for the slope Effect of slope ± SE
P-value
306 3055 3074 3093 3101 3107 3111 3174 3249 3255 3256 3257 3276 3322
0.92 ± 3.82 –3.37 ± 3.75 1.53 ± 4.10 –4.55 ± 3.23 –7.27 ± 4.60 –1.50 ± 4.48 –3.79 ± 4.70 –5.14 ± 3.82 11.60 ± 3.78 1.11 ± 4.17 4.93 ± 3.85 8.65 ± 3.88 0.68 ± 3.62 –4.42 ± 3.64
0.81 0.37 0.71 0.16 0.11 0.74 0.42 0.18 0.01 0.79 0.20 0.03 0.85 0.23
–0.25 ± 0.49 –0.12 ± 0.45 –0.10 ± 0.51 0.25 ± 0.37 1.07 ± 0.52 –0.34 ± 0.51 0.01 ± 0.49 0.73 ± 0.47 –0.99 ± 0.50 0.48 ± 0.52 –0.05 ± 0.53 –0.95 ± 0.54 0.31 ± 0.40 0.71 ± 0.44
0.61 0.79 0.84 0.50 0.04 0.51 0.98 0.13 0.05 0.35 0.93 0.08 0.44 0.10
3328 3345 3417
3.80 ± 3.85 –2.85 ± 4.01 –0.31 ± 3.38
0.32 0.48 0.93
–0.48 ± 0.57 –0.30 ± 0.49 0.01 ± 0.40
0.40 0.54 0.98
Family Evaluation Using Random Coefficient Model Analysis The RCM analysis tested each family’s intercept and slope against the population intercept and slope, respectively. Equation [6] and the covariance parameters (Table 3) were used to compute the variances that were used for testing the family effects. The overall tests for the intercept and slope effects were highly significant (P < 0.001) (data not shown). Families 3249 and 3257 produced significant (P