Optimal Allocation in Designs for Assessing ... - Semantic Scholar

0 downloads 0 Views 94KB Size Report
Hans-Peter Piepho1. Institut fьr Pflanzenbau und ..... Simon, R., E. Korn, L. McShane, M. Rademacher, G. Wright et al.,. 2003 Design and Analysis of DNA ...
Copyright Ó 2005 by the Genetics Society of America DOI: 10.1534/genetics.104.038448

Optimal Allocation in Designs for Assessing Heterosis From cDNA Gene Expression Data Hans-Peter Piepho1 Institut fu¨r Pflanzenbau und Gru¨nland, Universita¨t Hohenheim, 70599 Stuttgart, Germany Manuscript received November 11, 2004 Accepted for publication May 30, 2005 ABSTRACT Heterosis is defined as the superiority of a hybrid cross over its two parents. Plant and animals breeders have long been exploiting heterosis, but the causes of this phenomenon are as yet only partly understood. Recently, chip technology has opened up the opportunity to study heterosis at the gene expression level. This article considers the cDNA chip technology, which allows assaying two genotypes simultaneously on the same chip. Heterosis involves the response of at least three genotypes (two parents and their hybrid), so a chip or microarray constitutes an incomplete block, which raises a design problem specific to heterosis studies. The question to be answered is how genotype pairs should be allocated to chips. We address this design problem for two types of heterosis: midparent heterosis and better-parent heterosis. The general picture emerging from our results is that most of the resources should be allocated to parent-hybrid pairs, while chips with parent-parent pairs or hybrid-reciprocal pairs should be used sparingly or not at all.

P

ROGRESS in plant and animal breeding is often made by exploiting nonadditive gene action. For example, when two maize inbred lines are crossed, the resulting hybrid is frequently found to be superior to the midparent value, i.e., the average of the two parent means (Falconer and Mackay 1996; Lynch and Walsh 1998). This phenomenon is commonly denoted as midparent heterosis of hybrid vigor. Historically, heterosis was first studied at the phenotypic level of agronomically relevant traits such as yield. Several theories have been put forward to explain heterosis (e.g., Stuber et al. 1992), but a consensus has not yet emerged. The advent of chip technologies has now opened up the scope to study heterosis at the gene expression level (Ni et al. 2000; Kollipara et al. 2002; Guo et al. 2003), thus increasing our understanding of the underlying molecular basis of heterosis (Birchler et al. 2003). This article is concerned with the optimal design of gene expression studies aiming at heterosis. The notion of heterosis may be associated with a linear model as follows. The expected phenotypic values of two parent genotypes A and B and their hybrid AB can be expressed as m A ¼ f 1 tA ;

ð1Þ

m B ¼ f 1 tB ;

ð2Þ

mAB ¼ f 1 tAB ;

ð3Þ

and

1 Address for correspondence: Institut fu¨r Pflanzenbau und Gru¨nland, Universita¨t Hohenheim, Fruwirthstrasse 23, 70599 Stuttgart, Germany. E-mail: [email protected]

Genetics 171: 359–364 (September 2005)

where f is a general effect and t is the genotypic effect. Midparent heterosis may be defined by the linear contrast m 1 mB tA 1 tB ¼ tAB  : ð4Þ dAB ¼ mAB  A 2 2 Midparent heterosis occurs whenever dAB 6¼ 0. Often, it matters which inbred line is the male parent. It is then important to also study the reciprocal cross, which we denote as BA. The linear model for this genotype is mBA ¼ f 1 tBA :

ð5Þ

The reciprocal’s midparent heterosis is dBA ¼ tBA 

tA 1 tB : 2

ð6Þ

Heterosis of an agronomic trait is economically useful, when the hybrid outperforms both parents. This type of heterosis is also known as better-parent heterosis, and it will occur for hybrid AB, when tAB . tA and tAB . tB, assuming nonnegative coefficients and that an increase in average phenotype is considered advantageous. Heterosis is thought to be associated with nonadditive gene action or dominance. In fact, dominance may be regarded as midparent heterosis at the gene level. Similarly, overdominance occurs when there is betterparent heterosis at the gene level. If an expression product for a specific gene can be measured for the inbred parents and their hybrids, dominance can be estimated on the basis of (4) and (6). Similarly, overdominance can be assessed on the basis of the contrasts tAB  tA and tAB  tB at the expression level. This article is concerned with cDNA chip technology, where each of a large number of genes is represented by

360

H.-P. Piepho

a cDNA spot on a glass slide. Expression profiles of two mRNA samples representing two different genotypes are assayed on a slide in parallel. Genotypes are labeled by fluorescent dyes, resulting in a green signal for the one genotype and a red signal for the other genotype. To account for dye effects, it is customary to swap dyes on about half of the chips assigned to the same genotype pair. Statistically, a microarray may be considered as an incomplete block accommodating only two treatments (genotypes) (Kerr and Churchill 2001; Kerr 2003). The design problem is how to allocate different genotype pairs to chips. Most of the current literature on experimental designs for identifying differentially expressed genes deals with the case where two or more treatments of equal interest are to be compared. Efficient designs in this context are the reference design, the loop design, and balanced block designs (Dobbin and Simon 2002; Kerr 2003; Dobbin et al. 2003a,b; Simon et al. 2003). The objective of heterosis studies differs from those commonly considered in that the treatment contrast of interest involves three treatments, so efficiency regarding all pairwise comparisons is irrelevant. Also, most of the theory of optimal designs revolves around criteria such as A-optimality or E-optimality ( John and Williams 1995; Yang et al. 2002), which strive for optimality relative to a broad class of contrasts. In the case of heterosis, such approaches are not optimal, because the class of contrasts of interest is much more limited. Clearly there is only one type of contrast. While other designs such as the loop design may provide good heterosis estimates (Gibson et al. 2004), they are not usually optimal (Keller et al. 2005). By analogy, a balanced block design optimal with respect to all pairwise comparisons is not optimal regarding multiple comparison with a control. Generally, it will be more efficient to directly optimize the design with respect to the particular contrast(s) of interest ( John and Williams, 1995). This article is concerned with the problem of finding a design by which heterosis or dominance can be estimated with minimal standard error. Specifically, we search for the optimal allocation of a fixed number of chips among all possible genotype pairs. We first consider midparent heterosis and then turn to better-parent heterosis. With both types of heterosis, we study the case of two hybrids as well as that of a single hybrid (no reciprocal tested). The derivations for different cases are organized as follows: first an appropriate linear model is formulated and contrasts of interest are defined in terms of the parameters of that model. Optimality is then defined in terms of the variance of a contrast of interest. Minimization of this criterion leads to the optimal allocation. MIDPARENT HETEROSIS

Hybrids and reciprocal: We assume that analysis of normalized gene expression data is done in standard

fashion on the basis of a linear model for log measurements. The model accounts for all relevant effects, including dye, chip, and genotype (treatment). For details the reader is referred to Kerr and Churchill (2001), Wolfinger et al. (2001), and Keller et al. (2005). It is assumed throughout that chip effects are taken as fixed, implying that interchip information is not recovered. This approach corresponds to the usual assumption made when deriving optimal incomplete block designs ( John and Williams 1995). Since there are only two genotypes per chip, all information on genotype contrasts is contained in pairwise differences of genotypic expression levels per chip. Clearly, the analysis of differences of log measurements is equivalent to analysis of actual log measurements, when chip effects are fixed. We express the model in terms of genotype differences, because this greatly simplifies our study of optimal allocation. In applications one will not usually analyze actual log intensities instead of differences. Let yji denote the ith observed genotype difference for the jth genotype pair. Specifically, let y1i ¼ ith observation (chip) on difference A  B (i ¼ 1, . . . , n1), y2i ¼ ith observation (chip) on difference A  AB (i ¼ 1, . . . , n2), y3i ¼ ith observation (chip) on difference A  BA (i ¼ 1, . . . , n3), y4i ¼ ith observation (chip) on difference B  AB (i ¼ 1, . . . , n4), y5i ¼ ith observation (chip) on difference B  BA (i ¼ 1, . . . , n5), y6i ¼ ith observation (chip) on difference AB  BA (i ¼ 1, . . . , n6), where nj is the number of chips used for the jth genotype pair. The differences have the following expected values: Eðy1i Þ ¼ tA  tB Eðy2i Þ ¼ tA  tAB Eðy3i Þ ¼ tA  tBA Eðy4i Þ ¼ tB  tAB Eðy5i Þ ¼ tB  tBA

ð7Þ

Eðy6i Þ ¼ tAB  tBA : P The total sample size is given by n ¼ 6i¼1 ni. For symmetry reasons, we require the same number n0 of observations for each parent-hybrid pair, i.e., n2 ¼ n3 ¼ n4 ¼ n5 ¼ n0. Thus, the optimal allocation is given by (n0, n1, n6). To ensure identifiability, we set tBA ¼ 0. To account for dye effects, one commonly swaps dyes for half the chips of a genotype pair. The dye swap can be accommodated by extending the linear model with dye effects and dye-by-genotype interactions. To derive a design optimal with respect to contrasts among genotype main effects, it suffices to use model (7) and require

Optimal Allocation in Designs for Assessing Heterosis

that the number of arrays for a particular genotype pair be allocated in equal parts to both possible dye swaps. Model (7) may be expressed as EðyÞ ¼ X b;

ð8Þ

where y ¼ ðy11 ; . .. ; y1n1 ; y21 ; ... ; y2n2 ; .. . ; y61 ; ... ; y6n6 Þ9, b ¼ ðtA ; tB ; tAB Þ, and X is the appropriate design matrix with dummies 1 and 1. The heterosis contrast of BA can be written as dBA ¼ k9dðBAÞ b, where

kdðBAÞ

0 1   1 1@ 1 12 1 A ¼  ¼ : 2 2 0 0

ð9Þ

The least-squares estimator is dˆBA ¼ k9dðBAÞ ðX 9X Þ1 X 9y;

ð10Þ

which has variance varðdˆBA Þ ¼ k9dðBAÞ ðX 9X Þ1 kdðBAÞ s2 ¼ 1419D1s2 ;

ð11Þ

where D¼

  1 n1 n6 1 2n0 n1 1 n02 I2 1 J 2 ; 2n1 1 2n0 2n0 n6 1 2n02

ð12Þ

I2 is a 2 3 2 identity matrix, J2 ¼ 12 192 with 12 ¼ (1, 1)9, and s2 is the variance of a difference yji ( j ¼ 1, . . . , 6). A derivation of Equation 12 is given in the appendix. A design for a given sample size n involves an allocation (n0, n1, n6) to the different genotype pairings.   ˆ We now derive an allocation   that minimizes var dBA . It is first shown that var dˆBA does not depend on n1. Thus, we set n1 ¼ 0. In the next step, we find the optimal value of n0 subject to the constraint n ¼ 4n0 1 n6. It can be shown that 19D1 ¼

2n0 1 n6 ; n0 n6 1 n02

ð13Þ

which is free of n1. Thus, for any fixed values of n0 and n6, the variance of the heterosis contrast does not change with n1. This proves that parent-parent chips (A-B pairs) do not add any information with regard to the heterosis contrast (6), and so the optimal design must have n1 ¼ 0. Setting n1 ¼ 0 and n6 ¼ n  4n0, we obtain 19D1 ¼

n  2n0 n0 ðn  3n0 Þ

ð14Þ

and o19D1 2 ðn  2n0 Þðn  6n0 Þ  ¼ ¼ 0; ð15Þ on0 n0 ðn  3n0 Þ n02 ðn  3n0 Þ2

361

yielding a quadratic equation in n0 with roots rffiffiffiffiffiffiffiffiffiffiffi! rffiffiffiffiffiffi! 1 1 1 1 1 6  n¼ 6 n: n0 ¼ 2 4 6 2 12 Since n0 # n/4, the only feasible solution is rffiffiffiffiffiffi! 1 1 1  n , n: n0 ¼ 2 12 4

ð16Þ

ð17Þ

Thus, for a given total sample size n, the quantity 19D1, and hence the variance of the heterosis estimator,  var dˆBA , is minimized for the allocation rffiffiffiffiffiffi! 1 1 n0 ¼  n  0:21132n: ð18Þ 2 12 This same allocation also minimizes the variance of the other heterosis contrast (4), dˆAB . The optimal allocation was derived by looking at a single gene, while in gene expression studies thousands of genes are studied simultaneously. It is perhaps useful to point out that generally the optimal allocation derived here is independent of the variance s2, which may be gene specific. Thus, the optimal allocation applies to all genes simultaneously. Differences among genes in variance affect only the optimal total sample size needed to achieve a desired accuracy, which may be determined by standard procedures (Steel and Torrie 1980). Only one hybrid: When only one of the two possible hybrids is tested (hybrid AB, say), the model simplifies to ˜ ~; EðyÞ ¼ Xb

ð19Þ

with b ¼ ðtA ; tB Þ9 and the constraint tAB ¼ 0. It can be shown that   1 n1 1 ˜ ˜ ˜ ð20Þ I2 1 J2 ; D ¼ ðX 9X Þ ¼ 2n1 1 n0 n0 where n1 is the number of A-B pairs and n0 is the number ˜dðABÞ b ~ with of A-AB or of B-AB pairs. Noting that dAB ¼ k9 ˜kdðABÞ ¼ ð1=2Þ12 , it can be shown that 1 2 s ; varðdˆAB Þ ¼ 2n0

ð21Þ

i.e., the variance does not depend on n1, the number of parent-parent (A-B) pairs. Obviously, the variance is minimized when n0 ¼ n/2, where n is the total sample size. Thus, all microarrays should be allocated to parenthybrid pairs. Additive gene effects: In deriving an optimal allocation, we have focused on the accuracy in estimating d. It is sometimes of interest to also estimate the additive gene effect. The accuracy of such estimates in designs

362

H.-P. Piepho

optimized for d is now considered for the two cases studied (Hybrids and reciprocals as well as Only one hybrid). Design for hybrids and reciprocals: By not allocating any chips to A-B pairs, we have no direct comparisons of the parents. It turns out, however, that the A-B comparison can be made with good accuracy. More specifically, it may be of interest to estimate the additive gene effect defined by tA  tB a¼ ¼ ka b; ð22Þ 2 where 0 1 ka ¼

1 2 B 1C @  2 A:

The additive gene effect is of interest when studying the mode of dominance. When |d| ¼ |a|, there is complete dominance, while dominance is only partial when |d| , |a| and overdominance occurs when |d| . |a| (Kearsey and Pooni 1996). To study the mode of dominance it is desirable to estimate a with about the same accuracy as d. It turns out that with n1 ¼ 0 we have 1 2 s ; 4n0

ð24Þ

so that from (11) and (13) varðdˆÞ n  2n0 ¼ .1 varðaˆÞ n  3n0

for n0 , n=4:

ð25Þ

So generally, the additive genetic effect a will be estimated more accurately than both dAB and dBA, when the design is optimized with respect to these two heterosis contrasts. Design for only one hybrid: The additive gene effect ˜a b ~ with k˜ a ¼ ð1; 1Þ9, is estimated with variance aBA ¼ k9 2 2 varðaˆÞ ¼

1 s2 : 2ð2n1 1 n0 Þ

D11 ¼

ð26Þ

Thus, when all microarrays are allocated to parenthybrid pairs (n1 ¼ 0), the additive effect is estimated with the same accuracy as the dominance effect.

ð28Þ

After some algebra using n00 ¼ (n  4n0)/2 this becomes D11 ¼

n 1 2n0 : 4n0 ðn  2n0 Þ

ð29Þ

The differential equation oD11 =on0 ¼ 0 yields a quadratic equation in n0, which can be shown to have roots rffiffiffi! 1 1 n: ð30Þ n0 ¼  6 2 2 Obviously, the only feasible solution is rffiffiffi! 1 1 n0 ¼  1 n  0:20711n: 2 2

varðtˆ AB  tˆ A Þ ¼ D˜ 11 ¼

ð32Þ

n  n0 : ð2n  3n0 Þn0

ð33Þ

Maximization again leads to a quadratic equation in n0, which has roots rffiffiffi! 1 n: ð34Þ n0 ¼ 1 6 3 The only feasible solution is therefore given by

varðtˆ BA  tˆ A Þ ¼ k9lðBA;AÞ DklðBA;AÞ ¼ D11 ð27Þ

n0 1 n1 ; ð2n1 1 n0 Þn0

˜ 11 is the first diagonal element of D˜ in Equation where D 20. Using n1 ¼ n  2n0, this can be shown to equal ˜ 11 ¼ D

Hybrids and reciprocal: It is most convenient to consider the hybrid BA. Results for the other hybrid, AB, are analogous. Assessing better-parent heterosis of hybrid BA requires good estimates of the contrasts lBA(A) ¼ tBA  tA and lBA(B) ¼ tBA  tB. The coefficient vector for the first of these contrasts equals k9lðBA;AÞ ¼ ð1; 0; 0Þ, and the associated variance is

ð31Þ

Thus, 20% of the total sample size is to be used with each of four parent-hybrid pairs, leaving a little ,20% for the parent-parent pair A-B and the hybrid-reciprocal pair AB-BA. As n1 ¼ n6 for the optimal design, 10% should therefore be allocated to each of these two pairings. As in the case of midparent heterosis, most of the resources (80%) should be used on the hybridparent pairs. Only one hybrid: The variance of the contrast lAB(A) ¼ tAB  tA is

BETTER-PARENT HETEROSIS

3n 2 1 n1 n6 1 2n0 n6 1 2n0 n1 ; ¼ 0 4n0 ðn1 1 n0 Þðn6 1 n0 Þ

2 3n02 1 n00 1 4n0 n00 : 4n0 ðn00 1 n0 Þ2

ð23Þ

0

varðaˆÞ ¼ k9a Dka ¼

where D11 is the first diagonal element of D in Equation 12. The variance D11 is seen to be symmetric in n1 and n6; i.e., the equation remains unaltered if n1 and n6 are exchanged. Therefore the optimal design should be such that n1 ¼ n6. The common sample size is denoted as n00, i.e., n1 ¼ n6 ¼ n00; whence

n0 ¼

rffiffiffi! 1 n  0:42265n: 1 3

ð35Þ

Optimal Allocation in Designs for Assessing Heterosis

Thus, 84% of the sample size is allocated to the parenthybrid pairs, while only 16% of the chips are spent on the parent-parent pair. It is worth mentioning that the design problem here is equivalent to that of a multiple comparison with a control. For a completely randomized design and when two treatments are to be compared with a control, the opqffiffiffiffiffi timal allocation is known to be mh =mp ¼ 2  1:4142, where mh is the number of observations per hybrid and mp is the number of observations per parent. This allocation minimizes the variance of a hybrid-parent contrast. Using a somewhat different optimality criterion, Dunnett (1955) found the same optimal allocation. Note that complete randomization would imply a single genotype per chip. By comparison, the optimal allocation (35) qffiffiffiffiffi implies that mh =mp ¼ 2ð 3  1Þ  1:4641, where mh ¼ 2n0qand mp ¼ n0 1 n1, which is rather close but not equal ffiffiffiffiffi to 2. The difference is mainly due to the incomplete blocking, with blocks corresponding to chips. DISCUSSION

In this article we have derived formulas for the optimal allocation of resourses in cDNA expression studies to reveal midparent heterosis or better-parent heterosis at the gene level. A common feature of both of these cases is that most of the resources are allocated to the parenthybrid pairs. The researcher needs to make up his mind as to which type of heterosis he wishes to assess. In the case of midparent heterosis, the parent-parent pair need not be tested at all, while with better-parent heterosis a small fraction of the total resources should be devoted to both parent-parent pairs and hybrid-reciprocal pairs. We have not addressed the question of optimal sample size n. This may be determined by standard procedures (Steel and Torrie 1980). The sample size needed to detect heterosis will, among other things, depend on the variance. It should be stressed that variance will usually be gene specific, so optimal sample size will differ among genes. In designs with small sample sizes, efficient estimation of the variance is critical, and it may be useful to borrow strength from other genes (Wright and Simon 2003), trading variance for some bias. As pointed out by a referee it may also be necessary to account for dye bias in variance estimation. The result that in optimal designs, parent-parent pairs provide no information regarding midparent heterosis contrasts, may seem trivial on first sight. It should be pointed out, however, that it does not generally hold in suboptimal designs and is therefore not as trivial as it may seem. The reason is that parentparent pairs provide indirect information regarding heterosis contrasts. For example, data on the parent pair A-B and on the parent-hybrid BA-A allow an indirect comparison for the pairing BA-B, since BA-A  (A-B) ¼ BA-B. Therefore, it is often found (results not shown), that with suboptimal designs, the parent-parent pair

363

provides information for the heterosis contrast. For optimal designs, this information vanishes in much the same way as information from indirect comparisons vanishes in a complete block design. In many experiments, the linear model needs to account for several fixed and random sources of variation, giving rise to a complex mixed linear model (Wolfinger et al. 2001). In this case, finding an optimal allocation will typically require numerical search strategies such as simulated annealing (Keller et al. 2005). On the basis of the examples given in Keller et al. (2005) it may be conjectured that the optimal allocation in more complex settings will not deviate dramatically from that derived in this article. To study heterosis, one may estimate the dominance ratio, u ¼ d/a (Kearsey and Pooni 1996). Using the d-method ( Johnson et al. 1993) and exploiting the fact that dominance and additive gene effect estimates are stochastically independent, the approximate variance of the dominance ratio is     ˆ dˆ varðaˆÞ 2 varðdÞ var : ð36Þ 1 u aˆ d2 a2 One might consider finding a design that minimizes this variance. This approach is not usually feasible, however, unless a priori information is available on both a and d, which will rarely be the case. The same problem would apply if one were to work with the exact distribution of dˆ=aˆ, assuming normality (Hinkley 1969), or Fieller’s (1954) method (Piepho and Emrich 2005). Thus, it is preferable to optimize the design for contrasts related to either midparent heterosis or better-parent heterosis. I thank two anonymous referees for several helpful suggestions.

LITERATURE CITED Birchler, J. A., D. L. Auger and N. C. Riddle, 2003 In search of the molecular basis of heterosis. Plant Cell 15: 2236–2239. Dobbin, K., and R. Simon, 2002 Comparison of microarray designs for class comparison and class discovery. Bioinformatics 18: 1438–1445. Dobbin, K., J. H. Shih and R. Simon, 2003a Questions and answers on design of dual-label microarrays for identifying differentially expressed genes. J. Natl. Cancer Inst. 95: 1362–1369. Dobbin, K., J. H. Shih and R. Simon, 2003b Statistical design of reverse dye microarrays. Bioinformatics 19: 803–810. Dunnett, C. W., 1955 A multiple comparison procedure for comparing several treatments with a control. J. Am. Stat. Assoc. 50: 1096–1121. Falconer, D. S., and T. F. C. Mackay, 1996 Introduction to Quantitative Genetics, Ed. 4. Longman, Harlow, UK. Fieller, E., 1954 Some problems in interval estimation. J. R. Stat. Soc. B 16: 175–185. Gibson, G., R. Riley-Berger, L. Harshman, A. Kopp, S. Vacha et al., 2004 Extensive sex-specific nonadditivity of gene expression in Drosophila melanogaster. Genetics 167: 1791–1799. Guo, M., M. A. Rupe, O. N. Danilevskaya, X. Yang and Z. Hu, 2003 Genome-wide mRNA profiling reveals heterochronic allelic variation and a new imprinted gene in hybrid maize endosperm. Plant J. 36: 30–44. Harville, D. A., 2000 Matrix Algebra from a Statistician’s Perspective. Springer, Berlin.

364

H.-P. Piepho

Hinkley, D. V., 1969 On the ratio of two correlated normal random variables. Biometrika 56: 635–639 (correction: Biometrika 57: 683). John, J. A., and E. R. Williams, 1995 Cyclic and Computer Generated Designs. Chapman & Hall, London. Johnson, N. L., S. Kotz and A. W. Kemp, 1993 Univariate Discrete Distributions, Ed. 2. Wiley, New York. Kearsey, M., and H. S. Pooni, 1996 The Genetical Analysis of Quantitative Traits. Chapman & Hall, London. Keller, B., K. Emrich, N. Hoecker, M. Sauer, F. Hochholdinger et al., 2005 Designing a microarray experiment to estimate dominance in maize (Zea mays L.). Theor. Appl. Genet. 111: 57–64. Kerr, M. K., 2003 Design considerations for efficient and effective microarray studies. Biometrics 59: 822–828. Kerr, M. K, and G. A. Churchill, 2001 Experimental design for gene expression microarrays. Biostatistics 2: 183–201. Kollipara, K. P., I. N. Saab, R. D. Wych, M. J. Lauer and G. W. Singletary, 2002 Expression profiling of reciprocal maize hybrids divergent for cold germination and desiccation tolerance. Plant Physiol. 129: 974–992. Lynch, M., and B. Walsh, 1998 Genetics and the Analysis of Quantitative Traits. Sinauer, Sunderland, MA. Ni, N. Z., Q. Sun, Z. Liu, L. Wu and X. Wang, 2000 Identification of a hybrid-specific expressed gene encoding novel RNA-binding protein in wheat seedling leaves using differential display of mRNA. Mol. Gen. Genet. 263: 934–938.

APPENDIX

Piepho, H. P., and K. Emrich, 2005 Simultaneous confidence intervals for two estimable functions and their ratio under a linear model (in press). Searle, S. R., G. Casella and C. E. McCulloch, 1992 Variance Components. Wiley, New York. Simon, R., E. Korn, L. McShane, M. Rademacher, G. Wright et al., 2003 Design and Analysis of DNA Microarray Investigations. Springer, New York. Steel, R. G. D., and J. H. Torrie, 1980 Principles and Procedures of Statistics: A Biometrical Approach. McGraw-Hill, New York. Stuber, C. W., S. E. Lincoln, D. W. Wolff, T. Helentjaris and E. S. Lander, 1992 Identification of genetic factors contributing to heterosis in a hybrid from two elite maize inbred lines using molecular markers. Genetics 132: 823–839. Wolfinger, R., G. Gibson, E. D. Wolfinger, L. Bennett, H. Hamadeh et al., 2001 Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol. 8: 625–637. Wright, G. W., and R. M. Simon, 2003 A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 19: 2448–2455. Yang, X., K. Ye and I. Hoeschele, 2002 Some E-optimal designs for cDNA microarray experiments. ASA Proceedings of the Joint Statistical Meetings, New York, pp. 3853–3954. Communicating editor: J. B. Walsh

and

We here derive Equation 12 for matrix D. As we require n2 ¼ n3 ¼ n4 ¼ n5 ¼ n0, the matrix X 9X is given by 0 1   n1 n0 n1 1 2n0 A b @ A n1 n1 1 2n0 n0 ¼ X 9X ¼ b9 c n0 n0 2n0 1 n6

f ¼ ðc  b9A1 bÞ1 :

To study the heterosis contrast (4), it is sufficient to find D. Using

ðA1Þ

D 1 ¼ A  bb9c 1 ¼ ð2n1 1 2n0 ÞI2 

A ¼ ð2n1 1 2n0 ÞI2  n1 J2 ;

ðA2Þ

and

b ¼ n0 12 ;

ðA3Þ

with

ðxI2 1 zJ2 Þ

and c ¼ n6 1 2n0 ;

ðA4Þ

where 12 ¼ (1, 1)9, I2 is a 2 3 2 identity matrix, and J2 ¼ 12 192 . Using results on the inverse of a partitioned matrix (Harville 2000, p. 99) we find   D e ; ðA5Þ ðX 9X Þ1 ¼ e9 f where D ¼ ðA  bb9c 1 Þ1 ; e9 ¼ c

1

b9D;

ðA6Þ ðA7Þ

ðA8Þ

1

n1 n6 1 2n0 n1 1 n02 J2 n6 1 2n0 ðA9Þ

  1 z I2  J2 ¼ x x 1 2z

ðA10Þ

(Searle et al., 1992, p. 443), it can be shown that   1 n1 n6 1 2n0 n1 1 n02 D¼ I2 1 J2 : 2n1 1 2n0 2n0 n6 1 2n02

ðA11Þ

Now the least-squares estimator dˆBA ¼ k9dðBAÞ ðX 9X Þ1 X 9y has variance   D e 1 2 ˆ kdðBAÞ varðdBA Þ ¼ k9dðBAÞ ðX 9X Þ k dðBAÞ s ¼ k9dðBAÞ e9 f ¼ 1419D1s2 :

ðA12Þ

Suggest Documents