Published July, 1998
SOFTWARE SASCode for Recovering Intereffect Information in Experiments with Incomplete Block and Lattice Rectangle Designs Walter T. Federer* and Russell D. Wolfinger ABSTRACT
expected error mean square for differences (contrasts) of means is smaller when random effects information is recovered than whenit is ignored. The adjusted treatment means are less affected by the random nature of the incomplete block, the row, column, and gradient effects, in that their effects are reduced. Not only is recovery of information from the random effects important, but also the particular statistical model selected for analysis of the data mayhave a considerable effect on the interpretation of results. For certain types of variation encountered in experiments, standard textbook analyses may be inappropriate. Whenthe experiment design is selected, certain spatial patterns are anticipated; however, the blocking used may not correspond to the spatial variation that actually is present in the experiment. Various events may happen during the course of the experiment to make the blocking pattern of the experiment design ineffective in controlling the variation. Thus, it is necessary to select the analysis that fits the experimentrather than alwaysusing one analysis for all experiments. To illustrate, differential gradients within each of the incomplete blocks, row effects and differential gradients within each of the rows within a lattice rectangle, or polynomial regressions of row and of columneffects and the interaction of these regressions may explain the spatial variation present in an experiment. The differential gradients and the regression coefficients are realistically considered to be random effects, as they vary from block to block, row to row, column to column, and from experiment to experiment. Those from a particular experiment have no importance other than how they affect treatment means. To illustrate the need for selecting an appropriate statistical analysis, consider the lattice square designed experiment in Table 12.5 of Cochran and Cox (1957). Federer (1998) describes statistical procedures for number of situations as described below. Residual mean squares, coefficients of variation, F-values, and number of replicates required to obtain the same precision for this lattice square designed experiment are given in Table 1. The last analysis aboveis also knownas trend analysis. Here we note that the residual mean square for trend analysis is about half that for the textbook lattice square analysis. Also, using the F-statistic, the trend analysis wouldindicate significant treatment differences at the 0.05 level, whereas the textbook analysis would not. Since the data are means of three counts, data transfor-
Efficient use of resources and time requires optimal planning of designed experiments and optimal recovery of information from them. Selection of an appropriate experiment design and selection of an appropriatestatistical analysis are mandatory for efficient experimentation. It is knownthat using a randomeffects modelis moreefficient than using a fixed effects one, as in the analysis of variance, meaning that fewer experimentalunits (plots) will be necessary to obtain the sameprecision. The question of transformationof data prior to analysis, of using standard textbookanalyses, or of using a formof spatial analysis arises with each experiment. To obtain these analyses, it is desirable that computerprograms(code) be available to the experimenter. Computer programming to recover the information from randomeffects is not well or widely understood. In light of recent developments of software for performing the needed calculations, a description of SAS programsis presented. Codeis described for recovering interblock, interrow, intercolumn,intergradient, and interregression informationin incomplete block and lattice rectangle (rowcolumn arrangement within each complete block) designed experiments. These are illustrated with a numerical example.
Wc
HEN DESIGNINGAN EXPERIMENT,selecting a statisti-
al analysis, using a computerprogram, or interpreting the output from a computer program, it is advisable for most experimenters to consult with a knowledgeable and experienced statistician or researcher. In planning the experiment, the design which controls the assumedvariation is selected. A vast variety of experiment designs are available, of which those presented in (for example) Cochran and Cox (1957) represent a small fraction. Software packages are available (e.g., N.-K. Nguyen, CSIRO, and E.R. Williams, Univ. of Canberra, in Australia) to construct and randomize plans for v = kb treatments in b incomplete blocks of size k in r complete blocks or of v = kb treatments of b rows and k columns within each complete block of a lattice rectangle. Whenanalyzing the results from experiments designed as incomplete blocks or as lattice squares or rectangles, the information obtained from recovering interblock or interrow-column information should be utilized. Ignoring this type of information by using only an intrablock or intrarow-column (fixed effect) analysis is akin to ignoring whole-plot information in a split-plot design. It has been shownthat the W.T. Federer, Biometrics Unit and Dep. of Statistical Sciences, Cornell Univ., Ithaca, NY14853; R.D. Wolfinger, SAS Institute, Inc., R-52, SAS Campus Dr., Cary, NC 27513. Received 24 Feb. 1997. *Corresponding author (
[email protected]). Published in Agron. J. 90:545-551 (1998).
Abbreviations: REML,restricted
545
maximumlikelihood.
546
AGRONOMY JOURNAL, VOL. 90,
Table 1. Results of four statistical designed as a lattice square.
analyses for an experiment
Statistical analysisfor
RMS~
CV
F
r~
Randomizedcomplete block Lattice square Differential gradientsin rows Rosy, column,andinteraction regressions "~ RMS,residual meansquare. $ r, no. of replicates.
38.88 22.67 18.97
% 57 44 40
2.13 0.94 1.22
16 10 8
11.91
32
2.43
5
mation did not help and if the counts have a Poisson distribution, the expected variance would be the experiment mean divided by 3: in this case, 11/3, or roughly 4. Even the trend analysis does not achieve this value. As demonstrated above, the particular statistical analysis selected can have a considerable effect on the results and on efficient utilization of experimental resources in achieving the same precision. The recovery of interblock, interrow and intercolumn, interrow and intergradient, and interregression information leads to increased precision by considering the information in the replicates (complete blocks), where the treatment effect is partially confounded with the blocking factor. To demonstrate the increased precision, consider the balanced lattice square design. Let the intrarow-column information be w = 1/s 2 (where ~ i s t he intrarow-column mean square), let the interrow information be w, = 1/(s ~ + ks~), and let the intercolumn information be wc = 1/(s 2 + ksZc), where the interrow meansquare is s~ + ksZr and the intercolumn mean square is s ~ + ks2~ and k is the number of treatments in each row and in each column. Then, the variance of a difference between two treatment means adjusted for row and column information is 2/[(k - 1)w + wr + we]. The variance of a difference between two intrarow-column means (row and column information ignored) is 2/(k 1)w, which is larger than the above, since the denominator is smaller. Since analyses such as the above, as well as other forms of spatial analysis, may be computationally complex, this paper shows how to program PROCGLM (SAS Inst., 1989), which is a fixed effects procedure, and PROCMIXED(SAS Inst., 1996), which is a mixed model procedure, for a numberof situations. The latter procedure is used to recover intereffect information, while the former provides a fixed effects analysis. With such programs, experimenters may easily and routinely obtain efficient statistical analyses for their data. Programs in PROC GLM and PROC MIXED for four statistical models are provided in the Results and Programs Section. Programs for intrablock and interblock analyses (i.e., recovering interblock information) are given for an incomplete block design. Code is given for fixed effects and for recovering interrow and intercolumn information from experiments designed as lattice rectangle designs, or even if there is a rowcolumn arrangement of experimental units within a complete block. Programs are presented for fixed effects and for recovering interrow and intergradient information in incomplete block or lattice rectangle designed
JULY-AUGUST1998
experiments. For experiments in a row-column arrangement within the complete block, whether designed or not, fixed effect and random effect programs are given for computing polynomial regressions for rows and for columns within the complete blocks and for interactions of these regressions. Abbreviated output of the computer programs is presented for an example. Finally, some concluding remarks and comments are given in the last Section. METHODS Various software packages are available to perform many kinds of statistical analyses. The problemis howto use them to obtain the desired statistical analyses. The SASsoftware package has manyprocedures, two of them being PROCCoLM (general linear models) and PROCMIXED (mixed effects model)(see, for example,Searle et al., 1992). Anexperimenter may wish to use analyses from both PROCGLMand PROC MIXEDif an ANOVA table and the adjusted treatment meansrecovering randomeffect informationare both desired. PROCGLMoutput may include an ANOVA table, coefficient of variation, F-values,probabilitylevels, expectedvalues of factor meansquaresfor any randomfactor, intraeffect treatment least squares means, and other items. The Type I sums of squareshavenested eliminationof other effects in the order they appear in the MODEL statement in the program. Type III sumsof squares eliminates all effects but the one under consideration. If the SOLUTIONS option for effects is used, it shouldbe notedthat the restriction usedis to set the highest numbered effect equalto zero; i.e., that last effect is subtracted fromall effects. This makesthe solutions a difference of two effects, and the standard errors given are standard errors of a difference of twointraeffect estimates. SASPROC MIXED is used to recover the treatment information contained in the randomeffects. The randomeffects variance componentsneed to be estimated and are used to adjust the treatment means. Manyprocedures for estimating variance componentsare available in the literature. Oneis analysis of variance (ANOVA) and another is restricted maximumlikelihood (REML)(Searle et al., 1992). PROC allows ANOVA solutions for variance componentsand PROC MIXEDuses REMLas the default method for obtaining solutions to variancecomponents but allows a choice of others. The REML solutions for variance componentsare obtained by an iterative procedure. Standard maximum likelihood and the noniterative MIVQUE(0) methodsare also available. The PARMS statement with the NOITERoption can be used to perform the calculations using other variance component solutions, such as ANOVA solutions obtained from a PROC GLM procedure. Searle et al. (1992) recommend use of REML solutions for variance components.Often, there are small differences betweenvariance componentsolutions from the different methods, thus resulting in nearly the same adjusted treatment means. The mixed model procedure using REML solutions has been incorporated into other software packages (e.g., GENSTAT and S-PLUS). Programs (code) for obtainingthe different statistical analyses discussed abovetypically are less than straightforward, and are not usually included in packagedocumentation.It is desirable to have such programsreadily available for agricultural researchers, enabling themto process their data in a quick and an expeditious mannerand to have flexibility in selecting an appropriatestatistical analysis. Failing to recover information contained in a data set is an inefficient use of resources. The effort and cost of recoveringsuch information
FEDERER & WOLFINGER:
SAS CODE FOR RECOVERING INTEREFFECT
is minimizedwhenappropriate programs, such as those described below,are readily available. RESULTS
AND PROGRAMS
SAS PROC GLM and PROC MIXED procedures are presented below for the following four types of analyses: (i) recovery of intrablock and interblock information; (ii) recovery of intrarow-column and interrow and intercolumn information; (iii) recovery of intrarowgradient and intergradient information; and (iv) recovery of intraregression and interregression information. Recovering Interblock
Information
PROC GLMis used to obtain an ANOVAand intrablock least squares means, and PROCMIXEDis employedto obtain the adjusted treatment effects, adjusted treatment means, and a standard error of difference between two adjusted effects or means. Both intrablock and interblock information is used to obtain the adjusted means. The first step is to construct a SAS data set, which is here named BLOCK.DAT. SAS data sets are rectangular arrays with columns corresponding to variables and rows to the observations. The data set BLOCK is assumed to contain the following variables: Y, the response variable (such as yield, count, etc.); the treatment designation; R, the complete block or replicate designation; and B, the incomplete block within a replicate designation. A PROCGLMprogram for the data set BLOCKis: data block; infile ’block.dat’; inputy t r b; proc glm data = block; classt r b; modely : t r b(r) randomr b(r) ismeanst ; run ;
The PROCGLMstatement invokes the procedure and the DATA= option specifies the analysis data set to be BLOCK.DAT.The CLASSstatement lists which discrete variables are to be treated as classification variables (as opposed to quantitative variables). Dummy indicator variables are automatically created for each separate level of the class variables. The MODEL statement specifies the response variable Y and the fixed effects T, R, and B(R), the last one denoting the nesting of B within R. The RANDOM statement requests that R and B(R) be considered as random effects in constructing PROCGLM’s expected mean squares; they are considered to be fixed effects in the ANOVA and in the construction of the intrablock least squares means from the LSMEANS statement. Output from this program includes a standard ANOVA, Type I mean squares and tests (nested elimination of effects in the order in the model), Type III mean squares and tests (elimination of all other effects), expected mean squares, and the estimated intrablock means. PROCMIXEDis used to recover interblock informa-
INFORMATION
547
tion as follows: proc mixed data : block; classt r b; model y = t; randomr b(r) ismeanst ; rur~ ;
Note that only fixed effects are retained in the MODEL statement. This analysis treats R and B(R) as random effects and obtains solutions for their variance components as well as the variance componentfor the residual error using restricted maximumlikelihood (REML), which gives nonnegative solutions for all variance components. The test for T, as well as the printed least squares means, makes use of the estimated variance components and thereby recovers interblock information. If effects and their standard errors are desired, include "/solution" after the last term in the MODEL statement. The standard errors of the effects are actually standard errors of a difference betweentwo effects; i.e., the value of the effect set equal to zero is subtracted from every effect makingthe effect listed as a difference of two effects. Analysts should be aware that standard textbook analyses use ANOVA rather than REMLsolutions for the variance components, and unless the ANOVA and REMLsolutions are equal, the adjusted means will be different. The differences will usually be small. Recovery of Interrow and Intercolumn Information The SAS data set for this example is assumed to be named ROW_COL and to contain the following variables: Y, the response variable (such as yield, count, etc.); T, the treatment designation; R, the complete block or replicate designation; B, the row nested within replicate designation; and C, the column nested within replicate designation. A PROC GLM program for obtaining an ANOVA and intrarow-column (fixed effects) treatment means is: data row_col; infile ’row_col.dat’; inputy t r b c; proc glm data : row_col; classt r b c; modely : t r b(r) c(r) randomr b(r) c(r) ismeanst ; run; Thisanalysis extends theprevious oneby theaddition of the column effect C(R). The output is the same as in the previous analysis, and the LSMEANS statement computes the intrarow-column (fixed effects) treatment means. A PROCMIXEDprogram for obtaining treatment means adjusted for interrow and intercolumn information is obtained as follows:
548
AGRONOMY JOURNAL, VOL. 90,
proc mixed data : row_col; class t r b c;
model y : t; randomrb(r) c(r) ismeanst ; run; As before, REMLis used to compute the solutions for the variance components corresponding to R, B(R), C(R), and the residual error. The LSMEANS statement automatically computes the adjusted treatment means recovering interrow and intercolumn information. Recovery of Interrow and Intergradient Information Using an incomplete block or a lattice rectangle experiment design, the experimenter attempts to control spatial variation as muchas possible. Owingto unknown spatial patterns of variation in the experimental site, or to variations that occur during the course of the experiment, such as invasion by disease or insects, gradients may occur within incomplete blocks or within rows (or columns). There is no reason that these gradients should be the same from block to block, from row to row, or from replicate to replicate. In other words, the differential gradients vary at randomand maybe considered to be a randomeffect, and efficient analyses of data would require recovery of intergradient information in the same manner as for random blocks, rows, or columns (Federer, 1998). The gradients may take various forms. If they follow the polynomial regression model, then polynomial regression coefficients on position within an incomplete block or row (column) may be included the SASdata set along with the other variables, or they be obtained as described by Wolfinger et al. (1997). For this example, we use only a linear regression with the centered positions within each row of a lattice rectangle being the covariates and include these in the data set. The data set for this example is named GRADIEN and has the following variables: Y, the response variable; T, the treatment designation; R, the replicate designation; B, the row designation within replicate; and G, the coefficients for the linear gradient within row. PROC GLMcode to compute the intrarow (intrablock) and intragradient analysis is: data gradien; infile ’zradien.dat’ ; input y t r b g; proc glm data = gradien; class t r b; model y : t r b(r) g*b(r) random r b (r) ismeans t ; run ;
Federer (1998) described three models for the gradients: (i) the differential gradients vary around a parameter (mean) of zero, (ii) the gradients vary around parameter for the experiment and then G is added to the model, and (iii) the gradients vary around a replicate parameter (random or fixed) and a term G*R is added
JULY-AUGUST1998
to the above model statement. Also, G and G*R may both be added to the model, or G*B(R) could be replaced by G’R, as one regression may be commonfor each replicate rather than for each row or block within a replicate. Performing such analyses allows the experimenter to determine the nature of the spatial variation in the experiment. The asterisk symbol (*) usually denotes an interaction, but is required here for the code to work. The model as written accounts for differential regressions within blocks of an incomplete block or within rows of a lattice rectangle design. Although G, G’R, and/or G*B(R) may be considered to be random effects, they are not placed in the RANDOM statement because PROCGLMallows only classification effects there, and class variables are discrete variables. A PROC MIXEDprogram for obtaining treatment means adjusted for recovery of interrow and intergradient information and using REML solutions for the variance components is: proc
mixed
data
= gradien;
class t r b; model y : t; randomr b(r) g*b(r) ismeanst ; run ;
Note that G*B(R) is now put in the RANDOM statement. If there is a gradient for the whole experiment, then G may be considered a fixed effect and put in the MODEL statement. If G*R is used in place of G*B(R), it would be placed in the RANDOM statement. Since most situations wherein differential regressions would arise would have the differential regressions varying around a zero parameter, the program as written should suffice but, as noted, other options are available. Also, for larger experiments, a curvilinear gradient may be present and then quadratic regression coefficients would be added to the data set and the MODEL statement modified accordingly. Recovery of lnterregression
Information
Instead of considering row and column effects within each replicate, polynomial regression functions of the row and column effects within each replicate may be used. Then, interactions of these row and columnregressions may be added to the model (see Federer, 1998). This type of analysis (also called trend analysis) will useful when the row-columnorientation of the experiment does not correspond to the spatial pattern actually present. Insect and disease invasion, wind damage, and the like may enter an experiment at an angle to the row and columndirections. The regressions usually will vary randomly from replicate to replicate and hence would be considered to be random effects. This analysis and the previous one provide other alternatives to the standard textbook analyses given by the first two samples of code. As recommendedby Federer (1998), the statistical modelshould fit the spatial pattern found in the experiment and not necessarily the one that was assumed when the experiment design was selected. Computerpackages
FEDERER & WOLFINGER:
SAS CODE FOR RECOVERING INTEREFFECT
and programssuch as those described herein allow feasibility and flexibility in appropriate choice of a statistical analysis. Supposethat linear and quadratic regressions for row and column effects, BL, BQ, CL, and CQ, respectively, along with their interactions are considered appropriate for controlling the particular type of spatial variation found in the experiment. These row and column regression coefficients maybe entered into the data set as below, or they may be obtained as described by Wolfinger et al. (1997). The data set for this example named REGRESS. A PROC GLM program to obtain an ANOVA for the intraregression analysis and intraregression treatment means is: da~a regress; infile’ regress,dat ’ ; inpur r r cl cq bl bq; ii = bl*cl; lq = bl*cq; ql = bq*cl; qq = bq*cq; proc glm daza = regress ; class t r; model y = t r bl*r bq*r cl*r cq*r ll*r lq*r ql*r qq*r; randomr ; ismeanst ; run ~
The MODEL statement contains all eight regressions described above and relegates eight degrees of freedom for controlling the spatial variation in each replicate. This results in 8n degrees of freedom for n replicates being allocated for spatial variation. Each regression sum of squares will have n degrees of freedom. Federer (1998) describes a procedure for selecting which regressions to retain in the analysis. For example, it maybe found that the QQregression does not contribute to controlling variation and hence may be omitted. To recover interregression information, PROC MIXED is used. Appropriate code is as follows: proc mixed data = regress;
INFORMATION
549
pendix. The SASinput file is named lsgr2553.dat and contains yield, replicate number, row number, column number, centered linear regression values for the five columns(i.e., -2, -1, 0, 1, 2 for the five positions or columns 1, 2, 3, 4, 5), and the treatment number. The linear and higher regression values may be obtained using the PROCIML procedure described by Wolfinger et al. (1997), or they maybe entered in the data file (as was done here). Rather than using separate lines for each statement in the program, the statements were written continuously to save space. The program and some of the output generated from the program are given in the appendix, along with annotations (text enclosed in/*...*/). PROCGLMand PROCMIXEDprocedures are included for the standard lattice square design form of the analysis and for differential linear gradients in the rows. The former is used to obtain an analysis of variance table and the latter to obtain adjusted means. Using the rows as the incomplete blocks, a PROCGLMprocedure for a triple lattice ANOVA was included. The reader is advised to compare the results in the Appendix with those given by Cochran and Cox (1957). It should become immediately evident that the use of SASprograms allows analyses with a minimumof effort over their procedure. The program was written to obtain three analyses in the same output. For the standard analysis of this example, the REMLand ANOVAsolutions for the variance components were the same, resulting in the same adjusted means. The differential linear gradients analysis is more appropriate than the standard analysis in controlling spatial variation. The residual mean square was reduced from 9.57 to 4.06, a reduction of 58%, and the coefficient of variation was reduced from 10.6 to 6.9%. Using a more appropriate analysis was equivalent to having seven replicates instead of three for the standard analysis. This example illustrates the desirability of using an appropriate analysis for the data, and these programsfacilitate finding that analysis.
class ~ r; model = Z ; random r bl*r bq*r cl*r cq*r ll*r lq*r ql*r qq*r; ismeans t ; run ;
Note that all effects except T, ered to be random effects. for the variance componentfor is equivalent to omitting that
a fixed effect, are considWhenthe REMLsolution a regression is zero, this regression from the model.
Example The example selected to illustrate the above results is the semibalanced lattice square designed experiment given in Table 12.3 of Cochran and Cox (1957). There are 25 treatments (corn hybrids, Zea rnays L.) arranged in a 5-row by 5-columnarray within each of three replicates (complete blocks). Each treatment occurs once, with every other treatment either in a row or in a column. The yield (dependent variable) is pounds of corn from a 4-row by 10-hill plot. Annotated code and output are provided in the Ap-
CONCLUDING REMARKS AND COMMENTS Flexibility in the choice of appropriate statistical analyses is made possible through the software packages that are available. Whether or not an experimenter can use them depends on the ability to program a procedure or to obtain a program such as those given herein. To understand the output from such programs in more tail, the user may wish to obtain annotated computer outputs that explain the various items obtained. Federer (1995) and Barnard and Federer (1997) have prepared such annotated outputs (see also Searle and NewsomStewart, 1993). Alternatively, the programs herein may be applied to small numerical examples from a textbook (e.g., Examples XI-3 and XII-1 in Federer, 1955, or Table 10.1 in Cochran and Cox, 1957) and the output results compared with the textbook results. REMLanalyses rely on the assumption that the data are normally distributed, whereas ANOVAs do not. It may be desirable to use a transformation, such as log
550
AGRONOMYJOURNAL, VOL. 90,
or square root of the data, in an attempt to fulfill the assumption. These transformations are easily accomplished by including a statement such as SY = SQRT(Y) or LY = LOG(Y)immediately after the INPUTstatement. Then, SY or LY is used in place of Y in the MODEL statement. A study of the patterns in the residuals and/or a large coefficient of variation mayindicate a need for a transformation of the data. Searle et al. (1992) state that REMLand ANOVA solutions for variance components are the same for "balanced" data sets. By balanced they have to mean orthogonal. Also, it is necessary that all ANOVA solutions be nonnegative for this to happen. Orthogonality is a sufficient but not a necessary condition for equality of the solutions. For the triple lattice designed experiment in ExampleXI.3 of Federer (1955), ANOVA and REMLsolutions are identical. For the balanced lattice square in Table 12.5 of Cochran and Cox (1957), REMLand ANOVA solutions are different. Striking results are sometimes possible when using other than textbook analyses for responses from experiments. As stated above, the pattern of residuals or coefficients of variation beyond the range normally expected for the type of experiment may indicate the need for considering other analyses. Using the regression analysis described above on the data in Table 12.5 of Cochran and Cox (1957), the residual mean square was essentially halved over that obtained from the standard lattice square textbook analysis. Using the row-gradient analysis described above on these data resulted in a reduction of 16%, equivalent to one additional replicate, in the residual mean square over the standard analysis. Overfitting mayresult in a residual meansquare that is biased downward, indicating that the minimum number of space variables should be used to control heterogeneity in the experiment. The minimumnumber of terms possible should be used to control spatial variation, as this allocates additional degrees of freedom to the residual error mean square and reduces the possibility of over° parameterization. Weconclude with a few comments regarding the use of PROCMIXED.Whenall of the effects in the RANDOMstatement of PROC MIXEDshare one or more effects in common,it is often more efficient computationally to factor out this commoneffect into the option SUBJECT= effect. For example, a random statement of the form random r b(r)
g*r g*b(r)
can also be written as random ±n~c b g g*b / subject
= r;
The gain in computational efficiency grows with the number of levels of R. Whenperforming inference, it is often useful to construct single degree of freedom contrasts among the adjusted means. Simple differences (contrasts) are automatically generated using the DIFF option in the LSMEANSstatement of PROC MIXEDand multiple comparison adjustments accounting for simultaneous inference are available as well (see SASInst., 1996).
JULY-AUGUST 1998
Custom contrasts can be constructed with CONTRAST and ESTIMATEstatements, as demonstrated in Federer (1995). ACKNOWLEDGMENTS The constructive commentsof the Associate Editor and referees are greatly appreciated. APPENDIX /*Program Table 12.3 data isgr; col grad
for semibalanced lattice square example in of Cochran and Cox (1957).*/ infile ’Isgr2553.dat’; input yield rep row treat;
/*Textbook Analysis when the experiment design is a semibalanced lattice square.*/ proc glm data = isgr; class rep row col treat; model yield = rep treat row(rep) col(rep): random row(rep) col(rep); proc mixed data = isgr; class rep row col treat; model yield = treat: random rep row(rep) col(rep): Ismeans treat: run: /*Analysis as a triple lattice with rows as incomplete blocks.*/ proc glm data = isgr; class rep row col treat; model yield = rep treat row(rep); random rep row(rep); run: /*Analysis for differential linear gradients within rows.*/ proc glm data = isgr: class rep row col treat; model yield = rep treat row(rep) grad*row(rep); random rep row(rep); run; proc mixed data = isgr; class rep row col treat; model yield = treat; random rep row(rep) grad*row(rep); ismeans treat; run: /*Part of data file isgr2553.dat; yield replicate column gradien~ ~rea~men~ is ~he inpu~ order.*/ 33.3 1 1 30.7112 35.~ i 1 30.1114 29.6 1 1 24.6 1 2 30.8 1 2 28.8123 3~.8124
row
1 -2 18 1 9 3 0 ii 1 2 5 2 25 1 -2 24 2 -i 15 017 1 8
28.8 3 5 4 1 25 30.6355 216; /*Output from program in an abbreviated form.*/ /*Semibalanced lattice square textbook analysis, output from PROC GLM code.*/ The SAS System General Linear Models Procedure Class Level Information (]lass Levels Value REP 3 1 2 3 ROW 5 1 2 3 4 5 COL 5 1 2 3 4 5 TREAT 25 1 2 3 4 5 6 7 8 9 i0 II 12 13 i~ 15 16 17 18 19 20 21 22 23 24 25 Number of observations in data set = 75 Dependent
variable: DF
Model Error
YIELD
Sums of Squares 50 1981.79960 24 229.79560
Mean Squares 39.63599 9.57%82
F Value 4.14
Pr > F 0.0002
FEDERER & WOLFINGER: SAS CODE FOR RECOVERING INTEREFFECT INFORMATION Error
Corrected Total R-Square 0.896095
74
2211.59520 C.V. 10.59989
Root MSE 3.09432
YIELD Mean 29.1920
/'The Model DF (degrees of freedom) comes from the sum of the REP, TREAT, ROW (REP) , and COL (REP) DF . 0.896095 = 1981.79960/2211.59520; 10.59989 = 100(3.09432)729.1920; 3.09432 = (9.57482)1/2; 29.1920 is the mean of the 75 observations. */ General Linear Models Procedure Dependent Variable : YIELD F Mean DF Source Value Square Type I SS 2 546.876800 273.438400 2 8 . 5 6 REP 24 611 .081867 TREAT 2.66 25.461744 12 585.630133 ROW (REP) 48.802511 5 . 10 12 238.210800 COL(REP) 19.850900 2 . 0 7
Pr > F 0 . 0001 0 . 0100 0 . 0003 0 . 0621
F Mean Type III Value Square SS 546.876800 273.438400 2 8 . 5 6 1 .89 18.104417 434.506000 5 .10 585 . 630133 48.802511 2 .07 19.850900 238.210800
Pr > F 0 . 0001 0. 0628 0 . 0003 0 . 0621
DF 2 24 12 12
Source REP TREAT ROW (REP) COL (REP)
/*Type I is a nested analysis and Type III is the effect eliminating all other effects. Note that the last line in both sets is identical, because all other effects have been eliminated from COL. REP is orthogonal to all other effects and hence has the same Type I and III sums of squares . */ Source REP
TREAT
27.15543104 2.77885083 24
25
DF
24 24 24 24
t 10 .04 10 .28 10 .55 10 .67
13.00018
Root M S E 3.60558
YIELD mean 29.1920
Type III Mean F DF S3 Square Value Pr > F 2 546.876800273.438400 21.03 0.0001 24 4 5 2 . 8 6 5 6 0 0 18.869400 1.45 0.1525 12 585.630133 48.802511 3 . 7 5 0.0010
Source REP TREAT ROW(REP)
/ 'Linear g r a d i e n t s within rows ANOVA, output f r o m PROC GLM code . * / Error 4.06375 21 85 . 33873 R- Square C.V. 0.961413 6.905571
Root MSE
2.01587
YIELD Mean 2 9 . 1920
Mean Type III SS Square 2 5 4 6 . 8 7 6 8 0 0 273 . 4 3 8 4 0 0 14. 7 7 5 3 0 3 24 3 5 4 . 6 0 7 2 7 4 12 5 0 6 . 0 3 2 7 9 6 4 2 . 169400 15 3 8 2 . 6 6 7 6 7 4 2 5 . 511178
Source
DF
REP TREAT ROW(REP) GRAD*ROW(REP)
F
Pr > F
Value 67 .29
3.64 10.38 6.28
0 . 0001 0. 0020 0 . 0001 0 . 0001
/ ' L i n e a r g r a d i e n t s within r o w s , o u t p u t f r o m PROC MIXED c o d e . * / Covariance Parameter Estimates (REML) Cov F a r m
Estimate 8.00014750 13 . 8 7 3 3 5 6 5 1 3 .42574692 4.06814714 Least S q u a r e s Means
Pr 0 0 0 0
9.77
>
Effect TREAT TREAT TREAT TREAT
TREAT 1 2 3 4
TREAT
25
LSMEAN Std E r r o r 25.955268902.40289881 29.445504792.45438914 27.30738638 2.43921501 28.843414122.32662076
DF 21 21 21 21
Pr> |t| 10.800.0001 12.00 0.0001 11.200.0001 12.40 0.0001
2 5 . 6 5 8 6 6 2 8 6 2 . 5 2 5 1 9 7 8 9 21 10.16 0.0001
/'The s t a n d a r d e r r o r s d i f f e r owing to the n o n o r t h o g o n a l i t y i n t r o d u c e d by the c o v a r i a t e gradient. */
/*Semibalanced lattice square, output from PROC MIXED code.*/ Covariance Parameter Estimates (REML) /'Variance components*/ Cov Farm Estimate REP 7.58431667 ROW(REP) 11.76830833 COL(REP) 3.08282500 Residual 9.57481667 Least Squares Means LSMEAN Std Error 27 .90106804 2.77885083 28 .55432999 2.77885083 29 .31053442 2.77885083 29 .64180499 2.77885083
C.V 12.35125
ROW(REP) GRAB*ROW(REP) Residual
/*Var stands for variance component and these expectations are for ANOVA solutions, whereas PROC MIXED procedure makes use of REML solutions (See Searle et al. , 1992) . For this example, the ANOVA and REML solutions are the same. The three dots ( " . . . ") indicate that output has been omitted. */
Effect TREAT 1 TREAT TREAT 2 TREAT 3 TREAT 4
R-Square 0.788385
468.00640
REP
Type III Expected Mean Squares Var(Error) + 5 Var(COL(REP)) + 5 Var(ROW(REP)) + 25 Var(REP) Var(Error) +Q(TREAT) Var(Error) + 3.3333 Var(ROW(REP)) VAR(Error) + 3 . 333 Var(COL(REP))
TREAT ROW(REP) COL(REP)
36
551
t
0001 0001 0001 0001
0.0001
/ * T h e t - t e s t c o m p a r e s the mean with z e r o and hence is u s u a l l y m e a n i n g l e s s . The lowest Pr ( p r o b a b i l i t y ) listed in an output is 0 . 0001 . All means have the same s t a n d a r d e r r o r (Std E r r o r ) f o r this t - t e s t owing t o the b a l a n c e in the e x p e r i m e n t . * / / • T r i p l e lattice ANOVA, output f r o m PROC GLM code . */