TESTING INTERACTION IN 2 x 2 DESIGNS USING TRANSFORMED ...

1 downloads 0 Views 1MB Size Report
2.1 An approximate "Z test". We consider two different approaches to the test ofinteraction. The simplest approach is to test the null hypothesis Ho expressed as ...
TESTING INTERACTION IN 2 x 2 DESIGNS USING TRANSFORMED DATA

by

Paul D. Sampson Peter Guttorp

TECHNICAL REPORT No. 159 February 1989

Department of Statistics, GN-22 University of Washington Seattle, Washington 98195 USA

Testing Interaction in 2 x 2 Designs using Transformed Data

Paul D. Sampson and Peter Guttorp Department ofStatistics University ofWashington Seattle, Washington 98195 U.S.A.

*Paul D. Sampson is Research Associate Professor and Peter Guttorp is Associate Professor, Department of Statistics. GN-22, University of Washington, Seattle, Washington 98195. This work was supported by the U.S. Environmental Protection Agency through a cooperative agreement with the Societal Institute of the Mathematical Sciences and the University of Washington. The authors thank: Rick Vong and Tim Larson for motivating this research ASARCO smelter study.

Testing Interaction in 2 x 2 Designs using Transformed Data

Abstract Statistical analysis of the environmental impact of a pollutant point source or other human intervention often focuses on an interaction effect in order to compare, for example, measured pollutant levels before and after closure of a pollutant point source between regions presumed affected and unaffected by the intervention. This comparison may be meaningful only in terms of the original units of measurement of the environmental response. However, statistical modeling of environmental data is often most appropriately carried out on a transformed response scale. In this paper we present a simple approximate method for testing interaction expressed on the original scale of measurement for 2x2 experimental designs (such as time x region) with data analyzed in terms of one of the common power transformations. The.test is an approximate version of the Wald test, and hence is asymptotically equivalent to a likelihood ratio criterion. The latter is, however, more difficult to compute, requiring a numerical optimization. Both methods perform well on simulated data, even for small sample sizes. We also apply the simple test in a more complicated model to evaluate the effects on acid deposition of closure of the ASARCO copper smelter in Tacoma, Washington. Key words: Acid rain, Environmental impact, Likelihood ratio test, Power transformation, Simulation, Taylor series.

i

1. INTRODUCTION The null hypothesis of additivity of effects is sometimes the primary hypothesis of scientific interest in experimental designs. This is common, for example, in environmental studies of effects of changes in point sources of pollutants. The influence of a pollutant source is often judged by comparing pollutant levels in a potentially affected region between time periods when the source is and is not emitting pollutants (e.g.• before and after startup. or before and after closure of the point source). However, to control for other possible temporal and spatial trends in ambient pollutant levels. this comparison must be made in each of two regions: a "pseudo-control" region presumed unaffected by pollutant source emissions as well as the region potentially affected. For a study of mean difference in pollutant levels between. say. time 1 (source in operation) and time 2 (source not in operation). a source effect is defmed as the change in this mean difference between the two regions. That is. one asserts a source effect if an increase in pollutant levels from time 1 to time 2 is dependent on region. the increase (if any) being greater in the potentially affected region. Equivalently. a source effect is claimed if the difference in pollutant levels between the regions is greater during the time period when the source is in operation. Both of these assertions are statements of interaction between time and space (region) effects in the experimental design. See Green (1979) for further discussion. In applications such as the one just described. the null hypothesis of additivity (no interaction) is usually specified and interpreted in terms of specific units of measurement. In the example presented below. we measure pollutant (ion) concentration in ueq/l, However. heterogeneity associated with distributions of positive-valued random variables such as concentrations often calls for analysis in transformed units such as log or square root. The usual test of additivity applied to the transformed data is no longer relevant for tesnng additivity

of measurement.

1

factor with levels i= 1,2 corresponds to region and the factor with levels j= 1,2 corresponds to time. Denote the cell means E(XijtJ = Jlij. Suppose a transformation Y =g(X) stabilizes the variance and write E(Yijk) = E(g(Xijk)) = 'Tlij. In this paper we consider a simple test of the hypothesis of additivity, HO: Jl22-Jl21-Jl12+Jln=O, based on the cell means of the transformed data, Yij. Note that E(Y22-Y21-Y 12+Y 11) = 'Tl22-'Tl21-'Tl12+'TlU::l: 0 under Ho. Indeed, transformations are often chosen explicitly to eliminate interaction defined in terms of the n's when HO is false (cf. Box, Hunter and Hunter 1978). Therefore, tests for interaction on a transformed scale may yield very misleading results. To test HO we estimate Jl22-!l21-!l12+!l11 in terms of the means Yij of the transformed data using a second order Taylor series expansion and test for significant deviation of this estimate from zero. This simple test performs well in comparison with an approximate likelihood ratio test which is more difficult to compute. In fact, the two tests are asymptotically equivalent and perform similarly for finite sample sizes in a small simulation study. We apply this procedure in the analysis of data collected in a study of the effects of the closure of a copper smelter on wet acid deposition near Seattle, Washington, USA (Vong et al. 1988). The design is in fact more complicated than a simple two-way ANOVA, but it embodies two 2-level factors of primary interest: 1) year-before and after the closure of the smelter, and 2) geographic location of the rainfall samplers-upwind and downwind of the smelter. Data were collected for 5 winter storms (treated as random effects) in each of the two years of the study. The selected storms were cyclonic frontal systems with southsouthwesterly winds. Monitoring sites upwind of the smelter reflected primarily clean background off the Pacific Ocean while downwind monitoring sites were affected by urban sources of acid deposition whether or not

smelter was

The null hypothesis

of no smelter effect is therefore one of additivity between the year and geographic factors: that the upwind-downwind

in mean

is

same 2

concentration

sulfate or

a root.

2. TESTS OF INTERACTION FOR ANALYSIS UNDER POWER TRANSFORMATIONS We first approximate the null hypothesis

Ho: J.l22-iL21-iL12+J.ll1=O

in terms of the 11ij

= E(Yijk) = E(g(Xijk)) for common members of the family of power transformations, gk(X) =

x»,

k:i=O,

gkCX) = In(X),

k=O,

or equivalently, the Box-Cox version of this family (Box and Cox 1964). We then suppose that the range of values of X is such that second order polynomial (Taylor series) approximations will be useful. Approximations may be based either on expansions of Y=g(X) about X=J.l, or of X=g-l(Y)=h(Y) about Y=T\: Y "'" g(J.l) + (X-J.l)g'(J.l) + (X!)2 g"(J.l) X "'" h(11) + (Y-11)h'(11) + (Y

i)2

h"(11)

(la) (lb)

The latter expansion is preferred because E(X-J.l)2 is usually a function ofu, Although the transformation g(X) is often chosen to stabilize the variance assuming that E(X-J.l)2 ee J.lP, we prefer not to build this assumption explicitly into further computations. Instead we will rely simply on the assumption of homogeneity of variance on the transformed scale, E(Y-11)2 =

cr;. Then, J.l = E(X) "'" h(11) + ;

cr~

h"(11).

(2)

If the transformed response Y is symmetrically distributed, E(Y-11)3 == 0, and this second

order Taylor series expansion for the mean is accurate to third order. Utilizing (2), Ho is approximated as

Ho:

[h(1122)-h(1121)-h(1112)+h(1111)] +

lcr~ [h"(1122)-h"(1121)-h"(1112)+h"(1111) ] = o.

3

1.

Y = ..JX, h(Y) = y2, h"(y) = 2

(3a)

HO: l1f2-11fcl1f2+11f1 = 0 2.

Y = ~ X, h(Y) = y3, h"(y) = 6Y

(3b)

HO: [l1i2-11icl1f2+11f1] + 30; [1122-1121-1112+1111 ] = 0 3.

Y=ln(X), h(Y)=eY , h"(y)=e Y

(3c)

Ho: e1122-e1121-e1112+el111 = 0

4.

Y = l/X, h(Y) = lty, h"(y) = 2y-3 Ho:

(3d)

[112~-1121-11i~+l1il] + 60; [112~-1121-11i~+l1i1] = 0

Note that for the square root and logarithmic transformations these approximations are expressed as the usual interaction contrast in terms of the inverse transformations (square and exponential) applied to the means l1ij. The other two examples have a correction term scaled by the variance O'~.

2.1 An approximate "Z test" We consider two different approaches to the test of interaction. The simplest approach is to test the null hypothesis Ho expressed as a nonlinear function of the cell means l1ij using as a test statistic the ratio of a sample estimate of the appropriate nonlinear function of cell means (substituting 'Y's for n's, and a residual mean square s; for

0';)

to an estimate

of its standard error. These ratios are expected to be approximately normally distributed under Ho (or Ho) for sufficiently large sample sizes. We consider unbalanced designs with nij observations in cell (iJ). Taylor series approximations for the variances of the test

statistics are:

1.

Y = ..JX, h(Y) = y2

(4a)

4

=

3.

Y = In(X), hey) = eY Var(eY22-eY21-eY12+eYll)

(4c)

= ~~(e11ij-Var(Yij)) 1 J y .. 2 = ~~(e LJ-S;/nij) 1

J

(4d)

4.

Performance of the test based on these ratios (or -z statistics") is demonstrated below in a Monte Carlo experiment.

2.2 An approximate likelihood ratio test We also consider an approximate likelihood ratio test by comparing the unrestricted normal likelihood of the transformed data Y ijk with the likelihood corresponding to cell means "ij fitted under the constraint of an additive model for the underlying means Ilij of the data Xijk before transformation. The maximization of the likelihood under this additive model requires a numerical optimization.

For this we require the Taylor series

approximations for 11 in terms of Il derived by solving equation (2) for 11. For the power transformations of equation (3) these are:

1. -VX:

(Sa)

2. ~X:

(Sb)

S

4. I/X:

(5d)

Suppose the Yijk are normally distributed and approximate the normal density by replacing the parameters {l1ij,O';J by .{f(Jlij,a2y)'

O'~}, where !(Jlij,O'~) is the appropriate

member of the list of functions given in (5a-5d). Treat this as an approximate likelihood function for {Jlij,a;} and maximize over those values consistent with the additive model

Bo.

That is, compute

*= Lo where

*

2

L (Jlij,c.ry I {Yijkl) =

»)...

2 2 ';ij 1 l!ljf}1~11~21t 0'

-

{ -1 2 2} exp 20'2 (Yijr!(Jlij,