Robust Designs Based on the In nitesimal Approach - CiteSeerX

7 downloads 0 Views 297KB Size Report
robustness constraint is quanti ed in terms of boundedness of the Gateaux derivative of this determinant, in the direction of a contaminating response function or ...
Robust Designs Based on the In nitesimal Approach Douglas P. Wiens & Julie Zhou 1

University of Alberta and Lakehead University A version of this paper has appeared in the Journal of the American Statistical Association, 92, 1503-1511. We introduce an in nitesimal approach to the construction of robust designs for linear models. The resulting designs are robust against small departures from the assumed linear regression response and/or small departures from the assumption of uncorrelated errors. Subject to satisfying a robustness constraint, they minimize the determinant of the Mean Squared Error matrix of the Least Squares estimator at the ideal model. The robustness constraint is quanti ed in terms of boundedness of the Gateaux derivative of this determinant, in the direction of a contaminating response function or autocorrelation structure. Speci c examples are considered. If the aforementioned bounds are suciently large, then (permutations of) the classically optimal designs, which minimize variance alone at the ideal model, meet our robustness criteria. Otherwise, new designs are obtained. KEY WORDS: Regression design; V-robustness; B-robustness; M-robustness; Approximately linear regression response. Douglas P. Wiens is Professor and Director, Statistics Centre, Department of Mathematical Sciences, University of Alberta, Edmonton, Alberta, Canada T6G 2G1. Julie Zhou is Assistant Professor, Department of Mathematical Sciences, Lakehead University, Thunder Bay, Ontario, Canada P7B 5E1. The research of both authors is supported by the Natural Sciences and Engineering Research Council of Canada. AMS 1991 subject classi cations. Primary 62K05, 62F35; secondary 62J05, 62M10. 1

1

1 INTRODUCTION In this paper we introduce an in nitesimal approach to the construction of robust designs for linear models. The resulting designs are robust against small departures from the assumed linear regression response and/or small departures from the assumption of uncorrelated errors. Subject to satisfying a robustness constraint, they minimize the determinant of the Mean Squared Error matrix of the estimator at the ideal model. Speci cally, suppose that an experimenter is to observe a random variable Y at locations x1; : : : ; xn in a q-dimensional design space S . The response E [Y jx] is thought to be approximately linear in the elements of a p-dimensional vector z(x) of regressors, and is observed subject to additive and possibly autocorrelated errors. With f (x) = E [Y jx] ? zT (x) the observations then satisfy Yi = zT (xi) + f (xi) + i; i = 1; : : : ; n: (1) The i are zero-mean random errors with covariance matrix 2P for some autocorrelation matrix P : The disturbance function f is assumed to be constrained in a manner ensuring that the regression parameters are well-de ned - see Section 4 below. The experimenter, knowing neither f nor P , intends to compute the classical Least Squares estimate ^ . He will thus incur possible errors due to bias, and a possible loss in eciency. He wishes to use a design for which the size, as measured by the determinant, of the Mean Squared Error matrix of ^ remains within reasonable bounds. Versions of this problem, with respect only to variations from the tted response function, have been investigated by, among others, Box and Draper (1959), Huber (1975), Li and Notz (1982), Pesotchinsky (1982), Kiefer and Wynn (1984), Wiens (1991, 1992), Tang (1993), Liu (1994), and Liu and Wiens (1994). With respect to the second type of model departure - autocorrelated errors - see Jenkins and Chanmugam (1962), Sacks and Ylvisaker (1966, 1968, 1970), Judickii (1976), 2

Bickel and Herzberg (1979), Herzberg and Huda (1981), Constantine (1989), Bischo (1992, 1993) and Wiens and Zhou (1995, 1996). A possible approach to these problems is to seek robustness against in nitesimal departures from the model, in the sense of bounding the maximum Gateaux derivative, evaluated at the ideal model (f = 0; P = I ), of the determinant of the MSE matrix of ^ . The derivatives are taken in the directions of arbitrary contaminating functions f or autocorrelation matrices P . The boundedness ensures that the MSE remains close to its value at the ideal model, in suciently small neighbourhoods. Subject to boundedness we choose the design to minimize the MSE at the ideal model, this imposing a requirement of eciency in addition to that of robustness. Our designs attain their optimality through the placement of the design points and through the order in which these points are implemented. As an example of this latter factor, in a quality control framework observations at similar machine settings (x) may be positively (negatively) correlated, with the correlations decreasing in magnitude over time. It turns out that for MA(1) errors an appropriate ordering of the design points maximizes (minimizes) the number of sign changes among successive values of x ? x. The in nitesimal robustness of an estimator can be studied by means of the In uence Function of the estimator and by the Change-of-Bias and Change-of-Variance Functions (CBF, CVF) - see Hampel et al (1986) and more recently Hossjer (1991). The CBF and CVF quantify the local stability of the bias and variance. Extensions of these stability notions to design theory are given in Section 2 below, where we also introduce corresponding optimality criteria. Explicit optimal designs are given in Sections 3 to 5. Various designs are compared in Section 6, and an example from the chemical engineering literature is discussed in Section 7. Proofs are in the Appendix.

3

2 DESIGN SENSITIVITY Let  be the design measure, i.e. the empirical distribution function of the design points, and de ne ( 1 Pn?m n T X 1 bf; = n z(xi)f (xi); B (m) = Bn T (i?=1m)z;(xi)z (xi+m ); ?0 (n m? 1) n?m1 0g: 5

These classes correspond to MA(1) processes with, respectively, positive and negative lag-1 correlations bounded away from 0. The V -robust designs presented here do not depend on the values of c0 and c1. We consider the multiple linear regression model, i.e. zT (x) = (1; xTq1) in (1), and restrict to the class n;q of n-point designs for which B (0) is a diagonal matrix. The latter point can of course be justi ed on practical grounds. Furthermore, because of ane invariance there is little loss of generality, if S is rotationally invariant. The only additional restriction imposed is then the requirement that the design points be centred with respect to each co-ordinate axis. We nd, using (5), that 0 1 q X n ? 1 CV S (; P ) = sup 2(1) @ n + Q(x(j))A ; (1) j =1

where x(j) is the j + 1th column of Z ,

Pn?1 x x T (9) Q(x) := iP=1n ix2i+1 = xxTQx x; i=1 i and Q is the tridiagonal matrix with (i; j )th element qij = 21 I (ji ? j j = 1). Theorem 1 below shows that if is suciently large then (7) imposes no restriction. It is stated for q = 1 but the extensions to q > 1 are rather evident.

Theorem 1 Let q = 1 and S = [?1; 1]. De ne (

( n even, ? 4c1 n?n22 ; 0 ; n even, 1;n = 2;n = ?2c 2n ?5n+1 ; n odd. 2 ; n odd; 1 n(n?1) n(n?1) (i) For  1;n, V -robust designs for P1 are ( n even, x(1) = ; 0 >; n odd; with CV S (; P1) = 1;n . (ii) For  2;n, V -robust designs for P2 are 8 > < 1; : : : ; 1; ?1; : : :; ?1 >; n even, > < | n={z2 } | n={z2 } x(1) = > < 1; : : : ; 1; 0; ?1; : : :; ?1 >; n odd; > : |(n?{z1)=2} | (n?{z1)=2 } 6

with CV S (; P2) = 2;n . (iii) In each case, the design minimizes CV S (; P ) among those designs in n;1 which maximize jB (0)j.

Remark 1: The designs in Theorem 1 are, for n even, given in Jenkins and Chanmugam (1962) and Constantine (1989). Jenkins and Chanmugam (1962) minimize the variance of the slope of the LSE over a subset of the set of all possible designs carried out at two levels only. The subset consists of all discrete `square wave' designs in which a run of m experiments at one level alternates with a run of m experiments at the other. Then the design problem becomes that of choosing the block size m. In Constantine (1989), the goal is to maximize the trace of the covariance matrix of the BLUE over n;1. A linear approximation to Z T P ?1Z is used to derive the optimal design. In order to get the most V -robust designs for multiple regression, it is necessary to evaluate the extrema of Pqj=1 Q(x(j)) over n;q . This is carried out in the Appendix and yields Theorem 2 below. The statement of this result requires some de nitions. De ne constants and n  1 vectors (j ; rj ); 1  j  n and (j ; sj ); 1  j  [(n ? 1)=2], with the j and j ordered from largest to smallest, by s j ; (10) j = cos n + 1 ; (rj )k = n +2 1 sin nkj + 1  n+1  1 v 0 u cos k ? 2 j A u 1) cot n + 1  @1 ? j = cos j ; (sj )k = t n2((nn + : (11) j + 2) 2 cos n+1 2 j Here j is the solution, in (2j=(n + 1); (2j + 1)=(n + 1)), to the equation tan n + 1  ? (n + 1) tan  = 0: 2 2

(12)

Place the r2j and the sj into a matrix X , and de ne a corresponding sequence fj g, by

X n(n?1) = jjr2; s1; r4; s2; : : : ; r2[(n?1)=2]; s[(n?1)=2]; rn (if n is even)jj; ?1 = <  ;  ;  ;  ; : : : ;  fj gnj=1 2 1 4 2 2[(n?1)=2]; [(n?1)=2]; n (if n is even) > :

7

(13) (14)

Theorem 2 The most V -robust designs in n;q for Pk ; k = 1; 2 have model matrices Z = (1 ::

X (q;k)Dk ) where X (q;k) consist of the last (k = 1) or rst (k = 2) q columns of X and Dk is a

diagonal matrix chosen to have maximum determinant, subject to the constraint that the rows of X (q;k)Dk belong to S . The corresponding covariance matrices of ^ at P 0 = I are 2(Z T Z )?1 =   2 n?1  D?k 2 . The Change-of-Variance Sensitivities are

8  n?1 Pq < 2 n + j=1 n?j  ; k = 1;   CV S (; Pk ) = : ?2c1 n?n 1 + Pqj=1 j ; k = 2: Remarks: 2. If S is a q-dimensional rectangle [?c1; c1]  : : :  [?cq; cq ] then we nd that the max ?1 imizing matrix Dk is given by (Dk )jj = cj max1inj(X (q;k))ij j for j = 1; : : : ; q. See Figure 1 (a)-(d) for examples with q = 1; 2. If S is a q-dimensional sphere of radius c then the (Dk )jj are instead chosen to have maximum product, subject to the constraints Pqj=1 (X (q;k))2ij (Dk )2jj  c2 for i = 1; : : : ; n. 3. An attractive consequence of Theorem 2 is that the projection of the most V -robust design in n;q onto the last (k = 1) or rst (k = 2) q0 co-ordinate axes is most V -robust in n;q . 4. A separate but parallel development shows that if there is no constant term in the model, then X above may be replaced by the n  n matrix consisting of the rj , and fj g by fj gnj=1, all in their natural order. 5. The empirical distribution functions of the design points of the most V -robust designs in n;1 with S = [?1; 1] converge weakly, as n ! 1, to the arcsine distribution with distribution function :5 + ?1 arcsin(x) and density ?1(1 ? x2)?1=2. An asymptotic description of the designs is then given by this limit together with the order in which the design points are to be applied. For P1 the points alternate in sign, with the rst half of the points increasing in magnitude, the second half decreasing. For P2 the rst half of the points are positive, the second half negative. Within each half the points increase and then decrease in magnitude. It is interesting to note that the arcsine design also arises in another context - that of optimal polynomial regression design, as the degree of the tted polynomial tends to in nity. See 0

8

-0.5 -1.0 -1.0

0.0

0.5

1.0

23

-1.0

1.0 0.5

10

-0.5

4

0.0

0.5

0.0

9

18

8

19 25 20 24 212223

1 2

6 8 1.0

-1.0

-0.5

0.0 x1

(c)

(d)

1.0

-1.0

-0.5

0.0

0.5

1.0

1.0

0.5

1.0

20 15

index

5 -1.0

-0.5

0.0

0.5

1.0

-1.0

-0.5

0.0

x

x

x

x

(e)

(f)

(g)

(h)

9

0.5

10

20 15

index

5

5 0.5

10

20 15

index

10

20

0.0

7 6 3 45

25

x1

25

x (b)

15

-0.5

12 2

17 19 21

10

x

10

index

-0.5

15

11

16 17

14

13

14 13 12

(a)

5 -1.0

24

11

x2

x2

0.0

15

1.0

15

16

-1.0

0.5

20 0.5

2018

25

5 0.0

22 3 1

25

-0.5

25

-1.0

7 5

10

index

15 10 5

index

20

9

-0.5

1.0

25

25

Figure 1: Most V -, most B - and M - robust designs, n = 25. (a), (b): Most V -robust designs in [?1; 1] for P1 and P2 resp. (c), (d): Most V -robust designs in [?1; 1]  [?1; 1]. Indices of design points are plotted for P1 and P2 resp. (e), (f): Most B -robust designs for F1, ordered for P1 and P2 resp. (g), (h): M -robust designs for F1, ordered for P1 and P2 resp.

Pukelsheim (1993) for a discussion. This suggests that the designs might be suitable for polynomial regression as well.

4 B-ROBUST DESIGNS By virtue of (6), the B -robustness problem is that of constructing a design to maximize jB (0)j, subject to a bound on supf 2F bTf; B? 1(0)bf;. Such designs then coincide with the Bounded Bias designs of Liu (1994) and Liu and Wiens (1997).

Example 4.1: Consider the case of multiple regression. Let S be a q-dimensional sphere centred at the origin, with radius r determined by the requirement that S have unit volume. Recall (1), and take

Z

Z

z(x)f (x)dx = 0; S f 2(x)dx  2 g: S R The \true" parameter is de ned by 0 = argmin S (E [Y jx] ? zT (x))2dx. Minimax designs F1 = f f j

for this model are considered in Wiens (1992), and in Huber (1975) when q = 1. Bounded Bias designs are constructed in Liu (1994) when q = 1. It is shown in Wiens (1992) that only absolutely continuous design measures are admissible for this problem. Thus, following Sacks and Ylvisaker (1966, 1968) who investigated optimal rates of convergence of n-point designs to absolutely continuous measures, we propose approximating the B -robust design as in the following theorem.

Theorem 3 Let > 0 be given, and put 0 = =(n2), 0 = r2=(q + 2). De ne a distribution function H0(u); 0  u  r by its density h0 (u) = (quq?1=rq )g0 (u) in the following way: 1. If 0 < 0  4=(q(q + 4)) then 2 g0(u) = 1 + (

? 1)( q +4 4 )( u ? q); 0

0

10

where is determined from 0 = q(q + 4)(( = 0 ) ? 1)2 =4: 2. If 0  4=(q(q + 4)) then

p g0(u) = f[( ur )2 ? b]=Kq (b)gI (r b  u  r); where Kq (b) = (1 ? b) ? 2(1 ? b(q=2)+1)=(q + 2) and (b; ) are determined by the equations

? br2 ? 1 = 0:

= 0  KKq+2(b()b) ; q r2Kq (b) q De ne a = [2?q n]; b = n ? 2q a. Put ui = H0?1 (i=a); i = 1; : : :; a and for each i  a let ti;1; : : : ; ti;2

q

be a random permutation of points equally spaced over the surface of the unit sphere in IRq . Then the empirical distribution function n of the points xi;j = ti;j ui , together with b 0's, is an npoint design which converges weakly to a B -robust design 1 with density 1 (x) = g0(jjxjj). The limiting distribution of jjxjj = U is H0 (u), and CBS (1 ; F1) = . 0

Remarks: 6. The notion of points equally spaced over the surface of the unit sphere is to be interpreted in terms of the angles de ned by these points. When q = 1 we are asserting only that the design is symmetric. When q = 2 one may take ti;j = (cos i;j ; sin i;j )T , where i;1; i;2; i;3; i;4 are a random permutation of f i + (j ? 1)=2; j = 1; 2; 3; 4g and 1 ; : : :; [n=4] are a random permutation of angles equally spaced over [0; 2). Note that n 2 n;q . 7. The minimum value of CBS (; F1) is = 0. This is attained only by the Uniform distribution on S , which is then most B -robust.

Example 4.2 For approximate polynomial regression - zT (x) = (1; x; x2; : : :; xp?1) in (1) - Liu and Wiens (1997) construct Bounded Bias designs for the class

F2 = f f j jxpf (x)j  (x) 8x 2 S = [?1; 1]g; where  is a given non-negative function. Liu (1994) considers the case p = 2. The designs are similar to the classically optimal designs which minimize variance alone, in that they have all mass 11

at p symmetrically placed points. For p = 3 and (x)  1 the solution is 81 1 0 1 > < 2 0 + 2 (2 ) 61 ; for 0   2 ;  = > (1 ? 0)0 + 01; for 21  0  32 ; : 13 0 + 32 1; for 0  23 ; 0

where 0 = =n and a is pointmass at a. In this case then, the concept `most B -robust' leads to the clearly impractical design 0.

5 M-ROBUST DESIGNS We outline an approach to the construction of M -robust designs for multiple regression. Let  be B -robust in a class of designs. Note that the B -robustness is una ected by a permutation of the design points. Suppose then that there is a permutation of the points of support of  for which the corresponding design  has CV S (; P )  . Then  satis es both (7) and (8). It maximizes jB (0)j in the class of designs satisfying (8), hence a fortiori in the smaller class of designs satisfying both (7) and (8). It is thus M -robust.

Theorem 4 Let zT (x) = (1; xTq1). Suppose that  2 n;q is B -robust.   (i) If  2(n ? 1)=n and there is a permutation < x 1 ; : : :; xn > of the support points of  for which Q(x(j) )  0 for j = 1; : : : ; q, then the corresponding design   is M -robust for P1.  (ii) If  ?2c1 (n ? 1)=n and there is a permutation < x 1 ; : : : ; xn > for which Q(x(j ) )  0 for j = 1; : : :; q then  is M -robust for P2. Example 5.1: Let F1; S and 1 be as in Example 4.1, with q = 2 and n a multiple of 4. Obtain the ti;j by the method described in Remark 6. Then n 2 n;2. Now apply the design points by alternating between quadrants I and III for the rst n=2 points, and then between quadrants II and IV for the remaining n=2 points. It is readily checked that Q(x(j)) < 0 for j = 1; 2. The conditions of Theorem 4 are then met asymptotically, so that n is `asymptotically M -robust' for 12

fF1; P1g. Similarly, if the design points are applied quadrant-by-quadrant, with the rst n=4 from quadrant I , the next n=4 from quadrant II etc. then the design is asymptotically M -robust for fF1; P2g.

6 COMPARISONS In this section we compare the relative performance of various designs, with respect to straight line regression over S = [?1; 1]. To check design robustness we use several other loss functions and autocorrelation structures besides those used in the derivations. The true model is given by (1) with n = 25, f (x) = (45=8)1=2(x2 ? 1=3) (so that f 2 F1 and R?11 f 2(x)dx = 2) and " = 0P 1=2w where w is a vector of white noise with variance w2 , P is an autocorrelation matrix to be speci ed, and 02 = var(t=w ). The choice of f may be motivated by noting that: (i) this f is least favourable, in a minimax sense, for straight line regression - see Huber (1975); and (ii) a quadratic disturbance represents the most common and worrisome departure from linearity, in most applications. Denoting w?2 times the Mean Squared Error matrix of n1=2^ by C , we nd that ! T !  1 ! 45  Z PZ 1  1 0 1 0 3 2?3 2 C = 0 0 2?1 2 ? 3 ;  ; 3 0 2?1 + 8  n 2 2 where Z = (1 :: x) is the model matrix, k = Pni=1 xki =n is the kth moment of the design, and  = n2=w2 . The value of  may be viewed as re ecting the relative importance of bias versus variance in the mind of the experimenter. We consider the cases  = 0, for which the tted model is exactly correct, and  = 1, in R which case ( f 2)1=2 is of the same magnitude as a standard error. We take P = P j () to be the autocorrelation matrix of one of the following error processes ftg with lag-one autocorrelation : (i) j = 1: MA(1) with   0; (ii) j = 2: MA(1) with   0; (iii) j = 3: AR(1) with   0; (iv) j = 4: AR(1) with   0. For j = 1; 2 we have 02 = 1 + 2, where  2 [?1; 1] satis es  = ?=(1 + 2). For j = 3; 4 we have 02 = (1 ? 2)?1 . 13

The designs considered are: vrob.pos, vrob.neg - the V -robust designs of Theorem 1, for P1 and P2 respectively. mostvrob.pos, mostvrob.neg - the most V -robust designs of Theorem 2, for P1 and P2 respectively. See Figure 1 (a) and (b). mostbrob.pos, mostbrob.neg - the most B -robust designs for F1, approximated as in Theorem 3 and then ordered so as to be robust for P1 and P2 respectively. Thus the unordered design points are equally spaced: i=[n=2]; i = 1; : : : ; [n=2], and 0 if n is odd. These points are then ordered in the same manner as those in mostvrob.pos and mostvrob.neg respectively. See Figure 1 (e) and (f). Note that these designs are also M -robust for (Pk ; F1); k = 1; 2, for suciently large . uniform - n monotonically increasing points equally spaced over [?1; 1]. Apart from the ordering, this design then coincides with the two mostbrob designs. It is included primarily to illustrate the gains to be realized by an ordering which is appropriate for the sign of . mrob.pos, mrob.neg - M -robust designs as in Section 5, ordered so as to be robust for P1 and P2 respectively. The unordered design points are obtained as in Theorem 3, with 0 = 4=5, the boundary between the two cases. The calculations then give that these points are (i=[n=2])1=3; i = 1; : : : ; [n=2], and 0 if n is odd. See Figure 1 (g) and (h).

For each P j ; j = 1; : : : ; 4 we have computed (i) det = jC j evaluated at P (); (ii) trace = trace(C) evaluated at P (); and (iii)imse = trace(CA) evaluated at P (), where A = R 1 (1; x)T (1; x)dx = diag(2; 2=3). Apart from an additive term 2, imse is the Integrated Mean ?1 Squared Error R?11 E [(^y(x) ? E [Y jx])2dx. See Figure 2 for representative plots of the loss against  for each design, appropriate for the sign of . We have omitted the plots for the uniform design since for it the loss for  > 0 becomes 14

so great as to obscure the di erences between the more robust designs. The poor performance of the uniform design derives largely from the large variance of the slope estimate (see Table 1) and, when  > 0, the inappropriate ordering of the design points. We have included only the plots for  = 1 and loss = det or trace, since those in the excepted cases tell much the same story. The ranking of the designs turns out to be independent of the loss function used, and largely independent of jj. When  = 0 the V -robust designs are of course most ecient, followed by the M -robust designs. Note however that the V -robust designs with mass concentrated at 1 allow one little opportunity to test the appropriateness of the linear model, and then only when n is odd. When  = 1 the three loss functions are each minimized by either the most V -robust, or the M -robust designs. The performances of the most V -robust versus the M -robust designs are almost indistinguishable, as are their shapes (see Figure 1). At rst glance the shapes of the loss functions in Figure 2 (a) - decreasing in jj - and the similar though less extreme behaviours exhibited in the other three plots, may seem counterintuitive. That these are in fact features of appropriate orderings of the design points is made plausible by analyzing the dominant term in C . For MA(1) errors and any symmetric design this is ! T PZ  2 Z 2 0 n = 1 + p1 ? 42 diag(1 + 2; (1 + 2)2 ? ) + O(n?1 ) ; where  = Pn?1(x ? x )2=n. From this one obtains that the determinant of this matrix tends i=1

i

i+1

to zero as jj ! 1=2, as do the trace and imse as  ! ?1=2, if  = 42 when  > 0 and  = 0 when  < 0. In each case we ignore the O(n?1) terms. These equalities are not quite attainable, but they are upper and lower bounds respectively. The robust designs considered here order the design points in such a way as to maximize or minimize  for xed 2, subject to the other constraints placed upon them, according to whether  is positive or negative. Table 1 below gives values, for each design pair (.pos, .neg), of (i) 2 ? 1=3, the dominant term in the bias when  6= 0.

15

0

1

2

1

3

2

4

5

3

6

Figure 2: Comparative losses;  = 1. Autocorrelation models and loss functions are: (a) MA(1), det; (b) MA(1), trace; (c) AR(1), det; (d) AR(1), trace.

-0.4

0.0

0.2

0.4

-0.4

(a)

0.0

0.2

0.4

0.2

0.4

(b)

mrob mostbrob

2

1

3

2

4

3

5

4

6

mostvrob vrob

-0.4

0.0

0.2

0.4

-0.4

(c)

0.0 (d)

16

(ii) 1=2 , determining the variance of the slope estimate. (iii) The signal-to-noise ratio 2 = (45=8) (4 ? 22) evaluated at  = 1. As in Huber (1975), this is E [Z ]2=var[Z ], where Z = Pni=1 Yi(f (xi) ? f0) (with f0 := Pni=1 f (xi)=n) is the test statistic of the most powerful (Neyman-Pearson) test, assuming i.i.d. normal errors, of the null hypothesis that E [Y jx] is linear, versus the alternative f (x) given above. (iv) The power of the level = :1 Neyman-Pearson test, when  = 1. For purposes of comparison, note that the best design for such a test (when the errors are i.i.d.), placing one-half of the observations at x = 0 and one-quarter at each of x = 1, has 2 = 1:41 and a power of 0:325. However, this design has a power of 0 against a response function linear at ?1; 0 and 1, regardless of the behaviour at other points.

Table 1. Bias, Variance and Power Measures

Assuming Uncorrelated Errors Design 2 ? 1=3 1=2 S/N ratio Power vrob .627 1.042 .216 .136 mostvrob .188 1.917 .680 .213 mostbrob .028 2.769 .584 .197 mrob .281 1.629 .422 .171

NOTE: For the uniform design, all values coincide with those for the mostbrob designs when the errors are uncorrelated.

The Power column of Table 1 indicates that the non-linear disturbance f (x) is so slight as to be very dicult to detect; nonetheless of the designs considered here the most powerful are the most V -robust designs. Table 2 gives simulated values of the power of the one-sided Durbin-Watson test for serial correlation, with a nominal level of 5%. Twelve thousand sets of residuals were simulated for  = 0 and  = 1, and each of ve correlation structures represented by their autocorrelation matrices I ; P 1(:2); P 2(?:2); P 3(:2) and P 4(?:2). The hypotheses and designs used were those appropriate for the sign of . 17

The simulations indicate that the uniform design fares exceedingly poorly against positive autocorrelation. Among the more robust designs, when  = 0 the size and power of the test do not depend on the choice of design. When  = 1 there is essentially no change in the performance of the V -robust design. However, the power under the most V -robust, most B -robust and M robust designs then tends to be larger at jj than at ?jj. To a lesser extent the frequency of Type I errors increases when  = 1, for all but the V -robust designs. Table 2. Simulated Size and Power of One-Sided 5% Durbin-Watson Tests Design Response/error structure vrob mostvrob mostbrob mrob uniform independent .051 (2) .052 (2) .051 (2) .051 (2) .018 (1)  = 0 ma(1),  = :2 .206 (4) .209 (4) .208 (4) .208 (4) .106 (3) ma(1),  = ?:2 .203 (4) .205 (4) .203 (4) .204 (4) .204 (4) independent  = 1 ma(1),  = :2 ma(1),  = ?:2

.052 (2) .071 (2) .210 (4) .248 (4) .201 (4) .171 (3)

.068 (2) .062 (2) .028 (2) .241 (4) .232 (4) .130 (3) .175 (3) .183 (4) .172 (3)

NOTE: The size and power refer to the proportion of `conclusive' rejections only. Standard errors in the third decimal place are in parentheses. The ar(1) values were very similar to those for ma(1) models and have been omitted.

In general, the powers of the tests considered here are rather low in all cases, illustrating (once again) the point that barely detectable departures from the assumed model can have severe e ects on the precision and accuracy of estimates. We are concerned here solely with the amelioration of such e ects through the use of an appropriate design. For other discussion of the e ects of dependence, and of the attainment of robustness against such e ects through suitable estimation procedures, see Hampel et al (1986, chap. 8), Samarov (1987), Field and Wiens (1994) and references cited therein.

7 EXAMPLE In Werther, Hartge and Rensner (1990), measurements of uid-dynamic properties for gassolid beds are reviewed. Several techniques are discussed for the determination of solids concentrations, velocities, and mass ows in various gas-solid uidized beds - low-velocity bubbling 18

uidized beds, turbulent beds, circulating uidized beds, etc. One technique to determine the integral concentration Cv of solids in a low-velocity bubbling uidized bed is that of pressure-drop measurements. Since the solid particles are held in a state of suspension by the upward- owing gas, the pressure drop of the gas through the bed is equal to the weight of the solids per unit area of the bed. If p = p(h) is the pressure over the height h of the uidized bed, then the relationship between the pressure change p and the height change h is approximately linear within the bed, i.e. p  h; where  = s Cv g for known constants s (the density of the solid) and g (the acceleration due to gravity). Werther, Hartge and Rensner (1990) disuss the estimation of Cv through tting the above relationship, with additive error, after making a small number of measurements (pi; hi). With pi = pi ? pi+1 and xi = hi+1 ? hi the statistical model is pi = xi + i; i = 1;    ; n ? 1:

(15)

If the errors in the measurement of pi can be assumed to be stationary and uncorrelated, then fig in (15) is an MA(1) process with negative correlation (1) = ?0:5. Other sources of variation - in particular the reported carry-over e ect between periodic ushings of the pressure guage render this model somewhat approximate. It seems quite safe however to assume that (1) is negative. The simulation studies in Section 6 have shown that the most V-robust designs are quite robust against misspeci cations of the error structure. We thus take fxig to be the most V -robust design for P2 and straight-line regression through the origin, i.e. xi / sin in (see Remark 4 of Section 3). Then with h1 = h and hn = H we obtain, for i = 1; : : : ; n ? 1, nX ?1 Xi hi+1 = h + (H ? h)  xj = xj j =1 ( i +1) sin 2n  sin 2in : sin (n?2n1)

j =1

= h + (H ? h) 

We have simulated data from this model, for a low-velocity bubbling uidized bed, using input values obtained in collaboration with Dr. I. Nirdosh, Department of Chemical Engineering, 19

Lakehead University. The solids are sand particles with s = 2200 Kg=m3, and the height of the bed is 0:4m. Sixty thousand sets of MA(2) errors, with (1) = ?:36; (2) = ?:08, were used with each of two designs: the most V -robust, with fhig = f:020; :068; :152; :248; :332; 0:380g, and the uniform, with fhig = f:020; :092; :164; :236; :308; :380g. The resulting estimates of the bias and standard error of ^ are (?:002; 3:63) for the V -robust design, and (?:010; 4:04) for the uniform design. Similar values were obtained with other MA(2) models.

8 CONCLUSIONS A fruitful approach to designing for correlated errors has been to seek an ordering of the support points, of a design optimal for uncorrelated errors, which minimizes the variance under the model of correlation. This approach was taken by Jenkins and Chanmugan (1962), resulting in the designs of Theorem 1, found to be V -robust. As in Table 1 these designs tend to result in small variability of the regression estimates when the tted model is exactly correct, but large biases otherwise. Another tack has been to control the bias under departures from linearity - Box and Draper (1959), Huber (1975), Wiens (1991) - resulting in the uniform designs found by us to be most B -robust. See Table 1. Sacks and Ylvisaker (1968) also noted optimality properties of uniform designs for correlated errors. The present work shows that under departures from linearity which are almost undetectable, especially also in the presence of departures from independence, the most V -robust and M -robust designs used here yield estimates with signi cantly smaller mean squared error than those resulting from the competing designs. The favourable properties of these designs appear to be quite insensitive to the loss function used, and to the underlying autocorrelation model. The most V -robust designs have surprisingly high power with respect to tests for model departures in the direction of nonlinear response functions. Given that statisticians routinely t linear models, by Ordinary Least Squares, in the hopes that the departures from linearity/independence are suciently small that these assumptions will 20

not lead them seriously astray, we recommend the use of most V -robust or M -robust designs. They are simple to compute and a ord the necessary protection in such situations.

APPENDIX: PROOFS Proof of Theorem 1: It is easily veri ed that each of the four designs maximizes jB (0)j = jjxjj2=n in n;1 unconditionally, and has the stated value of CV S (; P ), hence is V -robust. For (iii) it must be veri ed that CV S (; Pk )  k;n for each k, if jjxjj2 is a maximum. This is straightforward, upon noting that (a) maximality of jjxjj2 forces x to have the (unordered) elements in the statement of the Theorem; and (b) for such x, the denominator of Q(x) is xed and the numerator is minimized (for P1) or maximized (for P2) by the indicated orderings. The proof of Theorem 2 requires some preliminary results. Lemma A1 The matrix Q has characteristic roots (ch.roots) j and corresponding orthonormal characteristic vectors (ch.vecs) rj , given by (10). The ch.vecs r j are orthogonal to 1 = (1; : : : ; 1)T i j is even. Proof: Both statements may be veri ed by direct calculations. The rst is derived in Grenander and Szego (1958, pp. 67-68).

Lemma A2 With de nitions as at (9)-(14) we have that X T X = I n?1 and that min  n;q

q X j =1

Q(x(j)) =

q X j =1

n?j ; max  n;q

q X j =1

Q(x(j)) =

q X j =1

j :

These extrema are attained at arbitrary non-zero multiples of the rst q and last q columns respectively of X . Proof: Note that

max  n;q

q X j =1

Q(x(j)) = maxftrace QXX T jX T X = I q ; X T 1 = 0g; 21

where X has columns fx(j)gqi=1. If J nn?1 satis es J T J = I n?1 and JJ T = I n ? n1 11T , then the conditions on X are equivalent to \X = JH for some H n?1q with H T H = I q ". With R = J T QJ the desired maximum is thus maxftrace RHH T jH T H = I q g: Theorem 1.10.2 of Srivastava and Khatri (1979) states that this maximum is Pqj=1 j , and similarly the minimum is Pq  , where   : : :   are the ch.roots of R. These extrema are attained if H consists 1 n?1 j =1 n?j of the q corresponding orthonormalized ch.vecs of R, and then x(j) is (any non-zero multiple of) Jh(j). The ch.vecs of R with roots  6= 0 are of the form h = J T z, where z is a ch.vec of R~ := JJ T Q with root . There is an extraneous ch.vec of R~ with root 0 which is useless to us. If n is odd then Q has a ch.vec r(n+1)=2 with root  = 0; this provides an additional ch.vec of R~ . ~ = z may be written as The equations Rz (Q ? I )z = c1; c = 1T Qz=n:

(A:1)

Pre-multiplying by 1T gives 1T z = 0, so that if  6= 0 we have z = JJ T z = Jh. Thus the set X of unordered vectors x(j) consists of the ch.vecs z of R~ corresponding to non-zero ch.roots, plus possibly a vector arising from r(n+1)=2. Case (i): If c = 0 in (A.1) then z is a ch.vec of Q. By Lemma A1 there are [n=2] such vectors which are orthogonal to 1. These include r(n+1)=2 i n + 1 is odd and (n + 1)=2 is even. Thus this case contributes the vectors r2j ; j = 1; : : : ; [n=2] to X , with corresponding roots 2j . Case (ii): Let c 6= 0,  6= 0 and assume that  is not a ch.root of Q. Then the rst equation in (A.1) gives z = c(Q ? I )?11, and the second then yields

1T (Q ? I )?11 = 0:

(A:2)

n+1)=2](1T r 2 Writing (A.2) as P[(j=1 2j ?1 ) =(2j ?1 ? ) = 0 shows that there are [(n ? 1)=2] solutions  = j which when ordered satisfy 2j?1 > j > 2j+1. The remaining elements of X are then (multiples of) the vectors sj = (Q ? j I )?11.

22

Of the n equations given by (Q ? I )s = 1, n ? 2 are of the form (vk+1 ? 1 ?1  ) = 2(vk ? 1 ?1  ) ? (vk?1 ? 1 ?1  ); k = 2; : : : ; n ? 1: This recursion, when solved and combined with the remaining 2 equations, and with (A.2), yields  n+1  1 0 cos k?  vk = @1 ? cos n+12 A =2 sin2 2 ; 2

where  = cos?1  satis es (12). From this we calculate Pnk=1 vk2 = n(n + 1)(n + 2)=(2 sin2 ), whence normalizing s to have unit length gives (11). It then remains only to establish that the terms in (14) are in decreasing order. Since both 2j and j are in (2j+1; 2j?1 ), we require 2j > j . This follows from the observation that the function on the left of (12) is strictly increasing where it is non-negative, and is negative at cos?1 2j . Proof of Theorem 2: If  2 n;q has model matrix Z = (1 :: X 0), then for P1 we are to maximize jZ T Z j = njX T0 X 0j, subject to Pqj=1 Q(x(j)) being a minimum. But by Lemma A2 any X whose columns minimize Pq Q(x ) must be of the form X D for a diagonal matrix 0

j =1

(j )

(q;1)

1

D1, and then jX T0 X 0j = jD1j2 is maximized by maximizing jD1j. The proof for P2 is entirely

analogous.

Proof of Theorem 3: A convexity argument allows a reduction to designs with spherically symmetric densities. The maximization of bTf;B ? 1(0)bf; over F can be carried out as at Theorem 1 of Wiens (1992); the methods of Section 3 of that paper then show that the B -robust design is the solution to the problem of maximizing EH [U 2] subject to a bound on EH [g(U )], where h(u) = (quq?1=rq )g(u) is the density, and H the distribution function, under , of U := jjxjj. To solve this problem we rst x = EH [U 2]=q. With the aid of Lagrange multipliers we nd that g is of the form g(u) = (u2 + )+;  > 0; u 2 [0; r]; where ( ) and ( ) are chosen to make h a proper density, and to attain the required bound on EH [g(U )]. Then EH [U 2] is maximized over , yielding H0(u). The nal statements of the Theorem follow from the observation that, by virtue 23

of spherical symmetry, T = x=jjxjj is uniformly distributed over the surface of the unit sphere, independently of U .

Proof of Theorem 4: Under the stated conditions, in each case we have CV S (; P ) = h i sup(1) 2(1) (n ? 1)=n + Pqj=1 Q(x(j))  .

REFERENCES Bickel, P.J. and Herzberg, A.M. (1979), \Robustness of Design Against Autocorrelation in Time I: Asymptotic Theory, Optimality for Location and Linear Regression," Annals of Statistics, 7, 77-95. Bischo , W. (1992), \On Exact D-optimal Designs for Regression Models With Correlated Observations," Annals of the Institute of Statistical Mathematics, 44, 229-238. Bischo , W. (1993), \On D-optimal Designs for Linear Models Under Correlated Observations With an Application to a Linear Model with Multiple Response," Journal of Statistical Planning and Inference, 37, 69-80. Box, G.E.P. and Draper, N.R. (1959), \A Basis for the Selection of a Response Surface Design," Journal of the American Statistical Association, 54, 622-654. Constantine, G.M. (1989), \Robust Designs for Serially Correlated Observations," Biometrika, 76, 245-251. Field, C.A. and Wiens, D.P. (1994), \One-Step M-estimators in the Linear Model, With Dependent Errors," Canadian Journal of Statistics, 22, 219-231. Grenander, U. and Szego, G. (1958), Toeplitz Forms and Their Application, Berkeley: University of California Press. 24

Hampel, F.R., Ronchetti, E., Rousseeuw, R.J. and Stahel, W. (1986), Robust Statistics: The Approach Based on In uence Functions, Toronto: John Wiley. Herzberg, A.M. and Huda, S. (1981), \A Comparison of Equally Spaced Designs with Di erent Correlation Structures in One and More Dimensions," Canadian Journal of Statistics, 9, 203-208. Hossjer, O. (1991), \The Change-of-Variance Function for Dependent Data," Probability Theory and Related Fields, 90, 447-467. Huber, P.J. (1975), \Robustness and Designs," in: A Survey of Statistical Design and Linear Models, ed. J.N. Srivastava, Amsterdam: North Holland, pp. 287-303. Jenkins, G.M. and Chanmugam, J. (1962), \The Estimation of Slope When the Errors are Autocorrelated," Journal of the Royal Statistical Society, Ser. B, 24, 199-214. Judickii, M. I. (1976), \On Designing Experiments to Estimate the Parameters in Regression Models with Correlated Errors in the Observations," Theory of Probability and Mathematical Statistics, 11, 173-180. Kiefer, J. and Wynn, H.P. (1984), \Optimum and Minimax Exact Treatment Designs for Onedimensional Autoregressive Error Process," Annals of Statistics, 12, 414-450. Li, K.C. and Notz, W. (1982), \Robust Designs for Nearly Linear Regression," Journal of Statistical Planning and Inference, 6, 135-151. Liu, S.X. (1994), \Some Topics in Robust Estimation and Experimental Design," unpublished Ph.D. dissertation, University of Calgary, Department of Mathematics and Statistics. Liu, S.X., and Wiens, D.P. (1997), \Robust Designs for Approximately Polynomial Regression," Journal of Statistical Planning and Inference, to appear. 25

Pesotchinsky, L. (1982), \Optimal Robust Designs: Linear Regression in Rk ," Annals of Statistics, 10, 511-525. Pukelsheim, F. (1993), Optimal Design of Experiments, Toronto: John Wiley. Sacks, J. and Ylvisaker, D. (1966), \Designs for Regression Problems with Correlated Errors," Annals of Statistics, 37, 66-89. ||{(1968), \Designs for Regression Problems with Correlated Errors; Many Parameters," Annals of Statistics, 39, 49-69. ||{(1970), \Designs for Regression Problems with Correlated Errors, III," Annals of Statistics, 41, 2057-2074. Samarov, A.M. (1987), \Robust spectral regression," Annals of Statistics, 15, 99-111. Srivastava, M.S. and Khatri, C.G. (1979), An Introduction to Multivariate Statistics, New York: North Holland. Tang, D. (1993), \Minimax Regression Designs Under Uniform Departure Models," Annals of Statistics, 21, 434-446. Werther, J., Hartge, E. -U. and Rensner, D. (1990), \Messtechniken fuer Gas/Feststo Wirbelschichtreaktoren." Chemie-Ingenieur-Tecknik, 62, 605-613. Wiens, D.P. (1991), \Designs for Approximately Linear Regression: Two Optimality Properties of Uniform Designs," Statistics and Probability Letters, 12, 217-221. ||{(1992), \Minimax Designs for Approximately Linear Regression," Journal of Statistical Planning and Inference, 31, 353-371.

26

Wiens, D.P. and Zhou, J. (1995), \Minimax Designs for Approximately Linear Models With AR(1) Errors," Technical Report 95.01, University of Alberta, Statistics Centre, Department of Mathematical Sciences. ||{ (1996), \Minimax Regression Designs for Approximately Linear Models With Autocorrelated Errors," Journal of Statistical Planning and Inference, 55, 95-106.

27

Suggest Documents