Optimal Spatial Network Design for Temporal Trend Estimation

2 downloads 0 Views 261KB Size Report
which are optimal for temporal trend estimation are also optimal for estimating ... mating a linear functional of the random eld, or some estimate of a parameter of.
Optimal Spatial Network Design for Temporal Trend Estimation J. Andrew Royle

Geophysical Statistics Project National Center for Atmospheric Research

Abstract The location of spatial sampling points so as to minimize a mean squared error criterion is a common concern in the design of monitoring networks. When the objective is temporal trend estimation, one could seek to minimize the variance of the generalized least-squares estimate of trend. This paper describes a general class of spatio-temporal models possessing separable mean and covariance and shows that under this class of models, certain spatio-temporal design problems (including design for estimating temporal trend) can be formulated as purely spatial design problems. Thus, solutions to these spatio-temporal problems are solutions to the simpler spatial problems. For the temporal trend problem under separability, the design which is optimal for temporal trend estimation is also that which is optimal for estimating the spatial mean, the later being a more computationally tractable problem. This intuitive result arises because the temporal structure of the separable model factors out of the trend mean squared error criterion and designs which minimize mean squared error criteria are invariant to scaling of the criterion. J. A. Royle is Visiting Scientist in the Geophysical Statistics Project at the National Center for Atmospheric Research. This research was supported by the National Science Foundation grant DMS93-12686 

1

Keywords: spatial design, network design, spatial sampling, spatio-temporal modeling, spatial design for trend estimation.

1 Introduction A practical problem in spatial statistics is that of optimizing the location of sampling points, such as in the construction of air-quality monitoring networks. The problem may be approached by specifying a covariance-based criterion and optimizing with respect to the sample locations. Possible choices of criteria include the average prediction variance of predictions over the domain of interest, or the prediction variance of the average over a region. A possible goal in the design or modi cation of air quality monitoring networks is trend estimation; therefore, one might consider designs which minimize the variance of the estimated trend. For example, this is of interest when assessing the impact of a regulation on air quality is an objective. Oehlert (1995, 1996) considers the problem of spatial design for temporal trend estimation. Oehlert's work focused on a particular model for wet deposition over a large geographic area and sought to minimize the sum of the variances of trend estimates over several smaller subregions. This paper provides a more general solution to the problem of network design for temporal trend detection for the class of spatio-temporal models with separable mean and covariance. For this class of separable models, it is shown that designs which are optimal for temporal trend estimation are also optimal for estimating attributes of the spatial mean. Thus, the problem of constructing designs for temporal trend estimation has a purely spatial formulation. The design problem can be greatly simpli ed in this manner. Intuitively, this result arises from the fact that the optimum of a mean squared error criteria is invariant to a scaling of the covariance function. For processes with separable mean and covariance, the mean squared error of the estimated trend is just a scaled version of the mean squared error of the estimated mean, where the scale factor depends only on the temporal 2

structure. The result is more general and applies to temporal regression parameters or linear functions of them. We assume that the design problem is to nd the optimal set of p locations, D = fx1 ; x2 ; : : : ; xpg (the \design") from a discrete candidate set C = fxj : j = 1; 2; : : : ; N g. This assumption is not necessary, but is made, in part, for convenience. One can optimize the speci ed criterion over a discrete set of points more easily and perhaps more eciently than one can perform the continuous space optimization. The point-swapping (or exchange) algorithm which we will use is natural for this discrete problem. It is also natural to formulate network thinning and augmentation problems in a discrete manner, as it is seldom that all possible (e.g. random) locations are considered or available for sampling. For problems that are not inherently discrete, one need only produce a ne grid of candidate points to de ne the problem. It is simple and straight-forward to de ne the spatial design problem for estimating a linear functional of the random eld, or some estimate of a parameter of the eld. We review these spatial design problems in Section 2. A general class of spatio-temporal models possessing separable mean and covariance are de ned in Section 3. Results are given that show that the design problem for trend estimation does not depend on the temporal structure of the process and this leads to the purely spatial formulation of the problem. In Section 4 some examples of separable models are given. In Section 5 a generic exchange algorithm for optimizing design criterion is described. An example is given in Section 6 involving an ozone monitoring network in Chicago.

2 Mean Squared Error-Optimal Design Problems In this section, purely spatial design problems that seek to minimize mean squared error criteria are reviewed. Designs from this broad class of problems will be called mean squared error-optimal, or simply \MSE-optimal". In general, MSE-optimal designs can be solved in a straight forward manner. See Cox et al. (1995) for 3

a review of some spatial sampling issues including those related to the following discussion. Let Y (x) : x 2 R2 be a random eld with E (Y (x)) = , V ar(Y (x)) = 2 and Cov(Y (x); Y (x0 )) = 2 k(x; x0 ). Let xj : 1  j  N be the full set of candidate points and let f = WY be a linear functional of the eld, for some m  N matrix W. Denote the p sample locations (i.e. the design) as D = fx1 ; x2 ; : : : ; xpg. Assuming that  is unknown, the best linear unbiased estimate of f given y = fY (x1 ); Y (x2); : : : ; Y (xp)g (a subset of Y at p points) is ^f = 0y where  = k0K?1 + (1 ? k0 K?1 1)(10 K?1 1)?1 (10 K?1 ), k = Cov(y; f ) and K = Cov(y; y). A reasonable criterion upon which to base selection of the p sample locations is the mean squared error

Q(D; f ) = E (^f ? f )2 = WKW0 ? k0K?1 k + (1 ? k0 K?1 1)(10 K?1 1)?1 (1 ? k0 K?1 1)0

(1) which is the prediction variance of ^f . If one chooses f to be the mean of Y on the N points, then W1N = f N1 ; N1 ; : : : ; N1 g and Equation 1 is the prediction variance of the average (PVA). For W1N = f0; 0; : : : ; 0; 1; 0; : : : ; 0g (i.e. the linear functional is the value at a particular location) then this is simply the variance of the so-called \ordinary kriging" predictor. One could consider point estimation at many locations such as the whole eld at N points or a subset thereof. In this case Q(D; f ) is a matrix and one could consider minimizing such things as Trace(Q(D; f )), which is the average prediction variance (APV), maxi Q(D; f )ii , which is the maximum prediction variance, or jQ(D; f )j (the determinant of Q). These are, respectively, A ? optimal, G ? optimal and D ? optimal designs of classical experimental design. Of more interest here is the following problem. Suppose that optimal estimation of spatial regression parameters, , is of interest. The criterion is then

Q(D; ) = E ( ^ ? )2 = 2 (X0 K?1 X)?1 ;

(2)

for some matrix of regression functions X. One could minimize interesting functions of this as before, such as the trace of Q(D; ), maximum of the diagonal elements, 4

or the determinant. For a process with common mean, this reduces to:

Q(D; ) = E (^ ? )2 = 2 (10 K?1 1)?1 :

(3)

The optimal design for any generic criteria will be denoted as:

Dopt = min Q(D): D A unique optimum for Q(D) may not exist. Since 2 designs with the same criterion value are equivalent by de nition of the design problem, no attempt will be made to distinguish between optimal designs for a given criterion. The criteria given by Equations (1), (2) and (3) depend on D through the covariance objects k and K, and possibly X in the case of (2). For convenience, this dependence has been suppressed and will remain so. It is clear that designs which minimize the MSE criteria given above are invariant to a scaling of the covariance function. That is, for any constant, , that does not depend on D, any design which is MSE-optimal under k(x; x0 ) is also MSE-optimal for the criteria under k(x; x0 ) since Q(D) is then simply Q(D). Thus, MSEoptimal designs are invariant to scaling of the criteria. We will make use of this fact by showing that under a certain class of trend models the constant involves the temporal correlation structure but does not depend on the design, D. Hence, the design problem for the spatio-temporal model has a purely spatial formulation, and it's solution is the solution to a purely spatial design problem. It is clear that stationarity of the covariances is not generally necessary. This is particularly useful for problems of optimal \thinning" of an existing network, when data exist from which to estimate the (nonstationary) covariances between all sites and one only wishes to thin the network in a way that minimizes the information lost concerning the existing network. Nonetheless, stationarity is often assumed for pragmatic reasons. Note that none of the mean squared error criteria above depend on the data, y. Hence, it is possible, at least in principle, to optimize these criteria over all possible 5

designs, D. One can minimize design criteria (MSE and otherwise) in a variety of ways. For example, by using \brute force" and sorting through all possible designs of size p. Nychka et al. (1997) take this approach, taking advantage of ecient regression subsetting algorithms for nding the optimal p variables (locations). For this paper, a simple exchange algorithm is used, which has been found to perform adequately for a wide variety of problems.

3 Design for Estimating Temporal Trend To solve the design problem for estimating temporal trend, we rst need to specify an appropriate spatio-temporal model. Assume that the spatio-temporal process is of the form Y (x; t) = f (x; t) + (x; t) (4) where  is the dependent random process and f describes the spatio-temporal mean. Further suppose that f (x; t) is linear. For a matrix of observations from this process, Ynp, we have that E (vec(Y)) = M and V ar(vec(Y)) = V where M is the np  m matrix of regression functions, is a length m vector and V is np  np. Thus, the design problem involves minimizing some quantity involving

Q(D) = (M0 V?1 M)?1

(5)

which is operationally equivalent to minimizing the criterion given by Equation 2. Speci cation of the problem in terms of minimizing Equation 5 does not provide an entirely satisfactory solution for two reasons. First, we have the intuition that temporal trend is essentially a change in spatial mean and thus designs which are optimal for estimating the spatial mean should also be optimal for estimating temporal trend. Second, for very large spatio-temporal problems, optimization of Equation 5 over all designs can pose a computational burden. In the following section, we use this intuition in attempt to simplify the spatio-temporal design problem 6

by restricting ourselves to a class of spatio-temporal models under which solutions of simpler spatial design problems are also solutions to the spatio-temporal problem. For the remainder of this paper, Ynp will denote a sample from Y (x; t) sampled at t = 1; 2; : : : ; n and D = (x1 ; x2 ; : : : ; xp ). Also, let the spatial and temporal covariance functions be denoted as Cov(Y (x; t); Y (x0 ; t)) = 2 k(x; x0 ) and Cov(Y (x; t); Y (x; t0 )) = 2 c(t; t0 ), respectively.

3.1 Separable Mean and Covariance Models To simplify the spatio-temporal design problem, we will consider models with separable mean and covariance. Special cases will be considered in Section 4. The assumption of separable covariance is commonly made in practice. This assumption is that the spatio-temporal covariance matrix can be factored into the product of it's spatial and temporal components:

De nition 1 Separable covariance: Let Y:i and Yj: be a row and column of Ynp, respectively. Let V ar(Y:i ) = 2 C : 8i = 1; 2; : : : ; p and V ar(Yj:) = 2 K : 8j = 1; 2; : : : ; n. Then, the variancecovariance matrix of vec(Y) is separable if: V ar(vec(Y)) = 2 Kpp Cnn; Implicit here is that the temporal covariance structure does not vary across space and the spatial covariance structure does not vary through time. Separability also implies that Cov(Y (x; t); Y (x0 ; t0 )) = 2 k(x; x0 )c(t; t0 ). The assumption of a separable mean is the analog to the assumption of separable covariance. Let Fnq = [f1 ; f2 ; : : : ; fq ] be a matrix of q temporal regression functions common to all sites (i.e. F does not depend on spatial location). Note that for any xo, f (xo; t) = F f (xo ) where f (xo ) is a q  1 vector of \temporal mean" parameters. Let Xpr be a matrix of r spatial regression functions common to all times (i.e. X does not depend on time). Note that for any to , f (x; to ) = X x (to ) where x (to ) is an 7

r  1 vector of \spatial mean" parameters. De ne to be a q  r matrix of regression parameters. Then, vec( ) is qr  1 vector of parameters that corresponds to the design matrix M = X F. Note that vec( ) 6= x f , although it is tempting to make this equality (in general, x and f depend on t and x, respectively, whereas does not). We now make the following de nition:

De nition 2 Separable mean:

A process Y (x; t) = f (x; t) + (x; t) where E (Y (x; t)) = f (x; t) has a separable mean if E [vec(Ynp )] = M = (X F)vec( ):

where X is a p  r matrix of spatial regression functions and F is an n  q matrix of temporal regression functions.

This general situation might be described as one in which the mean is a linear model with regression variables (e.g. trend) that vary over space. Thus, each row of is a set of r parameters describing the spatial variability of each temporal mean parameter. Some examples of separable models are given in Section 4. Generalized least-squares (GLS) parameter estimates of from processes with separable mean and covariance have an interesting property which we now state:

Theorem 3.1 Separability of Generalized Least Squares Estimates: Let Y (x; t) be a spatio-temporal process with separable mean and covariance. For Ynp = (Y1 ; Y2 ; : : : ; Yp ) let X and K describe the spatial mean and covariance (for all t) and let F and C describe the temporal mean and covariance (for all x). The generalized least squares (GLS) estimate of , and it's variance, may be factored as:

^ = g1 (K; X)h1 (C; F; Y) and

V ar(vec( ^)) = g2 (K; X) h2 (C; F)

where g1 (K; X) = (X0 K?1 X)?1 X0 K?1 , h1 (C; F; Y) = ((F0 C?1F)?1 F0 C?1 Y)0 , g2 (K; X) = (X0 K?1 X)?1 and h2 (C; F) = (F0 C?1 F)?1 .

8

Proof:

We have V ar(vec(Y)) = K C and E (vec(Y)) = (X F)vec( ) by assumption. The GLS estimate of is:

vec( ^) = ((X F)0 (K C)?1 (X F))?1 (X F)0 (K C)?1 vec(Y): Properties of the Kronecker product imply that:

vec( ^) = [(X0 K?1 X)?1 X0 K?1 (F0 C?1F)?1 F0 C?1 ]vec(Y): By relationship between the vec operator and Kronecker product (see Lemma 2.2.2 Muirhead, 1982, for example):

^ = [(F0 C?1 F)?1 F0 C?1 Y][(X0 K?1 X)?1 X0K?1 ]0 :

(6)

This veri es the form of g1 and h1 of the Theorem. Now it is clear that

V ar(vec( ^)) = [(X F)0 (K C)?1 (X F)]?1 : Properties of the Kronecker product imply that:

V ar(vec( ^)) = (X0 K?1 X)?1 (F0 C?1 F)?1

(7)

which veri es the form of g2 and h2 of the Theorem.

2 The astute reader will have noticed that h1 of Theorem 3.1 is simply the q  p matrix of GLS estimates of f (xi ) : i = 1; 2; : : : ; p. Furthermore, g1 (X; K)Y is the r  n matrix of GLS estimates of x (t) : t = 1; 2; : : : ; n. Therefore, Theorem 3.1 implies that one can compute the GLS estimates of by rst computing the GLS

estimates of the temporal parameters at each site, and then computing the GLS estimate of from these. The converse also holds. That is, the GLS estimate of may be computed by rst reducing to the GLS estimates of the spatial parameters and averaging these with weights (X0 C?1 X)?1 X0 C?1 . As far as the present work is concerned, Theorem 3.1 is useful as it allows one to remove the dependence of the 9

MSE of ^ on the temporal structure (C and F) by factoring these from the MSE criterion. Theorem 3.1 allows us to formulate the following theorem on spatial design for spatio-temporal processes, a particular application of which is optimal design for temporal trend estimation:

Theorem 3.2 Optimal spatial designs for trend estimation: Let Ynp be

an observation from a spatio-temporal process, Y (x; t), with separable mean and covariance as de ned in Theorem 3.1. Let ^ be the GLS estimate of with elements (i; j ) : i = 1; 2; : : : ; q; j = 1; 2; : : : ; r and V ar(vec( ^)) = V . Then (i) The MSE criterion for minimizing Qo (D; ij ) = V ar( ^ij ) is of the form i [(X0 K?1 X)?1 ]jj , where i = [(F0 C?1 Fi)?1 ]ii . (ii) The MSE criterion for minimizing Q1 (D; ) = Trace(V ) is of the form Trace[(X0 K?1 X)?1 ], where = Trace[(F0 C?1 F)?1 ]. (iii) The MSE criterion for minimizing Q2 (D; ) = det(V ) is of the form det[(X0 K?1 X)?1 ]q , where = det[(F0 C?1 F)?1 ]r . Consequently, D which minimize Qo (D; ij ), Q1 (D; ) or Q2 (D; ) do not depend on the temporal mean or covariance structure, F and C.

Proof: Let F, X, Y and qr1 be as de ned in De nition 3.2 The variance-covariance matrix of ^ (the GLS estimate) is, from Theorem 3.1:

V = V ar(vec( ^)) = (X0K?1 X)?1 (F0 C?1F)?1: If we consider the transpose of Y, say Yn p = Yp0 n then

V  = V ar(vec( ^ )) = (F0C?1F)?1 (X0 K?1X)?1 10

where  has elements which are just rearranged elements of . Thus, diagonal elements of V ar(vec( ^)) are of the form:

V ar( ^ij ) = [(F0 C?1 F)?1 ]ii [(X0 K?1 X)?1 ]jj : By invariance of MSE-optimal designs, min Q(D; i ) = min V ar( ^i ) D D

(8)

= min [(F0 C?1F)?1 ]ii [(X0 K?1 X)?1 ]jj D

= [(F0 C?1 F)?1 ]ii min [(X0 K?1 X)?1 ]jj D = min [(X0 K?1 X)?1 ]jj D

This establishes (i) of the theorem. By properties of the Kronecker product and invariance we have min Q1 (D; ) = Trace(V ) D

(9)

= min Trace[(F0 C?1 F)?1 (X0 K?1 X)?1 ] D

= min Trace[(F0 C?1 F)?1 ]  Trace[(X0 K?1 X)?1 ] D

= Trace[(F0 C?1 F)?1 ] min Trace[(X0 K?1 X)?1 ] D

= min Trace[(X0 K?1 X)?1 ]: D

This establishes result (ii) of the theorem. To establish (iii), we note that in general jArr Bqq j = jAjq jB jp. The result follows directly.

2

Theorem 3.2 simply states that designs which minimize various interesting mean squared error design criteria depend only on the spatial mean and correlation structure. In fact, it is clear that these designs are the solution to the purely spatial design problems speci ed by optimizing the analogous functions of Equation 2.

11

4 Examples of Separable Models Special cases of separable models will now be considered. As used here, separability requires both separable mean and separable covariance. The separable covariance assumption is a commonly made assumption for several reasons: (1) It may be reasonable for \small scale" processes; (2) Even when not entirely reasonable, it is often made out of convenience as few tractable covariance models for spatio-temporal processes exist; (3) For very large problems (e.g. thousands of temporal observations at thousands of spatial locations) the algebraic simpli cations that separability of the covariance imply greatly facilitate analyses. Indeed, for such large problems it would be dicult to even analyze data using covariance information for an arbitrary (i.e. nonseparable) spatio-temporal covariance. Thus, in many cases, the separable covariance assumption is more of a necessity or convenience rather than a reality. On the other hand, the separable mean component of this assumption essentially restricts the the class of models to linear models under which all temporal regression parameters depend on space in the same manner. This of course can be a severe restriction. However, it does make a great deal of sense in some situations, such as in the rst two examples given below, which are ubiquitous in applied statistics. More complicated models, such as the third example given below, are more dicult to support based on practical experience. However, this seems a logical manner in which to build spatial variation into temporal regression parameters.

4.1 The \Common Trend" Model Consider linear models of the form:

Y (x; t) = o +

q X i=1

fi (t) i + (x; t)

for arbitrary temporal regression functions, fi (t), and  a space-time dependent process with separable covariance K C. This is a separable model since the mean component can be put into the notation of Section 3 with Fnq = [1; f1 ; f2 ; : : : ; fq ] 12

and Xp1 = 1. A particular interesting case of this is the following \common trend" model:

Y (x; t) = o + t + (x; t)

(10)

where o and 1 are intercept and trend terms common to all sites in the domain of interest. For xed t, say to , the common trend model reduces to

Y (x; to ) = to + (x) where to = o + 1 to and Cov((x); (x0 )) = 2 k(x; x0 ). An implication of separability is that the optimal design for estimation of to will not change with t. It follows directly that the optimal design for estimation of to minimizes Equation 3:

Dopt = min (10 K?1 1)?1 : D

(11)

And Theorem 3.1 indicates that this is also the optimal design for estimating . Development of Theorems 3.1 and 3.2 was rst motivated by solving the design problem for this common trend model on the di erence process Z (x) = Y (x; t) ? Y (x; t + 1). It is straight forward to verify these theorems for this simple case.

4.2 Analysis of Seasonal Anomalies It is common in the atmospheric sciences (and other applications) to \deseasonalize" data. That is, analyze residuals from a seasonal model t to data at every spatial location. Implied here is the separable mean assumption with the regression model

Y (x; t) = o(x) +

q X i=1

fi (t) i (x) + (x; t)

assumed to hold for all locations x, and where the fi (t) are regression variables corresponding to monthly means or the rst few harmonics. Note that the regression parameters are indexed by x indicating that they are allowed to vary by site. In contrast, under the previous model the regression parameters were common to all 13

sites. It is clear that the mean is separable with X = [110 ]pp and F = [1; f1 ; : : : ; fq ]. If we assume the separable covariance structure on (x; t) then this is a very highlyparameterized separable model. It makes little sense under this model to construct spatial designs which are optimal for estimation of one of the i (x) (unless we are working in a Bayesian context). Indeed, the criterion for optimal estimation of trend (or any of the i (x)) is proportional to: Q(D) = (X0 K?1 X)?1 : Note that X is of rank 1 and the information contained in this Q(D) is equivalent to: Q(D) = (10 K?1 1)?1 which would be the criterion for any and all i (x). So we are led to the conclusion that designs which optimize trend estimation under the highly-parameterized \anomaly" model are the same as those which optimize trend estimation under the \common trend" model.

4.3 Spatial Variation in the Trend The \common trend" model assumes that the temporal trend is common to all sites whereas the \anomaly" model assumes di erent parameters for each site. In between these, one could reasonably posit that the temporal mean parameters vary slowly over space. One such model might hypothesize that the intercept and trend terms vary linearly over space. Letting x = (x1 ; x2 ) then

Y (x; t) = ao + a1 x1 + a2 x2 + a3 x1 x2 + bo t + b1 x1 t + b2 x2 t + b3 x1 x2 t + (x; t): For xed t, say to , this model speci es a linear drift in the mean:

Y (x) = o + 1 x1 + 2 x2 + 3 x1 x2 + e(x) 14

where o = (ao + bo to ), 1 = (a1 + b1 to ), 2 = (a2 + b2 to ) and 3 = (a3 + b3 to ). Similarly, for xed x, this model speci es a linear trend in the mean

Y (t) = o + 1 t + e(t) where o = (ao + a1 x1o + a2 x2o + a3 x1o x2o ) and ( 1 = bo + b1 x1o + b2 x2o + b3 x1o x2o ). Clearly, this model satis es the separable mean assumption of De nition 3.1 with X = [1; x1 ; x2 ; x1 x2]p4 , F = [1; t]n2 ,

1 0 a a a a =B @ o 1 2 3 CA ; bo b1 b2 b3

and vec( )0 = (ao ; bo ; a1 ; b1 ; a2 ; b2 ; a3 ; b3 ). Under the separable covariance assumption, then results such as Theorem 3.1 and Theorem 3.2 apply here. Of course one could extend this model to include more general X and F. For example, F could include seasonal terms, sunspot numbers, the stock market, etc., and X could include, polynomials in space, distance from pollution sinks and sources and orographic effects. Under this model, one may wish to minimize such things as Trace(Q(D; )) or perhaps V ar(^bo ) + V ar(^b1 ) + V ar(^b2 ) + V ar(^b3 ) (the sum of the variance of the \trend" parameters). The reader can verify that these two problems are equivalent by Theorem 3.2.

5 Minimizing the Mean Squared Error In this section, a simple algorithm is described for nding the p points which minimize some criterion, Q(D). This algorithm begins by taking a starting con guration of p points and then successively replacing each design point with the candidate point which produces the smallest mean squared error. Thus, N ? p evaluations of the mean squared error are necessary to perform a single updating of the design set. The use of point-swapping (or exchange) algorithms is not new. The most commonly cited algorithm is Federov's algorithm (Federov, 1972, see also Cook and 15

Nachtsheim, 1980). Two other early references are Kennard and Stone (1969) and Mitchell (1974). See Marengo and Todeschini (1992) and Tobias (1995) for application to distance-based criteria. For D = fx1 ; x2 ; : : : ; xp g the set of p points in the design and C = fy1 ; y2 ; : : : ; yN g the candidate set, the algorithm used here may be summarized as follows: (1) Select a starting design and compute Q(D). (2) For each yk : k = 1; 2; : : : ; p (3) Replace yk by xi : i = 1; 2; : : : ; N and compute N criterion values. (4) Swap yk for the xi which produces the largest decrease over the initial criterion. (5) Repeat 2-4 until no swap can be made. As is often the case for complex design problems, upon convergence, this algorithm may not produce the optimal design. Optimizing with several starting designs helps alleviate this problem. Several devices can greatly decrease the run time without e ecting the optimal design. For example, use of a nearest-neighbor search, where, for each yk only it's M nearest neighbors are considered for swapping (see Royle and Nychka, submitted).

6 An Air Quality Monitoring Network Example We illustrate the design problem for an existing network of ozone monitoring sites in the Chicago area. The data set used here consists of daily maximum ozone for the months April-October (the \ozone season") and years 1981-1991 (214 observations per year, 2354 daily observations). The network has been heavily modi ed throughout the years and data have been collected from at least 43 di erent sites over this 11 year period. The locations of the 43 which have data in this period are shown in Figure 1. In the last year od this particular data set (1991), 24 stations 16

collected data. The temporal distribution and number of non-missing data for each of the 43 sites is shown in Figure 2. De ne these 43 locations as the candidate set of all possible monitoring locations. Our objective will be to select the 24 locations which are optimal for estimating temporal trend from among these 43 and compare the optimal 24 with the 24 that were collecting data in 1991. See Nychka et al. (1997) for another analysis of design problems for this network. Bloom eld et al. (1996) address the issue of trend estimation for these data. Let Y (x; t) be the logarithm of maximum daily ozone at site x on day t. We will assume that the 43 sites have a common bimodal seasonality and trend so that 2t ) + sin( 2t ) + cos( 4t ) + sin( 4t ) + yr (12) Y (x; t) = o + 1 cos( 365 1 2 2 3 t 365 365 365 where yrt is the integer portion of (2354=214) plus a fractional remainder of the day-of-year divided by 365. Bloom eld et al. (1996) analyzed these data and suggested this basic model. Results of Bloom eld et al. (1996) indicate that it is critical to account for dependence of ozone on meteorological conditions when measuring trend. Important variables are maximum daily temperature (maxt), relative humidity (rh), wind speed (wspd), the daily average of the u and v components of wind direction (mu and mv) and opaque cloud cover (op). Since Bloom eld et al. (1996) incorporated these variables into the basic seasonal + trend model for the raw data in a multiplicative fashion, this suggests that a linear model on the transformed scale is adequate. As one possible linear model, we could add linear terms in these variables to (12): 4  maxt + 5  rh + : : : + 9  op (13) A \good" spatial design might estimate all of the critical temporal parameters, 3 ; 4 ; : : : ; 9 , well and so we might consider the criterion:

Qa (D) =

9 X

i=3

V ar( ^i ):

Alternatively, we might consider only the trend parameter:

Qb (D) = V ar( ^3 ): 17

29 28 30 31 27 2

26

20

18 22 15

17

24 25

5

19

8 11 12 137 4 6 9

3 10 16

1

38 23 143734 3635

33

41

39 40

21 43 42

32

Figure 1: Locations of 43 ozone monitoring sites in the Chicago region where observations were made between 1981 and 1991.

18

............................................... ............. ................................................................... ..................................................................................................................... ................... ...................... .......... ............................ ............... ......................... ...................................................................... .... ............. .............................. ....................... ............................................................................................................................................................................................................................................................................ ...................... ............ .... ................. .................................................................................................................................................................. ......................................................... ........................................ ...... ................................................................................................................................................................................................................................................................... ............................................................................................................................ .................................................. ................................................................................................................................................ .............................................................................................................................. ..................................................................................... ........................................................................................................................................................................................ ............... ........................ ................................................................................................................................................. .............................................................................................................................. .............................................................................................................................................................................................................................................................................. ................................................................................................................................................................................................................................................................................ ........................................................................................................................................................................................... ........................................................................... ......................... .................................................................................................................................................................................................................................................................................. ......................... ............................................................................................................................................................................ ........................ ................................................................................................................................................................................................... ........................................... ........................................... ........................ ..................... ..................................................................................................................................................... ........................ ...................... ..................................................................................................................................................... .................................................................................................................. ............................................................................................................................. ............................. ................................. ............ ......... .................................................................................................................. ................................................................................................................................................................................................................................................. ........ .............................................................................................................. .... .............................................................. ..................... ......................... .................... ................................................................................................................................................................ ............................................................................................................................ ............................... ................................................ ............................................................................................

0

10

20

site

30

40

................... ......................

82

84

86

88

90

333 503 1493 746 1342 2191 177 259 1833 335 2200 1444 2283 2267 322 2280 2261 2290 1574 628 211 2308 213 1425 1807 359 363 1605 1620 1945 513 95 67 935 1967 936 898 159 1322 1010 244 390 764

92

year

Figure 2: Temporal distribution of data collected at 43 ozone monitoring sites in the Chicago region for the 11 year period 1981-1991. Right-hand column of numbers is the number of observations for each site.

19

We will assume separable mean and covariance for Chicago ozone. Visual comparisons of spatial covariance functions and temporal auto-correlation functions suggested that this was not unreasonable for the second-moment component of separability. The assumption on the mean component is a subjective assertion more than anything and holds under the speci cation given by Equations 12 and 13. Under separability, Theorem 3.2 established that the criteria given by Qa (D) and Qb (D) are equivalent in terms of the optimal design. In fact, the optimal design for these is that which is optimal for estimating the spatial mean, and so our criterion is: Qc(D) = (10 K?1 1)?1 (14) where K is the spatial covariance matrix. Thus, we will seek the design which optimizes Qc (D). Here, K was taken to be the stationary exponential estimate based on the data. The variance of the GLS trend estimate will be the variance of the GLS mean estimate scaled by a constant which does not depend on D. Here the constant is the sum of (selected) diagonal elements of (F0 C?1 F)?1 where F is the matrix of regression functions (meteorogolical variables, trend, sinusoids) and C is the temporal variance-covariance matrix. Depending on the structure and size of C (here, 2354  2354) , this constant may be dicult to compute. In any case, Theorem 3.2 indicates that one need not even consider C in constructing the design. Therefore, the design problem in this example is greatly simpli ed. Minimization of Equation 14 was done using the algorithm of Section 5. Because this particular problem is small, the optimal design was computed for each of 20 random starting con gurations. Remarkably, all 20 random starts produced the same optimal design (Qc (D) = 289:0154). Based on this, we can have some con dence that the true optimal was found. The average value of Qc(D) for the 20 random starts was 295:8595 and so the improvement of the optimal design over random designs is only about 2%. The value of Qc(D) for the existing (as of 1991) network of 24 sites is 302:286 and so the improvement by the optimal design over 20

the existing network is considerably better. It is apparent that this existing network does not even do as well as random networks! The moderate improvement in criterion value makes sense given the very strong spatial correlation between sites and the high density of sites. Indeed, the optimal value of Qc (D) is plotted vs network size for p = 2 ? 43 in Figure 3 and we see that the decrease in variance of the GLS estimate of the mean is trivial until p reaches a size of 4 or 5.

7 Conclusions and Discussion For a class of spatio-temporal models under which the mean and covariance are separable, it has been shown that spatial designs which are optimal for estimating linear functions of temporal regression parameters (including trend) depend only on the spatial mean and covariance structure. For a simple \common trend" model, the design for estimating temporal trend is identical to that which is optimal for estimation of the mean parameter at any time period. As a result, constructing designs for estimating temporal trend can be made more ecient since computation of the mean squared error of the mean parameter is more ecient than computation of that for the trend. That is, under the separable model which assumes that V = K C, one need only minimize functions of (X0 K?1X)?1 where K is p  p rather than (M0 V?1 M)?1 where V is the full pn  pn spatio-temporal variancecovariance matrix. Intuitively, this result is not surprising since a temporal trend is simply a change in spatial mean over time and thus it is important to estimate the spatial mean well in order to achieve good estimation of it's temporal change! Indeed, it has been shown (Theorem 3.1) that the GLS parameter estimates have an exact representation as a spatial average of temporal GLS estimates under this class of separable models. The design result arises from the fact that designs based on mean squared error criteria are invariant to scaling of the criteria, and the separable model allows the mean squared error of parameter estimates to be factored into 2 components, one involving only the temporal structure, and one involving only the 21

340 330 320 310

existing ----> network

300

Variance of GLS mean

o

290

ooo

0

ooooo

x

oooooooooooooooooooooooooooooooo

10

20

30

40

design size

Figure 3: MSE of GLS mean for designs of various sizes. Circles are the values for the optimal design and the solid line indicates value for the best of 10 random designs.

22

spatial structure. Optimal spatial designs for temporal trend estimation were constructed for the Chicago network, consisting of 24 existing sites as of 1991. Results indicate that the existing 24 site network is not even as good as random designs selected from 43 potential sampling points for estimating temporal trend and spatial mean. Moreover, if one's objective is \thinning" of the existing network then the optimal 5 site network will increase the MSE of the trend estimate by only about 1 % from the 24 site 1991 network. Although separable covariance models are a vast simpli cation and probably unreasonable for large-scale processes, separable models are widely used in practice and may be justi ed for spatial scales on the order of urban air quality monitoring networks. That is, for small spatial regions, it may be reasonable to expect that the temporal structure is similar among sites. Also, it may be more reasonable to expect the spatial covariance structure to be similar over short time scales. For processes believed to be nonseparable, one could approach the design problem by considering subregions as was done by Haas (1992) and Oehlert (1995, 1996). Even they assume separability in some context, but on smaller regional scales. The assumption of separable mean used here has received little attention in spatio-temporal modeling applications, although it seems like a natural assumption and perhaps is more reasonable for many situations than the separable covariance assumption. Clearly, the results presented herein do not hold for nonseparable processes. Intuitively, regions with stronger temporal correlation should be sampled less. Also, if the spatial correlation structure changes over time, there will not be a single design which is optimal for estimating the trend at all times. When the design objective is to construct a static network of monitoring sites, and given a lack of \future" data with which to base information of temporal heterogeneity on, one has little choice but to assume that the spatial covariance structure remains constant. A general approach to the temporal trend design problem for arbitrary spatio-temporal 23

covariance structure seems dicult to formulate due to the absence of tractable statistical models in the literature. Complicated problems will likely have to be solved on a case by case basis under models developed for speci c situations, such as was done in Haas (1992) and Oehlert (1995, 1996).

References Benedetti, R. and Palma, D. (1995). Optimal sampling designs for dependent spatial units. Environmetrics 6, 101-114. Bloom eld, P., Royle, J.A., Steinberg, L.J. and Yang, Q. (1996). Accounting for meteorological e ects in measuring urban ozone levels and trends. Atmospheric Environment 30. Cook, R.D. and Nachtsheim, C.J. (1980). A comparison of algorithms for constructing exact D-optimal designs. Technometrics 22, 315-324. Cox, D.D, Cox, L.H., and Ensor, K.B. (1995). Spatial Sampling and the Environment. Technical Report 18, National Institute of Statistical Sciences, RTP, NC. Federov, V.V. (1972). Theory of Optimal Experiments, translated and edited by W.J. Studden and E.M. Klimko. Academic Press, New York. Haas, T.C. (1992). Redesigning continental-scale monitoring networks. Atmospheric Environment 26A, 3323-3333. Kennard, R.W. and Stone, L.A. (1969). Computer aided design of experiments. Technometrics 11, 137-148. Marengo, E. and Todeschini, R. (1992). A new algorithm for optimal, distancebased experimental designs. Chemometrics and Intelligent Laboratory Systems 16, 37-44. 24

Mitchell, T.J. (1974). An algorithm for the construction of \D-optimal" experimental designs. Technometrics 16, 203-210. Nychka, D., Yang, Q. and Royle, J.A. (1997). Constructing spatial designs using regression subset selection. Statistics for the Environment 3: Pollutation Assessment and Control, V. Barnett and K.F. Turkman (eds.). Wiley, New York (to appear). Oehlert, G.W. (1995). The ability of wet deposition networks to detect temporal trends. Environmetrics 6, 327-339. Oehlert, G.W. (1996). Shrinking a wet deposition network. Atmospheric Environment 30, 1347-1357. Royle, J.A. and Nychka, D. An algorithm for the construction of space- lling coverage designs with implementation in SPLUS. Submitted to Computers and Geosciences. Tobias, R. (1995). SAS QC Software. Volume 1: Usage and Reference. SAS Institute, Inc., Cary, NC.

25

Suggest Documents