A Hierarchical Approach to Multivariate Spatial Modeling ... - CiteSeerX

0 downloads 0 Views 827KB Size Report
ance or variogram of the multivariate process and estimating the model parameters. A common approach is to estimate empirical and cross-covariance functions ...
A Hierarchical Approach to Multivariate Spatial Modeling and Prediction J. Andrew Royle National Center for Atmospheric Research and L. Mark Berliner Ohio State University & National Institute of Statistical Sciences

Abstract We propose a hierarchical model for multivariate spatial modeling and prediction under which one speci es a joint distribution for a multivariate spatial process indirectly through speci cation of simpler conditional models. This approach is similar to standard methods known as cokriging and \kriging with an external drift", but avoids some of the inherent diculties in these two approaches including speci cation of valid joint covariance models and restriction to exhaustively sampled covariates. Moreover, both existing approaches can be formulated in this hierarchical framework. The hierarchical approach is ideally suited for, but not restricted for use in, situations in which known \cause/e ect" relationships exist. Because the hierarchical approach models dependence between variables in conditional means, as opposed to cross-covariances, very complicated relationships are more easily parameterized. We suggest an iterative estimation procedure which combines generalized least-squares with imputation of missing values using the Best Linear Unbiased Predictor. An example is given which involves prediction of a daily ozone summary from maximum

1

daily temperature in the Midwest. Keywords: Cokriging, EM algorithm, generalized least squares estimation, geostatistics, kriging with external drift, missing data, tropospheric ozone.

1 Introduction Ozone and Temperature: Regional prediction of tropospheric ozone levels is an

important problem in environmental monitoring. Modelers seek to understand the response of ozone levels to meteorological variables in seeking quality prediction of ozone behavior. Indeed, it is well-accepted that ozone is physically dependent on temperature. See the National Research Council (1991) report for further discussions of the importance of ozone modeling and the role of temperature. Motivated by these interests we present a predictive analysis of ozone based on temperature data in a region of the Midwest of the Unites States. The data include (i) ozone measurements at 147 sites and (ii) daily maximum temperature measurements at 40 sites in the same region. Interest is in prediction of the ozone eld over the region at a xed set of grid points. The data locations and the prediction points are shown in Figure 1. Since ozone and maximum temperature are believed to be highly correlated, the temperature observations should assist in producing better predictions of the ozone eld. However, the ozone and temperature monitoring sites are not co-located, nor are there temperature data at ozone prediction sites. These problems, combined with complex spatial interrelationships among and between ozone and temperature values create severe challenges to implementating standard spatial statistical procedures for these data.

Overview: Multivariate spatial prediction problems such as the ozone/temperature

problem are ubiquitous in environmental and related sciences. For univariate problems the most common prediction method is kriging. Major extensions of the kriging methodology to multivariate problems are: (1) cokriging, which is based on a model for the joint distribution of the variables, or at a minimum, a model for the joint rst 2

and second moments; and (2) kriging with an external drift (KED), which is based on a model for the conditional distribution (or it's moments) of one or more primary variables given the remaining covariable(s). A review of multivariate spatial statistical techniques can be found in Wackernagel (1995); the classical treatment of both univariate and multivariate geostatistics can by found in Journel and Huijbregts (1978). The purpose of this paper is to pursue formulation of spatial prediction problems in a hierarchical modeling format. Modeling then involves speci cations of conditional and marginal distribution models that, taken together, imply a valid joint model. Limitations and diculties that arise in using cokriging and KED motivate this work. First, standard cokriging requires the typically dicult speci cation of a joint covariance function (or variogram), for all variables. The hierarchical approach seeks to explain aspects of complex covariance structures across variables by modeling conditional means. This not only frees the modeler of the task of modeling and estimating covariance relationships for a very large number of random quantities, it allows the modeler to actively use understanding, such as beliefs about causal relationships, to reduce the complexity of spatial covariance modeling. While this may seem to be KED, note that in KED, spatial structures of covariables are not modeled. Hence, the KED methodology requires that the covariable(s) are observed everywhere that predictions of the primary variables are desired, as well as at all locations where the primary variables are observed (Ahmed and De Marsily, 1987; Gotway and Hartford, 1996). Methods discussed here for the implementation of the hierarchical strategy involve combination of (i) best linear unbiased prediction (BLUP) ideas and (ii) statistical estimation methods in the presence of missing data. The rst issue is very common in spatial statistics (Cressie, 1991); the second will permit KED-like analyses with greater generality than is currently available in the literature. The perspective agrees with cokriging in principle, i.e., we produce BLUP's based on an estimated 3

spatial covariance structure. The production of that model however is more in tune with the ideas of KED. Section 2 is an overview of these modeling strategies for the case of two variables (i.e. bivariate). In Section 3 we present a general formulation of the hierarchical model, followed in Section 4 by a general estimation scheme based loosely on the EM algorithm. In Section 5 we brie y discuss two important generalizations. We revisit the ozone/temperature example in Section 6 and state conclusions in Section 7.

2 Multivariate Spatial Prediction 2.1 Cokriging and Kriging with External Drift Cokriging: Myers (1982), and Ver Hoef and Cressie (1993) provide reviews of

classical cokriging. As in the case of kriging, the goal of the cokriging procedure is optimal linear unbiased prediction. The approach begins with speci cation of means and joint covariances for all variables. One seeks the solution to a constrained minimization problem: Minimize the mean-squared prediction error within the class of procedures that are linear in the observed variables and subject to an unbiasedness property of the predictor. The result is the Best Linear Unbiased Predictor (BLUP). Crucial steps in practical cokriging are posing a model for either the joint covariance or variogram of the multivariate process and estimating the model parameters. A common approach is to estimate empirical and cross-covariance functions of the processes separately. However, it may be dicult to ensure validity (i.e., positive de niteness) of the resulting joint-covariance matrices; see Myers (1982), Cressie, (1991, p. 141), and Ver Hoef and Barry, (1997) for further discussion. Operational approaches which place restrictions on cross-covariance models to produce valid joint covariance matrices are based primarily on mathematics rather than one's understanding of the structure of the processes being modeled. The common linear model 4

of coregionalization is one such approach under which one models both marginal and cross-covariance structures as linear sums of basic \elemental" models (Journel and Huijbregts, 1978, p. 326; Wackernagel, 1988, 1989). See also Ver Hoef and Barry (1997) for another approach and Myers (1982) for necessary conditions relating marginal and cross-covariance structures in order that the joint structure be valid.

KED: In KED and other regression-like approaches, one or more primary variables

are singled out to be predicted, while the others are treated as xed covariables (or auxiliary variables). This notion has been considered by several authors, including Ahmed and De Marsily (1987), Brown, Le, and Zidek (1994), Bourennane et al. (1996), Gotway and Hartford (1996) and references contained within these. The KED approach essentially reduces the prediction problem to a \universal cokriging" problem. Since covariables are not modeled as spatial processes, predictions of primary variables can be constructed only at sites at which the covariables were observed. For situations in which the covariables are not exhaustively observed, use of interpolated, predicted or otherwise \ lled-in" missing values has been suggested (Ahmed and De Marsily, 1987, Gotway and Hartford, 1996, and Haas 1996). However, these approaches are somewhat ad hoc and in many cases result in underestimation of prediction variances since, intuitively, the variance of the predicted covariates is not taken into account.

2.2 The Hierarchical View We develop this approach in a probability modeling context and in doing so, formalize the interrelationships among the various approaches. Following Ver Hoef and Cressie (1993), let fZ(s) : s 2 Dg represent a multivariate spatial random eld, where Z(s) 2 RN and D  Rd . That is, N variables are to be analyzed. Assume that D is nite and let Zi ; i = 1; : : : ; N represent the 5

vector of values of the ith variable. Throughout this paper the bracket notation [] will denote the probability distribution of the argument contained within the brackets. Although we do not rely on probability distribution assumptions for the model developed in Section 3 (and estimation in Section 4), this proves useful for the following general development. Let [Z1 ; : : : ; ZN ] represent the joint probability distribution of (Z1 ; : : : ; ZN ). Construction of this distribution, or at least its rst two joint moments, is the key to cokriging. Alternatively, suppose that we actually only require predictions for the rst m of the variables, Z1 ; : : : ; Zm and can use auxiliary information from the others Zm+1 ; : : : ; ZN . Construction of a conditional model (or at least the rst two conditional moments) [Z1 ; : : : ; Zm jZm+1 ; : : : ; ZN ]

(1)

is the cornerstone of probabilistic KED and other regression-like procedures. (We note that the auxiliary variables are often denoted by X 's; e.g., Gotway and Hartford, 1996.) The hierarchical approach is based on viewing a joint probability distribution as a product of conditionals. For example, if N = 3, the full joint distribution can be written as [Z1 ; Z2 ; Z3 ] = [Z1 jZ2 ; Z3 ][Z2 ; Z3 ]

(2)

= [Z1 jZ2 ; Z3 ][Z2 jZ3 ][Z3 ]

(3)

= [Z1 ; Z2 jZ3 ][Z3 ]:

(4)

We call this a \hierarchical" approach (i) due to its direct parallel with hierarchical modeling in the Bayesian literature (Bernardo and Smith, 1994), and (ii) to di erentiate it from approaches which condition on auxiliary variables, but do not model them spatially. This is not a probability paper; we introduce the above notion as a motivation for statistical modeling. This extra structure provides important understanding. 6

Expressions (2)-(4) suggest (at least) four potential modeling strategies. The analyst chooses which of the four to employ depending on the goals, data availability, and ease with which the components can be modeled and estimated. Assessing the left hand side of (2) is essentially cokriging. However, rather then specifying a full joint model, a model can be constructed by choosing to formulate the components in any one of the products in (2)-(4). Hence, (i) predictions for all variables can be developed, and (ii) the implied joint covariances matrices are ensured to be positive de nite. Further, as with KED, the term [Z1 jZ2 ; Z3 ] in (2) could be used to perform prediction of Z1 values, using the auxiliary variables Z2 and Z3 . If all needed values of these variables are observed, the component [Z2 ; Z3 ] plays no role. However, if only portions of the required auxiliary values are available, standard KED breaks down. However, consideration of [Z2 ; Z3 ] will permit our analysis. The intuition is familiar: we will use the model [Z2 ; Z3 ] to \predict" unobserved values of auxiliary variables, and then proceed with predictions of Z1 values. To formalize this intuition, we employ statistical methods for the treatment of missing data. Also, note that the formulation in (4) can be viewed as a combination of cokriging and KED, in that predictions of Z1 and Z2 are based on a joint model (as in cokriging) that is conditioned on Z3 (as in KED). See Royle et al. (1997) for an example of such a model.

2.3 Bivariate Illustration The purpose of this section is to clarify interrelationships among approaches in the simplest of contexts.

Notation: Let (Z (si); Z (si)); i = 0; 1; 2; : : : ; n be pairs of random variables de ned at n + 1 spatial locations. Let Z and Z denote the corresponding (n + 1)-vectors. As shorthand for the statement \Y is a random vector with mean  and covariance matrix , we write: Y  (; ). Note that no formal distributional assumption is 1

2

1

2

implied by this notation.

7

Cokriging: Assume that Z and Z are jointly distributed with means E [Z ] =  and E [Z ] =  and covariance structure as follows: 1 0 1 0 10    Z CA); B@ CA  (B@ CA ; B@ (5) 0    Z where ii is the marginal covariance matrix of Zi ; i = 1; 2; and  is the crosscovariance matrix of Z and Z . A prime symbol 0 indicates matrix transposition. Here the i are (n +1)  1 arbitrary vectors, we expand on their modeling in Sections 1

2

2

1

1

2

1

1

11

12

2

2

12

22

12

1

2

3 and 4. Suppose that the problem is to predict (Z1 (s0 ); Z2 (s0 )) after observing (Z1 (si ); Z2 (si )); i = 1; 2; : : : ; n:

(More general prediction contexts are considered in Section 4.) One can compute the BLUP for (Z1 (s0 ); Z2 (s0 )) given the data by familiar methods (from (5)), thus producing the cokriging predictor. The formulae for the predictors are given in Myers (1982), Stein and Corsten (1991) and Ver Hoef and Cressie (1993). Implementation of this theory requires speci cation of the unknown parameters in (5). Usually, all of the required covariance sub-matrices (i.e. the marginal and cross-covariances) are parameterized in some fashion (e.g. by stationary covariance functions). Data-based estimates are obtained for the mean and covariance parameters. These results are then \plugged-into" the prediction formula to obtain an estimated BLUP. We avoid the details of estimation for the sake of brevity; the reader is referred to Cressie (1991) and previously cited references for details.

KED: The KED model, as most commonly applied, assumes: Z jZ  ( + bZ ;  : ): 1

2

1

2

12

(6)

The matrix 1:2 is the conditional covariance matrix of Z1 given Z2 . The predictor of the unobserved value of Z1 is then the usual BLUP for this univariate universal kriging problem (see Cressie, 1991 p. 151). 8

Hierarchical Formulations: The hierarchical approach is based on speci cation of the rst two moments of the distributions:

[Z1 jZ2 ] and [Z2 ]: These speci cations would typically be parameterized models such as polynomial drift mean models and stationary covariance functions. Consider the rst stage of the model: Z1jZ2  (1 + B(Z2 ? 2); 1:2 ) (7) where B is an (n +1)  (n +1) matrix of regression coecients and 1:2 is a positive de nite conditional covariance matrix of Z1 jZ2 . Next, assume that

Z  ( ;  ): 2

2

(8)

22

where 22 is a positive de nite covariance matrix. The implied means and joint covariances of Z1 and Z2 under this speci cation are:

1 0 1 0 10 0  + B B B  Z CA): B@ CA  (B@ CA ; B@ : 0 (B )   Z 1

1

2

2

12

22

22

22

(9)

22

It is crucial to note that models (5) and (9) are equivalent, alternate parameterizations of each other. In comparison to cokriging, the hierarchical formulation hinges on speci cation and estimation of 1:2 and B rather than 11 and 1:2 . Practically speaking, direct speci cation of joint moments as in cokriging implies marginal covariance structure and the matrix B, whereas speci cation of a conditional and marginal model as above implies the cross-covariance structure. Hence, we are not claiming a generalization of cokriging, but rather that there are cases in which modeling and estimation are more readily accomplished using the hierarchical version. Further, parameterizing the dependence between Z1 and Z2 by a matrix B, rather than simply a scaler as in standard KED, permits exible modeling. (See Section 3.) Although one could parameterize the KED model in this manner, the 9

requirement of exhaustive sampling of the Z2 covariate will generally prevent exploitation of this parameterization. An interesting aspect of the implied cross-covariance structure B22 is that if B is not simply a constant times the identity matrix (for such an example, see Section 4.1.2), then the cross-covariance and marginal covariance matrices of Z1 must be nonstationary (i.e., corresponding to a nonstationary covariance function). Also, a common practice in cokriging is to specify cross-covariance models that are scaled versions of the marginal covariance models. This is (a special case of) the linear model of coregionalization referred to previously (Wackernagel, 1988, 1989), although more generally the coregionalization model is taken to be a sum of scaled \elemental" functions. Such structures arise naturally in the hierarchical view, e.g. if B = I, where I is the identity matrix, then 12 = 22 .

Ordering the Hierarchy: There are several practical considerations for deciding on the order of hierarchical modeling (i.e. whether to model the moments of [Z jZ ] and [Z ] or [Z jZ ] and [Z ]). First, as in the example of Section 1, known, causal 1

2

2

1

2

1

relationships imply a natural ordering for conditional speci cation. Second, one might be able to understand the model and parameters comparatively well for a particular speci cation. Third, data or spatial coverage imbalance might suggest a particular nested structure. Finally, a purely mechanical approach is to select the marginal model based on the variable which appears to have the more stationary covariance function; perhaps severe nonstationarity may be explained through conditioning. In conclusion, one may simply choose the conditioning order such that the models are easiest to construct and/or best t the data.

Technicalities: First, the conditional model (7) is centered in that the model was based on Z ?  . Alternatively, one could consider the model 2

2

Z jZ  ( + BZ ;  : ): 1

2

1

10

2

12

(10)

With this parameterization, the joint moments are given by

1 10 0 1 0 0 B@ Z CA  (B@  + B CA ; B@  : + B B B CA): (B )0   Z 1 2

1

12

2

22

22

2

22

(11)

22

Note that the covariance structure remains the same as in (9); the marginal mean of Z1 is di erent. The decision of which parameterization to use is equivocal. Centering in often recommended in the regression literature because it leads to more stable least squares estimates of regression parameters. Further, in the centered case, the interpretation of 1 as the marginal mean of Z1 is often convenient. On the other hand, estimation of parameters may be easier in the uncentered case. Next, if we add the assumptions that the distributions suggested in (7) (or (10)) and (8) are Gaussian, then the implied joint distribution (9) (or (11)) are also Gaussian. Finally, the calculations of marginal means and covariances described above follow from probability theory results, known as \iterated conditional expectation." (See Casella and Berger, 1990, Chapter 4.) Let E (Z) denote the expectation of the random vector Z and cov(Z) denote the covariance matrix of that vector. We arrive at (11) from (10) by using

E (Z1 ) = E (E (Z1 jZ2 )) = E (1 + BZ2 ); and

Cov(Z1 ) = E (Cov(Z1 jZ2 )) + Cov(E (Z1 jZ2 )) = 1:2 + Cov(1 + BZ2 ) = 1:2 + B22 B0 :

3 A Class of Hierarchical Models 3.1 Modeling and Parameterization We again focus on the bivariate case and assume that Z1 and Z2 are (n + 1)-vectors for simplicity. These assumptions can be relaxed but the notation is cumbersome. 11

3.1.1 Component One: Assume that

Z jZ  (X + BZ ;  : ); 1

2

1

2

1

12

(12)

where

 X is a xed, (n +1)   matrix is a  -vector of parameters. The elements of X can be chosen to model surfaces in the spatial domain D. (e.g., see Ver 1

1

1

1

1

Hoef and Cressie, 1993, p. 221.)

  : is an (n + 1)  (n + 1) conditional covariance matrix. 12

 B is a matrix of parameters. In general, we consider parameterization of B

that involves a relatively low dimensional vector of regression parameters , say p  1, and indicate the dependence of B on this vector as B() (which will generally be suppressed).

We consider \uncentered" mean models in this section; centered analogs can also be considered.

De nition and Assumption: We assume that B is linearizable in . We de ne linearizable to mean that there exists an (n + 1)  p model matrix such that B()Z = : 2

(13)

The elements of are selected functions of the elements of Z2 . We provide examples of B( ) matrices and the corresponding matrices in Section 3.2. We believe that formulating a model by speci cation of B may be more intuitive in some cases than is speci cation of directly. We introduce this additional apparent complication because aspects of statistical model tting are often simpli ed. In particular we can employ the well-developed method of generalized least squares estimation to estimate  (see Section 4). 12

Mean Speci cations: The conditional mean model in (12) E (Z1 jZ2 ) = X1 1 + 

(14)

has several interpretations and parallels in the literature. First, the term X1 1 is the sort typically associated with universal kriging (i.e. a nonconstant spatial mean). The second term corresponds closely to the KED notion. The modeling strategy is that of linear (in parameters) modeling so popular in statistics. This development is in direct analogy to mixed models in statistics; the location variables represented in X1 are viewed as \ xed e ects," while the Z2 are \random e ects." For discussion see Laird and Ware (1982). Conditional Covariance Speci cations: In our analyses 1:2 is modeled to depend on a low dimensional parameter vector, denoted by 1:2 . These may be parameters associated with any one of a number of common covariance function models. (Unless needed to avoid confusion, this dependence is suppressed.) The formulations here are intended to take advantage of the conditioning argument. The key is that 1:2 is easier to model than is the marginal covariance 11 (see Section 4).

3.1.2 Component Two: Next, assume that

Z  (X ;  ): 2

2

2

22

(15)

where

 X is a xed, (n+1) matrix and is a  -vector of parameters. As above, the elements of X can be chosen to model surfaces in the spatial domain D. 2

2

2

2

2

  is an (n + 1)  (n + 1) covariance matrix (the marginal covariance matrix of Z . 22

2

While we parameterize 22 via a low dimensional vector 22 , the case for simplicity here is not strong as in the argument regarding 1:2 since 22 is not based on a 13

conditional distribution. Of course, this step is also an issue in standard cokriging, so this is not a additional diculty induced by our approach. Note that the \ordinary" assumptions of constant means, X1 = 1, 1 = 1 and/or X2 = 1, 2 = 1, can be analyzed (here 1 is an (n + 1)-vector of ones).

3.2 Illustrations Example A - Simple Linear Regression:

A direct analog of simple linear regression may be selected. This corresponds to a scalar value  and diagonal B (i.e. B = I). The model matrix is then actually Z2 itself: E (Z1 jZ2 ) = X1 1 + Z2 : (16) Thus, this simple case of the component 1 model corresponds to the common KED model.

Example B - Spatially Varying Dependence on Z2:

Suppose sites are labeled in some two-dimensional coordinate system, si = (ui ; vi ); i = 0; : : : ; n. Let B be a diagonal matrix with ith element

bi = o + 1 ui + 2 vi and set  = (o ; 1 ; 2 )0 . We then have that

2 66 6  = 666 64

3

z0 u0 z0 v0 z0 7 0 7 o z1 u1 z1 v1 z1 77 B B 77 BB 1 .. . 75 @ 2 zn unzn vnzn

1 CC CC : A

where zi is the ith element of Z2 . This can be readily extended to arbitrary spatial functions (e.g. higher order polynomials, spline functions, etc.).

Example C - Nearest-Neighbor Regression:

A plausible model, particularly for data on a lattice or regular grid, is to construct 14

a \nearest-neighbor" regression model, by appropriate selection of B. The notion is that spatial structure of the Z1 variable at a given site depends in part on the behavior of Z2 in a neighborhood of that site. Though the comparison is not exact, this sort of thinking is parallel to the idea of Markov random elds (see Cressie, 1991, for a review). Suppose that our n +1 spatial locations are equally spaced on a circle; e.g., monitoring sites equally placed around a source (nuclear power plant, aerosol producing factory, etc.). One nearest-neighbor model is

2 66  66  66 0 BZ = 666 .. 66 . 66 64 0

1 2

2

3

2 0 : : : 2 7 70 1 1 2 0 : : : 77 B z0 C 7B C 2 1 2 : : : 777 B B z1 CC

77 BB ... 77 B@  775 zn

: : :  2 1 2 2 0 : : :  2 1

CC : CA

Identifying  = (1 ; 2 )0 , we can rewrite this model as

2 66 66 6  = 666 66 64

z0 z1 z2 .. .

3 z + zn 7 7 z + z 777 0 1  @ CA : z + z 777 B 77  75 1

0

2

1

3

1 2

zn zn?1 + z0

One can readily construct analogs in the plane. A related parameterization of B can be constructed for situations in which Z1 depends on the spatial derivative of Z2 . An example of this is given in Royle et al. (1997). It is possible to choose B to parameterize dependences that are anisotropic, but details are omitted for brevity.

15

4 Estimation Procedures We present an algorithm for parameter estimation under the hierarchical model. The following assumptions described in Sections 2 and 3 are used:

 the conditional mean of Z given Z is linearizable in  and linear in Z 1

2

2

 the covariance matrices  : and  are parameterized in some fashion (e.g. 12

22

using stationary functions) by vectors 1:2 and 22 , respectively.

For convenience we only present analysis for the bivariate situation. Though the ideas carry over readily for higher dimensions, the notational overhead is burdensome. We propose an iterative scheme that is analogous to iteratively reweighted leastsquares (IRLS), but tailored to handle missing data in spatial settings. For discussion of IRLS, see Weisberg (1985, Section 4.1) and Gallant (1987, Section 5.5). Recall the uncentered conditional mean of Z1 given Z2 : E [Z1 jZ2 ] = X1 1 + BZ2 . Due to missing data the model may not be directly usable as a basis for estimation of parameters. This situation occurs in several cases: (1) Z1 and Z2 data are not co-located; (2) Z2 is not exhaustively sampled; (3) the form of B is such that the conditional mean relates elements of Z1 explicitly to unobserved elements of Z2 , and vice versa. This occurs, for example, in the nearest-neighbor dependence discussed in Section 3. To circumvent this problem, we include an imputation-of-missing-data step in our scheme by roughly following the paradigm of the \EM algorithm". See Little and Rubin (1987). We use the following notation. Assume that sites are labeled to permit the following partitions of Z1 and Z2 :

0 1 0 1 u Zu Z Z = B@ o CA and Z = B@ o CA ; Z Z 1

1

2

2

(17)

2

1

where the subscripts u and o indicate unobserved versus observed values. Note that the dimensions of Zu1 and Zu2 need not be the same. 16

4.1 Basic EM Algorithm Under distributional assumptions on both [Z1 jZ2 ] and [Z2 ], the EM algorithm is 1. E-Step: Find the expected log-likelihood

E = E (log([Z1 jZ2 ]) + log[Z2 ])

(18)

where the expectation is taken with respect to the missing data for the current guesses of parameter values. 2. M-Step: Compute maximum likelihood estimates (MLE) for parameters in E . These become the current guesses. These steps are iterated until convergence (Little and Rubin, 1987, Chapter 7). To clarify how (18) arises for the hierarchical model, see Little and Rubin (1987, p. 97).

4.2 Outline of Our Algorithm Under Gaussian assumptions the exact EM algorithm could be applied; see Little and Rubin (1987, Chapter 8) for implementations for multivariate normal models. We relax model assumptions here, and rely on least squares estimation and missing data imputation via BLUP, rather the MLE and expected log-likelihoods. The method is related to Buck's Method (Little and Rubin, 1987, pp. 45-47) for the treatment of missing data and is similar in avor to \back- tting" algorithms of additive models, (see Hastie and Tibshirani, 1990 and Green and Silverman 1994), and also the \Cochrane-Orcutt" procedure, which iterates between estimation of mean and covariance parameters (Cochrane and Orcutt, 1949; see also Mardia and Marshall, 1984 and Neter, Wasserman and Kutner, 1990, p. 496). First, based on rough data analysis, choose initial values for all unknown parameters. Call these values, the current estimates. Next, iterate the following procedure until convergence: 17

1. Imputation: Estimate the missing data Zu1 and Zu2 using the BLUP based on the current parameter values. 2. Estimation of Parameters (IRLS): (a) compute (generalized) least squares estimates of 1 ; 2 , and , assuming the current estimates of the covariance parameters and missing data are the true values. (b) estimate 1:2 and 22 (e.g. by weighted least-squares), assuming the least squares estimates obtained in step (i), as well as the current imputed values, are true. This process is iterated until convergence. Further details of the algorithm and prediction are given in the Appendix. Furthermore, though our main theme in this article is the hierarchical model, these estimation strategies can be adapted to non-hierarchical cokriging formulations.

4.3 Example: Diagonal B with co-located observations If we assume that B = diagfbi : i = 0; 1; : : : ; ng is a diagonal matrix where the elements bi may themselves be functions of , and the data are co-located, then the algorithm above simpli es substantially. Namely, imputation of missing Zu2 values is unnecessary for estimation; estimation of  can be based on co-located pairs. This case is important because this is the typical situation in KED applications.

5 Generalizations 5.1 Multivariate Extensions to hierarchies of three or more variables is conceptually straightforward. In practice, e ective implementation may require subject-matter based modeling decisions. Further, substantial notational burdens arise. The reader is refered to Royle et al. (1997) for an example of a trivariate hierarchical spatial model. Under 18

the linearizable and linear assumptions on all conditional mean speci cations, the estimation algorithm of Section 4 extends readily.

5.2 Nonlinear in Z

2

In some settings one may wish to allow the elements of Z2 to enter the conditional mean of Z1 nonlinearly. As an example, suppose each coordinate Z1 is modeled with quadratic dependence on the covariate:

E [Z1 (si)jZ2 ] = o + 1 Z2(si ) + 2Z22 (si): The full conditional mean model can be shown to be linearizable in the parameters and, hence, amenable to the approach of Section 4. While our estimation procedure allows this, it should be noted that the form of the implied joint covariance matrix of Z1 and Z2 is not readily available, and hence the BLUP is not straight forward to specify. However, under distributional assumptions BLUP imputation can be replaced by conditional expectation-based imputations.

6 Ozone-Temperature Analysis We consider the problem of predicting ozone, Z1 , and daily maximum temperature, Z2 . The ozone data are recorded at 147 sites in the Midwest region of the United States (these data are a subset of the EPA AIRS data base covering 89 days during the summer of 1987; for background and analyses involving this ozone data set see Nychka and Saltzman, 1998). The particular ozone summary is the daily 8 hour average (ppb) between the hours of 0900 and 1700 and the temperature covariate is the maximum daily temperature, Maxt (degrees C). The daily temperature data are recorded at 40 sites in the same region and are taken from the Solar and Meteorological Surface Observation Network (SAMSON) database (National Renewable Energy Laboratory, 1993). The ozone and meteorological monitoring sites are shown in Figure 1. For illustration we consider a single day, June 9, 1987, and our objective 19

will be to produce predictions of both ozone and temperature on the 400 point grid shown in Figure 1. Note that interest might primarily be in predicting ozone, but there is no diculty in predicting both. Also, note that none of the observations of the two variables are co-located. For a recent application of cokriging to spatial prediction of a regional ozone summary using meteorological information, see Phillips et al. (1997). Their single covariate was an ozone \exposure" index constructed from daily maximum temperature, wind direction and NOx inventories. Graphical summaries of these data and their relationships are shown in Figure 2. Panel (a) shows a grayscale image plot of a smoothed ozone eld (using a thin-plate spline) in order to produce a visualization of the spatial variability. A grayscale image plot of the smoothed temperature eld is shown in panel (b). We note that the two elds appear highly correlated with larger values (higher temperature and ozone levels) in the south on this day and a general decreasing trend in both variables from South to North. The empirical covariance estimates are shown in Figure 3. In panel (a), the marginal covariance estimates of ozone are given. The marginal covariance estimates of Maxt are shown in panel (b). A scatter plot of the residuals of both ozone and Maxt from their estimated means is shown in panel (c). The correlation between the two residual elds is 0:678. Finally, the conditional covariance function of ozone given Maxt is given in panel (d).

6.1 Models and Estimation We assume that marginal spatial means for both ozone and temperature are a constant plus latitudinal drift component; E [Z1 (s)] = 10 + 11  lat(s) and E [Z2 (s)] = 20 + 21  lat(s). Also, we model the conditional mean of ozone as linearly related to temperature by setting B = I. The conditional covariance function of ozone given Maxt was assumed to be the 20

following function:

8 > < if h = 0 e +  : k : (h) = > :  : (cos( h) + 1112 sin( h))e?11 h otherwise 2 12

2

12

2 12

12

12

for h = jjs ? s0 jj, the distance separating points s and s0 . This family of covariance functions is given in Thiebaux (1985) and has proven useful in atmospheric science applications particularly involving covariance structures that appear periodic and contain negative values. The marginal covariance function of Maxt was assumed to behave similarly (with parameters 21 , 22 and 22 , but with no nugget, due to the very smooth nature of the temperature eld). The iterative algorithm of Section 4 was used and convergence of all mean and covariance parameters was rapid, occurring in approximately 7 iterations. The parameter values at the end of each of 9 iterations are shown in Table 1 (the rst row are the starting values). Since the ozone and temperature sites are not co-located, the algorithm involved an imputation step, whereby the Maxt values at the ozone sites were lled-in using the BLUP. The tted marginal covariance function of Maxt and conditional covariance function of ozone given Maxt are shown as the smooth lines in panels (b) and (d), respectively, of Figure 3. The predicted ozone and Maxt elds are shown in Figures 4(a) and 5(a), respectively, with the corresponding prediction standard error elds shown in panels (b) of the respective gures. We observe a large degree of smoothing in the ozone eld, and a structure that behaves largely according to the covariate Maxt eld, as expected. The mean prediction standard error for the 400 grid points is 4:217 ppb. For comparison, we computed ozone predictions and prediction variances using ordinary kriging. Again the mean was assumed to be a constant plus latitudinal drift and the marginal covariance function for ozone was assumed to be that given above (but with di erent parameters) (see Figure 3(a)). The prediction and prediction variance elds are shown in Figure 6. The mean prediction standard error 21

Table 1: Parameter estimates at the end of each of the rst 9 iterations of the estimation algorithm. The rst row are the starting values. 

10

11

20

21

e2

0.00 4.49 3.92 1.77 2.44 2.98 3.12 3.13 3.13 3.13

45.41 35.32 42.96 42.20 46.78 46.03 45.86 45.85 45.85 45.85

-0.051 -0.043 -0.029 -0.025 -0.032 -0.028 -0.027 -0.027 -0.027 -0.027

24.16 24.49 24.33 24.34 24.35 24.35 24.35 24.35 24.35 24.35

-0.013 -0.011 -0.010 -0.010 -0.011 -0.011 -0.011 -0.011 -0.011 -0.011

20.00 51.70 58.82 89.87 76.26 73.32 72.18 72.05 72.05 72.05

12:2

70.00 99.96 56.14 28.73 29.35 28.42 29.08 29.18 29.18 29.18

11

12

4.00 1.50 2.66 2.68 2.73 2.75 2.76 2.76 2.76 2.76

2.00 5.67 6.82 9.35 8.21 8.08 7.99 7.98 7.98 7.98

22

8.00 5.02 5.85 5.59 5.55 5.52 5.51 5.51 5.51 5.51

21

22

6.00 2.13 2.66 2.68 2.73 2.75 2.76 2.76 2.76 2.76

3.00 5.32 5.37 5.46 5.50 5.51 5.52 5.52 5.52 5.52

at the 400 grid points is 6:028 ppb. This is substantially larger than that from the bivariate model. Of course, this approach is inconsistent with the conditional model. Therefore, we computed ozone predictions and prediction variances using the implied marginal covariance structure of ozone (see (9)). These predictions are thus equivalent to predictions based on the conditional model but with the covariate dependence \integrated out." The prediction and prediction variance elds based on ordinary kriging with the implied covariance structure are shown in Figure 7. The mean prediction standard error at the 400 grid points is 5:137 ppb.

6.2 Comments First, the actual ozone/temperature data are space-time data. Although prediction on a day-by-day basis is likely to produce reasonable predictions, extension of the model to include the temporal aspect may lead to a substantial reduction in prediction variance. Second, we have focused on a small, high-quality, covariate data set for this illustration. Much higher density meteorological data are available 22

(although some of questionable quality). For example, the National Climate Data Center maintains a database of \cooperative observation station" data which would likely increase the covariate station density in this region by an order of magnitude. Results given here suggest that this would have a very large e ect on predicting ozone. Third, the stationary linear conditional relationship appeared sucient for this particular problem. But, di erent days could require di erent parameterizations, perhaps even a spatially varying relationship.

7 Conclusions We have presented a general modeling and estimation strategy for multivariate spatial prediction. This hierarchical view involves speci cation of one or more conditional models as opposed to direct speci cation of a joint model. Advantages include (1) avoidance of the dicult task of specifying valid cross-covariance models, (2) removal of data sampling restrictions associated with KED, and (3) the opportunity to aggressively model complicated inter-dependence structures between variables. Estimation uses an iterative weighted least-squares procedure in conjunction with imputation of missing data via BLUP. Prediction is based on the BLUP derived from the implied joint covariance structure. At the heart of the approach is parameterization of relationships between variables in the mean as opposed to the covariance. Here, we focused on linear mean models. This is not a comparatively severe limitation due to the basic equivalence between conditional linearity of means and dependence modeling via crosscovariances as in cokriging. The hierarchical model was applied to the problem of joint modeling and predicting of a daily ozone summary and maximum daily temperature. In this problem, the conditioning of ozone on temperature is suggested by the nature of the problem. Due to the high correlation between the two elds, a reduction in prediction standard error of about 30% over a standard application of ordinary kriging was 23

observed.

Acknowledgements This research was supported by the National Center for Atmospheric Research, Geophysical Statistics Project, sponsored by the National Science Foundation under grant #DMS93-12686; The Vegetation/Ecosystem Modeling and Analysis Project (VEMAP) sponsored by NASA Mission to Planet Earth; the USDA Forest Service Global Change Research Program; The Electric Power Research Institute (EPRI); and The National Institute of Statistical Sciences. The authors would like to thank Noel Cressie, Chris Wikle, two anonymous referees and the Associate Editor for many helpful comments.

References Ahmed, S., and De Marsily, G. (1987), \Comparison of Geostatistical Methods for Estimating Transmissivity Using Data on Transmissivity and Speci c Capacity," Water Resources Research, 23, 1717-1737. Bernardo, J.M. and Smith, A.F.M. (1994), Bayesian Theory, New York: Wiley. Bourennane, H., King, D., Chery, P. and Bruand, A. (1996), \Improving the kriging of a soil variable using slope gradient as external drift," European Journal of Soil Science, 47, 473-483. Brown, P.J., Le, N.D., and Zidek, J.V. (1994), \Multivariate spatial interpolation and exposure to air pollutants," The Canadian Journal of Statistics, 22(4), 489509. Casella, G. and Berger, R.L. (1990), Statistical Inference, Paci c Grove, CA: Brooks/Cole (Wadsworth), 650 pp. 24

Cochrane, D. and Orcutt, G.H. (1949), \Applications of least squares regression to relationships containing autocorrelated error terms," Journal of the American Statistical Association, 44 32-61. Cressie, N. (1985), \Fitting variogram models by weighted least squares," Journal of the International Association for Mathematical Geology, 17, 563-586. Cressie, N. (1991), Statistics for Spatial Data, New York: John Wiley & Sons, 900 pp. Gallant, A.R. (1987), Nonlinear Statistical Models, New York: John Wiley & Sons. Gotway, C.A. and Hartford, A.H. (1996), \Geostatistical methods for incorporating auxiliary information in the prediction of spatial variables," Journal of Agricultural, Biological, and Environmental Statistics, 1(1), 17-39. Green, P.J. and Silverman, B.W. (1994), Nonparametric Regression and Generalized Linear Models: A roughness penalty approach, New York: Chapman & Hall. Haas, T.C. (1996), \Multivariate spatial prediction in the presence of non-linear trend and covariance non-stationarity," Environmetrics, 7, 145-165. Hastie, T.J. and Tibshirani, R.J. (1990), Generalized Additive Models, New York: Chapman & Hall, 335 pp. Journel, A.G. and Huijbregts, C.J. (1978), Mining Geostatistics, London: Academic Press, 600pp. Laird, N.M. and Ware, J.H. (1982), \ Random-e ects models for longitudinal data," Biometrics 38, 963-974. Little, J.A. and Rubin, D.B. (1987), Statistical Analysis with Missing Data, New York: John Wiley & Sons. Mardia, K.V. and Marshall, R.J. (1984), \Maximum likelihood estimation of models 25

for residual covariance in spatial regression," Biometrika, 71(1), 135-146. Myers, D.E. (1982), \Matrix formulation of co-kriging,", Mathematical Geology, 14(3), 249-257. National Research Council (1991), Rethinking the Ozone Problem in Urban and Regional Air Pollution, Washington D.C.: National Academy Press. National Renewable Energy Laboratory (1993), Solar and Meteorological Surface Observation Network, 1961-1990, Version 1.0, National Climatic Data Center, Federal Building, Asheville, NC, 28801. Neter, J., Wasserman, W., and Kutner, M.H. (1990), Applied Linear Statistical Models, 3rd edition, Boston: Irwin. Nychka, D. and Saltzman, N. (1998), \Design of air quality monitoring networks," in: Case Studies in Environmental Statistics Nychka, D., Piegorsch, W., Cox, L. eds., Lecture Notes in Statistics, Berlin: Springer-Verlag. Phillips, D.L., Lee, E.H., Herstrom, A.A., Hogsett, W.E., and Tingey, D.T. (1997), \Use of auxiliary data for spatial interpolation of ozone exposure in Southeastern forests," Environmetrics, 8, 43-61. Royle, J.A., Berliner, L.M., Wikle, C.K., and Milli , R. (1997), \A hierarchical spatial model for constructing wind elds from scatterometer data in the Labrador Sea," Technical report No. 619, Dept. of Statistics, Ohio State University. Stein, A., and Corsten, L.C.A. (1991), \Universal kriging and cokriging as a regression procedure," Biometrics, 47, 575-587. Thiebaux, H.J. (1985), \On approximations to geopotential and wind- eld correlation structures,", Tellus, 37A, 126-131. Ver Hoef, J.M. and Cressie, N. (1993), \Multivariable spatial prediction," Mathematical Geology, 25(2), 219-239. 26

Ver Hoef, J.M. and Barry, R.P. (1997), \Constructing and tting models for cokriging and multivariable spatial prediction," Journal of Statistical Planning and Inference, (to appear). Wackernagel, H. (1988), \Geostatistical Techniques For Interpreting Multivariate Spatial Information," in Quantitative Analysis of Mineral and Energy Resources, eds., C.F. Chung et al., Dordrecht, D. Reidel, pp. 393-409. Wackernagel, H. (1989), \Description of a Computer Program for Analyzing Multivariate Spatially Distributed Data," Computers and Geosciences, 15, 593-598. Wackernagel, H. (1995), Multivariate Geostatistics, Berlin: Springer-Verlag, 256 pp. Weisberg, S. (1985), Applied Linear Regression 2/e, New York: John Wiley & Sons.

8 Appendix: Estimation Details The algorithm is best understood by comparision. In principle, one could adopt a cokriging approach as follows: By keeping track of appropriate rows and columns from (11), we could nd the means and joint covariance matrix of the data,

0 1 B@ Zo CA Zo 1 2

and estimate parameters. This is a horri c estimation problem, primarily due to the complicated way B enters these covariances. Instead, our suggestion uses the conditional modeling structure and iteration to easily estimate  employing generalized least squares.

Step (1) - Imputation of Zu: 2

We use the BLUP's, denoted by Z^ u2 , of the elements of Zu2 . The usual cokriging BLUP's, based on the current parameter values, are easily constructed using the appropriate elements of the rst and second moments from (9). 27

Step (2a) - Estimation of Mean Parameters

Assume that all indicated covariance matrices are known. We employ two substeps. In the rst, estimates for 1 and  are obtained via familiar generalized least squares (GLS) arguments. Based on the conditional model for Z1 given Z2 , we have that the implied rst and second moments of Z~ o1 given Z~ 2 are:

0 1 Zo jZ  (M B@ CA ; o: ); 1

(19)

1

2

12



where M is the matrix [Xo1 ; o ], and Xo1 , o , and o1:2 all represent the appropriate submatrices of X1 , , and 1:2 as determined by the pattern of observation of the Z1 variable (i.e. the rows that correspond to observed values of Z1 and Z2). The GLS estimates of 1 and  are given by

0 1 B@ ^ CA = fM0 (o: )? Mg? M0(o: )? Zo : ^ 1

1

12



1

12

1

1

(20)

To estimate 2 , we use the marginal model

Zo  (Xo ; o ); 2

2

2

22

where Xo2 and o22 are the appropriate submatrices of X2 and 22 . the resulting GLS estimate is ^ 2 = f(Xo2 )0 (o22 )?1 Xo2 g?1 (Xo1 )0 (o22 )?1 Zo2 :

(21)

Step (2b) - Estimation of Covariance Parameters:

To estimate the covariance parameters 1:2 and 22 we employ procedures common in geostatistics. First, we estimate the marginal, spatial covariances of Z2 empirically using nX (h) 1 c^2 (h) = n(h) [Z2 (s) ? x2 (s)0 ^ 2][Z2 (s0 ) ? x2 (s)0 ^ 2 ] i=1

(22)

for all, available (Z2 (s); Z2 (s0 )) pairs such that jjs?s0 jj = h. Here, n(h) is the number of data pairs separated by each particular lag (in general, this is the number of pairs 28

such that h ?   jjs ? s0 jj  h +  for some ) and x2 (s) are the appropriate rows of Xo2. For Z1, one takes into consideration the dependence on Z2 appropriately. That is, to estimate the conditional covariance structure 1:2 , we use

c^1:2 (h) = n(1h)

nX (h) i=1

[Z1o (s) ? (x01 (s) ^ 2 + 0 (s)^ )][Z1o (s0 ) ? (x01 (s0 ) ^ 2 + 0 (s0 )^ )] (23)

for all, available (Z1o (s); Z1o (s0 )) pairs such that jjs ? s0 jj = h and where x1 (s) and (s) are the appropriate rows of Xo2 and o . In the second step, parametric models are t to the c^2 (h)'s and c^1:2 (h)'s and estimates are obtained for 22 and 1:2 . Since the variances of the localized covariance estimates di er, one will generally use weighted least-squares (see Cressie, 1985).

29

44









x

























• x o •







• x •





































•x •







• oo • •o •



x• •









• oo • x • •

























• o • •









38

40

42

• x•

-94

• •

















• o • x•













• o •

•o •







• •















• o • xo • • •























































































-92



•x • o • o • • o • •o • o •o • o • o • • o o o o x •o •o o •o o• • ox o o o• • • o• • o o o o o o•o • • x • •o o oo oo o oo o•o o • • •ox •o o

• x • •







• o x • • • o • • • oo o o x • oo oo•oo • o o oo • o• •



36

Latitude

x

• o• • •



o





-88

x







x •























• • • • xo•o • xo • o • • oo• o o o ox • • • o• • • x •oo •



























o • xo • o o• x • • • •

o

• • xoo•











• xoo • o • •





o•o • o o •x ooo• • o • • •

•o • • o •o • • ox o o •o •o •

• • oo • o•

-90







• ooo• •x o • •

• o • o• •

• • •o •

• • • x oo •o o• • ooo o •o • • oo o xo o • • •

• •

• • • • • x •

•o x o o •









• o• xo • •





• • x•

































• x • •













• x •







•x •







-86

-84

-82

Longitude

Figure 1: Locations of ozone (\o") and Maxt (\x") observation sites and prediction grid. 30

42



• • •

• •

80

• • • ••

•• •

• •• • •••• • •• • •• • •• • •• •• •• • • •• •



••

••



• • • • • •• ••





• • • •• ••

• • • •• • • •



••• •

••





••

•••

• • • • • • • •• • • • • •



•••• • •••••••• • •











38

• •

••

• • •





36

20



• •

••



40

Latitude

• ••

60



40

44

(a): smoothed ozone

(b): smoothed Maxt •







• •









42

• •







• •



25



40

Latitude







30





38







• •

• •





20

44

• •



• •

36





0 36

-20

38

-10

40

Latitude

42

10

44

20

(c): residual field

Figure 2: (a) Smoothed Ozone eld; (b) Smoothed Maxt eld; (c) Residual eld (di erence between (a) and (b)). Smoothing was done for purposes of visualization using a thin-plate spline. 31

6 • •

0.2

0.4

• •

-2

0.6

0.0

0.2

0.4

• •



0.6



0.8

80 100

(d): Conditional of Ozone | Maxt

• • ••• • •• •• • •• • • •• • • • •• • • • •••• • ••• •

-40

0



-40

• -2

0

2

Maxt residual

4



60



40

covariance

• •• • • •••••• •••••• •••• • • • • ••• • •• • • • ••••••••••••••• • •••••••••• • • • •• • • • • • • •• •• • ••• ••••• •• •• • •• • • • •• • •• • • •





(c): Ozone-Maxt scatter

20

-50



• • •

distance (1000 km)

20 0



distance (1000 km)



-20

• •





-4

• •

0

•• •• • • • • •• ••• • • •• • •••• • • ••• • • •• •• • • • • • • • ••

• •

2



covariance

••

0.0

Ozone residual



4

• • •

50

100

(b): Marginal of Maxt



0

covariance

150

200

(a): Marginal of Ozone

6



•• • • • •• • • • ••• • • ••• • • • ••••• •• • • ••• • • • • • • • •• • ••• • •• • • •• • • • • •• • • •• • • • • • • 0.0

0.2

0.4

0.6

distance (1000 km)

Figure 3: (a) Marginal covariance estimates of ozone; (b) Marginal covariance estimates of Maxt and tted covariance function; (c) Scatter plot of ozone and Maxt residuals from mean; (d) Conditional covariance estimates of ozone given Maxt and tted covariance function. 32

(a): Ozone Predictions 50

55

44 ••

40

Latitude

42

30

38

65

36

••

-94

40

• 30 •• • • • • • ••• •• •• • •• • •••• •• • •• •• •••••• • • ••• • ••• •••• ••• • •••••••• •• •• 35 25 ••• • • 30 • • • • • • •••• ••35 ••• • • ••• • ••• • 40 45 • •••• • • ••••••••• •• 50 • ••• • • ••••• 70• 75 •• • • •

35

55

45

60

60

-92

-90

-88

55

-86

50

-84

-82

Longitude

•••

•• 5

40

• •• • • • • ••• •• • •••• • •• •• •••••• ••2 •••••••••••





6

38

Latitude

42

44

(b): Ozone PSE

36

••

-92

•• ••

• •• • ••3 4 • • ••• • •• •• 4 •• • • • 3 • ••• •3 • • • •••• •• • • ••• 4 • ••• • • •••• • • ••••••••• •• • • •••• • 5 ••••• • • •• •

8

-94

6 5

7

-90

6

6

-88

-86

-84

-82

Longitude

Figure 4: (a) Ozone predictions and (b) Ozone prediction standard errors produced under the hierarchical model. 33

(a): Maxt predictions

44

•25 23



24



42

• •



• •22 •

19





19 20 21 22 23 24 25 26 27







• •

• •

38

• 32 •



30

29

29

-90





31

33

-92



• •

• •

29



20





36

20

21



25 24

40

Latitude

25



-94







28

19





-88

-86

28





-84

-82

Longitude

(b): Maxt PSE •

44

0.4

42 40

0.4

0.6



• •

Latitude

0.4

• 0.4



0.4

•0.4

1

• •

0.2 0.2



•0.2

0.2

0.2







•0.2•

•0.2

•0.2



•0.2

0.2



•0.2 •

0.8 1





0.2

0.8 0.6 0.2



0.2



0.2

0.4

• • 0.4

0.2 0.2

0.2



0.4



• •

38



0.6

36

0.4 0.6



0.8 1.4 1.210.8 0.6

1.6

-94





-92

-90



-88



0.6



-86

-84

0.4 0.6

-82

Longitude

Figure 5: (a) Maxt predictions and (b) Maxt prediction standard errors produced under the hierarchical model. 34

(a): Ozone Predictions 25

44

20

• 25

••

•••

38

40

42

20

Latitude

• •• • • • 45••• •40 •• • •••• • •• •• •••••• •• •••••••••••



60

36

••

• •• • •• 30 • • ••• • ••35 •• 35 •• • • • • 40 ••• • • • • •••• ••45 • • ••• • • • • • 50 • •••• • 55 ••••••••• •• •• • • • 60 ••••• ••75 • 75 65 •• • • • 70

-94

-92

25

•• ••

40

70

-90

-88

-86

-84

-82

Longitude

(b): Ozone PSE 9

40



••

•••

8

38

Latitude

42

44

10



7

••

36

6

••

-92

10

• •• • • ••• •• •• • •••• • •• •• •••••• •••• 3 •••••••••

9 8 7 5 6 4

••4••

• •• • •• • • ••• •

••

••• • 6 • • 4 • • • • • • •4 • • • • 3 • 3 • • • ••• 6 • ••••••• • • 7 3 • • • • • • • • •••••• • 8 •••• • ••••• •• • • •

10

-94

8

9

9

-90

-88

10

-86

-84

-82

Longitude

Figure 6: (a) Ozone predictions and (b) Ozone prediction standard errors produced from ordinary kriging. 35

(a): Ozone Predictions

35 30 25 20

•••

40

••

••

••

36

•• ••

• •• • •• •• ••• •

••

35 • •• • 35 • • • • • • • • •••• •40 •• • • ••• • ••• • 45 • 50 •••• • • ••••••••• •• • • •••• • ••••• 65 70 •• • • •



60

55

-94

40

• •• • • ••• •• •• • •••• • •• •• •••••• •••• •••••••••



38

Latitude

42

44

45 50

-92

-90

-88

-86

-84

-82

Longitude

(b): Ozone PSE 8

9

40

•••

•• 6 7



8

36

••

••

-92

• •• • • ••• •• •• • •••• • •• •• •••••• •••2• •••••••••

9 8 7 6 5 4

•• •• 3

• •• • 3•• •• ••• •

••

••• • • 5 • •• •3 •• 5 • • • • • • • • • • • •• •3 ••••••• • • 6 • • • • • • • • •••••• • 7 •••• • • •••• •• • • • 8

9

-94

6

7



38

Latitude

42

44

10

8

-90

-88

-86

10

9

-84

-82

Longitude

Figure 7: (a) Ozone predictions and (b) Ozone prediction standard errors produced from ordinary kriging but with covariance structure implied by the hierarchical model. 36