Additive Covariogram Models and Estimation through Projections

0 downloads 0 Views 542KB Size Report
A new definition of in-fill asymptotics based ...... (53) can be replaced by single integrals if the general covariance forms CY , Cθ and ... necessary to obtain the new expressions. ...... Ruk,m = O(1), where Ruk,m is the quantity in (132) for D = Dm.
Additive Covariogram Models and Estimation through Projections

Miro Powojowski

Christian L´eger∗†

CRM-2870 October 2002

∗ Supported

by a grant of NSERC, Canada and by the Centre de recherches math´ ematiques de math´ ematiques et de statistique, Universit´ e de Montr´ eal, C.P. 6128, succursale Centre-ville, Montr´ eal (Qu´ ebec) H3C 3J7, Canada; [email protected] † D´ epartement

Abstract The paper considers the problem of estimating the covariogram of a stochastic process. Additive covariance models are used, but unlike in previous work by other authors, their estimation is based on projections in the inner product space of sufficiently regular functions. The method can easily accommodate the estimation of the mean of the process from the data through a linear regression model or the inclusion of a nugget effect in the additive covariance model. Asymptotic properties of the resulting estimators are worked out, without explicit assumptions about the functional form of model components or that of the true covariogram. Expressions for the bias of the estimator in misspecified models (it is unbiased if the models for the mean and the covariance are correctly specified), expressions for the estimator’s variance in the normal case and bounds for variance of the estimator under relaxed assumptions are derived. Both in-fill asymptotics and expanding-domain asymptotics are considered, the latter under the more restrictive assumption that the process is isotropic. A new definition of in-fill asymptotics based on the notion of discrepancy is introduced. Non-stationary covariance structures can be accommodated by the projection estimator and the in-fill asymptotic results hold. The techniques are applied to a data set of Davis (1973) and illustrate some of the advantages over the traditional methods. Mathematrical Subject Classification. Primary 62M30; Secondary 62G05, 62P12 Keywords and Phrases. Geostatistics, covariance estimation, stationary processes, isotropic processes, in-fill asymptotics, expanding-domain asymptotics, low-discrepancy sets, star discrepancy, Hardy-Krause variation

1

Introduction.

For a random process Y (x), x ∈ D, where D is a subset of a d-dimensional Euclidean space, the covariogram is defined as C(x1 , x2 ) = cov(Y (x1 ), Y (x2 )), the semivariogram is defined as γ(x1 , x2 ) = (1/2) var(Y (x1 ) − Y (x2 )), and the variogram is defined as 2γ. These definitions do not require the process to be stationary. For a second-order stationary process, the two are related through γ(x1 , x2 ) = C(0, 0) − C(x1 , x2 ) (Cressie, 1993). A common problem in geostatistics is one of estimating the functions C and γ based on one realisation of the process Y observed at a finite number of locations x1 , x2 , . . . , xn in D. It is important to note that the knowledge of function values C(x1 , x2 ) for arbitrary (x1 , x2 ) ∈ D2 is required, and not simply the covariances of Y at lags observed in the sample. The fact of observing only one realisation forces one to make certain assumptions about the process Y , which translate into restrictions on the form of C and γ. There also exist theoretical reasons for restricting the function families considered. The covariogram has to be a positive definite function, whereas the variogram has to be conditionally negative definite. Further restrictions may be desirable. The process Y may be assumed second-order stationary, or even isotropic, requiring C(x1 , x2 ) and γ(x1 , x2 ) to depend only on x1 − x2 or its length, respectively. In a typical covariogram estimation problem it is supposed that the observed process Y follows the model Y = Xβ + η. The known regressor X usually contains terms corresponding to the mean of the process and any trend that is modelled, while the parameter β is unknown and the random term η is assumed to have zero mean and a covariogram CY . In practice, the covariogram CY is modelled by a covariance function Cθ , known up to the value of a finitedimensional vector θ, to be estimated. Different methods have been proposed. A rather exhaustive discussion of the traditional methods of covariogram and variogram estimation is contained in Cressie (1993). Two broad classes of methods can be distinguished: methods requiring parametric distributional assumptions concerning the underlying process, such as ML or REML methods, and methods which avoid making such precise parametric hypotheses. Most of the methods of the latter category are based on parametric curves from some valid family of (co)variogram functions (without making distributional assumptions). These families are usually chosen by convenience, not necessarily because it is believed that the true (co)variogram is a member of this class. The estimation of the parameters of the (co)variogram is done by fitting the curve to the so-called empirical variogram or covariogram. Covariance function estimation based on empirical (co)variogram estimation suffers from a number of drawbacks. The empirical (co)variogram are based on averages of observations which are about the same distance apart. This usually requires some arbitrary binning of the observations and is sometimes difficult to perform if the number of observations is low or the process is not isotropic. In fact, the empirical covariogram is meaningless if the observed process is non-stationary while the empirical variogram, which requires intrinsic stationarity, is sensitive to departures from it, see e.g., Cressie (1993). Finally, the fitting procedure is usually difficult to assess from a statistical point of view. Most known theoretical results (whose comprehensive summary may be found in Cressie, 1993), are concerned only with the properties of the empirical (co)variogram and not those of the fitted (co)variogram function. It appears that the problem of obtaining the properties of the fitted (co)variogram function from the empirical (co)variogram has not been extensively studied. The difficulty lies with the fitting of the curve which is often done through ordinary or weighted least squares and usually requires optimisation of non-linear and non-quadratic functions. An alternative approach is based on the assumption that the covariance function is additive: (1)

Cθ =

q X

θ(i)Ci ,

i=1

where the components Ci are fully specified valid covariance functions and the only parameters to be estimated are the θ(i). Such a model has been considered for instance by Kitanidis (1985), Stein (1987, 1989), and in the case of an isotropic covariance function by Shapiro & Botha (1991). The first two authors have considered the MINQUE (minimum norm quadratic unbiased estimator) method of estimation for the parameters θ(i), from practical and theoretical points of view, respectively. The MINQUE method was originally designed for the purpose of estimation of variance components and has been studied quite extensively in the context of analysis of variance, see Rao and Kleffe (1988). This method leads to a family of estimators as it is a function of a first guess of the covariance function CY , say C0 . The practical usefulness of the estimator depends on the closeness of C0 to CY . Moreover, existing theoretical results establishing properties of MINQUE estimators impose certain assumptions on the relationship between the true covariance function CY and the initial guess C0 (in some sense, the two have to be close — for details the reader is referred to Rao & Kleffe, 1988 and Stein, 1989). In this paper, we introduce an alternative method of estimation of the parameters in the model (1). Let KY be the n × n matrix made up of the elements CY (xi , xj ) and let Kl be the corresponding matrix based on the 1

covariance function Cl . For now, assume that the mean of Y is 0. Note that E(Y Y 0 ) = KY . We study the estimator defined as the orthogonal projection of Y Y 0 onto the linear space of symmetric matrices of size n × n over the field of real numbers based on an inner product on that space. Because of this algebraic structure, it is possible to derive many interesting properties of this estimator. For instance, if CY is of the form (1), it is automatically unbiased, otherwise it is unbiased for the closest member to CY in that class. Interestingly, the estimator is a special case of the MINQUE estimator, see Section 2.3. Unfortunately, most of the theoretical results of the MINQUE estimators do not apply for this special case. Moreover, in the practice of geostatistics, the hypotheses necessary for these results are difficult to establish. The MINQUE estimator also requires the inversion of an n × n matrix whereas our estimator only requires the inversion of a q × q matrix. Unlike the estimator based on fitting a parametric curve to the empirical (co)variogram, we obtain theoretical results directly for the covariogram estimator. For instance, the projection-based estimation yields the mean and variance expressions for the parameters of the estimated covariance curve. It also makes sense even for non-stationary processes. One may question the practical applicability of the class of covariance functions (1) which may seem just as arbitrary as fitting a parametric curve since the q functions Ci must be known. But in the isotropic case, Powojowski (2002b) shows that it is possible to find functions Ci and parameters θ(i), i = 1, . . . , q for q large enough such that Pn the difference between the true covariance function CY and i=1 Ci is as small as needed and Powojowski (2002a) introduces a practical method of choosing the functions Ci and the order q while using the projection estimator introduced in this paper to estimate the parameters θ(i). >From a technical point of view, we introduce what we believe to be a useful, new definition of in-fill asymptotics. Borrowing from research in low-discrepancy sequences, the definition characterises in-fill sequences indirectly, as sequences whose discrepancy converges to zero. It turns out that only the rate of this convergence, and not the sequence itself, enters the proofs of the results, via an application of the Koksma-Hlawka inequality. We believe that the characterisation of in-fill asymptotics by discrepancy measures can be a valuable theoretical tool. It would appear that discrepancy measures may also be useful criteria for the evaluation of sampling configurations in practice. The paper is organised as follows. In Section 2 , we define the projection estimator after introducing the notation and summarising some notions of projections in inner product spaces. Section 3 contains asymptotic analyses of the estimator. First, an asymptotic in-fill setting is considered, where more and more observations of the process Y are collected on a finite domain. It is shown that in general it is impossible to estimate the covariogram consistently based on observations from a finite domain, but an upper bound for the asymptotic variance is obtained. The treatment of the so-called nugget effect is also studied. Note that none of the results to that point require that the process be second-order stationary. Subsequently, it will be assumed that the process is isotropic and we will show that as the size of the domain increases indefinitely, the upper bound for the variance of the estimator derived earlier converges to zero. Theorem 3.6 combines the two concepts of asymptotics and shows how we can obtain a convergent estimator by sampling an expanding region with increasing density. This last result is particularly important from a practical point of view. Finally, an application of the projection estimator is illustrated with a data set of Davis (1973). Technical details are deferred to the appendix.

2

Covariogram estimation through projections

This section describes the notation, reviews standard notions of inner product spaces and introduces the model and the estimator considered in the remainder of the paper, as well as some extensions of the estimator.

2.1

Notation

To avoid confusion which might arise due to the frequent occurrence of multiple subscripts, the following notation will be used throughout the paper. Given a set of scalars A(i, j), 1 ≤ i ≤ n, 1 ≤ j ≤ m the notation [A(i, j)] or A will denote the n × m matrix whose (i, j)-th entry is A(i, j). This notation will be used only in situations where the scalars A(i, j) and the ranges for i and j are clearly defined. Similarly, given a set of scalars B(i), 1 ≤ i ≤ n, [B(i)] or B will denote a (column) vector whose i-th entry is B(i). On the other hand, Ai,j and Bi may denote a matrix and vector from some (doubly) indexed set of matrices or vectors, respectively. In the most general setting, one considers a random process Y on the domain D, a subset of a d-dimensional Euclidean space. The process Y is observed at n locations {xi }ni=1 , xi ∈ D. Let Yn = (Y (x1 ), . . . , Y (xn ))0 and Yn (i) = Y (xi ), 1 ≤ i ≤ n. It will be further assumed that (2)

Yn = Xn β + ηn

with E[ηn ] = 0. It will be assumed that Xn has p columns corresponding to different regression terms. Thus Xn (l, k) = uk (xl ), 1 ≤ k ≤ p, 1 ≤ l ≤ n, where xl is the l-th location in the sample and uk is a continuous function 2

defined on D and it is the k-th regression term in the mean of Y . If present, the term u1 ≡ 1 corresponds to the (non-zero) constant term in the mean of Y . The term uk (xl ) = xl (1) would correspond to a linear trend in the mean of Y (x) in the direction of the first component of x. The matrix Xn will always be known, while the p × 1 vector β may have to be estimated. The function CY (x1 , x2 ) = cov(Y (x1 ), Y (x2 )) = cov(η(x1 ), η(x2 )) is called the covariance function of the process Y (and of the zero-mean process η). Let KY,n = var(Yn ). Thus KY,n is a symmetric matrix whose entries are KY,n (i, j) = CY (xi , xj ). If Cθ is a given covariance function model, one defines the symmetric matrix Kθ,n in a similar way, by putting Kθ,n (i, j) = Cθ (xi , xj ). Thus Kθ,n is a fixed matrix depending only on the model Cθ and on the set of locations {xi }ni=1 , xi ∈ D. The model Cθ will always be assumed to be additive, that is of the form (1). Throughout the paper the components Ci as well as CY will be assumed continuous. For convenience it will be assumed that Ci (0, 0) ≤ 1. In Section 3.2.2 a discontinuous W will be introduced, which will result in (possibly discontinuous) Pcomponent q models of the form C(γ,θ) = γW + i=1 θ(i)Ci . The difference in notation is meant to emphasise the different nature of the functions involved. In all sections preceding 3.3 no stationarity or isotropy assumptions are made about the processes η. In Section 3.3 and the remainder of the paper the process η will be assumed isotropic (hence in particular second-order stationary). Thus E[η(x)] = 0 for all x ∈ D and the covariance function CY of η (and Y ) depends only on ρ = kx1 − x2 k. It will then be convenient to introduce explicitly isotropic versions of CY , Ci and Cθ defined by CY (ρ) = CY (x1 , x2 ), Ci (ρ) = Ci (x1 , x2 ) and Cθ (ρ) = Cθ (x1 , x2 ). Given a model Cθ of the form (1), it will be said that the true covariance function CY is in the span of Cθ (or, equivalently, in the span of the components Ci , 1 ≤ i ≤ q) if and only if there exists a vector θY such that (3)

CY =

q X

θY (i)Ci .

i=1

If A is a symmetric matrix, the shorthands A > 0 and A ≥ 0 will mean that A is positive definite and semipositive definite, respectively. Similarly if B is another symmetric matrix of the same size as A, A > B and A ≥ B will mean A − B > 0 and A − B ≥ 0, respectively.

2.2

Orthogonal projections and estimation with additive models

The goal of this section is to summarise well-known relationships between orthogonal projections in inner product spaces and linear estimation. Let Vn be the linear space of symmetric matrices of size n × n over the field of real numbers. Let K1 , . . . , Kq be fixed, linearly independent elements of Vn with q ≤ dim(Vn ) = n(n + 1)/2. Furthermore, one considers the vector subspace span(K1 , . . . , Kq ) of Vn — the space of linear combinations of the elements K1 , . . . , Kq . Let hK, JiV ,

(4)

K ∈ Vn , J ∈ Vn

be any inner product defined on the vector space Vn , thus making it into an inner product space. The inner product (4) gives rise to a norm on Vn defined by 1/2

kK − JkV = hK − J, K − JiV .

(5)

Let P (J) denote the orthogonal projection of J onto the subspace span(K1 , . . . , Kq ). Thus P is a linear transformation satisfying hKi , J − P (J)iV = 0, i = 1, . . . , q. Pq Since P (J) ∈ span(K1 , . . . , Kq ), one can write P (J) = i=1 θJ (i)Ki . Together with (6) one obtains (6)

(7)

q X

hKi , Kj iV θJ (j) = hKi , JiV ,

i = 1, . . . , q

j=1

or, in matrix form (8)

[hKi , Kj iV ] θJ = [hKi , JiV ].

It is easy to see that the matrix [hKi , Kj iV ] is invertible under the assumption of linear independence of K1 , . . . , Kq . This implies that the equation (8) has exactly one solution, given by (9)

θJ = [hKi , Kj iV ]−1 [hKi , JiV ]. 3

If Y is an n-dimensional random variable with E[Y ] = 0 and var(Y ) = E[Y Y 0 ] = KY , let the subset SY of Vn be defined by SY = {Y Y 0 , Y ∈ Rn }. The random process Y gives rise to a probability measure on SY . Therefore, θˆ = [hKi , Kj iV ]−1 [hKi , Y Y 0 iV ]

(10) is a random variable. One has (11)

ˆ = [hKi , Kj iV ]−1 [hKi , E[Y Y 0 ]iV ] = [hKi , Kj iV ]−1 [hKi , KY iV ]. E[θ]

Therefore by the uniqueness of the solution of (8), one concludes that (12)

P (KY ) =

q X

ˆ E[θ(i)]K i.

i=1

On the other hand, if KY ∈ span(K1 , . . . , Kq ), then for some vector θY one has (13)

KY =

q X

θY (i)Ki .

i=1

By elementary properties of projections, one immediately obtains P (KY ) = KY and, again by the uniqueness of the ˆ solution of (8) it follows that E[θ(i)] = θY (i). Summing up, it follows that for any choice of inner product h., .iV , the random variable θˆ of (10) is the vector Pq ˆ is the vector minimising kKY − Pq α(i)Ki kV . Furminimising kY Y 0 − i=1 α(i)Ki kV . The mean vector E[θ] i=1 ˆ = θY . If on the other hand KY is not of the form (13), the vector thermore, if KY is of the form (13), then E[θ] ˆ given by (11) is still a meaningful parameter, since it defines the orthogonal projection Pq θ(i)Ki of KY θ = E[θ], i=1 onto span(K1 , . . . , Kq ).

2.3

The estimator

The goal is to estimate the unknown covariance function of the process Y from the observations Yn . If β in the equation (2) is unknown, it may also be necessary to estimate it, otherwise one can work directly with ηn . In this sense, knowing β is equivalent to putting X = 0. To motivate the discussion, it is initially assumed that X = 0 and hence Yn = ηn . One observes that (14)

E[Yn Yn0 ] = KY,n .

Furthermore, valid covariance functions Ci , 1 ≤ i ≤ q are assumed to be fully specified. The functions Ci give rise to the symmetric matrices Ki,n . One considers the class of covariance function models in (1), which results in the covariance matrix models (15)

Kθ,n =

q X

θ(i)Ki,n ,

i=1

where it will be assumed that the θ(i) are such that Cθ is a valid covariance function. A member of the class (1) is sought which will be in some way closest to the unknown true covariance function CY . We use the general approach outlined in Section 2.2 with the inner product (16)

hA, Bi = tr(AB 0 ).

The resulting norm kA − Bk = hA − B, (A − B)0 i1/2 , known as the Frobenius norm, is the square root of the sum of squares of elements of A − B which, for a symmetric matrix, equals the sum of squares of its eigenvalues. In various proofs we will also use the L∞ norm kAk∞ = maxi,j=1,...,n |Ai,j | and the spectral radius norm kAko = supkvk=1 kAvk where the vector norm is the L2 norm. We can easily show that the three norms are equivalent, i.e. (17) (18)

kAk∞ ≤ kAk ≤ nkAko 1 1 kAk∞ ≥ 2 kAk ≥ 2 kAko . n n

But only the norm (16) will be used to estimate θ. 4

The estimator of θ that we propose is the projection of the matrix Yn Yn0 (whose expectation is KY ) onto the linear Pq space spanned by the matrices Ki,n . Equivalently, the θˆn (i) are selected so as to minimise kYn Yn0 − i=1 θˆn (i)Ki,n k. The resulting estimator is (19)

θˆn = [tr(Ki,n Kj,n )]−1 [tr(Ki,n Yn Yn0 )] = [tr(Ki,n Kj,n )]−1 [Yn0 Ki,n Yn ]

where θˆn = (θˆn (1), . . . , θˆn (q))0 . The expression (19) should be compared to the general form (10). In particular, it follows that if the true covariogram CY is of the form (3) for some θY , then the estimator is unbiased for θY . Pq ˆ Otherwise θˆn is still a meaningful parameter in the sense that i=1 θn (i)Ki,n is the closest matrix of the form Pq Y 0 where closest is defined in terms of the norm induced by the inner product (16). i=1 αi Ki,n to the matrix YP Pq q Similarly, θn = E[θˆn ] defines i=1 θn (i)Ki,n , which is the closest matrix of the form i=1 αi Ki,n to the matrix KY,n . In the more general case of unknown β, Yn no longer has mean 0, so one must instead work with the residuals en = Yn − Xn βˆ = Pn Yn of the regression model (2), where βˆ is the least-squares estimator of β and Pn = In − Xn (Xn0 Xn )−1 Xn0 is the orthogonal projection to the columns of Xn . Note that E[en ] = 0 and that E[en e0n ] = Pn KY,n Pn = UY,n ,

(20) which is a generalisation of (14). Let (21)

Ui,n = Pn Ki,n Pn .

Pq If KY,n satisfies (13), then the norm kUY,n − i=1 θY (i)Ui,n k = 0. Hence to estimate the θ(i), the matrix en e0n is projected onto the linear space spanned by the matrices Ui,n rather than the matrices Ki,n . Equivalently, the θˆi,n Pq are selected so as to minimise ken e0n − i=1 θˆi,n Ui,n k. The resulting estimator is (22)

θˆn = [tr(Ui,n Uj,n )]−1 [tr(Ui,n en e0n )] = [tr(Ui,n Uj,n )]−1 [e0n Ui,n en ].

Again, if the true covariogram CY is of the form (3) for some θY , then the estimator (22) is unbiased for θY , by an argument similar to P that of Section 2.2. The resulting estimate of the covariance function CY based on the q observations Yn is Cˆθ = i=1 θˆn (i)Ci . By the linearity in e0n Ui,n en , it is easy to see that the mean and variance of θˆV,n are given by (23)

E[θˆn ] = [tr(Ui,n Uj,n )]−1 [tr(Ui,n UY,n )] and

(24)

var(θˆn ) = [tr(Ui,n Uj,n )]−1 var([e0n Ui,n en ])[tr(Ui,n Uj,n )]−1 .

Later it will often be useful to make the assumption (25)

var[Yn0 An Yn ] =c 0 is required. In many situations even more severe constraints P may be necessary. It may be required that θ(i) > 0 for all i, and in the covariogram estimation it is necessary that θ(i)Ci (xj , xk ) be a positive definite function. When these constraints are imposed, the optimisation may have to be carried out in a convex cone and not the entire vector space. Possible ways of addressing these difficulties include truncated estimators or quadratic optimization with linear constraints. In general, such methods tend to introduce a bias, but they often reduce the estimator’s MSE. These concerns are not relevant to the asymptotic results derived in the remainder of this paper for the case where (3) holds. Otherwise, a judicious choice of the model components Ci can still guarantee that the limit is a valid covariance function (see e.g. Powojowski, 2002b). However, to apply the estimator (22) in practice for a finite sample, one will have to address these issues.

3

Asymptotic results

This section contains the main results of the paper. Firstly, the in-fill asymptotic setting considered throughout the paper is defined. Assuming in-fill sampling on a finite domain, one obtains an expression for the asymptotic mean and a bound for the asymptotic variance of the projection estimator. Subsequently, the effect of varying the size of the domain on which the in-fill sequence is defined on the obtained asymptotic variance bound is investigated. Various extensions and special cases are considered such as the presence of the so-called nugget effect, or the case of a process with known mean.

3.1

Asymptotic settings

Various asymptotic settings are possible in geostatistics. In all cases it will be assumed that (2) holds. The number of observations n will be allowed to tend to infinity. However, additional considerations arise in defining a setting for an asymptotic theory. These have to do with the relative locations of the observations xi in the domain of the process, and the size and shape of the domain itself. Many authors (e.g., Cressie, 1993) have distinguished between two basic asymptotic settings: the so-called in-fill asymptotics, in which the observations are placed within a compact domain D, and the expanding-domain asymptotics, where the observations are spread over an increasing family of domains {Di }i=1,... , Di ⊆ Di+1 . Clearly, even this does not fully define the problem. In both settings the observations may be placed on some regular grid or in any geometric arrangement whatsoever. Each such arrangement generally gives rise to a different model as in (2), even though the underlying process Y is the same. Various precise definitions of asymptotic settings have been used by many authors. In-fill configurations have been considered, among others, by Stein (1987, 1989); Stein & Handcock (1989); Lahiri (1996). Expanding-domain schemes in which the minimal distance between observations remains bounded from below by a positive value have been considered by Cressie & Grondona (1992); Cressie (1993) and others. Finally, schemes combining both in-fill and expanding-domain properties in a sampling configuration have been considered by Hall & Patil (1994); Lahiri et al. (1999). 6

3.1.1

Sampling point sets and low-discrepancy point sequences

This section reviews some basic notions of low-discrepancy point sequences which will be required in the remainder of the paper. A general background on this subject may be found in Niederreiter (1992) and a comprehensive overview of recent work may be found in Niederreiter & Spanier (2000). Given a set D ⊂ Rd , a sample point set on D is any finite set S = {x1 , . . . , xn } of points in D. The cardinality of S will be denoted by |S|. Throughout this paper D will be a compact, simply connected set. Definition 3.1 Let ψ ∈ C d (D) (ψ has d-th order continuous partials on D) and let x0 ∈ D be a fixed point. The variation of ψ in the sense of Hardy and Krause is defined as  Z  |u| X ∂  ∗   (28) VD,x (ψ) = ψ(x , x ) u 0 dxu 0  D u ∂xu ∅6=u⊆{1,...,d}

where the notation ψ(xu , x0 ) means that all components xu (j) where j ∈ / u are set to x0 (j). ∗ Let VD∗ (ψ) = inf x0 ∈D VD,x (ψ). 0

Definition 3.2 The star discrepancy of the sampling point set S is defined as   1 X (29) D∗ (S) = kFunif (x) − FS (x)k∞ = supx∈D  1{s(j)≤x(j), µ(1{y∈D:y(j)≤x(j), j=1,...,d} ) − |S| s∈S

   j=1,...,d} .

Funif denotes the uniform probability measure on D. This measure is clearly absolutely continuous with respect to the Lebesgue measure and f ≡ 1/µ(D) will denote its Radon-Nikodym derivative — the uniform probability density function on D. The following result will be essential. Theorem 3.1 (Koksma-Hlawka) For any sampling point set S, any ψ ∈ C d (D) and any fixed x0 ∈ D the following inequality holds: Z  Z      1 X ∗  =  ψ(x) d(Funif (x) − FS (x)) ≤ D∗ (S)VD,x (30) e= ψ(x)f (x)dx − ψ(s) (ψ) 0     |S| D D s∈S

and hence e ≤ D∗ (S)VD∗ (ψ). Sequences {Sn }∞ n=1 of sampling point sets on D with |Sn | = n will be considered and for the sake of brevity the notation δn = D∗ (Sn )

(31)

will be used. The (star) discrepancy δn plays an important role in the rate of convergence of the expected value of the projection estimator as we shall soon see. The following result is attributed to Korobov (1959). Theorem 3.2 For the unit cube D = [0, 1]d there exist sampling point set sequences with discrepancies satisfying δn < cn−1 (log n)d

(32)

for some constant c. A sequence {Sn }∞ n=1 satisfying (32) is called a low-discrepancy sequence. The original statement of the theorem contains a method of constructing low-discrepancy sequences. The problem of constructing low-discrepancy sequences on various domains has been studied by many authors. In particular, Fang & Wang (1994) discuss such a construction on an arbitrary compact domain. 3.1.2

Asymptotic setting definitions

Precise meaning will now be given to the notion of in-fill asymptotics used in subsequent discussion. Our in-fill asymptotic context differs from that considered by other authors in that it does not require the observations to be equally spaced (as opposed to, for example, Stein, 1987) but nevertheless specifies a precise limiting coverage (as opposed to Stein, 1989, where the limiting coverage is not explicitly characterised). 7

d Definition 3.3 An in-fill sampling domain (D, {Sn }∞ n=1 ) consists of a compact, simply connected domain D ⊂ R ∞ and a sequence {Sn }n=1 of sampling point sets on D.

For each n, Sn = {xn1 , . . . , xnn }, with |Sn | = n and let δn be as in (31). The vector Yn = (Y (xn1 ), . . . , Y (xnn ))0 will be referred to as the sample of size n. Next, in-fill sampling and expanding domains will be combined. One considers a sequence of in-fill sampling ∞ domains. Let an in-fill sampling domain (D, {Sn }∞ n=1 ) be given. Let {rm }m=1 be an increasing and unbounded d sequence of real numbers with r1 = 1. Let Tm (x) = rm x, x ∈ R be the dilation operator. It will be assumed that the origin 0 lies in the interior of D, so that the images Tm (D) increase to cover all of Rd . Furthermore, let (33) (34)

xnm,j

Dm = Tm (D) = Tm (xnj ), xnj ∈ Sn , j = 1, . . . , n Sm,n = {xnm,1 , . . . , xnm,n }

(35)

Thus each pair (Dm , {Sm,n }∞ n=1 ) is an in-fill sampling domain. The following definition will be useful in discussing situations where the sampled domain is allowed to expand. ∞ Definition 3.4 The sequence {(Dm , {Sm,n }∞ n=1 )}m=1 , where Dm and Sm,n are as in (33) and (35) with D a compact simply connected set containing 0 as an interior point, will be called a sequence of expanding in-fill domains. To simplify notation it will be assumed that µ(D) = 1, where µ is the Lebesgue measure.

The requirement that µ(D) = 1 does not cause any loss of generality, since all asymptotic arguments will deal with the case where Dm is allowed to increase indefinitely as m increases. Remark 3.1. (36)

3.2

With the definitions above, one has D∗ (Sm,n ) = D∗ (Sn ) and hence δm,n = δn for m ≥ 1.

Asymptotics on a bounded domain

In this section it will be assumed that observations are collected on a fixed bounded domain, while the number of observations increases. 3.2.1

Process with unknown mean or a trend (X 6= 0)

This section explores the properties of the projection estimator as an increasing number of observations from an in-fill sampling sequence on a finite domain become available. We are interested in finding the limiting mean of the estimator and in bounding its variance. >From (23), (37)

E[θˆn ] = A−1 X,n MX,n ,

where the estimator θˆn is of the form (22) and AX,n and MX,n are defined as follows: (38)

AX,n = (1/n2 )[tr(Ui,n Uj,n )]

(39)

MX,n = (1/n2 )[tr(Ui,n UY,n )].

Here, UY,n and Ui,n are defined as in (20) and (21). To bound the variance, we introduce (40)

BX,n = (1/n4 )[tr(Ui,n UY,n Uj,n UY,n )]

(41)

−1 EX,n = A−1 X,n diag(BX,n )AX,n ,

where diag(BX,n ) is a matrix whose diagonal elements are the same as those of BX,n , while the off-diagonal elements are zero. To describe the limits of the quantities (38) through (41), we will need to introduce functions φi which depend on the functions ui used in defining the matrix of regressors Xn as in Section 2.1. Consider the matrix Rn = (1/n)Xn0 Xn and let the p × p matrix R be defined by Z (42) R(i, j) = ui (ξ)uj (ξ)f (ξ)dξ, D

8

where f ≡ 1/µ(D). It is easily seen that (43)

kRn − Rk ≤ pkRn − Rk∞ ≤ p maxk1 ,k2 =1,...,p VD∗ (uk1 (·)uk2 (·))δn = kR δn

and if R is invertible then for large enough n the matrix Xn0 Xn is invertible and by Lemma A.5 (44)

k(n(Xn0 Xn )−1 ) − R−1 k∞ ≤ kn(Xn0 Xn )−1 − R−1 k ≤ kR−1 k2 kR δn .

Therefore, if the observations come from an in-fill sampling domain (D, {Sn }∞ n=1 ), Rn converges to R. ˆ Note that the estimator θn of (22) depends on Xn only through Pn , the projection matrix on the space orthogonal to the columns of Xn . This projection remains unchanged if an invertible linear transformation is applied to the rows of Xn . In particular, we can transform it using the matrix P such that P 0 RP = I. Hence, without loss of generality, we can assume that R = I, so that kRk = kR−1 k = p. This means that we can assume that Z (45) ui (ξ)uj (ξ)f (ξ)dξ = δi,j , D

where δi,j = 1 if i = j and zero otherwise. In the remainder of this paper, this will indeed be assumed. Let’s now introduce the functions φi ’s: (46) φi (xl1 , xl2 ) = Ci (xl1 , xl2 ) −

p X

Z uk (xl2 )

uk (ξ)Ci (ξ, xl1 )f (ξ)dξ D

k=1



p X

Z uk (xl1 )

uk (ξ)Ci (ξ, xl2 )f (ξ)dξ D

k=1

+

p p X X

Z Z uk1 (xl1 )uk2 (xl1 )

uk1 (ξ)uk2 (ξ)Ci (ξ, η)f (ξ)f (η)dξdη. D

k1 =1 k2 =1

D

Similarly, one defines φY by replacing Ci by CY in the equation above. These definitions are rather technical in nature. Their precise role can be seen in the proofs of the results of this section, but intuitively they arise from ˆ The terms involving integrals can be associated considering cov(Y (x1 ) − Yˆ (x1 ), Y (x2 ) − Yˆ (x2 )) where Yˆ = X β. ˆ with the covariances of the predictors Y with Y and with themselves. As will be shown in Lemma 3.1, the following matrices will be the limits of (38) through (41): Z Z (47) AX (i, j) = φi (ξ1 , ξ2 )φj (ξ2 , ξ1 )f (ξ1 )f (ξ2 )dξ1 dξ2 Z DZ D (48) MX (i) = φi (ξ1 , ξ2 )φY (ξ2 , ξ1 )f (ξ1 )f (ξ2 )dξ1 dξ2 D D Z Z (49) BX (i, j) = hX,i (ξ, η)hX,j (η, ξ)f (ξ)f (η)dξdη D D Z (50) hX,i (ξ, η) = φi (ξ, λ)φY (λ, η)f (λ)dλ D

(51)

−1 and EX = A−1 X diag(BX )AX .

Lemma 3.1 For observations coming from an in-fill sampling domain (D, {Sn }∞ n=1 ), with the uk , k = 1, . . . , p continuous and such that R = I, assuming AX is invertible, there exist constants such that (52) (53) (54) (55)

kAX,n − AX k∞ ≤ cA,X δn kMX,n − MX k∞ ≤ cM,X δn kBX,n − BX k∞ ≤ cB,X δn kEX,n − EX k∞ ≤ cE,X δn .

The following result is the main result of this section. 9

Theorem 3.3 Under the model (2), with Xn containing continuous regressor functions and such that the matrix R of (42) equals the identity, if the observations come from an in-fill sampling domain (D, {Sn }∞ n=1 ), and the matrix AX of (47) is invertible, there is a constant such that the mean of the projection estimator θˆn defined by (22) satisfies kE[θˆn ] − A−1 X MX k∞ ≤ cθ,X δn .

(56)

If, moreover, (25) holds, the variance of the estimator satisfies var(θˆn (i)) ≤ qc (EX (i, i) + cE,X δn ). Pq PqRemark 3.2. Note that if the covariance function of the process Pq CY = i=1 θY (i)Ci as in (1), then φY (x1 , x2 ) = i=1 θY (i)φi (x1 , x2 ) and so it is easy to see that MX (i) = j=1 θY (j)AX (i, j), i.e., MX = AX θY . Hence the −1 asymptotic limit of the mean of the estimator, AX MX , is simply θY , i.e., the estimator is unbiased. This is of course not surprising since we have already argued that it is unbiased for all n when CY satisfies (1). If on the other hand CY does not satisfy (1), the parameter θ∞ ≡ A−1 X MX can still be defined as the limit of ˆ E[θn ] and it has some interesting properties. From earlier discussion and from the discussion in Section 2.2 it follows that (57)

(58)

lim E[Cθˆn (x, y)] =

n→∞

q X i=1

( lim E[θˆn (i)])Ci (x, y) = n→∞

q X

θ∞ (i)Ci (x, y)

i=1

which, when viewed as a function of (x, y), is the orthogonal projection of the function φY (x, y) onto the space spanned by the functions φi (x, y), where the inner product between two functions φ1 , φ2 on D2 is defined by Z Z (59) hφ1 , φ2 i = φ1 (ξ, η)φ2 (η, ξ)f (ξ)f (η)dξdη. D

D

This is somewhat comforting, since it means that if the functions Ci are selected so that the space they span is sufficiently rich to contain elements close to CY , the obtained estimator’s bias should be small. Remark 3.3. The theorem shows that the variance of the components of θˆn remains bounded above by the diagonal entries of the matrix E even as the number of observations increases in this fixed domain. In the case of a Gaussian process, we can even be more specific: −1 var(θˆn ) = 2A−1 n Bn An ,

(60)

which converges to 2A−1 BA−1 . This matrix will rarely be zero. Since A is invertible, A−1 BA−1 has the same rank as B and it is semipositive definite being a variance matrix. Hence the limiting matrix could only be zero if B = 0. This last condition will generally not hold (see the next remark for the case where the mean is known where it is easier to see why). So even though (57) is an upper bound, this discussion shows that in general, there will not be convergence of the variance to zero and therefore θˆn is not consistent. This should be compared to the results of Matheron (1965), who shows the impossibility of consistent estimation of the empirical variogram based on complete information about the process over a finite domain. Remark 3.4. Consider the case when the mean of the process is assumed known. This is equivalent to assuming that the matrix Xn = 0 in (2). Then we can define An , Mn , Bn , and En as in equations (38) through (41) by replacing Ui,n by Ki,n and UY,n by KY,n . Their limits A, M , B, E are as in equations (47) through (51) replacing φi (x1 , x2 ) by Ci (x1 , x2 ) and φY (x1 , x2 ) by CY (x1 , x2 ). With these definitions, the conclusions of Lemma 3.1 and Theorem 3.3 remain valid. In this case, it is easier to see that the matrix B is unlikely to be 0 (see the previous remark). For instance, if covariance functions Ci and CY are everywhere nonnegative, then the function hi (ξ, η) will be positive (unless the support of f is where the covariance functions are zero). In this case, the limiting covariance function (58) can now be viewed as the orthogonal projection of the function CY (x, y) onto the space spanned by the functions Ci (x, y) (instead of the projection of φY onto the space spanned by the φi ) with the same inner product (59). 3.2.2

The nugget effect

In the practice of geostatistics it is common to consider processes of the form (61)

Y (x) = Y (x) + (x) 10

where Y (x) is as in previous sections and where (x) is a zero-mean random variable with a finite variance γ and where for x1 6= x2 the random variables (x1 ) and (x2 ) are uncorrelated. The processes Y (x) and (x) are assumed uncorrelated as well. The variance of the term (x) is traditionally called the nugget effect in geostatistics. This section reviews the effect of the presence of a nugget effect in the model in the setting of the previous section. To estimate the covariance of the process with a nugget effect, a discontinuous covariance component is added to the model (1). This discontinuous component will be called the nugget effect covariance component and it will be denoted by W in order to differentiate it from the continuous components Ci . The component W is defined as ( 1 if ξ = η (62) W (ξ, η) = 0 otherwise. It will be assumed that the model (61) holds and hence the true covariance function of Y is (63)

CY, = CY + γW

where W is discontinuous as defined above, while CY , the covariance function of the process Y , is a continuous function as in the previous sections. The model to be fitted will be of the form (64)

Cγ,θ = γW +

q X

θ(i)Ci = γW + Cθ .

i=1

Because of the discontinuity of the component W , the theory of the previous section does not apply directly. In fact, as we will soon see, the rate of convergence for γ and for θ will differ. Following the argument leading to the definition of θˆn in (22), the projection estimator is easily obtained: (ˆ γn , θˆn )0 = ∆−1 n Ψn ,

(65) where

0  n−p [tr(Ui,n )] [tr(Ui,n )] [tr(Ui,n Uj,n )]   e0n en Ψn = . [e0n Ui,n en ]

 (66)

∆n =

(67) Therefore, given an observed vector Yn one has (68)

Cγˆn ,θˆn = γˆn UW,n +

q X

ˆ θ(i)U i,n ,

i=1

where UW,n = Pn In Pn = Pn , since Pn is a projection matrix. To obtain the asymptotic behavior of the projection estimator (65), we define the following matrices 1  0 Γn = n 1 (69) 0 2 Iq  n  tr(UW,n U(Y,),n ) (70) Υn = E[Ψn ] = [tr(Ui,n U(Y,),n )]  0 tr(UW,n U(Y,),n UW,n U(Y,),n ) [tr(Ui,n U(Y,),n UW,n U(Y,),n )] (71) Ξn = , [tr(Ui,n U(Y,),n UW,n U(Y,),n )] [tr(Ui,n U(Y,),n Uj,n U(Y,),n )] where U(Y,),n = Pn (KY,n + γIn )Pn = UY,n + γPn . We now define 0 n−1 (n − p) n−1 [tr(Ui,n )] A,n = Γn ∆n = −2 n [tr(Ui,n )] AX,n  −1  −1 n tr(UY,n ) + γn (n − p) M,n = Γn Υn = MXn + γn−2 [tr(Ui,n )]



(72) (73) (74)

B,n = Γn Ξn Γn

(75)

−1 E,n = A−1 ,n diag(B,n )A,n

11

where p is the number of columns in the regression matrix X, [tr(Ui,n )] is a q × 1 matrix, [tr(Ui,n Uj,n )] is a q × q matrix and where the matrices Ui,n are defined as in (22) while AX,n and MX,n were defined in the previous section. It is important to note that the premultiplication by the matrix Γn implies that the terms involving γˆ alone converge at the rate n−1 , whereas those that involve θˆ alone converge at the rate n−2 as in the previous section. Note also that the matrices are no longer symmetric with one cross-product with a n−1 term whereas the other has a n−2 term. This will have an effect on the limiting matrices as we will now see. To extend the results of the previous section to the case with the nugget effect, we need the following definitions: Z (76) a(i) = φi (ξ, ξ)f (ξ)dξ, with φi as in (46), D Z (77) m0 = γ + φY (ξ, ξ)f (ξ)dξ, D Z Z (78) b0 = φY (ξ, η)φY (η, ξ)f (ξ)f (η)dξdη, D D Z Z (79) b(i) = hX,i (ξ, η)φY (η, ξ)f (ξ)f (η)dξdη, with hX,i as in (50), D

D

  1 a0 , where AX is q × q given by (47), 0 AX   m0 M = , where MX is q × 1 given by (48), MX   b b0 B = 0 , where BX is q × q given by (49), b BX

(80)

A =

(81) (82)

−1 0 E = A−1  diag(B )(A ) .

(83)

The following lemma further generalises Lemma 3.1: Lemma 3.2 For observations coming from the process in (61) sampled on an in-fill sampling domain (D, {Sn }∞ n=1 ), with the uk , k = 1, . . . , p continuous and such that R = I, assuming AX is invertible, there exist constants such that kA,n − A k∞ ≤ cA, δn kM,n − M k∞ ≤ cM, δn kB,n − B k∞ ≤ cB, δn kE,n − E k∞ ≤ cE, δn .

(84) (85) (86) (87) The following result holds:

Theorem 3.4 Under the model (61) with Y as in (2), where X contains continuous regressor functions and such that the matrix R of (43) equals the identity, if the observations come from an in-fill sampling domain (D, {Sn }∞ n=1 ), and the matrix A of (80) is invertible, the projection estimator defined by (65) has the mean satisfying w   w w    w w γˆ w w γˆ w m0 − a0 A−1 MX w w w w n n −1 X (88) wE ˆ − A M w = wE ˆ − w ≤ cθ, δn −1 A MX w θn w w θn w X





where θ is given by (56). If, moreover, (25) holds, the variance of the estimator satisfies (89) (90)

var(ˆ γn ) ≤ (q + 1)c (E (1, 1) + cE, δn ) ˆ var(θn (i)) ≤ (q + 1)c (E (i + 1, i + 1) + cE, δn ), i = 1, . . . , q.

Remark 3.5. If the covariance of the process Y satisfies the linear model equation (64), then the projection 0 estimator for γ and θ is asymptotically unbiased since A−1 X MX = θ and it is easy to see that m0 − a θ = γ in this case. In fact, for any finite n, the estimator is unbiased. But if the covariance model for the Y part of Y is not correct, then both the estimators of θ and γ will be biased. Also, as in the case without the nugget effect, the limiting variance of the estimator is finite and non-zero in general. 12

Remark 3.6. It is of course of interest to find out what happens if one does not include the term γW in the covariance model (64) even though there is a nugget effect so that the covariance function is given by (63). Interestingly, there is no effect on the asymptotic mean and variance of θˆn . To see this, one examines the formulae (47), (48) and (49). Clearly, (47) is unaffected, while (48) yields 1 1 tr(Ui,n (UY,n + γIn )) = lim MXn + lim 2 tr(Ui,n ) = MX (i) 2 n→∞ n n→∞ n→∞ n so that the limit is unchanged with that of the previous section. A similar argument shows that the limit of BX,n is also unchanged. Hence if the covariance model for Y is correct, then the covariance estimator Cθˆn (x, y) is asymptotically unbiased for x 6= y. Without the nugget component, of course, the estimator is biased for x = y. It is important to note that, in general, omitting a component in the covariance which is present in the true covariance, say θ1 C1 , will lead to biased estimates of the other components and change the asymptotic mean and variances of these components. It is the very special nature of the nugget component W (a discontinuous delta function) which changes the rates of convergence which leads to this surprising conclusion.

(91)

3.3

lim

Asymptotics on expanding domains

In the previous sections it was shown that under some rather general conditions, if the observations come from ˆ of the estimator (22) are an in-fill sampling sequence on a compact domain, the variances of the components θ(i) bounded by a multiple of the diagonal elements of the matrix E (where the precise definition of E may be given by (51) or (83) depending on context). It was also seen that those bounds are generally strictly positive and that even in the simple case of a Gaussian process, var(θˆn (i)) remains bounded away from zero. In this section attention will be focused on isotropic processes and the effect of the size of the finite domain on the limiting variance will be considered. The limit matrix E will be found to depend on the size of the domain D. The main result derived in the subsequent sections states that under fairly mild regularity conditions the entries of the matrix E converge to zero as the sampled domain D is allowed to grow indefinitely. It will therefore follow that var(θˆn (i)) can be made arbitrarily small by sampling a sufficiently large domain at a sufficient number of locations. ∞ It will be assumed that a sequence {(Dm , {Sm,n }∞ n=1 )}m=1 of expanding in-fill domains is given, as in Definition 3.4. The subscript m runs over domains in the expanding sequence {Dm }. The results obtained so far do not make the assumption of stationarity of η in (2). Throughout the remaining sections it will be assumed that the random process η, as well as the component covariance models Ci , are isotropic and therefore (second order) stationary. In the case of an isotropic process the covariance functions Ci (x1 , x2 ), Cθ (x1 , x2 ) and CY (x1 , x2 ) depend only on ρ = kx1 − x2 k and the following notation will be used along with the current notation: Ci (ρ) = Ci (x1 , x2 ), Cθ (ρ) = Cθ (x1 , x2 ) and CY (ρ) = CY (x1 , x2 ). Thus the model (2) will remain unchanged, but (1) will now have an equivalent isotropic version (92)

Cθ =

q X

θ(i)Ci .

i=1

3.3.1

Isotropic fields

In the case of the isotropic field, the equations (52) and (53) take a simplified form. The double integrals of (52) and (53) can be replaced by single integrals if the general covariance forms CY , Cθ and Ci are replaced by their isotropic counterparts CY , Cθ and Ci . These new expressions will be essential in subsequent considerations deriving the limit of the matrix E as the size of the domain D grows indefinitely. This section establishes some fairly technical details necessary to obtain the new expressions. One considers the measure Z (93) F2 (B) = f (ξ)f (η)dξdη B 2

where B is any Lebesgue-measurable subset of D . The following defines a measure on [0, diam(D)]: G(A) = F2 ({(ξ, η) : kξ − ηk ∈ A})

(94)

where A is any Lebesgue-measurable subset of [0, diam(D)]. (The fact that the distance function is continuous guarantees that the set {(ξ, η) : kξ − ηk ∈ A} is measurable.) It is easily seen that if φi (ξ, η) = Φi (ρ), i = 1, 2 are measurable and isotropic, then Z Z Z diam(D) (95) φ1 (ξ, η)φ2 (η, ξ)dF2 (ξ, η) = Φ1 (ρ)Φ2 (ρ)dG(ρ). D

D

0

13

The measure G will generally depend on the size and shape of D. The non-negative function G(ρ), ρ > 0 will be defined by G(ρ) = F2 ({(ξ, η) : kξ − ηk ≤ ρ}) = G([0, ρ]).

(96)

Examples of the function G for some regular domains for d = 2 can be found in Bartlett (1964) or Diggle (1983). Next one considers a sequence of expanding in-fill domains (Dm , {Sm,n }∞ n=1 ) as in Definition 3.4, with fm ≡ ∞ 1/µ(Dm ), which gives rise to the sequences of measures {F2,m }∞ m=1 and functions {Gm }m=1 , where F2,m is defined by (93) with f replaced by fm , and Gm is given by (96) with F2 = F2,m . The sequence {Gm }∞ m=1 will play an important role in the next lemma. To describe the edge effect present in a bounded domain, the set ∂ρ (D) = {x ∈ D : ∃y ∈ Rd \ D : kx − yk < ρ}

(97)

will be called the edge strip of D of depth ρ — the subset of D containing all points which are less than ρ away from some point outside of D. Loosely speaking, the following lemma decomposes the function Gm into two components: one which depends only on the dimensionality d of the embedding space, and another, which depends on the geometry of the domain and which may be associated with the edge effect. Lemma 3.3 Let (Dm , {Sm,n }∞ n=1 ) be a sequence of expanding in-fill domains. Let µ(D) = 1 and let D satisfy the following condition: (98)

sup ρ>0

µ(∂ρ (D)) ≤c 0 and τ > 0 |Ci (ρ)| ≤ αρ−d/2−τ

(104)

and consequently, for some constants ci and cii Z −d/2−τ |Ci (ξ, λ)|fm (λ)dλ ≤ ci rm for all ξ ∈ Dm (105) Dm Z Z −d/2−τ (106) |Ci (ξ, λ)|fm (λ)fm (ξ)dλdξ ≤ cii rm Dm

Dm

for i = 1, . . . , q. Working with different domains Dm implies that the regressor functions ui are defined on different domains. For this reason and to ensure the orthonormality, it is necessary to consider a separate linear transformation of the regressor functions for each domain Dm . Condition 3.4 It is assumed that on each domain Dm , the regressor functions {ui }pi=1 can be linearly transformed into regressor functions {˜ um,i }pi=1 satisfying: Z (107) u ˜m,i (ξ)˜ um,j (ξ)fm (ξ) dξ = δi,j . Dm

Condition 3.5 There exists a finite constant c such that for all m, multi-index vectors u satisfying |u| ≤ d and for i = 1, . . . , p ∂ |u| u ˜m,i (ξ) ≤ c, ξ ∈ Dm ∂u

(108)

Condition 3.4 can be always satisfied for linearly independent, summable functions on a compact domain. Condition 3.5 can be shown to be satisfied if, for example, the regressors {ui }pi=1 are polynomials (see Lemma A.8 in the Appendix). 3.3.3

Expanding-domain convergence

For each m = 1, 2, . . ., one defines am , m0,m , b0,m , bm , A,m , M,m , B,m and E,m by putting D = Dm in the definitions (76), (77), (78), (79), (80), (81), (82) and (83), respectively. One defines: (109) (110)

a(i) = Ci (0) m0 = γ + CY (0) Z

(111)

A(i, j) = dαG



Ci (ρ)Cj (ρ)ρd−1 dρ i, j = 1, . . . , q with αG as in Lemma 3.3 Z ∞ M (i) = dαG CY (ρ)Ci (ρ)ρd−1 dρ i = 1, . . . , q 0   1 a0 A = 0 A   m0 M = M 0

(112) (113) (114) (115)

θ = A−1 M.

Theorem 3.5 Let the sequence {rm } be increasing and unbounded. Let Conditions 3.1, 3.2, 3.3, 3.4 and 3.5 hold. Let am , m0,m , θm , E,m , a, m0 , θ, γ be as defined earlier. Then the following limit holds:       Pq m0,m − a0m θm m0 − a0 θ γ + CY (0) − i=1 θ(i)Ci (0) (116) lim = = . θm θ θ rm →∞ Moreover, there exists a fixed vector of positive numbers E(i), i = 1, . . . , q + 1 such that for m = 1, . . . (117)

d rm E,m (i) ≤ E(i).

15

The last result provides an important insight into the way in which the variance of the projection estimator decreases as the size of the domain increases. For each domain Dm the bounds (84) - (87) and (88) hold with A replaced by A,m , M replaced by M,m , etc. and with the constants cA, , cM, , cB, , cE, and cθ, replaced by constants dependent on the domain Dm : cA,,m , cM,,m , cB,,m , cE,,m and cθ,,m . For a sufficiently large n, the dominating −d term in the variance of γˆn or θˆn (i) is of the form (q + 1)cE,m (j) ≤ (q + 1)cE(j)rm with j = 1 for γˆn and j = i + 1 ˆ for θn (i). This bound vanishes as the domain increases, although n may have to increase correspondingly. The next section explores the relationship between the sample size and the size of the domain required to ensure that the variance vanishes.

3.4

Mixed in-fill and expanding domain asymptotics

The results of previous sections can be combined to construct a consistent estimator provided that an increasing sequence of domains is sampled with increasing intensity. The increasing domain size rm reduces the bound (117) on limiting variance of an estimator based on sampling that domain. On the other hand, the constants cθ, and cE, in (88) and (89) will in general depend on the domain Dm and for this reason it will be denoted by cθ,,m and cE,,m . The way in which they depend on rm is described by the following lemma. Lemma 3.4 If the functions Ci , i = 1, . . . , q and CY have uniformly bounded partials of orders not exceeding 2d and Conditions 3.1 – 3.5 hold, the following bounds hold: (118)

4d cθ,,m = O(rm )

(119)

8d cE,,m = O(rm ).

Combining the last lemma with Theorem 3.5 leads to the following theorem. ∞ Theorem 3.6 Let {(Dm , {Sm,n }∞ n=1 )}m=1 be a sequence of expanding in-fill domains and let the assumptions of ˆ Theorem 3.5 hold. Let θm,n be the estimator defined in (65) based on the observations on the sampling set Sm,n . Let {n(m)}∞ m=1 be any sequence of integers satisfying, for all m

(120)

lim cθ,,m δn(m) = 0, i = 1, . . . , q + 1

m→∞

(121)

lim cE,,m δn(m) = 0, i = 1, . . . , q + 1.

m→∞

It follows that (122)

(ˆ γm,n(m) , θˆm,n(m) ) →p (m0 − a0 θ, θ) as m → ∞.

Moreover, a sufficient requirement for conditions (120) and (121) to hold is 8d lim rm δn(m) = 0.

(123)

4

m→∞

Surface elevation data

In this section the projection estimator is applied to the data of Davis (1973). The data consists of n = 52 measurements of surface elevation and it was collected on a square. Thus the domain is two-dimensional. Firstly, one must choose a class of additive models. One such class was suggested by Shapiro & Botha (1991) and it takes the form (124)

CA θ (ρ) =

q X

θ(i)J0 (λi ρ)

i=1

where J0 is the Bessel function of order zero of the first kind and the λi are fixed positive numbers, while the θ(i) are positive numbers to be estimated. It should be emphasised that the estimation procedure applied by Shapiro and Botha is very different from the projection estimator considered here. Nevertheless, the model is additive so that the projection estimator (22) can be applied to this model. However, the covariance component functions in the model decay very slowly (on the order of ρ−1/2 ). This is insufficient to satisfy Condition 3.3, hence Theorem 3.6 does not apply and the convergence in probability may not be attainable. The model (124) Pq will be named model A. Another model, named B, will also be considered. It may be parametrised as CB (ρ) = θ i=1 θ(i)Ci (ρ) where     2 b(i)J0 (b(i)ρ) − a(i)J0 (a(i)ρ) if ρ > 0 2 2 (125) Ci (ρ) = ρ(b(i) −a(i) )  1 if ρ = 0. 16

model linear trend, model A linear trend, model B1 linear trend, model B2

θˆ1 1123.54 1365.13 —

θˆ2 359.73 203.806 738.566

θˆ3 106.796 74.5592 52.6978

θˆ4 49.7274 133.592 263.894

Table 1: Estimated coefficients for the two mean models. A covariance model with components given by (125) is described in detail in Powojowski (2000, 2002b). It is a valid model as long as a(i) < b(i) and θ(i) ≥ 0 for i = 1, . . . , q. In addition, it can be easily shown that Ci (ρ) = O(ρ−3/2 ), which is enough to satisfy Condition 3.3. The particular model A considered is defined by q = 4, with λ1 = 1, λ2 = 2, λ3 = 3 and λ4 = 4. Two models of type B are considered. Model B1 will be defined by q = 4, (a(1), b(1)) = (0.5, 1.5), (a(2), b(2)) = (1.5, 2.5), (a(3), b(3)) = (2.5, 3.5) and (a(4), b(4)) = (3.5, 4.5), while Model B2 has q = 3 with the last three pairs of parameters. So Model B2 is like B1 but with the coefficient θ1 being forced to take the value 0. Criteria for choosing the model parameters will not be discussed here. They are considered in greater detail by Powojowski (2000, 2002a). Since the mean of the sampled process Y is not known, it needs to be modelled. In this paper, we considered only one such model allowing for an arbitrary linear trend over the sampled square, with corresponding regressor matrix X. The resulting estimates are given in Table 1. The top graph in Figure 1shows the three estimated covariograms. We see that Models A and B1 have very similar covariograms for distances up to 3 whereas Model B2 is quite different. Between distances of 3 and 5, the three models are somewhat different, while beyond distances of 5, Models B1 and B2 are similar and different from Model A. The bottom graph contains the 2704 products of residuals marked as individual points with coordinates (kxk − xl k, e(k)e(l)), where 1 ≤ k ≤ 52, 1 ≤ l ≤ 52 and e(k) = Y (k) − Yˆ (k) is the residual computed as in (20) along with the loess smooth of these points as computed in Splus6. To try to get an idea which of these three models fit better, we will try to compare this smooth curve to the covariograms, although the comparison will be indirect. Note that the mean of the product of residuals e(k)e(l) is (P KY P )(k, l), rather than KY (k, l) which may be estimated by (P Kθˆi P )(k, l) rather than by Kθˆi (k, l), where P denotes the (regression) projection in (20). To judge the quality of a given covariance model, we therefore plot the loess smooth estimate of the points (kxk − xl k, e(k)e(l)) along with the expected value of these points under the combination of regression and covariance models (kxk − xl k, (P Kθˆi P )(k, l)) where i is A, B1, or B2. If the covariance model is adequate, the smooth curve based on the product of residuals should pass through the cloud of expected product of residuals. Figure 2 shows this cloud superposed over the loess smooth of the product of residuals along with 95% confidence intervals of the curve for each of the three covariance models. The curve and especially the confidence intervals are to be taken with caution since the 2704 products of residuals are clearly not independent. We see that the fit of the covariance Model B1 is slightly better than that of Model A while Model B2 does not fit well at all. So this shows that forcing the coefficients θ1 of Model B1 to be 0 is not compatible with the data. Comparing the results of this section with models previously fitted to the same data (e.g., Ripley, 1988, and Wackernagel, 1995), one observes an important difference. The models for the mean of the process used by those authors are linear trends or quadratic surfaces whereas the covariance models included the exponential and the Gaussian parametric models. These parametric covariance models do not allow negative covariances whereas the additive models that we consider do. Visual examination of Figure 2 suggests that models without negative covariances are not plausible.

17

Estimated covariance functions

1000 500 0

Covariance

1500

Model A Model B1 Model B2

0

2 

4 

6 

8 

rho

5000 0 −5000

Covariance

15000

Loess smooth of products of residuals

· · · · · · · · · · ·· · · · ·· · · · · ·· · · · · ·· ·· ·· ·· ·· ··· · · ·· · ···· ··· · ·· · ·· ·· · · · · · · · · · · · · · · ······· ·· ······· ··· · · ··· ··· · · ········ · · ·· ·· ······ · ··· ·· ········· ···· ········ ······················· · ···· ·· ·· ·· ············································································································································································································································ ·········· ···· · ···· ·· · · ·· ·· ····································· ········ ················ ············ ······················································ ········· ·· ·· ·· · · · · · · · ····················· ····· · ·· · ···· ·········· ···· ··· · ············ · · ··· ·· ···· · ···· · · · · ·· · ·· ·· · · · · · ·· · · · · · ·· · · ··· · ·· · ·· · · 0 

2 

4 rho

Figure 1:

18



6 

·

8

Expected product of residuals versus loess smooth of products of residuals

1000 500 −500

0

Covariance

1500

2000

2500

Model A

· · ··· ·· ·· ··· · ·· · · · ········ ········ · · ·· ························ · · ···················· ··· ·· ·· ··················· ···· ·································· · ··· · · · · ·· ·· ··································································· · ······ · · ··· ············································································· ················································································································· ···· ·· ···································································································· ···· · · ··· ······· ··············· ··················· · · · ·· · · 0



2

4

6



· 8

rho

1000 500 −500

0

Covariance

1500

2000

2500

Model B1

· ·· ·· ··· ·· · · · · ·········· ·· · ····· ···· ······ ····································· · ·· · · ··· ·· · ······································· · · ··························································· ··· · · ··············································· ·· ··························································································································· ······ ··· · ·· · · · ·· · ······························································································· · · ·············································· · · · 0



2

4

6



· 8

rho

1000 500 0 −500

Covariance

1500

2000

2500

Model B2

··· · ·· ······· ·· ··· ··· ··· · ············· ··················· ··· ·· · ······························································································································································· ···· ·· · ·· ········ ························· ····· 0

2



4

6 rho

Figure 2:

19



·

8

5

Conclusion

The paper proposes a new approach to the problem of estimating the covariance function of a stochastic process. The approach combines two main ideas: using additive models and a new estimation method based on orthogonal projections. The approach may be viewed as a particular case of MINQUE estimation. The case is presented that this approach offers many distinct theoretical and practical advantages over traditional approaches involving the empirical (co)variogram, as well as over general MINQUE estimation. In comparison with traditional methods the need to estimate the empirical (co)variogram is eliminated and so is the arbitrariness of the bin selection. Since the empirical (co)variograms are meaningless for non-stationary processes and hard to compute for non-isotropic processes, the approach presented is more generally applicable than the traditional procedures. The mean and polynomial trends in the mean can be estimated simultaneously without much complication and without compromising the estimator’s properties. For instance, the estimator by projection is unbiased even when the mean of the process needs to be estimated from the data (assuming that the model for the mean is correct and the true covariance function is in the class of models considered). The estimation procedure involves only linear algebra, and thus all problems, both theoretical and practical, associated with non-linear, nonquadratic optimisation are avoided. The properties of the estimator can be more easily understood than is the case in the traditional approach. In particular, the properties of the covariance estimator are as easy to obtain as those of the estimators of the parameters, unlike for the traditional covariance estimators. In contrast with the general MINQUE estimation, (which is also unbiased and requires only linear algebra to compute) the stability and asymptotic properties of the projection estimator are not dependent on the relationship between the initial guess K0 required by MINQUE and the true covariance matrix KY . The linear algebra computations required for the projection estimator involve inverses of much smaller matrices than those required for any other MINQUE estimator. It is shown that in the in-fill asymptotic context (sampling of finite domain) it is generally impossible to estimate the covariogram consistently. An upper bound may, however, be obtained for the asymptotic variance of the estimator in this context. Furthermore, it is shown that for isotropic processes this upper bound can be made arbitrarily small if the sampled domain is sufficiently large. Finally, a constructive method is presented to sample more densely an increasing domain in such a way to guarantee the convergence of the covariance function projection estimator. The estimator is illustrated using a data set of Davis (1973). In order to apply the projection estimator in practice, an adequate class of additive models is required. One such class is the class of Shapiro & Botha (1991), as seen in Section 4, although it suffers from rates of decay too slow for the assumptions in some of the results of this paper. Other flexible classes of models, satisfying the hypotheses needed for the convergence results, are explored in Powojowski (2000, 2002b) and a model of this type is also briefly considered in modelling the Davis data. APPENDIX This section contains the proofs of the results. For convenience and without loss of generality, for all covariance model components throughout this appendix it will be assumed that Ci (0, 0) = 1 or Ci (0) = 1 and that the discrepancy sequences {δn }∞ n=1 are nonincreasing. Before proving Lemma 3.1 some required lemmas are established. d Lemma A.1 Let (D, {Sn }∞ n=1 ) be an in-fill sampling domain and ψ ∈ C (D) be the uniform limit of the sequence of d functions {ψn } in C (D) with supx∈D |ψn (x) − ψ(x)| ≤ αn . It follows that Z  X   ∗  ψ(x)dx − 1 ψn (s)  ≤ αn + δn VD (ψ).  |S | n D s∈Sn

Lemma A.2 Let S be a sampling point set on D with |S| = n and D∗ (S) = δ. Then S × S is a sampling point set on D × D with |S × S| = n2 and D∗ (S × S) ≤ 2δ. Lemma A.3 For two q × q matrices A and B and any norm, there exists a finite k such that (126)

k AB k≤ k k A k k B k

where k = 1 for the Frobenius norm and the spectral radius, and k = q for the L∞ norm. Lemma A.4 If matrices A and B are positive-definite and A is symmetric, then tr(AB) ≤ k A ko tr(B) ≤ tr(A) tr(B). The proofs are omitted, due to their simplicity. The following simple result will be useful. 20

Lemma A.5 Let the sequence of symmetric q × q matrices {An } be such that k An − A k∞ → 0 where A is invertible. Then for n large enough to guarantee k An − A k∞ ≤k A−1 k∞ /(2q 3 ), one has −1 k A−1 k∞ ≤ 2q 6 k A−1 k2∞ k An − A k∞ n −A −1 k A−1 k∞ n k∞ ≤ 2q k A

Proof of Lemma A.5. the relations (17) and (18).

The proof follows easily from Theorem 2.3.5 of Atkinson & Han (2001) on page 50, and

Lemma A.6 Let the estimator be defined as in (22). If (25) holds, then with c given by (25) and the notation as in (38) - (41) var(θˆn (k)) ≤ qcEn (k, k),

(127)

k = 1, . . . , q.

Proof of Lemma A.6. Firstly, one observes that if P is a q × q symmetric non-negative definite matrix, then q diag(P ) − P is non-negative definite. By (25) var(Yn0 Ui,n Y ) ≤ c tr(Ui,n UY,n Ui,n UY,n ) and it follows that [cov(Yn0 Ui,n Yn , Yn0 Uj,n Yn )]

 ≤ q diag



[cov(Yn0 Ui,n Yn , Yn0 Uj,n Yn )]



 ≤ cq diag [tr(Ui,n UY,n Uj,n UY,n )] Denoting the k-th column of A−1 X by a one obtains var(θˆn (k)) =



 −1 4 0 0 A−1 (1/n )[cov(Y U Y , Y U Y )]A (k, k) n i,n n n j,n n X X = a0 (1/n4 )[cov(Yn0 Ui,n Yn , Yn0 Uj,n Yn )] a ≤ cq a0 (1/n4 ) diag([tr(Ui,n UY,n Ui,n UY,n )]) a   −1 0 4 = cq A−1 (1/n ) diag([tr(U U U U )])(A ) (k, k) = cq EX (k, k) i,n Y,n i,n Y,n X X

which concludes the proof. Proof of Lemma 3.1. estimator (22)

We begin with the matrix AX,n . First consider the covariance matrix of en in the

(128) Ui,n (l1 , l2 ) = cov[en (l1 ), en (l2 )] = Ci (xl1 , xl2 ) − Xn (l2 , ·)(Xn0 Xn )−1 Xn0 Ki,n (·, l1 ) − Xn (l1 , ·)(Xn0 Xn )−1 Xn0 Ki,n (·, l2 ) + Xn (l1 , ·)(Xn0 Xn )−1 Xn0 Ki,n Xn (Xn0 Xn )−1 Xn (l2 , ·)0 = Ci (xl1 , xl2 ) − (u1 (xl2 ), . . . , up (xl2 ))(Xn0 Xn )−1 Xn0 (Ci (x1 , xl1 ), . . . , Ci (xn , xl1 ))0 − (u1 (xl1 ), . . . , up (xl1 ))(Xn0 Xn )−1 Xn0 (Ci (x1 , xl2 ), . . . , Ci (xn , xl2 ))0 + (u1 (xl1 ), . . . , up (xl1 ))(Xn0 Xn )−1 Xn0 Ki,n Xn (Xn0 Xn )−1 (u1 (xl2 ), . . . , up (xl2 ))0 ≡ φi,n (xl1 , xl2 ), where the sign ≡ is used to define the quantity to its right. By (44) and (45) one has k n(Xn0 Xn )−1 − I k∞ ≤ VX δn with (129)

VX = p2 maxk1 ,k2 =1,...,q VD∗ (uk1 uk2 ) and

(130)

k n(Xn0 Xn )−1 k∞ ≤ JX < ∞.

Similarly, using Theorem 3.1   Z 1   (Xn (·, k))0 (Ci (x1 , xl1 ), . . . , Ci (xn , xl1 ))0 − uk (ξ)Ci (ξ, xl1 )f (ξ)dξ  n  D ≤ δn VD∗ (uk (·)Ci (·, xl1 )) ≤ Ruk ,Ci δn 21

where Ruk ,Ci = supξ∈D VD∗ (uk (·)Ci (·, ξ)) < ∞.

(131) Since |Ci (ξ, η)| ≤ 1

  1   (Xn (·, k))0 (Ci (x1 , xl1 ), . . . , Ci (xn , xl1 ))0  ≤ supξ∈D |uk (ξ)| ≡ Ru k  n

(132)

and using k R−1 k= p and Theorem 3.1, one obtains   Xn (l2 , ·)(Xn0 Xn )−1 Xn0 (Ci (x1 , xl ), . . . , Ci (xn , xl ))0 1 1  −

p p X X

Z uk1 (xl2 ) D

k1 =1 k2 =1

  uk2 (ξ)Ci (ξ, xl1 )f (ξ)dξ   ≤

p X

  p p X X |uk1 (xl2 )| VX R u k2 + Ruk2 ,Ci δn .

k1 =1

k2 =1

k2 =1

R R

Furthermore, putting W (k1 , k2 ) = D D uk1 (ξ)uk2 (η)Ci (ξ, η)f (ξ)f (η)dξdη,, and using Lemma A.2     1 0  ∗   (133)  n2 Xn Ki,n Xn (k1 , k2 ) − W (k1 , k2 ) ≤ 2δn VD×D (ψk1 ,k2 ,i ) ≡ Vuk1 ,uk2 ,Ci δn , (134)

where ψk1 ,k2 ,i (ξ, η) = uk1 (ξ)uk2 (η)Ci (ξ, η)

and    p X    Xn (l1 , ·)n(Xn0 Xn )−1 (k) − uk (xl1 )≤ VX δn Ru j = V X R u δ n ,  

(135)

j=1

where Ru =

Pp

j=1

Ruj . It is easy to see that |W (k1 , k2 )| ≤ Ruk1 Ruk2 and it follows that

  Xn (l1 , ·)(Xn0 Xn )−1 Xn0 Ki,n Xn (Xn0 Xn )−1 Xn (l2 , ·)0 −  p p X X k1 =1 k2 =1

  uk1 (xl1 )uk2 (xl2 )W (k1 , k2 ) 

    p p  X X  1 0  Xn (l1 , ·)n(Xn0 Xn )−1 (k1 ) X K X (k , k ) − W (k , k ) ≤ i,n n 1 2 1 2  n2 n k1 =1 k2 =1   Xn (l2 , ·)n(Xn0 Xn )−1 (k2 )      + Xn (l1 , ·)n(Xn0 Xn )−1 (k1 )W (k1 , k2 ) Xn (l2 , ·)n(Xn0 Xn )−1 (k2 ) − uk2 (xl2 )      + Xn (l1 , ·)n(Xn0 Xn )−1 (k1 ) − uk1 (xl1 ) W (k1 , k2 )uk2 (xl2 )  ≤

p p X X

2 (Vuk1 ,uk2 ,Ci Ru2 JX + JX VX Ruk1 Ruk2 Ru2 + VX Ru Ruk1 Ru2 k )δn . 2

k1 =1 k2 =1

Hence it follows that (136)

|φi,n (xl1 , xl2 ) − φi (xl1 , xl2 )| ≤ Qφi δn

where Qφi < ∞ is independent of (xl1 , xl2 ). A similar result holds for φY . Let (137) (138)

Qφ = max{Qφ1 , . . . , Qφq , QφY } Rφ = supD×D {max{|φ1 |, . . . , |φq |, |φY |}} < ∞. 22

Using again Theorem 3.1, and the triangle inequality of the form |an bn − ab| ≤ |a||bn − b| + |b||an − a| + |an − a||bn − b| one obtains |AX,n (i, j) − AX (i, j)|  n n  Z Z X X 1    = φ (x , x )φ (x , x ) − φ (ξ , ξ )φ (ξ , ξ )f (ξ )f (ξ )dξ dξ i,n k l j,n l k i 1 2 j 2 1 1 2 1 2  n2 D

k=1 l=1



≤ D (Sn ×

∗ Sn ) VD×D (φi φj )

D

∗ + 2Qφ Rφ δn + Q2φ δn2 ≤ 2VD×D (φi φj )δn

+ 2Qφ Rφ δn + Q2φ δn2 . The same argument is applied to (53) and it follows that the following constants are sufficient ∗ cA,X = maxi,j=1,...,q {2VD×D (φi φj )} + 2Qφ Rφ + Q2φ δ1

(139)

∗ cM,X = maxi=1,...,q {2VD×D (φi φY )} + 2Qφ Rφ + Q2φ δ1 R Let’s now consider BX,n . Putting hX,i (ξ, η) = D φi (ξ, λ)φY (λ, η)f (λ)dλ one obtains

(140)

  n X   1 ∗  (141) hX,i (ξ, η) − φi,n (ξ, xk )φY,n (xk , η)  ≤ (VD (φi (ξ, ·)φY (·, η)) + 2Qφ Rφ )δn n k=1

+ Q2φ Rφ δn2 . By the continuity of the functions involved and the compactness of D one has (142)

Rφi ,φY = sup(ξ,η)∈D×D VD∗ (φi (ξ, ·)φY (·, η)) < ∞

(143)

RhX,i = sup(ξ,η)∈D×D |hX,i (ξ, η)| < ∞ X   n 1  2  φi (ξ, xk )φY (xk , η)   ≤ Rφ . n

(144)

k=1

Furthermore,   hX,i (xk , xk )hX,j (xk , xk ) 1 3 1 3  −

 n n X X  1 1 φi,n (xk1 , xk2 )φY,n (xk2 , xk3 ) φj,n (xk1 , xk2 )φY,n (xk2 , xk3 )  n n

k2 =1

k2 =1

≤ ((Rφi ,φY + 2Qφ Rφ )Rφ2 + (Rφj ,φY + 2Qφ Rφ )RhX,i + (RhX,i + Rφ2 )Q2φ Rφ δn2 . Using Lemma A.1, this leads to (145) cB,X = maxi,j=1,...,q {(Rφi ,φY + 2Qφ Rφ )Rφ2 + (Rφj ,φY + 2Qφ Rφ )RhX,i ∗ + (RhX,i + Rφ2 )Q2φ Rφ δ1 + 2VD×D (hX,i hX,j )}.

Assuming AX,n is invertible and using (126) and Lemma A.5 −1 −1 −1 k EX,n − EX k∞ ≤k A−1 X,n diag(BX,n )AX,n − AX diag(BX )AX k∞ −1 −1 −1 −1 =k A−1 X,n (diag(BX,n ) − diag(BX ))AX,n + AX,n diag(BX )(AX,n − AX ) −1 −1 + (A−1 X,n − AX ) diag(BX )AX k∞ −1 2 2 8 ≤ (q 2 JA −1 cB,X + 2q JA−1 k diag(BX ) k∞ k AX k∞ cA,X X

X

3 + 2q 8 cA,X k A−1 X k∞ k diag(BX ) k∞ )δn

where JA−1 = supn=1,... k A−1 X,n k∞ < ∞. It follows that X

−1 2 2 8 (146) cE,X = q 2 JA −1 cB,X + 2q JA−1 k diag(BX ) k k AX k cA,X + X

X

3 2q 8 k A−1 X k k diag(BX ) k cA,X .

23

Proof of Theorem 3.3. The proof of (56) is an immediate application of Lemma A.5 and (126), (52) and (53). Putting JM,X = sup1,... k MX,n k and θX = A−1 X MX , one obtains: (147)

k E[θˆX,n ] − θX k∞ =k [tr(Ui,n Uj,n )]−1 E[e0n Ui,n en ] − θX k∞ −1 =k A−1 X,n MX,n − AX MX k∞ −1 2 ≤ (2q 7 k A−1 X k∞ cA,X JM,X + q k AX k∞ cM,X )δn = cθ,X δn .

Finally, (57) follows from Lemma A.6 and (55).

Proof of Lemma 3.2. Clearly one has KW,n = In and hence UW,n = Pn KW,n Pn = Pn . Recall that Ui,n = Pn Ki,n Pn . For 1 ≤ j ≤ q, using Theorem 3.1 (148) (149)

n X 1 1 1 φj (xl , xl ) tr(UW,n Uj,n ) = tr(Uj,n ) = n n n l=1   1   tr(UW,n Uj,n ) − a(j) ≤ VD∗ (ψj )δn , where ψj (ξ) = φj (ξ, ξ). n 

Moreover, (150) (151)

U(Y,),n

tr(UW,n UW,n ) = tr(Pn ) = n − p = Pn K(Y,),n Pn = Pn (KY,n + γIn )Pn = UY,n + γPn

and so again using Theorem 3.1 (152) (153)

n X 1 1 1 1 n−p tr(UW,n U(Y,),n ) = tr(UY,n ) + γ tr(Pn ) = φY (xl , xl ) + γ n n n n n l=1    1  tr(UW,n U(Y,),n ) − m0  ≤ VD∗ (ψY )δn + γn−1 n 

where ψY (ξ) = φY (ξ, ξ). Furthermore, using Lemma 3.1 and Lemma A.4       1  1  1       (154)  tr(U U ) − M (i) ≤ tr(U U ) − M (i) + γ tr(U ) i,n (Y,),n X i,n Y,n X i,n   n2   n2   n2 ≤ cM,X δn + γn−1 , as well as 1 1 2γ γ2 tr(U U U U ) = tr(U U ) + tr(U ) + tr(Pn ) W,n W,n Y,n Y,n Y,n (Y,),n (Y,),n n2 n2 n2 n2 n n XX 1 ≤ φY (xk , xl )φY (xl , xk ) + (2γc + γ 2 )n−1 n2 k=1 l=1

where (155)

c = supD (var(Y ))

is a constant. To see the last result, one recalls Lemma A.4 to obtain: n−1 (2γ tr(Pn KY,n Pn ) + γ 2 tr(Pn )) ≤ n−1 2γ k Pn ko tr(KY,n ) + γ 2 ≤ n−1 2γ tr(KY,n ) + γ 2 ≤ 2γ supD (var(Y )) + γ 2 . Hence, by Theorem 3.1 and Lemma A.2   1  ∗ 2 2 −1   (156)  n2 tr(UW,n U(Y,),n UW,n U(Y,),n ) − b0  ≤ 2VD×D (φY )δn + (2γc + γ )n . 24

Since limn→∞ n−1 /δn = 0, the rate of convergence above is O(δn ). Similarly, invoking Lemma A.4 again 1 1 tr(UW,n U(Y,),n Uj,n U(Y,),n ) = 3 tr(UY,n Uj,n UY,n ) + 2γc n−1 + γ 2 n−2 3 n n 1 1 (158) tr(Ui,n U(Y,),n Uj,n U(Y,),n ) = 4 tr(Ui,n UY,n Uj,n UY,n ) + 2γc n−1 + γ 2 n−2 . n4 n By arguments similar to those in the proof of Theorem 3.3 it follows that   1   tr(U U U ) − b(j) (159)  Y,n j,n Y,n   n3

(157)

∗ ≤ (Rφ (Rφi ,φY + 2Qφ Rφ ) + 2VD×D (φY hX,j ) + RhX,j Qφ )δn = cbj δn .

Hence and by (54) (160) (161)

  1   tr(UW,n U(Y,),n Uj,n U(Y,),n ) − b(j) ≤ cbj δn + c n−1 + γ 2 n−2  n3    1   tr(Ui,n U(Y,),n Uj,n U(Y,),n ) − BX (i, j) ≤ cB,X δn + c n−1 + γ 2 n−2 .  n4 

>From the remarks above it follows that there exists bounds cA, , cM, , cB, and cθ, such that (162)

k A,n − A k∞ ≤ max{cA,X , VD∗ (ψ1 ), . . . , VD∗ (ψq )}δn + pn−1 ≤ cA, δn

(163)

k M,n − M k∞ ≤ max{cM,X , VD∗ (ψY )}δn + γn−1 ≤ cM, δn

(164)

k B,n − B k∞ ≤ max{cB,X , cb1 , . . . , cbq }δn + 2γc n−1 + γ 2 n−2 ≤ cB, δn

(165)

8 −1 2 2 k E,n − E k∞ ≤ (q 2 JA k∞ cA, −1 cB, + 2q JA−1 k diag(B ) k∞ k A  

3 +2q 8 k A−1  k∞ k diag(B ) k∞ cA, )δn ≤ cE, δn ,

where JA−1 = supn=1,... k A−1 ,n k∞ < ∞. 

(166)

Proof of Theorem 3.4. (167)

−1 −1 k E[(ˆ γn , θˆn )0 ] − A−1  M k∞ =k A,n M,n − A M k∞

(168)

2 −1 ≤ (k A−1  k∞ cA, JM + k A k∞ cM, )δn where JM = sup1,... k Mn k∞ < ∞.

Proof of Lemma 3.3. For each m, let Dm = Tm (D) and let Bρ (D) = D \R ∂ρ (D). It is easily seen that d−1 −1 (D)) = Bρ (Tm (D)) and hence µ(Dm \ Bρ (Dm )) ≤ cρr Tm (Brm m . Let F1,m (A) = A fm (ξ)dξ for any measurable ρ subset A of D. Let αG = π d/2 /Γ(d/2 + 1). Therefore µ({x ∈ Rd :k x k≤ ρ}) = αG ρd . By the hypotheses of Lemma 3.3 and recalling that µ(D) = 1, F1,m (A) = µ(A)/µ(Dm ) one observes that F1,m (Dm ) F1,m ({x ∈ Rd :k x k≤ ρ}) ≥ Gm (ρ) 2 = F2,m ({(ξ, η) ∈ Dm :k ξ − η k≤ ρ})

≥ F1,m (Bρ (Dm )) F1,m ({x ∈ Rd :k x k≤ ρ}) = µ(Dm )−2 µ(Bρ (Dm )) µ({x ∈ Rd :k x k≤ ρ}) d−1 −2d d d−1 ≥ µ(Dm )−2 (µ(Dm ) − cρrm )αG ρd = rm (rm − cρrm )αG ρd −d d −d−1 d+1 = αG rm ρ − αG crm ρ −d d while F1,m (Dm ) F1,m ({x ∈ Rd :k x k≤ ρ}) = αG rm ρ . Hence −d d −d d −d−1 d+1 αG rm ρ ≥ Gm (ρ) ≥ αG rm ρ − αG crm ρ

and one obtains the required result.

25

The following lemmas will be necessary for the proof of Theorem 3.5. Lemma A.7 For any real nonnegative function φ ∈ L1 (R+ ) Z rm 1 lim φ(ρ)ρdρ = 0. rm →∞ rm 0 Lemma A.8 For the orthonormal polynomial regressors uk,m , k = 1, . . . , p, m = 1, 2, . . . on Dm = [−rm /2, rm /2]d there exists a constant c such that for m = 1, 2, . . .: supx∈Dm |

(169)

∂ |u| uk,m (x)| ≤ c. ∂u

Proof of Lemma A.8. Initially u = (0, . . . , 0) is assumed. Let mi , i = 1, . . . , l be all the monomials present in the polynomials uk , k = 1, . . . , p. Thus (170)

uk,m =

l X

ck,i,m mi , with

i=1

Z

d u2k,m (x)dx = rm .

(171) Dm

Hence mi (x) =

Qd

j=1

x(j)γi,j for i = 1, . . . , l and some non-negative integers γi,j and so  d rm d+Pdj=1 (γi1 ,j +γi2 ,j ) Y 1 − (−1)1+γi1 ,j +γi2 ,j . = mi1 (x)mi2 (x)dx = 2 (γi1 ,j + γi2 ,j ) + 1 Dm j=1 Z

κi1 ,i2 ,m

Hence it follows that for all i1 , i2 = 1, . . . , l the matrix K defined as (172)

K(i1 , i2 ) =

κi1 ,i2 ,m (κi1 ,i1 ,m )1/2 (κi2 ,i2 ,m )1/2 −1/2 d/2

is positive definite and does not depend on m. Let m ˜ i,m (x) = κi,i,m rm mi (x). It follows that, Z d (173) m ˜ i,m (x)m ˜ j,m (x)dx = rm K(i, j) Dm −1/2

d/2 ˜ i,m (x)| = κi,i,m rm supx∈Dm |m supx∈Dm |mi (x)| Pd P d −d/2− j=1 γi,j d/2 rm ≤ rm rm ( ) j=1 γi,j ≤ 1. 2

(174)

Let the orthogonal matrix P and the diagonal matrix Λ be such that P 0 KP = Λ. Defining vi,m = one has, for some finite constant cv Z d (175) vi,m (x)vj,m (x)dx = rm Λ(i, j)

Pl

j=1

P (j, i)m ˜ j,m

Dm

supx∈Dm |vi,m (x)| ≤ cv .

(176) Thus (170) can be rewritten as (177)

uk,m =

l X

C˜m (k, i)vi,m

i=1

where C˜m is a p × l matrix where l ≥ p. By (171), it follows that for all m 0 I = C˜m ΛC˜m ,

(178)

where it is seen that the entries of the matrices C˜m are uniformly bounded, for all m by some finite constant cλ . It follows that (179)

supx∈Dm |uk,m (x)| ≤

l X

|C˜m (k, i)| supx∈Dm |vi,m (x)| ≤ lcv cλ ≤ c

i=1

26

for some finite c independent of m, which concludes the proof in the case u = (0, . . . , 0). Extending the result to the case of generic u is obvious, since the coefficients |P (j, i)| and |C˜m (k, i)| have been shown to be bounded uniformly for all m, while in a manner similar to (174) it is easily shown that that given the maximal order of monomials is L, the following bound holds ∂ |u| ∂ |u| −d/2− −1/2 d/2 m ˜ i,m (x)| = κi,i,m rm supx∈Dm | u mi (x)| ≤ rm u ∂ ∂

Pd

supx∈Dm |

j=1

γi,j

rm Pdj=1 (max{γi,j −u(j),0}) ) ≤ 1, 2

d/2 |u| rm L (

whence the result follows. Lemma A.9 For some constants αi and αY , and the constant τ of Condition 3.3, the following bounds hold uniformly on Dm × Dm for m = 1, 2, . . .: (180)

−d/2−τ |φi,m − Ci | ≤ αi rm

(181)

−d |φY,m − CY | ≤ αY rm ,

where φi,m is defined in (46) for domain Dm and φY,m is similarly defined. Applying Lemma A.8 and Conditions 3.2 and 3.3 one has

Proof of Lemma A.9.

 p  Z X   |φi,m (ξ, η) − Ci (ξ, η)| ≤  uk,m (ξ) uk,m (λ)Ci (ξ, λ)fm (λ)dλ  D

k=1

X  Z  p   + u (η) u (λ)C (η, λ)f (λ)dλ k,m k,m i m   D

k=1

X p  p X + uk1 ,m (ξ)uk2 ,m (η)  k1 =1 k2 =1

Z Z × D

D

  uk1 ,m (λ1 )uk2 ,m (λ2 )Ci (λ1 , λ2 )fm (λ1 )fm (λ2 )dλ1 dλ2   −d/2−τ −d/2−τ −d/2−τ ≤ 2pc2 ci rm + p2 c4 cii rm = αi rm

and (180) is proved. The proof of (181) is similar. Proof of Theorem 3.5. The proof consists of showing that the matrices A,m and M,m , defined in (80) and (81) with the domain D = Dm , suitably multipled by a diagonal matrix of constants, converge to the matrices A and M of (113) and (114) and to show that the diagonal elements of the matrix B,m , defined in (82) with D = Dm , are bounded. To show that, we start by showing that this is indeed the case when X = 0 before showing that the limit is identical if X 6= 0. We begin with the submatrix A of A (113) and the corresponding AX,m of A,m . Putting X = 0 in the definition of AX,m and applying (95) one defines Z

diam(Dm )

(182) Am (i, j) = AX=0,m (i, j) =

Ci (ρ)Cj (ρ)dGm (ρ) 0

=

−d dαG rm

Z

diam(Dm )

Ci (ρ)Cj (ρ)ρ

d−1

Z dρ −

0

0

(183)

diam(Dm )

|Ci (ρ)Cj (ρ)|ρd dρ.

0

R∞ 0

|Ci (ρ)Cj (ρ)|ρd−1 dρ < ∞, and so Lemma A.7 implies −1 rm

Z

Ci (ρ)Cj (ρ)dRm (ρ) 0

where from Lemma 3.3 Z diam(Dm ) Z −d−1 Ci (ρ)Cj (ρ)dRm (ρ) ≤ cαG (d + 1)rm Condition 3.3 implies that

diam(Dm )

rm diam(D)

|Ci (ρ)Cj (ρ)|ρd dρ → 0

0

27

as rm → ∞, and the second term in (182) vanishes. Hence (184) (185)

d rm Am (i, j)



Z

Ci (ρ)Cj (ρ)ρd−1 dρ Z ∞ d d lim rm Mm (i) = lim rm MX=0,m (i) = dαG Ci (ρ)CY (ρ)ρd−1 dρ lim

rm →∞

rm →∞

= dαG

0

rm →∞

0

where the last limit is shown through a similar argument. To extend the results (184) and (185) to the case of X 6= 0, the following limits will be useful: d lim rm |AX,m (i, j) − Am (i, j)| = 0

(186)

rm →∞

d lim rm |MX,m (i) − Mm (i)| = 0.

(187)

rm →∞

Let φi,m denote the function φi of (46) for the domain Dm . By Lemma A.9 one has −d/2−τ −d/2−τ −d−2τ (188) |φi,m (ξ1 , ξ2 )φj,m (ξ2 , ξ1 ) − Ci (ξ1 , ξ2 )Cj (ξ2 , ξ1 )| ≤ αi rm |Cj (ξ1 , ξ2 )| + αj rm |Ci (ξ1 , ξ2 )| + αi αj rm

uniformly on Dm × Dm and hence (186) is proved as follows: d lim rm |AX,m (i, j) − Am (i, j)|

rm →∞

≤ lim

rm →∞

d rm

Z

Z |φi,m (ξ1 , ξ2 )φj,m (ξ2 , ξ1 )

Dm

Dm

− Ci (ξ1 , ξ2 )Cj (ξ2 , ξ1 )|fm (ξ1 )fm (ξ2 )dξ1 dξ2 d −d/2−τ −d/2−τ −d/2−τ −d/2−τ −d−2τ ≤ lim rm (αj rm cii rm + αi rm cjj rm + αi αj rm ) = 0. rm →∞

A similar argument proves (187). To bound |BX,m (i, j)|, three inequalities will be needed. Let ψi (x) = Ci (x, 0) and ψY (x) = CY (x, 0). Then by the stationarity of the process and a change of variables (189) Z Z Z |Ci (ξ, λ)CY (λ, η)|dλ = Rd

|Ci (ξ−η−λ, 0)CY (λ, 0)|dλ = Rd

|ψi (ξ−η−λ)ψY (λ)|dλ = (|ψi |∗|ψY |)(ξ−η) = ϑi (ξ−η) Rd

where ϑi = |ψi | ∗ |ψY | denotes the convolution of the two functions. If ψi ∈ L2 and ψY ∈ L1 (and by its boundedness, also ψY ∈ L2 ) then the same applies to |ψi | and |ψY |. Firstly, the convolution integral converges since ψi and ψY are in L1 . Let F(ψ) denote the Fourier transform of ψ. It follows that F(|ψi |) and F(|ψY |) exist and are all in F L2 . Moreover, since |ψY | ∈ L1 , it follows, for some constant cF ψY , that |F(|ψY |)| ≤ cψY < ∞. Furthermore by the 2 2 convolution theorem |F(|ψi | ∗ |ψY |)| = |F(|ψi |)F(|ψY |)| ≤ cF ψY |F(|ψi |)| ∈ L . Since for L functions the Fourier 2 transform is fully reciprocal, it follows that (|ψ R i | ∗ |ψY |)(ξ) = ϑi (ξ) ∈ L . It is also clear that the ϑi are isotropic since ψi and ψY are. Thus for some constant Rd |ϑi (ξ)ϑj (ξ)|dξ ≤ cϑi,j < ∞ and so  Z Z Z Z Z 1 1 d (190) |ϑi (ξ − η)ϑj (ξ − η)|dξdη ≤ ϑ2i (ξ − η)dξ + ϑ2j (ξ − η)dξ dη ≤ (cϑi,i + cϑj,j )rm , 2 Dm Dm 2 Dm Dm Dm which is the first of the three inequalities. Secondly, by lemma A.9, (102) of Condition 3.2 and (105) of Condition 3.3 Z (191) |φi,m (ξ, λ)φY,m (λ, η)|dλ Dm Z −d/2−τ −d ≤ (|Ci (ξ, λ) + αi rm )(|CY (λ, η)| + αY rm )dλ Dm Z Z −d/2−τ ≤ |Ci (ξ, λ)CY (λ, η)|dλ + αi rm |CY (λ, η)|dλ Dm Dm Z Z −d −d/2−τ −d + αY rm |Ci (λ, η)|dλ + αi αY rm dλ Dm Z Dm −d/2−τ ≤ |Ci (ξ, λ)CY (λ, η)|dλ + cφ,i rm Dm

28

for some constant cφ,i . Lastly, Z

Z |ϑi (ξ − η)|dξdη

(192) Dm

Dm 3d/2 ≤ rm

Z

Z

Dm

ϑ2i (ξ − η)dξdη

1/2 Z

Dm

Z

Dm



−3d rm dξdη

1/2

Dm 3d/2 d 1/2 −d 1/2 rm (cϑi,i rm ) (rm )

3d/2 = (cϑi,i )1/2 rm .

By (49), the preceding inequalities and by (190), |BX,m (i, j)| is bounded by −4d rm

Z

Z

Dm

Dm

Z

 |φi,m (ξ, λ)φY,m (λ, η)|dλ Dm Z  × |φj,m (ξ, λ)φY,m (λ, η)|dλ dξdη Dm  Z Z Z −4d −d/2−τ ≤ rm |Ci (ξ, λ)CY (λ, η)|dλ + cφ,i rm D D Dm Z m m  −d/2−τ |Cj (ξ, λ)CY (λ, η)|dλ + cφ,j rm dξdη Dm Z Z Z Z −4d −4d −d/2−τ ≤ rm ϑi (ξ − η)ϑj (ξ − η)dξdη + rm cφ,j rm ϑi (ξ − η)dξdη Dm Dm Dm Dm Z Z Z Z −4d −d/2−τ −4d −d−2τ + rm cφ,i rm ϑj (ξ − η)dξdη + cφ,i cφ,j rm rm dξdη Dm

Dm

Dm

Dm

1 1/2 1/2 −3d −3d−τ −3d−2τ ≤ (cϑi,i + cϑj,j )rm + (cϑi,i cφ,j + cϑj,j cφ,i )rm + cφ,i cφ,j rm 2 and hence 3d rm |BX,m (i, j)| ≤ cB < ∞ for all m.

(193)

Next, we consider the elements related to the presence of a nugget effect. Since Ci (ξ, ξ) = Ci (0) and by Lemma A.9 Z     (194) lim |am (i) − a(i)| = lim  (φi,m (ξ, ξ) − Ci (ξ, ξ))fm (ξ)dξ   rm →∞ rm →∞ Dm Z Z −d/2−τ −d ≤ lim |φi,m (ξ, ξ) − Ci (ξ, ξ)|fm (ξ)dξ ≤ lim αi rm rm dξ = 0 rm →∞

rm →∞

Dm

Dm

and similarly lim |m0,m (i) − m0 (i)| = 0.

(195)

rm →∞

Moreover, (196)

Z  Z   d d   lim rm |b0,m | = lim rm φ (ξ, η)φ (η, ξ)f (ξ)f (η)dξdη Y,m Y,m m m   rm →∞ rm →∞ Dm Dm Z Z d −d −d ≤ lim rm (|CY (ξ, η)| + αY rm )(|CY (η, ξ)| + αY rm )fm (ξ)fm (η)dξdη rm →∞ Dm Dm  Z Z Z Z d −2d 2 −3d = lim rm rm CY (ξ, η)dξdη + 2αY rm |CY (ξ, η)|dξdη rm →∞

Dm

Dm

Dm

Dm

+

−4d αY2 rm

Z

dξdη Dm

The following holds: Z Z d (197) lim rm rm →∞

Dm

Dm

h2X,j,m (ξ, η)dξdη

≤ lim

rm →∞

−d rm

Z Dm

29

Z Dm



Z

< ∞.

Dm

−τ −2τ ϑ2j (ξ − η)dξdη + lim O(rm + rm ) 0 and  > 0. For i = 1, . . . , q     ˆ ˆ pα = P |θm,n(m) (i) − θ(i)| > α ≤ P |θm,n(m) (i) − θm (i)| + |θm (i) − θ(i)| > α for m large enough, one has |θm (i) − θ(i)| ≤ α/2 and hence   α pα ≤ P |θˆm,n(m) (i) − θm (i)| > 2   α . ≤ P |θˆm,n(m) (i) − E[θˆm,n(m) (i)]| + |E[θˆm,n(m) (i)] − θm (i)| > 2 By (88) and the condition (120), one has |E[θˆm,n(m) (i)] − θm (i)| ≤ cθ,,m δn(m) ≤ α/4 for m sufficiently large and hence by Chebyshev’s inequality   α 16 pα ≤ P |θˆm,n(m) (i) − E[θˆm,n(m) (i)]| > ≤ 2 var(θˆm,n(m) (i)) 4 α 16 ≤ 2 (q + 1)c (E,m (i + 1, i + 1) + cE,,m δn (m)) α 16 −d ≤ 2 (q + 1)c (rm E (i + 1, i + 1) + cE,,m δn (m)) α where (90) from Theorem 3.4 and (117) from Theorem 3.5 are used. Hence the result follows by the condition (121). The proof for γˆm,n(m) follows the same argument.

Acknowledgments. This paper is based on the thesis of the first author under the co-supervision of the second author and of Marc Moore ´ and Denis Marcotte of the Ecole Polytechnique de Montr´eal. Both authors wish to thank them for their help with the thesis. 31

References Atkinson, K. & Han, W. (2001). A Functional Analysis Framework, vol. 39 of Texts in Applied Mathematics. New York: Springer. Bartlett, M. S. (1964). Spectral analysis of two-dimensional point processes. Biometrika 44, 299–311. Cressie, N. A. C. (1993). Statistics for Spatial Data. New York: John Wiley & Sons, revised ed. Cressie, N. A. C. & Grondona, M. O. (1992). A comparison of variogram estimation with covariogram estimation. In The Art of Statistical Science, K. V. Mardia, ed. Chichester: Wiley. Davis, J. C. (1973). Statistics and Data Analysis in Geology. New York: John Wiley & Sons. Diggle, P. J. (1983). Statistical Analysis of Spatial Point Patterns. New York: Academic Press. Fang, K. T. & Wang, Y. (1994). Number-theoretic Methods in Statistics. London: Chapman & Hall. Hall, P. & Patil, P. (1994). Properties of nonparametric estimators of autocovariance for stationary random field. Probability Theory and Related Fields 99, 399–424. Kitanidis, P. K. (1985). Minimum variance unbiased quadratic estimation of covariances of regionalized variables. Journal of the International Association for Mathematical Geology 17, 195–208. Korobov, N. M. (1959). The approximate computation of multiple integrals. Dokl. Akad. Nauk. SSSR 124, 1207–1210. Lahiri, S. N. (1996). On inconsistency of estimators under infill asymptotics for spatial data. Sankhya 58, 403–417. Lahiri, S. N., Kaiser, M. S., Cressie, N. & Hsu, N. (1999). Prediction of spatial cumulative distribution functions using subsampling. Journal of the American Statistical Association 94, 86–97. Matheron, G. (1965). Les variables r´egionalis´ees et leur estimation. Paris: Masson et Cie. Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. Philadelphia: SIAM. Niederreiter, H. & Spanier, J., eds. (2000). Monte Carlo and Quasi-Monte Carlo Methods 1998. New York: Springer. Powojowski, M. (2000). Sur la mod´elisation et l’estimation de la fonction de covariance d’un processus al´eatoire. Ph.D. thesis, University of Montr´eal. Powojowski, M. (2002a). Model selection in covariogram estimation. Tech. rep., In preparation. Powojowski, M. (2002b). Spectral component additive models of the covariogram. Tech. rep., In preparation. Rao, C. R. & Kleffe, J. (1988). Estimation of Variance Components and Applications. Amsterdam: NorthHolland. Ripley, B. D. (1988). Statistical Inference for Spatial Processes. Cambridge: Cambridge University Press. Shapiro, A. & Botha, J. D. (1991). Variogram fitting with a general class of conditionally nonnegative definite functions. Computational Statistics and Data Analysis 11, 87–96. Stein, M. L. (1987). Minimum norm quadratic estimation of spatial variograms. Journal of the American Statistical Association 82, 765–772. Stein, M. L. (1989). Asymptotic distribution of minimum norm quadratic estimators of the covariance function of a gaussian random field. The Annals of Statistics 17, 980–1000. Stein, M. L. & Handcock, M. S. (1989). Some asymptotic properties of kriging when thecovariance function is misspecified. Journal of the International Association for Mathematical Geology 21, 839–861. Wackernagel, H. (1995). Multivariate Geostatistics. New York: Springer-Verlag.

32

Suggest Documents