Random notes on kriging - CiteSeerX

11 downloads 0 Views 144KB Size Report
Theorem 5 (Schoenberg-Yaglom) A continuous function φ(x, y) that is condi- tionally negative definite and such that φ(x, ...... and earth sci- ences. Arnold, 1992. 26.
Random notes on kriging: an introduction to geostatistical interpolation for environmental applications Luca Bonaventura Stefano Castruccio MOX - Laboratorio di Matematica Applicata Dipartimento di Matematica Politecnico di Milano [email protected]

Contents 1

The estimation of spatially distributed and uncertain data

2

2

Basic definitions on random fields 2.1 Finite dimensional distributions . . . . . . . . . . . . . . . . . 2.2 First, second order moments and variograms of random fields . . 2.3 Analysis of random fields . . . . . . . . . . . . . . . . . . . . . 2.4 Definitions of stationarity of random fields . . . . . . . . . . . . 2.5 Characterization and representation theorems for variogram functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Measurement error and subgrid scales: the nugget effect . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Isotropic variogram models . . . . . . . . . . . . . . . . . . . .

. . . .

4 4 5 6 7

.

9

. .

11 12

3

Variogram estimation 3.1 Empirical variogram estimators . . . . . . . . . . . . . . . . . . . 3.2 Least squares variogram fitting procedures . . . . . . . . . . . . .

14 14 15

4

Spatial prediction and kriging 4.1 Ordinary kriging . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Universal kriging . . . . . . . . . . . . . . . . . . . . . . . . . .

17 17 20

5

Appendix: Basics of random variables

23

References

26

i

Introduction The purpose of these notes is to provide a short and self-contained introduction to the literature on geostatistical interpolation of scattered data. The existing literature on this topic has a very broad scope and presents the related issues from a wide range of rather different perspectives, motivated by highly specific applications, e.g. in mining, groundwater flow modelling, oceanography, meteorology. The aim of this introduction is to summarize in a consistent way the basic terminology and the key theorical concepts underlying the practice of geostatistical interpolation and to present the derivation of the most widely used kriging estimators. There is no attempt at a complete presentation of the underlying theories or methods, which is available in a number of well known publications. For a more complete description of the statistical techiques surveyed here, the reader is referred, among many others, to the presentations in [5], [9], [12], [13]. A more advanced presentation of the same material for readers with good background in mathematical statistics can be found in [4]. There is also no attempt at achieving a high standard of mathematical rigour in the formulation of the definitions and theorems. The reader interested in the complete presentation of the measure theoretic problems associated with probability spaces, random variables and random fields should consult textbooks such as [2]. A basic introduction to probability theory and mathematical statistics can be found e.g. in [10].

1

Chapter 1 The estimation of spatially distributed and uncertain data Consider N points xi , i = 1, · · · , N in the vector space Rd . At these locations, data zi i = 1, · · · , N are assumed to be known. These data are interpreted as the values of a field z, whose value depends on the position in space. In general, the points xi will be scattered disorderly in space, rather than aligned on a regular grid. Furthermore, the data are assumed to be affected by some uncertainty, due either to measurement error, or to the fact that the quantity z is dependent on some unpredictable physical process, or both. Definition 1 (Geostatistical interpolation) Given the N points xi , i = 1, · · · , N and the uncertain data zi i = 1, · · · , N, the geostatistical interpolation problem consist of • predicting the most appropriate value z0 for the quantity z at a point x0 , different from the points associated to the available data • estimating the uncertainty of the prediction z0 as a function of the uncertainty on the available data zi i = 1, · · · , N and of their correlation structure. The geostatistical interpolation problem is quite different from the classical interpolation problem. In classical interpolation, the data zi are assumed to be sampled from a function z(x), that is reconstructed from the data under some assumption on the nature of the interpolating function zˆ. Typically, for classical Lagrange interpolation one assumes that the function zˆ is a polynomial (see e.g. [11]), while in the case of Radial Basis Function interpolation (which is quite useful for deterministic interpolation from scattered data and has many technical similarities with kriging as far as the formulation of the interpolation problem is 2

Spatial data

3

concerned, see e.g. [3]) the interpolator is assumed to be a linear combination of shape functions with particular properties. Furthermore, the approximation error is dependent on the regularity of the underlying function z and of its derivatives. On the other hand, geostatistical interpolators do not depend in general on the regularity of z and do not yield in general regular reconstructions, apart from the fact that, if measurement errors and subgrid effects are disregarded, an exact interpolation condition holds at the points xi , i = 1, · · · , N.

Chapter 2 Basic definitions on random fields Definition 2 A random field is a function Z = Z(ω, x) which prescribes a real number Z for each couple (ω, x), where ω is an event in a probability space (Ω, P ) and x ∈ Rd (in the following, the dependence on ω will often be omitted for the sake of simplifying the notation). Thus, a random field is a function of several real variables which also happens to depend on elements of a probability space. A short review of the basic properties of these mathematical objects will show how they combine the peculiarities of random variables and scalar fields on Rd . Concepts from both analysis and probability theory are necessary for a proper description of their behaviour.

2.1 Finite dimensional distributions From a probabilistic viewpoint, the behaviour of a random field is completely determined if it is known how to compute the probabilities h i P Z(x1 ) ∈ (a1 , b1 ), · · · , Z(xN ) ∈ (aN , bN ) , (2.1)

where ai , bi , denote the extremes of arbitrary intervals on the real line. For each N and each set of N points xi , i = 1, · · · , N the probabilities (2.1) define uniquely a set of values P(x1 ,··· ,xN ) [(a1 , b1 ), · · · , (aN , bN )] which identifies a probability distribution on RN . These probability distributions are called finite dimensional distributions of the random field Z. It should be observed that the quantities (2.1) are symmetric with respect to permutations of the set of points xi , i = 1, · · · , N. In the case of random fields with continuous finite dimensional distributions, to compute (2.1) it is sufficient to prescribe for each N and each set of N points xi , i = 1, · · · , N a probability density fZ (u) = f(Z(x1 ),··· ,Z(xN )) (u1 , · · · , uN ). 4

Random fields

5

Theorem 1 (Kolmogorov) A set of probability distributions on RN , defined as P(x1 ,··· ,xN ) ([a1 , b1 ], · · · , [aN , bN ]) for N ≥ 1 and symmetric with respect to permutations of the set of points xi , i = 1, · · · , N determines uniquely the probability of any event associated with the random field if one assumes h i P(x1 ,··· ,xN ) ([a1 , b1 ], · · · , [aN , bN ]) = P Z(x1 ) ∈ [a1 , b1 ], · · · , Z(xN ) ∈ [aN , bN ] . An important example are gaussian random fields, for which the finite dimensional distributions are defined by multidimensional gaussian distributions, whose densities are given for a generic set of points xi , i = 1, · · · , N by 1

fZ (u) = p exp (2π)N det(A)

n −(u − m)A ˙ −1 (u − m) o 2

(2.2)

where m = (m(x1 ), · · · , m(xN ) is vector of space dependent quantities and A = Ax1 ,··· ,xN is a symmetric, positive definite matrix.

2.2 First, second order moments and variograms of random fields The average and the variance of a random field are defined as usually for random variables h i Z +∞ m(x) = E Z(x) = ufZ(x) (u) du (2.3) −∞

h i V ar Z(x)

h 2 i = σZ2 (x) = E Z(x) − m(x) Z +∞ = (u − m(x))2 fZ(x) (u) du

(2.4)

−∞

The computation of mean and variance only involves the one dimensional distributions. Other quantities such as the covariance require instead two dimensional finite distributions h i h  i Cov Z(x), Z(y) = E Z(x) − m(x) Z(y) − m(y) . (2.5)

The covariance is defined if the first and second order moments of the random field exist. In the case of Gaussian random fields whose finite dimensional distributions are described by equation (2.2), the vector m = (m(x1 ), · · · , m(xN ) has indeed

Random fields

6

as components the mean values of the field at locations x1 , · · · , xN , while the matrix A is such that ai,j = Cov[Z(xi )Z(xj )]. A very important quantity which plays a key role in the development of statistical interpolators is the variogram. Definition 3 (Variogram) The variogram of a random field Z(x) is defined as h i V ar Z(x) − Z(y) . The quantity

h i 1 γ(x, y) = V ar Z(x) − Z(y) 2 is called semivariogram of Z. If Z has constant mean, the semivariogram is defined equivalently as 2 i 1 h γ(x, y) = E Z(x) − Z(y) 2 If a random field has second order moments, both variogram and covariance exist and there is a simple relationship between them h i h i h i h i V ar Z(x) − Z(y) = V ar Z(x) + V ar Z(y) − 2Cov Z(x), Z(y) (2.6) . Higher order moments can also be defined as done for standard random variables. However, in practice they are quite difficult to estimate from the data and in many applications estimation and inference are only feasible for first and second order moments.

2.3 Analysis of random fields If a random field Z(ω, x) is considered as a function of the spatial variable, a number of the usual analysis concepts (limit, continuity, derivative) can be introduced. This, however, can be done in different ways depending on how the dependency on the probability space is dealt with. Various concepts of limit are given here for the (spatially) pointwise convergence of a sequence of random fields Zn (ω, x), n = 1, · · · , ∞. The same definitions can be extended to different types of convergence in the spatial variable. Furthermore, based on these limit concepts, the continuity and differentiability of the random fields can also be defined accordingly.

Random fields

7

Definition 4 (Pointwise convergence in probability) The sequence Zn (ω, x), n = 1, · · · , ∞ converges pointwise in probability to Z(ω, x) if for any  > 0 and for any x ∈ Rd one has lim P [|Zn (ω, x) − Z(ω, x)| > ] = 0.

n→∞

(2.7)

Definition 5 (Convergence with probability one ) The sequence Zn (ω, x), n = 1, · · · , ∞ converges pointwise with probability one to Z(ω, x) if for any x ∈ Rd one has P [ lim |Zn (ω, x) − Z(ω, x)| = 0] = 1. (2.8) n→∞

Definition 6 (Convergence in mean square sense ) The sequence Zn (ω, x), n = 1, · · · , ∞ converges pointwise in mean square sense to Z(ω, x) if for any x ∈ R d one has lim E|Zn (ω, x) − Z(ω, x)|2 ] = 0. (2.9) n→∞

These convergence concepts are not independent of each other: for example, both convergence in mean square sense and convergence with probability one imply convergence in probability. An important result relating the continuity of a random field to the properties of its second order moments is the following: Theorem 2 (Continuity of random fields) If there is a β > 0 such that h 2 i E Z(x) − Z(y) ≤ Ckx − yk2d+β the random field Z(x) is continuous with probability one.

Proof: See e.g. [1]. This theorem implies that the specific features of the variogram function have a relevant impact on the regularity of the field as function of the spatial variables.

2.4 Definitions of stationarity of random fields Geostatistical interpolation, as it will be seen later, can be in general introduced independently of any hypothesis on the nature of the random field. However, in order to achieve an acceptable estimate of the semivariogram without requiring an amount of data much larger than what is usually available (especially in underground flow or mining applications) some restrictions on the nature of the allowed random fields are necessary. Similar restrictions are also introduced for either conceptual or practical reasons in other areas in which random fields are applied.

Random fields

8

Definition 7 (Stationary random fields) A random field is called stationary if for any vector h ∈ Rd and for any set of points xi , i = 1, · · · , N one has h i P Z(x1 + h) ∈ [a1 , b1 ], · · · , Z(xN + h) ∈ [aN , bN ] h i = P Z(x1 ) ∈ [a1 , b1 ], · · · , Z(xN ) ∈ [aN , bN ] . (2.10)

The stationarity property can also be summarized by saying that the finite dimensional distributions of a stationary field are translation invariant. As a consequence, all the single site moments E[Z(x)k ], k ≥ 1 are constants. If they exist, either covariance or semivariogram only depend on the difference between the two locations at which Z is evaluated. Definition 8 (Intrinsically stationary random fields) A random field is called intrinsically stationary if the field semivariogram is only a function of the difference between the two positions at which the increment is computed, that is, if there exists a real scalar field γ on Rd such that γ(x, y) = γ(x − y)

(2.11)

In general, the class of intrinsically stationary random fields is much larger than that of stationary random fields. Furthermore, a stationary field is also intrinsically stationary. Definition 9 (Second order stationary random fields) A random field is called second order stationary if the field covariance exists and is only a function of the difference between the two positions at which the increment is computed, that is, if there exists a real scalar field C on Rd such that C(x, y) = C(x − y)

(2.12)

If a field Z has finite second order moments that are constant in space, definitions 2.11 and 2.12 are equivalent, since one can use equation 2.6 to obtain h i h i h i γ(x, y) = V ar Z(0) + V ar Z(0) − 2Cov Z(x), Z(y) (2.13) Definition 10 (Increment stationary random fields) A random field is called increment stationary if the field of increments Z(x) − Z(0) is stationary. Increment stationary random fields are also intrinsically stationary.

Random fields

9

Definition 11 (Isotropic random fields) An intrinsically (second order) stationary random field is called isotropic if, for any x, y ∈ Rd the semivariogram (covariance) only depends on the euclidean norm of the difference between the the two points, that is γ(x, y) = γ(kx − yk). Some special cases of anisotropy can be handled more easily, as in the case of Definition 12 (Geometrically anisotropic random fields) An intrinsically stationary random field is geometrically anisotropic if its semivariogram is given by γ(x, y) = γ o (kA(x − y)k) where A is a d × d matrix. In the case of gaussian random fields, stationarity and second order stationarity coincide, since the finite dimensional distributions of the field are entirely determined by the covariance function.

2.5 Characterization and representation theorems for variogram functions In geostatistical interpolation, variograms have in general to be estimated from the data. In order to reconstruct their functional form, however, it is necessary to take into account that variograms belong to a special class of functions that will now be defined. If this fact is disregarded, some serious inconsistencies may arise when using estimated variograms which do not belong to this class, such as for example negative values for positive quantities such as the kriging variance. Definition 13 (Conditionally negative definite functions) A function φ(x, y) is called conditionally negative if for any N ≥ 2, given xi ∈ Rd , i = 1, · · · , N and any set of real numbers αi , i = 1, · · · , N such that N X

αi = 0

i=1

one has

N X N X i=1 j=1

αi αj φ(xi , xj ) ≤ 0.

Theorem 3 (Conditional negative definiteness of variograms) The semivariogram of an an intrinsecally stationary random field is a conditionally negative definite function.

Random fields

10

P Proof: Let αi , i = 1, · · · , N such that N i=1 αi = 0 and assume that Z is an intrinsecally stationary random field. Given xi ∈ Rd , i = 1, · · · , N, one has )2 ) ( N ( N N X 1 XX αi Z(xi ) = − αi αj (Z(xi ) − Z(xj ))2 , (2.14) 2 i=1 j=1 i=1 since

PN

i=1

αi = 0. Taking the expected value one obtains N X N X i=1 j=1

m X

αi αj 2γ(xi − xj ) = −2var

αi Z(si )

i=1

!

≤ 0.

(2.15)

Conditionally negative definite functions can be characterised as follows Theorem 4 Let γ(·) be a continuous function on Rd such that γ(0) = 0. The following statements are equivalent • γ(·) is conditionally definite negative; • for any a > 0, exp(−aγ(·)) is positive definite • there exist a quadratic form Q(·) ≥ 0 and a positive measure that R +∞ RG(·) +∞ is symmetric, continuous at the origin and that satisfies −∞ . . . −∞ (1 + kωk2 )−1 G(dω) < +∞ such that Z +∞ Z +∞ 1 − cos(ω 0 h) γ(h) = Q(h) + ... G(dω). (2.16) kωk2 −∞ −∞ As a result, one obtains the following representation theorem Theorem 5 (Schoenberg-Yaglom) A continuous function φ(x, y) that is conditionally negative definite and such that φ(x, x) = 0 is the variogram of an intrinsically stationary random field. Proof: Define the random field Z +∞ Z Z(s) = ... −∞

+∞ −∞

0

eiω s − 1 W (dω), kωk

(2.17)

where {W (s) : s ∈ Rd } is a complex valued zero mean random field with independent increments and such that E(|W (dω)|2 ) = G(dω)/2. One then has Z +∞ Z +∞ 0 Z(s + h) − Z(s) = ... eiω s Wh∗ (dω), (2.18) −∞

−∞

Random fields where Wh∗ is the independent increment fields such that Z ω1 Z ωd 1 − cos(ν 0 h) 2 ∗ ∗ E(|Wh (dω)| ) = Gh (dω) = ... G(dν). kνk2 −∞ −∞ The random field defined by (2.17) has then semivariogram given by Z +∞ Z 1 − cos(ω 0 h) 1 +∞ ... γ(h) = G(dω), 2 −∞ kωk2 −∞

11

(2.19)

(2.20)

which is in the form of equation (2.16) with Q(h) = 0. A consequence of these representation theorems is that, given any set of semivariograms γi , i = 1, · · · P , m and non negative coefficients αi , i = 1, · · · , m the linear combination γ = m i=1 αi γi is also the semivariogram of an intrinsically stationary process. Functions that satisfy the hypotheses of theorem 4 are also called admissible or valid variogram functions. Similar representation theorems can be derived also for covariograms, based on the concept of conditionally positive definite function. For second order stationary fields the related representation theorems are entirely equivalent.

2.6 Measurement error and subgrid scales: the nugget effect It is clear from definition 3 that for a stationary (in any sense) random field one has γ(0) = 0. If the variogram is assumed to be continuous in the origin, it will be seen in the following that the geostatistical interpolation procedure yields an exact interpolation of the known data at the points where the field has effectively been sampled. This is not appropriate in many cases for two reasons. On one hand, it does not allow to include measurement error among the uncertainties that affect the data: Measurement error is in general assumed to be spatially uncorrelated and should not affect the structure of the variogram for values of h different from zero. Another important effect that is not taken into account if the variogram is assumed to be continuous is the so called nugget effect, i.e. the possibility of sudden jumps in field values on spatial scales that have not been completely sampled by the available data. In many applications, it is necessary to allow for the possibility that even very close to the sampled point the reconstructed random field can take rather different values in a way that is effectively independent of the sampled value.

Random fields

12

Both these effects, although conceptually quite different, can be effectively described by allowing the variogram to be discontinuous at the origin. In particular, if limh→0 γ(h) = c0 with c0 different from zero, the variogram is said to display the nugget effect. A complete proof of the formal equivalence of nugget effect and inclusion of measurement errors can be found in [4].

2.7 Isotropic variogram models A number of isotropic variogram model have been widely used in the applications. In all these examples, we denote semivariograms by γθ (·), where θ represents the vector of free parameters that fully determine the variogram shape. For the variogram models we consider, it will often be the case that θ = (c 0 , c1 , c2 ), where c0 is the nugget parameter, i.e. the non zero limit lim h→0 γ(h) = c0 in case the variogram model is assumed to be discontinuous in the origin, c 1 is the so called sill parameter, that is the limit value limh→+∞ γ(h) = +∞, and c2 is the range, i.e. the typical spatial scale associated to significant changes in the variogram function. It is to be remarked that for some authors range denotes indeed the maximum distance beyond which the correlation between two different field values is zero; we use here a more general definition. Definition 14 (Power law model) The power law variogram is given by  0, h = 0, γθ (h) = c0 + c1 |h|λ , h 6= 0,

(2.21)

with θ = (c0 , c1 ) and c0 , c1 ≥ 0. The particular case λ = 1 is also known as linear variogram model. In order to satisfiy the requirements for admissible variograms described in section 2.5, it must be assumed that 0 < λ < 2. For this variogram model, lim h→+∞ γ(h) = +∞, so that the variogram does not have a sill, does not define an associated covariogram and the associated random field does not have a spatial scale on which correlations decay. Definition 15 (Exponential model) The exponential variogram model is given by ( 0,   h = 0, γθ (h) = (2.22) c0 + c1 1 − exp (−|h|/c2 ) , h 6= 0, where θ = (c0 , c1 , c2 ), ci ≥ 0 for i = 0, 1, 2.

Random fields

13

Definition 16 (Gaussian model) The gaussian variogram model is defined by (  h = 0,  0, (2.23) γθ (h) = |h|2 c0 + c1 1 − exp (− c2 ) , h 6= 0, 2

with θ = (c0 , c1 , c2 ), ci ≥ 0 for i = 0, 1, 2.

It should be remarked that random fields with Gaussian variogram need not be Gaussian random fields. Gaussian variograms imply very smooth random fields, that are often not realistic for many practical applications. Definition 17 (Spherical model) The spherical model is defined by  h = 0,     0,   3   γθ (h) = c0 + c1 23 |h| − 21 |h| , 0 < h ≤ c2 , c2 c2    c0 + c 1 , h > c2 ,

(2.24)

with θ = (c0 , c1 , c2 ), ci ≥ 0 for i = 0, 1, 2.

This formula defines a valid variogram only if h is the absolute value of a vector in R2 or R3 .

Chapter 3 Variogram estimation In order to estimate the variogram of an intrinsically stationary random field from the available data, several variogram estimators have been introduced, which are used to derive the so called empirical variogram, i.e. a discrete set of values to which then an admissible variogram model can be fitted. For the purposes of this presentation, we will restrict the attention to isotropic random fields, although similar considerations can be carried out in the anisotropic case.

3.1 Empirical variogram estimators A finite set of positive values hk , k = 1, · · · , K is introduced. These values are assumed to be ordered so that hk < hk+1 and they are interpreted as absolute distances from the origin. We also introduce the positive values δ k , k = 1, · · · , K so that the intervals [hk − δ2k , hk + δ2k ] are mutually disjoint and cover completely the interval [0, hK + δ2K ]. These values can be used to define the distance classes δk δk ≤ kxi − xj k < hk + δk , } (3.1) 2 2 Here, xi ∈ Rd , i = 1, · · · , N denotes as in the previous chapters the points at which the data are available, so that class N (hk ) includes all pairs of measurement points whose mutual distance falls in the interval [hk − δ2k , hk + δ2k ). N (hk ) = |N (hk )| will denote in the following the cardinality of class N (hk ). In general, it is required that the distance classes are sufficiently populated for the variogram estimation to be significant. For example, [5] suggests that N (hk ) ≥ 30. In case this condition is not satisfied, new values of hk should be chosen to guarantee the significance of the variogram estimation. The classical Matheron estimator is defined for k = 1, · · · , K as N (hk ) = {(xi , xj ) : hk −

14

Variogram estimation

15

γˆ M (hk ) =

X 1 (Z(xi ) − Z(xj ))2 . 2N (hk )

(3.2)

N (hk )

This the most straightforward form of a variogram estimator and it has been widely applied, see e.g. [5], [6], [7], [12]. One problem with the Matheron estimator is that it can be very sensitive to the presence of outliers in the data. In [8], a more robust estimator was proposed by Cressie and Hawkins. This is defined for k = 1, · · · , K as C

1

γˆ (hk ) =  2 0.457 +

X 1 1  |Z(xi ) − Z(xj )| 2 N (hk ) 0.494

N (hk )

N (hk )

!4

.

(3.3)

This choice can be explained as follows since, for gaussian random fields, Z(xi ) − Z(xj ))2 is a random variable with a χ21 distribution with one degree of freedom. For this type of variables, it can be seen heuristically that elevating to the power 1/4 is the transformation that yields a distribution most similar to a normal distribution, for which it can be proven that |Z(xi ) − Z(xj )|1/2 are less correlated among themselves than |Z(xi ) − Z(xj )|2 . Another alternative is the estimator   4  med 1/2 γˆ (h) = med |Z(xi ) − Z(xj )| : (xi , xj ) ∈ N (h) 2B(h), (3.4) where med{·} denotes the median of the values in brackets and B(h) is a bias corrector that tends to the asymptotic value of 0.457.

3.2 Least squares variogram fitting procedures Once an empirical variogram has been estimated using the techniques outlined in the previous section, a valid variogram model can be fitted to the estimated values. More precisely, denote by γˆ ] (h) one of the variogram estimators defined in section 3.1 and by γ(h; θ) a valid variogram model, dependent on a parameter vector θ. The simplest fitting procedure, also known as ordinary least squares method, computes an optimal value of θ by minimization of the functional K X  ] 2 γˆ (hk ) − γ(hk ; θ) . k=1

(3.5)

Variogram estimation

16

This provides a purely geometrical fitting and does not use any information on the distribution of the specific estimator γˆ ] (h) being used. This is instead taken into account in the so called generalized least squares method, that can be defined as follows. Let γˆ ] (hk ) k = 1, · · · , K the estimated values of the empirical variogram for an a priori fixed number K of distance classes. Furthermore, assume that the number of data pairs in each distance class is sufficiently large (Cressie suggests to consider only classes for which at least 30 data pairs are present). One can then consider the random vector 2γ ] = (2γ ] (h1 ), . . . , 2γ ] (hK ))T and its covariance matrix V = var(2γ ] ). The generalized least squares method consists in determining the parameter vector θ that minimizes the functional (2γ ] − 2γ(θ))T V(θ)−1 (2γ ] − 2γ(θ)),

(3.6)

where 2γ(θ) = (2γ(h1 ; θ), . . . , 2γ(hK ; θ)T is the theorical variogram model to be fitted computed at distances h1 , . . . , hK . Lo stimatore così ottenuto viene indicato con θV] . The generalized least square method is only using the second order moments of the variogram estimator and does not require any assumption on the data distribution. On the other hand, the covariance matrix can be quite complex to derive and minimization of the functional 3.6 not easy. For this reason, a simplified procedure is presented in [5], that is based on heuristic considerations valid in the case of a gaussian field Z. This derivation shows that the nondiagonal terms of V can be disregarded in a first approximation, and that the diagonal terms can be approximated by 2(2γ(hj ; θ)2 . Vj,j ≈ |N (hj )|

As a consequence, an estimator of the parameter vector θ can be obtained by minization of the functional  2 K X γ b(hj ) N (hj ) −1 . (3.7) γ(hj ; θ) j=1

Formula 3.7 yields a criterion that attributes a greater importance to well populated distance classes hj for which N (hj ) is larger. This approximation can also be considered as the first step of an iterative procedure, in which the minimization of 3.6 is sought via a sequence θ k , where θ 0 is obtained by minimizing 3.7, and the following θ k are obtained by minimization of (2γ ] − 2γ(θ))T V(θk−1 )−1 (2γ ] − 2γ(θ)).

(3.8)

Chapter 4 Spatial prediction and kriging Geostatistical interpolation consists in recovering an optimal prediction of the field value at a location where no data are available, using the known data both for the purpose of estimating the field variogram (or covariance) and to provide a prediction and estimate the prediction error. It is to be remarked that the stationarity assumptions that will

4.1 Ordinary kriging In ordinary kriging, the uncertain data zi i = 1, · · · , N, assumed to be known at the N points xi , i = 1, · · · , N are interpreted as a realization of an intrinsically stationary random field Z(x) with constant mean µ. The constant mean is not assumed to be known, while the semivariogram has to be available. The implications of using estimated variograms on the quality of the estimate will be discussed later. This amounts to assume Z(x) = µ + δ(x), where δ is a zero mean random field. Considering definition 3 these assumptions imply that E

h

Z(x) − Z(y)

2 i

=E

h

δ(x) − δ(y)

2 i

= 2γ(x, y).

(4.1)

Under these assumption, one can define ordinary kriging as follows: Definition 18 (Ordinary kriging) Given a point x0 , the ordinary kriging estimator at x0 based on the data Z(xi ) i = 1, · · · , N is defined as the linear unbiased estimator N X ˆ Z(x0 ) = λi Z(xi ) i=1

of Z(x0 ) with minimum mean square prediction error. 17

Kriging

18

It can be remarked that the unbiasedness assumption amounts to require 1, since

PN

i=1

λi =

N N N h i hX i X h i X ˆ E Z(x0 ) = E λi Z(xi ) = λi E Z(xi ) = µ λi ,

h

i=1

i=1

i

i=1

which is equal to µ = E Z(x0 ) if and only if the coefficients of the linear combination sum to one. In order to derive an expression for these coefficients, it is practical to resort to the method of Lagrange multipliers to reduce the problem to an unconstrained minimization. Thus, one introduces the function

φ(λ1 , . . . , λN , β) = E

h

Z(x0 ) −

N X

λi Z(xi )

i=1

2 i

− 2β

N X i=1

λi − 1



and seeks values of λ1 , . . . , λN , β such that φ attains its minimum. Before proceeding to the minimization, the function is rewritten using the fact that 

Z(x0 ) −

N X

λi Z(xi )

i=1

2

= Z(x0 ) − 2Z(x0 ) = Z(x0 )2 − 2Z(x0 ) − =

N X

i=1 N X i=1

λi Z(xi )2 +

2

N X

i=1 N X

λi Z(xi ) + λi Z(xi ) +

i=1

N X

N X

λi Z(xi )

i=1 N X

2

λi Z(xi )2

i=1

λi Z(xi )

i=1

2

  λi Z(x0 ) − 2Z(x0 )Z(xi ) + Z(xi )2

N N N N i X X X 1h X 2 2 2 λi Z(xi ) + λj Z(xj ) − 2( λi Z(xi ) )( λj Z(xj )2 ) − 2 i=1 j=1 i=1 j=1

=

N X i=1



λi Z(x0 ) − Z(xi )

2

 2 1 XX − λi λj Z(xi ) − Z(xj ) . 2 i=1 j=1 N

N

Kriging

19

Because of equation (4.1), this implies that φ(λ1 , . . . , λN , β) N N N N h i X  X X X = λi γ(x0 , xi ) λi γ(x0 , xi ) − λj γ(xi , xj ) − 2β λi − 1 i=1

=−

N N X X i=1 j=1

i=1

j=1

λi λj γ(xi , xj ) + 2

N X i=1

λi γ(x0 , xi ) − 2β

i=1

N X i=1

 λi − 1 .

(4.2)

Setting the gradients of the function φ equal to zero leads to the linear system ΓO λO = γ O

(4.3)

where the unknown and right hand side are given by, respectively,     γ(x0 , x1 ) λ1   ... . . .     γ = λO =  O γ(x0 , xN )  λN  1 β

and the system matrix is defined by  γ(x1 , x1 ) γ(x1 , x2 )  γ(x2 , x1 ) γ(x2 , x2 )  ... ΓO =   ... γ(xN , x1 ) γ(xN , x2 ) 1 1

. . . γ(x1 , xN ) . . . γ(x2 , xN ) ... ... . . . γ(xN , xN ) ... 1

 1 1  1 . 1 0

(4.4)

(4.5)

The ordinary kriging coefficients can then be determined solving the linear system (4.3), so that λO = Γ−1 O γO.

(4.6)

It is to be remarked that the solution λO provides two types of information. Along with the values of the coefficients λi , i = 1, . . . , N, the solution of the system also provides the value of the Lagrange multiplier β that minimizes the mean square prediction error. Putting the computed values back into the expression of this functional one can see that the optimal value of the prediction error is given by 2 σOK (x0 ) = λTO γ O = γ TO Γ−1 O γO.

(4.7)

This expression is also called kriging variance and is an estimate of the prediction error associated with the ordinary kriging predictor.

Kriging

20

4.2 Universal kriging In universal kriging, the uncertain data zi i = 1, · · · , N, assumed to be known at the N points xi , i = 1, · · · , N are interpreted as a realization of a random field that can be decomposed in the sum of a deterministic component and of an intrinsically Pstationary random field Z(x) with zero mean. This amounts to assume Z(x) = pj=1 βj fj (x) + δ(x), where δ is the zero mean random field. The deterministic component is represented using shape functions fj that are assumed to be known, along with the semivariogram of the random field δ, but the coefficients βj are not needed to formulate the prediction. Under these assumption, one can define universal kriging as follows: Definition 19 (Universal kriging) Given a point x0 , the universal kriging estimator at x0 based on the data Z(xi ) i = 1, · · · , N is defined as the linear unbiased estimator N X ˆ Z(x0 ) = λi Z(xi ) i=1

of Z(x0 ) with minimum mean squared prediction error.

It is to be remarked that if p = 1 and f1 = 1 are chosen, ordinary kriging is recovered exactly. Introducing the matrix   f1 (x1 ) . . . fp (x1 )  f1 (x2 ) . . . fp (x2 )  , X = (4.8)  ... ... ...  f1 (xN ) . . . fp (xN ) and the vectors       β1 δ(x1 ) Z(x1 ) β = . . .  δ =  . . .  Z =  . . .  βp δ(xN ) Z(xN ) the universal kriging data can also be rewritten as Z = Xβ + δ,

(4.9)

which highlights the formal similarity with the general linear estimation problem. The functional to be minimized can be written in the case of universal kriging as φ(λ1 , . . . , λN , m1 , . . . , mp ) = E − 2

h

Z(x0 ) −

p X j=1

mj

N X

i=1 N X i=1

λi Z(xi )

2 i

λi fj (xi ) − fj (x0 )



Kriging

21

where ml , l = 1, . . . , p are the Lagrange multipliers. Repeating the derivation along the lines of the previous section leads to the linear system ΓU λU = γ U where the unknown and right hand side vector are given by, respectively,     γ(x0 , x1 ) λ1   ... . . .     γ = λU =  U γ(x0 , xN )  λN  1 β

(4.10)

(4.11)

and the system matrix is given by 

γ(x1 , x1 )  γ(x2 , x1 )   ...  ΓU =  γ(xN , x1 )  f1 (x1 )   ... fp (x1 )

... ... ... ... ... ... ...

 γ(x1 , xN ) f1 (x1 ) . . . fp (x1 ) γ(x2 , xN ) f1 (x2 ) . . . fp (x2 )   ... ... ... ...   γ(xN , xN ) f1 (xN ) . . . fp (xN ) . f1 (xN ) 0 ... 0   ... 0 ... 0  fp (xN ) 0 ... 0

(4.12)

The universal kriging coefficients can then be determined by solving the linear system (4.10), so that λU = Γ−1 U γU .

(4.13)

Similarly to the ordinary kriging case, along with the prediction also the mean squared prediction error can be computed by the formula σU2 K (x0 ) = λTU γ U = γ TU Γ−1 U γU

(4.14)

once the universal kriging coefficients and Lagrange multipliers have been determined. The main difficulty in the practical application of universal kringing lies with the fact that, if the variogram is not known, for any random field with non constant mean the standard variogram estimators described in section 3 are no more unbiased and, indeed, cannot be applied if the coefficients βj are not known. These can be in turn estimated assuming that the field δ has known covariance. Indeed, if the covariance of the data Z is known and denoted by Σ, the standard generalized least squares estimator yields the value β gls = (X T Σ−1 X)−1 X T Σ−1 Z.

Kriging

22

However, the data covariance, assuming it exists, is in fact related to the variogram. This leads to a circularity between the hypotheses needed for variogram estimation that can be resolved in a number of ways, none of which is free from criticism and practical problems. The reader is referred to the discussion in [5], [4] for more details.

Chapter 5 Appendix: Basics of random variables In order to make these notes self contained, some basic definitions and results in probability theory are summarized in this appendix. There is also no attempt at achieving a high standard of mathematical rigour in the formulation of the definitions and theorems. The reader interested in the complete presentation of the measure theoretic problems associated with probability spaces and random variables should consult textbooks such as [2]. A basic introduction to probability theory and mathematical statistics can be found in [10]. Definition 20 (Probability space) A probability space is defined by • the set Ω of all events that are considered admissible • the collection F of all subsets of Ω for which a probability is defined (which includes Ω and the empty set ∅); in order to avoid some paradoxes and to endow P with all the desirable properties defined below, F cannot coincide with the set of all subsets of Ω and must satisfy a series of properties which will not be listed here. • the probability P, a function that assignes values in the interval [0, 1] to each set in F , representing the relative weight of a given event with respect to the set of all admissible events The probability P must satisty the properties • P (Ω) = 1; • P (Ac ) = 1 − P (A), for each set A ∈ F , where Ac denotes the complementary of A; 23

Random variables

24

• given an arbitrary (possibly infinite) sequence of mutually disjoint sets A i , i ≥ 1, Ai ∪ Aj = ∅, it holds X P (∪i≥1 Ai ) = P (Ai ). i≥1

Definition 21 (Random variable) A random variable is a function Z = Z(ω) which prescribes a real number Z for each event ω in a probability space (Ω, P ). From a probabilistic viewpoint, the behaviour of a scalar random variable X is completely determined if it is known how to compute the probabilities h i P X ∈ [a, b) . (5.1)

Definition 22 (Probability distribution) The probability distribution of the random variable X is defined for each x ∈ R by h i FX (x) = P X ∈ [−∞, x) . (5.2) Definition 23 (Continuous random variables) The random variable X has a continuous distribution if for each x ∈ R there is a non negative real function fX (u) such that h i Z +∞ fX (u) du. (5.3) FX (x) = P X ∈ [−∞, x) = −∞

fX (u) is called the probability density function of X. An important example are gaussian random variables, for which the distribution are defined by multidimensional gaussian distributions n (u − m)2 o 1 (5.4) exp − 2a 2πa where m = is vector of space dependent quantities and a is a positive number. The average and the variance of a random variable are defined as Z +∞ mX = E[X] = ufX (u) du (5.5) fX (u) = √

−∞

V ar[X]

= =

2 σX

Z

h i 2 = E (X − mX )

+∞ −∞

(u − m)2 fX (u) du

(5.6)

Random variables

25

The median of a random variable is defined implicitly by the equation 1 FX (med[X]) = . 2

(5.7)

The mean of a random variable its approximation with a constant that minimizes the L2 norm of the difference, i.e. Theorem 6 (Variance as minimum mean square estimator) For any real number λ, one has h i h i E (X − mX )2 ≤ E (X − λ)2 .

The median of a random variable its approximation with a constant that minimizes the L1 norm of the difference, i.e. Theorem 7 (Median as minimum mean of absolute estimator) For any real number λ, one has h i h i E |X − med[X]| ≤ E |X − λ| .

Other quantities such as the covariance require instead two dimensional finite distributions: h i Cov(X, Y ) = E (X − mX )(Y − mY ) . (5.8)

Existence of the covariance is equivalent to the existence of the second order moments of the random fields. Variance and covariance are related by V ar[X + Y ] = V ar[X] + V ar[Y ] + Cov(X, Y ).

(5.9)

Bibliography [1] R.J. Adler. Geometry of Random Fields. Wiley, 1981. [2] P. Billingsley. Probability and measure. Wiley, New York, 1986. [3] M. D. Buhmann. Radial Basis Functions. Cambridge University Press, Cambridge, 2003. [4] R. Christensen. Linear Models for Multivariate, Time Series and Spatial data. Springer Verlag, 1991. [5] N. Cressie. Statistics for spatial data. Wiley, 1991. [6] M.G. Genton. Higly robust variogram estimation. Mathematical Geology, 30:213–221, 1998. [7] D.J. Gorsich and M.G. Genton. Variogram model selection via nonparametric derivative estimation. Mathematical Geology, 32:249–270, 2000. [8] D.M. Hawkins and N. Cressie. Robust kriging-a proposal. Journal of the International Association for Mathematical Geology, 16:3–18, 1984. [9] G. Kitanidis. Geostatistics. In D.R. Maidment, editor, Handbook of Hydrology, pages 153–165. McGraw Hill, 1993. [10] S. Ross. Probability and Statistics for the applied sciences. ??, Berlin, 1995. [11] J. Stoer and R. Bulirsch. An Introduction to Numerical Analysis, 2nd edition. Springer Verlag, Berlin, 1990. [12] H. Wackernagel. Multivariate Geostatistics. Springer Verlag, Berlin, 1995. [13] A. T. Walden and P.Guttorp. Statistics in the environmental and earth sciences. Arnold, 1992.

26