of modeling skewness for spatial data by working with a larger class of distributions than the .... E[Y], the variance Var(Y) and the skewness coefficient Sk(Yi).
MODELING SKEWNESS IN SPATIAL DATA ANALYSIS WITHOUT DATA TRANSFORMATION
PHILIPPE NAVEAU(1,2) and DENIS ALLARD(3) (1) Dept. of Applied Mathematics, University of Colorado, Boulder, USA (2) Laboratoire des Sciences du Climat et de l’Environnement, IPSLCNRS, Gif-sur-Yvette, France (3) INRA, Unit´e de Biom´etrie, Site Agroparc, 84914 Avignon, France Abstract. Skewness is present in a large variety of spatial data sets (rainfalls, winds, etc) but integrating such a skewness still remains a challenge. Classically, the original variables are transformed into a Gaussian vector. Besides the problem of choosing the adequate transform, there are a few difficulties associated with this method. As an alternative, we propose a different way to introduce skewness. The skewness comes from the extension of the multivariate normal distribution to the multivariate skew-normal distribution. This strategy has many advantages. The spatial structure is still captured by the variogram and the classical empirical variogram has a known moment generating function. To illustrate the applicability of such this new approach, we present a variety of simulations.
1 Introduction The overwhelming assumption of normality in the multivariate Geostatistics literature can be understood for many reasons. A major one is that the multivariate normal distribution is completely characterized by its first two moments. In addition, the stability of multivariate normal distribution under summation and conditioning offers tractability and simplicity. However, this assumption is not satisfied for a large number of applications. In this work, we propose a novel way of modeling skewness for spatial data by working with a larger class of distributions than the normal distribution. This class is called general multivariate skew-normal distributions. Besides introducing skewness to the normal distribution, it has the advantages of being closed under marginalization and conditioning. This class has been introduced by Dom´ınguez-Molina et al., 2003 and is an extension of the multivariate skew-normal distribution first proposed by Azzalini and his coworkers (Azzalini, 1985, Azzalini, 1986, Azzalini and Dalla Valle, 1996 and Azzalini and Capitanio, 1999). These distributions are particular types of generalized skewelliptical distributions recently introduced by Genton and N. Loperfido, 2005, i.e. they are defined as the product of a multivariate elliptical density with a
2
P. NAVEAU AND D. ALLARD
skewing function. This paper is organized as follows. In Section 2, the definition of skew-normal distribution is recalled and notations are introduced. In Section 3, we first recall the basic framework of spatial statistics and then present the spatial skewed Gaussian processes. The estimation procedure of the variogram and skewness parameters is presented in details and illustrated on simulations. We conclude in Section 4. 2 Skew-normal distributions Multivariate skew normal distributions are based on the normal distribution but a skewness is added to extend the applicability of the normal distribution while trying to keep most of the interesting properties of the Gaussian distribution. Today, there exists a large variety of skew-normal distributions (Genton, 2004) and they have been applied to a variety of situations. For example, Naveau et al., 2004, developped a skewed Kalman filter based on these distributions. In a Gaussian framework, spatial data are analyzed using the skew normal distribution (Kim and Mallick, 2002), but without a precise definition of skew normal spatial processes. As it will be shown in Section 3.1, this model leads to a very small amount of skewness and therefore is not very usefull in practice. From a theoretical point of view, we use in this work the multivariate closed skew-normal distribution (Dom´ınguez-Molina et al., 2003, Gonz´ alez-Far´ıas et al., 2004). It stems from the ”classical” skew-normal distribution introduced by Azzalini and its co-authors. It has the advantages of being more general and having more properties similar to the normal distribution than any other skew-normal distributions. A drawback is that notations can become cumbersome. The book edited by Genton, 2004, provides an overview of the most recent theoretical and applied developments related to the skewed distributions. An n-dimensional random vector Y is said to have a multivariate closed skewnormal distribution denoted by CSNn,m (µ, Σ, D, ν, ∆), if it has a density function of the form: t cm φn (y; µ, Σ) Φm (Dt (y − µ); ν, ∆), with c−1 m = Φm (0; ν, ∆ + D ΣD), n
m
n×n
n×1
(1)
where µ ∈ R , ν ∈ R , Σ ∈ R and ∆ ∈ R are both covariance matrices, D ∈ Rm×n , φn (y; µ, Σ) and Φn (y; µ, Σ) are the n-dimensional normal pdf and cdf with mean µ and covariance matrix Σ, and Dt is the transpose matrix of D. When D = 0, the density (1) reduces to the multivariate normal one, whereas it is equal to Azzalini’s density (Azzalini and Dalla Valle, 1996), i.e. the variable Y follows a CSNn,1 (µ, Σ, α, 0, 1), where α is a vector of length n. This distribution was the first multivariate skew-normal distribution and it was introduced by Azzalini and his coworkers (Azzalini, 1985, Azzalini, 1986, Azzalini and Dalla Valle, 1996 and Azzalini and Capitanio, 1999). The CSNn,m (µ, Σ, D, ν, ∆) distribution defined by (1) is generated from the following bivariate vector. Let U be a Gaussian vector of dimension m and let us consider the augmented Gaussian vector (Ut , Zt )t with the following distribution: U d ν ∆ + Dt ΣD −Dt Σ = Nm,n , (2) Z 0 −ΣD Σ
MODELING SKEWNESS IN SPATIAL DATA
3
d
where = corresponds to the equality in distribution. Then, it is straightforward to show that conditional on U ≤ 0 the random vector µ + [Z|U ≤ 0] is distributed according to a CSNn,m (µ, Σ, D, ν, ∆) as defined in Equation (1). Here the notation A ≤ B corresponds to Ai ≤ Bi , for all i = 1, . . . , n. This construction offers a wide range of possible models depending on the choice of µ, ν, ∆, Σ and D. For more details on this type of construction, we refer to (Dom´ınguez-Molina et al., 2003). It is well known that the conditional vector [Z|U] is also a Gaussian vector with distribution d
[Z | U] = Nn (−Dt Σ(∆ + Dt ΣD)−1 (U − ν), Σ − Dt Σ(∆ + Dt ΣD)−1 ΣD). (3) This property provides a two-step algorithm for simulating a CSN vector Z: (i) d
generate samples of the Gaussian vector U = Nm (ν, ∆ + Dt ΣD) such that U ≤ 0; (ii) generate the Gaussian vector [Z | U] according to (3). Generating a vector U conditional on U ≤ 0 is not direct. In particular direct seqential simulations cannot be used to generate such a vector. MCMC methods must be used instead. Here, we used a Gibbs sampling technique to simulate the vector U | U ≤ 0. The moment generating function (mgf) of a closed-skew normal density is equal to (Dom´ınguez-Molina et al., 2003): 1 M (t) = cm Φm (Dt Σt; ν, ∆ + Dt ΣD) exp µt t + tt Σt . (4) 2 The mgf of a CSN random vector is thus the product of the usual mgf of a Gaussian vector with mean µ and covariance matrix Σ by a the m dimensional normal cpf with mean ν and covariance matrix ∆ + Dt ΣD. It is well known that even for moderate dimensions for m, the cpf Φm is difficult to compute. 3 Spatial skewed Gaussian processes Let {Z(x)} with x ∈ R2 , be a spatial, ergodic, stationary, zero-mean Gaussian process with variogram 2γ(h) = Var(Z(x + h) − Z(x)), for any h ∈ R2 and variance σ 2 = Var(Z(x)). For more details on the variogram, we refer to the following books: Wackernagel, 2003, Chil`es and Delfiner, 1999, Stein, 1999 and Cressie, 1993. The covariance matrix of the random vector Z = (Z(x1 ), ..., Z(xn ))t built from the covariance function c(h) = σ 2 − γ(h) is denoted by Σ. To link this spatial structure with skew normal distributions, we simply plug the covariance matrix Σ in Equation (2). Hence, we assume in the rest of this paper that the vector Z is the same that the one used in Equation (2). Consequently, the process {Y (x)} is defined through the following equality d
Y = µ + [Z | U ≤ 0]. This is our definition of a CSN random process. In practice, we only observe the realizations (Y (x1 ), ..., Y (xn ))t , but neither U nor Z.
4
P. NAVEAU AND D. ALLARD
In this work, we focus on two different special cases. The first one is presented in Section 3.1 and should be viewed as a pedagogical introduction. It corresponds to a simple case where U is a one dimensional random variable (i.e. m = 1). Although the cpf Φ1 can be easily computed, this set-up allows only a small amount of skewness. In comparison, the second case presented in Section 3.2 is more complex (i.e, m = n) and provides a much greater skewness, but at the cost of computational difficulties. 3.1 THE UNIVARIATE CONDITIONAL CASE
Here we suppose m = 1, ∆ = 1 and D = δA where A is a vector independent on δ ≥ 0. This latter coefficient represents the degree of skewness introduced in this model. The random vector Y follows a closed-skew normal distribution d Y = CSNn,1 (µ1, Σ, δA, ν, 1) (see Equation (1)). The assumption ∆ = 1 is made without loss of generality, √ since √ a CSNn,1 (µ1, Σ, δA, ν, 1) is identically distributed to a CSNn,1 (µ1, Σ, δA d, ν d, d) for any non-null scalar d. To investigate the departure from the Gaussian model, we compute the mean E[Y], the variance Var(Y) and the skewness coefficient Sk(Yi ). These quantities are derived from the cumulant function K(t) = log M (t). It is well known that K (1) (0) = E[Y], K (2) (0) = Var(Y) and K (3) (0)i,i,i = E[(Yi − E[Yi ])3 ]. The skewness coefficient is defined by Sk(Yi ) = K (3) (0)i,i,i /Var(Yi )3/2 . It follows from Equality (4) that 1 K(t) = log(c1 ) + log Φ1 (δAt Σt; ν, 1 + δ 2 At ΣA) + µt t + tt Σt. 2 Taking derivatives of the cumulant function gives E[Y] = µ + δh(ν)A, Var(Y) = Σ − (δ)2 h(1) (ν)AAt , (5) √ where = 1/ 1 + δ 2 At ΣA and h(x) = φ(x)/Φ(x) is the Gaussian hazard rate function whose first and second derivatives are equal to: h(1) (x) = h(x)(h(x) − x) and h(2) (x) = h(1) (x)(2h(x) − x) − h(x). To compute the skewness coefficient of the i-th component of the vector Y, we note K (3) (0)i,i,i = (δ)3 h(2) (ν)(AAt )i,i Ai = (δai )3 h(2) (ν) where ai is the i-th coordinate of the vector A. Plugging the expressions of the moments in the skewness leads to Sk(Yi ) =
(δai )3 h(2) . [σ 2 − (δai )2 h(1) ]3/2
The quantity driving the skewness is si = δai =
δ 2 a2i 1 + δ 2 At ΣA
1/2
(6)
.
Clearly, the skewness is equal to 0 if δ = 0 or ai = 0. Furthermore, si → 0 as σ 2 → ∞. To get a better understanding of the limitations of the case m = 1, we focus on a few specific cases for the vector A:
MODELING SKEWNESS IN SPATIAL DATA
5
√ − A = (1, 0, . . . , 0)t . Then si = 0 for i = 2, ..., n and s1 = δ/ 1 + σ 2 δ 2 . This quantity is null for δ = 0, increasing w.r.t. δ and is bounded by 1/σ as δ → ∞. In other words, all introduced skewness is concentrated on the first coordinate Z(x1 ). − A = 1t , where 1 is a vector of ones of dimension n. This is implictely the model in Kim and Mallick, 2002, although it is never clearly stated in their paper. Then si = δ(1 + δ 2 1t Σ1)−1/2 for all i. Again, si is null for δ = 0, increasing −1/2 w.r.t. δ and is bounded by (1Rt Σ1) . Note that 1t Σ1/n2 → C(D, D), as R n → ∞ and where C(D, D) = D D C(x−y)dxdy and D is the domain under study. Hence, si ' C(D, D)−1/2 and thus si → 0 as n → ∞. − More generally, if A = nα 1t , then si = δnα 1 + δ 2 n2α 1t Σ1
−1/2
= δnα 1 + δ 2 n2α+2 C(D, D)
−1/2
.
It is easy to show that si → ∞ as n → ∞ for all α ∈ R. − If A = Σ−1/2 1, then si = δ 2 ai /(1 + nδ 2 ), and again si → 0 as n → ∞. In conclusion, the skewness introduced by U spans on all coordinates of the vector Z for which ai 6= 0. But, as the dimension n of the observation vector Y increases, the amount of skewness is spread over all non-null coordinates of A and is dispersed over the observations. This explains why the skewness tends to disappear for medium and large size samples. To solve this problem, we choose to increase the dimension of U. The consequences and advantages of such a choice are explained in the next section. 3.2 THE MULTIVARIATE CONDITIONAL CASE
In the previous section, we concluded that the skewness must be introduced for a vector U of sufficiently high dimension. A natural choice is to have one conditioning for each coordinate of Z. Therefore we will consider m = n. In order to reproduce correctly the spatial dependance of Z, a second natural choice is ∆ = Σ and D = δI, where I is the identity matrix of rank n. This model has the advantages of being easy to interpret and parsimonious. In this case, 1 δ d (U − ν), Σ . (7) [Z|U] = Nn − 1 + δ2 1 + δ2 and hence, d
Y =µ−
δ 1 [(U − ν) | U ≤ 0] + √ Σ1/2 V, 1 + δ2 1 + δ2
(8)
where V is a vector of independent N(0, I). The vector Y is the sum of a sample of a stationary process with covariance function c(h)/(1 + δ 2 ) and a vector U conditional on U ≤ 0. This vector is not necessarily stationary. The parameter δ corresponds to the intensity of skewness. When δ = 0, Z is independent on U and Z is a Gaussian vector. As δ increases, the conditional
6
P. NAVEAU AND D. ALLARD
covariance matrix tends to the null matrix, and the vector [Z | U], properly renormalized, tends to the vector U, a truncated Gaussian vector. The cumulant function of Y is 1 K(t) = log cm + log Φm (δΣt; ν, (1 + δ 2 )Σ) + µt t + tt Σt. 2 To calculate the moments of Y from the cumulant function, we need to compute the derivatives of Φm (δΣt; ν, (1 + δ 2 )Σ). These quantities do not have explicit expressions for even small dimensions of m (i.e., m > 2). To make the derivations tractable, we assume that the cpf Φm (δΣt; ν, (1 + δ 2 )Σ) can be approximated by Φm (δσ 2 It; ν, (1+δ 2 )σ 2 I). Although such an approximation is strong, simulated examples indicate that it leads to reasonable moment estimates. Then, the cumulant function K(t) can be approximated by the function K ∗ (t) defined by K ∗ (t) = log cm +
n X i=1
log Φ
δσ 2 ti − ν √ σ 1 + δ2
1 + µt t + tt Σt. 2
It follows immediately that ν ν 2 √ √ 1 and Var[Y] ' Σ − (σδ ∗ h) h(1) 11t , E[Y] ' µ + σδ ∗ h σ 1 + δ2 σ 1 + δ2 where h(.) is the hazard rate function and δ ∗ is defined as δ/(1 + δ 2 )1/2 . The new parameter δ ∗ varies between 0 and 1. If δ ∗ = 0, there is no skewness. If δ ∗ = 1, the skewness is maximum and Y corresponds to a truncated Gaussian vector. It is also interesting to note the similarities between the approximated mean and variance of Y] with the expressions in (5). Concerning the parameter ν, it can be shown that the maximum amount of skewness is obtained for values of ν near 0. Therefore, in order to present here simple computations we suppose from now on that ν = 0. For this special case, we have the following approximations p E[Yi ] ' µ + σδ ∗ 2/π, Var(Yi ) ' σ 2 (1 − δ ∗2 2/π), µ3 ' σ 3 δ ∗3 c3 ,
3 where µ3 = E[(Y p − E[Y ]) ] is the centered moment of order 3 and the constant c3 = (4/π − 1) 2/π ' 0.218. From these equations, moment estimators can be proposed for the three parameters µ, σ 2 and δ ∗ . Let us denote m1 the sample mean of Y, m2 its sample variance and m3 its sample centered moment of order 3. Then,
δˆ∗ = (m3 /0.218)1/3 , restricted to 0 ≤ δˆ∗ ≤ 1, m2 σ ˆ2 = , 1 − 2/π δˆ∗2 p µ ˆ = m1 − δˆ∗ σ ˆ 2/π.
MODELING SKEWNESS IN SPATIAL DATA
7
The variogram is estimated in a similar way, by considering that the covariance matrix of Y is (1 − 2δ ∗2 /π)Σ. Then the parameters of the covariance function c(h) are estimated in the usual way by using the experimental variogram of Y. Skewed observations are simulated from an exponential covariance model with a = 0.1 and µ = 0, σ 2 = 1, ν = 0 and at 200 random locations in a unit square domain. Three different values of δ are chosen (0.577, 1, 1.732) such that the variance of the non-truncated part in Equation (8) is equal to 3/4 σ 2 , 1/2 σ 2 and 1/4 σ 2 . To allow direct comparison, the same conditioning vector U is used. Figure 1 represents the histogram and the variogram of the simulated vector Y. From these pictures, one can see that the average and the skewness increase as δ increases, whereas the variance decreases. It is also clear that the theoretical variograms obtained using the above mentioned approximation provide a very good fit to the experimental variogram. Table 1 reports the first three moments of the model simulated in Figure 1 as given by their theoretical expressions, the moments computed on the simulations and the estimated parameters. On these simulations the experimental variance is quite close to the theoretical value but √ the third moment is different to the expected value, in particular for δ ∗ = 1/ 2. Table 1. Three first moments computed on the simulations shown in Figure 3.2 and their estimated parameters.
δ ∗ = 1/2 √ δ ∗ = 1/ 2 √ δ ∗ = 3/2
E[Y ]
m1
Var(Y )
m2
µ3
m3
µ ˆ
σ ˆ2
δˆ∗
0.40 0.56 0.69
0.69 0.97 1.23
0.84 0.68 0.52
0.84 0.75 0.54
0.03 0.08 0.14
0.058 0.35 0.06
0.14 -0.18 0.78
1.14 2.07 0.74
0.64 1 0.65
4 Conclusion In this work, we explore a novel way to model skewness in spatial analysis studies. Our method is based on the closed-skew normal distribution that has the advantage of having similar properties than the multivariate Gaussian distribution and of generating different degrees of skewness. The richness of the general closed-skew normal distribution imposes the practitioner to select simpler sub-models for which the number of parameters is manageable in terms of interpretation and estimation, while keeping a strong skewness. We show in this paper that a CSNn,1 model is not sufficient to satisfy the latter condition and instead, we propose a CSNn,n model that generates much more skewness without being too complex. The drawback of this last model is that the derivation of the moments can not be explicitly undertaken and an approximation has to be used to implement the method-ofmoments in order to estimate the parameters of our more complex model. Despite
8
P. NAVEAU AND D. ALLARD
delta*2 = 0.25
0.8
variogram
0.4
0.3 0.2
0.0
0.0
0.1
Density
0.4
1.2
0.5
delta*2 = 0.25
−1
0
1
2
3
0.1
0.2
0.3
0.4
Y
h
delta*2 = 0.5
delta*2 = 0.5
0.5
0.6
0.7
0.5
0.6
0.7
0.5
0.6
0.7
0.8
variogram
0.4
0.3 0.2
0.0
0.0
0.1
Density
0.4
1.2
0.5
−2
0
1
2
3
4
0.1
0.2
0.3
0.4
Y
h
delta*2 = 0.75
delta*2 = 0.75
0.8
variogram
0.4
0.3 0.2
0.0
0.0
0.1
Density
0.4
1.2
0.5
−1
0
1
2 Y
Figure 1.
3
0.1
0.2
0.3
0.4 h
Histogram and variogram of simulated skewed Gaussian processes
these limitations, the variogram can be well estimated but more work is needed to estimate accurately the skewness parameter. Finally, we believe that spatial models based on the closed-skew normal distribution can offer an interesting alternative to represent skewed data without transforming them. Still, much more research, theoretical as well as practical, has to be undertaken to determine the advantages and the limitations of such a approach.
MODELING SKEWNESS IN SPATIAL DATA
9
5 Acknowledgments This research was supported by the National Science Foundation with the grant ATM-0327936. References Azzalini, A. (1985). A class of distributions which includes the normal ones. Scand. J. Stat., 12:171–178. Azzalini, A. (1986). Further results on a class of distributions which includes the normal ones. Statistica, 46:199–208. Azzalini, A. and Capitanio, A. (1999). Statistical applications of the multivariate skew normal distribution. J. R. Statist. Soc. B, 61:579–602. Azzalini, A. and Dalla Valle, A. (1996). The multivariate skew-normal distribution. Biometrika, 83:715–726. Chil`es, J.-P. and Delfiner, P. (1999). Geostatistics: Modeling Spatial Uncertainty. John Wiley & Sons Inc., New York. A Wiley-Interscience Publication. Cressie, N. A. C. (1993). Statistics for Spatial Data. John Wiley & Sons Inc., New York, revised reprint. Revised reprint of the 1991 edition, A Wiley-Interscience Publication. Dom´ınguez-Molina, J., Gonz´ alez-Far´ıas, G., and Gupta, A. (2003). The multivariate closed skew normal distribution. Technical Report 03-12, Department of Mathematics and Statistics, Bowling Green State University. Dom´ınguez-Molina, J., Gonz´ alez-Far´ıas, G., and Ramos-Quiroga, R. (2004). Skew-normality in stochastic frontier analysis. In Genton, M., editor, Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality, pages 223–241. Edited Volume, Chapman & Hall, CRC, Boca Raton, FL, USA. Genton, M. (2004). Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Edited Volume, Chapman & Hall, CRC, Boca Raton, FL, USA. Genton, M. and N. Loperfido, N. (2005). Generalized skew-elliptical distributions and their quadratic forms. submitted. Annals of the Institute of Statistical Mathematics (in press). Gonz´ alez-Far´ıas, G., Dom´ınguez-Molina, J., and Gupta, A. (2004). The closed skew-normal distribution. In Genton, M., editor, Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality, pages 25–42. Edited Volume, Chapman & Hall, CRC, Boca Raton, FL, USA. Kim, H. and Mallick, B. (2002). Analyzing spatial data using skew-gaussian processes. In Lawson, A. and Denison, D., editors, Spatial Cluster Modelling. Chapman & Hall, CRC. Naveau, P., Genton, M., and shen, X. (2004). A skewed kalman filter. Journal of Multivariate Statistics (in press). Stein, M. L. (1999). Interpolation of Spatial Data. Springer-Verlag, New York. Some theory for Kriging. Wackernagel, H. (2003). Multivariate Geostatistics. An Introduction with Applications. Springer, Heidelberg, third edition.