TRUNCATED SKEW-NORMAL DISTRIBUTIONS

5 downloads 0 Views 224KB Size Report
May 28, 2009 - Abstract In this paper we obtain expression of the mth order moments and some weighted moment of truncated skew-normal distributions.
TRUNCATED SKEW-NORMAL DISTRIBUTIONS: ESTIMATION BY WEIGHTED MOMENTS AND APPLICATION TO CLIMATIC DATA

by

C´ edric Flecher Denis Allard Philippe Naveau

Research Report No. 39 Septembre 2009

Unit´e Biostatistique et Processus Spatiaux Institut National de la Recherche Agronomique Avignon, France http://www.biosp.org

Truncated skew-normal distributions: moments, estimation by weighted moments and application to climatic data by C. Flecher1,2 , D. Allard1 and P. Naveau2 1

Biostatistics and Spatial Processes,

INRA, Agroparc 84914 Avignon, France 2

Laboratoire des Sciences du Climat et de l’Environnement, CNRS Gif-sur-Yvette, France May 28, 2009

Running title: Truncated skew-normal distributions: moments, estimation by weighted moments and application to climatic data Abstract In this paper we obtain expression of the mth order moments and some weighted moment of truncated skew-normal distributions. We linked these formulaes to previous results for truncated classical normal distribution and non truncated skew-normal distributions. Methods to estimate skew-normal parameters using classical and weighted moments are proposed and compared. In a second step we propose to model relative humidity distribution by truncated skew-normal distribution using our both methods to estimate parameters. Key words: Truncated skew-normal distributions, inference methods, classical moments, weighted moments, relative humidity.

1

1

Introduction

In many applications, the probability distribution function (pdf) of some observed variables can be skewed and their values restricted to a fixed interval. For example, variables such as pH, grades, humidity in environmental studies have upper and lower physical bounds and their pdfs are not necessarily symmetric within these bounds. To illustrate such behaviors, Figure 1 shows the estimated pdf of daily relative humidity measurements made in Toulouse (France) from 1972 to 1999. All observations belong to the interval [0, 100] and skewness is apparent, especially during

0

20

40

60

80

100

H (%)

0

20

40

60

80

100

0.06 0.05 0

H (%)

0.03

Density

0.02 0.01 0.00

0.00

0.01

0.02

0.03

Density

0.04

0.05

0.06

Sep−Oct−Nov

0.04

0.05 0.00

0.01

0.02

0.03

Density

0.04

0.05 0.04 0.03 0.00

0.01

0.02

Density

Jun−Jul−Aug

0.06

Mar−Apr−May

0.06

Dec−Jan−Feb

20

40

60 H (%)

80

100

0

20

40

60

80

H (%)

Figure 1: Estimated densities of daily relative humidity measurements made in Toulouse (France) from 1972 to 1999. Each panel corresponds to a season. the Spring and Fall seasons. To model such skewed and bounded data, there exists a large variety of strategy. In this paper our approach is to conceptually view such observations as truncated measurements originating from a flexible skewed distribution. More precisely we assume that the truncation bounds are known and that the underlined pdf belongs to the class of skew-normal densities. Among this large family (e.g., Genton, 2004), we focus on the following skew-normal pdf defined by Azzalini (1985) 2 fµ,σ,λ (x) = φ σ



x−µ σ



  x−µ Φ λ , σ

(1)

where µ ∈ R, σ > 0 and λ ∈ R represent the location, scale and shape parameters, respectively. The notations φ and Φ correspond to the pdf and the cumulative distribution function (cdf) of the standard normal distribution, respectively. The notation X ∼ SN (µ, σ, λ) represents a random variable, X, following (1) and Fµ,σ,λ represents its cdf. The particular case λ = 0 corresponds to the classical normal distribution with mean µ and variance σ 2 . In the following, the pdf and the cdf of a SN (0, 1, λ) are simply denoted fλ and Fλ . In this context our main objective is to propose and study a novel method-of-moment for estimating the parameters of (1) in presence of truncation. Among others, Martinez (2008) recently studied the moments of the skew-normal defined by (1). 2

100

Concerning the moments of a truncated Gaussian random variable, Dhrymes (2005) provided a recursive relationship among the r-order moments of a standard normal distribution, say Z, truncated at a and b mr (a, b) = (r − 1)mr−2 (a, b) −

[z r−1 φ(z)]ba , for r = 1, 2, . . . , [Φ(z)]ba

(2)

where mr (a, b) = E[Z r | a < Z ≤ b] with a < b, [Φ(x)]ba denotes Φ(b) − Φ(a), m0 (a, b) = 1 and m−1 (a, b) can be any finite value. Our first objective is to combine the results of Martinez (2008) with the approach of Dhrymes (2005) in order to derive the moments of a truncated skewnormal distribution, see Section 2. Following the work of Flecher et al. (2009), we also compute a special type of weighted moments of truncated skew-normal distribution. In Section 3, these theoretical expressions of moments allow us to propose a new method-of-moment approach to estimate the parameter estimation of such truncated distributions. In section 4, this estimation method is compared to the classical method-of-moment on simulated data. Daily air relative humidity measurements are also analyzed in this section. Proofs are all shown in the Appendix.

2

Moments of truncated skew-normal variables

Our first result is to solve (2) for any positive integer r. The following lemma gives the solution to such a problem. Lemma 1. Let Z be a standardized Gaussian random variable and −∞ ≤ a < b ≤ +∞. If mr (a, b) = E(Z r | a < Z ≤ b) denotes the rth moment of the truncated Z, then, ! k X [z 2i−1 φ(z)]ba 1 , for k = 1, 2, . . . , m2k (a, b) = (2k − 1)!! 1 − (2i − 1)!! [Φ(z)]ba i=1

m2k+1 (a, b) = −

k X i=0

(2k)!! [z 2i φ(z)]ba , for k = 0, 1, . . . , (2i)!! [Φ(z)]ba

(3)

where n!! denotes the double factorial defined by Arfken (1985) as  1, if n = −1, n = 0 or n = 1, n!! = n × (n − 2)!! if n ≥ 2.

As the couple (a, b) goes to (−∞, ∞), the moments m2k (a, b) and m2k+1 (a, b) tend to (2k − 1)!! and zero, respectively. The latter two values correspond to the classical Gaussian moments. To obtain the truncated skew-normal moments from the truncated Gaussian ones, the following

3

definition of a truncated skew-normal pdf has to be introduced from (1)  1  f (x), if a < x ≤ b, [Fµ,σ,λ (x)]ba µ,σ,λ fµ,σ,λ (x | a < X ≤ b) = 0 , otherwise,

(4)

where X ∼ SN (µ, σ, λ) and −∞ ≤ a < b ≤ +∞ represent the range of the truncation. Let us consider the simple case (µ, σ) = (0, 1). Proposition 1. Let X be a SN (0, 1, λ). If sλ,r (u, v) = E[X r | u < X ≤ v] with u < v denotes its rth moment of the truncated variable, then the following recursive relationship holds sλ,r (u, v) = (r − 1)sλ,r−2 (u, v) + rλ,r (u, v), for r = 1, 2, . . . ,

(5)

where sλ,−1 can be any finite value, λ∗ = (1 + λ2 )1/2 , rλ,r (u, v) = −

2 λ [Φ(λ∗ x)]vu [xr−1 fλ (x)]vu √ + mr−1 (λ∗ u, λ∗ v). [Fλ (x)]vu 2π λr∗ [Fλ (x)]vu

with sλ,0 (u, v) = 1. From (5), we can derive p X (2p − 1)!! rλ,2k (u, v), with p = 1, 2, . . . ,, sλ,2p (u, v) = (2p − 1)!! + (2k − 1)!!

(6)

k=1

sλ,2p+1 (u, v) =

p X (2p)!! rλ,2k+1 (u, v), with p = 0, 1, . . . ,. (2k)!!

(7)

k=0

Martinez’s results (2008) can be viewed as limiting cases of (6) and (7) lim (u,v)→(−∞,+∞)

sλ,2p (u, v) = (2p − 1)!!, with p = 0, 1, . . . ,

p 2 X (2p)!! λ lim sλ,2p+1 (u, v) = √ , with p = 0, 1, . . . ,. (2k − 1)!! (u,v)→(−∞,+∞) (1 + λ2 )k+1/2 2π k=0 (2k)!!

Equalities (6) and (7) tell us that odd (respectively even) moments of truncated skew-normal distributions can be interpreted as linear combinations of even (respectively odd) moments of the normal distribution truncated at λ∗ u and λ∗ v. If λ = 0, Equation (5) and Proposition 1 are equivalent to the recursive equation provided in Dhrymes (2005) and Lemma 1, respectively. The restricting condition (µ, σ) = (0, 1) can be easily removed and leads to the following proposition. Proposition 2. Let X ∼ SN (µ, σ, λ). Then, we have E[X m | a < X ≤ b] =

m X

r m−r r Cm µ σ sλ,r (u, v).

r=0

where u = (a − µ)/σ, v = (b − µ)/σ and sλ,r (u, v) is defined in Proposition 1. 4

(8)

To estimate the three parameters of a truncated skew-normal distribution using a method-ofmoments, the first two moments that can be obtained from Proposition 2 are not sufficient and a third moment is needed. Besides the complexity of deriving a simple and explicit expression of a moment of order three, its estimation is usually tainted with large variance. An alternative route is to derive other types of moments. Following the work of Hosking et al. (1985) and Diebolt et al. (2008), Flecher et al. (2009) introduced and studied probability weighted moments for skewnormal distributions. The basic idea is to compute moments of the type E[X s Φr (X)], where s and r are small integers. Here we concentrate on the case s = 0 and r = 1. We obtain the following proposition. Proposition 3. Let X be a SN (µ, σ, λ) and a < b. Then, E[Φ(X) | a < X ≤ b] = 2Φ2 (0; ν + , I2 + σ 2 D+ Dt+ )

[F+ (x)]ba = ϕ(µ, σ, λ), [Fµ,σ,λ (x)]ba

(9)

where F+ is the cumulative distribution function of a closed skew-normal variable (Genton, 2004) CSN1,2 (µ, σ 2 , D+ , ν + , I2 ), with D+ = (1, λ/σ)t and ν + = (−µ, 0)t .

3

A method-of-moment from truncated skew-normal pdfs

To apply a method-of-moment approach to realizations generated from a truncated skew-normal pdf, three explicit equations are necessary to estimate the three skew-normal parameters. At this stage, two possible approaches can be implemented. Because of Proposition 2, we can compute the three first moments E[X k |a < X ≤ b] with k = 1, 2, 3. This is the classical method-ofmoment approach and we will call it MOM thereafter. A second alternative is to take advantage of Proposition 3. Flecher et al. (2009) showed that this strategy applied to non-truncated skewnormal data leads to better estimates than the classical MOM approach. For truncated data, the following set of three equations is the core of our estimation scehme, thereafter called Method of Weighted Moments (MWM), E[X|a < X ≤ b] = µsλ,0 (u, v) + σsλ,1 (u, v),  Var(X|a < X ≤ b) = σ 2 sλ,2 (u, v) − s2λ,0 (u, v) ,

(10)

E[Φ(X)|a < X ≤ b] = ϕ(µ, σ, λ), with u = (a − µ)/σ, v = (b − µ)/σ and sλ,r (u, v) is defined in Proposition 1. To estimate the parameters, we used the following usual scheme: • Compute from the sample the empirical moments (X, S 2 , Φ(X)), corresponding to E[X],Var(X) and E[Φ(X)], 5

• Solve (10) for (µ, σ, λ) using nlminb function in R.

The complexity of the equations, in particular the last equation related to the weighted moment ˆ The performance of does not allow us to obtain distributional results for the estimators (ˆ µ, σ ˆ , λ). this estimation scheme must therefore be assessed by simulations. Figure 2 shows how E[X 3 ] (left panel) and E[Φ(X)] (right panel) vary as a function of λ, when (µ, σ) = (0, 1). Boxplots have been obtained from 1000 replicates of the empirical moments, each one being computed on a sample of 500 truncated skew-normal random variables. Clearly, the weighted moment Φ(X) is much less dispersed than X 3 , specially for low to moderate values of λ. Also, the theoretical curve E[Φ(X)](λ) is in general steeper than E[X 3 ](λ). Note also that both curves flattens out as λ reaches high values. It will thus become very difficult to estimate high values of λ with precision, when λ is large (say, λ ≥ 4). Identical results have also been obtained with different truncation schemes and/or truncation intensity, or without any truncation at all (see the next section for a precision definition of truncation intensity). These are good indications that the weighted moment should be preferred than the third moment. They have been consistently confirmed on simulations (see below and Table 1).

4 4.1

Data analysis Simulations

To assess our inference method we simulated 1000 vectors of size 500 of independent replicates, using the rsn function of the R package sn (Azzalini, 1985). We considered three values for the shape parameter λ ∈ {1, 2, 4}, corresponding to respectively low, medium and high levels of skewness. Because λ plays a symmetrical role, we restrict ourselves to positive values for λ; λ = 0 corresponds to absence of skewness. Note also that λ = 4 is located in a relatively flat area of the curve E[Φ(X)](λ), i.e. there exists a large interval of values for λ sharing very close values for the weighted moment. We paid a special attention to the type and intensity of truncation. Since left and right truncation do not play a symmetrical role, we considered left, right and bilateral truncation. The intensity of the truncation is defined as the probability that the non truncated skew-normal random variable falls outside the truncation interval. We considered two intensities: 10% and 20%. In the case of a bilateral truncation, the same probability is applied to the left-hand side and right-hand side tail of the distribution.

6

Comparison between MWM and MOM is reported Table 1 for 10% left truncation and λ = 2. Similar results were obtained for other cases, but for sake of shortness they are not reported here. Except in one case (ˆ µ for right truncation) MWM yields better estimates than MOM, both in terms of bias and MSE. One can also observe that MOM has great difficulties in estimating λ in the case of left truncation, a problem not encountered with MWM. This problem is related to a higher variability for the empirical third moment than for the empirical weighted moment. As a consequence of the results shown in Table 1 and Figure 2, we will now only report simulation results for MWM. Table 1: Simulation results for MWM and MOM. 1000 replicates of size 500; left or right truncation with intensity 20%; (µ, σ, λ) = (0, 1, 2).

MWM

Left

Right

MOM

Bias

MSE

Bias

MSE

µ ˆ

0.033

0.027

0.106

0.063

σ ˆ ˆ λ

−0.024

0.007

−0.033

0.010

−0.434

1.143

1994

8.5 107

µ ˆ

0.059

0.033

0.033

0.038

σ ˆ ˆ λ

0.004

0.159

0.107

0.543

−0.003

1.806

0.372

5.546

Table 2 reports the mean and the standard deviation (in brackets) of the MWM estimates. Several observations can be made

• In general, estimates have less bias and lower standard deviation when the truncation intensity is 10% than when it is 20%. There are some exceptions, but of low magnitude. • The most difficult parameter to estimate is λ. This fact has been consistently reported for skew-normal distribution (Arellano–Valle et al. (2008)); even with a truncation intensity of 10%, it is extremely difficult to estimate λ if the distribution is left truncated. MOM yields to even worst result and cannot be considered as an alternative in this case. • The scale parameter σ is in general very well estimated; with a tendency to i) underestimation for bilateral truncation and high values of λ, and ii) overestimation with a right truncation and low level of skewness. • The location parameter is not always well estimated; it is well estimated when λ is also well estimated. 7

Table 2: Mean and standard deviation of estimates for MWM. 1000 replicates of size 500; left, right and bilateral truncation with varying amount of intensity; (µ, σ) = (0, 1).

µ ˆ

σ ˆ

ˆ λ

λ=1

λ=2

λ=4

Left 10%

0.084[0.152]

0.125[0.184]

−0.074[0.167]

Left 20%

0.107[0.212]

0.033[0.161]

−0.201[0.250]

Right 10%

0.040[0.259]

0.028[0.128]

0.009[0.053]

Right 20%

0.037[0.324]

0.059[0.173]

0.018[0.062]

Bilat. 10%

0.031[0.209]

0.098[0.144]

0.118[0.092]

Bilat. 20%

0.064[0.153]

0.172[0.077]

0.034[0.166]

Left 10%

0.969[0.071]

0.938[0.089]

0.991[0.081]

Left 20%

0.979[0.092]

0.976[0.077]

1.032[0.094]

Right 10%

1.041[0.249]

0.987[0.162]

0.990[0.107]

Right 20%

1.243[0.771]

1.004[0.398]

0.985[0.194]

Bilat. 10%

1.018[0.169]

0.929[0.129]

0.846[0.070]

Bilat. 20%

0.980[0.142]

0.831[0.083]

0.866[0.121]

Left 10%

0.819[0.421]

1.255[0.863]

1.301[0.729]

Left 20%

1.102[0.907]

1.566[0.977]

1.043[0.631]

Right 10%

1.075[0.771]

1.993[0.695]

3.977[0.991]

Right 20%

1.455[1.788]

1.997[1.344]

3.931[1.436]

Bilat. 10%

1.105[0.738]

1.646[0.966]

0.911[0.367]

Bilat. 20%

0.920[0.578]

0.891[0.205]

0.834[0.055]

The most interesting feature of this simulation study is that the side and the intensity of truncation plays an important role in the performance of the estimation procedure. Problems occurs for important truncation levels and especially for bilateral and left truncation when the shape parameter is high. Notice that the truncation intensity is in fact an unknown quantity. Only the truncation thresholds ˆ In some cases, the are known, and the resulting truncation intensity is a function of (ˆ µ, σ ˆ , λ). parameters are not identifiable. Consider for example a one-sided truncation at t = 0 (i.e. the density is equal to 0 when x < 0). Because a skew-normal distribution with infinite shape parameter λ is nothing but a unit normal distribution truncated at 0, the two sets of parameters (µ = 0, σ = 1, λ = 0, t = ∞) and (µ = 0, σ = 1, λ = ∞, t = 0) lead to the same density. In this case

8

Table 3: Mean [and standard deviation] of the estimates of a SN (0, 1, 2) with 20% left truncation, when clustered in two populations. Group # 1

Group # 2

N

463

537

µ ˆ

0.085[0.187]

−0.011[0.116]

σ ˆ ˆ λ

0.936[0.079]

1.009[0.056]

0.694[0.367]

2.31[0.663]

Intensity

33.2%[7.6]

19.4%[7.3]

the parameters are not identifiable. This is only a degenerate limit case, but obviously, in some situations it can happen that a whole set of parameter values leads to very comparable distributions. Consider for example the estimation of a SN (0, 1, 2) with 20% left truncation. The estimates are ˆ ranges from 0.1 to 5.5, with a mean equal to 1.5. Inspection of Figure 3a plotted Figure 3. λ ˆ is bimodal, corresponding to two populations of estimates shows in fact that the distribution of λ ˆ of comparable size (see Table 3). Note also that the apparent truncation intensity is much (ˆ µ, σ ˆ , λ) ˆ values. Figure 4 depicts the true density and higher than the true one in the group of lower λ the two densities corresponding to the center of the two populations. This Figure shows clearly that the three densities are very close for moderate to high values. They differ mainly close to the truncation threshold. This example indicates us two things: i) since the third moment E[X 3 ] is very sensitive to high values of X, i.e. values where the densities do not differ very much, MOM has grater difficulties than MWM for estimating the parameters, see Table 1. ii) For the same reason, truncation to the right (when λ > 0) is not a problem and the parameters are quite well estimated, see Table 2.

4.2

Daily relative humidity measurements

The relative humidity of an air-water mixture is defined as the ratio of the partial pressure of water vapor in the mixture to the saturated vapor pressure of water at a prescribed temperature (Perry, 2007). This quantity is normally expressed as a percentage and is thus restricted to the interval [0, 100]. The data set considered here consists of daily measurements of relative humidity in the INRA weather station located in Toulouse, South of France, between 1972 and 1999. In temperate regions, relative humidity is always much larger than 0; truncation at 100 is the only 9

truncation visible on the experimental histograms. We will therefore consider right truncation only. To take into account the seasonality of the climate, the year is divided in four periods, (Semenov et al., 1998; Flecher et al., 2009): December-January-February (DJF), March-April-May (MAM), June-July-August (JJA) and September-October-November (SON). Our goal is to fit truncated skew-normal distribution to ]0, 100[ values for each one of the four periods assuming climatologic stationarity throughout the whole period 1972-1999, because climate change is not our primary concern here. For each season, the parameters were estimated by both methods. Results are presented in Table ˆ is negative, i.e. we are in the difficult case described in the previous section. 4. Except in JJA, λ ˆ is positive, and the estimates obtained with the two Hence, the estimates are different. In JJA, λ methods are very close. The densities with the parameters estimated via MWM are depicted Figure 1. The agreement between histograms and estimated densities is clear. Table 4: Estimated parameters by MWM and MOM for relative humidity in Toulouse for four seasons. MWM

5

MOM

µ ˆ

σ ˆ

ˆ λ

DJF

94.1

9.4

−0.98

92.0

9.0

0.00

MAM

86.2

15.6

−1.98

86.0

16.0

−2.05

JJA

58.8

14.6

1.29

59.0

14.0

1.17

SON

92.2

16.1

−2.12

88.0

14.0

0.01

µ ˆ

σ ˆ

ˆ λ

Concluding remarks and discussion

In this paper we defined a new structure of distribution for skewed and bounded data. We mixed results on moments of truncated Gaussian distribution and on skew-normal density. Then we were able to express the mth order moments of truncated skew-normal distribution for any m ≥ 0 and to express it as a linear combination of moments of truncated Gaussian distribution. We linked our results with classical results on non-truncated skew-normal distribution moments. We also derived expression for some weighted moments by taking advantage of the formulaes proposed by Flecher et al. (2009). We then presented two inference methods of parameters based on classical moments and on weighted moments. Both methods were compared for different values of the shape parameter λ and truncation intensities. Clearly the method based on weighted moments gives better estimates than the 10

method of classical moments, especially for the scale parameter. We also showed that when the truncation intensity or the shape parameter are too high the parameters are close of being non identifiable. We have seen that different set of parameters lead to similar densities Figure 4; Table 3 illustrates how this fact is translated into an estimation indeterminacy, at the cost of a varying truncation intensity. Supplementary information at or near the truncation would greatly improve the precision of the estimators. For example, if the intensity of a left truncation was known to be equal to, say η, we could use P(Z ≤ a) = η as third equation in (10). Future work should thus be aimed at proposing estimation equations carrying more information about the density near the truncation, in order to provide better estimators. Despite these limitations, the illustration on relative humidity showed that it was indeed possible to estimate the parameters on truncated data and to provide density models fitting the histograms. Acknowledgments. This work was supported by the ANR-CLIMATOR project, the NCAR Weather and Climate Impact Assessment Science Initiative and the ANR-AssimilEx project. The authors would also like to credit the contributors of the R project.

11

0.05

(b)

0.75

(a)

10% 0.00

0.70

10%

−0.10

sλ,3(u,v)

−0.20

−0.15

0.65 0.60 0.55 0.40

−0.30

0.45

−0.25

0.50

E[Φ Φ(X)]

20%

−0.05

20%

1

2

4

1

λ

2

4 λ

Figure 2: Theoretical and empirical moments of E[X 3 ] and E[Φ(X)] as a function of λ. Boxplots are computed on 1000 samples of size 500 of truncated SN (0, 1, λ) with λ ∈ {1, 2, 4}. 10% and 20% left truncation.

12

140



1.2

120

1.0

^ σ

80

1.1

100

● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●●●● ●● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ●● ●

0.9

60



0.7

0

20

0.8

40

Frequency



0

1

2

3

4

5

−0.8

−0.6

^ λ

−0.4

−0.2

0.0

0.2

0.4

0.6

^ µ

ˆ computed on 1000 samples of size 500 of truncated SN (0, 1, 2); 10% Figure 3: Estimates (ˆ µ, σ ˆ , λ) left truncation.

References Arfken, G., 1985. Mathematical Methods for Physicists, 3rd ed. Orlando, FL: Academic Press. Azzalini, A., 1985. A class of distributions which includes the normal ones. Scand. J. Statist, 12, 171–178. Azzalini, A., 2005. The skew-normal distristribution and related multivariate famelies. Scand. J. Statist, 32, 159–188. Arellano–Valle, R.B. and Azzalini, A., 2008. The centred parametrization for the multivariate skew-normal distribution. Journal of Multivariate Analysis, 99, 1362–1382. Diebolt, J., A. Guillou, P. Naveau and P. Ribereau. Improving probability-weighted moment methods for the generalized extreme value distribution. REVSTAT - Statistical Journal, 6–1, 33–50. Dhrymes, P.J., 2005. Moments of truncated Normal distribution. Unpublished note. Avalaible at (28.05.2009).

13

0.4 0.0

0.2

density

0.6

SN(0,1,2) SN(0.09,0.94,0.69) SN(−0.01,1.01,2.3)

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Figure 4: Black solid line: 20% left truncated SN(0,1,2); grey pointed line: average density of group #1; grey dashed line: average density of group #2.

Flecher, C. Naveau, P. and Allard, D., 2009. Estimating the Closed Skew-Normal parameters using weighted moments. Under review. Flecher, C. Naveau, P., Allard, D. and Brisson, N., 2009. A Stochastic Daily Weather Generator for Skewed Data. Submitted. Genton, M.G. (Ed.), 2004. Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Boca Raton, FL: Chapman & Hall/CRC. Hosking, J.R.M., J.R. Wallis and E.F. Wood. Estimation of the generalized extreme-value distribution by the method of probability-weighted moments, Technometrics, 27, 251-261, 1985. Martinez, E.H., Varela, H., Gomez, H.W and Bolfarine H., 2008. A note on the likelihood and moments of the skew-normal distribution. SORT, 32(1), 57–66. Perry, R.H. and Green, D.W. (Ed.), 2007 Perry’s Chemical Engineers’ Handbook, 7th Revised edition. McGraw-Hill Publishing Co. Semenov, A.M., Brooks, R.J., Barrow, E.M. and Richardson, C.W., 1998. Comparison of the

14

WGEN and LARS-WG stochastic weather generators for diverse climates. Climate research, 10, 95–107.

6 6.1

Appendix Proof of lemma 1

Dhrymes (2005) obtained the recursive representation (2). The proof is done by induction. It is only presented for odd moments; it is very similar for even moments. First note that the lemma is true for k = 0. Suppose now that (3) is true for the (2k + 1)th moment. Then, using the recursive representation for the (2k + 3)th moment: m2k+3 (a, b) = (2k + 2) −

k X (2k)!! [z 2i φ(z)]b

a

i=0

= −

(2i)!! [Φ(z)]ba

k+1 X (2k + 2)!! [z 2i φ(z)]b

a

i=0

(2i)!!

[Φ(z)]ba

! −

(2k + 2)!! [z 2k+2 φ(z)]ba , (2k + 2)!! [Φ(y)]ba

.

Hence, (3) is true for any odd moment.

6.2

Proof of proposition 1

We first prove the first part of the proposition, i.e. equation (5). For n > 1, denoting λ∗ =



1 + λ2 ,

and integrating by part the quantity sλ,r−2 , Z v 2 sλ,r−2 (u, v) = ξ r−2 φ(ξ)Φ(λξ)dξ, [Fλ (ξ)]vu u   Z v 1 [ξ r−1 fλ (ξ)]vu 2λ r−1 = sλ,r (u, v) + − ξ φ(ξ)φ(λξ)dξ , r−1 [Fλ (ξ)]vu [Fλ (ξ)]vu u   Z v 1 [ξ r−1 fλ (ξ)]vu 2λ 1 r−1 = sλ,r (u, v) + −√ ξ φ(λ∗ ξ)dξ , r−1 [Fλ (ξ)]vu 2π [Fλ (ξ)]vu u   1 [ξ r−1 fλ (ξ)]vu 2 λ [Φ(λ∗ ξ)]vu √ = sλ,r (u, v) + − m (λ u, λ v) , r−1 ∗ ∗ r−1 [Fλ (ξ)]vu 2π λr∗ [Fλ (ξ)]vu from which equation (5) follows directly by denoting rλ,r (u, v) = −

[ξ r−1 fλ (ξ)]vu 2 λ [Φ(λ∗ ξ)]vu √ + mr−1 (λ∗ u, λ∗ v). [Fλ (ξ)]vu 2π λr∗ [Fλ (ξ)]vu

The second part of the proposition is proved by induction, similarly to the proof of proposition 1. 15

6.3

Proof of Proposition 2

Let X ∼ SN (µ, σ, λ) with pdf fµ,σ,λ and cdf Fµ,σ,λ . Then, E[X

m

Z | a < X ≤ b] =

b

2

m

x fµ,σ,λ (x)dx = a

σ[Fµ,σ,λ (x)]ba

Z

b

a

xm φ(

x−µ x−µ )Φ(λ )dx. σ σ

Let us consider the change of variable ξ = (x − µ)/σ. Then, with ξi = (xi − µ)/σ, i = 1, 2 we have Z v 2 (σξ + µ)m φ(ξ)Φ(λξ)dξ, E[X m | a < X ≤ b] = [Fλ (ξ)]vu u ! Z v X m 1 r r r m−r = Cm σ ξ µ fλ (ξ)dξ, [Fλ (ξ)]vu u r=0 Z v m X 1 r r m−r = Cm σ µ ξ r fλ (ξ)dξ, [Fλ (ξ)]vu u r=0

=

m X

r r m−r Cm σ µ sλ,r (u, v).

r=0

6.4

Proof of Proposition 3

This proposition is a direct application of a more general result regarding the weigthed moments of Closed Skew-Normal variables, established in Flecher et al. (2009). Proposition 2 in Flecher et al. (2009) is then applied for the function h(x) = 1/[Fµ,σ,λ (x)]ba , for a < x ≤ b and h(x) = 0 elsewhere.

16