Testing the normality of the residuals of surface ...

6 downloads 0 Views 4MB Size Report
Apr 21, 2015 - The statistical tests employed to test the normal- ity are the following: Pearson χ2, Kolmogorov-Smirnov and. Anderson-Darling (as frequentist ...
Arab J Geosci (2015) 8:10119–10134 DOI 10.1007/s12517-015-1911-7

ORIGINAL PAPER

Testing the normality of the residuals of surface temperature data at VLBI/GPS co-located sites by goodness of fit tests Emine Tanır Kayıkçı 1 & Eyüp Sopacı 2

Received: 30 September 2014 / Accepted: 7 April 2015 / Published online: 21 April 2015 # Saudi Society for Geosciences 2015

Abstract Evaluating the distribution patterns of surface temperature data at Very Long Baseline Interferometry (VLBI)/ Global Positioning System (GPS) co-located sites w.r.t. normality is one of the most important issues in modeling surface temperature data over long periods. Such evaluation can generate algorithms for filling in missing data at measurement sites. Some algorithms in the literature, such as those in the study of Cho et al. J Coast Res 65. doi: 10. 2112/SI65-321. 1, (2013), require trend, harmonic, and residual components to fill in the missing data. Trend and harmonic components estimate an optimal model that can be used to assist such algorithms when filling in missing data. The present study is based on the investigation of the normal distribution of the residuals of a surface temperature time series at VLBI/GPS co-located sites, after removing the trend and seasonal effects through harmonic components (inter-daily variations). This study uses surface temperature data collected from the VLBI/GPS colocated sites of two different regions in Europe: Matera (Italy) and Wettzell (Germany). The data collected from these sites form a time series, and time series analyses and conventional k-sigma outlier detection are implemented on these data sets before subjecting them to goodness of fit tests for normality. The residual components of the time series are acquired through a decomposing trend and signal effect from the

original time series, assuming that the residuals of the time series are normally distributed. In testing the hypothesis that an observed frequency distribution fits the normal distribution, the following tests are used: Pearson χ2, KolmogorovSmirnov, Anderson-Darling, Shapiro-Wilk or ShapiroFrancia, D’Agostino, Jarque-Bera, skewness, and kurtosis tests. Some graphical methods are also applied to support the results of the goodness of fit tests for normality. Some proposals on the application of the goodness of fit tests are put forward, such as the evaluation of the estimation model for trend and harmonic components by considering the properties of the implemented goodness of fit tests. The results of this study can be used to determine the optimal model for estimating trend and harmonic components. The output of the present study is expected to have an important role in modeling surface temperature distributions at co-located VLBI/GPS sites for filling in missing data. Above all, meteorological data, such as temperature, pressure, and humidity, are of specific interest for modeling tropospheric delay, the main error factor in positioning in space geodesy, which in turn makes investigations on the distribution of meteorological data more attractive in geoscience. Keywords Time series analysis . Goodness of fit test . Very Long Baseline Interferometry (VLBI) . Global Positioning System (GPS) . Normal distribution . Surface temperature

* Emine Tanır Kayıkçı [email protected] Eyüp Sopacı [email protected] 1

Department of Geomatics Engineering, Karadeniz Technical University, Trabzon 61080, Turkey

2

Graduate School of Geodesy and Geographic Information Technologies, Middle East Technical University, Çankaya, 06800 Ankara, Turkey

Introduction Probability distributions are used to model variability and support decisions. If an inappropriate distribution that does not fit the data well is used, subsequent calculations will be incorrect and will certainly result in wrong decisions. The use of incorrect models in many industries can have serious

10120

consequences, such as the inability to complete tasks or projects on time, wrong engineering designs that result in damages to expensive equipment, and others. In some specific areas, such as hydrology, using inappropriate distributions can even be more critical. Several statistical models that allow estimating distribution parameters based on the sample data should be applied to find a particular distribution. After the selected distributions are fitted, how well they fit the data must be determined. Several goodness of fit tests assess how well a set of observations fits an expected statistical distribution. They fall into two broad categories: graphical and statistical methods. The goodness of fit test for normality is used to test the hypothesis that the observed data from the field fit the normal distribution. Goodness of fit tests can be successfully used in various areas, such as signature verification, automatic speaker identification, detection of radio frequency, economics, and data reconstruction (Biswas et al. 2008; Cho et al. 2013; Güner et al. 2009; Srinivasan et al. 2005). The present study uses surface temperature data collected from Very Long Baseline Interferometry (VLBI)/Global Positioning System (GPS) co-located sites of two different regions in Europe: Matera (Italy) and Wettzell (Germany). The series is decomposed into its trend, harmonic (inter-daily variations), and residual components. The trend and harmonic components that correspond to daily patterns are estimated using the linear least squares (LSQ) estimation method, and the residual components of a time series are acquired by extracting the trend and harmonic components from an original time series. The residual component is normally distributed and used as a noise component in a time series. Conventional k-sigma outlier detection is applied on the residuals to detect outliers and remove them from the residuals. The residuals are then analyzed with the goodness of fit test for normality. This study uses both graphical and statistical methods of goodness of fit tests to check the normality of the residuals of a surface temperature time series received from GPS/VLBI colocated sites. The statistical tests employed to test the normality are the following: Pearson χ2, Kolmogorov-Smirnov and Anderson-Darling (as frequentist tests), Shapiro-Wilk (SW) or Shapiro-Francia (SF) (as regression-correlation tests), D’Agostino, Jarque-Bera (JB), and skewness and kurtosis tests (as moment-based tests). Graphical methods provide some information on the shape of a distribution but do not give exact information on the difference between normal and sample distributions. Evaluating residuals using various statistical tests to determine normality demonstrates the effectiveness of the method of estimating trend and harmonic components. Although the main contribution of this study is the comparison of the results of different goodness of fit tests for the normality of the residuals of a surface temperature time series from VLBI/GPS colocated sites, the results also contribute in investigations on the optimal model for estimating trend and harmonic

Arab J Geosci (2015) 8:10119–10134

components. If a test leads to the conclusion that normality is absent in the residuals, one or more harmonic components may be incorporated in the seasonal model, or one or more additional break points may be inserted in the trend model. The output of the present study may have an important role in generating a methodology of filling in missing data in repeated observations of the same variables over long periods of time, such as surface temperature data at co-located sites. To generate an algorithm for filling in missing data at measurement sites, such as at co-located sites using the idea of filling in the missing data (Cho et al. 2013), harmonic components (seasonal, annual, and interannual variations) and residuals are obtained from the raw data in the first step. The normal distribution assumption must be satisfied for the application of outlier detection in the residuals in the second step. Therefore, applying the correct methodology in testing normality is an important decision in this step. A random number is generated from the mean and standard deviation of a residual component. In the final step, the sum of the harmonic and residual components is used to fill in the missing data. Various tropospheric delay models have been developed as a function of meteorological parameters, such as temperature, pressure, and humidity errors in the tropospheric delay, which mostly contributes to errors in the height component of a station’s coordinate estimates (Ahn et al. 2006; Mendes and Langley 1994; Saastamoinen 1972). Investigations on distributions of meteorological data, such as temperature, pressure, and humidity, which are measured through installed instruments near geodetic antennas, can help improve the main geodetic parameters (e.g., station coordinates) because tropospheric delay is known as one of the largest limiting factors of high-accuracy positioning. Therefore, the distribution of meteorological data should be of interest to readers not only in the area of applied statistics but also in geodesy.

Co-located space geodetic sites in Europe Space geodetic techniques (e.g., GPS and VLBI) that have been established for surveying and geodesy purposes may also be used for meteorology and climatology studies. The International GNSS Service (IGS) was formally established in 1993 by the International Association of Geodesy and began routine operations on January 1, 1994. The IGS has operated a worldwide network of permanent GPS tracking stations with more than 350 GPS sites, each equipped with a GPS receiver. The International VLBI Service for Geodesy and Astrometry (IVS) is an international collaboration of organizations that operate or support VLBI components. All IVS data and products are achieved in data centers and are publicly available for research in the related areas of geodesy, geophysics, and astrometry. In recent years, space geodetic techniques, such as GPS and VLBI, have been co-located to address many

Arab J Geosci (2015) 8:10119–10134

applications in geoscience. A co-location site is defined as two or more space geodesy instruments that simultaneously or subsequently occupy close locations, which are very precisely surveyed in three dimensions using classical or GPS surveys. This study uses the surface temperature data on the IGS server (ftp://igs.ensg.ign.fr/pub/igs/data/2008/054/wtzr0540.08m.Z. Accessed 01 March 2013) with about 15-min time resolution received from temperature sensors in the sites of Matera in Italy and Wettzell in Germany (Fig. 1), which are equipped with the co-located techniques of GPS and VLBI (a part of a European network). A meteorological sensor, together with a GPS and VLBI antenna, is installed at each site. Both IGS and IVS data sites collect surface pressure, temperature, and humidity data. These surface meteorological measurements, together with the GPS and VLBI data, are made available to data and processing centers. The VLBI reference point and GPS control point in both Wettzell and Matera are located near each other (Fig. 2). Continuously operating GPS receivers at both sites of Matera and Wettzell are part of the work of the European IGS network. The 20-m radio telescopes at the sites of Wettzell (http://www.wettzell.ifag.de/index_e.html. Accessed 01 March 2013) and Matera (http://www.asi.it/en/agency/bases/ geodesy. Accessed 01 March 2013) are European network stations in IVS (Table 1). The surface temperature data received from the Matera and Wettzell sites from January 1 to 20, 2012, have the form of a time series and a set of repeated observations of the same variable. The received surface meteorological data are used to test the algorithms in terms of their statistical and graphical goodness of fit test for normality.

Graphical methods for testing normality Graphical methods, which include the histogram and normality plot, facilitate the assessment of normality in the first step. Using such methods, the distribution of the data can be

10121

identified as Bnormal^ or Bnot normal.^ In most cases, goodness of fit tests for normality are necessary to confirm the results of graphical methods. Using graphical methods requires experience in interpreting normality based on graphs or plots (e.g., scientific knowledge about the data). The simplest test of assessing normality is to look at the histogram and evaluate whether or not it approximates the bell curve of a normal distribution. Researchers may draw a histogram, stem and leaf plot, scatter plot, or boxplot to determine how a variable is distributed. One of these graphs may be used depending on the number of observations. When the number of observations is small, the stem and leaf plot is useful to visualize the data. However, when the number of observations is large, the histogram is the appropriate graph to use. If a variable is normally distributed, its histogram is bell shaped. The histogram gives an idea of whether or not the observed data follow the assumption of normality. Other graphical methods for checking normality are the probability-probability (P-P) and quantile-quantile (Q-Q) plots. In the Q-Q plot, observed values and expected value (i.e., the normally distributed data represented by the line) are plotted on a graph. If the observed value varies from a straight line, then the data are not normally distributed. Otherwise, the data are normally distributed. The P-P plot assesses how the cumulative distribution functions of the observed data closely agree with a specific theoretical cumulative distribution function (e.g., the normal distribution function). Thus, the P-P and Q-Q plots are used to determine how well a theoretical distribution models the observed data. However, these graphical methods do not provide objective criteria for testing normality. Graphical methods do not guarantee that the distribution is normal. However, visually presenting the data facilitates the assessment of the distribution assumption. The frequency distribution (histogram), stem-and-leaf plot, boxplot, P-P plot, and Q-Q plot are used to visually check normality (Şişman 2014). The frequency distribution that plots the observed values against their frequency provides a visual judgment

Fig. 1 Geographic maps of Germany and Italy, indicating the location of Wettzell (left) and Matera (right)

10122

Arab J Geosci (2015) 8:10119–10134

Fig. 2 Geodetic observatories of Wettzell (left) and Matera (right)

about whether or not the distribution is bell shaped and also insights on the gaps in the data and outlying values. The stemand-leaf plot is a method similar to the histogram, although it retains information on actual data values. The P-P plot plots the cumulative probability of a variable against the cumulative probability of a particular distribution (e.g., normal distribution). The Q-Q plots are easier to interpret in the case of large sample sizes. The boxplot shows the median as a line inside the box and the interquartile range (i.e., range between the 25th to 75th percentiles) as the length of the box. The whiskers (i.e., line extending from the lower and upper end of the box) represent the minimum and maximum values when they are within 1.5 times the interquartile range from either end of the box. Scores greater than 1.5 times the interquartile range are out of the boxplot and are considered outliers, whereas those greater than three times the interquartile range are extreme outliers. A boxplot that is symmetric with the median line at approximately the center of the box and with symmetric whiskers that are slightly longer than the subsections of the center box suggests that the data may have come from a normal distribution.

Goodness of fit tests for testing normality A goodness of fit test describes how well a set of observations fits an expected statistical distribution and is applied by testing the hypothesis that the observed data from the field fit the expected statistical distribution. In a time series analysis, goodness of fit tests for testing normality is used to determine the probabilistic process that describes a particular time series Table 1

Latitude and longitude of the locations of Matera and Wettzell

Technique

Site name

Longitude (deg.)

Latitude (deg.)

GPS VLBI GPS VLBI

Wettzell Wettzell Matera Matera

12.87891 12.87745 16.70446 16.70402

49.14420 49.14501 40.64913 40.64952

(Iqbal and Quamar 2011). Moreover, in various fields of geosciences, testing normality with various goodness of fit tests is required because a normal distribution may lead to more reliable results, which are necessary for analyses (Abouelnaga et al. 2014; Narany et al. 2014; Safari et al. 2013). As analysts, scientists and engineers often assume that their observed data are normally distributed. Two parameters control the normal distribution: the mean for the location of the parameter and the standard deviation for the dispersion of the distribution. Skewness measures the symmetry, and kurtosis measures the tails of the distribution. Normal distribution is the most frequently used distribution in statistical theory and applications. Testing the normality of data is of fundamental concern to the analyst when conducting statistical analysis using parametric methods. Most statistical tests, such as t-tests, linear regression analysis, and ANOVA, require the assumption that a variable or variables are normally distributed. When the normality assumption is violated, interpretation and inferences may not be reliable. Some nonparametric trend tests can calculate trends in a time series without requiring normality. However, they are recommended for general use by the World Meteorological Organization (Rahimi and Ahmadi 2015). Therefore, assessing such assumptions before using any appropriate statistical test is important. Both graphical and statistical methods are used for evaluating normality. The goodness of fit tests for normality is one of the commonly used statistical procedures in determining whether or not the observed data originate from a normal distribution. The approach to testing normality is to determine whether the following hypotheses are to be accepted or rejected: H0: No difference exists between the distribution of the observed data and a normal distribution. HA: A difference exists between the distribution of the observed data and a normal distribution. With the above hypotheses, we compare what we have with what we expect to see. If the difference between the observed and expected statistical distributions is large, the null

Arab J Geosci (2015) 8:10119–10134

10123

hypothesis H0 is rejected, and the alternative hypothesis HA is accepted. All normality tests report a p value. Similar to other tests, the p value is an estimate of the probability of a random sample generating data that deviate from the normal distribution as much as the observed data do. The goodness of fit test can be performed through the critical value or p value method. The null hypothesis of a normal distribution is rejected when the p value is smaller than any standard significance level α. That is, the smaller the p value, the greater the evidence that the data do not originate from the selected distribution. Therefore, when the calculated p value is greater than any standard significance level, no evidence exists to reject the null hypothesis of a normal distribution.

Time series analysis A time series is a set of data, usually collected at time intervals. Time series data occur in many areas of application, such as economics, finance, environmental science, medicine, and so on (Danneberg 2012; Hughes et al. 2006; Khajavi et al. 2012; Petrow and Merz 2009). Time series patterns can be described in terms of four basic elements: trend (T) as longterm movements in the mean, harmonic components (I) as cyclical fluctuations related to the calendar, cycles (C) as other cyclical fluctuations, and residuals (E) as other random or systematic fluctuations. Time series models are formally denoted as follows: X ¼T þI þCþE

ð1Þ

After identifying and estimating a time series model, the entire model is checked for the normality of its residuals either by graphical methods (P-P, Q-Q plots, and histograms) or by goodness of fit tests. Many methods of time series analysis depend on the basic assumption that the data are sampled from a normal distribution. Investigating the use of normality tests in geodetic data analysis is the main contribution of the present study. This study defines the trend model as a linear line with break points at each period. The trends are defined and detrended from the series with the Bdetrend^ MATLAB function. The harmonic effects are modeled with the sine and cosine functions because of the behavior of the temperature time series in increased mode at daytime and in decreased mode at nighttime. The frequencies of the time series as determined through Fourier transformation (with the MATLAB fft function) are used to model the composed sine and cosine functions as follows: xk ¼ x þ

p X

β 2i−1 cos2πλi t k þ β 2i cos2πλi t k

ð2Þ

i¼1

where tk is the time point at which the data are collected k=1, 2, 3,.., n with n number of data, xk =x(tk) is the data collected at time point t k ; x is the mean of xk, λi (i=1, 2, 3,.., p) is the

frequency itself with p as the number of the highest frequencies. The parameters of β2i and β2i−1 are calculated by the following equation using the LSQ method: n X k¼1

" ðxk −xÞ−

p X

!#2 β 2i−1 cos2πλi t k þ β 2i cos2πλi tk

¼ min

i¼1

ð3Þ The trend is extracted from the raw temperature data in the first step, and the harmonic component corresponding to a seasonal pattern is extracted in the next step. After removing the trend and harmonic components from the model, the remaining data are theoretically considered normally distributed residuals. Before applying the goodness of fit tests for testing the normality of the residuals, outlier detection is applied to retain the residuals by k-sigma (k=2 is used) criteria. This approach is very easy to implement. If the residual vi of the ith observation is larger than km0, then this observation is determined as an outlier. k is an integer coefficient defined by the user and can take the value of 2 or 3, and m0 is the a posteriori standard deviation of the unit weight (Sopacı 2013; Tanır et al. 2004; Teke 2011).

χ2-based tests The chi-square test is used to determine the goodness of fit between theoretical and experimental data by comparing observed values with theoretical or expected values. Observed values are those that the researcher obtains empirically through direct observation, whereas theoretical or expected values are developed based on some hypothesis. In the present study, the chi-square test is used to determine the goodness of fit for normality. To determine whether or not the observed values follow a normal distribution, the null and alternative hypotheses are as follows: H0: The observed data follow a normal distribution. HA: The observed data do not follow a normal distribution.

Pearson χ2 test The data can be classified into one of k classes, with the probabilities p1, p2, p3,…, pk falling in each class. If all the data are accounted for, then ∑pi =1. Suppose the data are taken by classifying them, the number of observations falling in each of the k classes is counted. n1 falls in the first class, n2 in the second, and so on, up to nk in the kth class. A total of n

10124

Arab J Geosci (2015) 8:10119–10134

observations exist; thus, ∑ni =n. npi is the mean or expected value of ni. The Pearson’s chi-square test statistic is χ2 ¼

k X ðni −npi Þ2 npi i¼1

ð4Þ

A large value of the test statistic is evidence that the observed data do not match the expected distribution. The p value is the probability of the chi-square test statistic being as large or larger if the null hypothesis is true. The Pearson’s chi-square test is used to find the p value using the MATLAB Bchi2cdf^ function with k-3° of freedom. If 1-p is smaller than α level of significance, then hypothesis H0 can be rejected. Otherwise, if 1-p is larger than α level of significance, then hypothesis H0 should not be rejected. In the present study, the Pearson χ2 test is applied using the Bchi2gof^ function in MATLAB with k-3° of freedom.

This is a step function that increases by 1/n at the value of each ordered data point. F(x) is the normal distribution function for the data set x1, x2,…, xn with a mean of μ ¼ x and a variance of σ=s2. The Kolmogorov-Smirnov test statistic is defined as D ¼ sup j F ðxÞ− F n ðxÞj

ð7Þ

x∈R

The difference between F(x) and Fn(x) follows a distribution designed by Kolmogorov. The values of this distribution are critical values D1−α corresponding to test statistics D. If the computed value of D is smaller than the D1−α table value, then the null hypothesis is accepted, and the alternative hypothesis is rejected; otherwise, the alternative hypothesis is accepted. In this study, the Kolmogorov-Smirnov test is applied using the Bkstest^ function in MATLAB (Drezner et al. 2008; Lilliefors 1967).

Frequency-based tests

Anderson-Darling test

A sample of n observations x1, x2, x3, …, xn may be considered as a sample from a given specified distribution. For example, the distribution may be a normal distribution with a mean of B0^ and a variance of B1^. The specified distribution is more generally defined by its cumulative distribution function F(x). The model specifies u1 =F(x1), u2 = F(x2),…, un =F(xn) as a sample from the standard uniform distribution in the unit interval [0,1]. The empirical distribution function is defined as F n ðxÞ ¼ kn. A goodness of fit test involves a comparison of Fn(x) with F(x). The hypothesis H0 :Fn(x)=F(x) is rejected if Fn(x) is different from F(x). This difference is defined as

The Anderson-Darling test uses a ψ(t) weight function, which is between [0,1] and computed as

W 2n

Z∞ ¼n

ψ ð uÞ ¼

1 uð1−uÞ

As the equation shows, the weight function is maximized for the values u=0 and u=1; thus, it gives more weight to the tiles of the funtion. The test statistic can be expressed as A2n

Z∞ ¼ −n −∞

2

½ F n ðxÞ− F ðxÞ ψ½ F ðxÞd F ðxÞ

ð9Þ

   1X ð2 j−1Þ logu j þ log 1−uðn− jþ1Þ n j¼1 n

Z∞ ½ F n ðxÞ− F ðxÞ2 ψ½ F ðxÞf ðxÞd ðxÞ

ð5Þ

−∞

where ψ is the weight function, and f(x) is the density function of F(x).

The Kolmogorov-Smirnov test is used to determine if a sample comes from a population with a specific distribution. The Kolmogorov-Smirnov test is based on the empirical distribution function. Given n ordered data, the empirical c. d. f. is defined as Xj

n i¼1 j

n

A2n ¼ −n−

ð6Þ

ð10Þ

To compute the p values, the following algorithms are applied following Anderson and Darling (1952) and Stephens (1986) 2 p ¼ 1−eð−13:436þ101:14c−223:73c Þ f or 0:00≤ c ≤ 0:20 2 p ¼ 1−eð−8:318þ42:796c−59:938c Þ f or 0:20 ≤ c ≤ 0:34 2 p ¼ 1 − eð0:9177−4:279c−1:38c Þ f o r 0:34 ≤ c ≤ 0:60 2 p ¼ 1−eð1:2937−5:709cþ0:0186c Þ f or 0:60 ≤ c ≤ 13

Kolmogorov-Smirnov test

F n ð xÞ ¼

½ F n ðxÞ− F ðxÞ2 d F ð xÞ F ðxÞ½1− F ðxÞ

The formula can be simplified into

−∞

¼n

ð8Þ

where the c coefficients are computed as   0:75 2:25 2 þ 2 c ¼ An 1 þ n n

ð11Þ

ð12Þ

In the present study, the Anderson-Darling test is implemented in MATLAB programing language by the codes of http://

Arab J Geosci (2015) 8:10119–10134

mathworks.com/matlabcentral/fileexchange/14807andartest/ content/AnDartest. Accessed 01 March 2013.

Moment-based tests Tests based on skewness or kurtosis naturally fall into this group. Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution or data set is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution is the extreme case. The skewness and kurtosis test of normality is one of the normality tests designed to detect all departures from normality. The normal distribution has a skewness of 0 and kurtosis of 3. The test is based on the difference between the data’s skewness and 0 and between the data’s kurtosis and 3. The moment coefficient of skewness and kurtosis of a data set is calculated recursively as follows: pffiffiffiffiffi m3 b1 ¼ 3 m22 m4 b2 ¼ m2 X2 ðx−xÞ2 m2 ¼ ð13Þ X n 3 ðx−xÞ m3 ¼ X n 4 ðx−xÞ m4 ¼ n where x is the mean, and n is the sample size. m2 is the variance, the square of the standard deviation , m3 is called the third moment of the data set, and m4 is called the fourth moment of the data set.

Jarque-Bera test Skewness and kurtosis are used to construct this test statistic. The Jargue-Bera (JB) test is used to check whether the coefficients of skewness and excess kurtosis (b2 −3) are jointly B0^. The test statistic JB is defined as " pffiffiffiffiffi2 # b1 ðb2 −3Þ2 þ JB ¼ n ð14Þ χ2 e 6 24 where n is the number of observations (or degrees of freedom in general). If the data come from a normal distribution,

10125

the JB statistics has a chi-squared distribution with two degrees of freedom; thus, the statistic can be used to test the hypothesis that the data are from a normal distribution. The null hypothesis is a joint hypothesis of both the skewness and excess kurtosis being 0. Samples from a normal distribution have an expected skewness of 0 and a kurtosis of 3. Any deviation from this increases the JB statistic. The JB test is applied using the MATLAB Bjbtest^ function in this study.

D’Agostino test The D’Agostino test separately implements skewness and kurtosis values into transformation processes to supply uncorrelated independent skewness and kurtosis values, thereby speeding up the approximation of the test to the normal distribution. The test statistics is pffiffiffiffiffi

2 b1 þ ðZ ðb2 ÞÞ2 ð15Þ K2 ¼ Z pffiffiffiffiffi Here, Z b1 and Z(b2) are the transformed skewness and kurtosis, respectively. The transformation workflow of the pffiffiffiffiffi skewness b1 is as follows according to D’agostino et al. (1990) and D’agostino and Pearson (1973) 1 pffiffiffiffiffi ðn þ 1Þ þ ðn þ 3Þ =2 Y ¼ b1 6ðn−2Þ pffiffiffiffiffi 3ðn2 þ 27n−70Þðn þ 1Þðn þ 3Þ β2 b1 ¼ ðn−2Þðn þ 5Þðn þ 7Þðn þ 9Þ n pffiffiffiffiffi o W 2 ¼ 1 þ 2 β2 b1 −1

δ ¼ 1 pffiffiffiffiffiffi lnW . n . o1 2  α ¼ 2 W 2 −1  o1=2  pffiffiffiffiffi

Y n Z b1 ¼ δln þ ðY =αÞ2 þ 1 α

ð16Þ ð17Þ ð18Þ ð19Þ ð20Þ ð21Þ

and transformation workflow of kurtosis is E ðb2 Þ ¼ σ2b2 ¼

3ðn−1Þ nþ1 24nðn−2Þðn−3Þ

ð n þ 1Þ 2 ð n þ 3Þ ð n þ 5Þ .qffiffiffiffiffiffiffi σ2b2 x ¼ ðb2 −Eðb2 ÞÞ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffi 6ðn2 −5n þ 2Þ 6ðn þ 3Þðn þ 5Þ β 1 ð b2 Þ ¼ ð n þ 7Þ ð n þ 9 Þ nðn−2Þðn−3Þ

ð22Þ ð23Þ ð24Þ ð25Þ

10126

Arab J Geosci (2015) 8:10119–10134

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi# " 8 2 4 A ¼ 6 þ pffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffi þ 1 þ ð β β 1 ð b2 Þ β 1 ð b2 Þ 1 b2 Þ .1 0 #1 3   " B C. 2 1−2=A C p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z ð b2 Þ ¼ B − 1− @ A 9A 1 þ x 2=ð4−AÞ 

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi .

ð26Þ

Shapiro-Wilk test

2 ð9AÞ

ð27Þ

The computed test statistic has an χ distribution. In the present study, the p value of the test statistics is computed with 2° of freedom using the Bchi2inv^ MATLAB function. 2

Regression-correlation-based test These kinds of tests are based on associating the linear combination of the sorted samples with the probability plot of the normal distribution. In other words, it associates a linear combination as yi =μ+σxi, where xi is a member of the x1, x2,…, xn sorted sample that has a size of n and is assumed to come from the x~N(0,1) standard normal distribution of the probability plot of the standard normal distribution. These tests were first put forward by Shapiro and Wilk (1956) and Shapiro and Francia (1972). The proposed test statistic was X n W ¼ Xn

2

ay ݼ1 i i

i¼1

These kinds of normality tests are also known as the only normality tests that can detect skewness and kurtosis, aside from non-normality. In the present study, these two tests are applied in the same MATLAB package based on the algorithms of Royston (1995).

ð28Þ

ðyi −yÞ2

where the ai coefficients are the linear regression values of the sorted standard normal distribution and are computed as mT C −1 aT ¼ ða1 ; …an Þ ¼  1=2 mT C −1 C −1 mT

ð29Þ

where m = (m1, …, mn) are the expected values of the ordered statistics of independent and identically distributed random variables sampled from the standard normal distribution, and C is the covariance matrix of the ordered sample. The test statistic is the ratio of the least square estimation of two variances (Royston 1995). The most popular regression-correlation-based tests are the SW (Shapiro-Wilk), SF (Shapiro-Francia), and Ryan-Joiner tests. In the present study, the SW and SF tests are implemented. The Shapiro-Francia test produces better solutions when the kurtosis of the sample is larger than 3; conversely, the SW test produces better solutions when the kurtosis is lower than 3. The criteria of the SW and SF have the same form and differ only in the definion of coefficients. T

The fundamentals of the SW test were introduced by Sarhan and Greenberg (1956) for only 20 samples. Later, Shapiro and Wilk (1956) estimated the coefficients of up to 50 samples. Roystone (1983) then developed the test using up to 2000 samples. The test statistic is X n

2 a y i i ݼ1 ð30Þ W ¼ Xn 2 ð y −y Þ i¼1 i Here, approximation is applied to compute the ai coefficients m ei ai ¼ pffiffiffi i ¼ 1; 2; …; n−2 ∈

ð31Þ

where the m ei values are the approximations for mi expected values and computed as i−3=8 ð32Þ m ei ¼ ϕ−1 n þ 1=4 0

m e :m e−2m e2n −2m2n−1 ∈¼ 1−2a2n −2a2n−1

ð33Þ

where an =a1 and an-1 =a2 are the special cases and computed as an ¼ −2:706056u5 þ 4:434685u4 −2:071190u3 −0:147981u2 þ 0:221157u þ cn

ð34Þ

an−1 ¼ −3:582633u5 þ 5:682633u4 −1:752461u3 −0:293762u2 þ 0:042981u þ cn−1

ð35Þ

Here, coefficients u and c are 1 u ¼ pffiffiffi n m e ffi c ¼ pffiffiffiffiffiffiffiffiffiffi 0 m e :m e

ð36Þ

This approximation is valid when the sample size is n≥6. Given that the sample size used in the present study is large

Arab J Geosci (2015) 8:10119–10134

10127

enough, the case of n