A Comparison of Normality Tests using SPSS, SAS and MINITAB: An Application to Health Related Quality of Life Data Nornadiah Mohd Razali, Norin Rahayu Shamsudin, Nur Niswah Naslina Azid @ Maarof, Az’lina Abdul Hadi, Dr. Adriana Ismail Faculty of Computer and Mathematical Sciences Universiti Teknologi MARA Shah Alam, Malaysia
[email protected] Abstract— Numerous studies have been conducted on the power of various tests of normality such as Shapiro-Wilk (SW), Kolmogorov-Smirnov (KS), Lilliefors (LF) and AndersonDarling (AD) tests. This paper focuses on the application of these normality tests. The mentioned tests of normality were performed on Health Related Quality of Life (HRQoL) data sets using three different statistical packages; SPSS, SAS and MINITAB. The results for small, moderate and large sample were compared. Comparisons were also made with the results obtained from a simulation study. On top of that, other methods for testing normality such as the skewness and kurtosis coefficients provided in different statistical packages were also presented. Keywords-Test of normality; Shapiro-Wilk; Kolmogorov-Smirnov; Lilliefors; Anderson-Darling; HRQoL; SF-36
I.
INTRODUCTION
Normality has always been an important assumption when dealing with parametric methods of data analysis. This assumption is crucial for the correct implementation of the methods. Realizing the importance of this assumption, many statisticians have put their effort on the modification or improvement of the original tests of normality as well as developing a new test. This is proven by numerous amount of test of normality available in the statistical literature. Some of the common tests were discussed by Farrel and Stewart [1] and Oztuna, Elhan and Tuccar [2]. Among the tests discussed include those commonly available in most statistical packages; Shapiro-Wilk (SW), Kolmogorov-Smirnov (KS), Lilliefors (LF) and Anderson-Darling (AD) tests. There are also other tests of normality developed but rarely used and not available in statistical packages such as Kuiper test, Vasicek test and Ajne test of which the formula for the test statistic was given by Yazici and Yolacan [3]. The test statistic of the Vasicek test is based on the sample entropy [4]. It was proven by Vasicek [4] in his study that the Vasicek test produced the highest power for several alternative distributions. Declercq and Duvaut [5] also have proposed a new test of normality called Hermite normality test which makes use of the orthogonal property of Hermite polynomials in developing the test statistic. A more recent test of normality could be found in This study is sponsored by the Fundamental Research Grant Scheme (600RMI/ST/FRGS5/3/Fst(239/2010)) provided by the Ministry of Higher Education, Malaysia through the Research Management Institute, Universiti Teknologi MARA.
a study done by Wang [6] who developed a new test of normality based on the Lévy characterization of the normal distribution. Due to the vast emergence of various tests of normality, studies on the power comparison of these tests have also been one of the concerned among statisticians. In the previous study done by the author, several tests of normality were compared against various sample sizes and alternative distributions. It was found that SW test is the most powerful test for all types of distributions and sample sizes whereas KS is the least powerful test. Furthermore, the performance of AD test is comparable to that of SW test and LF test always outperforms the KS test. This paper, on the other hand focuses on the application of these tests; SW, KS (with Lilliefors correction) and AD on real data set. The mentioned tests were performed on health-related quality of life (HRQoL) data using three different statistical packages which are SPSS, SAS and MINITAB. The results for small, moderate and large sample were then compared. On top of that, comparisons were also made with the results obtained from the previous simulation study. II.
HEALTH RELATED QUALITY OF LIFE (HRQOL) DATA
Health-related quality of life (HRQoL) can be defined as the satisfaction or happiness of an individual towards the domains of life as far as they affect or affected by health [7]. The HRQoL data used in this study were gathered through a questionnaire which was adopted from a short-form health survey (SF-36). The SF-36 basically measures eight domains of health; limitations in physical activities due to health problems (PF), limitations in role activities due to physical health problems (RP), bodily pain (BP), general health (GH), vitality (VT), limitations in social activities due to physical or emotional problems (SF), limitations in role activities due to emotional problems (RE) and mental health (MH)[8]. All these domains served as variables on which the tests of normality were performed. In order to vary the sample size, the data were divided according to states. Data sets from three states which were Johor (n=13), Perak (n=35) and Selangor (n=227) were selected to represent small, moderate and large sample size, respectively.
III.
and
NORMALITY TESTING IN STATISTICAL PACKAGES
The methods for testing normality can be classified into two different categories; descriptive statistics and theory driven method [9]. Descriptive statistics include the skewness and kurtosis coefficients and graphical methods such as stem-andleaf plot, box plot and histogram. The quantile-quantile plot (Q-Q plot), probability-probability plot (P-P plot) and tests of normality falls under the theory driven methods. A. Descriptive Statistics: Skewness and Kurtosis Coefficients The skewness and kurtosis coefficients are available in all statistical packages. However, one should be cautious in interpreting the coefficients as different packages use somewhat different definition of coefficients. The different definition may not be of concern when the sample size is large. However, in small and moderate samples the differences in the values produced could be quite staggering. This has been proven by Joanest and Gill [10] who compared the mean squared error of three different measures of skewness and kurtosis including those used in SAS and MINITAB. They pointed out that for large samples from a normal distribution, all three measures produced very close values, but not for small samples. In addition, they concluded that the skewness and kurtosis coefficients produced in MINITAB have smaller mean squared error in normal samples while those produced by SAS have smaller mean squared error in highly skewed samples. Defining the formula for estimating the skewness and kurtosis coefficients is cumbersome as there are several different formulas for these coefficients that could be found from the literature. According to D’ Agostino, Belanger and D’Agostino, Jr. [11], SPSS and SAS use Fisher’s definition of skewness and kurtosis. Let be the observation, be the sample mean, be the number of nonmissing observations and be the moment about the mean of calculated ∑ ⁄ . The sample skewness and kurtosis as used in SPSS and SAS can then be defined as,
n ∑( X − X )
3
g1 =
(1)
(n − 1)(n − 2)S 3
and
n(n + 1)∑( X − X )
4
g2 =
3(n − 1) , (n − 2)(n − 3) 2
(n − 1)(n − 2)(n − 3)S 4
−
(2)
respectively, where
n m2 = S = n −1 2
∑(X − X )
2
n −1
(3)
is the sample variance. Joanest and Gill [10] reported that MINITAB somehow uses a slightly different definition of sample skewness and kurtosis such as follows,
m ⎛ n −1⎞ b1 = 33 = ⎜ ⎟ S ⎝ n ⎠
3/ 2
m3 m23 / 2
(4)
2
m ⎛ n − 1 ⎞ m4 b2 = 44 − 3 = ⎜ ⎟ 2 −3. S ⎝ n ⎠ m2
(5)
Based on the definition used by SPSS, SAS and MINITAB, the tested data could be assumed as normally distributed if both skewness and kurtosis values are close to zero. Other statistical packages such as MATLAB use the coefficients defined by Pearson which gives a normal distribution a skewness of 0 and a kurtosis of 3. The differences between Fisher’s and Pearson’s definition of coefficients were explained further by Rahman [12] in his book. The sample skewness and kurtosis defined by Pearson could be written as,
b1 =
m3 m23 / 2
(6)
b2 =
m4 . m22
(7)
and
> 0) indicates that the A positive value of the skewness ( distribution of the tested data is skewed to the right whereas < 0) indicates a left skewed distribution. negative value ( The same convention applied to the skewness definition used by SPSS, SAS and MINITAB. Interpreting the kurtosis, nevertheless, needs extra attention as it depends on the definition used by the statistical package. If SPSS, SAS or MINITAB is used, the distribution of the tested data is said to have heavier or thicker tails with higher peak (leptokurtic) than normal (mesokurtic) if the kurtosis value is positive. A negative value of kurtosis indicates that distribution has lighter or thinner tails with lower peak (platykurtic) than normal [11]. In contrast, if MATLAB is used, a leptokurtic distribution will have a kurtosis of greater than 3 while platykurtic distribution will have a kurtosis of less than 3. The rules for determining the type of distribution based on skewness and kurtosis may however vary among statisticians. Evans [13] for instance, suggested that distribution with skewness value of greater than 1 or less than -1 could be considered as highly skewed. Those with skewness value of between 0.5 and 1 or between -1 and -0.5 is said to have moderately skewed distribution whereas a value between 0.5 and -0.5 indicates relative symmetry. Brown [14] on the other hand, proposed to the practitioners to make use of the standard error of skewness (ses) and standard error of kurtosis (sek) in deciding whether the tested data could be assumed to come from a normal distribution. He suggested that the data could be assumed as normally distributed if the skewness and kurtosis values lie within the range of 2*ses and 2*sek, respectively. Some practitioners favor one and some favor the others. Nonetheless, skewness and kurtosis do not provide conclusive information about normality. Hence, it is always a good practice to supplement the skewness and kurtosis coefficients with other methods of testing normality such as the graphical methods and formal tests of normality.
B. Theory Driven Method: Tests of Normality The variability of normality tests in the literature has led to the discrepancy of such test available in different statistical packages. Table 1 summarizes the normality tests available in different statistical packages. As can be seen from the table, SAS and MINITAB provide almost all of the normality tests listed in the table. In SPSS, the tests are limited to those which are most commonly used; KS and SW tests. Notice that all listed statistical packages still provide the KS test despite being proven to be the least powerful tests by most power comparison studies. These tests should not be used when testing for normality due to their poor power properties [11]. TABLE I.
SUMMARY OF NORMALITY TESTS IN STATISTICAL PACKAGES Statistical Packages
Test of Normality
SPSS 20
SAS 9.2
MINITAB 16
Kolmogorov-Smirnov (KS)a
√
√
√
Shapiro-Wilk (SW)
√
√
NA
Anderson-Darling (AD)
NA
√
√
Cramer-von Mises (CVM)
NA
√
NA
Jarque-Bera (JB)
NA
NA
√
Ryan-Joiner (RJ)
NA
NA
√
a.
With Lilliefors correction b.
NA=Not available
The KS, CVM and AD tests belong to the empirical distribution function (EDF) class of normality tests [15]. KS statistic belongs to the supreme class of the EDF tests whereby the statistic is calculated based on the largest vertical distance between the hypothesized and empirical distribution. By contrast, CVM and AD statistic belongs to the quadratic class of the EDF tests in which their test statistics are calculated based on the squared difference between the hypothesized and empirical distribution. One of the disadvantages of KS test is that it is severely affected by ties. As a remedy to this problem, MINITAB uses the average ranks for each tied value to calculate the KS statistic. The KS test needs the parameters of the tested distribution to be fully specified. If you were to test whether your sample data come from a population with known parameters; mean and variance, then the original KS test may be employed. However, when testing normality for a general normal with unspecified or unknown parameters, LF test is more appropriate since LF test estimate the mean and variance of the normal distribution from the sample data [16]. Even though the LF test uses similar test statistic as in KS test, the results produced might be different due to the different table of critical values used [17]. This will lead to a different conclusion about normality. Therefore, it is crucial to choose the most appropriate test for the data. Most statistical packages however, provide the KS test with Lilliefors correction. AD test is a modification of the Cramer-von Mises test. It gives more weight to the tail of the distribution compared to the Cramer-von Mises test [1]. Some authors believe (based on simulation study) that AD is the most powerful EDF test.
SW test on the other hand belongs to the regression and correlation class of normality tests. The SW statistic can be referred to as a ratio of two estimates of the sample variance. The numerator is the square of an appropriate linear combination of the sample order statistic whereas the denominator is the usual symmetric estimate of variance [18]. SW statistic of close to 1 indicates normality. This test is originally created for a sample size of between 3 and 50. However, Royston has extended the test so that it is appropriate for larger sample size of between 7 and less than 2000 [9]. According to Peng [19], different version of SAS software may report different value of the SW statistic. This is due to the modification of algorithm used in the different versions of the software. In MINITAB, the Ryan Joiner (RJ) test is said to be similar to the SW test. The test statistic of this test is as follows,
Rp =
∑Y b (n − 1)∑b i i
s2
(8)
2 i
where are the ordered observations, are the normal is the sample variance. scores of the ordered data and Similar to the SW test, this test is also based on the correlation between the sample and the tested data. The tested data are considered close to normal if the value of is near 1. The JB test is classified as moment tests whereby it makes use of the skewness and kurtosis coefficients in calculating the test statistic. Some refers to this test as omnibus test [20]. The test statistic of this test can be written as,
( )
⎡ b JB = T ⎢ 1 ⎢ 6 ⎣
2
+
(b2 − 3)2 ⎤⎥ 24
⎥ ⎦
(9)
where T is the sample size and and are the skewness and kurtosis as defined in (6) and (7), respectively. Several modifications of the above tests have been proposed by different authors due to the fact that the original JB test does not perform very well in small and moderate sample size [21]. The formula for the test statistic of KS, AD and SW tests could be found in the paper written by Yazici and Yolacan [3]. IV.
RESULTS AND DISCUSSION
A. Skewness and Kurtosis of the HRQoL Data Table II displays the skewness (Sk) and kurtosis (Ku) values for all eight variables produced by different statistical packages. SAS and SPSS produced similar results as they are using the same definition of sample skewness and kurtosis. Therefore, the results produced by both statistical packages were combined in the table. For large sample, very little difference can be seen in the values of both coefficients produced by all listed statistical packages. However, for small sample, MINITAB produced slightly different values of skewness and kurtosis from those produced by SAS and SPSS. This is due to the different definition employed by the statistical package.
TABLE II. Sample Size
13
35
227
SKEWNESS AND KURTOSIS VALUES FOR ALL VARIABLES Variables PF RP BP GH VT SF RE MH PF RP BP GH VT SF RE MH PF RP BP GH VT SF RE MH
SPSS AND SAS Sk Ku -1.61 2.43 -0.43 -0.85 -1.07 0.75 0.38 -0.37 0.66 -0.38 -0.61 -0.93 -0.63 -0.81 -0.31 -1.09 -0.63 -0.26 -0.70 0.11 -0.09 -0.79 -0.57 1.56 0.08 -0.05 -0.66 -0.23 -0.99 0.62 -0.45 -0.72 -0.99 0.62 -1.26 0.84 -0.48 -0.62 -0.64 0.04 -0.37 0.34 -0.63 -0.77 -0.85 -0.74 -0.34 -0.87
MINITAB Sk Ku -1.25 0.54 -0.33 -1.28 -0.84 -0.39 0.30 -1.02 0.51 -1.02 -0.47 -1.33 -0.49 -1.26 -0.24 -1.42 -0.58 -0.54 -0.64 -0.24 -0.09 -0.97 -0.52 0.94 0.08 -0.36 -0.60 -0.51 -0.91 0.17 -0.42 -0.91 -0.98 0.54 -1.24 0.76 -0.54 -0.61 -0.64 -0.02 -0.37 0.28 -0.62 -0.80 -0.83 -0.77 -0.34 -0.90
Based on the rules proposed by Evans [13], the different measures of skewness produced by SAS, SPSS and MINITAB seem to agree that for small sample, variables PF and BP are highly skewed while variables VT, SF and RE are moderately skewed. RP, GH and MH are approximately symmetric. However, looking at their kurtosis values, the distribution of all three variables has lighter and thinner tails than the normal distribution. Clearer pictures of the distribution of some variables can be seen in Fig. 1 and Fig. 2.
Figure 1. Histogram of PF and MH for small sample
Figure 2. Q-Q Plot of PF and MH for small sample
For moderate sample, variables BP, VT and MH could be considered as almost symmetric while for large sample, only variables VT and MH are approximately symmetric in shape. The significantly lower kurtosis values of BP and MH than the
normal kurtosis of 0 from the moderate and large sample, respectively, depict that the distribution of the variables has lighter and thinner tail than the normal distribution. This can be seen in the figures below.
Figure 3. Histogram of BP, VT and MH and for moderate sample
Figure 4. Q-Q Plot of BP, VT and MH for moderate sample
Fig. 5 shows the histogram with distribution curve for variable VT and MH from large sample. The corresponding QQ plots are shown in Fig. 6. Based on the histogram, the distribution of the two variables is slightly skewed to the left. The Q-Q plot also indicates nonnormality of both variables as shown by the points which deviate from the straight line at the lower end and higher end of the Q-Q plot of VT and MH, respectively.
Figure 5. Histogram of VT and MH for large sample
Figure 6. Q-Q Plot of VT and MH for large sample
All other variables from all sample sizes not mentioned are either highly or moderately negatively skewed.
B. Tests of Normality Table III and Table IV present the results of different tests of normality produced in SPSS, SAS and MINITAB. Contrary results were highlighted. Notice that the p-values for KS (with Lilliefors correction) test are equal across different statistical packages. Similar phenomenon can be observed for the pvalues of other tests of normality. Given the approximately symmetrical shape of the distribution of RP, GH and MH from the small sample (as shown by the skewness values), all SW, KS and AD tests agree that these variables are approximately normally distributed. All tests also point out that VT is normally distributed despite the slightly high skewness value than the acceptable range proposed by Evans [13]. All tests, however shows contrary results for all other variables in the small sample. SW and AD tests produced lower p-values that the 5% level of significance whereas KS test produced slightly higher p-values. PF and BP which are highly skewed have been proven as not normally distributed by SW and AD tests. KS test, however, fail to detect the nonnormality. For moderate sample, all tests consistently signify the nonnormality of PF, SF and RE and the normality of GH and VT. As for variable RP and BP, KS test outlines that both variables are normally distributed. In opposite, SW test does not agree with the normality of both variables. AD test on the other hand, agrees with the normality of BP but not with normality of RP. KS and AD tests conclude that MH is not normally distributed. The p-value of SW test on the other hand, is on the borderline (5% level of significance). For large sample, all tests in all statistical packages consistently produce very low p-values. These significantly low p-values exhibit that all variables are not normally distributed. TABLE III. Sample Size
13
35
227
P-VALUES FOR NORMALITY TEST IN SPSS AND MINITAB SPSS Variable PF RP BP GH VT SF RE MH PF RP BP GH VT SF RE MH PF RP BP GH VT SF RE MH
SW 0.013 0.163 0.025 0.603 0.181 0.029 0.029 0.694 0.010 0.004 0.031 0.225 0.551 0.006 0.001 0.053 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
KS (LF) 0.062 0.200 0.099 0.200 0.119 0.090 0.094 0.200 0.015 0.058 0.062 0.200 0.200 0.021 0.000 0.006 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
MINITAB KS AD (LF) 0.063 0.017 >0.150 0.218 0.095 0.032 >0.150 0.567 0.114 0.151 0.088 0.045 0.091 0.046 >0.150 0.702 0.019 0.015 0.059 0.007 0.064 0.061 >0.150 0.425 >0.150 0.393 0.029 0.015