A Test for Normality Based on Robust Regression Residuals ¨ Onder ¨ A.O. and A. Zaman
Department of Economics, Ege University, Izmir, Turkey E-mail:
[email protected] Lahore University of Management Science, Lahore, Pakistan
Summary. We attempt to investigate the effects of using residuals from robust regression replacing OLS residuals in test statistics for the normality of the errors. We have found that this can lead to substantially improved ability to detect lack of normality in suitable situations. We derive the asymptotic distribution of the robustified normality test as chi-squared with 2 degrees of freedom under the null hypothesis of normality of the error terms. The high breakdown property of the test statistic is discussed. By using simulations, we have found that situations where a small subpopulation exhibits characteristics which are different from the main population are the ones which ideally suit to the use of robustified normality tests. We have employed several real data sets from the literature to show that these types of situations arise frequently in real data sets.
1 Introduction Significant deviations from normality of the regression residuals can substantially affect the performance of usual inference techniques. Thus, diagnostic tests for normality are important for validating inferences made from regression models. In recent years, several such tests have been devised (see, for example Pierce and Gray, 1982; White and Macdonald, 1980; Pearson et al., 1977; Urzua, 1996; Jarque and Bera, 1987). In general, these tests are conducted by testing the OLS residuals for normality. However, since OLS estimates themselves are obtained by maximizing the likelihood that the residuals come from some normal distribution, OLS residuals tend to look as normal as they possibly can be. In particular, outliers and wild nonnormal errors may easily be masked in an OLS analysis. This can be seen easily in Figure 1, where four points lie nearly on a straight line, and one point is an outlier (In 1976, Belgium had an extremely hot summer, leading to record numbers of fires, and record fire insurance claims.). The OLS line has a negative slope, and all standardized residuals are less than 2, showing no signs of lack of normality. Without looking at the graph, one might never suspect a problem with the regression, since the OLS regression statistics give no clue as to the presence of outliers. The lack of normality of residuals from a robust regression clearly reveals the non-normality of the errors, since one of the residuals is 57 standard deviations away from 0 (a virtually impossible value for a standard normal).
The authors would like to thank Mehmet Caner for helpful comments and suggestions.
296
¨ Onder ¨ A.O. and A. Zaman
Year 76 77 78 79 80
16000
14000
LMS OLS
12000
Claims OLS Res. LMS Res. 16,694 1.08 57.44 12,271 -1.14 -0.58 12,904 -0.58 0.58 14,036 0.26 7.57 13,874 0.38 -0.58
76 77 78 79 80 Fig. 1. Fire insurance claims in Belgium
In light of this, the main idea of this study is to use residuals from a robust estimator instead of OLS residuals as the basis for normality tests. We found that this basic idea was valid whenever a small subset of the original population deviates from the basic pattern of the main population. By using examples from the literature, we have shown that this type of situation holds for many real data sets. The results of the study shed light on situations in which robust residuals can perform substantially better than OLS residuals and on the situations in which they cannot perform better. Regarding univariate normality tests amongst the several alternatives we have selected Jarque and Bera (1987) (JB) as being one of the popular tests for testing normality in regression applications. It is well known that the JB test is based on a weighted average of the skewness and kurtosis:
JB
where n is the number of observations, is the sample skewness and is the sample kurtosis. Jarque and Bera (1987) proved that if the alternatives to the normal distribution belong to the Pearson family, JB is a score test for normality. As mentioned earlier, JB is calculated on the basis of OLS residuals. We suggest to use the same statistics, using residuals from a robust regression estimator in place of OLS residuals and denote this by JB . The particular estimator we study is called Least Trimmed Squares (LTS), introduced by Rousseeuw (1984). This has the property of being highly resistant to a relatively large proportion of outliers—a high breakdown value. Regarding the details of this technique and its properties, see Rousseeuw and Leroy (1987), and also Chapter 5 of Zaman (1996). The precise definition of the LTS estimator is as follows. Consider the regression model
where is a vector of regressors, is a vector of regression coefficients, are observed responses. represent the error terms which are assumed to be independent and identically distributed with mean 0 and standard deviation . In
A Test for Normality Based on Robust Regression Residuals
297
this study, we assume random carriers that are independent of . Moreover, the distribution of is symmetric. Given any let . Let be a rearrangement of the squared residuals in increasing order. 20% trimmed LTS The . estimator is defined to be the value of minimizing Although there exist other high breakdown estimators, the LTS has many virtues to recommend itself. A recently developed algorithm has permitted fast computation of the LTS (see Rousseeuw and Van Driessen, 1999). The main advantages of the LTS method is as follows. Firstly, LTS is simple to understand and easy to motivate. Also, it is more efficient than the LMS (least median of squares) introduced by Rousseeuw (1984) with which it shares these virtues. Many estimators commonly regarded as ‘robust’ in the econometrics literature have low breakdown values and cannot deal with any significant number of outliers. For example, the bounded influence estimator of Krasker and Welsch (1982) and the LAD (least absolute deviations) of Edgeworth (1887) both suffer heavily from the presence of a small subgroup of outliers—see Yohai (1987) for illustrations.
2 Asymptotic and Robustness Properties Before starting with simulations, we discuss the asymptotic distribution and the breakdown point of our test statistic. Theorem 1. Under the null hypothesis of normality of the error terms JB is, asymptotically, chi-squared with 2 degrees of freedom. Proof. As can be seen from H¨ossjer (1994) that
(1)
Then using (1) and Davidson and MacKinnon (1993) (p. 568-570) we obtain
where are the residuals from LTS regression and standard deviation. So we have
is the LTS estimate of the
(2)
¨ Onder ¨ A.O. and A. Zaman
298
Similarly for
(3)
Combining (2) and (3) in the test statistic we have JB
Roughly speaking, the breakdown point of a statistical functional is the smallest fraction of data contamination that can produce arbitrarily extreme results. A level breakdown occurs, if the data contamination of the null drives the test statistic to any point in its domain. On the other hand, a power breakdown point is the least amount of data contamination that can drive the test statistic to its null value regardless of the true alternative value (see He et al., 1990; Markatou and He, 1994, for a detailed discussion). The weighted likelihood estimator (Markatou et al., 1998) may reduce to the LTS estimator with the following weights;
where is the trimming point, and t=1, . . . , n are ordered. Following Agostinelli and Markatou (2001), we define the score type test function as
JB
where is the vector of parameters, the transpose denotes
and
superscript is the Fisher information
evaluated
at .
is the score function evaluated at and , where is the log-likelihood Markatou (2001), p. 507, function. Following the same techniqueas Agostinelli and but making minor modifications required to allow for the fact that we have a Pearson family alternative, it may be possible to establish that the level and power breakdown points of the JB test are the same as the breakdown point of LTS estimator, if the alternative belongs to the Pearson family. Table 1. Significance points for normality tests n
JB JB
20 2.34 3.58 8.92 3.34 5.15 12.23 50 3.17 5.13 12.78 3.82 6.50 14.55 *10.000 replications.
A Test for Normality Based on Robust Regression Residuals
299
Table 2. Sizes of the normality tests n
JB JB
20 0.096 0.051 0.011 0.095 0.049 0.009 50 0.095 0.045 0.009 0.099 0.043 0.009 *10.000 replications.
3 Motivating Examples We conducted a Monte Carlo simulation to obtain the significance point of JB and JB by using the GAUSS programing language. To generate pseudo-random variates from normal and other distributions considered throughout the study, we used the built-in random number generator of GAUSS 3.1.1. We considered a linear model with a constant term and one additional regressor. The regression matrix consists of one column of ones and one column of uniform random variables1 . The significance points and the sizes of the tests are presented in Table 1 and Table 2. Rousseeuw and Leroy (1987) use several data sets for their analysis related to robust regression. As a preliminary test for the validity of our basic idea, we conducted a study of tests for normality on five of these data sets (see Table 3). The results were highly encouraging. In each of the five data sets studied, we can say that the JB test fails to detect any lack of normality in the OLS residuals, while JB is well above its critical values (given in Table 1); and pick up the lack of normality of the errors very clearly. Since the presence of big outliers (corresponding to lack of normality) in these series is well known, we observe that the JB test, conducted on the basis of OLS residuals, can lead to misleading outcomes. In the light of this, we decided to do a simulation study of the relative performance of JB versus JB . Table 3. The values of the normality test statistics from Rousseeuw and Leroy data sets Series n JB JB Brain 28 1.93 25.52 Cloud 19 2.23 7.14 Salinity 28 0.03 165.32 Aircraft 23 0.17 57.88 Delivery 25 0.01 54.22 * Rousseeuw and Leroy (1987) from pp. 57, 96, 82, 154, 155.
The same experiment was repeated with a regressor matrix consisting of a column of ones and three columns of uniform random variables. This gave similar results which are therefore not presented here.
300
¨ Onder ¨ A.O. and A. Zaman
4 Conventional Power Comparisons Given that robust estimation and normality testing have been considered in the literature for a long time, it is surprising that the use of robust estimators prior to normality testing has not been considered. Three main reasons can be given for this: 1. A poorly chosen ‘robust’ estimator will not identify non-normal errors clearly since many estimators termed ‘robust’ do not work in the presence of groups of outliers due to the masking effect—see Atkinson (1986). The first practical estimators, namely, LTS and LMS which work in presence of a substantial proportion of outliers were introduced by Rousseeuw (1984). However, these have not received much attention in the econometric literature. See Rousseeuw and Leroy (1987), Chapter 5 of Zaman (1996), and also Orhan and Zaman (1999) for more details. 2. Computational costs and lack of availability of suitable software have been barriers to experimentation with these robust estimators. A fast algorithm for computing the LTS for large data sets has only recently been developed by Rousseeuw and Van Driessen (1999). 3. Conventional Monte Carlo designs for testing robust estimators do not reflect the presence of contaminated data, and therefore do not reveal the virtues/disadvantages of alternatives in the context of real situations. To illustrate the third point, we conducted a simulation using the framework of Jarque and Bera (1987). Jarque and Bera used four alternative distributions; beta(3,2), Student’s t (5), gamma (2,1) and lognormal to assess the ability of the JB test to detect lack of normality. Every experiment in this simulation study consisted of generating 20 pseudo-random variates from the given distribution in order to compute the values of JB and JB and find out whether null hypothesis (normality) is rejected by each individual test. The level of significance used was 0.10 throughout the simulations. We carried out 10.000 replications and estimated the power of each test by computing the percentage when the null hypothesis was rejected. The results are presented in Table 4. Regarding this, we can say that the robust-error based test JB is inferior to their OLS-error based counterpart JB. Table 4. Estimated power for n=20 Series JB JB Beta 0.07 0.05 t 0.29 0.28 Gamma 0.51 0.40 Lognormal 0.88 0.85 Cauchy 0.88 0.88 *10.000 replications, = 0.10.
The model underlying these simulations is that if the errors are not independent and identically distributed (i.i.d.) normal, then they are i.i.d. according to some other
A Test for Normality Based on Robust Regression Residuals
301
known density. It seems improbable that the data sets violating normality would satisfy this idealistic assumption. On the other hand, since the range of alternatives to null of i.i.d. normal is extremely large, these simulations focus on a manageably small subset. Furthermore, the real data sets display a wide range of variations and surprises, and it would be quite impossible to capture this in some small class of test cases for lack of normality. Experimenting with several variations of the alternative density as well as with variant robust estimators, it has become clear to us that the use of robust errors is not of any value in detecting alternative i.i.d. error structures. This raises the issue of what feature of the numerous real data sets analyzed by Rousseeuw and Leroy (1987) accounts for the indisputable superiority of JB to JB in those contexts.
5 Clustered Outliers In order to understand the reasons for the superiority of JB on the Rousseeuw and Leroy data sets, we conducted new simulations, allowing for a larger set of alternatives than just i.i.d. with some non-normal density. To accomplish this, we started
by using the mixture of two normal distributions ( as allowed us to choose the peran alternative to the null of normality. In this way, centage of outliers and and allowed us to control the behavior of the outliers. In the examples related to the failure of OLS series, we observed that the outliers lie in clusters (see, for example, Rousseeuw and Leroy, 1987, p. 58). Below we study the effects of clustered outliers (referred to as case A below) as opposed to randomly generated outliers (referred to as case B below). The nature of the outliers also turns out to be crucial. High variance outliers from and compared to the base distribution were studied, as well as mean-shift outliers generated from and distributions. In order to make a comparison, we also generated outliers at random places in samples. The first case, where the outliers lie in clusters is labeled Case A and the second case, where they are at random places is labeled Case B. In Table 5 the results for 20 and 50 observations are presented. In all cases, 80% of the true residuals are generated from N(0,1) and 20% of them from normal distributions with different means and variances. So for 20 observations 4 and for 50 observations 10 outliers are generated. These experiments shed considerable light on the properties of the modified normality test. When the outliers are clustered (Case A) and generated by shifting the mean of the normal, then the robust version of the test has substantially greater power than the test based on OLS residuals. In all other cases, both tests have similar performance. In retrospect, this can be explained as follows. With balanced outliers on both sides of the regression line, the OLS estimator is not much affected by the outliers, and hence OLS residuals are similar to robust residuals—the only difference being the loss of efficiency of the LTS due to trimming, i.e. ignoring some valid observations. Outliers generated by high variance are equally likely to be positive or negative, and hence cancel each other out. This also explains why the Cauchy did
302
¨ Onder ¨ A.O. and A. Zaman
not lead to improved performance for JB as we had expected. Outliers generated by a mean shift all lie on one side of the regression and hence are much more effective in systematically distorting OLS estimates. Also, clustering of the outliers has a significant effect on the relative performance of these tests. Clustered outliers tend to systematically distort OLS estimators to the possible largest extent (outliers with high leverage have a similar effect and result in similar outcomes). Thus, these situations lead to maximum improvement for tests based on robust residuals. When the outliers are not clustered, their impact on the OLS estimators is reduced (equivalently, they have less leverage). In such situations, both OLS and robust methods pick out the outliers easily, and therefore both achieve high power, as indicated in Table 5. Table 5. Power comparisons of mixture of normal alternatives with 20% outliers under different conditions n=20 Case A Case B N(5,1) JB 0.31 0.74 JB 0.44 0.63 N(10,1) JB 0.42 0.94 JB 0.68 0.98 N(0,9) JB 0.54 0.54 JB 0.54 0.54 N(0,16) JB 0.72 0.71 JB 0.73 0.71 n=50 Case A Case B N(5,1) JB 0.59 1.00 1.00 JB 0.86 N(10,1) JB 0.69 1.00 JB 0.75 1.00 N(0,9) JB 0.85 0.83 JB 0.85 0.83 N(0,16) JB 0.96 0.95 JB 0.96 0.95 *10.000 replications, = 0.10.
6 Applications The main objective of this paper is to explore whether the alternative JB is superior to the conventional test JB. This issue cannot be settled by simulations, since one can create contexts which might be favorable one to another. Based on the simulations carried out in this study, we can say (roughly speaking) that randomly scattered and unbiased (that is, equally likely to be positive or negative) deviations from normality are more easily detected by the conventional form. Systematic, clustered, and biased
A Test for Normality Based on Robust Regression Residuals
303
distortions are more easily detected by the modified form. In applied contexts, this could be translated to the following question. Do violations of normality in real data sets arise due to non systematic randomly scattered errors (like recording errors), or due to the presence of a small subpopulation (or several such) with different characteristics from the main population? The very diverse set of practical applications (see Rousseeuw, 1997, for a review) which has emerged of robust regression techniques of LTS and LMS leads to the supposition that in a significant number of real cases the second supposition is valid. We assess this issue on a number of real data sets from recent publications as the following. First we apply the LTS test to models used by Tansel (1993) investigating the cigarette demand for Turkey in the period 1960-88. In her models, the cigarette consumption per adult is regressed on a number of variables on which the effect of government decisions about cigarette consumption for Turkey is tested by the use of the OLS method. We repeated the regressions and then computed the normality test statistics. The results are presented in Table 6. Using the critical values from Table 1 it can be seen that the JB test fails to reject normality for all of the models as in Tansel (1993). But if robust tests are applied to the same models, then some of the tests lead to different decision scenarios. Although JB fails to reject the null for Model 1 and Model 3, it rejects the null for Model 2 and Model 4 at 1% significance level. According to Orhan and Zaman (1999), these data sets include outliers, which imply that the true residuals are not normally distributed. This supports the inference drawn from our robust tests. Table 6. The values of the normality test statistics from Tansel (1993) and Metin (1998) models Tansel (1993) Metin (1998) Model 1 Model 2 Model 3 Model 4 Model 1 Model 2 n/k 28/6 28/5 29/5 28/7 33/17 33/11 JB 0.72 1.27 0.83 1.30 0.73 1.48 3.20 33.90 2.91 55.92 183.26 51.87 JB *k is the number of regressors.
We applied the same method to the study of Metin (1998) in which she analyzes the relationship between inflation and the budget deficit in the Turkish economy for the period 1950-1987. She uses single equation modeling, written as an error correction model. In this study, inflation depends on the scaled budget deficit, the real growth rate of income, and scaled base money. We replicated the first and second model, in which OLS is used as the estimation method and obtained the normality test results as in Table 6. By using the critical values in Table 1 it can be seen that the JB test statistics fails to reject the null hypothesis of normality for both models in Metin (1998). But the results of the robust test are just the opposite. For Model 1 and Model 2 the JB rejects the normality at 1% significance level. Again the robust
304
¨ Onder ¨ A.O. and A. Zaman
residuals suggest significant non-normality in the errors which is not detected by OLS regression analysis.
7 Conclusions In this study, we aimed to explore the issue of whether one should use OLS residuals or residuals from a robust regression technique (LTS) as a basis for the Jarque Bera normality test. We obtained the asymptotic distribution of the robustified normality test as chi-squared with 2 degrees of freedom under the null hypothesis of normality of the error terms. We investigated breakdown points of the test statistic. Through simulations we found out that the relative power of the test depends on the nature of the alternative. In situations where the alternative is randomly scattered and unbiased, the conventional test is more powerful. On the other hand, for systematic and clustered outliers, the robust test yields substantially greater power. Systematic and clustered outliers occur in many different real data sets as the recent survey by Rousseeuw (1997) shows. Regarding this, we tried to apply the modified test to some economic data sets. In order to do this, we found several examples in which the modified test reveals lack of normality not detected by a test based on OLS residuals from the literature. Other high breakdown point estimators like S estimators (Rousseeuw and Yohai, 1984), MM estimators (Yohai, 1987), and estimators (Yohai and Zamar, 1988) may also be applied to the Jarque Bera normality test. It could be interesting to see the comparison of the finite sample performance of this test. However, as it needs an extensive simulation, we suggest it as an area of further research.
References C. Agostinelli and M. Markatou. Tests of hypotheses based on the weighted likelihood methodology. Statistica Sinica, 11(2):499–514, 2001. A.C. Atkinson. Masking unmasked. Biometrika, 73:533–554, 1986. R. Davidson and J. MacKinnon. Estimation and inference in econometrics. 568-571. Oxford Univ. Press, New York, 1993. F.Y. Edgeworth. On observations relating to several quantities. Hermathena, 6:279–285, 1887. X. He, D.G. Simpson, and S.L. Portnoy. Breakdown robustness of tests. J. Am. Statist. Assoc., 85:446–452, 1990. O. H¨ossjer. Rank-based estimates in the linear model with high breakdown point. J. Am. Statist. Assoc., 89:149–158, 1994. C.M. Jarque and A.K. Bera. A test for normality of observations and regression residuals. Int. Statistical Review, 55:163–172, 1987. W.S. Krasker and R.E. Welsch. Efficient bounded influence regression estimation. J. Am. Statist. Assoc., 77:595–604, 1982. M. Markatou, A. Basu, and B.G. Lindsay. Weighted likelihood estimating equations with a bootstrap root search. J. Am. Statist. Assoc., 93:740–750, 1998.
A Test for Normality Based on Robust Regression Residuals
305
M. Markatou and X. He. Bounded influence and high breakdown point testing procedures in linear models. J. Am. Statist. Assoc., 89:543–549, 1994. K. Metin. The relationship between inflation and the budget deficit in Turkey. J. of Business and Economic Statistics, 16(4):412–422, 1998. M. Orhan and A. Zaman. Econometric applications of high-breakdown robust regression techniques. Technical report, Economics Department, Bilkent University, Ankara, 1999. E.S. Pearson, R.B. D’Agostina, and K.O. Bowman. Tests for departure from normality: Comparison of powers. Biometrika, 64(2):231–246, 1977. D.A. Pierce and R.J. Gray. Testing normality of errors in regression models. Biometrika, 69(1):233–236, 1982. P.J. Rousseeuw. Least median of squares regression. J. Am. Statist. Assoc., 79:871–880, 1984. P.J. Rousseeuw. Introduction to positive breakdown methods. In G.S. Maddala and C.R. Rao, editors, Handbook of Statistics, Vol. 15: Robust Inference, pages 101–121. Elsevier, Amsterdam, 1997. P.J. Rousseeuw and A.M. Leroy. Robust regression and outlier detection. Wiley, New York, 1987. P.J. Rousseeuw and K. Van Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212–223, 1999. P.J. Rousseeuw and V.J. Yohai. Robust regression by means of S-estimators. In J. Frank, W. H¨ardle, and R.D. Martin, editors, Robust and Nonlinear Time Series Analysis. Lecture Notes in Statistics, pages 256–272. Springer, New York, 1984. A. Tansel. Cigarette demand, health scares and education in Turkey. Applied Economics, 25: 521–529, 1993. C.M. Urzua. On the correct use of omnibus tests for normality. Economics Letters, 53: 247–251, 1996. H. White and G.M. Macdonald. Some large sample tests for nonnormality in the linear regression model. J. Am. Statist. Assoc., 75:16–28, 1980. V.J. Yohai. High breakdown-point and high efficiency robust estimates for regression. Ann. Statist., 15:642–656, 1987. V.J. Yohai and R.H. Zamar. High breakdown point estimates of regression by means of the minimization of an efficient scale. J. Am. Statist. Assoc., 83:406–413, 1988. A. Zaman. Statistical foundations for econometric techniques. Academic Press, New York, 1996.