Testing the Hypothesis of a Homoscedastic Error ... - SAGE Journals

Educational 10.1177/0013164405278578 Wilcox / Homoscedastic and Psychological Error TMeasurement erm

Testing the Hypothesis of a Homoscedastic Error Term in Simple, Nonparametric Regression

Educational and Psychological Measurement Volume 66 Number 1 February 2006 85-92 © 2006 Sage Publications 10.1177/0013164405278578 http://epm.sagepub.com hosted at http://online.sagepub.com

Rand R. Wilcox University of Southern California

Consider the nonparametric regression model Y = m(X) + τ(X)ε, where X and ε are independent random variables, ε has a median of zero and variance σ2, τ is some unknown function used to model heteroscedasticity, and m(X) is an unknown function reflecting some conditional measure of location associated with Y, given X. This article considers the problem of testing H0:τ = 1, the hypothesis that the error term is homoscedastic. Several methods were considered, two of which were found that control the probability of a Type I error well in simulations. One is fast from a computational point of view, and the other is based in part on a bootstrap method. Neither dominates in terms of power. Keywords: heteroscedasticity, smoothers, Theil-Sen estimator, Winsorized correlations

I

n recent years, various methods have been derived with the goal of making inferences based on robust regression techniques that are designed to be insensitive to situations in which the error term is heteroscedastic. That is, the methods are designed to yield accurate confidence intervals for parameters of interest or to control the probability of a Type I error, even when the error term is heteroscedastic. But among methods aimed at nonparametric regression, an issue that appears to have received no attention is testing the hypothesis that the error term is homoscedastic. The problem is not finding a reasonable technique but rather identifying methods that perform well in simulations. Here, it is found that some of the seemingly more obvious methods are unsatisfactory, but two methods are found that give good results. To state the goal more precisely, consider the nonparametric regression model Y = m(X) + τ(X)ε,

(1)

where X and ε are independent random variables, ε has a median of zero and variance 2 σ , τ is some unknown function used to model heteroscedasticity, and m(X) is an

85

86 Educational and Psychological Measurement

unknown function reflecting some conditional measure of location associated with Y, given X. The goal is to test H0:τ(X) ≡ 0,

(2)

the hypothesis that the error term is homoscedastic. This hypothesis has practical relevance because rejecting H0 indicates that the precision of the estimate of Y, given X, changes with X. And of course, heteroscedasticity can be of intrinsic interest. When it is assumed that τ(X) = β1X + β0, a variety of methods have been proposed for testing Equation 2, comparisons of which were made by Lyon and Tsai (1996). Lyon and Tsai found only one method that performed well in simulations; it was proposed by Koenker (1981) and represents a slight modification of the technique derived by Cook and Weisberg (1983). Here it is noted that, at least in some situations, the methods described in the next section have more power than Koenker’s (1981) method, even when τ(X) = β1X + β0. For completeness, Ruppert, Wand, and Carroll (2003) described a variety of methods for studying and characterizing heteroscedasticity. But they do not provide any methods for testing Equation 2, and evidently no such method has ever been proposed.

Description of the Methods The basic strategy can be outlined as follows. Let (X1, Y1), . . . , (Xn, Yn) be a random sample of n points and let m(X) be the conditional median of Y, given X. The first step is to approximate m(X) using what is called a running interval smoother (e.g., Wilcox, 2003), a description of which is given momentarily. Let ri = Yi – m(X1), i = 1, . . . n, in which case |ri| measures the distance between Yi and m(Xi). Let β be the slope of some regression line between Xi and |ri|. Then under homoscedasticity, H0: β = 0

should be true. Alternatively, if ρ is some correlation between Xi and |ri|, then H0:ρ = 0 should be true. A natural strategy is to use least squares regression or Pearson’s correlation, but this was found to be unsatisfactory. In simulations, Pearson’s correlation performed reasonably well when m(X) is linear in X, but when, for example, m(X) = X2, this was no longer the case. What was found to be more satisfactory were two methods that offer protection against outliers among the X values. This does not seem too surprising in light of properties associated with the smoother used, for reasons to be elaborated on. Here, the so-called running interval smoother is used to estimate m(X). Let f be some constant to be chosen, and let M be the median of the values X1, . . . , Xn. The median absolute deviation (MAD) is the median of the values |X1 – M|, . . . , |Xn – M|. The point X is said to be close to Xi if

Wilcox / Homoscedastic Error Term 87

|Xi – X| ≤ f(MAD)/.6745.

Under normality, MAD/.6745 estimates the standard deviation, in which case X is close to Xi if X is within f standard deviations of Xi. Let N(Xi) = {j:|Xj – Xi| ≤ f(MADN)}, where, for convenience, MADN is MAD/ .6745. That is, N(Xi) indexes the set of all Xj values that are close to Xi. Then, m(Xi) is taken to be the median of the Yj values such that j is an element of N(Xi). For convenience, let Ri = |Yi – m(Xi)|. So the goal is to test the hypothesis that the (population) regression line between R and X is horizontal. As previously mentioned, if Pearson’s correlation between R and X is used, situations were found in which this approach proved to be unsatisfactory. A speculation as to why has to do with a general problem with smoothers: Accurate estimates of the regression line are typically difficult for extreme X values. One reason is that often the number of observed X values is small in these cases. Another and perhaps more serious problem is that there is an inherent bias when dealing with extreme X values. For example, suppose the goal is to estimate m(20). Ideally, a smooth would use X values both less than and greater than 20 to accomplish this goal, but if X = 20 is the smallest observed X value, then any estimate of m(20) might suffer from serious bias. To deal with this problem, the strategy is to use a method for making inferences about the association between R and X that ignores or down-weights extreme X values. The first was to take β to be the population slope of the regression line between R and X corresponding to the Theil (1950) and Sen (1968) estimator. The second strategy was to let ρw be the Winsorized correlation between R and X and test H0:ρw = 0 using the method in Wilcox (2003, 2005). Many alternative robust regression methods are possible, but no attempt has been made to include all of them here. The main point is that, in terms of controlling the probability of a Type I error, both of the methods considered here perform very well in simulations. To elaborate on the Theil-Sen estimator, for any i < i′, for which Xi ≠ Xi′, let Sii ′ =

Ri − Ri ′ . Xi − Xi ′

The Theil-Sen estimate of the slope is bts, the median of all the slopes represented by Sit′. Let β be the population slope estimated by bts. To test H0: β = 0, it currently seems that a basic percentile bootstrap method performs relatively well. In particular, a bootstrap sample is obtained by randomly sampling, with replacement, n rows of data from R 1, X 1 R 2, X 2 ... R n, X n.


Let b*ts be the Theil-Sen estimate of β based on this bootstrap sample. Repeat this bootstrap process B times, yielding b1* , . . . , b*B . Let b*(1 ) ≤ . . . ≤ b*( b ) be the B bootstrap estimates written in ascending order. Let l = αB/2, rounded to the nearest integer, and u = B – 1. Then ( b(*l + 1 ) , b(*u ) )

is an approximate 1 – α confidence interval for β. Let p be the proportion of bootstrap estimates less than zero. Then, a p value is 2min ( p , 1 − p ). As for the Winsorized correlation approach, for convenience, set Yi1 = R1 and Yi2 = Xi, i = 1, . . . , n. Next, Winsorize the Y values. That is, for fixed j, let Y(1)j ≤ . . . ≤ Y(n)j be the n values written in ascending order, and if Yij ≤ Y(g + 1)j, Wij = Y(g + 1)j if Y(g + 1)j < Yij < Y(n – g)j, Wij = Yij, and if Yij ≥ Y(n – g)j, Wij = Y(n – g)j., where g = [γn], γ(0 ≤ γ < .5) is the amount of Winsorizing to be done and [.] is the greatest integer function. Here, γ = .2 is used. Then, the estimate of ρw, the sample Winsorized correlation between R and X, is just Pearson’s correlation based on the Winsorized values. That is, estimate ρw with rw =

Σ(Wi 1 − W1 )(Wi 2 − W2 ) Σ(Wi 1 − W1 )2 Σ(Wi 2 − W2 )

.

To test H0:ρw = 0, compute Tw = rw

n−2 , 1 − rw2

and reject if Tw ≥ t1 – α/2, the 1 – α/2 quantile of Student’s t distribution with ν = h – 2 degrees of freedom, where h = n – 2g.

A Simulation Study Simulations were used to study the small-sample properties of the two methods just described. Observations were generated according to Equation 1 with either m(X) = X 2 or m(X) = X . The distributions for both X and ε were taken to be one of four g-and-h distributions that contain the standard normal distribution as a special case. If Z has a standard normal distribution and g > 0, then W =

exp( gZ ) − 1 exp( hZ 2 / 2 ) g

has a g-and-h distribution in which g and h are parameters that determine the first four moments. When g = 0, this last equation is taken to be


Table 1 Some Properties of the g-and-h Distribution g

h

κ1

κ2

0.0 0.0 0.2 0.2

0.0 0.2 0.0 0.2

0.00 0.00 0.61 2.81

3.0 21.46 3.68 155.98

W = Z exp(hZ 2/2).

The four distributions used here were the standard normal (g = h = 0.0), a symmetric heavy-tailed distribution (h = 0.2, g = 0.0), an asymmetric distribution with relatively light tails (h = 0.0, g = 0.2), and an asymmetric distribution with heavy tails (g = h = 0.2). Table 1 shows the skewness (κ1) and kurtosis (κ2) for each distribution considered. Additional properties of the g-and-h distribution are summarized by Hoaglin (1985). Table 2 shows the estimated probability of a Type I error when testing at the .05 level with n = 30. The estimates are based on 1,000 replications with B = 600 when using the bootstrap method. (From Robey & Barcikowski, 1992, 1,000 replications are sufficient from a power point of view. More specifically, if we test the hypothesis that the actual Type I error rate is .05, and if we want power to be .9 when testing at the .05 level and the true α value differs from .05 by .025, then 976 replications are required.) As indicated in Table 2, among all situations considered, the estimated probability of a Type I error when using Theil-Sen ranged between .030 and .067. As for the method based on the Winsorized correlation, the estimates ranged between .021 and .050. As for power, neither method dominates, and the choice of method appears to depend, at least in part, on the nature of the heteroscedasticity. For example, if the distribution of X is shifted so that it has a median of 3, m(X) = X, and τ(X) = |X|, the method based on the Theil-Sen estimator has more power for all of the situations listed in Table 2. With m(X) = X2, the same result was obtained. But if the distribution of X has a median of zero, m(X) = X, and τ(X) = 1/|X + 1|, the exact opposite is true. Finally, it is illustrated that, at least in some situations, both methods have more power than Koenker’s technique, even when there is a linear association between Y and X. Suppose n = 30, both X and ε have standard normal distributions, and Y = exp(–(X – .5)2)ε. The power of Koenker’s method was estimated to be .31. In contrast, the estimated power for the bootstrap method and the method based on Tw was .53 and .62, respectively.


Table 2 Type I Error Rates, n = 30, ␣ = .05 ε

X

2

m(X) = X

m(X) = X

g

h

g

h

TS

WIN

TS

WIN

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2

0.0 0.0 0.0 0.0 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.2 0.2 0.2 0.2

0.0 0.0 0.2 0.2 0.0 0.0 0.2 0.2 0.0 0.0 0.2 0.2 0.0 0.0 0.2 0.2

0.0 0.2 0.0 0.2 0.0 0.2 0.0 0.2 0.0 0.2 0.0 0.2 0.0 0.2 0.0 0.2

.046 .049 .049 .040 .058 .060 .056 .063 .039 .042 .040 .047 .067 .060 .065 .067

.038 .045 .038 .038 .038 .046 .041 .040 .038 .045 .050 .043 .038 .044 .040 .043

.044 .045 .032 .037 .035 .037 .030 .038 .038 .043 .036 .041 .043 .049 .039 .049

.036 .036 .021 .027 .034 .037 .027 .037 .030 .041 .030 .029 .041 .044 .038 .038

An Illustration To illustrate some of the key steps in a relatively simple manner, 12 X values were generated from a normal distribution with a mean of 3, and the corresponding Y values were generated with Y = X + |X|ε, where ε has a standard normal distribution. The resulting values were X

Y

2.07 1.78 4.29 2.75 4.65 3.24 0.09 4.55 4.02 2.08 1.75 1.81

1.42 1.38 4.58 2.09 1.38 –0.91 0.06 6.72 –0.93 0.33 1.20 1.39


The sample median of the X values is M = 2.415, and the MADN is 1.105. Consider the first X value, X = 2.07. Then, with f = 1, N(Xi) = {1, 2, 4, 6, 10, 11, 12}. The median of the corresponding Y values is 1.385. That is, m(X) = 1.385, and the remaining m(Xi) values are computed in a similar manner. The resulting residuals are 0.035, –0.005, 3.200, 0.710, –1.600, –1.500, 0.000, 3.740, –2.310, –1.055, –0.185, 0.005.

The (20%) Winsorized R values are 2.07, 1.78, 4.29, 2.75, 4.29, 3.24, 1.78, 4.29, 4.02, 2.08, 1.78, 1.81,

and the corresponding Winsorized R values are 0.035, 0.005, 2.310, 0.710, 1.600, 1.500, 0.005, 2.310, 2.310, 1.055, 0.185, 0.005.

Computing Pearson’s correlation based on these Winsorized values yields rw = 0.94, and the p value associated with the test statistic Tw is less than .001. The Theil-Sen estimate of the slope between the absolute residuals and X is 0.85, and the p value is again less than .001.

Concluding Remarks There are many variations of the methods considered in this article, and perhaps one of them has practical value for the problem at hand. The main result here is that two methods were found that perform well in simulations, in terms of controlling the probability of a Type I error, so they would seem to deserve consideration in applied work. Finally, it is known that the power of conventional methods for detecting dependence can be adversely affected by heteroscedasticity (Wilcox, 2003). Before declaring two variables independent when using traditional techniques, it would seem to be prudent to check for heteroscedasticity using one of the methods in this article.

References Cook, R. D., & Weisberg, S. (1983). Diagnostics for heteroscedasticity in regression. Biometrika, 70, 1-10. Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h distributions. In D. Hoaglin, F. Mosteller, & J. Tukey (Eds.), Exploring data tables, trends, and shapes (pp. 461-515). New York: John Wiley. Koenker, R. (1981). A note on Studentizing a test for heteroscedasticity. Journal of Econometrics, 17, 107112. Lyon, J. D., & Tsai, C.-L. (1996). A comparison of tests for homogeneity. Statistician, 45, 337-350. Robey, R. R., & Barcikowski, R. S. (1992). Type I error and the number of iterations in Monte Carlo studies of robustness. British Journal of Mathematical and Statistical Psychology, 45, 283-288. Ruppert, D., Wand, M., & Carroll, R. (2003) Semiparametric regression. Cambridge, UK: Cambridge University Press.

92 Educational and Psychological Measurement Sen, P. K. (1968). Estimate of the regression coefficient based on Kendall’s tau. Journal of the American Statistical Association, 63, 1379-1389. Theil, H. (1950). A rank-invariant method of linear and polynomial regression analysis. Indagationes Mathematicae, 12, 85-91. Wilcox, R. R. (2003). Applying contemporary statistical techniques testing. San Diego, CA: Academic Press. Wilcox, R. R. (in press). Introduction to robust estimation and hypothesis testing (2nd ed.). San Diego, CA: Academic Press.