Recovered errors and normal diagnostics in regression - University of ...

1 downloads 0 Views 134KB Size Report
Abstract. Diagnostics for normal errors in regression typically utilize ordinary residuals, despite the failure of assumptions to validate their use. Case studies.
Metrika (1999) 49: 107±119

> Springer-Verlag 1999

Recovered errors and normal diagnostics in regression Donald R. Jensen1, Donald E. Ramirez2,* 1 Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA (e-mail: [email protected]) 2 Department of Mathematics, University of Virginia, Charlottesville, Virginia 22903, USA (e-mail: [email protected]) Received: January 1999

Abstract. Diagnostics for normal errors in regression typically utilize ordinary residuals, despite the failure of assumptions to validate their use. Case studies here show that such misuse may be critical. A remedy invokes recovered errors having the required properties, taking into account that such errors are closer to normality than are disturbances in the observations themselves. Simulation studies show consistent improvement over the usual methods in small samples. In addition, e¨ects on normal diagnostics due to various model violations are examined. Key words: Recovered errors, normal tests, varimax, kurtosis 1 Introduction We brie¯y survey regression diagnostics, with emphasis on the normality of errors. A case study then illuminates the central issues, followed by a brief account of the notation and models to be considered. 1.1 An overview Regression methods and their validation are central to applied research. Regression diagnostics seek to identify misspeci®ed models, correlated or heteroscedastic errors, the type of error distribution, in¯uential observations, outliers, evidence of cross validity in model evaluation, and other critical features of the data. Many such diagnostics require the analyses of the ordinary residuals. * The second author was partially supported by the Center for Advanced Studies at the University of Virginia.

108

D. R. Jensen, D. E. Ramirez

Prominent among these are diagnostics for normal errors in regression, including graphical (Snee and Pfeifer (1983)) as well as hypothesis testing procedures (D'Agostino (1982)). Both are supported by a variety of software packages, all based on the ordinary residuals. Both strictly presuppose simple random sampling, and thus are misapplied in regression owing to the heteroscedastic, correlated, and singular joint distribution of the ordinary residuals. The twin uncertainties are whether such irregularities might be misconstrued as evidence against normality, or conversely, whether a lack of evidence might be an artifact of these features under nonnormal disturbances. In short, these uncertainties give pause in attempts to interpret diagnostics for normal errors as currently practiced. These issues have been addressed in part. Tests designed for independent, identically distributed observations are valid asymptotically when applied using ordinary residuals under homoscedastic and uncorrelated disturbances. See Huang and Bolch (1974), Mukantseva (1979), Pierce and Kopecky (1979), and White and MacDonald (1980), for example. Pierce and Gray (1982) conclude from simulations that the tests of Shapiro and Wilk (1965) and Anderson and Darling (1952) are adequate for n ˆ 20 in straight-line models, and for n ˆ 40 in selected multiple regression settings. For disturbances generated instead by a stationary stochastic process exhibiting positive dependence, then tests based on the empirical distribution function (EDF ) have in¯ated type I error probabilities; see Moore (1982) and Moore and Gleser (1983). Theil (1965) introduced linear unbiased scaled (LUS) disturbance estimators to correct for the heteroscedastic, correlated, and singular features of the ordinary residuals. It is incumbent here to diagnose the applicability of normal diagnostics, including possible adverse e¨ects owing to the use of singular, heteroscedastic, and correlated residuals. Regarding correlations, Myers (1990, p. 61) conjectures: ``When ideal conditions on the ei hold, the correlation among the residuals is not severe.'' Here fe1 ; e2 ; . . . ; en g are disturbances of the model. However, this assessment is incomplete in that correlations among the ordinary residuals are determined by the matrix of regressors even under ideal conditions on the disturbances. Heterogeneous variances also may be at issue as well, as these matters appear not to have been studied in depth. We return to these topics subsequently, including di½culties surrounding the singularity of the joint distribution of the ordinary residuals. A continuing problem in the use of residuals is that their Pearson skewness …g1 † and kurtosis …g2 † ratios are closer to normal values than are those of the model disturbances themselves if nonnormal; see Huang and Bolch (1974). Thus normal diagnostics using residuals, both graphical and hypothesis testing, appear to be slanted towards normality. In this paper, we address the aforementioned di½culties on reconstructing entities, called linearly recovered errors, having the requisite properties. Moreover, these are recovered from the ordinary residuals using standard software that may be assimilated readily into existing regression packages. We next examine anomalies in the use of ordinary residuals in the context of a case study to be considered further. 1.2 Case studies: A ®rst look We consider discoloration in canned applesauce during storage. Draper (1965) examined e¨ects of temperature …X1 †, betaine added …X2 †, and storage time

Recovered errors and normal diagnostics in regression

109

…X3 † on the Munsell chroma …Y† of each specimen. A multilinear model was determined using n ˆ 48 data points. The normal probability plot of the ordinary residuals is essentially a straight line. This is con®rmed by the Anderson-Darling test with p-value ˆ 0:567, the Shapiro-Wilk test with pvalue ˆ 0:457, and the Kolmogorov-Smirnov test with p-value > 0:15. This example falls within the guidelines of Pierce and Gray (1982) regarding sample size, so that nonnormality of errors is not an issue using currently accepted methods. However, the applicability of the methods themselves must be questioned for reasons cited earlier. We ®nd that the ordinary residuals tend to be negatively correlated; their minimum, ®rst, second, and third quartiles, and their maximum correlations are fÿ0:14939; ÿ0:04519; ÿ:02020; 0:00675; 0:08795g respectively, with a range of 0.23734. Corresponding values for variances are f0:84613; 0:89326; 0:91956; 0:93908; 0:97280g with a range of 0.12667. Clearly the use of singular, heteroscedastic, and correlated residuals is problematic. Accordingly, we rework the foregoing tests using n ÿ k ÿ 1 ˆ 44 linearly recovered errors instead, as prescribed later in Proposition 1, to ensure their validity. We use the Shapiro-Wilk test for normality since these results were computed using SAS, and this test was convenient to code. The p-value for the ordinary least squares residuals is 0.4568, and for one of our rotated linearly recovered errors is 0.0262. The revised test clearly points towards nonnormal errors, contradicting our earlier assessment based on the ordinary residuals. In consequence, users of conventional diagnostics would be misdirected into reliance on normal-theory inferences of dubious merit. The guidelines of Pierce and Gray (1982) appear to have fallen short. In summary, this example shows that opposing conclusions may be found under the two methods, and that evidence for or against normality is suspect when based on the ordinary residuals in regression. We turn next to matters of notation and details surrounding the models to be considered. 1.3 Notation Designate by L…Y † the law of distribution of Y. In particular, N1 …m; s 2 † is the one-dimensional normal distribution having the mean E…Y † ˆ m and variance Var…y† ˆ s 2 . The covariance and correlation of …Y1 ; Y2 † are designated as Cov…Y1 ; Y2 † and r…Y1 ; Y2 †, respectively. We consider full-rank multilinear models of the type Yi ˆ b0 ‡ b 1 Xi1 ‡    ‡ b k Xik ‡ ei

…1†

relating the typical response Yi to regressors fXi1 ; Xi2 ; . . . ; Xik g through unknown parameters fb0 ; b 1 ; . . . ; bk g, for i ˆ 1; 2; . . . ; n. The ordinary residuals are given by ei ˆ Yi ÿ Y^i ˆ Yi ÿ …b^0 ‡ b^1 Xi1 ‡    ‡ b^k Xik †

…2†

in terms of the ordinary least-squares estimators fb^0 ; b^1 ; . . . ; b^k g. Conventional assumptions regarding such models include

110

D. R. Jensen, D. E. Ramirez

A1 : E…ei † ˆ 0; 1 U i U n; A2 : Var…ei † ˆ s 2 ; 1 U i U n; A3 : Cov…ei ; e j † ˆ 0; i 0 j ˆ 1; . . . ; n; and A4 : L…ei † ˆ N1 …0; s 2 †, independently for i ˆ 1; 2; . . . ; n: We return to these assumptions subsequently. 2 Recovery of errors We now proceed constructively to ®x the problems identi®ed in Section 1. Against this new benchmark, case studies show that normal diagnostics using ordinary residuals at times may be grossly misleading. 2.1 Basics Under assumptions A1 ÿ A3 it follows directly that E…ei † ˆ 0, whereas their variances and covariances are determined by the matrix of regressor variables. In consequence, the elements fe1 ; . . . ; en g typically are heteroscedastic and are correlated, as documented empirically in the case study of Section 1.2. Moreover, their joint n-dimensional distribution is singular of rank n ÿ k ÿ 1, there being k ‡ 1 sure linear relations among them. This joint singularity is itself troubling for reasons to follow. Using all n ordinary residuals diagnostically ignores redundancies among them, and it supports the illusion that there are n, instead of the actual n ÿ k ÿ 1 e¨ective data points. In consequence, the usual graphical displays, order statistics, sample moments, the EDF, and various tests for goodness of ®t are based on excessive and partially redundant data, whereas p-values depending on sample size are reported erroneously. In short, the fraction …k ‡ 1†=n represents the fully redundant portion of the ordinary residuals, and these redundancies are especially acute when k is large relative to n. Unfortunately, no accounting for such anomalies is in current vogue. In the next section we preempt these and other di½culties on extracting exactly n ÿ k ÿ 1 constructs having the required properties. 2.2 The recovered errors Using properties of linear statistics and letting t ˆ n ÿ k ÿ 1, we recover elements fR1 ; . . . ; Rt g from the ordinary residuals fe1 ; . . . ; en g, as linear functions whose coe½cients depend on the matrix projecting observations onto the error is based on the spectral decompoP n space.0 An algorithm xi qi qi ˆ QDx Q 0 of Bn ˆ In ÿ X…X 0 X†ÿ1 X 0 , such that Dx ˆ sition Bn ˆ iˆ1 Diag…x1 ; . . . ; xn † contains the ordered eigenvalues and Q ˆ ‰q1 ; . . . ; qn Š the corresponding eigenvectors of Bn . To these ends, let R…n† ˆ Q 0 e; partition R…n† 0 0 0 as R…n† ˆ ‰R…t† ; R…k‡1† Š 0 with R…t† A R t and R…k‡1† A R k‡1 ; and identify R…t† ˆ ‰R1 ; . . . ; Rt Š. Their properties are summarized in Proposition 1. We henceforth refer to fR1 ; . . . ; Rt g as the linearly recovered errors, these having properties in t-dimensional space identical to those of fe1 ; . . . ; en g in n-dimensional space under assumptions A1 ÿ A3. Speci®cally, the linearly recovered errors turn

Recovered errors and normal diagnostics in regression

111

out to be homoscedastic, uncorrelated, and nonsingular, precisely as required to validate the use of standard diagnostics for normality. These developments are summarized for later reference as follows. 0 0 ; R…k‡1† Š 0 ˆ Q 0 e with Q from the spectral decomProposition 1. Let R…n† ˆ ‰R…t† Pn 0 position Bn ˆ iˆ1 xi qi qi0 ˆ QDx Q of Bn ˆ In ÿ X…X 0 X†ÿ1 X 0 . Then under as0 ˆ sumptions A1 ÿ A3 with t ˆ n ÿ k ÿ 1, the linearly recovered errors R…t† ‰R1 ; . . . ; Rt Š are homoscedastic, nonsingular, and uncorrelated, whereas under assumptions A1 ÿ A4, the variables fR1 ; . . . ; Rt g now comprise a simple random sample of size t ˆ n ÿ k ÿ 1 from N1 …0; s 2 †, independently of X.

Proof. Invoke assumptions A1 ÿ A3 to infer that E…e† ˆ 0 and V …e† ˆ s 2 Bn In Bn ˆ s 2 Bn since Bn is idempotent. With Bn ˆ QDx Q 0 and R…n† ˆ Q 0 e, it follows that E…R…n† † ˆ Q 0 0 ˆ 0 and V …R…n† † ˆ s 2 Diag…x1 ; . . . ; xn †, with fx1 V x2 V    V xn g as the ordered eigenvalues of Bn . But since Bn is idempotent of rank t ˆ n ÿ k ÿ 1, we infer that n ÿ k ÿ 1 eigenvalues are unity and the remainder are zero, so that V …R…n† † ˆ s 2 Diag…It ; 0†. The linearly recovered errors fR1 ; . . . ; Rt g thus have properties on R t identical to those of e ˆ ‰e1 ; e2 ; . . . ; en Š 0 A R n under assumptions A1 ÿ A3, whereas the elements of R…k‡1† are degenerate at 0 A R k‡1 and accordingly are dropped from further consideration. 9 2.3 Non-uniqueness of recovered errors All linearly recovered errors are in the class of Linear Unbiased Scaled …LUS† residual estimators; see Theil (1965). These estimators are not unique. Since the eigenvalues of Bn are not distinct, with unity appearing t ˆ n ÿ k ÿ 1 times, the matrix Q is not unique. Write Q 0 ˆ ‰Q1 ; Q2 Š 0 with Q1 of rank t and Q2 of rank n ÿ t. Given a choice for Q10 ; any orthogonal rotation P 0 Q10 also will produce a LUS estimator for the disturbances. Theil (1965) suggests using the matrix P 0 Q10 that minimizes the expected squared length of the transformed residuals, to be called the BLUS residuals. Other criteria are required in testing for normality, however. The methodology we employ is that of ®nding the linearly recovered errors which have a large kurtosis estimate. We show how to achieve this in the next section. 3 Kurtosis estimates We have noted that the skewness and kurtosis ratios of the ordinary residuals are closer to normal values than are those of nonnormal disturbances, slanting the diagnostics towards normality. Here we settle nonuniqueness of the recovered errors on maximizing their kurtosis, thereby slanting the diagnostics away from normality. 3.1 Kurtosis criterion We assume that the linearly recovered errors fR1 ; . . . ; Rt g have been determined from the matrix Q 0 as in Section 2.2. For any orthogonal matrix P of

112

D. R. Jensen, D. E. Ramirez

order t  t, the matrix P 0 Q10 ˆ A 0 also will transform the ordinary residuals into a nonsingular vector of linearly recovered errors. The kurtosis g2 for the BLUS residuals has been given by Huang and Bolch (1974); see also Misra (1972). The result, which extends to include the LUS residuals, is (where we correct the typographical error) g2 …Ri † ˆ 3 ‡ …g2 …ei † ÿ 3†

n X jˆ1

aji4 ;

for 1 U i U t:

…3†

Pt g …R † as P is varied, eventually Our criterion is to consider C…P† ˆ iˆ1 P t 2 Pi n 4 choosing P so as to maximize F…P† ˆ iˆ1 jˆ1 aji . This maximization is available with standard software since it is equivalent to the Kaiser ``raw'' varimax criterion in factor analysis; see, for example, Harman (1976, p. 290). The varimax criterion in turn is used to ®nd the orthogonal matrix P which will maximize t X n t n X 1X 1X aji4 ÿ 2 a2 n iˆ1 jˆ1 n iˆ1 jˆ1 ji

!2 …4†

where A ˆ Q1 P. We now note that the second term above is the constant t=n 2 since A 0 A ˆ It : 3.2 The methodology For a given data set, ®rst compute the ordinary leverages hii ˆ xi …X 0 X†ÿ1 xi0 , for 1 U i U n: Following Ramsey (1969, p. 359) and Huang and Bolch (1974, p. 332), we sort the data in ascending order based on the leverages. Those authors choose the k ‡ 1 disturbances that will not be represented in the BLUS residuals based on the k ‡ 1 smallest elements on the diagonal of the matrix Bn ˆ In ÿ X…X 0 X†ÿ1 X 0 ; equivalently, the k ‡ 1 largest ordinary leverages. The BLUS residuals found on using the k ‡ 1 smallest and the k ‡ 1 largest leverages are denote by BLUS1 and BLUS2 , respectively. The order of the data does a¨ect the linearly recovered errors. We ®nd the 0 Y from the eigenvectors QH of Hn ˆ linearly recovered errors RH ˆ QH1 ÿ1 0 0 0 Y from the eigenX…X X† X ; and the linearly recovered errors RB ˆ QB1 vectors QB of Bn ˆ In ÿ Hn . The ®rst method transforms into zero the residuals corresponding to the k ‡ 1 values with low leverages, and the second transforms into zero the residuals corresponding to the k ‡ 1 values with high leverages. We now use a varimax rotation to ®nd the matrices PH and PB ; 0 0 QH1 that have maximized F…  † for QH1 and QB1 , respectively. With AH0 ˆ PH 0 0 0 0 and AB ˆ PB QB1 , we determine two sets of recovered errors RAH and RAB0 , respectively, each having maximal kurtosis, which we now can test for normality without violating the important assumptions required. Note also that the rotation matrices are independent of the responses Y. The recovered errors RAH0 and RAB0 with varimax rotation correspond, respectively, to the BLUS1 and BLUS2 residuals without rotation. For comparative purposes, we consider all of the foregoing transformations. In practice, a user will choose one of these under guidelines to be given subsequently.

Recovered errors and normal diagnostics in regression

113

When the residuals are from a multivariate linear regression, we use the Mardia skewness test for normality (Mardia and Zemroch (1975)) and the invariant consistent test of Henze and Zirkler (1990) to determine whether the recovered vector errors are indeed ``normal.''

3.3 Case studies: Revisited Returning to the example of Section 1.2, we apply the foregoing methodology in the discoloration of canned applesauce during storage. Accordingly, we rework the usual tests using n ÿ k ÿ 1 ˆ 44 linearly recovered errors as prescribed in Proposition 1. The p-values are those from the Shapiro-Wilk test for normality. The p-value for the ordinary least squares residuals e is 0.4568, and 0 for the rotated transformed residuals, AH Y and AB0 Y, are 0.1219 and 0.0262, respectively. As expected, the rotated models are better able to reveal the nonnormality of the linearly recovered errors, with the residuals RAB0 ˆ AB0 Y having a p-value ˆ 0:0262. 4 Other case studies Consider actual market returns for stocks as related to corresponding accounting rates. For each of n ˆ 54 companies the mean yearly market return …Y † and the mean yearly accounting rate …X † were determined for the period 1959±1974. Simple linear regression analysis then gave the best-®tting line as reported in Myers (1990, p. 16¨.), along with the ordinary residuals. The normal probability plot for the ordinary residuals is not linear, and their histogram suggests a distribution of gamma type. In testing for normality with n ˆ 54, the Anderson-Darling test gave a p-value of 0:000 to three decimals, whereas the Kolmogorov-Smirnov test gave an approximate p-value < 0:01. For details and further references regarding these tests, see D'Agostino (1982). The accounting data thus have failed tests for normality using the ordinary residuals. Using linearly recovered errors, the Shapiro-Wilk test has p0 Y and RAB0 ˆ AB0 Y: value ˆ 0.0002 for both sets of recovered errors RAH0 ˆ AH Thus we have shown the nonnormality of these data without violating the assumptions of the Shapiro-Wilk test. The linearly recovered errors also can be applied to multivariate models by noting that Equation 1 is valid with the vectors fY; b; and eg now matrices with column rank ˆ p > 1, the number of dependent variables. To illustrate, we next consider a data set from Timm (1975, p. 314) having p ˆ 3 yvariables as scores on achievement tests, and k ˆ 5 regressor variables as scores on ®ve other tests. The original ordinary residuals appear to be multivariate normally distributed with p-values of 0.6695 and 0.7055 for the Mardia Skewness test and the Henze-Zirkler test, respectively. However, when the ordinary residual vectors are transformed into the recovered errors RB , the pvalues are found to be 0.0001 and 0.0036 for the Mardia Skewness test and the Henze-Zirkler test, respectively. Once again, the user is cautioned against misapplying tests for normality, which are derived for random samples, on data sets comprising the ordinary residuals which are known to be singular, heteroscedastic, and correlated.

114

D. R. Jensen, D. E. Ramirez

Table 1. Estimated type I error for the Monte Carlo experiment a

.10

.05

n

W

W

W

BLUS1

.101

.048

.007

AH0 Y

.104

.049

.007

BLUS2

.104

.043

.004*

AB0 Y

.101

.053

.011

OLS

.090

.042

.007

BLUS1

.102

.050

.010

AH0 Y

.118

.056

.016

BLUS2

.100

.038

.013

AB0 Y

.117

.056

.016

OLS

.114

.046

.009

BLUS1

.102

.046

.008

AH0 Y

.083

.031

.009

BLUS2

.105

.041

.013

AB0 Y

.091

.043

.009

OLS

.093

.042

.009

10

30

50

.01

5 Results of a Monte Carlo experiment Huang and Bolch (1974) conducted a Monte Carlo experiment to determine the power of Theil's BLUS residuals in detecting non-normal disturbances. We follow their protocol in using an experimental design matrix of rank n  4, with the three predictors fXi1 ; Xi2 ; Xi3 g having been drawn from a uniform distribution on ‰0; 1Š. The responses are given by Yi ˆ ÿ20:0 ‡ 4:5Xi1 ÿ 1:5Xi2 ‡ 2:8Xi3 ‡ ei , i ˆ 1; 2;    ; n, for several sample sizes n ˆ 10; 15; 20; 30; 40; and 50. The design matrix X is held ®xed for experiments with the same n. The errors ei are generated ®rst from a normal distribution and then from a shifted exponential distribution, both with m ˆ 0 and s ˆ 8:2. The experiment was replicated N ˆ 1000 times using SAS. Table 1 shows the estimated Type I error probabilities using the ShapiroWilk W statistic. The sample sizes shown are for n ˆ 10; 30, and 50, and are representative for the other sample sizes. The three a levels are 0.10, 0.05, and 0.01. Entries in the table which are marked with an asterisk are those values which are beyond two standard deviations from the nominal a-level. Tables 2 and 3 record the number of trials in N ˆ 1000 that the nonnormal residuals were correctly identi®ed at the a ˆ 0:01 and 0.05 levels. As antici-

Recovered errors and normal diagnostics in regression

115

Table 2. Estimated power at the 1 percent level with 1000 replications W

n

W

n

BLUS1

.014

20

BLUS1

.147

40

AH0 Y

.045*

AH0 Y

BLUS2

.029

AB0 Y

n 10

15

W BLUS1

.653

.220

AH0 Y

.722

BLUS2

.189

BLUS2

.688

.037*

AB0 Y

.209

AB0 Y

.709

OLS

.033

OLS

.273

OLS

.851

BLUS1

.066

BLUS1

.378

BLUS1

.808 .881

AH0 Y BLUS2

30

50

.117

AH0 Y

.517

AH0 Y

.099

BLUS2

.447

BLUS2

.831

AB0 Y

.116

AB0 Y

.518

AB0 Y

.871

OLS

.141

OLS

.618

OLS

.955

Table 3. Estimated power at the 5 percent level with 1000 replications n 10

15

BLUS1 AH0 Y BLUS2

W

n

.070

20

BLUS1

.124*

AH0 Y

W

n

.264

40

W BLUS1

.786

.405

AH0 Y

.848

.080

BLUS2

.343

BLUS2

.803

AB0 Y

.127*

AB0 Y

.379

AB0 Y

.843

OLS

.121

OLS

.473

OLS

.946

BLUS1

.158

BLUS1

.561

BLUS1

.889

AH0 Y

.264

AH0 Y

.686

AH0 Y

.932

BLUS2

.205

BLUS2

.616

BLUS2

.918

AB0 Y

.263

AB0 Y

.684

AB0 Y

.931

OLS

.293

OLS

.789

OLS

.985

30

50

pated from the work of Ramsey (1969) and Huang and Bolch (1974), BLUS2 did out-perform BLUS1 . Our recovered errors procedure had uniformly more power than the BLUS 0 s statistics over all ranges. The table entries where our recovered errors had more power than the ordinary least-squares residuals are marked with an asterisk. These cases occur with n ˆ 10. Using the paired t-test to compare the p-values for the BLUS1 and BLUS2 residuals did not show any statistically signi®cant di¨erences at the a ˆ 0:05

116

D. R. Jensen, D. E. Ramirez

level when the disturbances were normally distributed. For example, with n ˆ 20 the means of the 1000 p-values for the BLUS1 and BLUS2 residuals were 0.502 and 0.510 respectively. The paired t-test showed no signi®cant 0 Y and di¨erence with p-value ˆ 0:471. Comparing the p-values for the AH AB0 Y residuals also showed no signi®cant di¨erences at the a ˆ 0:05 level when the disturbances were normally distributed. For example, with n ˆ 20 0 Y and AB0 Y residuals were 0.513 and the means of the 1000 p-values for the AH 0.508 respectively, with p-value ˆ 0:601 from the paired t-test. However, when the disturbances were generated from a shifted exponential distribution, there were statistical di¨erences using the BLUS 0 s residuals when n was small. For n ˆ f10; 15; 20; 30; 40; 50g, the paired t-test gave pvalues ˆ f0:019; 0:001; 0:000; 0:118; 0:159; 0:733g respectively. Our recovered errors procedure, however, showed no signi®cant di¨erences at the a ˆ 0:05 level with p-values ˆ f0:063; 0:199; 0:590; 0:839; 0:548; 0:536g respectively. Thus the ordering of the data does a¨ect the p-values for the BLUS 0 s residuals whereas we observed no such e¨ect with our recovered errors in these com0 Y residuals or the AB0 Y residuals can puter simulations. And so either the AH be used. 6 Nonstandard models Developments to this point presuppose assumptions A1 ÿ A3 in assessing diagnostics for A4. Clearly any of these assumptions may fail, and it remains to study e¨ects of such failure on normal diagnostics using linearly recovered errors. In particular, we study normal diagnostics under structured correlations in lieu of A3, and under certain star-shaped mixtures in lieu of A4. These choices are shaped by commonly occurring, albeit seldom recognized, structural correlations induced by the very process of taking measurements. Details follow. 6.1 Correlated errors We have seen that correlations among the observed residuals derive from the matrix of regressor variables even when the disturbances fe1 ; e2 ; . . . ; en g are uncorrelated, and that the misuse of singular, heteroscedastic, and correlated residuals may completely undermine standard diagnostics for normality. Often the disturbances fe1 ; e2 ; . . . ; en g are themselves correlated in a structured manner, and it remains to examine e¨ects of such correlations on normal diagnostics. Origins of correlated data in practice are many, including the following. Data collected in science and engineering typically entail the use of calibrated instruments. Moreover, the calibration itself is often subject in practice to experimental errors of calibration. An analysis of the propagation of errors then shows that observations fY1 ; . . . ; Yn g, when subject to a common calibration error, typically are dependent and often are equicorrelated with parameter r. Although pervasive, this problem generally has been overlooked by statisticians, who typically are neither engineers nor scientists, and by scientists and engineers, who seldom are statisticians. For a treatment of further consequences of this oversight see Jensen (1996).

Recovered errors and normal diagnostics in regression

117

Additional e¨ects on normal diagnostics due to equicorrelated model disturbances are simple. There are none. The principal ®ndings may be summarized as follows, where J…r† ˆ s 2 ……1 ÿ r†In ‡ r1n 1n0 †. Proposition 2. Consider the model (1) together with the assumptions A1 : E…ei † ˆ 0; 1 U i U n; A2 : Var…ei † ˆ s 2 , 1 U i U n; and the revised A3 0 : r…ei ; e j † ˆ r, i 0 j ˆ 1; 2; . . . ; n. Then the linearly recovered errors fR1 ; . . . ; Rt g are homoscedastic, nonsingular, and uncorrelated. In particular, if L…Y† ˆ Nn …Xb; J…r††, then fR1 ; . . . ; Rt g comprise a simple random sample from N1 …0; s 2 …1 ÿ r††. Proof. Rewrite the model (1) as Y ˆ Xb ‡ e with X ˆ ‰1n ; X2 Š in partitioned form. Without loss of generality, we may take X2 to be centered such that X20 1n ˆ 0. Again under assumptions A1 ÿ A3 we ®nd that E…e† ˆ 0 ˆ E…R…n† † with R…n† ˆ Q 0 e as before. We further ®nd that the dispersion matrix of the ordinary residuals is V …e† ˆ Bn J…r†Bn ˆ s 2 Bn ……1 ÿ r†In ‡ r1n 1n0 †Bn ˆ s 2 ……1 ÿ r†Bn ‡ rBn 1n 1n0 Bn † ˆ s 2 …1 ÿ r†Bn , since Bn is idempotent and the second term vanishes. 9 In summary, no additional di½culties are encountered in normal diagnostics owing to equicorrelations among the unobserved disturbances. The ordinary residuals have precisely the same correlation, joint singularity, and heteroscedastic structure as before, and the linearly recovered errors again remedy those problems. Complications arising from equicorrelations induced through calibrated instruments thus do not carry over to include diagnostics for normality. We turn next to prospects for detecting nonnormality when the underlying distribution is in fact a star-shaped mixture. Origins of such mixtures may be explained as follows.

6.2 Star-shaped mixtures Mixture distributions arise in practice in a variety of circumstances, including the use of calibrated devices for which the induced correlations vary randomly from calibration to calibration, as often may be the case. Such mixtures have distinctive features. For correlation mixtures of n-dimensional normal errors, their density function f …x† may be described as symmetric star-unimodal for reasons to follow. A set S H R n containing 0 A R n is said to be symmetric and star-shaped about 0 A R n if x A S implies ÿx A S and if, for every x A S, the line segment joining 0 to x is in S. A function f …x† > 0 is called symmetric star-unimodal about 0 A R n if and only if for t > 0, its level sets Bt ˆ fx A R n : f …x† > tg are either symmetric star-shaped about 0 A R n , or they are empty. Normal mixtures on R n , as treated in Proposition 3, emerge in practice through calibration, for example, as noted. To be precise, let J…r† ˆ s 2 ……1 ÿ r†In ‡ r1n 1n0 † be an equicorrelation matrix with c…n† < r < 1 and c…n† ˆ ÿ…n ÿ 1†ÿ1 as before; denote by G0 the class of cumulative distribution functions on …c…n†; 1†; designate as g…x; m; S† the density function for Nn …m; S†; and denote by GM…m† the class of normal correlation mixtures whose typical density function takes the form

118

D. R. Jensen, D. E. Ramirez

f …x; m; G† ˆ

…1 c…n†

g…x; m; J…r†† dG…r†

…5†

with G…† A G0 . It can be shown that every f …x; 0; G† in GM…0† as in Equation 5 is symmetric star-unimodal about 0 A R n ; for further details see Jensen (1996). It remains to determine whether normal diagnostics in regression are able to discern between star-shaped mixtures and normal errors. For equicorrelated normal mixtures, the answer is negative, as summarized in the following. Proposition 3. Suppose that errors of the model (1) are generated as a mixture of equicorrelated normal errors with density as in Equation 5 with m ˆ 0. Then the linearly recovered errors R…t† ˆ ‰R1 ; . . . ; Rt Š 0 are homoscedastic, nonsingular, and uncorrelated for all such mixtures in GM…0†. Proof. Proposition 2 applies conditionally given the value of the mixing parameter r in Equation 5. As the conditional properties of e and R…t† hold independently of r by Proposition 2, the conclusion holds unconditionally as well, thus completing the proof. 9

7 Conclusions Diagnostics for normal errors in regression are subject to misuse when applied to singular, heteroscedastic, and correlated residuals. Case studies show that such misuse may undermine substantially the intent of these diagnostics and should be discontinued. Fortunately, these di½culties may be surmounted on using linearly recovered errors amenable to standard regression software packages. Performance characteristics of normal diagnostics are examined under nonstandard conditions, including equicorrelated disturbances and disturbances arising as correlation mixtures of equicorrelated normal variates. These choices are shaped by structural correlations induced by the very process of taking measurements using calibrated instruments. It is seen that normal diagnostics based on ordinary residuals, including the linearly recovered errors, cannot distinguish between normal and star-shaped distributions of disturbances arising as correlation mixtures. Grossly nonnormal error distributions may go undetected, thus beclouding the role of normal diagnostics in the choice of normal-theory inferences in regression. This is troubling, and a further assessment follows. In straight-line models, Jensen (1996) has shown that equicorrelated disturbances, and disturbances arising as mixtures of equicorrelated normal variates, adversely a¨ect inferences about the intercept. Pathologies include inconsistency in estimation and often grossly altered levels of the usual tests. On the other hand, positive correlations and their mixtures have a salutary e¨ect on normal-theory inferences regarding the slope of the line, resulting in more e½cient estimation, in preserving levels of tests, and in uniformly greater power. Details are given in Jensen (1996). Methods employed in this paper apply without further di½culty to include the multilinear model of Equation 1. In such circumstances, the inability of normal diagnostics to distinguish

Recovered errors and normal diagnostics in regression

119

between normal and star-shaped mixtures, as distributions of disturbances, is not critical to the ensuing data analysis regarding slopes. We have addressed concerns that skewness and kurtosis ratios for ordinary residuals are closer to normal values than are those of nonnormal model disturbances. On the other hand, the skewness and kurtosis ratios for the Gauss-Markov estimators also are closer to normal values than are those of nonnormal model disturbances. These facts in turn would appear to render somewhat less critical the slant towards normality of the recovered errors. References [1] Anderson TW, Darling DA (1954) A test of goodness of ®t. J. Amer. Statist. Assoc. 49:765± 769 [2] D'Agostino RB (1982) Departures from normality, tests for. In: Johnson NL, Kotz S, Read CB (eds.) Encyclopedia of statistical science, volume 2, John Wiley, New York, pp. 315±324 [3] Draper W (1965) E¨ect of carbohydrate degradation upon browning and corrosion in canned applesauce. Ph.D. Dissertation Library, Virginia Polytechnic Institute, Blacksburg, VA 24061 [4] Harman H (1976) Modern factor analysis. The University of Chicago Press, Chicago [5] Henze N, Zirkler B (1990) A class of invariant consistent tests for multivariate normality. Commun. Statis. -Theory Meth. 19:3595±3617 [6] Huang CJ, Bolch BW (1974) On the testing of regression disturbances for normality. J. Amer. Statist. Assoc. 69:330±335 [7] Jensen DR (1996) Straight-line models in star-shaped mixtures. Metrika 44:101±117 [8] Mardia KV, Zemroch PJ (1975) Algorithm AS84. Measures of multivariate skewness and kurtosis. J. Roy. Statist. Soc. C 24:262±265 [9] Misra PN (1972) Relationship between Pearsonian coe½cients of distributions of least squares estimators and the disturbance term. J. Amer. Statist. Assoc. 67:662±663 [10] Moore DS (1982) The e¨ect of dependence on chi-squared tests of ®t. Ann. Statist. 10:1163± 1171 [11] Moore DS, Gleser LJ (1983) The e¨ect of dependence on chi-squared and empirical distribution tests of ®t. Ann. Statist. 11:1100±1108 [12] Mukantseva LA (1977) Testing normality in one-dimensional and multi-dimensional linear regression. Theory Prob. Applic. 22:591±602 [13] Myers RH (1990) Classical and modern regression with applications, second edition. PWSKent Publishing Company, Boston [14] Pierce DA, Gray RJ (1982) Testing normality of errors in regression models. Biometrika 69:233±236 [15] Pierce DA, Kopecky KJ (1979) Testing goodness of ®t for the distribution of errors in regression models. Biometrika 66:1±5 [16] Ramsey JB (1969) Tests for speci®cation errors in classical linear least squares regression analysis. J. Royal Statist. Soc. Series B, 31:350±371 [17] Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52:591±612 [18] Snee RD, Pfeifer CG (1983) Graphical representation of data. In: Johnson NL, Kotz S, Read CB (eds.) Encyclopedia of statistical science, volume 3, John Wiley and Sons, Inc., New York, pp. 488±511 [19] Theil H (1965) The analysis of disturbances in regression analysis. J. Amer. Statist. Assoc. 60:1067±1079 [20] Timm NH (1975) Multivariate analysis with applications in education and psychology. Brooks/Cole Publishing Co., Monterey, California [21] White H, MacDonald GM (1980) Some large-sample tests for nonnormality in the linear regression model. J. Amer. Statist. Assoc. 75:16±28