BIASES AND STANDARD ERRORS OF ... - Springer Link

0 downloads 0 Views 310KB Size Report
regression coefficient is zero. The sample standardized regression coefficients are also biased in general, although it should not be a concern in practice when ...
PSYCHOMETRIKA — VOL . 76, NO . 4, O CTOBER 2011 DOI : 10.1007/ S 11336-011-9224-6

670–690

BIASES AND STANDARD ERRORS OF STANDARDIZED REGRESSION COEFFICIENTS

K E -H AI Y UAN UNIVERSITY OF NOTRE DAME

WAI C HAN THE CHINESE UNIVERSITY OF HONG KONG The paper obtains consistent standard errors (SE) and biases of order O(1/n) for the sample standardized regression coefficients with both random and given predictors. Analytical results indicate that the formulas for SEs given in popular text books are consistent only when the population value of the regression coefficient is zero. The sample standardized regression coefficients are also biased in general, although it should not be a concern in practice when the sample size is not too small. Monte Carlo results imply that, for both standardized and unstandardized sample regression coefficients, SE estimates based on asymptotics tend to under-predict the empirical ones at smaller sample sizes. Key words: asymptotics, bias, consistency, Monte Carlo.

1. Introduction Linear regression is typically the first model introduced in statistical text books. It is also one of the most widely used statistical methods across all disciplines. Statistical inference for regression models includes obtaining a confidence interval for each regression coefficient and performing an F -test for the overall significance of all the explanatory variables. There also exist various diagnostic tools for properly applying regression models. In addition to the regression coefficients, the standardized regression coefficients are also of substantial interest in practice (Kelley and Maxwell, 2003). This is especially true in social sciences where variables are typically measured in different and arbitrary units. In the psychometric literature, standardized regression coefficients are called Beta-coefficients while the conventional regression coefficients are called B-coefficients. The importance of the standardized regression coefficients can be testified by its coverage in popular statistical textbooks (e.g., Cohen, Cohen, West, & Aiken, 2003; Harris, 2001; Hays, 1994) as well as most widely used statistical software (e.g., SAS, SPSS). However, our knowledge of standardized regression coefficients is very limited. For example, there does not exist a formula to properly estimate the standard errors (SEs) of the sample standardized regression coefficients. We also do not know whether the sample standardized coefficients are biased or not. The purpose of this paper is to study SEs and biases of the sample standardized regression coefficients. In regression analysis, the most widely used statistic is probably the t-statistic for testing the null hypothesis that a regression coefficient is zero. Under the assumptions of given predictors and normally distributed errors, the t-statistic follows a Student t-distribution. However, data in social sciences are seldom normally distributed (see Micceri, 1989) and the predictors are typically not controllable. Nevertheless, linear regression is frequently used in social science research when studying the relationship of a response with potential predictors. Thus, random predictors This research was supported by Grants DA00017 and DA01070 from the National Institute on Drug Abuse. Requests for reprints should be sent to Ke-Hai Yuan, University of Notre Dame, Notre Dame, IN 46556, USA. E-mail: [email protected]

© 2011 The Psychometric Society

670

KE-HAI YUAN AND WAI CHAN

671

and nonnormally distributed data are not assumptions but the reality faced by many social science researchers. It is necessary to study their effect on statistical inference of the regression model. As we shall see, regardless of the distribution of the sample, the commonly used SE estimates for the B-coefficients are still consistent. Actually, a Beta-coefficient is zero if and only if the corresponding B-coefficient is zero. If the interest is to infer whether a Beta-coefficient is zero, one can just perform a standard Student t-test on the B-coefficient, which will yield a fairly accurate conclusion when the sample size is not too small. There is no need to estimate the SEs of the sample Beta-coefficients. However, when a predictor is chosen by a researcher, it is unlikely for the Beta or B-coefficient to be zero. A nonsignificant t-statistic is most likely due to a not large enough sample size. A more interesting question is how accurate a Beta-coefficient is, as characterized by a confidence interval. Reporting confidence interval was strongly recommended in the Guidelines by Wilkinson and the American Psychological Association (APA) Task Force on Statistical Inference (1999). In a section on hypothesis tests, the Guidelines (p. 599) state “It is hard to imagine a situation in which a dichotomous accept-reject decision is better than reporting an actual p value or, better still, a confidence interval.” Such a viewpoint was more clearly articulated by Kelley and Maxwell (2003, p. 306): Many times obtaining a statistically significant parameter estimate provides a research community with little new knowledge of the behavior of a given system. However, obtaining confidence intervals that are sufficiently narrow can help lead to a knowledge base that is more valuable than a collection of null hypotheses that have been rejected or that failed to reach significance, given that the desire is to understand a particular phenomenon, process, or system. In addition to confidence intervals, effect sizes or standardized coefficients are also emphasized by the Guidelines of the APA Task Force (p. 599): “Always present effect sizes for primary outcomes.” The importance of confidence intervals and effect sizes has also been echoed by researchers in other fields (Nakagawa & Cuthill, 2007; Thompson, 2001). Because SEs are key components in constructing confidence intervals for standardized regression coefficients, formulas for estimating the SEs of the sample standardized coefficients have been given in popular textbooks (e.g., Cohen et al., 2003, p. 86; Harris, 2001, p. 80; Hays, 1994, p. 709). Hays (1994, p. 708) also contains an equation stating that the sample standardized regression coefficients are unbiased. However, as we shall see, the SEs using formulas provided in these textbooks are not consistent even when the explanatory variables are given. The sample standardized coefficients are also biased although the biases disappear as the sample size goes to infinity. SEs for standardized estimates in covariance structure analysis (CSA) are also of substantial interest. Jamshidian & Bentler (2000) studied four approaches for obtaining consistent SE estimates and illustrated their performances with examples. Although the regression model can be regarded as a special case of covariance structure models, the results in Jamshidian & Bentler (2000) do not relate the SEs to the underlying distribution of the predictors and errors in the regression model. They also do not foster the understanding of the inconsistency of the SEs for Beta-coefficients provided in textbooks. Actually, there exist several differences between the regression model and CSA with respect to standardized parameter estimates: (first) A correlation structure model can be formulated as a covariance structure model and consistent SEs for standardized parameter estimates can be obtained when fitting the sample covariance matrix using a proper estimation method (Jennrich, 1974; Lee, 1985); (second) certain sets of parameter estimates and their SEs in CSA may possess an invariance property whether the covariance matrix or the correlation matrix is analyzed (Browne, 1982; Cudeck, 1989); (third) SEs in CSA may depend on how the scale of each latent variable is identified—fixing the variance of the latent variable at 1.0 may lead to different SEs and

672

PSYCHOMETRIKA

z-statistics from those of fixing one of the loadings of its indicators at 1.0 (Gonzalez & Griffin, 2001). None of these properties/problems is carried to the regression model. Our study includes models with stochastic and nonstochastic predictors. We will characterize the effect of distributions of both the predictors and errors on SEs and biases. We review the existing formula for SEs of the sample standardized regression coefficients in Section 2. Section 3 provides formulas for consistent SEs and order O(1/n) biases when predictors are both stochastic and nonstochastic. We also show that the distributions of predictors and errors do not asymptotically affect the inference of B-coefficients to make it clear that problems with existing results on biases and SEs of the sample Beta-coefficients are all due to standardization. A real data example on biases and SEs as well as a numerical one on the implications of different SEs to hypothesis testing are provided in Section 4. Section 5 contains Monte Carlo simulation results comparing formula-based biases and SEs against their empirical counterparts. Discussion and conclusion are provided at the end of the paper. Because the details leading to biases and SEs are not of direct interest to many readers of this journal, they are give in Appendices A to E on the web at www.nd.edu/~kyuan/betacoefficient/AppendicesA-E.pdf.

2. An Existing Formula of SEs for the Sample Standardized Regression Coefficients Let the regression model be y i = α + β  xi + ei ,

i = 1, 2, . . . , n,

(1)

where xi = (xi1 , xi2 , . . . , xip ) , β = (β1 , β2 , . . . , βp ) , ei are independent and identically distributed with E(ei ) = 0 and E(ei2 ) = σ 2 , and xi and ei are independent. Let Sxx , sxy , syy = sy2 be the sample covariances of xi with xi , xi with yi , and yi with yi , respectively. Then   (2) βˆ = βˆ1 , βˆ2 , . . . , βˆp = S−1 xx sxy  1/2 and αˆ = y¯ − βˆ x¯ are the least squares (LS) estimates of the β and α in (1). Let sj = sjj with sjj being the j th diagonal elements of Sxx . The j th standardized coefficient estimate is given by

βˆ∗j =

sj βˆj , sy

(3)

which is just the estimate of the j th regression coefficient when the responses and predictors are standardized. Let x represent the population of the predictors when they are stochastic, and e represent the population of the errors. When the predictors are stochastic, denote 1/2  xx = Cov(x) = (σj k ) and σj = σjj . Obviously, the population counterpart of βˆ∗j is (1)

β∗j =

σj βj , σy

where σy = (β   xx β + σ 2 )1/2 . When the predictors are nonstochastic, we assume the limit of Sxx exists and continue to denote it as Sxx . Then the population counterpart of βˆ∗j is (2)

β∗j =

sj βj , σy

where σy = (β  Sxx β + σ 2 )1/2 . Notice that the meaning of σy for random predictors is different from that for given predictors.

673

KE-HAI YUAN AND WAI CHAN

It is well known that, with given predictors, ˆ = σ 2 S−1 /n. Cov(β) xx

(4)

1/2 ˆ Denote S−1 xx = (aij ) and aj = ajj . Then it follows from (4) that the SE of βj is consistently estimated by

 βˆj ) = a√j σˆ , SE( n

(5)

where  2 1 yi − αˆ − βˆ  xi . n−p−1 n

σˆ 2 =

(6)

i=1

Using (5) and treating sj and sy as constants, one may obtain an SE estimate for βˆ∗j as ˆ j aj σ  tbj = s√ SE , nsy

(7)

where the subscript tb is for textbook because (7) is mathematically equivalent to formulas given in Cohen et al. (2003, p. 86), Harris (2001, p. 80) and Hays (1994, p. 709). However, sy is subject to sampling errors in the context of the regression model and sj is also subject to sampling errors in most applications of regression models in social sciences.  tbj may contain a substantial bias in estimating the As we shall see, the textbook formula SE SE of βˆ∗j . This bias also depends on the distributions of x and y. Because βˆ∗j is not a linear function of yi , we can only analytically study the SE of βˆ∗j using asymptotics. Since all SEs, including those in (5) and (7), approach√ zero as the sample size approaches infinity, our study will focus on the asymptotic variance of nβˆ∗j . Let 2  2tbj = = n SE νˆ nj

sj2 ajj σˆ 2

.

(8)

,

(9)

sy2

When the predictors are stochastic, we have P

2  νnj −→ νj21 =

σj2 cjj σ 2 σy2

P

where −→ denotes convergence in probability and cjj is the j th diagonal element of  −1 xx = C = (cij ). When the predictors are nonstochastic, P

2 −→ νj22 = νˆ nj

sj2 ajj σ 2 σy2

(10)

.

In particular, when p = 1, νj21 = ν12 =

σ2 = 1 − ρ2 σx2 β 2 + σ 2

and νj22 = ν22 =

σ2 , sx2 β 2 + σ 2

where ρ = σx β/(σx2 β 2 + σ 2 )1/2 is the population correlation between x and y.

(11)

674

PSYCHOMETRIKA

3. Standard Errors and Biases of the Sample Standardized Regression Coefficients In this section we will study the bias and SE of βˆ∗j . When xi are stochastic, we will study how the SE and bias of βˆ∗j are affected by the distributions of x and e. When xi are nonstochastic, we will study how the SE and bias of βˆ∗j are affected by the distribution of e. We will also show that the βˆ in (2) is unbiased and its asymptotic SEs are not affected by the distributions of x and y. 3.1. Standard Errors of βˆ∗j with Stochastic Predictors We will use the central limit theorem and the delta-method (Ferguson, 1996) to obtain the asymptotic distribution of βˆ∗j . Let vech(Sxx ) be the vector by stacking the columns corresponding to the lower triangular part of Sxx , σ xy = Cov(x, y), ⎛ ⎞ ⎛ ⎞ vech(Sxx ) vech( xx ) ⎠, ⎠, sxy σ xy s=⎝ σ =⎝ syy σyy μx = (μ1 , μ2 , . . . , μp ) = E(x), κe = E(e4 )/σ 4 be the kurtosis of e, and



  xx = E vech (x − μx )(x − μx ) −  xx vech (x − μx )(x − μx ) −  xx . Notice that s is a vector of sample moments. It follows from the central limit theorem that √

L

n(s − σ ) → N (0, ),

(12)

L

where → denotes convergence in distribution, and ⎛  11  12  = ⎝  21  22 γ 31 γ 32

⎞ γ 13 γ 23 ⎠ γ33

with  11 =  xx , Dp being the duplication matrix (Magnus & Neudecker, 1999, p. 49),



   12 = E vech (x − μx )(x − μx ) −  xx (x − μx )(y − μy ) − σ xy =  xx Dp (β ⊗ I),



 γ 13 = E vech (x − μx )(x − μx ) −  xx (y − μy )2 − σy2 =  xx Dp (β ⊗ β),



   22 = E (x − μx )(y − μy ) − σ xy (x − μx )(y − μy ) − σ xy   = β  ⊗ I Dp  xx Dp (β ⊗ I) + σ 2  xx ,



 γ 23 = E (x − μx )(y − μy ) − σ xy (y − μy )2 − σy2   = β  ⊗ I Dp  xx Dp (β ⊗ β) + 2( xx β)σ 2 ,

2       = β ⊗ β  Dp  xx Dp (β ⊗ β) + 4 β   xx β σ 2 + (κe − 1)σ 4 ; γ33 = E (y − μy )2 − σy2  21 =  12 ,

γ 31 = γ 13 ,

γ 32 = γ 23 .

Let δ j be a p × 1 vector whose j th element is 1.0 and the others are zero. Then (1) = h(σ ) = β∗j

σj  −1 δ  σ xy . σy j xx

675

KE-HAI YUAN AND WAI CHAN

Let σ xx = vech( xx ). Using the chain rule of differentiation, the derivatives of h(σ ) with respect to σ xx , σ xy and σyy are, respectively, h˙ 1 (σ ) =

 βj σj   β ⊗ cj Dp , v − 2σj σy j σy

h˙ 2 (σ ) =

σj  c , σy j

and h˙ 3 (σ ) = −

σj βj , 2σy3

(13)

where v j is a vector of length p(p + 1)/2 whose element corresponding to the position of σjj in   σ xx is 1.0 and 0 elsewhere; and cj =  −1 xx δ j = (c1j , c2j , . . . , cpj ) = (cj 1 , cj 2 , . . . , cjp ) . Denote ˙ ) = (h˙ 1 (σ ), h˙ 2 (σ ), h˙ 3 (σ )). It follows from the delta-method that h(σ   √  (1)  L n βˆ∗j − β∗j → N 0, ωj21 , where ˙ ) h˙  (σ ). ωj21 = h(σ

(14)

¯ (yi − y) ¯ 2 ) . A consistent estimator of ωj21 Let ui = (vech [(xi − x¯ )(xi − x¯ ) ], (xi − x¯ ) (yi − y), is obtained when replacing  by the sample covariance matrix Su of ui and σ by s. When βj = 0, (13) reduces to   h˙ 1 (σ ) = −σj β  ⊗ cj Dp /σy ,

h˙ 2 (σ ) = σj cj /σy ,

and h˙ 3 (σ ) = 0.

(15)

Simplifying (14) with (15) and the expressions for the elements of the  in (12) yields ωj21

=

σj2 cjj σ 2 σy2

(16)

,

 tbj is consistent when βj = 0. which is just the νj21 in (9). Thus, SE Notice that the  in (12) only contains the second- and fourth-order moments of x and y; ˙ ) only contains the second-order moments of x and y. Skewnesses of x and y do not and h(σ appear in ωj21 . Because  xx contains p(p + 1)(p 2 + p + 2)/8 nonduplicated fourth-order moments, it is not so informative to characterize how ωj21 depends on each of these moments. We further examine the dependence of ωj21 on the kurtosis of x = (x1 , x2 , . . . , xp ) when it follows an elliptical distribution with relative kurtosis ηx = E(xj − μj )4 /(3σj4 ), that is, +   xx = 2ηx D+ p ( xx ⊗  xx )Dp + (ηx − 1)σ xx σ xx ,

(17)

4  −1  4 where D+ p = (Dp Dp ) Dp . Notice that all κj = E(xj − μj ) /σj within an elliptical distribution are equal and there exist κj = 3ηx = κx , j = 1, 2, . . . , p. Under (17), Appendix A on the web contains the detail of (14) becoming

ωj21 =

σj2 cjj σ 2 σy2

+

2] ηx βj2 [σj2 (β   xx β) − σyj

σy4

+

(3ηx + κe − 6)σj2 βj2 σ 4 4σy6



σj2 βj2 σ 2 σy4

,

(18)

where σyj = Cov(y, xj ) = β   xx δ j . Since δ j is not proportional to β, it follows from the Cauchy–Schwarz inequality that  2      2 σyj = δ j  xx β < δ j  xx δ j β   xx β = σj2 β   xx β .

(19)

676

PSYCHOMETRIKA

Thus, ωj21 increases when either ηx or κe increases. When ηx = 1 and κe = 3, the ωj21 in (18) reduces to 2 ωnmj 1=

σj2 cjj σ 2 σy2

+

2] βj2 [σj2 (β   xx β) − σj2 σ 2 − σyj

σy4

,

(20)

where the subscript nm is for normal distribution. The first term on the right side of (20) is just 2 defined in (8) is biased even for normally distributed the νj21 given in (9). Consequently, the νˆ nj x and e. This bias depends on the value of β,  xx and σ 2 . According to (19) and (20), the bias is positive when σ 2 is large and negative when σ 2 is small. When p = 1, the ωj21 in (18) becomes ω12 =

[(κx − 1) + (κe − 1)]σx2 β 2 σ 4 + 4σ 6 , 4(σx2 β 2 + σ 2 )3

which obviously increases with κx and κe . When κx = κe = 3, 2 ω12 = ωnm1 =

2  σ4 = 1 − ρ2 , 2 2 2 2 (σx β + σ )

(21)

where the ρ is the same as in (11). We have 2 = ρ 2 (1 − ρ 2 ). ν12 − ωnm1

Thus, with normally distributed data, (7) tends to overestimate the SE of βˆ∗ for simple regression. The overestimation reaches its maximum value of .25 when ρ 2 = 1/2 or ρ = .707. It is even more 2 informative to compare the relative value of the ν12 in (11) and the ωnm1 in (21). Because ν12 2 ωnm1

=

1 1 − ρ2

is between 1 and ∞, the formula of SE given in popular textbooks may grossly overestimate the true SE of βˆ∗ . Only when ρ = 0 is the SE estimate in (7) consistent. 2 When p = 1, βˆ∗ =√r is just the product-moment correlation and the ωnm1 in (21) is the asymptotic variance of nr under the normality assumption on (x, y). This result is well known (see e.g., Olkin & Finn, 1995; Yuan & Bentler, 2000) although rigorous study of standardized regression coefficients cannot be found in the literature. When p = 1 and data are normally distributed, reliable inference for ρ can be obtained by Fisher’s z-transformation  

1 F (r) − F (ρ) ∼ N 0, , (22) n−3 where F (r) = ln[(1 + r)/(1 √ − r)]/2. This is asymptotically equivalent to estimating the SE of  = (1 − r 2 )/ n − 3. With (14), we may estimate the SE of βˆ∗j by βˆ∗ = r by SE(r) ˙  (s)]1/2 ˙ uh  cj 1 = [h(s)S SE , √ n−3

(23)

where the subscript c is for consistency. Of course, when both x and e are normally distributed, we may estimate the SE of βˆ∗j by ωˆ nmj 1  nmj 1 = √ SE , n−3

(24)

KE-HAI YUAN AND WAI CHAN

677

where, according to (20), 2 ωˆ nmj 1=

sj2 ajj σˆ 2 s2y

+

ˆ − s 2 σˆ 2 − s 2 ] βˆj2 [sj2 (βˆ  Sxx β) j yj sy4

 cj 1 and SE  nmj 1 with syj being the sample covariance of yi and xij . The performance of SE will be examined by simulation in Section 5. 3.2. Biases in βˆ∗j with Stochastic Predictors Following the notation introduced in the previous section, we have    

  σj βj 3   (1) ˆ E β∗j = β∗j + β ⊗ β  Dp  xx Dp (β ⊗ β) + 4 β   xx β σ 2 + (κe − 1)σ 4 nσy 8σy4   

1 1 σ2 1  (25) v j  xx Dp (β ⊗ β) − (κj − 1) − 2 + O 2 . − 2 2 8 σy n 4σj σy The detail leading to (25) is provided in Appendix B on the web. Thus, βˆ∗j contains a bias of order O(1/n). This bias depends on the second- and fourth-order moments of x and e, not their skewnesses. When x follows an elliptical distribution with relative kurtosis ηx = κx /3, we have      2 3σ 4 (δ j  xx β)2 (β   xx β + σ 2 ) σj βj (β  xx β)2   (1) η − β −  β σ − E(βˆ∗j ) = β∗j + x xx 2 8 nσy5 2σj2       2 (3κe − 10)σ 4 1 (26) +O 2 . + β  xx β σ + 8 n According to (19), the quantity within the curly brackets in (26) is positive when σ 2 is small. In such a case, if κe >

10 8(β   xx β) , − 3 3σ 2

the absolute bias in βˆ∗j increases with both ηx and κe . When σ 2 is not trivial, the quantity within the curly brackets will be negative and the relationship between the kurtosis and the bias in βˆ∗j is rather complicated. When both x and e are normally distributed, we have      (δ j  xx β)2   σj βj   1 (1) 2 ˆ β  xx β − σ − +O 2 . (27) E β∗j = β∗j + 3 2 2nσy n σj The bias has the same sign as βj when σ 2 is small and it has the opposite sign with βj when σ 2 is large. For simple regression with p = 1, (27) reduces to     σx βσ 2 1 . + O E βˆ∗ = β∗(1) − 3 2nσy n2 Thus, for normally distributed data, βˆ∗ contains a bias of order O(1/n) that is in the opposite direction of the regression coefficient.  nmj 1 , respectively. We will  cj 1 and bias Let the bias estimates based on (25) and (27) be bias compare them with empirical biases in Section 5.

678

PSYCHOMETRIKA

3.3. Standard Errors of βˆ∗j with Nonstochastic Predictors When the predictors are nonstochastic, Appendix C on the web contains the detail leading to   √  (2)  L n βˆ∗j − β∗j → N 0, ωj22 , where ωj22

=

sj2 σ 2 σy2

  (κe − 9)βj2 σ 2 − 4βj2 (β  Sxx β) ajj + . 4σy4

(28)

Thus, ωj22 proportionally increases with κe . When κe = 3, (28) becomes 2 ωnmj 2=

sj2 ajj σ 2 σy2



sj2 βj2 σ 2 [3σ 2 + 2(β  Sxx β)] 2σy6

.

(29)

 tbj in (7) Notice that the first term on the right side of (29) is the νj22 given in (10). The SE generally overestimates the variance of βˆ∗j for normally distributed e. For simple regression, (28) becomes ω22 =

4σ 6 + (κe − 1)sxx β 2 σ 4 , 4(sxx β 2 + σ 2 )3

(30)

which is clearly an increasing function of κe . When κe = 3, (30) becomes 2 = ω22 = ωnm2

(2σ 2 + sxx β 2 )σ 4 , 2(sxx β 2 + σ 2 )3

which is a special case of (29). It follows from (11) and (30) that ν22 − ω22 =

[4sxx β 2 + (9 − κe )σ 2 ]sxx β 2 σ 2 . 4(sxx β 2 + σ 2 )3

Thus, for simple regression with β = 0, only when κe = 9 +

4sxx β 2 σ2

is the estimate in (7) consistent for the SE of βˆ∗ . We need a consistent estimate of κe when evaluating the ωj22 in (28). This can be done using the marginal kurtoses of x and y. Let bxi = β  (xi − x¯ ). It follows from yi − μy = β  (xi − x¯ ) + ei that         E(yi − μy )4 = E bx4i + 6E bx2i σ 2 + 4E(bxi )E ei3 + E ei4 , whose sample counterpart is n  i=1

(yi − y) ¯ = 4

n  i=1

 bˆx4i

+6

n  i=1

 bˆx2i

σˆ 2 +

n 

ei4 ,

i=1

where bˆxi = βˆ  (xi − x¯ ). Thus, n 4 e /n κˆ y σˆ y4 − (κˆ bx σˆ b4x + 6σˆ b2x σˆ 2 ) κˆ e = i=1 4 i = , σˆ σˆ 4

(31)

KE-HAI YUAN AND WAI CHAN

679

where κˆ bx and σˆ b2x are the sample variance and sample kurtosis of bˆxi = βˆ  (xi − x¯ ), respectively. Notice that there are two ways to estimate σˆ 2 , one is given by (6) and the other is σˆ 2 =

n   2 yi − αˆ i − βˆ  xi /n.

(32)

i=1

To be consistent with the commonly used variance estimator in estimating the kurtosis, we choose (32) when estimating (28) and evaluating (31). This also implies that σˆ y2 = βˆ  Sxx βˆ + σˆ 2 is identical to the sample variance of yi . Because both the numerator and denominator in (28) involve σ 2 , the choice only has a tiny effect on the resulting ωˆ j22 when the sample size is small. The corresponding SE of βˆ∗j is given by ˆ j2  cj 2 = ω SE √ , n √ √ where we choose n in the denominator instead of n − 3, which is because the result in (22) was established when both x and y are normally distributed. Similarly, let ωˆ nmj 2 be the variance estimator according to (29), where the σˆ 2 in (6) is used to estimate σ 2 . The resulting SEs for βˆ∗j is given by nmj 2  nmj 2 = ωˆ√ SE . n

These two SE estimates are further examined by simulation in Section 5. 3.4. Biases in βˆ∗j with Nonstochastic Predictors With xi being nonstochastic, Appendix D on the web contains the detail leading to  

  sj βj σ 2  1 (2) 2 ˆ 8β Sxx β + (3κe − 7)σ + O 2 . E β∗j = β∗j + 5 8nσy n

(33)

Thus, βˆ∗j has a bias of order O(1/n). This bias has the same sign as βj when κe >

7 8β  Sxx β − 3 3σ 2

(34)

and the opposite sign otherwise. When κe = 3, (33) becomes      sj βj σ 2   1 (2) 2 . 4β + O S β + σ E βˆ∗j = β∗j + xx 5 4nσy n2

(35)

Obviously, κe = 3 for normally distributed errors satisfies (34). We will use biascj 2 and biasnmj 2 to denote the order O(1/n) biases in (33) and (35), respectively. These will be further examined in Section 5, using Monte Carlo simulation. 3.5. Inferences with Unstandardized Regression Coefficients  Let sxe = ni=1 (xi − x¯ )ei /n. It follows from −1 βˆ = S−1 xx sxy = β + Sxx sxe

(36)

680

PSYCHOMETRIKA

ˆ = β regardless of the distribution of e and/or x. Thus, βˆ is always unbiased under the that E(β) condition E(ei ) = 0. With random predictors, Appendix E on the web shows that the asymptotic distribution of βˆ is described by  L √  n βˆ − β → N (0, ), (37) where  = σ 2  −1 ˆ 2 S−1 xx is consistently estimated by σ xx . Thus, the commonly used estimates ˆ for SEs of β are still consistent regardless of the distributions of x and e. The effect of the distributions of x and e on the asymptotic SE and bias of βˆ∗j is totally due to standardization. √ When e ∼ N(0, σ 2 ), the conditional distribution of nβˆj /(σˆ aj ) given Sxx is the Student t-distribution with n − p − 1 degrees of freedom, which does not depend on Sxx . Thus, randomness of x has no effect on testing Hj 0 : βj = 0 when the inference is based on the t-distribution. But the commonly used t-statistic will not follow the Student t-distribution anymore when e is nonnormally distributed. Whether e follows a normal distribution or not. The result in (37) indicates that referring the t-statistic to the Student t-distribution or N (0, 1) will be a safe procedure at a relative large n.

4. Two Examples The purpose of the first example is to see, with real data, how much the SE estimates obtained in the previous section can differ from the one given in textbooks, not to elaborate on the substantive content of the real data. We will also estimate the bias in βˆ∗j . The second example is to show that different SEs also have different implications when the interest is in hypothesis testing. Example 1. Stevens (1996, pp. 594–595) contains a data set on quality ratings of 46 research doctorate programs in psychology as well as six potential correlates of the quality ratings, which was from a 1982 National Academy of Sciences published report and rated by senior faculty in the field who taught at institutions other than the one being rated. The seven variables are: mean rating of scholarly quality of program faculty (Quality); number of faculty members in program as of December 1980 (Nfaculty); number of program graduates from 1975 through 1980 (Ngrads); percentage of program graduates from 1975 to 1979 that received fellowships or training grant support during their graduate education (Pctsupp); percent of faculty members holding research grants from the Alcohol, Drug Abuse and Mental Health Administration, the National Institute of Health or the National Science Foundation at anytime during 1978–1980 (Pctgrant); number of published articles attributed to program faculty members 1978–1980 (Narticle); percent of faculty with one or more published articles from 1978 to 1980 (Pctpub). Let Quality be the dependent variable and the others be predictors. We regard the predictors as random because they are not controllable. SAS 9.2 stepwise regression was used first to see which of the six potential correlates significantly predict Quality. The predictors entered into the regression equations in sequence are Narticle, Pctgrant, and Pctsupp. The other three predictors are not selected at the .05 level. So three regression models are favored in the process of variable selection: (first) program quality predicted by the number of articles; (second) program quality predicted by the percent of grant and the number of articles; (third) program quality predicted by the percent of support, the percent of grant and the number of articles. We will study these three models conditional on the variables identified by stepwise regression. The sample standardized regression coefficients, as well as their bias and SE estimates, for the three regression models are reported on the left side of Table 1. The unstandardized regression

681

KE-HAI YUAN AND WAI CHAN TABLE 1.

Example 1, regression coefficients and their biases and SEs for three regression models: program quality is predicted by the number of articles; percent of grant and the number of articles; percent of support, percent of grant and the number of articles. All the predictors are treated as random. Data are from Stevens (1996).

Standardized Regression c  tb bias SE

βˆ∗

 nm bias

Narticle

.762

−.003

−.008

Pctgrant Narticle

.454 .564

−.000 −.001

Pctsupp Pctgrant Narticle

.263 .391 .495

.001 .000 .000

Regression  SE

 nm SE

c SE

βˆ

.098

.064

.062

.130

.017

.000 −.004

.085 .085

.085 .081

.084 .083

.250 .096

.047 .014

.002 .000 −.003

.075 .078 .078

.077 .080 .078

.079 .074 .070

.113 .216 .084

.032 .043 .013

coefficients and their SE estimates are provided on the right side for reference. All the estimated biases are small compared to the value of βˆ∗j , mainly because the sample size is not too small.  nm  tb for βˆ∗ is over 50% larger than both SE When only Narticle is in the model, the estimate SE  and SEc . However, when either Pctgrant or Pctsupp and Pctgrant are in the model, the three SE  tb ’s are even smaller than the corresponding SE  nm ’s and estimates of βˆ∗j are very close; some SE  SEc ’s.  tb is not consistent, it might be close The results of Example 1 imply that, although SE   to SEnm or SEc when p > 1. We will further study them in the next section by Monte Carlo simulation. Example 2. Different SEs do not just generate confidence intervals with different widths, they also yield different conclusions when the confidence intervals are used for hypothesis testing. For the βˆ∗ in the simple regression model, if we use (7) for confidence intervals, the lower bound  tb , where cα is the upper of the confidence interval with level 1 − 2α is given by Lα = βˆ∗ − cα SE 1 − α percentile of N (0, 1). Suppose the predictor and the response are positively correlated and we choose to use whether the confidence interval contains 0 to test the null hypothesis β∗ = 0.  tb Notice that the population counterpart of βˆ∗ is β∗ = ρ and the population counterpart of SE is also a function of β∗ = ρ according to (11). Thus, at the population level Lα = Lα (β∗ ) is a function of β∗ . Solving Lα (β∗ ) > 0 for β∗ leads to β∗ >

cα = β∗tb . (n + cα2 )1/2

Similarly, β∗ >

(4cα2 + n)1/2 − n1/2 = β∗nm1 2cα

implies that the population counterpart of the lower bound corresponding to the 1 − 2α confi nm1 is greater than 0. Clearly, both β∗tb and β∗nm1 are dence interval resulting from using SE functions of cα and the sample size n. Figure 1 contains their plots against n at cα = 1.96. When  nm1 but βˆ∗ falls between curves A and B, the null hypothesis β∗ = 0 is rejected according to SE  tb . not according to SE Clearly, both β∗tb and β∗nm1 go to zero as n increases and their difference disappears. How nm1 and ever, the relative difference of widths of the two confidence intervals resulting from SE  tb or their ratio remains as a constant when n increases. SE

682

PSYCHOMETRIKA

F IGURE 1.

 tb and SE  nm at 1 − 2α = .95 and n = 10 to 40: Confidence Approximate critical regions for βˆ∗ corresponding to SE   nm do not ˆ intervals based on SEtb do not contain 0 when β∗ is above curve A; and confidence intervals based on SE contain 0 when βˆ∗ is above curve B.

The curves in Fig. 1 are approximations obtained when the estimates are replaced by their asymptotic values. In practice, the differences between different confidence intervals can be more or less, depending on the sample size as well as the distribution of the data.

5. Monte Carlo Results All the formulas of SEs obtained in Section 3 are based on asymptotics. This section com ep ’s). The purpose is to see at pares estimates based on these formulas with the empirical SEs (SE what sample size the estimates based on asymptotics can reasonably approximate the empirical ones. We will study both simple regression and multiple regression. Models with fixed predictors are also examined. We will also exam the biases in βˆ∗j with both random and nonstochastic predictors. The conditions for simple regression with random predictors are: β = 0, .2, .4, .6, .8; μx = 0, σx2 = 1, and σ 2 = 1 − β 2 , so that σy2 = 1 and β∗ = β. Two distribution conditions for (x, e) are: (1) x ∼ N(0, 1) and e ∼ N (0, σ 2 ), (2) x follows the standardized1 (1/2) and e = σ z with z following standardized (1/2). The sample sizes are n = 10, 50, 100 and 500. For each combination of β, distribution of (x, e) and sample size, 1000 replications are used. The conditions on β, n and the number of replications for simple regression with nonstochastic predictors are the same as for simple regression with random predictors. The predictors xi for a given n are obtained by a random sample from N (0, 1) and remain unchanged across the 1000 replications. Let sx be the sample standard deviation of x1 , x2 , . . . , xn , and σ 2 = sx2 (1 − β 2 ). The two distribution conditions of e are (1) e ∼ N (0, σ 2 ) and (2) e = σ z with z following standardized (1/2), thus, β∗ = β. For multiple regression, we choose p = 5. The conditions for multiple regression with random predictors are: β = (0, .2, .4, .6, .8) , μx = 0,  xx being a p × p correlation matrix of compound symmetry, and σ 2 = 1. Two distribution conditions for (x, e) are chosen: 1 A standardized gamma distribution is obtained by x = [u − E(u)]/[Var(u)]1/2 , where E(u) and Var(u) are the mean and variance of u ∼ (1/2).

683

KE-HAI YUAN AND WAI CHAN TABLE 2.

Biases and SEs of βˆ∗ in simple regression with normally distributed x and e.

Standardized Regression  ep SE  tb SE  nm bias

c SE

 ep SE

Regression  ep   ep bias SE SE

.334 .143 .101 .045

.313 .140 .099 .045

.339 .141 .100 .045

−.033 −.006 .002 .000

.356 .145 .101 .045

.382 .144 .101 .045

.326 .140 .099 .044

.325 .137 .097 .043

.302 .136 .096 .043

.329 .139 .097 .043

−.009 −.004 −.004 −.000

.348 .140 .099 .044

.383 .141 .099 .045

−.008 .004 .001 −.002

.305 .131 .092 .041

.288 .120 .084 .038

.268 .119 .083 .038

.291 .122 .083 .039

.007 .010 .003 −.001

.322 .132 .093 .041

.342 .137 .092 .042

−.016 −.004 −.002 −.000

−.017 −.009 −.002 .000

.270 .115 .081 .036

.229 .094 .065 .029

.212 .092 .065 .029

.235 .096 .063 .029

.008 −.005 −.001 −.000

.282 .116 .081 .036

.302 .117 .079 .037

−.009 −.002 −.001 −.000

−.026 −.004 −.004 −.000

.209 .086 .061 .027

.142 .053 .037 .016

.133 .052 .037 .016

.157 .053 .038 .016

−.003 .000 −.004 .000

.215 .087 .061 .027

.235 .088 .064 .027

n

 nm bias

c bias

.00 .00 .00 .00

10 50 100 500

.001 .000 −.000 −.000

.001 .000 −.000 −.000

−.029 −.006 .002 .000

.331 .143 .101 .045

.20 .20 .20 .20

10 50 100 500

−.006 −.002 −.001 −.000

−.008 −.002 −.001 −.000

−.018 −.003 −.004 −.001

.40 .40 .40 .40

10 50 100 500

−.013 −.003 −.002 −.000

−.014 −.003 −.002 −.000

.60 .60 .60 .60

10 50 100 500

−.015 −.004 −.002 −.000

.80 .80 .80 .80

10 50 100 500

−.013 −.003 −.001 −.000

β = β∗

(1) x ∼ N (0,  xx ) and e ∼ N (0, σ 2 ), (2) x follows an elliptical distribution with ηx = 15 and e = σ z with z following standardized (1/2). Thus, β∗j does not equal βj for multiple regression. The sample sizes are n = 30, 100, 500 and 1000. For each combination of distribution of (x, e) and sample size, 1000 replications are used. The conditions on β, p, n and the number of replications for multiple regression with nonstochastic predictors are the same as for multiple regression with random predictors. The predictors xi for a given n are obtained by a random sample from N (0,  xx ) and remain unchanged across the 1000 replications. The two distribution conditions of e are (1) e ∼ N (0, 1) and (2) e following standardized (1/2). Again, β∗j does not equal βj .  cj , bias  nmj , SE  tbj , SE  cj , SE  nmj are For each replication at each of the conditions, βˆ∗j , bias  ep ’s of βˆ∗j are obtained using the formulas given in Sections 2 and 3. Empirical biases and SE obtained using the sample means and sample covariance matrix of the 1000 βˆ∗j ’s for each combination of conditions. The averages of the formula-based biases and SEs across the 1000 repli nmj , SE  tbj , SE  cj and SE  nmj in the  cj , bias cations are also obtained and are still denoted as bias ˆ following tables. For reference, parallel statistics for unstandardized βj are also obtained, using  cj is consistent for the SE of βˆ∗j as long as the distribution of e or (5) for SEs. Notice that SE  nmj needs e or (x, e) to (x, e) has finite fourth-order moments. However, the consistency of SE be normally distributed. We expect to see these properties in the Monte Carlo results. We also expect to see some biases in βˆ∗j when n is small.  for each βˆ∗ in simple regression with normally distributed Three bias estimates and four SE’s  ep as well as the  tb predicts SE x and e are reported on the left side of Table 2. When β = 0, SE  ep as β increases. Although other two SE estimates when n ≥ 50. But it tends to over-predict SE  c are consistent, SE  nm predicts SE  ep better than SE  c at n = 10 and 50. When  nm and SE both SE

684

PSYCHOMETRIKA TABLE 3.

Biases and SEs of βˆ∗ in simple regression with xi being given and e following a normal distribution.

Standardized Regression  ep SE  tb SE  nm bias

c SE

 ep SE

Regression  ep   ep bias SE SE

.280 .138 .099 .045

.272 .138 .099 .045

.334 .139 .102 .045

−.008 .002 .005 .002

.310 .141 .100 .045

.317 .138 .102 .045

.326 .140 .098 .044

.270 .131 .094 .042

.260 .131 .094 .042

.317 .138 .098 .043

.001 .007 .003 −.002

.300 .138 .098 .044

.312 .141 .100 .044

.013 .003 .002 .001

.305 .131 .092 .041

.239 .113 .080 .036

.223 .111 .079 .036

.265 .109 .082 .036

−.005 .001 .000 .000

.280 .129 .091 .041

.276 .124 .092 .041

.013 .004 .002 .000

.028 .007 .003 .000

.262 .114 .080 .036

.177 .081 .058 .026

.155 .079 .057 .026

.194 .085 .057 .026

.001 .005 .003 −.000

.242 .113 .080 .036

.250 .117 .079 .036

.018 .004 .002 .000

.023 .003 .003 −.000

.193 .085 .060 .027

.095 .043 .030 .013

.081 .041 .029 .013

.093 .042 .030 .014

.006 −.003 .002 −.001

.184 .085 .060 .027

.194 .084 .060 .026

n

 nm bias

c bias

.00 .00 .00 .00

10 50 100 500

−.000 .000 .000 .000

−.001 .000 .000 .000

−.009 .003 .005 .002

.332 .143 .100 .045

.20 .20 .20 .20

10 50 100 500

.006 .001 .001 .000

.004 .001 .001 .000

.007 .007 .004 −.002

.40 .40 .40 .40

10 50 100 500

.013 .003 .001 .000

.008 .002 .001 .000

.60 .60 .60 .60

10 50 100 500

.018 .004 .002 .000

.80 .80 .80 .80

10 50 100 500

.018 .004 .002 .000

β = β∗

 c predicts SE  ep as well as SE  nm . At n = 10, SE  c is substantially smaller than SE  ep , n ≥ 100, SE  which is expected because SE based on asymptotics tends to be smaller than the corresponding  ep when the sample size is small. Such a phenomenon is also observed for the estimates of SE SEs of βˆ on the right side of Table 2. According to (27), the order O(1/n) population biases at n = 10 and β = .2, .4, .6, .8 are −.010, −.017, −.019 and −.014, respectively; while the biases for the other conditions are between .000 and .004. These are well-reflected by the empirical and formula-based biases in Table 2. Of course, biases reported in Table 2 also contain sampling errors; the empirical biases also contain the effect beyond the O(1/n) order. Biases and SEs for simple regression with given predictors and normally distributed e are  of βˆ∗ are similar to those in presented in Table 3, where the performances of the different SE’s  tb still over-predicts SE  ep even when xi are nonstochastic. With given Table 2. In particular, SE  is still biased due to taking the square ˆ = σˆ 2 /sxx is unbiased for Var(β). ˆ But SE  β) predictors, Var(   ep on the right side of Table 3 for root, which is reflected by the discrepancy between SE and SE smaller n. Using sx = 1 and σy = 1 as approximations when evaluating (35), the order O(1/n) population biases in βˆ∗ at n = 10 and β = .2, .4, .6 and .8 are, respectively, .005, .012, .020, and .021. All the rest of the population biases are between .000 and .004. These population values are clearly reflected empirically in Table 3. Table 4 contains the biases and SEs for the simple regression model when x and e follow  c for βˆ∗ is consistent, it tends to substantially under-predict standardized (1/2). Although SE  the corresponding SEep when n ≤ 100. Similar amount of under-prediction also occurs to the  tb predicts commonly used formula of SEs for βˆ on the right side of the table. When β = 0, SE

685

KE-HAI YUAN AND WAI CHAN TABLE 4.

Biases and SEs of βˆ∗ in simple regression with both x and e following standardized (1/2).

Standardized Regression  ep SE  tb SE  nm bias

c SE

 ep SE

Regression  ep   ep bias SE SE

.337 .143 .100 .045

.258 .121 .089 .043

.330 .144 .102 .046

−.006 .004 −.002 −.000

.464 .151 .105 .045

.619 .162 .107 .047

.316 .140 .098 .044

.311 .137 .096 .043

.255 .126 .094 .046

.344 .148 .108 .048

−.001 −.003 −.004 −.002

.419 .148 .102 .044

.553 .155 .109 .044

.018 .009 −.001 .001

.292 .129 .092 .041

.271 .118 .084 .038

.237 .132 .100 .050

.328 .155 .112 .052

−.001 .002 .000 −.001

.413 .138 .096 .041

.521 .145 .096 .041

−.017 −.005 −.003 −.001

−.018 −.007 −.010 −.001

.256 .113 .080 .036

.217 .092 .065 .029

.210 .122 .095 .048

.294 .145 .109 .050

.001 .003 −.004 .001

.341 .122 .084 .036

.392 .133 .085 .036

−.012 −.007 −.005 −.002

−.051 −.009 −.009 −.002

.201 .085 .060 .027

.144 .053 .037 .016

.152 .084 .067 .034

.243 .109 .079 .037

.007 .002 −.002 −.001

.263 .090 .062 .027

.361 .101 .064 .027

n

 nm bias

c bias

.00 .00 .00 .00

10 50 100 500

.002 −.000 .000 .000

.000 .000 −.000 .000

−.016 .003 −.001 −.000

.332 .143 .100 .045

.20 .20 .20 .20

10 50 100 500

−.007 −.002 −.001 −.000

−.009 .001 .002 .001

.041 .004 −.001 −.000

.40 .40 .40 .40

10 50 100 500

−.011 −.003 −.002 −.000

−.015 −.001 .001 .001

.60 .60 .60 .60

10 50 100 500

−.012 −.003 −.002 −.000

.80 .80 .80 .80

10 50 100 500

−.011 −.003 −.001 −.000

β = β∗

 ep the best among the three formula-based SEs for βˆ∗ . Actually, according to (16), the nonSE  tb when β = 0. But normality of the distribution of x or e does not affect the consistency of SE  tb becomes obvious as n and β increase. When n = 500, the SE  c ’s are the inconsistency of SE  ep across the 4 values of β. only slightly below SE It follows from (26) that the order O(1/n) population biases at n = 10 and β = .2, .4, .6, .8 are, respectively, .041, .036, −.028, and −.094; the bias at n = 50 and β = .8 is −.019. Those for all the other conditions are below .009 in absolute value. These population biases are clearly  nm and bias  c when reflected by empirical ones in Table 4. But they are not well-reflected by bias β = .20 and .40 at n = 10, most likely because βˆ∗ has relatively large SEs for these conditions. For simple regression with nonstochastic predictors and e following standardized (1/2),  c tends to substantially under-predict the the results on the left side of Table 5 indicate that SE  ep of βˆ∗ when n ≤ 50 while the two are very close at n = 500. In contrast, SE  tb can be SE  ep (e.g., when β = .6) or above SE  ep (e.g., when β = .8) regardless of the consistently below SE  nm can be substantially below value of n. Due to the nonnormality of the distribution of e, SE    ˆ ˆ SEep when β is large. Similar to SEc for β∗ , SE for β on the right side of Table 5 are also  ep at smaller n, although Var( ˆ is unbiased  β) substantially below the corresponding value of SE ˆ for Var(β). Using sx = 1 and σy = 1 as approximations when evaluating (33), the order O(1/n) population biases at n = 10 and β = .2, .4, .6, .8 are, respectively, .088, .139, .131, .068; the biases at n = 50 and β = .2, .4, .6, .8 are, respectively, .018, .028, .026, and .014; those at n = 100 and β = .4, and .6 are, respectively, .014 and .013. Those for all the other conditions are below .009. These population values are clearly reflected in Table 5 although most of the estimated biases,  c and bias  nm in particular, are smaller. bias

686

PSYCHOMETRIKA TABLE 5.

Biases and SEs of βˆ∗ in simple regression with xi being given and e following standardized (1/2).

Standardized Regression  ep SE  tb SE  nm bias

c SE

 ep SE

Regression  ep   ep bias SE SE

.279 .138 .099 .045

.278 .140 .100 .045

.341 .146 .099 .045

.001 −.003 .001 −.001

.281 .136 .098 .045

.351 .140 .097 .046

.317 .139 .098 .044

.256 .130 .094 .042

.249 .135 .098 .045

.323 .144 .100 .044

.019 .001 −.000 .000

.275 .134 .097 .044

.296 .138 .098 .042

.091 .019 .019 .003

.283 .129 .091 .041

.211 .109 .078 .036

.205 .122 .090 .043

.294 .142 .097 .045

−.000 −.004 .004 .000

.259 .126 .089 .041

.298 .141 .092 .041

.020 .014 .009 .002

.069 .024 .016 .002

.237 .111 .079 .036

.153 .078 .056 .026

.149 .095 .075 .038

.246 .114 .085 .039

−.015 .007 .001 −.000

.223 .110 .078 .036

.261 .114 .083 .036

.018 .008 .005 .001

.054 .013 .006 .002

.165 .082 .059 .027

.077 .040 .029 .013

.076 .054 .044 .024

.134 .073 .055 .025

.011 .002 −.004 −.000

.162 .082 .059 .027

.185 .086 .063 .027

n

 nm bias

c bias

.00 .00 .00 .00

10 50 100 500

−.000 −.000 .000 −.000

−.001 −.000 .000 −.000

−.001 −.003 .001 −.001

.331 .143 .101 .045

.20 .20 .20 .20

10 50 100 500

.009 .001 .001 .000

.013 .009 .006 .002

.088 .017 .007 .002

.40 .40 .40 .40

10 50 100 500

.014 .003 .001 .000

.017 .013 .009 .003

.60 .60 .60 .60

10 50 100 500

.016 .004 .002 .000

.80 .80 .80 .80

10 50 100 500

.014 .004 .002 .000

β = β∗

Four SEs and three biases for βˆ∗j in multiple regression with normally distributed predictors  c tends to under-predict the correand errors are on the left side of Table 6. When n = 30, SE   ep by only .002. As might be  sponding SEep . At n = 100, each SEc is blow the corresponding SE expected from Example 1 in the previous section, only little differences exist among all the estimates of SEs of βˆ∗j . Similarly, on the right side of Table 6 only small difference exists between  tb and  of each βˆj when n ≥ 100. Because  xx possesses compound symmetry, SE the two SE’s  nm for βˆ∗j remain about the same when the value of βj changes. Such a phenomenon also SE can be observed for the other conditions in the following tables when the distributions of xi and ei vary. All the order O(1/n) population biases are below .001 in absolute value, which are well-reflected in Table 6, where the largest biases are in the third decimal place. Table 7 contains the SEs and biases when the five predictors are nonstochastic and e follows  c for βˆ∗j tends to under-predict the correa normal distribution. Similar to those in Table 6, SE   sponding SEep at n = 30, and all the other SE estimates for βˆ∗j are very close. The two SE’s for each βˆj are also very close. Using Sxx =  xx as an approximation when evaluating (35), the order O(1/n) population biases at β5 = .8 and n = 30 is .003, and all the others are between .000 and .003. This is well-reflected in Table 7, where all the estimated biases are in the third decimal place. Table 8 contains the SEs and biases when the five predictors follow an elliptical distribution  ep for βˆ∗j tends to be with ηx = 15 and e follows a standardized (1/2). Again, at smaller n SE  under-predicted substantially by the corresponding SEc . Similar amount of under-prediction also  tb and SE  nm for βˆ∗j  of βˆj at smaller n on the right side of Table 8. The SE occurs for the SE’s  also under-predict the corresponding SEep even when n is large, indicating their inconsistency. It follows from (26) that the order O(1/n) population biases associated with βˆ∗3 , βˆ∗4 and βˆ∗5 at

687

KE-HAI YUAN AND WAI CHAN TABLE 6.

Biases and SEs of βˆ∗j in a regression model with p = 5 random predictors, β = (.0, .2, .4, .6, .8) , x ∼ N (0,  xx ) with  xx being a correlation matrix, and e ∼ N (0, 1).

Standardized Regression  ep SE  tb  nm bias SE

c SE

 ep SE

Regression  ep   ep bias SE SE

.147 .146 .146 .145 .143

.127 .127 .127 .127 .125

.142 .141 .142 .140 .137

.012 −.010 .006 −.015 .004

.263 .262 .264 .263 .264

.268 .270 .265 .274 .263

.070 .070 .070 .070 .070

.071 .071 .071 .071 .070

.069 .068 .068 .068 .068

.071 .068 .070 .070 .070

−.005 −.001 .001 .002 .004

.133 .133 .134 .134 .134

.136 .130 .134 .136 .137

−.001 .001 −.000 .000 −.000

.031 .031 .031 .031 .031

.031 .031 .031 .031 .030

.031 .031 .031 .030 .030

.031 .030 .031 .030 .029

−.001 .002 −.001 .001 −.001

.058 .058 .058 .058 .058

.058 .058 .058 .058 .056

.000 −.000 .001 −.001 .000

.022 .022 .022 .022 .022

.022 .022 .022 .022 .021

.022 .022 .022 .021 .021

.022 .021 .021 .022 .021

.000 −.000 .003 −.001 .000

.041 .041 .041 .041 .041

.042 .040 .040 .042 .040

n

β

 nm bias

c bias

30

β1 β2 β3 β4 β5

−.000 .000 .000 −.000 −.001

−.000 .000 .000 .000 .000

.007 −.005 .004 −.008 .001

.139 .139 .139 .138 .138

100

β1 β2 β3 β4 β5

−.000 .000 .000 −.000 −.000

−.000 .000 .000 .000 −.000

−.003 −.001 −.000 .000 .002

500

β1 β2 β3 β4 β5

−.000 .000 .000 −.000 −.000

−.000 .000 .000 −.000 −.000

1000

β1 β2 β3 β4 β5

−.000 .000 .000 −.000 −.000

−.000 .000 .000 −.000 −.000

TABLE 7.

Biases and SEs of βˆ∗j in a regression model with p = 5 given predictors, β = (.0, .2, .4, .6, .8) , and e ∼ N (0, 1).

c bias

Standardized Regression  ep SE  tb SE  nm bias

c SE

 ep SE

Regression    ep biasep SE SE

n

β

 nm bias

30

β1 β2 β3 β4 β5

−.000 .002 .003 .007 .008

−.000 .001 .001 .003 .003

−.005 .006 −.004 .003 .006

.135 .117 .110 .130 .139

.134 .116 .108 .125 .132

.120 .104 .097 .113 .120

.131 .119 .113 .125 .138

−.011 .011 −.009 .004 .011

.281 .230 .246 .208 .270

.273 .236 .254 .207 .277

100

β1 β2 β3 β4 β5

.000 .001 .002 .002 .003

.000 .000 .000 .001 .001

.004 −.000 −.001 .001 −.002

.065 .072 .072 .072 .068

.065 .072 .071 .070 .064

.063 .069 .069 .068 .063

.066 .071 .072 .071 .064

.009 −.000 −.002 .001 −.006

.122 .149 .137 .142 .128

.124 .147 .139 .145 .125

500

β1 β2 β3 β4 β5

−.000 .000 .000 .001 .001

−.000 .000 .000 .000 .000

−.000 −.000 .001 −.000 .001

.031 .032 .030 .030 .030

.031 .032 .030 .029 .028

.031 .032 .030 .029 .028

.032 .031 .030 .029 .028

−.001 −.001 .002 −.001 .001

.059 .058 .057 .057 .057

.060 .057 .057 .058 .055

1000

β1 β2 β3 β4 β5

.000 .000 .000 .000 .000

.000 .000 .000 .000 .000

.000 −.000 −.000 −.001 .001

.022 .022 .022 .021 .022

.022 .022 .022 .020 .020

.022 .022 .022 .020 .020

.022 .021 .022 .021 .021

.001 −.001 −.001 −.001 .001

.041 .041 .042 .039 .042

.041 .040 .041 .040 .042

688

PSYCHOMETRIKA TABLE 8.

Biases and SEs of βˆ∗j in a regression model with p = 5 random predictors, β = (.0, .2, .4, .6, .8) , x following an elliptical distribution with μx = 0,  xx being a correlation matrix, ηx = 15, and e following standardized (1/2).

Standardized Regression

Regression

n

β

 nm bias

30

β1 β2 β3 β4 β5

−.000 −.000 −.000 −.000 −.001

−.000 −.001 −.001 −.002 −.003

−.006 −.005 .007 −.005 −.019

.186 .188 .185 .184 .184

.197 .198 .195 .193 .192

.154 .159 .157 .162 .167

.195 .206 .201 .204 .214

−.012 −.004 .024 .014 −.003

.384 .388 .387 .387 .386

.421 .431 .420 .403 .437

100

β1 β2 β3 β4 β5

−.000 .000 −.000 −.000 −.000

−.000 −.000 −.001 −.002 −.003

−.001 .003 −.005 −.008 −.010

.083 .082 .082 .083 .083

.084 .084 .084 .084 .083

.073 .074 .079 .085 .086

.088 .089 .094 .106 .102

−.001 .009 −.000 −.007 −.004

.164 .162 .165 .165 .165

.174 .172 .176 .177 .173

500

β1 β2 β3 β4 β5

−.000 .000 .000 −.000 −.000

.000 −.000 −.001 −.001 −.001

.000 .000 −.001 −.002 −.001

.032 .032 .032 .032 .032

.032 .032 .032 .032 .032

.030 .031 .035 .039 .041

.034 .034 .038 .043 .046

.001 .001 −.001 −.000 .002

.062 .061 .062 .062 .062

.066 .061 .065 .063 .061

1000

β1 β2 β3 β4 β5

−.000 .000 .000 −.000 −.000

−.000 −.000 −.000 −.001 −.001

.001 −.002 .000 −.000 −.001

.022 .022 .022 .022 .022

.022 .022 .022 .022 .022

.021 .022 .025 .028 .030

.022 .024 .027 .030 .032

.001 −.003 .002 −.000 .000

.043 .042 .042 .042 .042

.042 .043 .044 .042 .042

c bias

 ep bias

 tb SE

 nm SE

c SE

 ep SE

 ep bias

 SE

 ep SE

TABLE 9.

Biases and SEs of βˆ∗j in a regression model with p = 5 given predictors, β = (.0, .2, .4, .6, .8) , and e following standardized (1/2).

Standardized Regression c bias

Regression

 ep bias

 tb SE

 nm SE

c SE

 ep SE

 ep bias

 SE

 ep SE

n

β

 nm bias

30

β1 β2 β3 β4 β5

.000 .002 .005 .006 .010

.000 .001 .003 .004 .005

.002 −.003 .001 .008 .013

.144 .124 .129 .131 .137

.143 .123 .126 .127 .128

.128 .110 .114 .116 .118

.150 .130 .134 .132 .139

.005 −.009 −.005 .006 .008

.297 .236 .232 .291 .253

.312 .252 .255 .303 .263

100

β1 β2 β3 β4 β5

.000 .001 .002 .003 .004

−.000 .000 .001 .001 .002

.002 .003 .001 .003 .003

.064 .066 .071 .068 .068

.064 .066 .070 .066 .064

.062 .064 .068 .065 .064

.063 .065 .070 .069 .067

.003 .005 −.002 −.002 −.004

.112 .149 .133 .131 .127

.111 .148 .132 .133 .128

500

β1 β2 β3 β4 β5

−.000 .000 .000 .001 .001

−.000 .000 .000 .000 .000

−.001 −.000 .000 .001 .001

.030 .030 .030 .031 .030

.030 .030 .029 .029 .028

.030 .030 .030 .030 .029

.030 .031 .029 .031 .029

−.003 −.001 .000 .002 .001

.057 .060 .059 .056 .057

.057 .062 .058 .058 .057

1000

β1 β2 β3 β4 β5

.000 .000 .000 .000 .000

.000 .000 .000 .000 .000

.001 .001 −.001 .001 .001

.022 .022 .022 .021 .022

.022 .022 .022 .020 .020

.022 .022 .022 .021 .021

.022 .022 .022 .022 .022

.001 .001 −.002 .002 .002

.040 .040 .042 .041 .041

.040 .041 .042 .042 .042

KE-HAI YUAN AND WAI CHAN

689

n = 30 are, respectively, −.014, −.026 and −.043; that associated with βˆ∗5 at n = 100 is −.013; and all others are in the third decimal place. Except for the empirical bias of βˆ∗3 at n = 30, all the others corresponding to the negatively large population biases also have relatively large negative  c ’s corresponding to the larger population biases are also relatively values in Table 8. The bias  nm do not reflect large although they all are in third decimal place. The quantities under bias the empirical or population biases well because they are not consistent, due to the nonnormal distribution of x and e. Table 9 contains SEs and biases when xi are nonstochastic and e follows standardized  c of βˆ∗j tends to under-predict the correspond (1/2). Similar to the previous tables, the SE  ep at n = 30 while the two are very close when n ≥ 100. The other SE estimates of βˆ∗j ing SE  ep . At n = 30, SE  for βˆj also tends to under-predict the corresponding are also very close to SE  ˆ  SEep although Var(βj ) is unbiased for Var(βˆj ). Using Sxx =  xx as an approximation when evaluating (33), the largest population bias of order O(1/n) is for βˆ∗5 at n = 30; all others are between .000 and .006. The largest biases in Table 9 also occur with βˆ∗5 at n = 30, although  nm is not consistent. bias

6. Discussion and Conclusion Standardized coefficients are among the most interesting parameters in regression when the response and predictors are of arbitrary scales. Better SEs will yield better confidence intervals when evaluating the accuracy of the estimates. Our analytical results indicate that both the SEs and biases of sample standardized coefficients depend on model parameters as well as on the distributions of the predictors and the errors. The SEs given in textbooks are consistent if and only if βj = 0. When βj = 0, the bias of βˆ∗j of order O(1/n) is also zero. The analytical results  tbj overestimates the true SE of βˆ∗ at p = 1 when both x and e are further suggested that SE normally distributed. It can also underestimate the true SE when x and e have heavy tails. For  tbj may be close to being consistent, as in Tables multiple regression, under certain conditions, SE 6; but it can also contain substantial biases, as in Table 8. The biases in βˆ∗j of order O(1/n) can be either positive or negative, which should not be a concern in application unless the sample size is small.  c under-predict the empirical SEs when the sample size is small. Although consistent, the SE √ √ The correction with the denominator n − 3 instead of n in (23) is not enough. Further upward correction is needed for the resulting SE estimates to be both consistent and close to the empirical ones at small sample sizes. Similar upward correction is also needed for the commonly used SE estimates of the unstandardized regression coefficients βˆj . When predictors are nonstochastic, all the uncertainty in y is due to that of e, whose standard deviation is σ . Because sy converges to (β  Sx xβ + σ 2 )1/2 instead of σ , Mayer & Younger (1976) proposed using β˜∗j =

sj βˆj σˆ

instead of the βˆ∗j in (3) to estimate the Beta-coefficient, where the σˆ is given by (6). Because σˆ contains sampling errors, β˜∗j also contains bias and its SE is not consistently estimated by √ sj aj / n. Considering that β˜∗j is seldom used in practice, we will not further study its SE and bias. In this paper, we obtained formulas for consistent SEs and biases of order O(1/n). Because no iteration is needed in evaluating them, they can be easily incorporated into an existing regression program. Biases and consistent SEs can also be obtained by the bootstrap (Efron &

690

PSYCHOMETRIKA

Tibshirani, 1993). As mentioned in the introduction, when the predictors are random, software for structural equation modeling (Bentler, 2007, 2008; Jöreskog and Sörbom, 1996; Muthén & Muthén, 2007) may also generate consistent SEs using special constraints and phantom variables (Cheung, 2009).

Acknowledgements We would like to thank Dr. Hirokazu Yanagihara, the editor, an associate editor, and three referees for comments that helped in improving the paper. References Bentler, P.M. (2007). Can scientifically useful hypotheses be tested with correlations? American Psychologist, 62, 772– 782. Bentler, P.M. (2008). EQS 6 Structural equations program manual. Encino: Multivariate Software. Browne, M.W. (1982). Covariance structure analysis. In D.M. Hawkins (Ed.), Topics in applied multivariate analysis (pp. 72–141). Cambridge: Cambridge University Press. Cheung, M.W.L. (2009). Constructing approximate confidence intervals for parameters with structural equation models. Structural Equation Modeling, 16, 267–294. Cohen, J., Cohen, P., West, S.G., & Aiken, L.S. (2003). Multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah: LEA. Cudeck, R. (1989). Analysis of correlation matrices using covariance structure models. Psychological Bulletin, 105, 317–327. Efron, B., & Tibshirani, R.J. (1993). An introduction to the bootstrap. New York: Chapman & Hall. Ferguson, T.S. (1996). A course in large sample theory. London: Chapman & Hall. Gonzalez, R., & Griffin, D. (2001). Testing parameters in structural equation modeling: Every “one” matters. Psychological Methods, 6, 258–269. Harris, R.J. (2001). A primer of multivariate statistics (3rd ed.). Mahwah: LEA. Hays, W.L. (1994). Statistics (5th ed.). Belmont: Wadsworth. Jamshidian, M., & Bentler, P.M. (2000). Improved standard errors of standardized parameters in covariance structure models: implications for construct explication. In R.D. Goffin, & E. Helmes (Eds.), Problems and solutions in human assessment (pp. 73–94). Dordrecht: Kluwer Academic. Jennrich, R.I. (1974). Simplified formulae for SEs in maximum likelihood factor analysis. British Journal of Mathematical and Statistical Psychology, 27, 122–131. Jöreskog, K.G., & Sörbom, D. (1996). LISREL 8 users’s reference guide. Chicago: Scientific Software International. Kelley, K., & Maxwell, S.E. (2003). Sample size for multiple regression: obtaining regression coefficients that are accurate, not simply significant. Psychological Methods, 8, 305–321. Lee, S.-Y. (1985). Analysis of covariance and correlation structures. Computational Statistics & Data Analysis, 2, 279– 295. Magnus, J.R., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics (2nd edn.). New York: Wiley. Mayer, L.S., & Younger, M.S. (1976). Estimation of standardized regression coefficients. Journal of the American Statistical Association, 71, 154–157. Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156– 166. Muthén, L.K., & Muthén, B.O. (2007). Mplus user’s guide (5th ed.). Los Angeles: Muthén & Muthén. Nakagawa, S., & Cuthill, I.C. (2007). Effect size, confidence interval and statistical significance: a practical guide for biologists. Biological Reviews, 82, 591–605. Olkin, I., & Finn, J.D. (1995). Correlations redux. Psychological Bulletin, 118, 155–164. Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Mahwah: LEA. Thompson, B. (Ed.) (2001). Confidence intervals around effect sizes [Special issue]. Educational and Psychological Measurement, 61(4). Wilkinson, L. & The American Psychological Association Task Force on Statistical Inference (1999). Statistical methods in psychology journals: guidelines and explanations. American Psychologist, 54, 594–604. Yuan, K.-H., & Bentler, P.M. (2000). Inferences on correlation coefficients in some classes of nonnormal distributions. Journal of Multivariate Analysis, 72, 230–248. Manuscript Received: 11 MAY 2010 Final Version Received: 13 SEP 2010 Published Online Date: 4 AUG 2011