Sociedad de Estadstica e Investigacion Operativa Test (2001) Vol. 10, No. 2, pp. 301{308
Least squares estimators in measurement error models under the balanced loss function Shalabh
Department of Statistics Panjab University, India
Abstract The ultrastructural form of the measurement error model is considered and a comparison of the direct and the reverse regression estimators is made under the balanced loss function, which explicitly takes into account the accuracy of the predictions and the precision of estimation.
Key Words: Balanced loss function, direct and reverse regression, measurement errors, ultrastructural model.
AMS subject classi cation: 62J05
1 Introduction There are two popular strategies for least squares estimation of the parameters in a linear regression relationship involving only two variables. One, is the direct regression method based on the regression of the study variable on the explanatory variable, and the other is the reverse regression method based on the regression of the explanatory variable on the study variable. Considering the regression coeÆcient, the direct regression estimator is not only unbiased but also has superior performance, at least asymptotically, than the reverse regression estimator with respect to its mean squared error and the goodness of model t, provided that there are no measurement errors in the observations. When the observations are contaminated by measurement errors, there is a dramatic change in the performance properties of both the estimators; see Cheng and Van Ness (1999) and Fuller (1987) for an interesting account. In a seminal article, Zellner (1994) has recommended that the eÆciency of any estimator should be examined by both the precision of estimation Correspondence to: Shalabh, Department of Statistics, Panjab University, Chandi-
garh - 160 014, India. Email:
[email protected],
[email protected] Received: May 2000; Accepted: May 2001
302
Shalabh
and the goodness of model t which essentially re ects the accuracy of predictions. Accordingly, he has proposed the use of balanced loss function that succeeds in taking account of both criteria; see, e.g., Giles, Giles and Ohtani (1996), Ohtani (1998) and Wan (1994) for some interesting applications. Utilizing such a loss function, we compare the direct and reverse regression estimators in the context of the linear ultrastructural model considered by Dolby (1976).
2 The Main Result Consider a set of n observations (x1 ; y1 ); (x2 ; y2 ); : : : ; (xn ; yn ) on the explanatory variable X and study variable Y . They are assumed to be errorridden so that we can write
yi = Yi + ui (i = 1; 2; : : : ; n) xi = Xi + vi
(2.1)
where Yi and Xi denote the true but unavailable counterparts while ui and vi are the measurement errors. It is assumed that the errors u1 ; u2 ; : : : ; un are independently and identically distributed with mean 0 and variance u2 . Similarly, the errors v1 ; v2 ; : : : ; vn are independently and identically distributed with mean 0 and variance v2 . We also assume that X1 ; X2 ; : : : ; Xn are random variables with means m1 ; m2 ; : : : ; mn but the same variance 2 . Finally, it is assumed that all these quantities are mutually independent. This completes the speci cation of a linear ultrastructural model; see Dolby (1976). It reduces to the structural form of the linear measurement error model when m1 ; m2 ; : : : ; mn are all equal. When X1 ; X2 ; : : : ; Xn are assumed to be nonstochastic so that 2 = 0, we obtain the functional form of the linear measurement error model. Lastly, it becomes the classical linear model, free from measurement errors, when v2 and 2 are both equal to 0. If denotes the slope parameter in the linear regression relationship of Y on X , an application of the least squares procedure provides the following direct regression estimator of : s bD = xy ; (2.2) sxx
303
Least squares estimators
where
sxx = sxy =
n 1X (x n i=1 i
n 1X (x n i=1 i
x)2 ;
x=
x)(yi y);
y=
n 1X x n i=1 i
n 1X y: n i=1 i
(2.3)
Similarly, if we apply the least squares procedure to the regression of X on Y , we obtain the reverse regression estimator of as follows s (2.4) bR = yy ; sxy where
syy =
n 1X (y n i=1 i
y)2 :
(2.5)
Performance properties of estimators (2.2) and (2.4) with respect to the consistency, unbiasedness, variance and mean squared error criteria are well discussed in Cheng and Van Ness (1999) and Fuller (1987); see also Srivastava and Shalabh (1997) for the asymptotic properties under nonnormality of the distributions. If ^ denotes any estimator of , the balanced loss function introduced by Zellner (1994) is speci ed by n h n i2 X wX (1 w) ^ (yi y) ^(xi x) + ( )2 (xi x)2 L( ^) = n n ^ xy ) + (1 w)( 2 ^) sxx ; = ^2 sxx + w(syy 2 s (2.6)
where w is a nonstochastic scalar lying between 0 and 1. The rst quantity on the right hand side of (2.6) measures the accuracy of predictions while the second quantity measures the closeness of the estimate to the true parameter value. The constant w re ects the weighting being assigned to the accuracy of predictions in relation to the precision of estimation.
304
Shalabh
Let us now consider the dierence in risks associated with the direct regression and the reverse regression estimators of using the loss function (2.6). = E [L(bR ) L(bD )] syy sxx =E + (1 2w)sxy sxy
2(1 w) sxx
syy sxy
sxy : (2.7) sxx
For the derivation of asymptotic risk, it is assumed that the variance of m1 ; m2 ; : : : ; mn tends to a nite quantity as n tends to in nity. Such a speci cation, for instance, eliminates the presence of any trend in the explanatory variable; see, e.g., Schneeweiss (1991). It is easy to see that E (sxx ) = v2 + 2 + + O (1=n) E (sxy ) = (2 + ) + O (1=n) E (syy ) = (2 + ) 2 + 2 + O (1=n) : u
Using these results in (2.7), we obtain the expression for the leading term in as follows u2 2 v2 v2 1 2 2 2 + 2 2 1+ 2 u (1 2w) v = 2 2 + v + + + (2.8) provided that is dierent from zero. When there are no measurement errors so that v2 = 0 and the linear ultrastructural model reduces to the classical linear regression model, it is seen that the leading term in is positive. This implies the well known result in the classical linear regression model that the direct regression estimator is, at least asymptotically, superior than the reverse regression estimator, whether the performance criterion is the accuracy of predictions, or the precision of the estimation, or a convex combination of both. This suggests that a practitioner should always use the direct regression estimator in preference to the reverse regression estimator for the estimation of the slope parameter in the classical linear regression model under the present criterion of performance. It is observed from (2.8) that the superiority of bD over bR and viceversa depends upon the magnitude of the regression coeÆcient ( ), error
305
Least squares estimators
variances (u2 and v2 ), and the quantity (2 + ) characterizing the nature of the measurement error model (functional, structural or ultrastructural) apart from the scalar (w) specifying the loss function. When w = 0, i.e., the criterion is the precision of estimation as measured by the mean squared error, the direct regression estimator is invariably superior to the reverse regression estimator when 2 2 Æ = 2 u 2 + 2 2u > 1: (2.9) v ( + ) It may be observed that the rst component in the speci cation of Æ is related to the ratio of error variances in the true values of the study and explanatory variables, while the second component is related to the ratio of the variance of errors in the true values of the study variable and the variance of the true values of the study variable. Further, if we de ne 2 + x = 2 2 (2.10) v + + 2 (2 + ) y = 2 2 2 ; (2.11) u + ( + ) the quantity Æ can be expressed in terms of the reliability ratios x and y associated with the explanatory and study variables respectively as follows
Æ=
(1 y ) : y (1 x )
(2.12)
When Æ is less than 1, the opposite is true, i.e., the reverse regression estimator has better performance than the direct regression estimator. When w = 1, i.e., the criterion is the accuracy of predictions, it is interesting to note that the direct regression estimator is invariably superior to the reverse regression estimator. Let us now consider the intermediate values of w between 0 and 1. It is observed from (2.8) that the direct regression estimator is better than the reverse regression estimator as long as w > 1=2, i.e., the accuracy of predictions is assigned higher weighting in comparison to the precision of estimation in the speci cation of the performance criterion. When the
306
Shalabh
w Æ 0.0 0.1 0.3 0.5 0.0 R R R * 0.1 R R R D 0.3 R R R D 0.5 R R D D 0.7 R R D D 0.9 R D D D 1 D D D D
0.7 D D D D D D D
0.9 D D D D D D D
1 D D D D D D D
Table 1: Estimator with superior performance. D: Direct regression estimator; R: Reverse regression estimator; *: Both equally good.
weighting is lower, i.e., w < 1=2, the superiority of the direct regression estimator over the reverse regression estimator continues to hold, provided that Æ is greater than 1 2w. Conversely, the reverse regression estimator has better performance than the direct regression estimator when Æ is less than 1 2w, i.e., Æ is suÆciently small. In other words, when the accuracy of predictions is assigned lower weighting in comparison to the precision of estimation, such that w is smaller than (1 Æ)=2, the reverse regression estimator is superior to the direct regression estimator. Both the estimators are, however, equally eÆcient when Æ + 2w = 1. A summary view of the above ndings for some selected values of Æ and w is presented in Table 1, where the superior estimator has been stated. When both the estimators are equally good, it is indicated by an asterisk. It is thus observed that a choice between the direct and reverse regression estimators is governed by the magnitude of Æ. When Æ is not less than 1, the direct regression estimator is always preferable to the reverse regression estimator. This result remains true for Æ less than 1 as long as the performance criterion of the accuracy of predictions is given more weighting than the precision of estimation. On the other hand, when the accuracy of predictions receives lesser weighting than the precision of estimation in the speci cation of the performance criterion, the reverse regression estimator may be preferred for some values of Æ smaller than 1. It may be remarked that Æ is generally unknown in practice. A simple solution for making a guess about the magnitude of Æ is to use the sample data to estimate the various unknown quantities in the expression for Æ
Least squares estimators
307
employing, for example, the method of moments. Further, the practitioner may often possess some information about the reliability ratios x and y lying between 0 and 1 from extraneous source like past experience of other studies, repeated survey results, theoretical considerations and long association with the experimental material and similar investigations; see, e.g., Gleser (1992). In such cases, an appropriate choice of an estimator in conformation with the aim specifying the weighting to be given to the accuracy of predictions in relation to the precision of estimation can be exercised. Lastly, it may not be out of place to mention that we have restricted our attention to a simple bivariate linear regression model subject to measurement errors. It will be interesting to extend our results when there are two or more explanatory variables in the model and/or the regression relationship is nonlinear.
Acknowledgements
The author is extremely grateful to an associate editor and two referees for their helpful comments.
References Cheng, C. and J.W. Van Ness (1999). Statistical Regression With Measurement Error. Arnold Publishers, London. Dolby, G.R. (1976). The ultrastructural relation: a synthesis of the functional and structural relation. Biometrika, 63, 39-50. Fuller, W.A. (1987). Measurement Error Models. John Wiley, New York. Giles, J.A., D.E.A. Giles and K. Ohtani (1996). The exact risk of some pre-test and Stein-type regression estimates under balanced loss. Communications in Statistics - Theory and Methods, 25, 2901-2924. Gleser, L.J. (1992). The importance of assessing measurement reliability in multivariate regression. Journal of the American Statistical Association, 87, 696707. Ohtani, K. (1998). The exact risk of a weighted average estimator of the OLS and Stein-rule estimators in regression under balanced loss. Statistics and Decisions, 16, 35-45. Schneeweiss, H. (1991). Note on a linear model with errors in the variables and
308
Shalabh
with trend. Statistical Papers, 32, 261-264. Srivastava, A.K. and Shalabh (1997). Asymptotic eÆciency properties of least squares estimator in ultrastructural model. Test, 6, 419-431. Wan, A.T.K. (1994). Risk comparison of the inequality constrained least squares and other related estimators under balanced loss. Economics Letters, 46, 203210. Zellner, A. (1994). Bayesian and Non-Bayesian estimation using balanced loss function. In Statistical Decision Theory And Related Topics V, 377-390 (S.S. Gupta and J.O. Berger eds.) Springer-Verlag.