Feb 27, 2012 - To cite this article: Alice S. Whittemore (1989) Errors-in-Variables Regression Using Stein. Estimates, The American Statistician, 43:4, 226-228.
The American Statistician
ISSN: 0003-1305 (Print) 1537-2731 (Online) Journal homepage: http://amstat.tandfonline.com/loi/utas20
Errors-in-Variables Regression Using Stein Estimates Alice S. Whittemore To cite this article: Alice S. Whittemore (1989) Errors-in-Variables Regression Using Stein Estimates, The American Statistician, 43:4, 226-228 To link to this article: http://dx.doi.org/10.1080/00031305.1989.10475663
Published online: 27 Feb 2012.
Submit your article to this journal
Article views: 31
Citing articles: 9 View citing articles
Full Terms & Conditions of access and use can be found at http://amstat.tandfonline.com/action/journalInformation?journalCode=utas20 Download by: [71.74.122.179]
Date: 08 January 2016, At: 20:01
Errors-in-Variables Regression Using Stein Estimates ALICE S. WHITTEMORE*
Downloaded by [71.74.122.179] at 20:01 08 January 2016
A method is proposed for estimating regression parameters from data containing covariate measurement errors by using Stein estimates of the unobserved true covariates. The method produces consistent estimates for the slope parameter in the classical linear errors-in-variables model and applies to a broad range of nonlinear regression problems, provided the measurement error is Gaussian with known variance. Simulations are used to examine the performance of the estimates in a nonlinear regression problem and to compare them with the usual naive ones obtained by ignoring error and with other estimates proposed recently in the literature. KEY WORDS: Covariate measurement errors; James-Stein estimates; M-estimates.
n
L(x, i )=
In the classical errors-in-variablesregression problem described by Kendall and Stuart (1961), one wishes to estimate the intercept and slope of a regression line relating response y to a scalar covariate x. The estimation is based on a sample of independent pairs ( y l , zl), . . ., (y,, z,), where y i has Blxj and variance u2 and zi is the value xi mean 0, contaminated with measurement error ( i = 1, . . .,n). Usually ziis a Gaussian variable with mean xi and variance 6. Here we propose a simple method for estimating the parameters in this and more general nonlinear regression models. The method consists of replacing the unobserved true covariates xi by their James-Stein estimates, based on the observed covariates z1 , . . ., z, and then estimating the regression parameters in the usual way. Section 2 reviews the James-Stein estimate (James and Stein 1961; Stein 1955, 1981) and the empirical Bayes interpretation given it by Efron and Moms (1975). Section 3 shows why the proposed method has promise for errors-invariables regression under Gaussian measurement error assumptions. In Section 4 a small simulation based on a nonlinear regression model is used to compare the new estimate with the naive one obtained by regressing response on the observed covariates and with corrected estimates proposed by Stefanski (1985) and Whittemore and Keller (1988).
+
2. THE JAMES-STEIN ESTIMATE Suppose that n 1 4 observed values z1, . . ., z, are independently normally distributed with Ez; = xi and known variance 6. The unknown vector of means (xl, . . ., x,) is to be estimated with sum of squared error loss *Alice S. Whittemore is Professor of Epidemiology and Biostatistics, Depaltment of Health Research and Policy, Stanford University School of Medicine, Stanford, CA 94305-5092. This work was supported by National Institutes of Health Grant CA 23214 and by a grant to the Societal Institute for Mathematical Sciences from the U.S. Environmental Rotection Agency. The author is grateful to Daniel Stram for performing the simulations, and to Joseph B. Keller, James Ware, and Tor Tosteson for helpful discussions.
226
The American Statistician, November 1989, Vol. 43, No. 4
x;)2,
i= 1
where 9 = (a,, . . .,2,) is the estimate of x. The maximum likelihood estimate (MLE) z = (zl , . . ., z), has constant risk n
E
2 (z; -
=
n6,
i= 1
where E denotes expectation over the n-variate Gaussian distribution described previously. James and Stein (1961) introduced the estimate e(z) = (e,(z), . . ., e,(z)) given by e,(z) = Bp;
1. INTRODUCTION
2 (i;-
+ (1 - B ) z i ,
i = 1,
. . ., n, (1)
. . ., p,) any initial guess at x, B = (n - 2 ) / S , with (pI, and S = z ( z i - p;)*.This estimate has risk strictly less than n6 for all x. Furthermore, if xi = p; for all i , the risk is 26, which is much smaller than the risk n6 for the MLE when n is large. Efron and Morris (1973, 1975) showed that the JamesStein estimate e(z) arises quite naturally in an empirical Bayes context. If the xi are themselves a sample from a Gaussian prior xi
iid
- N ( p , T),
i = 1,
. . ., n,
(2)
then the Bayes estimate of xi is its posterior mean given the data: E[x; I z;] = B p
+ (1 - B ) z ; ,
B
=
6/(6+ 7 ) .
(3)
In the empirical Bayes situation, p and 7 are unknown, but can be estimated because marginally the ziare independent normal with common mean p and common variance 7 + 6. Thus f = n-'&zi is an unbiased estimate of p, and as shown by Efron and Morris (1975, Secs. 1 and 2), 6(n - 3)/ 3, where = Z(zj - 2)', is an unbiased estimate of B = 6/(6+ 7 ) . Substitution of these values for the unknown p and B in (3) gives the James-Stein estimate (l), with p i = 2, andB = 6(n-3)/3: e j ( z ) = BZ
+ (1
-
B>zj.
(4)
A further improvement, having even smaller risk for all x (Efron and Moms 1973), is the estimate (4) with B = min[l, 6 ( n - 3 ) / 3 ] . In estimating the scale factor B of (3), it was assumed that 6 is known. If 6 is unknown it too can be estimated, provided we have repeated mesurements of the 2;. More explicitly, suppose that ti,( j = 1, . . ., J) represent J iid measurements, each normally distributed with mean xi and variance 6 ( i = 1, . . ., n). Then the Bayes estimate (3) for xi becomes E(x; 1 z ; ] , . . ., z;,} = B p
+ (1 -B)T;, B = J-'6/(5-'6+
7),
0 I989 American Statistical Association
where Ti = J-'z,zi,. Efron and Morris (1972, sec. 7) suggested estimating B by B = J - l 8 ( n - 3)/s,where 8 = [ n ( J - l)+2J-'X;=l(qj and = C;=1(Zi-Z)2. [Here Z = (Jn)-Izijzi, is the grand mean.] The good large-sample properties of the Stein estimate have been demonstrated in theory and in practice by several investigators (e.g., Efron and Morris 1973, 1975; Laird and Louis 1987).
3. ERRORS-IN-VARIABLES REGRESSION USING THE JAMES-STEIN ESTIMATE
Downloaded by [71.74.122.179] at 20:01 08 January 2016
Now consider the following errors-in-variables regression problem. We wish to estimate a vector parameter 8 governing the distribution of a random variable y when the distribution also depends on a covariate x that is erroneously measured as z. Conditional on x, z has a Gaussian distribution with mean E(z) = x and known variance 6. In the absence of measurement error, inferences on 8 are based on an M-estimate 8(y, x) that is a solution to an equation
C i=l
(5)
+(yipxi, 8) = 0.
Here +b is a given function, such as a likelihood or quasilikelihood score or a robust estimating function. For example, the ordinary least squares estimate for the linear model satisfies ( 5 ) with +( y, x, 8) = ( y - 0x)x. The usual naive estimate &y, z), obtained by solving ( 5 ) with x, = zi ( i = I , . . ., n) is generally inconsistent. I propose the alternative estimate 8[y, e(z)], obtained by replacing each xi with its James-Stein estimate (4) and then solving ( 5 ) . The rationale for the estimate e[y, e(z)] can be seen by reconsidering the classical linear regression problem when the xi themselves represent a random sample from the Gaussian distribution (2). Conditional on x i , y i and zi are inde-
+
pendent Gaussian variables with means E[ yi I x i ] = 8, elxi and E[zi I xi] = xi (i = 1, . . ., n). Then marginally the zi also have the Gaussian distribution (2) but with T replaced by T 6. Thus, as surrogates for the x i , the zi are too variable, and they need to be shrunk toward their mean. To determine the amount of shrinking needed, note that the ordinary least squares slope estimate 0,(y, z) is the sample covariance of y and z divided by the sample variance of z , and thus it converges in probability to cov( y , z)/var z, which equals 8, T / ( T 6). Multiplying the z's by a constant c has the effect of multiplying &(?, z) by c - ' . Therefore, to correct the asymptotic bias in (y, z), one should scale the z's by the factor T / ( T 6). Since the coefficient 1 - B of the zi in (4) is an unbiased estimate of T / ( T a), the JamesStein estimate (4) provides the right amount of shrinkage for the z's. A similar argument shows that io[y, e(z)] is consistent for the intercept 8,.
+
+
+
4.
+
SIMULATION RESULTS
Simulations were used to evaluate the performance of the new estimate when the probability model relating y to x was nonlinear and non-Gaussian. These simulations were done by Daniel Stram. The true and measured covariates x and z were generated from a bivariate Gaussian distribution with mean 0 and covariances a,, = ur2= 1 and crzz = 1 + 6. Conditional on x , y was generated from an exponential distribution with mean e - Or. Five hundred simulated data sets were generated for each of the three sample sizes n = 50, 100, 500 and each of the four values 6 = . l o , .25, .50, 1 .oo. Table 1 gives the mean, standard deviation (SD), and root mean squared error (RMSE) of the 500 values of each of the following estimates. The error-free estimate is the MLE based on the data ( y l , xl), . . ., ( y , , x,) and is, therefore, the solution to ( 5 ) with +( y , x, 8) = (1 - yeex)x.
Table 1. Performance of Various Estimates for 8 (true 8 = 1; sample size = n ) n = 50
n
=
n = 500
100
Estimate
Mean
SD
RMSE
Mean
SD
RMSE
Mean
SD
RMSE
Error-free
1.01
.15
.15
1.01
.10
.10
1 .oo
.04
.04
.91
.91
.10 .12 .12 .12
.14 .12 .12 .12
.91
.99 .99
.05 .05 .05 .05
.10 .05 .05 .05
.81 1 .oo .97 1.02
.05 .06 .07 .07
.20
.67
.90 1.04
.06 .08 .08 .09
.34 .08 .13 .10
.50 .99 .76 1.06
.06 .12 .09 .13
.50 .12 .25 .14
6 = .10
Naive New Stefanskia Whitternore and Kellerb
1 .oo 1 1 .oo
.oo
.15 .17 .17 .17
.18 .17 .17 .17
1 .oo 1.01
6 = .25 Naive New Stefanskia Whitternore and Kellerb
.80 1.01 .98 1.01
.17 .20 .21 .21
.26 .20 .21 .21
.81 1 .oo .98 1.02
.11
.13 .14 .14
.22 .13 .14 .14
.67
.90 1.03
.11 .19 .16 .17
.35 .19 .19 .18
.50 1.03 .77 1.06
.11 .33 .18 .23
.51 .33 .29 .24
6 = .50 Naive New Stefanski" Whitternore and Kellerb
.65 .99 .89 1
.oo
.16 .28 .24 .25
.38 .28 .26 .25
6 = 1.00 Naive New Stefanskia Whitternore and Kellerb
.50 1 .OO .78 1.05
.15 .42 .26 .32
.52 .42 .34 .32
1 .oo
1 .oo
NOTE: It was assumed that ( y 1 x ) IS exponential with mean e - " and ( x , z)are Gaussian with mean (0, 0) and covariances vxr=