Journal of Statistical Computation and Simulation A ...

0 downloads 0 Views 430KB Size Report
A Comparison of james-sten regression with least squares in the pitman nearness .... Under the iloinalitj d5sumption in Eq. jlj, C' has a central chi-square distribution xi-, with m-p ..... "testimators" date back to Bancroft (1972) and Hogg (1974).
This article was downloaded by: On: 30 January 2010 Access details: Access Details: Free Access Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 3741 Mortimer Street, London W1T 3JH, UK

Journal of Statistical Computation and Simulation

Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713650378

A Comparison of james-sten regression with least squares in the pitman nearness sense

Jerome P. Keating a; Veronica czitrom a a Division of Mathematics,Computer Science, and Statistics, The University of Texas at San Antonio, San Antonio, Texas, USA

To cite this Article Keating, Jerome P. and czitrom, Veronica(1989) 'A Comparison of james-sten regression with least

squares in the pitman nearness sense', Journal of Statistical Computation and Simulation, 34: 1, 1 — 9 To link to this Article: DOI: 10.1080/00949658908811202 URL: http://dx.doi.org/10.1080/00949658908811202

PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Sroi~sr.Compui. Simul.. Voi. 34, pp 1-9 Reprrnts available directly from the publrsher Photoccpymg permirted b) !rcense only

j

A COMPARISON' OF JAMES-STEN REGRESSION WITH LEAST SQUARES IN THE PHTMAN NEARNESS SENSE JEROME P. KEATING and VERONICA CZITROM Dwi>wn of Matimnattcs, compute^ Science, ilnd Statistics, The U?:ioersity cf Texas at Sun Antonio, San Antonio, Texas 78285, U S A

:he least squares ectimatnr and the James-Stein estimator of rhe vector-valued paranleiel in a iiidtipk linear regression model are compared in thc Pitman searness sense. A table of the comparison is presented, and the james-Siein esiiriiaior is fad to be ui;if~rm!y preferred t e the lezst s q ~ a r e si~ !his sense. In many practical situations the James-Stein estimate is extremely close to the least squares estimate, and so is barely preferred in the Pitman nearness sense. However, there are practical situations in which the James-Stein estimate is suEciently different from the least squares estimate to be substantially better in the Pitman nearness sense,

Downloaded At: 21:21 30 January 2010

T

KEY WORDS:

Multiple regression, Rao's noncentral F.

For many years regression analysts have caiied for methods of esrimation viiier than least squares. The most promising alternative has been the James-Stein estimator, which was reviewed by Draper and Van Nostrand (1979). Jennrich and Oman (1986) compared several estimators using the mean squared error criterion, and gave conditions under which the James-Stein estimator is the most appropria[e. Eowever, authors such as Halpeiifi (1970) have called for comparison criteria other than the commonly used mean squared error and mean absolute deviation. This paper uses the Pitman nearness (PN) criterion to compare two Jmes-Stein type estimates (of which least squares and James-Stein estimates are special cases) of the vector-valued parameter in multiple linear regression. In this comparison we parallel the arguments of C. R. Rao et al. (1986). The investigations given herein are for small to moderate sample sizes. More general asymptotic results on the comparison of estimators based on Pitman nearness can be found in the seminal article of P. K. Sen (1986). 2. PITMAN NEARNESS FOR THE LINEAR REGRESSION MODEL Consider the usual linear regression model

J. P. KEATING AND V. CZITROM

2

-

Y = Xj.?+ e and e N J O , azl)

!1)

where Y is the m x 1 random vector of observations, X is an rn x p design matrix of known constants and of rank p [rnzp)and fl is a p x l vector of unknnwr? parameters. The ordinary least squares estimate of 1 is b=(XIX)-'X'Y, and an ..-L:,. ..uuwaseuA c b u.m a w r. of g2 is s2 =(YIY-b'X'Yjijm-p). T i e normaiity assumption in (1) implies that h h7,(j(( X 1 X ) -'a2), that (m- pj~2/a2 , and that they are stochastically independent. and bz be two vector-valued estimators of the vector b. The Pitman Let riearness (PN) of 1, relative to j2in estimating 8 is given by --L:

L.

-

B1

A

where 2(.;) is a convex loss function. One should prefer the estimator /?, o~er ~ ~ hdbs ~ ) L for all B, in which case f, is said to be estimator fi, if P N ~ ( ) ~ , 1/2 If jiand ji are !inex f n r m nf a corr,rr,on st&istic inadmissible with respect to b* which has a nonsingular covariance matrix Z, then it is reasonable and customary to employ the Mahalanobis loss function

Downloaded At: 21:21 30 January 2010

ti.

In the regression framework we consider two James-Stein type estimators and of /? of the form

B2

1,

and where b'XIXb>O since X'X is positive definite. Note that (4) reduces to the least squares estimate b if ci=O, and to the James-Stein estimate if c i = (m - p)(p- 2)/(m-p + 2). Lindley and Smith (1972) discuss the James-Stein estimate as a special case of shrinkage estimators. Efron and Morris (1973) propose an empirical Bayes interpretation of the James-Stein estimate. In the comparison of two James-Stein type estimators of the form (4), the loss j.?) = - fl)'xlx(Bi - #)/a2 for b* = h and 2= = fiinction (3) becomes 9(&, (XIX)-lo2.To determine the Pitman nearness of B, relative to it can be shown that 8(j1, fi 5 9 ( B 2 ,b) can be expressed as

(fii

If we define

C = (c, + cz)/2 6' = /?'X'Xb/4a2

B2,

JAMES-STEIN WITH LEAST SQUARES

then (5) can be written as

Downloaded At: 21:21 30 January 2010

T h ~ sthe Pitman nearness of the James-Stein type estimators parameter j in a linear regression model can be expressed as

Bi

and [, of the

3. DISTRIBUTION OF W Under the iloinalitj d5sumption in Eq. jlj, C' has a central chi-square distribution with m - p degrees of freedom, while U has a noncentral chi-square distribution X9(62/2)with p degrees of freedom and noncentrality parameter 6'12. Since the distribution of W appears quite often, it is worthwhile to note its canonical formulation and to ascribe priority to its founder.

xi-,

Let U -Xi(62/2) and v -x,', where U and V are independent random DEFINITION variables. Then the random variable

is said to follow Rao's noncentral F-distribution with noncentrality parameter 6'12, k degrees of freedom in the camerator and r degrees of freedom ir? the denominator. We name this distribution in honor of C. R. Rao who first derived it in !984. Observe that if the noncentrality parameter is zero (6=0), then W is distributed as F,,,k/r, where F,,, has (Snedecor's) central F distribution with k degrees of freedom in the numerator and r degrees of freedom in the denominator. C. R. Rao et al. (1986) gave the following properties of W When k = 2 q where q is a positive integer, the density of W for w > O is given by

where P X i ) = A'e-"j! is the mass function associated with a Poisson distribution. The cumulative distribution function of W is given by

J. P. KEATING A N D V. CZITROM

where F,,,(.) ...

!d.l.

W I L I I iii

- - > .. auu n

is the cumulative distribution function of a central F distribution 3

------

A ~~ccuuirl. BII

- C C---A---

ucgicch ul

-l---:*L' -.-... :l,.LlC--A L - -".*LA--A L - L ~ I ~ U ~ I L I iS I I avallaulc I ~ I I U I I I LIIC ~ U L I I U Lnai L ~

wi!l calculate the probabi!ity statement given i: Eq. (10) for various values of 5, q and r.

4. EVALUATION OF THE PITMAN NEARNESS In this rection we Ugive a tahle that s h o w thatj in the Pitman nparnew sensei the L L . .

James-Stein -.stimator is preferred to the least squares estimator of the vector of - . .. -- -. parameters p !n !!near regression. We aiso show that, in the Pltman nearness sense, the James-Stein estimator can be improved upon. be the James-Stein estimate (given by Eq. ( 4 ) with the James-Stein Let choice c, = ( m- p)(p - 2)/(m- p + 3) and b, = b be the least squares estimate (Eq. (4) with c, =O). Table 1 gives the Pitman nearness of the James-Stein estimate /?, relative te the least squares estimate h for several values of p, m and 6. The PI\! values are tabulated using the result in Eq. ( 8 ) and the infinite series expression in (10). The values in Table 1 were tabulated using IMSL subroutines for the Poisson weights and values of the CDF of Snedecor's F distribution. The accuracy of the results for 6 = 0 can be verified from the CDF of Snedecor's F distribution. Note that the case of p=2 parameters is not tabulated because the James-Stein estimate reduces to the least squares estimate b. Since all values of the Pitman nearness given in Table 1 are greater than oneha!f, ?he !east squares estimate b is inadmissib!e with respect to the James-Steir, estimator in the Pitman nearness sense, for all tabulated values of 6, m and p. These numerical results support the analytical result of Sen et al. (1989). They prove that the Pitman nearness is always greater than one-half, so that the least squares estimator is inadmissible with respect to the James-Stein estimator in the Pitman nearness sense. The tabulated values show how much greater than one-half the values of Pitman nearness are for various values of 6, m and p. Keating and Mason (1988) show that the magnitude of preference based on PN does play an influential role on mean squared error comparisons. The following conclusions can be reached from examining Table 1. For given values of p and m, the smaller the value of 5 the stronger the preference for the James-Stein estimate over the least squares estimate. Also from Table 1, for given values of 6 and p, there is a slight decrease in the strength of the preference for the James-Stein estimate as the number of observations m increases. Furthermore, for given values of 6 and m, the value of the preference for the James-Stein estimate over the least squares estimate increases slightly as the number of parameters p increases. Thus the preference for the James-Stein estimate over the least squares estimate decreases slightly as the number of observations m increases, and increases slightly as the number of parameters p in the model increases.

Downloaded At: 21:21 30 January 2010

b1

JAMES-STEIN WITH LEAST SQUARES

Downloaded At: 21:21 30 January 2010

Table 1 Values of PN comparing the James-Stein estimator to the least squares estIlnaror in linear regrzssioil

It can be shown that the optimal choice of ci in Eq. (4) in the Pitman nearness sense is c* = ( m - p ) M , where M is the median of the random variable W Thus if c i = c , is shrunk towards a median c* we have an estimator which is better than the James-Stein estimator in the Pitman nearness sense. In fact, Sen et al. (1989) observe that the positive parts Stein rule is uniformly superior to the Stein rule in the sense of PN. However, c* depends upon the unknown parameter d2, so that Eq. (4) with c,=c* is not an estimator. These observations imply that, in the sense

6

J. P. KEATING AND V. CZITR0.M

of Pltman nearness, there are better choices of c, in Eq. (4) than the James-Stein 2). choice C, = ( i i ? - p j j p - Z j , ( m - p +

Downloaded At: 21:21 30 January 2010

5 ESTIMATION OF' d 2 AND CONCLUSIONS

In this section we give the iriaximum iiic-'" ,~;nuod estimate $2 of d2, and express the James-Stein estimate in terms thereof. We see that the James-Stain estimate is preferable to the least squares estimate in the Pitman nearness sense when the iilodei is iiiisspeoified (such as over-parameterized) andjor the model does not fir well (large s2). We relate S" to the F-statistic of the hypothesis test of Ho:/3=0. This leads to the conclusions that, since N, can be very strongly rejected in practice. the James-Stein estimate can be very close to the least squares estimate. and in such cases is only a slightly better estimate in the Pitman nearness sense. However, there are practical situations in which H , is strongiy rejected, but the James-Stein estimate is suficientiy different from the least squares estimate to make it a better estimate in the Pitman nearness sense. The maximum likelihood estimate of b 2 , which is related to the signal to noise ratio, is

Observe that J2 is small when b is close to zero (ie,, the model may be overparameterized or otherwise misspecified), when Xb is small (i.e., the estimated value of the observations is small) and when s2 is large (i.e., the model doesn't fit weiij. The James-Stein estimate can be written in terms of 8' as

This expression shows that the larger the value of J2, the closer the James-Stein (JS) estimate fi, is to the least squares (LS) estimate b. On the other hand, from Table 1 we saw that the larger the value of d, the over the LS estimate b in the Pitman smaller the preference for the JS estimate nearness (PN) sense. In other words, the larger 6 is, the closer 1, is to b and the smaller the preference of fi, over b. This interpretation makes the results of Table 1 plausible. We conclude that the preference for the JS estimate over the LS estimate is large in the PN sense (and the two estimates are not too close) when J2 is small. This occurs when the model is misspecified (such as overparameterized), the estimated value of the observations Xb is small, or s2 is large (the model doesn't fit well). Let us now consider the test of the null hypothesis H,: / ? = O with alternative H,: at least one P , # O If the null hypothesis H o is true, then all the parameters in the model are zero, including the constant term (if any); in this case Y = X/?+e = e, so

b1

JAMES-STEIN WITH LEAST SQUARES Table 2 Value of regression model

Downloaded At: 21:21 30 January 2010

p

m

6,

and of PN for diiTerent levels of significance in the

z = 0.05

a=O,OI

a=O.OO!

1

8

d

d

8

PN

PN

PN

-

D.DOO! Pit'

that the observations are random error, and the modei is misspeciiied or overparameterized. The F-statistic for the hypothesis test is

- MSR

p = ------

MSE

where M S X = b ' X f X b / p is the mean square for regression and M S E = s 2 is the mean square for error. The null hypothesis H,: /?=O is rejected at the a level of significance if F is greater than the tabulated value F,,,-,;, of the F distribution. in Eq. ( 1 1 ) in terms of the F-statistic in Eq. (13) as J 2 = ( p / 4 ) F . We can write Using this expression we see that the stronger the rejection of H,:/?=O (i.e., the large: the value ef F 2nd cmsequent!y thzt of J2) the c!oser is to hj and the smaller the preference of the JS estimate f i , over the LS estimate b in the P N sense. In practice, the statistician initially tests H,:/ ? = O for some set of regressor variables. However, the practitioner usually deletes the corresponding variables for which Bi is not significantly different from zero. The subsequent model is refitted so that no component of fi is zero. Therefore, in practice F can be quite large, which means that the JS estimate 8, has only a slight edge over the LS estimate b in the PN sense. Conversely, the JS estimate fi, will be substantially better than the LS estimate b in the PN sense only when the model does not fit extremely well. -.. The quantity 8a=,/(p/4jFP,,-p,, represents the smallest value of 8 at which the null hypothesis H,: /?=0 is rejected at the a level of significance. For example, if a=0.05, p=4 and m=20, then F , , , ~ 3 . 0 1 and 8,,,,=1.73. For 6=6,.,,, the relative difference between the JS estimate and the LS estimate b is /Ifi, -b/l/llbil= c , / 4 8 ~ , ,=0.148, , and the value pf P N can be computed as 0.7272. Note that both the relative difference between 8, and b as well as the PN are fairly large. The columns of Table 2 give 8, as well as the values of P N for 6 = 8,, that is, the

s2

ti

,,,,:

A

/?,

A

Downloaded At: 21:21 30 January 2010

8

J. P. KEATING AND V. CZITROM

values of PN at which one would reject the null hypothesis H , : B = O at significance ieveis x =O.Oj, O.Oi, O.OOi, and O.OCIO1 for p = 4 j t j i O and m = B , 60, !no. Tab!e 2 is very reminiscent of the resu!ts found in Jennrich 2nd Oman (1986). Observe that as .* decreases the value of PN decreases. However even at r =0.000i, [he preference of the James-Stein estimator over the ieast squares is too large io be ignsreb. Coiisequentiy we have three regions of the parameter space to be concerned with. For values of d near zero, the outcome is immaterial because the regression model does not fit the data. For moderate values of 6 (i.e., p/ 2 g G 2 2 p , say), tile regression modei cannot be rejected and the James-Stein estimator is significantly better than the least squares estimator. For 6 >2p, Table 1 reveals that the improvement obtained by the James-Stein estimator is negligible. The adaptive estimator (based on the outcome of an F-test) suggested in the paragraph above is simiiar in cnnsrrsction to the estimator suggested by - ~ a i k a r et ui. (19841. They coin the phrase '?estimatorm to describe such adaptive. estimators predicated on the outcome of certain tests of hypotheses. In fact such "testimators" date back to Bancroft (1972) and Hogg (1974). Consider the usual case in which there is a constant term P, in the model. The test statistic F' for testing the null hypothesis f i b : P i = O , i# ! is related to the test statistic F for H,: Pi=0 (i.e., H,: p = O ) by

This expression shows that F and F' are directly proportional, so that using F' one can reach conclusions that are analogous to those found using F. Computer programs usually express their analysis of variance tables in terms of sums of cAmarescorrected fm the mean, and of F'. --IAcknowledgement The authors presented the contents of this manuscript at the 1987 Joint Statistical Meetings in San Francisco.

Bibliography Bancroft, T. A. (1972). Some recent advances in inference procedures using preliminary tests of significance. Statistical Papers in Honor of George W Snedecor. Iowa State University Press, 19-30. Draper, N. R. and Van Nostrand, R. C. (1979). Ridge regression and James-Stein estimation: Review and comments. Technometrics 21, 451-466. Efron, B. and Morris, C. (1973). Stein's estimation rule and its competitors-an empirical Bayes approach. Journal of the American Statistical Association 68, 117-130. Halperin, M.(1970). O n inverse estimation in linear regression. Technometrics 12, 727-736. Hogg, R. V. (1974). Adaptive robust procedures: A partial review and some suggestions for future applications and theory. Journal of the American Statistical Association 69, 909-923. Jennrich, R. I. and Oman, S. D. (1986). How much does Stein estimation help in multiple linear repression?. Technometrics 28, 113-1 2 1.

JAMES-STEIN WITH LEAST SQUARES

9

Downloaded At: 21:21 30 January 2010

Keating. J. P. and Mason, R. L. (1958). James-Stem estimation from an alternative perspective. The American Staristicran 42, l 6 i t i 6 4 . Lindley, D. V. and Smith, A. F. M. (1972). Bayes estimates for the linear model. Journal of the Royal Statistical Society. Series B 34, 1-41. Rao, C. R., Keating, J. P. and Mason, R. L. (1986). The Pltman nearness criter~on and its determination. Communications in Statistics A15, 3173-3191. Sen, P. K. (1986). Are BAN estimators the Pitman-closest ones too'! Sankhya, Series A 48, 51-58. Sen, P. K., Kuboiiawa, T. and Stileh, A. K. M. E. (1989). The Stein paradox in the Pitman c!oseness. Annals of Statisrics, to appear. Waikar, V. B., Schuurman, F. J. and Raghunathan, T. E. (1984). On a two stage shrinkage testimator of the mean of a normal distribution. Communications in Statistics A13, no. 15, 1901-1913.

Suggest Documents