USING BOOTSTRAPPED CONFIDENCE INTERVALS ... - CiteSeerX

1 downloads 0 Views 163KB Size Report
More recently, Atkinson and Wilson (1992) have demonstrated analytically that this is ..... Atkinson, Scott E. and Paul W. Wilson, The bias of bootstrapped versusĀ ...
USING BOOTSTRAPPED CONFIDENCE INTERVALS FOR IMPROVED INFERENCES WITH SEEMINGLY UNRELATED REGRESSION EQUATIONS Paul Rilstone and Michael Veall

York University (416) 736-2100 and McMaster University (905) 525-3829 First Version: November, 1993, This version: June, 1995

The authors would like to thank three anonymous referees and the editor for their comments. Any remaining errors are the responsibility of the authors. Research funding for both authors was provided by the Social Sciences and Humanities Research Council of Canada. Typeset by AMS-TEX 1

Running Title: Bootstrapped Confidence Intervals

Corresponding Author's address: Paul Rilstone Department of Economics York University North York, Ontario CANADA, M3J 1P3

2

Abstract

The usual standard errors for the regression coecients in a Seemingly Unrelated Regression model have a substantial downward bias. Bootstrapping the standard errors does not seem to improve inferences. In this paper Monte Carlo evidence is reported which indicates that bootstrapping can result in substantially better inferences when applied to t-ratios rather than to standard errors.

3

1. Introduction

Systems of Seemingly Unrelated Regression (SUR) equations have received extensive attention in the econometrics and statistics literature. A well documented result in this context is that the usual standard errors for the regression coecients have a substantial downward bias. The central implication of this is that the usual hypothesis tests have a tendency to overreject. A natural suggestion is that better standard errors might be obtained via the bootstrap. However, research in this direction has suggested that bootstrapping may not be the panacea. Marais (1986) reported Monte Carlo evidence that the usual bootstrapped standard errors, while better than the conventional estimates, were still biased downward. More recently, Atkinson and Wilson (1992) have demonstrated analytically that this is the case and provided further Monte Carlo evidence that bootstrapped standard errors are biased downwards. An apparent implication is that the bootstrap provides little improvement for inferences over the traditional techniques. We say apparent since the primary reason one estimates standard errors (which themselves are only nuisance parameters) is to make inferences and it would seem more reasonable to focus the attention, say, on estimation of con dence intervals. The purpose of this paper is to demonstrate that, when some second order re nements are utilized, the bootstrap can provide much better inferences in the SUR context than evidenced by the results on standard error estimates. In the econometrics literature Efron's (1979) bootstrap has been used primarily to estimate properties of the asymptotic distribution of an estimator which may be analytically intractable. In this way, the bootstrap has been used to avoid the derivation of asymptotic results and/or their numerical calculation. While the bootstrap is certainly a valuable tool when employed in this way, more recent research has focused on the fact that, in certain contexts or with certain modi cations, the bootstrap estimate of the distribution function of a statistic is actually more accurate than that provided by asymptotic theory. Efron's (1987) accelerated bias correction (BCa method) produces bootstrap estimates of the distribution of an estimator which are second order accurate. Singh (1981), Hartigan (1986), Wu (1986), Beran (1988) and Hall (1988) have shown that the bootstrap estimate of the distribution function of an asymptotically pivotal statistic (the percentile-t or pre-pivoting method) is also, in a higher order sense, closer to that of the true distribution than the usual estimate based on asymptotic theory. In contrast, the traditional bootstrap estimate of the distribution of a (nonpivotal) statistic is generally not better than that furnished by 4

asymptotic theory. To see if these analytical results hold with respect to inferences with SUR models we reconsidered the Monte Carlo experiment employed by Atkinson and Wilson. As will be seen, the results support a much more positive image of the bootstrap when the percentile-t method is used although the BCa method does not seem to lead to any great improvement in this context over traditional techniques. The discussion proceeds as follows. In the next section we brie y review various approaches to bootstrapping. In Section 3 we describe the simulations and results. Section 4 concludes. 2. Improved bootstrapped confidence intervals

There are several formal and detailed comparisons of various bootstrap methods including Hall (1988, 1992) and Beran (1988). We summarize the ideas here. Let ^ be an estimator of a parameter  based on a sample X . Two ways in which Efron's (1979) bootstrap has been used in econometrics to construct con dence intervals for  are as follows. Let ^ denote a bootstrapped estimator of  based on a resampling (random and with replacement) from X . The respective distribution functions are denoted

G(x) = Pr[^  x]

and

G^ (x) = Pr[^  xjX ]:

An asymptotically valid (1 ? )100% con dence interval for  is given by h i IBD = G^ ?1 ( =2); G^ ?1 (1 ? =2) :

(2.1)

(2.2)

In a practical context, the =2 and 1 ? =2 quantiles from the empirical distribution of B estimates of ^ from resampling would be used to construct the con dence intervals. Another way in which the bootstrap has been used to construct con dence intervals (explained in detail in Efron and Tibshirani, 1993) and that implicitly employed in the papers concerned with standard errors is as follows. Suppose ^ is assumed asymptotically or approximately normal with mean  and standard devation (^). Let (^) denote the standard deviation of the bootstrap estimator of ^, denote the standard normal cumulative distribution function by (z) and put zx = ?1 (x). The con dence interval for  is i h (2.3) IB = ^ ? (^)z1? =2 ; ^ ? (^)z =2 : In practice, con dence intervals are constructed in the usual manner with the usual standard error replaced by the standard deviation of B resampled estimates of . 5

To improve on inferences using IBD , Efron (1987) introduced the BCa method to adjust for second order bias and skewness. In this case the con dence interval is

h

i

IBCa = G^ ?1 ((z[ =2])) ; G^ ?1 ((z[1 ? =2]))

(2.4)

where z[x] = m + (m + zx )=(1 ? a(m + zx )) and m and a are measures of the bias and skewness of the estimator.1 An alternative (and simpler) way to achieve higher order accuracy is the percentile-t method. Again, suppose ^ is asymptotically normal with asymptotic standard deviation (^). De ne the usual t-statistic and its bootstrap by ^ ^ ^ (2.5) t^ =  ?^ ; t^ =  ?^  ^ () ^ ( ) (where ^ (^) and ^ (^) denote the usual, estimated, standard errors based on a sample and a resample respectively) and their respective distribution functions by

H^ (x) = P [t^  xjX ]: (2.6) A (1 ? )100% con dence interval for t^ is [t =2; t1? =2] = [H^ ?1 ( =2); H^ ?1(1 ? =2)]: The percentile-t method therefore provides alternative critical points for the usual t-statistic in place of those provided by asymptotic theory. Con dence intervals for  are constructed as H (x) = P [t^  x];

h

i

IBt = ^ ? ^ (^)t1? =2; ^ ? ^ (^)t =2 :

(2.7)

3. Monte Carlo design and results

The experiments were based directly on the experimental design used by Atkinson and Wilson. A two equation model was constructed and the various estimation techniques were compared for a variety of parameter values. The model was thus:    0    y1i = X1i 0 + 1i ; i = 1; 2; : : : ; N (3.1) y2i 0 X20 i 2i or more concisely 1 As Hall (1992, p.134) points out, there are several asymptotically equivalent ways to write this correction. We experimented with several: none of the other forms seemed to perform any better.

6

yi = Xi + i ; i = 1; 2; : : : ; N: (3.2) X1i and X2i were both k  1, k = 3, vectors, the rst element of each being a constant. The other regressors were i.i.d., jointly normal with mean zero, unit variance and covariance x . These were drawn once so that inferences were conditional on that particular draw. The i 's were also i.i.d., jointly normal with mean zero, unit variance and covariance . All the j 's were equal to two. For these experiments the parameters and sample sizes were conducted for a range of values, speci cally, x (0,.5),  (.25,.75) and N (25, 50, 100). Each experiment was replicated 1000 times. All of the estimators were of the form !?1 N N X ^ X Xiyi : (3.3) Xi^ Xi0 ^ = i=1

i=1

With least squares ^ was set equal to the identity matrix. The SUR estimators di ered by their choice of ^ . The usual or restricted estimators used the second moment matrix of the usual least squares residuals. The unconstrained or unrestricted estimators used least squares residuals from the projections of y1i and y2i on both X1i and X2i. An incompletely resolved issue with SUR estimation is the appropriate normalizing factor or degrees of freedom for the estimation of covariance matrices and standard errors. In all cases we followed Atkinson and Wilson and used N ?1 . For each replication, ve sets of 90, 95 and 99 percent con dence intervals were computed ( = :10; :05; :01). The rst was based on the usual standard errors for each estimator using the N (0; 1) distribution to calculate the con dence intervals. The two sets of con dence intervals based on (2.2) and (2.3) as well as those produced by the percentile-t method (2.7) are straightforward. The accelerated bias correction needs a little explanation. Following Efron (1987), for coecient j the bias parameter was estimated by m^ j = ?1 (G^ ( ^j )). For the OLS and SUR estimation de ne Uji as the sample in uence function of the residual ei on estimate ^j so that Uji = S(j) xi ^ ?1 ei where S(j) is the j 'th row of the inverse of PN ^ ?1 0 ^ i=1 Xi  Xi . The skewness measure for j was estimated by ! N !3=2 N X 2 X 1 : (3.4) ji Uji3 = a^j = 6 i=1

i=1

The only di erence from the Atkinson and Wilson experiment (apart from the random numbers) was that we set B = 200 rather than 100 because B = 100 does not provide 7

for sensible con dence intervals with = :01. For any particular draw, even 200 might seem small, particularly for computing quantiles at the tails of the distribution; however the errors in large measure cancel in a Monte Carlo setting. (In the context of a speci c empirical example, where a large number of Monte Carlo trials need not be performed, it would probably be worthwhile to increase B to 1000 as suggested by Efron (1987). We also conducted the experiment with N = 25,  = :25 and x = 0 using B = 2000. The results changed very little.) The endpoints for the (1 ? )100% con dence intervals were then computed as the =2, 1 ? =2-empirical quantiles. There is some argument how to choose these, particularly for relatively small B . With B bootstrap estimates of ^ we used a traditional approach with the p'th quantile estimated by ^p = (1 ? g)^(j) + g^(j+1) where j = int[(B + 1)p] for p = =2; 1 ? =2 and (B + 1)p = j + g and ^(j) was the j 'th order statistic from the bootstrapped ^'s. To compare the intervals we calculated the percentage of times that the intervals included the true value of the parameters. If the con dence interval estimates are unbiased these be should equal to the nominal (1 ? )100% levels. For each replication there were six numbers corresponding to each of the estimated coecients. To summarize the results succinctly the numbers given in Tables 1{9 represent the average \acceptance" rates over the six coecients. In the tables \IC " refers to the usual, classical, con dence intervals. A number of points can be made with respect to the results. First, changing the parameter values did not have a great e ect on the relative performance of the techniques. The coverage of the con dence intervals was slightly increased for larger amounts of correlation amongst the regressors (x = :5). Second, the results certainly re ect the well established fact that the usual standard errors are downwardly biased. In all of the experiments, the coverage of the classical con dence intervals was substantially below the nominal coverage. The results of Atkinson and Wilson were also con rmed in that in most (but not all) cases, the con dence intervals based on a bootstrapped estimate of the variance also had too little coverage, although in general their coverage was slightly better than those based on the classical procedure. (None of the techniques were substantially skewed. The percentile-t intervals were generally wider than the other intervals.) It is interesting that the coverage errors for IC , IB and IBD are relatively similar, for while all of the twosided equal-tailed con dence intervals studied here are second-order accurate in terms of coverage (Hall, 1992, p. 104), these three intervals all have critical points that are only rst order accurate and not second order accurate as in the case of percentile-t intervals. (We also calculated one-sided intervals. The ranking of the techniques was the same.) 8

The overall results were strongly in favour of the percentile-t method. Of the 108 di erent experiments reported in the tables the percentile-t method had the best coverage accuracy in all but 14 cases. Moreover, in these 14 cases, the coverage of the percentile-t method was still very close to nominal levels, the largest di erence being just over one percentage point and the average di erence over the fourteen cases being less than one third of a percentage point. Overall the percentile-t method was typically within one percentage point of nominal coverage and the maximum di erence was about 2.5, 1.5 and .5 percentage points respectively for the 90%, 95% and 99% con dence intervals. For all the other methods, the comparable di erences were at least 5, 4 and 3 percentage points. Perhaps the most important comparison is with respect to the estimators' performances with the small (N = 25) samples. This is, after all, one of the motivations for using the newer techniques. (With the larger samples all the techniques re ect the asymptotic results.) With samples of size 25, those techniques not based on prepivoting had coverage errors often as high as 5 points or more. In contrast the percentile-t con dence intervals seldom had coverage errors exceeding 1.5 percentage points. The performance of the BCa method was disappointing: in many situations it performed less well than the classical approach. This was consistent with results in Rilstone (1993). One possible reason for this, pointed out by a referee and discussed by Hall (1992), is due to nonmonotonicity (with respect to ) in the BCa interval. This has the e ect of shortening the intervals. Since the BCa correction has various representations that are second order equivalent, we attempted these, but without any success. The BCa method tended to do better for the larger samples and smaller 's. Since most bootstrapping estimates are obtained by Monte Carlo methods, a nontrivial consideration concerns the di erences in computational complexity. Each of the bootstraps used in the simulations require B = 200 resamples. In addition IBD , IBCa and IBt require sorting the B estimates. IBCa requires slightly more time than IBD to calculate the bias correction. The percentile-t method requires additional time to compute both estimates and their standard errors for each resample. Our personal experience, however, is that this is o set by the simple construction of the test statistic. In contrast, the form of the accelerated bias correction is not obvious in many situations. From a practitioner's perspective the di erences in computing time may be a relatively minor consideration. We also conducted the simulations with larger SUR models (2 equations with 8 regressors in each and 3 equations with 4 regressors in each) as well as with t(5) disturbances rather normal. With the larger SUR models, the ranking of the con dence intervals re9

mained the same in terms of coverage errors. With the t(5) disturbances, the relative ranking of the various bootstrapped intervals was essentially the same; the usual classical intervals were occasionally better. 4. Conclusion

Our simulations support recent results that indicate that the accuracy of the bootstrap can be substantially increased when applied to asymptotically pivotal statistics. In the SUR model the percentile-t method led to substantial reductions in coverage error as compared to more traditional bootstrap con dence intervals. On the other hand, the accelerated bias correction of Efron (1987) did not lead to any substantial improvements. We should add that these results do not imply that the bootstrap will provide substantially better con dence intervals in more complicated situations, for example where nonlinearities or nonnormal disturbances are involved. These issues are being investigated. References Atkinson, Scott E. and Paul W. Wilson, The bias of bootstrapped versus conventional standard errors in the general linear and SUR models, Econometric Theory 8 (1992), 258{275. Beran, R.J., Prepivoting test statistics: A bootstrap view of asymptotic re nements, Journal of the American Statistical Association 83 (1988), 687{697. Efron, B., Bootstrap methods: Another look at the jackknife, Annals of Mathematical Statistics 7 (1979), 1{26. Efron, B., Better bootstrap con dence intervals (with discussion), Journal of the American Statistical Association 82 (1987), 171{200. Efron, B. and R. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, New York, 1993. Hall, Peter, Theoretical comparison of bootstrap con dence intervals (with discussion), Annals of Statistics 16 (1988), 927{953. Hall, Peter, The Bootstrap and Edgeworth Expansion, Springer{Verlag, New York, 1992. Hartigan, J.A., Comment, Statistical Science 1 (1986), 75{77. Marais, M.L., On the nite sample performance of estimated generalized least squares in seemingly unrelated regression: nonnormal disturbances and alternative standard error estimators, (1986), working paper, Graduate School of Business, University of Chicago. Rilstone, Paul, Some improvements for bootstrapping regression estimators under rst-order serial correlation, Economics Letters 42 (1993), 335{339. Singh, K., On the asymptotic accuracy of Efron's bootstrap, Annals of Statistics 9 (1981), 1187{1195. Wu, C.F.J., Jackknife, bootstrap and other resampling methods in regression analysis, Annals of Statistics 14 (1986), 1261{1295.

10

Table 1: Percentage coverage of nominal 90 per cent con dence intervals based on OLS estimates of . .75,.5

50 .25,0 .25,.5 .75,0

.75,.5

100 .25,0

.25,.5

.75,0

.75,.5

.8527 .8490 .8787 .8550 .8427

.8797 .8788 .8975 .8787 .8730

.8860 .8805 .9005 .8833 .8755

.8843 .8817 .8948 .8795 .8733

.8878 .8823 .8942 .8815 .8735

.8898 .8873 .8967 .8863 .8803

.8907 .8873 .8993 .8872 .8815

.8877 .8845 .8952 .8873 .8793

.8877 .8847 .8913 .8883 .8807

.75,.5

50 .25,0

.25,.5

.75,0

.75,.5

100 .25,0

.25,.5

.75,0

.75,.5

IC .8430 .8448 .8595 .8592 .8785 .8793 .8858 .8875 .8885 .8868 .8898 .8620 .8620 .9253 .9228 .8877 .8865 .9462 .9448 .8940 .8938 .9465 IB IBt .8860 .8835 .8993 .8968 .8980 .8990 .9042 .9022 .8973 .8973 .8982 IBD .8628 .8648 .9263 .9228 .8888 .8892 .9453 .9457 .8970 .8962 .9473 .8548 .8583 .9192 .9155 .8805 .8838 .9388 .9368 .8880 .8890 .9410 IBCa Table 3: Percentage coverage of nominal 90 per cent con dence intervals based on unconstrained SUR estimates of .

.8873 .9457 .8978 .9452 .9383

N  ,x

Method

25 .25,0

.25,.5

.75,0

IC .8523 .8530 .8547 IB .8480 .8488 .8543 IBt .8818 .8798 .8773 IBD .8517 .8500 .8565 IBCa .8462 .8447 .8482 Table 2: Percentage coverage of SUR estimates of . N  ,x

Method

N  ,x

Method

25 .25,0 .25,.5

25 .25,0

.25,.5

.75,0

.75,0

IC .8190 .8215 .8215 IB .8642 .8642 .9318 IBt .8852 .8843 .8860 IBD .8665 .8660 .9320 IBCa .8585 .8590 .9275 Table 4: Percentage coverage of estimates of .

nominal 90 per cent con dence intervals based on usual

.75,.5

50 .25,0

.25,.5

.75,0

.75,.5

100 .25,0

.25,.5

.75,0

.75,.5

.8238 .9277 .8868 .9288 .9237

.8677 .8892 .8968 .8900 .8800

.8673 .8860 .8977 .8892 .8835

.8653 .9470 .8947 .9462 .9392

.8627 .9450 .8943 .9458 .9378

.8840 .8938 .8962 .8973 .8878

.8808 .8938 .8952 .8965 .8888

.8808 .9475 .8930 .9487 .9427

.8802 .9457 .8945 .9465 .9387

nominal 95 per cent con dence intervals based on OLS

N  ,x

25 .25,0

.25,.5

.75,0

.75,.5

50 .25,0

.25,.5

.75,0

.75,.5

100 .25,0

.25,.5

.75,0

.75,.5

IC IB IBt IBD IBCa

.9080 .9093 .9380 .9108 .9045

.9072 .9078 .9365 .9117 .9043

.9107 .9145 .9375 .9142 .9057

.9112 .9140 .9357 .9137 .9042

.9350 .9342 .9473 .9322 .9287

.9355 .9347 .9480 .9365 .9288

.9357 .9340 .9422 .9313 .9287

.9362 .9353 .9453 .9358 .9290

.9430 .9420 .9495 .9418 .9340

.9443 .9453 .9505 .9447 .9375

.9407 .9402 .9440 .9397 .9317

.9417 .9402 .9438 .9410 .9335

Method

11

Table 5: Percentage coverage of nominal 95 per cent con dence intervals based on usual SUR estimates of . N  ,x

25 .25,0

.75,.5

.25,.5

.75,0

.75,.5

100 .25,0

.75,.5

IC .9052 .9038 .9185 .9168 .9318 .9315 .9372 .9380 .9427 .9427 .9440 IB .9172 .9172 .9603 .9577 .9380 .9415 .9727 .9727 .9480 .9482 .9748 IBt .9395 .9402 .9485 .9455 .9485 .9477 .9515 .9488 .9490 .9512 .9478 IBD .9182 .9192 .9608 .9595 .9372 .9403 .9733 .9723 .9470 .9478 .9735 IBCa .9127 .9142 .9555 .9508 .9317 .9347 .9708 .9682 .9410 .9410 .9710 Table 6: Percentage coverage of nominal 95 per cent con dence intervals based on unconstrained SUR estimates of .

.9405 .9743 .9493 .9740 .9707

.25,.5

.75,.5

50 .25,0

.25,.5

.75,0

.75,.5

100 .25,0

.25,.5

.75,0

.75,.5

.8890 .9605 .9392 .9618 .9550

.9253 .9382 .9473 .9382 .9328

.9247 .9413 .9478 .9405 .9357

.9223 .9730 .9472 .9733 .9702

.9248 .9733 .9462 .9733 .9682

.9393 .9482 .9498 .9473 .9418

.9395 .9482 .9503 .9477 .9413

.9365 .9747 .9437 .9737 .9705

.9345 .9748 .9462 .9742 .9705

.75,.5

50 .25,0

.25,.5

.75,0

.75,.5

100 .25,0

.25,.5

.75,0

.75,.5

.9728 .9740 .9873 .9745 .9593

.9865 .9835 .9905 .9848 .9738

.9867 .9847 .9895 .9860 .9743

.9850 .9832 .9883 .9838 .9722

.9852 .9825 .9878 .9847 .9747

.9870 .9868 .9918 .9883 .9777

.9882 .9878 .9915 .9885 .9788

.9875 .9843 .9897 .9855 .9743

.9873 .9862 .9920 .9873 .9780

.75,.5

50 .25,0

.25,.5

.75,0

.75,.5

100 .25,0

.25,.5

.75,0

.75,.5

IC .9670 .9670 .9755 .9743 .9825 .9842 .9847 .9860 .9870 .9867 .9883 IB .9740 .9755 .9902 .9907 .9852 .9863 .9937 .9930 .9880 .9892 .9952 IBt .9873 .9860 .9913 .9900 .9903 .9898 .9898 .9905 .9917 .9918 .9917 IBD .9765 .9753 .9885 .9895 .9853 .9873 .9937 .9938 .9902 .9905 .9957 IBCa .9627 .9588 .9832 .9830 .9750 .9772 .9897 .9905 .9812 .9813 .9918 Table 9: Percentage coverage of nominal 99 per cent con dence intervals based on unconstrained SUR estimates of .

.9880 .9948 .9917 .9962 .9918

N  ,x

Method

25 .25,0

.75,0

50 .25,0

.75,0

Method

.25,.5

.25,.5

.75,0

IC .8862 .8877 .8880 IB .9187 .9183 .9632 IBt .9383 .9375 .9412 IBD .9205 .9197 .9648 IBCa .9150 .9160 .9607 Table 7: Percentage coverage of estimates of . N  ,x

Method

25 .25,0

.25,.5

.75,0

IC .9700 .9707 .9720 IB .9717 .9723 .9730 IBt .9853 .9852 .9870 IBD .9752 .9728 .9727 IBCa .9587 .9553 .9587 Table 8: Percentage coverage of SUR estimates of . N  ,x

Method

25 .25,0

.25,.5

.75,0

nominal 99 per cent con dence intervals based on OLS

nominal 99 per cent con dence intervals based on usual

N  ,x

25 .25,0

.25,.5

.75,0

.75,.5

50 .25,0

.25,.5

.75,0

.75,.5

100 .25,0

.25,.5

.75,0

.75,.5

IC IB IBt IBD IBCa

.9550 .9753 .9855 .9770 .9635

.9562 .9757 .9852 .9758 .9602

.9563 .9907 .9882 .9898 .9850

.9575 .9917 .9873 .9907 .9848

.9782 .9855 .9893 .9852 .9747

.9810 .9865 .9888 .9872 .9768

.9802 .9937 .9882 .9940 .9900

.9795 .9932 .9877 .9940 .9907

.9862 .9878 .9918 .9903 .9807

.9857 .9892 .9920 .9900 .9810

.9862 .9952 .9908 .9957 .9918

.9870 .9950 .9907 .9962 .9918

Method

12