Inference in Two-Step Panel Data Models with Time ...

1 downloads 0 Views 109KB Size Report
Mar 14, 2006 - ance matrix without imposing conditional homoscedasticity or serial independence on the model errors. Because of the inherent complication ...
Inference in Two-Step Panel Data Models with Time-Invariant Regressors

Scott E. Atkinson Department of Economics University of Georgia Athens, GA 30602 Christopher Cornwell Department of Economics University of Georgia Athens, GA 30602

March 14, 2006

DRAFT: Do not cite or quote without the permission of the authors.

Abstract

The primary advantage to panel data is the ability they a ord to control for unobserved heterogeneity or e ects. The xed-e ects (FE) estimator is by far the most popular technique for exploiting this advantage, but it eliminates any time-invariant regressors in the model along with the unobserved e ects. This problem can be easily solved in a secondstep regression of residuals constructed from the FE estimator and the time means of the data on the time-invariant variables. Hausman and Taylor (1981) proposed such a twostep estimator, allowing for some of the time-invariant variables to be correlated with the e ects. Unfortunately, the Hausman-Taylor procedure appears to have been overlooked in empirical applications where the time-invariant variables are of interest to policy makers. In this paper, we resurrect the Hausman-Taylor estimator and derive its asymptotic covariance matrix without imposing conditional homoscedasticity or serial independence on the model errors. Because of the inherent complication in computing the asymptotic formula and nite-sample bias in the asymptotic standard errors, we consider bootstrapping alternatives to standard-error estimation. Using Monte Carlo methods, we compare the size of the asymptotic and bootstrap alternatives. In addition, we apply our two-step estimator and alternatives for inference to the problem of estimating the returns to education.

1

1. Introduction

The primary advantage to panel data is the ability they a ord to control for unobserved heterogeneity or e ects. The xed-e ects (FE) estimator is by far the most popular technique for exploiting this advantage, because it makes no assumption about the relationship between the explanatory variables in the model and the e ects. However, a well known problem with the FE estimator is that eliminates any time-invariant regressors in the model along with the e ects. This problem can be easily solved in a second-step regression of residuals constructed from the FE estimator and the time means of the data on the time-invariant variables. Hausman and Taylor (1981) proposed such a two-step estimator, allowing for some of the time-invariant variables to be correlated with the e ects. Unfortunately, the HausmanTaylor procedure appears to have been overlooked in empirical applications where the time-invariant variables are of interest to policy makers. In such cases, the FE estimator is usually abandoned entirely for a random-e ects (RE) approach. Two-step procedures are even omitted from several panel-data econometrics texts. The reader is either left with the impression that the partial e ects of time-invariant variables are irretrievably lost in a FE regression or given no guidance about how to recover inference about them. For example, Baltagi (2005) correctly states on page 13 that the \: : : FE estimator cannot estimate the e ect of any time-invariant variable like sex, race, religion, schooling, or union participation. These time-invariant variables are wiped out : : :". However, no mention is made of the ability to recover these e ects and their estimated standard errors. In this paper, we resurrect the Hausman-Taylor estimator and derive its asymptotic covariance matrix without imposing conditional homoscedasticity or serial independence

2

on the model errors. Because of the inherent complication in computing the asymptotic formula and nite-sample bias in the asymptotic standard errors, we consider bootstrapping alternatives to standard-error estimation. Following Davidson and MacKinnon (2004), we adapt the naive bootstrap, the pairs bootstrap, and the wild bootstrap to our two-step estimator. Using Monte Carlo methods, we compare the size of the asymptotic and bootstrap alternatives. In addition, we apply our two-step estimator and alternatives for inference to the problem of estimating the returns to education. Education is typically completed before a person enters the sample and it is generally assumed to be correlated with the unobserved e ect. The remainder of this papers is organized as follows. In section 2 we introduce the our model and two-step estimator. In section 3 we present the asymptotic formula for correcting the second-stage estimated standard errors and the bootstrap alternatives. Section 4 reports the results of our Monte Carlo simulations to compute the size of tests based on our 4 estimators of standard errors. In section 5, we report the results of our estimation methods applied to a panel data set. Conclusions follow in section 6. 2. The Two-Step Model and Parameter Estimation

We consider the estimation of linear panel-data models of the form yit = xit + zi + ci + eit ;

i = 1; : : : ; N ; t = 1; : : : ; T

(2:1)

where xit is a K -vector time-varying regressors zi is a G-vector of time-invariant regressors, ci is an unobserved e ect that is xed for the cross-section unit, and eit is an error term.

For most of the discussion that follows we will want to work with the form of the model that combines all T observations for each cross-section unit: yi

= Xi + ( j T

zi ) + jT ci + ei ; 3

(2:2)

where yi and ei are T

 1 vectors, Xi is T  K , and jT

is a T -vector of ones.

Our interest is in estimating allowing for the possibility that some of the variables in Xi and zi are correlated with the unobserved e ect, Thus we de ne the following set of orthogonality conditions through which the model parameters will be identi ed: E (Xi1 ci ) = 0 and E (zi1 ci ) = 0;

(2:3)

where Xi1 and zi1 have column dimensions K1 and G1 , respectively. In addition, we will maintain throughout that Xi and zi are strictly exogenous with respect to ei . Our approach to the estimation of begins with the xed-e ects (FE) estimator of , which is consistent even if E (Xi1 ci ) 6= 0:

X  1X 0 0 ^F E = Xi Qi Xi Xi Qi yi i

where

Qi

=

IT

0

(2:4)

i

0 is familiar projection that time de-means the data. The

jT (jT jT ) 1 jT

second step uses ^F E to compute the group residuals ^i = yi

 i ^F E ; x

(2:5)

and then formulate the regression model ^i = zi + ui ;

(2:6)

where ui = i

 i ( ^F E x

)

i = ci + ei , and the over-bar indicates sample-period mean for unit i (e.g., x i = T1

(2:7)

P

t xit ).

We then estimate by applying instrumental variables (IV) to (2.5) utilizing instruments

4

implied by (2.3). The estimator, which we label ^F E because it is derived from the FE estimator of , can be written as

^F E =

X i

0 z wi i

X  1 X  1 X X  1X 0 0 0 0 0 wi wi wi zi zi wi wi wi wi ^i i

i

i

i

i

(2:8)

where wi is a J  1 vector of instruments satisfying E (wi0 ui ) = 0:

(2:9)

This sort of procedure was rst proposed by Hausman and Taylor (1981) as a \consistent but inecient" estimator of the coecients of time-variant variables in linear FE models when some of those variables may be correlated with the e ects. However, they did not derive the asymptotic covariance matrix for ^F E . 3. Computing Second-Stage Standard Errors

3.1. Asymptotic Covariance Matrix

Murphy and Topel (1985) provide a framework for asymptotic inference with two-step estimators. We adapt their result to the problem of deriving the asymptotic covariance matrix of ^F E allowing for conditional heteroscedasticity. We begin by writing the sampling error of ^F E as

^F E =

X i

0 z wi i

X  1 X  1 X X  1X 0 0 0 0 0 wi wi wi zi zi wi wi wi w i ui : i

i

i

p

Using standard arguments, we can show that N (^ F E

i

i

(3:1)

) is asymptotically normal

with a limiting covariance matrix that can be expressed as (Bzw Bww1 Bwz )

1

1 1 1 1 Bzw Bww A Bww Bwz (Bzw Bww Bwz ) ;

5

(3:2)

where, e.g., Bzw = plim N1 A

P 0 i zi wi .

= plim

1 N

X i

As implied by (2.7) i2 wi0 wi + plim

where V ^FE is the limiting covariance matrix of

p

1 N

X i

N ( ^F E

0

i; FE x

 i V ^ x

(3:3)

).

A consistent estimator of the asymptotic covariance matrix of ^F E hinges on the consistent estimation

A.

The latter is accomplished by utilizing the robust covariance

matrix estimator of V ^FE , ^ ^ V

FE

X  1X X  0 0 0 0 = Xi Q i Xi Xi Q i e ^i e^i Qi Xi Xi Qi Xi i

i

i

1

(3:4)

(see Arellano (1987)), and extracting an estimator of i from the second-step residuals. 3.1. Bootstrap Methods

Simpler to compute than the two-step analytical formula, Bootstrapping standard errors is often easier than computing standard-error estimators based on the asymptotic formula. In addition, bootstrapping is typically less biased in small samples and produces an actual size for t-tests that is closer to the nominal size. The rst bootstrap estimator that we consider for the second-stage standard errors assuming i.i.d. errors is the naive bootstrap estimator. Given this assumption for the error terms, the naive bootstrap does not produce heteroskedasticity-robust estimated standard errors, as shown by Lancaster (2003). 3.1.1. Naive Bootstrap Estimator:

Step 1:

Estimate ^F E in (2.4).

Step 2:

Using ^F E , compute ^i = yi

 i ^F E x 6

(3:5)

Step 3:

Estimate ^F E in (2.8).

Step 4:

In (2.1), let it = ci + eit . Then compute ^it = yit

^F E xit

where ^ must be weighted by [NT=(NT

(jT K

zi )^ F E ;

(3:6)

G)], the standard in ation factor for

bootstrap residuals. For each of i = 1; : : : ; N blocks, draw randomly with replacement T observations

Step 5:

with probability 1=T from it to obtain it . Step 6:

Generate yit = xit ^F E + (jT

Step 7:

zi )^ F E + it :

(3:7)

Compute xed e ects estimator of ^ using the starred data:

X  1X  0 0  F E = Xi Q i Xi Xi Qi yi : i

Step 8:

i

Using F E from the previous stage, compute



u^i = ^i Step 9:

(3:8)

 i ( F E x

^F E ):

(3:9)

Randomly resample with replacement from u^i to obtain ui . Then compute i = zi ^F E + ui :

Step 10:

F E = Step 11:

(3:10)

Compute the second-stage estimator F E using the starred data:

X i

0 z wi i

X  1 X  1 X X  1X 0 0 0 0 0 ~ : wi wi wi zi zi wi wi wi wi  i i

i

i

i

i

Repeat steps 5-9 1,000 times and compute the sample standard deviation of F E

as an estimator of the standard error of ^F E . 7

3.1.2. Pairs Resampling Bootstrap Estimator

Lancaster (2003) shows that the \pairs" bootstrap estimator yields heteroskedasticityrobust estimated standard errors. We construct this estimator by drawing pairs randomly. As an extension we develop the following resampling bootstrap estimator: Step 1:

Estimate ^F E in (2.4).

Step 2:

Using ^F E , compute ^i = yi

 i ^F E x

(3:11)

Step 3:

Estimate ^F E in (2.8).

Step 4:

For each i = 1; : : : ; N blocks draw randomly with replacement T observations with

probability 1=T from fyit ; xit ; zit ; wit g; to obtain fyit ; xit ; zit ; wit g. Step 5:

Compute xed e ects estimator of ^ using the starred data:

X  1X      Xi Q i Xi F E = Xi Q i yi : 0

0

i

Step 6:

(3:12)

i

Using F E from the previous stage, compute i = yi

 

 i F E : x

(3:13)

Step 7:

Randomly resample by pairs from i and zi to obtain ~i and z~i :

Step 8:

Compute the second-stage estimator F E using the starred data:

X

F E =

i

X  1 X      ~i wi wi wi z i

~ w  z i

0

0

0

i

1

i

X X  1X      ~ : ~i wi z wi wi wi  i 0

i

Step 9:

0

i

0

i

Repeat steps 3-5 1,000 times and compute the sample standard deviation of F E

as an estimator of the standard error of ^F E . 8

3.1.3. Wild Bootstrap Estimator

Davidson and MacKinnon (2002) show that the Wild bootstrap estimator produces heteroskedasticity-robust estimated standard errors. We construct this estimator following the steps used to generate the Naive bootstrap estimator except that in Step 4, we replace ^ with f (^it )~vit ;

where f (^it ) =

(3:14)

^it (1 hit )1=2

hit is the projection matrix corresponding to (3.7), and v~it is de ned as v~it =



1 withprobability 21 1 withprobability 12



(3:15)

See MacKinnon (2002) for details. 5. Size Calculations

We now perform a number of Monte Carlo experiments designed to compare the actual sizes of the asymptotic formula with those of the bootstrap methods and the uncorrected estimator of the second-stage standard errors. For the Monte Carlo experiments, we utilize 1,000 replications and 400 bootstrap trials. The rst stage model has 3 explanatory variables while the second stage model has 3 explanatory variables plus an intercept. The correlations of these variables range from .1 to .6 and their means are all set to 1. In order to compute size, the null must be true, so we set all true coecient values for = = 0. For each Monte Carlo trial we generate new data for all variables assuming a variancecovariance matrix derived from the correlation matrix and using the assumed mean vector. 9

We generate

yit

for the rst-stage regression using (2.2), where ci and eit are generated

with mean zero and standard deviation 2 for NT observations. We then compute the actual size of the t-statistic for the naive, pairs, and wild bootstrap estimators as well as for the asymptotic estimator and the unadjusted estimator, which makes no adjustment for the rst-stage estimation. Results are reported in Tables 1-3 for N = 40, T = 40; N = 40, T = 20; and N = 40, T = 10, respectively, based on a nominal size of .05. The pairs bootstrap estimator under-rejects by the greatest amount, with an actual size of about .02 with the smallest value of T = 10. The naive bootstrap estimator, although it consistently under-rejects, has the most accurate size of approximately .04 when T=40, but it under-rejects by increasing amounts as T falls. The Wild bootstrap estimator consistently over-rejects but appears to be the most accurate across all sizes of T . The the asymptotic estimator typically has an actual size of about .10 and is the poorest performer of all the estimators which attempt to correct the second-stage estimated standard errors for the rst-stage estimation. The actual size of the unadjusted estimator is the largest of any estimator and is included strictly to provide an upper bound.

10

6. Conclusions

Panel data are desirable because they allow you to control for unobserved e ects. The FE estimator is commonly the panel-data procedure of choice because it produces inference conditional on the e ects (by partialing them out). The downside to FE estimation is that any time-invariant variables are eliminated with the e ects. That the partial e ects of time-invariant variables can be recovered in a second-step regression is generally omitted from most panel-data econometrics texts (Wooldridge (2002) is an exception). In this paper, we resurrect the two-step \consistent, but inecient" estimator proposed by Hausman and Taylor (1981) and derive its asymptotic covariance matrix without imposing conditional homoscedasticity or serial independence on the model errors. Because of the inherent complication in computing the asymptotic formula and nite-sample bias in the asymptotic standard errors, we consider bootstrapping alternatives to standard-error estimation. Following Davidson and MacKinnon (2004), we adapt the naive bootstrap, the pairs bootstrap, and the wild bootstrap to our two-step estimator. Using Monte Carlo methods, we compare the size of the asymptotic and bootstrap alternatives. In addition, we apply our two-step estimator and alternatives for inference to the problem of estimating the returns to education. Education is typically completed before a person enters the sample and it is generally assumed to be correlated with the unobserved e ect. Using Monte Carlo methods, we compare the size of these alternative estimators of the second-stage standard errors and nd that the heteroscedasticity-robust bootstrap methods have more accurate size in small samples. In the next version we will compare these methods in the context of a standard human capital wage regression, in which the variable of interest, education, does not vary over the sample period.

11

References

Baltagi, B. H., 2005, Econometric Analysis of Panel Data, 3rd edition, John Wiley and Sons: New York. Cornwell, C. and P. Rupert, 1988, "Ecient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators" Journal of Applied Econometrics 3 149{155 . Davidson, R. and J. G. MacKinnon, 2004, \Bootstrap Methods in Econometrics", unpublished manuscript, Department of Economics, Queen's University. Greene, W. H., 2003, Econometric Analysis, 5th ed., Prentice Hall: Upper Saddle River, N.J. Hsiao, Cheng, 2004, Analysis of Panel Data, 2nd ed., Cambridge University Press: Cambridge, U.K. Hausman, J. A. and W. Taylor, 1981, \Panel Data and Unobservable Individual Effects",Econometrica 49, 1377-1399. Lancaster, T., 2003, \A Note on Bootstraps and Robustness", Unpublished manuscript, Department of Economics, Brown University. Lee, Myoung-jae, 2002, Panel Data Econometrics, Academic Press: San Diego. Murphy, K. M. and R. H. Topel, 1985, \Estimation and INference in Two-Step Econometric Models", Journal of Business and Economic Statistics 3, 88{97. MacKinnon, J. G., 2002, \Bootstrap Inference in Econometrics", Working Paper, Department of Economics, Queen's University, Kingston, Ontario, Canada. Wooldridge, J. M., 2002, Econometric Analysis of Cross Section and Panel Data, The MIT Press: Cambridge, Massachusetts.

12

Table 1: Monte Carlo Size Calculation

( Nominal Size = .05; N=40,T=40)

1

2

3

4

Bootstrap Pairs Naive .044 .034 .040 .082 .038 .082 .038 .094

Wild .050 .072 .076 .078

13

Asymptotic .082 .090 .074 .086

Unadjusted .098 .090 .074 .086

Table 2: Monte Carlo Size Calculation

( Nominal Size = .05; N=40,T=20)

1

2

3

4

Bootstrap Pairs Naive .032 .022 .032 .068 .043 .062 .038 .062

Wild .034 .058 .068 .074

14

Asymptotic .082 .073 .100 .108

Unadjusted .108 .073 .100 .108

Table 3: Monte Carlo Size Calculation

( Nominal Size = .05; N=40,T=10)

1

2

3

4

Bootstrap Pairs Naive .012 .022 .024 .046 .028 .036 .024 .074

Wild .040 .056 .060 .072

15

Asymptotic .092 .096 .110 .068

Unadjusted .124 .098 .110 .068