Oct 24, 1995 - ... 1, D-10178 Berlin und. Michael Lechner ... widely used to estimate linear dynamic panel data models (Holtz-Eakin et al. 1988, Arellano and ...
October 24, 1995
GMM-Estimation of Nonlinear Models on Panel Data Jorg Breitung Institut fur Statistik und Okonometrie, Humboldt Universitat zu Berlin, Spandauer Strasse 1, D-10178 Berlin und Michael Lechner Institut fur Volkswirtschaftslehre und Statistik, Universitat Mannheim, D-68131 Mannheim
Abstract We show that the Generalized Method of Moments (GMM) methodology is a useful tool to obtain the asymptotic properties of some existing estimators for nonlinear panel data models as well as to construct new ones. Many non-linear panel data models imply conditional moments, which do not depend on parameters from the o-diagonal part of the intertemporal covariance matrix of the error terms. The pooled maximum likelihood estimator, the sequential ML estimator based on minimum distance estimation in the second step, and previously suggested alternative GMM estimators are based on these moments. Although the pooled ML estimator is asymptotically the least ecient of the estimators considered, the Monte Carlo study indicates that it may have good small sample properties. We use a low dimensional approximation of the optimal instrument matrix to obtain an estimator, which appeared to be nearly as ecient as FIML. However, GMM estimators are easier to compute and also possess desirable small sample properties. The research for this paper was carried out within Sonderforschungsbereich 373 at the Humboldt University Berlin and was printed using funds made available by the Deutsche Forschungsgemeinschaft. The second authors gratefully acknowledges nancial support by the Deutsche Forschungsgemeinschaft (DFG). We like to thank Andreas Ziegler, a referee of this journal and participants of the econometric seminar in Tilburg and the Fifth Conference on Panel Data, Paris, for helpful comments and suggestions. Thanks are also due to Nadine Riede for able research assistance.
1
1 Introduction In this paper we investigate the asymptotic and small sample properties of various estimators for non-linear models applicable for panel data sets with a large number of individual units and a moderate number of time periods. In general, full information maximum likelihood (FIML) estimation of such models relies on tight restrictions concerning the multivariate distribution of the error terms and involve a considerable computational burden. Accounting for possible serial correlations of the errors requires high dimensional numerical integration over probability density functions, which are (for many distributions) infeasible when T is larger than 3 or 4. Although there has recently been some progress using approximations of these integrals by simulation methods (see Hajivassiliou 1993, Keane 1994), these methods are very computer intensive. Furthermore, in most cases the numerical properties of iterative procedures deteriorate rapidly with an increasing number of covariance parameters to be estimated. To sidestep these diculties alternative approaches have been suggested. Butler and Mott (1982) proposed an ecient algorithm for computing the maximum likelihood estimator of binary probit models with a one factor random eects structure. Although the one factor model generalizes the pure random eects model it still excludes many realistic and important error processes (see Heckman 1981). Moreover, an incorrectly speci ed one factor structure may lead to inconsistent estimates. Chamberlain (1980, 1984) suggested a sequential estimating procedure. In a rst step maximum likelihood is applied to each cross-section separately and in a second step these estimates are combined optimally to a full sample estimator using a minimum distance technique (Kodde, Palm and Pfann 1990). In this paper we consider the Generalized Method of Moments methodology (GMM, Hansen 1982) to estimate the nonlinear panel data model with 2
arbitrary covariance structure. Such estimators were initially suggested for the multiperiod probit model by Avery, Hansen and Hotz (1983) and are widely used to estimate linear dynamic panel data models (Holtz-Eakin et al. 1988, Arellano and Bond 1991). Apparently, the GMM methodology has a number of attractive features: First, it oers a menu of choice for selecting appropriate members of a broad class of consistent estimators involving quite a dierent extent of computational burden. Second, it is not necessary to specify the covariance structure of the errors explicitly. Moreover, the GMM framework nests some well-known estimators as special cases, so that they can be analysed conveniently on a theoretical level. The purpose of this paper is to consider a class of GMM estimators with attractive asymptotic properties and a reasonable computational burden. A number of dierent estimators are suggested and compared with respect to their asymptotic and small sample properties. It turns out that the simple GMM variants have the potential to attain the eciency of the ML estimator in relevant sample sizes and, in some instances, even outperform the ML estimator. The paper is organized as follows. Section 2 introduces the notation of the non-linear model and summarizes the properties of GMM estimators. Section 3 discusses conditional moment restrictions and states useful properties of GMM estimators to be used later on. The equivalence of Chamberlain's (1980, 1984) sequential estimator to a particular GMM estimator is shown in section 4. The results of the Monte Carlo study for the multiperiod probit model are considered in more detail in section 5. Section 6 concludes.
3
2 The nonlinear model We will consider the nonlinear model given by
yit = F (xit; 0) + uit; i = 1; 2; . . . ; N; t = 1; 2; . . . ; T;
(1)
where yit is the dependent variable observed at cross sectional unit i and time period t, xit is a k 1 vector of time varying1 explanatory variables and uit is a random error. The vector 0 2 is an unknown p 1 vector of parameters to be estimated and is a compact subspace of RI p. As a special case we will consider the generalized linear model (GLIM) letting F (xit; 0) = F (x0it 0), where k = p. The following assumptions are assumed to hold: Assumption 1: With respect to model (1) it is assumed that
(i) (ii) (iii) (iv) (v)
F () is twice continuously dierentiable with respect to 0: E (ujsjxit) = 0 for i; j 2 f1; . . . ; N g and t; s 2 f1; . . . ; T g: T X N X
xitx0it is positive de nite.
t=1 i=1 E (uisujt)
= 0 for j 6= i and t; s 2 f1; . . . ; T g: jE (uituis)j < 1 for i 2 f1; . . . ; N g and t; s 2 f1; . . . ; T g:
It should be noticed that with respect to the distribution of the errors uit no distributional assumptions are imposed. In a fully parameterized model additional nuisance parameters are introduced to make explicit the serial correlation pattern and the form of the distribution function. We neglect such extra information for two reasons. First, in many practical applications there is hardly no prior information concerning the generating process of the errors. Second, the nonlinear setup generally complicates inference in a 1 This assumption is made merely for notational convenience. The necessary modi ca-
tion for the inclusion of time invariant variables is brie y noted in footnote 4.
4
completely speci ed model. Even for the simplest discrete choice models, for instance, a maximum likelihood framework involves a T -fold integral over the likelihood contributions, so that either computer intensive numerical methods must be encountered or restrictive distributional assumptions yielding explicit expressions of these integrals have to be imposed. The framework applied here allows for so called \random eects", i.e., cross section speci c random eects, implying a particular form of serial correlation for the errors. It may also be possible to accomodate a \ xed eects" speci cation using the framework advocated by Chamberlain (1994). Such models are, however, beyond the scope of the present paper. In what follows we will consider Generalized Methods of Moments (GMM) estimators, which provides convenient and ecient estimators in a large number of applications (e.g. Ogaki 1993). To introduce the class of GMM estimators let yi = [yi1; . . . ; yiT ]0 and Xi = [xi1; . . . ; xiT ]0. Moreover, assume that the moment conditions can be written as E [ (yi; Xi ; )] = 0 for all i 2 f1; . . . ; N g. The GMM estimator is de ned by solving the minimization problem
h X i h X i ^ = argmin 1 (yi; Xi; ) 0AN 1 (yi; Xi; ) ; N i 2 N i
(2)
where the weight matrix AN is positive de nite. As a convenient framework the following set of assumptions are met. Assumption 2: [yi; Xi ] are independent draws from the set of random
variables [y; X ] and for the m 1 function (y; X ; ) it is assumed that
5
E [ (y; X ; )] exists for all 2 and is zero at 0, which is in the interior of . (ii) (y; X ; ) is continuous and dierentiable in . (iii) AN converges almost surely to the deterministic matrix A.
(i)
(iv) The parameters are identi ed by the moment constraints E [ (y; X ; )]0AE [ (y; X ; )] = 0 ) = 0. P (v) N ?1 i (yi; Xi ; ) converges almost surely and uniformly in to E [ (y; X ; )]. P (vi) N ?1 i @ (yi; Xi ; )=@ converges almost surely and uniformly in to @E [ (y; X ; )]=@ . (vii) E jj (y; X ; )jj2 < 1.
The i.i.d. assumption may be replaced by less restrictive assumptions, e.g. along the lines of Hansen (1982). However, to keep the exposition reasonably simple the assumptions of Gourieroux and Monfort (1989) are used in what follows. Given Assumption 2 the GMM estimator is consistent and asymptotically normally distributed with
p ^ d N ( ? ) ?! N (0; V );
where V = (D 0AD )?1, D = @E [ (yi; Xi ; )=@ 0] (Gourieroux and Monfort 1989, p. 339). The optimal choice of A is
A0 = E [ (y; X ; 0) (y; X ; 0)0]?1
(3)
or a sequence of random matrices converging to this expectation.
3 Conditional moment restrictions The GMM is an easy device for constructing consistent estimators. An important problem is, however, to nd appropriate moments, which identify the 6
parameters of interest and satisfy the regularity conditions. With respect to Assumption 1 the following set of conditional moments will be exploited
E (uijXi) = 0; i = 1; . . . ; N;
(4)
where ui = [ui1; . . . ; uiT ]0. Using the law of iterated expectations the conditional moment restrictions can be expressed as a set of unconditional moment restrictions E [B (Xi)ui] = 0; (5) where B (Xi) is a m T matrix of functions on Xi. As has been shown by Newey (1993) the optimal choice is
Bi = E (r i 0jXi )?i 1;
(6)
where r i = @E (uijXi)=@ 0 and i = E (uiu0ijXi). This result, however, is not particularly useful in our case. Since the error process is left unspeci ed, the covariance matrix i is unknown in general.2 We will therefore limit our attention to a class of \feasible" moment conditions yielding simple and more ecient estimators than alternative estimators. The class of moment conditions the analysis will be based upon is de ned as E [ (yi; Xi ; )] = E [Zi0Gi ui] = 0; (7) where Zi is a instrumental variable (IV) matrix not depending on unknown parameters and Gi = G(Xi ; ) is a \weight matrix".3 Usually the instrumental variables are derived from the explanatory variables. For instance, 2 For an alternative approach that involve nonparametric estimation of i see Bertschek
and Lechner (1995). 3 Of course the distinction between i and i is somewhat arbitrary, because it is the product of these matrices what Newey calls the \instruments". However, following Avery et al. (1983) we will use this notion in order to separate terms depending on unknown parameters from observable quantities. Z
G
7
the matrix Zi may contain all past, present and future observations from the regressors. In this case the IV matrix is
Zi = Xi+ = IT vec(Xi)0; :
(8)
where vec(Xi)0 stacks the columns of a Xi into a kT 1 row vector.4 It is interesting to note that the class of estimators de ned by (7) entails some well known estimators as special cases. Examples include the Pseudo ML estimator suggested by Liang & Zeger (1986) and the GMM estimator for multiperiod probit models suggested by Avery et al. (1983). In general the weight matrix Gi = G(Xi ; ) depends on . Hence, it is often convenient to apply a two step approach. First, Gi is estimated using a simple initial estimator yielding Gbi = G(Xi ; ~), where ~ denotes the initial estimate. A possible rst step estimator could be a GMM estimate letting Gi be the identity matrix. On the second stage the GMM estimator is computed treating Gbi as xed. Such a two step estimator possesses the same limiting distribution as the corresponding one step estimator (e.g. Newey and McFadden 1994). Obviously, GMM estimates can be based on various forms of the instrument matrix Zi obeying (7). To compare the asymptotic eciency for dierent estimators from this class of GMM estimators the following proposition is useful.
4 If the analysis is extended to time invariant explanatory variables stacked in the k 1 vector x , the corresponding matrix is X + = IT [vec(Xi )0 ; x0 ] so that the number of i
instruments is T 2 k + T k .
i
i
8
Proposition 1: Let Zi1 and Zi2 be T
m1 and T m2 matrices,
respectively, where m1 < m2. These matrices are assumed to satisfy (i) E (Zi10Gi ui) = 0 and (ii) E (Zi20Giui) = 0, with given matrix Gi. Then, if there exists a m2 m1 matrix K such that Z1 = Z2K , the GMM estimator based on (ii) is asymptotically at least as ecient as the GMM estimator based on (i). Proof: The criterion function according to Zi1 can be written as
=
X 0 0 1 1 X 10 uiGiZi A Zi Gi ui Xi 0 0 2 1i 0 X 20 i
uiGiZi KA K
i
Zi Gi ui;
where A1 is the ecient weight matrix according for the estimator using (i). Thus, the GMM estimator using Zi1 can be seen as a GMM estimator using Zi2 and an inecient weight matrix KA1K 0. 2 For illustration assume that the model is written as a GLIM so that uit = yit?F (x0it 0). Treating the observations for the time period t = 1; . . . ; T as T separate (cross section) samples gives the optimal moment conditions it t(yit; xit; ) = xitgit uit; (t = 1; . . . ; T ); where git = f (x0it )=E (u2itjxit) and f (!) = @F (!)=@!. The \pooled estimator" is de ned as p(yi; Xi ; ) = Xi0?i ui; where ?i is a diagonal matrix with git as typical diagonal element. Using Proposition 1 it is easy to construct estimators that are asymptotically more ecient than pooled estimators. First, let Xei =diag(x0i1; . . . ; x0iT ). Then, the estimator s(yi; Xi ; ) = Xei0?i ui 9
is called \sequential estimator" for reasons that become apparent in section 4. Second, we may further improve the eciency of the GMM estimator by using the moment conditions +(yi; Xi ; ) = Xi+ 0?i ui; where Xi+ is de ned in (8).
4 The relationship to the minimum distance estimator Chamberlain (1980, 1984) suggests a sequential estimator that can be applied to models considered here. The rst stage consists in estimating the T time periods separately, i.e. the data is treated as a sequence of cross sections. At the second step a minimum distance (MD) procedure is applied to impose the constraints implied by the multiperiod model. In this section we discuss the relationship between the MD estimator and the GMM principle. We will assume that the cross section estimator applied for the rst stage can be written as a GMM estimator according to (7). Let t be the vector of coecients for the time period t. By virtue of model (1) there are constraints on these vectors, i.e. t = for all t. Let the rst stage estimate for the t'th period be denoted by ^t and ^ = [ ^10 ; . . . ; ^T0 ]0. In matrix notation the rst stage estimates result from employing the moment conditions 0 Z (1) 0 . . . 0 10 0 G(1) 0 . . . 0 1 0 v ( ) 1 i1 1 i i B C B C B C (2) (2) B C B C B C 0 Z 0 v ( ) 0 G 0 i 2 2 i i C B C=0 B C EB B C B B C ... ... C B ... ... C B ... C B C @ A@ A A@ ( T) ( T) 0 0 . . . Zi viT ( T ) 0 0 . . . Gi or E [Zei0Gei vi( )] = 0; 10
where vit( t) = yit ? F (xit; t). Then the minimum distance estimator is given by minimizing the generalized distance between ^ and '( ), where the function '( ) represents the relationship between the \reduced form" parameters 1; . . . ; T and the \structural" parameters . In the following proposition it is shown that this sequential procedure is asymptotically equivalent to a GMM estimator using the matrix of instruments Zei and the weight matrix Gei but formulating the residuals in terms of . Proposition 2: Let ^ be a GMM estimator for = [ 10 ; . . . ; T0 ]0 us-
ing the moments function (yi; Xi ; ) = Zei0 Gi vi( ). The minimum distance estimator given by
~MD = argmin[^ ? '( )]0V ^?1 [^ ? '( )]; 2
(9)
where '( ) = {T , {T is a T -vector of ones, V ^ denotes the covariance matrix of the estimate ^. Then, ~MD is asympotically equivalent to a GMM estimator of using (yi; Xi ; ) = Zei0 Gi ui . Proof: The MD estimate results from solving the rst order conditions
0 V ?1[^ ? '( )] = 0; where
i?1 h h h e0 0 0 e i?1 E (Zi GiuiuiGi Zi) ;
i?1
V ^ = E (Zei0Gir i ) A?0 1 E (r i 0Gi Zei) ; A0 = r i = @vi( )=@ 0; = @'( )=@ 0:
Since the moment equations for ^ are just identi ed we have
h
^ ? '( 0) = E (Zei0Gi r i )
i?1 X e0
Zi Gi u0i + op(N ?1=2);
11
where u0i is a T 1 vector with typical element u0it = yit ? F (xit; 0). The rst order condition admits the expansion
0 V ?1 [^ ? '( )] ^
i X e0 0 e A0 Zi Giui + op(N ?1=2) E (r h 0 e i X e0 0 ?1=2
0
h
0G Z ) i i i
= = E (ri Gi Zi) A0
Zi Gi ui + op(N
)
Thus, the moment conditions of the MD estimator admit the same asymptotic expansion as a GMM estimator using the moment conditions (yi; Xi; ) = Zei0Giui. 2 Although Proposition 2 shows that the MD estimator is asymptotically identical to a speci c GMM estimator, the MD estimator may be preferred for the two following reasons. (i) The rst step estimation does not depend on possibly incorrect coecient restrictions implied by the panel structure. This allows to separate the speci cation testing of the model: The validity of the other statistical assumptions can be tested using the rst step estimation, whereas the coecient constraints can be checked by the MD procedure (cf. Lechner 1992). (ii) Various alternative panel speci cations can be based on the same rst step estimates. These may involve time varying coecients and heteroscedasticy over time and are obtained by dierent second step estimates. Since the second step estimation does not use the data, computer time can be saved by using the MD procedure.
5 A Monte Carlo Study for the multiperiod probit model 5.1 The model and data generating grocesses To investigate the small sample properties of several variants of the GMM estimator, we will consider the multiperiod probit model, which is considered 12
by, e.g., Avery et al. (1983). In this model F (xit; ) = (x0it ), where () denotes the c.d.f. of the standard normal distribution. The errors take the value (x0it ) with probability 1 ? (x0it ) and 1 ? (x0it ) with probability (x0it ). For the same cross section unit the errors are correlated at dierent time periods. In panel data analysis it is often assumed that the serial correlation is due to an individual speci c error component. Furthermore, there may be other error components generated by a time series process such as an autoregressive scheme. In our Monte Carlo design we will employ such a prototypical panel data setup. More precisely the data generating process for the Monte Carlo study can be characterized by following equations:
yit xDit xNit vit it i
= 1I( C + DxDit + N xNit + vit > 0) = 1I(~xDit > 0); = axNi;t?1 + bt + it; = ci + it; = i;t?1 + ~it; = 1; . . . ; N; t = 1; . . . ; T:
P (~xDit > 0) = 0:5 it U [?1; 1]; ci N (0; 1); ~it N (0; 1);
The parameters ( C ; D; N ; a; b; ; ) are xed coecients and 1I() is an indicator function, which is one if its argument is true and zero otherwise. All random numbers are drawn independently over time and individuals. The rst regressor is a serially uncorrelated indicator variable, whereas the second regressor is a smooth variable with bounded support. The dependence on lagged values and on a time trend induces a correlation over time. This type of regressor has been suggested by Nerlove (1971) and was also used for example by Heckman (1981). The error terms may exhibit correlations over time due to an individual speci c eect as well as a rst order autoregression. It should be noted that our Monte Carlo experiment excludes a possible correlation between the explanatory variables and the individual speci c error term. The analysis of the bias in such a \ xed eects speci cation" is left 13
for future research. In order to diminish the impact of initial conditions, the dynamic processes have been started at t = ?10 with xNi;t?11 = i;t?11 = 0. The sample size T has been set to 5 or 10 and N to 100 or 1600 in order to study the behaviour in fairly small and really large sample. Since all estimators are p N -convergent, the standard errors for the small sample size should be four times as large as for the larger one. Table 1 and Table 2 present some statistics for the dierent DGPs used in the simulation study. All DGPs have the common feature that the unconditional mean of the indicator variable is close to 0.5 in order to obtain maximum variance and thus to contain maximum information about the underlying latent variable. Furthermore, C = ?0:75; P D = N = 1 have been used in all simulations and T ?1 Tt=1 var(vit) = 1. For the ease of notation let it = C + D xDit + N xNit . Table 1 gives some summary statistics for the part of the DGP related to the regressors.
[ - insert Table 1 about here - ] The coecients x and t are used to generate dierent correlation pattern of it over time. Most of the simulations are based on the second con guration, the rst and last one are merely considered as extreme cases. Table 2 presents similar statistics for the error terms.
[ - insert Table 2 about here - ] We consider three standard cases (see Table 2).5 In the rst DGP the errors are uncorrelated. The second DGP adds a classical individual speci c random eect and the third DGP removes the equicorrelation pattern by 5 In the working paper version of this paper we also consider models with heteroskedastic
errors and endogenous regressors. The results can be obtained from the authors upon request.
14
adding a rst order autoregressive process. Depending on the DGP, 500 or 1000 replications (R) have been performed.
5.2 Estimators For the rst two speci cations given in Table 2 ML estimators are available. In the case of uncorrelated errors the standard ML procedure for dichotomous Probit models applied to the pooled sample is asymptotically ecient. This estimator can be seen as a GMM estimator using
0 1 xD xN 1 i1 i1 C B D N B 1 xi2 xi2 C C B Zi = B C C B : : : A @ 1 xDiT xNiT
and Gi is a diagonal matrix with fit=Fit(1 ? Fit) on the leading diagonal. This estimator will be labelled as pooled estimator in what follows. From Lemma 2 we know that in case of correlated errors asymptotically more ecient estimators can be constructed using a larger set of instruments. Accordingly, we will construct GMM estimates with a maximal number of instruments, i.e., we use Xi+ as de ned in section 3. The weight matrix Gi is the same as for the pooled estimator. The optimal weight matrix is computed using the residuals of the pooled estimate u^i and
A^N =
X i
Xi+ 0Giu^iu^0iGi0Xi+
!?1
:
The resulting estimator will be called GMM-W-opt. Unfortunately, the matrix A^N is a matrix of order (kT 2) (kT 2) so that computing A^N involves a formidable computational eort in most applications. If xit are independent draws from k random variables, as we have assumed in Assumption 2, a substantial simpli cation is possible. Letting 15
zi = vec(Xi) we have A?N1 = E = E = E = =
"X i "X N
Xi+ 0Gi uiu0iGi0Xi+
(Gi ui zi)(Gi ui
"X i=1 N
X N
i=1
#
z )0
#
#
i
(Gi uiu0iG0i zizi0)
(Gi iG0i );
i=1 N
X i=1
!
Gii G0i ;
where = E (zizi0) for all i. This suggests using
AeN =
N X i=1
Gi u^iu^0iG0i
!?1
N
!?1 N X ?1 0 i=1
zizi
;
(10)
where u^i is the residual vector from an estimation of the model using the pooled estimator. The GMM estimator employing A~N as weight matrix will be labelled as GMM-W. As has been noted in Section 3, an optimal GMM estimator can be constructed whenever consistent estimates of i are available. However, since the estimation of i generally involves the estimation of nuisance parameters, such estimates are dicult to construct. In case of random eects model it is assumed that the serial correlation is due to a individual speci c error i such that Fit = F (i + x0it ) and i N (0; 2 ). Then we can approximate the resulting covariance matrix of the errors by using a a linear Taylor series expansion around i = 0. In the Appendix we derive the approximation
E (vivi0) = i ' 2 fifi0 + i;
(11)
where fi is a vector T 1 with typical element f (x0it 0) and i is a T T diagonal matrix with the elements F (x0it 0)[1 ? F (x0it 0)]. A least-squares 16
estimator for 2 is suggested and the optimal GMM estimator according to (6) is computed (for details see the Appendix). This estimator is labelled as GMM-SS. Finally we consider a sequential estimator as considered in section 4. We compute this estimator using a minimum distance procedure and a GMM estimation procedure. The former estimator will be denoted by sequential and the latter is called GMM-xt.
5.3 Results In order to compare the performance of the dierent estimators we compute various measures of the accuracy of the estimates (see Table 3). ^r denotes the estimate of the true value 0 from the r'th replication of the model, and as std( ^r ) denotes the corresponding estimated standard error.
[ - insert Table 3 about here - ] Since in binary choice models identi cation is only up to scale and location, the ratio of estimated coecients is also of interest. For reasons of space the statistics related to the constant terms are omitted. Table 4 presents the results of the simulations with independent errors for T = 5 and T = 10. In this case all estimators under consideration are asymptotically equivalent to the pooled estimator and, thus, they are asymptotically ecient. This theoretical result is con rmed by the results for N = 1600, where it is seen that all estimators perform very similar. For the small sample size (N = 100) the pooled and GMM-SS dominate all other estimators. The estimated standard errors for GMM-xt, GMM-Wopt and sequential are serously downward biased. This problem increases for T = 10. Note that for T = 10 GMM-W-opt involves more than 100 moment equations so that the estimate for the optimal weight matrix A is singular 17
for T = 10 and N = 1600. We therefore do not present results for these estimates in case of T = 10.
[ - insert Table 4 about here - ] The simulations for the pure random eects error process (Table 5) also include ML-RE which is asymptotically ecient, provided the number of Hermite integration points is suciently large. Even for ve integration points it dominates the other estimators except GMM-SS for both sample sizes by all measures. For the large sample all coecient estimates are nearly unbiased, but for the small sample the GMM-xt and GMM-W-opt coecient estimates are upward biased. For the larger sample GMM-W dominates the pooled estimator in terms of RMSE and variance. For the small samples GMM-xt and GMM-W-opt perform worse by all measures.6 Adding an AR(1) process to the random eect leads basically to the same conclusions.
[ - insert Table 5 and 6 about here - ] Increasing the time dimension leads to the availability of more instruments of GMM-W and GMM-SS, and hence their merits compared to other estimators becomes more apparent. Furthermore, for N = 100 the pooled estimator is clearly superior to the sequential and GMM-xt estimators. Increasing the correlation of the regressors over time results in convergence problems due to near multicollinearity, but none of the substantial conclusions changes. This is also true for the case of uncorrelated regressors. 6 Note that
is only given when exhibited a singular weighting matrix. GMM-W
GMM-W-opt
18
did not converge properly or
6 Conclusion The ML estimation of non-linear models on panel data either involves tight restrictions on the multivariate distribution of the error terms or requires high dimensional numerical integration over probability density functions. Both approaches have their shortcomings so that it is appealing to consider an alternative framework such as the class of GMM estimators. We show that this methodology is a useful tool to obtain the asymptotic properties of some existing estimators and to construct new ones. By concentrating on conditional moments which do not depend on parameters from the odiagonal part of the covariance matrix of the error terms, the suggested methods sacri ce some eciency compared to FIML but are much easier to compute. We show that the pooled maximum likelihood estimator, the sequential estimator, which is based on minimum distance estimation in the second step (Chamberlain 1984), and the GMM estimators suggested by Avery et al. (1983) are all members of this class of GMM estimators. Although the pooled ML estimator is asymptotically the least ecient of the estimators considered, the Monte Carlo study showed that it may have good small sample properties. Another important conclusion from the Monte Carlo study is that, even in small samples, substantial eciency gains are possible. However, the problem of constructing more ecient estimators is that in ating the number of instruments leads to a poor performance in small samples. Thus, the optimal instruments should be approximated in a lower order dimension. One such approximation is the suggested small-sigma approximation. For future work it will be interesting to see whether whether our Monte Carlo results carry over to other non-linear models with observed variables, that are more informative about the underlying latent variables, such as ordered probit or tobit models for example. 19
Appendix: The small-sigma approximation A linear Taylor series expansion of F (i + x0it ) around 0 and i = 0 gives
F (i + x0it ) = F (x0it 0) + f (x0it 0)i + Op(2 ); which means that the approximation works better the smaller the variance of the individual eects is. Inserting in the regression function gives
yit = F (x0it 0) + uit; where uit = f (x0it 0)i + vit. Let fi be a vector with typical element f (x0it 0) and i denotes a diagonal matrix with the elements F (x0it 0)[1 ? F (x0it 0)] on the leading diagonal. Then,the covariance matrix of uit is i = 2 fifi0 + i:
(12)
This matrix can be used to construct an optimal GMM estimator according to (6). A simple estimator for 2 is obtained as follows. In a rst step the model is estimated ignoring the individual eect. Since random individual eects does not invalidate the moment restrictions these estimates are consistent. From the residual vector denoted by u^i the statistic si = vech(^ui u^i 0) is constructed, where the operator \vech" stacks the non-redundant elements of a symmetric matrix in a vector. Similarly, we construct hi = vech(fifi0). Since i is a diagonal matrix it follows vech(i) gives a vector of zeros. Then, an estimator for 2 can be obtained from an OLS regression
si = vech() + ei = 2 hi + ei;
(13)
where ei is an T 1 error vector. If 2 is not \too large" the estimate should be close to the true value. Inserting this estimate in (12) estimates of the covariance matrices i for i = 1; . . . ; N are obtained. 20
References Arellano, M. and Bond, S. (1991): Some Tests of Speci cation for Panel Data: Monte Carlo Evidence and an Application to Employment Equations, Review of Economic Studies, 58, 277{297.
Avery, R., Hansen, L. and Hotz, V. (1983): Multiperiod Probit Mod-
els and Orthogonality Condition Estimation, International Economic Review, 24, 21-35.
Bertschek, I. and Lechner, M. (1995): GMM-Estimation of Panel Pro-
bit Models: NOnparametric Estimation of the Optimal Instruments, SFB 272 Discussion Paper No. 25, Humboldt University Berlin.
Butler, I.S. and Mo tt, R. (1982): A Computationally Ecient Quadrature Procedure for the One-Factor Multinomial Probit Model, Econometrica, 50, 761-764.
Chamberlain, G. (1980): Analysis of Covariance with Qualitative Data, Review of Economic Studies, 47, 225-238.
Chamberlain, G. (1984): Panel Data, in Griliches, Z. and Intriligator,
M.D. (eds.), Handbook of Econometrics, Vol. II, Ch. 22, Amsterdam: North-Holland.
Gourieroux, C. and Monfort, A. (1989): Statistique et Modeles Econometriques, Paris: Economica.
Guilkey, D.K. and Murphy, J.L. (1993): Estimation and Testing in the Random Eects Probit Model, Journal of Econometrics, 59, 301-317.
Hajivassiliou, V.A. (1993): Simulation Estimation Methods for Limited Dependent Variable Models, in: Maddala, G.S., Rao, C.R. and Vinod, 21
H.D. (eds.), Handbook of Statistics, Vol. 11: Econometrics, Ch. 19, Amsterdam: North-Holland.
Hansen, L.P. (1982): Large Sample Properties of Generalized Methods of Moments Estimators, Econometrica, 50, 1029-1055.
Hamerle, A. and Nagl, W. (1987): Misspeci cation in Models for Dis-
crete Panel Data: Applications and Comparisons of some Estimators, Discussion paper 105/S, University of Konstanz.
Heckman, J.J. (1981): The Incidental Parameters Problem and the Prob-
lem of Initial Conditions in Estimating a Discrete Time - Discrete Data Stochastic Process and Some Monte Carlo Evidence, in: Manski, C. and McFadden, D. (eds): Structural Analysis of Discrete Data, Cambridge: MIT-Press.
Holtz-Eakin, D., Newey, W. and Rosen, H.S. (1988): Estimating Vector Autoregressions with Panel Data, Econometrica, 56, 1371{1395.
Keane, M.P. (1994): A Computationally Practical Simulation Estimator for Panel Data, Econometrica, 62, 95-116.
Kodde, D.A., Palm, F.C. and Pfann, G.A. (1990): Asymptotic Least-
Squares Estimation Eciency Considerations and Applications, Journal of Applied Econometrics, 5, 229-243.
Lechner, M. (1992): Some Speci cation Tests for Static Limited Depen-
dent Variable Models Estimated on Panel Data, Beitrge zur Angewandten Wirtschaftsforschung, 474-92, University of Mannheim.
Liang, K.-Y., and Zeger, S.L. (1986): Longitudinal Data Analysis Using Generalized Linear Models, Biometrika, 73, 13{22. 22
Nerlove, M. (1971): Further Evidence on the Estimation of Dynamic Eco-
nomic Relations From a Time Series of Cross Sections, Econometrica, 39, 359-383.
Newey, W.K. (1993): Ecient Estimation of Models with Conditional Mo-
ment Restrictions, in Maddala, G.S., Rao, C.R. and Vinod, H.D. (eds.), Handbook of Statistics, Vol. 11: Econometrics, Ch. 16, Amsterdam: North-Holland.
Newey, W.K. and McFadden, D.L. (1994): Large Sample Estimation and Hypothesis Testing, in: Engle, R.F and McFadden, D.L. (eds), Handbook of Econometrics, Vol. IV, Elsevier: North Holland.
Ogaki, M. (1993): Generalised Method of Moments: Econometric Applications, in Maddala, G.S., Rao, C.R. and Vinod, H.D. (eds.), Handbook of Statistics, Vol. 11: Econometrics, Ch. 17, Amsterdam: NorthHolland.
23