Consistent estimation of discrete-choice models for panel data with multiplicative effects1
Alban Thomas LEERNA-INRA, Universit´e des Sciences Sociales de Toulouse
January 2002
Preliminary draft. Do not quote.
Abstract. This paper presents an extension to the fixed-effect Logit for panel-data discrete-choice models, where the error component structure is multiplicative (individual effects multiplied by time effects). In linear models with such an error-component structure as investigated by Ahn, Lee and Schmidt (2001), usual fixed-effect estimators are generally inconsistent. We propose a conditional Logit estimator based on a different sufficient statistic, for the case where multiplicative time effects are known. When not the case, we discuss the implementation of the Modified Profile Likelihood based on a transformation of incidental parameters. The last estimator is an extension of Honor´e and Lewbel (2000) semiparametric estimator. We investigate smallsample properties of these estimators with a Monte Carlo experiment.
1
Corresponding address: INRA-LEERNA, Universit´e des Sciences Sociales de Toulouse, 21 All´ee de Brienne, 31000 Toulouse, France. email
[email protected], Phone +33 (0)5 61 12 85 21, Fax +33 (0)5 61 12 85 20.
1
1
Introduction
The fixed-effect Logit is widely used in applications of discrete-choice models with panel data. Whereas fixed-effect procedures are of interest in linear models because of their ability to filter out individual effects that may be correlated with explanatory variables, in discrete-choice models however, the key feature of these procedures is to alleviate the incidental parameter problem2 . Consider the following binary choice model for panel data yit = 1I(yit∗ > 0) with yit∗ = x0it γ + αi + εit ,
(1)
i = 1, 2, . . . , N , t = 1, 2, . . . , T , where 1I(.) is the indicator function, xit is a K × 1 vector of explanatory variables, αi is the individual effect and εit is i.i.d. across units and time periods. This is the standard discrete-choice specification of the literature. Specifying a probability distribution for εit produces a set of individual contributions to the sample likelihood, that may be maximized using conventional, gradient numerical procedures. Alternatively, semiparametric methods also exist that do not require distributional assumptions on εit , but may impose parametric identification restrictions (maximum score of Manski 1985, semiparametric estimator of Honor´e and Lewbel 2000). It is well known that the Maximum Likelihood estimator (MLE) of the Logit model with individual effects is not consistent when T is fixed (see Hsiao 1992). This is due to the dependence between the MLE for γ and for the N incidental parameters αi , as the Logit model is nonlinear. When the number of time periods is small, the MLE estimate α ˆ i is not consistent, even when N → ∞, and this inconsistency is reported to the MLE of γ. This is the incidental parameter problem (Neyman and Scott 1948, Lancaster 2000). The conditional maximum likelihood estimation principle (Andersen 1973) has been suggested as a convenient way to remove individual effects from the Logit model. In the standard Logit model with an additive individualP effect, a sufficient statistic for the latter is the sum of positive outcomes ( t yit ) for that given individual. Hence, constructing a new set of probabilities conditional on this statistic forms the basis of a modified Maximum Likelihood criterion, whose maximization yields consistent parameter estimates.3 2
See, e.g., Heckman (1981), Hsiao (1992), and Bertschek and Lechner (1998) for a survey on alternative, random-effect models.
2
The ability to construct a conditional version of the Logit model relies on the additivity of the individual effect in the index function. In this standard version of the discrete choice model, unobserved individual heterogeneity has the same impact on the probability that yit = 1, no matter the time period. This property allows one to compare the structure of the model with the conditional (McFadden) Multinomial Logit where original parameters are constant across the M (M > 2) alternatives, and estimated parameters are related to differences in the level of explanatory variables across alternatives. The alternative model with individual-specific explanatory variables and alternative-specific parameters would have, in the panel data framework, parameters indexed by the time period. And, because explanatory variables would be individual-specific, the individual heterogeneity term would also be affected by a time-varying parameter, hence leading to a multiplicative, time-varying individual effect. Multiplicative effects in panel data models have been proposed in the literature on linear models as an alternative specification (Ahn, Lee and Schmidt 1999, Nauges and Thomas 1999, Holtz-Eakin et al. 1988, Cr´epon, Kramarz and Trognon 1997). The basic intuition behind these models is that individual heterogeneity has a different impact on the dependent variable, depending on the time period. As a consequence, the marginal effect of unobserved heterogeneity is time-varying, which allows for more flexibility in modeling individual choices, as the standard model with linear additive individual effects is a special case of the multiplicative effects specification. GMM procedures based on model transformation by quasi-differencing produces consistent parameter estimates under mild regularity conditions, for static or dynamic models. Incorporating multiplicative effects in a discrete-choice framework raises interesting questions for empirical applications when Logit models are considered. First, the motivation for such a model can be found in economic conditions under which there may be a exogenously-driven tendency for all individuals or firms to move toward an equilibrium level in which the event characterized by yit = 1 always (or never) happens, when explanatory variables xit are stationary. This would be the case of a monotonic trend, either increasing or decreasing. As T increases and eventually reaches values outside of the observed sample, all units would be located in either of the two 3
In the process, individuals with 0 or T positive outcomes are discarded, as their contribution to the log-likelihood is zero.
3
possible equilibrium regimes, depending on the value of their associated unobserved heterogeneity component, αi . Of course, this might not be observed in practice, depending on the relative magnitude of x0it γ and the heterogeneous trend in the neighborhood of 0 for yit∗ , when the number of time periods is limited. Some examples are the adoption of a new technology by firms under time-varying market conditions, the likelihood that an unemployed person finds a job given exogenous labor market shocks, and so on. Another motivation for incorporating a multiplicative structure embedding both individual effects and time effects is to allow for heterogeneous sensitivity of economic agents to common shocks, not necessarily trends. Macro-economic conditions for instance, may condition individuals’ response in terms of the discrete-choice model, and the marginal response to the common shock may be assumed different across the population of agents. Second, the question of parameter consistency of the usual fixed-effect Logit model has to be addressed, when the true model has multiplicative effects. Third, the possibility to construct a conditional version of the Logit model may rely on prior knowledge on time effects. When the latter are unobservable and are treated as structural parameters, the conditional fixedeffect Logit procedure may not be feasible. In this case, a possibility would be to construct a modified profile likelihood along the lines of Cox and Reid (1987), and maximize it with respect to structural parameters. It is well known that such a procedure would not be consistent for fixed T , but would have a lower bias (of order O(1/T )) that the simple concentrated-likelihood procedure. Alternatively, a semiparametric version of the discrete-choice model may be called for. The rest of the paper proceeds as follows. The usual Logit model for panel data with fixed effects is presented in Section 2, where we recall the incidental-parameter problem, and the way around it by means of constructing a conditional likelihood based on a sufficient statistic. In section 3, we present the discrete-choice with multiplicative effects in the Logit framework, and show that, when those effects are known, a consistent conditional Logit estimator exists. In section 4, we consider the case where time effects are unknown, and propose a modified profile likelihood technique to estimate jointly time effects and structural parameters. Section 5 deals with an extension toward a semiparametric estimator of the model, along the lines of Honor and Lewbel (2000). We present Monte Carlo experiment results in section 6. Concluding remarks are in section 7. 4
2
Overview of the usual Logit model
In Model (1), the distribution of αi given xi is left unrestricted (fixed-effect model). On the other hand, a crucial assumption is that εit is i.i.d. across time periods and individuals. This implies that period-specific random terms are uncorrelated, a somewhat restrictive assumption. Magnac (2001) suggests a characterization of distribution functions for εit such that the “global cut” property is satisfied, as in the logistic case. His results are valid P for the case T = 2, where the global cut property is that P rob(yi |xi , αi )/P rob( 21 yit = 1|xi , αi ) be independent from αi . Under independence between εit on the one hand, xit and αi on the other, the global cut condition is that P rob(εi1 > −xi1 γ − αi , εi2 ≤ −xi2 γ − αi )/P rob(εi1 ≤ −xi1 γ − αi , εi2 > −xi2 γ − αi ) is a function c(xi1 − xi2 ). Magnac presents joint characterization of the distribution of εit and of function c(.). Parametric identification has been studied intensively by Chamberlain (1992), who showed that there may be local underidentification for fixed T , unless the distribution of εit is Logistic, when explanatory variables have a bounded support.4 In the model yit = 1I(x0it γ + αi + εit > 0), assume εit is distributed according to a continuous distribution with density probability function and cumulative density function respectively λ(.) and Λ(.): λ(x0it γ + αi ) =
exp(x0it γ + αi ) , [1 + exp(x0it γ + αi )]2
Λ(x0it γ + αi ) =
exp(x0it γ + αi ) . 1 + exp(x0it γ + αi )
Let λit = λ(x0it γ + αi ), Λit = Λ(x0it γ + αi ), and hit = λit /[Λit (1 − Λit )]. We have hit = 1 for the Logit case. First-order conditions from the log-likelihood maximization problem are T
∂li X = hit (yit − Λit )xit = 0, ∂γ t=1
(2)
T
X ∂li = hit (yit − Λit ) = 0, ∂αi t=1 4
The proof relies on the linearity of the log odd ratios.
5
(3)
where PN li is individual’s i contribution to the sample log-likelihood, log L(γ) = i=1 li . The MLE of γ maximizes the concentrated log-likelihood: γˆ = arg max γ
N X
li [γ, α ˆ i (γ)],
i=1
where α ˆ i solves (3). First-order conditions corresponding to the concentrated log-likelihood are finally N N ˆ i (γ)) ∂li (γ, α ˆ i (γ)) ∂ α ˆ i (γ) 1 X ∂li (γ, α ˆ i (γ)) 1 X ∂li (γ, α + = N T i=1 ∂γ ∂αi ∂γ N T i=1 ∂γ N T 1 XX = hit [yit − Λit (γ, α ˆ i (γ))] xit = 0. N T i=1 t=1
(4)
When T is fixed, individual effect estimates α ˆ i (γ) are not consistent and, as a consequence, the MLE of γ is not consistent either, as the information matrix in (γ, αi ) is not block diagonal (Lancaster 1999). The conditional Maximum Likelihood estimation principle (Andersen 1973) has been suggested as a convenient way to remove individual effects from the Logit model. According to this principle, if a minimum sufficient statistic τi exists for incidental parameters αi and this statistic does not depend upon structural parameter γ, the conditional density of observations is, in vector form: f (yi |γ, αi ) f (yi |γ, τi ) = , i = 1, 2, . . . , N, g(τi |γ, αi ) for some density g(τi |γ, αi ). This conditional density PN function does not depend on incidental parameters and maximizing i f (yi |γ, τi ) yield consistent estimates for γ. P In the Logit model, it is easily seen from condition (3) that τi = t yit is a sufficient statistic for αi . Hence, maximizing the conditional log likelihood (based on all possible sequences of yit ’s such that the sum of positive outcomes is equal to τi ) produces consistent estimates for γ. The key element to this conditional log-likelihood is the probability of the T vector yi = (yi1 , yi2 , . . . , yiT )0 conditional on τi : P T 0 exp y x γ it it t τ !(T − τi )! P × i P rob(yi |τi ) = P , (5) T T! 0 exp d x γ d∈Bi t it it 6
P where Bi is the set of all possible T sequences dit such that Tt dit = τi . Because individual effects appear linearly in the model, they are easily removed by grouping together all sequences with the same value for τi . For example, when T = 2, the conditional log-likelihood is X log L = {ωi log Λ[(xi2 − xi1 )0 γ] + (1 − ωi ) log [1 − Λ[(xi2 − xi1 )0 γ]]} , i∈B1
(6) where Λ(.) = exp(.)/[1 + P exp(.)], and BP 1 = {i|yi = (0, 1), (1, 0)} (sequences with all events identical: t yit = T or t yit = 0 contribute nothing to the log-likelihood).
3
A Logit model with multiplicative effects (h(t) linear and known)
Consider now the following specification with multiplicative effects: yit∗ = x0it γ + h(t)αi + εit ,
(7)
where h(t) is a known, deterministic function of time. When yit∗ is observed and takes real values, removing individual effects can be done by quasidifferencing. The special case of a linear trend h(t) = t, has been examined recently by Verbeek and Knaap (1999) in a GMM framework. In our discrete choice framework, this multiplicative specification indicates a heterogenous trend in the probability of an event (yit = 1) for individual i at time t, and may originate, e.g., from a random-coefficient model. Assume h(t) is linear and known up to two integers a and b, such that h(t) = a + bt. The probability associated with the T vector yi is, under the logistic assumption: P PT PT T 0 y + b ty ) y x γ + α (a exp it i t=1 it it t=1 it t=1 . (8) P rob(yi ) = QT 0 {1 + exp[x γ + (a + bt)α ]} i it t=1 The derivative of the individual contribution to the log-likelihood li with respect to αi is T
X ∂li = (a + bt) {yit − F [x0it γ + (a + bt)αi ]} ∂αi t=1 7
=a
T X t=1
yit + b
T X
tyit −
t=1
T X exp[x0it γ + (a + bt)αi ](a + bt) . 0 1 + exp[x γ + (a + bt)α ] i it t=1
When equation (7) is the true data generating process, the fixed-effect Logit estimation procedure described in the previous section is consistent when h(t) is constant, i.e., when b = 0, ∀a. This is because the conditional likelihood function above is based on identical values of the sufficient statistic τi , where each time observation has the same weight. On the other hand, when h(t) 6= h(s) for t 6= s, i.e., when b 6= 0, αi is weighted differently according to the time period and is not filtered out from the conditional likelihood. The conditional probability for yi given τi is in this case : P PT PT T 0 exp t yit xit γ + αi (a t yit + b t=1 tyit ) P P rob(yi |τi ) ∝ P PT PT T 0 d∈Bi exp t dit xit γ + αi (a t dit + b t=1 tdit ) P PT T 0 exp t yit xit γ + αi b t tyit P , =P (9) PT T 0 exp d x γ + α b td it i it it d∈Bi t t P P P PT T T 0 0 where τi = t dit , and is different from exp t yit xit γ / d∈Bi exp t dit xit γ when b 6= 0. The inconsistency of the usual fixed-effect Logit is due to the presence of unobserved heterogeneity which is not fully filtered out. In the example above with T = 2, assume xi1 = 0 and xi2 = 1, and let ∆h = h(2) − h(1) = b. We then have P rob(yi = (0, 1)|τi = 2) =
exp [γ + (a + 2b)αi ] exp (γ + αi ∆h) = exp [γ + (a + 2b)αi ] + exp [(a + b)αi ] 1 + exp (γ + αi ∆h)
exp (γ + bαi ) . 1 + exp (γ + bαi ) P PN1 Let N1 = N i 1I(yi1 + yi2 = 1) and n1 = i 1I(yi1 = 0, yi2 = 1). When h(1) = h(2), differentiating the conditional log-likelihood with respect to γ yields γˆ = log (n1 /(N1 − n1 )) and plim γˆ = γ. Assuming now an heterogenous trend, the usual fixed-effect Logit estimator is not consistent because plim log [n1 /(N1 − n1 )] = γ + log E[exp(αi ∆h)] 6= γ unless ∆h = 0. =
On the other PT hand, it is clear from Equation (9) that a sufficient statistic ∗ will be τi = t=1 tyit ∀i, provided a = 0 ∀b. In this case, the denominator of 8
the conditional has to be constructed from all possible sequences PT probability PT ∗ such that t=1 tdit = τi where dit = {0, 1}, instead of t=1 dit = τi as before. A conditional Logit based on this statistic will remove individual effects even if parameter b is unknown, as the conditioning on τi∗ is invariant to any multiplicative transformation. This is of course due to the restrictive nature of our model with heterogeneous trends, and this result will not extend to more general, nonlinear trend specifications.5 Table 1 presents possible combinations and values of sufficient statistics τi and τi∗ , for selected values of T , and a linear trend h(t) = t. Table 1: Sufficient statistics for Combinations τi (0,1),(1,0) 1 (0,0,1),(0,1,0),(1,0,0) 1 (1,1,0),(1,0,1),(0,1,1) 2 (1,0,0,0),(0,1,0,0),(0,0,1,0),(0,0,0,1) 1 (1,1,0,0),(1,0,1,0,),(1,0,0,1),(0,1,1,0), 2 (0,0,1,1),(0,1,0,1) (1,1,1,0),(1,0,1,1),(1,1,0,1),(0,1,1,1) 3
T 2 3 3 4 4 4
Logit models Combinations (1,1,0),(0,0,1)
τi∗ 3
(1,1,0,0),(0,0,1,0) (1,0,1,0),(0,0,0,1)
3 4
(1,0,0,1),(0,1,1,0)
5
We have the following proposition.
5
When h(t) is unknown and is considered for estimation along with γ, the condition that the sufficient statistic be independent from structural parameters is lost.
9
Proposition. Assume model P (7) holds with h(t) a + bt, where a and b P= T T ∗ are two integers. Define τi = t yit and τi = t h(t)yit . Let Bi and Bi∗ denote the set of P all possible sequences {dit = (0, 1)} for unit i such that PT T ∗ t=1 dit = τi and t=1 dit = τi respectively. a) When b 6= 0, τi is not a sufficient statistic, and the fixed-effect Logit estimator obtained by maximizing the conditional log-likelihood based on conditional probabilities ! ! T T X X X exp (10) P rob(yi |τi ) = exp yit x0it γ / dit x0it γ , t=1
d∈Bi
t=1
is not consistent. b) When a = 0 ∀b, τi∗ is a sufficient statistic, and maximizing the conditional log-likelihood based on the conditional probabilities ! ! T T X X X P rob(yi |τi∗ ) = exp yit x0it γ / exp d∗it x0it γ , (11) d∗ ∈Bi∗
t=1
t=1
produces consistent estimates for γ when T ≥ 3. Conditional probabilities in (11) above can be rewritten in an analog fashion as for the usual conditional Logit (10). Let y˜it P = yit h(t), PTx˜it = T ˜ ˜ ˜ ˜ ˜ ˜ xit /h(t), Bi = {(di1 , di2 , . . . , diT ), dit = dit h(t), dit = 0 or 1, t dit = t y˜it }. Then the conditional probability (refeq6) becomes P T ∗ 0 exp ˜it x˜it γ + bαi τi t=1 y P P rob(yi |τi∗ ) = P T ˜it x˜0 γ + bαi PT d˜it exp d ˜ ˜ it d∈Bi t=1 t=1 y˜it x˜0it γ P . =P T 0 ˜ ˜it γ ˜ B ˜i exp d∈ t=1 dit x exp
P
T t=1
(12)
P In the conditional probability (12), the terms Tt y˜it (in the numerator) P and Tt d˜it (in the denominator) cancel out, but because of the restriction on 10
the shape of h(t), non-zero contributions to the log-likelihood can be found only when T − 2 > 0, i.e., when T ≥ 3. In the linear trend case where h(t) are T − 2 possible values PT = t, thereP ∗ ∗ for τi with τi ≤ T (different from t yit = 0 or Tt yit = T ). The number of possible combinations is 2 for (T ≥ 3, τ ∗ = 3, 4), 3 for (T ≥ 5, τ ∗ = 5), 4 for (T ≥ 6, τ ∗ = 6) and so on. The fixed-effect Logit estimator proposed above is consistent and very easy to compute. Let D denote a 2T × T matrix containing all possible ˜ = D×A, where A is a T ×T matrix sequences of 0 and 1 for a T vector, and D 0 with h = (h(1), h(2), . . . , h(T )) in its main diagonal and zeroes elsewhere. ˜ eT is a T vector of ones, Let the T vector d˜k denote row k of matrix D, and F denotes a 2T vector with typical row element Fk = 1I(τi∗ = d˜0k eT ), k = 1, 2, . . . , 2T . The probability given in (11) is then computed in vector form as exp(˜ yi0 x˜0i γ) ∗ P rob(yi |τi ) = . (13) ˜ x0 γ)]0 F exp(D˜ i
4
Orthogonality and modified profile likelihood (h(t) unknown)
When the trend function h(t) is unknown, the conditional Logit procedure deP scribed above is not feasible anymore, because the set of statistics t h(t)dit will not in general contain enough admissible sequences to construct the conditional probabilities. This is obvious in particular when elements of h(t) are real-valued, logarithmic or quadratic functions of integer numbers. Consequently, if time effects are to be estimated and when there does not exist a consistent conditional Logit estimator, a remaining possibility is to write the concentrated log-likelihood in terms of all structural parameters and maximize the Modified Profile Likelihood (MPL). The idea is to center the original log-likelihood, so as to obtain a lower bias (of order O(1/T )) when maximizing the concentrated log-likelihood (see Cox and Reid 1987). An important question is first to check for identification of the concentrated log-likelihood. Assume from now h(t) = θt , t = 1, 2, . . . , T and set β = (γ, θ) the vector of structural parameters, where θ0 = (θ1 , θ2 , . . . , θT ). The in-
11
P formation matrix corresponding to log L = N ˆ i (β)) is E(∂ 2 log L/∂β∂β 0 ) i=1 li (β, α 2 2 2 ∂ log L ∂ log L ∂ log L ∂ 2 log L E ∂γ∂θ1 . . . E ∂γ∂θt . . . E ∂γ∂θT E ∂γ 2 2 2 2 2 ∂ log L ∂ log L ∂ log L ∂ log L E ∂θ1 ∂γ E . . . E ∂θ1 ∂θt . . . E ∂θ1 ∂θT 2 ∂θ 1 = .. .. .. .. .. .. . . . . . . E
∂ 2 log L ∂θT ∂γ
∂ 2 log L ∂θ1 ∂θT
E
∂ 2 log L ∂θT ∂θt
... E
... E
∂ 2 log L 2 ∂θT
with the following approximations: E
∂ 2 log L ∂γ 2
E
∂ 2 log L ∂γ∂θt
E
≈ −(1/N T )
N X T X
λit (β, α ˆ i (β))xit x0it ,
i=1 t=1
∂ 2 log L ∂θt2
≈ −(1/N T )
N X
α ˆ i (β)λit (β, α ˆ i (β))xit ,
i=1
≈ −(1/N T )
N X
α ˆ i2 (β)λit (β, α ˆ i (β)),
i=1
E
2
∂ log L ∂θt ∂θs
= 0.
P Assume plim (1/N T ) i,t xit x0it has rank K. The estimated information matrix is not singular provided the main diagonal has non-zero elements, a condition obviously satisfied for α ˆ i (β) such that the density values λit (β, α ˆ i (β)) do not tend to 0. Hence, identification requires that α ˆ i (β) does 6 not ˆ i (β) solves the first-order condition ∂li /∂αi = PT tend to −∞ P.T Because α h(t)yit − t=1 Λit h(t) = 0, a necessary condition for identification is t=1P that Tt=1 h(t)yit > 0. As in the usual Logit case, units for which yit = 0 ∀ t have to be removed from the analysis. Following the lines of Cox and Reid (1987), we can show that there exist incidental parameters µi , i = 1, 2, . . . , N , such that information orthogonality condition holds for the reparameterized log-likelihood. Note also that cross derivatives E ∂ 2 log L/∂γ∂θt tend to 0 if λit either tends to 0, or to 1 and αi is not correlated with xit for any given t. In any case, this does not preclude the information matrix to be nonsingular. 6
12
Suppose we consider li∗ (β, µi ) = li [β, α(µi , β)], such that, conditional on xi and αi , we have 2∗ ∂ li (β, µi ) E |xi , αi = 0 i = 1, 2, . . . , N. ∂β∂µi We have
−1 ∂αi E∂ 2 li ∂li2 =− ∂β ∂β∂αi ∂αi2
P where li = Tt=1 {yit log Λit + (1 − yit )[1 − log Λit ]}. Single and second derivatives of li with respect to αi and components of β are: T
T
∂li X = (yit − Λit )xit , ∂γ t=1
X ∂li = θt (yit − Λit ), ∂αi t=1 T
X ∂ 2 li =− θt λit , 2 ∂αi t=1
∂li = αi (yit − Λit ), ∂θt ∂ 2 li = −αi2 λit , ∂θt2
∂ 2 li = (yit − Λit ) − αi θt λit , ∂αi ∂θt
T
X ∂ 2 li =− θt λit xit , ∂αi ∂γ t=1
∂ 2 li = −αi λit xit , ∂γ∂θt T
X ∂ 2 li = − λit xit x0it . ∂γ 2 t=1 We have 2 ∂ 2 li ∂ li ∂ 2 li ∂ 2 li ∂ 2 li =− , , ,..., , ∂β∂αi ∂γ∂αi ∂θ1 ∂αi ∂θ2 ∂αi ∂θT ∂αi so that 2 ∂αi ∂ li /∂γ∂αi ∂ 2 li /∂θ1 ∂αi ∂ 2 li /∂θ2 ∂αi ∂ 2 li /∂θT ∂αi =− , 2 , 2 ,..., , . ∂β ∂ 2 li /∂αi2 ∂ li /∂αi2 ∂ li /∂αi2 ∂ 2 li /∂αi2 13
Since
PT θt λit xit ∂ 2 li /∂γ∂αi = − Pt=1 , 2 T 2 2 ∂ li /∂αi t=1 θt λit
we have ∂αi ∂ 2 αi =− ∂γ∂µi ∂µi
"P
T 0 t=1 θt λit (xit + θt (∂αi /∂γ) PT 2 t=1 θt λit
# ,
where λ0it (X) = ∂λit (X)/∂X. Also,
⇔
∂ 2 li /∂γ∂αi (yit − Λit ) − αi θt λit = P 2 2 ∂ li /∂αi − Tt=1 θt2 λit " # P ∂ 2 αi ∂αi 2θt λit + αi θt2 λ0it + (∂αi /∂θt ) Tt=1 θt3 λ0it =− PT 2 ∂θt µi ∂µi t=1 θt λit
(14)
P The term in brackets for ∂ 2 αi /∂γ∂µi is equal to ∂ log( t θt2 λit )/∂γ, and the one for ∂ 2 αi /∂θt ∂µi is equal to P P ∂ log( Tt=1 θt2 λit ) ∂αi ∂ log( Tt=1 θt2 λit ) + . ∂θt ∂θt ∂αi Hence, because ∂ 2 αi ∂β∂µi
∂αi ∂µi
−1
∂αi ∂ , = log ∂β ∂µi
we have that " T #−1 Z x0it γ+θt αi T X X ∂αi 2 2 = θt λit ⇔ µi = θt λ(r)dr ∂µi −∞ t=1 t=1 =
T X
θt2 Λ(x0it γ + θt αi ).
(15)
t=1
Let µ ˆi (β) denote the new incidental parameter obtained as a function of the structural parameters, by concentrating the log-likelihood function. The individual contribution to the MPL is liM (β) = li∗ [β, µ ˆi (β)] −
1 log −d∗µµi (β, µ ˆi (β)) , 2
14
(16)
where µ ˆi (β) is the MLE of µi given β, and d∗µµi (β, µi ) = ∂ 2 li∗ /∂µ2i . It is more interesting to view the MPL in terms of the original parameterization: li∗ (β, µ ˆi (β)) = li (β, α ˆ i (β)) because of the parametric invariance property of the MLE. The relationship between the second-order derivatives with respect to original and new auxiliary parameters can be found from 2 2 2 ∂ 2 li ∂αi ∂li ∂ 2 αi ∂ 2 li∗ ∂li2 ∂αi ∂ 2 li∗ = + ⇔ = . ∂µ2i ∂αi ∂µi ∂αi ∂µ2i ∂µ2i ∂αi2 ∂µi |µi =ˆµi (β) Consequently, the MPL is liM (β)
5
2 1 ∂ li ∂µi − 2 (β, α ˆ i (β)) + log |α =αˆ (β) = li (β, α ˆ i (β)) − 2 ∂αi ∂αi i i " T # X 1 = li (β, α ˆ i (β)) + log θt2 λ (x0it γ + α ˆ i (β)θt ) . (17) 2 t=1
A semiparametric estimator (h(t) unknown)
We present in this section the semiparametric approach developed by Honor and Lewbel (2000), applied to our model with multiplicative effects. These authors briefly present our model as an extension to their semiparametric analysis for discrete choice with panel data. The model is yit = 1I(ηit + x0it γ + αi θt + εit ),
(18)
where ηit is a single explanatory variable with coefficient normalized to 1, independently distributed from αi and εit conditionally upon xit and a vector of instruments denoted zi . xit and zi are respectively K ×1 and L×1 vectors. For purpose of identification, it is assumed that the conditional distribution F (ηit |xit , zi ) exists and is continuous, with density ft (ηit |xit , zi ). Let eit = αi θt + εit the stochastic error component distributed on (−∞, ∞), with eit independent from v it conditionally upon xit and zi . eit is distributed on the domain Ωe . The only distributional requirement so far is that the distribution ft (ηit ) be continuous, while individual effects αi may be correlated 15
with xit and zi . On the other hand, instruments zi are assumed uncorrelated with εit disturbances. Let [yit − 1I(ηit > 0)] , ft (ηit |xit , zi )
yit∗ =
then E(yit∗ |xit , zi ) = x0it γ + E(eit |xit , zi ).
Dropping subscripts for ease of notation and letting s(x, e) = −x0 γ − e, we have Z ∞ E[y − 1I(η > 0)|η, x, z] E[y − 1I(η > 0)|η, x, z]f dη ∗ E(y |x, z) = E |x, z = f (η|x, z) f (η|x, z) −∞ Z ∞Z = [1I(η + x0 γ + e > 0) − 1I(η > 0)] dFe (e|η, x, z)dη −∞
Ωe
Z
Z
∞
[1I(η > s) − 1I(η > 0)] dηdFe (e|η, x, z)
= −∞
Ωe
Z
Z
∞
[1I(s ≤ η < 0)1I(s ≤ 0) − 1I(0 < η ≤ s)1I(s > 0)] dηdFe (e|η, x, z)
= Ωe
−∞
Z
Z 1I(s ≤ 0)
= Ωe
1
Z dη − 1I(s > 0)
0
Z =
s
dη dFe (e|x, z)
0
(x0 γ + e)dFe (e|x, z) = x0 γ + E(e|x, z).
(19)
Ωe
In the standard case described by the authors, when T ≥ 2 and θt is constant, γ is identified by running a 2SLS regression in first-difference of ∆yit∗ on ∆xit , using zi as instruments. In our model with multiplicative effects however, ∗ moment conditions are to be based on elements θs yit∗ − θt yis for time periods s, t, s 6= t, where θt ’s are parameters to be estimated. More precisely, we have E(yit∗ |xit , zi ) = x0it γ + θt E(αi |xit , zi ) + E(εit |xit , zi ) ∗ ⇔ θs E(yit∗ |xit , zi )−θt E(yis |xis , zi ) = (θs x0it −θt x0is )γ+θs θt E(αi |xit , zi )−θt θs E(αi |xis , zi ).
Moment conditions are therefore E [zi0 (θs εit − θt εis )] = 0 ∗ ⇔ E [zi0 (θs yit∗ − θt yis − (θs xit − θt xis )γ)] = 0,
16
∀t, s, t 6= s.
(20)
For T ≥ 2, there are L × T (T − 1)/2 such restrictions. A nonlinear GMM procedure can be implemented to jointly estimate γ and θt , t = 2, . . . , T , noting that θ1 can be normalized to 1. Because of multiplicative effects, the model is not covariance-stationary, and a two-stage estimation procedure might be used to account for the particular nature of heteroskedasticity in transformed error terms. When T = 3 for example, the variance-covariance submatrix of moment conditions above for individual i is σε2 zi Σzi0 with Σ = E(ui u0i ), εi2 − θ2 εi1 εi3 − θ3 εi1 θ2 εi3 − θ3 εi2 ui = εi4 − θ4 εi1 , θ2 εi4 − θ4 εi2 θ3 εi4 − θ4 εi3 where θ1 is normalized to 1, and 1 + θ22 θ2 θ3 −θ3 θ2 θ4 −θ4 0 2 θ2 θ4 θ3 θ2 θ3 θ4 0 0 2 2 −θ θ θ + θ 0 θ θ −θ Σ= 3 2 3 4 2 θ4 2 3 2 2 θ2 θ4 θ3 θ4 θ2 θ3 0 θ1 + θ4 θ2 0 −θ4 −θ2 θ3 θ3 θ2 θ3 θ32 + θ42
.
Starting from a unit matrix for Σ, consistent estimates for γ, θt , t = 2, 3, . . . , T ) can be obtained in a first stage. Given initial θˆt , the consistent weighing maˆ 0 can then be used in the second stage of the nonlinear GMM trix zi Σ(θ)z i procedure. Finally, the conditional density ft (ηit |xit , zi ) is typically estimated by a multivariate nonparametric kernel procedure. Let wit = (xit , zi ) denote the K + L vector of explanatory and instrument variables, and uit = (ηit , wit ) (a K + L + 1 vector). fˆ(ηit , wit ) and fˆ(wit ) respectively denote the estimated joint density function of ηit and components of wit , and the joint density associated to components of wit . These densities are NT X uit − uj K+L+1 −1 ˆ f (ηit , wit ) = N T h Km1 , h j=1 fˆ(wit ) =
Z
fˆ(ηit , wit )dηit = N T hK+L+1
NT −1 X j=1
17
Z Km1
ηit − ηj wit − wj , h h
dηit
= NTh
K+L −1
NT X
Km2
j=1
wit − wj h
,
where h is the window, Km1 (.) andR Km2 (.) are two multivariate kernels such R that Km2 (x) = Km1 (x, y)dy and Km2 (x)dx = 1.7 The conditional density is then estimated by fˆ(ηit , wit ) fˆt (ηit |xit , zi ) = . fˆ(wit )
6
Small-sample behavior
To investigate the small-sample behavior of fixed-effect Logit and MPL estimators, we undertake a Monte Carlo simulation experiment inspired from Heckman (1981). The data generating process is yit∗ = ηit + xit γ + (t/T )αi + εit , xit = 0.1t + 0.5xi,t−1 + νit , where νit is uniform on [−1/2, 1/2], εit has a logistic distribution with mean 0 and variance σε2 , and αi has a logistic distribution with mean 0 and variance σα2 . For means of comparison between the semiparametric and the Logit estimators, ηit is a random variable included in the model, as a normal variate with mean 0, variance 1 and a correlation coefficient of 0.35 with xit . To investigate the sensitivity of parameter estimates to the ratio σα /σε , we consider the following values for the variances: (σε2 = 1, σα2 = 0.25, σα2 = 1.0, σα2 = 4.0). The sample size is (N = 200, T = 4) and the number of replications is 5000. The true value of the parameter of interest γ = 1. Pis T With these values, the proportion of individual sequences with t = T or PT t = 0 is between 0.15 and 0.2 for all case parameters. In the simulation experiment, we consider five different estimators: the fixed-effect Logits with sufficient statistic τi (usual conditional Logit) and τi∗ (our procedure for heterogeneous trends), the MLE and Modified Profile Likelihood, and the semiparametric estimator. For the MLE and MPL, we solve numerically first-order conditions ∂li /∂αi = 0 in αi , i = 1, 2, . . . , N , and 7
See, e.g., Park and Marron (1990) on plug-in methods for selecting the optimal window.
18
replace these estimates in the (concentrated) log-likelihood function, which is then maximized with respect to β = (γ, θ). The semiparametric estimator is implemented using univariate and multivariate Gaussian Kernels for fˆ(η|x), from the Kernel Gauss package (Koning 1996). As ηit is non correlated with εit , the GMM (or 2SLS) procedure suggested by Honor´e and Lewbel (2000) reduces to a simple nonlinear least squares problem. Table 2 reports the mean and standard error of the five estimators. As can be seenPfrom this table, our conditional Logit estimator (based on the statistic τi∗ = t tyit ) has P limited small-sample bias compared to the usual Logit based on τi = t yit . Both fixed-effects Logit estimators tend to perform better when the ratio σα2 /σε2 is equal to 1. Although consistent against the usual alternative, our Logit procedure is less efficient for such limited number of time periods. This is because it uses less information to construct the conditional probabilities, as less possible sequences are available for τi∗ than for τi . Turning now to the MLE estimators, the bias in the “plain” concentrated log-likelihood MLE is clearly diminished in a drastic way by the Modified Profile Likelihood procedure. This is particularly true when σα2 = 0.25 and σα2 = 4.0, where the MPL performs at least equally well than the conditional Logit estimator in terms of small-sample bias. As for efficiency however, the MPL estimator has a much lower standard error, and can be seen therefore as an interesting candidate for such discrete choice models with multiplicative effects. This is even more so as in this case, time effects h(t) = θt are unknown parameters, whereas their structure has to be specified in the conditional Logit case. Finally, the semiparametric estimator suggested by Honor´e and Lewbel (2000), not surprisingly, exhibits a lower efficiency in the estimation of structural parameter γ than its Maximum Likelihood counterparts. The smallsample bias seems also to be significant, especially for large values of σα2 relative to σε2 .
7
Conclusion
This paper is an attempt toward extending panel data model specifications with multiplicative effects, i.e., when the individual effects are modulated by associated, common time effects, to nonlinear models. As the fixed-effects 19
Logit is popular in applications of discrete-choice models with panel data, it seems legitimate to consider such an extension, provided that adequate economic models of individual choice match such a specification. One might think of market conditions under which an exogenously-driven tendency may exist for all economic units to move toward an equilibrium level in which positive outcomes always (or never) happen, and when explanatory variables are stationary. A special case of this situation would be, e.g., a monotonic trend, either increasing or decreasing. As the number of time periods increases and eventually reaches values outside of the observed sample, all units would be located in either of the two possible equilibrium regimes, depending on the value of their associated unobserved heterogeneity component. Of course, this would depend on the relative magnitude of independent variables and the heterogeneous trend in the neighborhood of 0 for the underlying latent variable, when the number of time periods is limited. Alternatively, our model may find its motivation in the possibility to allow for heterogeneous sensitivity of economic units to common shocks, not necessarily trends. Global market conditions may condition individuals’ response in terms of the discrete-choice model, and the marginal response to the common shock may be assumed different across the population of agents. When the multiplicative time effects structure is known a priori, we propose a conditional Logit Maximum Likelihood estimator based on a sufficient statistic that explicitely accounts for the fact that positive outcomes are associated with different weights according to the position of the particular time period in the sequence. When this structure is unknown, we discuss the implementation of the Modified Profile Likelihood based on a transformation of incidental parameters. Finally, when the Logit assumption is relaxed and a semiparametric approach is preferred, we consider an extension suggested by Honor´e and Lewbel (2000) of their “pseudo-regression” semiparametric procedure. Small-sample properties of these alternative estimators are investigated with a Monte Carlo experiment. Simulation results reveal that our conditional Logit procedure performs reasonably well whereas the usual Logit clearly exhibits significant small-sample bias. On the other hand, the Modified Profile Likelihood estimator is seen as a very interesting candidate, as it does not require prior knowledge of the time effects structure.
20
References [1] Ahn, S.C., Y.H. Lee and P. Schmidt, 2001, GMM Estimation of Linear Panel Data Models with Time-varying Individual Effects, Journal of Econometrics 101, 219-255. [2] Andersen, E.B., 1973, Conditional inference and models for measuring (Mentalhygiejnisk Forlag, Copenhague). [3] Arellano, M., 2001, Discrete choices with panel data, working paper 0101, CEMFI. [4] Bertschek, I. and M. Lechner, 1998, Convenient estimators for the panel probit model, Journal of Econometrics 87, 329-371. [5] Chamberlain, G., 1992, Binary response models for panel data: identification and information, working paper, Harvard University. [6] Cox, D.R. and N. Reid, 1987, Parameter orthogonality and approximate conditional inference (with discussion), Journal of the Royal Statistical Association, Series B 49, 1-39. [7] Cr´epon, B., F. Kramarz and A. Trognon, 1997, Parameters of interest, nuisance parameters and orthogonality conditions: An application to autoregressive error component models, Journal of Econometrics 82, 135-156. [8] Heckman, J., 1981, The incidental parameters problem and the problem of initial conditions in estimating a discrete time-discrete data stochastic process, in: C.F. Manski and D. McFadden, eds., Structural analysis of discrete data with econometric applications (MIT Press, Cambridge) 179-195. [9] Holtz-Eakin, D., W. Newey and H. Rosen, 1988, Estimating vector autoregressions with panel data, Econometrica 56, 1371-1395. [10] Honor´e, B.E. and A. Lewbel, 2000, Semiparametric binary choice panel data models without strictly exogeneous regressors, working paper, Boston College.
21
[11] Hsiao, C., 1992, Logit and Probit models, in: L. Matyas and P. Sevestre, eds., The econometrics of panel data, chap. 11 (Kluwer Academic Publishers, Dordrecht) 223-241. [12] Koning, R.H., 1996, Kernel: A Gauss library for kernel estimation, www.xs4all.nl/˜rhkoning/gauss.htm [13] Lancaster, T., 1999, Orthogonal parameters and panel data, working paper, Brown University. [14] Lancaster, T., 2000, The incidental parameter problem since 1948, Journal of Econometrics 95, 391-413. [15] Lewbel, A., 2000, Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables, Journal of Econometrics 97, 145-177. [16] Magnac, T., 2001, Binary variables and fixed effects: generalizing conditional logit using global cuts, working paper, CREST. [17] Manski, C.F., 1985, Semi-parametric analysis of discrete response: Asymptotic properties of the Maximum Score estimator, Journal of Econometrics, 27, 313-333. [18] Nauges, C. and A. Thomas, 1999, Consistent estimation of dynamic panel data models with time-varying individual effects, paper presented at the North American Summer Meeting of the Econometric Society, Madison, Wisconsin, June 23-27. [19] Neyman, J. and E.L. Scott, 1948, Consistent estimates based on partially observed observations, Econometrica 16, 1-32. [20] Park, B.U. and J.S. Marron, 1990, Comparison of data-driven bandwidth selectors”,Journal of the American Statistical Association 85, 6672. [21] Verbeek, T.J. and T. Knaap, 1999, Estimating a dynamic panel data model with heterogenous trends, Annales d’Economie et de Statistique, 56-56, 331-349.
22
Table 2: Simulation experiment. Parameter estimates σα2 = 0.25 σα2 = 1.0 σα2 = 4.0 Usual conditional Logit 0.8109 0.9188 0.7996 (0.2371) (0.1823) (0.2033) Our Logit 0.9472 1.0623 0.9743 (0.4507) (0.3130) (0.4074) Logit MLE 1.3360 1.2552 1.3672 (0.2371) (0.2571) (0.3287) Logit MPL 1.0649 0.9965 0.8682 (0.224) (0.2323) (0.2075) Semiparametric estimator 1.0838 1.1455 1.1214 (0.3873) (0.4189) (0.4792) Notes. Standard errors are in parentheses. N = 200 and T = 4. σε2 = 1. True value is γ P = 1. Based on 5000 replications. The usual conditional Logit uses τi = t yit as a sufficient P statistic, whereas our Logit procedure for multiplicative effects uses τi∗ = t h(t)yit with h(t) known. Logit MPL is the Modified Profile Likelihood, along the lines of Cox and Reid (1987). The semiparametric estimator is the Honor´e and Lewbel (2000) proposed extension of their additive individual-effects model.
23