Aug 15, 2008 - in school and identified as the only wife of the household head. There are .... 5 Note that age, age squared/100, high school, university, Aegean, North West, North East and constant in ...... Publishing Company, Darien, CONN.
CORRECTION OF SELECTIVITY BIAS IN MODELS OF DOUBLE SELECTION Berk Yavuzoglua , Insan Tunalib , Elvan Ceyhanc a Department
of Economics, University of Wisconsin-Madison, Madison, Wisconsin 53706-1393, USA
b Department c Department
of Economics, Koc University, Sariyer, I·stanbul 34450 Turkey
of Mathematics, Koc University, Sariyer, I·stanbul 34450, Turkey
August15, 2008 ——————————————————————————————————————
Abstract Edgeworth expansions are known to be useful for approximating probability distributions and moments. In our case, we exploit the expansion to specify models of double selection by relaxing the trivariate normality assumption. We assume bivariate normality among the error terms in the two selection equations and allow the distribution of the error term in the regression equation to be free. This sets the stage for a simple test for the more common trivariate normality speci…cation. Other recently proposed methods for handling multiple categories are Multinomial Logit based selection correction models. An empirical example about the wage determinants of regular and casual female workers in Turkey is presented to document the di¤erences among the results obtained from our selectivity correction approach, trivariate normality speci…cation and Multinomial Logit based selection correction models. Keywords: double selection models; Edgeworth expansion; female labor supply; Multinomial Logit based selection correction models; selectivity bias JEL codes: C34, C35, C63. ——————————————————————————————————————
1
Motivation
Availability of a random sample provides the conventional setting for estimation of population parameters. In this process, if the outcome variable is observed just for a subsample of the original sample and this subsample itself fails to be a random sample, we encounter what is called selectivity. In case of selectivity, reduced form estimates obtained by conventional linear regression will be inconsistent for the population parameters. Selectivity problem, which refers to the non-randomness of a sample caused by selection according to speci…ed rules, has been a topic of interest for many researchers in econometrics and statistics in the last 30 years. To solve the selectivity problem, most of the literature deals with one selection equation according to which the non-random sample is formed. Heckman (1976, 1979) and Lee (1976, 1979) solved this problem by assuming bivariate normality. Lee (1982) extended this literature by dropping the bivariate normality assumption and relying on an Edgeworth expansion. Tunali (1986) built on Heckman’s and Lee’s original work by bringing a second selection equation into the scene and was able to handle complete and incomplete classi…cations using a double selection model. He assumed trivariate normality among the random disturbances of these three equations. Our contribution on the subject is relaxing the trivariate normality assumption by extending Lee (1982). In obtaining our correction, we do not impose any condition on the form of the distribution of the random disturbance in the regression equation, but conveniently assume bivariate normality between the random disturbances of the two selection equations. This apparently strong distributional assumption can be defended in light of Ruud (1983) and Stoker (1986), or quasi-maximum likelihood methodology. As documented in Miller (2005), Edgeworth expansion was introduced by F. Y. Edgeworth in 1905 and dubbed "Edgeworth’s series" by E. P. Elderton in 1906. H. Cramer established the theory of "Edgeworth’s series" in the 1920s. The presently popular term Edgeworth expansion was used for the …rst time by David et al. (1951). The topic has received considerable attention in statistics and econometrics. In order to sample the contributions, one may look at Bhattacharya and Rao (1976), Bhattacharya and Ghosh (1978), Rothenberg (1983) and Kolassa and McCullagh (1990). Although double selection problem can be solved by employing other methodologies, we
2
opt for Edgeworth expansion because it is easier to implement and yields a richer functional form for selectivity correction. This form nests the version obtained under trivariate normality and provides a test of that restriction. We illustrate our robust selectivity bias correction in the empirical context of Tunali and Baslevent (2006), where the goal is to estimate a wage equation subject to participation and mode of employment choices. Multinomial Logit based selection correction models have also been developed for the purpose of handling two or more rounds of selection. These models have a very di¤erent structure. They de…ne a selection equation for each possible state, and the random disturbances of these selection equations are assumed to be independent and identically Gumbel distributed (Bourguignon et al., 2007). Our model has the advantage of allowing correlation between the random disturbances of the two selection equations. Another advantage of our model is the accommodation of states which are not disjoint with another state (as in Tunali and Baslevent, 2006). In such cases, using a Multinomial Logit based selection correction model is equivalent to ignoring the structure of the problem. Nonetheless, we present the results obtained from some well-known Multinomial Logit based selection models: Lee (1983), Dubin and McFadden (1984) and its two variants proposed by Bourguignon et al. (2007), and Dahl (2002). In Section 2, we introduce the theory related to our problem. We start with the de…nition of the double selection problem; then, discuss various solution strategies and reasons for employing Edgeworth expansion. After explaining Edgeworth expansion in detail and developing a theoretical solution for the double selection problem, we conclude with an explanation of the estimation procedure. In Section 3, we provide an empirical example which illustrates our selectivity correction. In the following section, we revisit the example using Multinomial Logit based selection correction models. We compare these results with those based on our selectivity correction approach. In the …nal section of the paper, we present the conclusion.
3
2
Theory
2.1
Double Selection Problem
We take as our point of departure the following generic representation of the double selection problem: y1i = y2i = y3i =
0
1 X1i 0
2 X2i 0
3 X3i
+
1 u1i
(…rst selection rule),
(1)
+
2 u2i
(second selection rule),
(2)
+
3 u3i
(regression equation).
(3)
For k = 1; 2; 3 and individual i, Xki ’s are vectors of explanatory variables,
k ’s
are the
corresponding vectors of unknown coe¢ cients, uki ’s are the random disturbances, and
k ’s
are unknown scale parameters. The variables y1i and y2i are unobserved continuous random variables, but without loss of generality, functions of them classify individuals to di¤erent categories according to a sample selection regime denoted by
. From now on, we drop
subscript i to avoid notational clutter. Let X = X1 [ X2 [ X3 . The feature of interest is E(y3 j X; ) =
0
3 X3
+
3 E(u3
j X; ):
(4)
Selectivity is manifested via E(u3 j X; ) 6= 0, which renders conventional linear regression inconsistent. Parametric approaches exploit additional assumptions which yield a functional form for E(u3 j X; ), so that adjustment for selectivity can be implemented. It is customary to assume that (u1 ; u2 ; u3 ) is distributed independently across individuals and of Xk for k = 1; 2; 3. Once the selectivity adjustment component is consistently estimated, linear regression becomes a viable choice. The goal, then, is to consistently estimate on the sample selection regime determined by
3
conditional
.
In the cases covered in Tunali (1986), the selection rule can be expressed as
= fD1 ; D2 g
where D1 and D2 are two dichotomous variables indicating the outcomes of the two selection rules as follows:
D1 =
8 9 > < 1 if y > 0 > = 1 > : 0 if y1 6 0 > ;
and D2 =
8 9 > < 1 if y > 0 > = 2 > : 0 if y2 6 0 > ; 4
:
(5)
Then, we can characterize the outcomes resulting from the two selection equations in the 2 table, where Sj is the set of individuals in the j th subsample for j = 1; 2; 3; 4.
following 2
D2 0
1
D1 0
S1
S2
1
S3
S4
Figure 1. Possible outcomes resulting from selection equations. The case in which we observe all the cells of 2
2 table yields a complete classi…cation
of the original sample. However, incomplete classi…cation cases, in which only three or two distinct cells of 2
2 table are observed, are also possible. Various sample selection regimes
which …t this set-up are discussed in Tunali (1986). Without loss of generality, he assumes that y3 is observed for the subsample S4 , so that the conditional expectation function of interest given in Equation (4) specializes to:
E(y3 j X; D1 = 1; D2 = 1) =
0
3 X3
=
+
3 X3
+
3 E(u3 j 3 E(u3
X; D1 = 1; D2 = 1) j
1 u1
>
0
1 X1 ;
2 u2
>
0
2 X2 ):
(6)
The conditioning on X is implicit throughout, but we drop it for notational convenience. Tunali (1986) assumes that u1 , u2 and u3 have a trivariate normal distribution and sets 1
=
2
choices of
= 1. The latter is an innocuous assumption for his set-up. However, for other , this is no longer the case. In obtaining our solution, we relax both restrictions.
In particular, we let the form of the distribution of the random disturbance in the regression equation be free. To prevent any confusion, under our distributional assumption, we denote this term by u ~3 instead of u3 , and assume that it has expectation E(~ u3 ) = 0 and variance V ar(~ u3 ) = 1. However, we maintain that u1 and u2 have a standard bivariate normal distribution. In Section 2:3, after setting of E(~ u3 j u1 >
0
1 X1 ; u 2
>
1
=
2 X2 )
strategy for consistent estimation of
2
= 1 for convenience and giving the functional form
obtained via an Edgeworth expansion, we provide a 3.
The methodology can easily be extended to other
cases covered in Tunali (1986). Presently, we situate our approach within a broader distri5
butional context and justify our choice. We relax the variance restriction when we confront the empirical problem in Tunali and Baslevent (2006).
2.2
Solution Strategies
Given our distributional assumptions, numerous approaches can be pursued for the purposes of estimation. Kolassa (1994) o¤ers a broad list of series approximations and discusses the computational details. One commonly used approximation is the Normal Inverse Gaussian, examined in detail in Eriksson et al. (2004). Unlike Maximum Likelihood Estimation, this method involves an incomplete speci…cation: the conditional expectation is written without specifying the joint distribution. Even though this method works well for the univariate case, its behavior is not known in the multivariate case. Edgeworth expansion also involves an incomplete speci…cation, but is easier to apply compared to the alternatives. For our purposes, the most appealing characteristic of Edgeworth expansion is the fact that it nests the conventional trivariate normality assumption. Under the trivariate normality assumption, conditional expectation function E(u3 j u1 ; u2 ) becomes a linear function of u1 and u2 . In other words, trivariate normality allows us to express E(u3 j
) using only the
…rst and second central moments. Edgeworth expansion provides an improvement because higher order moments are introduced. These allow us to relax the conventional restrictions imposed on the skewness and kurtosis of the distribution. Skewness is a measure of the degree of asymmetry of a distribution. If the left tail is more pronounced than the right tail, the function is said to have negative skewness or said to be left skewed. If the reverse is true, it has positive skewness or is right skewed. If two tails are symmetric as in a normal distribution, then the density function has zero skewness. The skewness of a multivariate distribution is de…ned by a function of 3rd order central moments (Mardia, 1970b). For the univariate case, it is given by
=
3=2 3= 2
where
i
= E(x E(x))i .
Kurtosis is the degree of peakedness of a distribution compared to the normal distribution, and is de…ned as a function of 4th order central moments (Mardia, 1970b). For the univariate case, the kurtosis is de…ned as & =
2 4= 2
.
Since only the …rst and second central moments are used under the trivariate normality assumption, the conventional method will not work well if our data are skewed and/or kurtic. In the derivation of our Edgeworth expansion, we also use the third and fourth central 6
moments. This allows us to test the restrictions on skewness and kurtosis. Most real life data reveal some amount of skewness and/or kurtosis. Data are said to be skewed or kurtic if a non-normal distribution is involved in their generation. Lahiri and Song (1999) provide an example in the context of smoking related diseases.
2.3
Trivariate Edgeworth expansion
Let f (u1 ; u2 ; u ~3 ) be the joint density of u1 , u2 and u ~3 . The triplet (u1i ; u2i ; u ~3i ) is assumed to be independently and identically distributed across individulas with zero mean vector and covariance matrix 2 12 6 1 6 =6 6 12 1 4 13
23
13 23
1
3
7 7 7; 7 5
and is independent of Xki for k = 1; 2; 3. It is helpful to keep in mind that u1 , u2 and u ~3 are the random disturbances of Equations (1) - (3). For brevity, we suppress the conditioning on covariate matrix X and the individual subscript i throughout the section. We denote the standard trivariate normal density STVN ( and the standard bivariate normal density SBVN (
12 )
12 ; 13 ; 23 )
by g(u1 ; u2 ; u3 )
by g(u1 ; u2 ). Under some general
conditions given by Chambers (1967) and conditional on existence of all the moments of u1 , u2 , and u ~3 , the trivariate density f (u1 ; u2 ; u ~3 ) can be expanded in terms of a series of derivatives of g(u1 ; u2 ; u3 ):
f (u1 ; u2 ; u ~3 ) = g(u1 ; u2 ; u3 ) +
X
r+s+p 3
( 1)r+s+p Arsp Dur 1 Dus 2 Dup 3 g(u1 ; u2 ; u3 ); r!s!p!
(7)
where Arsp ’s are functions of the moments of f (u1 ; u2 ; u ~3 ), and Dur 1 Dus 2 Dup 3 g(u1 ; u2 ; u3 ) =
@ r+s+p g(u1 ; u2 ; u3 ) : @ur1 @us2 @up3
(8)
The iterative relation between the derivatives in Equation (8) can be captured with the help of Hermite polynomials. In our case, a bivariate version is su¢ cient. The bivariate
7
Hermite polynomial of (r; s)th order, denoted by Hrs (u1 ; u2 ), satis…es Dur 1 Dus 2 g(u1 ; u2 ) = ( 1)r+s Hrs (u1 ; u2 )g(u1 ; u2 ):
(9)
In other words, Hrs (u1 ; u2 ) is a function of the ratio of (r; s)th to (0; 0)th order derivatives of g(u1 ; u2 ). With the help of Equation (9), we can deduce from Equation (7) that the joint density of u1 and u2 can be written as: 2
h(u1 ; u2 ) = 41 +
X
r+s 3
3 1 Ars0 Hrs (u1 ; u2 )5 g(u1 ; u2 ): r!s!
(10)
Lee (1984) exploits this expression for testing bivariate normality in single selection models. Employing the usual practice in the literature (see Lee, 1982; Lee, 1984; Lahiri and Song, 1999), we consider terms up to 4th order (r + s + p = 4 in Equation (7)). This amounts to truncating the series approximation. While this practice is known as a Gram Charlier Expansion for the univariate case, for the bivariate case, it is called a Type AaAa surface in Pretorious (1930) and a Type AA surface in Mardia (1970a). For the trivariate case, we may adopt Mardia’s (1970a) designation and identify the expansion as a Type AAA surface. We further know that for r + s + p
4, Arsp = Krsp where Krsp ’s denote trivariate cumulants
(Lahiri and Song, 1999). Returning to the generic representation of the double selection problem, we assume (u1 ; u2 )
SBVN (
12 )
in Equations (1) and (2), but allow the distribution of u ~3 to be free
with E(~ u3 ) = 0 and V ar(~ u3 ) = 1 in Equation (3) and set
1
=
2
= 1. Extending the steps
given in Mardia (1970a), we get the following operationally useful formula for the Type AAA surface: 1 E(~ u3 j u1 ; u2 ) = K101 H10 (u1 ; u2 ) + K011 H01 (u1 ; u2 ) + K201 H20 (u1 ; u2 ) 2 1 1 + K021 H02 (u1 ; u2 ) + K111 H11 (u1 ; u2 ) + K301 H30 (u1 ; u2 ) 2 6 1 1 1 + K031 H03 (u1 ; u2 ) + K211 H21 (u1 ; u2 ) + K121 H12 (u1 ; u2 ): (11) 6 2 2
8
Equivalently,
E(~ u3 j u1 ; u 2 ) =
1
1 + K201 H20 (u1 ; u2 ) 2 1 1 1 + K021 H02 (u1 ; u2 ) + K111 H11 (u1 ; u2 ) + K301 H30 (u1 ; u2 ) 2 6 1 1 1 + K031 H03 (u1 ; u2 ) + K211 H21 (u1 ; u2 ) + K121 H12 (u1 ; u2 ): (12) 6 2 2 2 [( 13 12
12 23 )u1
+(
23
12 13 )u2 ]
The derivation is provided in Appendix 6:1. Appendix 6:2 gives explicit expressions for the bivariate Hermite polynomials in Equation (11), and Appendix 6:3 presents derivations of moments of the truncated SBVN distribution up to the third order.1 Incorporating the explicit formulas for the Hermite polynomials into Equation (12) and using the moment formulas of the truncated SBVN distribution, we obtain:
E(~ u3 j u1 >
0
1 X1 ; u 2
>
2 X2 )
=
13 1
+
1 + K301 6
6
23 2
1 + K201 2
1 + K031 6
7
Explicit formulas of ’s are given in Appendix 6:4. Note that
3
+ K111
1 + K211 2 1
and
2
8
4
1 + K021 2
1 + K121 2
9:
5
(13)
are the terms given
in Tunali (1986) and omission of the remaining terms in Equation (12) corresponds to the standard trivariate normality speci…cation. In theory, we can exploit Equation (10) and use approximation to the same order (r + s = 4) for the bivariate component, h(u1 ; u2 ), instead of assuming standard bivariate normality. However, this will add 9 terms to the denominator. Since each of these terms includes a di¤erent cumulant, the problem becomes intractable. We therefore rely on the SBVN distribution. Two strands in the econometrics literature help us justify our choice. First, as Ruud (1983) and Stoker (1986) have shown, slope coe¢ cients of index-function models can be consistently estimated up to a factor of proportionality using any commonly used technique like probit or logit. Second is the work on quasi maximum likelihood methodology: If the correct speci…cation of the distribution is in the linear exponential family and we incorrectly specify the model by using another distribution from that family, we still get consistent 1
Our moment derivations of the truncated SBVN distribution up to the second order corroborate with Henning and Henningsen’s (2007) formulas, but di¤er from Rosenbaum (1961). In our view, Rosenbaum (1961) made a sign error while calculating integrals. Besides, Henning and Henningsen (2007) cross-check their formulas using numerical integration and Monte Carlo simulation.
9
estimates. As seen from the formulas of ’s in Appendix 6:4, the model is highly non-linear. Therefore, adopting the practice in the literature, we rely on Heckman’s (1979) two-step estimation procedure to estimate Equation (12). In the …rst step, we maximize the relevant likelihood 2 12 .
function to get consistent estimates of ’s and
Then, we …t the following linear regression
for the individuals in S4 :
y3 =
0
3 X3
+
where E( j u1 >
^ +
1 1
0
^ +
2 2
1 X1 ; u 2
>
^ +
3 3
^ +
4 4
2 X2 )
^ +
5 5
^ +
6 6
^ +
^ +
7 7
8 8
^ +
9 9
3
; (14)
= 0. We may refer to ^ ’s as selectivity correction
terms. Then, we can test for the presence of selectivity bias by testing joint signi…cance of all selectivity correction terms ^ 1 , ^ 2 , :::, ^ 9 . Furthermore, we can set
3
=
4
= ::: =
9
= 0 to
test for the conventional trivariate normality speci…cation under the maintained assumption that the random disturbances in the two selection equations are bivariate normally distributed.3 The latter test can also be thought as a test for linearity of the conditional expectation of u ~3 given u1 and u2 .
3
Application
3.1
Problem
We illustrate our methodology in the context of the empirical work undertaken in Tunali and Baslevent (2006). The data come from October 1988 Household Labor Force Survey conducted by the State Institute of Statistics. The working sample consists of 8962 married women between 20-54 years old who reside in urban areas (population > 20000), are not in school and identi…ed as the only wife of the household head. There are four labor force participation categories in the data set: 7770 non-participants, 745 wage workers (or regular/casual wage and salary earners), 181 self-employed which include employers, own-account work and unpaid family workers, and 266 unemployed. The goal of the paper is to estimate labor supply elasticities for wage workers. This calls for estimation of a wage equation for 2
The formation of the likelihood function according to di¤erent sample selection regimes is explained in detail by Tunali (1986). 3 Lahiri and Song (1999) provide a Lagrange Multiplier procedure for testing trivariate normality assumption in a sequential selection context.
10
the subsample of wage workers, taking selection into consideration. Tunali and Baslevent (2006) assume that home-work (or non-participation), self-employment and wage work utilities can be expressed as follows. Home-work utility : U0 = Self-employment utility : U1 = Wage work utility : U2 =
0
0z 0
1z 0
2z
where z is a vector of observed variables, coe¢ cients and
j ’s
+
0;
(15)
+
1;
(16)
+
2;
(17)
j ’s
are the corresponding vectors of unknown
are the random disturbances. Assuming that individuals choose the
state with highest utility, their decision can be captured using the utility di¤erences: y1 = U1
U0 = (
y2 = U2
U1 = (
0
0
0
0
1 2
0 )z 1 )z
+(
1
0)
=
+(
2
1)
=
0
1z 0
2z
+
1 u1 ;
(18)
+
2 u2 :
(19)
This pair of selection equations has the same form as the ones in Equations (1) and (2) with X1 = X2 = z. Note that y1 can be expressed as the propensity to be self-employed or the excess of the utility gained from self-employment compared to home-work and y2 as the incremental propensity to engage in wage work rather than self-employment. Then, y1 + y2 gives the excess utility under wage work over home-work. Even though we do not know the preferences of unemployed women, Tunali and Baslevent (2006) follow Magnac (1991) and de…ne them as people obtaining higher utility either from self-employment or wage work relative to home-work. The four way classi…cation observed in the sample can be obtained via the categorical variable LF P (Labor Force Participation)
LF P =
8 > > 0 = home-work, if y1 < 0 and y1 + y2 < 0; > > > > > < 1 = self-employment, if y > 0 and y < 0; 1 2 > > 2 = wage labor, if y2 > 0 and y1 + y2 > 0; > > > > > : 3 = unemployed, if y > 0 or y + y > 0: 1 1 2
Hence, the sample selection regime is given by
9 > > > > > > > = > > > > > > > ;
= fLF P g for this problem. In Appendix
6:5, the methodology given in Section 2:3 is adapted to handle the current case.
11
(20)
Since probability of being unemployed is de…ned as a union of self-employed and wage work probabilities, we conclude that our support is broken down into three mutually exclusive regions, which correspond to LF P = 0; 1; 2. Considering Equation (20), we see that the classi…cation in our sample is obtained via y1 , y2 and y1 + y2 . Suppose we were to normalize the variances of y1 and y1 + y2 to 1, but this has an implication for the variance of y2 (
2 2
=
2
12 ).
Hence, we can apply the normalization to one of
11
=
2 1
and
22
and must let the other variance be free. In our analysis, we apply the normalization and let
22
=
2, 2
11
=1
be free. In the …rst step, we rely on maximum likelihood estimation and obtain
consistent estimates of
1,
2,
12
and
2
subject to
1
= 1. The relevant likelihood function
is given by
L=
Y
Y
P0
LF P =0
P1
LF P =1
Y
LF P =2
P2
Y
P3 ;
(21)
LF P =3
where Pj = P r(LF P = j) for j = 0; 1; 2; 3 and their explicit de…nitions are provided in Appendix 6:5. The regression equation for this problem is a Mincer-type wage equation obtained by setting y3 = log(wage) in Equation (3) where X3 includes human capital variables and labor market characteristics:
log(wage) =
0
3 X3
+
The aim is to estimate
3 u3 :
3
(22)
for wage workers, who have LF P = 2. As in Equation (14),
after forming the estimates of selectivity correction terms via …rst step estimates, we run a linear regression equation with 9 selectivity correction terms in the second step.
3.2
Results
Table 1 provides the summary statistics for the variables used in the …rst stage of the analysis. We …rst run a normalized (
11
= 1) maximum likelihood estimation as Tunali and Baslevent
(2006) did. This step is estimated using derivative free Maximum Likelihood option (lf) of Stata by restricting the correlations and variances according to their theoretical ranges using non-linear transformations. Besides, the probability weights coming from the data are used
12
Table 1. Sample Means (Standard Deviations) of Variables by Labor Force Participation Status Variable Full NonSelfWage Unsample participant employed worker employed Own wage 1.39 (3.17) Age 33.29 33.41 34.28 32.82 30.65 Age squared/100 11.75 11.86 12.37 11.18 9.84 Experience 20.29 20.82 21.31 15.76 16.93 Experience squared/100 4.87 5.08 5.26 3.11 3.39 Illiterate (reference) 0.25 0.27 0.21 0.055 0.13 Literate without a diploma 0.090 0.095 0.12 0.034 0.071 Elementary school 0.49 0.52 0.47 0.22 0.48 Middle school 0.058 0.055 0.10 0.060 0.094 High school 0.087 0.059 0.072 0.35 0.20 University 0.030 0.0069 0.022 0.28 0.023 Husband self-employed 0.33 0.35 0.39 0.16 0.18 Children aged 0-2 0.26 0.26 0.18 0.22 0.22 Children aged 3-5 0.34 0.35 0.28 0.27 0.34 Female children aged 6-14 0. 41 0.42 0.49 0.32 0.36 Male children aged 6-14 0.44 0.45 0.49 0.34 0.43 Extended household 0.12 0.13 0.14 0.11 0.079 Ext. hh × Children aged 0-2 0.029 0.029 0.028 0.031 0.019 Ext. hh × Children aged 3-5 0.036 0.036 0.033 0.039 0.019 Ext. hh × Female ch. aged 6-14 0.045 0.046 0.044 0.032 0.030 Ext. hh × Male ch. aged 6-14 0.047 0.049 0.050 0.039 0.030 Share of textiles 0.31 0.31 0.39 0.30 0.31 Share of agriculture 0.17 0.18 0.20 0.15 0.19 Share of finance 0.060 0.059 0.053 0.064 0.056 Migration rate 0.024 0.023 0.021 0.035 0.018 Marmara (reference) 0.35 0.35 0.32 0.37 0.28 Aegean 0.11 0.11 0.10 0.14 0.21 South 0.12 0.12 0.26 0.099 0.13 Central 0.19 0.18 0.10 0.23 0.17 North West 0.040 0.037 0.028 0.055 0.075 East 0.078 0.081 0.088 0.044 0.086 South East 0.081 0.087 0.088 0.039 0.015 North East 0.033 0.034 0.0055 0.027 0.041 Population 200,000 or more 0.67 0.66 0.69 0.74 0.60 Population 1 million or more 0.51 0.50 0.43 0.61 0.46 Share of Welfare Party 0.073 0.074 0.059 0.062 0.058 Share of Left of Center 0.35 0.35 0.33 0.36 0.36 Sample size 8962 7770 181 745 266
13
since sampling frame relies on strati…cation. Robust standard errors are reported throughout the section. Table 2 provides the results of the …rst step. Based on the likelihood ratio statistics, 4-way model …ts the data well (p value < 0:0001). Since ^12 is very close to
1, computational fragility is an issue. A cure for computational
fragility in multinomial probit models may be adding alternative speci…c variables for each election equation as shown by Keane (1993). Since alternative speci…c variables are not available, this venue is not a viable option. To explore the issue, Tunali and Baslevent (2006) estimate restricted versions of the model in which they set
22
=B
11
for B = 0:35, 0:5; 1,
and 2 to see the sensitivity of the results. They state that all these restricted versions lead to very similar inferences. However, they only report the estimates of the restricted version with B = 0:35 since ^ 22 is equal to 0:358 in the normalized version.4 This version improves the precision of several slope estimates.5 Detailed discussion of the substantive results can be found in Tunali and Baslevent (2006). We review some of these results in Section 4:3 in order to compare them with the …rst stage results of Multinomial Logit based selection correction models. The negative sign of ^12 can be interpreted as follows: if the unobserved characteristics of a woman make her more likely to choose self-employment rather than home-work, these unobserved characteristics also make her less likely to choose wage work over self-employment. (Tunali and Baslevent, 2006) We provide three least squares estimates of the wage equation in Table 3: without selectivity correction terms, with conventional correction (corresponds to trivariate normality speci…cation), and with robust correction (corresponds to the speci…cation of this paper). We exclude 10 observations in these regressions due to missing wage information. Note that experience corresponds to potential labor market experience, which equals age
7
years
of schooling for individuals having elementary school education or above and age
12 for
individuals who have not …nished elementary school. The regression equation is estimated after excluding 15 variables that appear in the selection equations and putting experience and experience squared/100 instead of age and 4
We do not provide the …rst step estimates of the restricted version with B = 0:35 here since the coe¢ cient estimates di¤er at most by 4% from the normalized version. Furthermore, we only need point estimates of the normalized version in order to correct selectivity bias in the regression equation. 5 Note that age, age squared/100, high school, university, Aegean, North West, North East and constant in the …rst selection equation, and literate without a diploma, middle school, share of …nance and population 1 million or more in the second selection equation become signi…cant in the restricted version with B = 0:35 in addition to the signi…cant variables in Table 2.
14
Table 2. Maximum Likelihood Estimates of Reduced Form Participation Equations (Normalized Version) Variable First Selection Second Selection Coefficient Standard Coefficient Standard Error Error Age 0.077 0.066 -0.010 0.029 Age squared/100 -0.124 0.096 0.024 0.041 Literate without a diploma 0.256*** 0.099 -0.185 0.122 Elementary school 0.124 0.085 -0.048 0.075 Middle school 0.471*** 0.138 -0.194 0.187 High school 0.534 0.580 0.090 0.116 University 0.931 1.050 0.149 0.154 Husband self-employed -0.081 0.242 -0.152* 0.092 Children aged 0-2 -0.178** 0.075 0.077 0.089 Children aged 3-5 -0.078 0.077 0.012 0.053 Female children aged 6-14 0.055 0.103 -0.075 0.094 Male children aged 6-14 0.031 0.089 -0.072 0.065 Extended household 0.189 0.133 -0.167 0.130 Ext. hh. × Children aged 0-2 0.096 0.181 -0.047 0.182 Ext. hh. × Children aged 3-5 -0.058 0.178 0.091 0.149 Ext. hh. × Female ch. aged 6-14 -0.252 0.165 0.221 0.161 Ext. hh. × Male ch. aged 6-14 -0.083 0.161 0.103 0.137 Share of textiles 0.058 0.242 0.172 0.153 Share of agriculture -0.759* 0.432 0.952*** 0.356 Share of finance -3.686* 2.121 2.918 2.637 Migration rate -3.642** 1.613 4.214*** 1.515 Aegean -0.229 0.228 0.326* 0.180 South -0.038 0.101 -0.015 0.093 Central -0.508* 0.263 0.547* 0.292 North West -0.392 0.306 0.526** 0.262 East -0.381 0.233 0.424* 0.233 South East -0.291* 0.151 0.149 0.201 North East -0.850 0.546 0.970* 0.528 Share of Welfare Party -5.969*** 2.063 5.628** 2.857 Share of Left of Center -0.697 0.695 1.172** 0.508 Population 200,000 or more 0.114 0.088 -0.029 0.123 Population 1 million or more -0.160* 0.092 0.139 0.099 Constant -1.688 1.340 -0.242 0.643 σ11 1 [normalized] σ22 0.358 (0.514) ρ12 -0.971*** (0.046) No. of observations 8962 Log-likelihood without covariates -3247.25 Log-likelihood with covariates -2443.05 Robust standard errors are reported in parentheses. * significant at 10%; ** significant at 5%; *** significant at 1%.
15
Table 3. Least Squares Estimates of the Wage Equation Variable With robust With conventional correction correction Coef. Std. Coef. Std. Error Error Experience 0.032*** 0.012 0.034*** 0.013 Experience sq./100 -0.064** 0.033 -0.072** 0.037 Literate w/o a diploma -0.119 0.174 -0.080 0.181 Elementary school -0.055 0.116 -0.028 0.120 Middle school 0.215 0.149 0.256* 0.144 High school 0.594*** 0.191 0.437** 0.184 University 1.254*** 0.272 1.083*** 0.250 Share of Textiles -0.308** 0.150 -0.282** 0.143 Share of Agriculture 0.098 0.237 0.085 0.238 Share of Finance 3.437*** 1.319 2.557** 1.225 Aegean -0.135* 0.072 -0.169** 0.073 South 0.162** 0.081 0.172** 0.080 Central 0.035 0.072 0.007 0.073 North West -0.151 0.100 -0.170* 0.099 East 0.109 0.114 0.096 0.115 South East 0.130 0.118 0.086 0.114 North East -0.137 0.117 -0.210* 0.113 -0.044 0.687 0.171 0.146 ˆ λ1 2.441 3.132 -0.458** 0.182 λˆ2
Without selectivity correction terms Coef. Std. Error 0.031** 0.013 -0.064* 0.037 -0.112 0.186 -0.011 0.121 0.231* 0.125 0.465*** 0.119 1.038*** 0.128 -0.407*** 0.122 0.150 0.238 4.241*** 1.158 -0.131* 0.069 0.159* 0.082 0.029 0.073 -0.166* 0.096 0.142 0.113 0.191* 0.100 -0.129 0.115 ― ―
λˆ3 λˆ
0.135
0.412
―
―
2.033
2.784
―
―
λˆ5 λˆ
2.076
2.897
―
―
-0.060
0.215
―
―
0.625
1.006
―
―
0.595
1.291
―
―
4
6
λˆ7 λˆ 8
-1.149 2.080 ― λˆ9 Constant -1.087** 0.519 -0.927*** 0.345 No. of obs. 735 735 R2 0.387 0.378 Robust standard errors are in parentheses. * significant at 10%; ** significant at 5%; *** significant at 1%.
― -0.923*** 0.194 735 0.367
age squared/100. In order to be con…dent that exclusion of these variables do not cast any doubt in our …ndings, we conduct Sargan’s (1984) test for overidentifying restrictions on our robust correction.6 The regression of the residuals from the wage equation on the 6
We do not include age and age squared/100 in this test since experience is a highly collinear function of age.
16
15 excluded variables generates an R2 = 0:0035. Let n denote the sample size. Then, nR2 = 2:5725
2 (15),
which is well within the acceptance region. Hence, the exclusions are
warranted. Using robust correction results, a test for the joint signi…cance of all the selectivity correction terms yields strong evidence in favor of non-random selection (p Moreover, the test
3
=
4
= ::: =
against conventional correction (p
9
value = 0:0006).
= 0 provides evidence in favor of our robust correction
value = 0:0424).
Only a fraction of women in urban areas participate and participation probability increases dramatically with education. In light of the …rst stage results, one would think that participants do not constitute a random subsample and this has implications for the estimate of the wage equation. Indeed, comparison of the substantive results from the three models underscores the importance of proper adjustment for selection. According to our speci…cation, while the coe¢ cient of middle school becomes insigni…cant (implying people having middle school education or less earn the same wage), the coe¢ cients of high school and university dramatically increase in magnitude. Evidently, the minority which opts for wage work is highly selective of unobservables and failure to account for this yields underestimation of the returns to higher levels of education. With illiterates as the reference category,high school results in 81 percent higher returns on average according to our robust correction whereas it is 55 percent on average under conventional correction. Compared to illiterates, while university results in 250 percent higher returns on average under our robust correction, this statistic is 195 percent on average in the trivariate normality speci…cation. According to our robust correction, average incremental returns are 21.9 percent per additional year of high school and 28 percent per additional year of university education. The other signi…cant variables under our robust correction have coe¢ cient estimates between that of random selection and conventional correction. Our robust correction implies that wage is maximized at 25 years of potential labor market experience. This statistics is given by 24 in the conventional correction. Furthermore, regions are jointly statistically signi…cant with a p
value of 0:0158. Besides, a one percent increase
in share of textiles causes a 0:31 percent decrease in wages while a one percent increase in share of …nance increases wages by 3:4 percent on average. These e¤ects are underestimated in the trivariate normality speci…cation. 17
3.3
Multicollinearity
One issue that arises in the context of our robust correction approach is the large standard errors of individual selectivity correction terms (see Table 3). Letting Rj2 for j = 1; 2; ::9 denote the coe¢ cient of determination obtained by regressing ^ j on the remaining selectivity correction terms, we obtain values in the range 0:995 < Rj2 < 1. This implies extremely high multicollinearity. At a …rst glance, one may think that this is caused by the fragility of our selection model since ^12 is very close to model by restricting
22
1. To get rid of this fragility, we reestimated the
= 1. This version yields ^12 =
0:746 and 0:964 < Rj2 < 0:9997 for
j = 1; 2; ::9. To assure that multicollinearity issue is not speci…c to this problem, we also generated ^ ’s for Tunali’s (1986) migration-remigration problem, which is another example of a double selection problem. The generic statement of this problem was given in Section 2:1. The empirical context of the problem is as follows. There are three earnings pro…les: ys under the stay option, yo under the one-time move option, yf under the frequent move option. Let 1
= yo
ys denote anticipated earnings gain from the one-time move to the best potential
destination, and
1
be the cost of that move. Let
2
= yf
yo denote anticipated incremental
earnings gain from the frequent move involving the best potential combination of intermediate and …nal destinations, and
2
be the cost of that move. Tunali (1986) assumes that individuals
are rational, in that sense anticipations are unbiased. Then, the two selection equations in Equations (1) and (2) are obtained as y1 =
1
1
and y2 =
2
2.
Upon introducing the
variable r = supf0; y1 ; y1 + y2 g, the decision rule becomes Stay if r = 0;
(23)
Move once if r = y1 ;
(24)
Move more than once if r = y1 + y2 :
(25)
However, Tunali’s (1986) modi…ed decision rule treats r = y1 + y2 > 0 > y1 (move more than once will be optimal in this case) as a stay decision on the grounds that the …rst move will not be feasible when 0 > y1 . Using the two dichotomous variables given in Equation (5), we can de…ne the decision
18
rule as:
Stay if D1 = 0;
(26)
Move once if D1 = 1 and D2 = 0;
(27)
Move more than once if D1 = 1 and D2 = 1:
(28)
Thus, D2 is observed () D1 = 1. The sample selection regime for this problem is a 3-way classi…cation with
= fD1 ; D2 g, where cells in the …rst row of Figure 1 are combined. A more
detailed description can be found in Tunali (1986). For this problem, we …nd ^12 = 0:389. Nevertheless, high multicollinearity is present: 0:910 < Rj2 < 0:9999 for j = 1; 2; ::9. Note that STATA had no trouble in computing standard errors and test statistics despite extremely high multicollinearity. We are able to explain this issue by exploiting linear alge~ 3 as (X31 j X32 ), where X31 includes all explanatory bra. Partition the full covariate matrix X variables of the model without selectivity correction terms (X31 = X3 in terms of the notation given in the generic statement of the double selection model), and X32 includes all the selectivity correction terms. We can write:
0
~3X ~3) (X
1
0
0
0
1
B X31 X31 X31 X32 C =@ A 0 0 X32 X31 X32 X32
1
:
(29)
0
~ X ~ We use the lower right block of the matrix (X 3 3)
1
in calculating standard errors of
^ ’s. This block may be computed as [(X 0 X ) X 0 X (X 0 X ) 32 32 32 31 31 31
1X 0 X ] 1 31 32
via submatrix 0
inverse theorem (see Goldberger, 1991). Extremely large Rj2 ’s imply that the matrix (X32 X32 ) is nearly singular. However, this matrix is not inverted. Instead, the di¤erence between 0
0
0
(X32 X32 ) and X32 X31 (X31 X31 )
1X 0 X 31 32
is inverted. Therefore, it is possible to compute
standard errors for ^ ’s.
19
4
Multinomial Logit Based Selection Correction Models
4.1
Selection Step
Multinomial Logit based selection correction models are used as alternatives to Multivariate Normal based models in the literature. These models de…ne a selection equation for each possible state. In the context of Tunali and Baslevent (2006), let7
yj =
0
jz
+
for j = 0; 1; 2; 3;
j
(30)
where yj ’s denote the utility obtained from the choice of labor force participation status j and are unobserved, z is the vector of explanatory variables8 , of unknown coe¢ cients and
j ’s
j ’s
are the corresponding vectors
are the random disturbances. Let r = max (y0 ; y1 ; y2 ; y3 ).
The four way classi…cation can obtained via the following categorical variable:
LF P 0 =
8 > > 0 = home-work, if r = y0 ; > > > > > < 1 = self-employment, if r = y ; 1 > > 2 = wage labor, if r = y2 ; > > > > > : 3 = unemployed, if r = y : 3
Hence, the sample selection regime is given by
9 > > > > > > > = > > > > > > > ;
(31)
= fLF P 0 g and Equation (30) corresponds
to the selection equations. We assume that
j ’s
satisfy Independence of Irrelevant Alternatives (IIA) hypothesis, so
that they are independent and identically Gumbel distributed. McFadden (1973) proves that this speci…cation corresponds to the Multinomial Logit model (Bourguignon et al., 2007). The choice probabilities are
j
0
= Pr(LF P = j j z) =
0
exp( j z) 3 X
; j = 0; 1; 2; 3:
(32)
0
exp( z)
=0
7
Individual subscripts are dropped from Equation (30) in order to avoid notational clutter. We deliberately use the same set of explanatory variables speci…ed in the Equations (18) and (19) for logit based selection equations. The aim is to compare the empirical results of these models and our selectivity correction approach in Section 4:3. 8
20
Since
3 X
k
= 1, we choose non-participants as the reference group and set
0
= 0. The
k=0
estimation of these models relies on two step estimation as well. In the …rst step, we obtain consistent estimates of
L=
Y
LF P =0
0
j ’s
Y
LF P =1
1
by maximizing the likelihood function in Equation (33): Y
LF P =2
2
Y
3:
(33)
LF P =3
Note that our selectivity correction approach allows correlation between the random disturbances of the two selection equations, but computational di¢ culties limit its generalization9 (unless simulation based estimation is used at the …rst stage). On the other hand, Multinomial Logit based selection correction models do not allow correlation between the random disturbances of the selection equations, but can handle a large number of selection equations. Because of the IIA assumption, these models treat unemployment as a distinct state. However, unemployment probability is de…ned as a union of self-employment and wage labor options in our problem [see Equation (20)]. Therefore, another potential limitation of these models is the failure to capture the choice structure appropriately. Despite this failure, we still discuss second (regression) step estimation for some well-known Multinomial Logit based selection correction models (Lee (1983), Dubin and McFadden (1984) and its two variants proposed by Bourguignon et al. (2007), and Dahl (2002)) in the following subsection.
4.2
Regression Step
Regression equation given in Equation (22) remains intact. Our aim is to estimate the parameter vector of the regression equation under the speci…ed Multinomial Logit based 0
selection correction models for the individuals with LF P = 2. Returning to the generic statement of the selectivity problem in Equation (4), once again we are confronted with E(u3 j X; ) 6= 0. Multinomial Logit based models di¤er in the assumptions needed for obtaining a parametric form for E(u3 j X; ). The regression step for each model is discussed in detail below. Denote the correlations between 9
j
and u3 by
j
for j = 0; 1; 2; 3 and let
This is the case for any multinomial probit model. For example, the aML software package (Lillard and Panis, 2003) may allow at most 4 alternatives.
21
l
=
l
2
for l = 0; 1; 3. In addition, considering Equation (30), we de…ne
2
= max (yl
y2 ) for l 2 f0; 1; 3g:
(34)
Then, LF P 0 = 2 () 4.2.1
2
< 0:
(35)
Lee’s (1983) Model
Lee (1983) generalizes the two-step selection correction methodology provided by Heckman (1976, 1979) and Lee (1976, 1979) by extending it to polychotomous choice. Thus, this method can be applied to Multinomial Logit based selectivity problems. Assume that the cumulative distribution function for
2
is given by F 2 (
2
0
j
z) where
is a matrix including the vectors of unknown coe¢ cients in each selection equation, i.e. =(
0
j
J 2(
where
1
j
2
j
2
0
j
3 ).
Using the translation method, we can de…ne 1
z) =
(F (
2
0
j
z));
(36)
is the standard normal distribution function. Lee (1983) makes the following as-
sumption:
The joint distribution of (u3 ; J 2 (
2
j
0
z)) does not depend on the index
0
z:
(37)
This assumption is contested in the literature since it has strong implications. First, the transformation J 2 (
2
j
0
z) implies that the marginal distribution is independent of
0
z, but
it does not tell us anything about the joint distribution. Besides, Schmertmann (1994) shows that under Lee’s (1983) index independence assumption, all moreover, if we assume
l
2 ’s
are identically distributed,
l ’s l ’s
must have the same sign;
will be identical for l = 0; 1;
3 (Bourguignon et al., 2007). Lee (1983) also assumes that
E(u3 j
2;
0
z) = c2 J 2 (
2
j
0
z);
(38)
22
where c2 is the correlation between u3 and J 2 (
2
j
0
z). In the literature, some ways of avoid-
ing the additional assumption in Equation (38) are available. If u3 is normally distributed, this equation follows directly. When u3 is not normally distributed, this equation may be viewed as a linear approximation to the conditional expectation function. Another way to justify Equation (38) is to translate the distribution of u3 to standard normal. Following Equation (36), we denote this translation as Ju3 (u3 ). Then, assuming that the joint distribution of (Ju3 (u3 ), J 2 ( and J 2 (
2
0
j
2
0
j
z)) does not depend on
0
z (i.e. the correlation between Ju3 (u3 )
z) is zero), we obtain Equation (38).
To recapitulate, under Lee (1983)’s assumptions, we get 0
0
E(u3 j LF P = 2; z) = where
2
=
c2 and
j 2
=
j 2 2;
(J 2 (0 j F 2 (0 j
(39) 0 0
z))
z)
. Note that
j 2
is the familiar term in conventional
selection correction. As before, after obtaining the consistent estimate of so for
j 2,
from the …rst step,
the second step estimates of Lee’s (1983) model (LEE) are acquired by estimating
a linear regression:
log(wage) =
4.2.2
0
3 X3
j 3 2 2:
+
(40)
Dubin and McFadden’s (1984) Model and Its Two Variants
The conditional expectation function of the regression equation error is assumed to be a linear function of
E(u3 j
j
E(nj )’s, yielding Equation (41):10
1; 2; 3; 4)
=
p
3
6X
j( j
E(nj )):
(41)
j=0
In this expression, consistent with the set-up of the selection step,
j ’s
are assumed to have
independent extreme value distributions. It can be shown that
E(
2
0
0
E(n2 ) j LF P = 2; z) =
log(
2 );
10
(42)
Linear conditional expectation function assumption is a convenient starting point going back to Olsen (1980).
23
E(
0
0
E(nl ) j LF P = 2; z) =
l
log( l ) for l = 0; 1; 3: 1 l
l
(43)
where log denotes the natural logarithm. To assure that E(u3 j X3 ) = 0, they also assume 3 X
j
= 0:
(44)
j=0
Under the assumptions given by Equations (41) and (44), the regression equation becomes
log(wage) =
0
3 X3
+
3
p
6 X
l sl ;
(45)
l=0;1;3
where sl =
l
log( l ) 1 l
+ log(
2)
for l = 0, 1, 3. After forming the three selectivity correction
terms via selection step estimates, we run a least squares on Equation (45) to obtain the second step estimates of DMF model. Even though the estimation will not provide us a consistent estimate for
2,
we may calculate it from Equation (44) as ^2 =
(^0 + ^1 + ^3 ).
Since only restriction on the correlation structure is via Equation (44), DMF can be regarded as superior to LEE. Bourguignon et al. (2007) show via Monte Carlo experiments that the restriction in Equation (44) results in biased results when incorrectly imposed; in contrast, e¢ ciency loss is small by not imposing it when the restriction holds. Hence, they drop the restriction in Equation (44) and refer to it as the …rst variant of Dubin and McFadden’s (1984) model (DMF 1). For this model, the regression equation is given by
log(wage) =
0
3 X3
+
3
p
2
64
X
l sl
+
2 s2
l=0;1;3
where sl = (
0
+
1
+
l
log( l ) 1 l
3)
for l = 0, 1, 3 and s2 =
3
5; log(
(46)
2 ).
Observe that if we set ^2 =
in this equation, we get Equation (45). Hence, DMF 1 is more general
compared to DMF. After forming the four selectivity correction terms via selection step estimates, we run a least squares on Equation (46) to obtain the second step estimates of the DMF 1 model. Since the assumption in Equation (41) restricts the distribution of u3 , Bourguignon et al. (2007) proposes a second variant of Dubin and McFadden’s (1984) model (DMF 2). Assuming
24
that the cumulative distribution function for
j
is given by G( j ) and using the trick given
in Lee (1983), they make the transformation 1
= J( j ) =
j
(G( j )):
(47)
Then, they assume that conditional expectation function of the regression equation error is a linear function
E(u3 j where
j
1;
j ’s:
2;
3;
4 X
4) =
j j;
(48)
j=1
is the correlation between u3 and
by setting
j
=
j.
Observe that Equation (41) can be obtained
E( j ) in this equation; therefore, DMF 2 is a generalized version of DMF
j
1. Note that the assumption in Equation (48) holds if u3 and distributed. Assuming that (:) denotes the density function of
M(
j)
=
Z
J(v
log(
j ))
(v)dv for j = 0; 1; 2; 3:
j ’s j,
are multivariate normally let
(49)
Then, Bourguignon et al. (2007) derive the following: 0
0
0
0
E(
2
j LF P = 2; z) = M (
2 ):
E(
l
j LF P = 2; z) = M ( l )
(50) l l
1
for l = 0; 1; 3:
(51)
As a result, wage equation can be written as
0
log(wage) =
where sl = M ( l )
3 X3
l l
1
+
2
X
34 l sl l=0;1;3
+
2 s2
3
5:
for l = 0, 1, 3 and s2 = M (
(52)
2 ).
Even though M ’s do not have closed
form solutions, they are computed using Gauss-Laguerre quadrature. In this calculation, methods given by Davis and Polonsky (1964) are used. Second step estimates of DMF 2 are obtained by running a linear regression on Equation (52) after obtaining consistent …rst step estimates and then forming selectivity correction terms.
25
4.2.3
Dahl’s (2002) Model
Dahl (2002) proposes a semi-parametric model. Estimation of the model becomes more di¢ cult as the number of categories increases. To simplify matters, he makes an index su¢ ciency 0
assumption by restricting the set of known probabilities to a subset Q of the full choice set Q=f
0;
Q since
1; 3
'(u3 ;
2g
=1
2
j
in order to deal with large number of categories. We do not include (
0
0+
1+
z) = '(u3 ;
2)
2
3
in
This assumption can be written as follows:
j Q) = '(u3 ;
2
0
jQ)
where ' is the conditional density function of u3 and
(53)
2.
In other words, Dahl (2002) makes the following restriction in order to reduce the dimensionality of the problem: 0
0
0
0
0
E(u3 j LF P = 2; z) = E(u3 j LF P = 2; Q ) = (Q ):
(54)
is a unique function, which Dahl (2002) obtains by assuming that the probabilities are invertible. Then, the regression function becomes
log(wage) =
0
3 X3
+ (
2 ):
(55)
This approach has two shortcomings. First, as stated by Dahl (2002), we generally can not test the index su¢ ciency assumption. Second, reducing dimensionality is equivalent to making a correlation assumption, i.e.
j,
the correlation between
j
and u3 , is a function of
0
the elements of Q for j = 0; 1; 2; 3 (Bourguignon et al., 2007). However, compared to other Multinomial Logit based selection correction models, Dahl (2002) does not assume a linear conditional expectation function. 0
A special case of this model proposed by Dahl (2002) is to assume that the set Q is a singleton containing only the chosen alternative, in our case 2. Then, the regression equation becomes
log(wage) =
0
3X
+ (
2)
(56)
26
We assume that
is a fourth order polynomial as in Bourguignon et al. (2007), and call
this model DAHL 1. Another approach, following Bourguignon et al. (2007), is to express as a function of all probabilities up to fourth order plus the interactions between them. This latter approach is identi…ed as DAHL 2 in our analysis. In both approaches,
’s are
approximated by a series expansion after obtaining …rst step estimates. Unlike the other logit based methods, DAHL’s approach does not yield consistent estimates of
4.3
33
and ’s.
Application Revisited
We revisit the application given in Section 3 and estimate it using the Multinomial Logit based selection correction models. Conveniently, the results for both of the steps can be obtained by using the Stata module selmlog developed by Bourguignon et al. (2007).11 The results obtained from the selection step are presented in Table 4. Observe that while Tunali and Baslevent (2006) and we set P3 = P1 + P2 , Multinomial Logit based selection correction models have
3
=1
(
0+
1+
2 ).
The restriction for unemployment probability in Tunali
and Baslevent (2006) and our model is not satis…ed in Table 4 since coe¢ cients in columns 1 and 2 do not sum to those in column 3. Since our model captures the logic of participation better, it may be safe to assume that it gives more accurate results. Then, do …rst step results of Multinomial Logit based models yield us astray? Even though most qualitative results are similar except those regarding unemployment (for obvious reasons) with those of the multinomial probit based models12 , there are some di¤erences. These di¤erences can be seen as costs of simplicity. As seen from Table 4, with illiterates as the reference category, being a middle school or university graduate increase the log odds for self-employment relative to non-participation. High school can also be regarded as a signi…cant variable for the selection equation regarding self-employment since it has a p
value of 0:101. As people get more educated, they become
more able, so they may set up their own business or work for their own account. In that sense, these …ndings are consistent with expectations. Besides, education levels starting from elementary school are signi…cant for the selection equation regarding wage workers, and the coe¢ cient estimates increase with education. This implies the probability of being a wage 11
The STATA ado …le is available at http://www.pse.ens.fr/gurgand/selmlog13.html Throughout the section, in order to make a comparison, we use the results of the restricted version with 22 = 0:35 since it gives more precise estimates. 12
27
Table 4. Selection Step for Multinomial Logit Based Selection Correction Models Variable Self-employed Wage Worker Unemployed Coef. Std. Coef. Std. Coef. Std. Error Error Error Age 0.085 0.090 0.575*** 0.067 0.128 0.087 Age squared/100 -0.113 0.126 -0.839*** 0.097 -0.264** 0.129 Literate without a diploma 0.402 0.282 0.371 0.262 0.310 0.295 Elementary school 0.194 0.221 0.500*** 0.185 0.287 0.207 Middle school 1.083*** 0.311 1.493*** 0.233 0.863*** 0.282 High school 0.572 0.348 3.226*** 0.189 1.642*** 0.244 University 1.630*** 0.559 5.089*** 0.235 1.735*** 0.476 Husband self-employed 0.082 0.160 -1.318*** 0.125 -0.863*** 0.166 Children aged 0-2 -0.362 0.232 -0.390*** 0.134 -0.505*** 0.174 Children aged 3-5 -0.163 0.193 -0.429*** 0.118 -0.176 0.147 Female children aged 6-14 0.381** 0.178 -0.260** 0.114 -0.086 0.151 Male children aged 6-14 0.160 0.180 -0.405*** 0.113 0.020 0.152 Extended household 0.634* 0.343 0.094 0.256 -0.156 0.400 Ext. hh. × Children aged 0-2 0.220 0.562 0.260 0.338 0.099 0.566 Ext. hh. × Children aged 3-5 -0.067 0.523 0.446 0.308 -0.450 0.543 Ext. hh. × Female ch. aged 6-14 -0.702 0.487 -0.007 0.312 0.183 0.505 Ext. hh. × Male ch. aged 6-14 -0.301 0.476 0.293 0.306 -0.090 0.511 Share of textiles -0.163 0.539 1.212*** 0.433 0.823* 0.498 Share of agriculture -1.115 1.248 1.939** 0.841 -1.397 1.039 Share of finance -6.208 4.435 -2.910 3.615 -4.184 4.793 Migration rate -10.18*** 3.356 7.469** 3.070 -3.402 3.406 Aegean -0.729** 0.310 0.551** 0.227 0.507* 0.260 South -0.105 0.291 -0.283 0.240 -0.033 0.315 Central -1.739*** 0.361 0.670** 0.292 0.090 0.343 North West -1.559*** 0.552 1.012** 0.398 0.350 0.412 East -1.386** 0.543 0.495 0.415 0.118 0.481 South East -0.451 0.468 -0.223 0.361 -1.750*** 0.609 North East -2.790*** 1.062 1.037** 0.463 0.347 0.485 Share of Welfare Party -16.509*** 2.978 2.749 2.309 -4.006 2.941 Share of Left of Center -4.545*** 1.240 2.400* 1.346 3.757*** 1.465 Population 200,000 or more 0.766*** 0.263 0.303* 0.174 0.088 0.230 Population 1 million or more -0.372 0.259 0.126 0.185 -0.523** 0.249 Constant -1.993 1.693 -14.708*** 1.343 -5.258*** 1.605 No. of observations 8962 Log-likelihood w/o covariates -4603.94 Log-likelihood with covariates -3528.64 Robust standard errors are in parentheses. * significant at 10%; ** significant at 5%; *** significant at 1%. The reference group is non-participation.
worker compared to non participation increases with education level where illiterates constitute the reference group. These results are consistent with those of the multinomial probit based models. Self employment is more likely for women who live in extended households or have female children aged 6-14 compared to non-participation. However, probability of wage labor over
28
non-participation is not a¤ected by extended households and decreases with the presence of children. Typically, women are responsible for care of dependents (child and/or elderly people) due to their comparative advantage and/or the social role assigned to them in the Turkish culture. As the number of dependents in the household increases, not only time required for care responsibilities but also living expenses increase. That probably induces women to work, but they do not have enough time to work in a regular job due to the time constraint arising from increased care responsibilities. As a result, they may end up being a self-employed rather than a wage worker. The results of the multinomial probit based models imply that extended household and presence of children except those aged 0-2 do not a¤ect participation probabilities. Children aged 0-2 decreases the probability of being a self-employed over non-participation. Hence, the results di¤er. While probability of being a wage worker compared to non-participation has a concave pro…le with respect to age, log odds for self-employment relative to non-participation is not a¤ected by age. The part regarding wage workers are consistent with the theory of labor supply over the life cycle. However, self-employment part does not comply with this theory, and it is contrary to the results of the multinomial probit based models. Although shares of textiles and agriculture increase the probability of being a wage worker, they do not a¤ect the probability of self-employment compared to non-participation. The …rst pattern is inconsistent for agriculture sector, and observe that share of textiles is an insigni…cant variable for both of the selection equations in the analysis of the multinomial probit based models. Moreover, both the probabilities of being a wage worker and selfemployed relative to non-participation are not a¤ected by share of …nance, which again disagrees with the results in Table 2. Notice that the interpretation of migration rate remains the same. If the husband is self employed, log odds of being a wage worker relative to non-participation decreases. Furthermore, nearly all of the region dummies have interpretations consistent with the results of the multinomial probit based models. Likewise, the interpretations regarding share of votes received by welfare and left of center parties agree with the multinomial probit based models except that the probability of self-employment compared to non-participation decreases with an increase in share of votes received by left of center parties. The variable population 200; 000 or more is signi…cant for self-employment and wage worker selection 29
equations, and this does not comply with the results of of the multinomial probit based models. Unlike our model, logit based models use an independent equation for the unemployed. The results for the unemployed paint a reasonable picture of the labor market. In the selection equation regarding unemployment, we see that as age increases, the probability of being unemployed relative to non-participation become less likely. This is reasonable since more and more people give up looking for a job as they get older. With illiterates as the reference group, the probability of being unemployed compared to non-participation increases with middle school education or more. This is as expected since price of leisure is higher for more educated people, and thus they look for a job. Moreover, having children aged 02 decreases unemployment probability compared to non-participation since women refrain from looking for a job in order to nurture young children. Besides, women with self employed husband have a lower unemployment probability compared to non-participation probability. The reason may be that if these women would like to work, they probably work with their husbands. Share of textiles increases the unemployment probability relative to non-participation. As new textile jobs become available, some low-educated and inexperienced non-participants may decide to look for a job. Furthermore, as votes received by left of center parties increase, unemployment probability increases compared to non-participation. This is in line with the Tunali and Baslevent’s (2006) assertion that people would have a more liberal view about the female labor in the provinces where the share of votes received by center of left parties is higher. In the second stage, selectivity correction terms are formed according to the assumptions of the speci…ed models and the regression estimates for wage workers are obtained using the selmlog module. Standard errors are estimated using the bootstrap method by drawing with replacement 100 and 400 samples where the sample size is determined by the size of the sample in use, which is 735. Since additional bootstrapping does not lead to big di¤erences in the standard errors of the variables, we only report the estimates of the version with 100 replications. Table 5 gives the results based on LEE where weighted least squares is used to account for heteroskedasticity. The weights are given in the appendix of Bourguignon et al. (2007). 30
The selmlog module provides consistent estimates of
33
and c2 . It does not restrict c2 to
the [ 1; 1] interval.
Table 5. LEE Estimates of the Wage Equation Variable Coefficient Std. Error Experience 0.033** 0.014 Experience sq./100 -0.068* 0.041 Literate w/o a diploma -0.110 0.190 Elementary school -0.003 0.124 Middle school 0.250* 0.143 High school 0.486*** 0.184 University 1.067*** 0.256 Share of Textiles -0.391*** 0.144 Share of Agriculture 0.156 0.254 Share of Finance 4.211*** 1.194 Aegean -0.132** 0.067 South 0.153 0.095 Central 0.029 0.077 North West -0.171 0.110 East 0.137 0.130 South East 0.184* 0.112 North East -0.129 0.125 j -0.009 0.114 λˆ2 Constant -0.967*** 0.321 σ33 0.279*** 0.062 c2 -0.016 0.178 No. of obs. 735 Standard errors are estimated using 100 bootstrap replications. * significant at 10%; ** significant at 5%; *** significant at 1%. The …nding from LEE supports random selection since selectivity correction term has ap
value = 0:939. This is inconsistent with the results obtained both from the trivariate
normality and our robust speci…cation. In fact, results based on LEE are almost exactly the same with the results from the model without selectivity correction reported in Table 3. Table 6 reports the second step estimates from DMF, DMF 1 and DMF 2. This step, for all these models, is estimated by employing weighed least squares to deal with heteroskedasticity. The relevant weights can be found in the appendix of Bourguignon et al. (2007). Note that, for j = 0; 1; 2; 3, sj ’s denote di¤erent variables for each model and their de…nitions were already given in Section 4:2:2. The selmlog module provides consistent estimates of j
and
j
33 ,
for j = 0; 1; 2; 3 by not restricting the correlation estimates to their theoretical
ranges. 31
Table 6. DMF, DMF1 and DMF2 Estimates of the Wage Equation Variable DMF 2 DMF 1 Coef. Std. Coef. Std. Error Error Experience 0.031** 0.012 0.028** 0.014 Experience sq./100 -0.064* 0.034 -0.057 0.036 Literate w/o a diploma -0.127 0.195 -0.138 0.190 Elementary school -0.030 0.137 -0.044 0.137 Middle school 0.191 0.176 0.167 0.187 High school 0.453** 0.197 0.409** 0.186 University 1.044*** 0.299 1.041*** 0.285 Share of Textiles -0.419** 0.166 -0.377* 0.197 Share of Agriculture 0.176 0.278 0.146 0.271 Share of Finance 4.593*** 1.728 4.400** 2.166 Aegean -0.115 0.085 -0.114 0.121 South 0.155* 0.105 0.155 0.134 Central 0.045 0.088 0.044 0.107 North West -0.150 0.125 -0.138 0.135 East 0.156 0.143 0.149 0.191 South East 0.199 0.140 0.178 0.192 North East -0.101 0.173 -0.077 0.176 -0.142 0.256 -0.361 0.410 sˆ0
DMF Coef. 0.029** -0.060* -0.128 -0.032 0.208 0.436*** 1.052*** -0.329** 0.105 3.920*** -0.113 0.159 0.035 -0.131 0.145 0.140 -0.090 -0.352
Std. Error 0.013 0.036 0.182 0.124 0.143 0.162 0.225 0.160 0.234 1.247 0.095 0.102 0.081 0.111 0.129 0.111 0.125 0.346
-0.667 -0.025 0.090
1.415 0.974 0.546
-0.290 -0.064 0.084
2.475 0.060 0.413
0.136 ― 0.273
0.544 ― 0.392
-1.002*** 0.309*** ―
0.325 0.132 ―
-1.052** 0.417 -0.718
0.439 0.420 0.529
-1.001*** 0.432 -0.686
0.352 0.347 0.476
ρ1
―
―
-0.576
2.588
0.266
0.784
ρ2
―
―
-0.127
0.096
―
―
ρ3
―
―
0.168
0.610
0.532
0.574
ρ 0*
-0.255
0.432
―
―
―
―
ρ1*
-1.200
1.932
―
―
―
―
-0.046 0.160 ― ― ρ 2* 0.163 0.831 ― ― ρ 3* No. of obs. 735 735 Standard errors are estimated using 100 bootstrap replications. * significant at 10%; ** significant at 5%; *** significant at 1%.
―
―
―
―
sˆ1 sˆ2 sˆ3 Constant σ33 ρ0
Since s^0 , s^1 and s^3 are jointly statistically insigni…cant for DMF (p s^0 , s^1 , s^2 and s^3 are jointly statistically insigni…cant for DMF 1 (p
735
value = 0:667), and value = 0:8021) and
DMF 2 (p value = 0:9564), these models provide evidence in favor of random selection. This
32
conclusion contradicts the results based on trivariate normality and our robust speci…cations. Like LEE, results obtained from these models are very similar to that of random selection, reported in Table 3. Table 7 gives the second step estimates obtained from DAHL 1 and DAHL 2. Note that we cannot test for non-random selection since the selmlog module neither provides robust standard errors for correction terms nor computes R2 ’s.
Table 7. DAHL 1 and DAHL 2 Estimates of the Wage Equation Variable DAHL 2 DAHL 1 Coef. Std. Coef. Std. Error Error Experience 0.027* 0.016 0.033** 0.014 Experience sq./100 -0.050 0.039 -0.067* 0.039 Literate w/o a diploma -0.204 0.187 -0.140 0.182 Elementary school -0.090 0.120 -0.041 0.120 Middle school 0.158 0.162 0.182 0.146 High school 0.592*** 0.201 0.533*** 0.170 University 1.066*** 0.291 1.147*** 0.240 Share of Textiles -0.274 0.195 -0.370*** 0.121 Share of Agriculture 0.114 0.276 0.140 0.233 Share of Finance 4.474*** 1.636 4.138*** 1.168 Aegean -0.122 0.109 -0.142* 0.077 South 0.146 0.112 0.130 0.086 Central 0.098 0.097 0.029 0.073 North West -0.103 0.125 -0.181** 0.091 East 0.206 0.152 0.113 0.113 South East 0.003 0.165 0.172 0.109 North East -0.074 0.161 -0.122 0.113 Constant -267.265 418.920 -1.030*** 0.223 No. of obs. 735 735 Standard errors are estimated using 100 bootstrap replications. significant at 10%; ** significant at 5%; *** significant at 1%. Comparing the results in Table 7 with those obtained from our robust correction in Table 3, observe that South in both models, and experience sq./100, share of textiles and Aegean in DAHL 2 become insigni…cant. Moreover, North West becomes signi…cant at 5 percent in DAHL 1. The point estimates of the signi…cant variables in both DAHL 1 and our speci…cation di¤er up to 20 percent whereas the discrepancy rises to 30 percent with DAHL 2. To recapitulate, Multinomial Logit based selection correction models fail to detect selectivity in wage equation estimates and yield magnitudes di¤ering much from ours. The reason 33
might be that these models ignore the logic of participation. Among these models, closest estimates to ours are given by DAHL 1. This is surprising since by comparing root mean squared errors via Monte Carlo experiments, Bourguignon et al. (2007) …nd that DAHL 1 performs better than other models in only 1 out of 100 cases. Furthermore, their results support using variants of DMF or DAHL 2, which perform badly in our case.
5
Conclusion
This paper discusses correction of selectivity bias in models of double selection. The approach proposed here relaxes the conventional trivariate normality assumption. To assess the merits of the new method, we compare our methodology with multinomial based selection correction models, which have been developed for handling two or more rounds of selection in the literature. The conventional selectivity correction mechanism used in double selection models relies on trivariate normality among the random disturbances of the regression and the two selection equations. However, when tests are done, this assumption is typically rejected. We relax this assumption by relying on an Edgeworth expansion, extending what Lee (1982) did for simple selection models with one selection equation. We assume bivariate normality among the random disturbances in the two selection equations, but do not impose any restriction on the form of the distribution of the random disturbance in the regression equation. Since it relaxes the skewness and/or kurtosis assumptions, our methodology is superior to trivariate normality based approaches. Besides, we provide simple tests for the presence of selectivity bias and the trivariate normality speci…cation under our distributional assumption. The drawback of our model is that extensions beyond 2 selection equations are cumbersome and computationally intensive. To handle multiple selection, Multinomial Logit based correction models are used frequently. These models de…ne a selection equation for each possible category and impose the IIA structure. Our model has the advantage of allowing correlation between the random disturbances of the two selection equations. In addition, problems in which the states are not pairwise disjoint are tractable via our speci…cation. Cross-equation restrictions are impossible for Multinomial Logit based models due to the maintained IIA property.
34
To illustrate the di¤erences among various selection correction models, an empirical example involving the wage determinants of regular and casual female workers in Turkey is presented. Only a fraction of women living in urban areas of Turkey participate and participation probability increases dramatically in response to an increase in education level. Thus, there is reason to believe that wage workers are not a random subsample. Indeed, both the conventional correction method and the robust version support this conjecture. Notably, the example provides evidence in favor of our robust speci…cation, and against the trivariate normality assumption. We conclude that it is important to allow kurtosis and skewness. As examples of Multinomial Logit models, we analyzed the methods provided by Lee (1983), Dubin and McFadden (1984) and its two variants proposed by Bourguignon et al. (2007), and Dahl (2002). These models fail to capture selectivity. Furthermore, the substantive results di¤er from those obtained under our robust speci…cation. Our model captures the logic of participation better, so we believe that our robust results are more reliable and conclude that the speci…ed Multinomial Logit based selection correction models are far from providing reasonable estimates in this empirical example.
35
6
Appendix
6.1
Derivation of Equations (11) and (12)
Let f (u1 ; u2 ; u ~3 ) be the joint density of u1 , u2 and u ~3 . The triplet (u1i ; u2i ; u ~3i ) is assumed to be independently and identically distributed across individulas with zero mean vector and covariance matrix 2 12 6 1 6 =6 6 12 1 4 13
13 23
1
23
3
7 7 7; 7 5
and is independent of Xki for k = 1; 2; 3. For brevity, we suppress the conditioning on covariate matrix X and the individual subscript i throughout the section. Our aim is to obtain an expression for E(~ u3 j u1 ; u2 ). We denote the standard trivariate normal density STVN ( and the standard bivariate normal density SBVN (
12 )
12 ; 13 ; 23 )
by g(u1 ; u2 ; u3 )
by g(u1 ; u2 ). Let k(~ u3 j u1 ; u2 ) be
the conditional density of u ~3 given u1 and u2 . We assume that k(~ u3 j u1 ; u2 )g(u1 ; u2 ) = f (u1 ; u2 ; u ~3 ). That is, we specify the joint distribution of (u1 ; u2 ) as standard bivariate normal, but allow the conditional distribution of u ~3 given u1 and u2 to be non-normal. Let q 1
A=
a= p
(
2 12
1 1
2 12
+
2 13
+
2 23
2
12 13 23 );
(A.1.1)
;
(A.1.2)
b1 =
13
12 23 ;
(A.1.3)
b2 =
23
12 13 ;
(A.1.4)
c=
a (b1 u1 + b2 u2 ) ; A
(A.1.5)
u1 (u1 ; u2 ) = u1 = a(u1
12 u2 );
(A.1.6)
u2 (u1 ; u2 ) = u2 = a(u2
12 u1 );
(A.1.7)
u3 =
u3 Aa
c:
(A.1.8)
36
We can write:
E
u ~3 j u1 ; u2 g(u1 ; u2 ) = Aa
Z
1 1
u ~3 k(~ u3 j u1 ; u2 )g(u1 ; u2 ) d~ u3 : Aa
(A.1.9)
Equivalently,
E(~ u3 j u1 ; u 2 )
g(u1 ; u2 ) = Aa
Z
1 1
u ~3 f (u1 ; u2 ; u ~3 ) d~ u3 : Aa
(A.1.10)
For the Type AAA surface de…ned in Section 2:3, we may expand f (u1 ; u2 ; u ~3 ) in terms of a series of derivatives of g(u1 ; u2 ; u3 ) via Equation (7). We truncate the expansion by keeping terms with r + s + p
4. This allows us to set Arsp = Krsp where K’s denote cumulants
(Lahiri and Song, 1999). Thus, the truncated version of Equation (A:1:10) is:
E(~ u3 j u1 ; u 2 )
+
4 X
r+s+p=3
Z
1
u3 g(u1 ; u2 ; u3 ) du3 1 Aa Z 1 ( 1)r+s+p u3 r s p Krsp Du1 Du2 Du3 g(u1 ; u2 ; u3 )du3 : (A.1.11) r!s!p! 1 Aa
g(u1 ; u2 ) = Aa
Let
I1 =
Z
1 1
u3 g(u1 ; u2 ; u3 ) du3 ; Aa
(A.1.12)
and
Irsp =
Z
1 1
u3 r s p D D D g(u1 ; u2 ; u3 )du3 : Aa u1 u2 u3
(A.1.13)
Then, Equation (A:1:11) may be expressed as
E(~ u3 j u1 ; u 2 )
g(u1 ; u2 ) = I1 + Aa
4 X
r+s+p=3
( 1)r+s+p Krsp Irsp : r!s!p!
37
(A.1.14)
Claim: g(u1 ; u2 ; u3 ) =
(u1 )
(u2 ) (u3 ) A
, where
is the univariate standard normal density.
Proof: Let g(u2 j u1 ) be the conditional density function of u2 given u1 and g(u3 j u1 ; u2 ) be the conditional density of u3 given u1 and u2 . Then, g(u1 ; u2 ; u3 ) can be written as a product of marginal and conditional distributions:
g(u1 ; u2 ; u3 ) = g(u1 )g(u2 j u1 )g(u3 j u1 ; u2 ):
(A.1.15)
Clearly, g(u1 ) = (u1 ). We know from Goldberger (1991) that
g(u2 j u1 ) = a (u2 ) :
(A.1.16)
Now, let us …nd g(u3 j u1 ; u2 ). Using the formula in Goldberger (1991), we obtain E(u3 j u1 ; u2 ) = a2 (b1 u1 + b2 u2 );
(A.1.17)
V ar(u3 j u1 ; u2 ) = A2 a2 :
(A.1.18)
Then,
g(u3 j u1 ; u2 ) =
(u3 ) : Aa
(A.1.19)
Substituting Equation (A:1:16) and Equation (A:1:19) into Equation (A:1:15), we obtain
g(u1 ; u2 ; u3 ) =
(u1 ) (u2 ) (u3 ) : A
Q:E:D:
(A.1.20)
Substituting Equation (A:1:20) into Equation (A:1:13) and making the transformation z3 = u3 , so Aadz3 = du3 , we get
Irsp =
Let
(p)
Z
1 1
(z3 + c) Dur 1 Dus 2 Dup 3 (u1 ) (u2 ) (z3 )adz3 :
(A.1.21)
denote the pth derivative of the univariate standard normal density. Observe that
Dup 3 (z3 ) =
(p)
(z3 )
1 Aa
p
;
(A.1.22)
38
and Dzp3 (z3 ) =
(p)
(z3 ):
(A.1.23)
Hence, 1 Aa
p
Dzp3 (z3 ) = Dup 3 (z3 ):
(A.1.24)
Inserting Equation (A:1:24) into Equation (A:1:21), we obtain
Irsp
1 = p p A a
1
Z
1 1
(z3 + c) Dur 1 Dus 2 Dzp3 (u1 ) (u2 ) (z3 )dz3 :
(A.1.25)
Making another transformation z3 = u3 (abusing the notation), so Dzp3 (z3 ) = Dup 3 (u3 ) and dz3 = du3 , we have
Irsp =
1 Ap ap
1
Z
1 1
(u3 + c) Dur 1 Dus 2 Dup 3 (u1 ) (u2 ) (u3 )du3 :
(A.1.26)
The de…nition of rth order univariate Hermite polynomial can be obtained by setting s = 0 in Equation (9). In other words, Dur 1 (u1 ) = ( 1)r Hr (u1 ) (u1 ). Using this, we can deduce H0 (a) = 1 and H1 (a) = a. Then, we can write 0 1 1 X B 1 C i u3 + c = H1 (u3 + c) = @ A c H1 i (u3 ); i i=0 0
(A.1.27)
1
B 1 C where @ A denotes the binomial coe¢ cient. Incorporating Equation (A:1:27) into Equai tion (A:1:26), we get
Irsp =
1)p
( Ap ap
Dr Ds (u1 ) (u2 ) 1 u1 u2
1 X
0
1
B 1 C i @ Ac i i=0
39
Z
1 1
Hp (u3 )H1 i (u3 ) (u3 )du3 : (A.1.28)
For any x, the orthogonality property of Hermite polynomials allows us to write Z
where 1
1
Hn (x)Hm (x) exp
n;m ,
(A.1.29)
1
n;m
is the Kronecker product,
i () i = 1 Z
p x2 =2 dx = n! 2
1
p () p
n;m
1,
Hp (u3 )H1 i (u3 ) (u3 )du3 =
1
8 9 > < 1 if n = m > = = . Since > : 0 if n 6= m > ;
Z
1
Hp (u3 )H1 i (u3 )
1
p = p! 2
Also observe that
R1
1 Hp (u3 )H1 i (u3 )
1 2
p;1 i p
exp
p
u23 =2 2
= p! for p
p;1 i
= 1 () p =
du3
1:
(A.1.30)
(u3 )du3 = 0 for p > 1. Returning to Equation
(A:1:28), we conclude that
Irsp = 0 for p > 1:
(A.1.31)
Moreover, substituting Equation (A:1:30) into Equation (A:1:28), we obtain:
Irsp =
0
1
( 1)p r s B 1 C 1 D D (u ) (u ) @ Ac 1 u u 2 1 2 Ap ap 1 1 p
p
p! for p
1:
(A.1.32)
Equivalently,
Irsp =
Substituting
Irsp =
( 1)p r s c1 p D D for p (u ) (u ) 1 2 Ap ap 1 u1 u2 (1 p)! a A (b1 u1
1:
(A.1.33)
+ b2 u2 ) back instead of c, we obtain
( 1)p 1 Dr Ds (u1 ) (u2 ) (b1 u1 + b2 u2 )1 2p 2 Aa (1 p)! u1 u2
p
for p
1:
(A.1.34)
We can conclude from Equations (A:1:12) and (A:1:13) that I1 = I000 . Then, we can write I1 as
I1 =
a2 (u1 ) (u2 ) (b1 u1 + b2 u2 ) : A
(A.1.35)
40
Setting p = 0 and 1 in Equation (A:1:34); we respectively get:
Irs0 =
a2 r s D D (u1 ) (u2 ) (b1 u1 + b2 u2 ) ; A u1 u2
(A.1.36)
and
Irs1 =
1 r s D D (u1 ) (u2 ) : A u1 u2
(A.1.37)
After substituting Equations (A:1:31), (A:1:35), (A:1:36), and (A:1:37) into Equation (A:1:16) and doing some manipulations, we get
E(~ u3 j u1 ; u2 )g(u1 ; u2 ) = a3 (u1 ) (u2 ) (b1 u1 + b2 u2 ) 4 X ( 1)r+s Krs0 a3 Dur 1 Dus 2 (u1 ) (u2 ) (b1 u1 + b2 u2 ) + r!s! r+s=3
3 X ( 1)r+s Krs1 aDur 1 Dus 2 (u1 ) (u2 ) : + r!s!
(A.1.38)
r+s=2
Since a (u1 ) (u2 ) = g(u1 ; u2 ), which may be obtained by multiplying Equation (A:1:16) by (u1 ), Equation (A:1:38) can be written as:
E(~ u3 j u1 ; u2 )g(u1 ; u2 ) = a2 g(u1 ; u2 ) (b1 u1 + b2 u2 ) 4 X ( 1)r+s Krs0 a2 Dur 1 Dus 2 g(u1 ; u2 ) (b1 u1 + b2 u2 ) + r!s! r+s=3
+
3 X ( 1)r+s Krs1 Dur 1 Dus 2 g(u1 ; u2 ): r!s!
(A.1.39)
r+s=2
Now, let us compute Dur 1 Dus 2 g(u1 ; u2 ) (b1 u1 + b2 u2 ) via the chain rule:
Dur 1 Dus 2 g(u1 ; u2 ) (b1 u1 + b2 u2 ) = (b1 u1 + b2 u2 ) Dur 1 Dus 2 g(u1 ; u2 ) + g(u1 ; u2 )Dur 1 Dus 2 (b1 u1 + b2 u2 ) : Observe that Dur 1 Dus 2 (b1 u1 + b2 u2 ) = 0 for r + s
41
(A.1.40)
2. Using the formula for bivariate Hermite
polynomials given in Equation (9), we obtain
Dur 1 Dus 2 g(u1 ; u2 ) (b1 u1 + b2 u2 ) = (b1 u1 + b2 u2 ) Dur 1 Dus 2 g(u1 ; u2 ) = (b1 u1 + b2 u2 ) ( 1)r+s Hrs (u1 ; u2 )g(u1 ; u2 ) for r + s
2: (A.1.41)
Incorporating Equation (A:1:41) into Equation (A:1:39) and doing some manipulations, we get "
E(~ u3 j u1 ; u2 ) = a2 (b1 u1 + b2 u2 ) 1 + +
3 X
r+s=2
Since (u1 ; u2 )
1+
4 X
r+s=3
SBVN (
12 )
4 X
r+s=3
# 1 Krs0 Hrs (u1 ; u2 ) r!s!
1 Krs1 Hrs (u1 ; u2 ): r!s!
(A.1.42)
Equation (10) allows us to write
1 Krs0 Hrs (u1 ; u2 ) = 1: r!s!
(A.1.43)
This expression may be veri…ed by exploiting the moment formulas in Stuart and Ord (1994), after showing that Kijk =
ijk
for i + j + k = 3 using their technique where
’s denote
trivariate central moments. Thus, when h(u1 ; u2 ) = g(u1 ; u2 ), Equation (A:1:42) becomes K201 H20 (u1 ; u2 ) + K111 H11 (u1 ; u2 ) 2 K301 K211 K021 H02 (u1 ; u2 ) + H30 (u1 ; u2 ) + H21 (u1 ; u2 ) + 2 6 2 K121 K031 + H12 (u1 ; u2 ) + H21 (u1 ; u2 ): 2 6
E(~ u3 j u1 ; u2 ) = a2 (b1 u1 + b2 u2 ) +
(A.1.44)
When we express a, b1 and b2 as in Equations (A:1:2), (A:1:3) and (A:1:4) respectively, we get Equation (12). Using Equations (A:2:1) and (A:2:2) given in the next section, and the cumulant formulas K101 =
13
and K011 =
23
(Stuart and Ord, 1994), it can be shown that
a2 (b1 u1 + b2 u2 ) = K101 H10 (u1 ; u2 ) + K011 H01 (u1 ; u2 );
Combining Equation (A:1:45) with Equation (A:1:44), we obtain Equation (11).
42
(A.1.45)
6.2
Expressions for Bivariate Hermite Polynomials
The expressions given below were obtained by solving Equation (9) for various values of r and s using Maple. The code may be obtained from the author upon request.
H10 (u1 ; u2 ) = au1 ;
(A.2.1)
H01 (u1 ; u2 ) = au2 ;
(A.2.2)
H20 (u1 ; u2 ) =
a + u 12 ;
(A.2.3)
H02 (u1 ; u2 ) =
a + u 22 ;
(A.2.4)
H11 (u1 ; u2 ) =
6.3
12
+ u1 u2 ;
(A.2.5)
H30 (u1 ; u2 ) = au1 H20
2a2 ;
(A.2.6)
H03 (u1 ; u2 ) = au2 H02
2a2 ;
(A.2.7)
H21 (u1 ; u2 ) = aH11 u1
a3 (u2
12 u1 );
(A.2.8)
H12 (u1 ; u2 ) = aH11 u2
a3 (u1
12 u2 ):
(A.2.9)
Moments of the Truncated SBVN Distribution
The moment formulas of the truncated SBVN distribution up to the second order may be found in Rosenbaum (1961). However, in our independent derivation, we discovered that Rosenbaum made a sign error while calculating integrals. Our expressions up to the second order agree with those in Henning and Henningsen (2007), who cross-check their formulas using numerical integration and Monte Carlo simulation. We also provide some third order moments of the truncated SBVN distribution below. The derivations may be obtained from the author upon request. We denote the univariate standard normal density and distribution function respectively by
(:) and
(:). Let Cti =
ti Xti
for t = 1; 2 for individual i. We
suppress the individual subscript throughout the section to avoid notational clutter. We denote the moments of the truncated SBVN distribution for the subsample S4 (de…ned in
43
Section 2:1) as mj;k = E(uj1 uk2 j u1 >
q=
p
1 p
2 12
2
C2 ). Recall a = p
1 1
2 12
and let
1 = p ; a 2
(A.3.1)
D2 = C12
D(C1 ; C2 )
C1; u2 >
2
12 C1 C2
+ C22 ;
(A.3.2)
C1 (C1 ; C2 )
C1 = a(C1
12 C2 );
(A.3.3)
C2 (C1 ; C2 )
C2 = a(C2
12 C1 );
(A.3.4)
C1 (C1 ; C2 )
C1 = C1 +
12 C2 ;
(A.3.5)
C2 ( C1 ; C2 )
C2 = C2 +
12 C1 ;
(A.3.6)
(C1 ; C2 ) = ( C1 ) (C2 );
(A.3.7)
(C1 ; C2 ) = ( C2 ) (C1 );
(A.3.8)
(C1 ; C2 ) =
(A.3.9)
(aD) ;
and
L(C1 ; C2 ;
12 )
=
Z
1 C2
Z
1
g(u1 ; u2 )du1 du2 :
(A.3.10)
C1
Then,
m0;1 =
m1;0 = m0;2 = m2;0 = m1;1 =
(C1 ; C2 ) + (C1 ; C2 ) ; L(C1 ; C2 ; 12 )
(A.3.11)
(C1 ; C2 ) + 12 (C1 ; C2 ) ; L(C1 ; C2 ; 12 )
(A.3.12)
12
L(C1 ; C2 ;
12 )
L(C1 ; C2 ;
12 )
2 C 12 1
(C1 ; C2 ) C2 (C1 ; C2 ) + q L(C1 ; C2 ; 12 )
2 C (C ; C ) + q C1 (C1 ; C2 ) 1 2 12 2 L(C1 ; C2 ; 12 )
12 L(C1 ; C2 ; 12 )
12 C1
(C1 ; C2 ) L(C1 ; C2 ;
44
12 C2 12 )
12
(C1 ; C2 )
12
(C1 ; C2 )
;
(A.3.13)
;
(A.3.14)
(C1 ; C2 ) + q (C1 ; C2 )
;
(A.3.15)
m0;3 =
1 L(C1 ; C2 ;
12 )
[2L(C1 ; C2 ;
+ C22 (C1 ; C2 )
m3;0 =
1 L(C1 ; C2 ;
12 )
q
1 L(C1 ; C2 ;
+ (1
m2;1 =
2 12
+
1 L(C1 ; C2 ;
+ (1
2 12
+
12 )
q
[2
12 C1
[2
12 )m1;0
+
1
2 12
+
3 2 12 C1
12
1
2 12
+
3 2 12 C2
(C1 ; C2 )];
(C1 ; C2 )
(C1 ; C2 )
(C1 ; C2 )
(C1 ; C2 ) (A.3.17)
+
2 12 C2
(C1 ; C2 )
qC2 (C1 ; C2 )];
12 L(C1 ; C2 ; 12 )m1;0
2 2 12 C2 )
12
(A.3.16)
12 L(C1 ; C2 ; 12 )m0;1
2 2 12 C1 )
12 )
+
(C1 ; C2 )];
[2L(C1 ; C2 ;
+ C12 (C1 ; C2 )
m1;2 =
12 C2
12 )m0;1
+
2 12 C1
(A.3.18)
(C1 ; C2 )
qC1 (C1 ; C2 )]: (A.3.19)
6.4
Expressions of ’s
We obtain below formulas using Equation (12) and the formulas provided in Appendices 6:1 and 6:2 via a Maple code.
1
=
(C1 ; C2 ) ; L(C1 ; C2 ; 12 )
(A.4.1)
2
=
(C1 ; C2 ) ; L(C1 ; C2 ; 12 )
(A.4.2)
3
=
C1 (C1 ; C2 ) q L(C1 ; C2 ;
4
=
C2 (C1 ; C2 ) q L(C1 ; C2 ;
5
=
q (C1 ; C2 ) ; L(C1 ; C2 ; 12 )
12
(C1 ; C2 )
12 ) 12 12 )
(C1 ; C2 )
;
(A.4.3)
;
(A.4.4) (A.4.5)
45
6
6.5
(C1 ; C2 ) + C22 (C1 ; C2 ) + q L(C1 ; C2 ;
12 (aC1
+ C1 ) (C1 ; C2 )
12 ) 12 (aC2
+ C2 ) (C1 ; C2 )
;
(A.4.6)
;
(A.4.7)
7
=
8
=
aqC1 (C1 ; C2 ) ; L(C1 ; C2 ; 12 )
(A.4.8)
9
=
aqC2 (C1 ; C2 ) ; L(C1 ; C2 ; 12 )
(A.4.9)
12 )
Expressions of ’s for Tunali and Baslevent’s (2006) Problem
Let
12
2 u2 ) 22
(C1 ; C2 ) + C12 (C1 ; C2 ) + q L(C1 ; C2 ;
=
+
=
1 2 12 ,
BVN (0; 0; 12 .
11
=
11 ; a
2 1
2 ; e)
and
22
=
2. 2
p
where a =
When (u1 ; u2 ) 11
+
22
+2
12
SBVN ( and e =
12 ), 11
+
(
1 u1 ; 12 .
1 u1
+
Let b =
Denote the probability of LF P = j by Pj and the standard bivariate normal
cumulative distribution function with upper thresholds s1 , s2 and correlation coe¢ cient r by (s1 ; s2 ; r). Assume that state probabilities are given as:
P0 =
(C01 ; C02 ; C03 );
(A.5.1)
P1 =
(C11 ; C12 ; C13 );
(A.5.2)
P2 =
(C21 ; C22 ; C23 );
(A.5.3)
P3 = 1
P0 :
(A.5.4)
Unemployment probability follows from the adopted de…nition. Considering the categorical variable LFP in Equation (20), observe that
P0 = P (y1 < 0; y1 + y2 < 0) = P 0
1 u1
= P @u1
0) = P
2 u2
2 u2
!
0
2z
= P @u2 >
2
0
0
2z
= P @u2