Identification of Quantile Treatment Effects Exploiting Instrumental Variation∗ Margherita Fort †
Richard Spady ‡
This version: January 7, 2009. Abstract: This paper discusses the strategies proposed in the literature to identify causal effect of an endogenous regressor on the distribution of an outcome variable. It highlinghts the relative merits and drawbacks of each approach and their links. To further illustrate the differences and analogies, it concludes with an application of the surveyed methodologies to the analysis of the effects of education on the distribution of wages in Europe. Keywords: endogeneity, heterogeneity, monotonicity, non-additive models, rank invariance. JEL codes: C3.
PRELIMINARY AND INCOMPLETE
1 Introduction The recent increase in the attention devoted in the evaluation literature to quantile treatment effect is due to their intrinsic ability to characterize heterogenous impact of the treatment on various points of the outcome distribution.1 Indeed, the hetero∗
Early stages of this project where developed while Fort was a Max Weber Fellow at the European University Institute in Florence in the academic year 2006-2007. Fort gratefully acknowledges financial support from the Max-Weber Programme. The usual disclaimer applies. † University of Bologna and CHILD. Mailing address: Department of Economics, Piazza Scaravilli 2, Bologna Italy. E-mail:
[email protected]. Home-page: http://www2.dse.unibo.it/fort/eng/index.htm. ‡ Johns Hopkins University and cemmap. 1 Conditional quantiles are also suitable objects for the discussion on identification in nonseparable non additive models: quoting [Chesher, 2003b, p. 23] “when a structural function is
1
geneity of the response is emprically important (Sen [1997], Sen [2000], Cabrera and Evans [2000]) and mean impacts miss distributional effects (Heckman et al. [1997], Bitler et al. [2006]). Although one may in principle be interested in the distribution of the impact, often the focus shifts to quantile treatment effects (QTEs), i.e. the change in the response required to stay in the same conditional quantile attributable to the treatment. This happens mainly because identification of QTEs stands generally on weaker assumptions than those needed for the identification of the distribution of the impact. Identification of QTEs may be informative about features of the impact distribution and is informative about the impact distribution only in the special case where that is degenerate, i.e. everyone with the same observed characteristics experiences the same impact. Indeed, as Heckman et al. [1997] highlighted, identification of the impact distribution requires information on the joint distribution of the outcome and the treatment and this cannot be retrieved, even from experimental data, without clarifying the degree of dependece between outcomes in the presence or in the absence of the treatment. This review discusses strategies aimed at identifying quantile treatment effects which exploit the exogenous variation in the potentially endogenous regressor induced by an instrumental variable.2 There is relatively little novel in this discussion. The goal is instead to establish the links between different approaches and discuss the issues related to the empirical implementation of such strategies. The structure of the paper is as follows: section 2 reviews the instrumental variable strategies aimed at identifying QTEs; section 3 discusses the links between them; section 4 illustrates these links via an empirical example aimed at assessing distributional impacts of education on wages in Europe. Section 5 concludes. nonseparable in its latent variate, conditional mean independence is an uninformative restriction unless there are highly restrictive conditions on the functional form of the structural function. If the structural function is restricted to be monotonic in its latent vairate, a conditional quantile restriction is informative when a structural function is non separable.”. 2 Strategies based on the assumption of selection on observables (uncounfoundedness), such as Firpo [2007], are not considered here.
2
2 Identification of Quantile Treatment Effects: A Survey In what follows, Y denotes the outcome; D denotes the regressor (potentially endogenous in the data generating process); X denotes a set of K exogenous regressors; Z denotes a set of L instrumental variables. The discussion, unless otherwise stated, refers to the case where L = 1.3 Restrictions on the scale of the variables introduced later. At this stage, D, Y, Z can be either continuous or discrete random variables.4 Both Y and D can be decomposed in two components: one which is deterministic (typically some function -at least partially unknown, often unknown only up to a finite number of parametersof the covariates) and one which is stochastic. The two components need not be additively separable. The stochastic components account for differences in the distribution of D and Y across individuals with identical values of X. The econometric models place restrictions on the distribution (joint, marginal, conditional) of these stochastic components and X. Data consist on a random sample of size n {yj , dj , xj , zj }j=1,...,n from (Y, D, X, Z). In the population, the outcome for individual j can be denoted as ydj = gd (x, d) where gd (·) denotes a generic function (unrestricted as for now) and x = [x1 · · · xN ]′ , d = [d1 · · · dN ]′ , N is the number of individuals in the population. The assumption that the individuals do not interact and there are no general equilibrium effects is maintained throughout, so that one has ydj = gd (xj , dj ).5 For the purpose of the following discussion, let’s assume n = N. For the sake of simplicity the subscript j, denoting the individual, may be omitted: Yd denote the potential outcomes (Neyman [1923], Fisher [1935], Cox [1958], Rubin [1974]), FYd (·, x) the corresponding cumulative distribution function, where FYd (·, x) ≡ Pr[Yd ≤ y|X = x]. 3
Exogeneity assumptions are defined in terms of distributions of observed variables as in Engle et al. [1983], not in terms of conditional expectations. 4 The scale of X is generally not relevant for the discussion on identification but may affect the choice of the estimation method. 5 The assumption is often referred as “stable unit treatment value assumption” (SUTVA) (Rubin [1980], Rubin [1990]).
3
When D is binary, the quantile treatment effect is defined as the horizontal distance between the distribution function under the effect of the treatment and the distribution function in the absence of the treatment (Doksum [1974], Lehmann [1974]), as in (1): −1 δ(τ, x) = F−1 Y1 (τ, x) − FY0 (τ, x)
τ ∈ (0, 1)
τ ≡ FY0 (y, x)
(1)
0 Changing variables one has δ(τ, x) = F−1 Y1 (FY (y, x), x) − τ , τ ∈ (0, 1). Indeed,
the quantile treatment effect represents the change in the response required to stay in the τ th conditional quantile. Maintaining the definition of quantile treatment effect as in (1), one can formulate a quantile regression model for the binary treatment case as in (2) (essentially a reparametrization of the problem) QY (τ |D, X) = α(τ, x) + δ(τ, x)D
α(τ, x) ≡ F−1 Y0 (τ, x)
(2)
When D takes a finite number of p + 1 distinct values (see Koenker [2005]) one can extend (2) to (3) QY (τ |D, X) = α(τ, x) +
p X
δj (τ, x)D
α(τ, x) ≡ F−1 Y0 (τ, x)
(3)
j=1
and the treatment effect of each treatment j is defined with respect to the controltreatment j = 0. When D takes a continuum of values one can define the parameter of interest as the marginal effect of D on the conditional quantile, replacing (1) with (4) δ(τ, x) =
∂QY (τ |D, X) ∂D
(4)
One can also contrast the marginal (with respect to X) distribution of Y by D and get expression similar to (4) and/or (1). Since we do not rule out the possibility that the observed value of D is choosen from individuals on the basis of gains (as in a standard Roy model) or more generally on the basis of the (potential) outcomes for specific levels of the treatment j (Yj ), the standard quantile regression set-up cannot be directly exploited to draw
4
inference on the parameter of interest moving from observational data without further restrictions. Table 1 gives an overview of the models reviewed in the following subsections. Key feature of these models are that: (i) identification is attained nonparametrically; (ii) heterogeneity of treatment effects is not restricted to the one driven by observed covariates by identifying assumptions. Note that estimation of the treatment parameter(s) of interest may nonetheless require more restrictive models, typically parametric and/or distributional assumptions: results driven by these models may be seen as best approximation of the structural features of interest (Abadie [2003],[Chesher, 2003c, pp.4-5]). Besides, as Table 1 shows,: (i) all models specify two “equations”: one equation describes the outcome variable Y, the other describes the potentially endogenous treatment variable/regressor D; (ii) all models incorporate an exclusion restriction: the “instrument” Z only affects Y via D but not directly; it is “excluded” from the Y equation.6 The role of the instrument in these models is to isolate exogenous variation in D, i.e. variability in D not driven by expected gains or by the outcome under any level j of the treatment Yj .
6
Indeed, the ignorability assumption used by [Abadie et al., 2002, p.93] ((Y1 , Y0 , D1 , D0 )⊥Z|X), as highlighted by the authors, subsumes random assignment of Z and that Z does not affect direclty Y (exclusion restriction). Nevertheless, the ignorability assumption is weaker than assuming random assignment of Z and the exclusion restriction.
5
Table 1: Approaches to the Identification of the Quantile Treatment Effects (QTEs) and the Exogenous Impact Function (EIF): Model Representations. Abadie et al. [2002] (focus on QTE)
Chesher [2001] (focus on EIF)
Chernozhukov and Hansen [2005] (focus on QTE)
Description Yd = Y(D = d, X), d ∈ {0, 1} Dz = D(Z = z, X) z ∈ {0, 1} (Y1 , Y0 , D1 , D0 )⊥Z|X E[D1 |X] 6= E[D0 |X] Pr[D1 ≥ D0 |X] = 1 QY|X,D,D1 >D0 (τ ) = α(τ )D + X′ β(τ )
Y = hY (D, X, ε, ν) D = hD (X, Z, ν) hY (·, ·, ε, ·) stricly monotonic hY (D, ·, ·, ·) differentiable hY (·, ·, ·, ν) differentiable hD (·, ·, ν) strictly monotonic hD (·, Z, ·) differentiable hD (·, ·, ν) differentiable τε - conditional quantile ⊥(ν, X, Z) τν - conditional quantile ⊥(X, Z) Y, D continuous r.v. Yd = qd (d, x, Ud ) Ud eU(0, 1) D = δ(Z, X, V) qd (·, ·, τ ) strictly monotonic Ud ⊥Z|X (a) Ud ≡ U (b) |V
d
Ud ≈ U
Comments The identification of the marginal distributions of Yd d ∈ {0, 1} does not require parametric assumptions. Set up has not yet been generalized to the case where D takes more than 2 values or is continuous. This is a local identification result. QTEs are identified but, without further assumptions, features of the impact distribution are not. It guarantees local point identification of features of the impact distribtion. When Z and D assume a finite number of values, the set-up can be extended (see Chesher [2003b], Chesher [2005]) but it ensures only interval identification of features of the impact distribution. Notably, this is done without imposing conditions on the evolution of ranks across different levels of D. It guarantees global identification of conditional quantiles of Y|D, X. It applies to settings with outcome continuous outcomes and both discrete and continuous instrument and treatment.It imposes restrictive conditions on the evolution of ranks across treatment states. It requires that there is a single latent variable that governs the evolution of ranks.
Legend This table reports the key reference for the 3 main approaches (1st column); a short description of the assumptions of each approach (2nd column) (the notation used may differ from the one used in the original paper); some comments (column 3), mainly information about generalizability of the approach to slightly different settings and whether it is a global or local identification result.
6
2.1 Instrumental Variable Quantile Regression by Abadie, Angrist and Imbens Abadie et al. [2002] generalize the approach of Angrist et al. [1996] to the estimation of the effect of a binary (potentially endogenous) treatment D on the quantiles of the distribution of a scalar continuously distributed outcome Y. Their approach requires the availability of a binary instrumental variable Z and identifies QTEs for the subpopulation of treated individuals whose behaviour is affected by the instrument (the so-called compliers). The approach has not yet been extended to cover the case where D is discrete or continuosly distributed. D takes the value 1 when the unit j receives the treatment and the value 0 when the unit is not treated. Z takes the value 1 when the unit j is assigned to the treatment, 0 otherwise. Assignment is assumed to be random or at least ignorable (Rubin [1978], [Rosenbaum and Rubin, 1983, p. 43], Rosenbaum [1984]). The response observed on each unit is represented by Yj , 0 Yj ≡ Yj 1 − Dj + Dj Y1j
(5)
where Y1 and Y0 denote the potential outcomes, respectively the outcome in the presence and in the absence of the treatment. Similarly,the treatment status observed on each unit is represented by Dj : Dj ≡ D0j 1 − Zj + Zj D1j
(6)
where D1 , D0 denote the potential treatment status, D1 ≡ Dj (Zj = 1) and D0 ≡ Dj (Zj = 0), one for each assignment status. In this framework, one can classify units recognizing the dependence, at the individual level, between the treatment and the instrument by using potential treatment indicators, D1 and D0 , as shown in Figure 1. In principle, one can distinguish: (i) always takers, individuals who will always be treated, whether assigned to the treatment group or not; (ii) never takers, individuals who will never be treated, whether assigned to the treatment group or not; (iii) compliers, individuals who will be treated if assigned to the treatment group but will not be treated if not assigned to the treatment group; (iv) defiers, individuals who will not be treated if assigned to the treatment group and will be treated if not assigned to the treatment group. Always-takers, nevertakers and defiers are jointly referred as non-compliers. The key results then are 7
that, given the assumptions listed in Table 2: (i) the set of defiers is empty; (ii) conditional on X and on the population of compliers, i.e. on the subpopulation for which D1 > D0 , the treatment status D is independent of potential oucomes, i.e. (Y1 , Y0 )⊥D|X, D1 > D0 . Therefore, in the population of compliers comparisons by treatment status conditional on X may be given a causal interpretation. Figure 2 represent the correspondence between observed groups and compliance types, which is crucial for identification in Angrist et al. [1996], Imbens and Rubin [1997], Abadie et al. [2002], Abadie [2003]: individuals who are not treated (Dj = 0) but were assigned to the treatment (Zj = 1) inform about Y0 in the subpopulation of never-takers; individuals who are not treated (Dj = 1) but were assigned to the treatment (Zj = 0) inform about Y1 in the sub-population of alwaystakers and the other two groups inform about Y0 and Y1 for sub-populations which are mixtures of compliers and never-takers and compliers and always-takers, respectively. Weights in these mixtures are function of the proportion of individuals in the sub-populations (never-takers, always-takers, compliers) that can consistently be estimated using observed data. As a consequence, the correspondence allows to recover the marginal distribution of Y1 and Y0 for compliers (Imbens and Rubin [1997], Abadie [2002]). However the population of compliers cannot be identified from observed data. As a consequence, causal parameters, including δ(τ, x) (see equation (1)), even for this sub-population cannot be estimated directly.7 Abadie [2003] suggest a solution based on re-weighting the population using a weighting function that “finds compliers in an average sense”. His result include those previously presented in the literature on instrumental variables models as special cases. He first shows that the proportion of compliers in the population is identified from observed conditional moments , then he shows that the moments of any measurable function g of (Y, D, X) with finite moments are identified (see Lemma 2.1 and Theorem 3. in Abadie [2003]). The weighting scheme proposed by Abadie [2003] stem from the fact that, under the assumption of the model, the population can be partitioned in 3 groups (compliers, never-takers and always-takers). Formally, 1 E[g(Y, D, X)|D > D ] = E Prob[D1 > D0 ] 1
0
1−
D(1 − Z) (1 − D)Z − Prob[Z = 0|X] Prob[Z = 1|X]
g(Y, D, X) (7)
The weighting scheme proposed by Abadie [2003] may produce negative values of weights. Abadie et al. [2002] propose an alternative definition that guarantees 7
Indeed, for each individual either the value of D0 or the value of D1 is observed.
8
Figure 1: Compliance Types Di (Zi = 0) 0
0 NEVER - TAKER ∀j, D(Zj ) = 0
1 ∀i, D(Zj ) = 1 − Zj
COMPLIER
ALWAYS - TAKER
∀j, D(Zj ) = Zj
∀j, D(Zj ) = 1
DEFIER
Dj (Zj = 1) 1
nonnegative weights (see Lemma 3.2, p. 96 in Abadie et al. [2002]). Estimation in this set-up requires two step: in the first step the weighting function is estimated, then the estimated weights are used to estimate the parameter of interest. Estimators and inference is illustrated in Abadie [2003] and Abadie et al. [2002]. Figure 2: Compliance Types by Observed Treatment Status and Assignment to the Treatment given Monotonicity Zj 0 0
NEVER - TAKER
or
1 NEVER - TAKER
COMPLIER
Dj 1
ALWAYS - TAKER
ALWAYS - TAKER COMPLIER
9
or
Table 2: Abadie et al. [2002]’s Approach: Key Features Y-scale continuous
D-scale binary
Assumptions 1. ∃ a binary (instrument) Z s.t. (Y1 , Y0 , D0 , D1 )⊥Z 2. P[Z = 1|X] ∈ (0, 1) 3. E[D1 |X] 6= E[D0 |X]
References & Comments Abadie et al. [2002] see also Abadie [2003], Imbens and Rubin [1997 Angrist et al. [1996] 1 ≡ ignorability (exclusion restriction & random ass. of Z 2 ≡ non trivial assignment 3 ≡ first stage/instrument relevance
4. P[D1 ≥ D0 |X] = 1 4 ≡ monotonicity ′ 5. QY|X,D,D1 >D0 (τ ) = α(τ )D + X β(τ ) 5 linearity of conditional quantile for compliers
10
2.2 Chesher’s Causal Chain Model8 Recently, Chesher made several contributions to the literature on identification of non-additive structural functions (Chesher [2001], Chesher [2003a], Chesher [2003b], Chesher [2005], Chesher [2007a], Chesher [2007b], Chesher [2008]). Chesher’s approach relies on four main ingredients: (i) the triangular structure of the equation system both in the observable variables (Chesher [2001]) and in the latent variables (error terms/stochastic components); (ii) the monotonicity of the (unknown) non-additive structural functions in the latent variables; (iii) local invariance conditions on conditional quantiles; (iv) no excess variation: in other words the number of latent variables admitted in the model is not greater than the number of endogenous regressors. Assumptions (i), (ii), (iii) characterize the structures admitted by Chesher.9 The assumptions of the Chesher’s causal chain model (Koenker [2005]) guarantee: (i) local (non parametric) point identification of features of the impact distribution -the exogenous impact function, a functional of the joint distribution of (D, Y), if Y, D, Z are continuous random variables; (ii) local (non-parametric) interval identification of the quantile treatment effects if Y is a continuous random variable, D assumes only a finite number of values, Z has a rich support and a strong impact on the distribution of D.10 Currently, the set-up has not been generalized to the case in which: (i) D is binary; (ii) Z is binary; (iii) there are more latent variables than observed outcomes. We firstly consider the set up proposed by Chesher [2003a] (see the first row of Table 3). The feature whose identification is sought in this framework is a 8
Koenker [2005] refers to the model proposed by Chesher using this terminology. Here we consider a 2 equation system, one for Y, one for D but the set-up applies to the case where there are M equation provided that at most M latent variables are admitted. 10 If one is willing to make parametric assumption that guarantee some kind of smoothness of the conditional quantile function, point identification may be achieved. Essentially, continuous variation or some kind of smoothness are required for an unambiguous definition of conditional quantiles (Chesher [2001]). In Chesher [2002], there is no requirement on the scale of the regressors and of the instruments but a completeness condition has to be met. More generally, “parametric restrictions allow ‘interpolation’ between points at which nonparametric identification is feasible” ([Chesher, 2005, p.1525]). 9
11
functional of the joint distribution of Y and D conditional on X and Z, namely the exogenous impact function, that is the marginal change in Y corresponding to an exogenous variation in D for an individual with specific level of the observed characteristics X, Z and the unobserved characteristics ν and ε. Formally, the exogenous impact function (see [Chesher, 2001, p.3]) is described in equation (8) def
z}|{ π(τε , τν , x, z) ≡ ∇D hy (hD (x, z, Qν (τν )), x, Qε (τε ), Qν (τν )) = ∇D hy (d, x, ε, ν),
(8)
where ∇D denotes the partial derivative with respect to D. Chesher [2001], Chesher [2003a] shows that, under the monotonicity and quantile invariance assumptions and assuming the triangular structure of the model, the exogenous impact function in (8) can be written as a function of observed conditional quantiles (see equation (9) below). The equivariance of quantile estimates to monotone transformations plays a key role in achieving the result.11 To prove it, Chesher firstly shows how the τ th conditional quantile of Y can be written has a function of the quantiles of the latent variables ν and ε, then he clarifies the links between the exogenous impact function as in definition (8) and observed conditional quantiles. By applying the chain rule12 and exploiting the model assumptions, he gets the relation reported in equation (9) below, where π(τε , τν , x, z) is defined as in equation (8). π(τε , τν , x, z) = ∇D QY|D,X,Z (τε , QD|XZ (τν , x, z), x, z)+
∇Z QY|D,X,Z (τε , QD|XZ (τν , x, z), x, z) ∇Z QD|X,Z (τν , x, z) (9)
As noted by Chesher [2001] and Koenker and Ma [2006] the formula in (9) suggests to retrieve the exogenous impact function applying a “bias correction”.13 11
Let h(·) be a non decreasing function, Y a random variable, the following holds Qh(Y) (τ ) = h(Q(y))(τ ). The list and discussion of the quantile estimates equivariace properties can be found in Koenker [2005]. 12 See Simon and Blume [1994], chapter 14, sections 5-6. 13 Recall that, in the linear regression model, one of the alternative ways to obtain the two stage least squares estimator is to estimate via ordinary least squares a regression of the dependent variable Y on the full set of covariates -excluding the instrument- and the residual of the first stage regression, i.e. the one where the potentially endogenous covariate D is regressed on the instrument
12
Estimation in the Chesher’s causal chain model may rely on analog estimation procedures; Koenker and Ma [2006] develop estimation and inference for the continuous case (Y, D, Z continuous r.v.) using a control-function approach. The framework developed by Chesher [2003a] for models with continuous outcomes and covariates can be extended to the case where outcomes and/or covariates exhibit discrete variation (see Table 3 and Table 4). Chesher [2007a] maintains the triangular structure of the model in both the observed and unobserved variables when the outcome and the covariates are discrete but requires that the treatment variable is continuous. In this case, structural partial difference functions are point identified locally. The structural partial difference function is defined by equation (10). ′ ′ ∆hY (z, z , z∗ ) ≡ hY (¯ d(z ), z∗ , ε¯, ν¯) − hY (¯ d(z), z∗ , ε¯, ν¯)
(10)
Equation (11) presents an estimator directly based on the identifying correspondence illustrated by Chesher [2007a] in Theorem 1 and 2. Under the restrictions ′
′
imposed by Chesher [2007a], it can be shown that ∆hY (z, z , z∗ ) ≡ ∆hY (z, z ) and the latter can be estimated using the estimator in equation (11), very similar to the Wald estimator for linear parametric models. ′ ′ b d(z), z, ν¯) d(z ), z , ν¯) − b QY|DZ (¯ ε|¯ QY|DZ (¯ ε|¯ ′ b QD|Z (¯ ν |z = z ) − b QD|Z (¯ ν |z = z)
(11)
When the endogenous treatment variable takes only a finite number of values, more restrictive assumptions are required even to guarantee only local partial identification. Indeed, the extension by Chesher [2005] to the case with discrete endogenous variables maintains the triangular structure of the model in the observed variables but further restricts the structure of the model in the unobserved components and imposes a rank-order condition in the unobserved factors driving Z. The rational is that the estimate of the parameter associated to D in this auxiliary regression will deliver an inflated estimate of the impact of D on Y because of the potential endogeneity of D in the Y equation. Introducing the first stage residual as a control variate helps to isolate the exogenous variation in D: the estimate of the impact of D on Y is “corrected” so that it reflects the only the variation in Y induced by changes in D driven by Z, i.e. -given the model assumptions- exogenous changes in D.
13
the outcome and treatment equation. As a consequence, although the treatment parameter still exhibit some heterogeneity after conditioning on observed covariates, this heterogeneity in the response to changes of the level of the treatment is restricted to be driven by a single latent factor instead of two distinct factors as in Chesher [2003a]. The model by Chernozhukov and Hansen [2005] illustrated in the next section imposes the same restriction.
14
Table 3: Chesher’s Causal Chain Model: Key Features (continuous endogenous variables). For a complete summary see also Table 4. Y-scale continuous (scalar)
D-scale continuous (scalar)
continuous/ discrete (scalar)
continuous (scalar)
Assumptions Y = hY (D, X, ε, ν) D = hD (X, Z, ν) hY (·, ·, ε, ·) strictly monotonic hY (D, ·, ·, ·) differentiable hY (·, ·, ·, ν) differentiable hD (·, ·, ν) strictly monotonic hD (·, Z, ·) differentiable hD (·, ·, ν) differentiable τε - conditional quantile ⊥(ν, X, Z) τν - conditional quantile ⊥(X, Z) Z continuous r.v. Y = hY (D, X, ε, ν) D = hD (X, Z, ν) Z discrete r.v. ¯ V set of values of Z ′ ∀z , z, z∗ ∈ ¯ V: (i) D|Z = z continuous r.v. (ii)hY (¯ d(z), z, ε, ν¯) strictly monotonic f. of ε (iii) ε(z) = ε(z′ ) = ε ′ ′ ′ (iv) hY (¯ d(z ), z , ε¯, ν¯) = hY (¯ d(z ), z, ε¯, ν¯) ′ useful if ¯ d(z ) 6= ¯ d(z) (rank condition) quantile function difference ≡ ′ ′ QY|DZ (¯ ε|QD|Z (¯ ν |z ), z ) − QY|DZ (¯ ε|QD|Z (¯ ν |z), z) structural function partial difference ≡ ′ hY (¯ d(z ), z∗ , ε¯, ν¯) − hY (¯ d(z), z∗ , ε¯, ν¯)
15
References & Comments Chesher [2001], Chesher [2003a] It guarantees local point identification of features of the impact distribution. It can be extended to the case where there are endogenous variables provided that there are at most M unobserved latent variables. M represents the irreducibile number that remains after combination of free outcomes a covariates.
Chesher [2002], Chesher [2007a] It guarantees local point identification of features of the impact distribution. Restrictions (i) and (ii) guarantee local point identification of the conditional quantile function of Y|Z. Restrictions (i) to (iv) guarantee identification of the structural function partial difference from the quantile function difference. The number of partial differences that can be identified depends on both the support of the instrument Z and the strength of the relationship between Z and D. Parametric restrictions may increase the number of identifiable features via extrapolation.
Table 4: Chesher’s Causal Chain Model: Key Features (discrete endogenous variables). For a complete summary see also Table 3. Y-scale continuous or discrete or mixed discrete-continuous (scalar)
discrete (scalar)
D-scale discrete (scalar)
discrete/ continuous (scalar)
Assumptions Y = hY (D, Z, ε) D = hD (Z, ν) εep.d.f fε continuous r.v. νeU(0, 1) hY (·, ·, ε) weakly monotonic QY|D,Z (ε|ν, z) monotonic in ν pk (z) ≡ FD|Z (dk |z), p0 (z) ≡ 0 k ∈ {1, . . . , M} ∃zm , zm−1 : pm (zm ) ≤ τD ≤ pm−1 (zm−1 )
Z discrete r.v. Y = hY (D, ε) εeU(0, 1) codomain of Y independent of D hY (·, ε) weakly monotonic, thus hY (d, u) = ym m ∈ {1, . . . , M} if pm−1 (d) < ε ≤ pm (d) s.t. p0 (d) = 0, pM (d) = 1 ∀d ∃Z, s.t. ∀τ ∈ (0, 1) Prob[ε ≤ τ |Z = z] = τ ∀z ∈ Z Z may have discrete or continous support; it does not have to be a r.v.
16
References & Comments Chesher [2003b], Chesher [2005] It guarantees local interval identification of QTEs ([Chesher, 2005, p. 1530]). Compared to the case outlined in Chesher [2001] and Chesher [2003a], it requires that ν is excluded from the Y equation. Results do not hold for extreme values of the support of D but only for interior points of the support of D; in particular they do not hold for the binary case. The scale of Z plays a relevant role for identification: the combination of weak impact of Z on the conditional distribution of D and sparse support may lead to underidentification. Chesher [2007b], Chesher [2008] It is a local partial identification result. The size of the identified sets depends on the density of the support of Y and on the richness of the support of Z with respect to the density of the support of D. Differently from previous work by Chesher in the area, here he considers the identifying power of a single-equation model. The model imposes no restrictions on the way the endogenous is generated. Z is excluded variable D from the structural function h(·, ·).
2.3 Instrumental Variable Quantile Regression Model by Chernozhukov and Hansen14 Chernozhukov and Hansen [2005] propose an Instrumental Variables (IV) IV model of quantile treatment effects to recover causal effects in quantile regression with endogenous factors. Their identification result apply to the whole population, not only for the subpopulation of compliers (as in section 2.1), and it is obtained imposing some structure on the evolution of ranks across treatment states The approach is applicable when the outcome variable Y is continuous, while D and Z can be either continuous or discrete random variables. Their model implies that the conditional quantiles qYd |X [τ |x] can be retrieved from the conditional quantiles of the observed outcome Y given X and Z. Chernozhukov and Hansen [2005] focus on the conditional quantiles of potential outcomes qYd |X (τ, x) (but denote it as q(d, x, τ )). The model assumes that: (i) the potential outcomes can be represented as Yd = qYd |X (Ud , X) and q(·) is strictly increasing in Ud , thus ruling out non-continuous outcome variables; (ii) conditional on X, the rank variables Ud are independent of Z; (iii) conditional on X and Z, D is determined by a random component V whose correlation with the Ud ’s drives endogeneity; (iv) conditional on X, Z and V, Ud are equal to each other or identically distributed. The latter condition is referred to as rank similarity, which essentially weakens rank invariance to allow for nonsystematic differences in ranks between potential outcomes. In words, rank similarity implies that rank invariance holds allowing for “slippages” in one’s rank that reflect some random variation. Chernozhukov and Hansen [2005]’s crucial result (theorem 1 in the paper) can equivalently be stated by equating to zero the τ -th quantile of the random variable Y − qYD |X : Pr[Y − qYD |X [τ |x] ≤ 0|X, Z] = τ .
(12)
The latter formulation suggests an estimation procedure in two steps: first, compute the conditional quantiles of the random variable Y − qYD [τ |X] given X and Z; 14
This section borrows from the review in Battistin and Fort [2008].
17
then, choose as estimate of qYD [τ |X] the one that minimizes the absolute value of the coefficient associated with Z in the first step. Note that this procedure requires an estimate of qYD [τ |X] in the first stage. Chernozhukov and Hansen [2006] consider linear quantile regression models qYD [τ |X] = αD + βX and suggest to take a grid over α to compute Y − qYD [τ |X] in the first step. The grid should be centered around the two stage quantile regression estimates, that is the estimate of α in the quantile regression of Y on b D and X, where b D ≡ E[D|Z]. The grid of values increases with the dimension of D.15
3 Identification of Quantile Treatment Effects: A Critical Discussion Heckman et al. [1997] firstly stressed the importance of learning about the features of the distribution of programme impact for their evaluation. It is possible, and it is instrumental for the purpose of this paper, to link the approaches presented in section 2.1 to section 2.3 to the ideas developed by Heckman et al. [1997] and Heckman and Smith [1998]. Heckman et al. [1997] discuss accurately the assumptions needed for the identification of the impact distribution from observed experimental data and nonexperimental data. The authors propose two different approaches to achieve identification: (i) a model that preserves perfect positive dependence among programme outcomes under different treatment states; (ii) the use of assumptions on the agents participation decision rules and their implications for the structure of dependence between potential outcomes.16 Formally, we have (i) Under perfect positive rank dependence and assuming absolute continuous distributions of the potential outcomes, the deterministic impact function is δ(y0 ) = F−1 1 (F0 (y0 |D = 1)) − y0 ; under randomization, this function can be identified since F0 (y0 |D = 1) = F0 (y0 |D = 0) = F0 (y0 ) and F1 (y1 |D = 1) = 15
This estimation procedure will be exploited in the section 4. Bounds are not considered here. Similarly, we do not review approaches that do not allow to restrict the set of admissible impact distribution, such as permutations. 16
18
F1 (y1 |D = 0) = F1 (y1 ) and F0 (y0 |D = 0) and F1 (y1 |D = 1) are observed by the analyst. The function can be identified also using non-experimental data provided that the assumptions underlying matching are satisfied, namely F0 (y0 |D = 1, X) = F0 (y0 |D = 0, X) = F0 (y0 |X) and 0 < Pr[D = 1|X] < 1. The latter set of assumptions rules out the case where participation depends on Y0 and allows for the possibility that participation depends on Y1 . This approach can be extended to the case of a discrete treatment. (ii) The model assumes that individuals are rational, risk averse (their utility function U(·) is concave and common across individuals and treatment states) and are uncertain about the future realization of their outcomes in the treatment state (Y1 ) and in in the no-treatment state (Y0 ) but know F0 (y0 ) R R and F1 (y1 ). It follows that D = 1( U(y0 )dF0 (y0 ) ≤ U(y1 )dF1 (y1 )), where 1(A) takes the value 1 if condition A is satisfied and 0 otherwise. One should
have Pr[Y1 ≥ Y0 |Y0 = y0 , D = 1] = 1, that is ‘in the participating population (. . .) all of the mass of the Y1 distribution conditional on Y0 is to the right of y0 ’. Without assuming any dependence between Y0 and Y1 but only conditioning on realized values, one has the following implication: Pr[Y1 ≤ y1 |Y0 = y0 , D = 1] is not increasing in y0 ∀y1 . The joint distribution of Y1 and Y0 for participants can be obtained from observed experimental data if one is willing to make the additional assumption that participation is determined only by programme gains (Y1 − Y0 ) and that these are independent from the base state Y0 for participants, as shown by [Heckman and Honore, 1990, theorem 9, theorem 12] and by [Heckman and Smith, 1998, appendix A]. As highlighted by Heckman and Smith [1998], this is the case for the standard Roy model. If a generalized Roy model is considered where unobservables other than those in the outcome equation enter the participation equation, the result does no longer hold [Heckman and Smith, 1998, p. 21]. The model proposed by Chernozhukov and Hansen [2005] is closely linked to the assumptions proposed within approach (i) by Heckman et al. [1997] whereas the 19
approaches by Abadie et al. [2002] and Chesher [2003a] are closely linked to the assumptions proposed within approach (ii). We start the discussion of the different approaches moving from a standard Roy model with separable additive errors and a binary treatment D.17 Y1 = µ1 (X) + U1
E[U1 |X] = 0
(13)
Y0 = µ0 (X) + U0
E[U0 |X] = 0
(14)
D = 1(Y1 ≥ Y0 )
(15)
Using equation (13) and equation (14), the equation for the observed outcome Y becomes
µ1 (X)−µ0 (X)
z}|{ Y = µ0 (X) + D[ α ¯ (X)
U1 −U0
z}|{ + ε ] + U0
(16)
From equation (16), the conditional quantile of the distribution of the observed outcome is F−1 U1 ,U0 (τU1 ,τU0 ),X
z }| { −1 QY [τy |X, D] = µ0 (X) + Dδ( F−1 ε (τε ), X ) + FU0 (τU0 )
(17)
Note that the model in equation (16) and equation (15) is characterized by additive errors and by the assumption that the impact ot the treatment D is addittive. These assumptions restrict the heterogeneity of the impact of D on Y in equation (17) imposing that δ(F−1 ¯ (X) + ε where ε ≡ U1 − U0 . Since neither Abadie ε (τε ), X) = α et al. [2002]’s nor Chernozhukov and Hansen [2005]’s approach assume error additivity, we refer to the general expression reported in (17). 17
The equivalence result by Vytlacil [2002] suggest that this is a sensible approach to constrast the Chernozhukov and Hansen [2005]’s and Abadie et al. [2002]’s approaches. Indeed, Vytlacil [2002] shows that given the assumptions of Abadie et al. [2002] and Pr[D = 1|X]in(0, 1), it is possible to construct a latent-index model that generates D0 and D1 and that latent-index assignment with constant coefficients and independent errors implies the Abadie et al. [2002]’s assumptions. As it will be apparent in the text, the Roy model has proven a useful starting point to link the above mentioned strategies with the one proposed by Chesher [2003a] as well.
20
An asymptotically unbiased estimate of the coefficient of D in equation (17) -for any fixed τy - cannot be estimated by standard quantile regression methods. To identify the parameter, one needs, alternatively: (a) to set F−1 ε (τε ) to zero conditional on X, U0 ; −1 (b) to impose perfect dependence between between F−1 ε (τε ) and FU0 (τU0 ) con−1 ditional on X or equivalently between F−1 U1 |U0 (τU1 ) and FU0 (τU0 ) conditional on
X, U0 ; −1 (c) to be able to vary F−1 ε (τε ) independently from FU0 (τU0 ) conditional on X or −1 equivalently to be able to vary independetly F−1 U1 |U0 (τU1 ) and FU0 (τU0 ).
Under the set-up (a), the treatment parameter will be fully determined by levels of X.18 The set-up (b) implies that the treatment parameter may be heterogeneous across individuals with the same level of X but the heterogeneity will be driven by only one latent factor, namely the relative position of the individual in the distribution of Y0 , the potential outcome in the absence of the treatment. The set-up (c) would allow the treatment parameter to depend on both U0 and ε or equivalently on U0 and U1 |U0 .19 Chernozhukov and Hansen [2005]’s, in the spirit of Heckman et al. [1997]’s approach (i), propose the identifying assumptions of independence of Ud , d ∈ {0, 1} of an instrumental variable Z, conditional on X, and rank invariance or rank similarity of the latent factors Ud . Chernozhukov and Hansen [2005] assume that the selection equation takes the form D ≡ f(Z, X, V) for some unknow function f and a random vector V. As clarified below, the assumption imposes no additional restrictions when the rank invariance assumption is maintained but is crucial for identification under rank similarity. The joint distribution of the latent factors conditional on X, Z can be written as FU1 ,U0 (u1 , u0 |Z = z, X = x) = FU0 (u0 |Z = z, X = x)FU1 |U0 (u1 |U0 ≤ u0 , Z = z, X = x). 18
This assumption, implicit in the model, imposes restriction across quantile regressions run at different quantiles of Y. 19 The joint distribution of U1 , U0 can indeed be written as the product of the marginal distribution of U0 and the conditional distribution of U1 given U0 .
21
Under the independence and the rank invariance assumptions, for all (u0 , u1 ) such that u1 ≤ u0 , one has FU1 ,U0 (u1 , u0 |Z = z, X = x) = Prob[U0 ≤ u0 ∩ U1 ≤ u1 |Z = z, X = x] = Prob[U0 ≤ u0 ∩ U0 ≤ u1 |Z = z, X = x] = Prob[U0 ≤ u0 |Z = z, X = x]Prob[U0 ≤ u1 |Z = z, X = x, U0 ≤ u0 ] = Prob[U0 ≤ u0 |Z = z, X = x] · 1.
Recall that in the IV-QTE model by Chernozhukov and Hansen, the assumption of rank similarity is equivalent to the assumption that U1 and U0 are identically distributed conditional on V, the random component in the selection equation. Under the independence and the rank similarity assumptions, for all (u0 , u1 ) such that u1 ≤ u0 , FU1 ,U0 (u1 , u0 |Z = z, X = x) = Prob[U0 ≤ u0 ∩ U1 ≤ u1 |Z = z, X = x] R = Prob[U0 ≤ u0 ∩ U1 ≤ u1 |Z = z, X = x, V = v]dP[V = v|X = x, Z = z] R = Prob[U0 ≤ u0 ∩ U0 ≤ u1 |Z = z, X = x, V = v]dP[V = v|X = x, Z = z] R = Prob[U0 ≤ u0 |Z = z, X = x, V = v]Prob[U0 ≤ u1 |Z = z, X = x, U0 ≤ u0 , V = v]dP[V = v|X = x, Z = z] R = Prob[U0 ≤ u0 |Z = z, X = x, V = v] · 1 dP[V = v|X = x, Z = z] = Prob[U0 ≤ u0 |Z = z, X = x].
To sum up, the IV-QTE model restricts the heterogeneity in the treatment param-
eter. The model assumes that the variability in treatment parameter is driven by the observed variables X and by a single unobserved factor. Besides, the model assumes that the latent factor driving the heterogeneity in the treatment parameter is the same unobserved factor determining the relative ranking of the individuals in the outcome distribution. Abadie et al. [2002] , in the spirit of Heckman et al. [1997]’s approach (ii), suggest to exploit information delivered by observed participation decisions, albeit with a different take. Differently from what Heckman et al. [1997] propose, the set up by Abadie et al. [2002] does not exploit the dependence between potential outcomes induced by self-selection but uses the instrumental variable to identify a subpopulation where conditional independence holds. The identification result by Abadie et al. [2002] is valid for an instrument-specific subpopulation of the participants, the so-called compliers. As reviewed in section 2.1, the binary instrumental variable Z defines a partition of the population in four subpopulations or types (Imbens [2006]). When the instrument is binary, the type indicator 22
is (D(Z = 0), D(Z = 1)) where D(Z = z) is the potential treatment status. The type indicator can be seen as a function of Z and ε, T(ε, Z), where ε is the single latent factor that affects the selection into the treatment group, as in the IV-QTE model. In the Roy model, ε is equivalent to U1 − U0 . The endogeneity of the treatment in equation (17) is driven by the correlation between ε and U0 . Conditioning on the type, under the Abadie et al. [2002]’s assumptions, the treatment is exogenous, in particular it is independent from U0 .20 This means that conditioning on the type, F−1 ε (τε ) can be independently varied F−1 U0 (τU0 ) and, as a consequence, the parameters in equation (17) can be identified for the types for which D is not degenerate (Imbens [2006]). Note that, the treatment parameter can be identified on the support of U0 |type, Z or U0 |T(ε, Z), Z.21 The approach proposed by Abadie et al. [2002] allow the treatment parameter to vary across types but is informative about the value of the treatment parameter only for the types for which D is not degenerate. When the both the treatment and the instrument are binary, there is a single subpopulation for which the condition holds and the approach identifies the treatment parameter for the subpopulation of compliers. When the instrument takes more values, there are more compliers types and this approach is informative about the heterogeneity in the impact across subpopulations (Imbens [2006]). In this sense, Abadie et al. [2002] rely on weaker identifying assumptions than Chernozhukov and Hansen [2005]. Now consider the case where the treatment D is continuous. Imbens [2006] provides a description of the model that is well-suited to the two cases discussed so far. He consider a two equation recursive model as the one described by equation (18) and equation (19). D = hd (Z, ε)
(18)
Y = hy (D, ν)
(19)
20
In Abadie et al. [2002] the participation equation (15) is D = 1(p(Z) > U) where 1(·) is the indicator function, p(Z) = Pr[D = 1|Z = z] is a non trivial function of Z and U is assumed to be independent from Z. See [Vytlacil, 2002, p. 336] for the characterization of U. 21 Conditioning on observed exogenous covariates X is omitted for simplicity.
23
Imbens [2006] highlights that, in this model, the type indicator may be characterized as FD|Z (D|Z), under the assumption that hd (z, ε) is continous in both its arguments, it is strictly monotone in ε and normalizing ε on the interval [0, 1]. This suggest to use FD|Z (D|Z) to estimate the type and then exploit the conditional independence condition to identify the parameters of interest. When D is not continous, the functional FD|Z (D|Z) is not useful to provide a point estimate of the type but it can be informative providing an interval estimate for the type. Although the approach by Abadie et al. [2002] has not been formally extended to deal with the continuos treatment case, the framework proposed by Imbens [2006] seems the natural extension. As in the framework by Abadie et al. [2002], the treatment paramters will be identified for the subpopulation of types where D exihibit some variation (see Imbens [2006]). Crucial for this approach is the monotonicity assumption and the fact that in equation (19), the selection equation, there is a single unobserved factor. Where there are two unobserved factors in the selection equation, the strategy breaks down essentially because conditioning on the type is not enough to control for all the unobserved factors affecting the heterogeneity of the treatment parameter that are correlated with the unobserved factor affecting the outcome. The model described by equation (18) and equation (19) with continuous outcomes differs from the one proposed by Chesher [2003a] because this system is triangular in the observables but not in the unobservables. The model proposed by Chesher has the structure described by (20) and equation (21) with the additional requirement that also Z is a continuous random variable. D = hd (Z, ε)
(20)
Y = hy (D, ε, ν)
(21)
Take equation (20) has a selection equation. Note that, as in all the models reviewd so far, there is only one unobserved factor affecting the selection equation and this is driving the endogeneity of the treatment D in the outcome equation (21). In the standard Roy model, the outcome equation (16) has one unobserved component, U0 because the second unobserved component ǫ ≡ U1 − U0 only enters in the co24
efficient of D. The endogeneity of D in the outcome equation is driven by the fact that ǫ and U0 cannot be independently varied in the population. Chesher [2003a] does not impose the conditional independence condition, i.e. (ε, ν) ⊥ Z, but only the weaker assumption of conditional quantile invariance, i.e. that the conditional quantile of ν given Z, ε is invariant with respect to the conditioning variables and that the conditional quantile of ε given Z does not depend on Z.22 Chesher [2003a] does neither impose rank similarity or rank invariance conditions between ε and ν. However, he imposes more structure. The triangular structure in both observed and unobserved variable is crucial for identification. The exclusion of Z from the outcome equation allows to isolate the variation in D from the variation in ε. More specifically, the selection equation is exploited to get an estimate of ε. Conditioning on ε and on the other exogenous variables in the outcome equation, since there is no feedback and under the quantile invariance assumptions, ν can be independently varied from ε. The strategy breaks down in the case of feedback. Similarly to what happens in the framework by Abadie et al. [2002], the treatment parameter in the framework proposed by Chesher [2003a] may depend on two latent factors, both ν and ε. However, the causal-chain model by Chesher [2003a] does not impose restrictions on the values of ν at which the treatment parameters can be identified. In this sense, it is more general than the previous approaches. 22
Observed conditioning variables omitted for simplicity X.
25
4 Illustration: Examining the Effect of Education on the Distribution of Wages in Europe23 In this section, we illustrate the identification approaches discussed in the previous section using as the analysis of the effects of education on the distribution of wages in Europe (Brunello et al. [2009]) as guiding example. We distinguish four cases according to the scale in which the endogenous regressor (schooling in the example) and the instrument are recorded (see Table 5). We then identify and estimate the returns to schooling in the different setups using the available approaches. More specifically, we are able to contrast point estimates under the identifying assumptions of Abadie et al. [2002] and Chernozhukov and Hansen [2005] in set-up A1 and A2 and point estimates under the identifying assumptions of Chernozhukov and Hansen [2005] and Chesher [2003a] in set-up B2.24 The recoding of both the treatment variable and the instrument is not innocuous: it affects both the choice of the identification strategy and the interpretation of the results. To cast the analysis in the framework by Abadie et al. [2002] we coded the instrument and the treatment variable on a binary scale as illustrated in Table 5. To provide support to the monotonicity assumption, we plot the empirical cumulative 23
The ECHP data used in this paper are from the release to the project “Labour market and living condition: dynamic measures and comparative analysis between Italy and other principal member states in the European Union” within the project “Dynamics and inertia in the Italian labour market and policy evaluation (databases, measurement issues, substantive analyses)” supported by the Italian Ministry of Education and Scientific Research. This paper uses data from SHARE 2004. The SHARE data collection has been primarily funded by the European Commission through the 5th framework programme (project QLK6-CT-2001-00360 in the thematic programme Quality of Life). Additional funding came from the US National Institute on Aging (U01 AG09740-13S2, P01 AGO05842, P01 AGO8291, P30 AG12815, Y1-AG-4553-01 and OGHA 04-064). Data collection in Austria (through the Belgian Science Policy Office) and Switzerland (through BBW/OFES/UFES) was nationally funded. The SHARE data set is introduced in B¨orsch -Supan et al. (2005); methodological details are in B¨orsch -Supan and J¨urges (2005). 24 To achieve point identification under the Chesher [2003a]’s causal chain model in this application, we exploit the parametric specification of the quantile regressions.
26
Figure 3: Testing the monotonicity assumption of the Abadie et al. [2002]’s approach. Residuals e.c.d.f, Males Residuals e.c.d.f, Females Empirical c.d.f
.8 .6 .4 .2 0
0
.2
.4
.6
.8
1
1st stage residuals
Empirical c.d.f 1
1st stage residuals
−10
0
10
20
−10
temp Pre−reform
0
10 temp
Post−reform
Pre−reform
Post−reform
Note. The OLS regression of yedu (years of education) used to calculate the residuals were run separately for men and women and included a constant, country dummies, a country specific quadratic trend over cohorts, survey dummies, age, age squared, the lagged country specific unemployment rate and GDP per capita, the country and gender specific labour force participation rate at the estimated time of labour market entry, the country specific GDP per head and unemployment rate at the age affected by the country specific reform.
27
20
Table 5: Characterization of the set-up and available identification strategies within each set-up. Legend: Sij education or qualification of individual i of country j Cij cohort of individual i of country j; ¯ cj pivot cohort for country j 1(Sij ≥ ¯ sj )∗
Sij ≡ actual years of schooling
1(Cij ≥ ¯ cj )∗
case B1 Abadie et al. [2002], Chernozhukov and Hansen [2005]
case A1 Chernozhukov and Hansen [2005]
prescribed years of compulsory education
case B2 Abadie et al. [2002] Chernozhukov and Hansen [2005]
case A2 Chernozhukov and Hansen [2005], Chesher [2003a]
Treatment Instrument
∗
cj is the pivot cohort for country j; s ¯ ¯j is the post-reform minimum school leaving age in country j. See Table 1 in Brunello et al. [2009].
distribution function of the residuals of an OLS regression of the variable years of schooling on country and individual specific characteristics. Figure 3 report two lines denoting the pre- and post- reform conditional distribution of years of education net of country and individual specific characteristics. The solid line denotes the pre-reform conditional distribution whereas the dashed line denotes the postreform distribution. The graphs show that the post-reform distribution is shifted to the right with respect to the pre-reform distribution and the two distribution do not cross, for both males and females. Note that the difference between the two distribution vanishes at higher quantiles. Given the pattern of the graphs reported in Figure 3, we conclude that the evidence supports the monotonicity assumption, conditional on some country and individual specific characteristics. A more serious issue is related to the consequences of potential violations of the exclusion restrictions (see Angrist and Imbens [1995]).25 Angrist and Imbens [1995] show 25
Fort thanks J. Angrist for pointing this out during a discussion on early stages of a related research project.
28
that ‘when a variable treatment is incorrectly parametrized as binary, the resulting estimate tends to be too large relative to the average per-unit effect along the length of the response function. On the other hand, by virtue of the monotonicity, the sign’ of the average causal response ’is still consistently estimated’. Brunello et al. [2009] show that the instrument ycomp (the years of mandatory schooling prescribed by the law) affects the conditional distribution of years of education in more than one point. This is confirmed by the graphs in 3, as discussed above. Thus, we expect that the estimates of the causal impact of education on wages computed under the approach by Abadie et al. [2002] will be higher than those reported by Brunello et al. [2009]. Table 7 reports the estimates of the first stage equation. The figures suggest that the proportion of women affected by the reforms is nearly 6% whereas the proportion of men affected is around 3%. There is also some indication that the instrument is weak for men. Table 6 present some evidence of heterogeneity of the impact of Z on the distribution of D, with the impact being generally stronger at lower quantiles of the treatment distribution Table 9 reports estimates of the association between education and wages by gender. The estimates using the years of education or a qualification indicator are broadly consistent. The figures in the top-panel of Table 9 suggest that an additional year of schooling is associated with between 2% and 4% increase in wages for males and between 3% and 5% increase for females. The figures in the bottom-panel of Table 9 report the association between wages and the qualification of individuals. More specifically, the binary treatment indicator D takes the value 1 if the individual has completed upper secondary education and zero otherwise. The figures suggest that completing upper secondary education is associated with nearly 18% increase of wages for men and between 17% and 30% increase for women. Since upper secondary education (ISCED 3) corresponds to between 8 and 12 years of formal education in the countries considered (see Table 1 Brunello et al. [2009]) and to 9 years of formal education on average, we can reconcile the 29
Table 6: First stage effect of the instrument Z on the endogenous regressor/treatment D. Sample size: 18,328. Identification strategy: Chesher [2003a]’s causal chain model. Treatment (D∗): actual years of schooling (approximately continuous). Instrument (Z): years of compulsory schooling prescribed by law (discrete).26
Coef.
Males τa = 0.10 τa = 0.30 τa = 0.50 0.120∗∗∗ 0.056∗∗∗ 0.354∗∗∗
τa = 0.70 τ = 0.90 0.026 0.078∗∗∗
F-test p-value
2146.6 0.000
4.86 .027
Coef.
Females τa = 0.10 τa = 0.30 τa = 0.50 0.072∗∗∗ 0.284∗∗∗ 0.416∗∗∗
τa = 0.70 τ = 0.90 0.135∗∗∗ 0.219∗∗∗
F-test p-value
643.8 0.000
57.4 0.000
s.e.
s.e.
0.007
0.016
0.012
19.1 0.000
0.020
195.4 0.000
0.006
307.6 0.000
0.007
88.7 0.000
0.035
0.029
0.071
0.13 0.714
0.065
4.26 0.039
figures of the two panels. The association between education and wages increases moving from the bottom to the top decile of the conditional wage distribution and this suggest that rise in education is associated with increasing conditional wage inequality.
References A. Abadie. Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models. Journal of the American Statistical Association, 97(457):284–292, 2002. A. Abadie. Semiparametric Instrumental Variable Estimation of Treatment Response Models. Journal of Econometrics, 113:231–263, 2003.
30
Table 7: First stage effect of the instrument Z on the endogenous regressor/treatment D. Sample size: 18,328. Identification strategy: Abadie et al. [2002]. Linear probability model estimates. Treatment (D): whether the individual completed at least ¯ sj years of education (binary) ∗ . Instrument (Z): whether the individual was born after the pivot year (binary).
Proportion of compliers Males Females Coef (s.e.) 0.029∗∗ (0.012) 0.054∗∗∗ (0.013) 5.50 (0.02) 18.35 (0.00) F-test (p-value) Note.
∗
sj is the post-reform minimum school leaving age in country j. See Table 1 in Brunello ¯
et al. [2009]. Each regression, run separately for males and femals, included a constant, country dummies, a country-specific quadratic trend, survey dummies, age, age squared, the lagged country specific unemployment rate and GDP per capita, the country and gender specific labour force participation rate at the estimated time of labour market entry, the country specific GDP per head and unemployment rate at the age affected by the country specific reform. Robust standard errors are shown in parentheses. Three stars, two stars and one star for statistically significant coefficients at the 1%, 5%, 10% level.
31
Table 8: Effect of the instrument Z on the endogenous regressor/treatment D. Sample size: 18,328. Identification strategy: Chernozhukov and Hansen [2005]. A. Treatment (D): actual years of schooling (discrete). A1. Instrument (Z): whether the individual was born after the pivot year (binary). A2. Instrument (Z): years of compulsory schooling prescribed by law (discrete). B. Treatment (D): whether the individual completed at least ¯ sj years of education (binary)∗ . B1. Instrument (Z): whether the individual was born after the pivot year (binary). B2. Instrument (Z): years of compulsory schooling prescribed by law (discrete).
Coefficient of Z in E[D|Z]. A1 Males Coef. (s.e.) F-test (p-value)
0.048 (0.136) 0.13 (0.72)
A2 Females
0.614∗∗∗
(0.150) 16.70 (0.00)
B1 Coef. (s.e.) F-test (p-value) Note.
∗
Males 0.029∗∗ (0.012) 5.50 (0.02)
Males
Females
0.117∗
0.243∗∗∗ (.077) 9.91 (0.00)
(0.069) 2.92 (0.09) B2
Females 0.054∗∗∗ (0.013) 18.35 (0.00)
Males 0.020∗∗∗ (0.006) 11.95 (0.00)
Females 0.026∗∗∗ (0.006) 20.91 (0.00)
sj is the post-reform minimum school leaving age in country j. See Table 1 in Brunello ¯
et al. [2009]. Each regression, run separately for males and femals, included a constant, country dummies, a country-specific quadratic trend, survey dummies, age, age squared, the lagged country specific unemployment rate and GDP per capita, the country and gender specific labour force participation rate at the estimated time of labour market entry, the country specific GDP per head and unemployment rate at the age affected by the country specific reform. Robust standard errors are shown in parentheses. Three stars, two stars and one star for statistically significant coefficients at the 1%, 5%, 10% level.
32
Table 9: Association between education and wages in Europe. Standard quantile regression. Outcome (Y): logarithm of real gross wage in PPS (continuous). Treatment A: D ≡ 1(if the individual completed at least ¯ sj years of education)∗ (binary). Treatment B: D ≡ years of schooling (approximately continuous) .
Treatment A: D ≡ years of schooling Males Females
τ = 0.10
τ = 0.30
τ = 0.50
τ = 0.70
τ = 0.90
0.019∗∗∗ (0.002) 0.027∗∗∗ (0.003)
0.026∗∗∗ (0.001) 0.037∗∗∗ (0.001)
0.033∗∗∗ (0.001) 0.043∗∗∗ (0.001)
0.035∗∗∗ (0.001) 0.050∗∗∗ (0.001)
0.039∗∗∗ (0.002) 0.051∗∗∗ (0.002)
Treatment B: D ≡ 1(if the individual completed at least ¯ sj years of education)∗ Males Females Note.
∗
τ = 0.10
τ = 0.30
τ = 0.50
τ = 0.70
τ = 0.90
0.172∗∗∗
0.152∗∗∗
0.184∗∗∗
0.179∗∗∗
0.178∗∗∗ (0.026) 0.324∗∗∗ (0.030)
(0.023) ∗∗∗ 0.224 (0.033)
(0.012) ∗∗∗ 0.237 (0.017)
(0.015) ∗∗∗ 0.260 (0.014)
(0.018) ∗∗∗ 0.260 (0.019)
sj is the post-reform minimum school leaving age in country j. See Table 1 in Brunello ¯
et al. [2009]. Each regression, run separately for males and femals, included a constant, country dummies, a country-specific quadratic trend, survey dummies, age, age squared, the lagged country specific unemployment rate and GDP per capita, the country and gender specific labour force participation rate at the estimated time of labour market entry, the country specific GDP per head and unemployment rate at the age affected by the country specific reform. Robust standard errors are shown in parentheses. Three stars, two stars and one star for statistically significant coefficients at the 1%, 5%, 10% level.
33
Table 10: Heterogeneous impact of education on the distribution of wages in Europe. Identification strategy: Chesher [2003a]’s causal chain model. Treatment: actual years of schooling (approximately continuous). Outcome: logarithm of real gross wage in PPS (continuous). Treatment (D): actual years of schooling (approximately continuous). Instrument (Z): years of compulsory schooling prescribed by law (discrete).
Males τa = 0.10
τu = 0.10 τu = 0.30 0.0583∗∗∗ 0.0748∗∗∗ 0.004
0.004
0.003
0.004
0.006
τa = 0.30
0.0625∗∗∗
0.0476∗∗∗
0.0462∗∗∗
0.0420∗∗∗
0.0503∗∗∗
τa = 0.50
∗∗∗
∗∗∗
∗∗∗
∗∗∗
0.0469∗∗∗
0.007
0.0665 0.006
0.004
0.0492
τu = 0.50 τu = 0.70 τu = 0.90 0.0598∗∗∗ 0.0550∗∗∗ 0.0555∗∗∗ 0.005
0.003
0.0432
0.0478
0.004
0.004
0.004
0.006 0.006
τa = 0.70
0.0486
0.0396
0.0448
0.0411
0.0471∗∗∗
τa = 0.90
0.0468∗∗∗
0.0329∗∗∗
0.0384∗∗∗
0.0332∗∗∗
0.0452∗∗∗
Mean effect+
0.0598
0.0456
0.0465
0.0429
0.0499
Females τa = 0.10
τu = 0.10 τu = 0.30 0.0780∗∗∗ 0.0952∗∗∗ 0.007
0.004
0.004
0.005
0.007
τa = 0.30
0.0838∗∗∗
0.0701∗∗∗
0.0713∗∗∗
0.0730∗∗∗
0.0702∗∗∗
τa = 0.50
∗∗∗
∗∗∗
∗∗∗
∗∗∗
0.0646∗∗∗
∗∗∗
0.006 0.006
0.007
0.0847 0.006
∗∗∗
∗∗∗
0.004
0.004
0.004 0.004
0.003
0.0679
0.004
0.003
0.005 0.006
τu = 0.50 τu = 0.70 τu = 0.90 0.0759∗∗∗ 0.0820∗∗∗ 0.0788∗∗∗ 0.004
0.003
0.0707
0.0690
0.004
0.003
0.004
∗∗∗
0.006 0.006
τu = 0.70
0.0689
0.0573
0.0588
0.0615
0.0612∗∗∗
τu = 0.90
0.0631
0.0502
0.0527
0.0555
0.0567
Mean effect+
0.0792
0.0645
0.0655
0.0655
0.0674
∗∗∗
0.005
0.006
∗∗∗
∗∗∗
∗∗∗
0.003
0.003
0.003
0.003
0.003
0.004
0.005
0.006
Note: see [Brunello et al., 2009, Table 5] and details in the text. τu denotes the quantile of the distribution of labour market fortune and τa denotes the quantile of the distribution of ability. Bootstrapped stardard errors (100 replications) in small characters. τa ) quantile treatment effect.
34
+
Mean effect: average (over
Table 11: Heterogeneous impact of education on the distribution of wages in Europe. Identification strategy: Abadie et al. [2002]. Outcome (Y): logarithm of real gross wage in PPS (continuous). Treatment (D): whether the individual completed at least ¯ sj years of education∗ (binary). Instrument (Z): whether the individual was born after the pivot year (binary). τ = 0.10 τ = 0.30 τ = 0.50 τ = 0.70 τ = 0.90 Males Females
Table 12: Heterogeneous impact of education on the distribution of wages in Europe. Identification: Chernozhukov and Hansen [2005]. Identification strategy: Chernozhukov and Hansen [2005]. Outcome (Y): logarithm of real gross wage in PPS (continuous). Treatment (D): whether the individual completed at least ¯ sj ∗ years of education (binary). Instrument (Y): whether the individual was born after the pivot year (binary). τ = 0.10 τ = 0.30 τ = 0.50 τ = 0.70 τ = 0.90 Males Females
Table 13: Heterogeneous impact of education on the distribution of wages in Europe. Identification: Chernozhukov and Hansen [2005]. Outcome: logarithm of real gross wage in PPS (continuous). Treatment: actual years of schooling (approximately continuous). Instrument: years of compulsory schooling prescribed by law (discrete). τ = 0.10 τ = 0.30 τ = 0.50 τ = 0.70 τ = 0.90 Males Females
35
A. Abadie, J. Angrist, and G. Imbens. Instrumental Variable Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings. Econometrica, 70(1):91– 117, January 2002. J.D. Angrist and G.W. Imbens. Two-Stage Least Squares Estimation of Average Causal Effect in Models with Variable Treatment Intensity. Journal of American Statistical Association, 90(430):431–442, June 1995. J.D. Angrist, G.W. Imbens, and D.B. Rubin. Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association, 91(434):444–455, June 1996. with discussion. E. Battistin and M. Fort. What’s Missing from Policy Evaluation: Identification and Estimation of the Distribution of Treatment Effects. Atti della XLIV Riunione Scientifica, Societ Italiana di Statistica, pages 127–134, 2008. M. Bitler, J. Gelbach, and H. Hoynes. What Mean Impact Miss: Distributional Impacts of Welfare Reform Experiments. American Economic Review, 96(4):988–1012, September 2006. G. Brunello, M. Fort, and G. Weber. Changes in Compulsory Schooling, Education and the Distribution of Wages in Europe. The Economic Journal, 119, March 2009. Forthcoming. N. Cabrera and V.J. Evans. Welfare Reform and Its Consequences: What Questions are Left Unanswered? Poverty Research News 4(6), Joint Center for Poverty Research, Nov.-Dic. 2000. Issue: “What Policymakers Want to Know”. V. Chernozhukov and C. Hansen. An IV Model of Quantile Treatment Effects. Econometrica, 73(1):245–261, January 2005. V. Chernozhukov and C. Hansen. Instrumental Quantile Regression Inference for Structural and Treatment Effect Models. Journal of Econometrics, 2006. A. Chesher. Exogenous Impact and Conditional Quantile Functions. Working Paper CWP01/01, Centre for Microdata Methods and Practice, October 2001. A. Chesher. Instrumental Values. Working Paper CWP17/02, Centre for Microdata Methods and Practice, March 2002. A. Chesher. Identification in Nonseparable Models. Econometrica, 71:1405–1441, 2003a.
36
A. Chesher. Nonparametric Identification Under Discrete Variation. Working Paper CWP19/03, Centre for Microdata Methods and Practice, December 2003b. A. Chesher. Identification - Course Notes. Course Notes for the 2003 EEA Summer School in Microeconometrics, September 2003c. A. Chesher. Nonparametric Identification Under Discrete Variation. Econometrica, 73 (5):1525–1550, September 2005. A. Chesher. Instrumental Values. Journal of Econometrics, 139:15–34, 2007a. Special Issue: Endogeneity, Instruments and Identification. A. Chesher. Endogeneity and Discrete Outcomes. Working Paper CWP05/07, Centre for Microdata Methods and Practice, March 2007b. A. Chesher. Instrumental Variable Models for Discrete Outcomes. Working Paper CWP30/08, Centre for Microdata Methods and Practice, November 2008. D.R. Cox. The Planning of Experiments. Wiley, New York, 1958. K. Doksum. Empirical Probability Plots and Statistical Inference for Nonlinear Models in the Two-Sample Case. The Annals of Statistics, 2:267–277, 1974. R.F. Engle, D.F. Hendry, and J.-F. Richard. Exogeneity. Econometrica, 51(2):277–304, 1983. S. Firpo. Efficient Semiparametric Estimation of Quantile Treatment Effect. Econometrica, 75(1):259–276, January 2007. R. A. Fisher. Design Of Experiments. Oliver and Boyd, London, 1935. J. Heckman and B. E. Honore. The Empirical Content of the Roy Model. Econometrica, 58(5):1121–1149, 1990. J. Heckman and J. Smith. Evaluating the Welfare State. Working Paper 6542, National Bureau of Economic Research, May 1998. J.J. Heckman, J. Smith, and N. Clements. Making The Most Out of Programme Evaluations and Social Experiments: Accounting for Heterogeneity in Programme Impacts. Review of Economic Studies, 64(4):487–535, Oct. 1997. Special Issue: Evaluation of Training and Other Social Programmes.
37
G. Imbens. Nonadditive Models with Endogenous Regressors. http://elsa.berkeley.edu/∼ imbens/wp.shtml, February 2006.
Paper available at
G.W. Imbens and D.B. Rubin. Estimating the Outcome Distribution for Compliers in Instrumental Variables Models. Review of Economic Studies, 64:555–574, 1997. R. Koenker. Quantile Regression. Cambridge University Press, 2005. R. Koenker and L. Ma. Quantile Regression Methods for Recursive Structural Equation Models. Journal of Econometrics, 134(2):471–506, October 2006. E.L Lehmann. Nonparametrics: Statistical Methods Based on Ranks. San Francisco, CA: Holdenday, 1974. J. Neyman. Statistical Problems in Agricultural Experiments. Journal of the Royal Statistical Society, 2(2):107–180, 1923. Supplement. P. Rosenbaum and D. Rubin. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, 70(1):41–55, 1983. P.R. Rosenbaum. From Association to Causation in Observational Studies: The Role of Tests of Strongly Ignorable Treatment Assignment. Journal of the American Statistical Association, 79(385):41–48, 1984. D. Rubin. Bayesian Inference for Causal Effects: The Role of Randomization. The Annals of Statistics, 6(1):34–58, 1978. D. Rubin. Randomization analysis of experimental data: The fisher randomization test comment. Journal of the American Statistical Association, 75(371):591–593, September 1980. D. Rubin. [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. section 9.] Comment: Neyman(1923) and Causal Inference in Experiments and Observational Studies. Statistical Science, 5(4):472–480, November 1990. D.B Rubin. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psycology, 66:688–701, 1974. A. Sen. Handbook of Income Distribution, chapter Social Justice And The Distribution Of Income. North-Holland, Amsterdam-The Netherlands, 2000. A.B. Atkinson and F. Bourguignon Editors.
38
A. Sen. On Economic Inequality. Oxford, Clarendon Press, 1997. C.P. Simon and L. Blume. Mathematics for Economists. Norton & Company, New York, 1994. E. Vytlacil. Independence, Monotonicity, and Latent Variable Models: An Equivalence Result. Econometrica, 70:331–341, 2002.
39