ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND. CONTROL FUNCTIONS. Jeff Wooldridge. Michigan State University. BGSE/IZA Course in ...
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. Homogeneous Treatment Effect and Treatment Effects a Function of Obervables 3. Heterogeneous Treatment Effects and LATE 4. Control Function Methods for Heterogeneous Treatment Effects 5. A Different Approach to Allowing Heterogeneous Treatment Effects 6. Nonlinear Response Models 1
1. Introduction
∙ Consider now the case where unconfoundedness, or selection on observables, does not hold. Allow units to select into treatment based on unobservables that affect the response.
∙ Even if eligibility is randomly assigned, actual enrollment in programs may suffer from self selection. But randomized eligibility can often be used as an instrumental variables.
∙ Identification and estimation are very simple in the constant treatment effect case. Contemporary interest is often in the heterogeneous treatment effect case.
2
∙ Three general approaches: (i) Study simple IV estimators to see if they estimate anything of value under fairly weak assumptions; (ii) Add more assumptions and try to estimate a parameter such as ate ; (iii) Use “structural” economic models without parametric assumptions, and try to identify interesting policy or structural parameters.
3
2. Homogeneous Treatment Effect and Treatment Effects a Function of Obervables
∙ In the simplest case we assume Y1 − Y0 is constant. Then Y Y0 WY1 − Y0 Y0 W ≡ 0 W V 0 where 0 ≡ EY0 and V 0 ≡ Y0 − 0 . So, we have a simple regression model with a constant slope.
4
∙ Suppose we have a single instrumental variable, Z. (At this point, Z could be binary, continuous, or anything in between.) Recall the two requirments for an instrument: CovZ, W ≠ 0 CovZ, V 0 CovZ, Y0 0
5
∙ The first requirement (relevance) is fairly easily tested checked. The second (exogeneity) is maintained. Note that having Z predetermined or randomly assigned does not guarantee its exogeneity. The value of Z could affect Y0 through other channels.
∙ Conveniently, the nature of Y is not restricted, but the treatment effect is assumed constant. ate att is consistently estimated by the IV estimator. [Remember, IV not usually unbiased.]
∙ Even if we restrict attention to consistency, it is not true that one should use a “slightly” endogenous instrument rather than OLS.
6
∙ Why? It is easy to show: plim ̂ OLS VW0 CorrW, V 0 and CorrZ, V 0 plim ̂ IV VW0 CorrZ, W So, if CorrZ, W is small – that is, Z is a “weak” instrument – then even a small correlation between Z and V 0 can produce a larger asymptotic bias than OLS.
7
∙ In economics, very common to see IV estimates that are larger in magnitude than OLS estimates. Usually, other explanations are given other than bad IV. Measurement error and heterogeneous treatment effect (later) are among them.
∙ Weak instruments lead to large asymptotic standard errors, too: Avar N ̂ IV
2V 0 − 2 2 . W Z,W
2 When Z,W is small, the asymptotic variance can be very large. The 2 formula for the OLS estimator omits Z,W .
8
Adding Other Covariates and Instruments
∙ Suppose we need to add covariates before our instruments are appropriately exogenous: Yg g x − x g u g , g 0, 1 where g EYg and x Ex.
∙ We have made a functional form assumption once we assume V g has a zero mean, conditional on x. We also assume that netting out x makes z exogenous. Easiest way to impose both: Eu g |x, z 0, g 0, 1.
9
∙ In effect, we have made the standard exclusion restrictions plus a linear functional form. Write Y Y0 WY1 − Y0 0 x − x 0 u 0 W 1 − 0 Wx − x Wu 1 − u 0 0 W x 0 Wx − x u 0 Wu 1 − u 0 where 1 − 0 and 0 0 − x 0 .
∙ The last term – the interaction of the treatment and the unobserved gain from treatment, e ≡ u 1 − u 0 , causes difficulties.
10
∙ In the simplest case, there is no heterogeneity except in the intercept: 0, u 1 u 0 . Then Y 0 W x 0 u 0 , and we can estimate the parameters by 2SLS using instruments 1, x, z.
∙ How might we exploit the binary nature of W? 1. (Not Recommended): Take the expected value conditional on all exogenous variables: EY|x, z 0 EW|x, z x 0 Eu 0 |x, z 0 PW 1|x, z x 0 because W is binary.
11
∙ Two-step procedure: (i) Estimate PW 1|x, z by probit (or logit) and obtain the first-stage fitted probabilities, say ̂ i ̂ 0 x i ̂ 1 z i ̂ 2 . Should be able to reject 2 0 fairly ̂ i in place of W i : strongly. (ii) Use the ̂ i , x i , i 1, . . . , N. Y i on 1,
12
∙ Why not recommended? (a) Inconsistent generally if model (probit in this case) for PW 1|x, z is wrong. Then EY|x, z ≠ 0 0 x 1 z 2 x 0 ; (b) Need to adjust standard errors for “generated regressor”; (c) No known efficiency properties; (d) Tempting to achieve identification off of the nonlinearity in the probit in the absense of instruments.
13
̂ i as an IV, not a regressor. That is, estimate 2. (Recommended): Use Y i 0 W i x i 0 u i0 , ̂ i , x i . by IV using instruments 1, ∙ Not the same as using ̂ i as a regressor! The first stage when ̂ i is used as an IV is ̂ i x i ̂ Ŵ i ̂ 0 ̂ 1 2 and ̂ 0 ≠ 0, ̂ 1 ≠ 1, ̂ 2 ≠ 0.
14
∙ Why recommended? (a) Misspecification of probit model does not matter, provided W is partially correlated with 0 x 1 z 2 ! So this method, like using 1, x i , z i as instruments, is robust to having the model for PW 1|x, z wrong. (b) No need to account for generated instruments in standard errors – see Wooldridge (2002, Chapter 6). (c) Estimator is efficient IV estimator if Varu 0 |x, z Varu 0 and probit model for W is correct.
15
∙ As a practical matter, there might not be much difference in the point estimates between the approaches. Certainly not if the coefficients in ̂ i , x i are close to 0, 1, 0. But using the fitted the regression W i on 1, values as a regressor has no advantages.
∙ Easy to modify to allow treatment to interact with x. After first stage probit, estimate Y i 0 W i x i 0 W i x i − x̄ error i ̂ i, xi, ̂ i x i − x̄ . Benefits same as by IV, using instruments 1, without interaction.
∙ Can safely ignore estimation error in sample mean, x̄ .
16
∙ Note we have now allowed heterogeneous treatment effects but only as a function of x. In particular, ̂ x ̂ x − x̄ ̂, and we can insert different values for x to see how the ATE varies with observed characteristics.
17
3. Heterogeneous Treatment Effects and LATE The Local Average Treatment Effect
∙ What does IV estimate, in general, if the gain from treatment, Y1 − Y0, is not constant? Imbens and Angrist (1994): Now treatment is counterfactual, too. Let Z be the binary instrumental variable, which is often randomized eligibility, so that Z is zero or one. Let W0 be treatment status if Z 0 and let W1 be treatment status if Z 1. (Some eligibles may not choose to participate; some non-eligibles may find other ways to “participate.”)
18
∙ We only observe one of the counterfactual treatments: W 1 − ZW0 ZW1 Now write Y 1 − WY0 WY1 Y0 W0Y1 − Y0 ZW1 − W0Y1 − Y0
19
Assumptions: LATE.1: Z is independent of W0, W1, Y0, Y1 LATE.2: PW 1|Z 1 ≠ PW 1|Z 0 LATE.3: W1 ≥ W0 (no defiers, or monotonicity)
20
∙ This last condition means that if a unit would participates in the program when not eligible, they would participate if made eligible. (Vietnam draft lottery: a defier would be someone who would serve if not drafted but would not serve if drafted. LATE.3 rules this out.)
∙ Imbens and Angrist define the local average treatment effect (LATE) as LATE EY1 − Y0|W1 − W0 1 which is the average treatment effect for those induced into treatment by assignment. (That is, W0 0 and W1 1.)
21
∙ Imbens and Angrist call the subpopulation with W1 1, W0 0 the compliers. These individuals comply with assignment no matter what their eligibility. That is, if they are assigned to the control group, they would not participate, and if they are assigned to the treatment, they do participate.
∙ One criticism of LATE is that it is a treatment effect for an unidentifiable segment of the population: we do not know, on the basis of observing Y i , W i , Z i whether unit i is a complier.
22
∙ Consequently, LATE may not have “external validity,” that is, it may not apply to evaluations of other programs even if the population is the same.
∙ A related point is that two different binary instruments lead to different groups of compliers in the same population. For example, if Y is log of earnings, W is college attendance, and Z is living near a college, the compliers are those who would attend college if one is nearby, but not otherwise. But if Z is whether a grant is offered, the compliers consist of those who would not attend college if they do not receive a grant, but would attend if they do.
23
∙ The key result of IA is that, under the three LATE assumptions, the usual IV estimator consistently estimates LATE . To see this, use the independence assumption to write EY|Z EY0 EW0Y1 − Y0 ZEW1 − W0Y1 − Y0 so that EY|Z 1 − EY|Z 0 EW1 − W0Y1 − Y0.
24
Now EW1 − W0Y1 − Y0 1 EY1 − Y0|W1 − W0 1PW1 − W0 1 0 EY1 − Y0|W1 − W0 0PW1 − W0 0 −1 EY1 − Y0|W1 − W0 −1PW1 − W0 −1 EY1 − Y0|W1 − W0 1PW − EY1 − Y0|W1 − W0 −1PW1 − W0 −1 EY1 − Y0|W1 − W0 1PW because PW1 − W0 −1 0 by the no defiers assumption.
25
∙ Because PW1 − W0 −1 0, W1 − W0 is a binary variable, and PW1 − W0 1 EW1 − W0 EW1 − EW0.
∙ But, again, Z is independent of W0, W1, and so EW|Z 1 − ZEW0 ZEW1 which implies EW1 − EW0 EW|Z 1 − EW|Z 0 PW 1|Z 1 − PW 1|Z 0 ≠ 0 by LATE.2.
26
∙ We have shown that EY1 − Y0|W1 − W0 1
EY|Z 1 − EY|Z 0 , PW 1|Z 1 − PW 1|Z 0
and this is the probability limit of the simple IV estimator when Z and W are both binary. In fact, the IV estimate is just the Wald estimate, N
̂ LATE
N
−1 N −1 1 ∑ i1 Z i Y i − N 0 ∑ i1 1 − Z i Y i
N −1 1
N i1
∑ ZiWi −
N
N −1 0
∑
N 1 i1
− W i Y i
,
where N 1 ∑ i1 Z i and N 0 N − N 1 . (Sometimes called a “grouping estimator.”)
27
∙ This interpretation of IV has had wide influence in the treatment effect literature. It is probably relied on too much to explain why IV estimate is larger than OLS. Sometimes one forgets the possibility that the IV is not exogenous.
∙ Literature is vast on allowing covariates, x, in the analysis and when z is not a binary scalar. Flavor of results carries through.
28
4. Control Function Methods for Heterogeneous Treatment Effects
∙ Suppose we think there are heterogeneous treatment effects but are not satisfied with estimating LATE (or possibly the necessary assumptions do not hold). The “old-fashioned” approach is to add more assumptions and apply control function methods. This leads to the so-called “switching regression” model, which is the same as a random coefficient model.
∙ Recall that if Yg g x g u g we can write Y 0 W x 0 W x − x u 0 W e where e u 1 − u 0 . Note that the ATE is still the coefficient on W. 29
∙ We have to take seriously the underlying functional forms, EYg|x g x g , and we will further restrict the conditional distribution of Yg. In effect, this setup applies to Yg continuous. (Contrast the approaches based on unconfoundedness, particularly PS weighting and matching.)
∙ Even if we add the exogeneity assumptions Eu 0 |x, z Ee|x, z 0, it is generally not true that
30
EW e|x, z ≠ EW e. and this is the condition for earlier IV estimators to be consistent for if the interaction term W e is in the error term – see Wooldridge (2003, Economics Letters).
∙ As it turns out, this condition can hold for continuous treatments, in which case only 0 is estimated inconsistently. The IV estimator is consistent for ate .
∙ For common binary response models for W, this mean independence assumption is known not to hold.
31
∙ Instead, use a control function approach, that is, find EY|W, x, z: EY|W, x, z 0 W x 0 W x − x Eu 0 |W, x, z WEe|W, x, z
∙ We can estimate the expectations if we model W. Probit is convenient, and we now have to take the model seriously. W 1 0 x 1 z 2 c ≥ 0 and u 0 , u 1 , c|x, z ~ Multivariate Normal
∙ Can relax this a bit: Eu 0 |c, x, z and Eu 1 |c, x, z linear in c (and do not depend on x, z) along with Dc|x, z standard normal. 32
∙ Expectations have the well-known “inverse Mills ratio” form: EY|W, x, z 0 W x 0 W x − x Wr/r 1 − Wr/1 − r where is the standard normal pdf and is the standard normal cdf and r 1, x, z.
∙ This is an estimating equation for , and , too, when we want x x − x .
33
∙ Two-Step Method (Heckman, but for “switching regression,” or “endogenous treatment,” not sample selection): (i) As before Run probit of W i on 1, x i , z i and obtain ̂ i r i ̂ and ̂ i r i ̂. (2) Run the OLS regression ̂ i , 1 − W i ̂ i /1 − ̂ i Y i on 1, W i , x i , W i x i − x̄ , W i ̂ i /
∙ There is a “generated regressor” problem. Need to adjust standard errors and inference for first-stage estimation. Can use “heckit” in Stata applied to treated and nontreated.
34
∙ If, say, we estimate the coefficients ̂ 0 , ̂ ′0 ′ and ̂ 1 , ̂ ′1 ′ using heckit on the W i 0 and W i 1 subgroups, important to form ̂ x ̂ 1 − ̂ 0 x̂ 1 − ̂ 0 and, as a special case, ̂ ̂ ate ̂ 1 − ̂ 0 x̄ ̂ 1 − ̂ 0 . It makes no sense to just look at ̂ 1 − ̂ 0 unless ̂ 1 ̂ 0 has been imposed or x̄ is, coincidentally, close to zero.
35
∙ The ATEs are never obtained by looking at ÊY i |W i 1, x i , z i − ÊY i |W i 0, x i , z i and averaging these differences across i. We already know how and x depend on the parameters. They do not depend on z!
∙ If the ATEs were based on EY|W, x, z there would be no identification problem and never any functional form restrictions because we could always estimate EY|W, x, z nonparametrically.
∙ Again, we use the expression for EY|W, x, z as an estimating equation, not for computing treatment effects.
∙ Whether separate estimations are carried out or the pooled regression is used, can use the bootstrap for inference.
36
∙ The Stata command “treatreg” (typically using MLE, but sometimes using the CF approach) does not apply to the full switching regression case. In fact, it seems to apply only to the simple model with homogeneous effect, Y 0 W x 0 u 0 , in which case more robust IV methods are available. The CF regression in this case is ̂ i 1 − W i ̂ i /1 − ̂ i Y i on 1, W i , x i , W i ̂ i /
∙ That is, treatreg imposes the restriction .
37
∙ As usual, need make sure z has at least one element that predicts treatment once x is controlled for. (Use the first-stage probit.)
∙ In the pooled estimation in the general case, can use a joint test of the ̂ i , 1 − W i ̂ i /1 − ̂ i to test the null that W is two terms W i ̂ i / exogenous. Under the null, do not need to account for generated regressors, so use a heteroskedasticity-robust Wald test.
38
∙ Extension of the usual Heckman approach. Suppose we allow lots of heterogeneity in the two counterfactual equations: Y i g a ig x i b ig g x i g x i d ig u ig where a ig , b ig is independent of x i , z i . We can write Y i 0 W i x i 0 W i x i − x u i0 W i e i x i d i0 W i x i g i where g i d i1 − d i0 .
39
∙ Under sufficient normality and linear conditional expections of heterogeneity given c i (the error in the probit for W i ) we get the control function equation EY i |W i , x i , z i 0 W i x i 0 W i x i − x hW i , r i W i hW i , r i hW i , r i x i W i hW i , r i x i where hW i , r i W i r i − 1 − W i −r i
40
∙ So the generalized residual from the probit, hW i , r i ̂, gets added on its own, interacted with W i , interacted with x i , and interacted with W i x i . (A total of 2K 1 terms.)
∙ The null that W i is exogenous is a test that all terms have zero coefficients. Usual heteroskedasticity-robust Wald test can be used.
∙ As usual, with a two-step estimation method, need to adjust standard errors if conclude W i is endogenous. (Bootstrap.)
41
∙ Even more flexibility: Allow W i to follow a heteroskedasticity probit, that is W i 1r i expr oi c i ≥ 0 (where r oi omits a constant) in which case the generalized residual is hW i , r i ̂, r oi ̂ W i exp−r oi ̂r i ̂ − 1 − W i − exp−r oi ̂r i
∙ A lot of scope for studying robustness of findings even within parametric models.
42
5. A Different Approach to Allowing Heterogeneous Treatment Effects
∙ Based on Wooldridge (2007, Advances in Econometrics). Go back to Y 0 W x 0 W x − x u 0 W e
∙ Rather than find Eu 0 |W, x, z and Ee|W, x, z, now write W e EW e|x, z a, where Ea|x, z 0 by definition. Then Y 0 W x 0 W x − x EW e|x, z u 0 a, where now the composite error, c u 0 a, has zero mean conditional on x, z.
43
∙ Turns out EW e|x, z is remarkably simple when W follows a probit: EW e|x, z r 0 x 1 z 2
∙ The estimating equation is Y 0 W x 0 W x − x r q Eq|x, z 0.
∙ But W is still endogenous in this equation, so we need an instrument. (This is why the method differs from standard control function approach.)
∙ The added regressor, r, is exogenous, and acts as its own instrument. The natural instrument for W is r.
44
∙ Two-step procedure: ̂ i. (i) As before, probit of W on 1, x, z. Now get ̂ i in addition to (ii) Estimate Y i 0 W i x i 0 W i x i − x̄ ̂ i error i ̂ i, xi, ̂ i x i − x̄ , ̂ i . by IV, using instruments 1, ∙ Generated regressor problem with ̂ i , so use delta method or bootstrap. (Do not need to worry about generated instruments.)
45
∙ A test of whether W u 1 − u 0 is needed is just the usual (heteroskedasticity-robust) t statistic on ̂ i , after IV estimation. Not a test of endogeneity of W because W can still be correlated with u 0 .
∙ Drawback to this approach: Does not work with binary Z without covariates: 0 2 Z and 0 2 Z are perfectly collinear when Z is binary. Generally, might be sensitive to weak instruments. Need Z to vary sufficiently.
∙ Control function approach can work with binary Z, at least in some cases. But in CF case, test of interaction not robust to nonnormality (endogeneity and functional form tied together).
46
∙ Can learn other things by studying the estimating equation. Assume we have no covariates, so drop x and write Y 0 W 0 2 Z q Eq|Z 0. Via simulations, Angrist (1990, NBER Technical Working Paper) studied the properties of the usual IV estimator in the equation without 0 2 Z. In other words, the implicit error term is 0 2 Z q. In general, the IV, Z (which is what Angrist used, not 0 2 Z), is correlated with 0 2 Z. But not always. In fact, in Angrist’s simulations, 0 2 Z 2 Z and Z has a symmetric distribution about zero. 47
∙ Just like Z and Z 2 are uncorrelated when Z is symmetrically distributed about Z, CovZ, 2 Z 0 (because is symmetric about zero). We already know CovZ, q 0, and so Z is uncorrelated with 0 2 Z q.
∙ Likely explains why Angrist’s simulation results for the simple IV estimator look so promising for estimating .
48
∙ This method is likely less efficient than control function methods in general. But it can be quite a bit easier with multiple treatment levels. For example, suppose we replace W with a vector W 1 , . . . , W J . Then the control function method requires computing the expectations Eu 0 |W, x, z and Ee|W, x, z, which can be very difficult, whereas the “correction function” method requires only calculation of EW j e|x, z for each j separately.
49
6. Nonlinear Response Models
∙ What if Y is clearly a discrete outcome, such as a binary response? Could take the Imbens and Angrist perspective and settle for (hopefully) estimate LATE. But what if we want to estimate the ATE as a function of covariates x?
∙ Unfortunately, no general, robust procedures. Joint MLE is the only reliable method. That is, specify a model for W as a function of x, z and also a model for Y.
50
∙ Some poor practices linger, especially in complicated discrete response models (multinomial and conditional logit, for example). Even if Y is binary, often see a two-step method applied. ∙ (1) Probit of W i on x i , z i and get fitted values, ̂ i . (2) Probit of Y i ̂ i, xi. on
∙ Coefficient on ̂ i is meaningless (except maybe for the sign). How do we get the ATE from this? No current justification for these kinds of approaches.
∙ Is it somehow possible that
51
∙ Can think of a similar Tobit method (when Y is a corner solution response) that attempts to mimic “two-stage least squares.
52
∙ What can we do that at least works under a set of assumptions? ∙ Suppose Yg are binary, and we are willing to assume, for g 0, 1, Yg 1 g x g u g 0 u g |x, z ~ Normal0, 1
∙ We know that the ATE as a function of x is x 1 x 1 − 0 x 0 , which is easy to estimate given estimates of the parameters.
53
∙ If we again assume W 1 0 x 1 z 2 c ≥ 0 and u 0 , u 1 , c|x, z ~ Multivariate Normal, then estimation using standard software is straightforward: apply the Heckman selection approach to the probit model for the control W i 0 and treated W i 1 groups separately.
54
∙ In Stata, “heckprob.” This gives ̂ 0 , ̂ 0 W i
0, need instrument) and
̂ 1 , ̂ 1 (W i 1) and then we have ̂ x ̂ 1 x̂ 1 − ̂ 0 x̂ 0 , which we can study at various values of x, or average across subpopulations. The estimate of ate E x 1 x 1 − 0 x 0 is N
̂ ate N −1 ∑̂ 1 x i ̂ 1 − ̂ 0 x i ̂ 0 . i1
55
∙ Simple test for null that W is exogenous. Can show that the score (Lagrange multiplier) test can be obtained as the following variable addition test: (1) Probit of W i on 1, x i , z i to get ̂ and the generalized residual, gr i ≡ W i r i ̂ − 1 − W i −r i ̂ (2) Probit of Y i on 1, x i , W i , gr i and the usual MLE t statistic. Under the null, no need to adjust the standard error for first-stage estimation.
∙ Cannot be theoretically justified as a correction under H 1 , but one wonders for “local” forms of endogeneity.
56
∙ Rather than compute ̂ ate as earlier, can extend Blundell and Powell (2003): N
̂ x N −1 ∑̂ x̂ ̂ ̂ gr i − ̂ x̂ ̂ gr i i1
where ̂ is the probit coefficient on W i . Then average out across x i to approximate ̂ ate . (Caution: I really have no idea if or when this works.)
57
∙ For testing the null, can do lots of different things. For example, with multiple binary treatments W i1 , . . . , W iG , can (assuming enough instruments) compute a GR for each W i1 , then do probit Y i on 1x i , W i1 , . . . , W iG , gr i1 , . . . , gr iG and do a usual joint Wald test for joint significance of gr i1 , . . . , gr iG . Under null, they should be jointly significant.
58
∙ For a test, could use the Pearson (standardized) residual in place of the generalized residual: pr i W i − r i ̂/ r i ̂1 −r i ̂ As with adding GR, might this approximate the average treatment effect when averaged out of the second-stage probit?
∙ Recent work [Wooldridge (2008, manuscript)]: If W i (scalar) is endogenous and follows a probit, can use the same bivariate probit (quasi-) log likelihood when Y i is a fractional response following a probit mean function, without further retricting the distribution of Y i .
59
∙ If Yg is a corner (labor supply, charitable contributions) might specify Yg max0, g x g u g , g 1, 2 and use the assumptions as above, except allow 20 and 21 as variances of u 0 , u 1 . Under the multivariate normality of u 0 , u 1 , c (and being independent of x, z), can use MLE. The estimated average treatment effect as a function of x is ̂ x ̂ 1 x̂ 1 /̂ 1 ̂ 1 ̂ 1 x̂ 1 /̂ 1 − ̂ 0 x̂ /̂ 0 ̂ 0 ̂ 0 x̂ /̂ 0 0
0
and average this across x i to estimate ate .
60
∙ For other functional forms, we can get by with fewer restrictions, although often same parameters are imposed across regimes.
∙ Write EY|W, x, z, r 1 exp x W r 1 , where r 1 is an unobservable. Terza’s (1998) control function method based on EY|W, x, z exp x WEexpr 1 |W, x, z.
61
∙ When W follows a probit, EY|W, x, z exp x WhW, 0 x 1 z 2 , where hW, 0 x 1 z 2 , exp 2 /2W r/z 1 − W1 − r/1 − r.
∙ Estimate using first stage probit, as before. Then, say, Poisson QMLE with the above mean function to estimate , , and .
62
∙ Then the ATE as a function of x is estimated as ̂ x exp̂ x̂ ̂ − exp̂ x̂
∙ What is the score test of H 0
: 0 (W is exogenous)? Can show the
variable addition test is (1) Probit of W i on 1, x i , z i , as usual; (2) Poisson (say) regression of Y i on 1, x i , W i , lgr i where lgr i is the log of the usual generlized residual: lgr i W i logr i ̂ − 1 − W i log−r i ̂
63
∙ IV methods that work for any distribution of W are available [Mullahy (1997)]. If EY|W, x, z, r 1 expx W r 1 , and r 1 is independent of z then Eexp−x − WY|z Eexpr 1 |z 1, where Eexpr 1 1 is a normalization. The moment conditions are Eexp−x − WY − 1|z 0.
64