Robust system estimation of causal effects on ... - Semantic Scholar

1 downloads 0 Views 165KB Size Report
nonlinear regressions with a multivariate Student's-t error structure. An application ... a priori information on the joint distribution of the disturbances. In an.
Robust system estimation of causal effects on binary outcomes, with application to effect of alcohol abuse on employment M. Christopher Auld1 University of Calgary November 2002

Abstract This paper discusses estimation of causal effects of non-randomly assigned binary treatments on binary outcomes. The approach is full information maximum likelihood estimation of endogenously switching nonlinear regressions with a multivariate Student’s-t error structure. An application measures the effect of problem drinking on probability of employment.

JEL Classification: I1, C3

1

I thank the Alberta Heritage Foundation for Medical Research for financial support.

Robust system estimation of causal effects on binary outcomes, with application to effect of alcohol abuse on employment

1

Introduction.

Researchers are commonly faced with the problem of estimating the causal effect of a non–randomly assigned “treatment” on binary outcomes. For example, clinical researchers may be interested in experimental data subject to non–ignorable missingness when the outcomes are dichotomous such as ’did the patient live.’ Social scientists may be interested the effect of a job training program on the probability of employment, or the effect of having a college degree on the probability of having a child. The inference problem centers on non–random assignment to the ‘treatment’ and ‘control’ groups, which generally implies that simple comparisons will confound causal effects with selection on unobservables. Recently, structural models for such problems have been of much interest (e.g., Heckman and Robb 1985, Angrist, Imbens, and Rubin 1996, Heckman 1997), most often in the context of a binary treatment and continuous outcome measures. These models attempt to estimate subject–specific counterfactuals by estimating state–specific outcomes along 1

with the process determining whether the unit is in the treatment of control groups. This paper discusses full information maximum likelihood estimation of counterfactual probabilities when the errors follow a multivariate Student’s t distribution (MVT). An econometric approach is developed to estimate counterfactuals building on the work of Heckman, Tobias, and Vytlacil (2000) and Aakvik, Heckman, and Vytlacil (2000). The former paper discusses two–step estimation of causal effects on continuous outcomes when the errors are multivariate Student’s t, whereas the latter develops system maximum likelihood estimation of causal effects on binary outcomes with a Gaussian common factor error structure. The approach developed in this paper extends this framework to frequentist full information maximum likelihood estimation when the errors are MVT. A Bayesian approach to this framework is discussed by Chib and Hamilton (2001). The MVT model’s key advantage over the textbook Gaussian model is that it is closer to a semiparametric approach, imposing less restrictive a priori information on the joint distribution of the disturbances. In an application, the effect of heavy drinking on probability of employment is estimated. The estimated treatment effects are shown to be sensitive to the distributional assumptions. The MVT model is a convenient alternative to the Gaussian because it nests the Gaussian as a special case, such that likelihood ratios provide simple specification tests. Further, the Fisher information in the model is block diagonal with respect to the degrees of freedom parameter (Lange, Little, and Taylor 1989), which implies that covariance

2

estimates do not have to be adjusted after a prior search over degrees of freedom for the distribution. Finally, the MVT distribution has the property that its marginals are also MVT, which reduces computational demands. The paper is organized as follows. Section 2 presents the analytical framework. Section 3 presents expressions for the evaluation of subject– specific and averaged treatment effects (average, marginal, local, and treatment– on–the–treated). Section 4 discusses specification tests. An application to drinking and employment is discussed in Section 5, and Section 6 concludes.

2

Econometric model.

Let D ∈ {0, 1} denote a quasi–randomly assigned treatment and let Y ∈ {0, 1} denote a binary outcome. The observed outcomes for the tth subject are (Dt , Wt , Ytj ), where Ytj is t’s outcome given his treatment is Dt = j and Wt is a vector of exogenous covariates. The inference problem arises because the counterfactual outcome Yt1−j is not observed. Dropping subscripting, a structural model cast in terms of latent outcomes can be written, D ∗ = Zγ + UD Y1∗ = Xβ1 + U1 Y0∗ = Xβ0 + U0 ,

(1)

where (D, Y1 , Y0 ) = (1[D ∗ > 0], 1[Y1∗ > 0], 1[Y0∗ > 0]), Z, X ⊆ W are vectors of covariates, (γ, β1 , β0 ) are parameters to be estimated, and U = (UD , U1 , U0 ) 3

is a vector of error terms. Throughout, assume U is independent of (X, Z). It is convenient to write Y = DY1 + (1 − D)Y0

(2)

the observed outcome in terms of the treatment and potential outcomes. An appropriate empirical method must impose enough structure to identify the distributions of Y1 and Y0 given the data.

2.1

Robust error structure.

Estimation using a likelihood based method requires distributional assumptions for the errors U . The MVT structure has long been advocated in the Bayesian literature, see Zellner (1976), Lange, Little, and Taylor (1989), Geweke (1993), Hamilton (2000) and Chib and Hamilton (2001). Under the MVT assumption, the density of U can be expressed tν (u, Σ) =

Γ(1/2(ν + 3)) q

Γ(ν/2) |Σ|(νπ)3

1 0 1+ u Σu ν



 

−(1/2)(ν+3)

(3)

where ν is the “degrees of freedom” of the distribution, Γ(·) is the Gamma function, and (for ν > 2) Σ is the covariance matrix. The multivariate-t distribution has the advantages that it converges to the normal distribution as ν → ∞, but its “fat tails” allow more robust inference in the presence of unusual observations when ν is small. The correlation between U1 and U0 does not enter the likelihood and is therefore not identified. Define σD0 = corr(UD , U0 ) and σD1 = corr(UD , U1 ).

4

The correlations reflect selection on unobservables, whereas correlations between Zγ and Xβj , j = 0, 1, reflect selection on observables. The covariance matrix of the errors takes the form 



1 σD1 σD0  V (U ) = Σ =  σD1 1 0   σD0 0 1

(4)

The lack of identification of Cov(U0 , U1 ) is problematic in that some of the estimated distributions to be discussed below depend on this parameter. Aarkvik, Heckman, and Vytlacil (2000) handle this problem by imposing more structure on the covariance matrix such that Cov(U1 , U0 ) is estimated by

q

q

Cov(UD , U1 ) Cov(UD , U0 ). Another approach is to set this parameter

to zero, as in Hamilton and Chib (2001). The latter approach is followed here. The model is formally identified by the distributional assumptions even when all covariates appear in each equation, however, the credibility of the estimates is greatly increased by exclusion restrictions on the outcome equations. It is desirable that X contains at least one column Xi with the property that, after conditioning on the other covariates, Xi is correlated with D but uncorrelated with Y j , j = 0, 1.

2.2

Causal effects.

This section derives the treatment effect parameters of interest. See Angrist and Imbens (1994), Heckman (1997), Heckman, Ichimura, and Todd (1997), and Pearl (2000) for discussion of estimation of causal effects. The analysis 5

here draws heavily on that of Aakvik, Heckman, and Vytlacil (2000) and Heckman, Tobias, and Vytlacil (2000); see also Balke and Pearl (1994), Imbens and Rubin (1997), Chib and Hamilton (2001) and Aarkvik et al. (2002) for further examples of analyses of causal effects on dichotomous outcomes. The causal effect of treatment is given by ∆ = E[Y 1 − Y 0 ].

(5)

When outcomes are binary, ∆ can only take on the values -1, 0, and 1. If ∆ = 1, the outcome is unitary only if treatment occurs. If ∆ = 0, the outcome is unaffected by treatment status. If ∆ = −1, the outcome is unitary only if treatment does not occur. The average treatment effect for an individual with characteristics X is defined as ATE(X) = E[Y 1 − Y 0 |X] = Tν (Xβ1 ) − Tν (Xβ0 )

(6)

where Tν (·) denotes the MVT distribution function with ν degrees of freedom. Averaging AT E(X) over the sample, which is equivalent to integrating AT E(X) with respect to the empirical distribution function of X, gives an estimate of the unconditional effect of treatment AT E. This estimand answers the question, “If a randomly chosen respondent is treated, how much will the probability Y = 1 change?” This effect is rarely of much interest as it includes effects for individuals who are unlikely to ever be treated and for individuals who are likely to be treated regardless of incremental changes in incentives. 6

A more economically meaningful question is, “How much did treatment affect outcomes for those who were actually treated?” The effect of treatment on the treated is TT(X) = E[Y 1 − Y 0 |X, D = 1] Z

=

Z



∞ −Xβ1 ∞ −Xβ0

Z Z



tν (s1 , s2 ; ΣD1 )ds2 ds1 −Zγ ∞

tν (s1 , s2 ; ΣD0 )ds2 ds1 ,

(7)

−Zγ

where tν (·) denotes the bivariate student’s-t density with ν degrees of freedom, zero means, and covariance matrix Σ and ΣDj denotes the covariance matrix of (UD , Uj ). Averaging T T (X) over the sample for which D = 1 yields T T , the mean effect of treatment on the treated. Consider also the effect of treatment on a respondent who is indifferent over treatment status (UD = −Zγ), the marginal treatment effect M T E(X) = E[Y 1 − Y 0 |X, D ∗ = 0] =

Z



tν (s, −Zγ; ΣD1 )ds − −Xβ1

Z



tν (s, −Zγ; ΣD0 )ds. (8) −Xβ0

The sample mean of M T (X) over all individuals is a consistent estimator of M T , the mean marginal causal effect. Finally, the effect of treatment in a subsample of respondents induced into treatment at z 0 = Z 0 γ but not at z = Zγ is LAT E(X) = E[Y 1 − Y 0 |X, D(z) = 0, D(z 0 ) = 1] "Z Z z0 ∞ 1 tν (s1 , s2 ; Σ1 )ds2 ds1 = Pr(z < uD < z 0 ) −Xβ1 z −

Z

∞ −Xβ0

Z

z0 z

tν (s1 , s2 ; Σ0 )ds2 ds1

7

#

(9)

The local average treatment effect, LATE, is then a parameter defined by the choice instruments and how much they are varied (Angrist and Imbens 1994).

2.3

Estimation.

Estimation is carried out by obtaining maximum likelihood estimates of the parameters in the system defined by equations (1) through (4). The likelihood can be expressed L=

Q

−Zγ R 0 R −Xβ

D=0,Y0 =0 −∞ −∞

x

Q

−Zγ R

∞ R

D=0,Y0 =1 −∞ −Xβ0

x x

Q

∞ R 1 R −Xβ

D=1,Y1 =0 −Zγ −∞ R∞ ∞ R Q

D=1,Y1 =1 −Zγ −Xβ1

tν (s; ΣD0 )ds tν (s; ΣD0 )ds tν (s; ΣD1 )ds tν (s; ΣD1 )ds

(10)

where ΣDj denotes the 2x2 covariance matrix of (UD , Uj ). This expression follows from the properties of the multivariate Student distribution, see Johnson and Kotz (1972). The likelihood is maximized over the parameters θ ≡ {γ, β0 , β1 , σD0 , σD1 }. Estimation requires an algorithm to integrate the bivariate Student distribution over rectangular areas. ˆ the estimated treatment effects for each reGiven the estimates θ, spondent can be evaluated by the expressions in Section 2.2.

8

2.4

Resampling method for the standard errors of treatment parameters.

Standard errors for both respondent–specific and averaged causal effects can be calculated using a parametric bootstrap. Under standard regularity conditions, the maximum likelihood estimates θˆ are asymptotically distributed a θˆ ∼ N θ0 , −E0 [H −1 ]





(11)

where N(·) denotes the multivariate normal distribution, θ0 denotes the population parameters, E0 the expectation operator under the DGP, and H the Hessian of the log–likelihood. The resampling procedure replaces θ0 and H with estimates and proceeds by drawing replications θr from ˆH ˆ −1 ), θr ∼ N(θ,

(12)

where r ∈ {1, R} indexes replications. The causal effects are evaluated for each respondent at each replication. The standard deviation observed in the bootstrap samples is an estimate of the standard error of the parameter. For example, the variance of the tth respondent’s average treatment effect can be estimated by V(T Tt ) ' R

−1

R X

(T Ttr (θr ) − T Ttr )2 ,

(13)

r=1

where T Ttr denotes the mean over replications of T Ttr .

3

Model selection.

The degrees of freedom parameter ν indexes a family of distributions. When ν = 1, the errors are multivariate Cauchy, and have no moments. As ν rises, 9

the distribution becomes more tightly centered. At nine degrees of freedom it approximates the logistic distribution, and approaches the Gaussian distribution as ν → ∞. Since the information matrix is block-diagonal with respect to ν, the parameters θ may be estimated holding ν fixed with no need to adjust the standard errors (Lang, Little, and Taylor 1989). An advantage of maximum likelihood estimation of the system is that the maximized likelihood provides a measure of fit which can be easily used to compare distributional assumptions (Pinherio, Liu and Wu 1997). If Lν > Lν 0 denote values of the log–likelihood maximized over θ holding the degrees of freedom at ν and ν 0 , then a

2[Lν − Lν 0 ] ∼ χ21

(14)

under the null that the population value degrees of freedom equals ν 0 . Estimation of the model is computationally intensive, such that an exhaustive search over values of ν low enough to substantially differ from the Gaussian distribution will often be prohibitive. However, grid searches over ν can be conducted, and the statistic above easily calculated to facilitate model selection.

4 4.1

Application: Drinking and Employment Problem.

This section revisits the problem of estimating the causal effect of problem drinking on probability of employment previously discussed by Mullahy and 10

Sindelar (1996) and Terza (2002). Mullahy and Sindelar (1996) used the 1988 Alcohol Supplement of the National Health Interview Survey to estimate the effect of heavy drinking (alcohol consumption percentile > 90) on the probabilities of employment, unemployment, and out of labor force. Terza (2002) considered the male sample of the same data, but dropped the linear probability framework used by Mullahy and Sindelar, estimating a multinomial logit model of employment status treating the drinking indicator as endogenous by using a common factor error structure. Terza (2002) reports that the linear probability structure is misleading, as he recovers larger, statistically significant, and heterogeneous effects. I have used Terza’s (2002) data made available in the Journal of Applied Econometrics archives. This section simplifies the problem by treating the outcome as either employed or not employed. This simplification retains the core economic issue but allows the econometric model developed in previous sections to be implemented. Like Terza’s (2002) approach, the model does not impose linearity, but also has the advantages of heterogeneous response on both observed and unobserved determinants, a more flexible error structure, and the economically meaningful distinctions between the various treatment effects. In particular, the effects of treatment on the treated and the local average treatment effect, which here estimate the effect of drinking among drinkers and the effect of policy changes on employment, are of interest. To facilitate comparisons, the same data and core specification as in Mullahy and Sindelar and Terza is used, such that differences in estimated causal effects reflect differences in econometric models rather than differences across datasets or assumptions about causal paths. See Mullahy and Sindelar (1996) for further 11

discussion of the data, summary statistics, and relation to other literature. The endogenous outcomes are defined as follows: D Y1 Y0

indicator for heavy drinking indicator for employment if heavy drinker indicator for employment if not heavy drinker

The covariates in the outcome and selection equations include demographic characteristics such as age, family structure, health status, and region of residence. The drinking equation also includes State-level measures of per– capita alcohol consumption and its square, beer and cigarette tax rates and their squares, and indicators for alcoholic relatives during childhood (biological mother, father, and other). See Mullahy and Sindelar (1996) and Terza (2002) for discussion of motivation for the exclusion restrictions and relevant specification tests.

4.2

Estimation.

The model was estimated using simplex and Newton–Raphson based optimization methods to maximize the likelihood (10). Integrals over the bivariate student’s–t distribution were evaluated using algorithms discussed in Genz (2001) and Genz and Bretz (2002), setting the minimum accuracy of the integral evaluations to 10−10 . These specialized algorithms are faster and more accurate than a generic numerical quadrature approach. Likewise, the causal effects parameters were evaluated at the maximum likelihood structural parameter estimates using these algorithms. 500 bootstrap replications were used to estimate the standard errors of the mean treatment parameters 12

as describe in Section 2.4. All routines were written in Fortran 90 and executed on a 2.4 gigahertz Pentium 4 based machine. The outer product of the gradient approximation to the information matrix was used to estimate the covariance matrix of the parameter estimates.

4.3

Model selection.

Table 1 displays values of the maximized log–likelihood at different values of the degrees of freedom parameter. In this example, the textbook Gaussian model fits the data better than the MVT alternatives, suggesting that even the Gaussian distribution may have tails which are “too thick.” However, we could not know the Gaussian distribution fits best without estimating the fatter–tailed alternatives, and those alternatives allow comparison of the best model to misspecified alternatives. Likelihood ratio tests against the Gaussian model, also displayed in Table 1, demonstrate that MVT distributions with fewer than 10 degrees of freedom can be rejected at 5% significance.

4.4

Results.

Table 2 presents estimates of the structural parameters for the outcomes equations for the Gaussian model. The estimates for the drinking equation are quite similar to those presented by Mullahy and Sindelar (1996) and Terza (2002) and are suppressed. However, it is worth noting that many of the parameter estimates in that equation, notably on the excluded instruments, are highly significant, indicating that selection on observables is important. 13

Determinants of employment are very similar for drinkers and non– drinkers. Correlations between unobserved determinants of drinking and employment vary across drinking status, with much higher positive correlation for non-drinkers than for drinkers. Table 3 presents estimates of the mean ATE, TT, LATE, and MTE, the standard deviation of these measures across individuals at the maximum likelihood structural estimates, resampled standard errors of the mean estimates, and the 5th and 95th percentiles of the bootstrap distributions of the mean effects. The LATE is defined with respect to a policy change: The tax on beer is varied from its sample minimum (0.045) to its maximum (2.37). Thus, the LATE measures the effect of drinking on employment among individuals induced to stop drinking by an increase in the beer tax from its national minimum at the time of the sampling to its national maximum. Randomly selecting an individual and forcing him to change status from non-drinker to drinker reduces his probability of employment by an average of 14.4%. The standard deviation of average treatment effects around this mean is 8.1%. The mean marginal, local, and on-the-treated effects are somewhat smaller in magnitude, at -0.135, -0.133, and -0.126 respectively. Drinkers experience treatment effects that are smaller in magnitude than non–drinkers, suggesting that individuals select into heavy drinking based on comparative advantage. However, the bootstrapped 90% confidence interval for the mean treatment effects is quite large and admits positive effects of heavy drinking on employment for the average treatment effect. Table 4 characterizes how LAT E(X) varies over the covariates in the 14

employment equations. Note the standard errors have not been corrected; if this were a linear model the fit would be perfect and the estimated parameters simply (βˆ1 − βˆ0 ). Table 4 shows that older, more educated, and particularly healthier respondents’ employment is less affected by heavy drinking than respondents with lower human capital. Such results would be impossible to recover in a framework which assumes the causal effect operates only through the constant, such as a bivariate probit approach. Figure 1 displays kernel density estimates of the distribution of average and local average treatment effects across individuals for the Gaussian model. The marginal and treatment-on-the-treated densities are similar to that of the local average effect and are not displayed. The average effect is more dispersed across individuals than the local effect, and places more probability mass on more negative effects. The average effect could then be misleading, as these estimates suggest that a randomly selected person would experience more deleterious effects on their labor market outcomes if they drank heavily than individuals who are likely to be influenced to drink heavily by policy changes. Recall the MVT distribution with nine degrees of freedom has approximately the same moments as the extreme value distribution. Evaluation of the treatment effects parameters under this distribution for the errors yields substantially different results than under Gaussian errors. In particular, the average treatment effect is estimated as -21%, which is similar to Terza’s (2002) estimate of -28% from a similar model with extreme value errors. Heckman and Vytlacil (2000) report similar discrepancies as the degrees of

15

freedom are allowed to vary in the case of two–step estimation on continuous outcomes.

5

Conclusions.

This paper has discussed estimation of causal effects on binary outcomes using a structural model with a flexible error structure. Full information maximum likelihood estimators of the structural parameters yields asymptotically efficient estimates, and the causal effects can be evaluated in a straightforward manner. The properties of the multivariate student’s–t distribution frequently simplify and reduce the computational demands posed by the model, and maximum likelihood estimation provides simple tests for model selection. An application considered the effect of heavy drinking on probability of employment. The textbook Gaussian model was found to fit better than fatter-tailed alternatives for these data, and the estimated causal effects were shown to be sensitive to the assumed distribution of the errors. A specification approximating the moment structure of the multinomial logit model (with correlated errors), for example, produced an average treatment effect of -21%, as opposed to -14% with Gaussian errors. The estimates suggest that heavy drinking reduces probability of employment by about 13% for those likely to be influenced by policy variables such as alcohol taxes, although this effect is not precisely estimated.

16

6

References

Aakvik, A., T. Holmas, and E. Kjerstad (2002) “A low–key social insurance reform — Treatment effects for back pain in Norway,” presented at Eleventh European Workshop on Econometrics and Health Economics, Lund, Sweden. Aakvik, A., J. Heckman, and E. Vytlacil (2000) “Treatment effects for discrete outcomes when responses to treatment vary among observationally identical persons: An application to Norwegian Vocational Rehabilitation Programs,” forthcoming Journal of Econometrics. Angrist, J., and Imbens, G., (1994), ”Identification and estimation of local average treatment effects,” Econometrica 62:467-75. Balke, A. and J. Pearl (1994) “Counterfactual probabilities: Computational methods, bounds, and applications,” in R. Mantares and D. Poole (eds) Uncertainty in Artificial Intelligence 10, Kaufman. Chib, S. and B. Hamilton (2001) “Bayesian analysis of cross-section and clustered treatment models,” Journal of Econometrics 97(1):25-50. Genz, A. (2001) “Numerical computation of rectangular bivariate and trivariate normal and student’s t probabilities,” Manuscript, Department of Mathematics, Washington State University. Genz, A. and F. Bretz (2002) “Methods for the computation of multivariate t probabilities,” forthcoming, Journal of Computational and Graphical Statistics. Geweke, J. (1993) “Bayesian treatment of the independent student t linear model,” Journal of Applied Econometrics 8: S19-S40. Heckman, J. (1997) “Instrumental variables: A study of implicit behavioral assumptions used in making program evaluations,” Journal of Human Resources 32:441-462. Heckman, J., H. Ichimura, J. Smith, and P. Todd (1998) “Characterizing selection bias using experimental data,” Econometrica 66:1017-1098. Heckman, J. J. Tobias, and E. Vytlacil (2000) “Simple estimators for treatment parameters in a latent variable framework with an application to estimating the returns to schooling,” NBER #7950. 17

Imbens, G. and D. Rubin (1997) “Bayesian inference for causal effects in randomized experiments with noncompliance,” Annals of Statistics 25: 305-327. Johnson, N. and S. Kotz (1972) Distribution in Statistics, Wiley. Lange, K., R. Little, and J. Taylor (1989) “Robust statistical modeling using the t distribution,” Journal of the American Statistical Association 84:881-896. Mullahy, J. and J. Sindelar (1996) “Employment, unemployment, and problem drinking,” Journal of Health Economics 15:409-434. Pearl, J. (2000) Causality. Cambridge: Cambridge University Press. Terza, J. (2002) “Alcohol abuse and employment: A second look,” Journal of Applied Econometrics 17:393-404. Zellner, A. (1976) “Bayesian and non-Bayesian analysis of the regression model with multivariate student error terms,” Journal of the American Statistical Association 71:400-405.

18

Table 1 Log–likelihoods as error distribution varies d.f. 5 9 10 20 ∞

χ21

p–value

-5,786.26 10.32 -5,783.16 4.12 -5,782.93 3.66 -5,781.85 1.50 -5,781.10

< 0.01 0.042 0.055 0.221

likelihood

Notes: - Each row corresponds to estimates of the structural model conditional on the errors having a multivariate student’s-t distribution listed degrees of freedom, where infinite degrees of freedom denotes the Gaussian model. - The χ2 statistic is against the null the errors have the listed student distribution against the Gaussian distribution.

19

Table 2 Structural estimates of employment probability

Variable unemployment rate age age2 schooling married family size white excellent very good good fair Northeast Midwest South center city other MSA quarter 1 quarter 2 quarter 3 constant

Drinker Not drinker Estimate t-ratio Estimate t-ratio -0.064 0.115 -0.001 0.076 0.306 0.009 0.375 2.088 1.965 2.041 1.635 -0.184 0.331 0.176 0.033 -0.001 0.044 0.034 -0.057 -4.438

σD0 σD1

1.336 2.304 2.350 2.506 1.903 0.157 2.224 4.889 5.126 5.281 4.266 0.878 1.778 0.920 0.189 0.008 0.270 0.205 0.326 3.792

-0.052 3.578 0.103 5.634 -0.001 6.226 0.038 5.624 0.268 5.226 0.008 0.631 0.394 8.097 1.606 15.207 1.599 14.978 1.341 12.688 0.850 7.423 0.048 0.750 0.098 1.756 0.208 3.847 -0.053 1.020 0.086 1.685 -0.108 1.981 -0.058 1.035 -0.033 0.588 -2.680 6.538

0.907 3.917 0.291 0.662

Notes: - Estimated simultaneously with drinking equation. - σD0 is correlation between unobserved determinants of drinking and employmentif-not-drinker. - σD1 is correlation between unobserved determinants of drinking and employmentif-drinker.

20

Table 3 Heterogeneous causal effect estimates

effect ATE(X) TT(X) MTE(X) LATE(X)

mean effect

std. dev.

-0.144 -0.126 -0.135 -0.133

0.081 0.144 0.137 0.137

Bootstrapped: percentiles std. err. 5th 95th 0.222 0.112 0.142 0.139

-0.666 0.050 -0.400 -0.050 -0.492 -0.039 -0.481 0.038

Notes: - LATE defined with respect to a change in the beer tax from its national minimum to its national maximum. - “standard deviation” measures variation of effects across individuals whereas “standard error” and the percentile estimates are estimates of the sampling distribution of the mean effect.

21

Table 4 How LAT E(X) varies with covariates variable

coeficient

unemployment rate age age2 schooling married family size white excellent very good good fair Northeast Midwest South center city other MSA quarter 1 quarter 2 quarter 3 constant

t-ratio

-0.012 -48.54 0.020 61.29 -0.000 -62.91 0.015 123.69 0.069 67.27 0.004 18.75 0.075 73.86 0.617 258.14 0.590 246.30 0.602 250.27 0.479 177.43 -0.037 -32.81 -0.059 -59.05 -0.029 -29.43 -0.016 -16.61 -0.004 -5.33 -0.008 -8.62 0.005 6.04 -0.007 -8.01 - 1.332 -188.44

Notes: - OLS estimates of regression of LAT E(X) on X. Positive parameter estimates mean the effect of drinking on employment is less negative for higher values of the covariate. - Standard errors not corrected to reflect sampling distribution of structural parameters. - LATE defined with respect to a change in the beer tax from its national minimum to its national maximum.

22

Figure 1. Average and local average treatment effects. Kernel density estimates of heterogeneous causal effect of heavy drinking on probability of employment. LATE defined with respect to a change in the beer tax from its national minimum to its national maximum.

23

Suggest Documents