Using Regression Discontinuity with Implicit ... - Semantic Scholar

7 downloads 0 Views 1MB Size Report
Mar 19, 2011 - case of Comunidades Solidarias Rurales in El Salvador, which used ... Government of El Salvador through the Fondo de Inversión Social para ...
March 19, 2011

Using Regression Discontinuity with Implicit Partitions: The Impacts of Comunidades Solidarias Rurales on Schooling in El Salvador

Alan de Brauw and Daniel Gilligan∗

Abstract Regression discontinuity design (RDD) is a useful tool for evaluating programs when a single variable is used to determine program eligibility. RDD has also been used to evaluate programs when eligibility is based on multiple variables that have been aggregated into a single index using explicit, often arbitrary, weights. In this paper, we show that under specific conditions, regression discontinuity can be used in instances when more than one variable is used to determine eligibility, without assigning explicit weights to map those variables into a single measure. The RDD approach used here groups observations that are common across multiple criteria through the use of a distance metric and by creating an implicit partition between groups. We apply this model to the case of Comunidades Solidarias Rurales in El Salvador, which used partitioned cluster analysis to determine the order communities would enter the program as a function of the poverty rate and severe stunting rate. Using data collected for the evaluation as well as data from the 6th National Census of El Salvador, we demonstrate that the program increased both parvularia and primary school enrollment among children aged 6 to 12 years old. Among children of primary school age, we further show that enrollment gains were largest among younger children and older girls. Alan de Brauw and Daniel Gilligan are Senior Research Fellows, International Food Policy Research Institute, 2033 K Street NW, Washington, DC 20006. We thank Mauricio Shi Artiga, Doug Miller, Amber Peterman, Margarita Beneke de Sanfeli´ u, Mauricio Sandoval, and seminar participants at the University of California, Davis, for contributions and suggestions that have strengthened this paper. The evaluation data used in the study were collected on behalf of the Government of El Salvador through the Fondo de Inversi´ on Social para el Desarrollo Local (FISDL). Please direct correspondence to Alan de Brauw at [email protected] or at the address listed above. All remaining errors are our responsibility. ∗

Using Regression Discontinuity with Implicit Partitions: The Impacts of Comunidades Solidarias Rurales on Schooling in El Salvador Regression discontinuity methods have become increasingly popular in evaluating the impacts of social programs in the economics literature. In general, evaluations have been based on applying RDD around thresholds of a single metric that determines program eligibility (see Lee and Lemieux, 2010, for a review). Since the threshold is arbitrary from the perspective of the unit of intervention, units that are just eligible for the program– or have values of the metric “close” to the threshold– can be compared with units that are just not eligible, to measure the local average treatment effect of the program. It is not necessarily the case that one metric determines program eligibility. Rather, governments or agencies charged with determining program eligibility may instead choose to use two or more metrics. If two or more metrics are used, a common way to map these into a single measure is through a mathematical function that assigns explicit weights to each metric to construct the aggregate single metric. If this procedure is used then regression discontinuity is still simple to use if specific assumptions are met (e.g. Imbens and Lemieux, 2008). A common example of such a procedure is when program eligibility is determined by a proxy means test, which effectively turns several measures into one measure, which can then be used to determine strict program eligibility (see, for example, Galasso, 2006; Chaudhury and Parajuli, 2008; Filmer and Schady, 2009; and Ponce and Bedi, 2010). However, one does not necessarily need to use a well-defined function to determine program eligibility. For example, Kane (2003) considers the case in which students graduating from high school in California only become eligible for grants if they achieved a minimum GPA and had income and financial assets below specific thresholds. In several other cases, authors use the distance to a city boundary as a forcing variable, which technically depends upon two

1

different variables for a threshold, longitude and latitude (e.g. Black, 1999; Bayer et al., 2007). Anti-poverty programs, however, are often distributed on the basis of a proxy means test or a similar function that transforms several variables into a single number. One can argue that such functions use arbitrary or politically determined weights assigned by program managers to determine the score. The approach to RDD proposed here uses a less arbitrary approach to grouping observations that are similar across several characteristics. In this approach, partitioned cluster analysis is used to identify similar groups within data, which classifies individual observations into similar clusters of observations. If a subset of those clusters are then assigned a treatment, then the treatment status is completely determined by cluster membership, and therefore by the metrics used in assigning units to clusters. But an explicit threshold between treatment clusters and control clusters does not exist, so one cannot immediately perform regression discontinuity to determine program impacts. In this paper, we develop a set of additional assumptions needed to use standard regression discontinuity methods to evaluate programs that determine treatment status using partitioned cluster analysis or similar methods. The idea behind the estimator is that we use the distance metric that determines clustering in the data to implicitly define the threshold between treatment and control groups as a function of the distance between cluster centers. Under quite reasonable assumptions, we show that the threshold can then be used in a sharp regression discontinuity estimator using the distance from each point to the threshold in estimation. We then apply this methodology to evaluate a specific program, Comunidades Solidarias Rurales (CSR) in El Salvador, that used partitioned cluster analysis to determine the order in which municipalities entered CSR, as well as which municipalities would receive CSR. Using both data from the evaluation of CSR as well as census data from El Salvador, we compare schooling outcomes among households in municipios that entered CSR in 2006 with municipios entering in 2007. We find that close to the threshold, children in the 2006 entry

2

group are more likely to have enrolled in parvularia at age 6, and that school enrollment rates among primary school age children increase by 4 percentage points. Using the census data, we further disaggregate these results by age and gender. The paper proceeds as follows. First, we briefly review the one-dimensional regression discontinuity estimator, including assumptions necessary for the estimator to provide an unbiased estimate of the treatment effect. Second, we provide a brief description of partitioned cluster analysis. Third, we develop conditions for an N -dimensional regression discontinuity estimator to be valid. The fourth section presents basic information about CSR and describes the data sources used for analysis. The fifth section presents results and the sixth section concludes.

1

Regression Discontinuity Design

Regression discontinuity designs are typically referred to as sharp and fuzzy designs. The estimator we will develop follows the sharp design, so we review it here. Following the notation of Imbens and Wooldridge (2008), we can consider two potential outcomes for unit i, namely Yi (0) and Yi (1), where the difference Yi (1)−Yi (0) is the definition of the causal effect of the treatment. The observed outcome is equal to:

Yi = (1 − Wi ) · Yi (0) + Wi · Yi (1)

(1)

where Wi �{0, 1} is the treatment indicator variable. The idea behind a sharp regression discontinuity evaluation is that there is a variable Xi that completely determines whether or not a unit receives the treatment. Calling this threshold c, a unit will receive the treatment if Xi ≥ c, which implies: Wi = 1{Xi ≥ c}

3

(2)

In a sharp regression discontinuity design, all units with a value of Xi that is at least c do receive the treatment, and those units with a value of Xi below c do not receive the treatment, effectively becoming the control group. The average treatment effect δ is the difference between the mean outcome for units with values of Xi just below the threshold (Y − ) and just above the threshold (Y + ). It can then be written as the difference in conditional expectations between units just above and below the threshold:

δ = Y + − Y − = lim E(Yi (1)|Xi = c + ε) − E(Yi (0)|Xi = c − ε) ε→0

(3)

for ε > 0. To estimate δ, one needs to estimate both Y + and Y − . Then Y + and Y − must be estimated, and one can quite generally write the solution to the estimation problem in the form of non-parametric regressions:



Xi −c

Yi K( h ) Yˆ + = �i Xi >c Xi −c i Xi ≥c K( h ) Yˆ − =



c−Xi i Xi ' U()H' L(O' O' UW)U' LGH' UG)O' LJ>' U' UL)W' LJ>' UH)K' LGH' JK' UG)>' G>(' UO)J' LJ>' JJ' UW)L' >LU' UL)W' G>(' J(' UJ)K' >WL' U()J' >LU' N8$%,&X'M6-.,0'49./$.0"8+'Y.1&/"+&'N$%9&3F')":#*.0-0&+(!"/.0-$.-+(;#$-/&+F'(KKO)'

' V.2/&'()''N,=88/'4+%8//6&+0F';="/B%&+'.#&B'HIJ('"+'3&.%'-%&9"8$1'08'1$%9&3'A8%'2.1&/"+&'.+B'.0'-%&1&+0'A8%' A8//8' ' RK)KKOS' RK)KJ>S' RK)KJ(S' RK)KJWS' 4+%8//&B'"+'(KKH' K)UOH' K)U>U' K)U>(' K)UG>' ' RK)KK>S' RK)KJJS' RK)KKUS' RK)KJKS' Q80&1X'N0.+B.%B'&%%8%1'"+'-.%&+0=&1&1',/$10&%&B'.0'0=&',.+08+'/&9&/)'V=&'.9&%.#&1'A8%'(KKL'.%&' %&Z' OW)LZ' UG)WZ' OW)LZ' UG)GZ' (KKH' LG)UZ' OO)GZ' LH)(Z' OO)KZ' LH)(Z' OO)OZ' !"#$%&'')&*+"(,-%."*-/(0&(1"2/-%.3*(4(5.6.&*0-(0&(7889F'4/'N./9.B8%)'

' V.2/&'G)'N,=88/'4+%8//6&+0'?.0&1F'(KKHF'?$%./'4/'N./9.B8%F'23'789&%03':%8$-' 789&%03':%8$-' 4+%8//6&+0'?.0&F'LIJ('T&.%'C/B1' Q$62&%'8A'C21&%9.0"8+1' N&9&%&' U()WZ' WWJOW' \"#=' OL)HZ' JJJHLH' P//'C0=&%'?$%./'P%&.1' OU)>Z' GGKH>H' !"#$%&'')&*+"(,-%."*-/(0&(1"2/-%.3*(4(5.6.&*0-(0&(7889F'4/'N./9.B8%)'

! V.2/&'L)'?&#%&11"8+'@"1,8+0"+$"03'?&1$/01'A8%'M6-.,0'8A'V%.+1A&%'P118,".0&B'G' K)KG>' ' RK)KJHS_' RK)K(KS_' RK)K(KS__' RK)K(JS___' Q$62&%'8A'C21)' W(WU' (WKL' (JKG' JGW>' Q80&1X'N0.+B.%B'&%%8%1'"+'-.%&+0=&1&1',/$10&%&B'.0'0=&'6$+","-"8'/&9&/)''_I"+B",.0&1'1"#+"A",.+,&'.0'0=&'JK' -&%,&+0'/&9&/a'__I'"+B",.0&1'1"#+"A",.+,&'.0'0=&'G'-&%,&+0'/&9&/a'___I'"+B",.0&1'1"#+"A",.+,&'.0'0=&'J' -&%,&+0'/&9&/)''!8%'`&%+&/'&10"6.0&1F'10.+B.%B'&%%8%1'.%&'288010%.--&B'$1"+#'JKK'%&-/",.0"8+1'8A'0=&'B.0.)' Y.+BK' ' RK)KJGS__' RK)KJHS__' RK)KJLS__' Q$62&%'8A'C21&%9.0"8+1' WH((J' (JHHH' J>G(O' Q80&1X''N0.+B.%B'&%%8%1',/$10&%&B'.0'0=&'6$+","-"8'/&9&/'.%&'"+'-.%&+0=&1&1)''_I'"+B",.0&1'1"#+"A",.+,&'.0' 0=&'JK'-&%,&+0'/&9&/F'.+B'__I'"+B",.0&1'1"#+"A",.+,&'.0'0=&'G'-&%,&+0'/&9&/)'!8%'`&%+&/'&10"6.0&1F'10.+B.%B' &%%8%1'.%&'288010%.--&B'$1"+#'JKK'%&-/",.0"8+1'8A'0=&'B.0.)''P//'%&#%&11"8+1'"+,/$B&'.'A$//'1&0'8A'.#&'.+B' #&+B&%'B$66"&1)' !"#$%&'')&*+"(,-%."*-/(0&(1"2/-%.3*(4(5.6.&*0-(0&(7889F'4/'N./9.B8%)' '

' V.2/&'O)'M6-.,0'8A';86$+"B.B&1'N8/"B.%".1'?$%./&1'8+'N,=88/'4+%8//6&+0F'23'P#&'.+B':&+B&%F' Y.+BLS__' RK)K(LS__' RK)KWGS__' RK)K(GS__' RK)KLHS' O' K)K>(' K)KG>' K)KWH' K)KGJ' K)K>H' K)KGO' ' RK)KJGS__' RK)K(>S__' RK)KJLS__' RK)K(LS_' RK)KJLS__' RK)K(>S__' U' K)KWU' K)K>' K)KWW' K)K((' K)K>H' K)KL' ' RK)KJWS__' RK)KJUS__' RK)KJKS__' RK)KJ>S' RK)KJHS__' RK)K(LS__' JK' IK)KKJ' IK)KJU' K)KKJ' K)KJU' IK)KK>' IK)K>O' ' RK)KJ(S' RK)K(JS' RK)K(JS' RK)K(OS' RK)KJHS' RK)K(WS_' JJ' K)K(L' K)KJG' K)KJO' IK)KW' K)KWW' K)KLH' ' RK)KKLS__' RK)KJJS' RK)KKUS_' RK)KKHS' RK)KJWS__' RK)KJUS__' J(' K)K(>' K)K(W' K)KJ>' bK)KKJ' K)KW>' K)K>H' ' RK)KJ>S_' RK)KJHS' RK)KJJS' RK)KJHS' RK)KJUS' RK)K(JS__' Q80&1X'N0.+B.%B'&%%8%1',/$10&%&B'.0'6$+","-"8'"+'-.%&+0=&1&1)''4.,=',&//'%&-%&1&+01'.'1&-.%.0&' %&#%&11"8+)''_I'"+B",.0&1'1"#+"A",.+,&'.0'0=&'JK'-&%,&+0'/&9&/a'__I'"+B",.0&1'1"#+"A",.+,&'.0'0=&'G'-&%,&+0' /&9&/)''!8%'`&%+&/'&10"6.0&1F'10.+B.%B'&%%8%1'.%&'288010%.--&B'$1"+#'JKK'%&-/",.0"8+1'8A'0=&'B.0.)' ?&#%&11"8+1',86-.%&'"+B"9"B$./1'"+'(KKL'&+0%3'6$+","-"81'U' K)JUL' K)JLJ' K)(WO' ?&#%&11"8+' RK)KW(S__' RK)K>JS__' RK)KLKS__' RK)KOKS_' RK)K>LS__' ,"*&$*&/+( ' ' ' ' K)JLO' K)JGO' K)JGJ' K)JW>' K)JLU' :.$11".+' RK)K(JS__' RK)K(>S__' RK)KWKS__' RK)KWOS__' RK)KW>S__' K)JLL' K)JGW' K)JGG' K)JWG' K)JHL' 4-.+&,=+"`89' RK)K(JS__' RK)K(LS__' RK)KWJS__' RK)KWUS__' RK)KWHS__' Q$62&%'8A'C21)' L(KU' WLLG' (GKU' J(U>' J(KU' Q80&1X''N0.+B.%B'&%%8%1',/$10&%&B'.0'0=&'6$+","-"8'/&9&/'.%&'"+'-.%&+0=&1&1)''_I'"+B",.0&1'1"#+"A",.+,&'.0' 0=&'JK'-&%,&+0'/&9&/F'.+B'__I'"+B",.0&1'1"#+"A",.+,&'.0'0=&'G'-&%,&+0'/&9&/)'!8%'`&%+&/'&10"6.0&1F'10.+B.%B' &%%8%1'.%&'288010%.--&B'$1"+#'JKK'%&-/",.0"8+1'8A'0=&'B.0.)'' '!"#$%&'')&*+"(,-%."*-/(0&(1"2/-%.3*(4(5.6.&*0-(0&(7889F'4/'N./9.B8%)' '

' V.2/&'JK)'?&#%&11"8+'@"1,8+0"+$"03'?&1$/01'A8%'M6-.,0'8A'V%.+1A&%'P118,".0&B'S' RGS' RLS' M6-.,0'.0'V=%&1=8/B' K)JUL' K)JUL' K)JUJ' ' K)KWO' K)KWL' K)KWO' ' RK)KLJS__' RK)KLJS__' RK)KLWS__' ' RK)K(JS_' RK)K(JS' RK)KJHS__' @"AA&%&+,&'"+'@"10.+,&' K)K(K' K)K(K' K)K(K' ' K)KKL' K)KKL' K)KKG' Y&0S' RK)KKWS' @"AA&%&+,&'"+'@"10.+,&' IK)KJL' IK)KJG' IK)KJH' ' IK)KJJ' IK)KJK' IK)KJK' _V=%&1=8/B' RK)K(GS' RK)K(GS' RK)K(LS' ' RK)KKHS' RK)KKHS' RK)KKGS_' :&+B&%'RJ]6./&S' ' IK)KKL' IK)KKU' ' ' K)KKH' K)KKH' ' ' RK)K((S' RK)K(KS' ' ' RK)KKGS' RK)KKGS' P#&'R"+'3&.%1S' ' ' ' ' ' K)KJ>' K)KJ>' ' ' ' ' ' ' RK)KKWS__' RK)KKWS__' !&6./&'\&.B&B' ' ' K)K(J' ' ' ' K)KJK' \8$1&=8/BRJ]3&1S' ' ' RK)KJUS' ' ' ' RK)KKGS_' \&.B'"1'E"0&%.0&'RJ]3&1S' ' ' K)KLG' ' ' ' K)KG>' ' ' ' RK)KJ(S__' ' ' ' RK)KKUS__' P#&'8A'\8$1&=8/B'\&.B' ' ' IK)KK(' ' ' ' K)KK>' ' ' ' RK)KKOS' ' ' ' RK)KKJS__' ' ' ' ' ' ' ' ' Q$62&%'8A'C21)' (G(U' (G(U' (GJH' ' J>G(O' J>G(O' J>>LH' Q80&1X''N0.+B.%B'&%%8%1',/$10&%&B'.0'0=&'6$+","-"8'/&9&/'.%&'"+'-.%&+0=&1&1)''_I'"+B",.0&1'1"#+"A",.+,&'.0' 0=&'JK'-&%,&+0'/&9&/F'.+B'__I'"+B",.0&1'1"#+"A",.+,&'.0'0=&'G'-&%,&+0'/&9&/)'' '!"#$%&'')&*+"(,-%."*-/(0&(1"2/-%.3*(4(5.6.&*0-(0&(7889F'4/'N./9.B8%)' '

' P--&+B"5'V.2/&'J)''?&#%&11"8+'@"1,8+0"+$"03'?&1$/01'A8%'M6-.,0'8A'V%.+1A&%'P118,".0&B'OLG' J>G(O' Q80&1X''N0.+B.%B'&%%8%1',/$10&%&B'.0'0=&'6$+","-"8'/&9&/'.%&'"+'-.%&+0=&1&1)''_I'"+B",.0&1'1"#+"A",.+,&'.0' 0=&'JK'-&%,&+0'/&9&/F'.+B'__I'"+B",.0&1'1"#+"A",.+,&'.0'0=&'G'-&%,&+0'/&9&/)'!8%'`&%+&/'&10"6.0&1F'10.+B.%B' &%%8%1'.%&'288010%.--&B'$1"+#'JKK'%&-/",.0"8+1'8A'0=&'B.0.)''P//'%&#%&11"8+1'"+,/$B&'.'A$//'1&0'8A'.#&'.+B' #&+B&%'B$66"&1)' !"#$%&'')&*+"(,-%."*-/(0&(1"2/-%.3*(4(5.6.&*0-(0&(7889F'4/'N./9.B8%)' '