Exploiting Impact Heterogeneity: A Statistical Model of Programme Selection∗ Jonas Staghøja†, Michael Svarera & Michael Rosholmb a) School of Economics and Management, University of Aarhus b) Department of Economics, Aarhus School of Bussiness
May, 2007
Abstract In Denmark and many other countries, Active Labour Market Policies (ALMPs) are heavily used instruments to combat unemployment. Some types of ALMP are supposed to enhance general skills for the unemployed, while other programmes are of a more specific nature. And some programmes may even have the main purpose of being unpleasant and hence motivating the unemployed to search harder for a job because of the lowered value of staying unemployed. This variety of programmes raises the question of how to allocate unemployed individuals into ALMP in an optimal way? In this paper we estimate a statistical model capable of providing caseworkers with information about impacts of different programmes conditional on observing certain characteristics of the unemployed. A multivariate duration model with unobserved heterogeneity is used to estimate heterogeneous treatment effects, and the allocation of programmes can then be based on these estimated effects. We compare different allocation mechanisms, and the results shows that the choice of allocation mechanism has a considerable impact on the average duration of unemployment. Keywords: Profiling, Targeting, Statistical Treatment Rules, Heterogeneous Effects. ∗ †
Preliminary Draft. Corresponding author. Tel.: +45 8942 1584, E-mail address:
[email protected]
1
1
Introduction
In this paper we develop a model that tries to improve on the allocation of unemployed into different types of Active Labour Market Programmes (ALMPs). During the last decade, ALMPs has received increasing emphasis in a lot of countries, and many politicians has seen it as a promising way of decreasing unemployment. But evaluation studies typically finds the effects of these programmes to be insignificant or even negative.1 One reason for this discrepancy could be that politicians are actually right about the value of ALMP, but has not been able to implement it in an optimal way. If the effects of ALMPs are heterogeneous, it is a possible scenario that the different programmes are simply not allocated to the unemployed in an optimal way. Our results shows that the choice of allocation mechanism has a considerable impact on the average duration of unemployment and hence on the unemployment rate. Allocation of unemployed to different types of ALMPs typically involve a large degree of discretion to the caseworkers. Some deterministic rules, like meeting certain broad criteria regarding unemployment, may initially screen the unemployed and state that programmes should be allocated to a particular group of unemployed. But such deterministic rules are often just an initial screening device, and the actual choice of a particular programme is usually decided by the caseworker and perhaps the individual unemployed. In this paper we argue that this may not necessarily be an optimal way of allocating programmes. Instead we propose a model for statistical programme selection which should be able to illuminate statistically important patterns that the caseworkers are not able to see. We estimate a multivariate duration model and use the "timing-of-events" model developed by Abbring & van den Berg (2003) to be able to take unobserved heterogeneity into account. The idea of using a statistical model for allocation of programmes to unemployed is not new, and quite a lot of countries has developed and run experiments involving similar models in the last decade. Following Frölich, Lechner & Steiger (2004) the models can be divided into two broad categories of models labeled Profiling and Targeting. Profiling models attempts to identify and allocate programmes to those unemployed at risk of staying unemployed for a long period, and examples of this kind of models include the Worker Profiling and Reemployment System (WPRS) implemented in 1995 in the United States 1
See e.g. Heckman et al. (1999) for a general review of treatment effects or Bolvig et al. (2003),
Rosholm & Skipper (2003) and Munch & Skipper (2003) for analyses of danish ALMPs.
2
and A Danish Profiling System implemented in Denmark in 2004.2 The effectiveness of these systems rests on past research suggesting that ALMPs are more effective if you can identify the group of potentially long-term unemployed and offer intensive treatment to them very early in the unemployment spell. This might be a sensible way to allocate unemployed, but if the overall goal of the system is to maximize effects of ALMPs, a more direct approach is to allocate unemployed directly from their estimated effects of ALMPs. This alternative modelling approach is called Targeting and models of this type has been developed in Switzerland (SAPS) and Germany (TrEffeR).3 In Switzerland, the SAPS model has been implemented in a randomized pilot study to evaluate the effectiveness of the model, and results from this study will be available in 2007. Germany are about to start similar experiments with their TrEffeR model. The model we use in this paper is quite similar to these models although we use another econometric model which are based on somewhat less strict assumptions. Other important papers about statistical treatment models include Berger, Black and Smith (2000) and Manski (2004). Berger, Black and Smith (2000) contains a theoretical discussion of how to design profiling systems, which provided the inspiration for the present paper, and it also contains some results on possible trade-offs between equity and efficiency. Manski has written several papers about statistical treatment models with a more strict model based approach, and in Manski (2004) he analyze how a utilitarian planner should use information on treatment outcomes obtained from a classical randomized experiment, to provide an optimal allocation of treatments. Recently an entire issue of the Economic Journal was devoted to profiling models and the introduction by Manski contains a good description of possible utilizations of profiling models (Manski (2006)). The paper is organized as follows. In section 2 we first provide a framework for analyzing the effectiveness of profiling and targeting models. In section 3 we describe the econometric model, and section 4 contains a description of the data. Section 5 explains the actual estimation and proposed implementation of the system and section 6 provides the results. Finally, section 7 contains a conclusion. 2
See Eberts, O´Leary & Wandner (2002) for descriptions and evaluations of profiling and targeting
systems in the United States and Rosholm, Staghøj, Svarer & Hammer (2006) for a description of the Danish model. 3 See Frölich (2006) or Behncke, Frölich & Lechner (2006) for more details about the SAPS (Statistically Assisted Programme Selection) model and Stephan, Rässler & Schewe (2006) for information about TrEffeR (Treatment Effects and Prediction).
3
2
Profiling and Targeting models
2.1
Framework for analyzing statistical models of programme selection
Profiling, Targeting or Statistical Programme Selection are different names for similar mechanisms. In this section we discuss how and when to use these methods in different settings. The potential scope for such mechanisms is very broad and include applications in Finance, Medicine, Insurance, Criminology, Marketing and Data Mining.4 The main purpose of using a statistical programme selection mechanism is to reveal systematic relationships between some observed variables, X, and an outcome variable, Y , and then use this information to provide better allocations of some kind of treatment. Denoting {0, 1, ..., R} as the set of possible programmes, we are looking for a mapping of characteristics and outcomes into the set of programmes {X, Y } {0, 1, ..., R} In this paper we will analyze the allocation of ALMPs to unemployed using a statistical programme selection mechanism. This is an interesting problem because of the increased use of ALMPs in many countries and because resource constraints implies the need for a precise targeting of expensive programmes. Demographic changes in the future years contributing to an increased demand for labour may very well magnify the importance of a precise targeting system. Before considering how to analyze the relationships between X, Y and the programmes, we define what the system is supposed to achieve because the goals are important for analyzing how to set up and implement the system in an optimal way. Also, by having some clearly defined goals, we can more easily evaluate the system afterwards. Ideally we would like to specify a welfare function for the society and let this function guide the implementation of the system. For a given welfare function, the programmes should always be allocated to maximize the welfare function, by allocating programmes to those benefitting the most from participation, measured in terms of changes in the overall welfare function. Even though this way of setting up the problem is very general, and perhaps not very useful for practical purposes, it can be helpful by structuring the 4
For applications of profiling in other kinds of litterature see f.ex. Auerhahn (1999) for an application
in Criminology, Yeo et al. (2001) for an application about Insurance, Shaw et al. (2001) for an application in Marketing, and Khan et al.(2001) for an application in Medicine.
4
thoughts about how to define some more concrete goals. Ultimately the maximization of many sorts of welfare functions can be boiled down to maximizing equity and/or efficiency. Increasing equity should only be the goal if it is the most effective way of increasing the welfare function. A maximin welfare function is an example of a welfare function leading to this choice of goal. Similarly if improving efficiency is the goal, then the reasoning behind this should be, that an increase in the welfare function is most effectively achieved by improving efficiency. In this paper we consider a goal of maximizing efficiency. Having defined the goals of the system, we can move on to discuss how to implement the system. The first objective is to define the outcome variable, Y , that is, the variable which is going to be used for the statistical programme selection. The stated goals for the system should guide the choice of outcome variable, but even with clearly defined goals, the definition of the outcome variable(s) is not given. The choice of outcome variable may also depend on the available X variables, or rather, on how precisely we are able to relate X to Y . Usually Y is chosen as a scalar, but it could in principle be an index measuring different aspects related to the goals. In the case of ALMPs possible Y variables include duration of unemployment, duration of subsequent job, average time spent in unemployment over some time period, or wage in next job. To summarize, the outcome variable should be highly correlated with the goals, it should vary over the relevant population, and we should be able to explain significant parts of this variation. We choose the impact on the duration of unemployment as the outcome variable. As an illustration, consider the concrete example of using a statistical model to identify individuals in greatest risk of becoming long term unemployed. Equity could be one of the goals behind this system, if these individuals are generally those with the lowest welfare, and if the programmes are in fact improving the situation for those participating. But this allocation mechanism could also be the result of a goal of increasing efficiency, if these individuals are actually among those who benefits the most from the assigned programmes. Rosholm, Staghøj, Svarer & Hammer (2006) estimates such a model and defines the outcome variable, Y , as the probability of remaining in unemployment 26 weeks after the period in time where the prediction is made. Caseworkers then receive this information and decide on the allocation of programmes to unemployed. When defined in this way, the interesting statistical relationship is that between X and Y , and because of the definition of the outcome variable, the most important interesting feature becomes predictive ability. A duration model for the time spent in unemployment is estimated and used to calculate the relevant probabilities, but in this case any econometric model 5
that reveals the systematic relationship between X and Y could have been used without thinking about causal relationships. A model like this is what we refers to as a Profiling model, because it only takes the ex ante situation of the individuals into account when profiling. When we instead define the outcome variable, Y , as the impacts on the duration of unemployment, this clearly serves a goal of increasing efficiency. With this specification of the outcome variable we have to be more careful when estimating the relationship between X and Y in order to get consistent estimators for causal effects. And then we also have to consider a way of relating outcomes to programmes. A model like this is what we refer to as a Targeting model because it does not only take the ex ante situation into account, but also uses information about the predicted situation after the programmes are assigned. This is the problem we analyze in this paper, and it involves a more detailed econometric analysis compared to the pure profiling approach.
2.2
Statistical model vs. caseworker discretion
As mentioned in the introduction, current allocation mechanisms typically consists of a combination of some initial deterministic screening device and discretionary power to caseworkers. So in this section we discuss the contributions of a statistical model and compare it to caseworkers. The first question to address is whether the system should be implemented as an alternative to caseworkers or as a tool offered to caseworkers? If implemented as an alternative to caseworkers, the system would ensure equal treatment of similar people5 , which might very well be an important criteria, and it would of course offer considerable scope for cost savings since no caseworker salaries would have to be paid.But we argue that using the system to equip caseworkers with additional information is a better idea in the problem setup considered in this paper. Since the institutions taking care of allocation of the programmes are often publicly funded, this way of implementing the system as an information system might also be seen as a an attempt to compensate for the lack of a natural pricing-mechanism guiding caseworkers on which programmes are the most valuable, for the particular type of unemployed. When considering a mixture of a statistical programme selection system and caseworker discretion, we need to think about the interactions between the different mecha5
Analyses in Denmark have shown huge differences in the performance of local institutions and dif-
ferences between individual Caseworkers may exist as well.
6
nisms. A good statistical system provides caseworkers with new and relevant information, so the important ability of the system becomes the partial degree of explanation it can bring. An important question to ask is when in the process the two mechanisms should be implemented? Although we want to exploit heterogeneity, it might be reasonable to expect the statistical model to work best when considering somewhat similar individuals. So perhaps caseworkers could be given the task of sorting unemployed into groups of somewhat homogeneous individuals; for example into one group of individuals who are just unemployed and another group of individuals who have other problems as well. The statistical predictions would probably not be very accurate for a person with an alcoholic abuse or similar problems not observed in the data, but for relatively "mainstream" unemployed the system may do a good job in assisting the caseworkers. The main contribution from a statistical system is the possibility to include literally hundredth of thousands of observations in performing inference on previously treated unemployed and then use this information in an attempt to predict the future. Even experienced caseworkers will only meet a very limited number of unemployed and furthermore, the caseworkers might not be able to follow the unemployed over a sufficient time period to actually see the results after the programmes are finished. Also, it may be impossible for caseworkers to actually recognize the effects from a very small sample of unemployed if for example the effect is a 25 % increase in an already very small probability of finding a job in a given time period. One way to address this lack of feed back could be to implement a statistical information system. So the statistical system observes and systematically analyze a lot of information for many different types of individuals, but the caseworker typically observe some information about the individuals which can not be implemented in the statistical system. Motivation or ability are typical examples of this kind of local information. These concepts could be seen as some kind of unobserved variables which are important to take into account and may be partially observed by the caseworker, but not included in the statistical model. If these unobserved variables are sufficiently important for estimating the true relationships between X and Y
variables, then caseworkers may actually do a better
job than the statistical model. Lechner & Smith (2006) however shows some evidence suggesting that caseworkers are not very good at predicting outcomes, or interpreting the results differently, they are not seeking to maximize the expected outcomes. In their paper they actually show that caseworkers are no better than a random selection mechanism.6 6
Their results are conditional on the estimated model, so strictly speaking, the caseworkers could still
7
Similar results are found in Bell & Orr (2002) and in Aakvik, Heckman & Vytlacil (2005) where caseworkers are actually found to target the unemployed with the lowest effects. An obvious, and perfectly valuable objection, often mentioned as an argument against a statistical programme selection system is, that the statistical system can not estimate the outcomes with sufficient precision. Therefore we have to pay attention to the uncertainty of the model predictions and also consider possible ways to present the results of the model. But caseworkers, and other allocation mechanisms, makes mistakes as well, and with a statistical programme selection system at least we can try to measure the uncertainty, so this argument should not ex ante be a major obstruction to the estimation of a statistical model.
3
Econometric Model
3.1
Potential outcomes
In this paper we will estimate a model which are supposed to maximize efficiency of the programmes, measured in terms of the ability to decrease the duration of unemployment. To operationalize this we have to predict the future outcome, for each individual, conditional on participating in any of the possible programmes. Then we can measure the impact as the difference in the predicted outcomes compared to the option of No Programme, and choose the programme which maximizes the impact (and choose No Programme if all impacts are negative). In the econometric exercise of estimating these impacts we encounter what is called the fundamental evaluation problem, and to discuss this we introduce the concept of potential outcomes.7 For each individual we define the potential outcomes as Y0 , Y1 , ..., YR where {0, 1, ..., R} is the set of possible programmes denoting 0 as the option of No Programme. By constructing these hypothetical outcomes we are able to analyze various features of the distribution of conditional outcomes, and hence also the expected impact compared to No Programme. The fundamental evaluation problem is, that we for each individual observes at most one of the potential outcomes, since the programmes are be superior to the model if local information is very important. The results from the ongoing pilot study in Switzerland will add information on this issue. 7 Rubin (1974)
8
mutually exclusive. All other potential outcomes will remain counterfactuals and we need additional identifying assumptions in order to estimate these. If assignment to programmes where completely random the estimation would be straightforward, but since the assignment of unemployed into programmes are probably not random, we have to distinguish the causal effects from selection effects. One possible identifying assumption used when estimating the Swiss SAPS model is the Conditional Independence Assumption (CIA), which can be stated as Y0 , Y1 , ..., YR ∐ D|X
∀x ∈ χ
where D ∈ {0, 1, ..., R} is a dummy variable indicating which programme the individual is assigned to, and χ is the relevant set of characteristics. With this assumption we get that E(Yr |X, D = r) = E(Yr |X, D = r) = E(Yr |X) and hence we can estimate the actually experienced outcomes, as well as the counterfactuals, as long as we condition on X. If the assignment of programmes are random, this assumption is clearly fulfilled when the support conditions are satisfied, but the selection into programmes are also allowed to depend on the observed variables, X. Another way to explain this assumption is, that we have to observe and condition on all variables which influences both the potential outcomes and the selection process into the programmes. Proper use of this identifying assumption will typically require a very rich data set containing detailed data on individual characteristics as well as market specific informations. For the present analysis we actually have access to quite detailed data, so it might be reasonable to assume that CIA is fulfilled. But if there exists some unobserved variables which influences the selection process as well as the potential outcomes, the CIA approach will result in biased estimates. As described in the following section we use the "timingof-events" model developed by Abbring & van den Berg (2003) to estimate the potential outcomes, and this actually allows some kind of unobserved heterogeneity to be present, so strictly speaking we do not need CIA to be fulfilled. As will be explained later on, the unobserved heterogeneity we introduce is however quite restrictive, so we will probably still have to argue that CIA is at least approximately fulfilled. Finally, we assume that the Stable Unit Treatment Value Assumption (SUTVA) is fulfilled, which means that the potential outcomes for each individual does not depend on the treatment of other individuals.8 This implies that we ignore possible general 8
Rubin (1980)
9
equilibrium effects, and constrict our self to a partial equilibrium analysis.
3.2
Duration analysis
We want to allocate individuals to the programme where they experience the shortest expected period of unemployment. Hence, the stochastic variable of interest is the duration of unemployment, Tu ∈ (0, ∞). To model the selection process into programmes, we define another stochastic variable, Tp ∈ (0, ∞), as the duration until assigned to ALMP. If Tp < Tu we observe when the individual is assigned to ALMP, and if Tp = Tu then Tp is right censored and the individual has not participated in ALMP before leaving unemployment. The duration until participation in a programme, Tp , is modelled from the durations until assignment into each of the 4 types of programmes, Tp1 , Tp2 , Tp3 , Tp4 .9 We only evaluate each individual’s first ALMP, so if we observe the start of a second programme, the duration is treated as right-censored. The identifying strategy in the timing-of-events approach is to use exogenous variation in the time until the unemployed are assigned into programmes. This strategy is well-suited for an evaluation of ALMPs in Denmark, because we actually observe a lot of variation in the time until individuals are assigned to a programme. Some unemployed are assigned into a programme very early in their unemployment spell, and if there are exogenous variation in the timing of the assignments, we can use similar unemployed, not yet assigned to a programme, as the relevant counterfactuals. We model the selection processes into programmes conditional on observable variables, and argue that the remaining variation is exogenous. One possible reason for exogenous variation in the timing comes from supply constraints at the local unemployment office. A possible problem with the timing-of-events approach in this context could be, that unemployed in Denmark are required to participate in some kind of ALMP after a certain period of unemployment. After one year of what we label as "open" unemployment, unemployed enters the activation period and in this period they should in principle participate in a programme in 75 % of the time. But apparently this is not enforced too strictly by the labour market authorities, so even for longer unemployment durations, we still have some individuals which can be used as counterfactuals. Another important assumption underlying this model is, that the unemployed do not know the precise timing of when they are assigned to a programme, but they are allowed to know the distribution 9
The types of programmes is described in the data section.
10
of time until assignment to a programme. If unemployed are told in advance when they will be assigned to a programme, then they may change their behavior before entering the programme, and hence we can not use them as proper counterfactuals. Again we argue that this is probably not a serious problem in the Danish case as programme participation is typically not planned a long time before actually taking place. 3.2.1
Hazard functions
The central concept in duration analysis is the hazard function, so to estimate the model we have to specify this. A hazard function is the probability of exit from a state to another at time t, conditional on having survived in the state until time t: P (t < T < t + dt|T > t, xt , v) dt→0 dt
θ(t|xt , v) = lim
(1)
We model the hazard functions as Mixed Proportional Hazards, which means that the hazards are modelled as the product of a baseline hazard, λ(t), depending on time, and a scaling function, φ(xt , v), depending on the observed characteristics and possibly also some unobserved heterogeneity terms.10 This functional form is often used in duration analysis because it is convenient for estimation and can be made quite flexible by specifying the baseline hazard as a flexible function. The Mixed Proportional Hazard form is needed for identification in the timing-of-events approach because the unobserved heterogeneity is identified from the the observed nonproportionality in the hazard rates conditional on observed characteristics.11 We allow for unobserved heterogeneity by introducing the stochastic variables Vu , and Vp = (Vp1 , Vp2 , Vp3 , Vp4 ) as unobserved variables, which are allowed to have direct effects on the hazards into employment and programmes.12 We restrict the distributions of the unobserved variables to be discrete with only two mass-points, and allow the distributions of Vu and Vp to be correlated. This way of introducing unobserved heterogeneity is based on Heckman & Singer (1984) and it is an often used method in econometric duration analysis. Van den Berg (2001) concludes that single spell duration analysis is very sensitive 10
as a result of the time-varying covariates this scaling function is actually time-varying but we use the
conventional notation and denote it as a scaling function. 11 See Van den Berg (2001) for a detailed discussion of identification issues in duration models. 12 It is well known that problems with unobserved heterogeneity is particularly important to handle in duration models. In contrast to usual regression models, even unobserved heterogeneity which is uncorrelated with the covariates implies biased results in duration models if not taken properly into account.
11
to the specification of the distribution of unobserved heterogeneity but also writes that "... a consensus has emerged that multi-spell data allow for reliable inference that is robust with respect to the specification of the unobserved heterogeneity distribution."13 The reason for this is, that with multi-spell data the identification does not lie completely on the proportionality assumption when we assume the unobserved heterogeneity term to be constant over time for each individual. And since we observe multi-spell in our event history data, we argue that the discrete distribution for unobserved heterogeneity is applicable in this context. One of the mass-points in each marginal distribution is normalized to zero so Vu ∈ {vu1 = 0, vu2 } and Vpj ∈ {vp1j = 0, vp2j }, j = 1, 2, 3, 4. The distributions of the unobservable terms in the 4 different activation hazards are assumed to be perfectly correlated, which means that if an individual is more likely to be assigned to a particular type of programme, this individual is also more likely to be assigned to the other types of programmes. This assumption can be relaxed to allow for different selection on unobservables into the different programmes, but the perfect correlation restriction simplifies the estimation process. The correlation between Vu , and Vp is important, because this is the way in which this procedure allows selection on unobservables without a resulting bias in the estimated effects. The associated probabilities for all the possible combinations from the discrete distributions are defined as P1 = Pr(Vu = vu1 , Vp1 = vp11 , Vp2 = vp12 , Vp3 = vp13 , Vp4 = vp14 ) P2 = Pr(Vu = vu2 , Vp1 = vp11 , Vp2 = vp12 , Vp3 = vp13 , Vp4 = vp14 ) P3 = Pr(Vu = vu1 , Vp1 = vp21 , Vp2 = vp22 , Vp3 = vp23 , Vp4 = vp24 ) P4 = Pr(Vu = vu2 , Vp1 = vp21 , Vp2 = vp22 , Vp3 = vp23 , Vp4 = vp24 ) where 0 ≤ Pj ≤ 1, j = 1, 2, 3, 4 and hazards can be written as
4
j=1
Pj = 1. With these assumptions, the
θ(t|xt , v) = λ(t)φ(xt , v)
(2)
The baseline hazard, λ(t), is flexibly specified as a piecewise-constant hazard, in which we divide the time line into a number of intervals. For the unemployment hazard for example, we divide the time line into M = 12 intervals measured in weeks (0-4, 4-8, 13
Van den Berg (2001) suggests ways to use economic restrictions to provide a more robust analysis,
and Gaure, Røed & Zhang (2005) shows in a monte carlo analysis that the method proposed by Heckman & Singer is actually quite robust.
12
8-12, 12-16, 16-20, 20-25, 25-35, 35-52, 52-79, 79-104, 104-156, 156-) and we let λu (t) = (λu1 , ..., λu12 ) denote the estimated parameters in these intervals. The scaling function has to be positive and is specified as φ(xt , v) = exp(xt β + v) We will estimate 5 hazard functions: the main hazard out of unemployment, θu (tu |xt , vu ), and 4 hazards into the different kind of programmes, θ pj (tpj |xt , vpj ) where j = 1, 2, 3, 4. The hazard into programmes is then defined as the sum of all 4 programme hazards, θp (tp |xt , vp ) =
4
θpj (tpj |xt , vpj )
(3)
j=1
Using these hazards and defining a censoring dummy, Ci , equal to 1 if the duration spell of unemployment for individual i is uncensored, we can construct the likelihood function for individual i with K unemployment spell as Li (vu , vp ) =
K
θp [tpk |xtk , vp ]1[tpk