Heterogeneity versus duration dependence with ... - Wiley Online Library

24 downloads 0 Views 172KB Size Report
R. ROBB, H. FRYDMAN AND A. ROBERTSON at t = 0 and whose employment status is observed again at three points in time, t = 1,2,3. Assuming a mixture of ...
Research Article Received 14 June 2016,

Revised 6 January 2017,

Accepted 31 January 2017

Published online in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/asmb.2242

Heterogeneity versus duration dependence with competing risks: an application to the labor market Richard Robba*† , Halina Frydmanb and Andrew Robertsonc Two hypotheses can explain the declining probability of gaining employment as an unemployment spell wears on: heterogeneity of the unemployed versus duration dependence. The nonparametric tests developed in the literature for testing duration dependence would not account for the fact that an unemployment spell can terminate in other ways than employment. The nonparametric tests developed in this paper extend, under certain conditions, those tests to competing risks. We illustrate our test using US unemployment data in which we find little consistent evidence for duration dependence. © 2017 The Authors. Applied Stochastic Models in Business and Industry published by John Wiley & Sons, Ltd. Keywords: duration dependence; heterogeneity; competing risks; nonparametric tests

1. Introduction Six-hundred and twenty-nine people entered the US Current Population Survey (CPS) in March 2013 as unemployed. In each of the following 3 months, the proportion of unemployed individuals who found employment declined: 18.8% in April, 17.0% in May, and 7.7% in June. There are two explanations for why the probability of finding a job goes down as an unemployment spell persists: (i) people are heterogeneous and the surviving population from any cohort contains the least employable or (ii) each person’s chance of employment goes down over time. The literature refers to the first hypothesis as “no duration dependence.” (e.g., [1]) It arises in the job search model of Lippman and McCall [2] in which unemployed people receive wage offers at constant rates drawn from a stable distribution over an infinite horizon. The distribution of waiting times to find a job is therefore a “mixture of exponentials”; each unemployment spell is exponentially distributed. Hazard rates vary across individuals, so the rate at which the surviving population finds a job declines as a function of the length of the unemployment spell. The alternative explanation, negative duration dependence, could arise most simply from diminishing returns to job search. After unemployed people exhaust their best leads, they have to look for work in less promising places. Other sources for negative duration dependence include dissipating skills the longer a person is out of work as in [3] or a stigma that lowers the worker’s wage offers (e.g., [4]). Theoretically, positive duration dependence could arise from declining assets that motivate the unemployed person to accept a lower wage, unemployment insurance running out, or a declining time horizon as in the Lippman and McCall model that lowers the benefit of holding out for a higher wage (e.g., [5]). Knowing whether duration dependence characterizes unemployment spells has important policy implications. In the dissipating skills or stigma theories—but not with diminishing returns to job search—unemployment causes unemployment and society benefits from putting people in jobs before they start a downward spiral. If this is the case, on the margin, government should provide less unemployment insurance and favor macroeconomic policies that smooth troughs in the business cycle. Ever since [5] following a suggestion in [6], economists have known that these two hypotheses can sometimes be distinguished from data on single spells. To take the simplest example, consider a cohort of individuals who are unemployed

a School

of International and Public Affairs, Columbia University, New York, U.S.A. York University Stern School of Business, New York, U.S.A. c 11 Waterloo Place, London, UK. *Correspondence to: Richard Robb, School of International and Public Affairs, Columbia University, New York, U.S.A. † E-mail: [email protected] This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. b New

© 2017 The Authors. Applied Stochastic Models in Business and Industry published by John Wiley & Sons, Ltd.

Appl. Stochastic Models Bus. Ind. 2017

R. ROBB, H. FRYDMAN AND A. ROBERTSON

at t = 0 and whose employment status is observed again at three points in time, t = 1, 2, 3. Assuming a mixture of exponentials, it does not matter how long they have been unemployed before t = 0. Assume for now that the only way an individual can leave unemployment is for employment, thus ignoring unemployed individuals who drop out of the labor force or go Let 𝜋 be the probability that an individual from a cohort gains employment (fails) in period i, and ( missing. ∑i−1 ) i 𝜃i = 𝜋i ∕ 1 − k=1 𝜋k , i ⩾ 1. Here, 𝜃i is the hazard rate in period i defined as the conditional probability of failure in ) ∏i−1 ( period i given survival up to the beginning of the period: Writing the 𝜋i ′ s in terms of the 𝜃i ′ s, 𝜋i = 𝜃 i k=1 1−𝜃k . In the case of three periods, necessary and sufficient conditions for the mixture of exponentials hypothesis are the following inequalities (e.g., [1], p. 584): 𝜋3 𝜋2 𝜋1 ⩾ ⩾ (1a) 1 − 𝜋1 1 − 𝜋1 − 𝜋2 𝜋1 𝜋3 ⩾ 𝜋22

.

(1b)

Condition (1a) can be rephrased as 𝜃1 ⩾ 𝜃 2 ⩾ 𝜃3 . It states that the failure probabilities decline from one period to the next conditional on surviving up to the beginning of the period or equivalently that the hazard rate declines from one period to the next. Assuming that we observe declining hazard rate as in (1a), Condition (1b) provides a restriction for testing the hypothesis of mixture of exponentials against a declining hazard rate not generated by a mixture of exponentials. Focusing on transitions from unemployment to employment in isolation, the sample proportions in the March 2013 cohort violate (1b): 𝜋̂ 1 = 0.188; 𝜋̂ 2 = (1−0.188)×0.170 = 0.138, 𝜋̂ 3 = (1−0.188)×(1−0.170)×0.077 = 0.054, and 𝜋̂ 1 𝜋̂ 3 < 𝜋̂ 22 . The intuition behind (1b) is straightforward. The mixture of exponentials places a lower bound on 𝜋3 as a function of 𝜋1 and 𝜋2 . In 2013, the decline in the hazard rate from April to May was fairly gentle, dropping from 18.8% to 17.0%. This implies a limited amount of heterogeneity assuming the mixture of exponentials hypothesis. Then the hazard falls abruptly in the third month: only 7.7% of the individuals who were unemployed through May found jobs in June. Under the mixture of exponentials hypothesis, the hazard should not fall so sharply in the third month, considering we already know there is not much heterogeneity in this population based on evidence from the first 2 months. Negative duration dependence, which is the alternative hypothesis, is consistent with any pattern of declining hazards. With negative duration dependence, the transition rate for each individual could theoretically drop to zero in the third period but not for a mixture of exponentials. Nonparametric tests for duration dependence considered in the literature assume that failure times are generated by a single risk. In studies of US unemployment, this is a troubling assumption, because gross flows from unemployment (U) to out of the labor force (O) are as large as the flows from unemployment to employment (E) while flows from unemployment to missing from the survey (M) are also surprisingly large. Table I shows the sample hazard rates averaged over all 161 cohorts that entered the sample in each month from January 2000 through July 2013. The data only consider individuals who entered the survey as unemployed. A person who finds employment the month after entering the survey counts as U → E in Month 1. If he remains unemployed in Month 1 and finds employment in Month 2, he counts as U → E in Month 2 and so on. For instance, a person who enters as unemployed and stays unemployed for the next 2 months has on average a 15.6% chance of U → E, a 13.9% chance of U → O, a 5.1% chance of U → M, and a 65.4% chance of staying in U in the fourth and final time he participates in the survey. Competing risks confound many other problems in which an analyst would like to distinguish between duration dependence and the mixture of exponentials hypothesis. In models of fecundability in [7], a woman might die before having a child, so the time to first birth is never observed. Censoring is likely to be nonrandom because mortality risk is correlated with fecundability. Diseases such as Hodgkin’s exhibit declining incidence as a function of age over certain ages. Is the surviving population made up of individuals who are naturally less vulnerable to Hodgkin’s or do individuals become biologically stronger over time? The conventional test would assume a single risk of Hodgkin’s, yet in reality, people die of other cancers or other causes. Similarly, the default rate of low-rated bonds declines over time. Is this because the best Table I. Hazard rates out of unemployment—all cohorts combined January 2000–July 2013. Month 1 2 3

U → E (%)

U → O (%)

U → M (%)

21.7 18.0 15.6

22.5 16.5 13.9

7.0 6.1 5.1

Source: US Current Population Survey; See Section 6 for methodology. © 2017 The Authors. Applied Stochastic Models in Business and Industry published by John Wiley & Sons, Ltd.

Appl. Stochastic Models Bus. Ind. 2017

R. ROBB, H. FRYDMAN AND A. ROBERTSON

companies survive or do these companies tend to repair their financial condition? Again, the company may be subject to risk of merger or may choose to become unrated. In both cases, a company will drop out of the rating agencies’ data sets. To take one final example, the probability, a young driver files his first insurance claim on any given day goes down as time passes. Some drivers may be naturally more cautious (heterogeneity), or alternatively, drivers may become more skilled with practice (duration dependence). A subset of drivers switches insurance companies, so their time-to-first-accident is not observed. The decision to switch companies could be correlated to a driver’s ability. The insurance company would like to distinguish between these hypotheses: to the extent that heterogeneity dominates learning, a prudent insurance company would raise rates following an accident, thereby weeding out careless customers. This paper investigates tests for heterogeneity in a primary risk in the presence of competing risks. Assuming that each person has a constant hazard rate of failing from the primary risk, we establish the following results for discrete data: • The same restrictions on the discrete-time failure probabilities that characterize a mixture of exponential distributions with a single risk in isolation also apply when each individual has a competing risk with a hazard rate that is a completely monotone function (Theorem 1). (These restrictions are presented in the next section). Because a constant function is completely monotone, this result covers the special case where the competing risk has a constant hazard rate. • With three time periods, the identical necessary and sufficient conditions (1a)–(1b) apply to the probabilities of failure over equally sized intervals from a primary risk when each individual’s hazard rate of failure from one or more competing risks is non-increasing (Theorem 2). • With four or more time periods, there is no analogous result to Theorem 2; the conditions that characterize heterogeneity in a single risk case do not extend to competing risks (Theorem 3). Theorem 2 leads to tests suited to US unemployment data. It is plausible to assume that both O and M fulfill the conditions for Theorem 2; this assumption is supported by the data in Table I, which shows that the empirical hazard rates slope downwards. The paper is organized as follows. Section 2 presents the results for single risks. The proof of Theorem 1 is presented in Section 3 and of Theorem 2 in Section 4. Section 5 develops the statistical test of condition (1b). The empirical results are discussed in Section 6. Section 7 concludes.

2. Notation and summary of main results for single risks Let T denote the continuous failure time due to a single risk, for example, time to being hired for an unemployed person or time to Hodgkin’s diagnosis for a healthy person. T is counted from some reference time t = 0. If each person faces a single risk with a constant failure rate, 𝜃, the probability of surviving beyond the ith period is as follows: ∞

Si = P(T > i) =

∫0

e−𝜃i 𝜑(𝜃)d𝜃

i = 0, 1, 2, …

(2)

where 𝜙(𝜃) is the probability density function of the failure rate in a population. The necessary and sufficient conditions for any sequence {Si } to have a representation in the form of (2) is that {Si } be completely monotone (e.g., [8], p. 9 or [9], pp. 225–226): (−1)

j

j ∑ i=0

( ) j (−1) j−i Sm+i ⩾ 0 for all j, m = 0, 1, 2, … i

or equivalently, (−1)

j

j ∑ i=0

( ) j (−1) j−i 𝜋m+i+1 ⩾ 0 for all j, m = 0, 1, 2 … i

(3)

In practice, we collect data over a finite number of periods, n, from which we can estimate 𝜋1 , 𝜋2 , … , 𝜋n . For n = 3, conditions (1a)–(1b) imply that the corresponding {Si }, i = 1,2,3, have a representation in (2). The connection between conditions (1a)–(1b) and (3) is the following: © 2017 The Authors. Applied Stochastic Models in Business and Industry published by John Wiley & Sons, Ltd.

Appl. Stochastic Models Bus. Ind. 2017

R. ROBB, H. FRYDMAN AND A. ROBERTSON

Proposition 1 Suppose an infinite sequence of probabilities 𝜋1 , 𝜋2 , … satisfies (3). Then 𝜋1 , 𝜋2 , …, 𝜋n satisfies conditions (2.3a)–(2.4b) in [1], that is, certain Hankel matrices formed from the Si are positive semidefinite. Conversely, if the sequence 𝜋1 , 𝜋2 , …, 𝜋n satisfies conditions (2.3a)–(2.4b) in [1] then there exists an infinite sequence of probabilities 𝜋n+1 , 𝜋n+2 ,… such that the sequence 𝜋1 , 𝜋2 ,…, 𝜋n , 𝜋n+1 , 𝜋n+2 … satisfies (3). For n = 3, conditions (2.3a)–(2.4b) are the same as (1a)–(1b). Proof See [10] (p. 74) or Theorem 4 in [1] (p. 584).

3. The competing risk’s hazard rate is completely monotone Assume each person has a constant hazard rate 𝜃 of failing from the primary risk and hazard rate function 𝛾𝜆(t) of failing from a competing risk, where 𝛾 is an individual’s specific constant and 𝜆(t) is a completely monotone function that may differ across individuals: (−1)k 𝜆(k) (t) ⩾ 0

k = 0, 1, 2 … , t ⩾ 0.

(4a)

Some examples of completely monotone hazard functions are as follows: 𝜆(t) = 𝛼, 𝛼 > 0; 𝜆(t) = e−𝛼t , 𝛼 ⩾ 0; 𝜆(t) = 𝛼t𝛼−1 , 1 ⩾ 𝛼 > 0; and 𝜆(t) = (t + 𝛼)−𝛽 , 𝛼 > 0, 𝛽 ⩾ 0. If 𝜙(𝜃, 𝛾) is the joint probability density function of the distribution of hazard rates in a population, the probability that a randomly chosen person fails because of the primary risk in period i is ∞

𝜋1i =

∫0

[



∫0

i

∫i−1

[ ] t − 𝛾 ∫0 𝜆(u)du+𝜃t

𝜃e

] dt 𝜙(𝜃, 𝛾)d𝜃d𝛾

i ⩾ 1.

(4b)

In the case where 𝜙(𝜃, 𝛾) is a discrete probability function, the expressions for 𝜋1i are obtained from (4b) by replacing integrals with summations. Theorem 1 Consider the competing risk model defined by (4a)–(4b). Then (i) 𝜋1i , i = 0, 1, 2, … satisfy (3) with 𝜋i replaced by 𝜋1i . (ii) 𝜋1i , 1 ⩽ i ⩽ n, satisfy (2.3a)–(2.4b) in [1] with 𝜋i replaced by 𝜋1i . (iii) 𝜋1i , 1 ⩽ i ⩽ 3, satisfy (1a)–(1b) with 𝜋i replaced by 𝜋1i . This is a special case of (ii). Proof Proof: The proof is similar to [1] (p. 583) for the single risk case. First, assume that 𝜆(t) is a common function. Because completely monotone functions generate completely monotone sequences ([11], p. 158), we can prove (i) by showing that [ ] t

− 𝛾 ∫ 𝜆(u)du+𝜃t

0 is completely monotone. If F(x) is completely monotone and Ψ(x) is nonnegative with a completely monoe x tone derivative, then F(Ψ(x)) is completely monotone. (See [9], p. 441.) Set F(x) = e−𝛾x and Ψ(x) = ∫0 𝜆(u)du + (𝜃∕𝛾)x for x > 0. The derivative of Ψ(x) is a completely monotone function of t due to (4a). Given our assumptions, Theorem 1(ii) follows from Theorem 1(i) by Proposition 1. Next, we relax the assumption that 𝜆(t) is a common function. Assume that λ(t) is randomly selected with two potential outcomes: P[λ(t) = λa (t)]= 𝛼 and P[λ(t) = λb (t)] = 1-𝛼 where both λa (t) and λb (t) are completely monotone on t∈[0, n]:



j

𝜋1i =

∫0



∫0

i

𝜃

∫i−1

[ ] t − 𝛾 ∫0 𝜆j (u)du+𝜃t

e

dt𝜙(𝜃, 𝛾)d𝜃d𝛾

i ⩾ 1 and j = a,b

and π1i = 𝛼πa1i + (1 − 𝛼) πb1i . We know that the Hankel matrices (2.3a)–(2.4b) in [1] formed from 𝛼πa1i for i = 1,…n as well as the Hankel matrices formed from (1 − 𝛼) πb1i are both positive-semidefinite, so the sum of these matrices is also positive semidefinite. By induction, we can extend this argument to any set of {λk (t): k ∈ K}where K is any countable set. This holds for any discrete distribution, so we can pass to the continuous case by observing that any integral with respect to a probability measure of positive semi-definite matrices is positive semi-definite. © 2017 The Authors. Applied Stochastic Models in Business and Industry published by John Wiley & Sons, Ltd.

Appl. Stochastic Models Bus. Ind. 2017

R. ROBB, H. FRYDMAN AND A. ROBERTSON

Restrictions in Theorem 1(ii) form the basis of the tests for all n ⩾ 3. Theorem 1 extends easily to multiple competing risks. Simply, combine all competing risks into a single risk and name it risk 2. Theorem 1 holds as long as the hazard rate of risk 2, which is the sum of the hazards over all competing risks, is a completely monotone function. This covers that case where each competing risk is completely monotone because the sum of completely monotone functions is completely monotone. It also covers the special case where the competing risk’s hazard rate is constant for each individual with no duration dependence either in the primary or the competing risk.

4. The competing risk’s hazard rate is non-increasing The following theorem is a generalization of Theorem 1 for the three-period case. We show that the competing risk’s hazard rate function only needs to be non-increasing. This is more general, because all completely monotone functions are non-increasing but not all non-increasing functions are completely monotone. Theorem 2 If (𝜋11 , 𝜋12 , 𝜋13 ) are given by (4b) and λ(t) is a non-increasing function on [0,3] that may differ across individuals, then (1a)–(1b) hold for all possible densities 𝜙(𝜃, 𝛾) with 𝜋i replaced by 𝜋1i . Proof See Appendix. The proof of Theorem 2 is highly specialized to the case n = 3 and does not extend to the case n ⩾ 4. Theorem 3 Suppose 𝜋1i is given by (4b) and 𝜆(t) is a non-increasing function. If n ⩾ 4, the restrictions in (3) do not hold in general. Proof It suffices to give a numerical counterexample. Let n = 4, 𝜆(t) be a step function with 𝜆1 = 0.05, 𝜆2 = 0.05, 𝜆3 = 0.03, 𝜆4 = 0, and P(𝜃 = 0.1) = 1, P(𝛾 = 1) = 1. Then 𝜋11 = 0.09286, 𝜋12 = 0.07993, 𝜋13 = 0.06947, 𝜋14 = 0.06190 and 𝜋11 – 3𝜋12 + 3𝜋13 – 𝜋14 = −0.000416. This violates (3) for m = 0 and n = 4.

5. Testing for duration dependence with three periods 2 Our interest is in testing 𝜋11 𝜋13 − 𝜋12 ⩾ 0 or equivalently

H0 ∶ 𝜂 ⩾ 0 H1 ∶ 𝜂 < 0, where 𝜂 = log 𝜋11 −2 log 𝜋12 + log 𝜋13 . Let 𝜂̂ = log𝜋̂ 11 −2 log 𝜋̂ 12 + log 𝜋̂ 13 , where 𝜋̂ 1i = n1i /N, i= 1, 2, 3, is the maximum likelihood estimator of 𝜋1i , n1i is the number of individuals who failed in the ith period due to cause 1 and N is the sample size. Because estimated multinomial probabilities 𝜋̂ 1i are asymptotically normally distributed, so are log 𝜋̂ 1i and hence 𝜂̂ is also approximately normally distributed. We use as test statistic TS: TS = ̂ 𝜂) where SD( ̂ is the estimated standard deviation of 𝜂: ̂ √ ̂ (𝜂) SD ̂ =

𝜂̂ ̂ 𝜂) SD( ̂

( N −1

,

) 4 1 1 + + . 𝜋̂ 11 𝜋̂ 12 𝜋̂ 13

(5)

(6)

̂ 𝜂), For the derivation of SD( ̂ see [12]. When the sample size N is large and 𝜂 = 0, TS has approximately a standard normal distribution. Denoting the standard normal random variable by Z and the (1-𝛼) percentile of the Z distribution by z𝛼 , the 𝛼-level decision rule rejects H0 if TS