questions to which it brings new insights that cannot be obtained with other .....
Combination of sample selection and IV quantile regression.....105. 3.3.2.
Applied Quantile Regression DISSERTATION of the University of St. Gallen, Graduate School of Business Administration, Economics, Law and Social Sciences (HSG) to obtain the title of Doctor of Philosophy in Economics and Finance submitted by Blaise Melly from Ayer-Sierre (Valais)
Approved on the application of Prof. Dr. Michael Lechner and Prof. Dr. Bernd Fitzenberger
Dissertation no. 3255 Gutenberg AG, Schaan 2006
The University of St. Gallen, Graduate School of Business Administration, Economics, Law and Social Sciences (HSG) hereby consents to the printing of the present dissertation, without hereby expressing any opinion on the views herein expressed. St. Gallen, June 16, 2006 The President:
Prof. Ernst Mohr, PhD
i
Acknowledgments First and foremost, I would like to express my gratitude to my supervisor Prof. Dr. Michael Lechner, who competently guided me but, at the same time, granted me the freedom to explore the topics I was interested in. In addition, I would like to thank my co-supervisors Prof. Dr. Bernd Fitzenberger and Prof. Dr. Fabio Trojani for their constructive discussions, suggestions and comments. I am also grateful to Stefanie Behncke, Martin Dietz, Dragana Djurdjevic, Markus Frölich, Benita von Lindeiner, Ruth Miquel, Patrick Puhani, Heidi Steiger, Stephan Wiehler, Stephan Werner, and Conny Wunsch for their input. My parents supported me wherever possible during my whole life and enabled me to pursue my education. They deserve my sincere gratitude. Last but not least, I am indebted to Katia for her incessant loving support, attention and encouragements. Our son Arnaud accompanied us during the last months of this thesis and gave me a further motivation to complete this work.
St. Gallen, September 2006
Blaise Melly
ii
iii
Summary The effects of policy variables on distributional outcomes beyond simple averages are of interest in many research areas and in particular in labor economics. For instance, the distributional consequences of minimum wage, training programs and education are of primary interest to policy makers. The ability of quantile regression models, as introduced by Koenker and Basset (1978), to characterize the impact of variables on the whole distribution of the outcome of interest makes them appealing in these economic applications. This technique has recently received a great deal of attention in both theoretical and empirical research. A number of papers propose new models or new estimators that deal with various extensions of the original model: censored quantile regression, instrumental variable quantile regression, nonparametric quantile regression among others. At the same time, there is a rapidly expanding empirical quantile regression literature in economics. Buchinsky (1998), Koenker and Hallock (2001) and Koenker (2005) have recently surveyed this literature. The goals of my thesis are to offer some further improvements that extend the range of applications of quantile regression and to apply quantile regression to questions to which it brings new insights that cannot be obtained with other estimators. The first chapter proposes estimators of unconditional distribution functions in the presence of covariates and derive their asymptotic distribution. Monte Carlo simulations and an application to the black-white wage gap illustrates the usefulness of the estimators. In the second chapter, we use the parametric estimator defined in the first chapter to reassess the sources of changes in the distribution of wages in the United States between 1973 and 1989 using hourly wage data from the May Current Population Survey (CPS) and from the outgoing rotation groups of the CPS. Unlike most previous studies, we find that residuals account only for about 20% of the total growth in wage inequality. In the third chapter, we apply the instrumental quantile regression model of Chernozhukov and Hansen (2004b and 2006) to examine the wage structure in the public and private sector.
iv
v
Table of Contents 1. Estimation of counterfactual distributions using quantile regression .................1 1.1 Introduction .............................................................................................2 1.2 Parameters of interest and identification strategies.................................4 1.3 Parametric estimator................................................................................9 1.3.1 Model and estimators ......................................................................9 1.3.2 Asymptotic results .........................................................................13 1.3.3 Estimation of the asymptotic variance ..........................................16 1.3.4 Extension: effects of residuals.......................................................18 1.4 Semiparametric estimator......................................................................19 1.4.1 Model and estimators ....................................................................19 1.4.2 Asymptotic results .........................................................................22 1.5 Monte-Carlo simulations .......................................................................25 1.5.1 Parametric estimator......................................................................25 1.5.2 Comparison to Machado and Mata (2005) estimator....................27 1.5.3 Semiparametric estimator..............................................................29 1.6 Applications: black-white wage differentials........................................31 1.6.1 Parametric estimator......................................................................32 1.6.2 Semiparametric estimator..............................................................33 1.7 Conclusion .............................................................................................34 Appendix A: Proof of theorem 1 .......................................................................36 Appendix B: Proof of theorem 2 .......................................................................40 Appendix C: Efficiency bounds ........................................................................42 Tables for chapter 1 ...........................................................................................46 Figures for chapter 1..........................................................................................53 2. Decomposition of differences in distribution using quantile regression...........59 2.1 Introduction ...........................................................................................60 2.2 Estimating distribution functions in the presence of covariates............63 2.2.1 Definition and motivation of the estimator ...................................63 2.2.2 Decomposition of differences in distribution................................65 2.3 Changes in US wage inequality between 1973 and 1989 .....................67 2.3.1 Introduction ...................................................................................67 2.3.2 Data................................................................................................68 2.3.3 First step quantile regression results .............................................70 2.3.4 Decomposition results ...................................................................72 2.4 Conclusion .............................................................................................75 Tables for chapter 2 ...........................................................................................76 Figure for chapter 2 ...........................................................................................79 Appendix A: Changing the order of the decomposition ...................................80 Appendix B: Changing the number of quantile regression ...............................85 Appendix C: Changes in the distribution of wages over sub-periods ...............89 Appendix D: Monte Carlo comparison of three decomposition methods.........93
vi 3. Public and private sector wage distributions controlling for endogenous sector choice.....................................................................................................................97 3.1 Introduction ...........................................................................................98 3.2 Endogeneity in the quantile regression model ....................................100 3.3 Estimation of interacted models ..........................................................105 3.3.1 Combination of sample selection and IV quantile regression.....105 3.3.2 Integration of nonparametric first step estimates ........................107 3.3.3 Finite sample properties of these estimators ...............................108 3.4 Decomposition of differences in distribution......................................109 3.5 Data, descriptive statistics and instruments.........................................111 3.6 Empirical results..................................................................................114 3.6.1 Exogenous sector choice .............................................................114 3.6.2 Choice between private and public sector...................................115 3.6.3 Endogenous dummy variable ......................................................116 3.6.4 Endogenous sector choice with fully interacted covariates ........118 3.6.5 Comparison with Abadie, Angrist and Imbens (2002) estimator120 3.6.6 Validity of the instruments ..........................................................121 3.7 Summary and conclusion ....................................................................123 Appendix A: Instrumental Variable Quantile Regression...............................125 Appendix B: Asymptotic distribution of the SIVQR estimator ......................127 Appendix C: Asymptotic distribution of the MDIVQR estimator..................130 Tables for chapter 3 .........................................................................................133 Figures for chapter 3........................................................................................141 Bibliography ........................................................................................................147 Curriculum Vitae .................................................................................................157
vii
List of Tables Table 1.1: Monte Carlo simulation, parametric first step, point estimates. ..........46 Table 1.2: Monte-Carlo simulation, parametric first step, estimation of the standard errors. ..............................................................................................47 Table 1.3: Monte-Carlo simulation, point estimates of qc ( 0.5 ) with 400 observations. ..................................................................................................48 Table 1.4: Monte Carlo simulation, nonparametric first step, point estimates. ....49 Table 1.5: Monte Carlo simulation, nonparametric first step, estimation of the standard errors. ..............................................................................................51 Table 1.6: Descriptive statistics, means ................................................................52 Table 2.1: Mean of the covariates, median regression coefficients and interdecile ranges.............................................................................................................76 Table 2.2: Decomposition of changes in measures of wage dispersion using quantile regression.........................................................................................78 Table D1: Monte Carlo simulation for the first data generating process ..............96 Table D2: Monte Carlo simulation for the second data generating process .........96 Table 3.1: Monte-Carlo simulations, without interaction terms .........................133 Table 3.2: Monte-Carlo simulations, with interaction terms ..............................134 Table 3.3: Definition of the variables..................................................................135 Table 3.4: Descriptive statistics, means ..............................................................136 Table 3.5: Median regression using different estimators ....................................137 Table 3.6: Estimation of the selection equation, dependent variable: psect .......138 Table 3.7: P-values on the instrumental quantile regression process .................139 Table 3.8: Results disaggregated by the position of the father within the public sector............................................................................................................140
viii
List of Figures Figure 1.1: Correlation between the proposed estimator and the MM estimator as function of m..................................................................................................53 Figure 1.2: MSE of the proposed estimator and the MM estimator as a function of m. ...................................................................................................................54 Figure 1.3: Decomposition of the black/white wage gap using parametric quantile regression.......................................................................................................55 Figure 1.4: Difference between the unconditional quantiles implied by the model and the sample quantiles................................................................................56 Figure 1.5: Decomposition of the black wage gap using nonparametric quantile regression.......................................................................................................57 Figure 2.1: Decomposition of differences in distribution using quantile regression .......................................................................................................................79 Figure 2.A1: Effects of residuals found by changing the order of the decomposition................................................................................................82 Figure 2.A2: Effects of coefficients found by changing the order of the decomposition................................................................................................83 Figure 2.A3: Effects of characteristics found by changing the order of the decomposition................................................................................................84 Figure 2.B1: Effects of residuals with different numbers of quantile regressions86 Figure 2.B2: Effects of coefficients with different numbers of quantile regressions .......................................................................................................................87 Figure 2.B3: Effects of characteristics with different numbers of quantile regressions .....................................................................................................88 Figure 2.C1: Decomposition of changes in distribution between 1973 and 1979 90 Figure 2.C2: Decomposition of changes in distribution between 1979 and 1984 91 Figure 2.C3: Decomposition of changes in distribution between 1984 and 1989 92 Figure 3.1: Kernel density estimates of the wage distributions ..........................141 Figure 3.2: Public sector wage “premium” at different quantiles .......................142 Figure 3.3: Decomposition of public private sector wage differential at different quantiles.......................................................................................................143 Figure 3.4: Public sector wage premium using instrumental quantile regression .....................................................................................................................144 Figure 3.5: Decomposition of public private sector wage differential correcting for endogeneity ............................................................................................145 Figure 3.6: Decomposition of public private sector wage differential correcting for endogeneity ............................................................................................146
1
Chapter 1 Estimation of counterfactual distributions using quantile regression This chapter proposes estimators of unconditional distribution functions in the presence of covariates. The conditional distribution is estimated by (parametric or nonparametric) quantile regression. In the parametric setting, we propose an extension of the Oaxaca / Blinder decomposition of means to the full distribution. In the nonparametric setting, we develop an efficient local-linear regression estimator for quantile treatment effects. We show
n
consistency and
asymptotic normality of the estimators and present analytical estimators of their variance. Monte-Carlo simulations show that the procedures perform well in finite samples. An application to the black-white wage gap illustrates the usefulness of the estimators. Keywords: Quantile Regression, Quantile Treatment Effect, Oaxaca / Blinder Decomposition, Wage Differentials, Racial Discrimination. JEL classification: C13, C14, C21, J15, J31.
2
1.1
Chapter 1: Counterfactual distributions
Introduction
Most of the econometric literature in which the effects of a binary treatment under exogeneity are estimated has focused on average treatment effects. In the parametric setting, discrimination studies are dominated by the Oaxaca (1973) / Blinder (1973) decomposition. In the nonparametric setting, the matching literature surveyed by Imbens (2004) has focused almost entirely on the estimation of average treatment effects. Nevertheless, in many research areas, the effects of policy variables on distributional outcomes beyond simple averages are of special interest. In particular in labor economics, the distributional consequences of minimum wages, training programs and education are of primary importance to policy makers. Motivated by this interest and by the increase in wage inequality during the last decades, studying changes in the distribution of wages has recently become an active area of research.1 However, this literature focuses almost entirely on estimation without providing asymptotic justification or inference procedures, and it relies mostly on parametric restrictions. In this chapter, we propose and derive the asymptotic distribution of a quantile equivalent of the Oaxaca / Blinder decomposition. Then, in order to relax the parametric restrictions, we propose and derive the asymptotic distribution of a local-linear-regression-based estimator for quantile treatment effects. A regression strategy is applied in this chapter. We first estimate the whole conditional distribution by (parametric and nonparametric) quantile regression. In a second step, we integrate the conditional distribution over the range of covariates in order to obtain an estimate of the unconditional distribution. The advantages of these estimators are the natural interpretability of the first step estimation and the clarity of the assumptions made. The quantile regression framework is intuitive and flexible. Due to its ability to capture heterogeneous 1 For instance, Juhn, Murphy and Pierce (1993), DiNardo, Fortin and Lemieux (1996), Gosling, Machin and Meghir (2000), Donald, Green and Paarsch (2000), Machado and Mata (2005), Lemieux (2006), Autor, Katz and Kearney (2005a and 2005b).
Chapter 1: Counterfactual distributions
3
effects, its theoretical properties have been studied extensively and it has been used in many empirical studies; see, for example, Koenker and Bassett (1978), Powell (1986), Koenker and Portnoy (1987), Chaudhuri (1991), Gutenbrunner and Jureckova (1992), Buchinsky (1994), Koenker and Xiao (2002), Angrist, Chernozhukov and Fernández-Val (2006). This chapter contributes to the existing literature in four different dimensions. First, while the basic idea of estimating the conditional distribution function by parametric quantile regression and integrating it to obtain the unconditional distribution is not new,2 we propose an estimator that is faster to compute. In Section 1.5.2 we show that the Machado and Mata (2005) estimator, which is the most common quantile regression-based decomposition, and our proposed estimator will be numerically identical if the number of simulations used in the Machado and Mata procedure goes to infinity3. Hence, our asymptotic results apply also to their estimator and, since it is never possible to compute an infinite number of simulations, our estimator actually uses more information. Second, we derive the asymptotic distribution of the parametric estimator and use the asymptotic results to propose an analytical estimator of its variance. Bootstrapping the results is time consuming and sometimes simply impossible if the number of observations is very large. The Monte-Carlo simulations show that the asymptotic results are useful approximations in finite sample. The analytical standard errors perform better than the bootstrap standard errors in our simulations. Third, we propose a new estimator based on nonparametric quantile regression that does not require any parametric restriction.
n consistency, asymptotic
normality and achievement of the semiparametric efficiency bounds are proven. This procedure can be seen as the quantile equivalent of the estimator proposed by Heckman, Ichimura and Todd (1998) for the mean. A consistent procedure for 2
Gosling, Machin and Meghir (2000) and Machado and Mata (2005) were the first to propose such a procedure. 3 The Machado and Mata estimator is a simulation-based estimator.
4
Chapter 1: Counterfactual distributions
the estimation of the variance is also presented. The estimators perform well in Monte Carlo simulations. Finally, we apply both estimators to issues concerning racial discrimination in the USA. We first decompose the black-white wage gap using linear quantile regression. Since this parametric assumption is rejected by the data, we then use nonparametric quantile regression in the first step. The differences in basic human capital characteristics explain about one-third of the differences in the level of wages. We find that the amount of discrimination depends on the quantile at which it is evaluated but we cannot interpret the results as a glass ceiling effect. The structure of the chapter is as follows. Section 1.2 defines and discusses the estimands of interest. In Section 1.3, a parametric estimator of unconditional distributions in the presence of covariates is defined and we show how it can be used to decompose the differences in distribution. Its asymptotic distribution is then derived and an analytical estimator of its variance is proposed. Section 1.4 is devoted to the local-linear-regression-based matching estimator for quantile treatment effects. Section 1.5 presents results from different Monte-Carlo simulations. The application is presented in Section 1.6 and Section 1.7 concludes.
1.2
Parameters of interest and identification strategies
We are interested in the effect of a binary treatment T on an outcome Y. We have a sample of n units indexed by i, with n0 control units and n1 treated units.
Ti = 0 if unit i receives the control treatment and Ti = 1 if unit i receives the active treatment. “Treatment” should not be taken in a restrictive sense: in the application of Section 1.6, T = 0 for whites and T = 1 for blacks. We use the potential-outcome notation of Neyman (1923) and characterize each unit by a pair of potential outcomes: Yi ( 0 ) for the outcome under the control treatment and
Yi (1) for the outcome under the active treatment. In addition, each unit has a K-
Chapter 1: Counterfactual distributions
5
dimensional vector of covariates X i . In the econometric literature, the most commonly studied estimands are the overall average treatment effect (ATE),
E Y (1) − E Y ( 0 ) , and the average treatment effect on the treated (ATET),
E Y (1) T = 1 − E Y ( 0 ) T = 1 .4 We extend this literature by considering quantile treatment effects for the same populations, hence the overall θ th quantile treatment effect (QTE),
FY−(11) (θ ) − FY−(10) (θ ) , and the θ th quantile treatment effect on the treated (QTET),
FY−(11) (θ T = 1) − FY−(10) (θ T = 1) , where FY−1 (θ ) is the θ th quantile of Y. Note that we identify and estimate the difference between the quantiles and not the quantile of the difference. With the assumptions made in this chapter we can only identify the marginal distributions of the potential outcomes but not their joint distribution. That is, we can identify the effect of a treatment on the mean, the variance, kurtosis, Gini coefficient, etc., of the distributions of the potential outcomes, but not the distribution of the individual treatment effects. In some applications, this is sufficient to answer economically meaningful questions. In welfare economics, for instance, a basic assumption is anonymity. In order to compare two distributions, all permutations of personal labels are regarded as distributional equivalent (Cowell 2000) and, thus, the joint distribution is not required. The joint distribution can be deduced from the marginal distributions if we make an additional assumption: rank invariance. This implies that the treatment does not alter the ranking of the units conditionally on X. This assumption is likely to be satisfied in several applications; for instance, it seems difficult to imagine that 4
These are population measures. Imbens (2004) and Abadie and Imbens (2006) consider also the same measures conditionally on the sample. For the quantiles as for the mean effects, the only difference between the two estimands concerns the asymptotic variance and is discussed later.
6
Chapter 1: Counterfactual distributions
gender or race can change the ranking of an individual in the potential wage distributions. In other cases, if the rank invariance assumption is not likely to be satisfied for all observations, we can allow for given levels of overlap and bound the quantile treatment effects using the approach of Heckman, Smith and Clements (1997). In any case, knowledge of all QTEs is more informative than that of the ATE, because the mean can always be estimated by integrating over the quantiles. Since the QTEs have been recognized to be a useful way of summarizing the information about the distributions of the potential outcomes, we propose both estimators and inference procedures for them. Potential
outcomes
are
only
partially
observed
because
only
Yi = (1 − Ti ) Yi ( 0 ) + TiYi (1) is observable. We thus need to assume that some restrictions are satisfied in order to identify the estimands of interest. In this chapter, we follow the matching literature, surveyed by Imbens (2004), and assume that all regressors are exogenous. An alternative to this assumption would be the use of instrumental variables or sample selection procedures,5 but we do not explore that approach in this chapter. Our key identifying assumption is unconfoundedness: Y ( 0 ) , Y (1) ⊥ T X . This assumption implies, for instance, that E Y ( 0 ) T = 1, X = E Y ( 0 ) T = 0, X = E Y ( 0 ) X
but also that
FY−(10) (θ T = 1, X ) = FY−(10) (θ T = 0, X ) = FY−(10) (θ X ) . When assuming unconfoundedness, parametric assumptions are a first way to identify and estimate counterfactual means and quantiles. Oaxaca (1973) and Blinder (1973) assume that the expected value of Y conditionally on X is a linear 0 function of X. E Y ( 0 ) T = 1 can then be consistently estimated by X 1 βˆOLS ,
5
Abadie, Angrist and Imbens (2002), Chesher (2003), Chernozhukov and Hansen (2006), for instance, have proposed IV estimators for conditional quantile functions. Once we have obtained the coefficients corrected for endogeneity, we can use the procedure proposed in this paper to estimate quantile treatment effects (Melly 2006b).
Chapter 1: Counterfactual distributions
7
0 where X 1 = n1−1 ∑ X i and βˆOLS is the vector of coefficients obtained by i:Ti =1
regressing Y on X using only control observations. They can decompose the difference between Y 1 = n1−1 ∑ Yi and Y 0 = n0 −1 i:Ti =1
∑Y
i
into
i:Ti = 0
1 0 0 0 + X 1 βˆOLS . Y 1 − Y 0 = X 1 βˆOLS − X 1 βˆOLS − X 0 βˆOLS
The first bracket represents the effect of coefficients, typically interpreted as discrimination in numerous studies, and the second bracket gives us the effect of characteristics (justified differential). Under these assumptions, the first bracket can also be written as E Y (1) T = 1 − E Y ( 0 ) T = 1 and it becomes clear that the Oaxaca / Blinder decomposition estimates the average treatment effect on the treated. If we take the treatment group as the reference, the average treatment effect on the untreated will be estimated. In order to extend this procedure to quantiles, we need to estimate the counterfactual quantile FY−(10) (θ T = 1) . We assume that all quantiles of Y conditional on X are linear in X. The conditional quantiles of Y can then be estimated by linear quantile regression. Since the unconditional quantile is not the same as the integral of the conditional quantiles, we must first invert the conditional quantile function in order to obtain the conditional distribution function. Then, the unconditional distribution function can be estimated by integrating the conditional distribution function over the range of the covariates. Finally, the unconditional distribution function can be inverted in order to obtain the unconditional quantiles of interest. The details of the procedure are developed in Section 1.3. For the parametric approach, we do not need to assume anything about the support of the covariates because the parametric assumption can be used to make out-of-support predictions. Obviously, one might worry about the parametric assumption, which is often arbitrary. If we want to relax the parametric restrictions, we will need to make an additional assumption: the common support
8
Chapter 1: Counterfactual distributions
condition. In order to estimate nonparametrically the counterfactual distribution of a treated unit with characteristics X, we need to find a control unit with (almost) the same characteristics. Using the notation p ( X ) = Pr (T = 1 X ) and
p = Pr (T = 1) we can state this assumption as follows: overlap: 0 < p ( x ) < 1 for all x in the support of X. This is the condition necessary to identify the overall quantile treatment effect. If the common support assumption is not satisfied, we can estimate the effects for the subpopulation satisfying the common support or bound the effects (Lechner 2001). For simplicity, we assume that the overlap restriction is satisfied for the whole population. Various methods have been proposed to estimate average treatment effects assuming unconfoundedness and overlap but rejecting any parametric restriction. Following Imbens (2004), we can classify these estimators in 3 groups: matching estimators compare outcomes for pairs of observations with (almost) the same value of X; propensity score estimators do not adjust directly for the covariates but for the propensity score; regression methods rely on the estimation of
E Y X , T = j for j = 0,1 and then estimate the unconditional expected value by integrating over the distribution of X. All strategies can be applied to estimate quantile treatment effects. Frölich (2005), for instance, follows the first approach; Firpo (2006), the second one; we follow the third approach and propose a nonparametric regression estimator for quantile treatment effect. We estimate the conditional distribution function by local-linear quantile regression (Chaudhuri 1991). Then, the unconditional distribution is obtained again by integrating the conditional distribution function over the distribution of X. This estimator is similar to the kernel-based estimator of Heckman, Ichimura and Todd (1998). We derive its asymptotic distribution in Section 1.4.
Chapter 1: Counterfactual distributions
1.3
Parametric estimator
1.3.1
Model and estimators
9
In this section, we assume that the conditional quantiles of Y are linear in X. Extensions to general parametric assumptions are straightforward. We present an estimator of unconditional distribution functions in the presence of covariates which is then used to decompose differences in distribution, in analogy to the Oaxaca / Blinder decomposition. Notation: FY ( q ) represents the cumulative distribution of the random variable Y at q, fY ( q ) represents the density of Y at the same point; FY−1 (θ ) represents the inverse of the distribution function, commonly called the quantile function, evaluated at 0 < θ < 1 ;
FY ( q X i )
represents the conditional cumulative
distribution function of Y evaluated at q given X = X i . We make the following assumptions for t = 0,1 : P.i.
The conditional quantiles of Y ( t ) given X are linear in X:
FY−(1t ) (τ X i ) = X i β t (τ ) , for ∀τ ∈ ( 0,1) ; P.ii. There exist a positive definite matrix Dt0 such that lim nt −1 ∑ X i ' X i = Dt0 ;
n →∞
i:Ti = t
P.iii. For ∀τ ∈ ( 0,1) , there exist a positive definite matrix Dt1 (τ ) such that
(
)
lim nt −1 ∑ fY (t ) FY−(1t ) (τ X i ) X i X i ' X i = Dt1 (τ ) ;
n →∞
i:Ti = t
P.iv. For all X in the support: the distribution function FY ( t ) ( ⋅ X ) is absolutely continuous and has a continuous density with 0 < fY ( t ) ( u X ) < ∞ on
{u : 0 < F ( ) (u X ) < 1} and sup Y t
u
f 'Y (t ) ( u X ) < ∞ ;
10
Chapter 1: Counterfactual distributions
P.v.
FY (t ) ( q ) is absolutely continuous and has a continuous density with
(
)
0 < fY (t ) FY−(1t ) (θ ) < ∞ ; P.vi. For t ∈ {0,1} and t ' ∈ {0,1} , FY (t ) ( q T = t ') is absolutely continuous and
(
)
has a continuous density with 0 < fY (t ) FY−(1t ) (θ T = t ' ) T = t ' < ∞ ; P.vii.
{Yi , X i , Ti }i =1 n
are independent and identically distributed across i and
have compact support. Assumptions P.i.-P.iv. are traditional assumptions made in quantile regression models. Note that all assumptions are made for ∀τ ∈ ( 0,1) and t ∈ {0,1} since we need to identify the whole conditional distribution of Y given X for treated and control units. Assumptions P.v. and P.vi. ensure that FY−(10) (θ ) , FY−(11) (θ ) , FY−(10) (θ T = 0 ) , FY−(10) (θ T = 1) , FY−(11) (θ T = 0 ) and FY−(11) (θ T = 1) are well
defined and unique. They are implied by P.iv. if the distribution of X satisfies some restrictions, for instance if at least one regressor is continuously distributed on ℝ . To simplify the analysis and because all applications use micro-data, we assume iid sampling and compactness of the support. Koenker and Bassett (1978) show that, for t ∈ {0,1} , βt (τ ) can be estimated by
βˆt (τ ) = arg min nt −1 ∑ ρτ (Yi − X i b ) , b∈ℝ K
(1)
i:Ti = t
where ρτ is the check function
ρτ ( z ) = z (τ − 1( z ≤ 0 ) ) and 1( ⋅) is the indicator function. βt (τ ) is estimated separately for each τ . Asymptotically, we could estimate an infinite number of quantile regressions. In finite samples, Portnoy (1991) shows that the number of numerically different quantile regressions is O ( n log ( n ) ) and each coefficient vector prevails on an
Chapter 1: Counterfactual distributions
11
interval. Let (τ 0 = 0,τ 1 ,...,τ J = 1) be the points where the solution changes.6
βˆt (τ j ) prevails from τ j −1 to τ j for j = 1,..., J .7 The τ 's conditional quantile of Y ( t ) given X i is consistently estimated by
X i ' βˆt (τ ) . Theoretically, it is easy to estimate the conditional distribution function by inverting the conditional quantile function. However, the estimated conditional quantile function is not necessarily monotonic and thus cannot be simply inverted. To overcome this problem, the following property of the conditional distribution function needs to be considered: 1
(
1
)
FY (t ) ( q X i ) = ∫ 1 FY−(1t ) (τ X i ) ≤ q dτ = ∫ 1( X i β t (τ ) ≤ q ) dτ . 0
0
Thus, a natural estimator of the conditional distribution of Y ( t ) given X i at q is given by: 1
(
)
J
(
)
FˆY ( t ) ( q X i ) = ∫ 1 X i βˆt (τ ) ≤ q dτ = ∑ (τ j − τ j −1 )1 X i βˆt (τ j ) ≤ q . (2) 0
j =1
This implies that we can estimate the unconditional distribution functions simply by
FˆY (t ) ( q T = t ) = ∫ FˆY (t ) ( q x ) dFX ( x T = t ) = nt −1 ∑ FˆY (t ) ( q X i ) .
(3)
i:Ti = t
Often, we are more interested in the unconditional quantile function instead of in the unconditional distribution function since the former can be more easily interpreted.8 Following the convention of taking the infimum of the set, a natural estimator of the θ th quantile of the unconditional distribution of y is given by 6
In order to simplify the notation, we do not show the dependence of τ j on t.
7
We derive the results by assuming that all quantile regression coefficients have been estimated. However, the asymptotic results are also valid if we estimate quantile regression coefficients only along a grid of quantiles whose mesh is sufficiently small (a mesh size of order O ( n −1 2 −ε ) will work). 8
Juhn, Murphy and Pierce (1993), Gosling, Machin and Meghir (2000), Donald, Green and Paarsch (2000), for instance, present results for the unconditional quantile function.
12
Chapter 1: Counterfactual distributions
qˆt (θ ) = inf q : nt −1 ∑ FˆY (t ) ( q X i ) ≥ θ . i:Ti = t
(4)
Naturally, the quantiles of the unconditional distribution can be estimated consistently by the sample quantiles (Glivenko-Cantelli theorem). We will see in the next section that qˆt (θ ) is more precise than the sample quantile. However, the main interest in this estimator is the possibility of simulating counterfactual quantiles that can be used to decompose differences in distribution and to estimate quantile treatment effects. For instance,
qˆc (θ ) = inf q : n1−1 ∑ FˆY ( 0) ( q X i ) ≥ θ i:Ti =1
(5)
is the θ th quantile of the distribution that we would observe if the treated units had not been treated. A decomposition of the difference between the θ th quantile of the unconditional distribution of the treated and the untreated is given by:
qˆ1 (θ ) − qˆ0 (θ ) = qˆ1 (θ ) − qˆc (θ ) + qˆc (θ ) − qˆ0 (θ ) ,
(6)
where the first bracket represents the effect of coefficients (QTET) and the second gives us the effect of characteristics. In the next sub-section, we concentrate on the quantiles and give the joint asymptotic distribution of qˆ1 (θ ) , qˆ0 (θ ) and qˆc (θ ) , thus providing a full description of the decomposition (6). The results for quantiles of other distributions and for the estimation of other quantile treatment effects (such as the overall QTE) can be derived analogously. We will consider the asymptotic distribution for a single quantile in order to simplify the notation by suppressing the dependence on θ but results for the joint distribution of several quantiles are straightforward to derive.
Chapter 1: Counterfactual distributions
1.3.2
13
Asymptotic results
THEOREM 1: Under assumptions P.i. to P.vii. qˆ0 , qˆ1 and qˆc defined by (4) are consistent and asymptotically normally distributed. Define q0 , q1 and qc to be the true values. For t = 0,1 2 2 n ( qˆt − qt ) → N 0, E θ − FY (t ) ( qt X ) + Ω t Pr (T = t ) fY ( t ) ( q T = t ) = T t
(
)
where Ω t is equal to
(
) (
)
fY (t ) ( qt x ) fY (t ) ( qt z ) x 'cov βˆt FY (t ) ( qt x ) , βˆt FY (t ) ( qt z ) z and ∫∫ dFX ( x T = t ) dFX ( z T = t )
(
)
−1 −1 cov βˆt (τ ) , βˆt (τ ') = ( min (τ ,τ ') − ττ ') Dt1 (τ ) Dt0 Dt1 (τ ) .
n ( qˆc − qc ) 2 2 → N 0, E θ − FY ( 0) ( qc X ) p + Ω c (1 − p ) fY ( 0) ( qc T = 1) T = 1
(
)
(7)
where Ω c is equal to
(
) (
)
fY ( 0) ( qc x ) fY ( 0) ( qc z ) x 'cov βˆ0 FY ( 0) ( qc x ) , βˆ0 FY ( 0) ( qc z ) z . ∫∫ dFX ( x T = 1) dFX ( z T = 1) qˆ0 and qˆ1 are independent. The normalized asymptotic covariance between qˆ1 and qˆc is equal to
(
)(
E θ − FY (1) ( q1 x ) θ − FY ( 0) ( qc x )
T =1
)
fY (1) ( q1 T = 1) fY ( 0) ( qc T = 1)
and the normalized asymptotic covariance between qˆ0 and qˆc is
(
) (
) z
x 'cov βˆ0 FY ( 0) ( q0 x ) , βˆ0 FY ( 0) ( qc z ) ∫ ∫ fY (0) ( q0 x ) fY (0) ( qc z ) dFX ( x T fY ( 0) ( q0 T = 1) fY ( 0) ( qc T = 1)
= 0 ) dFX ( z T = t )
.
14
Chapter 1: Counterfactual distributions
The proof of THEOREM 1, which can be found in appendix A, is an application of theorem 2 in Chen, Linton and Van Keigelom (2003). Here, we concentrate on the interpretation of and the intuition for the results. All variances consist of two parts: the variance that we would obtain if we knew the conditional quantiles and the variance coming from the estimation of the conditional quantiles. Note that the variance of the θ th sample quantile of a random variable Y can also be decomposed in this way by applying the law of total variance:
(
var 1(Y ≤ q ) fY ( q ) = var E (1(Y ≤ q ) X ) + E var (1(Y ≤ q ) X ) 2
(
)
(
)
fY ( q )
2
)
2 2 = E θ − FY ( q X ) + E FY ( q X ) 1 − FY ( q X ) fY ( q )
where FY ( q ) = θ . Thus, the first part of the variances of qˆ0 , qˆ1 and qˆc is the variances of the conditional quantiles. If we consider a deterministic sample or if the estimands are defined conditionally on the sample (e.g. the discussion in Imbens, 2004, and Abadie and Imbens, 2006), the variance of the estimates will consist only of the second term. In this case, uncertainty arises only from the estimation of the conditional quantile functions since the distribution of X is considered to be known. As for the estimation of the ATE, the variance will be lower if we estimate the sample quantity instead of the population quantity. We also observe that the first element of the asymptotic variance of qˆ0 and qˆ1 is the same as the first element of the variance of the sample quantile. However, the second element is lower than for the sample quantile. The intuition is simple: the linear quantile regression model assumes that the conditional quantiles of Y given X are linear in X. All observations are used to estimate the conditional distribution function while this information does not enter the sample quantile. The price to pay is a more restrictive model. If the conditional quantile model is misspecified,
qˆt is not consistent for qt . The sample quantiles are in any case consistent. Therefore, a simple specification test of the conditional model consists of testing whether both estimates differ, like a Hausman (1978) test. If they differed significantly, it would imply that the linear quantile regression model is too restrictive.
Chapter 1: Counterfactual distributions
15
The second part of the variances of qˆ0 , qˆ1 and qˆc is similar to the variance of the trimmed mean estimator of Koenker and Portnoy (1987) and Gutenbrunner and Jurecková (1992). The differences arise because they integrate directly over the estimated coefficients while we integrate over the estimated quantiles and because of our different assumptions concerning heteroscedasticity. An intuition for this element can be given as follows. The asymptotic variance of
(
n X i βˆt (τ j )
)
is X i ' var βˆt (τ j ) X i . However, when estimating the θ th quantile of Y ( t ) ,
βˆt (τ j ) plays only a role for those observations with X i βˆt (τ j ) = qt . Moreover, the importance of each observation in estimating qt is proportional to the density of Y given X at qt . For instance, if the characteristics have a positive effect on Y, observations with a high value of X have a very small probability of playing a role in the estimation of a low quantile of Y. Finally,
( (
n X j βˆt FY (t ) qt X j
(
)
n X i βˆt FY (t ) ( qt X i ) and
) ) have a covariance of
(
) (
(
))
X i ' Cov βˆt FY (t ) ( qt X i ) , βˆt FY (t ) qt X j X j
because all quantile regression coefficients are correlated. The form of the asymptotic variance-covariance for different quantile regressions was given by Gutenbrunner and Jureckova (1992). In order to make inference on the decomposition (6), we give the covariances between qˆc and qˆ0 , and between qˆc and qˆ1 . They are not null because qˆc and
qˆ0 are computed with the same quantile regression coefficients and qˆc and qˆ1 are computed with the same covariates, which induces co-variation of the conditional quantiles.
16
1.3.3
Chapter 1: Counterfactual distributions
Estimation of the asymptotic variance
The variance of the estimators proposed in Section 1.3.1 can be estimated by bootstrapping the results9. However, since such estimators are often used with large if not huge datasets, bootstrapping the results is typically infeasible. We therefore propose to use the asymptotic results of Section 1.3.2 to construct an analytical estimator of the asymptotic variance. Consistent estimation of the asymptotic variance of qˆc requires consistent estimation of p, FY ( 0) ( qc x ) ,
(
fY ( 0) ( qc T = 1) , cov βˆ0 (τ ) , βˆ0 (τ ')
)
and fY ( 0) ( qc x ) .10 We discuss now the
estimation of each of these elements. A natural estimator of p = Pr (T = 1) is n1 n . An estimator for FY ( 0) ( qc x ) was given by (2). Since qc is not known, we replace qc by its consistent estimate qˆc . fY ( 0) ( qc T = 1) is the derivative of FY ( 0) ( qc x ) . Thus, a first possibility to
estimate this element is to use the idea of Siddiqui (1960):
fˆY ( 0) ( qc (θ ) T = 1) =
2hn . ˆ FY ( 0) ( qˆc (θ + hn ) T = 1) − FˆY ( 0) ( qˆc (θ − hn ) T = 1)
A second possibility is the use of a kernel estimator. Since we need to estimate the density of a counterfactual, unobserved distribution, we first simulate this distribution by estimating an important number of quantiles qˆc (θ d ) for {θ d }d =1 D
taken from a uniform grid between 0 and 1.11 We then use a normal kernel and the Silverman (1986) rule of thumb (other choice are of course also possible) and obtain:
9 The regularity conditions for bootstrap consistency given in theorem B in Chen, Linton and Van Keigelom (2003) can be verified in the same way as the conditions for asymptotic normality which areverified in the appendix. 10 The estimation of other variances or covariances require the estimation of the same types of elements. 11 In the Monte-Carlo simulations and in the application we set D = 10000 .
Chapter 1: Counterfactual distributions
1 fˆY ( 0) ( qc (θ ) T = 1) = nhn
17
qˆ (θ ) − qˆc (θ ) K c d . hn d =1,..., D
∑
(8)
A large literature already deals with the estimation of the covariance matrix of the quantile regression coefficients12. In this chapter, we would like to avoid the bootstrap in order to keep the computation time reasonable. Moreover, we cannot use rank-based estimators, since we need to estimate the whole covariance matrix. Finally, we want to allow for arbitrary dependence between the residuals and the regressors. Therefore, only two estimators can reasonably be used in order to estimate the variance of the quantile regression parameters: the Powell (1984) kernel estimator and the Hendricks and Koenker (1991) estimator. Normally, a disadvantage of the second estimator is that it needs more computation time because it requires the estimation of two additional quantile regressions for each quantile. But, since we have already estimated the whole quantile regression process anyway, this estimator is in our case as fast as the kernel estimator. And since the Hendricks and Koenker estimator appears to be
(
)
βˆ (τ ) , βˆ (τ ') by more precise in small samples, we focus on it to estimate cov 0 0
n0 ∑ X i ' X i fˆετ ( 0 X i ) i:Ti = 0
−1
( min (τ ,τ ') − ττ ') ∑ X i ' X i ∑ X i ' X i fˆετ ' ( 0 X i ) i:Ti = 0 Ti = 0
(
)
−1
where fˆετ ( 0 X i ) = 2hn X i βˆ0 (τ + hn ) − βˆ0 (τ − hn ) and hn is a bandwidth that follows the Hall and Sheather (1988) rule. Finally, fY ( 0) ( qc x ) is estimated in the same way by
( (
)
(
fˆY ( 0) ( qc x ) = 2hn x βˆ0 FˆY−(10) ( qˆc x ) + hn − βˆ0 FˆY−(10) ( qˆc x ) − hn Thus, we estimate the variance of qˆc by
12
See chapter 3 in Koenker (2005) for a recent survey.
)) .
18
Chapter 1: Counterfactual distributions
nn1−2 ∑ (θ − τˆi )
2
i:Ti =1
nn0 −1n1−2 ∑
∑
i:Ti =1 i:T j =1
+
2 fˆY ( 0) ( qˆc T = 1)
(
)
βˆ (τˆ ) , βˆ (τˆ ) X fˆY ( 0) ( qˆc X i ) fˆY ( 0) qˆc X j X i 'cov 0 j j 0 i 2 fˆY ( 0) ( qˆc T = 1)
where τˆi = FˆY−(10) ( qˆc X i ) in order to alleviate the notation. The proof of consistency of this estimator follows from the consistency of the different elements of the variance, which has already been proven in the cited papers and above, and from Slutsky and continuous mapping theorems. The proof is standard and will not be discussed here. 1.3.4
Extension: effects of residuals
Juhn, Murphy and Pierce (1993) and Lemieux (2006), among others, decompose the differences in distribution into three factors: coefficients, characteristics and residuals. Since there is a theoretical interest in several applications to identify these three sources of differences in distribution, we show how we can extend the decomposition of the preceding section in order to separate the effects of coefficients into the effects of median coefficients and residuals. This decomposition was developed and applied independently by Melly (2005b) and Autor, Katz and Kearney (2005a and 2005b). We use the same framework as Juhn, Murphy and Pierce (1993) to decompose the differences in wage distributions between the treated and control units. If we take the median as a measure of central tendency of a distribution, we can write a simple wage equation for each group
Yi ( t ) = X i β t ( 0.5 ) + ut ,i
t = 0,1 .
We can isolate the effects of differences in characteristics, median coefficients and residuals. The effect of characteristics can be estimated similarly to Section 1.3.1. To separate the effect of coefficients from the effect of residuals, note that the τ
th
quantile of the residuals distribution conditionally on X i is consistently
Chapter 1: Counterfactual distributions
(
19
)
estimated by X i βˆt (τ ) − βˆt ( 0.5 ) . We define βˆm1, r 0 (τ j ) = βˆ1 ( 0.5 ) + βˆ0 (τ j )
− βˆ0 ( 0.5 ) . Then, we estimate the distribution that would prevail if the median return to characteristics were the median return in the treated group but the residuals were distributed as in the control group by J qˆm1, r 0 (θ ) = inf q : n1−1 ∑ ∑ (τ j − τ j −1 )1 X i ' βˆm1, r 0 (τ j ) ≤ q ≥ θ . i:Ti =1 j =1
(
)
Therefore, the difference between qˆm1, r 0 (θ ) and qˆc (θ ) is due to differences in coefficients since characteristics and residuals are kept at the same level. Finally, the difference between qˆ1 (θ ) and qˆm1, r 0 (θ ) is due to residuals. The asymptotic distribution of this decomposition is straightforward to derive. All quantile regression coefficients estimated within the treated groups are independent from their control group analog. The covariance between different quantile regression coefficients was given in Section 1.3.2.
1.4
Semiparametric estimator
1.4.1
Model and estimators
The consistency of the estimators proposed above depend on the parametric assumption of the first step estimation. We have considered only linear quantile regression but nonlinear or censored quantile regression could also be used. In
(
)
this case, we would have to change the form of cov βˆt (τ ) , βˆt (τ ') but the other results would still remain valid. The parametric assumption can be alleviated by using polynomial series or dummy variables. However, it is sometimes better to completely abandon parametric assumptions and to estimate the conditional quantile functions nonparametrically. We propose an estimator based on local linear quantile regression (Chaudhuri 1991). This procedure can be seen as the quantile equivalent of the estimator proposed by Heckman, Ichimura and Todd (1998) for the mean. They estimate the conditional mean function by local
20
Chapter 1: Counterfactual distributions
constant or local linear regression. Hahn (1998) computes the conditional mean function by series estimation. This is an alternative approach but we do not explore it in this chapter. In order to derive the asymptotic properties of these estimators, we make the following assumptions: S.i.
{Yi , X i , Ti }i =1 n
are independent and identically distributed across i and
have compact support, X is a d-dimensional continuously distributed variable13 with f X ( X i ) ' continuously differentiable and bounded for all X i in the support; S.ii.
For τ ∈ ( 0,1) , FY−(1t ) (τ X ) is p -smooth14, where p > d ;
S.iii.
For all X in the support: the distribution function FY ( t ) ( ⋅ X ) is absolutely
continuous
0 < fY ( t ) ( u X ) < ∞
and on
has
a
continuous
density
{u : 0 < F ( ) (u X ) < 1} Y t
with and
supu fY ( t ) ( u X ) ' < ∞ ; S.iv.
The bandwidth sequence hn satisfies plim N →∞
deterministic sequence
{an }
hn = h0 > 0 for some an
that satisfies nand log n → ∞
and
nan2 p → c < ∞ for some c ≥ 0 ; S.v.
The kernel function K ( ⋅) is symmetric, supported on a compact set and Lipschitz continuous;
S.vi.
The kernel function K ( ⋅) has moments of order 1 through p − 1 that are equal to zero.
13
Discrete regressors do not matter asymptotically. We call a function p-smooth when it is p-times continuously differentiable and its pth derivative is Hölder continuous. 14
Chapter 1: Counterfactual distributions
21
These assumptions are in principle the same as those made by Heckman, Ichimura and Todd (1998) but some differences arise from the different estimands. Condition S.ii. guarantees that the conditional quantile functions are smooth enough to be estimated by local linear quantile regression. Condition S.iii. ensures that the conditional quantiles are well-defined and unique. Since the distribution of X is assumed to be continuous by S.i., this also implies that the unconditional quantiles of Y are well-defined and unique. Undersmoothing, higher-order and compact support kernel are necessary in order to control the bias and the rate of convergence of the kernel regression estimator. The procedure is very similar to the estimator that relies on linear quantile regression in the first step. The difference, however, is that the quantile regression coefficients depend on the point at which they are estimated. Formally, let
Xi − x ρτ (Yi − X i b ) hn
βˆt (τ , x ) = arg min nt −1 ∑ K b
be the τ
th
i:Ti = t
quantile regression coefficient estimated locally at x. We can allow the
bandwidth to depend on x and τ . Then the procedure is similar to that of Section 1.3.1: 1
(
)
J
(
FˆYS(t ) ( q X i ) = ∫ 1 X i βˆt (τ , X i ) ≤ q dτ = ∑ (τ j − τ j −1 )1 X i βˆt (τ j , X i ) ≤ q 0
j =1
)
FˆYS(t ) ( q T = t ) = nt −1 ∑ FˆYS(t ) ( q X i )
(9)
qˆtS (θ ) = inf q : nt −1 ∑ FˆYS(t ) ( q X i ) ≥ θ . i:Ti = t
(10)
i:Ti = t
and
Naturally we can estimate counterfactual quantiles by
qˆcS (θ ) = inf q : n1−1 ∑ FˆYS( 0) ( q X i ) ≥ θ i:Ti =1 and use them to estimate the quantile treatment effect on the treated
(θ ) = qˆ S (θ ) − qˆ S (θ ) . QTET 1 c
22
Chapter 1: Counterfactual distributions
In the same way, we estimate the overall quantile treatment effect by n n (θ ) = inf q : n −1 Fˆ S ( q X ) ≥ θ − inf q : n −1 Fˆ S ( q X ) ≥ θ . QTE ∑ ∑ i i Y (1) Y ( 0) i =1 i =1
1.4.2
Asymptotic results
THEOREM 2: Under the assumptions S.i. to S.vi. qˆ0S and qˆ1S are
n consistent
and asymptotically equivalent to the sample quantiles:
θ (1 − θ ) n ( qˆtS − qt ) → N 0, Pr (T = t ) f ( q T = t )2 t Y (t ) qˆcS is
.
n consistent and asymptotically normally distributed:
2 E θ − FY ( 0) ( qc X ) S T = 1 Ω + c n ( qˆcS − qc ) → N 0, p 1− p
(
)
(
fY ( 0) ( qc T = 1) (11) 2
)
where Ω cS = E FY ( 0) ( qc X ) 1 − FY ( 0) ( qc X ) f X ( X T = 1) f X ( X T = 0 ) . T =1
qˆ0S and qˆ1S are independent. The normalized asymptotic covariance between qˆ0S and qˆcS is
( (
)
)
E min FY ( 0) ( qc X ) , FY ( 0) ( q0 X ) − FY ( 0) ( qc X ) FY ( 0) ( q0 X ) . fY ( 0) ( qc T = 1) fY ( 0) ( q0 T = 0 ) (1 − p )
T =1
The normalized asymptotic covariance between qˆcS and qˆ1S is
(
)(
)
E FY ( 0) ( qc X ) − θ FY (1) ( q1 X ) − θ . fY ( 0) ( qc T = 1) fY (1) ( q1 T = 1) p
T =1
and QTE are consistent and asymptotically normally distributed: Thus, QTET
Chapter 1: Counterfactual distributions
(
)
23
(
)
− QTET → N 0, avar ( qˆ S ) + avar ( qˆ S ) − 2 acov ( qˆ S , qˆ S ) ;15 n QTET 1 1 c c Ω 2S − QTE → N 0, Ω S + n QTE 1 pfY ( 0) FY−(10) (θ )
(
)
(
)
2
+
Ω 3S
(1 − p ) fY (1) ( FY−(11) (θ ) )
2
,
where
(
)
(
)
var FY ( 0) FY−(10) (θ ) X var FY (1) FY−(11) (θ ) X + Ω = 2 2 −1 −1 fY ( 0) FY ( 0) (θ ) fY (1) FY (1) (θ ) T = 1 S 1
(
( (
)
(
) )( ( ) (
)
) )
E FY ( 0) FY−(10) (θ ) X − θ FY (1) FY−(11) (θ ) X − θ , −2 −1 −1 fY ( 0) FY ( 0) (θ ) fY (1) FY (1) (θ )
(
)
(
)(
(
)) f
(
)(
(
)) f
Ω 2S = E FY ( 0) FY−(10) (θ ) X 1 − FY ( 0) FY−(10) (θ ) X
Ω 3S = E FY (1) FY−(11) (θ ) X 1 − FY (1) FY−(11) (θ ) X
X
X
(X )
(X )
f X ( X T = 0 ) , and f X ( X T = 1) .
and QTE achieve the efficiency bounds derived by Firpo COROLLARY: QTET (2006). The proofs of Theorem 2 and its corollary can be found in the appendices B and C. Although the estimators in this section are based on nonparametric methods, they are
n consistent because the first-step infinite dimensional estimates are
integrated over all observations to obtain the finite-dimensional second step estimate. The average derivative quantile regression estimator of Chaudhuri, Doksum and Samarov (1997) is similar in this aspect. The asymptotic equivalence of the sample quantile and qˆtS (θ ) could be surprising but the reason is clear: asymptotically, the bandwidth is zero and no assumption is made about the dependence between Y and X. Note that if Y is linear in X, then the parametric estimator uses the optimal, infinite bandwidth while the nonparametric estimator 15
avar and acov are the normalized asymptotic variances and covariance given above.
24
Chapter 1: Counterfactual distributions
constrains the bandwidth to go to zero. This explains why the parametric estimator is more efficient than the nonparametric one in this case. The efficiency gain of the estimator of Section 1.3 results from the parametric assumptions. If the parametric restrictions are satisfied, we increase precision; if they are not satisfied, the estimator may be inconsistent. When comparing the asymptotic variances of qˆc and qˆcS , we note that both consist of two parts and that both first parts are exactly identical. This is the variance that we would obtain if we knew the true conditional quantiles and, therefore, this part does not depend on the method used to estimate the conditional quantiles. The second part is the contribution of the first step estimation to the second step variance which differs between qˆc and qˆcS . While
X i βˆ0 (τ ) and X j βˆ0 (τ ' ) are correlated, X i βˆ0 (τ , X i ) and X j βˆ0 (τ ', X j ) are asymptotically independent if X i ≠ X j because the coefficients are only locally estimated. Thus, we do not need to account for these covariances and the double integral appearing in the asymptotic variance of qˆc disappears for qˆcS . Finally, we can use the form of the asymptotic variance of βˆ0 (τ , X i ) to simplify
avar ( qˆcS ) .
and QTE achieve the We show in the corollary of Theorem 2 that QTET semiparametric efficiency bounds without knowledge of the propensity score derived by Firpo (2006). Moreover, he proves that his propensity score weighting estimators also achieve the semiparametric efficiency bounds. Thus, both estimators of quantile treatment effects are asymptotically equivalent, just as the Heckman, Ichimura and Todd (1998) and the Hirano, Imbens and Ridder (2003) estimator of average treatment effects. Naturally, their finite sample properties may be very different. The relative advantages of both approaches are discussed in the conclusion.
Chapter 1: Counterfactual distributions
25
and QTE is in principle The estimation of the asymptotic variance of QTET simpler to estimate than that of the parametric estimators. We only need to estimate unconditional distribution (and quantile) functions and unconditional densities. Unconditional distributions and quantile functions are estimated by (9) and (10) respectively. For the estimation of unconditional distributions, we use kernel density estimates with Silverman (1986) bandwidth. If we must estimate the density of an unobserved distribution, we use the principle described in (8) for the parametric estimator, that is we apply a kernel density estimator on the estimated unobserved distribution.
1.5
Monte-Carlo simulations
Asymptotic results are interesting partly because we hope that they describe approximately the behavior of the estimators in finite-samples. In this section we try to find out how the proposed estimators behave in finite samples. We first study the parametric estimator, which we call QQR (quantile based on quantile regression). Then we compare it to the estimator proposed by Machado and Mata (2005) since this estimator is frequently applied. Finally, we consider the estimator based on nonparametric first step quantile regression (QNQR). Software to implement the proposed estimators in R and to replicate the Monte Carlo simulation are available at the author’s website.16 1.5.1
Parametric estimator
We consider a simple model with three correlated covariates and a constant:
Y ( t ) = 1 + X 1 + X 2 + X 3 + ε ( t )(1 + X 1 ) where
X 1 ∼ U ( 0,1) ,
cor ( X 1 , X 3 ) = 0.49 ,
X 2 ∼ B ( 0.5 ) ,
cor ( X 2 , X 3 ) = 0.4 ,
t = 0,1
X 3 ∼ N ( 0,1) ,
ε ( 0 ) ∼ t (1) ,
cor ( X 1 , X 2 ) = 0.4 ,
ε (1) ∼ N ( 0,1)
and
Pr (T = 1 X ) = 0.5 . The distribution of the covariates and the median coefficients 16
R is an open-source programming environment for conducting statistical analysis and graphics that can be downloaded at no cost from the site www.r-project.org.
26
Chapter 1: Counterfactual distributions
do not depend on the treatment status but the error term is normally distributed for the treated and Cauchy distributed for the control units. Thus, the quantile treatment effect is positive below the median and negative above. It also allows us to compare the behavior of the estimators in the presence of a standard normal and an extremely fat tailed distribution. We consider 3 different sample sizes n0 = n1 : 100, 400 and 1600 and we set the number of replications to 10000, 5000 and 2500, respectively. We report the
= qˆ (θ ) − qˆ (θ ) , both evaluated at 3 results for qˆ0 (θ ) , qˆ1 (θ ) and QTET 1 c different quantiles: 5%, 25% and 50%. Table 1.1 reports the bias, standard error, skewness, kurtosis and mean squared error (MSE) of the estimates. The relative MSE of the sample quantile is also given for qˆ0 (θ ) and qˆ1 (θ ) in order to evaluate the efficiency gains achieved by the QQR. As expected, the bias is smaller in the center of the distribution and with normal error terms. In the cases where there is a bias, it tends to disappear as the sample size increases. The analytically established convergence rate of the estimator is confirmed since quadrupling the sample size cuts the standard errors by half and the MSE by about 75%. Considering the skewness and kurtosis, the distribution of the estimates appears to be already fairly close to the normal distribution with 100 observations for the median. A higher sample size is necessary for lower quantiles in the presence of Cauchy distributed error terms, but the convergence to the values of the normal distribution is clear. Finally, the QQR is almost always more efficient than the sample quantile. The only exception arises for the 5th percentile with small sample sizes and Cauchy distributed error terms. We also evaluate the performance of the analytical estimator for the variance proposed in Section 1.3.3 and compare its performance with that of the bootstrap. In order to keep the computation time reasonable, the results for the bootstrap are based on only 4000, 2000 and 1000 replications for sample sizes of 100, 400 and 1600 respectively. Within each Monte Carlo replication, 100 bootstrap replications were drawn. We present results only for the QTET but they are
Chapter 1: Counterfactual distributions
27
representative for the results of other estimands. Table 1.2 gives different criterions that allow us to evaluate the estimators. It reports first the rejection frequencies by a Wald test of the true null hypothesis for 3 different confidence levels. Secondly, since this first evaluation does not allow us to evaluate the precision of the estimates, the median bias and the median absolute deviation from the true value for both estimators are also given. We take the empirical standard errors obtained in the Monte Carlo simulations as the “true” values.17 The empirical sizes of the tests confirm that both the analytic estimator and the bootstrap are consistent for the standard error of the QQR. With the exception of the 5th percentile with low sample sizes, both are reasonable estimators with empirical sizes near the theoretical ones. If we consider the MAD of both estimators, we note that the analytic estimator is more precise than the bootstrap (with 2 exceptions). Thus, the analytic estimator of the variance is not only faster to compute but also more efficient and its use in applications can be recommended. 1.5.2
Comparison to Machado and Mata (2005) estimator
Machado and Mata (2005, MM hereafter) also propose using quantile regression in order to estimate counterfactual unconditional wage distributions. Their estimator is widely used in various applications, see for instance Albrecht, Björklund and Vroman (2003), Melly (2005a) and Autor, Katz and Kearney (2005a and 2005b). However, no asymptotic results and no method to estimate the variance consistently have been provided18. We show in this section that the MM estimator is numerically identical to our estimator if the number of simulations goes to infinity and, thus, the results of this chapter apply also to their estimator. 17
Another possibility would be to compute the asymptotic standard error analytically, but what we want is to estimate the empirical variance of the estimate and not the asymptotic variance. 18 Albrecht, Van Vuuren and Vroman (2004) derive the asymptotic distribution under the special assumption that the number of replications is of the same order as the number of observations. Therefore, they obtain different results. Their assumption entails the efficiency of the estimator, as explained below.
28
Chapter 1: Counterfactual distributions
The idea underlying their technique is the probability integral transformation theorem. If U is uniformly distributed on
[0,1] ,
then F −1 (U ) has F as
distribution function. Thus, for a given X i and a random θ ∼ U [ 0,1] , X i β 0 (θ ) has the same distribution as Y ( 0 ) X i . If we draw a random X from the control population instead of keeping X i fixed, X β 0 (θ ) has the same distribution as
Y ( 0 ) T = 0 . Formally, the procedure proposed by MM involves 4 steps: 3.1. Generate a random sample of size m from a U [ 0,1] : u1 ,..., um . 3.2. Estimate m different quantile regression coefficients: βˆ0 ( ui ) , i = 1,..., m. 3.3. Generate a random sample of size m with replacement from
{ X i }T =0 , i
{ }
denoted by Xɶ i
{
}
3.4. Yɶi = Xɶ i βˆ0 ( ui )
m
i =1
m
i =1
. is a random sample of size m from the unconditional
distribution of Y ( 0 ) T = 0 . Naturally, alternative distributions could be estimated by drawing X from another distribution and using different coefficient vectors. As noted by Autor, Katz and Kearney (2005a), this procedure is equivalent to numerically integrating the estimated conditional quantile functions over the distributions of X and θ . The principles of the MM estimator and of the QQR are identical. First, since the observations are assumed to be iid, the QQR uses all observations instead of a single one with each of the m different quantile regression coefficients. Second, if m → ∞ , the probability that a coefficient βˆ0 (τ j ) is chosen is exactly equal to
τ j − τ j −1
since
for
all
τ j −1 ≤ ui ≤ τ j ,
βˆ0 ( ui ) = βˆ0 (τ j )
and
Pr (τ j −1 ≤ ui ≤ τ j ) = τ j − τ j −1 . In other words, if m → ∞ , the MM estimator is numerically identical to the QQR.
Chapter 1: Counterfactual distributions
29
A Monte-Carlo simulation illustrates this result. We keep the same datagenerating process as in Section 1.5.1 and estimate qc ( 0.5 ) in 5000 replications using a sample of 400 observations19. Figure 1.1 plots the correlation between the MM estimator and the QQR as a function of m. The equality of both estimators when m → ∞ is clear. Figure 1.2 shows that the imperfect correlation between the QQR and the MM estimator is simply due to the noise added by the bootstrap procedure of MM. The MSE of the QQR is always lower than the MSE of MM but both converge if m → ∞ . Almost all applications of the MM procedure set m equal to the sample size. We note that the MSE of the MM estimator for
m = n1 = n0 = 400 is more than twice as large as the MSE of the QQR and, thus, the efficiency loss is really important in most of the applications. Table 1.4 shows that the bias of the MM estimator does not depend on m, as expected, but the standard errors of the estimates diminishes as we increase the number of replications. Thus, a large number of replications is necessary in order to obtain good MSE properties. Naturally, estimating a large number of replications is time consuming especially when the number of observations is high and the estimation of the whole quantile regression process is not possible. QQR can be computed faster and uses the information contained in the data more efficiently. Simulation procedures are useful if there is no analytical solution to the problem. However, they are not necessary if we can, as in our case, use moment conditions in order to derive an analytical estimator for the parameters of interest. 1.5.3
Semiparametric estimator
We now present the results of a Monte-Carlo simulation using nonparametric quantile regression in the first-step. We consider a nonlinear model with a single regressor and a constant. The error term is again hit by a linear heteroscedastic scale. Formally
19
Other quantiles, sample sizes or estimands lead exactly to the same conclusions.
30
Chapter 1: Counterfactual distributions
Y ( t ) = 5 + X + 4 cos ( X ) + 2sin ( 3 X ) + ε ( t ) ( 0.5 + X where
)
t = 0,1
X T = 0 ∼ N ( 0, 4 ) , X T = 1 ∼ N ( 0,1) , ε ( 0 ) ∼ t (1) and ε (1) ∼ N ( 0,1) .
We consider 3 different quantiles: 5%, 25% and 50% and 4 different sample sizes
n0 = n1 : 100, 400, 1600 and 6400. The number of replications was set to 8000, 4000, 2000 and 1000, respectively. We use an Epanechnikov kernel and estimate 100 quantile regressions at each observation. Choosing a bandwidth for a semiparametric estimator is a difficult task since the bandwidth does not appear in the first-order approximation of the asymptotic distribution. Here, it is even harder because we must choose not only one but a large number of bandwidths: one for each quantile regression. We make the simplifying assumptions of Yu and Jones (1998), which implies that the optimal bandwidth20 for one quantile can be derived from the optimal bandwidth for another quantile and we are left with the choice of a single bandwidth. We set the bandwidth of the median regression quite arbitrarily to n −1 4 sd ( X i ) . Table 1.4 reports the bias, the standard errors, the mean squared error (MSE), the
. The relative mean squared skewness and the kurtosis of qˆ0S , qˆ1S and QTET error of the sample quantile is also given for qˆ0S and qˆ1S . The consistency, the convergence rate and the asymptotic normality of the estimates are confirmed by the Monte Carlo simulations but more observations are needed when the error terms are Cauchy distributed than when they are normally distributed. The relative MSE of the sample quantiles converges to 1 as predicted by the asymptotic results. Once again we note a difference between qˆ0S and qˆ1S : in finite samples, the QNQR tends to have a higher MSE than the sample quantiles in the presence of Cauchy disturbances while it tends to have a smaller MSE in the presence of normal disturbances. Table 1.5 evaluates the analytic estimator of the 20
This is the optimal bandwidth for the nonparametric estimator and therefore cannot be the optimal bandwidth for the second step estimator. However, we can hope that this is a sensible bandwidth once we have corrected for the convergence rate. In any case, the asymptotic properties are still valid without the optimal bandwidth.
Chapter 1: Counterfactual distributions
31
variance by using the same criteria as in Table 1.2. It was not possible to bootstrap the results because of the computation time. Analytical standard errors tend to be close to the observed standard errors and fairly precise. With at least 400 observations the empirical sizes are close to the nominal ones. These results lead us to conclude that the proposed procedures constitute a complete system for estimating QTEs and for making consistent inference.
1.6
Applications: black-white wage differentials
As explained in the introduction and in Section 1.5.2., several estimators similar to the QQR have already be applied in different contexts: Gosling, Machin and Meghir (2000), Albrecht, Björklund and Vroman (2004), Machado and Mata (2005), Autor, Katz and Kearney (2005a and 2005b), for instance. In this section, we show in another application how the estimation of QTE complements the estimation of ATE and how the semiparametric estimator allows us to relax too restrictive assumptions. Race differentials in labor market outcomes remain persistent. Although earnings appeared to converge during most of the postwar period, the black-white wage gap has now stagnated for the last two decades. We complement the traditional decomposition of the racial wage gap (see Altonji and Blank, 1999, for a survey) by considering the wage gap at different points of the distribution, which allows us to answer different questions about the racial wage gap. We can test several hypothesis like the presence of a glass ceiling or of sticky floors. Usually, the literature has identified the existence of a glass ceiling when the pay gap is significantly larger at the top of the distribution. Arulampalam, Booth and Bryan (2005) identify a sticky floor when the wage gap is significantly larger at the bottom of the wage distribution. Both hypotheses have been put forward as explanations for the black/white wage gap. Some scholars have argued that blacks have become increasingly divided into two economic worlds: the emerging black middle class that rejoins the white middle class and the excluded black underclass, left out of the white economic world. This sticky floor hypothesis
32
Chapter 1: Counterfactual distributions
should appear in our results as an in absolute value decreasing black wage gap as we move along the wage distribution. Alternatively, if black employees are being discriminated against in promotion, that is if black employees have a lower probability of being promoted to jobs with higher responsibilities even if they have the same ability distribution as the white employees, then we should observe a glass ceiling pattern, i.e. a higher racial wage gap at the top of the distribution. We use data from the Merged Outgoing Rotation Groups of the Current Population Survey for the year 2001. We restrict the sample to men who are between 16 and 65 years old. To simplify the analysis, we simply multiply the censored observations by 1.33. This has virtually no effect on the results since less than 1% of the observations are censored. An alternative would be to estimate censored quantile regression. We consider the differences in log wage between white and black workers and define Ti = 0 for white and Ti = 1 for black. Descriptive statistics for the variables of interest are given in Table 1.6. The covariates consist of education, potential experience and three regional dummies (south is the excluded category). The means of the relevant variables show that black workers are less educated, slightly more experienced and concentrated in the South region. 1.6.1
Parametric estimator
Figure 1.3 plots the decomposition (6) of the black wage gap with a 95% confidence interval obtained by the analytical estimator of Section 1.3.3. The estimated total differential shows that the black wage gap is higher at the high end of the distribution than at the lower end. This could be interpreted as an indicator for the glass ceiling phenomenon. However, this could also arise from different distributions of characteristics for white and black. In fact, after correcting for the effects of characteristics, we find that the black wage gap is first increasing but is then stable from the 30th percentile until the end of the distribution. We cannot really interpret this pattern as a glass ceiling effect since we would expect the race gap to increase particularly at the high end of the distribution. Thus, none of the two hypothesis (glass ceiling and sticky floor) is verified and we observe a lower
Chapter 1: Counterfactual distributions
33
racial wage gap at the low end of the distribution. We see two possible explanations for this pattern. Discrimination is probably more difficult to justify21 for very basic jobs, where all employees are doing the same task. Customer discrimination is maybe also less relevant for some low-paid jobs, which are occupied predominantly by black workers. This decomposition depends crucially on the parametric assumption for consistency. A simple test of the functional form can be performed by comparing the sample quantiles with the quantiles implied by the linear quantile regression model. Figure 1.4 plots the differences between both estimates for white22 workers with a 95% bootstrap confidence interval. It is obvious that the model is misspecified with too high estimates in the extreme parts of the distribution and too low estimates in the middle of the distribution. For the majority of quantiles the differences are significantly different from 0. In order to suppress the parametric assumption, we now estimate the first step nonparametrically. 1.6.2
Semiparametric estimator
Since there are only 11 different values for education and 4 different regions, we can use exact nonparametric matching on these variables and must smooth only over experience. In this dimension, we use the same kernel and bandwidths as in Section 1.5.3. By looking at Figure 1.4, we can now check if the quantiles implied by the model and the raw quantiles are similar. The differences are now flat and not U-shaped any more as it was the case for the parametric first-step. Therefore, we trust these results more than those of Section 1.6.1. The decomposition plotted in Figure 1.5 does not really contradict the above interpretation. The analytically estimated standard errors are higher and the estimates are less smooth but the main message remains unchanged. The different distribution of characteristics explains about one third of the level in wages and a large part of the glass ceiling pattern. Neither a glass ceiling nor a sticky floor 21
In order to avoid a lawsuit. The differences are not significantly different from 0 for black workers, but the sample size is much lower.
22
34
Chapter 1: Counterfactual distributions
phenomenon can be observed but the racial wage discrimination is lower at the lowest part of the distribution.
1.7
Conclusion
This chapter proposes and implements parametric and semiparametric procedures to estimate unconditional distributions in the presence of covariates. This allows us to estimate counterfactual distributions and quantile treatment effects. The estimators are based on the estimation of the conditional distribution by parametric or nonparametric quantile regression. The first step estimates are then integrated over the range of the covariates in order to obtain the unconditional distribution.
n consistency and asymptotic normality of both estimators are
shown and analytical procedures to estimate their variances are provided. We also show that the parametric estimator of unconditional distributions is more precise than the sample quantile23 and that the semiparametric estimator of quantile treatment effects achieves the efficiency bound. Monte-Carlo simulations show that the asymptotic results are useful approximations in medium sample sizes. We apply the proposed estimators to decompose the black-white gap in earnings and find no glass ceiling effect for blacks. The estimators proposed in this chapter are based on the unconfoundedness assumption. In order to estimate quantile or average treatment effects, three types of estimators have been proposed: the regression estimators, the matching (in the restrictive way) estimators and the estimators using the propensity score. Our estimators are clearly of the first type since we estimate the conditional distribution function by quantile regression. If fully nonparametric procedures are used, all approaches yield numerically identical results. However, in applications, a fully nonparametric approach is often not possible and the different restrictions will have different effects on the estimation. The more we go into the parametric direction, the more the choice of the approach matters. If the sample size is too small or if the number of covariates is too high, the two tractable competitors are 23
Naturally, the sample quantiles can only be used to estimate observed distributions.
Chapter 1: Counterfactual distributions
35
the propensity score matching and the QQR. While propensity score matching estimators assume that p ( X ) satisfies a parametric distributional assumption, the QQR assumes that we know up to a finite number of parameters how Y depends on X. It depends on the application in question which of these assumptions is more likely to be satisfied. We have a preference for the second type of assumptions because they are often easier to interpret24 and because no distributional assumption is necessary25. New directions of research naturally arise from this chapter. The efficiency of the parametric estimator can certainly be improved by using weighted quantile regression (Zhao 1999). It would be interesting to investigate if this weighted estimator attains an efficiency bound. A method to choose the bandwidths is the most urgent development needed to fully specify the estimator using nonparametric quantile regression. The optimal choice of smoothing parameters is a problem appearing in the implementation of a lot of semiparametric estimators proposed during the last decade. We must say that no fully satisfying solution has so far been developed. An additional problem, which is specific to the proposed estimator, is that the optimal bandwidth probably depends on the quantile of the regression and, thus, a huge number of different bandwidths must be chosen. The computational burden may simply be too high for a large range of methods, and simplifying assumptions, such as the ones used in Section 1.5.3, may be unavoidable.
24
The coefficients have a natural interpretation as rates of return to the human capital characteristics. Theoretical models can help to choose the parametric specification. 25 Probit or logit estimators are consistent only if the latent error term is normally respectively logistically distributed.
36
Chapter 1: Counterfactual distributions
Appendix A: Proof of theorem 1 Theorems 1 and 2 are applications of the results of Chen, Linton and Van Keilegom (2003, CLV hereafter). They extend the results of Newey (1994) and Andrews (1994) for non-smooth objective functions, allowing for a nonparametric first step estimation. We follow, as much as possible the notation of CLV but we must replace their θ by q since θ already symbolizes the quantile of interest in the quantile regression framework. We derive the asymptotic distribution of qˆc (θ ) . The other results can be derived similarly. Define
Z i = (Yi , X i , Ti ) , 1
m ( X i , q, β ( ⋅) ) = θ − ∫ 1( X i ' β (τ ) − q ≤ 0 ) dτ , 0
M ( q, β ( ⋅) ) = E m ( X , q, β ( ⋅) ) , T =1 M n ( q, β ( ⋅) ) = n1−1 ∑ m ( X i , q, β ( ⋅) ) . i:Ti =1
The moment condition M is satisfied since, at the true parameters β 0 ( ⋅) and qc , 1 M ( qc , β 0 ( ⋅) ) = E m ( X , qc , β 0 ( ⋅) ) = E θ − ∫ 1( X β 0 (τ ) − qc ≤ 0 ) dτ T =1 T =1 0
= θ − E Pr (Y ( 0 ) < qc X ) = θ − E FY ( 0) ( qc X ) = θ − FY ( 0) ( qc T = 1) = 0. T =1 T =1 The asymptotic distribution of the first step parametric quantile regression process has been derived by Gutenbrunner and Jureckova (1992):
(
)
n βˆ0 (τ ) − β 0 (τ ) ⇒ b (τ ) , where b ( ⋅) is a mean zero Gaussian process with covariance function:
cov b (τ ) b (τ ') = ( min (τ ,τ ') − ττ ') D1t (τ ) D0t D1t (τ ) . −1
−1
(12)
Chapter 1: Counterfactual distributions
37
The consistency of qˆc is straightforward to show and is based on the consistency of the quantile regression coefficients. Thus, we concentrate on the asymptotic normality and examine the 6 conditions of theorem 2 in CLV. is, by definition, the θ th quantile of the sample
Condition (2.1): qˆc
{{ X βˆ (τ )} } J
i
0
j
where each “observation” is weighted by τ j − τ j −1 . Koenker
j =1 i:T =1 i
and Bassett (1978) show that quantiles can also be defined as solutions to optimization problems. qˆc solves
(
J
arg min n1−1 ∑ ∑ (τ j − τ j −1 ) ρθ X i βˆ (τ j ) ≤ q q
i:Ti =1 j =1
1
−1 1
= arg min n q
( )
M n q, βˆ
∑ ∫ ρθ (
) .
)
X i βˆ (τ ) ≤ q dτ
i:Ti =1 0
is the derivative of this problem. Koenker and Bassett (1978) show in
(
)
theorem 3.3 that M n qˆc , βˆ = ο ( n −1 2 ) and, thus, satisfies condition (2.1). Condition (2.2): Γ 1 =
=−
∂M ( qc , β 0 ( ⋅) ) ∂q
=
1 ∂ E θ − ∫ 1( X β 0 (τ ) − qc ≤ 0 ) dτ T = 1 ∂q 0
∂FY ( 0) ( qc T = 1) ∂ E FY ( 0) ( qc X ) = − = − fY ( 0) ( qc T = 1) . ∂q T =1 ∂q
By assumption P.vii. fY ( 0) ( qc T = 1) is continuous and not zero and thus conditions 2 (i) and (ii) of CLV are satisfied. Condition (2.3):
Γ 2 ( q, β 0 ( ⋅) ) βˆ0 ( ⋅) − β 0 ( ⋅) , the pathwise derivative of
M ( q, β 0 ( ⋅) ) , exists and
Γ 2 ( q, β 0 ( ⋅) ) βˆ0 ( ⋅) − β 0 ( ⋅)
( (
)
(
))
= E fY ( 0) ( q X ) X βˆ0 FY ( 0) ( q X ) − β 0 FY ( 0) ( q X ) since T =1
38
Chapter 1: Counterfactual distributions
∂m ( X i , q, β 0 ( ⋅) ) ∂β (τ k ) and
=
1 θ − ∫ 1( X i β 0 (τ ) − q ≤ 0 ) dτ = 0 if τ k ≠ FY ( 0) ( q X i ) ∂β (τ k ) 0
∂m ( X i , q, β 0 ( ⋅) ) ∂β (τ k )
∂
=−
∂
1
∂β (τ k ) ∫0
1( X i β 0 (τ ) − q ≤ 0 ) dτ = fY ( 0) ( q X i ) X i if
τ k = FY−(10) ( q X i ) . 1
The last equality follows from
∫ 1( X β (τ ) ≤ q ) dτ = F ( ) ( q X ) . i
0
Y 0
i
Now, by
0
assumption P.iv., fY ( 0) ( q0 X ) is continuous and fY ( 0) ( q0 X ) ' is bounded. Moreover, by assumption P.vii., X i is bounded. Thus condition 3 (i) is satisfied with c = sup fY ( 0) ( qc X ) ' X and condition 3 (ii) is also satisfied for the same reasons. Condition (2.4): The firs step estimator is a parametric,
n consistent estimator
and thus satisfies condition (2.4). Condition (2.5): We verify conditions (2.5) by applying theorem 3 of CLV. By definition:
m ( X i , q ', β ' ( ⋅) ) − m ( X i , q, β ( ⋅) ) ≤ 2
1
∫ (1( X
i
)
' β ' (τ ) ≤ q ' ) − 1( X i ' β (τ ) ≤ q ) dτ
0
1
≤
∫ (1( X
i
)
' β ' (τ ) ≤ q ' ) − 1( X i ' β (τ ) ≤ q ' ) dτ
0
1
+
∫ (1( X
i
)
' β (τ ) ≤ q ' ) − 1( X i ' β (τ ) ≤ q ) dτ .
0
We consider only the last term of the sum of the above right hand side, since the other term can be treated similarly (using the fact that X i is bounded by assumption P.vii.):
Chapter 1: Counterfactual distributions
39
1 sup E ∫ 1( X β (τ ) ≤ q ' ) − 1( X β (τ ) ≤ q ) dτ q ' − q ≤δ 0
(
)
1 ≤ E ∫ 1( X β (τ ) ≤ q + δ ) − 1( X β (τ ) ≤ q − δ ) dτ 0
(
)
≤ E FY ( 0) ( q + δ X ) − FY ( 0) ( q − δ X ) ≤ K δ for some K < ∞ , where the last inequality is due to the assumption that
supu , X fY ( 0) ( u X ) ' < ∞ . Hence condition 3.2 is satisfied with r = 2 and s = 1 2 and condition 3.3 holds by remark 3(ii) in CLK. Condition (2.6): We now verify condition (2.6’) which implies condition 2.6. Condition (2.6’) (i) is trivially satisfied: M n ( qc , β 0 ( ⋅) ) = n1−1 ∑ m ( X i , qc , β 0 ( ⋅) ) , i:Ti =1
E m ( X , qc , β 0 ( ⋅) ) = 0 (shown above) and
T =1
(
)
2 var m ( X , qc , β 0 ( ⋅) ) = E θ − FY ( 0) ( qc X ) ≤ 1 . T =1 T =1
In order to verify condition (2.6’) (ii) remember that
Γ 2 ( qc , β 0 ( ⋅) ) βˆ0 ( ⋅) − β 0 ( ⋅)
( (
)
))
(
= E fY ( 0) ( qc X ) X βˆ0 FY ( 0) ( qc X ) − β 0 FY ( 0) ( qc X ) . We
now
(
substitute
)
(
in
the
Bahadur
representation
)
for
βˆ0 FY ( 0) ( qc X ) − β 0 FY ( 0) ( qc X ) , interchange integral and summation an approximate to obtain
Γ 2 ( qc , β 0 ( ⋅) ) βˆ0 ( ⋅) − β 0 ( ⋅)
( (
) )
−1
= n1−1 ∑ fY ( 0) ( qc X i ) X i E fY ( 0) FY−(10) FY ( 0) ( qc X i ) X XX ' n0 −1 ∑ ωij T =0 i:Ti =1 j :T j = 0 + ο p ( n −0.5 )
= n1−1 ∑ ψ ( Z i ) + ο p ( n −0.5 ) i:Ti =1
40
Chapter 1: Counterfactual distributions
(
(
(
where ωij = X j FY ( 0) ( qc X i ) − 1 Y j ≤ X j β 0 FY ( 0) ( qc X i ) asymptotically
normally
(
distributed
with
)
E FY ( 0) ( qc X i ) 1 − FY ( 0) ( qc X i ) XX ' .
T =0
)))
mean
∑ω
and n0 −0.5
ij
0
and
variance
E (ψ ( Z i ) ) = 0
Therefore,
is
j :T j = 0
and
Var (ψ ( Z i ) ) < ∞ by assumptions P.iv and P.vii. We can now derive V1 ≡ E ( m ( X i , qc , β 0 ) + ψ ( Z i ) ) ( m ( X i , qc , β 0 ) + ψ ( Z i ) ) ' . Note first that m ( X i , qc , β 0 ) and ψ ( Z i ) are uncorrelated. We have already
(
)
2 var m ( X i , qc , β 0 ( ⋅) ) = E θ − FY ( 0) ( qc X i ) . T =1
derived
Then,
using
the
notation introduced in (12), we find that var (ψ ( Z i ) ) is equal to
(
) (
)
fY ( 0) ( qc x ) fY ( 0) ( qc z ) x 'cov βˆ0 FY ( 0) ( qc x ) , βˆ0 FY ( 0) ( qc z ) z ∫∫ dFX ( x T = 1) dFX ( z T = 1) or, integrating over the quantiles instead than over the distribution of X,
E fY ( 0) ( qc X ) X ' FY ( 0) ( qc X ) = τ cov βˆ (τ ) , βˆ (τ ') . ∫0 ∫0 E fY ( 0) ( qc X ) X FY ( 0) ( qc X ) = τ ' dτ dτ ' 1 1
Since all conditions of theorem 2 of CLV are satisfied, we apply this theorem and obtain (7). All other results of our theorem 1 can be derived similarly.
Appendix B: Proof of theorem 2 As for the parametric estimator, we derive only the asymptotic distribution of the estimator of the counterfactual quantile qˆcS . Since the structure of the proof is basically the same for the estimator using parametric first step estimation, we discuss only the differences. Define 1
m N ( X i , q, β ( ⋅, X i ) ) = θ − ∫ 1( X i β (τ , X i ) − q ≤ 0 ) dτ . 0
Chapter 1: Counterfactual distributions
41
Conditions (2.1), (2.2), (2.3) and (2.5) can be verified in the same way as for the parametric estimator, replacing β (τ ) by β (τ , X i ) . Assumptions S.iv., S.v. and S.vi. ensure that the bias of the nonparametric quantile regression goes to zero faster than n −1 2 and that the convergence rate of βˆ0 (τ , X i ) is at least of order n −1 4 , thus satisfying condition 2.4. In order to verify condition (2.6’) (ii), we use
(
)
the Bahadur representation for X i βˆ0 (τ , X i ) − β 0 (τ , X i ) derived by Chaudhuri (1991):
1 n0 hn
∑
j :T j = 0
(
τ − 1 Y j ≤ FY−(10) (τ X i )
(
f X ( X i T = 0 ) fY ( 0 ) F
−1 Y ( 0)
(τ
)
Xi ) Xi
)
X j − Xi −0.5 K + οp (n ) . hn
We obtain then the following representation:
Γ 2 ( qc , β 0 ( ⋅) ) βˆ0 ( ⋅) − β 0 ( ⋅)
{
(
)
)}
(
= E fY ( 0) ( qc X ) βˆ0 X , FY ( 0) ( qc X ) − β 0 X , FY ( 0) ( qc X ) T =1
fY ( 0) ( qc X i )
ωijN X j − Xi + ο p ( n −0.5 ) K ∑ h = 0 n h f X T f q X ) j:T j =0 n Y (0) ( c i ) i:Ti =1 0 n X ( i
= n1−1 ∑
1 n h f ( i:Ti =1 0 n X X i T = 0 )
= n1−1 ∑
X j − Xi K j :T j = 0 hn
∑
S −0.5 ωij + ο p ( n )
= n1−1 ∑ ψ S ( Z i ) + ο p ( n −0.5 ) i:Ti =1
(
(
) )) . Given
(
where ωijS = FY ( 0) ( qc X i ) − 1 Y j ≤ X j β 0 FY ( 0) ( qc X i ) , X i asymptotically
(
normal
with
mean
zero
X i , ωijS is
and
variance
)
FY X ( q0 X i ) 1 − FY X ( q0 X i ) . Therefore, E (ψ S ( Z i ) ) = 0 and f X ( X T = 1) var (ψ S ( Z i ) ) = E FY ( 0) ( qc X ) 1 − FY ( 0) ( qc X ) , T =1 f X ( X T = 0 )
(
)
42
Chapter 1: Counterfactual distributions
which is finite by the common support assumption. Thus, we can apply theorem 2 of CLV and we obtain (11). The other results from our theorem 2 can be proven similarly. For instance, using the same procedure as for qˆcS , we obtain
n1 ( qˆ1S − q1 )
(
TE=1 θ − FY ( 0) ( q1 X ) → N 0,
) + E F ( ) ( q X ) (1 − F ( ) ( q X )) 2
T =1
Y 1
fY (1) ( q1 T = 1)
1
Y 1
2
1
,
which is the variance of the θ th quantile of Y T = 1 by the law of total variance.
Appendix C: Efficiency bounds Firpo (2006) derives the efficiency bound for the QTE and the QTET assuming unconfoundedness, overlap and uniqueness of quantiles. Although his notation is almost totally different from our, we show in this appendix that the asymptotic
and QTE is equal to the efficiency variances of the proposed estimators QTET bounds. First, the efficiency bound for the QTE is given by Firpo (2006):
var g1,θ (Y ) X , T = 1 var g 0,θ (Y ) X , T = 0 E + p(X ) 1− p ( X )
(
)
2 + E g1,θ (Y ) X , T = 1 − E g 0,θ (Y ) X , T = 0
where
1(Y ≤ q j ,θ ) − θ g j ,θ = − fY ( j ) ( q j ,θ )
var g1,θ (Y ) X , T = 1 =
p( X ) =
f X ( X T = 1) fX ( X )
, q j ,θ = FY−(1j ) (θ )
(
for
FY (1) ( q1,θ X ) 1 − FY (1) ( q1,θ X )
p . Thus,
fY (1) ( q1,θ )
2
)
j = 0,1 . Note that
and
Chapter 1: Counterfactual distributions
(
43
)
FY (1) ( q1,θ x ) 1 − FY (1) ( q1,θ x ) var g1,θ (Y ) X , T = 1 E f X ( x ) dx =∫ 2 p( X ) p ( x ) fY (1) ( q1,θ ) =∫
=
(
)
FY (1) ( q1,θ x ) 1 − FY (1) ( q1,θ x ) f X ( x ) fY (1) ( q1,θ ) f X ( x T = 1) p 2
(
f X ( x ) dx
)
E FY (1) ( q1,θ x ) 1 − FY (1) ( q1,θ x ) f X ( x ) f X ( x T = 1) . 2 fY (1) ( q1,θ ) p
Similar calculations show that
var g 0,θ (Y ) X , T = 0 =
(
)
E FY ( 0) ( q0,θ X ) 1 − FY ( 0) ( q0,θ X ) f X ( X ) f X ( X T = 0 ) . fY ( 0) ( q0,θ ) p 2
Finally, using
(
)
2 E E g1,θ (Y ) X − E g 0,θ (Y ) X T = 1 T = 0 2 2 = E E g1,θ (Y ) X + E g 0,θ (Y ) X −2 E g1,θ (Y ) X E g 0,θ (Y ) X T =0 T =1 T =0 T =1
(
) (
)
)(
)
2 2 F q1,θ X ) − θ FY ( 0) ( q0,θ X ) − θ ( Y 1 ( ) + 2 2 f ( q1,θ T = 1) fY ( 0) ( q0,θ T = 1) = E Y (1) FY (1) ( q1,θ X ) − θ FY ( 0) ( q0,θ X ) − θ −2 fY (1) ( q1,θ ) fY ( 0) ( q0,θ )
(
=
var FY (1) ( q1,θ X ) var FY ( 0) ( q0,θ X ) + 2 2 fY (1) ( q1,θ T = 1) fY ( 0) ( q0,θ T = 1)
(
)(
)
E FY (1) ( q1,θ X ) − θ FY ( 0) ( q0,θ X ) − θ −2 fY (1) ( q1,θ ) fY ( 0) ( q0,θ )
and the we obtain the equality between the asymptotic variance of QTE efficiency bound.
44
Chapter 1: Counterfactual distributions
, Firpo derives the following bound: Now, for the QTET 2 p ( X )V g Y X , T = 1 p ( X ) V g 0,θ T =1 (Y ) X , T = 0 1,θ T =1 ( ) E + p2 p 2 (1 − p ( X ) )
(
)
2 p ( X ) E g1,θ T =1 (Y ) X , T = 1 − E g 0,θ T =1 (Y ) X , T = 0 + . p2
Using the same results as for the QTE and noting that p ( X ) =
1− p ( X ) =
f X ( X T = 0) fX ( X )
(1 − p )
p(X )
and
1− p ( X )
=
f X ( X T = 0 ) (1 − p )
p ( X ) var g Y X , T = 1 1,θ T =1 ( ) E p2
)(
( )) f x dx ( ) p f ( ) (q T = 1) F ( ) (q x ) (1 − F ( ) ( q x )) =∫ f ( x T = 1) dx pf ( ) ( q T = 1) = E F ( ) ( q X ) (1 − F ( ) ( q X ) ) f ( ) ( q T = 1) (
p ( x ) FY (1) q1,θ T =1 x 1 − FY (1) q1,θ T =1 x Y 1
1,θ T =1
1,θ T =1
Y 1
X
2
2
1,θ T =1
Y 1
X
2
Y 1
T =1
Y 1
c
1,θ T =1
c
Y 1
Y 1
1,θ T =1
and
p ( X )2 var g 0,θ T =1 (Y ) X , T = 0 E p 2 (1 − p ( X ) ) 2
=∫
)( (1 − p ( x ) ) f ( ) ( q (
(
p ( x ) FY ( 0) q0,θ T =1 x 1 − FY ( 0) q0,θ T =1 x p2
Y 0
0,θ T =1
)
T =1
2
)) f
X
fX ( X )
f X ( X T = 1) p
obtain
=∫
f X ( X T = 1)
( x ) dx
2
,
p,
we
Chapter 1: Counterfactual distributions
=∫
)(
(
(
FY ( 0) q0,θ T =1 x 1 − FY ( 0) q0,θ T =1 x
(
)) f ( x T = 1) f X
) f ( x T = 0) (1 − p )
fY ( 0) q0,θ T =1 T = 1
2
X
45
( x T = 1) dx
X
f X ( X T = 1) = E FY ( 0) ( qc X ) 1 − FY ( 0) ( qc X ) T =1 f X ( X T = 0 )
(
)
fY ( 0) ( qc T = 1) . 2
Similar calculations show that 2 p(X ) E 2 E g1,θ T =1 (Y ) X , T = 1 − E g 0,θ T =1 (Y ) X , T = 0 p var FY ( 0) ( qc X ) var FY (1) ( q1 X ) = T =1 + T =1 2 2 fY ( 0) ( qc T = 1) p fY (1) ( q1 T = 1) p
(
(
)
)(
)
E FY ( 0) ( qc X ) − θ FY (1) ( q1 X ) − θ . −2 fY ( 0) ( qc T = 1) fY (1) ( q1 T = 1) p T =1
is the same as the efficiency bound. Thus, the asymptotic variance of QTET
46
Chapter 1: Counterfactual distributions
Tables for chapter 1 Table 1.1: Monte Carlo simulation, parametric first step, point estimates. Sample size
Bias
100 400 1600
-2.0064 -0.4751 -0.1059
100 400 1600
-0.0413 -0.0099 -0.0038
100. 400 1600
0.0090 0.0007 0.0003
100. 400 1600
0.0193 0.0022 0.0009
100 400 1600
0.0131 0.0002 -0.0002
100 400 1600
0.0098 0.0001 0.0004
100. 400 1600
2.1150 0.4812 0.1059
100 400 1600
0.0549 0.0108 0.0036
100 400 1600
-0.0009 -0.0002 -0.0001
Relative MSE of the sample quantile 5th percentile, control units, true value: -7.5032 6.049 40.6123 -2.8884 21.2894 0.5156 2.1886 5.0148 -0.967 5.2957 0.888 1.0357 1.0834 -0.498 3.4822 1.0279 25th percentile, control units, true value: -0.0795 0.411 0.1706 -0.3001 3.3652 1.107 0.1986 0.0395 -0.1669 3.2084 1.1776 0.1004 0.0101 -0.128 2.9951 1.154 50th percentile, control units, true value: 1.8887 0.3273 0.1072 0.0661 3.0657 1.2014 0.1640 0.0269 0.0085 2.9647 1.2251 0.0815 0.0066 -0.0403 2.9281 1.2555 5th percentile, treated units, true value: -1.2798 0.3068 0.0945 -0.0128 2.9902 1.4539 0.1528 0.0234 -0.0155 3.0263 1.4868 0.0754 0.0057 -0.0509 2.9009 1.5398 25th percentile, treated units, true value: 0.5125 0.2301 0.0531 0.0227 2.8731 1.3348 0.1159 0.0134 -0.0162 2.9842 1.3453 0.0564 0.0032 0.0191 3.0702 1.3089 50th percentile, treated units, true value: 1.8918 0.2320 0.0539 0.0041 2.9503 1.3240 0.1152 0.0133 -0.0103 2.9435 1.3615 0.0574 0.0033 0.0071 3.1482 1.3525 5th percentile, QTET, true value: 6.2250 6.2848 43.9678 3.0180 21.9114 2.2014 5.0768 0.9617 5.2372 1.0374 1.0869 0.4756 3.4395 25th percentile, QTET, true value: 0.5919 0.4377 0.1946 0.3465 3.4224 0.2068 0.0429 0.1663 3.1594 0.1037 0.0108 0.0654 2.8879 50th percentile, QTET, true value: 0.0031 0.3443 0.1186 -0.0391 3.1260 0.1674 0.0280 -0.0179 3.0266 0.0823 0.0068 0.0349 2.9636
St. dev.
MSE
Skew.
Kurt.
The number of replications is 2500, 5000 and 10000 respectively for 1600, 400 and 100 observations.
Chapter 1: Counterfactual distributions
47
Table 1.2: Monte-Carlo simulation, parametric first step, estimation of the standard errors. Estimator analytic bootstrap analytic bootstrap analytic bootstrap analytic bootstrap analytic bootstrap analytic bootstrap analytic bootstrap analytic bootstrap analytic bootstrap
Empirical size for a confidence level of Median bias 1% 5% 10% st 5 percentile, 100 observations, “true” value: 6.2848 0.0778 0.1251 0.1632 -2.1858 0.039 0.0695 0.0935 -0.4586 st 5 percentile, 400 observations, “true” value: 2.2014 0.0404 0.0810 0.1236 -0.2115 0.0195 0.0450 0.0820 0.1052 st 5 percentile, 1600 observations, “true” value: 1.0374 0.0240 0.0644 0.1148 -0.0475 0.0210 0.0510 0.1030 -0.0071 25st percentile, 100 observations, “true” value: 0.4377 0.0098 0.0413 0.0833 -0.0044 0.0073 0.0323 0.0650 0.0372 25st percentile, 400 observations, “true” value: 0.2068 0.0114 0.0530 0.0986 -0.0018 0.0115 0.0535 0.1070 0.0026 25st percentile, 1600 observations, “true” value: 0.1037 0.0100 0.0548 0.1032 -0.0025 0.0090 0.0650 0.1030 -0.0019 50st percentile, 100 observations, “true” value: 0.3443 0.0083 0.0438 0.0853 0.0072 0.0060 0.0380 0.0848 0.0222 50st percentile, 400 observations, “true” value: 0.1674 0.0114 0.0502 0.1000 -0.0018 0.0100 0.0515 0.0990 0.0002 50st percentile, 1600 observations, “true” value: 0.0823 0.0088 0.0472 0.1016 0.0001 0.0120 0.0520 0.1070 -0.0001
MAD 3.5338 3.6826 0.6732 0.5969 0.1695 0.1559 0.0571 0.0691 0.0131 0.0165 0.0038 0.0192 0.0278 0.0370 0.0064 0.0104 0.0016 0.0235
For the analytic estimator, the number of replications is 2500, 5000 and 10000 respectively for 1600, 400 and 100 observations. For the bootstrap estimator, the number of replications is 1000, 2000 and 4000 respectively for 1600, 400 and 100 observations.
48
Chapter 1: Counterfactual distributions
Table 1.3: Monte-Carlo simulation, point estimates of qc ( 0.5 ) with 400 observations. Estimator
Bias
Standard error
MM with: m = 100
0.0074
0.3968
0.1575
5.8711
m = 400
0.0014
0.2416
0.0584
2.1762
0.6689
m = 1000 m = 10000 m = 100000
-0.0006 0.0008 0.0007
0.2015 0.1683 0.1643
0.0406 0.0283 0.0270
1.5138 1.0552 1.0059
0.8167 0.9760 0.9976
qˆc ( 0.5 )
0.0004
0.1638
0.0268
1.0000
1.0000
Results based on 5000 replications.
MSE Relative MSE true value: 1.8887
Correlation
0.3855
Chapter 1: Counterfactual distributions
49
Table 1.4: Monte Carlo simulation, nonparametric first step, point estimates. Sample size
Bias
100 400 1600 6400
-4.0656 -2.4228 -1.3499 -0.5805
100 400 1600 6400
-0.3502 -0.1555 -0.0678 -0.0169
100 400 1600 6400
0.031 0.0366 0.0205 0.0144
100 400 1600 6400
0.0153 0.0042 -0.0032 0.0056
100 400 1600 6400
0.0555 0.0316 0.014 0.0088
100 400 1600 6400
0.0195 0.0163 0.0074 0.0037
100 400 1600 6400
5.0822 1.485 0.5487 0.2156
St. dev.
MSE
Skew.
Kurt.
Relative MSE of the sample quantile
5th percentile, control units, true value: -9.9967 11.1005 139.735 -3.993 33.843 0.2791 3.9373 21.3688 -1.2106 5.6593 0.3807 1.6497 4.5423 -0.4672 3.3023 0.4303 0.7238 0.8603 -0.2607 3.0978 0.5537 25th percentile, control units, true value: 2.4906 1.2111 1.5892 -0.4817 3.2603 0.794 0.5905 0.3728 -0.3264 3.1843 0.9056 0.2967 0.0926 -0.174 3.1047 0.9555 0.1489 0.0224 -0.0896 2.9079 1.0227 50th percentile, control units, true value: 6.2308 0.5568 0.3109 0.0107 3.2204 0.9467 0.2691 0.0737 0.0688 2.9178 0.9985 0.1296 0.0172 -0.1041 3.0533 1.008 0.0646 0.0044 0.0381 2.9033 1.0051 5th percentile, treated units, true value: 3.7015 0.936 0.8763 -0.8658 4.296 0.9438 0.443 0.1962 -0.447 3.4442 1.0759 0.2247 0.0505 -0.1422 2.9996 1.0655 0.1126 0.0127 -0.0372 3.2399 1.0746 25th percentile, treated units, true value: 6.6709 0.3272 0.1101 -0.1129 3.0459 1.067 0.1607 0.0268 0.0481 3.2409 1.0713 0.0806 0.0067 -0.0026 2.9702 1.0525 0.0396 0.0016 -0.0718 2.9824 1.0211 50th percentile, treated units, true value: 8.5198 0.3479 0.1214 0.0825 3.1471 1.2787 0.1833 0.0339 0.0612 3.0014 1.1576 0.092 0.0085 0.0528 2.9641 1.1052 0.0483 0.0023 0.0553 2.8465 1.0846 5th percentile, QTET, true value: 6.3287 27.7069 793.4025 30.2677 1372.56 2.7951 10.0162 1.4287 6.5665 1.0287 1.3588 0.4466 3.2192 0.4775 0.2742 0.2293 3.0544
50
Chapter 1: Counterfactual distributions
Table 1.4 (cont.): Monte Carlo simulation, nonparametric first step, point estimates. Sample size
Bias
100 400 1600 6400
0.2217 0.0819 0.0293 0.0142
100 400 1600 6400
-0.0225 -0.0031 0.0011 0.004
Relative MSE of the sample quantile 25th percentile, QTET, true value: 1.3986 0.7256 0.5756 1.0586 7.8828 0.276 0.0828 0.1927 3.1587 0.1362 0.0194 0.1029 3.0828 0.0688 0.0049 0.0764 3.0489 50th percentile, QTET, true value: 0.8783 0.5303 0.2817 -0.031 3.2076 0.2459 0.0605 -0.0399 2.9529 0.1232 0.0152 -0.0979 2.8415 0.0613 0.0038 -0.2134 2.8404
St. dev.
MSE
Skew.
Kurt.
The number of replications is 1000, 2000, 4000 and 8000 respectively for 6400, 1600, 400 and 100 observations.
Chapter 1: Counterfactual distributions
51
Table 1.5: Monte Carlo simulation, nonparametric first step, estimation of the standard errors. Sample size 100 400 1600 6400 100 400 1600 6400 100 400 1600 6400
Empirical size for a confidence level of “True” value 1% 5% 10% st 5 percentile 0.1438 0.2300 0.2903 122.6105 0.0230 0.0635 0.1065 2.7951 0.0090 0.0405 0.0860 1.0287 0.0100 0.0520 0.1070 0.4775 25st percentile 0.0078 0.0295 0.0534 0.7256 0.0048 0.0343 0.0768 0.2760 0.0075 0.0400 0.0885 0.1362 0.0080 0.0370 0.0990 0.0688 50st percentile 0.0076 0.0399 0.0790 0.5303 0.0043 0.0315 0.0675 0.2459 0.0060 0.0330 0.0760 0.1232 0.0070 0.0300 0.0700 0.0613
Median bias
MAE
-119.5533 -0.6860 -0.0160 0.0059
119.5667 1.0394 0.1694 0.0431
0.2345 0.0204 0.0069 0.0016
0.2459 0.0269 0.0087 0.0022
0.0159 0.0220 0.0095 0.0048
0.0666 0.0235 0.0095 0.0048
The number of replications is 1000, 2000, 4000 and 8000 respectively for 6400, 1600, 400 and 100 observations.
52
Chapter 1: Counterfactual distributions
Table 1.6: Descriptive statistics, means Variable Ln(wage)
All
Whites
Blacks
2.1980
2.2159
1.9895
Experience
19.9547
19.9307
20.2333
Education
13.5631
13.6138
12.9750
South
0.2806
0.2560
0.5665
Midwest
0.2877
0.2972
0.1771
West
0.2325
0.2429
0.1115
Northeast
0.1993
0.2039
0.1449
Number of observations
40349
37147
3202
Chapter 1: Counterfactual distributions
53
Figures for chapter 1
1.0 0.9 0.8 0.7 0.6 0.5 0.4
Correlation between MM estimator and the proposed estimator
Figure 1.1: Correlation between the proposed estimator and the MM estimator as function of m.
5
6
7
8
9
10
Log of the number of replications Results based on 5000 Monte Carlo replications.
11
54
Chapter 1: Counterfactual distributions
0.14
0.16
Figure 1.2: MSE of the proposed estimator and the MM estimator as a function of m.
0.10 0.08 0.04
0.06
Mean squared error
0.12
Estimator proposed in this paper MM estimator
5
6
7
8
9
10
Log of the number of replications
Results based on 5000 Monte Carlo replications.
11
Chapter 1: Counterfactual distributions
55
-0.25
-0.20
-0.15
-0.10
-0.05
Figure 1.3: Decomposition of the black/white wage gap using parametric quantile regression.
-0.30
Total differential Coefficients Characteristics
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
56
Chapter 1: Counterfactual distributions
0.06
Figure 1.4: Difference between the unconditional quantiles implied by the model and the sample quantiles.
-0.06
-0.04
-0.02
0.00
0.02
0.04
Parametric estimator Nonparametric estimator
0.0
0.2
0.4
0.6
0.8
Quantile A 95% confidence interval obtained by bootstrapping the results 100 times is plotted.
1.0
Chapter 1: Counterfactual distributions
57
0.00
Figure 1.5: Decomposition of the black wage gap using nonparametric quantile regression
-0.30
-0.25
-0.20
-0.15
-0.10
-0.05
Total differential Coefficients Characteristics
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
58
59
Chapter 2 Decomposition of differences in distribution using quantile regression This chapter proposes a semiparametric estimator of distribution functions in the presence of covariates. The method is based on the estimation of the conditional distribution by quantile regression. The conditional distribution is then integrated over the range of covariates. Counterfactual distributions can be estimated, allowing the decomposition of changes in distribution into three factors: coefficients, covariates and residuals. Sources of changes in wage inequality in the USA between 1973 and 1989 are examined. Unlike most of the literature, we find that residuals account for only 20% of the explosion of inequality in the 80s. Keywords: Wage Inequality, Quantile Regression, Oaxaca Decomposition. JEL classification: J31.
60
2.1
Chapter 2: Decomposition of differences in distributions
Introduction
The pronounced increase in wage inequality in several countries and particularly in the United States since the early 80s has motivated the study of changes in the distribution of wages. The most common practice is to calculate, compare and decompose summary indices of inequality like the Gini coefficient. However, as is well known in the income distribution literature, different summary measures of inequality can yield different rankings of inequality, since they put different weights on different parts of the distribution. As a result, recent research has increasingly focused on more global methods for describing changes in the whole distribution of wages. There has been a surge of methodologies extending the Oaxaca (1973) and Blinder (1973) decomposition of differences at the mean to decomposition of the whole distribution. Juhn, Murphy and Pierce (1993, JMP hereafter) have proposed a simple extension of the Oaxaca decomposition by taking account of the distribution of residuals. A problem of this decomposition is that it does not account for heteroscedasticity. In the original paper, JMP formally allow for the distribution of residuals to depend on the covariates but they do not explain how to do it empirically and give no details. Most other applications of this decomposition do not condition on the covariates. If the error term is really independent and normally distributed, this procedure is efficient. However, if the location model is inappropriate, this decomposition can produce misleading results. The whole conditional distribution of wages and not only the first two moments can depend on the covariates. This lack of flexibility have motivated new estimators which are less restrictive. One of the most interesting approaches is the re-weighting estimator proposed by DiNardo, Fortin and Lemieux (1996) and extended by Lemieux (2002). The counterfactual weights are chosen in a way that makes the distribution of skills constant across time. The main advantage of this approach is the lack of restrictions on covariates effects and density shapes. However, if there are too many variables, in particular continuous variables, it becomes impossible to
Chapter 2: Decomposition of differences in distributions
61
estimate counterfactual distributions nonparametrically. They must estimate the probability of being in one period given the characteristics with a probit model. Naturally, probit (or logit) estimates are consistent only if the error term is homoscedastic, normally (or logistically) distributed and if the specification is accurate. Thus, the nonparametric character of the estimator is not preserved. Moreover, this first step estimation has no economic interpretation: What does “the probability of being in 1989” mean for an observation in 1973? In this study we propose a new estimator of distribution functions in the presence of covariates. The whole conditional wage distribution is estimated by quantile regression. Then, the conditional distribution is integrated over the range of covariates to obtain an estimate of the unconditional distribution. The approach can be qualified as semiparametric: we must assume that the conditional quantiles satisfy a parametric restriction but no distributional assumption is needed and the covariates are allowed to influence the whole conditional distribution. The first step estimation has not only a statistical but also a natural economic interpretation since the quantile regression coefficients can be interpreted as rates of return to skills at different points of the wage distribution (Buchinsky 1994). Finally, the estimation of the conditional distribution allows us to naturally integrate the results in order to obtain the unconditional distribution; a procedure that is not possible with the conditional mean. Of course, we are not the first to propose a decomposition procedure based on quantile regression. Machado and Mata (2005) and Gosling, Machin and Meghir (2000) have proposed similar procedures. We extend their work by solving the problem of the crossing of different quantile curves and by determining the asymptotic distribution of the estimator. We use the proposed approach to reassess the sources of changes in the distribution of wages in the United States between 1973 and 1989 using hourly wage data from the May Current Population Survey (CPS) and from the outgoing rotation groups of the CPS. Unlike most previous studies (JMP, Katz and Autor 2000, Acemoglu 2002, for instance), we find that residuals account only for about 20% of the total growth in wage inequality. Changes in characteristics, on the
62
Chapter 2: Decomposition of differences in distributions
contrary, explain about half of the increase. The reason for the differences between our results and those commonly accepted in the literature is that quantile regression accounts for heteroscedasticity while others, like the JMP decomposition, assume independent error terms. However, the variance of the residuals expands as a function of education and experience and is smaller within unionized workers or certain sectors (public administration, manufacturing). The fact that the population is becoming more educated, less unionized and that employment in sectors with low variance declines puts more weight on groups with higher within-group inequality. This is a composition effect and not an increase in the price of unmeasured skills as concluded traditionally. A recent paper by Lemieux (2006) that uses the re-weighting approach, which also accounts for the presence of composition effects in changes of inequality, reaches a similar conclusion. The main difference between Lemieux’s and our estimator is that different parametric restrictions are involved. We assume that the conditional quantiles of wages are linear in characteristics while he assumes that the probability of being in a year t relative to the base year satisfies the restrictions of the linear logit model. The fact that we obtain similar results with different assumptions suggests that the findings are solid and the very popular results of JMP seem to be inaccurate. The chapter is organized as follows. In section 2.2 we present the methodology and show how it is possible to decompose differences in distribution into three factors: coefficients, covariates and residuals. Section 2.3 provides an application of the methodology to the distribution of wages in the United States between 1973 and 1989 and section 2.4 concludes.
Chapter 2: Decomposition of differences in distributions
2.2
63
Estimating distribution functions in the presence of covariates
2.2.1
Definition and motivation of the estimator
Let { yi , xi }i =1 be an independent sample from some population where xi is a N
K × 1 vector of regressors. We follow Koenker and Bassett (1978) and assume that
Fy−x1 (τ xi ) = xi β (τ ) , ∀τ ∈ ( 0,1) where Fy−x1 (τ xi ) is the τ
th
quantile of y conditionally on xi . A linear
relationship is assumed between the quantiles of y and x similarly to OLS that assumes a linear relationship between the mean of y and x. Of course, this assumption is restrictive but can be relaxed by using dummy variables, polynomial expansions and interaction terms as it is done in the case of models that assume a linear functional form for the mean. In this application, the dependent variable is the logged wage and the covariates are human capital characteristics. Thus, the quantile regression coefficients can be interpreted as rates of return to the different characteristics at the specified quantile of the conditional distribution. Koenker and Bassett (1978) show that β (τ ) can be estimated by
βˆ (τ ) = arg min b∈ℝ K
1 N
N
∑( y
i
i =1
− xi b ) (τ − 1( yi ≤ xi b ) ) ,
where 1( ⋅) is the indicator function. β (τ ) is estimated separately for each τ . Asymptotically, we could estimate an infinite number of quantile regressions. In finite samples, Portnoy (1991) shows that the number of numerically different quantile regressions is O ( N log ( N ) ) and each prevails on an interval. Let
(τ 0 = 0,τ1 ,...,τ J
= 1) be the points where the solution changes. βˆ (τ j ) prevails
64
Chapter 2: Decomposition of differences in distributions
from τ j −1 to τ j for j = 1,..., J . Let βˆ be the vector of all different quantile
(
)
regression coefficients: βˆ = βˆ (τ 1 ) ,..., βˆ (τ j ) ,..., βˆ (τ J ) . This is a model for the conditional quantiles of y, but we want to estimate the unconditional quantiles of y. We therefore need to integrate the conditional distribution over the whole range of the distribution of the regressors. However, a problem with quantile regression is the potential lack of monotonicity, that is
τ j ≤ τ k ⇒ xi βˆ (τ j ) ≤ xi βˆ (τ k ) . To overcome this problem, consider the following property of q0 , the population's θ th quantile of y:
q0 = FY−1 (θ ) ⇔ ∫ 1( y ≤ q0 ) dFy ( y ) = θ ⇔∫
( ∫1( y ≤ q ) f ( y x ) dy ) dF 0
X
Y X
( x) = θ
1 ⇔ ∫ ∫ 1 FY−1X (τ x ) ≤ q0 dτ dFX = θ . 0
(
)
The last equivalence is obtained by changing the variable of integration and noting that fτ (τ j ) = 1, ∀τ j ∈ ( 0,1) . Thus, replacing Fy−x1 (τ j xi ) by its consistent estimate xi βˆ (τ j ) and following the convention of taking the infimum of the set if the finite sample solution is not unique, the sample analog of q0 is given by
1 qˆ βˆ , x = inf q : N
( )
N
J
∑∑ (τ i =1 j =1
j
− τ j −1 )1 xi βˆ (τ j ) ≤ q ≥ θ .
(
)
(1)
Assuming traditional restrictions of the quantile regression model, one can prove that qˆ is a consistent and asymptotically normally distributed estimator of q0 26. Given the difficulty in estimating the asymptotic variance, the statistical inference will be conducted with bootstrap procedures.
26
A formal proof and the asymptotic variance can be found in Melly (2006a).
Chapter 2: Decomposition of differences in distributions
65
The formulas above and specially (1) may seem complicated to practitioners. However, what has to be done to estimate the θ th quantile of y is very simple and consists of a straightforward 2-steps procedure: 1.
Estimation of the whole quantile regression process y = x β (τ ) . In R27, this can be done with a single line of command using the package quantreg written by Roger Koenker. Estimating the whole quantile regression process can take a very long time if the number of observations is large. However, the asymptotic results are also valid if the quantile regressions are estimated along a grid of τ -values whose mesh is sufficiently small (a mesh size of order O ( N −1 2 −ε ) will work). If the estimation is based upon a large dataset (about N > 3000 depending on the available computer performance), a smaller number of quantile regressions should be estimated. We recommend to use the interior point algorithm written by Portnoy and Koenker (1997) in R.
2.
Estimation of the θ th quantile of the sample
{{x βˆ (τ )} } J
i
j
N
j =1 i =1
by weighting
each “observation” by (τ j − τ j −1 ) . The weights are not necessary if a regular grid of quantiles has been used. 2.2.2
Decomposition of differences in distribution
The utility of estimating the unconditional distribution of a variable by using quantile regression as done in (1) is pretty small since the sample quantiles are in any case consistent (Glivenko-Cantelli theorem) and are simpler to estimate. The main interest in this estimator is the possibility of simulating counterfactual distributions that can be used to decompose differences in distribution. We use the same framework as JMP to decompose the differences in wage distributions
27 R is an open-source programming environment for conducting statistical analysis and graphics. The software can be downloaded at no cost from the site www.r-project.org. See R Development Core Team (2003) for details.
66
Chapter 2: Decomposition of differences in distributions
between 1973 and 1989. Taking the median as a measure of the central tendency of a distribution, we can write a simple wage equation for each year
yit = xit β t ( 0.5 ) + uit , t = 73,89 where β t ( 0.5 ) is the coefficient vector of the median regression in year t. We can now isolate the effects of changes in characteristics x, coefficients β ( 0.5 ) and residuals u. We estimate first the counterfactual distribution of wages that would have prevailed in 1973 if the distribution of individual attributes had been as it is in 1989 by minimizing (1) over the distribution of x in 1989 and using the coefficients estimated in 1973. Formally,
1 qˆ βˆ 73 , x89 = inf q : N
(
)
N
J
∑∑ (τ
j
i =1 j =1
− τ j −1 )1 xi89 βˆ 73 (τ j ) ≤ q ≥ θ
(
)
is the θ th quantile of this counterfactual distribution of wages. Thus, the
(
difference between qˆ βˆ 73 , x89
)
(
and qˆ βˆ 73 , x 73
)
is explained by changes in
characteristics. This decomposition is less restrictive than the JMP decomposition because the characteristics are allowed to influence the whole conditional distribution of y. To separate the effects of coefficients from the effects of residuals, note that the
τ
th
quantile of the residuals distribution conditionally on x is consistently
(
)
estimated by x βˆ (τ ) − βˆ ( 0.5 ) . We define the J × 1 vector βˆ m89, r 73 where its jth
(
)
element is given by βˆ m89, r 73 (τ j ) = βˆ 89 ( 0.5 ) + βˆ 73 (τ j ) − βˆ 73 ( 0.5 ) . Thus, we estimate the distribution that would have prevailed if the median return to characteristics had been the same as in 1989 but the residuals had been distributed
( ) is
)
(
as in 1973 by qˆ βˆ m89, r 73 , x89 . Therefore, the difference between qˆ βˆ m89, r 73 , x89
(
and qˆ βˆ 73 , x89
)
due to changes in coefficients since characteristics and
Chapter 2: Decomposition of differences in distributions
(
67
residuals are kept at the same level. Finally, the difference between qˆ βˆ 89 , x89
(
)
)
and qˆ βˆ m89, r 73 , x89 is due to residuals. The final decomposition is the following
(
) (
) (( + ( qˆ ( βˆ + ( qˆ ( βˆ
) ( )) , x ) − qˆ ( βˆ , x ) ) ) − qˆ ( βˆ , x ))
qˆ βˆ 89 , x89 − qˆ βˆ 73 , x 73 = qˆ βˆ 89 , x89 − qˆ βˆ m89, r 73 , x89 m 89, r 73
73
, x89
89
73
73
89
(2)
73
where the first bracket represents the effect of changes in residuals, the second the effects of changes in (median) coefficients and the third the effects of changes in the distribution of the covariates. Note that we can decompose all statistics (variance, difference between the 9th and the 1st deciles, Gini coefficient, coefficient of variation,…) since we can estimate the whole counterfactual distribution.
2.3
Changes in US wage inequality between 1973 and 1989
2.3.1
Introduction
A large and growing literature documents the changes in the US wage structure during the past three decades. Many researchers who used a variety of measures and datasets have found that wage inequality increased substantially during the 80s. An important finding is that residual inequality accounts for most of the growth in wage inequality. Katz and Autor (2000) survey this literature. They present a between- and within-group decomposition of the growth of the variance and find that residual inequality accounts for about 60% of the increase in inequality. Acemoglu (2002) summarizes four salient facts from the post-war US economy, one of which states: “Overall wage inequality rose sharply beginning in the early 1970s. Increases in within-group (residual) inequality account for much of this rise.” JMP have implemented their decomposition presented in the
68
Chapter 2: Decomposition of differences in distributions
introduction and found that 56% of the rise of the 90-10 wage differential from 1964 to 1988 is explained by residuals. On the other hand, in all these studies, the effect of changes in characteristics on the rise in inequality is negligible, less than 10% in JMP for instance. Acemoglu (2002), Aghion (2002) and others use these results as building blocks for models of technical changes and economic growth. However, Lemieux (2006) shows that this literature does not account for the possible dependence between residuals and characteristics. As suggested by Mincer’s (1974) famous human capital earnings model, residual wage dispersion should increase with experience and education. The literature on union28 and public sector29 wage effects shows also that the union membership and the public sector status reduce the variance of the unexplained component of earnings. Thus, changes in characteristics do not only affect the level of wages but also higher moments of the distribution. A part of the increase in the variance of residuals found in the literature is maybe due to changes in the composition of the workforce and not to higher returns to unobservable skills. The decomposition presented in section 2.2 can distinguish between both causes. 2.3.2
Data
We use hourly wage data from the May 1973 Current Population Survey (CPS) and from the 1989 outgoing rotation group files of the CPS. The samples used are broadly similar to those of DiNardo, Fortin and Lemieux (1996), Card and DiNardo (2002) and Lemieux (2006). The period has been chosen principally for three reasons. First, a major theme in the discussion on the widening of the wage distribution is the effect of the minimum wage. However, the minimum wage induces non-linearity of the wage function. JMP and Lemieux (2002) estimate linear regressions but, as recognized by Lemieux, this can only be an inadequate approximation in years where the real value of the minimum wage is high. To avoid this problem, we have simply chosen two years where the minimum wage 28
See Card, Lemieux and Riddell (2004) for a recent contribution to this issue. Borjas (2003) documents the shifts that occurred in the wage structures of the public and private sectors between 1960 and 2000.
29
Chapter 2: Decomposition of differences in distributions
69
was quite low and very few observations were at or below the minimum wage (0.33% in 1973, 0.28% in 1989). The differences in the distribution of earnings between these years cannot be caused by changes in the real value of the minimum wage. Secondly, several studies (such as Card and DiNardo, 2002) find that 1988 or 1989 is a turning point in the evolution of wage inequality. Wage inequality appears to have stabilized and no noticeable change can be seen between 1989 and 2001. Finally, the period 1973-1989 offers the possibility of comparing the results of the proposed estimator with the numerous empirical works covering this period. The measure of wage we use is the hourly wage of those workers that are paid on an hour-basis and usual weekly earnings divided by usual hours of work for the others. Allocated earnings are excluded because they can bias the results, in particular for distributional analysis since the mean value of the wage given the covariates is imputed. Unfortunately, flags indicating which observations are allocated are not available in 1989. However, Hirsch and Schumacher (2004) explain how it is possible to identify allocated earners by using unedited weekly earnings. We use their method to exclude allocated earnings. We use a broad sample of male workers but weight observations by the product of the CPS sample weight with usual hours of work to get a wage distribution representative of the total number of hours worked in the economy. We deflate wages to 198284 dollars using the CPI-U. Only men of age 16 to 65 and reporting an hourly wage above 1$ are kept in the sample. All observations with at least one missing for one of the variables are excluded. The sample sizes are approximately 23,000 in 1973 and 75,000 in 1989. The vector of regressors x consists of a quartic in potential experience (defined as
max ( 0, age − 5 − years in school ) ), 11 education dummies, 6 interaction terms between education and experience30, a part-time dummy, union status, 5 race dummies, 3 region dummies, and 17 industry dummies. The mean of all 30
Only interaction terms that were significant in at least one of both periods were kept as regressors.
70
Chapter 2: Decomposition of differences in distributions
covariates in both periods can be found in the first two columns of table 2.1. The level of potential experience decreased between 1973 and 1989 because of the entry of the baby-boom generation into the labor market and because of longer education. Educational attainment increased clearly over the period. For instance, the percentage of workers with a college degree increased by 60%. As is well known, de-unionization was impressing with a 11.4% fall in union members. 2.3.3
First step quantile regression results
Since some of the earnings are top coded, the conditional distributions have been estimated with censored quantile regressions. We have used the 3-step censored quantile regression algorithm suggested by Chernozhukov and Hong (2002). Their estimator requires a separation restriction on the censoring probability that costs a small reduction in generality but preserves the plausible semiparametric, distribution-free and heteroscedastic features of the model. It has the advantage of being easily computable. Because of the number of observations, it is simply not possible to estimate the whole quantile regression process. Therefore, we have estimated 200 different quantile regressions uniformly distributed between 0 and 1. We consider first the coefficients of the median regressions in the third and fourth columns of table 2.1. Points and stars indicate significant differences from zero, with standard errors estimated by bootstrapping the results 100 times31. The coefficients have generally the expected signs and are conform to previous studies. The negative public sector wage differential is surprising but it is compensated by the high positive coefficient on the public administration sector, where about 40% of public sector employees work. Between 1973 and 1989, we note that the wage-experience profile changed. The linear and cubic terms decreased but the quadratic and quartic terms increased. Therefore, the return to
31
The bootstrap is known to estimate the distribution of βˆ (θ ) consistently (Hahn 1995).
The observations are resampled with replacement. The number of replications must be kept reasonable because of the computation time (about 1.5 hours is needed per replication).
Chapter 2: Decomposition of differences in distributions
71
experience decreased for levels of experience below 10 years and increased above. Returns to education, particularly for persons with a degree higher than high school, increased. Coefficients of the median regression indicate how the level of wages depends on covariates. To analyze the effects of characteristics on the dispersion of earnings, the fifth and sixth columns of table 2.1 illustrate the difference between the quantile regression coefficients at the 9th decile and the coefficients at the 1st decile. If the error term is independent of a characteristic, the coefficient on this variable does not vary with the quantile and thus βˆ ( 0.9 ) − βˆ ( 0.1) should not be significantly different from zero. If the difference between the 9th and 1st decile coefficient on a covariate is positive (negative), a higher value of this variable increases (decreases) within-group inequality. The results show that for more than half of the variables the interdecile difference is significantly different from zero. Thus, heteroscedastic inconsistent methods will yield biased results. Consistent with Mincer (1974), within-group inequality grows as a function of experience32 and the interdecile range even increased between both periods. Within-group inequality also tends to grow as a function of education with low or negative interdecile differences at low educational levels and significant positive interdecile differences at the highest levels. As is well-known in the literature, the variance of wages is significantly lower for union members and this difference increased between 1973 and 1989. Finally, we observe that within-group inequality differs between sectors of employment. Given that within-group inequality depends on characteristics, changes in characteristics will affect overall inequality. A straightforward example helps to understand the importance of this composition effect. In order to keep the illustration simple, we consider first only x 73 and x 89 , the means of the characteristics distributions in 1973 and 1989. In the quantile regression framework, the interdecile range in 1973 evaluated at x 73 can be estimated by 32
However, his model predicts first declining and then increasing residual variance as a function of experience. We do not find evidence for the first prediction.
72
Chapter 2: Decomposition of differences in distributions
(
)
x 73 βˆ 73 ( 0.9 ) − βˆ 73 ( 0.1) . Using the results of table 2.1, we get a value of 0.883 for the interdecile range. Now, if we evaluate the interdecile range in 1973 at x 89 , we obtain a value of 0.934. This increase cannot possibly be due to changes
in residuals or coefficients since they were kept unchanged. It has to arise from changes in characteristics such as the increase in the percentage of college and post college degrees, the de-unionization and the fall of employment in the manufacturing sector. Nevertheless, if we keep the mean characteristics at x 73 but replace
( βˆ
73
( 0.9 ) − βˆ 73 ( 0.1) )
by their value in 1989, the estimated wage
interdecile range attains a level of 0.936. Thus, a part of the increase in withingroup inequality can be attributed to residuals but this effect will be strongly overestimated if we do not correct for the composition effect. 2.3.4
Decomposition results
These first results indicate that the composition effect is potentially important. To consider also the effects of changes in coefficients and changes in the distribution of characteristics, the decomposition (2) proposed in the second section has been estimated. Figure 2.1 plots the decomposition results at 999 different quantiles ( θ = 0.001, 0.002,..., 0.999 )
placed
on
the
x-axis.
Table
2.2
presents
decomposition results for the median and for various measures of wage dispersion: the standard deviation and the 90-10, 90-50, 50-10, 75-25 and 95-5 gaps of the log wage distribution. Standard errors computed by bootstrapping the results 100 times are given in parentheses. For each statistic, the relative importance of each component of the decomposition is given in italic in the second row. As documented in many other studies, there is a clear widening in the unconditional wage distribution over this time period. Real wages for workers at the 10th percentile declined by about 20%, while they rose by about 2% for workers at the 90th percentile. As a result, the standard deviation of log wages and all other measures of wage inequality such as the interquartile and the interdecile
Chapter 2: Decomposition of differences in distributions
73
ranges increased significantly. Perhaps more surprising but totally consistent with the findings of previous studies is that the mean and the median real wage were lower in 1989 than in 1973. The positive effect of characteristics on the median indicates that if workers' attributes had been rewarded the same in 1989 as in 1973, wages should have risen, not fallen, in 1989. The lower level of wages is explained by changes in coefficients, that is how workers characteristics are rewarded. This is mainly the consequence of a lower constant and not of lower return to human capital characteristics. Naturally, the effect of the residuals on the median is not significantly different from zero. These results could have been obtained by the traditional Oaxaca / Blinder decomposition but much more interesting is the decomposition of indices of inequality. We note first that the six indices of inequality yield similar results about the relative importance of each component. Residuals account for about 20% (between 15 and 25%, depending on the statistic) of the increase of inequality. This is much less than what is widely accepted in the literature and was not subject of controversy until recently. The literature briefly surveyed in section 2.3.1 finds that residuals account for most of the increase in inequality. Coefficients account for about 30% (between 25 and 35%) of the growth in overall wage inequality. This is a standard result that is explained by the rise in the returns to education. It is not surprising that our estimates are almost the same as those of JMP because the methodologies are principally similar for what concerns the effects of coefficients. Finally, the most important part of the growth in inequality is explained by changes in the distribution of characteristics. This stands again in contradiction to the literature quoted above and is explained by the composition effect of characteristics on residuals. We have seen in table 2.1 that wage dispersion increases with experience and education but is smaller for union member and in the construction sector. Thus, the increase in education, the de-unionization and the fall of the occupation in the manufacturing sector do not only affect the level of wages but also increase within-group inequality. Methods
74
Chapter 2: Decomposition of differences in distributions
that do not account for the dependence between residuals and characteristics overstate the effects of residuals and understate the effects of characteristics (see also the discussions in Lemieux, 2002 and 2006, and Machado and Mata, 2005). We note that the estimates are fairly precise if we take into account that we control for 48 independent variables and that it is more difficult to estimate changes in inequality than levels. In order to strengthen the conclusions, different robustness checks were performed. They are not given here in details but are available on the internet page of the author. First, the order of the decomposition has been changed. Since there are 3 components, we can imagine 6 different orders. All these 6 decompositions give similar results. Second, the number of quantile regressions in the first step of the estimation was set to 10, 100 and 400. The results are almost the same and absolutely not sensitive to this change. Finally, we have estimated the decomposition over sub-periods: 1973-1979, 1979-1984 and 1984-1989. The overall inequality first decreased between 1973 and 1979, then increased a lot between 1979 and 1984 and increased also but slightly less fast in the third period. The effect of changes in characteristics onto inequality is positive over the whole period. It is logical that changes in characteristics are more continuous than other changes since it takes time to increase the level of education or experience, for instance. Changes in coefficients reduce inequality in the first period, they account for most of the increase in inequality during the second period and they are almost insignificant in the third period. Residuals have a moderate positive effect on inequality in all sub-periods. We observe that the effects of residuals and of coefficients seem to go into opposite directions during the first period, which would contradict the prediction of a single-index model of skills. It is also interesting to note that the effects of coefficients seem to be connected with the level of the minimum wage. As a matter of fact, over the 1973-1979 period, both the real value of the minimum wage and coverage rose substantially. On the contrary, the real value of the minimum wage decreased a lot from 1979 to 1989.
Chapter 2: Decomposition of differences in distributions
2.4
75
Conclusion
In this chapter, we have proposed and implemented a flexible, intuitive and semiparametric estimator of distribution functions in the presence of covariates. The conditional wage distribution is estimated by quantile regression. Then, the conditional distribution is integrated over the range of the covariates to obtain estimates of the unconditional distribution. Counterfactual distributions can be estimated, allowing the decomposition of changes in distribution into three factors: changes in regression coefficients, changes in the distribution of covariates and residuals changes. This decomposition is in the spirit of the JMP decomposition but it allows the covariates to influence the whole distribution of the dependent variable. We have applied this methodology to US data for the period 1973-1989, a period during which earnings inequality increased quite dramatically. We find that about half of the increase in inequality can be explained by changes in the distribution of characteristics. Increases in the return to skills, particularly education, account also for a substantial proportion of the increase in inequality. On the contrary, changes in residuals account only for about 20% in the growth of inequality, suggesting that there was only a moderate increase in the price of unmeasured skills. These results, which are different from those of most other studies but similar to Lemieux (2006), show how important it is to allow the covariates to affect the whole residuals distribution and not only the first moment(s).
76
Chapter 2: Decomposition of differences in distributions
Tables for chapter 2 Table 2.1: Mean of the covariates, median regression coefficients and interdecile ranges
Constant
βˆ ( 0.5 )
Mean
Variable
βˆ ( 0.9 ) − βˆ ( 0.1)
1973
1989
1973
1989
1973
1989
1
1
0.888**
0.784**
0.956**
0.793**
Experience
19.9836
18.0228
0.072**
0.065**
0.028**
0.045**
Experience^3
588.635
462.846
-0.003**
-0.002**
-0.001·
-0.003**
Experience^3
20414.7
14303.2
5.2E-5**
4.1E-5**
3.6E-5·
6.9E-5**
Experience^4
774768
495823
-3.5E-7**
-2.8E-7**
-3.6E-7·
-6.8E-7**
Grade1to4
1.78%
0.83%
0.094
0.053
0.009
-0.064
Grade5to6
2.45%
1.52%
0.032
0.039
0.036
-0.199
Grade7to8
7.98%
2.49%
0.125
0.123*
0.112
-0.098
Grade9
3.92%
2.02%
0.210*
0.189**
0.179
-0.059
Grade10
6.44%
3.50%
0.336**
0.298**
0.000
-0.009
Grade11
6.02%
3.90%
0.352**
0.396**
0.044
-0.054
Grade12
2.63%
1.99%
0.421**
0.445**
0.062
-0.032
High school
34.17%
34.76%
0.470**
0.468**
0.098
0.075
Some college
17.99%
23.34%
0.593**
0.626**
0.186
0.146
College
9.03%
14.44%
0.837**
0.942**
0.239
0.216·
Post-college
7.28%
10.94%
1.021**
1.116**
0.360·
0.339**
Exp.*grade5to6
0.84915
0.41998
0.004*
0.000
-0.004
0.004
Exp.*grade7to8
2.66514
0.78727
0.003**
0.002
-0.003
0.003
Exp.*grade9
1.02476
0.47018
0.002·
0.002
-0.005·
0.001
Exp.*grade11
1.16156
0.65354
0.001
-0.002**
-0.001
0.002
Exp.*grade12
0.34661
0.24719
-0.001
-0.003**
-0.001
0.004
Exp.*college
1.36963
2.24567
0.005**
0.001
0.001
0.003·
Part-time
5.16%
5.97%
-0.126**
-0.174**
0.090**
0.050*
Union member
31.61%
20.21%
0.148**
0.179**
-0.109**
-0.164**
Black non-Hispanic
8.31%
9.04%
-0.138**
-0.139**
0.013
0.000
Mexican
2.86%
5.85%
-0.145**
-0.138**
-0.031
0.010
Other Hispanic
1.87%
2.91%
-0.118**
-0.128**
-0.017
0.070*
Other non-white
1.23%
2.86%
-0.109**
-0.128**
0.244·
0.067*
Northeast
22.21%
19.55%
0.089**
0.125**
-0.048*
-0.023
Midwest
28.03%
25.40%
0.103**
0.021**
-0.058**
-0.017
Bootstrap with 100 replications. ·: significant at the 5%, *: significant at the 1%, **: significant at the 0.1%.
Chapter 2: Decomposition of differences in distributions
77
Table 2.1 (cont.): Mean of the covariates, median regression coefficients and interdecile ranges βˆ ( 0.5 )
Mean
Variable 1973
βˆ ( 0.9 ) − βˆ ( 0.1)
1989
1973
1989
1973
1989
West
18.35%
20.89%
-0.080**
0.020
15.82%
14.64%
0.125** -0.038·
0.128**
Public sector
-0.071**
-0.004
-0.113**
Construction
9.95%
9.62%
0.453**
0.305**
-0.264**
-0.210**
Manufacturing durable goods
21.67%
17.25%
0.266**
0.250**
-0.484**
-0.304**
Other manufacturing
11.27%
9.30%
0.227**
0.201**
-0.447**
-0.229**
Transport
5.35%
6.47%
0.362**
0.245**
-0.356**
-0.155**
Utilities services
3.72%
3.71%
0.336**
0.352**
-0.478**
-0.297**
Wholesale trade
4.95%
5.40%
0.258**
0.128**
-0.371**
-0.203**
Retail trade
12.75%
13.68%
0.073*
-0.018
-0.344**
-0.167**
Finance
3.66%
4.61%
0.283**
0.264**
-0.284**
-0.071
Business services
3.09%
5.76%
0.167**
0.112**
-0.251**
-0.059
-0.292**
-0.072
-0.015
-0.110·
Personal services
1.02%
1.54%
0.004
-0.062·
Entertainment services
0.77%
1.04%
0.111·
-0.027
Health services
0.63%
0.96%
0.229**
0.062·
-0.333**
-0.047
Hospitals Educational services Social services
1.73%
1.97%
0.121*
0.071*
-0.362**
-0.098
5.49%
5.28%
0.081*
-0.017
-0.494**
-0.195**
1.11%
0.66%
-0.487**
-0.129**
0.229
-0.172*
1.59%
3.07%
0.348**
0.164**
-0.425**
0.111·
7.01%
5.86%
0.365**
0.246**
-0.376**
-0.192**
Other professional services Public administration
Bootstrap with 100 replications. ·: significant at the 5%, *: significant at the 1%, **: significant at the 0.1%.
78
Chapter 2: Decomposition of differences in distributions
Table 2.2: Decomposition of changes in measures of wage dispersion using quantile regression Effects of:
Statistic
Total change
residuals
coefficients
characteristics
Median
-9.71 (0.45) 100%
-0.3 (0.31) 3.08% (3.22)
-14.85 (0.42) 152.87% (6.23)
5.43 (0.29) -55.94% (4.92)
Standard deviation
6.76 (0.36) 100%
1.2 (0.31) 17.79% (4.43)
(0.31) 32.95% (4.24)
3.33 (0.18) 49.26% (2.81)
90-10
20.94 (0.9) 100%
4.01 (0.71) 19.14% (3.23
6.26 (0.89) 29.89% (3.6)
10.68 (0.52) 50.97% (2.62)
50-10
9.36 (0.55) 100%
1.42 (0.46) 15.14% (4.37)
3.28 (0.48) 35.1% (4.1)
4.66 (0.31) 49.77% (3.24)
90-50
11.59 (0.64) 100%
2.59 (0.53) 22.37% (4.34)
2.98 (0.54) 25.68% (4.47)
6.02 (0.36) 51.94% (3.81)
75-25
13.32 (0.48) 100%
3.42 (0.39) 25.69% (2.68)
3.7 (0.43) 27.76% (2.98)
6.2 (0.31) 46.55% (2.41)
95-5
22.55 (1.19) 100%
3.81 (1.08) 16.91% (4.64)
7.91 (1.16) 35.06 (4.34)
10.83 (0.63) 48.03% (2.83)
Note: all numbers have been multiplied by 100. Bootstrap standard errors with 100 replications in parentheses.
Chapter 2: Decomposition of differences in distributions
79
Figure for chapter 2
0.10
Figure 2.1: Decomposition of differences in distribution using quantile regression
0.00
0.05
Characteristics
-0.10
-0.05
Residuals
-0.15
Total change
-0.20
Coefficients
0.0
0.2
0.4
0.6
0.8
1.0
Quantile Note: Decomposition results obtained by applying formula (2) at each of the 999 per mills. Bootstrap standard errors with 100 replications.
80
Chapter 2: Decomposition of differences in distributions
Appendix A: Changing the order of the decomposition In the decomposition presented in table 2.2 and figure 2.1 we have first estimated the effect of residuals, then the effect of coefficients and finally the effect of characteristics. This order is somewhat arbitrary and we could imagine 5 other orders that are not necessarily less sensible. If different orders give totally different results, this would render the results questionable. Therefore, we have estimated all possible decompositions. The effects of residuals, coefficients and characteristics are plotted in figures 2.A1, 2.A2 and 2.A3 respectively. The capital letters represents the orders of the decomposition as defined by the followings formulas:
( qˆ ( βˆ , x ) − qˆ ( βˆ , x )) A. + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) qˆ ( βˆ , x ) − qˆ ( βˆ , x )) ( B. + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) ( qˆ ( βˆ , x ) − qˆ ( βˆ , x )) C. + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) ( D. + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) ( qˆ ( βˆ , x ) − qˆ ( βˆ , x )) E. + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) 89
89
m 89, r 73
m 89, r 73
89
89
89
73
89
89
89
73
89
73
73
73
89
73
89
73
73
m 89, r 73
73
m 89, r 73
73
73
73
73
m 73, r 89
73
73
73
89
73
73
73
m 73, r 89
89
m 89, r 73
89
89
89
89
73
m 73, r 89
m 73, r 89
89
89
89
89
89
m 89, r 73
89
89
m 89, r 73
73
qˆ ( βˆ , x ) − qˆ ( βˆ , x )) ( F. + ( qˆ ( βˆ , x ) − qˆ ( βˆ , x ) ) + ( qˆ ( βˆ 89
89
m 73, r 89
m 73, r 89
89
m 89, r 73
73
73
73
89
m 73, r 89
73
m 73, r 89
) (
, x 73 − qˆ βˆ 73 , x 73
))
Chapter 2: Decomposition of differences in distributions
81
The results presented in the chapter are those obtain by the order A. We observe that the conclusions of the chapter are not sensitive to the choice of the order of the decomposition.
82
Chapter 2: Decomposition of differences in distributions
Figure 2.A1: Effects of residuals found by changing the order of the decomposition
-0.02
0.00
0.02
0.04
0.06
orders A and E order B order C orders D and F
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
Chapter 2: Decomposition of differences in distributions
83
-0.10
Figure 2.A2: Effects of coefficients found by changing the order of the decomposition
-0.18
-0.16
-0.14
-0.12
order A orders B and F orders C and E order D
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
84
Chapter 2: Decomposition of differences in distributions
Figure 2.A3: Effects of characteristics found by changing the order of the decomposition
0.00
0.05
0.10
orders A and B orders C and D order E order F
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
Chapter 2: Decomposition of differences in distributions
85
Appendix B: Changing the number of quantile regression The results presented in the chapter are based on the estimation of 200 quantile regressions in the first step. The asymptotic results of Melly (2006a) are valid only if the number of quantile regressions goes to infinity as the number of observations goes to infinity. Thus, the highest possible number of quantile regressions should be run in the first step. On the other hand, Gossling, Machin and Meghir (2000) propose a similar procedure but estimate only 9 quantile regressions. A possible concern is that the model could be overparametrized with too many quantile regressions. Therefore, as a robustness test, we compare the decomposition results with different numbers of quantile regressions: 10, 100, 200 and 400. Figures 2.B1, 2.B2 and 2.B3 plot the effects of residuals, coefficients and characteristics based on different numbers of quantile regressions. The results are very similar and we can barely distinguish the results with 100, 200 or 400 different quantile regressions. The results with 10 quantile regressions are more noisy but there are not fundamentally different. It follows from these results that too many quantile regressions are not a problem. There is no overparametrization since each quantile regression estimates a different quantity.
86
Chapter 2: Decomposition of differences in distributions
0.08
Figure 2.B1: Effects of residuals with different numbers of quantile regressions
-0.02
0.00
0.02
0.04
0.06
10 quantile regressions 100 quantile regressions 200 quantile regressions 400 quantile regressions
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
Chapter 2: Decomposition of differences in distributions
87
-0.08
Figure 2.B2: Effects of coefficients with different numbers of quantile regressions
-0.20
-0.18
-0.16
-0.14
-0.12
-0.10
10 quantile regressions 100 quantile regressions 200 quantile regressions 400 quantile regressions
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
88
Chapter 2: Decomposition of differences in distributions
Figure 2.B3: Effects of characteristics with different numbers of quantile regressions
0.00
0.02
0.04
0.06
0.08
0.10
10 quantile regressions 100 quantile regressions 200 quantile regressions 400 quantile regressions
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
Chapter 2: Decomposition of differences in distributions
89
Appendix C: Changes in the distribution of wages over subperiods Some insights can be gained by considering smaller periods than the 16 years long period from 1973 to 1989. In this appendix, we present results for the three sub-periods 1973-1979, 1979-1984 and 1984-1989. Exactly the same procedure was used for each period as for the results presented in the chapter. Figures 2.C1, 2.C2 and 2.C3 plot the results. The overall inequality first decreased between 1973 and 1979, then increased a lot between 1979 and 1984 and increased also but slightly less fast in the third period. The effect of changes in characteristics onto inequality is positive over the whole period. It is logical that changes in characteristics are more continuous than other changes since it takes time to increase the level of education or experience, for instance. Changes in coefficients reduce inequality in the first period, they account for most of the increase in inequality during the second period and they are almost insignificant in the third period. Residuals have a moderate positive effect on inequality in all sub-periods. We observe that the effects of residuals and of coefficients seem to go into opposite directions during the first period, which would contradict the prediction of a single-index model of skills. It is also interesting to note that the effects of coefficients seem to be connected with the level of the minimum wage. As a matter of fact, over the 1973-1979 period, both the real value of the minimum wage and coverage rose substantially. On the contrary, the real value of the minimum wage decreased a lot from 1979 to 1989.
90
Chapter 2: Decomposition of differences in distributions
0.05
Figure 2.C1: Decomposition of changes in distribution between 1973 and 1979
Characteristics
0.00
Residuals
-0.05
Total change
-0.10
Coefficients
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
Chapter 2: Decomposition of differences in distributions
91
0.05
Figure 2.C2: Decomposition of changes in distribution between 1979 and 1984
0.00
Characteristics
-0.10
-0.05
Residuals
-0.15
Coefficients
Total change
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
92
Chapter 2: Decomposition of differences in distributions
0.08
Figure 2.C3: Decomposition of changes in distribution between 1984 and 1989
0.02
0.04
0.06
Total change
Characteristics
0.00
Residuals
-0.02
Coefficients
0.0
0.2
0.4
0.6 Quantile
0.8
1.0
Chapter 2: Decomposition of differences in distributions
93
Appendix D: Monte Carlo comparison of three decomposition methods In order to compare the quantile regression decomposition with the JMP and DFL decompositions, we have performed a Monte Carlo simulation. We consider two years, r and o, and we want to decompose the changes in the median and in the difference between the 90th and the 10th percentiles of the distribution of the dependent variable y. The wage structure in year r is the reference one. We consider two different data generating processes (DGP) for y r and y o . The first is the model that JMP have in mind:
yr = 1 + xr + ur , x r ∼ N ( 4,1) , u r ∼ N ( 0, 0.5 ) yo = 0.8 + 1.1xo + uo , x o ∼ N ( 5,1) , u o ∼ N ( 0, 0.6 ) . The level of the independent variable increased between r and o but it does not influence the variability of wages since the error term is independent of x. The return to the covariate increased but the constant decreased. Finally, the variance of the error term increased. The second DGP is a standard location-scale model with an error term hit by a linear heteroscedastic scale:
yr = 1 + xr + ( 0.5 + 0.125 xr ) ur , x r ∼ N ( 4,1) , u r ∼ N ( 0, 0.5 ) yo = 0.8 + 1.1xo + ( 0.5 + 0.125 xo ) uo , x o ∼ N ( 5,1) , u o ∼ N ( 0, 0.5 ) . The changes in characteristics and coefficients are exactly the same as in the first DGP. However, now the increase in the level of characteristics increases the variability of wages because of the heteroscedastic error term. The distribution of residuals conditionally on x does not change between both periods. To implement the JMP decomposition we assume an independent error term and we do not try to condition on x when we estimate the residuals distribution. Most of the applications using the JMP approach follow a similar procedure. However, it is not clear if it was really implemented in this way by JMP. The DFL decomposition was implemented as described in the original paper with a probit.
94
Chapter 2: Decomposition of differences in distributions
A fourth and a fifth order polynomial in x was used as regressor for the probit estimation respectively for the first and the second DGP. These orders were chosen by minimizing the mean squared error (MSE). The quantile regression decomposition presented in the second section of the paper was implemented with 200 different quantile regressions uniformly distributed between 0 and 1. The results are given in table D1 and D2. For each estimate, we give the true value, the mean, the standard error and the MSE obtained by 1000 simulations with 2000 observations. The results show that the decomposition of changes in the median of the distribution are very similar between the estimator for both DGPs. The standard errors of the DFL estimates are the highest ones, as expected since this estimator is the least restrictive one and do not assume a linear relationship between y and x. The standard errors of the JMP estimates are the lowest ones for the estimation of the effects of residuals and coefficients, as was expected since the error term is normally distributed. However, the estimates of the decomposition using quantile regression have smaller standard errors for the estimation of the effects of characteristics. If we consider the decomposition of the 90-10 gap for the first DGP, we observe that all estimators are consistent and the ordering of the standard error is the same as for the decomposition of the changes in the median. For the second DGP, however, the results differ between the JMP decomposition and the two others. The JMP wrongly attributes to residuals the part of the difference due to the characteristics. With this DGP the residuals do not change between both periods and it is clearly wrong to attribute one part of the increase in inequality to residuals. This error arises from the heteroscedastic error term. The change in the distribution of characteristics puts more weight to the observations with higher within-group inequality. It is the type of situations that we find in the application. The quantile regression and the DFL decomposition are consistent with this type of error term and rightly find that characteristics account for a part of the increase in inequality. Note however that the standard errors of the components of the
Chapter 2: Decomposition of differences in distributions
95
DFL decomposition are such high that the MSE of the JMP estimates are smaller although they are inconsistent. To conclude, these simulations show that the JMP decomposition is the most restrictive decomposition but the most efficient one if all restrictions are satisfied. In particular, assuming independent residuals strongly affects the conclusions in the presence of heteroscedasticity. In this case, the quantile regression decomposition is a consistent alternative. Finally, to be honest with the DFL decomposition, we could imagine a third DGP with a nonlinear functional form. In this case, the DFL would be the only consistent method. The choice of the estimator in applications must be guided by the plausibility of the assumptions. No estimator dominates the others in all situations.
96
Chapter 2: Decomposition of differences in distributions
Table D1: Monte Carlo simulation for the first data generating process Method true
residuals σ mean
QR JMP DFL
0 0
-0.04 0.03
1.54 1.01
QR JMP DFL
11.4 11.4
12.2 11.8
3.25 2.68
Effects of coefficients σ mse true mean mse true Changes in the median 0.02 30 30.1 2.63 0.07 100 0.01 30 29.9 2.26 0.05 100 30 30.3 3.46 0.12 100 Changes in the interdecile range 0.11 23.1 22.5 5.00 0.25 0 0.07 23.1 22.7 4.25 0.18 0 34.5 34.6 9.66 0.93 0
characteristics σ mean mse 100 99.96 99.83
3.67 4.52 4.59
0.13 0.20 0.21
0.11 0.13 0.18
5.43 7.17 10.7
0.30 0.51 1.14
Note: all numbers have been multiplied by 100; 1000 replications; 2000 observations; details in text.
Table D2: Monte Carlo simulation for the second data generating process Method true
residuals σ mean
QR JMP DFL
0 0
-0.03 -0.41
1.65 0.85
QR JMP DFL
0 0
0.07 7.25
3.14 2.22
Effects of coefficients σ mse true mean mse true Changes in the median 0.03 30 30.27 2.71 0.07 100 0.01 30 30.21 2.27 0.05 100 30 30.47 3.82 0.15 100 Changes in the interdecile range 0.10 22.5 22.75 4.83 0.23 7.5 0.57 22.5 22.87 4.34 0.19 7.5 22.5 23.37 10.3 1.08 7.5
characteristics σ mean mse 99.86 100.3 99.76
3.57 4.49 4.91
0.13 0.20 0.24
7.53 0.11 6.72
5.22 7.16 11.1
0.27 1.05 1.24
Note: all numbers have been multiplied by 100; 1000 replications; 2000 observations; details in text.
97
Chapter 3 Public and private sector wage distributions controlling for endogenous sector choice We apply the instrumental quantile regression estimator of Chernozhukov and Hansen (2004b and 2006) to examine the wage structure in the public and private sector in Germany. Assuming exogenous sector choice, we find a negative mean public sector wage premium and show that the wage distribution is more compressed in the public sector. Correcting for endogenous sector choice reverses the findings concerning the mean premium but preserves the more compressed structure of the public sector earnings distribution. Since the original estimator loses its good properties if we allow the public sector premium to vary with the covariates, we propose computationally convenient estimators that achieve good small-sample properties. Applying these estimators, we find that returns to experience and education are generally higher in the private sector. Keywords: Wage Inequality, Quantile Regression, Instrumental Variables, Wage Differentials, Public and Private Sector. JEL classification: C13, C14, C21, J31, J45.
98
3.1
Chapter 3: Public/private sector wage distributions
Introduction
Public sector pay attracts public attention for two reasons. First, public sector labor markets are large and the size of the public sector wage bill has implications for both monetary and fiscal policy. Thus, all taxpayers are concerned whether the government is run on an efficient basis. Second, public sector labor markets are different. There are a number of reasons, surveyed by Bender (1998), why earning differentials between the private and the public sector exist. Most importantly, the public sector is subject to political constraints and not to profit constraints: the goal of a politician is to be re-elected while the objective of a firm is to make a profit (or, at least no deficit). Therefore, issues such as pay equity and fairness can survive in the political market place more than in the economic market place and, hence, the whole distribution of wages is potentially different in the public sector. Given these differences in the wage setting procedures and the possible consequences on the economic well-being of a country, many researchers have sought to ascertain whether an identical employee would earn the same in both sectors. Early research (Smith 1976, Gunderson 1979) used least-squares regressions to compare the predicted wages in both sectors conditionally on basic human capital characteristics. More recently, two different directions of research have complemented these first results. One strand of research concentrates on the mean public sector wage differential and takes account of possible non-random selection by the Heckman (1979) / Lee (1978) correction for selectivity, endogenous switching regression models or fixed effect panel models.33 Most of them do find a significant selection bias, but the results differ widely, probably because of weak (or absence of valid) instruments. A second strand of research complements the results for the mean by estimating the distribution of the public sector wage premium, usually applying quantile regression.34 This approach is 33
See for example Goddeeris (1988), Gyourko and Tracy (1988) and Dustmann and Van Soest (1998). 34 See for instance Poterba and Rueben (1995), Mueller (1998) and Melly (2005a).
Chapter 3: Public/private sector wage distributions
99
however based on the assumption that the sector status is exogenous. The main conclusion that can be drawn from these studies is that the public sector seems to compress the conditional distribution of earnings. The objective of this chapter is to reconcile these two strands of research. Until recently, it was impossible to allow simultaneously for endogenous sector choice and heterogeneous public sector gaps at different points of the distribution. During the last few years, however, different instrumental variable methods for estimating endogenous quantile treatment effect have been proposed35. After comparing the different models and estimators, we decided to use the proposal of Chernozhukov and Hansen (2004b and 2006) to estimate the effects of the public sector status on the entire wage distribution controlling for endogenous sector choice. We also present estimates obtained through the procedure of Abadie, Imbens and Angrist (2002) and obtain similar results. We use data from the German Socio-Economic Panel36, which contains background information on parents’ economic status and provides us with reasonable instruments, such as the ones used for instance by Dustmann and Van Soest (1998). The results indicate that there is considerable heterogeneity in the effect of the public sector status and that controlling for endogenous sector choice is important. This chapter contributes to the literature both methodologically and substantially. The estimator proposed by Chernozhukov and Hansen is a convenient estimator if the number of endogenous variables is small, but becomes more and more difficult to implement when this number increases. In order to allow the public sector wage premium to vary with the covariates, we need to estimate a fully interacted model which would be practically infeasible in term of computation time. Therefore, we propose two estimators that are tractable in this configuration. Their asymptotic distributions are derived and a Monte-Carlo simulation shows that they have good finite-sample properties. 35
Abadie, Angrist and Imbens (2002), Chernozhukov and Hansen (2004b and 2006), Chesher (2003 and 2005), Ma and Koenker (2006), Lee (2004), Honore and Hu (2004), Hong and Tamer (2003), among others. 36 For an English language description of the GSOEP see SOEP Group (2001).
100
Chapter 3: Public/private sector wage distributions
This chapter also contributes to the literature on the public-private sector wage differential. To the best of our knowledge, it is the first study that both controls for endogenous sector choice and analyzes the public and private sector wage distributions. Heterogeneous public sector gaps and endogenous sector choice are shown to be really present and important. In summary, correcting for endogenous sector choice reverses the findings concerning the mean premium but preserves the more compressed structure of the public sector earnings distribution. Applying the new estimators, we find that returns to education are higher and that the experience-wage profile is more concave in the private sector. In Section 3.2, we compare the different estimators that have been proposed recently to correct for endogeneity in the quantile regression model, and we present in details the estimator of Chernozhukov and Hansen (2004b and 2006). In Section 3.3, we propose two convenient estimators in the case where the endogenous variable is fully interacted with the covariates, and we present the results of Monte-Carlo simulations. In Section 3.4 we show how it is possible to recover the unconditional wage distribution if the conditional wage distribution has been estimated by quantile regression, and how we can use this result to decompose differences in distribution. Section 3.5 describes the data set, including the instruments, along with some descriptive statistics. Section 3.6 presents the empirical results and Section 3.7 concludes.
3.2
Endogeneity in the quantile regression model
The basic quantile regression model specifies the conditional quantile as a linear function of covariates. Let Y be the dependent variable of interest and X be a vector of exogenous explanatory variables. It is assumed that: Y = X ' β (τ ) + ε and Fε-1 (τ X ) = 0,
where Fε-1 (τ X ) denotes the τ
th
quantile of ε conditionally on X . Koenker and
Bassett (1978) propose to estimate the τ
th
regression quantile by solving
Chapter 3: Public/private sector wage distributions
101
N
βˆ (τ ) = arg min ∑ ρτ ( yi − xi ' β ) , β ∈ℝ K
i =1
where ρτ is the check function, ρτ ( z ) = z (τ − 1( z ≤ 0 ) ) and 1( ⋅) is the indicator function. They show
N consistency and asymptotic normality of βˆ (τ ) . As
noted by Buchinsky (1991), this estimator has a GMM interpretation since the true β (τ ) satisfies the following moment condition
(
)
E τ − 1(Y < X ' β (τ ) ) X ' = 0 . By increasing τ continuously from 0 to 1, we can trace the entire distribution of Y conditionally on X. Complementing a model of conditional central tendency with a family of models for conditional quantiles enables us to achieve a more complete picture of the effect of the covariates on the dependent variable. This allows the covariates to influence location, scale and shape of the response distribution. For instance, the distributional consequences of minimum wages, training programs and education are of primary interest to policy makers. Unfortunately, in most cases, the treatment is self-selected or endogenous, making conventional quantile regression inappropriate. Amemiya (1982) was the first to seriously consider quantile regression methods in the presence of endogenous regressors. He shows the consistency and asymptotic normality of a class of two-stage median regression estimators. Subsequent work by Powell (1983) and Chen and Portnoy (1996) extended this approach but maintained the focus primarily on the conditional median problem. The main motivation of these works was the robustness of the median regression. Chernozhukov and Hansen (2001) show that this “fitted value” approach is not consistent when the quantile treatment effect differs across quantiles which is, however, the main motivation for using quantile regression. Other approaches have been considered, as well. Chesher (2003) develops a general nonparametric model which may be viewed as an extension of the recursive causal chain models discussed by Strotz and Wold (1969). He shows
102
Chapter 3: Public/private sector wage distributions
the nonparametric identification of the parameters of interest. Based on his results, Ma and Koenker (2006) propose two estimators but they assume a finitedimensional parametric restriction and integrate over the nonparametric estimates. The identification strategy of Chesher (2003) requires the dependent variable, the endogenous variables and the instruments to be continuous. Although Chesher (2005) shows that extensions for discrete variables are possible, he excludes the binary endogenous variable case, which is the situation encountered in the public-private sector application. Abadie, Imbens and Angrist (2002) propose a parametric estimator based on the LATE model of Imbens and Angrist (1994). Their estimator applies only to a special case: a binary treatment variable D and a binary instrument Z.37 They impose an independence condition (Z is independent of the errors in the outcome and selection equations) and a monotonicity condition (the direction of the effect of Z on the participation decision is the same for all individuals). They show that under these assumptions the marginal distributions of the potential outcomes are identified for the sub-population of compliers, and they suggest an ingenious estimator that can be interpreted as a re-weighted standard quantile regression estimator. If we assume that the quantile treatment effects are homogeneous, a more direct estimation strategy can rely directly on the exclusion restrictions (instruments) in the GMM framework. Suppose we have a structural relationship defined by
Y = D 'α (U ) + X ' β (U ) ,
U X , Z ∼ Uniform ( 0,1) ,
(1)
τ → D 'α (τ ) + X ' β (τ ) is strictly increasing in τ ,
(2)
D = δ ( X , Z ,V ) .
(3)
In these equations, Y is the scalar outcome of interest, U is a scalar unobserved random variable, 37
Extensions that allow for continuous instruments in a similar way as the local instrumental variable estimator for the mean (Heckman and Vytlacil, 1999) are possible. See Carneiro and Lee (2005) for recent developments.
Chapter 3: Public/private sector wage distributions
103
D is a vector of endogenous variables determined by (3), where X is a vector of exogenous control variables, Z is a vector of instrumental variables, and V is a vector of unobserved random variables possibly correlated with U. (1) and (2) imply that
Pr (Y ≤ D 'α (τ ) + X ' β (τ ) X , Z ) = τ ,
(4)
thus providing the moment conditions
(
)
E τ − 1(Y ≤ D 'α (τ ) + X ' β (τ ) ) ( X ', Z ' ) = 0 .
(5)
Assuming iid sampling, compactness on the support of variables and on the parameter space, and some full rank conditions assuring that the parameters are identified,38 we could estimate α (τ ) and β (τ ) by traditional GMM. This strategy was used by Hong and Tamer (2003), Chen, Linton and Van Keilegom (2003) and Honore and Hu (2004) to construct estimators.39 Hong and Tamer (2003) also present a discussion of conditions under which this model is identified. Abadie (1995) noted the computational difficulty in obtaining the solution to the optimization problem. The objective function is “million-modal” and has zero derivative almost everywhere, implying the need to perform a grid search over a subset of ℝ dim (α ) + dim ( β ) , thus rendering the application of these estimators almost impossible in data sets typically found in microeconometrics40. The instrumental variable quantile regression (IVQR) estimator proposed by Chernozhukov and Hansen (2004b) can be viewed as a computationally attractive method of approximately solving the moment condition (5). Their basic idea is 38
See for instance assumption R2 in Chernozhukov and Hansen (2006). Basically, it requires that a density-weighted covariance matrix between D and Z is of full rank. 39 Honore and Hu use the moment condition directly. Chen, Linton and Van Keilegom and Hong and Tamer use a minimum distance framework with nonparametric first step estimation. Chen, Linton and Van Keleigom propose an extension for partially linear models and Hong and Tamer for censored models. 40 This applies also to the robust LIML estimator of Sakata (2006). Moreover, the main objective of his estimator is to robustify the traditional LIML estimator, not to estimate the effect of endogenous variables on the distribution of the potential outcome.
104
Chapter 3: Public/private sector wage distributions
simple. If we knew the true coefficients α (τ ) , we could estimate β (τ ) consistently by regressing Y − D 'α (τ ) on X with traditional quantile regression. In reality, we do not know α (τ ) but we have instruments Z. Thus, we can try different values for α (τ ) and regress Y − D 'α (τ ) on X and Z. If the model is identified, the true value of α (τ ) will be the only one for which the coefficients on Z are zero. This reduces the computation time considerably since we need to perform a grid search only on dim (α ) which is frequently small. Moreover, quantile regression can be solved very fast using interior point algorithms (Portnoy and Koenker 1997).41 In the public-private sector application, we have an endogenous dummy variable. Thus, only the approaches of Abadie, Angrist and Imbens (2002) and Chernozhukov and Hansen (2004b) can be applied. The comparison of both models shows that Abadie, Angrist and Imbens (2002) impose more restrictive conditions on the choice equation but allow for heterogeneity42 in responses. When the assumptions of both models are satisfied, they estimate the same quantity and, therefore, a comparison of both sets of estimates provides a useful robustness check. We will concentrate principally on the approach of Chernozhukov and Hansen for two reasons. First, we have five binary instruments in our application and a relative small sample which makes it really difficult to estimate heterogeneous quantile treatment effects. The estimator of Chernozhukov and Hansen (2004b) allows us to use all five instruments to estimate the quantile treatment effects. As a comparison, we will also apply the estimator of Abadie, Angrist and Imbens (2002) with only one - the most significant - instrument. Second, only Chernozhukov and Hansen (2005b) have derived the properties of the
41
Details about the procedure and the asymptotic distribution of the estimator are given in Appendix A. 42 Here we mean heterogeneity conditional on X and the quantile of interest. Naturally, both estimators allow for different treatment effects at different quantiles.
Chapter 3: Public/private sector wage distributions
105
instrumental quantile regression process and have proposed consistent testing procedures based on that process. This is important in the present application since significant results cannot be obtained by considering a single instrumental variable quantile regression, but only by using the whole instrumental variable quantile regression process.
3.3
Estimation of interacted models
In order to allow the public sector wage premium to vary with X, we need to estimate a fully interacted model. This dramatically increases the number of endogenous covariates to dim ( D ) ⋅ dim ( X ) , which is equal to 8 in our application. Although the moment conditions will be highly correlated, we can generate a sufficient number of instruments by interacting the instruments with the exogenous regressors. However, the computation time needed to perform grid searches over such high-dimensional parameter spaces renders the procedure infeasible. Therefore, two computationally tractable estimators for the estimation of fully interacted models are proposed in this section. The first uses the sample selection correction of Buchinsky (1998) to estimate the slope parameters and the Chernozhukov and Hansen’s estimator to estimate the constants. The second uses the Chernozhukov and Hansen’s estimator locally at different points of the distribution of X. In a second step, we then estimate the global parameters by using the minimum distance framework. 3.3.1
Combination of sample selection and IV quantile regression
Buchinsky (1998) proposes a sample selection procedure for quantile regression. His estimator can be considered as the quantile regression equivalent of the series estimator suggested by Newey (1988). The key assumption is the single index restriction on the error term. Its distribution is assumed to depend on the regressors X and the instruments Z only through an index function, which can be estimated in a first step. The bias term can then be approximated by a power series of the estimated index. The constant is estimated “at infinity” using an idea
106
Chapter 3: Public/private sector wage distributions
suggested by Heckman (1990) and Andrews and Schafgans (1996) for the estimation of the constant term in mean regression with selectivity. In the public private sector application, D is an endogenous dummy variable and is equal to 0 if the person works in the private sector and equal to 1 if she works in the public sector. The idea of the estimator proposed in this section is to reconsider a fully interacted instrumental variable model as a switching regression model:
Y0 = α 0 (τ ) + X ' β 0 (τ ) + ε 0 Y1 = α1 (τ ) + X ' β1 (τ ) + ε1 Y = DY1 + (1 − D ) Y0 . Therefore, by considering the two wage equations separately, we can estimate them using the sample selection correction of Buchinsky (1998). The constant terms can theoretically be estimated "at infinity" as proposed by Buchinsky if there are some observations with Pr ( D = 1) → 1 and others with Pr ( D = 1) → 0 . Such a method has the drawbacks that it requires very strong and large support conditions, and that estimation that directly follows the identification strategy involves estimation on "thin sets" and thus a slow rate of convergence. The estimation often rests on just a handful of observations surpassing the growing threshold which may be hard to distinguish from unreasonable outliers. Thus, we propose to estimate only the slope coefficients with the sample selection procedure of Buchinsky to obtain
N consistent and asymptotically normally
distributed estimates βˆ0 (τ ) and βˆ1 (τ ) . In a second step, both constant terms are estimated using a slightly modified version of the instrumental quantile regression estimator. We use the Chernozhukov and Hansen (2004b) estimator with
Y − (1 − D ) X βˆ0 − DX βˆ1 as the dependent variable and only a constant as exogenous regressor. Using traditional results for sequential GMM estimators we can prove that βˆ0 (τ ) , βˆ1 (τ ) , αˆ 0 (τ ) and αˆ1 (τ ) are
N consistent and
asymptotically jointly normally distributed. All the details about the estimator
Chapter 3: Public/private sector wage distributions
107
(called SIVQR), the asymptotic distribution and the proofs can be found in Appendix B. 3.3.2
Integration of nonparametric first step estimates
When the distribution of the covariate vector X has finite support, the minimum distance framework provides an alternative estimation procedure. Buchinsky (1991, chapter 1, Section 9) and Chamberlain (1994) derive and apply such an estimator for the exogenous case. In the presence of endogenous regressors, the idea consists of estimating IV quantile regression separately in each cell and then using the minimum distance framework to obtain
N
consistent and
asymptotically normally distributed estimates of the coefficients. The asymptotic distribution of this estimator (called MDIVQR) can be found in Appendix C. Contrary to the method proposed in Section 3.3.1, this approach can be directly extended to the case of continuous endogenous variables. Moreover, it does not require the single index assumption for consistency. On the other hand, the problem will become more complicated if some of the exogenous variables are continuous. We can first estimate the quantile function nonparametrically at each observation using a locally weighted version of the instrumental quantile regression estimator, and then use the minimum distance framework to obtain an estimate of the finite-dimensional parameters. Under some conditions on the kernel and the bandwidth, the second step is
N
consistent since we integrate over all observations and there are only a finite number of parameters to estimate. Chen, Linton and Van Keilegom (2003) provide the basic framework for deriving the asymptotic distribution of this estimator. We will not follow this method in this chapter, however: First, the Monte Carlo simulation in the following subsection shows that the small sample properties of the minimum distance estimator are worse than those of the SIVQR. Second, we would lose the good computational property of the estimator which is the principal reason to use this estimator.
108
Chapter 3: Public/private sector wage distributions
3.3.3
Finite sample properties of these estimators
In order to compare the performance of the two estimators proposed above with the IVQR and to evaluate the costs of allowing for fully interacted models in terms of variance, we present the results of a Monte Carlo simulation.43 We choose the simplest data generating process that permits us to consider these issues:
Y = α + X β x + D β d + XD β xd + (1 + 0.2 D ) U X ∼ b ( 0.5 ) ; Z ∼ N ( 0,1) ; D ∼ 1( 0.5 − X + Z + ε < 0 )
(6)
ε ∼ t3 ; U ∼ t 3 ; Cov ( ε ,U ) = 0.8. We have only one endogenous binary variable, one exogenous binary variable and one instrument. The amount of endogeneity is quite high with a correlation of 0.8 between both error terms. Traditional parametric models for the binary choice model are not consistent since the error term is t-distributed with 3 degrees of freedom.
The
treatment
effect
varies
with
the
quantile
because
of
heteroscedasticity. We consider two different sets of parameters: model without interaction term: α = 0 , β x = 1 , β d = 1 and β xd = 0 , model with interaction term: α = 0 , β x = 2 , β d = 1 and β xd = −1 . We set the number of observations to 100, 400 and 1600 and draw 10000 replications. In each replication, we apply the traditional quantile regression estimator (QR), the IVQR of Chernozhukov and Hansen with and without interaction term, the SIVQR and finally the MDIVQR. The Chernozhukov and Hansen estimators are obtained by searching on regular grids with steps of length 0.01. Since we have only one endogenous and one exogenous regressor, the grid search will be only 2-dimensional if we allow for an interaction term. In the application, we have 8 exogenous regressors and grid search becomes infeasible. For the SIVQR, the selection equation is estimated by the Klein and Spady (1993) 43
Softwares to implement the proposed estimators in R and to replicate the Monte Carlo simulations are available at the author's website. R is an open-source programming environment and can be downloaded at no cost from the site www.r-project.org.
Chapter 3: Public/private sector wage distributions
109
semiparametric estimator. Since the results are similar for different quantiles, we present only the outputs for the median regression. Tables 3.1 and 3.2 give the bias, the standard error (S.E.) and the mean squared error (MSE) for each estimator, in the case without and with interaction term, respectively. The simulations confirm that the three estimators that allow for an interaction term between D and X are consistent in both cases, while the IVQR which does not allow for an interaction term converges to values difficult to interpret if there is an interaction term.
N convergence rates are validated since quadrupling the
sample size divides the standard errors by about 2 and the MSE by about 4. We note also that QR is heavily biased, which is due to the high level of endogeneity that is present in the data generating processes. Naturally, allowing for endogeneity and interaction term has a price in terms of variance: the standard errors of the IVQR are up to 80% higher than those of QR and allowing for an interaction term can double the standard errors of the estimates. Finally, comparing the three estimators that allow for an interaction term and endogeneity, we remark that the SIVQR is always more precise than the two others, and that the differences are not negligible (from 20% up to 100%). The MDIVQR is asymptotically equivalent to the IVQR that allows for an interaction term, but has better small sample properties. Thus, based on the computational and statistical properties of the estimators and the results of these Monte Carlo simulations, we apply the SIVQR estimator in Section 3.6.4.
3.4
Decomposition of differences in distribution
The most basic approach to explore wage differentials between groups or sectors involves estimating an earnings regression using pooled data and including a dummy variable for a worker’s sector of employment. This specification can be estimated for the conditional mean or the conditional quantiles of the dependent variable. If the sector of employment is considered to be endogenous, the conditional mean of the log wage can be estimated by traditional instrumental variable methods. For quantile regression, recent developments presented in
110
Chapter 3: Public/private sector wage distributions
Section 3.2 allow to correct for endogeneity. This simple dummy variable approach is pretty easy to estimate and the results are trivial to interpret since the “discrimination” part of the difference is the same for all observations at the same point of the distribution. However, a very strong restriction is implied by this specification: the returns to human capital characteristics are constrained to be equal across sectors. The effect of a worker's sector of employment is limited to be an intercept effect. Since this restriction is often violated by the data, alternative methodologies have been brought forward. The first step consists naturally of estimating the wage equation separately for each sector, which allows the discrimination to be different at different points of the distribution of the covariates. A first possibility to present the results is to consider the expected wage rates or the quantiles of the wage distributions in the public and private sectors for reference individuals. Another common procedure consists of aggregating the results. The Oaxaca (1973) / Blinder (1973) decomposition is the best known decomposition procedure for differences at the mean. It allows to decompose the total difference easily into a part explained by different characteristics and a part explained by coefficients. Decomposing differences in distribution is a more complex problem because the quantile of a linear function is not equal to the linear function of the quantile, contrary to the mean. Melly (2006a) proposes an intuitive procedure to decompose differences at different quantiles of the unconditional distribution. In a first step, the conditional distribution is estimated by quantile regression. In the second step, the conditional distribution is integrated over the range of the covariates. Formally, let
(
)
βˆ = βˆ (τ 1 ) ,..., βˆ (τ j ) ,..., βˆ (τ J ) be the quantile regression coefficients estimated at J different quantiles 0 < τ j < 1 , j = 1,..., J . Integrating over all quantiles and over all observations, a natural estimator of the θ th unconditional quantile of the dependent variable is given by
Chapter 3: Public/private sector wage distributions
1 q (θ , X , β ) = inf q : N
N
J
∑∑ (τ i =1 j =1
j
111
− τ j −1 )1 xi βˆ (τ j ) ≤ q ≥ θ .
(
)
Melly (2006a) shows that this estimator is consistent and asymptotically normally distributed. Consistent estimators of the variances are also proposed. Now, we can estimate counterfactual distributions by replacing either the estimated coefficients or the distribution of characteristics in one sector by the estimated coefficients or the distribution of characteristics in the other sector. It is thus possible to split the difference at each quantile of the unconditional distribution into a part explained by coefficients and a part explained by characteristics:
q (θ , X pub , β pub ) − q (θ , X priv , β priv ) = q (θ , X pub , β pub ) − q (θ , X pub , β priv ) + q (θ , X pub , β priv ) − q (θ , X priv , β priv ) where the first bracket represents the effect of differences in coefficients (discrimination) and the second bracket represents the effect of differences in the distribution of characteristics (justified differential). In the presence of endogenous sector choice, this procedure can be applied with the coefficients estimated by the procedures proposed in Section 3.3. We can then estimate the wage distributions that we would observe without the sample selection bias. Thus, the difference between the quantiles of the unconditional distribution in the public sector and the quantiles of the unconditional distribution in the private sector can be decomposed into three components: effect of endogenous sector choice, effect of differences in coefficients and effect of differences in the distribution of characteristics.
3.5
Data, descriptive statistics and instruments
The analysis in this chapter draws on data from the German Socio-Economic Panel (GSOEP) for the year 2003. It would be interesting to use the panel structure of the data to estimate a fixed effect model. Unfortunately there is not enough movement between the public and the private sector to obtain useful results. Therefore, we concentrate in this chapter on the last wave of the panel and
112
Chapter 3: Public/private sector wage distributions
we control for endogeneity of the sector choice by instrumental variable methods. After the reunification, the panel was extended to include the eastern part of Germany, but we focus here on West Germany because substantial economic differences subsist between East and West Germany. Since many public sector jobs are not open to foreign nationals, the analysis is based on the subsample of Germans only. Furthermore, the sample is restricted to include only men who were between 17 and 65 years old and who were in full-time or part-time employment.44 As the sample includes only wage earners, the results must be interpreted conditional on the selected sample. However, since we concentrate on males, we expect that this selection bias is not important. Finally, all observations with a missing value for one of the variables have been excluded.45 The final dataset contains 3125 observations. Table 3.3 defines the variables we use for our empirical analyses. Y, the dependent variable, is Lnghwage, the logged gross hourly wage. X, the vector of regressors which is assumed to be exogenous, contains a quadratic in potential experience and five educational dummies. D, the endogenous variable, is Psect, a dummy variable equal to 1 if the person is employed in the public sector and 0 if she is employed in the private sector. We do not distinguish between civil servants (Beamte) and other public sector employees since pay scales are the same and apply to all public sector workers at the federal, state and local level. Table 3.4 presents descriptive statistics for public and private sector employees. Means of the relevant variables show that average hourly earnings are higher in the public sector than in the private sector. They also show that public sector employees are, on average, better educated than private sector employees. For instance, 22.7% of the employees in the public sector have achieved a university degree (Ed level 6), as opposed to 13% in the private sector. Public sector employees have acquired more labor market experience, too. These differences in
44
As a sensitivity check, we have also repeated the estimation procedure (except for the bootstraps) only with men between 30 and 50 years and we have found no noteworthy difference. 45 245 observations have been excluded because of missing values.
Chapter 3: Public/private sector wage distributions
113
work experience and education may explain the higher average wages of public sector employees. A first visual summary of the public and private sector wage distributions is provided by Figure 3.1. The density functions were estimated using an Epanechnikov kernel estimator and the bandwidth was chosen according to the Silverman (1986) rule of thumb. It can be seen from this figure that the distributions are quite distinct between sectors. The public sector earnings distribution is characterized by a density function that is higher around the mode and has lower dispersion. The public sector earnings distribution lies “within” the private distribution. Public sector employees at the 10th quantile of the public sector earnings distribution enjoy an earnings advantage over private sector employees at the same point in the private sector distribution of wages; but the reverse holds for employees at the 90th quantile of the public sector and private sector earnings distribution. With “higher floors” and “lower ceilings”, the public sector compresses the unconditional wage distribution. Given that workers choose whether to work in the public or private sector, there is potential for a sample selection bias. To correct for endogenous sector choice, identification requires some exclusion restrictions. In many studies, the data is not rich enough to provide appropriate instruments and identification assumptions are sometimes doubtful. The GSOEP is a rich dataset that contains a large range of background variables usually not available in other studies. We will use the last five variables defined in Table 3.3, namely those that are related to parents' occupational status. Dustmann and Van Soest (1998) have used very similar exclusion restrictions. The most important instrument is Fcivil, a dummy variable that is equal to 1 if the father was a civil servant at the time the employee was 16 years old. Table 3.4 shows high correlations between the instruments and the public sector status. For instance, if the father works in the public sector, his son will also work in the public sector with a probability of 36%. If the father does not work in the public sector, this probability will be only 21%.
114
Chapter 3: Public/private sector wage distributions
3.6
Empirical results
3.6.1
Exogenous sector choice
As a benchmark, we first estimate the public-private sector wage differential assuming that the sector choice is exogenous. The first method used is the simple dummy variable approach. We regress the logged wage on X and on the public sector dummy with traditional quantile regression. The estimated public sector gap as a function of θ is plotted in Figure 3.2 with a 95% confidence interval. All standard errors in this chapter, if not stated otherwise, were estimated by the sample analogs of the asymptotic variances46. The densities were estimated by the method of Powell (1984), with a normal kernel and a bandwidth following the Bofinger (1975) rule. The estimated coefficients for the median regressions are given in Table 3.5. At this point of the distribution, public sector employees earn 8.4% less than private sector employees with the same characteristics, and this coefficient is significantly different from zero. The results of Figure 3.2 show that the public sector compresses the wages by giving a positive premium at the low end of the conditional distribution and a significant negative premium at the upper tail of the distribution. These results are correct only if the returns to individual characteristics are the same in both sectors. In order to test this restriction we estimate a fully interacted model where all characteristics are interacted with the public sector dummy and we test if the interaction terms are significantly different from zero. The null hypothesis is clearly rejected for most quantiles and is definitely rejected for the whole quantile regression process. Therefore, we have estimated 100 quantile regressions separately in each sector. Using the procedure described in Section 3.4, we have then decomposed the differences between the quantiles of the unconditional distributions into a part explained by different distributions of
46 That means that we use the proposals of Powell (1984) for traditional quantile regression, of Buchinsky (1998) for the sample selection correction, of Chernozhukov and Hansen (2004b) for the IVQR and of Melly (2006a) for the decomposition procedures.
Chapter 3: Public/private sector wage distributions
115
characteristics and a part explained by different coefficients (could be interpreted as wage premium or discrimination). Figure 3.3 plots the decomposition results with a 95% confidence interval for all estimates. The compression of the unconditional public sector wage distribution can be seen by looking at the total differential. The 10% quantile of the public sector wage distribution is higher than the 10% private sector wage distribution, but the contrary holds for the 90% quantile. This is just another way of presenting the results of Figure 3.1. The part explained by characteristics is significantly positive, reflecting the fact that the public sector employees are better educated and have more experience than private sector employees. We cannot reject the hypothesis that the part explained by characteristics is constant across the distribution. Therefore, the higher wage dispersion in the public sector is not caused by higher dispersion of the characteristics. Finally, the part explained by coefficients is very similar to the results of the dummy variable approach in Figure 3.2. The premium is significantly negative at the median and decreases monotonically from the low end to the high end of the distribution. Similar results have been found by Melly (2005a). However, we should recall that these results were obtained by assuming exogenous sector choice. This is unlikely to be satisfied, and the assumption will be abandoned in Sections 3.6.3 and 3.6.4. 3.6.2
Choice between private and public sector
We estimate the probability of working in the public sector conditionally on X and Z in order to describe the selection process between both sectors and to have a first step estimation for the sample selection correction procedure of Buchinsky (1998). The estimation is performed by a logit and, since the logit depends heavily on the distributional assumption for consistency, also by the semiparametric estimator of Klein and Spady (1993). We implement the Klein
116
Chapter 3: Public/private sector wage distributions
and Spady estimator as in Gerfin (1996).47 The estimated coefficient vectors are normalized such that the slope coefficients have norm 1. The results of the logit and of the Klein and Spady estimations are given in Table 3.6. The estimated coefficients are not fundamentally different. The standard errors of the logit estimates are generally lower than those of the semiparametric estimates, as expected, but the differences are not huge. The probability of working in the public sector increases with (potential) experience and education. One of the fundamental assumptions of the instrumental variable quantile regression estimator is the presence of at least one instrument that has an effect on the endogenous variable. We can test this assumption and find at least two significant instruments: Fcivil is significantly different from zero at the 1 per mil and Mnwork is significant at the 5 % level. If we believe the logit estimates, Fwhithe is significant at the 10% level. The Wald test for testing the hypothesis that the coefficients of all 5 instruments are equal to zero gives a value of 19 with the logit and 23 with the Klein and Spady estimator. We can thus reject the null hypothesis at all sensible significance levels. 3.6.3
Endogenous dummy variable
In this section, we correct for the endogeneity of the sector choice by using the estimator of Chernozhukov and Hansen (2004b and 2006) described in Appendix A. Given the relative small sample size, we do not try to weight the observations and choose Α (θ , α ) to be the inverse of the asymptotic covariance matrix of
n ( γˆ (θ , α ) − γˆ (θ , α ) ) . The parameter space for α is taken to be between -2 and 2 for θ < 0.2 or θ > 0.8 and between -1 and 1 for the other quantiles. We use equally spaced grids with step size of 0.001. Figure 3.4 plots the coefficients on the public sector dummy at each percentile.
47
A bandwidth of 0.3 was chosen. Note that the estimates of the sector choice equation are not really sensitive to the choice of the bandwidth and that the second step estimates (wage equations corrected for selection) are even less sensitive.
Chapter 3: Public/private sector wage distributions
117
If we compare these results to the results of Section 3.6.1, we will see that the premium is about 40% higher after the correction for endogeneity. While the differential was negative over the major part of the distribution with quantile regression, it is now positive over 75% of distribution. The correction for endogenous sector choice inverts the conclusion: the majority of public sector employees is over- and not underpaid. Thus, there is positive selection into the private sector and negative selection into the public sector. The direction of the selection effect is correctly predicted by the Roy (1951, see also the discussion in Heckman and Honore 1990) model. Employees with an absolute disadvantage have a comparative advantage in the sector in which earnings are more concentrated. Thus, individuals will be positively selected towards the sector with higher wage inequality, the private sector. The evolution of the premium when θ varies between 0 and 1 is hardly different whether we control for endogeneity or not. The premium declines more or less monotonically from high positive values at the lower end of the distribution to negative values at the higher end of the distribution. The differences are even more pronounced with estimates ranging from -1 to 1.5 if we control for endogenous sector choice. However, given the variances of the estimates, these extreme results should be interpreted with caution. In any case, the different distributions of wages in both sectors are not induced by different distributions of unobserved ability. It is not possible to come to significant conclusions for a single quantile because the standard errors of the estimated coefficients are almost 10 times higher if we allow for endogeneity. As explained in Appendix A, an efficient estimator can be obtained by using estimated optimal instruments and weights. We implemented the optimal estimator but were disappointed because the standard errors of the estimates did not really decreased. However, we can obtain significant results by considering the whole instrumental quantile regression process. Chernozhukov and Hansen (2006) propose inference procedures to evaluate the impact of the treatment on the entire distribution of outcomes. They suggest a resampling
118
Chapter 3: Public/private sector wage distributions
procedure to compute asymptotically valid critical values for these tests. They develop a method of score resampling but we prefer to recompute the estimates in each replication, in order to avoid the estimation of conditional densities which is a difficult task and requires the rather arbitrary choice of the bandwidths. We use the Smirnov-Cramer-Von-Misses statistic and estimate Anderson-Darling weights by resampling. We estimate the critical values by constructing 1000 replications with 250, 2000 and 3125 observations drawn with replacement48 and estimate the instrumental quantile regression process on τ ∈ [ 0.1, 0.9] . The p-values for five hypotheses are given in Table 3.7. In general, the larger the subsample size the more conservative the tests are, but they yield all the same results for a confidence level of 5%. As expected, we can reject the hypothesis that there is no difference between both sectors. The tests also strongly reject the null hypothesis of a constant effect, which was taken to be the weighted trimmed mean on τ ∈ [ 0.1, 0.9] . Furthermore, the tests reject the hypothesis of exogeneity, confirming the need of instruments for the sector choice. This confirms the visual impression of Figure 3.4 and shows that there is positive selection into the private sector. Finally, we reject the hypothesis that the wage distribution in the private sector dominates the distribution in the public sector, but we cannot reject the opposite. Thus, heterogeneous public sector wage gaps and endogenous sector choice are significant at the 5% significant level. 3.6.4
Endogenous sector choice with fully interacted covariates
In Section 3.6.3 we have assumed that the returns to characteristics are the same in both sectors. In the exogenous case, we have shown that this restriction is not satisfied. Therefore, we apply now the estimators proposed in Section 3.3. Given the results of the Monte-Carlo simulation, we concentrate on the SIVQR of
48
We want to check for the sensitivity of the results to the choice of the block size. Chernozhukov and Hansen (2006) recommend choosing a block size of kn 2 5 with k between 3 and 10. k = 10 gives a block size of 250. Due to the computation time only 500 replications have be drawn for the bootstrap.
Chapter 3: Public/private sector wage distributions
119
Section 3.3.1. Results for the MDIVQR (not presented) are similar but have higher variances. The sample selection procedure of Buchinsky (1998) is used to estimate the slope coefficients. The index was estimated by the Klein and Spady estimator (Section 3.6.2), and we approximate the bias term with a second-order power series expansion of the inverse Mill’s ratio. The constants are then estimated by the Chernozhukov and Hansen (2004b) estimator. For all possible values of α , we use the same grids as in Section 3.6.3. The instrument was taken to be the index estimated by the Klein and Spady estimator. The results of the median regressions, given in the 3rd and 4th column of Table 3.5, show that returns to education and experience are not the same in both sectors. Formal tests reject the null hypothesis of equal slopes at the 1% significance level. Similar results arise for the other quantiles, although the p-values are somewhat higher in the more extreme parts of the distribution. A joint test of equal slopes in both sectors at the 0.1, 0.25, 0.5, 0.75 and 0.9 quantiles rejects the null hypothesis at the 0.01% level. Returns to education are generally lower in the public sector. Thus not only within-group inequality but also between-group inequality is lower in the public sector. Returns to (potential) experience are also higher in the private sector for younger employees. However, since the function is more concave in the private sector, the situation is inverted at the end of the work life (more than 28 years of experience). We have estimated the coefficient vectors corrected for endogeneity at 100 different
quantiles
uniformly
distributed
between
0
and
1
( θ = 0.005, 0.015,..., 0.995 ). The procedure described in Section 3.4 allows the estimation of the potential wage distributions in both sectors and of the counterfactual distribution that would prevail if public sector employees were paid like private sector employees. Figure 3.5 plots the results. No confidence intervals are plotted to avoid overloading the figure. At the median, the standard errors of the estimates are 1.8%, 14%, 15% and 1.9% for the uncorrected differential, corrected differential, effects of coefficients and effects of
120
Chapter 3: Public/private sector wage distributions
characteristics, respectively. The uncorrected differential is taken from Figure 3.3 and represents the observed differences between the quantiles of the wage distribution in the public sector and in the private sector. The corrected differential is the differential that we would observe if the employees sorted randomly between sectors conditionally on their characteristics. We note that the corrected differential is much higher than the uncorrected one, showing that there is positive selection into the private sector and negative selection into the public sector. We then decompose the corrected differential into the part explained by different characteristic distributions and the part explained by different coefficients (often interpreted as discrimination). The effects of characteristics are positive and stable across the distribution, as they were in Section 3.6.1. The effects of coefficients decrease as we move along the wage distribution but remain positive at all quantiles, indicating that a positive wage premium is indeed given to public sector employees. While the wage premium is not significantly different from zero for most of the quantiles, a joint test of the absence of a premium at the 0.1, 0.25, 0.5, 0.75 and 0.9 quantiles rejects the null hypothesis at the 1% level. 3.6.5
Comparison with Abadie, Angrist and Imbens (2002) estimator
Abadie, Angrist and Imbens (2002) propose an alternative estimator based on the LATE model. They impose an independence condition (Z is independent of the errors in the outcome and selection equations) and a monotonicity condition (the direction of the effect of Z on the participation decision is the same for all individuals). They show that the marginal distributions of the potential outcome are identified for the sub-population of compliers and suggest an ingenious estimator that can be interpreted as a re-weighted standard quantile regression estimator. The estimands of Chernozhukov and Hansen and of Abadie, Angrist and Imbens are not generally the same and the sets of assumptions are different. However, if both sets of assumptions are satisfied and if the compliers are representative of the whole population, then both estimators will converge to the same value. If these conditions are not satisfied, they will have different
Chapter 3: Public/private sector wage distributions
121
probability limits. Thus, the comparison of the results of both estimators provides a useful robustness check. Since the Abadie, Angrist and Imbens estimator allows only for a single instrument, we use only the most powerful one, the public sector status of the father. As they suggest, we implement their estimator by running weighted quantile regression. We estimate the weights by power series, but, probably because of the relative small sample size, only the first order of the polynomial was found to have explanatory power. The 5th and 6th columns of Table 3.5 give the coefficients of the median regression for the public and the private sector respectively. The results obtained by applying the SIVQR are confirmed: the constant is higher in the public sector, returns to education and experience are higher in the private sector. Figure 3.6 plots the decomposition defined in Section 3.4 using the coefficients of 100 quantile regressions estimated by the method of Abadie, Angrist and Imbens. Again, there appears to be positive selection into the private sector. The differences between the corrected and the uncorrected differential are slightly lower, probably as a consequence of the weaker instrument which may bias the results in the direction of the results obtained by traditional quantile regression. Overall, it appears that these differences are small relatively to sampling variation and that one would not draw substantively different conclusions from either set of estimates. 3.6.6
Validity of the instruments
The crucial assumptions for the consistency of the estimators used in this chapter (apart from the parametric restrictions) are the presence of instruments and the exclusion of these instruments from the outcome equation. In Section 3.6.2, we find that at least three instruments have an effect on D, confirming the first assumption. With 5 instruments for a single endogenous variable, we can also partially test the second assumption. Since we have chosen the weighting matrix
Α (θ , α )
to be the inverse of the asymptotic covariance matrix of
n ( γˆ (θ , α ) − γ (θ , α ) ) , the objective function of the IVQR is asymptotically
122
Chapter 3: Public/private sector wage distributions
2 χ dim (γ ) -distributed under the null-hypothesis that the exclusion restrictions are
satisfied. We apply this test and can reject the null-hypothesis for none of the percentile at the 1% significance level. For instance, the value of the objective function is 6.74 at the median, which is far below standard critical values. However, these tests do not necessarily have high power and this is particularly true here because all instruments are of the same type. Therefore, we try to assess the plausibility of the exclusion restrictions. Our motivation for using these instruments is that children learn through imitation of those adults who live in their neighborhood, particularly their parents. The image the parents convey to their child is something that the child will use as a base for his own development. For a son, the father sets an example, a role model. Thus, the instruments we use should influence the sector choice but should have no impact on the potential wages. The exclusion restrictions would be violated, for instance, if the father had better relationships in his sector of employment and his son could benefit from his relationships to increase his wage. This would explain why the children of a father in the public sector work over-proportionally in the public sector. As a first indication against this hypothesis, we find that the sector choice of the father is not important (and not significant) for his daughter but is strongly significant for his son. Similarly, the sector choice of their mother plays no role in the occupational choice of boys but is high and significant for girls. These results indicate that the example a father sets for his son seems to be the reason for the intergenerational correlation in sector choice. Another way of assessing the exclusion restrictions is to use information about the position of the father within the public sector. If we assume that the capacity of a public sector employee to favor his son is an increasing function of his position within the public sector hierarchy, we can test the exclusion restrictions by using additional information about the father's position. In the data, we do not only know whether the father was working in the public sector or not, but we also have an indication of his position within the public sector. The second column of
Chapter 3: Public/private sector wage distributions
123
Table 3.8 gives the results of the logit estimation of the sector choice equation. Instead of a single dummy for the father’s public sector status we use five different categories related to the position of the father within the public sector. The resulting five coefficients are very close to each other and we cannot reject the hypothesis that there are equal (p-value of 90%), but we can naturally reject the hypothesis that they are null (p-value of 0.1%). In the third column of Table 3.8, we present the median public sector wage gap using only one category as instrument. The level of the estimates varies slightly but remains positive with all instruments and no “trend” arises. Thus, nothing indicates that sons of high-level public sector employees are more inclined to work in the public sector or that they obtain higher public sector wage premium.
3.7
Summary and conclusion
In this chapter, we have examined the public / private sector wage differential allowing for the first time simultaneously for endogenous sector choice and heterogeneous public sector gaps at different points of the distribution. We have applied the instrumental quantile regression estimator of Chernozhukov and Hansen (2004b and 2006) to data from the German Socio Economic Panel, which provides us with reasonable instruments. The empirical findings can be summarized as follows: The results assuming exogenous sector choice yield a negative mean public sector wage premium and show that the wage distribution is more compressed in the public sector. Correcting for endogeneity reverses the findings concerning the mean premium but preserves the more compressed structure of the public sector earnings distribution. Thus, we find positive selection into the private sector, the sector with higher wage inequality, as predicted by the Roy (1951) model. This chapter also contributes methodologically to the literature. The estimator proposed by Chernozhukov and Hansen is convenient if the number of endogenous variables is small, but it becomes more and more difficult to implement when this number increases. In order to allow the public sector wage
124
Chapter 3: Public/private sector wage distributions
premium to vary with the covariates, we need to estimate a fully interacted model which is practically infeasible in term of computation time. Therefore, we propose two estimators that are tractable in this configuration. Their asymptotic distributions are derived and Monte-Carlo simulations show their good behavior in finite samples. Applying these new estimators, we find that the public sector also reduces the between-group inequality by yielding smaller returns to education. Thus, the government refuses to pay low wages to its less skilled employees and very high wages to its most skilled employees.49 Finally, the experience-wage profile is more concave in the private sector. Contrarily to the private sector, returns to experience remain positive almost until the end of the career in the public sector, probably as a consequence of the rigid hierarchical pay structure and the automatic salary increase with seniority. In summary, statistical tests have shown that the sector choice is endogenous, the public sector wage premium is different at different parts of the distribution and returns to education and experience are different in the public sector.
49
This is true for observed and unobserved (for the econometrician) skills.
Chapter 3: Public/private sector wage distributions
125
Appendix A: Instrumental Variable Quantile Regression Chernozhukov and Hansen (2005) focus on the modeling and on the nonparametric identification and show how the results can be derived from primitive conditions. The estimator for a single quantile is defined and examined in Chernozhukov and Hansen (2004b). The properties of the instrumental variable quantile regression process and of the inference process and test statistics derived from it are established in Chernozhukov and Hansen (2006). An application can be found in Chernozhukov and Hansen (2004a). Chernozhukov and Hansen (2004b and 2006) note that
Formel-Kapitel 1 Abschnitt 1
Pr (Y ≤ D 'α (θ ) + X ' β (θ ) X , Z ) = θ is equivalent to the statement that 0 is the θ th quantile of Y − D 'α (θ ) − X ' β (θ ) conditional on ( X , Z ) . Thus, the problem is to find parameters such that
0 = arg min E ρθ (Y − D 'α (θ ) − X ' β (θ ) − γ Z ) .
(A.1)
γ
The finite-sample analog of this procedure is simple and implies only the estimation of quantile regression along a dim (α ) -dimensional grid. In the simplest case where D and Z are one-dimensional (1 instrument and 1 endogenous variable), the procedure consists simply of finding α such that the traditional quantile regression of Y − D 'α on Z and X gives a coefficient of zero on Z. In order to formalize the estimator in the general case, allowing for estimated weights and instruments, define the quantile regression objective function:
QN (θ , α , β , γ ) ≡ where
1 N
∑ ρθ (Y − D 'α − X N
i
i
i =1
Φ i (θ ) ≡ Φ (θ , X i , Z i )
is
an
i
)
' β − Φˆ i (θ ) ' γ Vˆi (θ )
r-vector
of
instruments
and
Vi (θ ) ≡ V (θ , X i , Z i ) > 0 is a weight function. Φˆ i (θ ) and Vˆi (θ ) are consistent estimates of Φ i (θ ) and Vi (θ ) . The estimation procedure is defined as follows:
126
Chapter 3: Public/private sector wage distributions
αˆ (θ ) = arg inf WN (θ , α ) , WN (θ , α ) := N γˆ (θ , α ) ' Αˆ (θ , α ) γˆ (θ , α ) such that α ∈A
( βˆ (θ ,α ) , γˆ (θ ,α ) ) = arg( inf) Q β ,γ
N
(θ , α , β , γ ) , so that
(αˆ (θ ) , βˆ (θ )) = (αˆ (θ ) , βˆ (θ ,αˆ (θ ))) . Αˆ (θ , α ) = Α (θ , α ) + ο p (1) and Α (θ , α ) is positive definite uniformly in α ∈ A . It is convenient to set Α (θ , α ) equal to the inverse of the asymptotic covariance matrix of
N ( γˆ (θ , α ) − γ (θ , α ) ) . In this case, more weight is given to the
instruments whose effects are more precisely estimated and WN (θ , α ) is the Wald statistics for testing γ (θ , α ) = 0 . Under some technical regularity conditions, Chernozhukov and Hansen (2004b)
(
)
derive the asymptotic distribution of αˆ (θ ) , βˆ (θ ) :
αˆ (θ ) − α (θ ) d 0 N → N , Λ (θ ) = ( K ', L ') ' S ( K ', L ' ) βˆ (θ ) − β (θ ) 0 where,
for
Ψ = V ⋅ [ X ',Φ '] '
and
(A.2)
ε = Y − D 'α (θ ) − X ' β (θ ) ,
S = θ (1 − θ ) E [ΨΨ '] , K = ( Jα ' HJα ) Jα ' H , H = J γ ' A (θ , α ) J γ , L = J β M , −1
M = I k + r − Jα K , Jα = E f∈ ( 0 X , Z , D )Ψ D ' , and J β ', J γ ' ' is a partition of E fε ( 0 X , Z )ΨΨ ' V
−1
such that J β is a k × ( k + l ) matrix and J γ is a
l × ( k + l ) matrix. Efficiency
can
be
achieved
by
choosing
V * = fε ( 0 X , Z )
and
Φ * = E Dυ * X , Z V * , where υ * = fε ( 0 D, X , Z ) . Then, the asymptotic variance simplifies to θ (1 − θ ) E [ΨΨ '] and attains the efficiency bound. −1
Chapter 3: Public/private sector wage distributions
127
Appendix B: Asymptotic distribution of the SIVQR estimator The notation necessary to describe the asymptotic distribution of the estimator described in Section 3.3.1, which combines the sample selection correction of Buchinsky (1998) and the instrumental variable quantile regression estimator of Chernozhukov and Hansen, is complicate since it is a 3-steps estimator and the asymptotic variance of the first-step estimate appears in the second step and the asymptotic variance of the second step estimate appears in the third step. However, the procedure is intuitively straightforward: 1st step: We regress D on X and Z and we obtain the estimated coefficients αˆ X and αˆ Z and their asymptotic covariance matrix Λα . This step can be estimated with different existing parametric (logit, probit) or semiparametric (Klein and Spaddy (1993), Ichimura (1993)) estimators50. 2nd
step:
Denote
by
gˆ = X αˆ X + Zαˆ Z
the
estimated
index
and
by
PS ( gˆ ) = ( PS 1 ( gˆ ) ,..., PSS ( gˆ ) ) a polynomial vector in gˆ of order S . We estimate now the quantile regression of Y on X and PS ( gˆ ) separately for observations with D = 0 and D = 1 :
Yi = X i βˆ0 (θ ) + PS ( gˆ i ) κˆ0 (θ ) +εˆ0i (θ ) , {i : Di = 0} Yi = X i βˆ1 (θ ) + PS ( gˆ i ) κˆ1 (θ ) +εˆ1i (θ ) , {i : Di = 1} . The bias terms induced by sample selection are approximated by PS ( gˆ i ) κˆ0 (θ ) and PS ( gˆ i ) κˆ1 (θ ) . The asymptotic distributions of βˆ1 (θ ) can be derived directly following Buchinsky (1998):
50
In principle, the results of Chen, Linton and Van Keilegom (2003) allow to use nonparametric first step estimators if we use high-order kernels and undersmoothing. Thus an estimator in the spirit of Ahn and Powell (1993) could be built for quantile regression. Note however that the single index assumption must be maintained in the second step and the finite sample properties of this estimator should be pretty bad, particularly with the sample size that we have in our application.
128
Chapter 3: Public/private sector wage distributions
(
(
)
N βˆ1 (θ ) − β1 (θ ) → N 0, Λβ1 (θ )
)
where Λβ1 (θ ) is the k − 1× k − 1 top-left submatrix of
∆ fr−11 (θ (1 − θ ) ∆rr1 + ∆ frx1Λα ∆Tfrx1 ) ∆ fr−11 where
∆ fr1 = E fε (θ ) ( 0 r1 ) r1r1 ' , 1
∆ frx1 = E fε (θ ) ( 0 r1 )
dhθ ,1 ( g )
1
dg
' κ1 (θ ) r1 X ,
∆rr1 = E [ r1r1 '] , r1 ' = D ⋅ ( X , hθ ,1 ( g ) ) and hθ ,1 = PS ( gi ) κ1 (θ ) . Similarly, the asymptotic distribution of βˆ0 (θ ) is given by
(
(
)
n βˆ0 (θ ) − β 0 (θ ) → N 0, Λβ0 (θ )
)
where Λβ0 (θ ) is the k − 1× k − 1 top-left submatrix of
∆ fr−10 (θ (1 − θ ) ∆rr 0 + ∆ frx 0 Λα ∆Tfrx 0 ) ∆ fr−10 where
∆ fr 0 = E fε
0
(θ )
( 0 r ) r r ' , 0
0 0
∆ frx 0 = E fε
0
(θ )
(0 r ) 1
dhθ ,0 ( g ) dg
' κ 0 (θ ) r0 X ,
∆rr 0 = E [ r0 r0 '] , r0 ' = (1 − D ) ⋅ ( X , hθ ,0 ( g ) ) and hθ ,0 = PS ( gi ) κ 0 (θ ) . The asymptotic covariance between βˆ0 (θ ) and βˆ1 (θ ) is given by the
k − 1× k − 1 top-left submatrix of
∆ fr−10 ∆ frx 0 Λα ∆Tfrx1∆ fr−11 . We note that βˆ0 (θ ) and βˆ1 (θ ) are correlated although they use different sets of observations since they use the same first step estimate of Pr ( D = 1 X , Z ) . We define Λβ to be the whole variance-covariance matrix of βˆ0 (θ ) and βˆ1 (θ ) . 3rd step: We use the Chernozhukov and Hansen (2004b) estimator with
Y − (1 − D ) X βˆ0 − DX βˆ1 as the dependent variable and only a constant as the exogenous regressor. The new weighted quantile regression objective function is given by
Chapter 3: Public/private sector wage distributions
129
QN (θ , α , δ , γ ) ≡
1 N
∑ ρθ (Y − α − D 'δ − (1 − D ) X N
i
i
i
i =1
i
)
' βˆ0 − Di X i ' βˆ1 − Φˆ i (θ ) ' γ Vˆi (θ ) .
The estimation procedure is defined as follows:
δˆ (θ ) = arg inf N γˆ (θ , δ ) ' Αˆ (θ , δ ) γˆ (θ , δ ) , such that δ ∈A
Q (θ , α , δ , γ ) , so that (αˆ (θ , δ ) , γˆ (θ , δ ) ) = arg(α inf γ) N
,
(αˆ
0
(θ ) , αˆ1 (θ ) , βˆ0 (θ ) , βˆ1 (θ ) ) = (αˆ (θ , δˆ (θ ) ) , αˆ (θ , δˆ (θ ) ) + δˆ (θ ) , βˆ0 (θ ) , βˆ1 (θ ) ) .
Here, we consider only the exactly identified case in order to keep the notation tractable, but the overidentified case can be derived in a similar way by weighting the moment conditions. The asymptotic distribution of αˆ 0 (θ ) and δˆ (θ ) can be derived by applying the results for 2-steps GMM estimators (for instance Newey 1984),51 since the true value α 0 (θ ) and δ (θ ) solve the following moment conditions:
(
1(Yi < α (θ ) − Dδ (θ ) − (1 − D ) X ' β 0 (θ ) − DX ' β1 (θ ) ) − θ E ⋅ 1: Φ (θ ) ' 'V (θ )
) = 0 .
The covariance matrix of these moment conditions is given by
S = θ (1 − θ ) E Ψ (θ )Ψ (θ ) ' where Ψ (θ ) = V (θ ) ⋅ 1,Φ (θ ) ' ' and the derivatives of these moment conditions relative to α and δ are given by
(
)
Jθ = E fε 0 X ,Φ (θ ) , D Ψ (θ ) [ D,1] .
51
Note that the results of Newey are valid only for smooth moment conditions. However, if we assume that the density of the error term at 0 is bounded away from zero and continuous, the moment conditions of the quantile regression estimators are asymptotically smooth and thus results of Powell (1984) or more generally Newey and Mc Fadden (1994, Section 7) allows to apply the GMM framework.
130
Chapter 3: Public/private sector wage distributions
If β 0 (θ ) and β1 (θ ) were given and not estimated, the asymptotic variance of
(αˆ
0
(θ ) , δˆ (θ ) )
would be Jθ −1 SJθ −1 ' . However, the derivative of the moment
condition relative to ( β 0 , β1 ) is not zero but is given by
Jβ =
(
)
(
)
E (1 − D )V (θ ) f 0 X ,Φ (θ ) Ψ (θ ) X , E DV (θ ) f 0 X ,Φ (θ ) Ψ (θ ) X . ε ε Therefore, following standard results for multi-step GMM estimators, since the moment conditions of the second and third step are uncorrelated, the asymptotic
(
)
distribution of αˆ 0 (θ ) , δˆ (θ ) is
αˆ (θ ) − α (θ ) n → N 0, Jθ−1 ( S + J β Λβ J β ') Jθ−1 ' . δˆ (θ ) − δ (θ )
(
)
Finally, since the measures of inequality are a function of more than one quantile estimate, we need the covariance matrix of distinct quantile estimates. This is straightforward to derive by using the results of Buchinsky (1998) and Chernozhukov and Hansen (2004b). Basically, the covariance matrix of the estimate at the quantiles θ1 and θ 2 is the same as the covariance matrix of the estimate for a single quantile θ but with min (θ1 ,θ 2 ) − θ1θ 2 instead of θ (1 − θ ) , and all matrix that appears twice ( ∆ fr , ∆ frx , J β , Jθ ) are evaluated once at θ1 and once at θ 2 .
Appendix C: Asymptotic distribution of the MDIVQR estimator Suppose that X i comes from a discrete distribution so that there is a finite number, say J, of different possible vectors X ( j ) , j = 1,...J . The Chernozhukov and Hansen estimator can be applied separately in each cell. Of course, only a constant α j (τ ) and the quantile treatment effect (the coefficient on D) δ j (τ ) are
Chapter 3: Public/private sector wage distributions
131
estimated. The asymptotic distribution of αˆ j (τ ) and δˆ j (τ ) is directly derived from (A.2):
αˆ j (τ ) − α j (τ ) d 0 Λ j (τ ) n → N , ˆ 0 Pr X = x δ j (τ ) − β j (τ ) ( j)
(
)
where Λ j (τ ) is equal to Λ (τ ) defined in (A.2) with the exceptions that we condition all expected value on X = x( j ) and the vector of regressors consists
(
only of a constant term. We also note that αˆ j (τ ) , δˆ j (τ )
(αˆ
j'
(τ ) , δˆ j ' (τ ) )
)
is independent of
for j ≠ j ' .
In order to obtain a condensed presentation of the results, we can now use the minimum distance framework to estimate the global parameters. Recall that we assume that the conditional quantiles of Y given X are linear in each sector. That is,
α j (τ ) = X ( j ) ' β 0 (τ ) α j (τ ) + δ j (τ ) = X ( j ) ' β1 (τ ) where the parameter vectors β 0 (τ ) and β1 (τ ) are the same for j = 1,...J . Define G to be a J × k
(with J ≥ k ) matrix with rows
X (1) ,..., X ( J ) ,
α (τ ) = (α1 (τ ) ,..., α J (τ ) ) , δ (τ ) = (δ1 (τ ) ,..., δ J (τ ) ) and Wˆs (τ ) is a J × J matrix that converges with probability one to Ws (τ ) , a positive-definite matrix, for s = 0,1 . The minimum distance estimators of β 0 (τ ) and β1 (τ ) are then defined by
βˆ0 (τ ) = min (αˆ (τ ) − G β ) 'Wˆ0 (τ ) (αˆ (τ ) − G β ) β
and Then
(
)
(
)
βˆ1 (τ ) = min αˆ (τ ) + δˆ (τ ) − G β 'Wˆ1 (τ ) αˆ (τ ) + δˆ (τ ) − G β . β
132
Chapter 3: Public/private sector wage distributions
(
)
d
N βˆs (τ ) − β s (τ ) →
(
N 0, ( G 'Ws (τ ) G ) G 'Ws (τ ) Ω s (τ ) Ws (τ ) G ( G 'Ws (τ ) G ) −1
−1
)
,
for s = 0,1 , where Ω s (τ ) is a J diagonal matrix with the jth diagonal element equal to the variance of αˆ j (τ ) and Ω1 (τ ) is a J diagonal matrix with the jth diagonal element equal to the variance αˆ j (τ ) + δˆ j (τ ) . An efficient minimum distance estimator is obtained by setting Ws (τ ) equal to a consistent estimator of
Ω s−1 .Note that the efficiently weighted minimum distance estimator is asymptotically equivalent to the estimator of Chernozhukov and Hansen with optimal instruments and weights if the model is correctly specified. If the model is misspecified, they will generally converge to different values since they attach different weights to different points in the support of X.52
52
See Angrist, Chernozhukov and Fernández-Val (2006) for a discussion of misspecified quantile regression.
Chapter 3: Public/private sector wage distributions
133
Tables for chapter 3 Table 3.1: Monte-Carlo simulations, without interaction terms Estimator
α
(true value: 0)
βx
(true value: 1)
Bias
S.E.
MSE
Bias
QR IVQR IVQR SIVQR MDIVQR
-0.81 0.04 0.04 0.01 0.04
0.3 0.46 0.68 0.52 0.64
0.74 0.21 0.46 0.27 0.41
0.33 -0.05 -0.07 -0.03 -0.06
QR IVQR IVQR SIVQR MDIVQR
-0.81 0 0.01 0 0.01
0.15 0.21 0.29 0.24 0.29
0.67 0.04 0.08 0.06 0.08
0.33 -0.01 -0.01 -0.01 -0.01
QR IVQR IVQR SIVQR MDIVQR
-0.81 0 0 0 0
0.07 0.1 0.14 0.11 0.14
0.65 0.01 0.02 0.01 0.02
0.33 0 0 0 0
S.E.
βd
(true value: 1)
MSE Bias S.E. 100 observations 0.3 0.2 1.19 0.31 0.39 0.15 -0.02 0.55 0.78 0.62 -0.02 0.84 0.55 0.3 0.01 0.65 0.73 0.54 -0.03 0.79 400 observations 0.15 0.13 1.19 0.15 0.17 0.03 0 0.25 0.34 0.11 -0.01 0.37 0.25 0.06 0.01 0.3 0.34 0.11 -0.01 0.37 1600 observations 0.07 0.12 1.18 0.07 0.08 0.01 0 0.12 0.16 0.03 0 0.18 0.12 0.01 0 0.15 0.16 0.03 0 0.18
β xd
(true value:
MSE
0) Bias
S.E.
MSE
1.51 0.3 0.7 0.42 0.62
0.01 0 0.01
1.2 0.64 1.08
1.44 0.41 1.17
1.43 0.06 0.14 0.09 0.14
0 0 0
0.5 0.3 0.5
0.25 0.09 0.25
1.4 0.02 0.03 0.02 0.03
0 0 0
0.25 0.14 0.25
0.06 0.02 0.06
Results based on 10000 replications. QR : traditional quantile regression ; IVQR : instrumental variable quantile regression estimator of Chernozhukov and Hansen; SIVQR : instrumental variable quantile regression estimator using the sample selection correction procedure of Buchinsky; MDIVQR : minimum distance instrumental variable quantile regression. The data generating process is given by (6).
134
Chapter 3: Public/private sector wage distributions
Table 3.2: Monte-Carlo simulations, with interaction terms Estimator
α
(true value: 0)
βx
(true value: 2)
Bias
S.E.
MSE
Bias
QR IVQR IVQR SIVQR MDIVQR
-0.8 0.42 0.05 0.01 0.04
0.38 0.52 0.68 0.52 0.64
0.79 0.45 0.46 0.27 0.4
0.33 -0.61 -0.07 -0.03 -0.06
QR IVQR IVQR SIVQR MDIVQR
-0.82 0.38 0.01 0 0.01
0.18 0.24 0.29 0.24 0.29
0.71 0.2 0.09 0.06 0.08
0.35 -0.57 -0.01 -0.01 -0.01
QR IVQR IVQR SIVQR MDIVQR
-0.82 0.38 0 0 0
0.09 0.12 0.14 0.11 0.14
0.69 0.16 0.02 0.01 0.02
0.36 -0.57 0 0 0
S.E.
βd
(true value: 1)
MSE Bias S.E. 100 observations 0.46 0.32 1.19 0.44 0.4 0.54 -0.55 0.57 0.78 0.62 -0.03 0.84 0.55 0.3 0.01 0.65 0.73 0.54 -0.03 0.79 400 observations 0.23 0.18 1.21 0.21 0.18 0.36 -0.52 0.26 0.34 0.11 -0.01 0.37 0.25 0.06 0.01 0.3 0.34 0.11 -0.01 0.37 1600 observations 0.11 0.14 1.21 0.11 0.09 0.33 -0.53 0.13 0.16 0.03 0 0.18 0.12 0.01 0 0.15 0.16 0.03 0 0.18
β xd
1) MSE Bias
(true value: S.E.
MSE
1.62 0.62 0.7 0.42 0.62
-0.02
0.59
0.35
0.02 0 0.01
1.17 0.64 1.08
1.38 0.41 1.16
1.5 0.34 0.14 0.09 0.14
-0.05
0.29
0.09
0 0 0
0.5 0.3 0.5
0.25 0.09 0.25
1.47 0.3 0.03 0.02 0.03
-0.06
0.15
0.02
0 0 0
0.25 0.14 0.25
0.06 0.02 0.06
Results based on 10000 replications. QR : traditional quantile regression ; IVQR : instrumental variable quantile regression estimator of Chernozhukov and Hansen; SIVQR : instrumental variable quantile regression estimator using the sample selection correction procedure of Buchinsky; MDIVQR : minimum distance instrumental variable quantile regression. The data generating process is given by (6).
Chapter 3: Public/private sector wage distributions
135
Table 3.3: Definition of the variables Variable Lnghwage
Expr Ed level Ed level 1 Ed level 2 Ed level 3 Ed level 4 Ed level 5 Ed level 6 Psect Fcivil Fblue Fself Fwhite Mnwork
Description The natural logarithm of gross hourly earnings from employment. Gross hourly wage are derived by dividing gross monthly earnings by monthly actual hours worked. Number of years of potential work experience the individual has accumulated. It is measured by min(age-schooling-6, age –18). Ordered variable on education: Dummy; 1 if no degree or basic or intermediate schooling with no training. Dummy; 1 if basic schooling with apprenticeship. Dummy; 1 if intermediate schooling with apprenticeship. Dummy; 1 if high school (Abitur or Fachabitur) with no training or with apprenticeship. Dummy; 1 if high school with technical school or polytechnic. Dummy; 1 if university. Dummy; 1 if employed in the public sector. Dummy, 1 if father civil servant at the time the respondent was 16 years old. Dummy, 1 if father blue collar at the time the respondent was 16 years old. Dummy, 1 if father self employed at the time the respondent was 16 years old. Dummy, 1 if father white collar at the time the respondent was 16 years old. Dummy, 1 if mother did not work at the time the respondent was 16 years old..
136
Chapter 3: Public/private sector wage distributions
Table 3.4: Descriptive statistics, means Variable Lnghearn Expr Education: Ed level 1 Ed level 2 Ed level 3 Ed level 4 Ed level 5 Ed level 6 Fcivil Fblue Fself Fwhite Mnwork Number of observations
All 2.693 22.09
Public Sector 2.745 24.15
Private Sector 2.677 21.47
9.7% 30.1% 24.7% 8.4% 11.9% 15.3% 10.3% 39.9% 12% 21.6% 21.2% 3125
5.7% 22.3% 26.4% 9.3% 13.5% 22.7% 16.2% 35% 12.6% 23.2% 25.2% 717
10.9% 32.4% 24.2% 8.1% 11.4% 13% 8.6% 41.4% 11.8% 21.2% 20% 2408
Chapter 3: Public/private sector wage distributions
137
Table 3.5: Median regression using different estimators
Constant Expr Expr^2 Ed level 2 Ed level 3 Ed level 4 Ed level 5 Ed level 6 Psect
RQ
IVQR
1.614** (0.056) 0.062** (0.004) -1e-3** (9e-5) 0.198** (0.041) 0.317** (0.042) 0.34** (0.053) 0.587** (0.045) 0.709** (0.045) -0.084** (0.02)
1.522** (0.064) 0.063** (0.005) -0.001** (1e-4) 0.26** (0.047) 0.32** (0.049) 0.322** (0.07) 0.587** (0.055) 0.659** (0.069) 0.31 (0.237)
SIVQR public 1.961** (0.244) 0.051** (0.009) -7e-4** (2e-4) 0.085 (0.094) 0.187* (0.094) 0.256* (0.114) 0.406** (0.104) 0.481** (0.112)
SIVQR private 1.601** (0.066) 0.060** ((0.006) -1e-3** (1e-4) 0.251** (0.050) 0.307** (0.053) 0.254* (0.077) 0.594** (0.064) 0.684** (0.079)
AAI public 1.732** (0.138) 0.052** (0.007) -7e-4** (2e-4) 0.089 (0.087) 0.196* (0.087) 0.295** (0.095) 0.435** (0.087) 0.525** (0.084)
AAI private 1.170** (0.378) 0.0811** (0.017) -1e-3** (3e-4) 0.268** (0.087) 0.427** (0.112) 0.413** (0.093) 0.717** (0.104) 0.848** (0.112)
Column 1: exogenous quantile regression, column 2: instrumental variable quantile regression, column 4 and 5: SIVQR in the public and private sectors, column 6 and 7: Abadie, Angrist and Imbens (2002) estimator in the public and private sectors. *: significant at the 5% level, **: significant at the 1% level. Analytical heteroscedasticity consistent standard errors are given in parenthesis.
138
Chapter 3: Public/private sector wage distributions
Table 3.6: Estimation of the selection equation, dependent variable: psect Logit Constant Expr Expr^2 Ed level 2 Ed level 3 Ed level 4 Ed level 5 Ed level 6 Fcivil Mnwork Fblue Fself Fwhite
Coefficient -1.6247*** 0.0240** -0.0001 -0.0218 0.3014*** 0.4449*** 0.3469*** 0.5863*** 0.4182*** 0.1337** 0.1079 0.1307 0.1569*
Std. error 0.2195 0.0120 0.0002 0.1185 0.0806 0.0922 0.0876 0.0611 0.1038 0.0646 0.0754 0.0927 0.0853
Klein and Spady Coefficient Std. error 0.0270** -0.0001 -0.1711 0.2229*** 0.3964*** 0.3352*** 0.6126*** 0.4767*** 0.1705** 0.0120 0.0514 0.1287
0.0143 0.0003 0.1748 0.1316 0.1118 0.1230 0.1005 0.1246 0.0791 0.1015 0.0966 0.1017
Standard errors are obtained by bootstrapping the results 100 times for the Klein and Spady estimator and 1000 times for the logit estimator. *: significant at the 10%, **significant at the 5%, ***: significant at the 1%.
Chapter 3: Public/private sector wage distributions Table 3.7: P-values on the instrumental quantile regression process Subsample size
Null hypothesis 250
2000
3125
No effect: α ( ⋅) = 0