b) Calculate at the j distribution the percentile levels at which qi lies and call these Pi. ... work but not the wage,
Decomposing the Gender Wage Gap with Sample Selection Adjustment: Evidence from Colombia 1
Alejandro Badel
[email protected] Federal Reserve Bank of St. Louis
Ximena Peña
[email protected] Universidad de Los Andes
June, 2009 Abstract Despite a strong convergence in the distribution of labor market characteristics between men and women in Colombia, relative hourly wages between the genders have not converged in the same order of magnitude and a sizeable gender wage gap persists. We employ quantile regression techniques to examine the degree to which differences in the distribution of observable characteristics can explain the gender gap, and rather find that the gap is largely explained by differences in the rewards to human capital characteristics. The remaining gap after controlling for observable factors is unevenly spread across the distribution, primarily affecting women at the top and the bottom of the distribution. We find that self selection is important, explaining roughly -50% of the gap, implying that able women self-select into work. Keywords: Gender gap, semiparametric, quantile regression, selection. JEL classification numbers: C21, J22, J31.
1
The authors are grateful to James Albrecht, Susan Vroman and participants at the Universidad de Los Andes and Rosario Seminars, NIP-Colombia and LACEA Conferences and BIARI 2009 for their comments. The usual disclaimer applies.
Introduction In the past decades in the Colombian labor market, there has been a relative improvement in the labor market indicators and characteristics of women. Whereas the participation rate of men has been fairly stable around 75%, participation rates for women almost doubled from 30% in 1976 to nearly 60% in 2006 and it is now among the highest in Latin America (Duryea et al., 2001). Between 1984 and 2006 in our sample 2 , whereas male participation was around 95% for the whole period, female participation increased from 40% to 70%. The median hours of work remained the same for both groups across time at 48 hours which is the legal working week for manufacturing in the country. The fraction of women in the primary sector, industry and services passed from 15%, 29% and 43% to 27%, 37% and 51%, respectively. When comparing working men and women, females have substantially improved the labor market characteristics that they bring to the labor market. First, the existing difference in potential experience has receded due to the increase in the participation rate of mothers (Amador et. al, work). Average years of potential experience for men and women went from 18.0 and 16.6 to 19.0 and 18.7, respectively. In terms of education, women widened their educational advantage over men; the average years of schooling raised from nearly 8.2 for men and 8.5 for women to 10.3 and 10.7. Furthermore, women reversed the education gap in college attainment and are now more educated than men (see Peña, 2006). While the fraction of men with college education over the fraction of women with college education for those working was 1.96 in 1986 and by 2006 it was 0.94. Despite a strong convergence in the distribution of characteristics, relative hourly wages between the genders have not converged in the same order of magnitude and a sizeable gender wage gap persists. The unconditional gender wage gap, that is, the difference in average wages between men and women, went from 23% to 14% during the same period 3 . However, between 1986 and 2006 not only did the labor market characteristics of men and women changed, but so did the market returns or ‘prices’ paid for such characteristics between genders. To isolate the effects of these two sources of change we calculated, using a mean wage regression with 1986 data, the average returns to observable characteristics in 1986. The gender wage gap implied by the 1986 returns and the characteristics of men and women in 2006 should be 2%; as mentioned earlier, the observed gap in 2006 was 14%. These facts appear puzzling because the gap did not fall too much, al though the other measures converged strongly, and therefore the unexplained portion of the gap increased substantially. In this paper we employ quantile regression techniques to study the gender gap. We find that the analysis of means is misleading. Higher moments of the distribution of characteristics go a long way in explaining the gender gap. Men are paid significantly more than women and the raw gap displays a U-shape: women's wages fall further below 2
See the data section for a full description of our sample. The conditional gap, that is, the gender gap after controlling for labor market characteristics, has decreased from 20,5% at the beginning of the period to 11,4% in 2006.
3
2
men's at the extremes of the distribution whereas they are closer around the middle of the distribution. We employ the Machado Mata (MM hereafter) decomposition technique, to decompose the gap into a component due to differences in human capital characteristics such as education and age -composition effect- and differences in the rewards to these characteristics -price effect. We examine the degree to which the distribution of male and female characteristics can explain the gender gap in 2006 and find that it is largely explained by the price effect. The remaining gap after controlling for observable factors is unevenly spread across the distribution, primarily affecting women at the top and the bottom of the distribution. The MM technique has been used to decompose de wage gaps across the distribution in several developed economies (see for example Albrecht et al., 2003, for Sweden; de la Rica et al., 2007, for Spain). Regarding developing countries, several papers calculate and decompose the gender wage gap along the distribution (see for example Ganguli and Terrell, 2005, for Ukraine; Ñopo, 2006, for Chile; Fernández, 2006, for Colombia). However, these papers do not control for sample selection, which is often an issue in these calculations. Albrecht et al. (2007, AVV in what follows) propose and extension of the MM technique to account for selection following Buchinsky (1998); this paper applies the AVV methodology. Despite having one of the highest female labor participation rates in Latin America, self selection of women into work is important in the Colombian case, explaining roughly -50% of the gap. We find a positive selection effect, that is, able women self-select into work. The analysis of the gender wage gap is important for a developing country because it is a relevant measure of how unequal a society is. Colombia is a highly unequal country; it is among the highest income inequalities in Latin America as measured by the Gini index. Thus, gender gap calculations and decompositions are especially interesting. To the best of our knowledge, there are no papers that decompose the selection-corrected gender wage gap along the distribution for developing countries.
Descriptive Statistics and Data We use the Colombian Household Survey (CHS), a repeated cross-section carried out by the Statistics Department. It collects information on demographic and socioeconomic characteristics such as gender, age, marital status and educational attainment, as well as labor market variables for the population aged 12 or more including occupation, job type, income and sector of employment. We use the June 1986, 1996 and 2006 shifts to analyze the evolution of the raw gap and then focus on the latter wave to perform the selection correction and decomposition exercises.
3
Our analysis focuses on the seven main cities which account for 60% of the urban population, and according to 2005 Census data 78% of Colombians live in urban areas4 . In the 7 main cities 93% of men between 25 and 55 years of age work, while only 69% of women do. When we compare Bogotá and the other cities, we find that even though the levels of male participation are comparable, women participation is significantly higher in Bogotá: 75% vs. 65%. We use only observations with a complete set of covariates and restrict our sample to prime-aged individuals (between 25 and 55 years of age) who report working between 16 and 84 hours per week 5 and earn more than one dollar per day. Table 1 shows the sample selection for 2006, which leaves 15,423 observations, equivalent to nearly 4 million using weights, 47% of which are female. The sample selection process was very careful to minimize measurement error in the log hourly wage. Table 1: Sample Selection, April-June 2006 No. Observations Weighted % Men 46.439 14.200.850 0,44 7 main cities, 12+ years 23.915 6.047.089 0,43 Ages 25 to 55 years… 4.302.923 0,51 16.513 who w ork, 15.563 4.012.872 0,52 report 16-84 hours per w eek and earn more than US$1 per day. 15.423 3.978.580 0,52
In addition to the aforementioned differences in participation rates, men and women also display important differences in hours worked per month. Even though in our sample both have median hours of 208, men work on average 220 hours per month while women work 197 hours. The dependent variable is log hourly wage. The explanatory variables included in the estimations are: age and its square 6 , 4 education groups 7 , and dummies for marital status 8 and head of household. The descriptive Statistics are summarized in Table (2). First, men earn higher mean hourly wages than women: the average log wage for men is 7.86 and 4
Bogotá accounts for 45% of the population in the 7 main cities but given the design of the CHS, the sample size corresponds to only 15%. Sample weights are used to get representative results. Instead of extending the MM methodology to include sample weights we perform calculations for Bogotá and Elsewhere separately, and then we build the weighted distribution as follows: a) Let qi be the percetiles of the log wage distributions for i={Bogotá, Elsewhere}. b) Calculate at the j distribution the percentile levels at which qi lies and call these Pi. E.g. PBog = Fbog(qelse). c) The percentiles qelse correspond to the Pr(z=bog)∗( PBog)+(1-Pr(z=bog))∗(0.01,0.02,0.03...0.99) percentile levels of the country distribution. d) Obtain the country percentiles by linear interpolation. 5 The legally defined full time work is 48 hours per week in Colombia. 6 There is no available information in the survey regarding work experience, nor information about the number of births per woman -this is only identifiable for the head of household or spouse. Therefore, we use age and its square to proxy for experience instead of a transformation of age and schooling. 7 The education groups are: no completed education, completed primary, completed secondary and completed tertiary. 8 We summarize the marital status information into two categories: ‘together’ including individuals married or cohabiting which we refer to as married, and ‘alone’ which includes the categories single, divorced, separated and widowed.
4
7.72 for women. There are sizeable differences between the traditional labor market characteristics of working and non-working women, which is suggestive of non-random selection into work. The distribution of age and schooling is very similar between men and working women. Working men and women have similar average age, whereas nonworking women are nearly 2 years older. Working women are the most educated, followed by men and finally non-working women; the education distribution of working women first-order stochastically dominates that of working men which in turn first order stochastically dominates that of non-working women. Working men and non-working women display similar proportions of married individuals, 69% and 67% respectively, whereas only 48% of working women report being married. Males are more often head of household than females: 69% of men are head of household, while only 30% of working women and 17% of non-working women are. Table 2. Descriptive Statistics, Wage Equation Men Women Working Working Not Working 7,86 7,72 Log Wage (0,76) (0,82) 38,33 38,01 39,93 Age (8,57) (8,34) (9,17) Education < Primary 0,07 0,07 0,10 Primary + 0,34 0,31 0,39 Secondary + 0,41 0,40 0,40 University 0,18 0,22 0,10 0,69 0,48 0,67 Married Head of Household 0,69 0,30 0,17 0,43 0,47 0,33 Bogotá 0,49 0,52 0,57 Home Ownership # Children 2-6yrs 2 0,18 0,13 0,15 1 0,03 0,02 0,02 0,04 0,02 0,02 # Children