A Comparison of Propensity Score and Linear Regression ... - CiteSeerX

Joint Statistical Meetings - Social Statistics Section

A COMPARISON OF PROPENSITY SCORE AND LINEAR REGRESSION ANALYSES OF GENDER GAPS IN COMPUTER SYSTEMS ANALYSTS CAREERS Elaine Zanutto1 The Wharton School, University of Pennsylvania, 466 JMHH, Philadelphia, PA 19104 KEY WORDS: information technology careers, salary, job satisfaction, SESTAT, family power theory 1. Introduction The field of Information Technology (IT) has provided extraordinary job growth in the United States (Council of Economic Advisers 2000). However, the impact of possible gender and racial gaps in IT on the nation's future IT preparation is of great concern (Gearan 2000). This paper extends current research on gender gaps in IT salary and job satisfaction by analyzing data on computer systems analysts from The National Science Foundation's 1997 SESTAT (Scientists and Engineers Statistical Data System) database (NSF 99-337). The SESTAT database contains information from several national surveys of people with at least a bachelor's degree in science or engineering or at least a bachelor’s degree in a nonscience and engineering field but working in science and engineering (for a detailed description of the coverage limitations see NSF 99-337). The SESTAT database has the advantage of very large sample sizes and high response rates (generally over 80%) as compared to surveys used previously in the gender gap literature (e.g., Kirchmeyer 1998; Shapiro 1994) and also contains many more detailed education, job-related, and family-related variables as compared to other national surveys such as the Current Population Survey. We use two statistical techniques for assessing gender gaps: multiple linear regression and propensity score analysis. Multiple linear regression is currently the most commonly used technique (Finkelstein and Levin 2001; Gray 1993), however, the statistical literature suggests that propensity score analysis has several advantages over multiple linear regression (Perkins et al 2000; Rubin 1997). In particular, propensity score analyses are less sensitive to the specification of the model than linear regression (Dehejia et al 1999; Rubin 1997; Drake 1993). Also, propensity score analysis has the intuitive appeal of creating groups of similar men and women which is similar in concept to audit pairs commonly used in labor discrimination experiments (Darity et al 1998).

1

2.

Previous Research Literature on gender gaps in salary and job satisfaction in IT careers is limited, but the Council of Economic Advisors (2000) reports that median earnings for women in IT are on average 12% lower than men, even after accounting for differences in age, education, race, and occupational composition. Shapiro (1994) suggests that even where women hold the necessary qualifications and experience, they may still experience problems in progressing their careers. Related research on the gender gap in salaries in engineering occupations (NSF 99-352) reports that although the median salary for women engineers was 13% less than the median salary for men in 1995, the difference was lowered to only 2% after controlling for covariates such as years of experience, engineering specialty, education, employment sector, and region of the country, suggesting the importance of controlling for these types of covariates in gender comparisons. Finally, research on gender differences in management careers suggests that spouse's employment status is an important predictor of income in mid-career managers. (Kirchmeyer 1998). This study also reports that even though sample women earned less income than men, they perceived their careers to be as successful. Stroh, Brett and Reilly (1992) found that in a sample of managers, even though the women had similar education, worked in similar industries, had similar geographic mobility, and had similar levels of “family power” (i.e., percentage of family income contributed by the employee) as men, women were still disadvantaged with regard to salary progression. 3.

The Data We analyze data from the 1997 SESTAT database, the most recent SESTAT database to include both salary and job satisfaction information. This database contains information about 9324 IT professionals including computer systems analysts, computer software engineers, computer programmers, and others. Due to differences between IT occupations, each occupation should be analyzed separately. This analysis focuses on the sample of 2035 computer systems analysts (1497 men, 538 women) who were working full-time as computer systems analysts in the

This research was supported by NSF CISE-ITWF grant #0089872

3905


United States. Due to the nature of the SESTAT database, all of these systems analysts possess at least a bachelor’s degree. We focus on systems analysts who have had their degree for at least two years. In the (unadjusted) survey-weighted data, the average salary of women is 7.6% ($4,500.74) less than the average salary for men in this population and there is a 3% (0.05 point) difference in average job satisfaction. Revised estimates of the gender differences, that control for relevant background characteristics, are presented in Section 5. 4.

Methodology

4.1 Multiple Linear Regression The most common form of regression analysis used to study gender salary gaps predicts salary from gender and other important covariates such as education, experience, productivity, and other market factors such as region of the country. If the coefficient for gender is statistically significant after controlling for the other confounders then a statistically significant wage gap is declared. Extension of this model to two separate models, one for men and one for women, has been advocated by several researchers (Finkelstein et al 2001, Gray 1993, Gastwirth 1993) so as not to restrict the analysts to considering only differences that can be represented by a shift in the regression line for women relative to men. In this methodology the coefficients of the explanatory variables from the two separate regression equations must be compared in order to determine if there are gender differences. The twoequation model is equivalent to a single-equation model containing interaction terms between the gender indicator variable and each of the other factors. Bonferroni adjustments, or other adjustments for multiple comparisons, should be used when testing the interaction coefficients or, equivalently, when comparing the coefficients from the two separate models. 4.2 Propensity Score Adjustments As an alternative to multiple linear regression, a propensity score analysis (Rubin 1997, Rosenbaum and Rubin 1984) can be used to create groups of men and women that have similar characteristics so that salary and job satisfaction comparisons can be made within these matched groups. As a first step in the propensity score analysis it is useful to assess the initial imbalance in the distributions of the covariates between the two groups. This can be accomplished with two sample t-tests for continuous covariates or two sample tests of proportions for binary covariates1. This will highlight covariates that have different distributions (i.e., are imbalanced) across

the two groups and therefore, covariates for which we will want to control and also provides a benchmark against which to verify that the propensity score adjustment methodology has in fact produced groups with more balance in the covariates. After the assessment of initial imbalance, we fit a logistic regression model for estimating the probability that a person is male. The predictor variables should include all variables that may affect salary and job satisfaction. Formally, we estimate the propensity to be male (i.e., a “propensity score”) given values of the background covariates. Unlike other propensity score applications, we cannot imagine here that given similar background characteristics gender was randomly assigned. Nevertheless, we can use the propensity score framework to create groups of men and women who share similar background characteristics. After the propensity scores are estimated, the overlap in the distribution of the propensity scores for men and women should be examined. Since the propensity score can be thought of as a one-number summary of the characteristics of a person, checking for overlap in the propensity scores verifies that there are comparable men and women in the data set. If there is little or no overlap in the propensity score distributions, this is an indication that the men and women in the sample are very different and comparisons between these groups should be made with extreme caution or not at all. This ability to easily check that the data can support comparisons between the two groups is one of the advantages of a propensity score analysis over a regression analysis. The overlap in the propensity scores will also indicate the range over which comparisons can be made (e.g., Dehejia et al 1999). All further analyses are restricted to men and women within the overlap range. The next step is to subclassify the sample into five groups according to propensity score quintile to create five groups of men and women with similar background characteristics. Cochran (1968) and Rosenbaum and Rubin (1984) have shown that creating five strata based on propensity score quintiles can be expected to remove approximately 90% of the bias due to the imbalance in the distribution of the covariates between the two groups. As a check on the adequacy of our propensity score model, we conduct a series of two-way ANOVA analyses to assess the covariate balance in the five groups of matched men and women. For each covariate, we fit a two-way ANOVA where the covariate is the dependent variable and the two factors are gender and the propensity score quintile.2 If this analysis shows nonsignificant main effects of propensity score quintile and nonsignificant effects of the interaction between propensity score quintile and gender, then this indicates that men and women within the five propensity score groups have similar

3906


distributions (that is, are balanced) on this covariate. Again, the problem of multiple comparisons should be kept in mind when conducting these tests. We would expect 5% of these tests to be significant at the 0.05 level purely by chance, so if 5% or less of our tests are significant, then we have achieved as much or more balance than would be expected in a randomized experiment and we can proceed to the next step of the analysis. If important differences between the two groups remain after subclassification, this indicates an inadequacy in the propensity score model and the next step would be to re-estimate the propensity score model including interactions and quadratic terms of variables that remain out of balance, then subclassify on the new propensity scores, and reassess balance. This procedure is repeated until adequate balance is achieved. If important differences remain after repeated modeling attempts, covariance adjustments can be used at the final stage to adjust for remaining differences (Dehejia et al 1999, Rosenbaum 1986). Finally, to estimate the average salary difference between men and women, controlling for background covariates, we first estimate the average difference within each propensity score quintile group by subtracting the average female salary from the average male salary within each group. Then an overall estimate of the difference is estimated by averaging the differences across all five quintiles. This procedure is summarized by the following formula:

(

5

nk y Mk − y Fk k =1 N

∆=∑

)

where ∆ is the expected overall gender difference, k indexes the propensity score quintile, N is the total sample size, nk is the sample size in propensity score quintile k, and y Mk and y Fk , respectively, are the average salary for male and females within propensity score quintile k. The estimated standard error of this estimated difference is commonly calculated as

s$ (∆ ) = where

2 2 nk2  sMk sFk + ∑ 2  nFk k =1 N  nMk 5

  

nMk and nFk are the number of males and 2

2

females, respectively, in quintile k, and sMk and sFk are the sample variances of salary for male and females, respectively, in quintile k. 4.3 Complex Survey Design Considerations Both linear regression and propensity score analyses are further complicated when the data have been collected using a complex sampling design, as is the case with the SESTAT data. One strategy in this situation is to fit the regression model using both

ordinary least squares and a survey-weighted version of this approach. Large differences between the two analyses may indicate model misspecification (DuMouchel and Duncan 1983). Similar advice should apply to propensity score analyses. 5.

Data Analysis

5.1 Variables In estimating gender differences, we control for educational, job-related, and demographic characteristics of the employees listed in Table 1. We have included family-related variables (marital status, whether or not the spouse works full-time or part-time, willingness to move for a job, and the percent of household income earned by the employee) in our analysis in order to address the family power theory of career progression (e.g., Stroh et al 1992). This theory proposes that the family member who provides the family with the most financial resources has the greatest power within the family and the family will tend to optimize that person’s career. We comment here on a few of the other variables for clarification. The work activities variables represent whether each activity represents at least 10% of the employee’s time during a typical workweek (1=yes, 0=no). The supervisor work variable represents whether the employee's job involves supervising the work of others (1=yes, 0=no). Employer size is measured on a scale of 1-7 (1=under 10 employees, 2=10-24 employees, 3=25-99 employees, 4=100-499 employees, 5=500-999 employees, 6=1000-4999 employees, 7=5000 or more employees). We treat this as a quantitative variable in the regression since larger values are associated with larger employers. Similarly, job satisfaction is measured by a single question with responses on a scale of 1-4 (1=very satisfied, 2=somewhat satisfied, 3=somewhat dissatisfied, 4=very dissatisfied). Again we treat this as a quantitative variable for the purposes of this analysis. The binary variables representing whether spouse works full-time and whether spouse works part-time (1=yes, 0=no) are coded as zero for anyone without a spouse. The variable measuring willingness to move is a crude proxy measuring whether or not the employee’s current job is in a different region of the country than where they received their most recent degree (1=yes, 0=no). Percent of household income earned by the employee is measured for 1996 since the surveys asked about total household income only for the previous year. Also, since respondents to the National Survey of Recent College Graduates (graduates during the past two years) were not asked about total household income for the previous year, the database provides total household income information only for respondents who had graduated at least two years ago (SESTAT component

3907


surveys are repeated every two years). Finally, the regression models contain squared terms for years since most recent degree and years in current job since the rate of growth of salaries may slow as employees acquire more experience (Gray 1993). To avoid multicollinearity, these variables have been meancentered. 5.2 Regression Results We summarize results from two regression models predicting salary. In the “basic” model, salary is predicted from education, job-related variables, and demographic variables. In the “full” model, salary is predicted from these variables plus family-related variables. The basic model is similar to models that have been used in other studies (e.g. NSF 99-352). The full model extends the basic model to include familyrelated variables that may influence work related decisions (e.g., relocation). Results from the basic model, indicate a preliminary salary gap of $2,198.47 (s.e.=$962.44, p=0.02) with men earning more than women. This preliminary estimate of the salary gap, with women earning approximately 3.7% less than men, is consistent with the salary gap found from a similar model for salaries of male and female engineers (NSF 99-352). However, after including family-related variables, results from the full model, shown in Table 1, show that the gender gap is no longer statistically significant (p=0.93). This suggests that it is important to consider the impact of family-related variables that can influence career strategies, as suggested by the family power theory. We note that for both of these regression models, diagnostic measures were satisfactory. These models were also fit including all two-way interactions with gender but none of the interactions were found to be significant after making Bonferonni adjustments for multiple comparisons, suggesting that separate models for men and women is unnecessary. Similar gender difference estimates and regression diagnostics were obtained from regression models using log salary and from survey weighted regressions. Similar analyses were performed for predicting job satisfaction. Four models were fit to predict job satisfaction: the basic and full models, plus two additional models formed by adding salary as a predictor to the full and basic models. The gender effect was not significant in any of these models. Again similar results were obtained from survey weighted regression. 5.3 Propensity Score Results As in the linear regression analyses, propensity score analyses were performed with and without controlling for family variables (the full model and basic models, respectively). For both models, after

subclassifying on the propensity score, all of the covariates are balanced (i.e, none of the imbalances are signifianct at the 5% level). In comparison, in the raw (unweighted) data, 15 covariates have differences with p-values of less than 0.05 which is many more than we expect by chance alone. For the full model, the estimated propensity scores overlapped for 520 women and 1452 men, making comparisons for these people possible. For the basic model, the estimated propensity scores overlapped for 529 women and 1462 men. Although they are difficult to classify, the 46 men who were dropped from the analysis in the full model because they could not be matched to similar women have slightly higher average incomes ($73,121.33) than the total sample of men, earned a very large portion of the household income (96% on average), didn’t have spouses that worked full time, and had a high incidence of supervisory responsibilities (76%). The 17 women who were dropped because they could not be matched to similar men had slightly lower incomes ($49,732.47) than the total sample of women, earned a very small percent of the household income (5% on average), had spouses who worked full-time (86%), and had a slightly lower incidence of supervisory responsibilities than the total sample of women (29.4%). Less extreme differences were found in the 9 women and 35 men dropped from the basic model. Propensity adjusted estimates of gender differences are similar to the regression estimates. In the basic model without family variables the preliminary estimate of the gender gap is $1,902.29 with men earning more than women, but the difference is not statistically significant (s.e.=$1,483.76, p=0.20). In the full model including family variables the gender effect is not statistically significant (p=0.79). Although the regression and propensity score analyses disagree on the statistical significance of the gender gap when not controlling for family-related variables, estimates from the basic model should be used with caution since both methods are susceptible to omitted variable bias which is present in both basic models since we know that the family-related variables are important predictors that have been omitted from these models. As in the linear regressions, similar analyses for job satisfaction (4 models: with and without family variables, with and without salary) yield no significant gender differences in job satisfaction. Again, similar results are obtained from survey weighted analyses for both salary and job satisfaction. 6.

Discussion After controlling for educational, job-related, demographic and family-related variables, there was found to be no statistically significant difference in salary or job satisfaction between male and female

3908


computer systems analysts in the 1997 SESTAT data. These results suggest that although educational, jobrelated, and some demographic variables can account for much of the differences between the salaries of men and women, it is also important to consider familyrelated variables that may affect career strategies when assessing gender gaps in population-wide studies. With regards to the statistical methodology, multiple linear regression and propensity score analysis are both useful methods of estimating gender gaps in salaries or job satisfaction. Although the main results from the multiple linear regression and propensity score analyses agreed in these analyses, they are not guaranteed to do so. We plan further comparisons of these two methods, in particular when there is less overlap in the covariate distributions of the two groups. Future analyses will also examine whether similar results are found for other IT occupations such as computer software engineers and computer programmers.

Nonexperimental Studies: Reevaluating the Evaluation of Training Programs,” Journal of The American Statistical Association, 94: 1053-1062.

Endnotes 1. Chi-squared tests can be used for multi-category variables, however to simplify our analysis we transformed these variables into a series of dummy variables and then used two-sample tests of proportions. 2. For binary covariates it would be more appropriate to use logistic regressions with the covariate as the independent variable and dummy variables for the propensity score quintiles and gender and all twoway interactions, but for the large samples in our analysis ANOVAs should give adequate approximations. Similar loglinear models could be used for multi-category variables, however transforming these variables to a series of dummy variables can simplify the analysis, allowing the use of two-way ANOVAs or logistic regressions, rather than loglinear models.

Gearan, A. (2000), “Clinton Chides Tech Biz Over Pay Gap,” Associated Press (May 11, 2000).

References Cochran, W. G. (1968), “The Effectiveness of Adjustment by Subclassification in Removing Bias in Observational Studies.” Biometrics, 24, 205-213. Council of Economic Advisers. (2000), “Opportunities and Gender Pay Equity in New Economy Occupations,” White Paper, May 11, 2000, Washington, D.C.: Council of Economic Advisors. Darity, W. A. and Mason, P.L. (1998), “Evidence on Discrimination in Employment: Codes of Color, Codes of Gender,” The Journal of Economic Perspectives, 12:63-90.

DuMouchel, W.H. and Duncan, G.J. (1983), “Using Sample Survey Weights in Multiple Regression Analyses of Stratified Samples,” Journal of the American Statistical Association, 78: 535-543. Drake, C. (1993), “Effects of Misspecification of the Propensity Score on Estimators of Treatment Effect,” Biometrics, 49:1231-1236. Finkelstein, M.O., and Levin, B. (2001), Statistics for Lawyers, Second Edition, New York: Springer-Verlag. Gastwirth, J.L. (1993), “Comment on ‘Can Statistics Tell Us What We Do Not Want to Hear? The Case of Complex Salary Structures’,” Statistical Science, 8:165171.

Gray, M. (1993), “Can Statistics Tell Us What We Do Not Want to Hear? The Case of Complex Salary Structures,” Statistical Science, 8:144-179. Kirchmeyer, C. (1998), “Determinants of Managerial Career Success: Evidence and Explanation of Male/Female Differences,” Journal of Management, 24:673-692. National Science Foundation (NSF 99-337), SESTAT: A Tool for Studying Scientists and Engineers in the United States. Authors: Nirmala Kannankutty and R. Keith Wilkinson, Arlington, VA, 1999. National Science Foundation (NSF 99-352), How Large is the Gap in Salaries of Male and Female Engineers? Arlington, VA, 1999. Perkins, S. M., Tu, W., Underhill, M. G., Zhou, X.-H., and Murray, M. D. (2000). “The Use of Propensity Scores in Pharmacoepidemiologic Research,” Pharmacoepidemiology and Drug Safety, 9, 93-101. Rosenbaum, P.R. (1986). “Dropping Out of High School in the United States: An Observational Study,” Journal of Educational Statistics, 11: 207-224. Rosenbaum, P. R., and Rubin, D. B. (1984). “Reducing Bias in Observational Studies Using Subclassification on the Propensity Score,” Journal of the American Statistical Association, 79, 516-524.

Dehejia, R.H., and Wahba, S. (1999), “Causal Effects in

3909


Rubin, D. B. (1997). “Estimating Causal Effects from Large Data Sets Using Propensity Scores,” Annals of Internal Medicine, 127(Part 2), 757-763.

Women, Work and Computerization, 57: 423-437.

Stroh, L.K., Brett, J.M., and Reilly, A.H. (1992), “All the Right Stuff: A Comparison of Female and Male Shapiro, G. (1994), “Informal Processes and Women's Managers’ Career Progression,” Journal of Applied Careers in Information Technology Management,” Psychology, 77: 251-260. Table 1 Regression Results: Full Model (with family-related variables) Coefficient S.E. t-statistic Intercept 21,464.89 4,896.14 4.38 Education: Years since most recent degree 520.44 62.90 8.27 (Years since most recent degree)2 -17.32 4.18 -4.14 Most recent degree in computer/math (yes/no) 2,908.17 854.83 3.40 Type of most recent degree (reference category Bachelor’s): Master’s 7,260.00 1,003.32 7.24 Doctorate 15,196.62 1,661.09 9.15 College courses after most recent degree (yes/no) -1,157.77 1,264.71 -0.92 Job: Employment Sector (reference Category Business/Industry): Government -6,341.00 1,392.03 -4.56 Education -14,199.13 1,780.53 -7.97 Hours worked during a typical week 312.14 67.21 4.64 Years in current job -54.37 97.48 -0.56 (Years in current job)2 31.67 7.58 4.18 Work Activities: Basic Research (yes/no) 895.95 1533.44 0.58 Applied Research (yes/no) 1,025.72 1,215.44 0.84 Computer Applications (yes/no) 4,268.80 2,199.87 1.94 Development (yes/no) -1,317.84 1,174.25 -1.12 Design (yes/no) 3,009.66 893.28 3.37 Management and Administration (yes/no) 1,912.14 1,193.43 1.60 Supervisory work (yes/no) 3,230.22 1,206.31 2.68 Attended work related training during the past year (yes/no) -849.13 917.75 -0.93 Employer size -479.95 233.16 -2.06 Demographics: Male 82.75 1010.47 0.08 Race (reference category: White): Asian -4,406.29 1,659.23 -2.66 Other -1975.16 1,414.16 -1.40 Current Location (reference category: Pacific): New England -1,907.67 1,906.39 -1.00 Mid Atlantic 446.92 1,424.63 0.31 East North Central -6,360.90 1,500.81 -4.24 West North Central -7,831.82 1,877.84 -4.17 South Atlantic -6,280.06 1,378.31 -4.56 East South Central -9,795.13 2,747.37 -3.57 West South Central -7,703.46 1,650.83 -4.67 Mountain -8,970.04 2,089.91 -4.29 U.S. Citizen (reference category: Native) Naturalized 3,622.26 1,601.00 2.26 Permanent Resident 6,458.93 2,312.66 2.79 Temporary -1,568.41 4,281.12 -0.37 Married 5,991.85 1,303.006 4.60 Spouse works full-time (yes/no) -713.92 1,357.54 -0.53 Spouse works part-time (yes/no) 419.25 1,588.39 0.26 Willing to move (yes/no) 637.78 923.82 0.69 Percent of household income earned by employee 12,663.32 2,561.29 4.94 Regression Summary Statistics: R2=0.24, overall F-statistic=F(39,1995)=15.94 (P=0.00), n=2035

3910

P-value 0.00 0.00 0.00 0.00 0.00 0.00 0.36

0.00 0.00 0.00 0.58 0.00 0.56 0.40 0.05 0.26 0.00 0.11 0.01 0.36 0.04 0.93 0.01 0.16 0.32 0.75 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.73 0.00 0.60 0.79 0.49 0.00

A Comparison of Propensity Score and Linear Regression ... - CiteSeerX

A Comparison of Propensity Score and Linear Regression ... - CiteSeerX

Suggest Documents

A Comparison of Propensity Score and Linear Regression ... - CiteSeerX

A Comparison of Propensity Score and Linear Regression ... | CiteSeerX

A Comparison of Propensity Score and Linear Regression Analysis of ...

A propensity score matched comparison of

A Propensity-Score Matched Comparison of

Comparing Standard Regression, Propensity Score ... - Springer Link

Propensity Score Estimation with Boosted Regression for ... - CiteSeerX

Propensity score matched comparison of non ...

a propensity score

Post-Acquisition Performance: A Propensity Score ... - CiteSeerX

A propensity score matched comparison of different insulin regimens 1 ...

A propensity score-matched ana

A Propensity-Score Matching Analysis

a propensity score analysis

Comparison of Logistic Regression and Linear Discriminant ...

A Propensity Score Matching Analysis

Two studies in one: A propensity-score-matched comparison ... - PLOS

Propensity Score Estimation with Boosted Regression ... - Google Sites

Propensity Score Matching

Propensity Score Matching

A Comparison between Simple Linear Regression ...

A Comparison between Simple Linear Regression

Propensity Score Estimation with Boosted Regression ... - Google Sites

Propensity score matching