On the basis of Hedges's ( 1 989 ) results, we know that p* and e in Equation 4 are ...... tion results for two job groups in the petroleum industry. Journal of.
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Psychological Bulletin 1991,\fol. 110, No. 3, 596-610
Copyright 1991 by the American Psychological Association, Inc. 0033-2909/91/J3.00
Validity Generalization Procedures Using Sample-Based Estimates: A Comparison of Six Procedures Jorge L. Mendoza and Robert N. Reinhardt Texas A&M University This Monte Carlo study examined the accuracy of 6 validity generalization (VG) procedures when the procedures were applied without a set of assumed artifact distributions. Because in most cases the relevance of the artifact distributions is difficult if not impossible to establish, it is important to investigate the accuracy of VG procedures when these are based only on available data. The study showed that VG procedures that are based on the available data are accurate, if the usual VG assumptions are met and the selection ratio is not too small. When there was little or no variability in the selection ratio across situations, the TSA1 procedure (Raju & Burke, 1983) was the most accurate. For conditions in which there is variability in the selection ratio, we recommend Hedge's procedure (Hedges, 1988).
For years, researchers have observed that validity of a test changes from a situation to another. This led Ghiselli (1966) to review hundreds of validity studies in an effort to categorize and identify situations in which tests are valid. He suggested that artifacts such as unreliability of the predictor and criterion, range restriction, and sampling error account for some of the variability of validity coefficients across situations. Not until the seminal work of Schmidt and Hunter (1977), however, did researchers have a way to estimate the magnitude of these effects. The Schmidt and Hunter (1977) work stimulated other researchers; a number of validity generalization (VG) procedures appeared in the literature shortly. Using a procedure somewhat different from that of Schmidt and Hunter, Callender and Osburn (1980) suggested a multiplicative model, and Raju and Burke (1983) presented their alternative VG procedures. Although the procedures developed by all these investigators (including those of Schmidt, Gast-Rosenberg, & Hunter, 1980) are similar and generally yield equivalent results, they can yield different results under certain circumstances. A number of computer studies have investigated these similarities/differences (e.g., Callender, Osburn, Greener, & Ashworth, 1982; Raju & Burke, 1983). These VG procedures rely on a decomposition of the variance of observed correlation coefficients. We refer to them as variance adjustment procedures. Lately, a different procedure, both conceptually and computationally, has been suggested by Hedges (1988). This procedure adjusts the correlation coefficients directly, only indirectly adjusting the
variance. We do not refer to this procedure as a variance adjustment procedure. The primary goal of a VG procedure is to estimate the mean and variance of true validity, p, across situations; true validity is the validity of a test (or construct) when the reliabilities are one and there is no range restriction. After making a number of assumptions, the procedures provide a way to estimate the mean and variance of each artifact variable. Using these means and variances, one adjusts the variance of the observed validity coefficients to obtain an estimate of the true validity variance. Because reliabilities and range restriction information are frequently unavailable, both Jones (19SO) and Schmidt and Hunter (1977) proposed using hypothetical (assumed) distributions of artifacts. Callender and Osburn (1980) were the first to use a Monte Carlo simulation to evaluate the accuracy of VG procedures. They created a distribution of true validities, attenuated this distribution with the assumed distributions of Schmidt and Hunter (1977), and applied the VG procedures. Callender and Osburn (1980) tested the accuracy of procedures under three sets of assumptions (or cases). In Case 1, true validity was held constant, and reliability and range restriction were varied; Case 2 was the opposite of Case 1; and in Case 3, all three variables were allowed to vary. In an effort to lessen the independence restrictions of previous procedures: Raju and Burke (1983) introduced two procedures: Taylor Series Approximation 1 (TSA1) and Taylor Series Approximation 2 (TSA2). The procedures require only that the covariances among the artifacts and true validity be zero, instead of requiring independence. Raju and Burke (1983) compared their procedures to the interactive and noninteractive procedures given in Schmidt et al. (1980) and to the multiplicative procedure of Callender and Osburn (1980). Raju and Burke's Monte Carlo simulation compared the procedures in the absence of sampling error. The bias was similar in the five procedures; TSA1 was in general slightly better than the others. For Case 1 (constant true validity), the noninteractive procedures tended to underestimate the vari-
We are pleased to acknowledge the support of Air Force Office of Scientific Research Grant F496020-85-C0013/SB5851-0360, which greatly facilitated the work on this project. We are grateful for the helpful advice supplied by Malcolm Ree from the Human Resources Laboratory, Brooks Air Force Base, Texas. Correspondence concerning this article should be addressed to Jorge L. Mendoza, who is now at the Department of Psychology, University of Oklahoma, 455 West Lindsey Street, Norman, Oklahoma 73019-0535.
596
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
VALIDITY GENERALIZATION
ance, but the other procedures generally overestimated the variance of true validity For Case 2, the procedures generally overestimated the variance. For Case 3 (i£., everything varies), the procedures also overestimated true validity and variance, but only by small amounts. Later studies found similar results (e.g., Kemery, Mossholder, & Roth, 1987; Specter & Levine, 1987). In the original procedure presented by Schmidt and Hunter (1977), the observed correlations, (rs), were transformed to Fisher's z before carrying out the calculations. This transformation reduces the correlation, observed over situations, between p and sampling error. (Without the r-to-z transformation, the sampling distribution of the correlation coefficient is negatively skewed; more important, the estimation of p is biased, causing the correlation between p and e) In addition, James, Demaree, and Mulaik (1986) provided evidence showing that the error variance in a VG study is underestimated when the rs are not transformed. The effect is negligible, however, when the samples are large (ijt, «, > 50; for details, see James et al, 1986; Silver & Dunlop, 1987; and Strube, 1988). We elaborate on this point later. Presently, VG researchers seldom use the r-to-z transformation. VG procedures appear to be reasonably accurate when the hypothetical distributions are representative of the data collected and the assumptions are met. But the procedures can be misleading when the hypothetical distributions are not representative of the data. The consequences of assuming unrepresentative hypothetical distributions have been investigated by Dribin (1981), by Raju, Fralicx, and Steinhaus(1986), by Paese and Switzer (1988), and more recently by Paese and Switzer (1990). These studies demonstrated that artifact variance can be estimated incorrectly when the hypothetical distributions are not representative of the data, and they recommended caution in using hypothetical distributions. The purpose of this study was to assess the accuracy of the VG procedures when only sample data are used and to present a statistical discussion demarcating the application. We compared the Hedges (1988) VG procedure with the variance adjustment procedures, using a Monte Carlo simulation. The procedures were compared under complete data and two missingdata cases, obtained by deleting at random values from the complete-data sets. In the first missing-data case, 50% of the validation studies did not have any artifacts estimates; in the second, 75% of the studies were missing one artifact estimate. We did not include a more extreme missing-data case because when combined with a small selection ratio, it would have resulted in very small samples. In a VG study, empirical artifact estimates are typically available for some validation studies but not for others. A number of investigators have considered methods for handling missing data. For example, Schmidt, Hunter, and Caplan (1981) took an eclectic approach in creating the assumed distributions for the well-known American Petroleum Institute data. The criteria reliability distributions were constructed mirroring the available data by assigning equal (relative) frequencies to the observed reliability estimates. In contrast, the assumed distributions for the reliabilities of the tests were established by a logical argument that considered the nature of the tests in each grouptest combination. Callender and Osburn (1981) took an analo-
597
gous approach when dealing with the same American Petroleum Institute data and recommended using the available artifact data to estimate the mean and variance of the artifact distributions. Like Schmidt et al. (1981), they also used a logical argument to establish some of the distributions for criterion reliabilities. Hedges (1988) has been critical of these approaches to the missing-data problem. He argued that for these procedures to work, the missing data must be a random sample of all studies actually conducted, an assumption that would be hard to meet in most VG studies. He proposed instead a procedure that was based on least squares regression. A Statistical Argument for the Situation-Based Analysis In this section, we review and integrate the research providing a statistical foundation for VG procedures. Then we show how the foundation can be expanded for situational estimates under a set of specific assumptions. Callender and Osburn (1980) presented a model for VG that was based on the following decomposition of the observed correlation coefficient between a predictor X and a criterion Y: rt = P, + e,, where r, is a sample correlation coefficient that is based on n, subjects observed in the i* situation (/ = 1, . . . , k), p, is the population correlation (unattenuated and unrestricted), and et is the error of estimate. Although Schmidt and Hunter (1977) did not explicitly specify a model in their article, their VG procedure in essence assumes the same model. The number of situations, fc, in a VG study can be conceptualized either as a random sample from a (large) finite or infinite number of situations (Hedges & Olkin, 1985, p. 243; James, Demaree, Mulaik, & Mumford, 1988) or as a finite number of situations that one can observe. These conceptualizations of k are analogous to assuming, respectively, a random or fixed model in the analysis of variance. A number of VG studies have not been clear about this point, but apparently most of them implicitly have assumed a fixed model. We also assume that k is finite and that all situations are observable, to simplify the statistical presentation. For now, we also assume that the observed correlations have not been attenuated by unreliability of measurement or restriction in range. As we develop the model, we will relax these assumptions.
Statistical Bases for a Simple VG Model In the derivation of their VG procedure, Callender and Osburn (1980) assumed e and p to be uncorrelated over independent situations to express the variance of the observed correlations as the sum V(r)=V(p)+V(e). (1) But Hedges (1988) has shown, assuming normality for the X and Y, that e and p are correlated. Fortunately, this correlation is generally small and negative, especially when the ns are large. Specifically, Hedges (1988) showed that the covariance between e and p is a function of p and «, Cov(