construct validity of two categories of assessment center ... - CiteSeerX

PERSONNEL PSYCHOLOGY 1990.43

CONSTRUCT VALIDITY OF TWO CATEGORIES OF ASSESSMENT CENTER DIMENSION RATINGS TED H. SHORE Department of Management and Marketing Kennesaw State College GEORGE C. THORNTON, III Colorado State University LYNN MCFARLANE SHORE Department of Management Georgia State University The construct validity of assessment center final dimension ratings was examined within a nomologicai network of cognitive and personality measures. Four hundred forty-one employees of a large mid-western petroleum company were assessed on 11 dimensions in two broad categories and completed four tests. Results showed that several cognitive ability measures related more strongly to performance-style dimension ratings than to interpersonal-style dimension ratings, providing evidence for convergent and discriminant validity. Correlation analysis and factor analysis support the two a priori interpersonal- and performance-style categories. The results suggest that final dimension ratings possess construct validity and that assessors can differentiate between two broad categories of assessment dimensions.

Validation of any measurement procedure requires a variety of different types of evidence to establish the psychological constructs being measured (Landy, 1986). Predictive and concurrent validity coefficients of the overall assessment rating with many different criteria have been thoroughly investigated (e.g., Gaugler, Rosenthal, Thornton, & Bentson, 1987; Hunter & Hunter, 1984; Thornton & Byham, 1982). Procedures for establishing the content relatedness of assessment centers have also been set forth (Sackett, 1987). What has been less thoroughly investigated is the relationship of dimension ratings within an assessment center to a wider array of external measures. Assessment center ratings have not been investigated within a nomologicai network of related constructs (Cronbach & Meehl, 1955). The purpose of this study was to investigate the construct validity of assessment center final dimension ratings utilizing the approach espoused by Cronbach and Meehl (1955). We wish to thank Virginia Rogers for providing the data used in this study. Correspondence and requests for reprints should be addressed to Ted H. Shore, Department of Management and Marketing, Kennesaw State College, Marietta, GA 30061. COPYRIGHT © 1990 PERSONNEL PSYCHOLOGY, INC.

101

102

PERSONNEL PSYCHOLOGY

Construct validation is a process of gathering evidence to support the use of a test as a measure of a construct. As Landy (1986) and Binning and Barrett (1989) argued persuasively, there are many types of information that can be gathered to understand the constructs being measured by any test, but there is really only a unitary process of validating a test. Tenopyr (1977) asserted that content, criterion-related, and construct validity strategies are not easily distinguishable from one another. She further stated that for inferences from test scores to be useful for personnel decisions, they must be based on underlying constructs common to both test performance and job behavior and that "underlying constructs are involved in all measurement" (p. 53). Developers of assessment centers have usually assumed content validity by employing exercises that are representative samples of behavior of the job domain. However, there has been relatively little effort to study the psychological meaning of assessor judgments about performance on dimensions. In fact, research on assessment centers has produced limited evidence concerning their construct validity (Adams & Thornton, 1988). It is important to know if assessor judgments accurately reflect the underlying constructs in dimensions since these ratings provide the basis for judgments of overall management potential as well as for planned developmental experiences. Several approaches may be used to study the construct validity of dimension ratings. Some studies have used the multitrait-multimethod matrix approach (Campbell & Fiske, 1959). Within this framework, dimensions serve as traits, and exercises are the methods. A demonstration of construct validity requires that (1) monotrait-heteromethod correlations between ratings of the same ability across different exercises be significantly correlated (convergent validity) and (2) heterotraitmonomethod correlations between ratings of different abilities within the same exercise be relatively uncorrelated (discriminant validity). Convergent validity evidence of within-exercise dimension ratings has been reported in several studies. Hinrichs and Haanpera (1976) reported an average monotrait-heteromethod correlation of .49 for 14 dimensions which ranged from -.04 to .73. Sackett and Dreher (1982) reported that for three different organizations' assessment centers, correlations among ratings of the same dimensions across different exercises were substantially lower than correlations among different dimension ratings within exercises. They concluded that "there is virtually no support for the view that the assessment center technique generates dimension scores that can be interpreted as representing complex constructs" (p. 409). Several other studies have reported moderate to high monotrait-heteromethod correlations; however, these correlations were

TED H. SHORE ET AL.

103

consistently lower than heterotrait-monomethod correlations (Archambeau, 1979; Bycio, Alvares, & Hahn, 1987; Neidig, Martin, & Yates, 1979; Russell, 1987). Thus, research on the construct validity of withinexercise dimension ratings has provided some evidence for convergent validity, but very little evidence for discriminant validity. Factor analyses of within-exercise dimension ratings have also been used to determine their construct validity. Support for construct validity would result if the underlying factors represented dimensions rather than exercises. Sackett and Dreher's (1982) factor analyses revealed that resulting factors represented exercises and not dimensions in all three assessment centers studied. Similarly, Bycio, Alvares, and Hahn (1987), using confirmatory factor analysis, found that exercise variance dominated assessor dimension ratings. These findings do not support construct validity since the factors that emerged did not reflect intended constructs (i.e., dimensions), but they reflected exercises. The construct validity of final (across-exercise) dimension ratings has also been investigated in several studies. In general, these studies have reported significant intercorrelations between final dimension ratings (Archambeau, 1979; Outcalt, 1988), suggesting that across-exercise dimension ratings do not reflect distinct constructs. Factor analyses of final dimension ratings have also been performed and have generally yielded two to four factors (Adams & Thornton, 1988). For example, Schmitt's (1977) factor analysis of 17 dimensions produced three factors: an Administrative Skills factor, an Interpersonal Skills factor, and an Activity factor. The present research studied the construct validity of final assessment center dimension ratings within a nomologicai network of related constructs. Evidence for construct validity is obtained by observing the pattern of interrelationships between measures of the constructs being studied (assessment center dimension ratings) and independent measures of other constructs expected to be both related and unrelated to those of interest (ability and personality test scores). This approach has not been applied often to assessment center ratings. Russell (1987) conducted the only known published study in which final dimension ratings were related to other measures in an assessment center. He found that a self-reported measure of interpersonal behavior was more strongly related to a variety of postexercise ratings from an interpersonal skill exercise (interview) than to the same dimensions rated in a non-interpersonal skill exercise (in-basket). The present study evaluated the construct validity of final assessment center dimension ratings in two ways. First, the construct validity of two broad categories of assessment dimensions was examined.

104


In the assessment center studied, dimensions were classified on rational grounds into an interpersonal-style or a performance-style category. Interpersonal-style dimensions purportedly measured the candidate's typical style when working with others (orientation toward people). By contrast, performance-style dimensions were designed to determine the candidate's work style and capabilities (orientation toward the task itself). Support for this approach is found in much previous research suggesting that assessors reduce a large set of dimensions to a smaller manageable number of categories (Thornton & Byham, 1982). Second, the construct validity of individual assessment center dimensions was also evaluated. Theoretical and empirical evidence suggests that the two major sets of dimensions should be related in different ways to the cognitive and personality tests. Furthermore, the two types of dimensions should be associated with different constructs within the personality realm. At the broadest level, Cronbach (1970) distinguished between maximal performance and typical performance. The former is an indication of how well a person can perform at his or her best and includes such attributes as general mental ability (e.g., intelligence), special mental abilities (e.g., reasoning, mechanical aptitude), and proficiency and skills. Typical performance refers to the behavior a person is likely to display in a given situation and includes personality, interests, and interpersonal relations. In the present study, the cognitive measures (general reasoning, quantitative ability, reading speed, and reading comprehension) served as maximal performance measures, whereas the personality test (16PF) provided evidence about the candidate's typical performance. The empirical distinction between these two broad classes of attributes is demonstrated in research by McCrae and Costa (1985). A factor analysis of a large set of adjectives and personality test scales showed that ability measures did not load highly with factors in the self-reports of five personality characteristics. These results are consistent with the treatment of cognitive abilities and personality characteristics as distinct variables in the professional literature (Anastasi, 1982). The correspondence between subsets of assessment dimensions and different personality constructs can be posited on the basis of theories of personality structure. Although there is no generally agreed-upon list of personality constructs, a five- or six-factor set of dimensions has emerged in the writings of several theorists. (Hogan, 1982; McCrae & Costa, 1985; Norman, 1963). Among these factors are attributes that refer to performance capabilities (e.g., conscientiousness and intelligence) and others that refer to interpersonal relations (e.g., extraversion, agreeableness, and adjustment). Due to the diverse nature of the personality factors measured by the 16PF test, individual assessment dimensions in

TED H. SHORE ET AL.

105

both the performance- and interpersonal-style categories were expected to relate to specific 16PF scales. Thus, it was hypothesized that: Hypothesis 1: Cognitive ability measures would relate more strongly to the performance-style dimension ratings than to the interpersonal-style dimension ratings. Hypothesis 2: Interpersonal-style dimension ratings would relate more strongly to conceptually similar personality dimensions than to conceptually dissimilar personality dimensions. Hypothesis 3: Performance-style dimension ratings would relate more strongly to conceptually similar personality dimensions than to conceptually dissimilar personality dimensions. Hypothesis 4: Performance-style dimensions would be more highly intercorrelated with each other than with interpersonal-style dimensions. Interpersonal-style dimensions would be more highly intercorrelated with each other than with performance-style dimensions. Method Participants

Assessment center ratings were obtained from a large midwestern petroleum company in which 441 candidates were assessed between 1980 and 1985. The assessment center was designed to identify the level of management (ranging from lower to upper) for which the candidate had potential. Candidates were nominated by their immediate supervisors to attend the center if they had obtained average or better recent job performance ratings and were perceived by their immediate supervisors to possess management potential. The average job tenure of the participants was 7.06 years. All candidates were exempt employees who performed in a variety of technical/professional or lower-level supervisory positions in various company divisions throughout the United States. Assessment Center Procedure

During each assessment center, 12 candidates participated over a three-day period in a variety of exercises including an individual interview, three leaderless group discussions (a selection exercise, a case analysis, and a manufacturing exercise), an oral presentation, and an inbasket. In addition, all candidates took the following standardized tests: School and College Ability Test (Educational Testing Service, 1961), Miller Analogies Test (Miller, 1975), Davis Reading Test (Davis & Davis, 1957), and the 16PF test (Cattell, 1978). The cognitive abilities tests

106


were treated as a set in the present study. The rationale for this is that all the cognitive measures were significantly intercorrelated (mean r = .57). Reading speed and general reasoning were the most highly correlated (r = .74, p < .001), and the least correlated were reading comprehension and quantitative ability (r - .44, p < .001). In addition, two projective tests were administered to the candidates (the Thematic Apperception Test, 1943; and a sentence completion test developed by the organization) but were not included in the present study because the scores were not independent of the dimension ratings (see procedure below). Following each exercise, assessors prepared a narrative report based on observations of the candidate's behavior. After all tests and exercises were completed, the integration committee, composed of the three assessors and two psychologists, met to evaluate each candidate. The integration committee was presented with all exercise reports, peer ratings, and projective test results prior to rating the dimensions. The intellectual test results were withheld from the assessors until after all dimensions were rated so as not to bias their ratings. In addition, the 16PF test was administered to all candidates on an experimental basis; the results were never made known to the assessors and were therefore not used in the evaluation of candidates. Therefore, the assessment dimension ratings were not contaminated by either the intellectual or 16PF test results. Dimensions were rated on a 1 (low) to 5 (high) point scale including half scale points (e.g. 1, 1.5, 2 .. .5). For each dimension, each full scale point was anchored with a verbal description. Assessors independently rated a candidate on a dimension and then shared their ratings. In the event of more than a full scale point discrepancy among the assessors, a discussion was held in order to achieve consensus. After all 11 dimensions were rated, the intellectual test results for general reasoning (derived from the MAT and SCAT Verbal Section), quantitative ability (based on the SCAT Quantitative Section), and reading speed and reading comprehension (based on the Davis Reading Test) were presented on a 5-point scale to the committee. The raw test scores were converted to the same 5-point (stanine) scale used for the assessment dimensions by means of internal test norms developed by the organization. This was done to maintain consistency in the scaling of all assessment center data for use by the assessors as well as in the summary assessment reports. Finally, an overall rating of the candidate's management potential was assigned by the committee. Since this overall assessment rating was contaminated by the knowledge of the intellectual abilities test results, this rating was not used in the present study. Definitions, means, and standard deviations of dimensions are shown in Appendix A and of test variables in Appendix B.

TED H. SHORE ET AL.

107

Results

The first hypothesis was that the cognitive ability measures would relate more strongly to the performance-style dimension ratings than to the interpersonal-style dimension ratings. Table 1 shows the correlations of the dimensions with the cognitive tests. As predicted, the average correlation between all of the cognitive abilities and performance-style dimensions (mean r =.25) was significantly greater than was the average correlation between the cognitive abilities and interpersonal-style dimensions (mean r = .09; t = 3.32, p < .001.). Furthermore, analyses were conducted separately for each of the four cognitive test measures. These analyses revealed that the performance-style dimensions were significantly more strongly correlated with general reasoning (t = 3.73, p < .001), quantitative ability (t = 3.06, p < .001), reading speed {t = 3.61, p < .001), and reading comprehension (t = 2.90, p < .001) than were the interpersonal-style dimensions. These results provide evidence of convergent and discriminant validities for the performanceand interpersonal-style dimension categories. The second and third hypotheses dealt with relationships between the 16PF scales and assessment dimensions. The hypotheses for which specific 16PF scales related to individual assessment dimensions were developed in the following way. Each of the authors independently hypothesized relationships between dimensions and 16PF scales on the basis of the scale and dimension definitions. Any relationship predicted by two or three of the authors was used as an hypothesis in the study. Each of the authors was very familiar with the assessment center dimensions used in the present study. Furthermore, each had experience teaching testing and measurement and was well acquainted with the 16PF scales. Hypothesis 2 predicted that the interpersonal-style assessment dimensions would relate more strongly to conceptually similar 16PF scales than to conceptually dissimilar scales. The top part of Table 2 shows the correlations between the interpersonal-style dimensions and the conceptually similar 16PF scales. Assessment ratings on amount of participation and impact were both highly correlated with all conceptually similar 16PF scales, and were significantly more strongly correlated with conceptually similar than with dissimilar 16PF scales (t = 4.61, p < .001; and t = 4.19, p < .001 respectively). Assessment ratings on personal acceptability were significantly correlated with two of the five conceptually similar scales on the 16PF, but there was no significant difference between the average correlation for similar and dissimilar 16PF scales.


108

TABLE 1 Relationship Between Assessment Dimensions and Cognitive Measures

Assessment dimensions Interpersonal-Style Amount of participation Impact Personal acceptability Understanding of people

.14** .13** -.05 .07 Meanr =

Performance-Style Originality Oral communication Recognizing priorities Need for structure Thoroughness Work quality Work drive Mean r = *p

construct validity of two categories of assessment center ... - CiteSeerX

construct validity of two categories of assessment center ... - CiteSeerX

Suggest Documents

The Construct-Related Validity of Assessment Center ...

Interview Assessment of Boldness: Construct Validity ... - Conservancy