performance appraisal of behavior-based

PERSONNEL PSYCHOLOGY 2007, 60, 201–230

PERFORMANCE APPRAISAL OF BEHAVIOR-BASED COMPETENCIES: A RELIABLE AND VALID PROCEDURE VICTOR M. CATANO Saint Mary’s University WENDY DARR Royal Canadian Mounted Police CATHERINE A. CAMPBELL Royal Canadian Mounted Police

A new performance appraisal system, developed for promotions in the Royal Canadian Mounted Police non-commissioned officer ranks, fairly differentiated among candidates. Members (N = 6,571) illustrated their performance on core competencies with behavioral examples. Supervisors and then review boards used a BARS procedure to reliably rate performance. Both candidates and supervisors supported the system. The performance appraisal scores predicted career advancement in the organization.

The advantages and disadvantages of using performance assessment in making employment decisions are well documented (e.g., Murphy & Cleveland, 1995). The limitations of performance assessment, such as inflated ratings, lack of consistency, and the politics of assessment (Tziner, Latham, Price, & Haccoun, 1996), often lead to their abandonment. Managers responsible for delivering performance reviews who are uncomfortable with the performance rating system may give uniformly high ratings that do not discriminate between ratees. Poor ratings detract from organizational uses and increase employee mistrust in the performance appraisal system (Tziner & Murphy, 1999). Employees on the receiving end of the appraisal often express dissatisfaction with both the decisions made as a result of performance assessment and the process of performance assessment (Milliman, Nason, Zhu, & De Cieri, 2002), which may have longitudinal effects on overall job satisfaction (Blau, 1999) and commitment

We thank all the members and staff of the RCMP who participated in this project and contributed to its success. We also thank the external reviewers and the editor for their comments and suggestions. Correspondence and requests for reprints should be addressed to Victor M. Catano, Department of Psychology, Saint Mary’s University, Halifax, NS, Canada B3H 3C3; [email protected]. C 2007 BLACKWELL PUBLISHING, INC. COPYRIGHT

201

202

PERSONNEL PSYCHOLOGY

(Cawley, Keeping, & Levy, 1998). The extensive research on performance appraisal (see Arvey & Murphy, 1998; Fletcher, 2001; Fletcher & Perry, 2001; Latham & Mann, 2006; Murphy & Cleveland, 1995; Smither, 1998 for reviews) has not addressed the fundamental problems of the performance appraisal process that performance appraisal is influenced by a variety of relevant, nonperformance factors such as cultural context (Latham & Mann, 2006), that it does not provide either valid performance data or useful feedback to individuals (Fletcher, 2001), or that performance appraisal instruments often measure the “wrong things” (Latham & Mann, 2006, p. 302). In 1992, the Auditor-General of Canada had singled out the use of performance assessments as part of the non-commissioned officer (NCO) promotion system of the Royal Canadian Mounted Police (RCMP) for just such criticism. As a result, a new promotion system based on a written examination of police knowledge and a behavioral-based structured interview was introduced in 1993 and used over two, 2-year promotion cycles. Performance appraisals were excluded from the new system, which was based on recommendations from a panel of human resource management experts and designed to incorporate the “best practices” of industrialorganizational psychology. The job dimensions underlying the measures were derived through functional job analysis carried out with what were considered to be key occupations within the 500 available for NCO ranks of constable, corporal, sergeant, and staff sergeant. This promotion system resulted in considerable negative reaction. Dissatisfaction centered on the lack of any direct assessment of the candidate’s performance, the lack of a meaningful role in the system for the candidate’s immediate supervisor, and the restriction of the interview to a limited number of members based on a rank ordering of exam scores. At 70 town hall meetings held across Canada with 50–100 members in attendance at each meeting, in written submissions, and through meetings with key stakeholders, both members and managers cited the need to put back a direct measure of performance in the promotion system as an incentive, a motivator, and a reward for good performance (Catano & Associates, 1997). This was somewhat surprising in light of Bowman‘s (1999, p. 557) definition of performance appraisal as something “given by someone who does not want to give it to someone who does not want to get it.” The challenge then became how to build a performance appraisal system that could be used for promotional purposes. Murphy and Cleveland (1995, p. 404) state that “a system that did nothing more than allow the making of correct promotion decisions would be a good system, even if the indices used to measure performance were inaccurate or measured the wrong set of constructs.” No assessment system, however, would meet with success if it did not have the support of those it assessed. RCMP members wanted a new performance appraisal

VICTOR M. CATANO ET AL.

203

system to be open, transparent, and fair. They wanted all promotional candidates to have the opportunity to be assessed (candidates who had not proceeded to the structured interview were demoralized and considered themselves to be “failures”). Members of the organization also urged that supervisors receive training in using the assessment system and that the supervisory ratings be monitored. Furthermore, the assessed performance dimensions had to be common to all NCO positions and not job specific (Catano & Associates, 1997). In addition to meeting the requirements of the members, the performance appraisal system had to be capable of handling approximately 6,500 candidates seeking promotion to the three highest NCO ranks. For policy reasons, it could not be based on any existing RCMP performance appraisal system used for personnel development. All forms associated with the system had to be compatible with, and completed on, the RCMP’s computer network and its FormFlow data management software program that was used to design, store, and retrieve all official forms in the organization. The performance assessment procedure could not cost more to operate than the structured interview boards that it would replace, and lastly, it had to be legally defensible and nondiscriminatory with respect to women, aboriginals, and visible minorities. Furthermore, the system had to be available in both English and French as the RCMP is a bilingual organization. Finally, in developing a new performance appraisal system, we were guided as much as possible by past research on performance appraisals that identified a number of factors that lead to greater acceptance of appraisals by employees. First, legally sound performance appraisals should be objective and based on a job analysis, they should also be based on behaviors that relate to specific functions that are controllable by the ratee, and the results of the appraisal should be communicated to the employee (Malos, 1998). Second, the appraisals must be perceived as fair. Procedural fairness is improved when employees participate in all aspects of the process, when there is consistency in all processes, when the assessments are free of supervisor bias, and when there is a formal channel for the employees to challenge or rebut their evaluations (Gilliland & Langdon, 1998). In addition to perceptions of fairness, participation by employees in the appraisal process is related to motivation to improve job performance, satisfaction with the appraisal process, increased organizational commitment, and the utility or value that the employees place on the appraisal (Cawley et al., 1998). The New Performance Assessment System

To overcome the problem of job-specific performance dimensions, the performance assessment system was based on behaviorally defined core

204


competencies (Dubois, 1993; Klein, 1996). The core competencies had been previously identified through an extensive process as being common to all NCO positions; these competencies were to become the basis for training new RCMP recruits and for the continuous development of existing RCMP members (Himelfarb, 1996; RCMP Human Resource Directorate, 1997). Fletcher and Perry (2001, p. 137) stated that “the elements constituting what we normally think of as [performance appraisal] will increasingly be properly integrated into the human resources policies of the organization—using the same competency framework for all HR processes, linking individual objectives with team and business unit objectives framing the input of appraisal to promotion assessment in an appropriate manner, and so on” making it “a more effective mechanism and less of annual ritual that appears to exist in a vacuum.” Along the same lines, Smither (1998) argued that a competency model must be developed from a job analysis and content validation process based on an organization’s strategic goals, with the competencies defined at the behavioral level and including criteria for differentiating between different levels of expertise. Smither (1998) went on to note that the same competency model should guide “numerous human resource initiatives” (p. 540). The competency development process used for this study followed the suggestions of Fletcher and Perry (2001) and Smither (1998) and included a review of functional job analysis data for general police constables that covered a majority of the different job positions. In this sense, the competencies were “blended” by incorporating the RCMP’s values and specific attributes (Schippmann et al., 2000). A blended approach is one that couples an organization’s strategy in the derivation of the broad competencies with the methodological rigor of task analysis. As Lievens, Sanchez, and De Corte (2004) note, a blended approach is likely to improve the accuracy and quality of inferences made from the resulting competency model because a blended approach capitalizes on the strengths of each method. First, strategy is used as a frame of reference to guide subject matter experts to identify those worker attributes or competencies that are aligned with the organization’s strategy and then to use the task statements to provide more concrete referents for the associated job behaviors (Lievens et al., 2004). Although the competencies remained the same across all positions and ranks, the behavioral expectations of people who fill those positions vary with their level of responsibility (Trainor, 1997). The RCMP Core Competencies were leadership, service orientation and delivery, thinking skills, personal effectiveness/flexibility, organization and planning, interpersonal relations, communication, and motivation. Table 1 contains the behavioral definitions of these competencies.


205

TABLE 1 RCMP Core Competencies Leadership. Ability to effectively set and accomplish goals through the involvement and teamwork of others; inspiring others to perform to the highest standard; ability to attract and mobilize energies and talents to work toward a shared purpose in the best interests of the organization, the people comprising it and the people it serves; ability to gain and sustain the interest and support of others for strategies, which will realize objectives; respecting, consulting, informing, empowering, and developing employees; managing risk and accepting responsibility for one’s actions; encouraging constructive questioning of policies and practices; ability to lead through integrity, credibility, and by example in a manner that supports and promotes the mission, vision, and values of the RCMP. Service Orientation and Delivery. Ability to maintain a client focus and to adhere to clear and visible service quality values/standards within a community policing framework; establishing partnerships and balancing competing interests of clients; sensitivity to clients’/partners’ needs and concerns; commitment to improving quality and efficiency on an ongoing basis; commitment to the provision of quality service to clients/partners and providing clients/partners with appropriate opportunities for active participation and consultation on decisions that are relevant to their needs and concerns. Thinking Skills. Ability to identify, define, comprehend, and analyze problems and situations using rational processes (including inductive and deductive reasoning) that result in the drawing of accurate conclusions and viable solutions; problem sensitivity and analysis; information gathering and integration; fluency and originality of ideas; ability to accurately evaluate risks and potential outcomes of various actions before making decisions; ability to identify priorities and issues, make decisions (decisiveness), and take appropriate courses of action; objectivity; sound judgement (including whom to inform, of what to inform them, and when) and the use of appropriate strategies to achieve objectives. Personal Effectiveness and Flexibility. Ability to effectively adapt one’s behavior to changing circumstances to reach a goal or address diverse and changing client needs; tolerance for ambiguity; responding with new approaches to changing priorities and conditions; reevaluating goals and priorities as situations change; remaining calm in uncertain or stressful situations; maintaining performance under time constraints, conflicting demands, opposition, unpleasant working conditions, and perhaps danger; resilience and ability to learn from one’s mistakes; willingness to express one’s views when appropriate but to “get on side” once decisions have been made by superiors; patience; self-management. Organization and Planning. Ability to establish effective courses of action for oneself and others; translation of organizational strategies into action plans; appropriate assignment and delegation of responsibility; setting realistic timeframes and diary dates; ability to make effective use of time and resources; appropriate prioritization (sequencing) of tasks. Interpersonal Relations. Ability to interact sensitively and respectfully with diverse individuals and groups in ways that advance the work of the organization by developing respect, mutual understanding, and productive working relationships; commitment to establishing and maintaining positive working relationships; functioning effectively as a team member; demonstrating tact, honesty, and empathy in interactions with others; perceiving and reacting to the needs of others; willingness and ability to effectively resolve conflict through negotiation and appropriate compromise. (continues)

206

PERSONNEL PSYCHOLOGY TABLE 1 (continued)

Communication. Ability to shape others’ understanding in ways, which accurately convey information, capture interest, and gain support through listening, interpreting, speaking, writing, presenting, educating, and counseling; ability to listen to and understand other perspectives and to tailor/modify one’s approach to enhance communication or achieve results; ability to clearly and persuasively present and defend one’s ideas and convictions. Motivation. Ongoing, significant, and active involvement in work activities; enthusiastic commitment of one’s energies to achieving organizational goals; seeking opportunities to initiate action; demonstrating initiative and taking action in anticipation of imminent demands or circumstances; persistence and perseverance in face of challenging or tedious work requirements; willingness to work on difficult problems, to strive to accomplish something significant, and to do an outstanding job; organizational citizenship; willingness to perform beyond the normal range of job expectations and requirements when necessary; commitment to continuous learning and self-development and to apply one’s knowledge to the job.

As part of the 1998 promotion process, members had to demonstrate how their job performance satisfied each competency. Candidates completed a Performance Report for Promotion (PRP) by providing two behavioral examples, for each competency, which they believed best illustrated their performance across the core competencies. References were provided in support of each behavioral example. This procedure was adapted from the Accomplishment Record (AR) developed by Hough (1984) for use in selecting and promoting professionals. The AR is a self-report instrument that focuses on biodata and an individual’s prior achievements rather than on the report of past competency-related behaviors. The PRP form, created on FormFlow software, contained the definition of each competency followed by a behaviorally anchored rating scale (BARS), which was adopted to deal with the issues of openness, transparency, and fairness. The BARS used to assess the leadership competency at the corporal rank is presented in Figure 1 as an example of one of the 24 scales developed in English. BARS development involved an exhaustive definition of rank-appropriate performance in each of the core competencies through the generation of behavioral examples for different levels of performance by subject matter experts (SMEs). Participation of RCMP members from different occupational backgrounds and geographic regions as SMEs was thought to lead to greater acceptance of the rating system as one reflecting the level of job performance expected by other RCMP members. Finally, the transparency of the behavioral anchors and their presence on the PRP form would give both the supervisor and the candidate a standard expectation of performance levels and increase the


207

Figure 1: Example of Supervisor’s Rating form.

perceived fairness of the system. To prevent faking, that is the creation of behavioral examples that fit the behavioral anchor, at least one of the candidate’s references was randomly selected and contacted regarding the veracity of the example.

208


Figure 1: (continued)


209

Completed PRP forms were reviewed by the candidate’s immediate supervisor whose assessment was based on, but not limited to, the behavioral examples presented on the PRP. The supervisor’s rating reflected all verifiable information that spoke to the member’s performance. Supervisors could add comments to the PRP in support of their ratings. The supervisor met with the candidate to review the ratings that were being recommended by the supervisor. A candidate could object to any rating and suggest ones they thought were more appropriate. If the supervisor and candidate could not reach consensus, candidates could present their suggested ratings along with the supervisor’s. PRPs were next reviewed by an independent Promotion Review Board (PRB), consisting of three well-respected members who were at or above the target promotional rank. Over 200 PRBs were established in RCMP divisions to deal with the anticipated volume of promotions (divisions correspond, for the most part, to Canadian provinces). The purpose of the board review was to resolve disagreements between supervisors and candidates, to control for supervisors who may give uniformly high or uniformly low ratings, and to achieve consistency of supervisory ratings across candidates. After due deliberation, the board could change a supervisor’s rating by unanimous vote; such change had to be accompanied by a written justification. The board’s ratings were used for promotional purposes. The final performance score was simply the sum of the eight competency ratings assigned by the PRB. The performance score was averaged with a score from a written examination called the job simulation exercise (JSE)1 , a variant on a situational judgment test, to form the promotion list. An important difference between the PRP and the JSE is that the former assesses candidates with respect to performance in a

1 The JSE, a low fidelity simulation (cf. Motowidlo & Tippins, 1993), took the form of a paper-and-pencil situational judgment test. A JSE was created for each of the three target ranks. Each exam contained 48 scenarios or hypothetical situations likely to be encountered in the next rank. Each scenario described the role of the focal person in the situation (e.g., investigator, shift supervisor), the context (e.g., rural area, large detachment, administrative function), a main issue (e.g., coworker conflict, planning issues, leadership), and the key stakeholders involved in the situation (e.g., public, coworker, superior). Candidates were required to choose from five possible response options containing a best response (two points), a satisfactory response (one point), and three ineffective responses (zero points each). The scenarios and response options were provided by subject matter experts (SMEs) performing in the target rank; the SMEs were representative of various functions and demographic groups in the organization. Eight scenarios representing critical incidents typically encountered in the target rank were developed for each of six core competencies (leadership, thinking skills, planning and organizing, service orientation, personal effectiveness and flexibility, and interpersonal relations). The scenarios were chosen and written for application across various functional areas in the organization. As such, the JSE may be seen as a measure of general problem-solving ability at the supervisory level, rather than of job-specific knowledge (RCMP Research Branch, 1999).

210


current rank, where as the latter focuses on requirements with respect to the next rank. It would have been preferable from a psychometric standpoint if the PRB made an independent assessment of the candidate without knowledge of the supervisor’s ratings. The Liaison Committee that oversaw the development and implementation of the new promotion system believed that in the interest of transparency the supervisors’ assessments and the candidates’ responses to them should be seen by the PRB. The Liaison Committee included influential representatives of the members. This procedure was accepted to increase the chances of buy-in for the new system from members seeking promotion. Training played an important part in the new performance assessment system. In addition to reducing rating errors and increasing accuracy (Day & Sulsky, 1995), training was essential to gaining acceptance of the system. Every available resource was expended to provide effective training in use of the performance assessment system. A variety of approaches were taken, geared to the needs and accessibility of the target groups, perceived efficiency, and time and cost effectiveness. Training of candidates and supervisors, due to the sheer numbers, was through print and video instruction. Training of PRB members took place through workshops carried out in each RCMP Division. Details on the training procedures and development of the BARS follow in the next section. Different BARS were produced for each promotional rank. Because the RCMP operates in two official languages, all promotional materials, including the PRP and videos, were produced in both English and French versions. The performance-based promotion system was accepted for implementation by the RCMP, and a modified version has now been used over three promotion cycles (see Discussion section). To overcome some fairness and morale issues identified in the previous promotion system, all applicants for promotion under the new system were evaluated on both the JSE and PRP during the 1998 promotion cycle. A promotion cycle lasts for 2 years. During that period, promotions were made in a top–down manner from a merit list, based on the average of the JSE and PRP scores, each weighted at 50%. Results presented here represent the evaluation of the performance assessment procedure based on data from the first promotion cycle and longitudinal promotion data. The data provide useful information related to the assessment of competencies, supervisor ratings, and the reliability of board ratings. Methods Development of the Behavioral Anchors

Three focus groups, of 15–16 SMEs each, developed behavioral anchors over a 3-day workshop. Each group wrote anchors for one


211

promotional rank only. The SMEs included members from the target and next three higher ranks. The SMEs were nominated by the commanding officers of RCMP Divisions. The SMEs had to have demonstrated high performance in the core competencies and to have served 3–7 years in their current rank. Each focus group represented the organization in terms of geographic area, job type, type of policing, gender, and aboriginal and visible minority status. After an orientation session, SMEs wrote three to four behavioral examples each of poor, average, and excellent performance related to each competency. Over 1,100 behavioral examples were generated by each focus group. The SMEs then rated all the behavioral statements produced by the group using a 10-point scale. The statements were coded and retranslated with only statements receiving 70% agreement being retained. The procedure that was followed is described in detail elsewhere (Shaw, Schneier, & Beatty, 1991). Following the workshops, the retained descriptors were edited, grouped by assigned rating, and labeled to form anchors. The draft anchors were reviewed and edited by the SMEs who were also asked to provide additional high-end descriptors. The revised anchors were reviewed for their appropriateness and then submitted to new groups of SMEs for retranslation and confirmation of the ratings. This process resulted in further editing of the anchors. The prototype PRPs and instructions were pilot tested with 70 volunteer candidates and their supervisors using the FormFlow software. Changes were made in the PRP form, the anchors, the instructions, and the process, based on evaluation of pilot test data and participant feedback. Information from the pilot study was also useful in identifying training needs for the candidates and supervisors. French language versions of the PRPs were prepared by the RCMP Translation Section after the English versions were finalized. Bilingual SMEs compared the French versions to the original English versions for similarity and identified discrepancies. The SMEs and the translator discussed the discrepancies until they reached consensus on an appropriate rewording. The final anchors on each of the three PRP forms were progressively more demanding for their respective ranks. Examples of the anchors for the final “exceptional” category of the communications competency were: Constable to corporal. Explains complicated points in different ways to ensure understanding; written correspondence used as a model by other members; advice sought by others regarding how to make effective presentations; written reports are concise, understandable, and lead to defendable and convincing conclusions; listens carefully to opinions and concerns of others and uses their views to help find solutions. Corporal to sergeant. Demonstrates enthusiasm in all communication; presentations stick to the topic and do not wander—given in a lively

212


and informative manner; writing is clear, concise and error-free, easy to comprehend, and contains a full explanation of the information being offered; writing is used as a model by other members; makes very effective public presentations on major issues in clear and simple language with little advance preparation; incorporates new information technology wherever available, as part of communication strategy. Sergeant to staff sergeant. Written correspondence used as a model by other members; instructs others in how to make public presentations; presents technical issues in plain language that can be understood by the intended audience; does not use jargon in communicating; makes very effective public presentations on major issues in clear and simple language with little advance preparation. Training

Candidates. Candidates were provided with extensive information and instructions on how to complete the PRP, how to present behavioral examples of competencies, and how the process worked. In addition, candidates had access to a 10-minute video explaining the PRP process and to a 40-minute training video developed for supervisors that illustrated different competency levels. As described in the next section, the supervisor’s video contained extensive information and illustrations on the use of the new PRP. The liaison group overseeing implementation of the new promotion system, which included representatives of RCMP members, felt it was both appropriate in terms of transparency and helpful in completing the PRP for candidates to view the longer supervisor’s video. Supervisors. Five thousand copies of a training video and a related guide were distributed to supervisors who were required to review this material prior to assessing a subordinate. The video used role-play scenarios, behavioral examples, and question-and-answer formats to cover the essential components of a rater training program (Pulakos, 1991). The video introduced the PRP form and procedures, the core competencies, and use of the BARS. Video examples illustrated performance for selected competencies; the guide contained PRP materials related to the video examples. Supervisors had to pause the video and rate the behavioral examples before comparing their ratings to those assigned by the supervisor in the video. The video also included information on accuracy training (including a practice and feedback session), error training (video examples are used to illustrate common rating errors), and meeting with the candidate to review ratings (illustrated with video example). The guide also provided a performance appraisal check list adapted from Tziner et al’s (1996) questionnaire for measuring perceived political considerations in performance appraisal.


213

Performance review boards. Training content was influenced by results from a series of mock boards where 18 SMEs rated PRPs produced for the pilot study. Two training teams conducted more than 25 1-day training workshops throughout RCMP divisions for members who had been selected to serve on the PRBs. Each workshop was limited to 21 participants. Board members reviewed the supervisor’s instructional video and the participant’s guide. This was followed by a discussion of rating accuracy; accountability; and monitoring (i.e., whether one supervisor is out of line with other supervisors). The training session also covered the knowledge and abilities required from board members, their role and responsibilities, the role of the board chairperson, guidelines for conducting boards, and a PRB-rating rationale that was developed to assist the PRBs in providing feedback to the supervisor and candidate. The training groups practiced rating competencies. Once they felt comfortable with the rating process, the trainees were divided into seven 3-person boards and practiced rating PRPs as a board. The trainers monitored progress and answered questions during the practice sessions. In plenary sessions, members discussed their board ratings and raised any concerns (e.g., how to resolve difficulties in reaching a consensus). A short test was given to evaluate the PRB members. Feedback Questionnaire

Following completion of PRPs but before candidates received their performance scores, a 14-item questionnaire was distributed electronically throughout the RCMP. The purpose of the questionnaire was to obtain feedback from both candidates and supervisors on the PRP as a component of the promotion system. The intent was to identify user satisfaction and improvements that should be made before its use in the next cycle. Six general items (e.g., the addition of the PRP has improved the promotion process) were directed to both candidates and supervisors with an additional four each to candidates (e.g., I feel that the supervisor’s comments and recommended ratings are an important part of the PRP process) and supervisors (e.g., I found the behavioral descriptors (anchors) useful in forming an accurate picture of my subordinate’s performance). A relatively low number of members responded to the request for feedback (N = 1,077; approximately a 10% to 13% response), with 610 responses coming from members who identified themselves as having been candidates, 147 from supervisors, and 291 from members who had participated as both a supervisor of other candidates and as candidates, themselves, for promotion to a higher rank. The low response rate may be due to the timing of the survey. The questionnaire was distributed early in an attempt to secure feedback that was not biased by the scores the participants received from

214


the PRB; as a result, the dissemination of the survey overlapped to some degree with some members who were still in the process of completing their own PRP or serving as board members on a PRB. Results and Discussion

At the time of this analysis, data from 4,056 constables, 1,814 corporals, and 701 sergeants who had applied for promotion (N = 6571) were available. Separate data analyses were conducted by rank as well as for the three ranks combined. In this later case, we recognized that the three PRP forms had different anchors, but the similarity of the process suggested the data could be combined. With rare exception, the results for each rank were almost identical to the combined data. Except where there are meaningful differences across the three ranks, we present the combined data. PRP Scores Assigned by the Boards

The maximum attainable score on the PRP was 80. The distribution of scores at each rank was relatively normal with a larger number of scores clustering at the center of the distribution and a fairly symmetric decline toward either end of the distribution. For the constable distribution, skewness and kurtosis were sk = −.23 and ku = 1.69, respectively. For the corporal distribution, these values were sk = −.41 and ku = 2.13; for the sergeant distribution sk = .03 and ku = −.36. Tabachnik and Fidell (2001) advise caution in interpreting these values in large samples, as is the case here. Instead, they suggest the visual examination of the distribution of scores and also of the actual size of these values (departures from zero), adding that with larger samples, the impact of departures from zero on the normality of the distribution diminishes. Frequency histograms conformed to superimposed normal distributions. The average PRP score across all candidates was M = 53.3 (SD = 7.96). PRP scores increased from constables (M = 52.0; SD = 7.89) to corporals (M = 54.4; SD = 7.79) to sergeants (M = 58.0; SD = 6.52), F (2, 6570) = 202.95, p < .001. As well, the mean score on each of the eight competencies replicated the results for the overall average score; the mean rating on each competency was significantly higher for sergeant applicants (p < .001) than those for corporals, whose ratings in turn were higher than those for constables. The higher mean scores across the eight competencies for applicants to the higher ranks are consistent with an expectation that these competencies should be more developed in higher-ranking members who have had a greater amount and variety of work experience. The culture of the organization was such that members expected to retire as corporals and persisted in applying for promotion regardless of lack of success on


215

previous cycles. That was not the case for higher rank where only those members who felt they had a realistic chance of advancement put themselves forward for promotion. In effect, at higher ranks there is likely a self-selecting out of members who believed they would not be competitive or were not likely to meet the new criteria for promotion. Equity Issues

The RCMP is subject to government legislation with respect to nondiscrimination for certain protected groups. The selection and promotion policies of government agencies must be in compliance with the relevant legislation. PRP scores were reviewed at all rank levels with respect to their discrimination on the basis of language, gender, visible minority, and aboriginal group membership. There were no significant differences for PRP scores across any of these variables except for visible minority (M = 48.9) and aboriginal (M = 47.8) constables who scored significantly lower than majority group constables (M = 52.2). Controlling for education and years of service did not remove these differences. There were no statistical differences among visible minority, aboriginal, and majority group members on the PRP at the corporal and sergeant ranks applying for promotion. These results were similar to those from the knowledge exam used in the previous promotion system and are mostly due to the majority of these groups, 70%, having entered the RCMP as “special constables” without having to meet the same stringent entry requirements as other members in an attempt to make the composition of the RCMP more similar to that of Canadian society. Initially, special constables were to remain at that rank with limited duties, but a subsequent policy decision “regularized” this group and made them eligible for the full range of duties and advancement in rank. The differences between the minority and majority members disappear when the analyses are restricted to minority members who met the regular entry standards, particularly those who had already advanced to the corporal and sergeant ranks. These results for visible minorities and aboriginals in the constable ranks may be an artifact of those policy decisions; nonetheless, they remained a concern with respect to discrimination and were addressed through special training initiatives directed at these two groups. In addition, future analyses should include equivalency studies, such as differential item functioning with respect to language, gender, visible minority, and aboriginal status as part of the review of potential discrimination against these groups. Intercorrelations Between Competencies

Table 2 shows the intercorrelations between the eight core competencies assessed by the PRP. All competencies are significantly and

216

PERSONNEL PSYCHOLOGY TABLE 2 Correlations Between Competencies

1. Lead 2. Service 3. Think 4. Flex 5. Org 6. IR 7. Com 8. Motive

1

2

3

4

5

6

7

8

– .62 .61 .61 .60 .55 .57 .59

.69 – .58 .57 .57 .54 .57 .58

.71 .63 – .64 .62 .51 .57 .58

.71 .65 .69 – .63 .58 .56 .60

.69 .64 .69 .67 – .53 .59 .61

.65 .62 .58 .65 .55 – .55 .53

.65 .61 .64 .61 .63 .59 – .61

.70 .64 .64 .65 .65 .57 .60 –

Notes. PRB ratings are below the diagonal. Supervisor ratings are above the diagonal. Lead = leadership. Service = service orientation & delivery. Think = thinking skills. Flex = personal effectiveness & flexibility. Org = organization/planning. IR = interpersonal relations. Com = communications. Motive = motivation.

moderately correlated with each other (p < .001). The coefficients range from r = .51 to r = .64. The same pattern of intercorrelations, although somewhat inflated, is evident in the supervisor’s recommended ratings that range from r = .55 to r = .71, which are also shown in Table 2. This pattern of intercorrelations generally implies an underlying conceptual basis for the performance measure and further suggests that the eight different competencies are separate items measuring the same overall performance construct. These high intercorrelations were expected because high intercorrelations were found between similar job dimensions that were the foundation of the structured interview and examinations used in the previous two promotion cycles (Catano & Associates, 1997). The high intercorrelations between the eight competencies were not a concern because only the total score was used for promotional decisions. These results do raise interesting questions for competency-based systems. The competencies used here are very similar to those found in most competency dictionaries (e.g., Dubois, 1993; Gorsline, 1996; Slivinski et al., 1996). Similarly, Atkins and Wood (2002), Beehr, Ivanitskaya, Hansen, Erofeev, and Gudanowski (2001), and Fletcher, Baldry, and CunninghamSnell (1998) found high intercorrelations among the competencies used in their studies. Atkins and Wood (2002) reported that highly trained assessment center raters had trouble separating out individual competencies from an overall halo effect. As well, research with structured interviews generally finds a high degree of intercorrelation between the interview dimensions (Conway & Peneno, 1999; Huffcutt, Weekley, Wiesner, DeGroot, & Jones, 2001; Pulakos & Schmitt, 1995). The lack of discriminant validity among interview dimensions is one reason why the recommended


217

procedure is to use the total interview score in decision making (Campion, Palmer, & Campion, 1998). The appraisal system and the training procedures that were implemented were among the most comprehensive used; yet, the intercorrelations suggest the presence of “halo errors” that may represent adaptive responses that are consciously adopted by the raters to the wider work situation (Cleveland & Murphy, 1992); that is, the raters are forming an overall view of competence, which is then reflected in each competency dimension (Murphy & Cleveland, 1995). Alternatively, the high intercorrelations may represent the presence of a general performance factor (Viswesvaran, Schmidt, & Ones, 2005). Can individuals discriminate performance on different competencies, and can individuals make meaningful distinctions between competencies even when they are behaviorally defined? Regardless of the resolution of this issue, rating the eight competencies assures a higher degree of reliability for the composite score in that they provide distinct but related views of overall performance.

Correlations among JSE Competency Assessments, PRB Scores, and Supervisor Ratings

Scores on the three JSEs ranged between 11 and 89 out of total possible score of 96. For the corporal exam, the mean score was 66.9 (SD = 7.7); for the sergeant JSE, the mean score was 68.7 (SD = 8.4); and for the staff sergeant JSE, the mean score was 66.3 (SD = 7.4). The three exams had internal consistency reliability estimates for the corporal, sergeant, and staff sergeant JSEs of .56, .63, and .54, respectively (RCMP Research Branch, 1999). The relatively low internal consistencies were not surprising in that the JSEs were intended to assess six different competencies and not a unitary construct. These estimates are within the range reported for low fidelity simulations (e.g., .56, Motowildo, Dunnette, & Carter, 1990) and are likely a function of the scoring format (i.e., points for less effective responses) and item selection to maximize distinctiveness from each other (RCMP Research Branch, 1999). There was a low but significant correlation between total PRP and total JSE scores (r = .18; p < .001). The correlation between individual competencies ranged from .03 to .10 (Table 3). Similar relationships were found for each of the three ranks. These results show that the JSE and PRP are assessing different aspects of the core competencies. They are nonredundant measures, each providing a different perspective on the member’s performance. The JSE is addressing the practical knowledge of the members, whereas the PRP assesses the translation of that knowledge into performance. Both aspects are relevant to promotion decisions.

218

PERSONNEL PSYCHOLOGY TABLE 3 Correlations between Competencies Assessed by the PRB and JSE 1

1. Lead 2. Service 3. Think 4. Flex 5. Org 6. IR 7. Total

2

3

4

5

6

7

.03 .10 .03 .04 .05 .06 .18

Notes. Communications and Motivation were not measured by the JSE. Lead = leadership. Service = service orientation & delivery. Think = thinking skills. Flex = personal effectiveness & flexibility. Org = organization/planning. IR = interpersonal relations. Com = communications. Motive = motivation.

Correlations between JSE scores and supervisor ratings on the PRP ranged between .15 and .17 across each rank. Overall, the correlations are low but appear to be consistent with McDaniel, Morgeson, Bruhn Finnegan, Campion, and Braverman‘s (2001) meta-analytic finding of a corrected .34 validity coefficient between situational judgment tests and supervisor ratings. Using the average reliability estimates for the JSE and PRB ratings to correct for unreliability increases the correlation between these measures to about .32. This modest association may be due to a combination of factors related to the large cognitive ability component of SJTs (McDaniel et al., 2001) and job experience. For example, Farrell and McDaniel (2001) found that for more complex jobs, associations between general cognitive ability and supervisor ratings decreased, dropping below .25, as experience increased beyond 7 years. As all promotional candidates in our sample had to have at least 5 years of service to participate in the promotion cycle, this may explain these results. Supervisor Ratings

The distribution of supervisor ratings for the three combined ranks was normal (sk = −.22; ku = −.01); the ratings at each rank were also normal with values for sk and ku similar to those for the combined distribution. The supervisor’s recommended performance scores (M = 58.12) were considerably higher than those assigned by the PRBs (M = 53.3; t(1,3140) = −33.01, p < .001). This difference was not unexpected as one role of the PRB was to control rating inflation. Differences between rating sources are more likely in organizations like the RCMP, which have rigid, hierarchical structures. At the supervisor–subordinate level,


219

the most immediate consequence of assigning low ratings is to jeopardize the supervisor–subordinate relationship by provoking negative reactions from the subordinate (Murphy & Cleveland, 1995). Most members (83%) agreed with their supervisor’s recommended rating. The recommended rating for these members (M = 58.43) was much higher than for those who disagreed with their supervisor (M = 52.25; t(2,869) = 12.81, p < .001). The results confirm the expectation of inflated supervisory ratings and supported the review of those ratings by the PRBs.

Impact of Inflated Ratings

The PRBs decreased 69.1% of the supervisor ratings, increasing 19.7% and upholding 11.2%. On average, the PRBs lowered PRP scores by 4.5, 6.0, and 3.5 points for constables, corporals, and sergeants, respectively. The promotion list was rank-ordered based on the average of the total PRP score and the score from the job simulation exercise. We ranked candidates on both the supervisor’s rating and the PRB score; across the three promotional groups, 55.8% of the candidates held higher rank positions on the PRB list than on the supervisor rating list, 38.0% decreased, and 6.0% remained the same. Although this result may seem counter intuitive, it shows the impact of inflated scores; lowering one score may drop that individual 10 positions on the promotion list causing 10 other individuals to go up in position.

Correlation of PRB and Supervisor Ratings

Are rankings based on supervisor ratings more accurate than those based on PRB ratings? This question can never be satisfactorily answered in the absence of true performance data. However, we can determine whether rankings based on the two rating sources are so different that they raise concerns about the validity of any promotional decision based on either set of ratings. PRB and supervisor ratings were positively and significantly correlated with each other (r = .72, p < 0.001). The two ratings are not independent in that the PRB rating is based on the supervisor’s score. As noted earlier, complete independence between both sets of scores was sacrificed to obtain buy-in for the system from members who argued for complete transparency with respect to the supervisor’s ratings and the member’s response to them. Nonetheless, the consistency between the two sources indicate that the PRB and the supervisor are viewing the candidate’s performance similarly, even though they might disagree on the exact value to place on that performance.

220


Estimates of PRP Reliability

Interrater reliability and intrarater reliability (internal consistency) have been used to assess performance appraisal reliability (Murphy & Cleveland, 1995; Pulakos, 1997; Viswesvaran, Ones, & Schmidt, 1996). Both methods have limitations. Interrater agreement, the agreement between two or more PRBs in rating the same individual, confounds information about the reliability of ratings with the validity of the ratings and at best provides only partial information about the reliability of ratings (Murphy & Cleveland, 1995, pp. 271–272). In addition, the requirement of having two boards rate each PRP made this an impractical procedure, from a cost standpoint, to consider as part of the implementation procedures. Intrarater reliability determines how consistently each rater, or in this case each PRB, assigned ratings to the different dimensions comprising the performance appraisal measure for all the candidates that a board rated. On the one hand, measures of internal consistency may produce artificially low estimates of reliability if all the dimensions that comprise a performance appraisal are providing unique information (Murphy & Cleveland, 1995). On the other hand, if the dimensions are moderately correlated, as is the case here, the measure will likely produce higher estimates than those from interrater reliability methods (Pulakos, 1997, p. 294). Recognizing these limitations, intrarater reliabilities were the only reliability estimates that could be obtained, initially. Table 4 presents the average intrarater reliability coefficient for the PRBs by rank within each RCMP division (Divisions correspond mostly to Canadian provinces). The procedure identified by Shrout and Fleiss (1979; see their Table 2) to calculate intraclass correlations was used to estimate intrarater reliability for each PRB. For each board, a k (number of PRPs evaluated by the board, where k > 1) × 8 (competencies) matrix was developed with competencies as the row variable. The resulting coefficient alpha was used as an estimate of intrarater or PRB reliability. The mean intrarater reliability was r = .89 (based on 44 mean coefficients) and ranged from r = .57 to r = .94 across divisions and ranks. These values (Table 4) are consistent with the average intrarater coefficients reported for job performance ratings (r = .86) in Viswesvaran et al.’s (1996) metaanalysis, which was based on data obtained from 89 reliability studies. One difference in the present case is that the relatively high degree of intercorrelation among the competencies may lead to overestimates of intrarater reliability. Although the intrarater reliabilities are estimates, it is nonetheless the case that the reliability of the PRP appears to meet or to exceed the values reported for intrarater reliability measures of job performance found in the professional literature.


221

TABLE 4A Average Intrarater Reliability Coefficients for PRBs Within Each Division Constable Division HQ A B C D E F G H J K L M O Depot

Corporal

Sergeant

Average α

No. of PRBs

Average α

No. of PRBs

Average α

No. of PRBs

0.89 0.85 0.84 0.92 0.88 0.88 0.88 0.88 0.87 0.91 0.88 0.91 0.85 0.92 –

2 5 7 7 11 22 8 4 11 5 15 2 2 10 –

0.86 0.78 0.86 0.90 0.91 0.88 0.88 0.84 0.89 0.84 0.88 0.90 0.86 0.93 0.84

2 3 4 2 5 7 4 2 3 6 7 1 1 4 6

0.87 0.82 0.90 0.89 0.91 0.73 0.84 0.94 0.94 0.91 0.86 0.92 0.57 0.84 0.88

2 1 1 1 3 4 2 1 1 2 2 1 1 1 1

Note. PRBs were available at this rank.

TABLE 4B Estimates of Interrater Reliability Constable (n = 638) .60

Corporal (n = 329)

Sergeant (n = 72)

.46

.59

The RCMP Act, the federal legislation that governs the RCMP, provides for an internal, but independent, mechanism for the redress of grievances. The number of promotion-related grievances has generally been high since 1994 when more structure was introduced into the promotion process through the use of quantitative assessment tools (e.g., job knowledge exam, structured interview). The two cycles of the previous promotion system resulted in approximately 2,000 grievances, or roughly 1,000 per cycle (Catano & Associates, 1997). We had hoped that the number of grievances would decrease but in fact they increased. Following receipt of their scores on the PRP, as determined by the PRBs, a number of members challenged either the process or the accuracy of their scores. Of the total number of grievances, 2,873, that were submitted by the end of the 2-year promotion cycle, approximately 15% pertained to

222


the JSEs, 4.5% were related to the PRP process and policy issues, and the remaining 80.5% had to do with PRB score changes. Members who grieved their PRP score were offered the opportunity to have their PRP rescored by a second board composed of members who had previously served on a PRB. The final score assigned to the member was the average of both PRB scores. The griever faced peril in that the final score could be lower than the initial one. Rescored PRPs for 638 constables, 329 corporals, and 72 sergeants were available for additional analyses. Although one might expect individuals with low scores to file grievances, those who did had scores higher than the population of candidates (Constable: M = 62.81, SD = 8.05; Corporal: M = 63.87, SD = 7.77; Sergeant: M = 68.75, SD = 6.37). The distributions of scores for this sample were very similar to those for the respective populations. The higher mean scores reflect both the competitiveness of RCMP members and the impact of the boards in lowering high scores recommended by supervisors. We correlated the original PRP scores with the second set of scores obtained from the redress boards to estimate interrater reliability. The second PRB was unaware of the original PRP score assigned by the first PRB and as such constituted a second, independent assessment of the PRP. Approximately 50% of the applicants received the same score from the second PRB with 19% receiving decreased scores and the remaining 31% receiving increased scores. These percentages were fairly constant except that the percentage of corporals receiving decreases, 12%, and increases, 41%, were different from those for constables and sergeants. This difference is also reflected in a lower interrater reliability for the corporal scores. The interrater reliabilities based on the correlation of the scores assigned by the two independent PRBs, as expected, were lower than the intrarater reliability estimates reported in Table 4: Constable, r = .60; Corporal, r = .46; Sergeant, r = .59. These estimates were consistent with or higher than those reported by Viswesvaran et al. (1996); however, they may be overestimates because both PRBs were aware of the supervisor’s original rating when scoring the PRPs. Nonetheless, both the intrarater and interrater reliability estimates strongly support the reliability of the PRP assessments. Validity of the PRP

With respect to the PRP, the issue of validity is one of making correct inferences that higher PRP scores reflect higher performance levels. In developing the PRP, a content validation strategy was used, by ensuring agreement of the SMEs that the behaviors being measured by the PRPs fairly represented the behaviors associated with performance levels in the three ranks with respect to the core competencies. Later, tentative


223

evidence for predictive validity was established using a measure of career advancement in the organization. We were able to gather promotion data over a 7-year period following the administration of the PRP in 1998. As an objective criterion, this measure overcomes, to some extent, the weaknesses of leniency and bias associated with subjective measures of performance such as supervisory ratings (Sturman, Cheramie, & Cashen, 2005); however, “objective” measures such as promotions and salary history often reflect subjective performance appraisals. Objective measures have been used as a criterion in previous validation studies. For example, 35% of the validity coefficients analyzed in Schmitt, Gooding, Noe, and Kirsch‘s (1984) meta-analysis used objective measures such as status change, earnings, and tenure. Meyer (1987) specifically examined time to promotion over a 21-month period when validating an aptitude battery to select computer operators, with some success. As the PRP was aimed at identifying those with potential to effectively assume higher levels of responsibility, the examination of promotion rate provides some indication of the assessment tool’s ability to predict future leadership potential. Such a measure also captures the dynamic aspect of individual performance by permitting an estimate over time (cf. Sturman et al., 2005). The rate of promotion was calculated as the ratio of the number of promotions between February/March 1998 and August 20052 or the date of retirement (for those who retired before August 2005) to the number of months between these two dates. As the PRB was created with the intent of discriminating between individuals with a potential to succeed at the next adjacent rank, promotion into that rank for those with higher PRB scores was a given. Consequently, promotion into the adjacent rank potentially confounded the criterion. For example, a constable who obtained a high PRB score was almost guaranteed a promotion into the corporal rank; however, promotion into subsequent ranks (i.e., sergeant or inspector) was dependent on performance on other assessment tools, including a JSE and revised PRP designed for the higher rank. To account for this potential confound, the rate of promotion was calculated twice, with and without including a member’s promotion into the next immediate rank at the time of the PRB. For example, for a constable who completed the PRB and was promoted into the corporal rank in 2001, and subsequently to a sergeant in 2002, and then staff sergeant in 2004, the rate of promotion was calculated twice (3/89 months = .0337 and 2/89 months = .0224). In addition, the partial correlation between each predictor and the number of subsequent promotions was calculated by controlling for initial promotion into adjacent rank. Promotion into the 2 Members acting in a position at the target rank prior to the PRP were able to back date their promotion to that date; August 2005 was the date of receipt of the criterion data.

224

PERSONNEL PSYCHOLOGY TABLE 5 Correlations Between Assessment Predictors and Rate of Promotion

Constables (N = 4,137) Rate of promotiona Rate of promotionb Promotions beyond adjacent Promoted to officerd Corporals (N = 1,830 to 1,913) Rate of promotiona Rate of promotionb Promotions beyond adjacent Promoted to officerd Sergeants (N = 705 to 732) Rate of promotiona Rate of promotionb Promotions beyond adjacent Promoted to officerd

JSE

PRB

Combined score

.367 .253 .130c .088

.357 .297 .203c .126

.463 .356 .230c .141

.433 .296 .170c .206

.326 .248 .169c .174

.499 .354 .235c .245

.330 .188 .185c .220

.384 .155 .161c .173

.467 .221 .238c .253

Note. a Includes all promotions. b Includes only promotions beyond adjacent rank. Partial correlation controlling for promotions beyond adjacent rank. d Coded 1 = promoted, 0 = not promoted.

c

commissioned officer rank of inspector was examined as another potential criterion. Correlations for each predictor (the PRB score for the PRP, JSE, and combined equally weighted JSE and PRB scores) with the three criterion measures are reported in Table 5. These are broken down by rank. All correlations are significant at the .01 level. When all promotions are factored into the rate of promotion index, coefficients are moderately high (.326 to .433) for each predictor and highest for the weighted combined JSE and PRB score (.463 to .499) across all ranks. When promotion into the next immediate rank is excluded from the promotion index, the coefficients are slightly lower. The validity coefficients for the PRB appear to be stronger in predicting the rate of promotion for constables (.297) and corporals (.248) than for sergeants (.155). The same trend is evident when the combined score is used, although the coefficients are higher (.354 for constables, .356 for corporals, and .221 for sergeants). Controlling for initial promotion into adjacent rank, the PRB continues to have significant, albeit slightly lowered, associations with subsequent promotions. The lower validity coefficients at higher ranks are, perhaps, due to the generally lower number of promotions at higher ranks compared to those at lower ranks. The criterion data for constables revealed 44.1% corporal promotions, 13.4% sergeant promotions, 1.5% staff/sergeant promotions, and 1% inspector promotions. The same trend is evident for the corporal


225

and sergeant data as well. Nevertheless, the obtained validity coefficients are comparable to those reported in Schmitt et al.’s (1984) meta-analysis. The overall uncorrected validity coefficients for objective criteria such as status change (.359) and wages (.378) are not much higher than that observed in the present study. Feedback Survey

Because of the low response rates on the survey (10% to 13%), the results from the feedback survey may not be representative of the target populations and should be interpreted with caution. The results do, however, offer some support for the validity of using the scores from the PRP to make inferences about a member’s suitability for promotion. Most candidates (58%) felt that their supervisor’s ratings on the PRP reflected their performance, opposed to 28% who did not share that view. As well, 51% of the supervisors who responded believed that the behavioral descriptors on the PRP were useful in forming an accurate picture of their subordinate’s performance, opposed to 29% who did not. More supervisors (48%) than not (34%) believed that use of the PRP allowed them to make an assessment of a member’s performance. More respondents believed that the ratings from the PRP were a good reflection of their performance. Most believed that the PRP should be retained (55% vs. 31%) but revised (52%). There was strong agreement with the need to modify certain aspects of the PRP, with 62% agreeing with the need to modify training, followed by PRP procedures, policies, format/content, and method of delivery. Only a small percentage of the respondents (8.5%) reported having received their PRB results before they filled out the survey; this suggests that the score they received had little to do with their perceptions of the process. Their views were likely shaped by the experience of having to adapt to the change in the system, the requirement to produce behavioral examples for each competency from memory and completing the PRP in a relatively short time frame. We did a content analysis of comments made by the respondents; 35% of the comments concerned the length of time needed to complete the PRP (Mean = 37.7 hours; SD = 23.6), which they felt detracted from time spent on other duties. The next highest category, 27%, identified the behavioral examples as a concern citing mostly insufficient time to develop them, questioning the need for two for each competency, or suggesting that one example should be used for more than one competency. Twenty percent of the comments addressed the role of the PRB with many arguing that their supervisor’s rating was the only score that was needed and that they should not be rated by a board of “three strangers.” The results from the feedback survey were helpful in making changes to the promotion system

226


since the first administration of the PRP. The most notable change in the process was to go to a multiple hurdle system with only those candidates who received a passing score on the JSE being invited to prepare a PRP. In the first cycle, all applicants were eligible to complete the PRP regardless of their score on the JSE. This policy was adopted to overcome some of the morale issues that led to the external review of the previous promotion system; however, the cost of having everyone, including those with no hope of promotion, prepare a PRP and then grieve their scores led to changes in the system. Further consultation with members through various means reaffirmed their commitment to having a direct measure of performance included in the promotion system. They also saw the PRP as the fairest means of providing that assessment. In view of all of these considerations, the PRP was placed at the back end of a multiple hurdle procedure. The second change was to have a PRB review only the top three to five candidates who applied for a promotion on the basis of their JSE score; those eligible candidates now submit unscored PRPs that are reviewed by a selection committee, along with a structured resume prepared by the candidates. The three-member selection committee now includes the hiring manager of the promotional position for which the candidates have applied, a staff relations representative, and another member at the target rank or higher. Selection committees also have the option of asking all candidates to undergo a structured interview when, after reviewing the PRP and structured resume, they deem additional information is necessary to make a selection recommendation.

Implications for Practice

Competency modeling is one of the fastest growing alternatives to a traditional job analysis. Lievens et al. (2004) report that a search of the ABI/INFORM database identified over 500 competency-related articles published between 1995 and 2003 as opposed to only 87 published between 1985 and 1995. When competency modeling is done right, it can provide useful information (Schippmann et al., 2000). Performance appraisal based on a competency model must meet the same standards as any other performance appraisal system. Latham, Almost, Mann, and Moore (2005, p. 78) state that an organization is most likely to win court challenges when: “(1) the appraisal instrument is based on a written job analysis; (2) it is behavioral; (3) there is a written manual for appraising and then coaching an employee; (4) reliability and validity of the appraisal decisions have been documented; (5) the results of an appraisal have been reviewed with


227

the employee; and (6) organizations can show that appraising and coaching of employees is fair.’’

As Pearlman (2002) noted many if not most systems based on competencies lack legal defensibility. Rarely do any competency-based systems document the impact of competency-based selection on subsequent performance. Part of the reason for this is the ill-defined nature of most competencies. Focusing on behaviors, as we have done here, increases the likelihood that the competency can be measured fairly and reliably. The core competencies used here incorporated task information from a functional job analysis. Lievens et al. (2004) showed that “blended” competencies enhance the inferences that can be drawn about the competency requirements as well as increasing the methodological rigor of the competency model. Assessing the competencies, whatever they may be, is the Achilles’ heel of competency modeling. As we have shown here, basing the competency profile on competency-related behaviors increases both the reliability and the validity of the inferences drawn from those profiles. Our results also suggest that much more work needs to be done with respect to making decisions on the basis of a profile. Here, decisions were based on the total composite score obtained from eight competencies. Consistent with other research, ratings across the eight competencies were highly correlated, suggesting that there is a need in competency modeling implementation to review the use of competency profiles and the need to meet minimum requirements on each and every competency. Conclusions

The PRP system is a valid and reliable performance appraisal instrument based on behavioral competencies that meets the requirements for legal defensibility. Supervisors use behaviorally anchored rating scales to assess performance on eight core competencies, which promotion candidates illustrate with behavioral examples; the supervisor’s ratings are reviewed by an independent board. The system successfully differentiated promotion candidates and was not influenced by factors such as gender. The PRP boards controlled supervisor-rating inflation. The boards were internally consistent in rating competencies although boards, and supervisors, may have formed overall competency judgments. The PRP performed very well in comparison with other performance appraisal methods and was supported by candidates and supervisors who had gone through the process. The new performance appraisal system, most importantly, predicted advancement through the organization. It has potential for use in other organizations.

228


REFERENCES Arvey RD, Murphy KR. (1998). Performance evaluations in work settings. Annual Review of Psychology, 49, 141–168. Atkins PWB, Wood RW. (2002). Self- versus others’ ratings as predictors of assessment center ratings: Validation evidence for 360-degree feedback programs. PERSONNEL PSYCHOLOGY, 55, 871–904. Beehr TA, Ivanitskaya L, Hansen CP, Erofeev D, Gudanowski DM. (2001). Evaluation of 360 degree feedback ratings: Relationships with each other and with performance and selection predictors. Journal of Organizational Behavior, 22, 775–788. Blau G. (1999). Testing the longitudinal impact of work variables and performance appraisal satisfaction on subsequent overall job satisfaction. Human Relations, 52, 1099–1113. Bowman JS. (1999). Performance appraisal: Verisimilitude trumps veracity. Public Personnel Management, 28, 557–576. Campion MA, Palmer DK, Campion JE. (1998). Structuring employment interviews to improve reliability, validity, and users’ reactions. Current Directions in Psychological Science, 7, 77–82. Catano & Associates. (1997). The RCMP NCO promotion system: Report of the external review team. Ottawa, ON: RCMP. Cawley BD, Keeping LM, Levy PE. (1998). Participation in the performance appraisal process and employee reactions: A meta-analytic review of field investigations. Journal of Applied Psychology, 83, 615–633. Cleveland JN, Murphy KR. (1992). Analyzing performance appraisals as goal-directed behavior. Research in Personnel and Human Resources Management, 10, 121–185. Conway JM, Peneno GM. (1999). Comparing structured interview question types: Construct validity and applicant reactions. Journal of Business and Psychology, 13, 485–506. Day DV, Sulsky LM. (1995). Effects of frame-of-reference training and rate information configuration on memory organization and rater accuracy. Journal of Applied Psychology, 80, 156–167. Dubois D. (1993). Competency-based performance: A strategy for organizational change. Boston: HRD Press. Farrell JN, McDaniel MA. (2001). The stability of validity coefficients over time: Ackerman’s (1988) model and the general aptitude test battery. Journal of Applied Psychology, 86, 60–79. Fletcher C. (2001). Performance appraisal and management: The developing research agenda. Journal of Occupational and Organizational Psychology, 74, 473–487. Fletcher C, Baldry C, Cunningham-Snell N. (1998). The psychometric properties of 360 degree feedback: An empirical study and cautionary tale. International Journal of Selection and Assessment, 6, 19–34. Fletcher C, Perry EL. (2001). Performance appraisal and feedback: A consideration of national culture and a review of contemporary research and future trends. In Anderson ND, Ones DS, Sinangil HK, Viswesvaran C (Eds.). Handbook of industrial, work and organizational psychology (pp. 127–144). London: Sage. Gilliland SW, Langdon JC. (1998). Creating performance management systems that promote perceptions of fairness. In Smither JW (Ed.), Performance appraisal: State of the art in practice (pp. 209–243). San Francisco: Jossey-Bass. Gorsline K. (1996). A competency profile for human resources: No more shoemaker’s children. Human Resource Management, 35, 53–66. Himelfarb F. (1996). Training and executive development in the RCMP. Ottawa: RCMP. Hough LM. (1984). Development and evaluation of the “accomplishment record” method of selecting and promoting professionals. Journal of Applied Psychology, 69, 135–146.


229

Huffcutt AI, Weekley JA, Wiesner WH, DeGroot TG, Jones C. (2001). Comparison of situational and behavior description interview questions for higher level positions. PERSONNEL PSYCHOLOGY, 54, 619–644. Klein AL. (1996). Validity and reliability for competency-based systems: Reducing litigation risks. Compensation and Benefits Review, 28, 31–37. Latham GP, Almost J, Mann S, Moore C. (2005). New developments in performance management. Group Dynamics, 34, 77–87. Latham GP, Mann S. (2006). Advances in the science of performance appraisal: Implications for practice. International Review of Industrial and Organizational Psychology, 21, 295–337. Lievens F, Sanchez JI, De Corte W. (2004). Easing the inferential leap in competency modeling: The effects of task-related information and subject matter expertise. PERSONNEL PSYCHOLOGY, 57, 881–905. Malos SB. (1998). Current legal issues in performance appraisal. In Smither JW (Ed.), Performance appraisal: State of the art in practice (pp. 49–94). San Francisco: Jossey-Bass. McDaniel MA, Morgeson FP, Bruhn Finnegan E, Campion MA, Braverman EP. (2001). Use of situational judgement tests to predict job performance: A clarification of the literature. Journal of Applied Psychology, 86, 730–740. Meyer HH. (1987). Predicting supervisory ratings versus promotional progress in test validation studies. Journal of Applied Psychology, 72, 696–697. Milliman J, Nason S, Zhu C, De Cieri H. (2002). An exploratory assessment of the purposes of performance appraisals in North and Central America and the Pacific Rim. Asia Pacific Journal of Human Resources, 40, 105–122. Motowildo SJ, Dunnette MD, Carter GW. (1990). An alternative selection procedure: The low fidelity simulation. Journal of Applied Psychology, 75, 640–647. Motowidlo SJ, Tippins N. (1993). Further studies of the low-fidelity simulation in the form of a situational inventory. Journal of Occupational and Organizational Psychology, 66, 337–344. Murphy KR, Cleveland JN. (1995). Understanding performance appraisal: Social, organizational, and goal-based perspectives. Thousand Oaks, CA: Sage. Pearlman K. (2002, June). Competency modeling: Mirror into the 21st century workplace or just smoke. Paper presented at the 26th Annual IPMAAC Conference on Personnel Assessment, New Orleans. Pulakos ED. (1991). Behavioral performance measures. In Jones J, Steffy BD, Bray DW (Eds.). Applying psychology in business: The handbook for managers and human resource professionals (pp. 307–313). New York: Lexington Books. Pulakos E. (1997). Ratings of job performance. In Whetzell DL, Wheaton GR (Eds.), Applied measurement methods in industrial psychology (pp. 291–318) Palo Alto, CA: Davis-Black. Pulakos ED, Schmitt N. (1995). Experience-based and situational interview questions: Studies of validity. PERSONNEL PSYCHOLOGY, 48, 289–308. RCMP Human Resource Directorate. (1997). RCMP core competencies. Ottawa, ON: Royal Canadian Mounted Police. RCMP Research Branch. (1999). RCMP corporal job simulation exercise: Cycle 3 (Technical Report). Ottawa, ON: RCMP. Schmitt N, Gooding RZ, Noe RA, Kirsch M. (1984). Meta-analyses of validity studies between 1964 and 1982 and the investigation of study characteristics. PERSONNEL PSYCHOLOGY, 37, 407–422. Schippmann JS, Ash RA, Battista M, Carr L, Eyde LD, Hesketh B, et al. (2000). The practice of competency modeling. PERSONNEL PSYCHOLOGY, 53, 703– 740.

230


Shaw DG, Schneier CE, Beatty RW. (1991). Managing performance with a behaviorally based appraisal system. In Jones J, Steffy BD, Bray DW (Eds.), Applying psychology in business: The handbook for managers and human resource professionals (pp. 314–325). New York: Lexington Books. Shrout PE, Fleiss JL. (1979). Intraclass correlations: Uses in assessing rater reliability. Journal of Applied Psychology, 86, 420–428. Slivinski L, Donoghue E, Chadwick M, Ducharme FA, Gavin DW, Lorimer A, et al. (1996). The wholistic competency profile: A model. Ottawa, ON: Staffing Policy and Program Development Directorate, Public Service Commission of Canada. Smither JW. (1998). Lessons learned: Research implications for performance appraisal and management practice. In Smither JW(Ed.), Performance appraisal: State of the art in practice (pp. 537–547). San Francisco: Jossey-Bass. Sturman MC, Cheramie RA, Cashen LH. (2005). The impact of job complexity and performance measurement on the temporal consistency, stability, and test–retest reliability of employee job performance ratings. Journal of Applied Psychology, 90, 269–283. Tabachnik BG, Fidell LS. (2001). Using multivariate statistics, 4th ed. Boston: Allyn and Bacon. Trainor NL. (1997). Five levels of competency. Canadian HR Reporter, 10, 12–13. Tziner A, Latham GP, Price BS, Haccoun R. (1996). Development and validation of a questionnaire for measuring perceived political considerations in performance appraisal. Journal of Organizational Behavior, 17, 179–190. Tziner A, Murphy KR. (1999). Additional evidence of attitudinal influences in performance appraisal. Journal of Business and Psychology, 13, 407–419. Viswesvaran C, Ones DS, Schmidt FL. (1996). Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 81, 557–574. Viswesvaran C, Schmidt FL, Ones DS. (2005). Is there a general factor in ratings of job performance? A meta-analytic framework for disentangling substantive and error influences. Journal of Applied Psychology, 90, 108–131.