cultural Transportability of Situational Judgment Tests

International Journal of Selection and Assessment

Volume 23 Number 4 December 2015

The Cross-cultural Transportability of Situational Judgment Tests: How does a US-based integrity situational judgment test fare in Spain? Filip Lievens*, Jan Corstjens*, Miguel Angel Sorrel**, Abad**, Julio Olea** and Vicente Ponsoda** Francisco Jose *Department of Personnel Management and Work and Organizational Psychology, Ghent University, Henri Dunantlaan 2, 9000 Ghent, Belgium. [email protected] **Universidad Aut onoma de Madrid, Madrid, Spain

Despite the globalization of HRM, there is a dearth of research on the potential use of contextualized selection instruments such as situational judgment tests (SJTs) in other countries than those where the selection instruments were originally developed. Therefore, two studies are conducted to examine the transportability of an integrity SJT that was originally developed in the United States to a Spanish context. Study 1 showed that most SJT scenarios (16 out of 19) that were developed in the United States were also considered realistic in a Spanish context. In Study 2, the item option endorsement patterns converged to the original scoring scheme, with the exception of two items. In addition, there were high correlations between the original US empirical scoring scheme and two empirical scoring schemes that were tailored to the Spanish context (i.e., mode consensus scoring and proportional consensus scoring). Finally, correlations between the SJT integrity scores and ratings on a self-report integrity measure did not differ significantly from each other according to the type of scoring key (original US scoring vs. Spanish scoring keys). Overall, these results shed light on potential issues and solutions related to the cross-cultural use of contextualized selection instruments such as SJTs.

1. Introduction he globalization of the economy necessitates multinational organizations to view the labor market in an international scope and to develop staffing systems that can be used across multiple countries. The design of personnel selection procedures across various countries represents one crucial component of any global staffing system (Huo, Huang, & Napier, 2002; Ryan, McFarland, Baron, & Page, 1999; Ryan & Tippins, 2009; Ryan, Wiechmann, & Hemingway, 2003). Today’s increased emphasis on cross-cultural selection systems stands in sharp contrast with the majority of current personnel selection studies. So far, selection research has predominantly focused on within-country (domestic) examinations (Lievens, 2006). One area wherein progress has been

T

made is the cross-cultural use of sign-based selection procedures such as cognitive ability tests (e.g., Salgado, Anderson, Moscoso, Bertua, & De Fruyt, 2003; Salgado, Anderson, Moscoso, Bertua, De Fruyt, & Rolland, 2003) and personality inventories (e.g., Salgado, 1997), with most research finding support for the cross-cultural use of sign-based selection procedures. However, it is difficult to generalize the results obtained with these sign-based predictors to the cross-cultural applicability and transportability of sample-based selection procedures (Briscoe, 1997; Lievens, 2006). Sample-based selection procedures present a series of situations to candidates that are representative of the situations they might encounter in their job. They include both low-fidelity (e.g., situational judgment tests [SJT]) and high-fidelity formats (e.g., assessment center exercises). The key underlying

C 2015 John Wiley & Sons Ltd, V

9600 Garsington Road, Oxford, OX4 2DQ, UK and 350 Main St., Malden, MA, 02148, USA

362 characteristic of the sample-based paradigm is that it capitalizes on point-to-point correspondence with the criterion domain and contextualization. Although these notions have produced adequate validities for samplebased selection procedures (Schmidt & Hunter, 1998), this might also come with potential problems. That is, it might increase the probability that contextual factors affect the presentation of the situations and the scoring of the responses so that a particular SJT can be used only in a specific context (Briscoe, 1997; Lievens, 2006; Ployhart & Weekley, 2006). Such contextual effects can be situated at various levels (Johns, 2006). Examples are cultural, industry, organizational, departmental, and job effects. In this study, we focus on one type of contextual influence, namely cultural effects. We also focus on one particular kind of sample-based selection procedure, namely the SJT. SJTs confront applicants with descriptions of jobrelated scenarios and ask them to indicate how they would react by choosing an alternative from a list of predetermined responses (Christian, Edwards, & Bradley, 2010; McDaniel, Hartman, Whetzel, & Grubb, 2007; McDaniel, Morgeson, Finnegan, Campion, & Braverman, 2001). As compared with high-fidelity sample-based predictors, SJTs might be easily deployed via the Internet in a global context due to their efficient administration and automatic scoring (Motowidlo, Dunnette, & Carter, 1990; Ployhart, Weekley, Holtz, & Kemp, 2003). In light of their contextualization, researchers have questioned whether SJTs can be used across countries. For example, Ployhart and Weekley (2006) mentioned the following key challenge: ‘it is incumbent on researchers to identify the cross-cultural generalizability-and limits- of SJTs. . . One might ask whether it is possible to create a SJT that generalizes across cultures. Given the highly contextual nature of SJTs, that poses a very interesting question’ (p. 349). Unfortunately, few studies have examined cultural differences in terms of the item stems, response options, or scoring keys of SJTs. Taken together, with selection practice evolving more on a global scale, there exists a need for more insight into the international use of selection procedures (Ryan & Tippins, 2009). This is especially the case for selection procedures such as SJTs that lend themselves to efficient and quick Internet screening of applicants in multiple countries. However, at this time there exists little conceptual insight and empirical evidence related to the crosscultural transportability of SJTs. Hence, practitioners are left in the dark whether SJTs can be deployed in countries other than those where they were originally developed. Therefore, we conduct two studies that examine whether an SJT can be used in another country than the one where it was originally developed. To this end, the transportability of the item stems, item options, and scoring scheme are scrutinized. This study’s specific context deals with transporting an existing SJT with integrity situations (Becker, 2005) from the United States to Spain. To

International Journal of Selection and Assessment Volume 23 Number 4 December 2015

F. Lievens et al. our knowledge, Becker was one of the first to propose SJTs as an alternative approach for measuring integrity (Sackett, Burris, & Calahan, 1989; Van Iddekinge, Raymark, & Odle-Dusseau, 2012). His conceptualization of integrity differs also from integrity conceptualizations embraced in traditional overt and personality-based integrity tests. Whereas such traditional integrity tests primarily tap into honesty or conscientiousness, Becker (1998) draws on the objectivist ethics literature to conceptualize integrity as actions in accordance with principles and morally justifiable rational values. This study’s US–Spanish context is relevant because in cultural value frameworks the United States and Spain are typically situated in different clusters. In particular, in Hofstede (2001), Spain appears in a cluster of European and Latin American countries, being defined by moderate levels in individualism and high levels in power distance and uncertainty avoidance. Conversely, Anglo-Saxon cultures are characterized by low power distance, high individualism, and fairly low scores on uncertainty avoidance.

2. Study background 2.1. Overview of prior international SJT research Most existing SJT research in other countries deals with an indigenous or emic approach in which SJTs were developed and validated with the own culture as a point-ofreference. For example, Chan and Schmitt (2002) developed an SJT for civil service positions in Singapore. This implied that the job analysis, the collection of situations, the derivation of response alternatives, the development of the scoring key, and the validation took place in Singapore. Chan and Schmitt (2002) found that in Singapore the SJT was a valid predictor for overall performance and had incremental validity over cognitive ability, personality, and job experience. This corresponds to the metaanalytic research base about SJTs in the United States (McDaniel et al., 2007). Other examples of the use of such an emic SJT developmental approach include SJTs developed in Germany (Behrmann, 2004; Bledow & Frese, 2009; Funke & Schuler, 1998; Kleinmann & Strauss, 1998; Schuler, Diemand, & Moser, 1993), Belgium (Lievens, Buyse, & Sackett, 2005), the Netherlands (Born, 1994; Born, Van der Maesen de Sombreff, & Van der Zee, 2001), Korea (Lee, Choi, & Choe, 2004), Iran (Banki & Latham, 2010), and China (Jin & Wan, 2004). In each of these country-specific applications, the development of the SJTensured that the job relevant scenarios, response options, and scoring rubrics were derived from input of local subject-matter experts. However, there are also drawbacks in the emic approach. As an indigenous approach implicates the use of different instruments for different countries, it is a costly and time-consuming strategy. Another challenge for the emic approach is to contribute to the cumulative knowledge in a specific domain,

C 2015 John Wiley & Sons Ltd V

Cross-Cultural Transportability of SJTs which is typically built around generalizable concepts (Leong, Leung, & Cheung, 2010; Morris, Leung, Ames, & Lickel, 1999). Contrary to the emic approach, the imposed etic approach assumes that the same instrument can be applied universally across different cultures (Berry, 1969; Church & Lonner, 1998). So, according to the imposed etic approach a selection procedure developed in a given country can be exported for use in other countries in case guidelines for test translation and adaptation are taken into consideration (International Test Commission, 2001). Hence, the imposed etic approach represents an efficient strategy for cross-cultural assessment. However, the imposed etic approach is also not without limitations. Even when tests are appropriately translated and adapted, the test content of the transported instruments might reflect predominantly the culture from which the instrument is derived, thereby potentially omitting important emic aspects of the local culture (Cheung et al., 1996; Leong et al., 2010). In particular, as discussed below, due to the contextualized nature of SJT items there exist various potential drawbacks of transporting an SJT from one country to another one.

2.2. SJT item characteristics affecting cross-cultural transportability of SJTs Contextualized measures such as SJTs are developed with a specific criterion domain in mind. That is, SJT items are directly developed or sampled from the criterion behaviors that the test is designed to predict (Chan & Schmitt, 2002). Therefore, SJT items are typically embedded in a particular context or situation that is representative of future job tasks. As noted before, this contextualized nature of SJT items makes them prone to various contextual influences. In this study, we focus on cultural differences. It is generally known that the culture wherein one lives acts like a lens, guiding the interpretation of events and defining appropriate behaviors (Heine & Buchtel, 2009; Lytle, Brett, Barsness, Tinsely, & Janssens, 1995). Accordingly, it might create boundary conditions for the use of an SJT across countries in at least four different ways (see Lievens, 2006). Specifically, we posit that the item stems, response alternatives, scoring keys, and item–construct linkages might affect the cross-cultural use of SJTs. 2.2.1. Item stems First, the contextualization in SJTs is shown in the kind of problem situations (i.e., the item stems) that are presented to candidates. These problem situations are generated from a job analysis and from critical incidents provided by high and low performers on a specific criterion (job). When SJTs are transported to another country, the issue then becomes whether the SJT situations still make sense. Some situations might no longer be relevant in the country to which the SJT is transported to.


363 Think, for example, about the differences in organizing meetings across countries. If one does not take account of these differences, it might well be that applicants are presented with an SJT item stem that is simply not considered to be realistic in their country. To our knowledge, no empirical studies have tested whether situations of an SJT that was originally developed in one country still make sense in another country. However, the cultural differences between the United States and Spain that we outlined above make it important to examine this. Therefore, we formulate the following first research question: Research Question 1: How many of the situations included in an integrity SJT that was developed in the United States are considered realistic in a Spanish context? 2.2.2. Item options The SJT response alternatives are a second factor that might impede the transportability of SJTs across countries. In SJTs, the response options typically reflect possible ways of dealing with a given situation as provided by high and low performers on a specific criterion (job). In an international context, one might question whether all response alternatives given to applicants are transportable from one country to another. In addition, a good distractor (e.g., ‘you have witnessed that a co-worker steals money and goods from the store in which both of you work. To stop the thefts you decide to talk to him, instead of revealing this to your supervisor’) in the country in which the SJT was developed might not be endorsed by many applicants in another country (e.g., country high in power distance). So far, one unpublished study examined whether the preference for response alternatives differs across countries. Nishii, Ployhart, Sacco, Wiechmann, and Rogg (2001) conducted a study among incumbents of a multinational food chain in different countries (Canada, Germany, Korea, Mexico, Spain, the United Kingdom, and Thailand) and investigated whether the endorsement of response options to five SJT items was affected by culture. Results revealed that people of different cultural backgrounds were differentially attracted to specific response alternatives, and that these differences were consistent with theoretical expectations. As a matter of fact, people from individualistic cultures chose response options that were task-oriented and that involved communicating directly with others. However, for the same item, people from collectivistic cultures tended to choose response options with a focus on group harmony and protecting others’ face. In light of the cultural differences between the United States and Spain (see above), it is important to examine in this study whether in the Spanish context the item options of an SJT are still regularly endorsed. Along these lines, it should be noted that low endorsement frequency of item options cannot be unambiguously interpreted as


364 indicative of a potential SJT cross-cultural transportability problem because even in the original country some SJT item options might not be endorsed a lot (i.e., these options might simply be poor distractors). One can reasonably expect, though, that the item options that in the original country were considered the best should be also the ones that are most frequently endorsed in the country where the SJT is being transported to (Liu, Harris, & Schmidt, 2006). Therefore, we limit our second research question to an inspection of the endorsement percentages of the best (aka ‘keyed’) item option: Research Question 2: To what extent are the keyed item options of an integrity SJT that was developed in the United States also most frequently endorsed in a Spanish context?

2.2.3. Scoring key Apart from item stems and response alternatives, the SJT scoring key is a third factor potentially impeding crosscultural transportability of SJTs because cultural differences might affect the effectiveness of response options. For instance, in one country (e.g., culture high in collectivism) answers that promote group harmony might be considered more effective, whereas the reverse might be true in countries belonging to a different cultural region (culture low in collectivism). Along these lines, Nishii et al. (2001) posited: ‘if a scoring key for a SJT is developed in one country and is based on certain cultural assumptions of appropriate or desirable behavior, then people from countries with different cultural assumptions may score lower on these tests. Yet these lower scores would not be indicative of what is considered appropriate or desirable response behavior in those countries.’ Thus, given the aforementioned cultural differences between the United States and Spain, it is important to examine the transportability of the original US scoring key to a Spanish context. This might be done by comparing the US empirical scoring key to a newly developed empirical scoring key that is tailored to the host culture. In this study, we, therefore, compared the original US empirical scoring scheme of the integrity SJT (Becker, 2005) to two other empirical scoring schemes (i.e., mode consensus scoring and proportion consensus scoring, Barchard, Hensley, & Anderson, 2013; Legree, Psotka, Tremble, & Bourne, 2005; Mayer, Caruso, & Salovey, 2000; Mayer, Salovey, & Caruso, 2002; Whetzel & McDaniel, 2009) that were especially developed in the Spanish context. In mode consensus scoring, the answer that is most frequently endorsed by respondents is considered the only correct item option. In proportion consensus scoring, the score associated with each item option reflects the percentage of respondents who endorsed that response. In sum, this leads to the following research question: Research Question 3: What is the correlation between SJT scores computed on the basis of a US scoring key and SJT


F. Lievens et al. scores computed on the basis of the two Spanish scoring keys?

2.2.4. Item–construct linkages A fourth item characteristic that might be prone to cultural differences is the link between response options as indicators for a given construct. We expect that the item–construct relationship in SJTs is more susceptible to deficiency and contamination because of possible crosscultural differences in the meaning/interpretation of situations or responses to the same situation. For example, given the same written situation (e.g., a situation depicting a meeting between an older supervisor and a group of employees), the same behavior (e.g., clearly and openly defending one’s views about work standards in front of the supervisor with all employees being present) might be linked to a specific construct (e.g., assertiveness) in one country (culture low in power distance), whereas it might be an indicator for another construct (e.g., rudeness, impoliteness) in another country (culture high in power distance). If item–construct linkages (and scoring keys) differ across countries, then the validity of the scores might be affected. Again, we found only one unpublished study that speaks to this issue. Such and Schmidt (2004) validated an SJT in three countries. Results in a cross-validation sample showed that the SJT was valid in half of the countries, namely the United Kingdom and Australia (which share the same heritage). Conversely, it was not predictive in Mexico. At the backdrop of the above, this study’s final research question dealt with examining whether different scoring schemes (US vs. the Spanish scoring schemes) might affect the criterion-related validity of the SJT integrity scores. Although the link between overt and personality-based integrity scores and counterproductive behavior is well established (see meta-analyses of Ones, Viswesvaran, & Schmidt, 1993; Van Iddekinge et al., 2012), it is not known whether this criterion-related validity evidence would also be observed when an SJT is used for measuring integrity and whether scoring keys developed in different countries would affect criterion-related validity. So, we examined the relationship between SJT integrity scores (computed on the basis of the different scoring schemes) and ratings on a Spanish self-report counterproductive behavior measure (i.e., first factor of the POLINTE measure, Abad, Olea, Ponsoda, & Garrido, 2007). Specifically, our research question dealt with possible differences in criterion-related validity depending on the scoring scheme used, namely: Research Question 4: Are the correlations between SJT scores computed on the basis of either US or Spanish scoring schemes and self-report ratings on counterproductive behavior significantly different from each other?


Cross-Cultural Transportability of SJTs

3. Study 1 Study 1 was conducted to examine our first research question regarding how many of the situations included in the original US-based integrity SJT made sense in a Spanish context.

3.1. Participants and procedure The sample of Study 1 was composed of 92 employed Spanish participants. Although this was a nonprobabilistic snowball-type sample, we took various precautions to obtain a heterogeneous and as-representative-as-possible sample. We emailed a sample of Spanish professionals that was extracted from different industrial sectors. In line with the National Classification of Occupations of the Spanish Institute of Statistics, five industries were considered: (1) management, office and administrative support occupations (25%); (2) education, training, and library occupations (27%); (3) food serving, personal care, and service occupations (15%); (4) life, physical, and social science occupations (15%); and (5) computer, financial, legal, and art occupations (17%). These professionals were asked to complete the questionnaire and to distribute it further to other potentially interested respondents. Note that we next used several criteria for including professionals into our final sample. Our inclusion criteria were that respondents should (1) have a job; (2) be proficient in the Spanish language; and (3) be native of Spain. Although we agree that these inclusion criteria might have slightly decreased the representativeness of the final sample, we believe they were important to guarantee that respondents had a strong understanding of Spanish culture and language, which was a prerequisite given the purpose of Study 1. All of this resulted in 92 participants (42 males, 50 females, age between 22 and 67 years, with a mean of 34.2 years, SD 5 11.3).

3.2. Measures We translated the integrity SJT that was originally developed by Becker (2005) to Spanish, thereby taking into

365 account the principles and guidelines for translating and using tests across cultures (Van de Vijver, 2003). Originally, this SJT was composed of 20 job-related scenarios, each one involving a potential workplace dilemma (e.g., about rational decision making, benevolence, honesty, independent thinking). We present an example item in the Appendix A. All original SJT integrity items and the empirical US scoring key are publicly available in the appendix of Becker (2005). Taking into consideration the scoring key developed by Becker, item 3 had zeros for all response options. That is the reason why we removed this item from the SJT. Becker found adequate constructrelated validity evidence for the SJT scores because they were correlated with ratings on various integrity relevant outcomes such as career potential (r 5 0.26), leadership (r 5 0.18), in-role performance (r 5 0.24), and overall job performance (r 5 0.22). To examine our first research question, participants were presented only with the item stems. So, the item options were stripped from the SJT. Participants were asked to rate the realism of each item stem on a 6-point scale: ‘1 5 absolutely not realistic,’ ‘2 5 very unrealistic,’ ‘3 5 somewhat realistic,’ ‘4 5 quite realistic,’ ‘5 5 very realistic,’ and ‘6 5 absolutely realistic.’

3.3. Results As we were dealing with ordinal variables, we computed the median for each situation and the mean of these values. Of the 19 situations, the average realism rating was 4.00 (SD 5 1.14). Items 10, 15, and 16 received the lowest ratings (medians were 2.5, 1, and 2, respectively). As shown in Table 1, these three items dealt with situations in which a generally respected employee (e.g., a surgeon) behaves in a way that runs counter one’s expectations (e.g., he/she is about to operate despite that he/ she has been drinking alcohol). The mean after excluding these items was 4.41 (SD 5 0.61)1. In short, we found that most of the scenarios (16 out of 19; 84.2%) developed in the US context were also considered realistic in Spain.

Table 1. Item stems considered unrealistic in a Spanish context Scenario 10. You’re a new clerk in a clothing store and are being trained by Angie, a veteran employee. She quietly tells you that because employees are paid minimum wage, most people sometimes take home clothes for themselves. Employees who don’t are considered dumb and arrogant. At closing time, Angie hands you a scarf to take home. Scenario 15. You’re seated at a table on which there is a hat. A job interviewer puts a dollar under the hat and says, ‘I’m going to leave. You decide whether to take the dollar and if you want to tell me you took or didn’t take it. When I return, I’ll ask you if you took it and then guess whether you’re lying. If you tell the truth, you get fifty cents. If you lie and I guess you told the truth, you get the dollar. If you lie and I guess it, you get nothing.’ He leaves. Scenario 16. You are a nurse. Dr. Jones arrives to perform a heart operation, and you smell beer on his breath. You tell your supervisor, but she says that you should keep quiet.



366

4. Study 2 We conducted Study 2 to examine our other three research questions. These dealt, respectively, with the item option endorsement, scoring key, and criterionrelated validity of the integrity SJT in the Spanish context.

4.1. Participants and procedure The sample of Study 2 was composed of 179 trainees (166 men, 13 women; mean age 5 26.44 years, SD5 2.14, range 22231 years) who had gone through a selection process for a job as a Spanish local police officer in the Madrid metropolitan area. This selection process had four main parts: (1) cognitive ability and personality tests; (2) physical ability tests; (3) health checks; and (4) a multiple choice knowledge test (police laws). Candidates passing these four exams were granted access to the Police Academy formation program (6 months). Given that the current personality test used in the selection process measured only the Big Five traits, the authorities showed interest to start using an integrity test to improve the selection process. As is often the case, the plan was to use the new test first in an experimental stage among already selected people (in this case trainees). Therefore, to obtain the empirical data of Study 2, we were given permission to administer test forms (both the integrity SJT and the POLINTE, see below) during participants’ first day at the Police Academy. Their participation was voluntary and they provided anonymous responses. On-site respondent attitudes (as observed by the test administration proctors) made us confident in the quality of the data gathered.

4.2. Measures 4.2.1. Integrity SJT The same SJT as in Study 1 was used. However, in Study 2, the SJT was administered as it is typically done. That is, participants were presented with the SJT items (item stems and item options) and were asked what they would do in each situation. Completing the SJT took about 20– 30 min. As noted above, we computed SJT scores on the basis of three scoring schemes. The first one was the original empirical scoring (OS) scheme as proposed by Becker (2005). This original scoring key was a criterionreferenced empirical scoring scheme because it was derived from the correlations between responses of 307 upper-level business students to a given SJT item and their self- and other-ratings of their integrity. That is, Becker first dummy-coded participants’ responses to every option (A–D) as 0 or 1 (option not chosen or chosen). Next, he correlated responses to each option with ratings of integrity (e.g., ‘This person is extremely honest’) pro-


F. Lievens et al. vided by the students and others who knew them well (family members, friends, and coworkers). In addition to this original empirical US-based scoring scheme, two empirical scoring schemes were tailored to the Spanish context. As noted above, we developed a proportional consensus scoring (PS) scheme on the basis of this study’s data2. In this scoring scheme, each option was scored according to the proportion of respondents selecting it. Thus, option correctness was understood as a continuous variable in the range between 0 and 1. The other scoring scheme involved mode consensus scoring (MS) in which the most chosen option was scored as correct (1), whereas the remaining alternatives were treated as incorrect (0). 4.2.2. POLINTE measure To gather criterion-related validity evidence we used the Spanish 44-item POLINTE measure (Abad et al., 2007). The POLINTE includes 12 subscales (see framework of Sackett & DeVore, 2001; Wanek, Sackett, & Ones, 2003) grouped in two factors: counterproductive behavior (theft thoughts, drugs/alcohol/tobacco use, social conformity and rule abidance, driving violations, perception of dishonesty norms and association with criminals) and socialization (level of social maturity and integrity related to personality characteristics such as locus of control, home life/upbringing, emotional stability, extraversion/introversion, safety/accident prone, and diligence/orderliness). An example item is: ‘I acknowledge to have had some problems with the law in the past.’ The response format was a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). Abad et al. (2007) found factor analytic evidence that the 12 facets could indeed be grouped in two factors: counterproductive behavior and socialization. In this study, this two-factor structure was supported by a Confirmatory Factor Analysis: v2 (53) 5 59.611, p 5 .25; CFI 5 0.982, TLI 5 0.978, SRMR 5 .05, RMSEA 5 .02. In addition, in this study, internal consistency reliability coefficients for the ratings were 0.79 (counterproductive behavior) and 0.79 (socialization). As noted above, this study’s research question dealt only with the first factor of the POLINTE measure (counterproductive behavior). Therefore, we report only the results for the six subscales related to this first factor. Note that – as expected – correlations between the SJT scores and ratings on the subscales related to the second factor (socialization) were not significant.

4.3. Results Our second research question dealt with the extent to which the keyed item options of the original integrity SJT were also most frequently endorsed in a Spanish context. To this end, an analysis of item option endorsement sheds light onto the plausibility of endorsement of the correct (keyed) item option and the incorrect item options. As


Cross-Cultural Transportability of SJTs

367

Table 2. Endorsement percentages of item options in the Spanish context Response options Item 1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A

B

C

D

Non plausible (