Scandinavian Journal of Psychology, 1999, 40, 229– 233
Norwegian short-form of the Marlowe-Crowne Social Desirability Scale FLOYD W. RUDMIN Department of Psychology, Uni6ersity of Tromso *, Norway
Rudmin, F. W. (1999). Norwegian short-form of the Marlowe-Crowne Social Desirability Scale. Scandina6ian Journal of Psychology, 40, 229 – 233. A tutorial example demonstrates the effects of social desirability bias on fictional multiculturalism and mental health data and how bias can be moderated by partial correlations using social desirability measures of different degrees of validity. The 33-item Marlowe-Crowne Social Desirability Scale was translated from English to Norwegian and presented to 117 university students and 124 non-students. Using psychometric criteria, and a ‘‘seed-crystal’’ method of scale accretion, a 10-item Norwegian short-form of the Marlowe-Crowne Scale was produced. Key words: Cross-cultural, Marlowe-Crowne, Norway, response bias, social desirability, translation. Floyd W. Rudmin, Dept. of Psychology, Uni6ersity of Tromso *, Tromso * N-9037, Norway. Tel: + 47 77 64 59 53, Fax: +47 77 64 52 91, E-mail:
[email protected]
INTRODUCTION Self-report questionnaires are widely used in social, clinical, and personnel psychology. However, self-reports are not accurate reports (Anastasi, 1976). In addition to random error, various kinds of systematic error, or bias, have been identified (Frederiksen, 1965; Paulhus, 1991; Spector, 1992). Social desirability bias is ‘‘the tendency for individuals to portray themselves in a generally favorable fashion’’ (Holden, 1994, p. 429). This tendency varies across individuals and contexts, and may entail a trait of high self-regard and/or deliberate impression management. Social desirability was described by Edwards in 1957 and has been long discussed within psychometrics (Block, 1965; Tedeschi, 1981). Measures of social desirability are useful for identifying susceptible items during the development of psychometric scales (Streiner & Norman, 1995), and for covariance computations to reduce bias effects during statistical analyses (Hough et al., 1990). A meta-analysis by Ones et al. (1996) has shown that carefully developed personality scales still retain residual susceptibility to social desirability, but that in some applied contexts covariance control is unnecessary. With less strictly developed scales and in contexts requiring covariance control of social desirability, the impact of bias can be estimated from the size of bias variance relative to true variance. Considering bias to be the common variance for two otherwise randomly related measures (Magnusson, 1967), then the false correlation caused by bias would be r =bias variance/(true variance +bias variance). If the true measures and bias had the same unit variance, then r =1/(1+1)=0.50. If bias variance were 25% of true variance, then r =0.25/(1+0.25)=0.20, and so forth. Clearly, a little bias goes a long ways.
In practice, bias variance is unknown, and sampling error complicates these simple estimations. Table 1 presents fabricated data for 20 persons (P1 to P20) on two variables of interest and on social desirability (SD) bias. The first three columns list the ‘‘true’’ measures independent of the self-report process. In practice, these values are unknown and unknowable. The numbers in the first column are random normal (generated Mn = 7, Variance= 16). The numbers in the second column are a randomization of the numbers in first column. Thus, the theoretical correlation of Var. 1 and Var. 2 is r = 0.00. The correlation computed on this sample of 20 is r =0.09. The social desirability data in column three are random normal (generated Mn=7, Variance= 16). On the right in Table 1 are the known, but biased, self-report data corresponding to the true measures on the left. The self-report data for Var. 1 and Var. 2, shown in columns four and five, were created by adding the true social desirability bias to the true values for Var. 1 and Var. 2. Data for three different self-report measures of social desirability appear on the far right. SD1 was created by adding value 4 to the true SD Bias in column three. SD2 was created by adding plus and minus random error to SD1. SD3 was created by adding still further error. The validity correlations for the three measures, shown below the three columns, are the correlations of each SD measure with the true measure of SD bias in column three. Thus, SD1 is a perfect measure of social desirability bias (r= 1.00). SD2 is a rather good measure (r= 0.89). SD3 is still worse (r=0.56). In practice, the validity of social desirability measures cannot be estimated with such precision. Var. 1 and Var. 2 are variables susceptible to social desirability bias, for example, ‘‘multiculturalism’’ and ‘‘mental health’’. Most people would not want to appear to
© 1999 The Scandinavian Psychological Associations. Published by Blackwell Publishers, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA. ISSN 0036-5564.
230
F. W. Rudmin
Scand J Psychol 40 (1999)
others or to themselves to have low scores on these measures, suggesting racism and instability. So intentionally or not, people may report themselves higher on these measures. The first person, P1, is truly a racist, with moderate stability, and a moderate tendency to bias self-report answers. P2 is also a racist, with good mental health, but no tendency to bias answers. P4 has a strong tendency for social desirability bias, causing both self-report scores to be high. Social desirability bias causes the variances of Var. 1 and Var. 2 to be increased above their true values. If only the self-report measures were available, the correlation for Var. 1 and Var. 2 would be r=0.47 (n=20, pB 0.05), and the conclusion would be that multiculturalism and mental health are positively correlated. If a perfect measure of social desirability bias were available, SD1, the partial correlation correction of the self-report scores would yield
r= 0.12, not significant. Partial correlation corrections with less than perfect measures, SD2 and SD3, also succeed in avoiding mistaken conclusions. In practice, social desirability bias has complex interactions with specific questions and with the unique demand characteristics of the study. But these data do illustrate how a systematic bias by some people can cause the false appearance of a significant correlation. These data also illustrate how measures of social desirability, even a relatively poor measure, can help avoid mistaken conclusions. It is essential that every nation have scales to measure social desirability and that these have psychometric properties suitable for covariance corrections of a diversity of self-report measures in a wide variety of studies. If social desirability measures are to be useful and used, they should be independent of common demographic variables, robust for various demand characteristics, and unobtrusive. Psy-
Table 1. Fictional data illustrating social desirability bias True but unknown data
Known but biased self-report data Three SD measures
Var. 1
Var. 2
SD Bias
Var. 1 +Bias
Var. 2 +Bias
SD1
SD2
SD3
P1 P2 P3 P4 P5
0 0 2 2 4
8 11 0 7 0
7 0 2 12 7
7 0 4 14 11
15 11 2 19 7
11 4 6 16 11
11 5 0 16 11
11 5 11 16 0
P6 P7 P8 P9 P10
6 6 7 7 7
14 12 7 6 8
5 4 8 9 13
11 10 15 16 20
19 16 15 15 21
9 8 12 13 17
9 8 12 17 14
9 8 12 17 8
P11 P12 P13 P14 P15
8 8 8 9 10
10 2 4 12 7
6 10 4 9 10
14 18 12 18 20
16 12 8 21 17
10 14 8 13 14
8 16 6 13 13
14 16 6 13 13
P16 P17 P18 P19 P20
11 11 12 12 14
2 6 11 8 9
10 9 5 3 4
21 20 17 15 18
12 15 16 11 13
14 13 9 7 8
17 12 9 5 8
17 12 9 5 8
Mean: Variance:
7.2 16.1
7.2 16.1
6.9 11.9
14.1 31.7
14.1 22.6
10.9 11.9
10.5 20.5
10.5 20.5
Validity of SD scales: SD1 r =1.00* Correlations of Var. 1 with Var. 2 Unbiased
Biased
r= 0.09
r = 0.47*
*Statistically significant, pB 0.05. © 1999 The Scandinavian Psychological Associations.
SD2 0.89*
SD3 0.56*
With partial-correlation control Using:
SD1 r = 0.12
SD2 0.17
SD3 0.38
Norwegian Social Desirability Scale
Scand J Psychol 40 (1999)
chologists now realize that excessively long psychometric tasks are inefficient and abusive of people’s time and goodwill. Of the established social desirability scales, the MarloweCrowne Scale is one of the oldest and most widely used (Crowne & Marlowe, 1960). It was developed to be a measure of bias towards affirming social norms and to be independent of psychopathology. Holden and Fekken (1989) found that this scale is independent of gender and has items focused on interpersonal sensitivity and considerateness. Paulhus (1991), in a factor study of ten social desirability scales, concluded that the Marlowe-Crowne Scale is a measure of impression management and, to a lesser degree, self-deception. Two faults with the MarloweCrowne Scale are (1) excessive length and (2) unbalanced positive and negative keying. Positively and negatively keyed items should be equal in number if acquiescence bias, or the tendency to answer ‘‘yes’’, is to be cancelled out. The purpose of the present study was to produce a Norwegian short-form of the Marlowe-Crowne Scale.
METHOD Instrument The 33 Marlowe-Crowne items were separated into 4 blocks of 8 or 9 items. For each block, forward-and-back translation from English to Norwegian to English was carried out by two independent teams of bilingual university students. The two translations were then compared, and a best translation was negotiated. A two-page questionnaire for the 33 Norwegian items was prepared, with two manipulations, producing four variations of the response form: (1) a confidence manipulation asked people to be sure of their answers on half of the forms, and to be impulsive on the other half; (2) a self-focus manipulation provided space on half of the forms for people to write their names and no such space on the other half. As with the original English version, the response options for the 33 items were true or false. Gender and age were requested at the end of the questionnaire. The cover letter invited participants to help with the development of a questionnaire to be used in social science research. It also explained that participation was voluntary and that the recording and reporting of the data would be anonymous. A summary of the results was offered to be sent to those requesting it.
Procedures Data collection was done by 16 students (5 men, 11 women) taking a research methods course. Participants were approached individually and completed the questionnaire at the time of recruitment. When finished, participants were asked to identify any items that seemed vague or poorly worded. Participants Participants were of two types: (1) university students, and (2) non-student adults. A balance of men and women was sought for each type. Students were recruited in university cafeterias and dormitories; non-students in airport and ferry waiting rooms, at hospital worksites, and in an art museum cafe. The student refusal rate was approximately 6%, usually for reasons of being too busy. © 1999 The Scandinavian Psychological Associations.
231
The non-student refusal rate was approximately 30%, for reasons of privacy, tiredness, disdain for questionnaires, as well as being busy. Of the 117 responding students, 61 were female, 56 male. Mean age was 26 (SD = 5 years). Of the 124 responding non-students, 59 were female, 65 male. Mean age was 39 (SD =14 years).
Statistics plan Responses to the items were dummy coded as true = 1 and false = 0 for the positively keyed items and the reverse of this for the negatively keyed items. Thus, summated scale scores indicate a tendency for social desirability bias. For each item, descriptive information was tabulated by which to identify weak items for elimination from the short-form. Cronbach alpha coefficients and inter-item correlations were used to evaluate reliability.
RESULTS Even though people were recruited while engaged in other activities, over 90% of them answered all 33 items. One person omitted 19 items, 2 people 2 items, and 15 people 1 item. The name manipulation was not fully effective: of the 125 people with forms with space for their names, only 57 wrote their name. Comments indicated that omission of name was due to concern about privacy and to inadvertency. Eight criteria identified items for elimination: (a) items frequently identified as vague or poorly worded (worst item was identified 19 times); (b) items frequently not answered (worst was unanswered 4 times); (c) items with low variance (worst had SD=0.19); (d) items correlated with gender (worst had r = −0.19); (e) items correlated with age (worst had r =0.39); (f) items negatively correlated with signing one’s name (worst had r= −0.12); (g) items with low item-total-correlations (worst had r= 0.07); and (h) items with different item-total-correlations for students and nonstudents (worst had r= 0.13 and r =0.46 respectively). By these criteria, 16 items were excluded. The remaining 11 positively keyed items and 6 negatively keyed items were then examined for inclusion in a shortform. This was an iterative process, which began with a core, four-item scale, like a ‘‘seed crystal’’, for the accretion of additional items according to psychometric criteria. The ‘‘seed’’ was comprised of the two positively keyed items and the two negatively keyed items with the highest item-total correlations in the item-analysis of the full 33-item scale. The ‘‘seed’’ was thus defined to be highly representative of the original scale. In Table 2, items 1, 2, 4 and 8 comprised the ‘‘seed’’, with item-total correlations of r= 0.42, r= 0.43, r= 0.36 and r =0.41, respectively. Successive pairs of positively and negatively keyed items were then selected based on their psychometric characteristics and tested for compatibility with this ‘‘seed’’. Compatibility criteria required that the enlarged scale: (1) have a higher Cronbach alpha coefficient, (2) show low correlations with gender, age, and instructions, and (3) approximate a normal distribution. By this iterative process, the 10-item short-form shown in Table 2 was derived.
232
F. W. Rudmin
Scand J Psychol 40 (1999)
Table 2. Short-form of the Marlowe-Crowne social desirability scale English
Norwegian
(1) No matter who I’m talking to, I’m always a good listener. (2) *There have been a few occasions when I took advantage of someone. (3) *I sometimes try to get even, rather than forgive and forget. (4) When I don’t know something, I don’t at all mind admitting it. (5) *There have been occasions when I felt like smashing things. (6) I never resent being asked to return a favor. (7) I have almost never felt the urge to tell someone off. (8) *I am sometimes irritated by people who ask favors of me. (9) *I sometimes think when people have a misfortune they only got what they deserved. (10) I have never deliberately said something that hurt someone’s feelings.
(1) Jeg er en god lytter unansett hvem jeg snakker med. (2)* Det har hendt at jeg har utnyttet folk. (3)* Noen ganger vil jeg heller ta igjen enn a˚ tilgi og glemme. (4) Na˚r det er noe jeg ikke vet, koster det meg ikke noe a˚ innro *mme det. (5)* Det har vært stunder da jeg har hatt lyst til a˚ smadre ting. (6) Jeg har aldri noe imot a˚ bli spurt om a˚ gjengjelde en tjeneste. (7) Jeg har nesten aldri hatt lyst til a˚ skjelle noen ut. (8)* Av og til blir jeg irritert pa˚ folk som ber meg om tjenester. (9)* Av og til na˚r folk mislykkes synes jeg de fa˚r som fortjent. (10) Jeg har aldri sagt noe med den hensikt a˚ sa˚re.
Asterisks * indicate items that are negati6ely keyed, meaning that agreement indicates lack of social desirability tendencies.
DISCUSSION Table 3 compares the psychometric properties of the longand short-forms of this Norwegian translation of the Marlowe-Crowne Scale. The short-form has the advantages of having 70% fewer items, a mean nearer the mid-point of the response scale, a larger standard deviation, and fewer criticisms of language. Despite these changes, the two forms have a strong positive correlation (r= 0.83, n= 224, pB 0.001), indicating that the ‘‘seed’’ selection process did make the short-form a reliable equivalent of the long-form. The Cronbach alpha coefficient does decrease for the shortform, from a =0.78 to a=0.65; but the short-form has better inter-item correlations than the long-form. Alpha tends to decrease as the number of items decreases and is therefore not a useful statistic for comparing scales comprised of different numbers of items. Alpha coefficients for the two groups of participants are more similar on the short-form than on the long-form. In Fig. 1, the histogram for the short-form shows that it approximates a normal distribution. Although direct studies of validity remain to be done, Pearson correlations with dummy-coded criteria variables show the short-form to be insensitive to gender (0 =male, 1 =female) and age. If instructions (0 = guess, 1 =be sure) and experimenter gender (0 = male, 1 =female) are presumed to embody different demand characteristics, then both versions are robust for such sources of variation. A measure of social desirability should be positively correlated with activation of the respondent’s self-schema, which was operationalized in this study by writing one’s name at the top of the questionnaire (0 =no name, 1 = name). This name manipulation was not fully effective; nevertheless, the expected positive correlation was weakly evident. Possibly the individualized process of recruiting participants had activated the self-schema to the point that writing one’s name had little further effect. It is also plausible that people susceptible to this manipulation declined to participate or © 1999 The Scandinavian Psychological Associations.
refuse to write their name. Finally, the negative correlation of social desirability with higher education (r= −0.18) computed in a meta-analysis of nine studies (Ones et al., 1996) was replicated with dummy-coded categories (0 = non-student, 1 = student) for the long-form (r= −0.19, n= 224, p= 0.005) and the short-form (r = −0.13, n =234, p= 0.05). An unpublished Norwegian translation of Schuessler’s (1982) 10-item English short-form of the Marlowe-Crowne Scale was made by Wichstro *m (1995). Schuessler had selected 5 positively keyed and 5 negatively keyed items based on highest item-total correlations. His short-form and the one developed here both included items 1, 2, 3, 5, 6 and 10 in Table 2. Comparing reliability coefficients for these short-forms, Wichstro *m reported a =0.54 for Norwegian youths; the present study found a= 0.64 for Norwegian university students, and Schuessler reported a =0.72 for U.S. university students. Reliability coefficients will increase if a four-point Likert response option is used instead of true – false (Halpin et al., 1994). The mean social desirability score for U.S. students was 6.4 (SD=2.45), which is amost 100% higher than the Norwegian students’ mean of 3.4 (SD=1.98). Although Norway and the United States are kindred cultures, they are still different cultures, as this comparison demonstrates. The borrowing and translation of psychometric scales between cultures must proceed with caution.
The following people assisted with this study: Svein Bergvik, Lars Bjo *rum, Wenche Bjo *rnstad, Trine Dalen Bergersen, Lene Danielsen, Astrid Eriksen Lorem, Britt Eriksen, Sturla Fossum, Turid Moen, Therese Nilsen, Audhild Sinnes, Anne Silviken, Marianne Strand, Kristina ) Ostvik, Jon Even Aasum, and Rune Wilhehmsen. Svein Bergvik and Britt Eriksen suggested that experimenter gender be considered a possible correlate of social desirability bias. Monica Martinussen helped with statistical theory. An anonymous reviewer suggested that the mathematics of reliability be adopted to explain the impact of bias.
Norwegian Social Desirability Scale
Scand J Psychol 40 (1999)
233
Table 3. Psychometric comparison of long- and short-forms Long-form DESCRIPTIVE MEASURES: Number of Items Summed for the Scale: Mean Response on the Scale (Range 0 to 1): Response Standard Deviation: Mean Reports of Vague or Poorly Worded Items:
33 0.33 0.15 0.005
Short-form
10 0.37 0.21 0.002
RELIABILITY MEASURES: Cronbach Alpha Coefficient: Cronbach Alpha Coefficient for Student Group: Cronbach Alpha Coefficient for Non-Student Group: Mean Inter-Item Correlation: Lowest Inter-Item Correlation: Proportion of Negative Inter-Item Correlations:
0.78 0.81 0.72 0.10 −0.14 0.12
0.65 0.64 0.66 0.16 −0.03 0.04
VALIDITY MEASURES: Correlation of Scale with Correlation of Scale with Correlation of Scale with Correlation of Scale with Correlation of Scale with
−0.07 0.12 0.03 −0.02 0.05
−0.01 0.09 0.00 0.03 0.11
Gender: Age: Instructions: Gender of Experimenter: Writing One’s Name:
Fig. 1. Frequency distribution of the Norwegian social desirability short-form.
REFERENCES Anastasi, A. (1976). Psychological testing (4th ed.). New York: Macmillan. Block, J. (1965). The challenge of response sets: Unconfounding meaning, acquiescence, and social desirability in the MMPI. New York: Appleton-Century-Crofts. Crowne, D. P. & Marlowe, D. (1960). A new scale of social desirability independent of psychopathology. Journal of Consulting Psychology, 24, 349–354. Edwards, A. L. (1957). The social desirability 6ariable in personality assessment and research. New York: Dryden Press. Frederiksen, N. (1965). Response set scores as predictors of performance. Personnel Psychology, 18, 225–244. Halpin, G., Halpin, G. & Arbet, S. (1994). Effects of number and type of response choices on internal consistency reliability.
© 1999 The Scandinavian Psychological Associations.
Perceptual and Motor Skills, 79, 928 – 930. Holden, R. R. (1994). Social desirability. In R. J. Corsini (ed.). Encyclopedia of psychology (2nd ed., vol. 3, pp. 429 – 430). New York: John Wiley. Holden, R. R. & Fekken, G. C. (1989). Three common social desirability scales: Friends, acquaintances, or strangers? Journal of Research in Personality, 23, 180 – 191. Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D. & McCloy, R. A. (1990). Criterion-related validities of personality constructs and the effect of response distortion on those validities. Journal of Applied Psychology, 75, 581 – 595. Magnusson, D. (1967). Test theory. London: Addison-Wesley. Ones, D. S., Viswesvaran, C. & Reiss, A. D. (1996). Role of social desirability in personality testing for personnel selection: The red herring. Journal of Applied Psychology, 81, 660 – 679. Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P. R. Shaver & L. S. Wrightsman (eds.). Measures of personality and social psychological attitudes. Vol 1: Measures of social psychological attitudes. (pp. 17–59). Toronto: Academic Press. Schuessler, K. F. (1982). Measuring social life feelings. San Francisco: Jossey-Bass. Spector, P. E. (1992). Summated rating scale construction: An introduction. Newbury Park, CA: Sage. Streiner, D. L. & Norman, G. R. (1995). Health measurement scales: A practical guide to their de6elopment and use (2nd ed.). Oxford: Oxford University Press. Tedeschi, J. T. (Ed.) (1981). Impression management theory and social psychological research. New York: Academic Press. Wichstro *m, L. (1995). Harter’s Self-Perception Profile for Adolescents: Reliability, validity, and evaluation of the question format. Journal of Personality Assessment, 65, 100 – 116. Received 18 December 1997, accepted 23 April 1998