An Experimental Validation Method for Questioning ... - APA PsycNET

1 downloads 0 Views 129KB Size Report
Aug 16, 2013 - for Questioning Techniques That. Assess Sensitive Issues. Morten Moshagen, Benjamin E. Hilbig, Edgar Erdfelder, and Annie Moritz. University ...
Research Article

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

An Experimental Validation Method for Questioning Techniques That Assess Sensitive Issues Morten Moshagen, Benjamin E. Hilbig, Edgar Erdfelder, and Annie Moritz University of Mannheim, Germany Abstract. Studies addressing sensitive issues often yield distorted prevalence estimates due to socially desirable responding. Several techniques have been proposed to reduce this bias, including indirect questioning, psychophysiological lie detection, and bogus pipeline procedures. However, the increase in resources required by these techniques is warranted only if there is a substantial increase in validity as compared to direct questions. Convincing demonstration of superior validity necessitates the availability of a criterion reflecting the ‘‘true’’ prevalence of a sensitive attribute. Unfortunately, such criteria are notoriously difficult to obtain, which is why validation studies often proceed indirectly by simply comparing estimates obtained with different methods. Comparative validation studies, however, provide weak evidence only since the exact increase in validity (if any) remains unknown. To remedy this problem, we propose a simple method that allows for measuring the ‘‘true’’ prevalence of a sensitive behavior experimentally. The basic idea is to elicit normatively problematic behavior in a way that ensures conclusive knowledge of the prevalence rate of this behavior. This prevalence measure can then serve as an external validation criterion in a second step. An empirical demonstration of this method is provided. Keywords: social desirability, indirect questioning, randomized-response technique, validity

One of the most pervasive data sources in the social and behavioral sciences are self-reports of respondents (Baumeister, Vohs, & Funder, 2007). Indeed, requesting that individuals provide some information or judgment about themselves is arguably the most common form of empirically investigating research questions both in basic research and applied fields of psychology and beyond. A frequent aim is to estimate the proportion of respondents who hold a certain attitude, show certain dispositional tendencies, or behave in a certain way. To this end, it is most common to simply ask respondents directly about the issue under consideration and use the observed proportion of a particular response as a prevalence estimate for the respective attribute. It is well known, however, that responses to direct questions need not reflect an individual’s true status. Individuals may distort their answers in order to appear more favorably or to avoid (legal) sanctions (Birkeland, Manson, Kisamore, Brannick, & Smith, 2006; McFarland & Ryan, 2000). As a result, the tendency to present oneself in the best possible light will systematically bias responses toward respondents’ perceptions of what is socially acceptable (Tourangeau & Yan, 2007). Consequently, self-report measures will consistently underestimate the prevalence of socially undesirable attitudes, tendencies, and behaviors (such as academic cheating, doping, drug use, xenophobia; e.g., Ostapczuk, Musch, & Moshagen, 2009; Simon, Striegel, Aust, Dietz, & Ulrich, 2006) and overestimate the prevalence of socially desirable ones (such as hygiene practices, physical activity, Experimental Psychology 2014; Vol. 61(1):48–54 DOI: 10.1027/1618-3169/a000226

moral courage, voting; e.g., Adams et al., 2005; Moshagen, Musch, Ostapczuk, & Zhao, 2010). Several methods have been proposed to overcome this social desirability bias, including the bogus pipeline procedure (Jones & Sigall, 1971), indirect questioning formats such as the randomized-response technique (RRT; Chaudhuri & Christofides, 2007; Warner, 1965), psychophysiological lie detection (Iacono, 2000), and the use of so-called lie scales (Paulhus, 1984). All of these approaches are associated with an increase in resources or, more generally, research effort. Indirect questioning techniques, for example, require extensive explanation of the procedure to respondents, comparatively large sample sizes to compensate for the increase in sampling error, and make it difficult to compute measures of association as a result of the loss of individual-level information. Since such increased costs are only justified if there is also a substantial increase in validity, it is vital that any such technique can be shown to actually outperform direct questioning in terms of validity.

Types of Validation Studies Three types of validation studies can be distinguished, providing weak, intermediate, or strong evidence, respectively, for the validity of the questioning method under scrutiny (cf. Lensvelt-Mulders, Hox, van der Heijden, & Maas,  2013 Hogrefe Publishing

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

M. Moshagen et al.: Validating Questioning Techniques

2005). These three types differ concerning the strength of the criterion that is used to judge the validity of the procedure under investigation. Specifically, the to-be-estimated prevalence of the sensitive attribute is entirely unknown in weak, partially known in intermediate, and conclusively known in strong validation studies. Weak validation studies merely assess whether the method under investigation yields different prevalence estimates than another data collection mode, typically (but not necessarily) a simple direct question administered in a comparable setting (for recent examples see Coutts & Jann, 2011; Moshagen, Hilbig, & Musch, 2011). Thus, most typically, respondents are randomly assigned to either a direct questioning condition or the to-be-validated questioning format and the respective prevalence estimates are compared. In this approach, the true prevalence of the to-beestimated attribute is unknown, and the method under investigation is considered valid if it yields significantly higher prevalence estimates than the direct question in case of socially undesirable and thus underreported attributes (and vice versa). Importantly, the evidence provided by such studies is entirely comparative and thus relative to a baseline method (typically direct questioning). The conclusions that can be drawn from such weak validation studies therefore rest completely on the validity of the ‘‘more is better’’ assumption (Umesh & Peterson, 1991). However, this assumption is questionable. Given that the true criterion value is unknown, it is not possible to conclude that the approach under consideration in fact improves validity. Even if the investigated approach yields significantly higher estimates of a socially undesirable attribute than a baseline approach, it remains possible that the investigated approach severely over- or underestimates the true prevalence. In what we call intermediate validation studies, the prevalence of the attribute under consideration is unknown in the current sample of respondents, but known with respect to a certain population segment. This approach has been pursued, for example, by Blair, Sudman, Bradburn, and Stocking (1977), who compared reports of alcohol consumption in a national probability sample with national consumption statistics derived from tax receipts. As such, the investigation comprised an outside criterion against which prevalence estimates could be pitted. Clearly, such intermediate validation studies improve upon weak validation studies; however, their strength of evidence is necessarily limited due to possible sampling biases. If the sample of participants is nonrepresentative (which in and of itself is difficult to test), the prevalence of the sensitive attribute in the sample may differ considerably from the prevalence in the underlying population. As these limitations imply, strong evidence regarding the validity of a questioning procedure requires that the prevalence of the sensitive attribute is conclusively known for a particular sample of respondents.1 In strong validation studies one can directly compare the estimate generated by the method under scrutiny against the known ‘‘true’’ 1

49

prevalence and thus immediately judge the validity of this estimate. Ideally, the method will yield an estimate that closely matches the known ‘‘true’’ prevalence. In addition, the estimates generated by different formats – to which respondents are randomly assigned – can thus be compared by simply determining which is closer to the conclusively known criterion. In view of these different types of validation studies, it is clear that validity can only be confidently asserted given an external criterion that is conclusively known for a particular sample of respondents. The major challenge for validation studies that aim to produce strong evidence is thus to obtain a suitable criterion. For example, the criterion may be obtained through a gold-standard measure, such as blood or saliva samples which are frequently used in determining the prevalence of smoking and drug use (e.g., Colon, Robles, & Sahai, 2001; Johnson & Fendrich, 2005). However, acquiring data of this type is often impossible, almost always cost-intensive, and further complicated by legal and ethical problems (Dalton, Daily, & Wimbush, 1997). The consequence of these hurdles is that most studies attempting to demonstrate the validity of a questioning procedure rely on weak or, at best, intermediate evidence (Roese & Jamieson, 1993; Umesh & Peterson, 1991). Indeed, only 6 of the 38 studies attempting to validate a randomizedresponse model considered in the meta-analysis by Lensvelt-Mulders et al. (2005) were able to obtain strong evidence.

A New Experimental Validation Method In this paper, we propose and demonstrate a new experimental validation method that provides strong evidence and can be applied to any technique tailored to estimate the prevalence of socially (un-)desirable attributes. The basic idea is to elicit normatively problematic behavior in a way that ensures conclusive knowledge of the prevalence rate of this behavior without any information on what single individuals did (thus perfectly preserving anonymity). Thereafter, the to-be-validated questioning technique(s) can be applied and the prevalence estimate(s) can be compared against the conclusively known value. Following this reasoning, we sought a paradigm that may elicit behavior (a) that is clearly socially unaccepted and (b) the prevalence of which can be assessed exactly at the group level while (c) nothing is known on the individual level to preserve anonymity. Also, the procedure should not require ethically questionable aspects such as deceiving participants (Hertwig & Ortmann, 2001). Specifically, the method proposed and applied herein is a variant of the dice-under-the-cup paradigm (e.g., Fischbacher & Heusi, 2008; Hilbig & Hessler, 2013). Participants roll a

A further differentiation concerns whether the true status of each single individual or merely the prevalence at group level is known. Because this difference is of lesser importance for the present work, we treat both types as strong validation studies. Note, however, that they have different implications in terms of protecting respondents’ anonymity.

 2013 Hogrefe Publishing

Experimental Psychology 2014; Vol. 61(1):48–54

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

50

M. Moshagen et al.: Validating Questioning Techniques

die in secret and merely report which outcome the roll produced. They thus have the opportunity to cheat by reporting an outcome different from the one they actually obtained. To elicit cheating, certain outcomes are associated with incentives such as monetary rewards. Each participant can claim this reward without revealing in any way whether it was rightfully earned or not. However, the probability of each outcome occurring across participants is known and it is thus straightforward to estimate the proportion of participants who can be expected to have cheated. Consequently, the prevalence of cheating is known on the aggregate level whereas nothing can be concluded about single individuals. In a second step, one can then apply the to-be-validated questioning procedure and determine whether the obtained prevalence estimate of cheating corresponds with the ‘‘true’’ prevalence. We illustrate the details of this validation method by means of an experiment designed to exemplify and demonstrate the approach.

Methods In line with the more general sketch provided above, our validation experiment consisted of two phases. In the first phase, cheating was elicited and the ‘‘true’’ prevalence of this behavior thus determined. In the second phase, two questioning techniques were used to estimate the experimentally induced factual prevalence, namely, a simple direct question and a variant of the RRT as the to-be-validated procedure. The direct question served as a manipulation check of whether the allegedly sensitive behavior actually is sensitive: If so, the prevalence estimate obtained in the direct questioning condition should underestimate the ‘‘true’’ prevalence significantly.

Participants A total of 386 respondents volunteered to participate in the study. Among those, N = 138 (35.6%) reported to have rolled the target outcome and therefore entered the second phase of the study. Participants’ mean age was 22.9 (SD = 3.65) years and 87 (63.1%) were male. Participants entering the second phase were randomly assigned to one of three questionnaire conditions: direct questioning (n = 31), Stochastic Lie Detector (SLD) with low (n1 = 53), and high (n2 = 54) randomization probability (see below). A higher number of participants was assigned to the latter conditions to compensate for the loss of efficiency associated with the randomization procedure.

Procedures The experiment was carried out at a prominent spot on the campus of the University of Mannheim, Germany, where 2

many students and faculty from diverse fields pass by. It was advertised as a lottery with the possibility of winning various attractive vouchers (e.g., free entrance to local clubs or museums)2 while taking only 5 min or less. Individuals willing to participate were given a fair die and an opaque cup. They were guaranteed that the die was fair, and they were also allowed to test this. Next, the experimenter told them their target outcome: Upon rolling this outcome, they would receive one of the vouchers. The six possible target outcomes occurred equally often across participants, while the allocation of participants to target outcomes was random. Most importantly, participants were told to roll the die in secret behind a screen without revealing the outcome to anyone. After the roll, the experimenter was going to ask them whether they had obtained their target outcome. Participants were explicitly asked to only answer ‘‘yes’’ or ‘‘no’’ in response to this question. Responding ‘‘yes’’ would automatically mean winning a voucher if a very brief follow-up questionnaire was filled out. Participants who reported having rolled the target outcome by responding ‘‘yes’’ to the experimenter entered the second phase of the study. Depending on the experimental condition, they were given one of two one-page questionnaires. Participants were asked to complete the questionnaire in secret behind the screen, fold it, and finally drop it into a ballot box without showing it to anyone. After dropping the questionnaire into the ballot box, they received their voucher and were thanked and debriefed.

Measures Direct Question Participants were asked to respond truthfully with ‘‘yes’’ or ‘‘no’’ to the statement: ‘‘I cheated in the previous phase by reporting an outcome I did not actually roll.’’ The questionnaire did not contain any reference to the respondent and was completed in secret to mimic the usual case of an anonymous questionnaire. Stochastic Lie Detector The SLD (Moshagen, Musch, & Erdfelder, 2012) is an extension of Mangat’s (1994) RRT variant, originally developed to improve statistical efficiency of the classical randomized-response model suggested by Warner (1965). In both Mangat’s variant and the SLD, participants are simultaneously provided with both the sensitive statement (same as in the direct questioning condition, denoted Statement A) and its negation (Statement B: ‘‘I reported the outcome I actually rolled’’), but are instructed to respond to only one of these statements. Respondents who did not cheat in the previous phase are instructed to participate in a randomization process (such as rolling a die) to determine

The monetary equivalent of the vouchers ranged from roughly €2 to €25 with a mean of approximately €5 ( 7 USD). We are indebted to the many local sponsors of this study who provided us with free vouchers.

Experimental Psychology 2014; Vol. 61(1):48–54

 2013 Hogrefe Publishing

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

M. Moshagen et al.: Validating Questioning Techniques

whether to respond to Statement A (selected with probability p) or Statement B (selected with the complement probability 1  p). In contrast, respondents who actually cheated are instructed to respond to Statement A regardless of the outcome of the randomization process. Since this procedure guarantees that a particular response is no longer incriminating, the confidentiality of the individual responses is protected (Ljungqvist, 1993; Ostapczuk, Moshagen, Zhao, & Musch, 2009). However, if the probability distribution of the randomization device is known, the prevalence of the sensitive attribute at group level can be determined by straightforward probability calculations. For example, in Mangat’s variant, the proportion of cheat^ ¼ ð^ ers is estimated by p k  1 þ pÞ=p, where ^ k denotes observed relative frequency of ‘‘yes’’-responses and p is the (a priori known) probability of being prompted to respond to Statement A. Unlike both Mangat’s variant and previous randomizedresponse models, the SLD considers the possibility that carriers of the sensitive attribute may fail to respond truthfully, despite the increased anonymity offered by the randomization procedure (e.g., Campbell, 1987). The SLD thereby takes incomplete truthful responding into account and thus allows for obtaining undistorted prevalence estimates of the sensitive attribute. Accordingly, the SLD assumes that carriers of the sensitive attribute respond truthfully with an unknown probability t  1, but fail to respond truthfully with probability 1  t. Thus, in the SLD, the probability of a ‘‘yes’’-response is k = pt + (1  p)(1  p). The SLD contains two unknown parameters which cannot be estimated on the basis on only one proportion of ‘‘yes’’-responses. For the SLD parameters to be statistically identifiable, it is therefore necessary to draw two independent random samples and to use different probabilities of being prompted to respond to statement A or statement B in each of the two samples (p1 and p2). By drawing two samples with different randomized assignments, two independent proportions of ‘‘yes’’-responses are obtained, which allow for estimating the SLD parameters. In particular, given the number of observations, n1 and n2, the known randomization assignments, p1 and p2, and the observed propork2 , for the two samples, tions of ‘‘yes’’ responses, ^ k1 and ^ the prevalence of the sensitive attribute can be estimated by   ^ k2  ^ k1 þ ðp2  p1 Þ ^¼ p ðp2  p1 Þ with associated variance ^ ^ k1 ð1  ^ k2 ð1  ^ k1 Þ k2 Þ ^Þ ¼ VARðp 2 þ 2: n1 ðp1  p2 Þ n2 ðp2  p1 Þ

51

In the present investigation, the month of birth was used as randomization device, because it is simple, transparent, and obviously unknown to the experimenter (Moshagen & Musch, 2012). The instructions provided to the participants in the first SLD condition read as follows: ‘‘Below, you find two mutually exclusive statements, labeled statement A and statement B. Please respond to only one of these statements: • If you actually rolled the target outcome, please respond to – Statement A, if you were born in January. – Statement B, if you were born between February and December. • If you did not actually roll the target outcome, simply respond to Statement A regardless of when you were born.’’ Instructions in the second SLD condition were largely identical, except that the randomization probabilities of the two SLD conditions were switched (participants were asked to respond to Statement B if they were born in January and to Statement A otherwise). According to official birth statistics published by the German Federal Agency for Statistics, the randomization probabilities p1 and p2 were .083 and .917, respectively. Instructions additionally explained thoroughly how the confidentiality of responses was protected owing to the randomization procedure.

Results As reported above, 138 participants out of 386 (35.75%) claimed to have rolled the target outcome in the first phase of the experiment. Given that a fair die was used and that rolling only one particular outcome was associated with receiving a voucher, the statistically expected proportion of winners is 1/6 = .1667. Thus, the maximum likelihood estimate of the experimentally determined ‘‘true’’ prevalence of cheating (conditionally on having reported the tar^0 = (.3575  .1667)/.3575 = .534 get outcome), p0, is p (SE = .042). In other words, about half of those who received a voucher most likely cheated. The estimated prevalence of cheating obtained through direct questioning and the SLD, respectively, in the second phase of the study are shown in Table 1. The SLD estimates were obtained by applying the equations provided above with n1 = 54, n2 = 53, p1 = .083, p2 = .917, and the observed proportions of ‘‘yes’’-responses in the two SLD samples, ^k1 ¼ :944 and ^k2 ¼ :509. Differences across the estimated prevalence rates were tested by standard

Table 1. Parameter estimates, standard errors (in parentheses), and z tests of H0: p  p0 against H1: p < p0 Mode Direct question (n = 31) Stochastic lie detector (n1 = 54; n2 = 53)

^ p

z Test; upper tail prob.

.355 (.0859) .478 (.0905)

z = 1.87; p = .03 z = 0.65; p = .26

^ = Prevalence of cheating as estimated through direct questioning Note. p0 = ‘‘True’’ prevalence of cheating (^ p0 = .534, SE = .042), p and the stochastic lie detector, respectively.  2013 Hogrefe Publishing

Experimental Psychology 2014; Vol. 61(1):48–54

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

52

M. Moshagen et al.: Validating Questioning Techniques

asymptotic z-tests for proportions taking into account the higher variance associated with the SLD estimate. Note that, for a questioning procedure to be valid, the crucial test of strong validation studies must be whether its associated prevalence parameter differs from the ‘‘true’’ prevalence rate. If this is not the case, the procedure yields an estimate which can be considered valid. However, providing evidence for a procedure’s validity no longer requires that it yields estimates that differ significantly from estimates of competing approaches (unlike in weak validation studies).3 When questioned directly, 35.5% (SE = 8.6) admitted to having reported an outcome they did not actually roll. As predicted, this proportion is significantly lower than the ‘‘true’’ prevalence of cheating, z = 1.87, p = .03, indicating that cheating in order to receive a voucher can be considered a sensitive behavior that is not readily admitted in an anonymous direct questionnaire. In contrast, employing the SLD yielded a higher prevalence estimate (47.8%; SE = 9.1) than direct questioning, which was numerically close to and statistically not significantly different from the estimated ‘‘true’’ prevalence of cheating, z = 0.56, p = .29. Thus, these results show that the SLD does not only provide numerically higher but also more valid prevalence estimates of cheating compared to direct questioning.

Discussion A common problem in studies involving sensitive issues is the tendency of individuals to respond in agreement with social norms rather than with the true state of affairs. To overcome this problem, different approaches that aim to counteract socially desirable responding have been proposed. However, all of these require increases in resources as compared to simple direct questioning. Therefore, it is vital to demonstrate that such alternative approaches yield a substantial increase in validity. Conclusively demonstrating superior validity, in turn, requires the availability of an external criterion reflecting the ‘‘true’’ prevalence of a sensitive attribute. Because such data are notoriously difficult to obtain, many validation studies proceed indirectly by merely comparing estimates obtained with different methods. This state of affairs is unsatisfactory, given that validation studies based on comparing methods only provide weak evidence. In the present work, we propose a procedure that enables researchers to measure the ‘‘true’’ prevalence of a sensitive behavior experimentally. Specifically we resorted to a cheating paradigm (Hilbig & Hessler, 2013) which allows for an estimate of sensitive behavior on the 3

aggregate level. This can then serve as an external criterion to produce strong evidence concerning the validity of the questioning approach under scrutiny. At the same time, the procedure fully protects respondents, as researchers gain no knowledge on individuals’ status concerning the sensitive attribute. Also, it refrains from deceiving or misinforming participants which is not only a matter of research ethic but also a practical one given that deception may ‘‘pollute’’ subject pools (Ortmann & Hertwig, 2002). Our application of the proposed procedure shows that it reliably elicits an ethically problematic behavior that is subsequently not readily admitted. That is, some participants cheated and mostly denied having done so when asked via a traditional anonymous questionnaire, which significantly underestimated the true prevalence of cheating. The proposed procedure thus provides an adequate means to judge the validity of approaches that aim to achieve superior prevalence estimates of sensitive attributes. In addition, the present experiment provided further evidence for the validity of the SLD (Moshagen et al., 2012). Importantly, however, the proposed validation method can be applied to any other procedure designed to reduce socially desirable responding. A possible objection is that successful validation of a tobe-validated method in one domain of problematic behavior (here: cheating) does not necessarily imply that this method is also valid in other domains (e.g., drug use). Nonetheless, unequivocal validation in a single domain suffices to reject the systematic-bias hypothesis. According to this hypothesis, threats to the validity of a questioning method do not arise from unsystematic errors that are present in some domains and absent in others. Rather, threats to the validity arise from systematic biases (i.e., over- or underestimations) induced by the way the method estimates prevalence rates. Systematic biases – if they exist – must show up similarly in any domain of sensitive behavior, even though the magnitude of bias may vary. In line with this reasoning, our results show that direct questioning underestimated the prevalence of cheating and is thus likely to be susceptible to systematic biases (underestimation) in other domains as well. By contrast, if a questioning technique (such as the SLD in our experiment) provides an adequate estimate of the prevalence of a clearly sensitive behavior (such as cheating), it can be concluded that this technique does not suffer from systematic bias. Arguably, this conclusion implies that the technique can be safely applied in various domains. Overall, the proposed procedure has several advantages when compared to more traditional strong validation studies: It offers a relatively simple way to obtain strong evidence for or against the validity of a questioning technique. At the same time, it does not necessitate knowing the true status of participants from gold-standard

Consider, for example, a comparative validation study involving two ideal questioning procedures, both of which reproduce the true prevalence perfectly. It is clear from this example that the adequate test concerning the procedures’ validities is whether their prevalence parameters differ from the ‘true’ prevalence, not whether there is a significant difference between the deviations of the prevalence parameters from the ‘true’ prevalence. In fact, such a test may even be misleading when one approach overestimates and another approach underestimates the ‘true’ prevalence (say, one differs by +5%, the other by 5%). Testing for the difference of deviations to the ‘true’ prevalence could then suggest different validities of these approaches, although both actually differ from the ‘true’ prevalence to the same absolute degree.

Experimental Psychology 2014; Vol. 61(1):48–54

 2013 Hogrefe Publishing

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

M. Moshagen et al.: Validating Questioning Techniques

measures or external records. The latter are typically difficult to obtain, often cost-intensive, and can be ethically problematic. Furthermore, studies applying the procedure proposed herein can be replicated quite easily without any geographical constraints. This is not generally the case for strong validation studies relying on official records (e.g., due to laws on data protection that vary between different nations). Likewise, the current procedure can be used to compare various techniques in a series of studies, as there are, in principle, no inherent limits regarding the sample size. In order to use the proposed validation procedure, it is useful if (a) many participants cheat by reporting an outcome illegitimately and vital that (b) there is a sufficiently high hurdle to admit having done so. The first aspect derives from the fact that only participants claiming to have obtained the target outcome are subsequently questioned via the to-be validated technique. They consequently only constitute a subsample of the total sample taking part in the study. Hence, the more prevalent cheating is, the fewer participants need to be sampled in total. Secondly, the procedure requires that the act of cheating is perceived as sensitive, that is, it must be considered a moral transgression which is not readily admitted under direct questioning. Both requirements are directly tied to the incentive to cheat that is implemented. If the perceived value of the incentive is too low, there is little reason to cheat and especially little pressure to deny cheating behavior. Although it is generally reasonable to assume that larger incentives render cheating more likely and pose higher hurdles to admit having cheated, very high incentives may seriously conflict with individual moral standards and thus interfere with the goal to elicit cheating (Mazar, Amir, & Ariely, 2008). Indeed, research using similar dice-rolling paradigms has demonstrated that participants avoid both minor and major lies (Shalvi, Handgraaf, & De Dreu, 2011). In summary, the proposed experimental validation method offers a simple means to obtain strong evidence regarding the validity of questioning techniques that aim toward reducing the problem of socially desirable responding. Given that such approaches and techniques are generally associated with increased efforts – which are acceptable only to the extent that a substantial increase in validity is gained – it is desirable that strong evidence concerning the validity of such techniques is routinely sought. Acknowledgments The work reported herein was supported by a grant from the German Research Foundation to the second author [HI 1600/1-1].

References Adams, S. A., Matthews, C. E., Ebbelin, C. B., Moore, C. G., Cunningham, J. E., Fulton, J., & Hebert, J. R. (2005). The effect of social desirability and social approval on  2013 Hogrefe Publishing

53

self-reports on physical activity. American Journal of Epidemiology, 161, 389–398. Baumeister, R. F., Vohs, K. D., & Funder, D. C. (2007). Psychology as the science of self-reports and finger movements: Whatever happened to actual behavior? Perspectives on Psychological Science, 2, 396–403. Birkeland, S. A., Manson, T. M., Kisamore, J. L., Brannick, M. T., & Smith, M. A. (2006). A meta-analytic investigation of job applicant faking on personality measures. International Journal of Selection and Assessment, 14, 317–335. Blair, E., Sudman, S., Bradburn, N. M., & Stocking, C. (1977). How to ask questions about drinking and sex: Response effects in measuring consumer behavior. Journal of Marketing Research, 14, 316–321. Campbell, A. (1987). Randomized response technique. Science, 236, 1049. Chaudhuri, A., & Christofides, T. C. (2007). Item count technique in estimating the proportion of people with a sensitive feature. Journal of Statistical Planning and Inference, 137, 589–593. Colon, H. M., Robles, R. R., & Sahai, H. (2001). The validity of drug use responses in a household survey in Puerto Rico: Comparison of survey responses of cocaine and heroin use with hair tests. International Journal of Epidemiology, 30, 1042–1049. Coutts, E., & Jann, B. (2011). Sensitive questions in online surveys: Experimental results for the randomized response technique (RRT) and the unmatched count technique (UCT). Sociological Methods and Research, 40, 169–193. Dalton, D. R., Daily, C. M., & Wimbush, J. C. (1997). Collecting ‘‘sensitive’’ data in business ethics research: A case for the Unmatched Count Technique (UCT). Journal of Business Ethics, 16, 1049–1057. Fischbacher, U., & Heusi, F. (2008). Lies in disguise: An experimental study on cheating. Research Paper Series 40, Thurgau Institute of Economics, University of Konstanz. Hertwig, R., & Ortmann, A. (2001). Experimental practices in economics: A methodological challenge for psychologists? The Behavioral and Brain Sciences, 24, 383–403. Hilbig, B. E., & Hessler, C. M. (2013). What lies beneath: How the distance between truth and lie drives dishonesty. Journal of Experimental Social Psychology, 49, 263–266. Iacono, W. (2000). The detection of deception. In J. Cacioppo, L. Tassinary, & G. Berntson (Eds.), Handbook of psychophysiology (2nd ed., pp. 772–793). New York, NY: Cambridge University Press. Johnson, T., & Fendrich, M. (2005). Modeling sources of selfreport bias in a survey of drug use epidemiology. Annals of Epidemiology, 15, 381–389. Jones, E. E., & Sigall, H. (1971). The bogus pipeline: A new paradigm for measuring affect and attitude. Psychological Bulletin, 76, 349–364. Lensvelt-Mulders, G. J. L. M., Hox, J. J., van der Heijden, P. G. M., & Maas, C. J. M. (2005). Meta-analysis of randomized response research: Thirty-five years of validation. Sociological Methods and Research, 33, 319–348. Ljungqvist, L. (1993). A unified approach to measures of privacy in randomized response models. Journal of the American Statistical Association, 88, 97–103. Mangat, N. (1994). An improved randomized-response strategy. Journal of the Royal Statistical Society: Series B, 56, 93–95. Mazar, N., Amir, O., & Ariely, D. (2008). The dishonesty of honest people: A theory of self-concept maintenance. Journal of Marketing Research, 45, 633–644. McFarland, L. A., & Ryan, A. M. (2000). Variance in faking across noncognitive measures. Journal of Applied Psychology, 85, 812–821. Experimental Psychology 2014; Vol. 61(1):48–54

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

54

M. Moshagen et al.: Validating Questioning Techniques

Moshagen, M., Hilbig, B. E., & Musch, J. (2011). Defection in the dark? A randomized-response investigation of cooperativeness in social dilemma games. European Journal of Social Psychology, 41, 638–644. Moshagen, M., & Musch, J. (2012). Surveying multiple sensitive attributes using an extension of the randomizedresponse technique. International Journal of Public Opinion Research, 24, 508–523. Moshagen, M., Musch, J., & Erdfelder, E. (2012). A stochastic lie detector. Behavior Research Methods, 44, 222–231. Moshagen, M., Musch, J., Ostapczuk, M., & Zhao, Z. (2010). Reducing socially desirable responses in epidemiologic surveys: An extension of the randomized-response-technique. Epidemiology, 21, 379–382. Ortmann, A., & Hertwig, R. (2002). The costs of deception: Evidence from Psychology. Experimental Economics, 5, 111–131. Ostapczuk, M., Moshagen, M., Zhao, Z., & Musch, J. (2009). Assessing sensitive attributes using the randomizedresponse-technique: Evidence for the importance of response symmetry. Journal of Educational and Behavioral Statistics, 34, 267–287. Ostapczuk, M., Musch, J., & Moshagen, M. (2009). A randomized-response investigation of the education effect in attitudes towards foreigners. European Journal of Social Psychology, 39, 920–931. Paulhus, D. L. (1984). Two-component models of socially desirable responding. Journal of Personality and Social Psychology, 46, 598–609. Roese, N. J., & Jamieson, D. W. (1993). Twenty years of bogus pipeline research: A critical review and meta-analysis. Psychological Bulletin, 114, 363–375. Shalvi, S., Handgraaf, M. J. J., & De Dreu, C. K. W. (2011). Ethical manoeuvring: Why people avoid both major and minor lies. British Journal of Management, 22, S16–S27.

Experimental Psychology 2014; Vol. 61(1):48–54

Simon, P., Striegel, H., Aust, F., Dietz, K., & Ulrich, R. (2006). Doping in fitness sports: Estimated number of unreported cases and individual probability of doping. Addiction, 101, 1640–1644. Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133, 859–883. Umesh, U., & Peterson, R. (1991). A critical evaluation of the randomized response method: Applications, validation and research agenda. Sociological Methods and Research, 20, 104–138. Warner, S. (1965). Randomized-response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60, 63–69.

Received January 19, 2013 Revision received May 26, 2013 Accepted May 27, 2013 Published online August 16, 2013

Morten Moshagen Lehrstuhl Psychologie III, University of Mannheim Schloss, EO 254 68131 Mannheim Germany Tel. +49-621-1812124 Fax +49-621-1813997 E-mail [email protected]

 2013 Hogrefe Publishing