The role of memory for crime details - Semantic Scholar

2 downloads 5916 Views 732KB Size Report
Sep 17, 2010 - of CIT validity were based almost exclusively on mock-crime studies, ...... the degree of separation between the distributions of the detec-.
P S Y P

0 1 1 4 8

Journal Name

Manuscript No.

B

Dispatch: 4.10.10

Journal: PSYP

Author Received:

No. of pages: 12

CE: Bindu PE: Deepa/Mini

Psychophysiology, ]]] (2010), 1–12. Wiley Periodicals, Inc. Printed in the USA. Copyright r 2010 Society for Psychophysiological Research DOI: 10.1111/j.1469-8986.2010.01148.x

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

Psychophysiological and behavioral measures for detecting concealed information: The role of memory for crime details

GALIT NAHARIa AND GERSHON BEN-SHAKHARb a

Department of Criminology, Bar Ilan University, Ramat Gan, Israel Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel

b

Abstract This study examined the role of memory for crime details in detecting concealed information using the electrodermal measure, Symptom Validity Test, and Number Guessing Test. Participants were randomly assigned to three groups: guilty, who committed a mock theft; informed-innocents, who were exposed to crime-relevant items; and uninformedinnocents, who had no crime-relevant information. Participants were tested immediately or 1 week later. Results showed (a) all tests detected the guilty in the immediate condition, and combining the tests improved detection efficiency; (b) tests’ efficiency declined in the delayed condition, mainly for peripheral details; (c) no distinction between guilty and informed innocents was possible in the immediate, yet some distinction emerged in the delayed condition. These findings suggest that, while time delay may somewhat reduce the ability to detect the guilty, it also diminishes the danger of accusing informed-innocents. Descriptors: Concealed Information Test, Symptom Validity Test, Skin conductance response, Memory

if the suspect’s physiological responses to the relevant alternative are consistently larger than to the neutral alternatives, knowledge about the event (e.g., crime) is inferred. As long as information about the event has not leaked out and assuming that each alternative appears equally plausible to an individual with no guilty knowledge, the probability that an innocent suspect would produce consistently larger responses to the relevant than to the neutral alternatives depends only on the number of questions and the number of alternative answers per question, and hence it can be controlled such that maximal protection for the innocent is provided. Extensive research conducted since the early 1960s has demonstrated that the CIT can be successfully used for detecting relevant information and discriminating between knowledgeable (guilty) and innocent individuals (e.g., Ben-Shakhar & Furedy, 1990; Ben-Shakhar & Elaad, 2003; Elaad, 1998; Lykken, 1959, 1960, 1998). In the last decade, the interest in the CIT seems to be growing, and various studies examining the mechanisms underlying this method, as well as applied questions related to its possible use as an aid in criminal investigations, have been published (e.g., Gamer, Bauermann, Stoeter, & Vossel, 2007; Gamer & Berti, 2010; Langleben et al. 2005; Rosenfeld et al., 2008; Rosenfeld, Shue, & Singer, 2007; Verschuere, Crombez, De Clercq, & Koster, 2004; Verschuere, Crombez, & Koster, 2004). However, in spite of the extensive research conducted on the CIT and its impressive validity estimates, the method has been applied extensively only in Japan (see Nakayama, 2002; Osugi, 2010). Many possible accounts have been offered to explain this gap between research and practice (e.g., Iacono, 2010; Kraphol, 2010; Podlesny,

Scientists and forensic experts have attempted for many years to develop instruments and methods for the purpose of detecting deception (e.g., Vrij, 2008). One notable approach, which has spawned several methods over the past century, is the use of psychophysiological responses (see, e.g., Ben-Shakhar & Furedy, 1990; Marston, 1917; Raskin, 1989; Reid & Inbau, 1977). In this study, we focus on just one of the two prominent methods of psychophysiological detection, known as the Guilty Knowledge Test (GKT) or the Concealed Information Test (CIT). This method, which is designed to detect concealed knowledge, rather than deception, is based on sound theoretical principles and proper controls and therefore satisfies the necessary requirements of an objective test (see Ben-Shakhar, Bar-Hillel, & Kremnitzer, 2002; Ben-Shakhar & Elaad, 2002a; Lykken, 1974, 1998). The CIT (Lykken, 1959, 1960) utilizes a series of multiplechoice questions, each having one relevant alternative (e.g., a feature of the crime under investigation) and several neutral (control) alternatives, chosen so that an unknowledgeable (innocent) suspect would not be able to discriminate them from the relevant alternative (Lykken, 1998). These relevant items are significant only for knowledgeable (guilty) individuals and, thus,

This research was funded by grants from the Israel Science Foundation to Gershon Ben-Shakhar. We thank Keren Maoz, Assaf Breska, and Tamar Pelet for their assistance in this research and Ewout Meijer for his helpful comments. Address correspondence to: Galit Nahari, Department of Criminology, Bar-Ilan University, Ramat Gan, 52900, Israel. E-mail: naharig@ mail.biu.ac.il 1

01148

2

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

1993), but one notable limitation of the bulk of CIT research conducted so far is that it has a questionable external validity. Estimates of CITvalidity were based almost exclusively on mock-crime studies, which differ in many important respects from real polygraph examinations. Roughly, these differences can be classified into two major categories: (a) Motivational-emotional factors, related to the differences between committing a real crime and following the instructions of an experimenter to commit a mock-crime, as well as differences related to the possible consequences of an incriminating polygraph test compared with failing a laboratory CIT (which typically means that the participant will not receive a bonus of a few dollars); and (b) cognitive factors, related to processing the critical information during the crime and the ability to remember this information during the test. As the present study focuses only on the second category of cognitive factors, only this category will be elaborated. In the typical mock-crime experiment, it is guaranteed that all subjects learn all the relevant items (e.g., six features of the mock crime, such as the color of an envelope stolen and the amount of money it contained). Furthermore, subjects are typically tested immediately after being exposed to this critical information, thus memory does not play a role in the experimental situation. In real life, things are typically entirely different. The guilty person is faced with a complex scene, and it cannot be assumed that all details were indeed noticed, processed, and stored in memory. Criminal suspects are very rarely tested immediately after committing the criminal act. In most cases they are tested days, weeks, and sometimes months after the crime was committed (see Ben-Shakhar & Furedy, 1990; Carmel, Dayan, Naveh, Raveh, & Ben-Shakhar, 2003). Carmel et al. (2003) were the first to systematically examine these cognitive aspects of the external validity of CITexperiments by comparing the standard mock crime procedure with a more realistic type of mock crime and by comparing immediate and delayed CITs. The results of this study revealed that the ‘‘realistic’’ mock-crime was associated with overall lower recall rates and weaker detection efficiency than the standard procedure. However, these effects were mediated by the type of CIT questions used, such that the decline in memory and detection efficiency was observed mainly for peripheral items that were not directly related to the mock crime (e.g., a picture on the wall), but not for items that were central to the event (e.g., the amount of money stolen). The results further indicated that a CIT based exclusively on the central items was unaffected by the type of mock-crime procedure. More recently, Gamer, Kosiol and Vossel (2010) also demonstrated that central items, but not peripheral ones, are recalled after a 2-week period. Thus, these studies imply that a careful selection of central items (e.g., modus operandi, type of weapon used) can produce high accuracy levels, not only in the artificial laboratory conditions, but also in more realistic settings. Another potential limitation of the CIT is the possibility that, in actual criminal cases, some critical information may leak out to innocent suspects. Leakage of information to unaware suspects may lead to enhanced responses to these items and eventually to a misclassification of the informed innocent suspects as guilty (e.g., Bradley, Barefoot, & Arsenault, 2010). Several studies examined the effects of exposing the critical information to ‘‘innocent’’ subjects in mock-crime experiments (e.g., BenShakhar, Gronau, & Elaad, 1999; Bradley, MacLaren, & Carle, 1997; Bradley & Rettinger, 1992; Bradley & Warfield, 1984) and generally demonstrated that, although informed innocent

01148

G. Nahari & G. Ben-Shakhar subjects showed smaller responses to the critical items than guilty subjects, they did show significantly larger responses to these items when compared with uninformed innocent subjects. Bradley and his colleagues (e.g., Bradley & Warfield, 1984; Bradley et al., 1997) proposed a method, labeled the Guilty Action Test (GAT), in which subjects are asked about their actions rather than their knowledge. Bradley et al. (1997) demonstrated that, while the GAT was associated with a smaller rate of false positive outcomes in informed innocents than the standard version of the CIT, it still produced a much larger rate of false positive outcomes in informed innocents compared with uninformed innocents. Recently, Gamer, Verschuere, Crombez, and Vossel (2008) used the GAT and compared ‘‘guilty’’ subjects with ‘‘informed innocents’’ both when tested immediately after committing a mock crime and when tested 2 weeks later. They found that, while ‘‘guilty’’ subjects tended to forget only the peripheral items during this 2-week period, the informed innocents forgot all items. Consequently, detection of guilty subjects remained stable (i.e., the areas under the Receiver Operating Characteristic (ROC) were 0.89 and 0.90 in the immediate and delayed conditions, respectively), whereas erroneous detection of informed innocents was significantly reduced in the delayed condition (the ROC areas were 0.95 and 0.75 in the immediate and delayed conditions, respectively). The purpose of the present study is to continue and extend the line of research initiated by Carmel et al. (2003) and Gamer et al. (2010). Specifically, we used the more realistic type of mock crime proposed by Carmel et al. (2003) and a 3  2 betweensubjects design with guilt (‘‘guilty,’’ ‘‘informed innocents,’’and ‘‘uninformed innocents’’) and time of testing (immediate vs. delayed by 1 week) as the two orthogonal factors. Furthermore, in addition to measuring skin conductance, which has been demonstrated as the most efficient autonomic measure in CIT research (e.g., Gamer, Verschuere, Crombez, & Vossel, 2008), we examined two behavioral measures that have been rarely applied for detecting concealed information. Both of these measures are based on asking examinees, who deny knowledge of some critical items, to guess these items. Effective concealment is possible when guessing is random (i.e., where the critical alternative is guessed with the same probability as all other alternatives), but producing random guesses may be very difficult for those who are actually aware of the true alternatives. Consequently, the outcome of multiple guessing attempts may differentiate knowledgeable (who would not be able to produce random guessing) and unknowledgeable examinees (whose guesses will be random). Specifically, we adopted the Symptom Validity Test (SVT), which is a forced-choice self-report test (with two alternative answers for each question) that has been used to detect malingering in various contexts (e.g., Merckelbach, Hauer, & Rassin, 2002; Pankratz, Fausti, & Peed, 1975; Verschuere, Meijer, & Crombez, 2008). The SVT may be a promising tool for detecting concealed information because it is based on an entirely different rationale than the physiological measures and thus may add non-redundant information. Recently, Meijer, Smulders, Johnston and Merckelbach (2007) demonstrated that the SVT can be a valuable tool for detecting concealed knowledge and, at least in some conditions, it can increase the validity of CITs based on skin conductance response (SCR). The second measure adopted in this study was derived from the Number Guessing Test (NGT) proposed by Lieblich and Ninio (1972) and by Lieblich, Shaham, and Ninio (1976). It is

3

Role of memory for crime details

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

based on a similar rationale to the SVT, but relies on guessing values of continuous variables, rather than guessing which of two possible alternatives is the correct one. Specifically, this method utilizes several numerical items (e.g., the house number where a crime was committed, the day of the month when the event occurred). As in the SVT, examinees are asked to guess the correct value of each item, and the detection measure is based on the correlation between the true profile and the profile guessed by each examinee. It is expected that knowledgeable examinees will produce larger correlations (either positive or negative) than unknowledgeable individuals. In the present experiment, we examine the utility of the three detection measures (SCRs based on the CIT, SVT, and NGT) in differentiating between ‘‘guilty,’’ ‘‘informed innocents,’’ and ‘‘uninformed innocents’’ both when examined immediately after committing a mock crime and 1 week later. In addition, we examine whether memory of the critical items and detection efficiency depend on the type of items used (central vs. peripheral). Methods Participants One hundred and twenty Hebrew University of Jerusalem undergraduate students (86 females and 34 males) participated in the experiment for course credit or payment (they receive 40 New Israeli Shekels (NIS), which is equivalent to US$10.50) or course credit. Their mean age was 24.06 (SD 5 3.22) years. Participants were recruited through ads placed on notice boards throughout the campus. All participants signed a consent form indicating that participation was voluntary and that they could withdraw from the experiment at any time without penalty. Eleven participants were eliminated due to unusually high skin resistance levels or excessive movements during the experiment, and eight additional participants were eliminated because they did not commit the mock crime or failed to show up to the second part of the experiment. These participants were replaced, so the total number of participants remained at 120. Apparatus Skin conductance was measured by a constant voltage system (0.5 V Atlas Researches, Hod Hasharon, Israel). Two Ag/AgCl electrodes (0.8-cm diameter) were used with a 0.05 M NaCL electrolyte. The experiment was conducted in an airconditioned laboratory, and an NEC CF-500 computer was used to control the stimulus presentation and compute skin conductance changes. The stimuli were displayed on the computer monitor.

Design A 3  2 between-participants design was used, with the following two orthogonal factors: (a) group: Participants either performed a mock-crimeF’’guilty’’ condition, didn’t performed a mockcrime but were informed about the relevant detailsF’’informedinnocent’’ condition, or didn’t perform the mock-crime and had no knowledge of the relevant detailsF ‘‘uninformed-innocent’’ condition; and (b) time of test: immediately after the first stage of the experiment (see below)Fimmediate condition, or after 1 weekFdelayed condition. The participants were randomly allocated to the six conditions created by this design, with 20 participants in each condition. Procedure The experiment was conducted in two stages: Stage 1 Participants arrived at the laboratory individually at the predetermined time. They were met by an assistant who read out loud the instructions appropriate for their particular condition. Guilty participants. Were instructed to go to an office of a staff member and ask for a particular numbered article. They have been told that, if the staff member is not in his office, they should open the office using a key that was handed to them in advance, enter the room, and find the particular article in a pile of numbered articles placed on the desk. In addition, they were requested to take advantage of the situation and steal an envelope with money and a jewel, to hide it in a mail box that was indicated to them, and then to enter the laboratory and hand over the requested article to the assistant. Actually, the staff member was never in the office, and thus all participants in this experimental condition were able to steal the envelope. Upon arrival at the designated office, participants faced a locked door with the name of the staff member and the office number typed on it. They opened the office using the key, and, when they entered the office, they saw that the light was turned on. On the desk, they found a pile of numbered articles with a newspaper on the top of it. Beside the pile were a family photo and a soft drink bottle. They found the requested article and looked for the envelope in the room. The envelope was located in the first drawer of a cabinet. It was a colored envelope with a date on it. The envelope was open, and contained Euros bills and a jewel. A note with a name was attached to the bills by an office clip. After checking its contents, the participants stole the envelope, dropped it in the mail box, and returned to the lab with the requested article. A total of six profiles of items were used in the CIT. Each of these profiles was composed of 11 items, described in Table 1.

Table 1. Profiles of Items Used in the Experiments

Profile

Envelope colorn

Name on note

Victim’s family namen

Soft drink

Newspaper

Jeweln

Article numbern

Office numbern

Sum of Eurosn

Date

Sex of victim

Buffer a b c d e

Yellow Green Orange Blue Red Purple

Lisa Marsha Lora Susan Judy Ashlee

Topaz Koren Morag Carmel Marom Zamir

Mineral water Sprite Ice-tea Coca-cola Orange juice Soda water

Hazofe Haaretz Yediot aharonot Maariv Israel hayom Globes

Brooch Earrings Ring Necklace Neck-pendent Bracelet

6 27 15 19 22 12

5 15 10 25 20 30

26 8 22 6 14 4

15 11 26 28 6 19

– Male Female Male Female Female

Note: ‘Sex of victim’ is the only item among the 11 items that did not appear in the CIT, but only on the SVT. Thus, it doesn’t have a buffer profile. n These items were classified as central.

01148

4

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

G. Nahari & G. Ben-Shakhar

Table 2. Items Included Only in the SVT

Question Correct answer Alternative answer

Currencyn

Victim’s titlen

Envelope’s conditionn

Family photo

Position of newspapern

Drawern

Glasses in drawer

Light in officen

Victim’s gender

Euro Dollar

Dr. Professor

Open Closed

Present Absent

On top of the pile Not on top

First Second

Absent Present

On Off

See Table 1 See Table 1

These items were classified as central.

n

One profile, the buffer profile, was used only in the interrogation phase of the experiment, and was never used as the relevant profile. One of the other five profiles of items (a–e) was randomly chosen as the relevant profile for each participant, such that each profile served as the relevant profile for 20% of the participants. Eight additional relevant items, which were identical for all participants, were used in this experiment for the SVT. These items are described in Table 2. Informed-innocent participants. Read an article entitled ‘‘A Scandal: Theft in the Campus.’’ The article described the mockcrime and included all the relevant details (according to the particular profile assigned to the participant). To give the impression that the article was real, it was embedded among other articles in a student newspaper. Participants were not asked to memorize the details in order to preserve the more realistic nature of the manipulation (See Carmel et al., 2003). After reading the article, as a control assignment, the participants were requested to go to the teaching assistants’ mail boxes, where they found a short questionnaire dealing with personal hobbies and interests. They were requested to fill out the questionnaire and drop it into another mail box that was pointed out to them (the same mail box in which the guilty participants were instructed to put the stolen envelope). Uninformed-innocent participants. were exposed to the same procedure as the Informed-innocent participants, except that the article they read didn’t reveal any of the relevant details. Stage 2 CIT, SVT, and NGT were administered to all participants. Participants in the immediate condition took the tests immediately after stage 1, and those in the delayed condition took it 1 week later. An experimenter, who was unaware of the experimental condition to which the examinee was assigned, informed the participants that a theft was committed in the Psychology Department, and that they are suspects in committing this theft. He/she explained that the experiment was designed to test whether they could cope with lie detection tests and convince the examiner that they are innocent of stealing the money and jewel. It was emphasized that beating these tests is a difficult assignment that only few people can succeed in, and they were promised a bonus of 10 NIS (about $2.50) for a successful performance of the task. Subsequently, the participant was attached to the electrodes, and the CIT examination was conducted. The CIT questions were presented after an initial rest period of 2 min, during which skin conductance baseline was recorded. All examinees were presented with ten different questions, each targeting a different relevant detail of the mock crime (the envelope color, the name of the person written on the note, the family name of the office owner, the brand of soft drink, the name of the newspaper, the type of jewel, the number of the requested article, the sum of money, the office’s number, and the date written on the

01148

envelope). The questions were simultaneously presented on the computer monitor and heard through the computer speakers. Each question was followed by a buffer item, designed to absorb the initial orienting response, and a set of five items (the relevant item and four neutral control items). The order of the questions as well as the order of the five items within each question was randomized. Each question was presented for 10 s, and each item (alternative answer) was presented for 5 s. The inter-stimulus interval (blank screen) ranged randomly from 16 to 24 s with a mean of 20 s. Participants were asked to respond verbally, saying ‘‘no’’ to every item. A short, participant-terminated break was given after presentation of five questions. Upon completion of the CIT, participants were detached from the electrodes and performed the SVTand NGT, using a PC computer. The SVT consisted of 15 questions, each with 2 alternative answersFthe relevant detail (correct answer) and a non-relevant detail (wrong answer). Six of the SVT questions resembled those of the CIT (the envelope color, the name of the person that was written on the note, the family name of the office’s owner, the brand of soft drink, name of the newspaper, and type of jewel). For each of these 6 questions, the alternative to the correct answer was chosen randomly from among the 4 control items, included in the CIT. The other 9 questions appeared only on the SVT, 8 of them had a fixed alternative answer, while for the 9th (the victim’s gender), the answer depended on the specific profile. These questions along with the correct and incorrect alternative answers are displayed in Table 2. The questions appeared on the screen, one at a time, with the two alternative answers. The participants were instructed as follows: ‘‘Please choose one alternative answer each time and if you do not know the answer, just guess it!’’ Participants were not aware of the length of the test, and thus would have had difficulty adjusting their performance in accordance with chance. The NGT consisted of 4 open questions referring to numerical relevant details, which were included in the CIT (the number of the requested article, sum of money, the office’s number, and the date that was written on the envelope). The participants were informed that answers should be within the range of 1 to 30, and were instructed as follows: ‘‘Please type your answer by using the keyboard and if you do not know the answer, just guess it.’’ Before each test, the experimenter indicated to the participants that the correct answers were known only to the thief. Following Carmel et al. (2003) and Gamer et al. (2010), the 19 questions included in this experiment were classified as either central (questions directly related to the execution of the mock crime) or peripheral (questions related to items that were present in the crime scene, but were unrelated to its execution). Tables 1 and 2 specify for each question whether it was classified as central or peripheral. At the end of the questioning session, the experimenter thanked the participants, and asked them to wait until the computer program processed the data of the tests and reached a decision as to whether they were found ‘‘guilty’’ or ‘‘innocent.’’

5

Role of memory for crime details

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

The processing took 1 min, and subsequently two memory tests were administered to examine whether participants recalled the relevant items: the first was a recall memory test consisting of the 19 questions used in the three detection tests, and the second was a recognition memory test in which participants were requested to choose the correct alternative on a copy of the SVT and NGT which, together, covered all the 19 questions that were used. Guilty and informed-innocent participants responded to both recall and recognition memory tests, while uninformed-innocents responded only to the recognition memory test. All participants were asked to attempt to recall or recognize the relevant details and guess only when they didn’t know the answer. Level of confidence was rated for each answer on a 6-point scale ranging from 1 (not confident at all) through 6 (very confident). In addition, participants filled up a questionnaire regarding their performance in the experiment. Specifically, they were asked about their motivation to beat the tests, whether or not they used a strategy during the tests, etc. Finally, all participants were debriefed and compensated. Scoring of the Dependent Measures SCR Responses were transmitted in real time to the computer. SCR was defined as the maximal increase in conductance obtained from the examinee, from 1 s to 5 s after stimulus onset and computed using an A/D (NB-MIO-16) converter with a sampling rate of 50 Hz. To eliminate individual differences in responsivity and permit meaningful comparisons of the responses of different examinees, each participant’s SCR was transformed into within-examinee standard scores (Ben-Shakhar, 1985). To minimize habituation effects, within-block standard scores were used (see Ben-Shakhar & Elaad, 2002b; Elaad & Ben-Shakhar, 1997). The 60 items (see Table 1 for a description of the 10 questions used in the CIT, with 6 alternative items for each question) were divided into 2 blocks, each consisting of 30 items. Thus, the z scores used in this study were computed relative to the mean and standard deviation of the participant’s responses to the 30 items of each block. Finally, two detection scores were computed for each participant (one for each item-type category) by averaging the standardized SCRs elicited by the critical items within each item-type category. SVT An unknowledgeable individual (uninformed innocent) is expected to guess the answers on the SVTand thus give about 50% correct answers (chance level). It is hypothesized that a person who is aware of the critical items will be unable to ignore this information when answering the SVT and consequently deviate from chance level performance. Although it is reasonable to assume that individuals attempting to conceal critical items will display below chance level performance on the SVT (e.g., Verschuere et al., 2008), we defined a detection measure based on the SVT as the absolute deviation of the percent of correct answers from chance level (50%). Specifically, this measure was defined as jP  50%j, where P is the percent of the participant’s correct answers. We used this measure because in some cases knowledgeable individuals may use their knowledge to guess above chance level. NGT The NGT-based detection measure was defined as the absolute value of the Pearson correlation coefficient

01148

between the actual values and the values guessed by the participant.1 Data Analysis and Statistics Each dependent measure (rates of correctly recognized items and the three detection scores constructed for SCR, SVT, and NGT2) was subjected to a mixed 2  3  2 analysis of variance (ANOVA), with item-type (central vs. peripheral) serving as a within-subjects factor and group (‘‘guilty,’’ ‘‘informed innocents,’’ and ‘‘uninformed innocents’’) and time of CIT (immediate vs. delayed) serving as the 2 between-subjects factors. This was followed by two sets of orthogonal planned contrasts. The first, which was designed to examine more closely the effect of the item-type factor and its interaction with the other factors by excluding the ‘‘uninformed innocents’’ (for whom no item-type effect is expected), included the following contrasts: (1) The dependent measure difference between central and peripheral items among ‘‘guilty’’ participants was compared with the respective difference among ‘‘informed innocents’’(i.e., examining the itemtype  group interaction, excluding ‘‘uninformed innocents); (2) The dependent measure difference between central vs. peripheral items in the immediate condition was compared with the respective difference in the delayed condition (i.e., examining the itemtype  time of testing interaction, excluding ‘‘uninformed innocents’’ and (3) A contrast examining whether the item-type differences reflect a group  time interaction, (i.e., whether the item-type differences among ‘‘guilty’’ participants are less affected by delaying the test than the respective differences among ‘‘informed innocents.’’ The second set of contrasts, which was designed to examine more closely the effects of the between-subjects factors, included the following four planned contrasts: (1) Combined ‘‘guilty’’ and ‘‘informed innocents’’ (knowledgeable participants) were compared with the ‘‘uninformed innocents’’; (2) ‘‘Guilty’’ were compared with ‘‘informed innocents’’; (3) The time effect (defined as the dependent variable difference between the immediate and the delayed conditions) among knowledgeable participants was compared with the time effect among ‘‘uninformed innocents’’; and (4) The time effect among ‘‘guilty participants’’ was compared with the time effect among ‘‘informed innocents.’’ A rejection region of po.05 was used for all statistical tests, and effect size estimates were computed, using Cohen’s f (Cohen, 1988). One-tailed tests were used to test directional, a priori formulated hypotheses.

Results Memory Tests As the pattern of the results of the recall and recognition tests were essentially similar, only the results of the recognition tests 1 We used a slightly different measure than the one employed by Lieblich and Ninio (1972) and Lieblich et al. (1976). They transformed each negative correlation into the absolute value of the observed correlation plus one. This measure was inefficient because many uninformed innocents produced negative correlations (which were expected when participants are guessing with no prior knowledge) and adding 1 to the absolute value of these correlations inflated the detection measure among these participants and resulted in a high rate of false positives. 2 As the NGT is based on a limited number of numerical items, it was impossible to compare central and peripheral items in this context, and thus the item-type factor was not included in the NGT analysis. Thus, a 3  2 between-subjects ANOVA, with group and time as the two orthogonal factors, was conducted.

6

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

G. Nahari & G. Ben-Shakhar

Figure 1. Means and Standard Errors of the rate of correctly recognized items, computed across the 12 central and 7 peripheral questions within each experimental condition.

are presented. The recognition results of four participants were lost, and thus the following analyses are based on 116 participants. The mean rates of correctly recognized items were computed across the 12 central and the 7 peripheral items, and they are displayed in Figure 1 as a function of experimental condition. The ANOVA, conducted on the recognition rates, yielded the following outcomes: The results of the within-subject factors revealed a statistically significant interaction between item-type and group (F(2,110) 5 12.40, f 5 0.31, po.05). The main effect of item-type was not statistically significant (F(1,110) 5 2.23, f 5 0.07), mainly because item-type differences are neither expected nor observed in the ‘‘uninformed innocent’’ condition. The interaction of item-type with time of testing as well as the triple interaction produced very small and non-significant effects (Fo1 in both tests). The three orthogonal contrasts conducted after excluding the ‘‘uninformed innocents’’ revealed that, consistent with our hypothesis, the advantage of central over peripheral items (i.e., higher recognition rates) was significantly more pronounced among ‘‘guilty’’ than among ‘‘informed innocent’’ participants (t(110) 5 3.49, f 5 0.27, po.001). However, in contrast to our hypothesis, the advantage of central over peripheral items was not more pronounced in the delayed than in the immediate test (t(110) 5 0.17). Finally, the item-type differences did not reflect a group  time interaction (t(110) 5 1.87, f 5 0.10).

The analysis of the between-subjects factors revealed statistically significant results for both main effects (F(2,110) 5 78.84, f 5 1.16, po.001 for the group factor and F(1,110) 5 17.99, f 5 0.38, po.001 for the time of testing factor). The interaction between these two factors was also statistically significant (F(2,110) 5 10.07, f 5 0.40, po.001). The four orthogonal contrasts conducted following this analysis revealed that: (1) Combined ‘‘guilty’’ and ‘‘informed innocents’’ (knowledgeable participants) displayed significantly larger rates of correctly recognized items than unknowledgeable participants (t(110) 5 8.88, f 5 0.82, po.001). (2) The difference in the rate of correctly recognized items between the ‘‘guilty’’ and the ‘‘informed innocents’’ was not statistically significant (t(110) 5 0.95). (3) As expected, the time effect (i.e., a smaller rate of correctly recognized items in the delayed than in the immediate condition) was significantly larger for knowledgeable than for unknowledgeable participants (t(110) 5 4.01, f 5 0.36, po.001). (4) Similarly, a significantly larger time effect was found for ‘‘informed innocents’’ than for ‘‘guilty’’ participants (t(110) 5 2.66, f 5 0.23, Po.01). SCR The means of the SCR detection scores, computed across participants within each experimental condition and each item-type category, are displayed in Figure 2. These data were subjected to the same ANOVA conducted for the recognition results. The

Figure 2. Means and Standard Errors of the Standardized SCRs to the Relevant Items, computed across the 6 central and 4 peripheral within each experimental condition.

01148

Role of memory for crime details

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

item-type factor showed a statistically significant main effect (F(1,114) 5 14.18, f 5 0.23, po.001) indicating that central items elicited larger relative SCRs than the peripheral items, but it did not show any statistically significant interactions with the other factors. However, these insignificant interactions may be due to the inclusion of the uninformed innocents for whom neither itemtype nor time of CIT should make a difference. Indeed, the three orthogonal contrasts conducted, excluding the ‘‘uninformed innocents,’’ revealed that, consistent with our hypothesis, the advantage in detection of central over peripheral items was more pronounced in the delayed CIT than in the immediate test (t(114) 5 1.75, po.05, one-tailed, f 5 0.11). On the other hand, in contrast to our hypothesis, the advantage in detection of central over peripheral items was not more pronounced in the ‘‘guilty’’ than in the ‘‘informed innocents’’ (t(114) 5 1.46, f 5 0.08). Finally, the contrast examining whether the item-type differences reflect a group  time interaction did not yield a statistically significant result (t(114) 5 0.49). The analysis of the between-subjects factors revealed a statistically significant group effect (F(2,114) 5 14.59, f 5 0.48, po.001) and a smaller time effect (F(1,114) 5 3.56, po.05, onetailed, f 5 0.15), reflecting larger relative SCRs in the immediate than in the delayed condition. The group factor showed also a statistically significant interaction with time (F(2,114) 5 4.49, f 5 0.24, po.05). This interaction was expected as time of CIT should affect only knowledgeable participants. The four orthogonal contrasts, conducted to examine more closely the group effect and its interaction with time, revealed that: (1) knowledgeable participants showed a significantly larger SCR detection score than non-knowledgeable participants (t(114) 5 4.20, f 5 0.37, po.001); (2) guilty did not differ significantly from informed innocents (t(114) 5 .47); (3) the time effect (larger detection score in the immediate than in the delayed condition) was significantly larger for knowledgeable participants than for ‘‘uninformed innocents’’ (t(114) 5 1.72, po.05, one-tailed; f 5 0.13); and (4) the comparison of the time effect on ‘‘guilty’’ vs. ‘‘informed innocents’’ did not yield a statistically significant outcome (t(114) 5 .1.02; f 5 0.02). SVT The means of the SVT detection scores computed across participants are displayed in Figure 3 as a function of item-type and experimental condition.

7 The data of Figure 3 were subjected to the same analyses applied for the CIT and the recognition results. Surprisingly, central items produced a significantly smaller SVT detection than peripheral items (F(1,114) 5 3.86, f 5 0.11, p 5 .052). However, an inspection of Figure 3 reveals that this trend was due to differences between the two item-types in the uninformed innocents, who are obviously guessing and are unable to differentiate between central and peripheral items. Indeed, when the uninformed innocents were excluded, the differences between the two item-types were no longer significant. In addition, no statistically significant interactions between item-type and the other factors were found. The same three planned contrasts involving the itemtype factor were computed as in the previous analyses, and none revealed a statistically significant outcome (t(114) 5 1.26, f 5 0.06 for the item-type  time interaction; t(114) 5 0.45 for the item-type  group interaction; and t(114) 5 .1.51, f 5 0.09 for the triple interaction). The analysis of the between-subjects factors revealed that both the two main effects and their interaction produced statistically significant outcomes (F(2,114) 5 8.58, f 5 0.36, po.001 for the group factor; F(1,114) 5 3.23, po.05, one-tailed, f 5 0.14, for the time factor, reflecting larger SVT detection score in the immediate than in the delayed condition; and F(2,114) 5 3.46, f 5 0.20, po.05 for the group  time interaction, indicating that as expected the reduction over time in the detection measure was small in the ‘‘guilty’’ condition, but much more pronounced with the ‘‘informed innocents’’). To examine more closely these effects, we conducted the same four planned contrasts computed for the CITand recognition data. The results of these analyses were generally similar to the SCR results, indicating that, while knowledgeable participants displayed a larger average value of the SVT detection score than unknowledgeable participants (t(114) 5 4.27, f 5 0.38, po.001), there were no significant differences between ‘‘informed innocents’’ and ‘‘guilty’’ participants (t(114) 5 .0.12). The effect of time of testing was larger for knowledgeable participants than for unknowledgeable (t(114) 5 3.17, f 5 0.27, po.001) and, unlike the SCR results, it was larger for ‘‘informed innocents’’ as compared with ‘‘guilty’’ (t(114) 5 2.44, f 5 0.25, po.01). NGT The means of the NGT detection scores computed across participants within each condition are presented in Figure 4. These

Figure 3. Means and Standard Errors of the SVT-based detection measure, computed across the 9 central and 6 peripheral within each experimental condition.

01148

8

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

G. Nahari & G. Ben-Shakhar

Figure 4. Means and Standard Errors of the NGT-based detection measure computed within each experimental condition.

data are based on 113 participants, as seven participants guessed the same value for all 4 questions, and thus it was impossible to compute a detection measure for them. A 3  2 between-subjects ANOVA, with group and time as the two orthogonal factors, was conducted on the data of Figure 4. This analysis yielded a statistically significant group  time of test interaction (F(2,107) 5 3.47, f 5 0.21, po.05). To further examine the nature of this interaction and possible group differences, we conducted the same 4 planned contrasts computed for the analysis of the recognition, CIT, and SVT data. Knowledgeable participants showed significantly larger NGT detection scores than unknowledgeable participants (t(107) 5 2.24, f 5 0.19, po.05), but there was no significant difference between the ‘‘guilty’’ and the ‘‘informed innocents’’ (t(107) 5 1.02, f 5 0.02). In addition, the reduction in the detection score over time of testing was significantly larger with knowledgeable than with unknowledgeable participants (t(107) 5 2.57, f 5 0.22, po.01), but no time effect differences were found between the ‘‘guilty’’ and the ‘‘informed innocents’’ (t(107) 5 0.44). ROC Curves An additional approach for describing and comparing detection efficiency was adopted from Signal Detection TheoryFSDT (e.g., Green & Swets, 1966; Swets, Tanner, & Birdsall, 1961). This approach is particularly useful for analyzing psychophysiological as well as behavioral detection data, and it has been applied extensively in this area (e.g., Ben-Shakhar & Elaad, 2003; National Research Council, 2003). Typically, detection

efficiency is defined in terms of the relationship between the detection measure and the actual guilt (or knowledge of the relevant items). In SDT terms, this is measured by a ROC curve reflecting the degree of separation between the distributions of the detection score of ‘‘guilty’’ and ‘‘innocent’’ participants. In the present experiment, there are two groups of knowledgeable participants, and the ROC for each of these groups was constructed by comparing the detection score distribution of the knowledgeable participants (either ‘‘guilty’’ or ‘‘informed innocents’’) with the respective distribution of the ‘‘uninformed innocents.’’ These ROCs were constructed within each experimental condition for each measure, based on the 12 central, the 7 peripheral, as well as all 19 items. In addition, we examined the possibility of combining the three detection measures and constructed additional ROC curves, one based on a combination of the SCR and SVT and another based on a combination of all three measures. The measures were combined by using simple averages of the standardized detection measures (each measure was first transformed into standard scores based on the entire sample, and then the three standardized measures were averaged). We did not apply optimal weights to the three detection measures to avoid the possibility of inflating detection efficiency estimates due to capitalization on chance. Table 3 displays the areas under the ROC curves of the various measures as a function of item-types and experimental conditions. An inspection of Table 3 reveals that detection efficiency of ‘‘guilty’’ participants, as reflected by the ROC area, ranged in the immediate testing from 0.69 to 0.82 when a single

Table 3. Areas Under the ROC Curves Computed for Each Detection Measure and for 2 Combinations of these Measures, Within Each Item Category, Across Categories, and Within Each Experimental Condition CIT

Guilty Immediate Delayed Informed-innocent Immediate Delayed po.05; nnpo.01.

n

01148

SVT

NGT

CIT1SVT

All 3 Tests

All

Central

Peripheral

All

Central

Peripheral

All

All

All

0.82nn 0.76nn

0.78nn 0.80nn

0.77nn 0.55

0.77nn 0.68

0.69n 0.77nn

0.73n 0.62

0.81nn 0.49

0.94nn 0.84nn

0.97nn 0.80nn

0.91nn 0.64

0.76nn 0.74n

0.87nn 0.44

0.87nn 0.54

0.81nn 0.64

0.83nn 0.50

0.70n 0.46

0.97nn 0.65

0.97nn 0.66

Role of memory for crime details

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

measure is considered, and it increased to a level of 0.94, or even 0.97, when two or three measures were combined. This increase in the ability to differentiate ‘‘guilty’’ from ‘‘uninformed innocents’’ with the addition of the two behavioral measures can be accounted for by the fact that these behavioral measures reflect different psychological processes than the psychophysiological measure. Indeed, the Pearson correlation coefficients among the three measures, computed across all knowledgeable participants, were nearly zero (ranging between  0.05 and 0.02). In the delayed testing condition, detection efficiency generally decreased, and both the SVT and NGT produced detection efficiency estimates that don’t significantly exceed a chance level of 0.50. The SCR, on the other hand, remained relatively stable and produced an area of 0.76 when all items were considered and 0.80 when only central items were used. The addition of the SVT further increased the area to a level of 0.84 in the delayed condition. While this may be seen as good news, it must be qualified by the relatively high areas obtained for the ‘‘informed innocents,’’ which means that the risk of false-positive outcomes when critical information is leaked out may be severe. This danger is particularly severe in immediate testing, but much less when the test is delayed. In fact, in almost all cases, the areas computed for the ‘‘informed innocents’’ in the delayed testing were not significantly larger than chance. For example, the ROC area computed for the ‘‘informed innocents’’ decreased from 0.91 to 0.64 when only the SCR was used and from 0.97 to 0.66 when all three measures were used. To further examine whether ‘‘guilty’’ and ‘‘informed innocents’’ can be differentiated, additional ROC curves were constructed, such that sensitivity represented the rate of correctly classifying ‘‘guilty’’ participants and false-positive rate represented the proportion of ‘‘informed innocents’’ classified as ‘‘guilty.’’ The results of this analysis revealed that all the areas under the ROC curves in the immediate condition were around a chance level of 0.50, but increased somewhat in the delayed condition (e.g., it increased from 0.48 to 0.65 for the SCR and from 0.44 to 0.72 for the combination of SCR and SVT), implying that false-positive errors due to information leakage may be attenuated when the test is delayed.

Discussion The results of this experiment join many previous studies in demonstrating that the CIT can be a powerful tool in differentiating between individuals possessing critical information and those who were not exposed to this information. However, the present results also demonstrate that, at least when tested immediately, individuals who actually committed the mock-crime cannot be differentiated from those who were just exposed to the critical information in a neutral context. This pattern was revealed in each of the three detection measures employed in this study as well as when participants’ memory for the critical items was examined after they took the various tests. It can be argued that this result reflects the fact that the standard version of the CIT (the GKT), rather than the GAT proposed by Bradley and his colleagues (e.g., Bradley et al., 1997), was used in this experiment. But, on the other hand, our results with respect to the informed innocents are quite similar to those reported recently by Gamer et al. (2010) who used the GAT. Furthermore, in an additional study, Gamer (2010) directly compared the GAT with the standard CIT (or GKT) and found that, while both formats

01148

9 were equally effective in differentiating between knowledgeable and unknowledgeable individuals, they were also equally ineffective in differentiating between guilty and informed innocents. All measures used in this experiment reflect, as expected, an effect of time among knowledgeable participants. More interestingly, most measures revealed a stronger time effect (a decrement of the detection measure in the delayed condition relative to the immediate condition) among ‘‘informed innocents’’ as compared with ‘‘guilty’’ participants. However, this tendency was statistically significant only in the ANOVAs conducted for the recognition test and the SVT, but not when the SCRs and the NGT were used. The differential time effect is also revealed in the ROC analysis (see Table 3) where the decline in the area statistic was smaller among ‘‘guilty’’ participants (e.g., from 0.82 to 0.76 for SCR; from 0.97 to 0.80 for all measures combined), than among ‘‘informed innocents’’ (e.g., from 0.91 to 0.64 for SCR and from 0.97 to 0.65 for all three measures). This finding is consistent with the results reported by Gamer et al. (2010) who used a combination of autonomic measures and demonstrated that, while the area statistic did not show any decline in the delayed test for the ‘‘guilty’’ participants (0.89 and 0.90 in the immediate and delayed tests, respectively), ‘‘informed innocents’’ showed a considerable decline (from 0.95 to 0.75). This result may reflect the roles of involvement and active task-participation in memory. Individuals who actually committed the mock crime took an active part in producing the items to be remembered, while ‘‘informed innocents’’ became aware of the critical details through reading a newspaper. This difference between the two groups does not affect their responses in the immediate testing, but it does affect memory and, consequently, differential responding to the critical items shows greater decline with time among ‘‘informed innocents’’ than among ‘‘guilty.’’ This account is consistent with an extensive literature on the ‘‘generation effect’’ in memory (e.g., Slamecka & Graf, 1978; deWinstanley, 1995; deWinstanley & Bjork, 2004), demonstrating that individuals tend to remember information better when they take an active part in producing it. For example, participants who generated words by themselves (e.g., generated the opposite of a given word) subsequently remembered them better than participants who read the same words (Slamecka & Graf, 1978). Similarly, the superiority of memory for actions (self-performed tasks) over memory for verbally learned material (verbally learned tasks) has been demonstrated to be highly robust (‘‘the enactment effect’’; Engelkamp, 1998). By the same token, ‘‘guilty’’ participants who actually experienced the event, enacted the mock crime and had a direct contact with the critical items were more involved in the task and thus remembered these items better than ‘‘informed-innocents’’ who were exposed to the concealed items by reading about them. The practical implication of these results is that, although a great caution must be exercised against the possibility of information leakage, this problem may be less severe in actual applications of the CIT, because typically CITs are never conducted immediately after a crime was committed and often it may take a few weeks to identify potential suspects and design a CIT. Ideally, of course, CIT should not be conducted at all with suspects who were informed about the critical information, and sometimes such suspects can be identified by a proper pre-test interview. However, suspects in criminal offenses may be reluctant to disclose knowledge of crime-related items, even when they did not commit the crime and the critical information was leaked to them, because they can’t be certain that they will be believed to

10

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

have obtained this guilty knowledge through leakage. For example, often such suspects are unable to explain how they became aware of the crime-related details (see Ben-Shakhar et al., 1999, for a more detailed discussion of this issue). Consequently, it is impossible to guarantee, in practice, that guilty and only guilty suspects have knowledge of the critical information, and, therefore, any means that may minimize the risks involved in testing informed innocent suspects is important. Our results differ somewhat from those reported by Gamer et al. (2010) in demonstrating attenuation in detection efficiency of ‘‘guilty’’ participants after 1 week. As indicated earlier, Gamer et al. (2010) did not find any time effect on ROC area with ‘‘guilty’’ subjects. Similarly, Carmel et al. (2003), who used only ‘‘guilty’’ subjects, reported identical ROC areas in the immediate and delayed conditions when the standard mock crime was applied (0.84 in both conditions), but with the more realistic version of the mock crime detection efficiency showed some decline when the test was delayed (from 0.71 to 0.68). Thus, the present results and those reported by Carmel et al. (2003) suggest that, in a more realistic mock crime, some reduction in SCR-CIT detection efficiency may be expected when the test is delayed. However, as Gamer et al. (2010) reached a different conclusion, this issue may require further research. Another important aspect of the present results is the differentiation between central and peripheral items. As predicted, central items produced more efficient SCR detection efficiency, and this effect was stronger when the test was delayed. This is most clearly reflected by the ROC analysis where the two itemtypes produced in the immediate CIT, either similar areas (0.78 and 0.77 for central and peripheral items, respectively, with ‘‘guilty’’ participants) or even an advantage of peripheral items in the ‘‘informed innocents’’ (0.76 vs. 0.87 for central and peripheral items, respectively). In the delayed condition, on the other hand, the areas remained stable for the central items (0.80 and 0.74 for ‘‘guilty’’ and ‘‘informed innocents,’’ respectively) but declined drastically when only peripheral items were used (0.55 and 0.44, both not significantly different from a chance area of 0.50). The ROC analysis for the SVT reveals a similar pattern (see Table 3), although the ANOVA conducted on the SVT detection measure did not reveal a statistically significant item-type  time interaction. Furthermore, when the test is delayed, relying on just the central items results in larger areas than relying on all items, and this pattern is reflected by both the SCR and the SVT. In this respect, the present results strengthen the conclusion made by both Carmel et al. (2003) and Gamer et al. (2010), namely, that when constructing a CIT, an effort should be made to identify as many central items as possible. Ben-Shakhar and Elaad (2003) demonstrated that CITs based on at least five questions produce optimal detection efficiency. However, it is doubtful whether it would be possible to identify five central features of a crime, in the realistic criminal context, and it is unclear from the present results as well as from Carmel et al. (2003) and Gamer et al. (2010) whether adding peripheral items would be beneficial. One option would be to use only central items and repeat each question several times (see Ben-Shakhar & Elaad, 2002b; Elaad & Ben-Shakhar, 1997), but this requires additional research as the previous examinations of item-repetition effects did not relate to the distinction between central and peripheral items, nor did they relate to the crucial factor of delaying the test. The inclusion of two behavioral measures in addition to SCRs allows us to examine how these measures are affected by delaying

01148

G. Nahari & G. Ben-Shakhar the test and also to assess their incremental validity when combined with the physiological measure. Both the SVT and the NGT showed the expected time effect on knowledgeable participants. Furthermore, the SVT demonstrated a significantly larger time effect for ‘‘informed innocents’’ than for the ‘‘guilty’’ participants, implying that, in realistic conditions where the CIT is almost always delayed, its vulnerability to information leakage may be reduced. The present results also demonstrate that these behavioral measures may be useful when used in combination with physiological measures in enhancing the validity of the CIT. For example, when adding the SVT to the SCR measure, the area under the ROC curve for detecting ‘‘guilty’’ participants in the immediate test increased from 0.82 to 0.94, and adding the NGT further increased the area to 0.97. In the delayed test, the addition of SVT increased the area from 0.76 to 0.84, but no further increase with the NGT was revealed. Clearly the addition of these behavioral measures increases also the likelihood of false-positive outcomes in the ‘‘informed innocents,’’ at least in the immediate testing (the area increased from 0.91 to 0.97). Interestingly, erroneous detection of ‘‘informed innocents’’ in the delayed condition is relatively minor and the addition of the SVT and NGT don’t make any difference (i.e., the area slightly increased from 0.64 to 0.65 and 0.66, all values are not significantly larger than a chance area of 0.50). These results, which are consistent with the results of the second experiment reported by Meijer et al. (2007), imply that the SVT can be a valuable addition to the traditional physiological measures in applied settings. Of course, it is premature to make a definitive recommendation at this stage, and various aspects of this behavioral measure must be further investigated. In particular, it will be important to study its vulnerability to countermeasures and to devise algorithms protecting it from countermeasure attempts. The present study did not include a systematic examination of the effects of countermeasures on the SVT, but a post-experiment interview with the participants revealed that some knowledgeable participants tried to use sophisticated strategies to produce a random pattern (e.g., ignoring the content of the questions and answers, choosing always the answers that appeared at the right (or left) side of the screen). This issue was examined by Verschuere et al. (2008) who coached half of their participants not to perform below chance level. Indeed, none of the coached participants performed below chance level and consequently they were not detected, but 21% of these participants were detected when a run test was applied to detect deviations in the number of response alterations. Although these results shed doubts on the utility of the SVT as an aid in criminal investigations, it is possible that additional algorithms could be developed to detect deviations from randomness. Future studies should be conducted to further examine the vulnerability of the SVT to countermeasures and the effectiveness of various methods to detect deviations from randomness. A greater deal of caution should be exerted regarding the use of the NGT. Although the present results show that it may have a potential, it has to be remembered that, in this experiment, the NGTwas based on just four questions, and correlation coefficients derived from such a small profile may be unreliable. In addition, it should be noted that the use of only four NGTquestions reflects an inherent difficulty to identify critical numerical items. Thus, it is suggested that the validity of the NGT and its potential as an additional detection measure in forensic applications should be further explored before any conclusions are reached.

11

Role of memory for crime details

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

Finally, it should be pointed out once again that this study focused on just two aspects differentiating the laboratory mock crime set-up from realistic criminal investigations (memory of various types of critical items when the test is delayed and leakage of critical information to innocent suspects). Clearly, there are various emotional and motivational differences between mock crime studies and criminal investigations that may affect differential responding to the critical items. It is, of course, an empirical question whether the present results as well as the large body of CIT research, which is based on mock crime paradigms, will generalize to the forensic usage of the CIT. We believe, following Lykken (1974) that, since the CIT, unlike

the Comparison Questions Test (CQT), focuses on the detection of specific knowledge stored in memory, rather than on the detection of deception, it would be relatively unaffected by the increased emotional arousal associated with realistic criminal investigations. A study by Kugelmass and Lieblich (1966) who successfully manipulated emotional arousal and stress and found a general increase in measures of physiological arousal, but no effect on differential responding to the relevant information, provides some empirical support for this belief. But clearly, further research focusing on the emotional and motivational factors and their effect on the outcomes of the CIT is needed.

REFERENCES Ben-Shakhar, G. (1985). Standardization within individuals: A simple method to neutralize individual differences in psychophysiological responsivity. Psychophysiology, 22, 292–299. Ben-Shakhar, G., Bar-Hillel, M., & Kremnitzer, M. (2002). Trial by polygraph: Reconsidering the use of the GKT in court. Law and Human Behavior, 26, 527–541. Ben-Shakhar, G., & Elaad, E. (2002a). The Guilty Knowledge Test (GKT) as an application of psychophysiology: Future prospects and obstacles. In M. Kleiner (Ed.), Handbook of polygraph testing (pp. 87–102). San Diego, CA: Academic Press. Ben-Shakhar, G., & Elaad, E. (2002b). Effects of questions’ repetition and variation on the efficiency of the guilty knowledge test: A reexamination. Journal of Applied Psychology, 87, 972–977. Ben-Shakhar, G., & Elaad, E. (2003). The validity of psychophysiological detection of deception with the Guilty Knowledge Test: A metaanalytic review. Journal of Applied Psychology, 88, 131–151. Ben-Shakhar, G., & Furedy, J. J. (1990). Theories and applications in the detection of deception: A psychophysiological and international perspective. New York: Springer-Verlag. Ben-Shakhar, G., Gronau, N., & Elaad, E. (1999). Leakage of relevant information to innocent examinees in the GKT: An attempt to reduce false-positive outcomes by introducing target stimuli. Journal of Applied Psychology, 84, 651–660. Bradley, M. T., Barefoot, C., & Arsenault, A. (2010). Leakage of information to innocents. In B. Verschuere, G. Ben-Shakhar, & E. Meijer (Eds.), Memory detection: Theory and application of the Concealed Information Test. Cambridge, UK: Cambridge University Press, Forthcoming. Bradley, M. T., MacLaren, V. V., & Carle, S. B. (1997). Deception and nondeception in guilty knowledge and guilty actions polygraph tests. Journal of Applied Psychology, 81, 153–160. Bradley, M. T., & Rettinger, J. (1992). Awareness of crime-relevant information and the guilty knowledge test. Journal of Applied Psychology, 77, 55–59. Bradley, M. T., & Warfield, J. F. (1984). Innocence, information, and the guilty knowledge test in the detection of deception. Psychophysiology, 21, 683–689. Carmel, D., Dayan, E., Naveh, A., Raveh, O., & Ben-Shakhar, G. (2003). Estimating the validity of the Guilty Knowledge Test from simulated experiments: The external validity of mock crime studies. Journal of Experimental Psychology: Applied, 9, 261–269. Cohen, J. E. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum. deWinstanley, P. A. (1995). A generation effect can be found during naturalistic learning. Psychonomic Bulletin & Review, 2, 538–541. deWinstanley, P. A., & Bjork, E. L. (2004). Processing strategies and the generation effect: Implications for making a better reader. Memory & Cognition, 32, 945–955. Elaad, E. (1998). The challenge of the concealed knowledge polygraph test. Expert Evidence, 6, 161–187. Elaad, E., & Ben-Shakhar, G. (1997). Effects of items’ repetitions and variations on the efficiency of the guilty knowledge test. Psychophysiology, 34, 587–596. Engelkamp, J. (1998). Memory for actions. East Sussex, UK: Psychology Press Publishers.

01148

Gamer, M. (2010). Does the guilty action test allow for differentiating guilty participants from informed innocents? A re-examination. International Journal of Psychophysiology, 76, 19–24. Gamer, M., Bauermann, T., Stoeter, P., & Vossel, G. (2007). Covariations among fMRI, skin conductance and behavioral data during processing of concealed information. Human Brain Mapping, 28, 1287–1301. Gamer, M., & Berti, S. (2010). Task relevance and recognition of concealed information have different influences on electrodermal activity and event-related brain potentials. Psychophysiology, 47, 355–364. Gamer, M., Kosiol, D., & Vossel, G. (2010). Strength of memory encoding affects physiological responses in the Guilty Action Test. Biological Psychology, 83, 101–107. Gamer, M., Verschuere, B., Crombez, G., & Vossel, G. (2008). Combining physiological measures in the detection of concealed information. Physiology and Behavior, 95, 333–340. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: John Wiley & Sons. Iacono, W. I. (2010). Encouraging the use of the guilty knowledge test (GKT): What the GKT has to offer to law enforcement. In B. Verschuere, G. Ben-Shakhar, & E. Meijer (Eds.), Memory detection: Theory and application of the Concealed Information Test. Cambridge, UK: Cambridge University Press, Forthcoming. Kraphol, D. (2010). Practical limitations of the concealed information test in criminal cases. In B. Verschuere, G. Ben-Shakhar, & E. Meijer (Eds.), Memory detection: Theory and application of the Concealed Information Test. Cambridge, UK: Cambridge University Press, Forthcoming. Kugelmass, S., & Lieblich, I. (1966). Effects of realistic stress and procedural interference in experimental lie detection. Journal of Applied Psychology, 50, 211–216. Langleben, D. D., Loughead, J. W., Bilker, W. B., Ruparel, K., Childress, A. R., Busch, S. I., & Gur, R. C. (2005). Telling truth from lie in individual subjects with fast event-related fMRI. Human Brain Mapping, 26, 262–272. Lieblich, I., & Ninio, A. (1972). Detection of suppressed involvement with information through a forced number-guessing technique. Acta Psychologica, 36, 381–387. Lieblich, I., Shaham, E., & Ninio, A. (1976). Effects of time stress and stimulus-response set size on the efficiency of detection of involvement with suppressed information through the use of the forced numberguessing technique. Acta Psychologica, 40, 75–84. Lykken, D. T. (1959). The GSR in the detection of guilt. Journal of Applied Psychology, 43, 385–388. Lykken, D. T. (1960). The validity of the guilty knowledge technique: The effects of faking. Journal of Applied Psychology, 44, 258–262. Lykken, D. T. (1974). Psychology and the lie detector industry. American Psychologist, 29, 725–739. Lykken, D. T. (1998). A tremor in the blood: Uses and abuses of the lie detector. New York: Plenum Trade. Marston, W. M. (1917). Systolic blood pressure symptoms of deception. Journal of Experimental Psychology, 2, 117–163. Meijer, E. H., Smulders, F. T. Y., Johnston, J. E., & Merckelbach, H. L. G. J. (2007). Combining skin conductance and forced choice in the detection of concealed information. Psychophysiology, 44, 814–822.

12 Merckelbach, H. L. G. J., Hauer, B., & Rassin, E. (2002). Symptom validity testing of feigned dissociative amnesia: A simulation study. Psychology, Crime and Law, 8, 311–318. Nakayama, M. (2002). Practical use of the concealed information test for criminal investigation in Japan. In M. Kleiner (Ed.), Handbook of polygraph testing (pp. 49–86). San Diego, CA: Academic Press. National Research Council. (2003). The polygraph and lie detection. Committee to Review the Scientific Evidence on the Polygraph. Washington: The National Academies Press. Osugi, A. (2010). Daily application of the CIT: Japan. In B. Verschuere, G. Ben-Shakhar, & E. Meijer (Eds.), Memory detection: Theory and application of the Concealed Information Test. Cambridge, UK: Cambridge University Press., Forthcoming. Pankratz, L., Fausti, S. A., & Peed, S. (1975). A forced-choice technique to evaluate deafness in the hysterical or malingering patient. Journal of Consulting and Clinical Psychology, 43, 421–422. Podlesny, J. A. (1993). Is the guilty knowledge polygraph technique applicable in criminal investigations? A review of FBI case records. Crime Laboratory Digest, 20, 57–61. Raskin, D. C. (1989). Polygraph techniques for the detection of deception. In D. C. Raskin (Ed.), Psychological methods in criminal investigation and evidence (pp. 247–296). New York: Springer-Verlag. Reid, J. E., & Inbau, F. E. (1977). Truth and deception: The Polygraph (‘‘Lie Detection’’) Technique. Baltimore: Williams and Wilkins. Rosenfeld, J. P., Labkovsky, E., Winograd, M., Lui, M. A., Vandenboom, C., & Chedid, E. (2008). The Complex Trial Protocol (CTP):

(BWUS PSYP 01148 Webpdf:=10/04/2010 05:53:39 725776 Bytes 12 PAGES n operator=) 10/4/2010 5:56:28 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

PSYP

01148

G. Nahari & G. Ben-Shakhar A new, countermeasure-resistant, accurate P300-based method for detection of concealed information. Psychophysiology, 45, 906–919. Rosenfeld, J. P., Shue, E., & Singer, E. (2007). Single versus multiple probe blocks of P300-based concealed information tests for autobiographical versus incidentally learned information. Biological Psychology, 74, 396–404. Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Learning, Memory & Cognition, 4, 492–604. Swets, J. A., Tanner, W. P. Jr., & Birdsall, T. C. (1961). Decision processes in perception. Psychological Review, 68, 301–340. Verschuere, B., Crombez, G., De Clercq, A., & Koster, E. (2004). Autonomic and behavioral responding to concealed information: Differentiating defensive and orienting responses. Psychophysiology, 41, 461–466. Verschuere, B., Crombez, G., & Koster, E. (2004). Orienting to guilty knowledge. Cognition & Emotion, 18, 265–279. Verschuere, B., Meijer, E., & Crombez, G. (2008). Symptom validity testing for the detection of simulated amnesia: Not robust to coaching. Psychology, Crime, & Law, 14, 523–528. Vrij, A. (2008). Detecting lies and deceit. Pitfalls and opportunities (Second Edition). West Sussex: John Wiley and Sons.

(Received May 5, 2010; Accepted September 17, 2010)

Author Query Form _______________________________________________________ Journal Article

PSYP 01148

_______________________________________________________ Dear Author, During the copy-editing of your paper, the following queries arose. Please respond to these by marking up your proofs with the necessary changes/additions. Please write your answers clearly on the query sheet if there is insufficient space on the page proofs. If returning the proof by fax do not write too close to the paper's edge. Please remember that illegible mark-ups may delay publication. Query No. Description Author Response No Queries

.