Modality Effects in Deception Detection and Applications ... - CiteSeerX

12 downloads 30812 Views 161KB Size Report
interviewers under one of three modalities— text chat, audio conferencing, or Face-to-Face. One section of the interview concerns the theft itself. We ignored the.
Proceedings of the 38th Hawaii International Conference on System Sciences - 2005

Modality Effects in Deception Detection and Applications in Automaticdeception-detection Tiantian Qin University of Arizona

Judee K. Burgoon University of Arizona

J. P. Blair Jay F. Nunamaker, Jr The University of Texas at San Antonio University of Arizona {tqin, jBurgoon, jnunamaker}@cmi.arizona.edu; [email protected]}

Abstract Modality is an important context factor in deception, which is context-dependent. In order to build a reliable and flexible tool for automatic-deception-detection (ADD), we investigated the characteristics of verbal cues to deceptive behavior in three modalities: text, audio and face-to-face communication. Seven categories of verbal cues (21 cues) were studied: quantity, complexity, diversity, verb nonimmediacy, uncertainty, specificity and affect. After testing the interaction effects between modality and condition (deception or truth), we found significance only with specificity and observed that differences between deception and truth were in general consistent across the three modalities. However, modality had strong effects on verbal cues. For example, messages delivered face-to-face were largest in quantity (number of words, verbs, and sentences), followed by the audio modality. Text had the sparsest examples. These modality effects are an important factor in building baselines in ADD tools, because they make it possible to use them to adjust the baseline for an unknown modality according to a known baseline, thereby simplifying the process of ADD. The paper discusses in detail the implications of these findings on modality effects in three modalities.

1.

of deception (ADD) [3, 4]. The goal of ADD is to identify reliable indicators (cues) that differentiate between deceptive and truthful messages, establish baselines, and predict the truthfulness of future messages. For example, a decision tree method can be applied to reduce deception in some text-based communication, as in the steps shown in figure 1[4]: 1. If the average- sentence-length (ASL) of a message is greater than 15.75, it is most likely that the message is true; 2. if ASL is less than 15.75 and the sentence complexity is greater than 22, it is more likely to be a deceptive message; otherwise it is most likely to be true. Such decision trees are a “baseline” against which to differentiate between deception and truth under some specific circumstances (text-only communication, high-risk scenario, etc).

Figure1. Sample baseline for deception detection: decision tree

Background

Deception detection is one of the most exciting research areas in communication. By definition, deception is a communication of information through which a sender undertakes to create a false impression on receivers [1, 2]. Modern technology has made possible the automatic detection

Although automatic baseline-building methods are feasible with computer programs, results of experimentation with them have not been fully satisfying. The reason is simple—deceptive behavior depends to some extent on a deceiver’s characteristics

0-7695-2268-8/05/$20.00 (C) 2005 IEEE

1

Proceedings of the 38th Hawaii International Conference on System Sciences - 2005

(gender, social skills [7]) or on the modality through which a message is delivered [17, 18]. Because some cues may distinguish deception in one context but not others, numerous baselines may be needed for an autodeception-detection (ADD) tool, one for each context. Given existing techniques, it is unrealistic to consider all possible contexts, but a manageable goal to a general approach is to learn about the features of cues under different contexts in order to identify a set of reliable cues that appear in more than one context. Reliable cues are those that significantly differentiate deception from truth. Since the set of reliable cues can be expected to decrease as the number of modalities increases, it is necessary to try to find a set of reliable cues that is suitable for use with a specific number of contexts. In this study, we limited the contexts to three modalities (through which messages were delivered): text only (Txt), audio based (Aud), and face-to-face (Ftf). The research question was to investigate the characteristics of linguistic cues on those modalities, to search for a set of reliable cues under each modality, and to reveal any modality effects on deceptive messages. In what follows, we first discuss previous background about linguistic cues and modality effects in deception; then methods and results used to investigate modality effects are displayed and discussed. 1.1 The linguistic cues Previous research suggests that quantities of words or sentences and language styles contribute to an understanding of peoples’ underlying thoughts, emotions, and motives [7]. Some cues exist only in a fixed modality. For example, most nonverbal cues (such as gestures, vocal pitch, etc.) are not present in text messages. However, although they can be challenging to put into service, nonverbal cues are so important to deception detection that they will be considered together with verbal cues in future research plans. In the current paper, we focused only on linguistic (verbal) characteristics of audio- and videorecord transcripts. Considering only verbal cues allowed us to perform a consistent modalitycomparison of the characteristics of cues that existed in all three modalities. Several well-known theories and criteria for systematic statement validity analysis can be applied to verbal cues: criteria-based content analysis (CBCA) [3, 8], reality monitoring (RM), scientific content analysis (SCAN), Interpersonal Deception theory (IDT) and some other theories [8, 9, 10]. Criteria-based content analysis (CBCA), known as Undeutsche’s hypothesis [9], originated in 1967 to verify testimony about sexual abuse cases in German

courts. CBCA is based on the assumption that false statements lack valuable details and contain detectable differences in logic structure. Not all 19 CBCA criteria are suitable for automatic deception detection. Some of them are closely related to logic structure and content, such as unstructured production and quantity of details, and other highly subjective cues are difficult to capture automatically. For our purposes, we adapted some specific cues so that they could be calculated by word counting. For example, since CBCA considers detailed description of events to be more creditable [10, 7], deceivers should use fewer spatial and temporal words than truth-tellers. Spatial and temporal words can be calculated from a pre-defined dictionary [10]. Reality monitoring (RM) is “the process by which a person attributes a memory to an actual experience (external source) or imagination (internal source)” [9]. The foundation for RM was the theory of memory characteristics by Johnson and Raye [11], who claimed that memories based on actual experience are more vivid, have more perceptual information, contextual information, and more affective information than those based on imagination. Truthful messages thus contain more sensory and affect words describing smells, sights, sounds, tastes, and feelings. Deceptive or imaginary messages, on the contrary, exhibit more cognitive operation; information is generated from logical inference. For example, a deceiver who lied about going out but actually stayed at home might say, “I went out…it was raining BECAUSE it had been raining for three days (I knew it was raining, not because I was outside but because it had been raining for some time and should continue for a while).” Scientific content analysis (SCAN), a structural criterion of statement validity analysis, focuses on a written statement. An extensive study of SCAN was conducted by Driscoll [12]. A total of 10 criteria used in SCAN included auto-detectable cues such as pronouns, emotion words, first person singular, past tense, and so on. SCAN is sometimes treated as comparable to a polygraph examination [12]. Interpersonal deception theory (IDT), proposed by Burgoon and Buller [1], considers deception to include both strategic and nonstrategic elements. Deceivers attempt to control the way they deceive by managing the information content of messages, ancillary behaviors, and the overall image they are projecting. Among other things, information management can be accomplished by disassociating the sender from his or her message and by conveying uncertainty or vagueness. Uncertainty cues such as modal verbs may be employed. Number of modifiers may be helpful, since deceivers may have fewer modifiers than truth-tellers and therefore lack a real experience they are uncertain how to describe.

0-7695-2268-8/05/$20.00 (C) 2005 IEEE

2

Proceedings of the 38th Hawaii International Conference on System Sciences - 2005

Behavior management may include use of nonimmediacy (terms that represent a distant relationship between speaker and listener. For example, the plural pronoun “We” is considered more immediate than the singular pronoun “I”). Quantity cues, such as fewer words and sentences, can also signal reticence and withdrawal. Image management refers to attempts to project a credible image, deny responsibility for the message or decrease detectability. Some semantic verbal cues (not covered in this paper) are needed to detect image management since it involves more contextual and subjective issues than information and behavior management. Deceptive behavior is also nonstrategic in the sense that deceivers usually inadvertently signal arousal and nervousness, negative affect and reduced conversational involvement. Therefore, deceivers may be less involved and more affectively negative than truth-tellers. Based on the findings from the research reviewed, deceptive messages are expected to have fewer words, verbs, sentences, and references, more complexity (average sentence length, average word length, pausality), more affect (affect, imagery, activation, pleasantness), temporal spatial information, greater diversity; more uncertainty, more passive voice, and more cognitive information [3, 9, 10,13]. The cues and their respective categories are listed below; formal definitions appear in appendix A. The 21 investigated cues are: 1. Quantity (number of words, number of verbs, number of sentences) 2. Complexity (Average sentence length, average word length, pausality) 3. Uncertainty (Modal verbs, modifiers) 4. Nonimmediacy (Passive voice, reference) 5. Diversity (lexical diversity, content diversity, redundancy) 6. Specificity (Temporal details, spatial details, over all specificity (temporal + spatial), and sensory) 7. Affect (affect, pleasantness, activation, imagery). The linguistic cues came from theories and standards that not confined with any modalities. Consequently, the first hypothesis is expected to be true for all 3 modalities: text, audio and face-to-face: H1: Deceitful messages display more (a) uncertainty (more modal verbs and fewer modifiers) and (b) nonimmediacy, and less (c) quantity, (d) complexity, (e) diversity, (f) specificity, and (g) affect than true ones during text-, audio-, and face-to-face-based communications.

1.2 Modality as a moderator between linguistic cues and deceptive behavior The modality in which the statement is presented may affect the performance of deception. Many studies investigated the ability to detect deceit under multimodalities and they did not reach a firm conclusion about which modality leaked out the most deceptive cues [14, 15, 16, 17, 18]. More recent researchers have studied the detection advantages in the audio-visual modality. Most existing literature focused on nonverbal communication channels (smiling, voice pitch, eye blinking, etc) from audio or video modalities and argued that availability of those cues provide major advantages compared to text modality. Atoum [17] suggested that a wider range of perceptual information from the audio and visual format could assist in the task of credibility, compared to other modalities that lack of such information; while Stephen [18] found no significant difference. Among three modalities, the face-to-face modality has the closest distance between speakers and listeners. In turn, we expect speakers to feel the most excited, nervous, and aroused in FtF, and the level of excitement, nervousness and arousal should decline as one moves from FfF to audio to text. Cognitive overload experienced by deceivers should also vary. When deceivers are confronted at closer range (e.g., FtF) and with fuller modalities, they should have more aspects of communication to manage and hence should experience greater cognitive difficulty than truthtellers. This should translate into message production differences. For instance, quantity differences may be more evident between deceptive and truthful messages in FtF than other modalities. Therefore, we predicted that there should be an interaction between modality and deception: H2: Modality interacts with condition of messages (deceptive or true), or, whether deceivers have more or less cues of (a) quantity, (b)complexity, (c) uncertainty, (d) non-immediacy, (f) specificity, (e) diversity, and (g) affect than truth-tellers are influenced by the modalities (Face-to-face-, audio-, and text-based) If the interaction between modality and conditions were significant, we planned to further investigate what portion of the differences in deception and truth are due to the interaction, and to study the characteristics of deception cues in different modalities. If the interaction were not significant, it would suggest that deceit does not vary by modality and that modality exerts main effects on behavioral displays. For instance, observing that deceivers talk more (quantity cues) in face-to-face than text mode does not necessarily imply deception is influenced by

0-7695-2268-8/05/$20.00 (C) 2005 IEEE

3

Proceedings of the 38th Hawaii International Conference on System Sciences - 2005

close distance in communication. In fact, if truth-tellers also talk more in face-to-face than text modality, we consider face-to-face interaction to elicit more talking than text in general. No interaction between modalities and deception condition also implies the differences (in some cues) between deceiver and truth-teller may be consistent across modalities. For example, deceivers may have more quantities in messages (words, sentences, etc) than truth-tellers not only in text, but in audio and face-to-face communication. The next question would be how to adjust the baseline for a modality to apply in the unknown case. For such purpose, we study the characteristics of the linguistics cues in modalities. Knowing such modality effects on cues is necessary to set up baselines for deception detection, especially when we have a baseline for one modality (A) but not the other (B), we are able to adjust the baseline for B according to the modality differences between A and B. Specifically, the baselines are the boundary values that jumping from one side to another may signal abnormality, i.e. deception, as demonstrated in the previous example (Figure 1). If the quantity cues are significant indicators in a text-based modality, such that people write more (words or sentences) when lying than telling the truth, we can set a certain value of n to be the baseline of words for text-based communication, meaning every message that contains greater than n words are likely to be deceptive (consider along with other cues). Furthermore, because of the modality effects, n should be higher in face-to-face than in textbased interaction, since people talk generally more in the latter situation. 2.

Method All linguistic cues were extracted from a mock theft experiment [23]. Participants (N = 175) were recruited from a multi-sectioned communication class at a large midwestern university. They were offered extra credit for participation and the chance to win money if they were successful at their task. Half of the students were randomly assigned to be “thieves,” i.e., those who would be deceiving about a theft, and the other half became “innocents,” i.e., those who would be telling the truth. Interviewees in the deceptive condition were assigned to “steal” a wallet that was left in a classroom. In the truthful condition, interviewees were told that a “theft” would occur in class on an assigned day. All of the interviewees and interviewers then appeared for interviews according to a preassigned schedule. Interviewees could win $10 if they convinced a trained interviewer that their descriptions were truthful. Interviews were conducted by one of three trained interviewers under one of three modalities— text chat, audio conferencing, or Face-to-Face. One section of

the interview concerns the theft itself. We ignored the rest of the interview in this study to focus only on modality effects during that portion of the interview. Subjects were required to describe their activities prior to, during, and following the class from which the wallet was missing. All data were then processed via GATE (General Architecture for Text Engineering [19]) and scores were generated for the 21 linguistic cues. 3.

Results Due to some technical difficulties in audiovisual or text recording, sample sizes were uneven across modalities. There were 26 deceivers and 32 truthtellers in the text modality; 29 deceivers and 22 truthtellers in the audio modality; and 29 deceivers and 34 truth-tellers in the face-to-face modality, totaling 172 subjects. The experiment had a 2 (condition, or deceptive/true) × 3 (modality) design with 21 dependent variables. A series of MANOVAs were conducted for seven linguistic constructs to test Hypotheses 1 and 2. Table 1 and 2 shows the results of the statistical analyses for Hypotheses 1 and 2 respectively. Table 3 presents pair-wise comparisons of cue means under three modalities, which shows the general modality effects on cues. Table 1 shows the F- and p-values of significant cues under all conditions. The key of “D” in Table 1 means the cue was significantly greater/more present under deception and “T” means more under truth. The significant levels are 0.1, two-tailed. All means of 21 cues of 3 modalities are shown in appendix B. In text chat mode, deceivers used shorter sentences than truth-tellers, with average-sentencelength being significant, F(1,57) = 2.87, p = 0.096. Deceivers also had simpler sentences with less pausality , F(1, 57) = 3.44, p = 0.07. Contrary to predictions, deceivers used fewer modal verbs than truth-tellers, F(1, 57) = 3.783, p = 0.057, suggesting that deceivers were less uncertain than truth-tellers. Similarly, deceivers appeared to use more, not fewer, reference terms than truth-tellers, with F(1,57)= 3.942, p = 0.05; Deceivers also had more spatial detail (F(1,57) = 2.992, p = 0.09) and total details of both spatial and temporal (F(1, 57)= 3.396, p = 0.07). Content word diversity was greater in truthful messages (p < 0.10). All other tests within text mode did not receive support.

0-7695-2268-8/05/$20.00 (C) 2005 IEEE

4

Proceedings of the 38th Hawaii International Conference on System Sciences - 2005

Table 1. Significance and F(P-values) of cues in three modalities BEHAVIORAL CLASS

TXT

Mock Theft Experiment AUD FTF

Quantity Words Verbs Sentences Complexity Average sent. length

Modal verbs

T 2.87(0.1)

T 3.44(0.07)

Spatial & temporal details T 4.7(0.04) T 2.37(0.1)

T 3.78(0.06)

Mock Theft Experiment AUD FTF

D 3.4(0.07) T 3.8(0.06)

Temporal details Spatial details Sensory ratio

D 2.9(0.09)

Affect

Modifiers VB nonimmediacy Passive voice Reference

TXT

Diversity Lexical diversity Content word diversity Redundancy Specificity

Average word length Pausality Uncertainty

BEHAVIORAL CLASS

Affect Imagery activation D 3.94(0.05)

D 6.4(0.01)

Pleasantness

In the audio modality, the F-test showed a trend toward a main effect for the deception condition on complexity, such that deceivers used shorter words (F(1, 50) = 4.7, p = 0.035) and less punctuations (F(1,50)=2.374, p = 0.1). Deceivers had fewer temporal details (F(1, 50) = 3.783, p = 0.057). In the face-to-face modality, significant differences emerged on affect. The analyses on diversity and affect terms showed that deceivers used more diverse language with more affect terms than did truth-tellers, F(1, 62)= 6.356, p = 0.014. Table 2 showed the MANOVA results of testing hypothesis 2. Table 3 is pair-wise comparison of cuemeans under three modalities, which shows the general modality effects on cues. From table 3 we can see clearly modality effects on affect cues: the meandifferences and the p-values. For quantity of language, the test yielded a significant main effect for modality, Wilks’ Ȝ = 0.874; F(6, 328) = 3.73, p = .001. However, the main effect of condition was not significant, F(3,164) = 1.97, p = 0.18. The interaction of modality and condition was not significant, Wilks’ Ȝ = 0.979, F(6, 328) = 0.59, p = 0.773. Thus H2(a) was not supported. All quantity cues were significantly different for the three modalities, with p < 0.001 for words and verbs, and p = 0.003 for sentences. Table 3 records significant trends (p < 0.001) that quantity cues (words, verbs and sentences) were most obvious in ftf and then diminished in audio and then in txt. Complexity was similar to quantity. The main effect of modality was significant, Wilks’ Ȝ = 0.895;

F(6, 338) = 3.15, p = 0.006. The interaction of modality and condition was not significant, Wilks’ Ȝ = 0.987; F(6, 328) = 0.371, p = 0.0,and therefore did not support H2(b). The main effect of condition was not significant: Wilks’ Ȝ = 0.982; F(3, 164) = 0.99, p = 0.3. From table 3, complexity cues were most apparent in FtF, with p

Suggest Documents