Mark Adkins, Douglas P. Twitchell, Judee K. Burgoon, Jay F. Nunamaker Jr. Advances in Automated Deception Detection in Text-Based Computer-Mediated Communication. Proceedings of the SPIE Defense and Security Symposium, Orlando, Florida, 2004.
Advances in Automated Deception Detection in Text-Based Computer-Mediated Communication∗ Mark Adkinsa and Douglas P. Twitchella and Judee K. Burgoona and Jay F. Nunamaker Jr.a a Unversity
of Arizona, Centerfor the Management of Information, 114 McClelland Hall, Tucson, Arizona 85721, USA; ABSTRACT
The Internet has provided criminals, terrorists, spies, and other threats to national security a means of communication. At the same time it also provides for the possibility of detecting and tracking their deceptive communication. Recent advances in natural language processing, machine learning and deception research have created an environment where automated and semi-automated deception detection of text-based computermediated communication (CMC, e.g. email, chat, instant messaging) is a reachable goal. This paper reviews two methods for discriminating between deceptive and non-deceptive messages in CMC. First, Document Feature Mining uses document features or cues in CMC messages combined with machine learning techniques to classify messages according to their deceptive potential. The method, which is most useful in asynchronous applications, also allows for the visualization of potential deception cues in CMC messages. Second, Speech Act Profiling, a method for quantifying and visualizing synchronous CMC, has shown promise in aiding deception detection. The methods may be combined and are intended to be a part of a suite of tools for automating deception detection.
1. INTRODUCTION Deceptive communication has long been a problem for military, government, and business organizations. The Internet has provided another way to communicate deceptively; a way that offers greater anonymity and leaner media for disguising intent. However, the lean media of chat, instant messaging, and email also provide opportunities for those who wish to detect deception. The persistence of internet messages, whose ability to stick around was dramatically illustrated in the Microsoft anti-trust trial, provides an opportunity for text analysis. Unlike unrecorded telephone or face-to-face messages, text-based internet messages are oftentimes available for post-hoc scrutiny.
2. DECEPTION WWhether its governments protecting their citizens from terrorists or corporations protecting their assets from fraud, many organizations are interested in finding and exposing deception, defined as the active transmission of messages and information to create a false conclusion.1 Messages that are unknowingly sent by a sender are not considered deceptive, as there is no intention to deceive. The problem is that most people are poor at detecting deception even when presented with all of the verbal and non-verbal information conveyed in a face-to-face discussion. This issue is more acute when the deception is conveyed in text such as written legal depositions, insurance claims or everyday email, and it is near impossible when there are large amounts of documents to sift through. Furthermore, deception strategies may change from situation to situation as the deceiver anticipates the interactions and attempts to fool possible detectors. Current text and data mining technology may aid the problem of detecting deception in on-line text. Deception provides a fertile ground for using machine learning techniques because deception strategies are situation specific and context sensitive. For example, one might use few words when concealing but use copious words when persuading, but in both cases be deceiving. Such context-sensitive changes are difficult to encapsulate in a single deception model. Machine learning methods learn their models using training data which can be limited to the context at hand. ∗ Portions of this research were supported by funding from the U. S. Air Force Office of Scientific Research under the U. S. Department of Defense University Research Initiative (Grant #F49620-01-1-0394). The views, opinions, and/or findings in this report are those of the authors and should not be construed as an official Department of Defense position, policy, or decision.
[email protected]; phone 1-520-621-2640; http://www.cmi.arizona.edu
3. MESSAGE FEATURE MINING The procedure we followed to classify deceptive and truthful documents based on document features can be divided into two major steps, extracting features and classification, each with its own substeps. Table 1 is a summary of the procedure. 1. Extract features. (a) Choose appropriate features for deceptive intent. (b) Determine granularity of feature aggregation (i.e. sentence, paragraph, etc.). (c) Calculate features over desired text portions. 2. Classify. (a) Manually classify documents (b) Prepare data for automatic classification. (c) Choose appropriate classification method. (d) Train model on portion of data. (e) Test model on remaining data. (f) Evaluate results and modify features, granularity, and/or classification method to improve results. Table 1. Summary of intent-based text classification procedure
3.1. Extracting Features Extracting features includes choosing appropriate features for deceptive intent on which the documents will be classified, determining the granulation of feature aggregation, and calculating the features on the desired text. Of these steps, the most difficult is choosing the appropriate features. Potentially there are an infinite number of possible features. Choosing those that are most appropriate for classifying based on deceptive intent requires knowledge of the deception domain. The document features we chose for deception originated in the Desert Survival Study described in Section 3.4 and2 and are enumerated in Section 3.5. Interestingly there was some overlap with the features Page used for automated essay grading.
3.2. Classifying documents Classifying the documents starts with manually classifying the training set, preparing data for automatic classification, choosing an appropriate classification method, training and testing the model, and evaluating results. Because unsupervised learning may or may not create clusters based on deception, the method uses supervised learning and manual classification of the training and testing sets. Once the data set is manually classified, it needs to be cleaned and formatted for input into the machine learning algorithms. In this case the process consisted of creating an ARFF (Attribute-Relation File Format) file for input into WEKA (see Section 3.3.2 for more about WEKA). Once the data is ready for classification, an appropriate classification method or set of methods must be chosen. There are a number of methods to choose from, each with its own advantages and disadvantages.3, 4 Furthermore, most methods have a number of parameters that adjust the behavior of the methods, resulting in a very large number of distinct methods. Choosing a set of methods can be daunting; however, some methods that seem to have withstood the test of time are inductive decision trees, neural networks, and Bayesian classification. After the method or set of methods is chosen, it is a simple task to train and test the data and obtain the accuracy results. Once obtained, the results can be used as a feedback tool for modifying the features, the granularity, and/or the classification methods in an effort to improve the results.
3.3. Tools To implement feature-based text classification of deception, we used open-source, publicly available software including GATE 2.0 for natural language processing and WEKA 3.3.4 for machine learning. Both programs provide an extensible architecture and easy-to-use graphical interfaces.
3.3.1. GATE: General Architecture for Text Engineering Created at the University of Sheffield, GATE 2.0 (General Architecture for Text Engineering) is a java-based, object-oriented framework, architecture, and development environment for creating programs for analyzing, processing, or generating natural language.5 GATE is a component-based architecture based on two main components: Language Resources (LR) and Processing Resources (PR). LRs are data-only resources such as single documents, corpora, ontologies and lexicons. PRs are programmatic or algorithmic resources that either use or process LRs such as parsers and part-of-speech taggers. For example, to parse all of the sentences in a document, one would create a LR that contains or represents the document. Next, a PR that contains the parser is created. It is common for a parser to require a lexicon, which could be loaded as an additional LR. Last, an application or pipeline is created wherein the parser PR is assigned to process the document LR. A pipeline can direct the processing of any number of PRs on any number of LRs. Detailed instructions on how this process is done can be found in,6 and general descriptions of GATE can be found in.5, 7 GATE has been and currently is used on many projects (see8 and http://gate.ac.uk for a listing) including, for example, creation of the American National Corpus (http://americannationalcorpus.org) and text summarization.9 We used GATE to extract features from documents. To accomplish this we created a set of PRs, each of which extracted a feature or set of features from the document. For example, we built a PR using the GATE-provided Java Annotations Processing Engine (JAPE)6 that recognized and counted group references such as we, us, and ours. 3.3.2. WEKA: Machine Learning Software in Java WEKA is a platform produced at the University of Waikato for implementing machine learning algorithms. It comes equipped with a large number of classification, clustering, and attribute selection algorithms including the ones used in this paper. It also provides a framework in which to build additional machine learning algorithms as well as a number of preprocessing and graphing functions. A full treatment of WEKA and its uses in data mining can be found in4 and at http://www.cs.waikato.ac.nz/ml/weka.
3.4. The Desert Survival Study The Desert Survival study was designed with two purposes in mind: first to test a set cues to deception and second to create a data repository for testing automated deception detection tools. To this end, the study utilized the Desert Survival Problem,10 which provides an environment for group communication and allowed us to produce a set of deceptive and truthful messages. This set of messages provided a test bed for determining deceptive cues and testing automated deception detection. The Desert Survival Problem places group members in a situation where they must rank 12 items according to how important that item is to survive in the desert. Before beginning the task, group members are given expert advice on how to survive in the desert and a member of the group is instructed to be deceptive. Group members discuss the items and come to a consensus on how to rank the items. The deceptive member is encouraged to change the group’s consensus contrary to his or her own opinion. The following paragraphs give a summary of the study, and a more detailed explanation can be found in2 and.11 Upon registration, the online experiment divided the subjects into dyads and assigned the dyads two conditions: truth-truth and truth-deceptive. In the truth-truth condition, both subjects in the dyad were given simple instructions on what the task was and how to complete it. In the truth-deceptive condition, one set of subjects (comprised of a single member of each dyad) was given the same instructions as those in the truth-truth condition while the other set was given instructions to deceive his or her partner. In addition, the subjects who received these instructions were given reminders to be deceptive each time they were required to send a message to their partner. After receiving the instructions, all the subjects were given information that described conditions in the desert and possible strategies for survival. Then they were given the scenario describing how their jeep crashed in the desert and the twelve items available to help them survive. The system asked all of the subjects to rank the items based on the information they then had. Half of the subjects, including all of the deceptive subjects, were told to send a message to their partner to start the dialog. The dialog continued for four days; each day half of
the subjects sent a message in the morning while the other half sent a message in the evening. All subjects filled out questionnaires on days two and four that asked general questions about their interaction with their partner. Deceivers filled out additional questionnaires each day indicating whether they were deceptive about any of the items specifically, and how deceptive they felt their messages were during the interaction. On the final day, non-deceivers also filled out a questionnaire asking how deceptive they thought their partners messages were. Additionally, to ensure they had a basis on which to deceive, the deceivers were given the “correct” rankings on day three. Finally, to give the game an air of reality and elicit discussion, all subjects were given random scenarios on days two and three that eliminated one of the items from consideration.
3.5. The Data The data consists of all of the messages sent by all of the participants each day of the study. Each message is considered a document and is classified as deceptive or not based on whether the participant was instructed to be deceptive. Each message was considered a document and document features were gathered using GATE processing resources. As document features, we adapted 23 of the 27 linguistic based cues from.2 The features are grouped into seven measures (or trins): quantity, complexity, non-immediacy, expressiveness, diversity, informality, and specificity and summarized in Table 2.
3.6. Findings Zhou et. al.2, 15 used this method to attempt deception detection in a laboratory setting. In this study, pairs of students took part in an online decision-making exercise called the Desert Survival Problem. The subjects were told they had crashed in the desert and they were to rate 12 items according to how valuable they would be to survival. One of the subjects in some of the pairs was instructed to deceive his or her partner by recommending a ranking counter to their actual opinion. Using the fully automated message feature mining technique the researchers were able to obtain approximately 80 Though message feature mining shows promise for aiding automated or semi-automated deception detection, it does have some drawbacks. To obtain their stated accuracy Zhou et. al. used a similar manually annotated training set of messages. Such a training set is difficult to obtain, reducing the number of applications of message feature mining to those where the increased accuracy and automation offsets the cost of producing a training set sufficiently similar to the messages of interest. Furthermore, although message feature mining has been shown to be promising with the relatively large messages found in asynchronous communication such as email, it has not, however, been tested on the somewhat small messages of chat and instant messaging. Speech act profiling, explained in the next section, shows some ability to deal with the short, quick messages characteristic of dialog.
4. SPEECH ACT PROFILING Speech act profiling16 is a method of analyzing and visualizing conversations and their participants according to how they go about conversing rather than what it is they talk about. Since people may deceive in any domain, it is useful to have a analysis technique that is domain independent. Speech act profiling provides a domain independent analysis of conversations by combining the concepts of speech act theory, automated speech act classification, and fuzzy logic. Speech act theory posits that any utterance (usually a sentence) contains a propositional content part, c, and a illocutionary force, f .17 The propositional content is the meaning of the words which created the utterence. For example, the statement It’s cold in here has the propositional content that the room or area where the speaker is located is cold. The illocutionary force, however, is the intent of the speaker’s assertion that something about the world is true. That is, the speaker is doing something by speaking, which in this case is asserting. Speakers can do many things with an utternance. They can assert, question, thank, declare, insult, order, and even make substantial changes in the world such as marrying a couple or inaugurating an president. There might be more than one illocutionary force or speech act assorciated with an utterance, and the real act is determined by the context where the words are uttered. In the previous example, It’s cold in here, if uttered by a general in the
Quantity: 1. Word a : a written character or combination of characters representing a spoken word. 2. Verb a : a word that characteristically is the grammatical center of a predicate and expresses an act, occurrence, or mode of being. 3. Modifier b : describes a word or make the meaning of the word more specific. There are two parts of speech that are modifiers- adjectives and adverbs. Adjectives modify nouns and pronouns. Adverbs modify verbs, adjectives, and other adverbs. 4. Sentence a : a word, clause, or phrase or a group of clauses or phrases forming a syntactic unit which expresses an assertion, a question, a command, a wish, an exclamation, or the performance of an action, which usually begins with a capital letter and concludes with appropriate end punctuation. Complexity: 5. Average sentence length12 : 6. Average word length: 7. Pausality12 :
total # of words total # of sentences
total # of characters total # of words
total # of punctuation marks total # of sentences
Non-immediacy c : # of passive verbs 8. Passive voice ratio totaltotal , where a passive verb is a form of a verb used when the subject is being # of verbs acted upon rather than doing something. # of passive verbs 9. Modal verb a : totaltotal , where a modal verb is an auxiliary verb that is characteristically used with a # of verbs verb of predication and expresses a modal modification.
10. You reference ratio c : second person pronoun. 11. Self reference ratio c : singular first person pronoun. 12. Group reference ratio c : first person plural pronoun. 13. Other reference ratio c : third person pronoun. Expressiveness: 14. Emotiveness12 :
total # of adjectives + total # of adverbs total # of nouns + total # of verbs
Diversity: 15. Lexical diversity13 :
total # of unique words , total # of words
which is the percentage of unique words in all words.
total # of function words , total # of sentences
16. Redundancy: where function word is a word expressing primarily grammatical relationship (prepositions, articles, and conjunctions). 17. Content word diversity: function words).
total # of unique content words , total # of content words
where content word primarily expresses lexical meaning (not
Informality: 18. Typo ratio:
total # of misspelled words total # of words
Specificity: 19. Affect ratio
a,d
: conscious subjective aspect of a positive or negative emotion apart from bodily changes
d
20. Sensory ratio : indicates sensorial experiences such as sounds, smells, physical sensations and visual details.14 21. Temporal immediate ratio d : temporal information which indicates closeness in time 22. Temporal nonimmediate ratio d : temporal information which indicates distance in time 23. Spatial close ratio d : information about locations or the spatial arrangement of people and/or objects that indicates closeness
a b c d
24. Spatial far ratio d : information about locations or the spatial arrangement of people and/or objects that indicates distance http://www.webster.com http://englishplus.com/grammar/glossary.htm Frequency counts divided by the total # of words. Dictionary matches divided by the total # of words. Table 2. Table 2: Summary of Selected Cues and Measures (Adapted from2 )
army to a private might be an order to turn up the thermostat rather than just a simple statement. Thus, every utterance has several illocutionary act potentials. Speech acts are important in deception detection for two reasons. First, they are the means by which deception is transmitted; and second, they provide a mechanism for studying deception in conversations in a content independent manner. Deceptive speakers may express more uncertainty in their messages than truthtellers,18 and this uncertainty can be detected in the type of speech acts speakers use. For example, uncertain speakers should tend to use more opinions, maybe expressions, and questions than truthtellers do. Speech acts are important and technically useful since a method has been created to automatically identify them.19 This method uses a manually annotated corpus of conversations to train n-gram language models and a hidden Markov model, which in turn identifies the most likely sequence of speech acts in a conversation. Using the principles of fuzzy logic, the probabilities from the hidden Markov model can be taken as degrees to which an utterance belongs to a number of fuzzy sets representing the speech acts. Speech act profiling aggregates these fuzzy sets together and subtracts from them a “normal” conversation profile (created from the training corpus) to create a profile for an entire conversation. An example profile is shown in Figure 1.
Other ^2
Commissives
t3
t1 other %
x0.06
^q
bf
Assertives
fx na
0.04
commits
ng nn
0.02
sv
ny 0
no
sd
-0.02
maybe h
^g ad
-0.04
ft
bh
-0.06
fp
bk
fe
qh
Air1 Intel1 Space1
fc
qo
fa
qr br
qrr bd
qw ba
qw^d b^m
Expressives
qy b
arp
ar
aa
^h
qy^d
Directives
Figure 1. Sample speech act profile from the StrikeCom corpus showing submissive and uncertain behavior by the deceiver
The conversation profiled in Figure 1 was taken from a multi-player, grid-based war game called StrikeCom.20 The goal of the game was for the players to use their assets (such as sattelites and spies) to find the enemy camps
on the board and destroy them. In a study using StrikeCom, one of the players, Space, was given incentive and instructed to deceive the other players. The profile in Figure 1 is from this study. Therefore, it is interesting to note the differences between Spaces use of speech acts as compared to the other players. In this case, Space, the lightest line in the graph, used a relatively small proportion of statements (sd), and a relatively large proportion of maybes, and opinions (sv), which could be interpreted as expressing uncertainty.
4.1. Findings To test speech act profiling’s ability to aid in deception detection, Twitchell et. al.21 identified those speech acts that were related to uncertainty and summed their proportions for each participant in each conversation. They found that the deceptive participants in the conversations were significantly more uncertain than their fellow players. This result shows that using speech act profiling as part of deception detection in text-based synchronous conversations is promising. Besides uncertainty, other groupings of speech acts could be tested such as dominant or submissive behavior, which has been identified with deception.18
5. CONCLUSION The methods and findings reported in this paper represent the beginning steps being taken in deception detection over text-based media. Much work in this area remains to be done, not only in improving the methods presented here, but also in creating newer, more effective methods. Such methods might incorporate the time dimension as an indicator, or they might attempt to bring semantics and content into the picture.
REFERENCES 1. D. B. Buller and J. K. Burgoon, “Interpersonal deception theory,” Communication Theory 6(3), pp. 203–242, 1996. 2. L. Zhou, D. P. Twitchell, T. Qin, J. K. Burgoon, and J. F. Nunamaker Jr., “An exploratory study into deception detection in text-based computer-mediated communication,” in Thirty-Sixth Annual Hawaii International Conference on System Sciences (CD/ROM), Computer Society Press, (Big Island, Hawaii), 2003. 3. T. M. Mitchell, Machine Learning, McGraw-Hill Series in Computer Science, McGraw-Hill, New York, 1997. 4. I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java, The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann, San Francisco, 2000. 5. H. Cunningham, “GATE, a general architecture for text engineering,” Computers and the Humanities 36, pp. 223–254, 2002. 6. H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, C. Ursu, and M. Dimitrov, “Developing language processing components with GATE (a user guide),” user guide, The University of Sheffield, August 2002 2002. 7. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan, “GATE: A framework and graphical development environment for robust nlp tools and applications,” in Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002. 8. D. Maynard, H. Cunningham, K. Bontcheva, R. Catizone, G. Demetriou, R. Gaizauskas, O. Hamza, M. Hepple, P. Herring, B. Mitchell, M. Oakes, W. Peters, A. Setzer, M. Stevenson, V. Tablan, C. Ursu, and Y. Wilks, “A survey of uses of GATE,” 2000. 9. P. Lal, Text Summarisation. Masters, Imperial College, 2002. 10. J. Lafferty and P. Eady, The Desert Survival Problem, Experimental Learning Methods, Plymouth, Michigan, 1974. 11. L. Zhou, J. K. Burgoon, J. F. J. Nunamaker, and D. P. Twitchell, “Automated linguistics based cues for detecting deception in text-based asynchronous computer-mediated communication: An empirical investigation,” Group Decision and Negotiation (in press) , 2003. 12. J. K. Burgoon, M. Burgoon, and M. Wilkinson, “Writing style as predictor of newspaper readership, satisfaction and image,” Journalism Quarterly 58(2), pp. 225–231, 1981. ME384 JOURNALISM QUART. 13. H. Hollien, The Acoustics of Crime: The New Science of Forensic Phonetics, Plenum, New York, 1990.
14. A. Vrij, Detecting Lies and Deceit: the psychology of lying and implications for professional practice, John Wiley & Sons, Chichester, 2000. 15. L. Zhou, P. Taylor, T. Qin, J. K. Burgoon, and J. F. Nunamaker Jr., “Toward the automatic prediction of deception - an empirical comparison of classification methods,” Journal of Management Information Systems (In Press) , 2004. 16. D. P. Twitchell and J. F. J. Nunamaker, “Speech act profiling: A probabilistic method for analyzing persistent conversations and their participants,” in Thirty-Seventh Annual Hawaii International Conference on System Sciences (CD/ROM), IEEE Computer Society Press, (Big Island, Hawaii), 2004. 17. J. R. Searle, “A taxonomy of illocutionary acts,” in Expression and Meaning: Studies in the Theory of Speech Acts, pp. 1–29, Cambridge University Press, Cambridge, UK, 1979. 18. B. M. DePaulo, B. E. Malone, J. J. Lindsay, L. Muhlenbruck, K. Charlton, and H. Cooper, “Cues to deception,” (under review) , 2000. 19. A. Stolcke, K. Reis, N. Coccaro, E. Shriberg, R. Bates, D. Jurafsky, P. Taylor, C. Van Ess-Dykema, R. Martin, and M. Meteer, “Dialogue act modeling for automatic tagging and recognition of conversational speech,” Computational Linguistics 26(3), pp. 339–373, 2000. 20. K. Wiers and D. P. Twitchell, “A multi-player strategy game,” in Workshop on Affective Dialogue Systems, (Kloster Irsee, Germany (Under Review)), 2004. 21. D. P. Twitchell, J. F. J. Nunamaker, and J. K. Burgoon, “Using speech act profiling for deception detection,” in Lecture Notes in Computer Science: Intelligence and Security Informatics: Proceedings of the Second NSF/NIJ Symposium on Intelligence and Security Informatics, (Tucson, Arizona (In Press)), 2004.