08 twyman.indd - Semantic Scholar

46 downloads 200725 Views 895KB Size Report
to the development of advanced artificial intelligence systems is their impact on the people using them to ...... movement (e.g., self- adaptor, illustrator gesture).
A Rigidity Detection System for Automated Credibility Assessment Nathan W. Twyman, Aaron C. Elkins, Judee K. Burgoon, and Jay F. Nunamaker Jr. Nathan W. Twyman is a postdoctoral research scientist in the MIS Department at the University of Arizona, where he received his Ph.D. in MIS. His research interests span human–computer interaction, group support systems, virtual communities, health IS, and leveraging human and organizational factors in auditing, security, and forensic investigation systems. He has published articles in the Journal of Management Information Systems, Journal of the Association for Information Systems, and Information & Management. Aaron C. Elkins is a postdoctoral researcher in both the MIS Department at the University of Arizona and the Department of Computing at Imperial College London. He holds a Ph.D. in MIS from the University of Arizona. He investigates how the voice, face, body, and language reveal emotion, deception, and cognition for advanced human–computer interaction and artificial intelligence applications. Complementary to the development of advanced artificial intelligence systems is their impact on the people using them to make decisions. Dr. Elkins also investigates how human decision makers are psychologically affected by, use, perceive, and incorporate the next generation of screening technologies into their lives. Judee K. Burgoon is a professor of communications, a professor of family studies and human development, the director of human communication research for the Center for the Management of Information, and the site director of the Center for Identification Technology Research at the University of Arizona. She holds a Ph.D. in communication and educational psychology from West Virginia University. Her research interests are in deception, trust, interpersonal interaction, and new technologies. Jay F. Nunamaker Jr. is Regents and Soldwedel Professor of MIS, Computer Science and Communication and Director of the Center for the Management of Information and the National Center for Border Security and Immigration at the University of Arizona. He received his Ph.D. in operations research and systems engineering from Case Institute of Technology, an M.S. and B.S. in engineering from the University of Pittsburgh, and a B.S. from Carnegie Mellon University. He received his professional engineer’s license in 1965. Dr. Nunamaker was inducted into the Design Science Hall of Fame in May 2008 and received the LEO Award for Lifetime Achievement from the Association for Information Systems (AIS) in December 2002 and was elected a fellow of the AIS in 2000. He was featured in the July 1997 issue of Forbes Magazine on technology as one of eight key innovators in information technology. He is widely published, with an H index of 60. His specialization is in the fields of system analysis and design, collaboration technology, and deception detection. The commercial product GroupSystems’ ThinkTank, based on Dr. Nunamaker’s research, is often referred to Journal of Management Information Systems / Summer 2014, Vol. 31, No. 1, pp. 173–201. © 2014 M.E. Sharpe, Inc. All rights reserved. Permissions: www.copyright.com ISSN 0742–1222 (print) / ISSN 1557–928X (online) DOI: 10.2753/MIS0742-1222310108

174

Twyman, Elkins, Burgoon, and Nunamaker

as the gold standard for structured collaboration systems. He was a research assistant funded by the ISDOS project at the University of Michigan and an associate professor of computer science at Purdue University. He founded the MIS Department at the University of Arizona in 1974 and served as department head for 18 years. Abstract: Credibility assessment is an area in which information systems research can make a major impact. This paper reports on two studies investigating a system solution for automatic, noninvasive detection of rigidity for automated interviewing. Kinesic rigidity has long been a phenomenon of interest in the credibility assessment literature, but until now was infeasible as a veracity indicator in practical use cases. An initial study unexpectedly revealed the occurrence of rigidity in a highly controlled concealed information test setting, prompting the design and implementation of an automated rigidity detection system for interviewing. A unique experimental evaluation supported the system concept. The results of the second study confirmed the kinesic rigidity found in the first, and provided further theoretical insights explaining the rigidity phenomenon. Although additional research is needed, the evidence from this investigation suggests that credibility assessment can benefit from a rigidity detection system. Key words and phrases: automated interviewing systems, computer vision, concealed information test, credibility assessment, deception detection, freeze response, kinesic rigidity.

Credibility assessment is a major concern in many organizations and is an area in which information systems (IS) research can have a major impact. KPMG Integrity Surveys report that nearly three-quarters of employees have firsthand knowledge of wrongdoing in their organization and half state that if such wrongdoing were made public, a significant loss of trust would result [42, 43]. The U.S. government estimates that less than 1 percent of drug trafficking proceeds were detected in a two-year span [74]. In these and many other examples that could be cited, noncredible information has proven difficult to detect, spurring interest among researchers in criminal justice, cognitive psychology, and more recently, IS. Unaided human judgment consistently performs near chance levels [9] in spite of chronic overconfidence by decision makers [22, 31], while current veracity system aids are cumbersome, criticized for validity problems, and labor intensive, thus limiting ubiquity. The most well-known and widely used methods of veracity assessment rely on skilled professionals using interviewing techniques that require time and specialized equipment. Extensive training, invasive sensors, and time limitations are among the factors that have limited the application of these traditional methods mostly to criminal investigations. From an academic standpoint, the validity of current techniques and the reliability of results have been questioned [51, 58]. In particular, lack of procedural control and potential for human error have been cited as potential concerns [58]. IS research can have a major impact in this area by integrating theory, methods, and technology to generate useful and creative solutions [59, 61]. Many efforts have

Rigidity Detection System for Credibility Assessment

175

already begun. Proposed alternative approaches range from monitoring text-based communication in search of linguistic indicators [46, 90] to identifying telling vocalic or eye movement patterns in human screening [63, 79]. These methods differ from traditional approaches employed to assess veracity in the sensors used, questioning protocols administered and cues identified. The present paper builds on this work by investigating the veracity assessment potential of a body movement cue termed rigidity via the Nunamaker approach [59, 60, 61]. This paper has four objectives. First, it reports on the unique discovery of a rigidity effect in a concealed information test (CIT) protocol via a realistic exploratory mock crime experiment. Second, it proposes a system design for automatic recognition of rigidity in credibility assessment interviews. Third, it reports on an implementation of the proposed system design, and evaluation of the instance via a mock screening experiment. Fourth, it summarizes theoretical insights gained throughout this investigation. Reported here are proof-of-concept iterations, beginning by examining (1) results of an experiment that informed (2) a conceptual system design for rigidity detection, of which an instance was (3) built and evaluated to (4) generate valuable knowledge for advancing credibility assessment. The results of this work help establish the proof of concept for a system for automated rigidity detection and also feed back into a larger program of research investigating solutions for enhancing the accuracy, validity, ubiquity, and management of automated credibility assessment [23, 63, 64, 78].

Background Credibility assessment and deception detection are gaining increasing interest in IS. Some IS research has focused on how decision makers interact with credibility assessment decision aids [6, 37, 38]. From an e‑commerce perspective, IS research has begun to develop a deeper understanding of the key factors involved in e‑commerce deception [8]. Automated extraction of linguistic cues to deception has been explored in computer-mediated communication [46, 90, 91], written criminal statements [28], and financial reports [32, 35]. Systems for exploring oculometric indicators of hidden knowledge have also been an emerging interest [63, 78, 79]. Particularly relevant to the current study, there exists a stream of research investigating the use of certain computer vision techniques to identify movement variables that may have relevance to deception detection [14, 39, 56]. Outside of IS circles, recent research has examined systems for credibility assessment using blinking patterns [27] as well as more invasive systems such as functional magnetic resonance imaging (fMRI) [29, 30, 45] and electroencephalography (EEG) [1]. Some IS credibility assessment research has emphasized the need for noninvasive, autonomic system solutions to increase the ubiquity and reliability of credibility assessment, allowing it to create value in nontraditional contexts such as employment screening, auditing, and physical security screening [23, 64, 78]. This forward-looking approach to credibility assessment inspired the concept of an automated conversational agent that monitors psychophysiological and behavioral indicators relevant to

176

Twyman, Elkins, Burgoon, and Nunamaker

credibility assessment [63, 66]. For such an approach to work, many research questions need to be addressed. Among these include the need for identification of valid, reliable cues to deception that can be automatically generated in real or near-real time in a noninvasive manner. Kinesic rigidity is one cue that has the potential to meet these criteria.

Kinesic Rigidity in Credibility Assessment Rigidity is one of several kinesic (i.e., body movement) cues that have been identified in communication and psychology research as potential indicators of veracity. Kinesic rigidity is a temporal period of constricted body movement. During high-stakes deception, a liar tends to exhibit fewer noncommunicative movements, such as fewer instances of rubbing hands together or bouncing a leg. Expressive or illustrative gestures that do occur tend to be more confined and appear forced, as if they are being resisted [12, 85, 88]. Rigidity has been discovered in several studies featuring openended questioning protocols [17, 86, 87]. Despite decades of research into bodily rigidity and related kinesic cues to deception, the contextual boundaries of these phenomena are still not well understood, and defining the nuanced interrelationships among nonverbal behaviors and veracity is an active area of investigation in psychology, communication, criminology, and IS research. Several theories have been proposed as explanations for rigidity during periods of low veracity. Common theoretical explanations include cognitive load [24, 89] and behavioral control [20]; the results of the current study help make a case that hypervigilance may also be a plausible explanation. Proponents of a cognitive load explanation propose that lying takes more cognitive effort than telling the truth, and assumes that fabricating events requires more cognitive resources than simply recalling events. Because more cognitive resources are allocated to creating a plausible deception, it is thought that other activities, including movement, are given less attention, leading to fewer illustrative or communicative gestures [21]. The second common explanation is overt behavioral control. Proponents of this theory emphasize that the general population holds to a false belief that liars show increased nervousness in their body movements. However, while the average person believes a person shows increased body movement when lying, the opposite tends to be the case  [75, 76]. According to behavioral control proponents, a deceiver therefore either reflexively or perhaps purposely becomes more rigid in an attempt to mimic his or her own false perception of what a truthful communication should look like [13, 92]. A third possible explanation of rigidity may be more basic, and a precursor to cognitive excitation or overt behavior. Rigidity may result from the body entering a state of hypervigilance during the biologically driven “stop, look, and listen” response to a perceived threat [10, 33]. When examinees perceive that a line of inquiry has the potential to expose their deception, their body may naturally gravitate toward this

Rigidity Detection System for Credibility Assessment

177

hypervigilant state, which is characterized partially by bodily rigidity. Rigidity in hypervigilance is explained in more depth in the Study 1 Discussion section. Traditionally, rigidity has been measured using human coders, who review video recordings and subjectively rate interview segments according to the appearance of forced versus natural gesturing given the type of gesture and the context in which it was made. Human coding is limited to the major movement that can be perceived by a given coder, and it remains subject to intercoder error. Minute changes in movement can be imperceptible to human coders. Beyond natural human bias and limitation, the biggest restriction to wider adoption of subjective rigidity coding for deception detection is that of the large amount of time and labor costs involved. Every hour of a recorded interview can take two to six hours of expert post-process coding. An automated solution will have the potential to greatly decrease the time and labor cost. Operator training costs can be eliminated altogether if sensors do not require attachment to or manual calibration with each examinee. A noninvasive, automated measurement method is thus a key contribution of this study and an integral component of the proposed rigidity detection system design. A second important contribution is the exploration of rigidity in CIT interviews, which previously has not been investigated. In addition to automated and noninvasive measurement, an effective system design requires a reliable questioning protocol. Several potential protocols were investigated in the experimental phase of Study 1, and a CIT structure was ultimately selected as the foundational questioning protocol in the system design. The CIT is detailed further in the Study 1 Discussion section.

Study 1: Initial Investigation into Automated Rigidity Detection Based on the observations noted in the Background section, general requirements for an automated rigidity detection system were clear from the beginning of the investigation (Table 1). The design requirements were necessarily at a high-level stage, given the novelty of the knowledge space. Additional requirements were added as the investigation progressed, and key considerations were revealed through prototyping and experimentation. Our initial efforts to track and measure movement in credibility assessment interviews involved using automated techniques for detecting the location of the face and hands in video and tracking two-dimensional changes in location over time. An initial experiment using a mock crime paradigm led to the discovery of rigidity in a CIT paradigm, driving further understanding and development of an automated screening and rigidity detection system design.

Tracking Movement in Credibility Assessment Interviews The context first selected for investigating movement was a standard interview setting with the examiner and examinee sitting across from one another, with cameras recording the examinee throughout the interaction. To measure movement, we

178

Twyman, Elkins, Burgoon, and Nunamaker

Table 1. Initial Design Requirements for Automated Rigidity Detection Requirement number 1 2 3

Description Automatic tracking of overall movement Noninvasive measurement apparatus Automatic identification of rigidity during deception

adapted existing computer vision algorithms for recognizing the hands and face in images [14, 56]. For the detection of face and hand/arm locations in video, we applied a skin blob tracking (SBT) technique recently introduced to deception detection research [14, 55, 56, 70]. The SBT technique involves analyzing video frame by frame. For each frame, the face is detected using the Viola–Jones algorithm [84]. Once the face is detected, hand/arm “blobs” are identified by searching for areas of similar (skin) color. The centroid of the face and each hand/arm blob is identified for each frame. Compared to hand/arm movement, minor changes in head movement ultimately proved more difficult to detect using a full-body frame, standard-definition video. As an alternative method of collecting data for head movement, a close-up video recording of the face was processed using the software suite ASM Face Tracker [41]. This software tracks the two-dimensional Cartesian coordinates of many points on a face. The computer vision technique is built on active shape modeling (ASM), which uses spatial-statistical models of shapes to match identified points on an object in one image to points on an object in a new image. The ASM algorithm tries to match the statistical model parameters to the image. Thus, the model can deform (e.g., stretch), but not beyond what would be naturally seen in a real-world object of similar features, given properly defined model parameters [19]. For faces, this means that identified facial points must represent the image of a face as a whole. For instance, a point on the chin cannot be accidentally identified as immediately adjacent to a point on the eye as this would be outside the bounds of statistically normal model parameters.

Experiment 1 The SBT and ASM body point location tracking algorithms were used to generate movement data through postprocessing of video recordings of interviews that were part of a realistic mock crime experiment. Mock crime experiments are appropriate for veracity assessment research because the realism involved can elicit reactions that closely mirror real-world scenarios [16]. This mock crime experiment was designed to explore many sensor and questioning technique combinations. The current paper emphasizes that portion of the experiment relevant to automated rigidity detection for credibility assessment.

Rigidity Detection System for Credibility Assessment

179

Participants Participants (n = 164) were recruited from the local community of a large university in the southwestern United States via newspaper and Craigslist (www.craigslist.org) listings and paper flyers placed in community centers. We recruited from the local community in order to obtain a sample of participants more representative than students alone, which we felt was important for this more exploratory phase. The participants received $15 per hour for participation, plus a $50 bonus if they successfully convinced the examiner that they were innocent of the mock crime. Qualitative observations of the participants noted a broad diversity in economic and social status. Of the 164 enrolled participants, 134 (82 percent) followed instructions and completed the experiment. The remaining 18 percent were disqualified because they did not follow instructions, failed to consent to participate, or confessed during the interview. Because of technical problems with the video recording and analysis system, only 107 of the initial 134 cases produced usable data for analysis. Of these 107 participants, 40 participants “committed” the crime, leaving 67 who did not. In this subset, 63 percent were female, and the average age was 39.5 (standard deviation = 14.0). Experimental Procedures Participants in a simple two-treatment mock crime experiment were instructed to arrive at a room in an upper floor of an old apartment complex. A prerecorded set of instructions was waiting for them. After listening to the instructions and signing a consent form, the participants left the apartment complex and walked to a nearby building. Per the instructions, the participants reported to a room on the top floor and asked for a Mr. Carlson. A confederate acting as a new receptionist who did not know Mr. Carlson asked the participant to wait while he went to locate Mr. Carlson. A camera in the room verified the participants’ activities while they were waiting for the receptionist’s return. Participants in the Innocent condition simply waited, while those in the Guilty condition stole a diamond ring from the desk. Guilty participants took a key from a mug on the top of a desk and used it to open a blue cash box in the desk drawer that was hidden underneath a tissue box. They removed the ring from the cash box and hid it somewhere on their person. Upon returning, the receptionist directed the participants to another room on the bottom floor of the building, the layout of which is depicted in Figure 1. There, the participants were told that a crime had occurred in the building that day and that they would be interviewed to assess their possible involvement in the crime. All the participants were interviewed by one of four professional polygraph examiners provided by the National Center for Credibility Assessment (NCCA). The interviewers were trained and experienced, and were familiar with the purpose and procedure involved in administering various interviewing techniques, including CIT, a veracity assessment technique highly regarded in academic circles [5, 58, 80] but rarely used in practice, Japanese criminal investigations being the notable exception [57, 65]. The participants were offered a $50 bonus if they successfully convinced the interviewer that they were

180

Twyman, Elkins, Burgoon, and Nunamaker

Figure 1. Layout of Interviewing Room for Experiment 1

innocent. This large monetary reward together with the realism of the experiment was important to induce behavioral effects and motivate participants to appear innocent in ways that would closely mirror real-world scenarios. Two studio-quality video cameras were placed directly in front of the chair in which each participant sat during the interview. The chair had a low back and did not have armrests. No other furniture or objects were within reach. This setup ensured that inactive arms and hands would rest on legs during the CIT portion of the interview. Other cameras and sensors were also present in the room, to examine their potential for credibility assessment (to be reported elsewhere). The location of each hand/arm and the head were identified frame by frame using the SBT and ASM computer vision techniques. The interview consisted of several questioning techniques, including a CIT. The CIT was a major portion of the interview and became the focal procedural component of the rigidity detection system, as explained in later sections. To our knowledge, rigidity has never been investigated in a CIT format prior to this study, and exploring rigidity in the CIT was not initially a primary consideration. Rather, we sought to detect rigidity in alternative questioning techniques similar to previous work. However, the control and simplicity of the CIT, together with its potential for an automated system

Rigidity Detection System for Credibility Assessment

181

Table 2. Questions Used in the Study 1 Concealed Information Test Question If you are the person who stole the ring, you are familiar with details of the cash box it was stored in. Repeat after me these cash box colors:

If you are the person who stole the ring, you moved an object in the desk drawer to locate the cash box containing the ring. Repeat after me these objects:

If you are the person who stole the ring, you know what type of ring it was. Repeat after me these types of rings:

Words repeated by suspect Green Beige White Blue* Black Red Notepad Telephone book Woman’s sweater Laptop bag Tissue box* Brown purse Emerald ring Turquoise ring Amethyst ring Diamond ring* Ruby ring Gold ring

* Target items (i.e., correct answers).

prompted an exploratory rigidity analysis. The three CIT questions together with their associated target and nontarget items are included in Table 2. Video from the two cameras recorded during the interview were processed to generate overall movement data, as explained in the next section. A final questionnaire followed the interview portion of the experiment, and contained simple manipulation check questions, together with a question about perceived behavioral control and measures of arousal and motivation levels. Measuring Rigidity This study took a novel approach to measuring movement, designed to circumvent the need for post hoc, manual subjective judgments. For the mock crime experiment, the centroid coordinates of each SBT-generated blob and the center coordinates of the ASM face model were generated for each frame between the end of an interviewer question and the beginning of the next question. Once data for each frame were generated, overall movement for the left and right hands/arms for each video segment was calculated by determining the average Euclidean distance between centroid position changes frame by frame during a given response in the following manner: Ms =  ∑ j  i

( y2 − y1 )2 + ( x2 − x1 )2  / j.

182

Twyman, Elkins, Burgoon, and Nunamaker

This produced an average overall movement score for each response for each individual. However, average overall movement during a response is certain to be affected by more than just veracity level. Culture, personality, mood, gender, and question type are example factors that may also affect overall movement or lack thereof. For instance, qualitative observations of the participants revealed that, on average, those from Western cultures tended to exhibit more movement overall when “sitting still” than those from Eastern cultures. Identifying, automatically measuring, and integrating all such potential global moderating factors that influence movement is a difficult and complex task, and well beyond the scope of this study. However, a repeated-measures interviewing protocol provides the possibility of an alternative approach. Individuals can be compared to an individual baseline rather than an overall population average [2, 82], sidestepping the need to account for factors such as gender, culture, or mood. The movement averages for each segment were thus standardized as within-subject z‑scores. The z‑scores were also body point specific, because natural variance is expected in the amount of movement that each point on the body will exhibit (e.g., a little movement of the head can be just as meaningful as a relatively large movement of a hand). In the case of the CIT questions, z‑scores were also question specific to control for the possibility of question effects. Results As part of the postinterview questionnaire, the respondents self-reported their levels of motivation, effort, and tension, each on a seven-point scale (see the Appendix). Participants reported high levels of motivation and effort, and moderate levels of tension. Summary statistics are in Table 3. Within-subject comparisons of interquestion overall movement did not produce significant results for any tested interviewing protocols except the CIT. The rigidity results of tests other than the CIT are omitted for succinctness. For the CIT questions, a multilevel regression model was specified for overall movement during the response time for each foil item. Multilevel regression models use adjusted standard errors to reflect the uncertainty that arises from variation within a subject. The summation of standardized movement scores for right hand, left hand, and head was used as the dependent variable. The independent variables included Condition (dummy coded: 1  = Guilty, 0  =  Innocent), Participant, and Target Item (dummy coded: 1 = Correct Answer, 0 = Incorrect Answer). Question and interviewer were initially included as covariates but were found to not be significant predictors and were subsequently dropped from the model. The effect of greatest interest was the Condition and Target Item interaction, which reflected overall movement when Guilty participants responded to the correct answer. The results of the multilevel regression model are shown in Table  4, with Condition labeled “Guilty” to help facilitate interpretation. The significant estimate of –0.624 in the Guilty × Target Item interaction can be interpreted to mean that when Guilty participants were asked to repeat the correct answer to a CIT question, they tended to be approximately 0.624 standard deviations

Rigidity Detection System for Credibility Assessment

183

Table 3. Self-Reported Motivation, Effort, and Tension Self-report (seven-point scale) Motivation to succeed Effort Tension

Condition

Mean

Standard deviation

Innocent Guilty Innocent Guilty Innocent Guilty

6.12 6.15 5.75 6.13 3.00 3.21

1.31 1.29 1.54 1.24 1.56 1.75

Note: No comparisons were significantly different between groups.

Table 4. Overall Movement: Multilevel Regression Model Results Fixed effects Intercept Guilty Target Item Guilty × Target Item

β

β standard error

0.044n.s. 0.102n.s. –0.197n.s. –0.624*

0.069 0.109 0.171 0.267

Notes: N = 1,887. Model fit using maximum likelihood. * p