Fidelity & Validity in Robotic Simulation - IEEE Xplore

6 downloads 0 Views 714KB Size Report
Mitchell Dunfee: tolour@knights.ucf.edu. Tyler Wild: tyler.wild@knights.ucf.edu. P. A. Hancock: peter.hancock@ucf.edu. University of Central Florida. Orlando, FL.
2015 IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA)

Fidelity & Validity in Robotic Simulation K. Elizabeth Schafer: [email protected] Tracy Sanders: [email protected] Theresa T. Kessler: [email protected] Mitchell Dunfee: [email protected] Tyler Wild: [email protected] P. A. Hancock: [email protected] University of Central Florida Orlando, FL Keywords: Human-Robot Interaction; Trust; Simulation Validation

using the RIVET simulation to measure trust responses in HRI. As our live and simulated conditions present the same robot in the same scenario, we expect to find a high correlation of reported trust levels between the simulated and live trials.

Abstract This work assesses the relationship between common theoretical constructs involved in simulation design and evaluation. Specifically, the degree to which realism is a desired goal in design is examined through a thorough review of the available literature. It was found that, especially for training simulations, high fidelity does not always beget improved outcomes, and this finding was corroborated by the results of an experiment involving a simulated robot. In the within-subjects experiment, participants rated their trust in both live and simulated versions of a robot performing in both reliable and unreliable scenarios. As predicted, strong correlations in both the reliable and unreliable scenarios validate the RIVET simulation engine as a model for trust in HRI and provide further evidence that relatively low-fidelity simulations can sometimes be sufficient or superior to highfidelity alternatives. 1.

1.1. Trust Trust is an important component of HRI as illustrated by direct links to outcomes such as team effectiveness and performance [Lee and See 2004]. As we constantly increase robot capabilities and the domains where robots are used, the issue of trust becomes ever more important. Trust is one of the largest contributing factors to the success of interactions involving any kind of automated device [Sheridan and Ferrell 1974], and is therefore a determining factor to the success of any task involving robotic systems. 1.2. Clarifying Evaluation Constructs Three broad, interdependent constructs used in evaluating simulations are fidelity, validity, and verification. Generally speaking, fidelity refers to a simulation’s realism, validity refers to practicality, and verification refers to the correctness of the model. Hughes and Rolek [2003] have suggested that fidelity is the ability of the simulation to represent reality, whereas validity refers to the actual usefulness of the model. Verification then deals with whether the model operates as intended. “Validation is the process of determining that we have built the right model, whereas verification is designed to see if we have built the model right” [Pegden, Shannon, and Sadowski 1995, 129]. The definitions of fidelity and validity in the literature are inconsistent and often overlap. Validity, especially when it is broken down into different types, often assumes characteristics that would otherwise be restricted to fidelity. For instance, the term “representational validity” (which, in turn, can be further broken down into subtypes) is the degree to which the simulation accurately represents the desired phenomenon [Feinstein and Cannon 2002], thus contradicting the aforementioned explanation provided by Hughes and Rolek [2003]. That is an example of validity encroaching on fidelity. This is not inherently wrong, but

INTRODUCTION

The purpose of this work was to validate the use of the Robotic Interactive Visualization & Exploitative Technology (RIVET) simulation engine in modeling trust in human-robot interactions (HRI). RIVET displays a variety of currently utilized military robots through computerdelivered simulations for the purpose of modeling robot behavior in HRI experimentation. It is crucial that the human users respond equitably to the simulation and interaction with a live robot for the simulation to be useful. Previous work has looked into determining the appropriate methods for validating simulations, with the overall consensus that simulations development is context dependent [Feinstein and Cannon 2002; Law 2009]. Here, we focus on the use of this computer-delivered simulation to gain information about trust in HRI. Simulations are often evaluated by comparing data from the simulator to real world data [Reimer et al., 2006]. In this experiment, we will examine the efficacy of

978-1-4799-8015-4/15/$31.00 ©2015 IEEE

113

2015 IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA)

we feel that such an overlap practically nullifies the need for fidelity as a separate term. Some points in favor of this need: (1) fidelity’s importance from a cost standpoint, (2) its role in the planning (as opposed to post evaluation) stage of project development, and (3) large differences in the way realism and practicality are measured. Another reason for the fidelity-validity overlap may be an overemphasis on what fidelity should be, as opposed to what it simply is. From Hays and Singer [1988], we find this definition of fidelity: “how similar a training situation must be, relative to the operational situation, in order to train most efficiently” [p.1]. In this instance, the key component of fidelity, degree of realism, is preserved. However, efficiency – though important – is a foreign addition to the term’s meaning. Determining efficiency requires a cost-benefit analysis. In this case, fidelity could constitute the “cost” (in space, equipment, programmers, etc.). For the “benefit” side of the equation, validity must be estimated by default. Saying a simulation has high fidelity is simply a description; saying it has an appropriate level of fidelity requires a value judgment. It is similar to the difference between “That bucket is a dark blue” and “I like the color blue on that bucket”; or “That simulation is very realistic” and “The simulation’s realism adds to its quality”. Due to a historical bias towards (unnecessarily) high-fidelity simulations, a priority of Hays and Singer [1988] was to emphasize, as they put it, “efficiency” when planning for simulation fidelity. However, as we have explained, that is fidelity in good practice, not its core meaning. Additionally, fidelity is subject to the law of diminishing returns. For example, tweaking certain aspects of a simulation, such as video resolution, may not be worth the time or resources to meet the final design goals. Increased realism may even be counterproductive. This concept was explored in the Alessi Hypothesis in 1988 (see figure 1).

likewise, high-fidelity systems. Some such features include: detailed, precise, and immediate feedback; the ability to track and adjust to trainee performance in a steady, systemized manner; unique time and pacing manipulation (e.g. pause, restart); and the simplification of complex task sequences. Thus, as Hays and Singer [1989] argue, such simplifying measures “reduce the realism of the training situation, but enhance learning”. This may be especially true for novices, who can be over-stimulated by high-fidelity systems [Feinstein and Cannon, 2002; Alessi 1988]. Therefore, fidelity is a means to the end goal of validity, but increases in fidelity do not necessarily beget improved outcomes. Accordingly, computer-based systems like RIVET can be capable simulations, as our evidence supports. 1.3. Current Work Simulation engines are commonly used to model behavior in order to make predictions about human behavior in the field. However, simulations must be evaluated on a case-by-case basis depending on their intended use, necessitating unique evaluations for nearly every type of model. It is especially important to examine simulations that propose to model trust in HRI since the amount of trust afforded to a simulation is sometimes different from the trust a user affords a live robot. For instance, it has been found that participants consider live robots more intelligent, helpful, and likeable than their simulated counterparts [Wainer et al., 2007], and are more likely to give them personal space and to obey their commands [Bainbridge et al., 2008]. However, other studies have shown similar trust ratings for live and simulated robots. Woods and colleagues [2006] found no significant differences in reported comfort levels between live robot and videotaped robot scenarios. Sometimes, the difference is not an issue of the robot at all, but of the live or simulated environment. Shinozawa [2005] found differences to be dependent on visual incongruities between the two interaction environments - it was the look of the simulated environment, not the absence of the robot‘s physical presence that caused them. The human user must trust the robotic system in order for HRI to be most productive. For that reason, we are manipulating the reliability of the robot to manipulate user trust levels. Participants will monitor the progress of both a live robot and a simulated robot in conditions where the robot is either reliable or unreliable then rate the robot on trustworthiness. We expect to find a large correlation between trust ratings for the live and simulated robot conditions, and a significant difference in trust levels between the 100% reliable and 25% reliable conditions.

Figure 1: The Alessi Hypothesis (Alessi 1988) Furthermore, simulations can provide learning benefits that are sometimes limited or unavailable in real life and,

2.

114

METHOD

2015 IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA)

simulated conditions, participants monitored the simulated robot while it performed the same task in a similar environment on a laptop computer. For the training trials in both settings (live and simulated), participants were asked to monitor the robot as it completed the task of locating an object (a box), then approaching the object and touching it. The participants were then asked to monitor the progress of the robot as it performed the same task, but with multiple objects (a box, a cone, a plant, and a pot), and mark the robot’s progress on a Robot Performance Worksheet. Participants monitored the progress of the robot performing its task during two trials for each setting: in one trial, the robot performed its task 100% reliably, and in the other trial, the robot performed its task 25% reliably. Following each trial, participants were asked to complete the Robot Trust Scale (Post-Interaction) [Schaefer, 2013]. The data collected from the Robot Performance Worksheets served as a manipulation check to confirm that participants understood the task, performed it properly, and noticed when the robot was not performing reliably. Following the final trial, participants were provided with post-experiment information describing the background and purpose of the study.

2.1. Participants Using the Sona System, college undergraduate students (N = 20) were recruited at the University of Central Florida. The Sona System is an online human subject pool management engine used by universities. As compensation for their participation, participants were given Sona credits to be used as extra credit in previously approved psychology courses. Participants consisted of 9 male and 11 female students Mage = 18.25 (SD = 0.72). 2.2. Design This 2 (presentation method: live or simulated) x 2 (reliability: 25% or 100%) between-participants design involved comparing participants’ trust ratings of live robots to their trust ratings of simulated robots, in conditions where the robot was either 25% reliable or 100% reliable. Each participant completed 4 (2-minute) trials: · Live robot / 25% reliable · Live robot / 100% reliable · Simulated robot / 25% reliable · Simulated robot / 100% reliable The trials were counterbalanced to account for order effects. 2.3. Materials Using a laptop computer, participants filled out questionnaires and provided basic demographic information. The Negative Attitude Toward Robots Scale (NARS) [Nomura et al., 2006] was used to evaluate pre-existing biases against robots. To assess individual differences in personality, the Mini-IPIP [Donnellan et al., 2006] was administered. Trust towards the robot was evaluated using the Robot Trust Scale [Schaefer, 2013], which consists of both pre- and post-interaction scales. The pre-interaction section of the scale provides the user with an image of the robot on which to base their trust ratings. Participants were also provided with a robot performance worksheet to evaluate the robots progress on its task. The live conditions were enacted using a Lego MindStorm NXT 2.0 robot. The simulated conditions were deployed using the RIVET simulation engine to the same laptop computer that delivered the surveys. RIVET is a game-based, interactive, robotic modeling and simulation platform.

3.

RESULTS

Trust ratings for the 20 participant were calculated in each of the four conditions (live reliable, live unreliable, simulated reliable, simulated unreliable) by subtracting the initial Robot Trust Scale [Schaefer, 2013] score for each participant from their post-trial score, yielding a difference score for each participant per treatment condition. 125 100 75 Trust Rating

50 25

Live 100% Reliable Simulated 100% Reliable

0 -25

Live 25% Reliable

-50

2.4. Procedure After participants were greeted, they received an informed consent document to read and verbally accept. Next, they were asked to complete a demographics questionnaire, the Negative Attitude Toward Robots Scale (NARS) [Nomura et al., 2006], Mini-IPIP [Donnellan et al., 2006], and Robot Trust Scale (Pre-Interaction) [Schaefer, 2013] using a laptop computer. In the live conditions, participants monitored the Lego MindStorm robot in a small experimental area. In the

-75

-100

Simulated 25% Reliable

-125 Figure 1. Each participant’s trust ratings for all conditions. The first analysis determined the correlation between the two 100% reliable conditions (live and simulated) and the two 25% reliable conditions (live and simulated). After

115

2015 IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA)

verifying assumptions, a correlation between the live and simulated conditions was evaluated using the Pearson product-moment correlation coefficient. It showed strong correlations in both the 100% reliable (r(20) = .878, p < .001; accounting for 77.0% of the variance) and 25% reliable (r(20) = .895, p < .001; accounting for 80.1% of the variance) scenarios.

appropriate degree of fidelity for HRI simulation of trust, and 2) that it is valid for modeling user trust.

4.3 Conclusion In our experiment, this capability refers to RIVET’s valid depiction of a phenomenon found to occur with live robots - that trust diminishes with robot error/unreliability. As discussed previously, different types of validity generated by researchers can be troublesome for their overlap with each other and with other theoretical constructs, but we feel that it would be helpful to use a few of those terms here in order to fully illustrate the potential RIVET displayed in our experiment. In Feinstein and Cannon’s excellent 2002 review, a number of helpful validity subtypes avoid encroaching on fidelity and verification, warranting their use. RIVET has content validity because it performs its intended function (modeling HRI) and conceptual validity because it adequately represents the real-world HRI trust system. Finally, RIVET is operationally valid because participant behavior was consistent with that of the live session. Since this experiment utilized a simple design, future research should investigate more complicated tasks involving more participant engagement to further test RIVET’s capabilities. A greater degree of realism may be required and/or more intricate HRI variables on trust may be outside RIVET’S modeling scope. However, the strength of our data lends considerable support to RIVET - and the use of simulations generally - in place of live robot sessions in HRI experiments.

90 70 Trust Rating

50 30

Live 100% Reliable

10

-10 -30 -50

Simulated 100% Reliable

-70 -90

Figure 2. Each participant’s trust ratings for the 100% reliable condition. The second analysis examined the difference between trust ratings in the reliable and unreliable conditions. As expected, 100% reliable scenarios were rated significantly higher on trust scales than the 25% reliable scenarios [F(1, 19) = 157.48, p < .001]. 90 70

Trust Rating

50 30

Live 25% Reliable

References Alessi, S.M., 1988, “Fidelity in the Design of Instructional Simulations”, Journal of Computer-Based Instruction, 15, no. 2, (Spring): 40-47 Bainbridge, Wilma A., Justin Hart, Elizabeth S. Kim, and Brian Scassellati. 2008. "The effect of presence on human-robot interaction." In Proceedings of the IEEE ROMAN 2008 17th International Symposium on Robot and Human Interactive Communication, (Munich, Germany, August 1-3), 701-706. Donnellan, Brent M., Frederick L. Oswald, Brendan M. Baird, and Richard E. Lucas, 2006, "The mini-IPIP scales: tiny-yet-effective measures of the Big Five factors of personality", Psychological Assessment, 18, no. 2, (June):192. Feinstein, Andrew H. and H. M. Cannon, 2002, “Constructs of simulation evaluation”, Simulation & Gaming, 33, no. 4, (December):425-440. Hughes, Tom and Evan Rolek. 2003. “Fidelity and validity: issues of human behavioral representation requirements development,” In Proceedings of the 2003 Winter

10 -10 -30 -50

Simulated 25% Reliable

-70 -90

Figure 3. Each participant’s trust ratings in the 25% reliable condition. 4.

DISCUSSION

4.1. Implications of Results These results confirm our first hypothesis with strong correlations of trust ratings across settings (live and simulated) in both reliable and unreliable conditions. Additionally, the data show a significant difference in trust ratings between the 100% and the 25% reliable conditions, confirming our second hypothesis. Our conclusion from these correlations is twofold: 1) that RIVET has an

116

2015 IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA)

Simulation Conference, (New Orleans, LA, December 7-10), 976-982. Lee, John, & See, Katrina A., 2004, “Trust in automation: Designing for appropriate reliance”, Human Factors, 46, no. 50, 50–80 Law, Averill M. 2009. “How to build valid and credible simulation models” In Proceedings of the 2009 Winter Simulation Conference, (Austin, TX, December 13-16), 24-33. Nomura, Tatsuya, Tomohiro Suzuki, Takayuki Kanda, and Kensuke Kato. 2006. "Altered attitudes of people toward robots: Investigation through the Negative Attitudes toward Robots Scale." In Proceedings of the AAAI-06 Workshop on Human Implications of Human-Robot Interaction (Boston, MA, July 17), 2935. Pegden, Claude Dennis, Robert E. Shannon, and Randall P. Sadowski. 1995. Introduction to simulation using SIMAN, Vol. 2., McGraw-Hill, New York, NY. Reimer, Bryan, Lisa A. D’Ambrosio, Joseph F. Coughlin, Michael E. Kafrissen, and Joseph Biederman, 2006, "Using self-reported data to assess the validity of driving simulation data", Behavior Research Methods, 38, no. 2, (May): 314-324. Schaefer, Kristin E., 2013 “The perception and measurement of human-robot trust." Doctoral Dissertation, University of Central Florida. Sheridan, Thomas B., and William R. Ferrell. 1974. Manmachine systems; Information, control, and decision models of human performance, MIT Press, Cambridge, MA. Wainer, Joshua, David J. Feil-Seifer, Dylan A. Shell, and Maja J. Mataric. "Embodiment and human-robot interaction: A task-based perspective." In Proceedings of the IEEE RO-MAN 2007 16th International Symposium on Robot and Human interactive Communication, (Jeju Island , Korea, August 26-29), 872-877. Woods, S. N., M. L. Walters, Kheng Lee Koay, and Kerstin Dautenhahn. 2006. "Comparing human robot interaction scenarios using live and video based methods: towards a novel methodological approach." In Proceedings of the 9th IEEE International Workshop on Advanced Motion Control (Istanbul, Turkey, March 2729), 750-755.

aggression, and driving behavior. She plans to pursue a PhD in psychology with an emphasis on social psychology. Tracy Sanders received her B.S. in Psychology in 2011 with a minor in Studio Art at the University of Central Florida. As an undergraduate, she studied trust in human robot interaction, time perception, and 3-dimensional studio art. She also holds an A.S. in graphic design form The Colorado Institute of Art. Now a Ph.D. student in the Applied Experimental and Human Factors Psychology program at the University of Central Florida, she is a research assistant on the RCTA, Robotics Collaborative Technology Alliance project, focusing her work on the aesthetic components of trust and robotics, and the physiological indicators of trust in HRI. Theresa Kessler is from Houma, LA. She received a B.Sc. in Marketing from Nicholls State University in 2001 and a B.Sc. in Psychology from University of Central Florida in 2013, where she graduated with honors in the major. She is currently working on her Ph.D. in Applied Experimental Human Factors Psychology at the University of Central Florida under the direction of Dr. Peter Hancock. She has worked in the MIT2 Laboratory since January of 2013. In addition, Theresa is a Graduate Teaching Assistant. Her research interests include trust and transparency in HRI in addition to human-robot teaming. Mitchell Dunfee Mitchell is currently a graduate student in the EAMBA program at Rollins College, and intends on working in the field of Forensic Engineering upon graduating. As an undergraduate, Mitchell attended UCF where he majored in Psychology. In November 2012 Mitchell joined the MIT2 laboratory, where he aided in research and virtual environment design. Mitchell is interested in human factors research related to accident reconstruction, driving, commercial trucking and conspicuity. Tyler Wild is currently an undergraduate at The University of Central Florida. At this time, he is working on his B.S. in Psychology, as well as a minor in Statistics. Presently, he holds a position as a research assistant for the MIT2 Lab at UCF. Here, he assists in research regarding Human Factors Psychology and aids in other various tasks for the lab. Tyler is also interested in conducting his own research in the lab regarding pressure and its effects on performance. After completing his undergraduate studies at UCF, Tyler plans on furthering his education by focusing his graduate studies on Sport Psychology. Specifically, he intends to pursue a Master’s degree in Sport Psychology and would like to become a Sport Psychologist for a professional sports team.

Biographies Elizabeth Schafer is a fourth-year undergraduate psychology student at the University of Central Florida (UCF). She is a research assistant for the UCF MIT2 laboratory and is currently collaborating on projects concerning human-robot interaction in team settings. Her research interests include: group dynamics, power,

117

Suggest Documents