These types of computer support can be considered to define ... 365. Accordingly, the present experiment was carried out to understand whether distance.
Consequences of shifting from one level of automation to another: main effects and their stability Francesco Di Nocera1, Bernd Lorenz2, & Raja Parasuraman3 Cognitive Ergonomics Laboratory, Department of Psychology, University of Rome “La Sapienza”, Rome, Italy 2 Institute of Flight Guidance, German Aerospace Center (DLR), Germany 3 ARCH Lab, George Mason University, Fairfax, VA, USA 1
Abstract A simulated space operations environment was used to investigate the effect of changes in the distance between levels of automation (LOA) on operator telerobotic performance. Participants were assigned to four experimental conditions corresponding to four different LOAs. During the simulation they both switched and did not switch between these LOAs. Participants performed three tasks: i) rover task: controlling the rover’s cruise and picking up (by means of a mechanical arm) samples of rocks from the terrain; ii) fault detection task: detecting the occurrence of a fault; and iii) recovery task: remembering the location of each of the simulated operators within the station, in order to recover from the fault. No performance costs were observed when two successive trials involved the same LOA. On the contrary, upward and downward shifts in LOA led to performance costs that were modulated by stage of processing and workload. Particularly, when workload was high, performance was primarily negatively affected when shifting from decision support to manual control (LOA 2 and 0 respectively) in a detection task, whereas the same shift led to a better performance in a working memory task. Finally, moderate LOA seems not to be suited for supporting working memory tasks. Introduction Automation is being introduced into various systems in various domains of work and everyday life, part of the move towards “ubiquitous computing”. Automated subsystems now provide the human operator valuable support in such domains as air, ground, space, and maritime transportation, military command and control, health care, and other areas. These types of computer support can be considered to define different levels of automation (LOA) between the extremes of full manual and full automation control (Sheridan, 2002). Between these two extremes a variety of intermediate LOA can be identified, and each one could be conceptualized as a compromise between human and machine responsibilities. A given system could be designed for a particular LOA on the basis of criteria such as system safety and efficiency, as well as human performance criteria such as the In D. de Waard, K.A. Brookhuis, R. van Egmond, and Th. Boersema (Eds.) (2005), Human Factors in Design, Safety, and Management (pp. 363 - 376). Maastricht, the Netherlands: Shaker Publishing.
364
Di Nocera, Lorenz, & Parasuraman
maintenance of situation awareness and balanced workload (Endsley & Kaber, 1999; Parasuraman, Sheridan, & Wickens, 2000). LOA may also be modified in real time during system operations, as in the so-called adaptive automation (Moray, Inagaki, & Itoh, 2000; Parasuraman, Molloy, & Singh, 1996; Scerbo, 1996; Scerbo et al., 2001). In adaptive systems, task allocation between the operator and the computer systems is flexible and context-dependent. Adaptive automation may reduce the human performance costs (unbalanced mental workload, reduced situation awareness, complacency, skill degradation, etc.) that are sometimes associated with high-level decision automation. Several investigators have looked at the effects of different Levels of Automation (LOA) on performance. According to Parasuraman et al. (2000) high LOA can be usefully implemented for information acquisition and analysis functions. Nevertheless, decision making functions are acknowledged to be best supported by moderate LOA. Studies by Crocoll & Coury (1990), Sarter & Schroeder (2001), and Rovira, McGarry, & Parasuraman (2002) support this view by showing that unreliable decision automation leads to greater costs than unreliable information automation. Kaber, Onal, & Endsley (2000), Endsley & Kiris (1995), Endsley & Kaber (1999), and Kaber, Onal, & Endsley (1998) also provide support for a “moderate” LOA philosophy. The underlying rationale views moderate LOA as an optimal balance with respect to the performance trade-off resulting from the benefits of reduced workload associated with higher LOA on the one hand and with better maintenance of situation awareness associated with lower LOA on the other hand. These studies induced rare automation failure events that require operators to return to full manual control. Typically, the higher the LOA prior to this event the poorer the return-tomanual performance or -in other words- the higher the out-of-the-loop performance cost. Lorenz, Di Nocera, Röttger, & Parasuraman (2002), however, have shown that a higher LOA in a complex fault-management task does not necessarily lead to poorer return-to-manual performance under automation failure in comparison to a moderate LOA, as long as the interface supports operator information sampling to maintain situation awareness. In fact, the moderate LOA was found to be linked to a higher disengagement of sampling fault-relevant information. Apparently this LOA directed the operator attention to lower-order manual implementation of faultrecovery actions at the expense of monitoring the impact of these activities on higher-order system constraints. A mitigation of this effect could be achieved in a follow-up study that used an integrated display in support of fault state monitoring (Lorenz, Di Nocera & Parasuraman, 2004). According to these studies the LOA per se is not necessarily the crucial factor affecting the out-of-the-loop performance costs. In general, it appears that there are differential effects of LOA by stage of processing and interface type. Yet, the experimental procedure used in these studies involved LOA shifts in different blocks, making it difficult to generalize the effects found to the adaptive automation domain. Indeed, adaptive automation assumes changes in LOA within shorter time frames, e.g. even from trial to trial, and there is very little research on such dynamic shifts in LOA. Furthermore, it is unclear whether the direction of the shift (up or down the LOA continuum) affects performance.
shifting from one level of automation to another
365
Accordingly, the present experiment was carried out to understand whether distance (extension of the “jumps” in the LOA hierarchy) and direction (upward vs. downward) in the shift from one LOA to another may affect human performance. To this aim, the impact of different LOA at different stages on performance during a complex task was examined. Particularly, the task was a simulated space telerobotic operation. Subjects were assigned to four groups, each one interacting with the system most of the time at a particular level of automation, and occasionally at other levels. Two main hypotheses can be tested with this design. The first hypothesis is that, when considering only distance “zero” (no shift between levels), the performance of the four groups should be equivalent, independently of the task that the subjects are carrying out. Indeed the groups are set to act within a particular LOA and they should show neither benefits nor costs associated with the execution of tasks within that level. The second hypothesis is that, when shifting from one LOA to another (as occurs in adaptive systems), specific distance and direction effects should be observed. Likely, a linear pattern should be expected downward (the larger the downward distance the larger the performance decrement), because of the costs associated with re-engagement of a previously dismissed process. This general pattern can be expected from the studies cited above, which address the out-of-theloop performance problem associated with varying LOA using the return-to-manual paradigm. Conversely, upward shifts should provide less predictable results. Indeed, the opposite linear pattern (the larger the upward distance the larger the performance increment) would suggest that there is no cost associated with the disengagement of a process. Hence, operators would adopt a very passive role when interacting with the system. Considering that subjects in the present experiment did not deal with a 100% reliable system, one would expect them showing some sort of resistance to the disengagement of a process. Experiment Method Participants Sixteen Catholic University of America undergraduate students (8 males and 8 females) volunteered to participate. Their mean age was 23 years, ranging from 18 to 35 years. All of them had normal or corrected-to-normal vision. All participants reported being right handed, and were naive as to the hypotheses of the experiment. Each student received $40 for participation. Tasks The tasks used in this experiment was composed of two separate simulations: 1) a telerobot simulation and 2) a station management simulation. One could consider the whole task as a videogame, with its rules and scores. The aim was to immerse the participants in a simulated scenario: the exploration of an area on Mars, pretending that they were scientists who controlled the entire operation from their workstation. The supposed location of the scientist was a space station in orbit around the planet, with no communication delay between station and planet surface. There were several
366
Di Nocera, Lorenz, & Parasuraman
differences between some actual telerobot workstations used in space research and this simulation, since no communication delay was considered, and the scientist had more responsibilities (e.g. station monitoring) than is typically the case. However, space telerobotics for scientific exploration is still in its early days, and no standard exists yet. Furthermore, the aim of this apparatus was to provide a challenging as well as general task that allowed generalization of the results. The use of such microworlds is considered a viable compromise between the amount of complexity needed to derive meaningful conclusions and the amount of experimental control needed to assess the impact of target experimental variables with sufficient reliability (Gray, 2002; Lorenz, Di Nocera & Parasuraman, 2003). Telerobot simulation. The telerobot simulation software was developed in Virtual Reality Modelling Language (VRML 2.0). Actually, VRML is neither virtual reality nor a modelling language. Virtual reality typically implies an immersive 3D experience, and 3D input devices, while VRML does not require immersion, even if it does not preclude it. The scenario provided the participants a picture that was close enough to the true operational environment. The terrain was a reasonable simulation of the Martian soil. The rover structure was very spartan, but it was close enough to provide all the elements needed to operate. Sensors were attached to the joints in order to allow routing from a control panel (a set of sliders) to them. The manipulator and the lower arm moved only vertically, while the upper arm moved vertically, horizontally, and around an imaginary cone. The arm was dexterous enough to allow pick up and release operation, but tricky enough to be still a complex task. Due to programming constraints it was not possible to detect collision between arm and samples, therefore a “rule” was implemented into the task: the samples were located in coloured boxes, defined as “areas of interest”, and to pick up the sample the participants had to immerse the manipulator into such boxes. To provide also a rule for releasing, subjects needed to place the manipulator within a green frame prior to pressing the release button. Users’ behaviour was collected using scripts associated to each element of the control panel and as the rover approached a sample. In this way it was possible to get specific data regarding: 1. time needed to approach a specific sample; 2. time needed to collect the sample; 3. number of movements needed for the operation; 4. number of samples collected. Station management simulation. The station management interface was developed using Microsoft Visual Basic 5.0. It provided a set of indicators and controls that the participants utilized to monitor and control the station environment, as well as part of the rover functions, namely the speed and the camera view. Another important function of this interface was to provide a set of tools for scientist-operator and system-scientist communication. Everything in the simulation used in the present study was scripted; timing for each event was strictly matched for all the participants. Audio warnings, also script-driven, started one minute before the time allowed for the area exploration expired.
shifting from one level of automation to another
367
Six environmental support indicators were placed in the right part of the interface. Four of them (Atmosphere, Temperature, Humidity, and Pressure) had a red scale with a green central part indicating the normal values interval. Other two indicators were LEDs indicating the amount of energy available and the level of radioactivity in the station environment. Other indicators included a little window showing updates of the operators’ position and a strip chart showing the solicitation from the terrain to the rover. The Intercom was a separate window invoked by the Intercom button, which was located in the lower right part of the interface. It provided three sets of mutually exclusive options corresponding to: 1. ten faults that could occur (for the first four indicators too high or too low values, while radioactivity could be only too high, and power only too low); 2. names of the three operators in charge of the different systems; 3. three areas where the operators could be located. The Automatic Aid System was represented by a window (usually blank) showing messages from the system. These messages were scripted as everything else in the simulation and provided help according to the four LOA used. Response times and subjects’ choices were collected during the mission simulation. The LOA used in this experiment were “System Action” (LOA 3), automation took control of the task from the operator, and then sent a notification message to the operator reporting what was done; “Notification + Suggestion” (LOA 2), automation notified events deviating from the normal behaviour of the system, and suggested to the operator a possible solution to the problem; “System Notification” (LOA 1), automation notified events deviating from the normal behaviour of the system; “No Automation” (LOA 0), automation was absent, the human operator carried out the task without help. Readers should be advised that in this paper the LOA number is unrelated to Sheridan’s LOA scale. However they represent a continuum of levels (0 is one step under 1, 1 is one step under 2, and 2 is one step under 3). Using numbers 0 to 3 has been considered useful for computing shifts in terms of distances from -3 (three steps down) to +3 (three steps up). See Table 1 for details. Procedure At the beginning of experiment, the Immersive Tendencies Questionnaire (Witmer & Singer, 1994; 1998) was administered. After they completed the questionnaire, participants sat in front of a computer monitor placed in a dark room. They were given a practice session (LOA 3 for all the participants), then randomly assigned to one of 4 experimental conditions corresponding to the four LOAs described above. After the training, they engaged in 3 experimental sessions, each 45 minutes long. Participants of each group performed two tasks in the simulation. The fault detection task consisted of detecting the occurrence of a fault, and the recovery task involved remembering the position in the station of each of the operators, in order to recover from the fault. Each group executed the Detection (of a fault) and Recovery (remembering the recovery procedure) tasks receiving help by the system at the four levels of automation: on 40% of fault occurrences they received help at the LOA characterizing the group, while on the remaining fault occurrences they received help at the other 3 LOA (20% each one). The order of the fault occurrences was
368
Di Nocera, Lorenz, & Parasuraman
randomized across subjects. LOA distance and direction were defined as the difference between the LOA which characterized the group (LOA[g]) and the LOA at which the system help was set on each trial (LOA[t]). Hence, distance “0” means that on that trial the LOA[t] and the LOA[g] were identical. Distance “1” means that on that trial the LOA[t] was one level above the LOA[g]. Distance “-2” means that on that trial the LOA[t] was two levels below the LOA[g], and so on. The following table shows how distances between LOA were computed (distances are indicated in cells). Table 1. This table shows how distances between LOA were computed (distances, or automation shifts, are indicated in cells). LOA[g] stands for Group’s LOA, whereas LOA[t] stands for Task’s LOA LOA[g]=0 LOA[g]=1 LOA[g]=2 LOA[g]=3
LOA[t]=0 0 [no shift] -1 -2 -3
LOA[t]=1 +1 0 [no shift] -1 -2
LOA[t]=2 +2 +1 0 [no shift] -1
LOA[t]=3 +3 +2 +1 0 [no shift]
Two workload conditions were used. In the low workload (single-task) condition subjects only monitored the station functions, whereas in the high workload (dualtask) condition they also controlled the rover. These conditions were introduced in order to test whether changes in task demand would differently affect performance in the different LOA conditions. After the three sessions participants were asked to rate their trust in the automated system using a scale ranging from 1 “I don’t trust this system” to 7 “I trust this system”. Finally, the Presence Questionnaire (PT) (Witmer & Singer, 1994; 1998) was administered to all participants following the conclusion of the task. Design Since a very small number of errors occurred (1.6 mean errors per condition), only response times were analyzed. A comparison between “no-shifts” trials (that is, trials where LOA[g]-LOA[t] was 0 (i.e., 0-0, 1-1, and 2-2) was carried out using an ANOVA design LOA[g] (0 vs. 1 vs. 2) by Workload Condition (low vs. high) by Task (detection vs. memory). LOA[g]=3 was not taken into consideration here, because subjects did not provide any response at this level. Analyses on distances were carried out separately for the two tasks using a LOA[g] by Distance (2 vs. 1 vs. 0 vs. -1 vs. -2 vs. -3) by Workload nested design, with Distance(LOA[g]) as nested factor. Indeed, this unbalanced design allowed for the analysis of distances 0, 1, and 2 (derived from LOA[g]=0), distances -1, 0, and 1 (derived from LOA[g]=1), distances -2, -1, and 0 (derived from LOA[g]=2), and distances -3, -2, -1 (derived from LOA[g]=3). Additionally, one-way ANOVAs by LOA[g] were performed on 1) a composite index for telerobotic performance (number of the telerobotic arm movements / number of samples collected), 2) the immersive tendencies and presence scores, and 3) the trust in automation as reported by participants using the scale described above.
shifting from one level of automation to another
369
Results The ANOVA LOA[g] by Workload by Task carried out only on distance 0 trials showed no significant differences between response times associated with the three LOA[g] (F2,8=.65; NS). A main effect of Task was found (F1,8=28.01; p