electrocardiogram data were collected with precordial bipolar leads and three electrodes (positive ..... Adaptive automation of a dynamic control task based on ...
Do cognitive assistance systems reduce operator workload? Stefan Röttger1, Krisztina Bali2, & Dietrich Manzey1 1 Berlin University of Technology Berlin, Germany 2 Budapest University of Technology and Economics Budapest, Hungary
Abstract As the operators’ tasks in complex human-machine systems are often difficult and cognitively demanding, today more and more cognitive functions are aided or completely carried out by automated assistance systems (e.g. decision or diagnostic support systems in process control). One major purpose of providing such automated assistance is to reduce operator workload. Adding a system, however, also means adding tasks and complexity to a work environment. Thus, workload reduction is not granted when cognitive tasks are automated. The present study addresses the impact of the introduction of automated assistance systems on operator workload in process control. Twelve participants were asked to detect, diagnose and repair any malfunctions occurring in a process control simulation (AutoCAMS). This had to be performed manually or was supported by automated assistance systems differing in their degree of automation. Workload effects were assessed with subjective (NASA-TLX), physiological (heart rate variability, HRV), and performance indicators (secondary task reaction time, RT). Automation support significantly increased overall performance. Furthermore, subjective workload ratings were significantly lower in the assisted conditions as compared to the manual condition. This latter effect was not reflected in the HRV and RT measures, although these measures were sensitive to workload changes due to the presence or absence of faults. Possible explanations for the observed dissociation of workload measures are discussed. Introduction Many areas of human work are characterized by an increasing degree of automation, e.g. ground-, sea- and air transportation, chemical and power plants, and even health care (Sheridan, 2002). As the operators’ tasks in such systems are often difficult and cognitively demanding, today more and more cognitive functions are automated, i.e. aided or completely carried out by assistance systems. Besides increasing performance, safety, and reliability of the overall system, one purpose of In D. de Waard, G.R.J. Hockey, P. Nickel, and K.A. Brookhuis (Eds.) (2007), Human Factors Issues in Complex System Performance (pp. 1 - 11). Maastricht, the Netherlands: Shaker Publishing.
2
Röttger, Bali, & Manzey
introducing such assistance systems often is to reduce operator workload. Adding a new (assistance) system, however, also means adding new tasks and complexity to a work environment. Thus, workload reduction may not be granted when cognitive tasks are automated. Some prominent examples for this are provided by approaches to cockpit automation in aviation. They have been criticized to often just shift workload of pilots from one area to another (e.g. decreasing manual but increasing cognitive workload), instead of achieving an overall workload reduction (Billings, 1997). Similarly, also the results of laboratory research provide some evidence for the ambiguous relationship between automated assistance and operator workload. Although an increased workload due to automation has not been reported, so far, the currently available evidence for workload reductions achieved by the introduction of assistance systems is at least equivocal. On the one hand, there are a number of studies which indeed report workload benefits of assistance systems. For example, McFadden, Vimalachandran, and Blackmore (2004) studied performance and workload in a naval radar tracking simulation. Participants of their studies were asked to keep track of up to 20 vessels on a radar display. Even though the demands of controlling such a high number of vessels are high, comparatively low workload ratings were obtained when a reliable automated tracking aid was available. A similar result has also been described by Goteman and Dekker (2003) with respect to pilots flying approaches with support of an area navigation system (RNAV). Although the use of RNAV was linked with additional monitoring requirements, perceived mental workload of both pilot flying and pilot not-flying remained low. In addition, there are studies which suggest a more or less direct link between operator workload and different levels of automation (LOA). For example, Kaber, Onal, and Endsley (2000) investigated such a relationship in a telemanipulation taks and found higher LOA associated with lower workload ratings. Kaber and Riley (1999) showed that mandatory adaptive automation in an air traffic control-like laboratory task can lower workload as indicated by secondary task performance. However, other studies have reported contradictory results. Using the same task as Kaber and Riley (1999), Kaber and Endsley (2004) failed to find any significant effect of LOA on secondary task performance and workload ratings. Endsley and Kiris (1995) investigated effects of different automated decision support systems on user performance and workload in a simulated planning task. Even though the assistance systems were found to affect performance, no clear relationships emerged with respect to the subjectively perceived workload in the different conditions. And similar results also were obtained by Lorenz, Di Nocera, Röttger, and Parasuraman (2002) who studied the effects of different automated assistance systems which were introduced to support operator’s fault identification and management in a simulated process control task and who did not find effects of LOA of the assistance system on subjective workload ratings either. However, one shortcoming of the above-mentioned studies is that workload effects rarely have been the main focus of investigation. Instead, these studies mainly have focussed on performance-related effects of automation where workload only was
do cognitive assistance systems reduce operator workload?
3
addressed as some kind of control variable. As a consequence, most of the studies relied on one single indicator of operator workload, only. This might have hindered to completely catch the pattern of workload-related effects associated with automated assistance systems, given the fact that workload appears to represent a multidimensional concept with differing workload assessment techniques differing in sensitivity and diagnosticity (O’Donnel & Eggemeier, 1986). Only recently, Metzger and Parasuraman (2005) examined workload effects of an assistance system for air traffic management using indicators from all three major categories of workload assessment techniques identified by O’Donnell and Eggemeier (1986): Subjective ratings, performance-based measures and physiological measures. While the participants of this study subjectively did not report a significant workload reduction when working with the automated assistance, secondary task performance was better indicating a higher amount of spare capacity in the automation condition. This latter result suggests that the introduction of the automated assistance system, on the one hand, successfully reduced the cognitive resources demanded by the primary tasks. However, because these “free” resources obviously were invested in order to enhance performance in the secondary task, the experienced overall workload remained constant. Although the results of the physiological data have been reported only partially, the results of the study of Metzger and Parasuraman (2005) can be taken as an example for the possible advantages of analysing several workload measures at the same time in order to provide a more complete pattern of workload related effects achieved by the introduction of automated assistance systems. The present study shall contribute to further elaborate the relationship between operator workload and automated assistance systems. In order to catch possible dissociations and to obtain a more comprehensive picture of automation consequences with respect to operator workload, effects of different types of automation (Parasuraman, Sheridan, & Wickens, 2000) were studied using different workload measures. For this purpose, participants in the present study where required to perform what has been referred to as a supervisory control task. More specifically, they had to monitor a process control system and to perform effective fault management in case of malfunctions. Fault management included the detection, identification and correction of any faults occurring in different subsystems of the overall process control. This latter task had to be performed manually or could be supported by one of two automated assistance systems varying in degree of automation. Workload effects were assessed by using subjective, behavioural (secondary-task performance), and physiological (heart rate variability) data. Three hypotheses were investigated: First, it was expected to find higher workload during periods where participants had to work on system malfunctions as compared to fault free phases. Second, it was expected that fault management performance would be increased and workload decreased by the provision of automated assistance. Third, the effect of workload reduction was expected to be dependent on the degree of automation of the assistance system, with higher degree of automation related to higher reduction in workload.
4
Röttger, Bali, & Manzey
Method Participants Twelve engineering students (five male, between 20 and 27 years old with a mean age of 24 years) were recruited for the experiment and were paid for participation. All of them were familiar with the experimental paradigm from a previous study (Manzey, Bahner & Hüper, 2006). This allowed for a marked reduction of the preexperimental training time, and the task was considered to be difficult enough to observe automation-induced workload reductions even in participants with prior task experience. As heart rate variability (HRV) measures were used in this study, only participants reporting to be without cardiac disorders and without diabetes mellitus, which can cause chronically decreased HRV (Kudat et al., 2006), took part in the experiment. Tasks The experiment was conducted using a modified version of AutoCAMS (Hockey, Wastell, & Sauer, 1998; Lorenz et al., 2002), a PC-based simulation of a process control task. This simulation is based on the Cabin Air Management System (CAMS) task originally developed by Hockey et al. in order to investigate stress effects on complex human performance. AutoCAMS simulates an autonomously running life support system of a spacecraft consisting of five subsystems that are critical to maintain atmospheric conditions in the space cabin with respect to different parameters (oxygen, nitrogen, carbon dioxide, temperature and pressure). By default, all of these subsystems are automatically maintained within their target range. However, different faults may occur occasionally, due to a malfunction in any subsystem (e.g. leaks or blocks of a valve or defective sensors). The primary task of the operator involves supervisory control of the subsystems. In case of a malfunction, fault diagnosis and management are to be performed, and an appropriate repair order has to be selected and sent to the spacecraft. Malfunctions always were indicated by an unspecific master alarm represented by a light turning from green to red. Fault diagnosis and management can be carried out manually or with support of an Automated Fault Identification and Recovery Agent (AFIRA). These systems can vary in their degree of automation. For the present study two types of AFIRA were used. The first one (AFIRA_1) represents a system which, in case of a fault, displays both, a fault-diagnosis as well as a supposed sequence of actions for effective fault management, which then has to be implemented manually by the operator. Using the model of Parasuraman et al. (2000), AFIRA_1 can thus be described as high level automation in the areas of information analysis and action selection. The second type, AFIRA_2, additionally offers a fully automated implementation of suggested interventions including and sending of the appropriate repair order, which constitutes a high-level automation of action implementation. In order to obtain a behavioural workload measure, participants are given an additional secondary task which has to be performed in addition to the process control task. This secondary-task consists of a simple reaction time measure.
do cognitive assistance systems reduce operator workload?
5
Participants have to respond as fast as possible by a mouse-click to the appearance of a “connection-check icon”, which is presented pseudo-randomly every 15 to 35 seconds. This task is introduced as a means of acknowledging seamless communication between control centre and remote life support system. Measures Data collection included four measures: Primary task performance, secondary task performance, subjective workload, and heart rate variability as a physiological workload indicator. Primary task performance was analysed separately for fault diagnosis and fault management during system malfunctions. Diagnose time, defined as mean time between master alarm and selection of correct repair order, was used as fault diagnosis performance indicator. Fault management performance was measured in terms of percent of time all relevant parameters remained within target limits. Workload measures were based on self-reports performance, and physiological reactions. Subjective workload was assessed using the NASA Task Load Index (Hart & Staveland, 1988). Instead of using the original 100-point scaling, participants indicated perceived workload on a more common seven-point scale. Secondary task performance was defined by the median of connection-check reaction times. The mid-frequency (70 to 150 mHz) power spectrum component of heart rate variability was used in the present study. An all-pole autoregressive model was fitted to RR-interval time series, yielding the central frequency and bandwidth of the mid-frequency spectral peak. Spectral computations were carried out on a moving time window of 30 seconds, which was shifted along the data series by steps of one second. Thus, HRV values for each second of the experiment were obtained. Procedure The day before the experimental session, participants received a two-hour task training with AutoCAMS. Because all participants already had participated in a previous experiment using this task, training was limited to refresh their knowledge about the functionality of AutoCAMS components, and on procedures for fault diagnosis and fault management. They were briefed on the experimental procedures of the following day and signed informed consent. During the experimental session, electrocardiogram data were collected with precordial bipolar leads and three electrodes (positive, negative and ground) placed on the participants’ chest. AutoCAMS was operated in three 30 minutes blocks, containing six malfunctions each. After every block, self-reports on workload were collected and extend of support for fault diagnosis and management changed. Thus, every participant operated AutoCAMS manually, and with support of AFIRA 1 and AFIRA 2. Order of automation conditions was balanced across subjects. Participants were instructed to verify AFIRA diagnosis before sending repair orders when working with AFIRA_1 or before starting automated action implementation of AFIRA_2, respectively.
6
Röttger, Bali, & Manzey
Analysis As the purpose of the assistance systems under study was to support operators during system malfunctions, workload effects of automation should be most pronounced during fault phases. However, awareness of availability of an assistance system may influence operator workload during fault free phases as well. For example, participants may not spend as much effort as in the manual condition on system monitoring to detect malfunctions and symptom patterns as early as possible. Moreover, higher workload levels were expected during fault phases as compared to fault free phase, which should be reflected by workload indicators as well. Therefore, a 2 x 3 repeated measures analysis of variance (ANOVA) with factors Fault (present vs. absent) and Automation (manual, AFIRA_1, AFIRA_2) was calculated for heart period variability and secondary task performance. As selfreports were collected only once after each block and performance measures are defined for fault-present phases only, these data were not available separately for fault-present and fault-absent phases and analysis was limited to a repeated measures ANOVA with Automation being the only factor. Greenhouse-Geisser correction was applied to degrees of freedom whenever sphericity assumption was violated. Results Mean values and standard deviations of all dependent variables are given in Table 1 for each level of factors Automation and Fault. ANOVA results are summarized in Table 2. Table 1: Means and standard deviations of performance measures, subjective workload ratings and heart rate variability (HRV) in all experimental conditions. Manual Diagnose Time (sec.) Time Within Limits (%) NASA-TLX Secondary Task Performance (ms) HRV (ms2)
AFIRA_1
AFIRA_2
Normal
Fault
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
107
57
45
13
56
16
--
--
--
--
99.16
0.83
99.76
0.14
99.50
0.21
--
--
--
--
46.9
20.4
33.7
12.9
21.8
11.0
--
--
--
--
1395
282
1471
302
1616
558
1316
325
1673
423
201
177
189
163
182
114
210
152
171
142
Performance Automation significantly influenced both diagnose time and time within limits with higher performance in assisted conditions. Post hoc t-test comparisons of diagnose times showed significant differences between Manual and AFIRA_1 (p=.002) as well as between Manual and AFIRA_2 (p=.01). Fault management performance, as indicated by the relative amount of time all parameters were kept within their predefined limits, was significantly higher with support of AFIRA 1 as compared to
do cognitive assistance systems reduce operator workload?
7
fault management carried out manually (p=.035) and with support of AFIRA_2 (p=.009). Secondary task performance, in contrast, did not differ significantly between Automation conditions. However, factor Fault had a significant main effect on prompted reaction times, which were slower during fault phases than during normal operation phases. There was no Automation x Fault interaction effect on secondary task performance. Table 2: ANOVA results. Diagnose Time
Time Within Limits
NASA-TLX
Sec. Task Performance
Heart Rate Variability
Source
df
F
df
F
df
F
df
F
df
F
Automation
1.3
11.8**
1.2
4.6*
1.15
11.9**
1.1
2.0
1.1
0.2
Fault
--
--
--
--
--
--
1
11.9*
1
17.6**
Aut. x Flt.
--
--
--
--
--
--
1.3
1.4
1.7
1.0
Error
13.8 12.7 12.7 13.9 313826 18.9 1763 0.41 274 3254 Notes: Greenhouse-Geisser correction was applied to degrees of freedom (df) whenever sphericity assumption was violated. Uncorrected degrees of freedom are df = 2 for factor Automation, df = 1 for factor Fault, df = 2 for interaction, and df = 22 for error. Mean square error values are given in italics. *p < .05. **p < .01.
Subjective workload There was a significant main effect of Automation on subjective workload ratings. Post hoc t-test comparisons showed that workload was rated significantly lower in the AFIRA_1 condition as compared to the manual condition (p=.023). AFIRA_2 led to a further significant reduction of experienced workload (p=.001). Heart rate variability A significant main effect of factor Fault on heart rate variability was observed, with lower HRV indicating higher workload during fault periods as compared to fault free periods of the experiment. There was no effect of Automation on HRV, and no Automation x Fault interaction was observed. Discussion Introducing an automated fault identification and recovery agent to support operators during system malfunctions in process control is beneficial for fault diagnosis and fault management. Diagnose times in both conditions with AFIRAsupport were reduced considerably as compared to unsupported performance. Fault management performance in terms of the relative amount of time all parameters being within target range was above 99 percent in all conditions. From this very high performance level, only AFIRA_1 significantly enhanced the time all parameters being within target range. At the first glance, this is counterintuitive as AFIRA_2, not AFIRA_1, offers fault management support by automatically implementing all necessary interventions. However, these automatic implementations are realised on a step-by-step bases allowing operators to monitor
8
Röttger, Bali, & Manzey
and understand AFIRA_2 actions. Doing so, AFIRA_2 is somewhat slower than operators who manually implement the interventions suggested by AFIRA_1. Thus, the observed advantage of AFIRA_1 over AFIRA_2 is a matter of specific system design, not of general function allocation. On these grounds, it can be concluded that with respect to performance, assistance systems supporting information analyses and action selection with and without automated action implementation are both beneficial, and that there is no general performance advantage of leaving implementation to operators. With respect to workload it was assumed that fault-management in general is more demanding than just monitoring the system in fault-free periods. Yet, the specific level of workload should be affected by the kind of automated support available. The first assumption got clear support from both measures used to assess differences in workload between fault-present and fault-absent periods. That is, during fault-present periods, heart-rate variability was lower and secondary-task reaction times were slower than during fault-absent periods. However, the additional assumption about the effects of automated assistance on workload only got support from the subjective workload assessment. Participants’ workloadratings in the most difficult manual condition were rather moderate, most probably because of their prior experience with the task from a previous study. Nevertheless, as expected, offloading participants from large parts of information acquisition, information analysis, and action selection tasks during fault phases by AFIRA_1 led to a significant decrease of experienced workload. Additionally automating action implementation (AFIRA_2) resulted in a further significant workload reduction on the subjective level. Thus, one could conclude that automating fault identification and recovery in process control indeed can lead to a reduction of the workload perceived by operators, and that this reduction is the stronger the higher the level of automation. However, this kind of effect was not reflected in the more objective workload measures: Neither heart rate variability as an indicator of the physiological effort, nor secondary-task performance as indicator of spare capacity showed any systematic relationship to the extend of assistance available to operators during fault management. In fact, no differences emerged in these measures between the manual and both supported conditions. There are at least two possible explanations for this interesting dissociation between workload measures: First, it could be supposed that both objective workload indicators were less sensitive and, thus, not capable of reflecting the workload differences induced by the different experimental conditions in the present experiment. However, the fact that both workload measures indicated workload differences between fault-present and fault-absent periods shows that they actually were sensitive to workload changes due to altered task demands. This would at least suggest that the workloaddifferences induced by the different assistance systems are only small, i.e. much smaller than those between fault-absent and fault-present periods. In this case they might have been strong enough to get recognized on a subjective level but too small to affect cardiac activity and secondary task performance. Inspection of
do cognitive assistance systems reduce operator workload?
9
participants’ fault management behaviour supports this explanation: Operators carried out less information sampling and less interventions in the assisted conditions. This suggests that assistance systems did unload participants to some extend from activities necessary for fault management. This could have led to an experienced workload reduction which is correctly reflected in subjective workload reports. However, findings regarding a similar dissociation of subjective, performance and physiological indicators of workload (e.g. Derrick, 1988) rather point to a second explanation of the observed effect. It might be that secondary task performance and heart rate variability correctly indicate that both assistance systems do not lead to any gains with respect to workload. In this case, the fact that participants did report workload alleviation might only reflect an evaluation they have inferred from the functionality of the assistance system at hand, i.e. the assumption that there must be a workload benefit if they get supported by such a system. In any case, what seems to be a safe conclusion from the current study is that at least no increments of workload were introduced by the provision of the automated assistance systems. This result seems to be particularly important because both of the systems were found to yield significant benefits in performance. Obviously it is possible by means of these systems to gain an increment in performance while keeping operator workload at least constant. Furthermore, it seems that the differences in level of automation of the systems studied do not provide any considerable benefits or costs with respect to workload. Therefore, neither overall performance nor workload considerations may be sufficient to finally decide about the relative advantages of these systems with respect to implementation in realworld process control. As a consequence, other possible aspects of humanautomation interaction have to be taken into account, in order to make and justify such a decision. These aspects might include all kinds of issues related to what has been labelled “out-of-the-loop-unfamiliarity” (Endsley & Kiris, 1995; Parasuraman et al., 2000), i.e. effects related to maintenance of situation awareness or avoidance of loss of skills after prolonged use of the automated systems. However, the currently available research does not provide clear guidelines in this respect (cf. Kaber & Endsley, 1999; Lorenz et al., 2002). More research clearly is needed in order to arrive at a more complete picture of automation consequences of cognitive assistance systems for fault management in process control. Acknowledgements We wish to thank Sabine Jatzev for her help during data acquisition and data analysis. We further highly appreciate the technical support from Marcus Bleil in configuring the experimental paradigm and setting up the equipment. References Billings, C.E. (1997). Aviation automation. The search for a human-centred approach. Mahwah, NJ: Lawrance Erlbaum Associates.
10
Röttger, Bali, & Manzey
Derrick, W.L. (1988). Dimensions of operator workload. Human Factors, 30, 95110. Endsley, M.R., & Kiris, E.O. (1995). The out-of-the-loop performance problem and level of control in automation. Human Factors, 37, 381-394. Goteman, Ö., & Dekker, S. (2003). Flight crew and aircraft performance during RNAV approaches: Studying the effects of throwing new technology at an old problem. Human Factors and Aerospace Safety, 3, 147-164. Hart, S.G., & Staveland, L.E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P.A. Hancock and N. Meshkati (Eds.), Human mental workload (pp. 139-183). Amsterdam: NorthHolland. Hockey, G.R.J., Wastell, D.G., & Sauer, J. (1998). Effects of sleep deprivation and user interface on complex performance: A multilevel analysis of compensatory control. Human Factors, 40, 233-253. Kaber, D.B., & Endsley, M.R. (2004). The effects of level of automation and adaptive automation on human performance, situation awareness and workload in a dynamic control task. Theoretical Issues in Ergonomics Science, 5, 113-153. Kaber, D.B., Onal, E., & Endsley, M.R. (2000). Design of automation for telerobots and the effect on performance, operator situation awareness, and subjective workload. Human Factors and Ergonomics in Manufacturing, 10, 409-430. Kaber, D.B., & Riley, J.M. (1999). Adaptive automation of a dynamic control task based on secondary task workload measurement. International Journal of Cognitive Ergonomics, 3, 169-187. Kudat, H., Akkaya, V., Sozen, A.B., Salman, S., Demirel, S., Ozcan, M., et al. (2006). Heart rate variability in diabetes patients. The Journal of International Medical Research, 34, 291-296. Lorenz, B., Di Nocera, F., Röttger, S., & Parasuraman, R. (2002). Automated faultmanagement in a simulated spaceflight micro-world. Aviation, Space, and Environmental Medicine, 73, 886-897. Manzey, D., Bahner, E.J., & Hüper, A.-D. (2006). Misuse of automated aids in process control: Complacency, automation bias, and possible training interventions. In HFES Proceedings of the Human Factors and Ergonomics Society 50th Annual Meeting (pp. 220-224). Santa Monica,CA: HFES. McFadden, S.M., Vimalachandran, A., & Blackmore, E. (2004). Factors affecting performance on a target monitoring task employing an automatic tracker. Ergonomics, 47, 257-280. Metzger, U., & Parasuraman, R. (2005). Automation in future air traffic management: Effects of decision aid reliability on controller performance and mental workload. Human Factors, 47, 35-49. O’Donnell, R.D., & Eggemeier, F.T. (1986). Workload assessment methodology. In K.R. Boff, L. Kaufman & J.P. Thomas (Eds.), Handbook of perception and human performance (Vol. II, Cognitive Processes and Performance, pp. 42-41 - 42-49). New York: Wiley Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39, 230-253.
do cognitive assistance systems reduce operator workload?
11
Parasuraman, R., Sheridan, T.B., & Wickens, C.D. (2000). A model for types and levels of human interaction with automation. IEEE Transactions on systems, man and cybernetics - Part A: Systems and humans, 30, 286-297. Sheridan, T.B. (2002). Humans and Automation: System Design and Research Issues. New York: Wiley-Interscience.