blocks) mixed factorial design with repeated measures on last two factors was ... Keywords: Mental workload, Feedback, Monitoring performance, Automation-.
151 © Journal of the Indian Academy of Applied Psychology January 2010, Vol.36, No.1, 151-158.
Performance Feedback, Mental Workload and Monitoring Efficiency Anju L. Singh, Trayambak Tiwari, and Indramani L. Singh Banaras Hindu University, Varanasi The present study examined the effect of success and failure performance feedback on perceived mental workload and monitoring performance in flight simulation task. The revised version of the multi-attribute task battery (MATB) was administered on 20 non-pilot participants. The performances were recorded as hit rates, false alarms and root mean square errors. Mental workload was assessed using NASA-TLX questionnaire. A 2(success-failure feedback) x 2(30-min sessions) x 3(10-min blocks) mixed factorial design with repeated measures on last two factors was used. The obtained results revealed that performance feedback did not have a significant effect on mental workload and malfunction detection. The findings support the notion that monitoring inefficiency (i.e., automation-induced complacency) is a robust phenomenon and it can be observed in multi-task environment with high static automation reliability. Keywords: Mental workload, Feedback, Monitoring performance, Automationinduced complacency, Flight simulation task
The technological revolution has gradually removed the human-operators of many complex systems from front-line levels of control and having their actions relayed via an intervening mass of computers and microprocessors. Instead of active controller of the system, the operator of an automated system has now become a passive observer. It may seem paradoxical, but automated systems can both reduce and increase mental workload. Therefore, mental workload is considered an important factor in the area of automation research. One of the fundamental reasons for introducing automation in complex systems is to reduce workload, and thereby to reduce human error. However, evidence shows that this is not necessarily true in all situations. Infact the automation merely changes how work is accomplished (Woods, 1994). Further, Reinartz and Gruppe (1993) argued that automated system present cognitive demands, which increases workload. The performance of the operator
is hindered by the increase in processing load resulting from the additional task of collecting information about the system state. This is further complicated by the extent of the operator’s knowledge about the system. In the event of manual takeover, the operator must either disable interlocks to other systems, or else match his/her actions to those of related process functions. Operators can use different strategies to cope with workload. Hart (1989) showed that experienced operators work in advance during periods of low workload in order to eliminate workload peaks in the future. Hockey (1993) has presented different strategies to cope with workload in a regulation model. Operators constantly compare their performance with the goal state. If the quality of the performance is not good enough according to the goal state, more effort will be invested. To a certain level this is an automatic process. An effort monitor evaluates the amount of effort that is required and when the effort increases too much, the
152
performance evaluation process is controlled at higher cognitive level. Operators can apply different strategies to situations in which the performance level does not fit the goal state. They can decide to invest more effort or to decrease the goal and accept a lower level of performance. When these strategies are not possible because the performance level is already low or the operator has already invested a maximum amount of effort, the situation leads to stress. Gaillard and Wientjes (1994) have shown that there are substantial costs involved when one has to invest a lot of mental effort to perform a highly demanding task. It is well understood that feedback or knowledge of result (KR) is a crucial factor in the early stages of skill acquisition (e.g., Groeger, 1997). This has been applied to many diverse fields from consumer products (Bonner, 1998) to aviation (White, Selcon, Evans, Parker, & Newman, 1997). In the latter study, it was found that providing redundant information from an additional source can actually elicit a performance advantage. One study that is relevant to the driving domain examined the effects of feedback on performance of controlled and automatic tasks was conducted by Tucker and associates (Tucker, MacDonald, Sytnik, Owens, & Folkard, 1997) and it was found that feedback can reduce error rates on tasks requiring controlled processing. However, automatic tasks were found to be resistant to the effects of feedback. Furthermore, a vigilance decrement was observed only in the controlled task, suggesting automatic responses do not suffer from such a decrement. This vigilance decrement was also found to be unaffected by feedback. Researches have also been conducted to compare novice and expert drivers in the context of automation. For example, Duncan, Williams and Brown (1991) examined the performance of a group of normal (experienced) drivers with that of novices and
Feedback and monitoring performance
experts on a subset of driving skills. They found that on half of the measured skills, the normal drivers actually performed worst than novice drivers, who performed at a similar level to the experts. Results revealed that the normal (experienced) drivers succumbed to a range of bad habits in the absence of learning feedback. In complex systems, such as modern fighter jets and helicopters, operators have to manage several tasks at the same time that increased pilots’ mental workload. An important and challenging problem in many multi-task environments is managing interruption (McFarlane & Latorella, 2002). Researchers have noted that proactive systems executing in environments such as aviation cockpits (Dismukes, Young, & Sumwalt, 1998; Latorella, 1996), control rooms (Stanton, 1994), in-vehicle displays (Lee, Hoffman, & Hayes, 2004) and office environments (Bailey & Konstan, 2006; Czerwinski, Cutrell, & Horvitz, 2000b; Jackson, Dawson, & Wilson, 2001) are significantly interrupts the user’s primary tasks’. It has also been observed that when primary tasks are interrupted at random moments, users take longer to complete the tasks (Bailey & Konstan, 2006; Czerwinski, Cutrell, & Horvitz, 2000a; Rubinstein, Meyer, & Meyer, 2001), commit more errors (Kreifeldt & McCarthy, 1981; Latorella, 1996) and experience increased levels of frustration, annoyance and anxiety (Adamczyk & Bailey, 2004; Bailey & Konstan, 2006; Zijlstra, Roe, Leonora, & Krediet, 1999). The foregoing review, though, suggest that performance feedback is related with mental workload, but it is still a controversial issue that how and to what extant the performance feedback influences the mental workload. For example, Becker, Warm, Dember and Hancock (1995) found that performance feedback generally lowered mental workload in a monitoring task, whereas, the results of Fairclough, May and
Anju L. Singh, Trayambak Tiwari and Indramani L. Singh
Carter (1997) suggest that time headway feedback had no effect on workload in a carfollowing scenario. In the light of this inconsistency in findings the present study makes an attempt to examine how and to what extent the performance feedback is related with mental workload in multi-task situation. It was hypothesized that the success feedback would reduce monitoring inefficiency more than failure feedback and participants would perceive low mental workload in success feedback condition than in failure feedback, resulting in low monitoring efficiency. Method Participants: Participants in this study were 20 students of the Banaras Hindu University. Each participant had normal (20/20) or corrected to normal visual acuity, and their age varied from 18 to 23 years. None of the participants had prior experience on the flight simulation task. Experimental Design: A 2(feedback) x 2(session) x 3(block) mixed factorial design was employed in this experiment. Between-subjects variable had two levels of feedback i.e., success feedback and failure feedback, whereas within-subjects variables included sessions and blocks. Participants were randomly assigned in each experimental condition (success and failure feedback; n = 10 in each). Tools: Mental Workload Questionnaire: Participants completed the NASA-TLX (Hart & Staveland, 1988) before beginning the experiment. The NASA-TLX has six components reflecting the degree of mental demand, physical demand, temporal demand, performance, effort and frustration associated with a task. This scale provides an overall workload score based on a weighted average of ratings
153
Flight Simulation Task: A revised version of multi-attribute task battery (MATB: Comstock & Arnegard, 1992) with high automation reliability (87.5%) was used in this study. Automation reliability was defined as the percentage of correct detection of malfunctions by the automation routine in each 10-min block in the system-monitoring task. This is a multi-task flight simulation package comprising system-engine monitoring, compensatory tracking, fuel resource management, communications, and scheduling tasks. In the present study, only the system-engine monitoring, tracking, and fuel-resource management tasks were used, in which system-monitoring task was automated during test sessions. These three tasks were displayed in separate windows on a 14" colour monitor (For details regarding the task see: Singh, Sharma & Singh, 2005; Singh & Singh, 2006). Procedure: Upon arrival at the lab, participants were required to fill out a consent form and background questionnaire. The Snellen Eye Chart was used to test visual acuity of the participants. This test measures how well participants see at various distances. The participants were then asked to complete the pre-task NASA-Task Load Index. After completing the questionnaires, the experimenter provided a brief introduction about the flight simulation task to participants. In both of the experimental conditions, participants first completed a 10 -minutes practice, which allowed them to become accustomed to the task before participating in final test session. The correct and incorrect detection were recorded as the dependent measures for the system monitoring task and the root mean square errors were recorded for the tracking and the fuel management tasks. Participants, who score above 60% on hit rates, were eligible for a final six 10minutes test blocks. In test session, systemmonitoring task was automated, and the
154
Feedback and monitoring performance
Results and Discussion Practice Performance Results of practice session indicated that the mean correct detection (hits) of all subjects varied from 60% to 80%, irrespective of feedback conditions. However, hit rates performance of participants didn’t significantly differ from success to failure feedback condition. Similar results were also obtained for remaining dependent measures like false alarms, tracking and fuel resource management. The finding demonstrates that all the participants have comparable level of performance on the experimental task before entering into the final experiment. Correct Detection Performance (hit rates) Means and SDs for correct detection of malfunctions on system-monitoring task were computed for each of the two sessions i.e., before and after manipulation of feedback performance. Mean performances indicated that participants obtained high detection of malfunction scores (M = 80.47; SD = 22.78) in the first session than in the second session (M = 67.13; SD = 17.81) under success feedback condition. Similarly, participants achieved high mean scores in the first session (M = 83.53; SD = 17.16) than in the second session (M = 48.37; SD = 26.74) under failure feedback condition. The total mean
performance across six 10-min blocks for the success feedback was higher (M = 73.80; SD = 20.29) than in the failure feedback (M = 65.95; SD = 21.95). Results further revealed that participant’s monitoring efficiency reduced across blocks, irrespective of the feedback types. Correct monitoring performance data were then submitted to a 2(feedback) x 2(session) x 3(block) analysis of variance with repeated measures on the last two factors for examining interaction effect, if any. The ANOVA results showed that the main effect of feedback was not found significant, which revealed that the types of feedback either success or failure given prior to the detection of automation failures had no impact on monitoring performance. So these results do not support our first hypothesis that the success or failure feedback performance would reduce monitoring performance like on other psycho-motor task performances. Moreover, the main effect of session was found significant (F (1, 18) = 23.04; p