illustrated in the context of system design and training regimes which can ... problem in high-risk technologies could be through error detection and .... as a function of the type of error committed (i.e., slip or mistake), the strategy devised.
“User Strategies in Recovering from Errors in Man Machine Systems” by Tom Kontogiannis Department of Production Engineering & Management, Technical University of Crete, Chania, Crete, GR 73100, Greece ABSTRACT Safety science has long been dominated by the concept of error suppression or prevention. With the increasing complexity of man-machine systems, however, error recovery may seem an important supplementary safety goal since total error prevention may be difficult to achieve.
This article presents
an elaborate
examination of the different processes of error recovery (i.e., detection, explanation, and correction), the stages at which they might occur, and the types of recovery goals that users may set themselves.
A research framework is proposed on the basis of a
taxonomy of user strategies which could support error recovery. User strategies can range from information search strategies (i.e., inner feedback, system error cueing, and communications) to planning behaviours and learning from errors. The research framework explores how error detection and correction may vary as a function of error types, recovery stages, and user strategies.
The benefits of this framework are
illustrated in the context of system design and training regimes which can enhance recovery from errors.
Keywords:
Error detection, error recovery, planning behaviours, interface design, training, human reliability
Manuscript revised in September 1998
0
User Strategies in Recovering from Errors in Man Machine Systems Tom Kontogiannis
1. INTRODUCTION For many years, developments in automation, operator support systems, and safety policies, have been made in the context of suppressing or preventing human error. As Wioland and Amalberti (1996) argued, the “zero accident” policy, which remains the ultimate safety goal, has long been interpreted as a “zero error” policy. This exclusive reliance on the “error suppression” approach has recently been questioned by various researchers (Frese, 1991; Zapf et al. 1994; Mo and Crouzet, 1996). Even in advanced technological systems, mal-functions of automated safety systems and mal-adaptations of user interactions have often resulted in serious accidents (Woods et al., 1994). Moreover, there are always bound to be unanticipated situations where operators are required to devise complex strategies and use their knowledge under tight time constraints and psychological stress. Total elimination of human error, therefore, may be difficult even with advanced technologies. Another approach that might be a sensible way of combating the human error problem in high-risk technologies could be through error detection and correction. Evidence from research studies shows that operators rely on these psychological processes to maintain their performance and manage to recover a significant amount of their errors (Reason, 1990).
With the accumulation of expertise, operators are
developing protections and defences against their own cognitive deficiencies (Amalberti, 1992) and these abilities are now being considered as good indices of operator skill (Rizzo et al., 1994). We are witnessing a shift in safety paradigms towards system designs that focus on preventing the consequences of human error by providing opportunities for error recovery. There are also other reasons which justify the consideration of error recovery as a safety goal supplementary to error prevention.
1
First, errors can cause stress in
operating teams particularly in cases where errors take a long time to correct (Brodbeck et al., 1993).
Investigations in this research field, therefore, can be
justified as a means of improving the quality of working life (Zapf et al., 1994). Second, identification of forms of error detection and correction can provide valuable input to the design of error tolerant systems (Rouse and Morris, 1987). These systems can tolerate varying operator inputs and minor errors, reject unsafe acts, or minimise the consequences of error. Finally, detection and correction can play an important role in learning from errors.
Error feedback during training has been recognised as
extremely important to the learning process. Errors can get learners through “hidden” parts of the actual job, that is, aspects of the task that are not apparent under error-free performance, and produce new insights about the task activities (Seifert and Hutchins, 1992). In this sense, safety measures for error suppression may deprive operators from certain types of task feedback and on-the-job learning opportunities. In fact, a growing number of studies have been undertaken in the area of “error training” programmes. In the context of human-computer interaction, several studies (Brodbeck et al., 1993; Dormann and Frese, 1994) have explored learning environments that allowed errors to appear instead of restricting user strategies. Some of the advantages of “error training” include: tailoring instruction to the needs of the learner as more knowledge is acquired, enhancing the user´s mental model of the system, developing emotional strategies to cope with errors and frustrations, and acquiring meta-strategies for error detection and correction (Frese and Zapf, 1994). These error management strategies have also a good chance of transferring to the actual job performance. It appears then that enhancing error recovery to mitigate the consequences of error would provide both opportunities for learning and safe performance outcomes. The purpose of this article is to contribute to the study of error detection and correction as well as the design of systems that support these processes. Specifically, a research framework has been developed that examines strategies employed by operators or users in handling their own errors. The proposed taxonomy of error recovery strategies provides a useful basis for making recommendations to improve aspects of job design. Finally, the framework is illustrated in the context of design and operator training that support the error handling processes.
2
2.
ERROR HANDLING PROCESSES
2.1 Recovery processes, outcomes and stages Studies in error recovery (Rizzo et al., 1994; van der Schaaf, 1995) have tended to distinguish three processes in error handling or error recovery, namely: (a) error detection - realising that an error is about to occur or suspecting that an error has occurred, independent from understanding the nature and cause of the error (Zapf and Reason, 1994), (b) error explanation or localisation - explaining why an error occurred, and (c) error correction - modifying an existing plan or developing a new one to compensate. Error handling or error recovery will be used interchangeably, in this article, to refer to user behaviours comprising these recovery processes. Depending on the nature of error, the consequences on the system, and the available reaction time, operators may set different corrective goals. correspond to the outputs of the error recovery process.
These
We can distinguish three
possible corrective goals (Mo and Crouzet, 1996), that is: (a) backward recovery in which the system is brought back to the original state occupied before the commission of the error. This means that operators have got the means to reverse the effects of their actions (e.g., the “undo” command in text editing). (b) forward recovery where
operators may seek to bring the system to an
intermediate stable state in order to “buy time” and find a better solution later on. This is more likely to happen when critical equipment has been damaged and the available response time is limited. (c) compensatory recovery in which operators may activate redundant equipment and bring the system to the desired goal state that was originally intended.
The interest of cognitive ergonomics in error recovery is twofold. First, there is a need to investigate what strategies experienced operators use in order to detect and correct errors. Second, on the basis of a taxonomy of error recovery strategies, we
3
need to examine what workplace factors and management factors can be optimised to enable timely error recovery.
Figure 1 shows a framework for the analysis of error
handling processes from an ergonomic perspective.
The distinction between slips
(e.g., wrong actions) and mistakes (e.g., wrong interpretation and plans) is very important in this framework since the mechanisms of recovery may vary for different types of error. -----------------------Figure 1 here -----------------------Before proceeding with descriptions of the error-handling processes, it is important to point out that the observation of adverse outcomes is not the only starting point in error recovery. In fact, ergonomics are more concerned with the detection and correction of errors before critical consequences are ensued. It is interesting therefore to examine what user strategies can support error recovery while an action is being performed, or even at the stage where the action-plan is being formulated. Figure 2 shows three different stages of performance at which error recovery can occur. a) Recovery in the outcome stage whereby a mismatch between expected effects and observed outcomes, can signal error detection. This may not be an easy task in complex systems since the effects of operator actions may be masked by other actions taken by the automated safety system, other persons, or even the same person in the past. This is the reason for emphasising the masking effects during the “delay interval” in Figure 2. b) Recovery in the execution stage whereby errors are “caught-in-the-act” and subsequently corrected. Recovery from slips can occur at the execution stage by comparing the actions being executed to the actions specified in the plan. c) Recovery in the planning stage whereby operators can recognise wrong intentions, or mismatches between intentions and formulated plans. Any mistakes in understanding the causes of a problem (fault diagnosis) or in specifying plans of action can be detected and corrected at this conceptualisation stage. ------------------------
4
Figure 2 here -----------------------The first recovery stage occurs “after-the-fact” while the other stages rely on a self-monitoring function that captures errors before consequences are ensued.
The
main purpose of the proposed framework is, therefore, to examine recovery processes as a function of the type of error committed (i.e., slip or mistake), the strategy devised by experienced users, and the performance stage at which recovery occurs. The remainder of this section looks deeper into several aspects of
error detection,
explanation, and correction in order to provide a basis for developing a taxonomy of user strategies in error recovery.
2.2.
Error detection
Error detection has been the most widely studied process of error handling probably because it constitutes the starting point of recovery.
An important mechanism that
triggers detection seems to be a mismatch between “expected effects” and “observed outcomes”. Difficulties in perceiving or attending to actual outcomes, and setting-up or remembering “expectations about effects”, can result in failures of detection in the outcome stage of performance.
Job factors giving rise to such difficulties may
include system design, workload, and user strategy. The action outcomes, for instance, may not be perceptible because of poor interface design, may be masked by safety logic interventions, or may not be sufficiently
attended because of high workload.
On the other hand, expectations about action effects may be ill-specified because of unfamiliarity with the work domain. Mathematical calculations or mental maths, for instance, are tasks where it is difficult to know what to expect in terms of action effects (Sellen, 1994). In complex systems, where equipment failures can occur and safety logic can intervene, undesired outcomes may not be attributed to operator action. Norman (1988) argued that users have to eliminate alternative error causes before assuming a self-produced error.
Hence, biased attitudes of “explaining away errors” can impede
detection and give rise to accidents.
Detection on the basis of outcome may be
difficult, particularly for complex sequences of actions, because of the memory burden
5
involved in keeping track of them; thus, previous actions can be forgotten and undesired outcomes can be attributed to equipment rather than one´s own performance. Figure 3 summarises the main aspects of error detection in the three stages of performance. -----------------------Figure 3 here -----------------------Errors can also be detected in the execution stage where operators may notice a mismatch between actions being executed and actions specified in their plans. As argued by Sellen (1994), this “action-based detection” takes place through a perception of some aspect of the erroneous action either auditorially, visually, or proprioceptively. A large percentage of typing errors, for instance, can be detected by skilled typists in the absence of any feedback from the display or keyboard; this implies that certain forms of slips can be detected proprioceptively (Rabbitt, 1978). Although many slips can be “caught-in-the-act”, action-based detection may also include cases where erroneous actions are recognised after their execution when operators have already proceeded further in their sequence of actions. In this case, a comparison is made between “memory of executed actions” and actions specified in the plan. Relatively little work has been done with regard to error detection at the conceptual or planning stage. Detection at the conceptual stage can take two forms, that is, (a) recognising mismatches between intentions and plans of action, and (b) recognising wrong intentions (i.e., higher-level goals).
The former refers to cases
where operators have formulated a plan that is not suitable for achieving their goals while the latter refers to higher-level goals that are unsatisfactory due to an inadequate understanding of the causes of the problem. Detection at the planning stage (e.g., detection of mistakes) is far more difficult than outcome or action-based detection because access and comparison to an efficient plan may be limited. In many cases, the same factors that caused the error in the first place may prevent its detection. In complex systems, a mechanism that impedes detection at the planning stage is “cognitive tunnel vision” in which evidence pointing to errors in the current plan is
6
disregarded. An analysis of several simulated nuclear emergencies (Woods, 1984), for instance, has found that the most common cause of failure to detect errors was “fixation” on the part of operators; wrong plans were mainly detected by other people who came to the emergency afresh.
2.3 Error localisation or explanation Once an error has been detected, operators may try to identify or explain the causes of the error.
Tracing the causes of mistakes appears to be more difficult than in cases of
slips. A mistake can be made either at the stage of problem interpretation, setting of higher-level goals, or formulation of plans of action. The need for a thorough explanation of errors will depend upon the time available to compensate and the extent that localisation is necessary for developing a new plan. The contribution of the error localisation phase to the error handling process is not so well researched. Analysis of near-miss reports in a steel plant (van der Schaaf, 1995) has shown that hardly any recovery process went through the more analytic localisation phase. This could be attributed to the limited time available to compensate or develop a new plan. In the very last phase of an accident sequence, there may simply not be enough time for a time-consuming explanation of the error. On the other hand, there might be cases where an explanation of error causes may be necessary in order to compensate. This reflects some of the research problems about the extent that an accurate assessment of the situation is necessary for developing a plan of action.
Error
explanation has also important implications for learning from one’s own errors and improving knowledge. That is, feedback about things that operators do not know yet is important for improving their mental models of the system.
2.4 Error correction In many respects, error correction involves similar mechanisms to the processes that caused the error in the first place (error causation). Operators will have to re-assess the problem situation in terms of the constraints set by an undesired system state, new developments of the initiating event, and their own capabilities. This re-assessment is necessary in order to decide upon the goal of recovery (e.g., forward, backward or compensatory recovery). Subsequently, a corrective plan must be formulated and
7
executed in a timely fashion. It worth noting that the context of work in error correction may be different than the initial one since less time may be available, fewer safety resources could be used, while stress levels may have increased due to unsuccessful attempts.
Coping with frustrations and psychological stress due to
previous errors is an important aspect of error correction. Research in error training (Brodbeck et al., 1993; Dormann and Frese, 1994)
may be a way forward to
developing efficient emotional strategies as well as co-operative strategies within the operating team.
3. USER STRATEGIES IN ERROR RECOVERY In this section, a proposal is made of a taxonomy of user strategies in error recovery which can be used as a basis for improvements in the work environment to support the different processes of error handling.
The taxonomy of user strategies has been
based on an elaboration of previous research in error recovery
and planning
behaviours (Woods, 1984; Seifert and Hutchins, 1992; Kontogiannis 1996; 1997). Figure 4 shows that error recovery strategies can be assigned into five groups:
Inner feedback. Errors can be detected and corrected utilising information from human memory (i.e., memory cues) or from the components of the action itself (e.g., action-based detection). This type of feedback is not related to the results of operator actions.
Exploring system feedback. Operators can explore feedback provided by the interface about the consequences of their actions in order to detect errors.
External communication.
Interactions with other team members can enhance
error detection and correction (e.g., team communication and supervision)
Planning behaviours. This group refers to a number of active strategies utilised by operators in situations entailing dynamic events, complex responses, and uncertainty.
Examples include: self-checking, planning ahead, anticipating
contingencies, arranging revision stages, and consulting external advice (e.g., emergency procedures).
8
Error informed strategies. Experience with errors made in other situations or training scenarios can help operators avoid similar errors in their jobs. Matching familiar error patterns to outcomes and coping with frustrations are examples of error informed strategies.
These operator strategies in error recovery can be encouraged with different kinds of support from the work environment. For instance, control panel design can support inner feedback and exploration of system feedback, training can augment planning behaviours, team training may facilitate external communications, and error training may promote learning from errors.
Each group of strategies is further
described below to provide a basis for appropriate support required from the work environment. --------------------Figure 4 here ----------------------
3.1
Inner feedback
Rizzo et al. (1994) refer to inner feedback as information available in the working memory which does not depend on feedback from the action outcome.
In other
words, errors can be “caught-just-after-the-act” when actions specified in the plan are compared to memory of actions executed in the past. To use an example from a diary study of Rizzo et al. (1994), “a person remembered that she left her book in a pub while checking what she should have had with her in the car; a feel of being lighter than before was the trigger for making this check”.
Another type of inner feedback
could be information coming from the components of the action being executed. Sellen (1994) refers to this episode as action-based detection (i.e., errors “caught-inthe-act”) which does not rely on the consequences of action. Detection of typing errors by skilled typists has been cited in the previous section as an example of utilising effectively inner feedback.
3.2
Exploring system feedback
Reason (1990) provided a good account of several ways in which users can exploit error cues in the environment or the control panel. Outcome-based detection relies on
9
external information provided in the environment about the consequences of action. Unfortunately, in complex systems, feedback may come when it is too late because of the long lags in the technical process.
Inferences about the possible state of a
parameter (e.g., temperature in a vessel) can be made by consulting feedback from other functionally related parameters (e.g., vessel pressure). Functional grouping of information on the control panel has been advocated as a sound ergonomic practice. Another source of difficulty in error detection is the masking of the effects of previous actions. In such cases, system feedback may confuse operators or contribute to “fixation” on previous assessments. The air crash in Kegworh (AAIB, 1990) is an example of fixation on an erroneous assessment of the situation due to the interventions of the safety logic which masked the actions of the crew. In that incident, switching off the right-hand engine (the healthy one) seemed to have cured the symptom temporarily (i.e., engine vibrations).
In actual fact, the vibrations
stopped because this action of the crew caused the auto-throttle to stop feeding the damaged left-hand engine with more fuel. In this sense, the mode-change of the autothrottle masked the effects of the actions of the crew who kept believing that the problem was in the right-hand engine. Coping with masking effects requires a good knowledge of the interventions of other agents, such as safety logic and team members, whose actions are not directly observable. Lewis and Norman (1986) identified several ways in which a system can respond to user errors. Some of them can be briefly mentioned here although they do not concern user strategies. The most unambiguous way in which the system can inform users that they have made an error is to block their onward progress. This is a “forcing function” which detects that some goal constraints have been violated and prevents performance of the intended action.
Forcing functions can guarantee error
detection in the execution stage but cannot help with error correction. Other types of system responses to errors include: warnings, self-correction of minor errors, starting a dialogue with the user, and so on.
3.3
External communication
Interactions with team members and supervisors can provide opportunities for error recovery by other persons.
External communication is probably one of the most
10
important mechanisms for detecting and correcting mistakes either in the planning stage or the outcome stage. Analyses of nuclear power incidents (Pew et al., 1981) and simulated nuclear scenarios (Woods, 1984) have shown that external communication was the main source of detection in erroneous diagnoses and plans. Good communications between team members and commanders proved to be a valuable resource in avoiding errors in diagnosis and planning in the military aviation sector as well (Serfaty and Entin, 1996). Hutchins (1994) argued that four elements are necessary for error detection by other persons. First, whoever detects the error must have access to the performance of his colleague that may be in error. Second, the detector must happen to attend to the aspect of the task that could result in error; high workload can divert attention to other parts of the job. Third, the detector must have knowledge of the task being performed or have some expectation about its correct outcome. Finally, the detector must have a perspective of the possible goals associated with the observed behaviour. This last point is very important since an action could seem very sensible to an outside observer but attain a different goal to that sought by the actual performer. Knowledge of possible goals associated with a task is very important in detecting mistakes. Supervisors and team members, who used to do the same job in the past, can understand the perspective of the task and act as error detectors. This brings us to the question of job design and the overlapping of knowledge or roles between team members. Multi-skill training can achieve a broadening of task perspectives and contribute to error detection by other team members.
It may also
enhance detection of one´s own errors. Seifert and Hutchins (1992) have quoted the example of the bearing-taker (i.e., the person who assumes the perspective of “compass reader”) but doesn’t know how this information is used by the plotter. Broadening the perspective of the bearing-taker to think about the bearing in terms of the co-ordinate space of the plot-chart (e.g., the physical location perspective) could help him notice reading errors. Adopting multi-perspectives can thus support error detection.
3.4 Planning behaviours Meta-strategies or self-monitoring strategies are probably the most effective user strategies in recovering from both slips and mistakes in complex systems. Research
11
in planning behaviours (e.g., Allwood, 1984; Bagnara et al, 1987; Kontogiannis, 1996; 1997) have identified several strategies that can be effective in error recovery.
The
most known self-monitoring strategy is a standard check made by operators periodically on their task progress (Allwood, 1984).
Simulations of production
planning exercises in a hot strip mill (Bagnara et al., 1987) showed that mistakes were discovered largely as a result of standard check behaviour.
Self-checking is a
proactive strategy which can take two forms. A decision may be made in advance that certain parts of a plan should be checked during execution to determine if any modifications are needed.
Self-checking may also come as a general work habit
where a routine check is made on previous actions, current ones, and actions that have been suspended or deferred; some future actions may also be previewed in order to remember their correct order. The reasons “why” and “when” an action sequence is checked depend on the constraints of the task, the context of work and the operator´s idiosyncratic attitudes. Another self-monitoring strategy can take the form of contingency plans whereby operators articulate their decision criteria and identify those that cannot be satisfied in their current strategies (Kontogiannis, 1996). This is crucial for managing dynamic events because the uncertainty of the data and the urgency to find a solution would make even a substandard decision tolerable for a certain time period. Even for well-thought decisions, some risks may arise as the event could evolve in unanticipated ways. By making problem constraints explicit, operators can become aware of unsatisfied criteria and develop contingency plans to cope with the risks associated with their chosen option.
In this sense, contingency plans can help
operators to detect aspects of their action-plans that may prove to be inefficient in the long run and prompt them to specify ways of coping with contingencies. Error suspicion and curiosity is another form of proactive behaviour that helps operators to keep their vigilance for subtle changes in the work environment. It is related to contingency planning, but works mainly for familiar tasks. Error suspicion and curiosity counteracts the syndrome of “complacency”, that is, a sense of self-satisfaction accompanied by unawareness of actual dangers or equipment deficiencies. Accident statistics of the US Forest Service (Jensen, 1989), for instance, showed that most aircraft accidents occur after the fire season is over and they usually
12
occur during what most pilots consider routine, point-to-point flights (e.g., running out of fuel, wheels-up landings, etc.).
Task-induced complacency occurs during
routine tasks where pilot expectations reduce vigilance to novel stimuli. Reliance on automation can also cause complacency, failures of error detection, and rigidity of expectations. In highly automated cockpits, pilots may lulled into a complacent state in which they may be easily distracted from their monitoring tasks by unimportant events (Hurst and Hurst, 1982). Error suspicion and curiosity is an attitude that makes operators prepared for several contingencies. Landing on a short runway is a procedure well-known to many pilots. Suspicious and curious pilots may decide in advance that failure to touch down before a certain point on the runway will call for an immediate go-around. In this sense, pilots may plan ahead their corrective plans for errors that may occur when landing on a short runway. Suspicion and curiosity are considered to be real virtues in aviation. Collins (1992) argued that “morbid as it sounds, the pilot who feels that the skies of the world are full of lurking hazards is likely to be the safest pilot”. Jensen (1995) has discussed several job design and attitude-change programmes that encourage error-suspicion and curiosity behaviours to counteract compliance. Two of the most frequent reactions to the discovery of an undesired outcome are
repeating the most recent action, or implementing an action that was just
forgotten. Instead of revising the way that a previous plan was carried out and reassessing the implications of the new situation, operators may tend to repeat the erroneous plan or apply parts of the plan that may be inappropriate in the new situation. Revision plans, although they may be reactive behaviours, are valuable in re-assessing the implications of undesired system states and in modifying previous action-plans. The following incident demonstrates the deleterious effects of failing to revise previous unsuccessful plans.
In a batch chemical reactor, the operator forgot to open a valve in the base of the reactor which would allow glycerol to pass through a heat exchanger and increase its temperature to react with ethylene oxide.
13
Although the indicated temperature had risen, the temperature of the reactor´s content had not. This happened because the temperature point had been placed close to the discharge pump (a design fault) which got hot (it was running against the closed suction valve) and transmitted its heat. When he realised that he left the valve shut, he opened it without any second thought.
Three tons of unreacted
ethylene oxide together with glycerol passed through the heat exchanger and a violent uncontrolled reaction occurred. (Kletz, 1994, pp. 56-58.)
Instead of revising the situation and evaluating the appropriateness of the forgotten action, the operator rushed into opening the valve.
The high pressure in the
reactor indicated that the ethylene oxide was not reacting, but the operator failed to consider this possibility since he did not revise the possible hazards of the new situation. The final category of planning behaviours in Figure 4 concerns the use of written procedures in complex systems. Contrary to traditional views, following procedures to cope with emergency scenarios is far from a routine activity.
A
common remark of many simulation studies (Roth et al., 1994; Joeffry and Charron, 1997) is that operators must use procedures in an intelligent way to cope with unfamiliar situations. The crew, for instance, may use procedures as a guide for further information search to assess the situation (i.e., important parameters to check) rather than find instructions for recommended actions. For situations requiring immediate actions, procedures can be used as a redundant system in order to verify that previous operator responses have not missed any important items (Degani and Wiener, 1993).
In other circumstances, operators may preview up-coming steps in
order to get a feel of the complexity of a procedure. In all these cases, operators compare their plans and system-monitoring strategies to those specified in the procedures.
“External plan” comparison can be seen, therefore, as a form of
planning behaviour that goes beyond the rote following of external advice.
3.5
Error informed strategies
14
Although most of the previous strategies could be enhanced with the use of error feedback provided either on-the-job or on training programmes, two error-informed strategies which could benefit the most are: (a) matching familiar error patterns, and (b) coping with frustrations from errors.
The former strategy may be useful in
avoiding errors made in similar situations while the latter may help in developing some stress management skills. Matching familiar error patterns is a strategy that has long being recognised as an error detection mechanism in familiar situations. In other words, a match occurs between expectations about errors and actual outcomes. Allwood (1984) and Rizzo et al. (1987) refer to this detection strategy as “direct error-hypothesis formation”. In her diary study, Sellen (1994) quoted a case where a person detected that she had written “1989” instead of “1990” on her cheque; it was January and she made a habit of looking for this error. This is a data-driven process where the tendency to err in certain situations sensitises people to familiar error patterns and prompts them to systematically inspect the product of their actions (Sellen, 1994). Errors, particularly those that take long to correct, can cause stress (Brodbeck et al., 1993).
Experience with “controlled” training situations, which do not restrict
job practices and allow errors to occur, may help people to develop some stress management skills to cope with associated frustrations. For this to occur, instructors must make special provisions since a “blame” culture may be developed instead. What types of instructions could enhance coping with frustrations is not yet well researched. Studies in error management training (Dormann and Frese, 1994) seem to be promising in exploring how error feedback in “controlled” environments can support such error coping mechanisms.
4.
TOWARDS A RESEARCH FRAMEWORK IN ERROR RECOVERY
From the above discussion it is clear that error recovery is a promising area of research in human reliability and ergonomics.
There is a need for a research
framework, therefore, that would put forwards new hypotheses for testing in the
15
context of simulated scenarios, analysis of accidents, and near-misses. Specifically, we need to examine how error detection and correction may vary as a function of the three performance stages (planning, execution and outcomes stages), error types (slips and mistakes), and user strategies.
Table 1 shows an example of the sort of
hypotheses that we need to test in error recovery research. ---------------------Table 1 here ---------------------A standard check on progress, for instance, may be efficient in detecting and correcting slips at the execution and outcome stages while mistakes can be detected at the planning and outcome stages. Contingency plans can help operators detect things that can go wrong in the long run and so prepare them as to the required plan corrections at the outcome stage. A revision plan is a reactive strategy for assessing the situation to detect slips and mistakes, after an error is made; the extent to which mistakes can be corrected at the outcome stage can differ, depending on the situation. Similar hypotheses can be put forward for other planning behaviours. Error suspicion is a strategy mainly for detecting and avoiding mistakes;
a pilot, for
instance may detect some instrumentation problems during take-off and abort it when unable to understand the causes of the situation. Error suspicion and curiosity are also powerful for detecting and correcting slips in the outcome stage (e.g., detecting a wheel-up landing and correcting the problem by doing a go-around). Again, there is not enough research evidence to suggest how this behaviour can compensate for mistakes at the outcome stage. External plan comparison is a planning behaviour that makes intelligent use of written procedures (e.g., using procedures to guide information search, previewing upcoming steps, and reminding one self of suspended or omitted checks). In this respect, mistakes can be detected even at the conceptual stage. The extent to which they can be corrected as well, would depend upon the effectiveness of procedures. Slips can be detected and corrected, at least in the execution stage by receiving feedback from both the system and the procedural checks. Focusing research in these issues would provide the basis for generating evidence about different ways (i.e., error
16
recovery strategies) which can support error detection and correction even at the planning stage of performance.
5. IMPLICATIONS FOR SYSTEM DESIGN AND OPERATOR TRAINING The proposed framework can be used in the development of system design and training regimes that could potentially enhance the processes of error recovery. Inner feedback and exploration of system feedback are user strategies that can be supported by interface design. On the other hand, planning behaviours and error-informed strategies are best supported by appropriate training regimes.
External
communications can be supported by both training and good interface design (e.g., enhanced horizon of observation).
5.1
System design
On the basis of the proposed framework, this section examines three broad design principles: observability of undesired system states which promotes self-monitoring, traceability of actions and effects to revise understanding, and reversibility of errors to help timely correction. The first one, observability or transparency, requires that the design incorporates features that facilitate prediction of system changes and detection of undesired states.
This can promote error detection by providing appropriate and
immediate feedback.
In high risk systems, for instance, critical actions on the
interface that may produce different outputs can be mediated by a well differentiated action pattern.
Designing knobs and dials to feel different from one another can
enhance error detection during execution (i.e., tactile feedback).
Provision of
discriminating feedback, therefore, can support action-based detection. The use of “forcing functions”, which block further progress when operator commands have adverse system effects, is another design feature that supports outcome—based detection (Lewis and Norman, 1986). A common error with many safety devices is persevering with the activation of equipment that are kept inoperable by the safety logic.
Making the functioning of
safety devices transparent to operators can help them detect their errors. Feedback
17
about the success or failure of safety devices, for instance, is important for understanding the current system state; also, reminders about temporarily inoperable equipment due to safety trips is a good feature of display design. The “horizon of observation” allowed by a system has also crucial effects on team communications and error detection by team members.
Any form of interface or computerised
mediated communication that suppresses or eliminates important cues about the activities of other team members may reduce opportunities for error detection within the team. Upon detection of an error, operators need to understand how their actions came to produce any unwarranted effects and develop better plans to correct the problem. Traceability refers to design features that help operators trace the causes of inadequate or erroneous actions. In slow-response systems, traceability is very crucial because it is difficult to understand which actions have led to undesired system states. Actions on slow-response equipment may take a long time before any effects are observed; in this sense, the results of previous actions are combined in ways that are difficult to understand. Providing reminders of actions taken in the past may increase traceability of patterns of actions and effects. Functional grouping of information has been mentioned, in section 3.2, as a design feature supporting inferences about controlled variables from others functionally related. The third principle is based on incorporating design features that promote reversibility of actions. It allows operators to cancel the effects of their actions and develop plans to bring the system to new safer states. Equipment redundancy is the simplest form of making allowances for recovery operations. In the context of text editing, “undo” commands are available which reverse the effects of previous actions. For complex systems, queuing devices may be possible to design that delay the execution of an action (e.g., cancel an action before its execution).
Other supportive
functions, such as simulation of actions and prediction of future system consequences may also enhance error recovery at the stage of plan formulation. Recovery from situations which have led to technical failures is, in general, more difficult because new plans of action have to be developed under tighter time constraints.
5.2
Training for error recovery
18
Improvements in system design can play a significant role in error recovery, however, there are always likely to be situations that are difficult to foresee, and system safety would rely on operator training. So far, the most focused training on error recovery has been in the area of human-computer interaction based on the concept of error management training (Frese, 1991; Brodbeck et al., 1993; Dormann and Frese, 1994). The main principle of this training philosophy is that “learning tasks” are constructed in ways that allow errors to occur without placing many restrictions on the required task practices.
The advantages of such learning environments
are that trainees
develop appropriate strategies for self-monitoring and coping with error frustrations while, at the same time, their knowledge is improved as they get to see the system through phases which may not be present under error-free performance. In addition, experience with certain error patterns can provide a basis for error detection in the actual job even before outcome feedback becomes available. Although the results of these studies have been very useful, it is rather difficult to extrapolate to more complex industrial domains that require managing high-risk tasks, team interactions, uncertain event progressions, and psychological stress. The taxonomy of planning behaviours (Figure 4 and Table 1) can provide a good basis for developing training programmes that foster error recovery on the part of the operators themselves. Self-checking (e.g., a standard check on progress), for instance, is a self-monitoring strategy that has been found useful in detecting mistakes. It is interesting therefore to examine what types of training can support this strategy. Training operators in incremental planning is a candidate approach. That is, carrying out a plan in “small pieces” rather than as a “whole” may be a case where a standard check on progress can result in the detection of wrong goals and plans. Incremental planning has been found to be an efficient strategy in coping with errors in the context of simulated emergency exercises (Woods and Roth, 1986). Training in contingency planning is another approach which can support error correction in complex and dynamic systems.
A contingency plan is a strategy for
thinking ahead of possible side-effects in the long run when operators appreciate that certain decision criteria cannot be satisfied in their current actions or that the situation may develop in unforeseen ways. The concept of a Decision Balance Sheet (Janis, 1989) can be useful in this respect.
19
To enable operators to anticipate
contingencies, a sheet can be developed specifying: the problem constraints, the importance of selection criteria, the operator’s confidence in achieving these criteria, and the potential side-effects arising from violations of these constraints.
An
example of how a Decision Balance Sheet can support contingency planning is presented by Kontogiannis (1996) in the context of an emergency scenario. Kaempf and Klein (1993) advocate the use of “mental simulation” techniques which enable operators to reason into the future about their actions and to devise contingency plans to cope with possible side-effects and other uncertainties. Apart from “playing out” a situation into the future, “mental simulation” can also be used to reason about how the current situation came to be (e.g., diagnosis of causes). Thus, “mental simulation” techniques may have a good potential for training both contingency planning and diagnostic skills. For familiar tasks, training for counteracting the syndrome of complacency can support recovery of slips and mistakes. Error suspicion and curiosity is a kind of vigilant behaviour that helps operators to correct slips and avoid mistakes. Jensen (1995) presented several training methods that can be used to counteract complacency.
Most of these methods focus on developing appropriate attitudes,
such as critiquing oneself, verbalising routine actions under high workload, repeating back consciously the comments of other team members, and asking “what if” questions. Another form of training relies on changing the composition of the team during simulated exercises.
That is, working with a new person may require
explaining one´s expectations and, thus, may cause one to think why certain expectations are set.
Errors based on expectations can therefore be reduced or
detected in a timely fashion. Multi-skill training could be another approach to supporting self-detection and error detection by other persons. Learning certain parts of the job that are normally carried out by other team members may help a person to understand how the data he collected can be used by others.
In section 3.3, it was mentioned that errors in
compass reading can be captured by the same person provided that he knows something about how these data are used by the plotter. However, these expectations about how one´s input is used by another person may give rise to other types of error (Seifert and Hutchins, 1992).
The extent and kind of multiple perspectives to be
20
embedded in a certain job position is a challenging issue which needs to be resolved with the use of simulated exercises. In this respect, the taxonomy of
user strategies in error recovery, and
specifically the planning behaviours, can be used for generating several training hypotheses to be tested empirically.
Further research along the lines of the
relationships between recovery strategies, recovery stages, and error types can provide useful guidance for developing training scenarios tailored to error recovery skills.
6. CONCLUDING REMARKS This article has sought to develop a framework for conducting research in error recovery and generating hypotheses about improvements in interface design and training regimes. These changes in the work environment can be optimised when they are based on an understanding of the different error handling processes and the recovery strategies of experienced users. The literature on error recovery has been mainly concerned with the domain of human computer interaction and, for this reason, it was supplemented with findings of studies in problem solving in dynamic systems (Woods, 1984, Bagnara et al, 1987; Seifert and Hutchins, 1992; van der Schaaf, 1995; Kontogiannis, 1996; 1997). This has helped in addressing better the issue of recovery from mistakes, the role of external interactions, and aspects of recovery strategies regarding planning behaviours. The relationships between the various components of the proposed framework in Table 1 provide some directions for future research in error recovery which remains a challenging area. The benefits of this approach have been illustrated in the context of interface design and operator training. In order to test the hypotheses derived from the proposed framework, there is a need for controlled laboratory studies, data collection from simulated exercises, and analysis of near misses. The later source of information is very valuable since the data come from actual work conditions while relatively fewer resources are required to produce them.
Near miss collection, however, is only possible in environments
where a “blame” culture does not exist; otherwise there will be a cover-up of errors which hinders the reporting of errors and recovery factors. Confusing “error” and
21
“responsibility” can be a trap in the error recovery philosophy, according to de Keyser (1995). Understanding the error handling process would enable us to develop better human performance models by including factors influencing both error prevention and recovery. Good user models can be valuable in designing the operator interface and developing appropriate training regimes. They can also be used in human reliability studies for the quantification of error recovery probabilities and their incorporation in risk analysis. However, error recovery cannot be seen as an independent safety goal, separate from error prevention. System designers should try and think ahead of time of all possible ways to prevent errors rather than leave the human operator to compensate for any residual design failures. Focusing on recovery should not neglect a user-centred design approach from the outset.
7.
REFERENCES
AAIB (1990). Report on the Accident to Boeing 737 - 400 G-OBME near Kegworth, Leicestershire on 8th January 1989.
Report no. 4190,
Air Accidents
Investigations Branch, Department of Transport. HSMO, London. Allwood, C.M. (1984). Error detection processes in statistical problem solving. Cognitive Science, 8, 413-437. Amalberti, R. (1992). Safety in process control: an operator-centred point of view. Reliability Engineering and System Safety, 38, 99-108 Bagnara, S., Stablum, F., Rizzo, A., Fontana, A. and Ruo, M. (1987). Error detection and correction: a study on human-computer interaction in a hot strip mill production planning and control system. Proceedings of the First European Meeting on Cognitive Science Approaches to Process Control, October 1987, Marcoussis, France. Brodbeck, F.C., Zapf, D., Prumper, J., and Frese, M. (1993). Error handling in office work with computers: a field study. Journal of Occupational and Organisational Psychology, 66, 303-317.
22
Collins, R.L. (1992). Air Crashes. Thomasson-Grant, Virginia, USA. Degani, A. and Wiener, E. L. (1993). Cockpit checklists: concepts, design and use. Human Factors, 35(2), 345-359 De Keyser, V. (1995). Evolution of ideas regarding the prevention of human errors. Proceedings of the Sixth IFAC/IFIP/IFORS/IEA Symposium on the Analysis, Design and Evaluation of Man-Machine Systems. June 1995, Cambridge, MA. Dormann, T. and Frese, M. (1994). Error training: replication and the function of exploratory behaviour. International Journal of Human Computer Studies, 6, 365-372. Frese, M., (1991). Error management or error prevention: two strategies to deal with errors in software design. In Human Aspects in Computing: Design and Use of Interactive Systems, (ed. H. J. Bullinger), pp. 776-782. Elservier, Amsterdam. Frese, M and Zapf, D. (1994). Action as the core of work psychology: a German approach. In Handbook of Industrial and Organizational Psychology, (eds H.C. Triandis, M.D. Dunnette and L.M. Hough), Vol. 4, Second edition, pp. 271-340. Consulting Psychologists Press, Palo Alto, CA. Hurst, R. and Hurst, L.R. (1982). Pilot Error: The Human Factors. Jason Aronson, New York. Hutchins, E. (1994). Cognition in the Wild. MIT Press, Cambridge. Janis, I.L. (1989). Crucial Decision Making. Free Press, New York. Jeffroy, J. and Charron, S. (1997). From safety assessment to research in the domain of human factors: the case of operation with computerised procedures. Proceedings of the Sixth IEEE Conference on Human Factors and Power Plants, June 1997, Orlando, Florida. Jensen, R.S. (1989). Aviation Psychology. Gower Press, Brookfield, VT. Jensen, R.S. (1995). Pilot Judgment and Crew Resource Management. Publishing Limited, Aldershot.
23
Ashgate
Kaempf, G.L. and Klein, G. (1993).
Aeronautical decision-making: The next
generation. In Cockpit Resource Management, (eds E. Wiener, B. Kanki and R. Helmreich), pp. 223-254. Academic Press, New York. Kletz, T.A. (1994). What Went Wrong ? Case Histories of Process Plant Disasters. (Third Edition). Gulf Publishing Co., Houston. Kontogiannis, T. (1996). Stress and operator decision making in coping with emergencies. International Journal of Human-Computer Studies, 45, 75-104. Kontogiannis, T. (1997). A framework for the analysis of cognitive reliability in complex systems: a recovery centred approach. Reliability Engineering and System Safety, 58, 233-248. Lewis, C. and Norman, D .A. (1986). Designing for error. In User Centered System Design, (eds D. Norman and S. Draper), pp. 411-432. Lawrence Erlbaum Associates, New Jersey. Mo, J. and Crouzet, Y. (1996). Human error tolerant design for air-traffic control systems.
Proceedings of the Third Probability Safety Assessment and
Management, PSAM-III, (eds P. C. Cacciabue and I. A. Papazoglou), June 1996, Crete, Greece. Norman, D. A. (1988). The Psychology of Everyday Things. Basic Books, New York. Pew, R. W., Miller, D. C. and Feeher, C. E. (1981). Evaluation of Proposed Control Room Improvements through Analysis of Critical Operator Decisions. Electric Power Research Institute, EPRI-1982, Palo Alto, California. Rabbitt, P. (1978).
Detection of errors by skilled typists.
Ergonomics, 21(11),
945-958. Reason, J.T. (1990). Human Error. Cambridge University Press, Cambridge. Rizzo, A., Bagnara, S. and Visciola, M. (1987). Human error detection processes. International Journal of Man Machine Studies, 27, 555-570. Rizzo, A., Ferrante, D. and Bagnara, S. (1994). Handling human error. In Expertise and Technology: Cognition & Human Computer Interaction, (eds J. M. Hoc,
24
P. C. Cacciabue and E. Hollnagel), pp. 195-212.
Lawrence Erlbaum
Associates, New Jersey. Roth, E. M., Mumaw, R. J., and Lewis, P. M. (1994). An Empirical Investigation of Operator Performance in Cognitively Demanding Simulated Emergencies. NUREG/CR-6208, U. S. Nuclear Regulatory Commission, Washington, DC Rouse, W.B and Morris, N.M. (1987). Conceptual design of a human error tolerant interface for complex engineering systems. Automatica, 23 (2), 231-235. van der Schaaf, T.W. (1995). Human recovery of errors in man-machine systems. Proceedings of the Sixth IFAC/IFIP/IFORS/IEA Symposium on the Analysis, Design and Evaluation of Man-Machine Systems, June 1995, Cambridge, MA. Seifert, C.M. and Hutchins, E.L. (1992).
Error as opportunity: learning in a
cooperative task. Human Computer Interaction, 7, 409-435. Sellen, A. J. (1994). Detection of everyday errors. Applied Psychology: An International Review, 43(4), 475-498. Serfaty, D. and Entin, E.E. (1996). Team adaptation and coordination training.
In
Decision Making Under Stress: Emerging Themes and Applications, (eds R. Flin, E. Salas, M. Strub, and L. Martin), pp. 170-184. Aldershot: Ashgate Publishing. Wioland, L. and Amalberti, R. (1996). When errors serve safety: towards a model of ecological safety. Proceedings of the First Conference on Cognitive Systems Engineering in Process Control. November 1996, Kyoto University, Japan. Woods, D.D. (1984). Some results on operator performance in emergency events. Institute of Chemical Engineers Symposium Series, 90, 21-31. Woods, D.D,. and Roth, E. (1986). The Role of Cognitive Modeling in Nuclear Power Plant Personnel Activities: A Feasibility Study.
NUREG-CR-4532, US
Nuclear Regulatory Commission, Washington, D.C. Woods, D. D, Johannesen, L. J, Cook, RI, and Sarter, N. B. (1994). Behind Human Error: Cognitive Systems, Computers and Hindsight. Crew Systems Ergonomics Information Analysis Center, Wright-Patterson AFB, Ohio.
25
Zapf, D., Maier, G.W., Rappensperger, G. and Irmer, C. (1994). Error detection, task characteristics, and some consequences for software design.
Applied
Psychology: An International Review, 43, 499-520. Zapf, D. and Reason, J. (1994). Human error and error handling. Applied Psychology: An International Review, 43(4), 427-432.
26
FIGURES AND TABLES
Figure 1: A framework for the analysis of error handling processes
Figure 2: Performance stages at which error recovery can occur
Figure 3: Aspects of error handling processes
Figure 4: User strategies in error recovery
Table 1: Hypotheses about error recovery processes as a function of error types, user strategies and performance stages
27
Table 1: Hypotheses about error recovery processes as a function of error types, user strategies and performance stages
User strategies in error recovery
Planning stage (Mistakes)
Execution stage (Slips)
Outcome stage (Slips)
Outcome stage (Mistakes)
Standard check (or monitoring plans)
D
D-C
D-C
D
Contingency plans (for unfamiliar tasks)
D
D-C
Revision plans (reactive behaviours) Error suspicion (for familiar tasks)
D-P
External plan comparison
D-C
D-C
D-C
D
D-C
D
D-C
D-C
KEY: D = Error Detection; C = Error Correction; P = Error Prevention or Avoidance
Management factors Error detection SLIPS event MISTAKES
User strategies in error recovery
Error explanation
Error correction
Workplace factors
Figure 1: A framework for the analysis of error handling processes
Backward recovery Forward recovery
Compensatory recovery
Masking effects
Set high-level goals (intentions)
Formulate/ specify plans
ACT
(delays)
Wait
PLAN
Figure 2: Performance stages at which error recovery can occur
Monitor outcomes
Inner feedback
External communication
User Strategies in Error Recovery
Exploring system feedback
Memory cues Action feedback
Interacting with team members Adopting multiple perspectives
System error cueing Coping with masking effects
Standard check Contingency plans Planning behaviours
Error suspicion & curiosity Revision plans External plan comparison
Error informed strategies
Figure 3: Aspects of error handling processes
Matching familiar error patterns Coping with frustrations from errors
Detect mismatch between expectations and outcomes ERROR DETECTION
Compare effects of equipment failures & self-produced errors Detect mismatch between plans and executed actions Detect mismatch between intentions and plans
Locate error in the interpretation of the situation ERROR HANDLING
ERROR EXPLANATION
Locate error in goals or plans Locate error in the specification of the task sequence
Re-assess situation ERROR CORRECTION
Develop corrective plan Execute corrective plan
Figure 4: User strategies in error recovery