Development of a Human Cognitive Workload Assessment Tool MCA Final Report
Dr. David Embrey Dr. Claire Blackett Dr. Philip Marsden Mr. Jim Peachey July 2006
Human Reliability Associates School House Higher Lane Dalton Lancashire 01257 463121 www.humanreliability.com © Crown Copyright 2006
1
2
Table of contents Acknowledgements ..................................................................................8 1. Introduction ........................................................................................9 1.1
Project Overview .............................................................................9
2. Review of the mental workload Literature ........................................13 2.1
Approaches to workload assessment ................................................ 13
2.2
PIFs derived from the workload literature survey.......................... 14
3. Workload assessment in maritime operations: Incident Investigations & Stakeholder analysis ..............................................16 3.1
Workload errors in shipping ............................................................ 16
3.1.1 3.1.2 3.1.3 3.1.4
Databases examined ................................................................ 16 Criteria for selecting accidents ................................................... 17 Number and type of accidents examined ..................................... 17 List of Performance Influencing Factors (PIFs) identified from the survey .............................................................................. 19 3.1.5 Conclusions regarding PIFs ....................................................... 20 3.2. Stakeholder analysis ...................................................................... 21 3.2.1 Stakeholder definition ................................................................ 21 3.2.2. Stakeholder categorisation........................................................ 21 3.2.3 Stakeholder Requirements ........................................................ 23 3.2.4 Conclusions on stakeholder requirements.................................... 24 3.3
Analysis of safety critical tasks ........................................................ 25
4. Choosing the Methodology ................................................................26 4.1
Requirements for the CMWL assessment tool..................................... 26
4.1.1 Tender specifications................................................................ 26 4.1.2 Results of the survey of existing research and methods for measuring CMWL..................................................................... 27 4.1.3 Results of the survey of marine accidents involving CMWL issues ..... 28 4.1.4 Stakeholder analysis ............................................................... 28 4.1.5 Preliminary results from the task inventory investigations ............. 28 4.2
Determination of appropriate methodology ....................................... 28
5. Methodology .....................................................................................31 6. Summary of procedure for developing an Influence Diagram (ID) Model ........................................................................................32 6.1
Developing the ID model ................................................................ 32
6.2
Weighting the factors..................................................................... 35
6.3
Rating the factors using a scenario .................................................. 35
6.4
Reviewing and updating the model................................................... 36
7. Summary of results from the ID sessions...........................................37 8. Conclusions and Future Research Directions .....................................39 8.1
Proposed future research................................................................ 41
Appendix 1 – Approaches to Workload Assessment: A review of the literature.....................................................................................44 1. Introduction ......................................................................................45
3
2. Topic Overview..................................................................................45 2.1 Workload and error.......................................................................... 46 2.1.1 Optimal Performance (A2) .......................................................... 46 2.1.2 Increased Workload Demand (A3-B-C) ......................................... 46 2.1.3 Decreased Task Demands (A1-D) ................................................ 47 2.2 Workload assessment and system safety ............................................ 48 2.3 Background Summary...................................................................... 48 3. Workload Theory ...............................................................................49 3.1
The Single Channel Hypothesis........................................................ 49
3.2
Resource Theory and Mental Workload ............................................. 51
3.2.1 Single Resource Models ............................................................ 51 3.2.2 Multiple Resource Models .......................................................... 55 3.3
Composite Models ......................................................................... 57
3.4
Workload Theory: Comment ........................................................... 58
3.5
Summary ..................................................................................... 59
4. Vigilance ...........................................................................................61 4.1
The Decrement Function................................................................. 61
4.2
Components of Vigilance Tasks........................................................ 63
4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.3
Signal Modality........................................................................ 63 Signal Salience........................................................................ 64 Stimulus Uncertainty................................................................ 64 Background Context................................................................. 65 Stimulus Complexity ................................................................ 65
Summary ..................................................................................... 66
5. Measures of Cognitive Workload .......................................................67 5.1
Factors influencing choice of workload measure ................................. 67
5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 5.1.6 5.2
Subjective Workload Measures ........................................................ 69
5.2.1 5.2.2 5.2.3 5.2.4 5.3
Sensitivity ................................................................................ 67 Diagnosticity ........................................................................... 68 Primary Task Intrusion ............................................................. 68 Implementation Requirements................................................... 68 Operator Acceptance................................................................ 69 Selectivity .............................................................................. 69 Modified Cooper-Harper Scale (MCH).......................................... 70 NASA Task Load Index (NASA-TLX) ............................................ 72 Subjective Workload Assessment Technique (SWAT)..................... 74 Subjective Workload Assessment Techniques: Summary ............... 77
Performance-based Measures.......................................................... 78
5.3.1 Primary Task Measures .................................................................. 79 5.3.2 Secondary Task Measures ......................................................... 79 5.3.3 Summary ............................................................................... 80 5.4
Physiological Measures .................................................................. 80
5.4.1 5.4.2 5.4.3 5.4.4 5.4.5 5.5
Electroencephalogram (EEG) ..................................................... 81 ECG and Associated Cardiovascular Measures .............................. 82 Electrodermal Measures............................................................ 83 Electrooculogram (EOG) ........................................................... 84 Summary ............................................................................... 85
Task Loading Methods.................................................................... 85
4
5.5.1 5.5.2 5.5.3 5.5.4 5.5.7 5.6
Time Based Task Loading Models ............................................... 85 Task Analysis Workload Methodology (TAWL) .............................. 87 Cognitive Task Analysis (OFM-COG) ........................................... 91 TAD & TADAM ......................................................................... 94 Summary of Task Loading Models ............................................ 101
Influence Diagrams ..................................................................... 102
5.6.1 5.6.2 5.6.3 5.6.4 5.6.5 5.6.6 5.6.7
Using Influence Diagrams to Model Causal Relationships ............. 102 Example Influence Diagram for a medical application .................. 103 Different types of ID .............................................................. 105 Developing the Influence Diagram............................................ 106 ID Calculations ...................................................................... 106 Applying IDEAS to CMWL Assessment....................................... 109 Conclusions on the applications of IDEAS to CMWL Assessment.... 112
6. Industry Specific Workload Assessment Tools ................................114 7. Conclusions .....................................................................................117 Appendix 2 – Factors influencing workload in shipping incidents.........128 1. Databases examined .......................................................................129 2. Criteria for selecting accidents........................................................130 3. Number and type of accidents examined.........................................131 4. Examples of errors due to high workload ........................................133 5. Examples of errors due to low workload .........................................146 6. Examples of errors due to low cognitive workload switching to high cognitive workload ..................................................................156 7. List of Performance Influencing Factors (PIFs)...............................160 Appendix 3 – Stakeholder analysis.......................................................162 1. Introduction ....................................................................................163 2. Stakeholder definition .....................................................................163 3. Stakeholder categorisation .............................................................163 4. Stakeholder requirements...............................................................165 5. List of stakeholders (alphabetical order) ........................................165 6. Evaluation of the importance of Stakeholder requirements.............171 Appendix 4 – Bridge task inventory .....................................................176 1. Generic overview ............................................................................177 1.1
Deck department......................................................................... 177
1.2
Engine department ...................................................................... 177
1.3
Hotel department ........................................................................ 177
2. Deck Department Tasks ..................................................................178 2.1
Navigation and control of the ship.................................................. 178
2.2
Manoeuvring............................................................................... 179
2.3
Control of seaworthiness .............................................................. 179
2.4
Management of cargo operations (loading, unloading, stowage, securing) ................................................................................... 179
2.5
Maintenance of hull structure and fittings........................................ 179
2.6
Emergency preparedness ............................................................. 180
5
2.7 Safety Critical Tasks ...................................................................... 180 Appendix 5 – Sample letter to shipping companies ..............................181 Appendix 6 – Workshop sessions to develop workload models ............184 1. The Influence Diagram Workshops ..................................................185 1.1 Developing a seed model................................................................ 185 1.1.1 Example of an accident caused by overload................................. 186 1.1.2 The seed model ...................................................................... 187 1.2 The Influence Diagram process........................................................ 190 1.2.1 1.2.2 1.2.3 1.2.4
Add, remove and/or change factors ........................................... 190 Add weights to the model ......................................................... 195 Check the Influence Rankings of the model ................................. 196 Test the model with a scenario .................................................. 199
2. The Completed Models .....................................................................202 2.1 Group 1: Maersk tanker vessel crew members................................... 202 2.1.1 Group 1: Overload model ......................................................... 202 2.1.2 Group 1: Underload model........................................................ 207 2.2 Group 2: Maersk container vessel crew members ............................... 209 2.2.1 Group 2: Overload model ......................................................... 209 2.2.2 Group 2: Underload model........................................................ 212 2.3 Group 3: Condor Ferries crew members............................................ 214 2.3.1 Comments about the specific factors in the overload model ........... 214 2.3.2 Testing the model with scenarios ............................................... 217 2.4 Group 4: James Fisher crew members .............................................. 220 2.4.1 Comments about the overload model ......................................... 220 2.4.2 Testing the model using scenarios ............................................. 222 3. Analysis............................................................................................225 3.1 Analysis of the overload models....................................................... 225 3.1.1. Common factors across all overload models ............................... 225 3.1.2 Differences between the common factors.................................... 227 3.1.3 Differences between the overload models ................................... 228 3.2 The composite overload model ........................................................ 230 3.2.1 Verification of the composite overload model............................... 233 3.2.2 Testing the composite overload model........................................ 236 3.3 Analysis of the underload models..................................................... 239 3.4 The composite underload model ...................................................... 240 3.4.1 Verification of the composite underload model ............................. 241 3.4.2 Testing the composite underload model ...................................... 241 4. Feedback ..........................................................................................246 4.1 Feedback from the participants of the Influence Diagram sessions ........ 246 5. Validation of the CLIMATE software tool ..........................................248 Appendix 7 – Diary study planning.......................................................250 1. Development of a pilot diary study...................................................251 2. Results of the pilot diary study.........................................................255 Appendix 8 – List and description of overload and underload factors.............................................................................................257
6
1. List of primary overload factors and descriptions............................258 2. List of secondary overload factors and descriptions ........................261 3. List of tertiary overload factors and descriptions ............................268 4. List of primary underload factors and descriptions .........................270 5. List of secondary underload factors and descriptions......................272 6. List of tertiary underload factors and descriptions ..........................276 Appendix 9 – Feedback Questionnaire for ID sessions .........................277 Appendix 10 – Sample Diary Study questionnaire and feedback questionnaire ..................................................................................281
7
Acknowledgements We would like to acknowledge the generous support of the shipping companies James Fisher, Condor Ferries and Maersk Shipping for their contribution to this research project. We would also like to give special thanks to the long-suffering participants from the companies, who had to endure our gruelling workshops.
8
1.
Introduction
This document is the Final Report of work carried out on the above project during the period from the effective starting date of the project (August 1st 2005) to June 30th 2006. The report is organised into this main summary report together with a number of Appendices as follows: Appendix 1 Approaches to Workload Assessment: A review of the Literature Appendix 2 Factors influencing workload in shipping incidents Appendix 3 Stakeholder analysis Appendix 4 Bridge Task Inventory Appendix 5 Sample letter to shipping companies Appendix 6 Workshop sessions to develop workload models Appendix 7 Diary study planning Appendix 8 List and description of overload and underload factors Appendix 9 Feedback Questionnaire for ID sessions Appendix 10 Sample Diary Study questionnaire and feedback questionnaire
In subsequent sections of this summary report, each of the topic areas in the Appendices is discussed in outline. The reader is referred to the Appendices for more detailed information. 1.1
Project Overview
The overall aims of this project, as set out in the original specification, were as follows: •
Review current research into safe maximum and minimum human cognitive workload capabilities.
•
Identify the safe maximum and minimum human cognitive workload levels for the maritime industry and, if necessary, for different trades or conditions of work within it.
•
Develop a robust tool that can effectively and efficiently assess human cognitive workload levels.
•
Test the tool using examples of rosters/shift patterns from the maritime industry.
9
In the response to the initial tender, Human Reliability Associates (HRA) set out a number of workpackages to achieve these project objectives, and these are set out Figure 1 below. The overall programme of work was based on this structure. WP1: Approaches to workload assessment 1.1 Topic Overview 1.2 Subjective assessment methods 1.3 Performance-based measures 1.4 Physiological measures 1.5 Task loading methods 1.6 Influence Diagram methods 1.7 Industry specific approaches 1.8 Discussion and conclusions WP2: Workload assessment in maritime ops 2.1 Workload errors in shipping 2.2 The stakeholder community 2.3 Specification of stakeholder needs 2.4 Analysis of safety critical tasks 2.5 Performance influencing factors 2.6 Tool Selection Criteria WP3: Recommended Approach 3.1 Summary of viable methods 3.2 Finalise method selection criteria 3.3 Prepare decision matrix 3.4 Determine optimal method WP4: Software Development 4.1 Concept development 4.2 Design specification 4.3 Program implementation 4.4 Software validation WP5: Pilot Studies / Field Trials Figure 1 Project Workpackages (WP)
The first project workpackage (WP 1) involved a comprehensive review of the Cognitive Mental Workload (CMWL) literature. This provided information on the theories and experimental literature, together with the tools and techniques that have been developed to measure workload. It also allowed us to survey any available information on the issue of recommended limits for CMWL underload and overload. The results are summarised in Section 2 and the full survey is provided in Appendix 1. WP 2 focussed on the Maritime industry. A review of the publicly available marine incident databases was performed in order to identify the extent to which CMWL issues had contributed to incidents, and to evaluate the types of factors (Performance Influencing Factors PIFs) that appeared to have
10
contributed to underload or overload problems. This is reported in Section 3.1.4 and Appendix 2. The identification of the potential users of the proposed CMWL assessment tool (referred to as Stakeholders), and the possible applications for the tool in the marine industry were then addressed. These results are summarised in Section 3.2 and Appendix 3. A Task Inventory of the main activities carried out on the bridge was developed. The purpose of this activity was to provide a clearer understanding of the nature of the bridge activities, so that the CMWL assessment tool could be applied at the level of an individual bridge task if required. The results of this activity are reported in section 3.3 and Appendix 4. Based on the results of the stakeholder analyses carried out in WP 2, a set of criteria was developed in order to evaluate the available approaches to workload measurement from the perspective of the potential users of the proposed measurement tool. WP 3 was concerned with developing a recommended approach to workload measurement, based on the results of the previous workpackages. The approach adopted was based on the use of the Influence Diagram methodology. This a structured process for developing a graphical representation of the factors that experienced marine personnel perceive to be the primary drivers of CMWL, based on their operational experience. The methodology also allows the results of research findings and accident investigations to be incorporated into a comprehensive workload model. The justification for using this approach is set out in Section 4. WP 4 involved two parallel activities. The first of these was the development of a series of workload models by means of workshops using different groups of mariners from a range of maritime activities. These included deep-sea operations, coasters and high-speed ferries. The second activity was the software development process. This involved adapting an existing software tool for Influence Diagram modelling called IDEAS (Influence Diagram Evaluation and Assessment System) for use in workshop sessions when developing the workload models. A separate program called CLIMATE (Cognitive Loading Index for Mariners Assessment TEchnique) was developed to allow workload to be assessed by non-specialists in the mental workload area such as shipping companies and accident investigators. CLIMATE uses the workload models developed in the workshops but provides a user-friendly interface to allow assessments of factors contributing to workload to be made. Once the initial CLIMATE assessment tool was developed, a process of refining and updated was carried out following feedback from the MCA regarding the usefulness of the tool and comments on the interface. WP 5 developed a framework to allow external validation of the model by means of an onboard data collection process. In this proposed diary study, OOWs collect data at the end of their watches regarding the factors that the models predict will be the primary drivers for CMWL. Information is also collected on the level of perceived overload or underload experienced during
11
the corresponding watches. The objective of this study is to evaluate the predictive power of the methodology and to update the model based on new factors that are identified as being significant influences on CMWL. A pilot study has been performed and at the time of this report, a protocol for performing a long-term validation study was being set up. It is envisaged that this study would extend beyond the formal end of the study in July 2006, so that validation data could continue to be collected.
12
2.
Review of the mental workload Literature
2.1
Approaches to workload assessment
The first part of the literature survey in WP 1 focussed on the current literature on Cognitive mental Workload (CMWL). Relatively few treatments of the topic have focused upon consideration of workload error at both the upper and lower limits. The mental workload literature has tended to be mostly concerned with performance problems associated with tasks high in explicit demand (i.e., the overload or cognitive strain scenario). There is, however, a separate major strand of work aimed at exploring/modelling errors at lower levels of the task demand spectrum. In this area, investigators have focused primarily on the performance of tasks requiring vigilance (e.g., target detection, usually in unstimulating situations such as watchkeeping). Predictable performance shortfalls occurring during a vigil are known collectively as the Vigilance Decrement. Vigilance studies have been carried out extensively in maritime operations where detection of infrequently occurring targets has historically been both problematic and a common feature of the work situation. The overall theoretical position that emerges from the literature is that Cognitive mental Workload (CMWL) arises primarily from a mismatch between the demands of the task itself, (e.g. e.g. judgements about manoeuvring and monitoring the position of a ship during berthing) and the mental resources available to meet these demands. These include the skills and training of the OOW, the supporting bridge team, and technical support systems such as radars and GPS that supplement these human resources, if well designed. The second part of the review focussed on workload assessment techniques. The review indicated five main approaches to workload assessment: Subjective methods: These are based on the subjective experience of task demands. Data are in the form of beliefs, values, preferences and attitudes and is collected via completion of self report questionnaires. Task performance methods: These provide assessments in the form of either speed/accuracy trade off and/or error rate data for primary tasks and/or measures of cognitive resource availability assessed via completion of secondary tasks. Physiological monitoring: These methods are based on the recording of the physiological states that are thought to correlate highly with the subjective experience, and possibly the performance decrements, associated with high and low CMWL situations. Task loading: These approaches provide assessments of psychological load based on explicit measures of task demands coupled with a consideration of
13
the known psychological factors governing the ability to react to those demands. The first four of these approaches have been used to develop a wide range of tools for workload assessment, mainly in industries traditionally defined as high risk such as defence, road transportation, railways, aerospace, process control, and power generation. The choice of method is usually based on consideration of the special requirements of the industry concerned. For example, workload assessment for drivers of road vehicles is typically based on performance of a range of secondary test tasks completed concurrently with performance of the primary driving task. Conversely, use of physiological monitoring techniques is frequently used in the defence industry, which has a ready supply of personnel, equipment and high fidelity simulators. Consequently, staff can be recruited to participate in realistic missions using advanced system simulators with performance being monitored using a variety of sensor equipment. In addition to these specific CMWL assessment methodologies, a more general framework for human performance modelling was also included, which appeared promising as a means for basing the workload prediction tool on the knowledge and experience of mariners: Influence Diagram Evaluation System (IDEAS): IDEAS is an application framework, which allows the insights from theoretical research and from people with extensive practical experience to be combined in the form of a simple graphical model of workload. 2.2
PIFs derived from the workload literature survey
The CMWL literature review revealed a number of generic psychological factors likely to influence task performance given the presence or absence of task load. In developing final versions of the CMWL models, these factors were supplemented by known PIFs associated with reliability evaluations and other factors specific to shipping operations. High Workload Conditions Competition for resources: In high workload situations, efficient time sharing performance will occur when concurrent tasks impose differential demands on human information processing stages (e.g., perception, central processing, response execution), where they require different processing modalities (e.g., visual/auditory channels) and where they rely on the different processing codes (e.g., Verbal/spatial). Interference effects will become more prevalent where there is competition for the same structural resources. Opportunity for adoption of workload management techniques: Efficient performance can be maintained where the opportunity exists to defer/transfer work tasks until resources become available. Performance errors will arise where one or more tasks impose strict time constraints. Workload can sometimes be managed when additional work force resources are available. 14
Nature of the task demands Examples of tasks with High Intrinsic Demand • • • • • • •
Visually scanning or searching displays for multiple possible conditions Tasks requiring written instructions to be read Detecting visual differences between objects/targets Decision making Exercising judgement Estimation, calculation, conversion Remembering item/information
Low Workload Conditions Audible alarms: In low workload situations, the modality of the target signal can influence the likelihood of detection. Audio signalling techniques are generally superior to visual only systems. Signal Salience: The salience of the alerting signal is a factor likely to influence detection of a target in a positive way during a watchkeeping task. Increases to the amplitude of target signals midway into the vigil can reverse the vigilance decrement trend as can increases to the duration of the target signal alerting device. Signal Uncertainty: The degree to which a target event is anticipated will influence detection. Detection tends to be better where the uncertainty of an encounter is low (when an encounter is more frequent). Foreground/background effects: Detection of a target event is much better where the foreground target and background non-target events occur with predictable regularity. Infrequently occurring random target events, embedded within randomly distributed background non-target events, are particularly problematic during performance of vigilance tasks. Nature of the Task Demands Examples of tasks with Low Intrinsic Demand • • • • •
Discrete actuation of controls Continuous adjustment of control (e.g., steering) Performance of well learned, highly routine activities Detection of simple sound/audio alarm Orientation to sounds
The complete findings from the literature survey in WP 1 are provided in Appendix 1.
15
3. Workload assessment in maritime operations: Incident Investigations & Stakeholder analysis The first purpose of WP 2 was to obtain an overview of the types of accident in which CMWL overload and underload are contributory factors. This was accomplished by a comprehensive analysis of the publicly available marine accident investigation databases. This activity identified some Performance Influencing Factors (PIFs) that can contribute to under or overload, or may affect the likelihood of errors or accidents in conjunction with these extremes of loading. This supplemented the information that had emerged from the literature survey. The next objective was to obtain insights into the types of facilities that should be available in the software tool in order to satisfy the needs of potential users in the stakeholder community. This involved investigating the nature of the tasks carried out on the bridge. A clear understanding of these tasks is obviously necessary in order for the CMWL tool to be effective in the marine domain. The final objective was to develop a set of criteria to be used as an input to the tool selection process in WP3. 3.1
Workload errors in shipping
The primary input to WP2, described in detail in Appendix 2, is a comprehensive examination of the major publicly accessible databases on marine accidents. In this section, we describe the databases that were considered. 3.1.1 Databases examined To gather a sample of accidents in which cognitive underload or overload was a factor, accident reports from four separate databases were examined. These contained collections of marine accident reports from four major investigative bodies:
Australian Transportation Safety Bureau Marine Accident Investigation Branch (UK) National Transportation Safety Board (USA) Transportation Safety Board of Canada
Each of these four bodies is an independent organisation. The aim of each organisation is to determine the circumstances, causes and contributory factors of accidents. The organisations do not apportion blame or liability, nor are they responsible for enforcing recommendations made at the end of an investigation.
16
The databases of reports from each organisation are freely available on the World Wide Web. Some of the databases contain more accessible accident reports than others. For example, the NTSB database contains approximately two hundred and twenty three reports, but only thirty-four of these are accessible via the website. The MAIB seems to contain the highest number of freely accessible reports, with two hundred and twenty reports between the years 1999 and 2005 alone. The reports tend to be quite similar in structure, and are generally divided into three or four sections: •
factual information;
•
analysis
•
conclusions
•
recommendations
However, the reports can vary in length and detail, depending upon the circumstances and severity of the accident, and the level of investigation required by the investigative body. 3.1.2 Criteria for selecting accidents As expected, the four databases contain hundreds of marine accident reports, and so some selection criteria had to be determined in order to select a number of reports to use as case studies illustrating the effects of cognitive underload and overload on marine accidents. The project specification from the MCA gave an example of low workload as “maintaining watch with autopilot on in an open and calm sea at night”, and thus it was decided that the sample of workload errors should focus on accidents involving watch crew and / or the Officer of the Watch (OOW). These types of accidents tend to result from errors during navigation, manoeuvring and control of the ship. A large number of accident reports from each organisation were examined. As mentioned previously, the reports tend to be quite similar in structure and, in most cases, contain a summary of the accident and probable causes at the beginning of the report. This made report selection somewhat easier, as the accident summary could be examined initially to determine whether there was any indication of cognitive workload as a factor in the accident. If that was found to be the case, then the full report was examined in more detail to determine how and why cognitive workload contributed to the accident. 3.1.3 Number and type of accidents examined The types of accidents investigated by the four organisations can vary greatly, even within the single marine domain. Typical accident types include
17
collision, grounding, capsizing, sinking, fire on board, loss of vessel, injury on board, shifting or loss of cargo, and various near-miss accidents. Organisation
No. of Accidents
Range of Years
ATSB
25
1995 – 2002
MAIB
220
1999 – 2005
NTSB
34
1994 – 2004
TSBC
32
2001 – 2004
Figure 2 Number of accidents examined
Figure 2, above, shows the number of accidents that were examined for this study. In total, three hundred and eleven accident summaries were examined to determine whether cognitive workload was a factor in that accident. Of these, a sample of thirty accidents was selected as these were deemed due, in some part, to cognitive workload errors, and so were examined in more detail. In some cases, even after examining the report in detail, it was difficult to determine whether cognitive workload played a part in the accident, because it was not commented upon in the accident report, or there was insufficient information available in the report to allow assessment of the workload of the watch crew. Thus, these accidents could not be chosen for the research sample. As mentioned previously, thirty accidents were examined in more detail for examples of cognitive workload errors. The majority of accidents in this sample were groundings and collisions. This is typical of accidents where CMWL was a factor, as events such as groundings and collisions tend to occur due to workload errors in manoeuvring, navigating and controlling the ship. Other types of accidents, such as fires, equipment failures, injuries, or cargo problems tend not to have CMWL as a factor, and usually do not involve members of the watch crew (while they are on watch, at least). Figure 3 below shows the thirteen accidents that were selected, as they were deemed useful for this study to demonstrate examples of errors due to high or low cognitive workload.
18
Report ID 159
Year
Organisation
Vessel Name(s)
Accident Type
2000
ATSB
Collision
196 190 211 MAIB 1/6/109 32/2000 Elm 34/2000 7/2002 12/2005 11/2004 M02C0064
2003 2003 2005 1999
ATSB ATSB ATSB MAIB
Star Sea Bridge / Sue M Lancelot / Jenabar Tauranga Chief Spartia / Hannah Lee Baltic Champ
1999 1999 2000 2001 2004 2004 2002
MAIB MAIB MAIB MAIB MAIB MAIB TSBC
Collision Near-Miss Grounding Grounding Collision Contact Collision
M04L0050
2004
TSBC
Dole America Elm / Suzanne Betty James Lomur Cepheus J / Ileksa Scot Venture Canadian Prospector / Stellanova Catherine Legardeur
Collision Grounding Collision Grounding
Grounding
Figure 3 Examples of accidents due to high / low cognitive workload
3.1.4 List of Performance Influencing Factors (PIFs) identified from the survey This section describes a list of Performance Influencing Factors that were noted during the examination of accidents from the four databases. These factors can influence human behaviour and can contribute to accidents in which high or low cognitive workloads are a factor.
The time of the incident: Many accidents where workload is an issue tend to occur early in the morning, between the hours of 0000 and 0600. At this time of day, people are naturally more tired and less alert than, say, in mid-afternoon.
The stage of the person’s shift: Many accidents occur at the beginning of a shift, when a person is assessing the current situation and trying to obtain a mental model of the current situation. Conversely, at the end of a shift, the person may be preparing to hand over, or may be distracted by thoughts of finishing the shift. In either case, the person may not be fully concentrating on the task(s) at hand.
The number of persons on watch: Many accidents tend to occur when a person is on watch alone, as that person is then responsible for all aspects of the navigation of the vessel. In cases where there are multiple people on watch together, distractions and lack of communication between people can also lead to accidents.
The type of vessel: This can sometimes contribute to the accident, especially if there is a large vessel navigating near a smaller vessel. In
19
many cases, the smaller vessel cannot be sighted visually, due to visual restrictions in the wheelhouse.
The technology used: Many vessels now have numerous navigational aids, which can contribute to accidents in two ways. Firstly, people can become over-reliant on these aids and thus less vigilant in their visual lookout and in cross checking data between instruments. Conversely, people may not use the navigational aids as they prefer to navigate by visual references, or they may not fully understand how to use the aids.
Weather conditions: These can contribute to accidents by reducing visibility (e.g. rain, snow, fog, glare of sun), or making it more difficult to control a vessel.
The area of the incident: Collisions are more likely to occur during the voyage, when vessels are passing each other in narrow channels. Groundings, or contact with pier walls, etc., are more likely to occur when berthing, coming into port or exiting port.
Fatigue: There have been extensive studies in to how fatigue can influence performance, and, when combined with high or low workloads, will almost inevitably lead to an accident.
•
Environment: The environment around the person on watch can influence performance. For example, if the person can conduct all tasks whilst sitting down, it can make the person less vigilant, and reduce the ability to switch between low cognitive workloads and high cognitive workloads.
3.1.5 Conclusions regarding PIFs The review of marine accident databases provided some useful insights into the types of scenario that the CMWL measurement tool needed to address and some of the PIFs that affect the level of workload. These and other PIFs also contribute to the likelihood of an error leading to an accident. Examples of incidents where underload, overload and the switch from underload to potentially overloaded situations were identified. However, in common with many accident investigation databases, the nature of the investigation process used tended to focus on the ‘what happened’ rather than the ‘why’ of accident causation, and hence the insights into specifically workload related factors are quite limited. Nevertheless, the more general recurrent PIFs that have been identified are useful if the CMWL measurement tool is to be linked with a marine error prediction process, as will be discussed in Section 4.2.
20
3.2. Stakeholder analysis The purpose of the stakeholder analysis was to ensure that the facilities and functions provided by the CMWL tool satisfy the needs of the potential user population. The work carried out on this issue is described in detail in Appendix 3. In this section, we provide a summary of the introduction and main results. 3.2.1 Stakeholder definition The word “stakeholder” is used to mean a participant in the shipping industry, i.e. a person, group, organisation, body corporate, authority, etc., of any kind which in some way or other contributes to the provision of shipping services. Thus, a pilot, a ship’s crew, a classification society, a marine equipment manufacturer and a flag state administration are all stakeholders in the shipping industry. They, and many other similar types of individual and organisations, each contribute to the provision of a shipping service to endusers. These include manufacturers, importers, exporters, consumers, passengers, the general public and society at large, for whom the costeffectiveness of shipping, and the risks associated with their reliance on shipping, are valid considerations. Shipping is characterised by having a relatively large number of stakeholders, only a few of whom are likely actually to make use of the tool (although many more may make use of information it generates). Of those who may actually use the tool, only a minority are likely to be primary users in the first instance. Their requirements should therefore dominate in terms of their influence on methodology selection. 3.2.2.
Stakeholder categorisation
A categorisation of stakeholders is proposed in respect of the application of the workload assessment tool to the tasks performed by a watchkeeping navigating officer (i.e. the OOW, this scope having been agreed during the project kick-off meeting). This categorisation is as follows. Stakeholders for whom, in respect of the core objectives of that stakeholder, the OOW’s cognitive workload is either: a) Directly relevant (stakeholder will use the tool and make decisions based on results); b) Indirectly relevant (will not actually use the tool, but will make decisions informed by results); or c) Not relevant at all (decisions made largely irrespective of results of tool). This categorisation is not clear-cut, since complex interactions exist between many of the shipping industry stakeholders, and because the eventual scope and applicability of the workload assessment tool is yet to be determined. The table provided in Appendix 3 comprises a comprehensive list of all 21
potential shipping industry stakeholders (as defined in 3.2.1. above). Figure 4 below contains only the direct stakeholders, which will be the primary source of requirements from the stakeholder community. Stakeholders
Category
Accident investigators
Direct
Coastguard agencies
Direct
Role Investigation of maritime accidents Dissemination of safety advice and recommendations Coordination and/or provision of search and rescue services to the maritime community Operation of MRCCs (Marine Rescue Coordination Centres)
Maritime professional institutions
Direct
Ship safety Minimum safe manning Crew education, training and licensing Crew safety
Maritime research organisations
Direct
Support for ship design and operation development/improvement
Master
Direct
The senior nautical officer on board with overall day-to-day responsibility for operating a ship
Navigating OOW
Direct
Watchkeeping nautical officer responsible for day-to-day navigation and operation of the ship
Navigation simulators
Direct
Shore-based schools for navigational training of nautical watchkeeping officers
Pilots
Direct
Ship handling Ship safety En route / channel safety
Port control inspectors
state
Regulatory policy-makers Ship design organisations Ship managers
Direct
Shipboard inspections to verify compliance with IMO SOLAS and other regulatory requirements
Direct
Flag State maritime policy development
Direct
Design responsibility for new ship construction, refit and repair
Direct
ISM compliance Pollution avoidance Crew training Training in routine operations and emergency preparedness
22
Stakeholders Ship owners
Category Direct
Role Selection of ship design options Safety design optimisation Seeking exemptions or equivalencies to prescriptive regulations Negotiation with yards Negotiations with insurance companies Safety system investments Support for safety management investigations/ decisions ISM compliance Pollution avoidance Crew training Training in routine operations and emergency preparedness Accident and incident investigation Database structuring and incident/accident information collection and archiving Ship repair and maintenance Ship provisioning
VTS / VTMS Direct organisation
Provision and operation of vessel traffic information and management systems in ports and confined waterways
Figure 4 Primary stakeholders and roles
3.2.3 Stakeholder Requirements Apart from the specific functional requirements for stakeholders, which will be discussed in detail below, the following general requirements have been developed, based on the list of Principal stakeholders in Figure 4: a) Applicable at the overall task and/or sub-task level; b) Suitable for predictive and reactive task analysis; c) Able to accommodate safety critical tasks in both the high workload and ‘dormant’ states; d) Minimal ambiguity associated with input parameters; e) Clear user guide manual; f) Repeatability; g) Provide both qualitative and quantitative outputs; h) Simple, robust and rapid version needed for use on board. For stakeholders in category (a), their anticipated specific functional requirements are defined below in Figure 5:
23
Stakeholder Requirements: Definitions Objectives Relevance:
User
Application
Practicalities
Importance Specific Needs Functional requirements
What are the key roles and objectives of this stakeholder within the maritime community? As a possible user of the tool, how relevant is cognitive workload to the key objectives of this stakeholder? Rate H/M/L Is this stakeholder a primary (P, will use it), secondary (S, will be directly affected by use) or tertiary (T, will only be indirectly affected by use) user? P scores high, T scores low Will this stakeholder use the tool primarily in: acute, demanding situations (Tactical); for routine operational work planning on board (Strategic); or as a long-term planning aid (Policy-setting)? T scores high, P scores low Is use of the tool by this stakeholder likely to be complex (C), middling (M) or simple (S) in terms of the required expertise, input data and time to use the tool? C scores high, S scores low Summation of relevance, class, application and practicality scores, on scale of 1 (lowest) to 9 (maximum). Brief description of stakeholder needs in relation to cognitive workload of OOW Specific functional requirements of workload assessment tool, given this stakeholder's specific needs (see also list of common requirements)
Figure 5 Stakeholder Requirements - Definitions
3.2.4 Conclusions on stakeholder requirements The analysis of stakeholder requirements was carried out by the member of the project team with extensive experience in the marine industry, and this has provided valuable insights into the potential needs of the marine stakeholders. This information was also an important input to the design specification for the CMWL tool, which will be discussed in Section 4.
24
3.3
Analysis of safety critical tasks
An initial Task Inventory has been compiled as a starting point for identifying Safety Critical Tasks (SCTs). Detailed information regarding the Task Inventory is available in Appendix 4. A summary of the information contained in the Appendix is provided in this section. Only tasks undertaken on-board, by the ship’s normal complement were considered. There are three departments: Deck, Engine and Hotel. Only the deck department was considered in detail. The ship’s master is part of the deck department and has overall command and responsibility for all activities onboard. Tasks are split into watch-keeping duties, and other (off-watch) activities. The focus is primarily on watch-keeping activities, with off-watch activities only being considered to the extent that they overlap with or impinge on watchkeeping tasks. In addition, from an accident prevention perspective, only tasks undertaken prior to an event are considered, i.e. the study is not dealing with emergency response activities by those on board after an accident (but will include emergency preparedness and actions taken / not taken to avert an imminent accident, e.g. collision avoidance). Tasks include the following: 1.1
Deck department: a) Navigation and control of the ship (passage planning, conning, external communications) b) Manoeuvring (including berthing, un-berthing, mooring, anchoring) c) Control of seaworthiness (ballasting, stability, watertight integrity) d) Management of cargo operations (loading, unloading, stowage, securing) e) Maintenance of hull structure and fittings f) Emergency preparedness (maintenance, training and drills with lifesaving and fire fighting appliances
1.2
Engine department a) Control of main and auxiliary machinery (propulsion, steering, power generation, utility systems, etc) b) Maintenance of machinery, systems and equipment on board, including communication systems
1.3
Hotel department a) Provision of food, drinking water and medical care for crew and passengers b) Shipboard house-keeping
All departments also have administrative tasks (liaison with shore authorities, record-keeping, compliance with security, safety and commercial management requirements, etc). A more detailed breakdown of these tasks is set out in Appendix 4.
25
4.
Choosing the Methodology
This section describes the process that was use to select the chosen methodology. 4.1
Requirements for the CMWL assessment tool
In deciding upon a specific approach to developing the MCA CMWL assessment tool, the following sources of information were taken into account: •
The original tender specifications
•
The results of the survey of research in CMWL and the types of techniques currently available to assess it
•
The results of the survey of marine accidents involving CMWL issues
•
The stakeholder analysis
•
Preliminary results from the task inventory investigations
Each of these sources of information will now be summarised as the starting point for the particular approach that was chosen for the CMWL measurement tool. 4.1.1 Tender specifications The project objectives were stated as follows: •
Review current research into safe maximum and minimum human cognitive workload capabilities.
•
Identify the safe maximum and minimum human cognitive workload levels for the maritime industry and, if necessary, for different trades or conditions of work within it.
•
Develop a robust tool that can effectively and efficiently assess human cognitive workload levels.
•
Test the tool using examples of rosters/shift patterns from the maritime industry.
The suggested work programme in the Invitation to Tender also provides insights into the capabilities of the tool anticipated by the MCA: •
Identification of the maximum and minimum cognitive workload limits that seafarers can safely adhere to
•
Development of a computer-based cognitive workload assessment tool that can analyse specific workloads and report whether they fall within
26
the identified safe limits (guidance or instructions in its use should also be provided) •
Using this tool to assess a selection of ship crew rotas and duties to identify points of highest risk, and so identify safe manning levels and task allocations for different roles on board ships
4.1.2 Results of the survey of existing research and methods for measuring CMWL A number of criteria for assessing the effectiveness of CMWL tools emerged from the workload literature survey. These criteria are mainly drawn from those used to evaluate tools and techniques used in psychological research. Although these criteria are important, they were given less weight than those derived from the specific requirements of the maritime industry when choosing a methodological approach to CMWL for this project. The criteria are as follows: Objectivity: The degree to which the measurement tool is independent of the person administering the test and the person analysing the results (i.e., freedom from experimenter effects). Reliability: The degree to which the measurement tool can be depended upon to provide results that are replicable over time and analysis teams. It is generally accepted that the degree of homogeneity, consistency and stability of results obtained from independent experts – exercising their professional judgement - provides an acceptable test of measure reliability within the mental workload evaluation domain. Validity: A measure of the degree of provenance a measurement tool provides regarding its accuracy with regard to the items it intends to measure. Concern here is with the ways the method deals with the presence or absence of uncontrolled confounding variables that may influence results. Sensitivity: The degree to which obtained measures are responsive to actual changes in task workload. Ideally, fine-grained changes to task demand should result in fine-grained changes to the results of the analysis. Diagnosticity: The degree to which the measure is able to discriminate between different kinds, or different sources, of workload. For example, can the technique distinguish tasks which demand perceptual resources from those which influence task execution mechanisms? Generalisability: The extent to which workload measures obtained in particular setting (e.g., simulator studies) with a particular group of people (e.g., lookouts) generalise to a defined universe of situations (e.g., navigation, propulsion management) and/or population of employees.
27
Usability: The extent to which a technique can be used with accuracy, efficiency and satisfaction of users. In this context, the term satisfaction is also used to provide an indication of the acceptability of the method to the stakeholder community. 4.1.3 Results of the survey of marine accidents involving CMWL issues The review of maritime incidents indicated that the CMWL tool needs to be able to include the effects of factors other than those directly related to CMWL under or overload. There were clear indications that CMWL variables interact with other PIFs to give rise to the observed accidents, particularly, in the underload situation, those related to fatigue. PIFs identified in the review, such as weather conditions, bridge automation, number of persons on watch and vessel type are typical of the factors that need to be considered. The CMWL assessment tool therefore had to have the capability to include environmental factors of this type. We anticipate that a proposed extension of the project, involving a continuing programme of information gathering at the level of day-to-day shipping operations, will provide additional insights into this area (see discussion in Section 8). 4.1.4 Stakeholder analysis The stakeholder analysis has produced a wide range of information regarding the required capabilities of the tool. In particular, it is clear that there is a requirement that the tool is usable in several different modes for different application areas and different types of user. For example for on-board applications by pilots or masters, the tool will need to be simple, rapid to use and require a minimum of expertise. For more detailed off-line use, e.g. for the determination of suitable manning levels, for the purposes of policymaking, the tool can be more comprehensive and detailed. 4.1.5 Preliminary results from the task inventory investigations The results of the task analyses provided a number of insights into the requirements of the CMWL tool. The first is the requirement to be able to apply the tool to whatever level of task decomposition is required. The tool also needs to be applicable to tasks that are predominantly cognitive in nature such as decision-making and problem solving as well as action orientated tasks, information acquisition and communications. 4.2
Determination of appropriate methodology
In deciding upon the basic methodology for the CMWL measurement tool, it become apparent that the conventional approach of defining a set of selection 28
criteria, evaluating each alternative approach on the criteria and then choosing the method in the matrix that scores highest on all the criteria, was not appropriate for this project. The requirement in the tender to ‘identify safe maximum and minimum human cognitive workload levels for the maritime industry…..’ implies that CMWL, which is essentially an internal state of a person or group, has to be related to some external measurable parameters such as error probability. ‘Safe’ in this context, does not refer to the safe limits of the effects of workload on the well being of the individuals that experience its extremes. Rather it implies an acceptable level of risk that a negative outcome will occur (e.g. an accident resulting from an error arising from underload or overload). This requires a workload measure that can be mapped on to an externally verifiable dimension such as error or accident probability. Another fundamental requirement arises from the potential use of the CMWL tool as part of a risk based assessment process. This is that it is both able to measure and predict expected workload and performance measures, and that these predictions are capable of being objectively validated by, for example incident data collection. A final essential requirement is that the methodology is able to model the effects on CMWL of the PIFs that are identified by mariners from their own experience as being significant drivers of workload. Subjective methods: These methods are attractive from the point of view of their simplicity, ease of application and face validity. However, because of the wide variability within individuals in their experience of loading even under identical conditions, this group of techniques was ruled out as the basis for the measurement tool itself. Nevertheless, subjective measures are attractive as one of several methods for evaluating the predictions of workload measurement techniques in situations, such as near misses, where there is no observable effect of the workload, such as an error has actually occurred. It is widely recognized that identifying these situations provides an indication of potential risks, where interventions should be considered in order to avoid an actual human error leading to a severe outcome. It would therefore be useful to record these situations, where they were associated with extremes of CMWL, together with a measurement of the factors included in the workload prediction tool. This would be part of a longer term programme of work to validate the predictions of the tool, based on diary studies, which is described in Appendix 7. Task performance methods: These methods essentially evaluate loading levels as a function of decrements in primary or secondary task performance. Although, as mentioned above, we believe that workload measurements need to be linked to error rates, from the point of view of risk management, this is not he same as inferring CMWL from error rates, in either primary or secondary tasks. This is because CMWL is only one contributor to error rates, and the size of its contribution is likely to vary depending on task type and other factors. These methods are more suited to laboratory research, where the resistance that would occur to performing additional tasks in operational situations would be less of a problem.
29
Physiological methods: Similar considerations apply to these techniques to the previous group. Although they could be used in a laboratory situation as another variable for validating the results of other techniques, their application in shipping operations would appear to be impractical, because of the requirement to monitor physiological variables using complex and sensitive equipment. Nevertheless, they may have applications in laboratory settings when validating the predictions of other techniques using a process of triangulation. Task loading methods: These methods, which involve an assessment of the demands arising from a task on the basis of a task analysis and the enumeration of the cognitive components of the tasks and sub-tasks, have a number of attractions. They involve an auditable process of evaluating the cognitive ‘primitives’ that make up tasks. If carried out with care, these evaluations can be highly consistent between assessors. However there are concerns about whether or not the taxonomies of cognitive task elements used to make these assessments have a strong basis in theory or experimentation, and the extent to which their components can be reliably assessed with regard to their individual contribution to workload. Nevertheless, they are intuitively appealing and hence there is a case for including them in the proposed CMWL methodology. Influence Diagram: This technique provides a modeling environment which allows the insights arising from theory based approaches, together with the knowledge and experience of mariners, to be combined in a form suitable for assessment. They also provide a method for allowing workload assessment to be linked to externally measurable data such as incident reports to allow verification of the predictions of the tool. Based upon the considerations discussed earlier in this section, the Influence Diagram Methodology was chosen to develop the workload modes to be used as the basis for the computer based CMWL assessment tool. The following sections set out the way in which the methodology was developed and applied.
30
5.
Methodology
Using the insights gained from the research conducted earlier in the project, such as the analysis of shipping incidents, and the review of mental workload literature, a framework model of the factors influencing CMWL was developed which expressed workload as a function of measurable characteristics of tasks, personnel and the context within which marine operations are performed. Separate models were developed for underload and overload situations, since the underlying processes that give rise to these states are very different, based on the research literature considered in the survey. In order to populate the model, data were drawn from the research literature, and structured workshops with subject matter experts from the marine industry. The framework model, including the factors influencing CMWL, was expressed in the form of a family of Influence Diagrams, using an already available Influence Diagram modeling software tool called IDEAS. A detailed description of Influence Diagrams in provided in Appendix 6. This framework model was used as the starting point (or seed model) for a series of interactive sessions with subject matter experts from the commercial marine industry, including Masters, First and Second Officers, and recently trained personnel. These sessions were used to develop workload models for different domains of shipping activities from high speed ferries, coasters and deep sea operations such as tankers. Both overload and underload models were developed. These models were then validated by using them to assess scenarios known to the workshop participants The main benefit of this approach was that it was pragmatic, evidence based, and did not depend on the accuracy of specific theoretical approaches in order to produce a workable CMWL tool. The use of the IDEAS modeling tool allowed the project to build on an existing tried and tested modeling environment that has been successfully utilized in a wide range of applications over the past 10 years. This approach also allowed more of the software resources to be utilized for the development of the interface and user facilities.
31
6. Summary of procedure for developing an Influence Diagram (ID) Model 6.1
Developing the ID model
The first part of the ID procedure is the development of a seed model. This model is developed separately, and without the input of the consensus group, and is based upon factors that are known to cause feelings of stress or pressure in the Officer of the Watch (OOW). These factors were mainly obtained from the review of shipping accident databases, and the MAIB’s Bridge Watchkeeping Safety Study, but some insights from the theoretical literature were included. The seed model used for these interactive sessions included eight primary factors, and twenty-one subfactors. The initial seed model is shown below in Figure 6, and discussed in more detail in Appendix 6. There are three main reasons for developing a seed model: 1. The seed model helps to establish the goal of the Influence Diagram by demonstrating how these factors contribute to overloaded. For example, as the number of distractions increases, the OOW might start to feel increasingly under pressure or overloaded. In contrast, as the degree of visibility improves, this might help to decrease feelings of loading. 2. The Influence Diagram process can be somewhat difficult to grasp at first, as it requires a different, less traditional way of thinking about the problem area. The seed model helps to demonstrate the conventions of Influence Diagram modelling. 3. Because IDs require an alternative way of thinking about scenarios, it can be difficult to stimulate thought and discussion when starting with a blank ID. By showing the group a seed model, the group is encouraged to discuss the factors already present, and to expand upon them or delete them as appropriate. For many participants, the ID session requires a different way of thinking about the problem area, and so it is useful to begin the ID session with a general discussion about the factors that influence workload. The group is then shown the seed model, and the participants are invited to discuss the model and comment upon the factors included. The group is asked to discuss the seed model to determine whether all of the factors are relevant and whether they are in the correct place. Some factors can be expanded upon (by adding subfactors) to more clearly define how they impact upon workload. Other factors may not need to be decomposed any further, as they are self-explanatory. The group may also decide that some
32
factors should be removed from the model, as they do not have a significant impact upon cognitive workload. At this stage, the group is encouraged to add factors influencing workload from their own experience, to expand upon those included in the seed model.
33
Figure 6 Initial Overload (Seed) Model
34
6.2
Weighting the factors
In addition, the group is asked to consider the placement of the factors. Factors that are lower down in the diagram (i.e., subfactors) will have less influence on CMWL than those factors at higher levels. The strength of the factors becomes more diluted towards the bottom of the tree. If the group considers a particular subfactor to be very influential, then that factor may be moved further up the tree, to become a primary factor, which will increase its strength and influence in the overall model. The next step is to add weights to each of the factors in the model. Not all factors will influence workload to the same degree, and the differences between the strength of the individual factors can be reflected by adding weights to the factors at each level of the tree. When all of the weights have been assessed, the IDEAS software can use the combined Influence Rankings for all the factors in the model to show which factors will have the highest impact upon the overall outcome if they are changed from their current ratings or states. This is a useful exercise, as displaying the Influence Rankings allows the group to see how the different factors will impact upon the overall outcome at the top of the diagram. 6.3
Rating the factors using a scenario
When the group is satisfied with the weights and position of the factors in the model, it should then be tested for accuracy by entering ratings (usually on a scale of 1 to 100) for a particular scenario. The group is asked to think of a real scenario that they have experienced in which they felt highly pressured and on the brink of becoming so overloaded that, had anything else happened, they probably would have made an error or an accident might have occurred. On the basis of all the ratings entered, combined with ratings, the Seafarer’s Loading Index (SLI) is calculated. This index reflects the overall loading of the individual for the scenarios represented by the ratings and the weights. The SLI is on a scale from 0 to 1, with 0 meaning that, in an overload situation the individual is not suffering from overloading (i.e., best case), and 1 meaning that the individual is experiencing the worst case possible in which he/she is completely overloaded. An SLI of 1 indicates that all of the factors contributing to loading are at their worst case possible, whereas an SLI of 0 indicated that the factors are at their best case. It should be noted that “best case” may actually represent a moderate level of loading, rather than a minimum level. This is because a moderate level of loading would represent the level at which the probability of human error arising from cognitive loading was also at its minimum.
35
6.4
Reviewing and updating the model
When the model has been completed (i.e., all factors, weights and ratings have been added) and the SLI has been recalculated, the group is then asked whether the resultant SLI reflects their expectations and experience of the overload situation. If the group thinks that the model does not realistically reflect their thoughts then the model can be altered accordingly. For example, weights can be changed to increase or decrease the impact of certain factors, or factors can be moved further up or down the tree to increase or dilute their influence on overload. It is useful to try out different “what if” scenarios to get a better idea of how accurately the model reflects the group’s beliefs. For example, if the group believes that in a highly stressful situation the addition of an extra qualified person on board the bridge (perhaps as a lookout, or to deal with communications) would make a significant difference to the loading level experienced by the OOW, then this can be tested by increasing the rating of the manning levels factor, and examining the resultant SLI. By testing the model with different scenarios, the group will eventually come to an agreement of a model structure that best reflects their own beliefs, opinions and experiences about the factors that influence overload. These insights are combined with the results from the literature and from incident investigation data to represent a model that includes all the available knowledge relevant to the domain being addressed.
36
7. Summary of results from the ID sessions Four Influence Diagram (ID) sessions were conducted with groups of officers from three British shipping companies. The officers ranged in rank from master to third officer. Four separate overload models and two separate underload models were created during these sessions. The models are discussed briefly in this section, and in more detail in Appendices 6 and 8. There are a number of factors and subfactors that are common across all of the overload models that have been developed with the separate groups of sea farers. These factors are listed in Figure 7 below and are described in detail in Appendix 8. Name of Factor •
Quality of bridge automation
•
Clarity of roles & responsibilities
•
Primary task characteristics
•
Communication
•
Bridge manning levels
•
Telephone/radio/other distractions
•
Severity of perceived consequences
•
Onboard relationships/other crew members
•
Bridge crew competence
•
Complexity of the task
•
Environmental conditions
•
Length of time on watch
•
Distractions
•
Disruption to watch patterns
•
Degree of restricted visibility
•
Quality of rest periods
•
Fatigue
•
Responsibility to additional duties
•
Training
•
Level of natural light
•
Experience
•
Weather conditions
•
Flexibility of primary task
•
Administrative tasks
Figure 7 List of common overload factors
As can be seen from Figure 6 (the seed model), these factors are all present in the seed model, and thus it could be argued that this is the only reason they are common to all four completed overload models. However, it is worth noting that some of the factors in the seed model have been removed by different groups, for different reasons, and this indicates that each of the groups did spend time discussing the factors in the seed model, before deciding whether to keep these factors or not. As mentioned previously, the factors in the seed model and in Figure 7 are commonly known to influence workload, regardless of the type of vessel or the situation under discussion. This is likely to account for their presence across all the overload models created during the Influence Diagram sessions.
37
Although there are twenty-four factors and subfactors that appear in each of the overload models, there are differences in how these factors have been weighted by the individual groups. There are a number of factors that have been similarly weighted by each of the individual groups, meaning that, despite differences in vessel types and operations, these factors are largely seen to have the same effect upon the workload of the bridge crew members. These factors are: •
Bridge manning levels
•
Fatigue
•
Degree of restricted visibility
•
Training (of bridge crew)
•
Experience (of bridge crew)
•
Communication
•
Telephone / radio / other distractions
•
Complexity of the task
•
Disruptions to watch patterns
•
Quality of rest periods
•
Responsibility to additional duties
The disparities between the remainder of the factors may be down to the subjective opinions of the individual group members but may also be due to the different trades of the individual groups (i.e., tankers, containers, fast ferries and coastal vessels). The differences in trades means that the groups may have different priorities and goals, and thus the same factor might be of utmost importance to one group, but of relatively little importance to another. There are a number of factors that appear in some, but not all of the overload models, and these are discussed in detail in Appendix 6. There are a number of possible reasons for this: • In some cases, the group may have renamed or combined a factor with another factor to clarify its meaning and impact upon workload. • Some factors are particular to the type of trade the ship and its crew members are engaged in. • Some factors may have been discussed in more detail with one group than with the other groups. The very nature of these interactive group sessions means that the model is shaped by the thoughts and feelings of the group on the day, as well as by their more universal opinions on the causes of workload. The facilitator must be careful to control the conversation to ensure it remains within the scope of the Influence Diagram session, whilst at the same time, giving the group enough
38
independence to ensure they are not guided or influenced by the facilitator’s own thoughts or opinions. Thus, some areas may have been investigated more thoroughly in one group than in the others. As mentioned previously, many factors and subfactors feature in one or two but not all of the overload models. The main reason for this may be that one group explored a particular factor or subfactor in more detail than the other groups. There are some factors amongst these that apply to all bridge crew members, regardless of shipping trade or type of vessel they serve on. To this end, a composite model was developed in an attempt to capture all of this knowledge and combine it in a single overload model that is generic enough to be used across all shipping trades, and yet specific enough to include the opinions of all the officers in the Influence Diagram sessions. A composite model was also created for the underload domain. These models and a detailed description of their development and validation are included in Appendix 6.
8. Conclusions and Future Research Directions This project has achieved its major objectives, including the development of a practical, evidence based methodology for assessing CMWL that can be applied both proactively and retrospectively within the shipping industry and by regulators, accident investigators and other stakeholders (objective 2.3 in the original project specification). The CLIMATE computer program provides a very simple to use but powerful tool for assessing CMWL without a deep knowledge of workload theory. The process used to develop the underlying models that drive CLIMATE was able to incorporate all of the available sources of knowledge on CMWL, both theoretical and practical. It was gratifying that the workshops used to develop the CLIMATE models were perceived by the participants as being practical, useful and enjoyable. In most cases, the participants felt that this was a very practical line of research and were surprised that it had not been attempted before. This gives us confidence that the methodology and resulting CLIMATE software will achieve a high degree of acceptability and application in the shipping industry. The project has provided a comprehensive review of the research literature on CMWL, with particular reference to the tools available for its measurement in accordance with objective 2.1 of the specification. Using the CLIMATE tool, we have obtained indicative measurements of levels of maximum and minimum acceptable CMWL to satisfy the requirement set out in objective 2.2 of the specification. In order to achieve objective 2.4 of the specification, we conducted a number of tests of the CLIMATE tool by using it to evaluate scenarios, provided by the workshop participants, where underload or overload was expected to occur. These scenarios included measurements of shift patterns and rosters which
39
might impact on fatigue and therefore on the level of CMWL. The results of these evaluations showed that the predictions of the CLIMATE tool, based on the CMWL models that it contains, were generally in accord with the perceptions of workload experienced by the participants in these scenarios. Although this test of the models predictive capability is based on a small sample size, part of the work carried out in the latter part of the project has been concerned with the development of a framework for more comprehensive testing programme. Appendix 5 provides an example of a letter sent to shipping companies in order to set up the necessary data collection processes.
40
8.1
Proposed future research
There are several future areas of work that would enable the MCA to achieve an additional return on its research investment in the current project. These areas are discussed in detail below. •
A continuation of the programme of workshops to develop a set of more tailored models for particular shipping sectors.
Although members of the research team had prior experience in the Influence diagram workshops, this project required the development of much more complex models than had been covered in our previous modelling work. There was also a requirement to build up protocols to manage the workshops in the most efficient manner, in addition to extending the capabilities of the existing IDEAS modelling software tool. Whilst we believe that we have constructed a robust generic model of CMWL, our research showed that there are significant differences between different trades, ship types and the factors influencing CMWL in each category. One of the areas in the proposed future research programme would therefore be to fine-tune the results to assess more accurately the predicted workload in specific sectors. •
More fully address underload and issues relating to changes from underload to overload.
All of the workshop sessions included the exploration of both underload and overload. However, although we believe that we produced a valid model of underload, there were socio-technical factors present that could have reduced the accuracy of this model. Participants were perhaps unwilling to admit that they had experienced underload sessions that could have impaired their performance, as this might be seen as negligence or lack of sufficient motivation to overcome these conditions. Conversely, most participants were quite willing to admit that they had frequently experienced and successfully coped with overload, as this would be seen as affirming their capabilities and professionalism. These considerations suggest that this area requires more extensive exploration in order to overcome these possible biases. Discussions concerning the transition from underload to overload produced only limited responses, from the participants, even when specific incidents involving this issue were cited. Despite these difficulties, we believe that underload and transition situations definitely need to be explored further, and we propose that this is included in a follow-up programme of work. •
Exploration of the relationship between the predicted workload and the likelihood of errors leading to particular types of high risk accidents.
The ability to map the CMWL dimension on to an error likelihood scale is a major benefit of the CLIMATE methodology that was not included in the original project specification. This capability needs to be developed further from both a modelling and application perspective. For example, the
41
workload models used by CLIMATE focussed on the factors necessary to predict CMWL. It is possible that certain categories of tasks need to consider additional error causation factors beyond those included in workload models. It would also be useful if the models were extended to predict particular types of high consequence accidents with greater accuracy. This links with a more general issue regarding the different applications of CLIMATE e.g. prediction, for the purposes of risk management, such as in the Formal Safety Analysis (FSA) domain, or for generating proactive guidance in areas such as manning levels. These issues could be addressed in a future project specifically aimed at accident prediction and error management. This work might be appropriate as a joint project between MCA and the MAIB. •
Full implementation of the Diary Studies to enable the workload predictions of existing and future models to be validated, and to incorporate a wider range of factors from field experience.
The validation of the predictions of the model using the proposed onboard Diary study is one of the highest priorities for a future research programme. Having set up the protocols to carry out such a study by the end of this project, we need to carry out a more extended data collection effort. In order to benefit from the momentum of the current project and the positive feedback from the shipping company participants, it is desirable to implement this exercise as soon as possible. There are many potential collateral benefits to both the MCA and the shipping companies, in terms of being able to make decisions in areas such as staffing levels, training and procedures based on factual evidence and data collected by the Diary studies. These benefits would be in addition to achieving external validation of the CLIMATE tool. We believe that it would therefore be useful for the MCA and the shipping companies to discuss joint short term funding to implement this programme until more long term funding becomes available. •
A programme of work with shipping companies and other potential end users of the tool, to verify its practicality in use, and to develop insights into how it should be modified and extended to address needs and applications not anticipated in the existing research programme.
Although the tool has received some preliminary use within the MCA, a comprehensive study of the potential applications of CLIMATE by a range of potential users is desirable. This would include the provision of training in the CLIMATE assessment process and then a programme of trial application by a range of end users to refine it so that it addresses their needs more effectively. The immediate potential end users include the shipping companies and the MAIB, which could find an immediate application for the tool in assessing the contribution of CMWL factors in accident causation. •
Application of the existing methodology to develop a Voyage Simulator that would allow dynamic predictions of workload to be made during an entire voyage so that appropriate corrective
42
actions could be applied in the event of a predicted problem of severe underload or overload. This topic area is a speculative, but interesting, extension of the present project. The CLIMATE models are currently static in that they essentially measure workload at a particular point in time based on the description of a scenario as represented by the ratings of the workload influencing factors. However, many of these factors such as weather conditions, fatigue arising from disturbed rest periods, manning level variations due to sickness and traffic density, will vary dynamically during the course of the voyage. A Voyage Simulator would be able to represent these variations by means of a dynamic simulation of the variable factors over the entire voyage, or other time intervals as required. In addition to producing better predictions concerning stages of a voyage where workload problems might reach unacceptable levels, such an approach would also provide the capability to manage these risks dynamically during the voyage. For example, if weather predictions fed into the model indicated that workload levels would rise to unacceptable levels during berthing operations, because of reduced manning levels arising from sickness, a decision might be made to stand off until weather conditions improved, or to bring engineering staff on to the bridge to assist in the berthing operation. Similarly, predictions of high levels of traffic in a particular sea-lane could lead to a decision to take an alternative route if severe weather conditions were also forecast, possibly leading to unacceptable loading levels and a high probability of errors and accidents. The extension of the model to produce such a simulator to predict and monitor future problems en route and to help formulate improved safety strategies would seem to be attractive, particularly as it would require a relatively simple extension to the existing methodology.
43
Appendix 1 – Approaches to Workload Assessment: A review of the literature
44
1.
Introduction
This Appendix examines approaches to the assessment of cognitive workload in real-world situations. In particular, background issues, theoretical approaches, measurement methods and practical tools are each reviewed to provide a frame of reference and suitable starting point for the development of a computer-based assessment tool tailored to meet the specific needs of the maritime industry.
2.
Topic Overview
The concept of cognitive workload can be defined in very general terms as the amount of mental resource a person needs to utilise to perform a particular task (or range of tasks) in a given environment. The definition implies that people have a limited amount of cognitive resource available and that mismatches between environmental demands and the availability of cognitive resources is a common cause of human error in work systems. Whilst there is strong evidence to suggest that workload levels are indeed related to the occurrence of human error, research has also shown that the link between external demand and available internal resource is not straightforward. For example, whilst high levels of mental workload tend to give rise to “cognitive strain” a condition in which the human information processing system is unable to cope with the large amounts of environmental stimuli competing for limited attentional resources, a reciprocal effect obtains when environmental demand is low. In this situation, people tend to become less vigilant and attention begins to wander increasing the probability of missing important information available for selection in the stimulus array. The tendency for quality of task performance to be hindered by strain (e.g., excessive demand) or boredom (e.g., insufficient demand) can be characterised with reference to Yerkes-Dodson law (Yerkes and Dodson, 1908). This law proposes that optimal task performance occurs at an intermediate level of mental arousal (i.e., workload), with much poorer performance levels resulting from lower and higher arousal levels. When plotted in a graph, Yerkes-Dodson leads to a predicted inverted U relationship between arousal and performance quality. Unfortunately, the precise shape of the inverted U function has been found to vary with changes to the nature of the task. Thus, optimal performance tends to occur at much lower levels of arousal than would be predicted by the ideal Yerkes-Dodson curve when tasks are easier. Conversely, higher levels of arousal are needed for effective performance of more difficult tasks. A further complication relating to the development of a better understanding of cognitive workload arises relative to the finding that arousal/performance levels are extremely fluid. Consequently, people’s reactions to task demands
45
can vary widely within a given population and can even be different for the same person performing identical tasks on two separate occasions. 2.1 Workload and error The complexity of the relationship between workload, task performance and task load can be illustrated with reference to the debate in which a number of investigators have aimed to provide an answer to the question “How much workload is too much?” (e.g., de Waard, 1996; Meijman and O’Hanlon, 1984; Teigen, 1994). To answer this question, investigators have found it useful to divide the Yerkes-Dodson inverted U function into 6 task performance-related regions as shown in Figure 8.
Figure 8 Task performance and workload as a function of demand (Adapted from de Waard, 1996)
2.1.1 Optimal Performance (A2) When task demand is in the A2 region, the human operator is able to cope easily with workload and performance remains at levels approaching optimal. Moderately increased demands do not lead to significant increase in cognitive strain and extrinsic factors do not unduly affect performance. If errors do occur they tend to arise due to factors other than those associated with task demand or cognitive workload. 2.1.2 Increased Workload Demand (A3-B-C) When task demands increase to levels within the A3 region, measures do not typically show any noticeable decline in performance despite increased task
46
loading. The operator is only able to maintain adequate task performance levels, however, by increasing cognitive effort – or depending upon the theory preferences of the analyst, by allocating more mental resources to processing activity. Limited amounts of time spent working in the A3 region are not thought to be detrimental to the well being of the human operator and fall within normal operating parameters. Extended periods of time spent in the A3 region, as might occur say when peak loads are frequently experienced in the work environment, can lead to emergence of potentially harmful stress effects which are to be avoided wherever possible (e.g., Mulders et al, 1988). The likelihood of a person experiencing stress is greater when the operator has no control over the work conditions giving rise to heightened workload. With further increases in demand, performance levels transition to region B. Here the quality of task performance begins to decline because workload demands start to exceed the operators’ capacity to cope. Again, dependant upon the type of model of workload being used, this can either mean that a limited-capacity channel has become fully occupied with a particular processing activity, or alternatively, it could mean that available cognitive resources are insufficient to meet the current task demand profile. Whichever is the case, in this region performance errors become increasingly commonplace as task demand increases. Individuals no longer have the mental resources needed to recover the situation without adopting coping strategies – which in some situations could involve reducing demand by jettisoning some of the work activities contributing to cognitive overload. Finally, when workload levels exceed the threshold for entry into region C the operator is at risk of losing control of the situation due to high workload levels and may begin to experience extreme effects of psychological stress. In region C, performance levels are at their lowest levels and can only be improved by reducing workload demands. 2.1.3 Decreased Task Demands (A1-D) In contrast to the high workload scenario, reduced task demands from the A2 region transition into region A1 in which maintenance of performance relies heavily on increased operator vigilance. It should be noted that, contrary to popular belief, vigilance is a cognitive activity, which implies increased – rather than decreased - cognitive workload levels for reasons that will be outlined in due course. Acceptable levels of vigilance can only be maintained in the A2 region by increased cognitive effort, although in this case, the effort needs to be directed towards the maintenance of a vigilant state, keeping the cognitive system in a state-of-readiness for response, rather than directing attention toward incoming task demands as in the case of high workload situations (Caggiano & Parasurraman, 2004; Warm, et al, 1996). As long as the individual is able to resource the vigilant state, performance will remain in the A1 region and there will be no workload related errors. However, if the person is unable to provide sufficient mental resources, task performance will transition into Region D, an inattentive state. Once this
47
occurs, errors of omission become prevalent in human performance, as the human information processing system is unable to maintain arousal levels sufficient to ensure the registration of the target events whose identification are necessary to trigger the appropriate response. Interestingly, the transition from region A2 to D tends to occur below the level of conscious awareness, which means that the operator is largely unaware that they are no longer attending to the vigilance task. 2.2 Workload assessment and system safety The division of the inverted U function into six ‘task performance’ regions provides a valuable qualitative classification useful for discussing cognitive workload in practical contexts. In particular, the classification provides a means of establishing when safe upper and lower limits of workload demand have been transgressed, an activity which has proved especially problematic for most workload assessment tools. Using the classification described above, error producing workload conditions arise when workload demands exceed the regions A2-B threshold. Conversely, the risk of vigilance errors becomes greatly increased when task demand falls below threshold values which define the A2-A1 boundary. Unfortunately, the classification is unable to provide insight into how the analyst might define the performance boundaries in absolute terms using task performance measures from real world situations. 2.3 Background Summary Despite all the problems associated with the concept of mental workload, it is widely recognised that there are significant benefits to be gained from the ability to make estimates of mental workload levels in particular situations. Possible uses for a workload assessment tool in hazardous environments are: • • • • •
To identify unsafe error producing conditions in normal or abnormal work situations To evaluate deployment of new equipment (i.e., automation) To assess manning levels in high-hazard environments To evaluate impact of proposed changes to work methods As an aid to incident investigation
48
3.
Workload Theory
Evolution of mental workload theory has been driven largely through experimental work and field studies conducted from a human information processing (HIP) standpoint. The HIP approach represents cognitive activity in terms of information flow diagrams similar to those used to depict information flow in computer systems and includes consideration of all processes studied within mainstream cognitive psychology, such as perception, attention, memory, decision-making and psychomotor responses. With regard to the analysis of mental workload, HIP models have proved particularly useful due to the special prominence they tend to give to the idea that effective task performance depends crucially on the availability of internal mental resources needed to power information processing sub-systems (e.g., Kahneman, 1973; Moray, 1967; Norman and Bobrow, 1975; Navon and Gopher, 1979; Wickens, 1984). When internal mental resources are available it is anticipated that cognitive workload levels will be acceptable and there will be error free task performance. When sufficient resources are unavailable, however, for example in situations where task demands are high, it is expected that the natural course of information processing will tend to become disrupted and task performance error prone (e.g., Reason, 1990). Three main types of cognitive model have been invoked to account for performance failures attributable to unsatisfactory mental workload levels. Each approach is considered in turn. 3.1
The Single Channel Hypothesis
Donald Broadbent (1958) proposed one of the first information processing theories suitable for the analysis of mental workload. Broadbent wanted to provide a theoretical account for the phenomenon of selective attention and was particularly interested in explaining performance failures among subjects who were asked to attend to multiple spoken messages in dichotic listening tasks. He developed a theory of attention according to which there is a single channel central processor with limited capacity capable of selecting only one sensory input at a time for conscious processing, thus accounting for the finding that messages presented in an unattended sensory channel were frequently lost to the individual. The central processor was assumed to have quite slow operation and could only switch between input channels approximately twice a second. According to Broadbent, results from divided attention experiments suggested that the proposed switching mechanism acted like a filter selecting some inputs for further processing and blocking others containing irrelevant data. It was further suggested that the filter could be ‘set’ to admit information on the basis of its broad physical characteristics, including location, pitch and intensity, amongst other things. Although Broadbent’s preliminary model was very quickly shown to be wrong in certain respects – for example, Anne Treisman (1969) was able to provide evidence of pre-attentive processing affecting stimulus selection - the idea that human
49
cognition was to be characterised by a single, limited capacity information processing channel became prevalent in cognitive science work throughout most of the 1960s and early 1970s. The approach was used as an explanatory framework to account for a variety of interference effects and operator errors, observed to occur during the performance of concurrent tasks (e.g., in high workload situations). “We may conclude this discussion….by reiterating a basic theme …that the human operator may be usefully regarded as a singlechannelled communication system, whose capacity for receiving, processing, storing and acting upon information is limited. This assumption is fundamental to our understanding of skill…[and error].” (Reason, 1970: p. 342) The single channel hypothesis proved highly influential for researchers working in the field of industrial psychology because it provided important insights into the nature of the human operator for designers of workplace equipment. For example, the hypothesis implied that the capture of an operator’s attention – say in an alarm handling situation – could be greatly improved by segregating the target signal from background information via manipulation of the physical properties of the stimulus agent (e.g., modifying the pitch/intensity of high priority audio alarms). Broadbent would have argued that such manipulations allow effective utilisation of the selective attention filter. The same general principle could be applied to the design of visual display terminals. In Treisman’s version of the model careful use of design features such as screen object colour, line orientation and motion, could be picked up by pre-attentive processes and used to grab the operators’ attention in human-computer interaction (Preece et al, 2004). Whilst, the single channel hypothesis found some success in the formulation of several robust design heuristics, each still relevant today, the approach proved much less adequate as a theory of human information processing. Moray (1967), for example, pointed out that the single channel hypothesis clearly failed to apply to all tasks requiring divided or selective attention. Specifically, he suggested, single channel models do not account for the performance of well-trained observers on divided attention tasks, or indeed, for cases where perceptual inputs are compatible with psychomotor outputs – as is the case in skilled performance. In both cases, individuals are clearly able to share resources between concurrent tasks. Using the language of information theory, Moray advanced the view that the human cognition should be considered a limited capacity processor rather than being viewed as a system dominated by a fixed, limited capacity channel. He proposed that it is the limitation of the central processor that influences performance rather than any limitation affecting input channels. In this view, mental operations performed on perceptual inputs consume capacity (an aspect of attention not considered by Broadbent or Treisman), and capacity was believed to be divided among different “processors” when multitasking is required. With this argument Moray was anticipating a human information processing approach
50
that would later come to be called the “allocation of resources hypothesis” or resource theory. 3.2
Resource Theory and Mental Workload
Following the apparent failure of the single channel hypothesis to account for multitasking performance outcomes, from the mid 1970s onwards, cognitive psychologists began to develop and refine cognitive theory which proposed the existence of pools of mental resource which could be made available to the human information processing system to be spent on task performance. Two different types of resource theory were developed to account for task performance (i.e., mental workload) effects. The first viewed processing capacity in terms of a single pool of resources to be distributed throughout the cognitive system in response to competing task demands. The second proposed the existence of multiple pools of resource each dedicated to particular types of processing activity. In this latter view, the likelihood that performance of concurrent tasks would give rise to the experience of cognitive strain would largely depend on whether the tasks placed demand on the same resource pool. 3.2.1 Single Resource Models Described in outline, single resource theory (SRT) assumes that cognitive resource allocation works simply in terms of supply and demand. Performance on one or more tasks is assumed to suffer whenever resource demands exceed available supply (see Figure 9). In some resource models the total supply of mental resources is assumed to be fixed (e.g., Navon and Gopher, 1979). Most SRT models, however, have assumed the amount of total resource to be variable in increasing or diminishing amounts - contingent on the presence or absence of psychological stressors and other Performance Influencing Factors (e.g., Kahneman, 1973).
51
Figure 9 SRT predicted relation between resources, demands and task performance
One early influential SRT model with implications for the analysis of mental workload was the capacity theory of attention and effort proposed by Daniel Kahneman (1973). Kahneman argued that the cognitive system has a single pool of limited capacity generally available to resource task performance at any one time. Difficult tasks were assumed to require large amounts of resources leading to possible problems when these tasks were paired with other concurrent tasks requiring mental capacity. Conversely, easy, well practiced and highly automated tasks were likely to be time-shared with relatively few task performance implications. In Kahneman’s view, the total pool of resources available at any one time is variable, dependant upon a number of influencing factors. Factors believed likely to affect the availability of mental resources included the enduring dispositions and momentary intentions of the actor, as well as the person’s general levels of psychological arousal and, the complexity of the task (see Figure 10). When faced with a difficult, demanding task, arousal levels can be increased in the cognitive system, providing additional resources to cover the extra demand. It was suggested that this adaptive reaction to mental workload could be monitored via the collection of physiological data indicative of increased autonomic nervous system activity1.
1
The relationship between levels of psychological arousal and increased autonomic nervous system activity is well documented and will be discussed later in the section dealing with workload assessment techniques.
52
Figure 10 Kahneman's Capacity Model of Attention (Source: DiDimenico, 2003: p.13)
The SRT approach to workload analysis was further developed by Norman and Bobrow (1975). They distinguished between the effects of resourcelimited processes and data-limited processes in human cognition. Resourcelimited processes are those where task performance improves as long as sufficient resources are allocated to meet task demand. For these tasks, the subject always has the opportunity to improve performance by adopting strategies that lead to the freeing up of mental capacity. Data limited processes, on the other hand, are those where the performance limitation is embedded within the structure of the task, as occurs, for example, in gambling situations and tasks involving judgments of uncertainty. For these situations, further allocation of mental resources is unlikely to improve task performance and other, data driven methods, must be adopted to improve the quality of human performance. It should be noted, however, that in some situations apparent data-limited processes may simply be reflecting ceiling effects of the task. Figure 11 shows a Performance Resource Function curve plotted to exemplify the effects of resource and data limited processes on task performance (cf: Norman and Bobrow, 1975).
53
Figure 11 Performance Resource Function (PRF) showing resource-limited and data-limited regions
In many respects single resource theory provides a much more coherent account of the relation between human cognition and task performance than was the case for the single channel hypothesis. In particular, it has proved particularly suited to accounting for the observation that people are able to perform concurrent tasks under certain circumstances and the approach seemed to provide the basis for a strong scientific foundation for workload assessment through physiological measurement techniques. Some investigators (Beatty, 1982; Just and Carpenter, 1993) took this further by proposing that spare capacity available in the resource pool could be measured by giving subjects secondary tasks to perform, leading to the prospect of developing an index of the capacity required to perform a primary task. In this way, the capacity demands of a variety of tasks can be measured and used to predict dual task performance. Despite the promise of the approach, however, SRT approaches to workload assessment have proved problematic. Contrary to the predictions of the model, task interference and performance decrements are still frequently observed even for multitasking scenarios involving simple tasks. Indeed, the amount of interference seems more related to task similarity than task difficulty. Thus, concurrent visual processing tasks interfere more readily than a visual and auditory one even thought the estimated extrinsic tasks demands were viewed comparable. Kahneman explained this finding by suggesting that a degree of structural interference between simultaneous tasks is to be expected when both tasks require use of a non-sharable mechanism or resource. As a final comment on SRT, it is worth noting that the Kahneman’s model of attention and effort proposed, perhaps for the first time, that people can exercise a degree of volition over how mental resources can be allocated to tasks and thus they can influence their reaction to task loading. The strategies that people can adopt to help them cope with high task demands are very important in workload research and will be considered in more detail below.
54
3.2.2 Multiple Resource Models Multiple resource theory has been proposed as an alternative explanatory framework to single resource theory, for research involving the assessment of mental workload. Investigators working from this standpoint have assumed that the human information processing system is a multiple channel processor (e.g., it has multiple structures), and that each processor, or group of processors, has its own internal capacity. In MRT approaches, mental resources are often seen as analogous to fuel that is consumed by various activities, or as a tank of liquid to be divided among several competing tasks (Wickens, 1984). In stressful conditions, or multitasking situations, the amount of resource may become depleted and give rise to interference effects. One important feature of this view is the idea that the impact of changes in task demand on mental resources may not be purely quantitative, but may also be qualitative as well, (i.e., structural). One of the most influential examples of a task performance model, developed from within a multiple resource theoretic perspective, has been that developed by Christopher Wickens (e.g., 1984; 1991). Wickens was interested primarily in providing an account for the effects of task similarity on dual task performance and proposed a model of human cognition in which pools of mental resources can be differentiated based on three dichotomous dimensions: • • •
Stage of processing: Perceptual, Central Processing, Response Perceptual modality: Visual and auditory Processing code used to represent information: Verbal, and spatial
He argued that each of these three resource pools can be utilised either jointly or independently, according to the demands of the information processing components that comprise the task. Wicken’s model predicts that efficient time-sharing performance should occur when two tasks use different values on each of the three dimensions (see Figure 12). For example, listening to a message is a perceptual stage task, involving a verbal code and auditory processing modality. Conversely, orientating a crosshair cursor to a target on a visual display terminal using a computer joystick is a psychomotor stage activity, reliant upon the visual processing modality using spatial coding. Consequently, these two tasks should pose little or no problem when performed together. Figure 13 provides a diagrammatic representation of Wickens (1984) human performance model to complement the tabular representation shown in Figure 12.
55
PROCESSING STAGE
Verbal PROCESSING CODE
PROCESSING MODALITY
Perceiving
Visual
Print
Auditory
Speech Analogue Quantities Spatial Patterns
Visual Spatial Auditory
Central Processing Memory Rehearsal Mental Arithmetic Mental Rotation
Psychomotor Response
Manually guided
Imagining
Response
Voice
Figure 12 Sources of mental resource relative to information processing stage, code and modality (e.g., Wickens, 1991)
Figure 13 Diagrammatic representation of Wickens Multiple Resource Theory (Source: DiDimenico, 2003: p.16)
According to the Wickens model, task interference (i.e., mental workload) effects arise when the available processor specific resources are shared among competing tasks. Again, the strong assumption is that task loading is greater for difficult tasks than for easy ones, but in addition, the approach also suggests that performance of two complex but highly similar activities will lead to performance problems due to the need to use the same resource for competing tasks. When two concurrent, interacting tasks need to be carried out, it is suggested that difficult performance trade-offs often need to be made. As a rule, cross modal time-sharing is better for coping with high workload than cases where dual task demand is intramodal. Conversely, when two tasks demand separate resources, time-sharing will become more efficient and changes in the difficulty of one task are less likely to influence 56
performance on the other. This is a similar proposition to the idea of datalimited processing discussed in the previous section where ceiling performance effects can be observed. Of course, when this occurs, it also means that resources withdrawn from one task will not be available to be used to the advantage of the other. 3.3
Composite Models
Before leaving the issue of mental workload theory, it is perhaps worth mentioning a relatively new type of cognitive modelling activity that aims to use elements of the single channel hypothesis and resource theory to account for workload effects in human performance. These composite models are usually represented in the form of computer programs that are applied to tasks and used to predict various aspects of human performance. The ability of these models to reproduce workload effects has frequently been mentioned as a justification for their development – although the extent to which they have managed to do this has been a matter of some debate. Perhaps the most frequently mentioned composite model in the workload literature has been the Executive-Process Interactive Control (EPIC) model developed by Kieras and Meyer, 1997. EPIC was developed to provide a comprehensive computational theory of multiple task performance based on current theory (SCH/SRT/MRT) and empirical data. Output from the model was intended to provide quantitative prediction of mental workload levels and performance effects useful in the practical design of systems, training and personnel selection. The model’s architecture adheres to assumptions inherent in the stage theoretic models of human performance described in the previous section, although it essentially uses production system modelling techniques to emulate operation of the main cognitive information processor. The model assumes that there are several separate perceptual processes with distinct operating characteristics – represented mainly in the form of their assumed time-dependences. Separate motor processors are encoded for vocal, manual and ocular movements. Information processing limitations are due in the model to limitations of structural resources (i.e., SCT) rather than being attributable to lack of available mental resource as specified in resource theoretic models. Thus, the central processor can fire any number of rules simultaneously, but since the peripheral sense organs have limited channel architecture, the overall system is subject to hard-wired processing channel limitations. EPIC also assumes a knowledge base comprising declarative/procedural knowledge that represents information held in permanent memory. Task procedures (procedural knowledge) are modelled using production system architecture. Declarative knowledge is represented in the form of a pre-loaded database of information detailing factual knowledge. The architecture for EPIC is summarised in Figure 14 below.
57
Figure 14 EPIC Architecture
Psychological theory expressed in the form of computation models of human cognition can provide a useful test of the mechanisms assumed to be effected implicated in the experience or workload. However, despite this, the EPIC model has only been used on a few occasions to predict workload variations in particular settings. For example, the model was applied to the simulated task of a telephone operator interacting with a workstation to assist a customer complete a call. Kieras and Meyer (1997) reported that the model successfully predicted time taken on task, it reproduced the type and timings of keystroke data, and the model also predicted that the major limitation on operator speed was not typing time but the rate at which the customer spoke the telephone number. Early results such as these illustrate the potential of the approach for assisting with the redesign of technology. However, the example also shows that composite model based approaches need to be developed far more fully before they can be used routinely to predict task demand and workload variation in complex tasks. 3.4
Workload Theory: Comment
As should be clear from the foregoing discussion, the debate about the number and nature of mental resources and indeed discussions regarding
58
whether the term mental resource is a useful concept at all (Navon, 1984; Allport 1993), have been extremely productive. Experimental studies have provided a rich seam of data providing insight into how, why and what way, human information processing becomes disrupted under conditions of high workload. In very general terms, it would be reasonable to conclude that multiple resource theoretic models have proved particularly useful. If cognitive resources are of more than one kind, as it seems that they are, then workload assessments must measure each aspect of workload individually rather than trying to capture task load in a single, overall measure. Similarly, and we must also include work completed within the SCH and SRT traditions here, research has provided the basis for the establishment of a number of useful design heuristics that should enable the development of usable technology particularly for work situations where a requirement for multitasking is a likely outcome of a combination of environmental circumstances. Despite the above benefits and intuitive appeal of workload theory, there are a number of surprisingly robust interference effects that have been observed which seem to defy explanation. For example, in one experimental paradigm explored by Pashler (1994) subjects were required to make a speedy response to a tone by pressing a button. At the same time, they were also required to make a rapid foot-press response to a visual signal (e.g., a light on a panel) that occurred shortly after the tone. When the interval between the light and tone was small, the foot-pressing response was delayed by several hundred milliseconds. This period, where the second task is disrupted due to interference from a prior task, has become known as the psychological refractory period. Interestingly, the magnitude of interference gets smaller and eventually disappears altogether with marginal changes to the interval of time between the two tasks. Pashler argued that observation of a refractory effect is indicative of a structural bottleneck in processing such that two tasks occurring in close proximity – but using different cognitive processes and information processing modes – are competing for access to a limited capacity mechanism. The task arriving second must wait for this mechanism to become free and re-energised. Contrary to Broadbent, Pashler proposed that the processing bottleneck occurs late in the processing chain just prior to response selection. 3.5
Summary
The main conclusions of the review of cognitive theory are as follows: The structurally limited human information processing system has limited resources available to service task performance. These limitations become particularly apparent in situations requiring multitasking. There is little agreement between models regarding the precise nature of the limitation. However, there is general agreement that interference effects manifest themselves in dual task performance situations. Task demand and performance decrement is greater when:
59
• • • • •
Tasks are more complex When the person has limited prior exposure to the task demand When two concurrent tasks to be performed are similar When the two tasks make demands on the same cognitive structures The individual’s reaction to task loading is variable and differs contingent upon a number of factors, including: o Levels of psychological arousal: Measured by monitoring autonomic nervous system activity o The presence or absence of psychological stressors: e.g., fatigue, stress o Presence or absence of extrinsic influencing factors: quality of training, quality of work procedures, etc.
Psychological theory, as discussed above, has been mainly concerned with developing explanation of an individuals’ response to high workload situations. A different literature has evolved to cover performance decrement in situations requiring vigilance. This literature is to be explored in the following section.
60
4.
Vigilance “The time is World War II. A British patrol plane flies over the Bay of Biscay. Inside, an observer peers at a speckled, flickering screen looking for a tell-tale spot of light or ‘blip’ that will signal the presence of an enemy submarine on the surface of the sea. The observer has been on watch for a little over 30 minutes and nothing much has happened. Perhaps this mission, like so many others, will be fruitless. Suddenly, the ‘blip’ appears but the observer makes no response. The ‘blip’ appears a few more times. Still the observer fails to respond. Evidently, the signal has gone undetected and, as a result, so has the submarine” (cf: Warm, 1984, p. 1).
Vigilance is the ability of a person to maintain focus of attention and remain alert to target stimuli over long periods of time (e.g., Davies and Parasuraman, 1982). As the above quotation illustrates, the study of vigilance task performance came to prominence in the late 1940s, largely in response to the high numbers of detection failures observed among operators of the early radar and sonar workstations. Similar concerns with people’s ability to maintain focus of attention during the performance of prolonged monitoring tasks such as quality control of assembly line products had already been noted in other industries. For example, as early as 1930, Wyatt and Langdon (1932) found time-related variations in the performance of inspectors working in a munitions factory. They found that that the inspectors’ ability to detect metal flaws and other defects varied over a 4-hour period with lowest performance levels occurring around the 90 minute mark. There are many examples of tasks in the modern world that rely crucially on a person’s ability to maintain focus of attention over long periods of time. Indeed, due to the increasing trend to automate work environments, coupled with an associated shift away from active to supervisory control modes (e.g., Sheridan, 1970), such jobs are on the increase in the contemporary world of work. Examples include: industrial quality control, robotic manufacturing, airtraffic control, ship bridge watchkeeping, nuclear power plant operation, proof reading and anaesthesiology. 4.1
The Decrement Function
In response to the problems with radar and sonar operation, Norman Mackworth was asked by the Royal Air Force in the late 1940s, to carry out research to establish the causes and effects of detection failures in vigilance tasks with a view toward the establishment of corrective measures. Mackworth completed a series of ingenious experiments that have subsequently come to define work in the field. In the seminal study of vigilance, subjects are required to complete a ‘clock test’ designed to emulate the essential features of a radar display-monitoring task. In a clock test
61
scenario, subjects are asked to view movements of a black pointer along the circumference of a blank faced clock that contains no scale markers or reference points. Once every second, the pointer moves 0.3 inches to a new position. From time to time, according to a predefined schedule specified in the research design, the pointer will execute a ‘double jump’ of 0.6 inch and subjects are required to detect this event by pressing a key. Participation in vigilance tests often involves payment and in some research designs payment is linked to performance to enhance motivation. The rate of the occurrence of the target event can be manipulated in any number of ways. In Mackworth’s initial experiments target events were set to occur at irregular intervals at a rate of 12 events each 30 minutes. In other experiments, however, presentation rates have been varied to be more or less frequent and the intervening interval has been manipulated to present target events in more or less predictable fashion (e.g., “a target event once every 35 seconds”, “22 events per session set to present randomly”). Irrespective of the rate of presentation, the typical vigilance task is typically prolonged and can last anything between 2-6 hours dependant upon the requirements of the analysis team. Results from a prototypical vigilance experiment are remarkably robust and have been confirmed over many thousands of experiments. Task performance has been found –with minor variations - to conform to the “decrement function” which is a plot charting loss of performance efficiency over time. The function predicts that approximately 25-30% loss of target detection efficiency will occur within the first 60 minutes of commencement of the vigil with at least half that amount occurring in the first 15 minutes of task performance. In some situations, detection failures can be induced from the task onset. Performance continues to degrade more slowly beyond the 60minute mark and continue until time runs out or a floor effect is achieved.
62
Figure 15 The 'standard' performance decrement in vigilance tasks (adapted from Mackworth, 1948)
4.2
Components of Vigilance Tasks
Performance efficiency in tasks requiring vigilance is clearly a product of several factors and a significant challenge in developing a better understanding of the construct requires the elaboration of the factors influencing efficient vigilant performance. In an attempt to provide an answer to this question, Jerison (1959) proposed a framework suggesting that performance on watchkeeping tasks was a function of factors including: • • • • •
The sensory modality of the target signal (auditory or visual) The salience or detectability of those signals Uncertainty in position, time or nature of the signal to be detected Characteristics of the non-signal background events Complexity of the task
4.2.1 Signal Modality Vigilance experiments have been conducted in which the signal to be detected involved presentation of auditory, visual, or cutaneous stimulation. However, almost without exception it has been found that tasks involving auditory modality stimulation (e.g., audio alarms) are performed with greater efficiency and are more stable over time, than tasks involving purely visual or cutaneous signalling techniques (e.g., Davies and Parasuraman, 1982). The efficacy of visual signals is enhanced by equipment design, which provide
63
better coupling between the observer and display to be monitored. For example, the provision of helmet-mounted displays can be used to elevate performance levels for tasks involving visual signals to levels obtained for tasks where detection of audio signals is required (Galinsky et al, 1990). In addition, task performance in systems that employ dual mode alerting methods (e.g., visual signal accompanied by audio alarm) is superior to single mode performance, irrespective of the primary modality method (Doll and Hanna, 1989). 4.2.2 Signal Salience A common finding in the plethora of experiments results reporting sustained attention task performance effects is the idea that target detection is considerably enhanced by increasing the amplitude and duration of the sought for signal. Interestingly, Corcoran and his colleagues found that the prototypical performance decrement curve can be reversed when the amplitude of the input signal is abruptly increased midway through the vigil (Corcoran et al, 1977). This performance improvement tends to remain active for the remainder of the session. Increases to the duration of the signal can also improve vigilance. Radar signals of brief duration, for example, are more likely to be missed than those that remain visible for some time. Significant reductions in signal detection failure can be obtained by increasing the duration of target signals up to a limit of around four seconds. Few, if any performance gains are likely to result from increases to the duration of the target signal beyond this range (e.g., Warm et al, 1970). 4.2.3 Stimulus Uncertainty Subjects in vigilance tasks are likely to be exposed to varying degrees of uncertainty with regard to the onset of the target event. Not surprisingly, research has shown performance to vary significantly as a function of the predictability of target signal. Signal uncertainty can be manipulated in an experimental situation in two ways. Temporal uncertainty arises when the probable timing of event signals are unknown. Spatial uncertainty occurs when the precise location of the event in the visual field is unknown. Performance decrement in tasks involving both temporal and spatial uncertainty is especially prone to detection failures. Experiments have been performed in which the density of target signals was manipulated. When critical signals are presented more frequently – in effect lowering the temporal uncertainty associated with signal rates – the greater the performance efficiency in the signal detection task (Warm and Jerison, 1984). Furthermore, response times to detection increase as a linear function of uncertainty. Temporal uncertainty has also been examined in experiments that manipulate the time duration between presentations of target signals. When performance is compared for three conditions in which intervening times are described as predictable, irregular or unpredictable, it was found that both the speed and 64
accuracy of detection was far greater in the context of regular as opposed to irregular signal conditions (Warm, 1984). The effect of spatial uncertainty on vigilance has been tested using tasks in which signals can appear on any one of several display terminals. Under such conditions, performance efficiency is lowered and subjects tend to bias their attention towards locations in which the likelihood of signal appearance is perceived to be the greatest (Joshi et al, 1985). With regard to the issue of signal uncertainty, several studies have revealed an important effect. Subjects trained initially under conditions of high signal probability tend to be more resistant in subsequent testing to performance decrement effects. This occurs irrespective of actual signal probability in the work domain (Griffin et al, 1986). This finding is strongly suggestive of the potential of training to help minimise the influence of signal uncertainty in practical contexts. 4.2.4 Background Context Signal detection experiments tend to use stimulus materials consisting of relatively few target signals embedded within the context of a wide range of non-target background events. For example, the subject may have to react to bright flashes of light that appear within an array of dimmer lights or respond to audio events of a particular tone presented in a sound recording of many different tones. In one obvious sense, background events are neutral with regard to performance of the vigilance task. They do not require an overt reaction. At another level, however, they are extremely influential in terms of their effect on a person’s ability to maintain focus of attention and a number of studies have revealed that performance decrement is more pronounced in situations where the occurrence of background events is high frequency (Lanzetta et al, 1987). Interaction effects between the frequency of background events and the temporal regularity of the target signal have also been observed. Synchronous relationships – those in which a regular target appears within stable background of non-target events – tend to provide the most beneficial arrangement for vigilant performance. Asynchronous relationships, on the other hand (those in which an irregular target appears within a background of varying regularity non-target events), provides the most challenging conditions for effective target detection and tends to yield the greatest number of detection failures. The effects of both event rate and event asynchrony are not trivial. They reveal that the efficiency of signal detection in a vigilance task is determined as much by what transpires in the background to the target event as it is by the critical event itself. 4.2.5 Stimulus Complexity Most vigilance studies rely on relatively simple perceptual discriminations during target detection involving change to the intensity, extensity, duration or motion pathway of critical stimuli on a single display. Vigilance tasks in actual 65
operating conditions can of course be much more complex than this and might be expected to require operators to cope with multiple signal sources and types. At an intuitive level it would be anticipated that increases to the complexity of the stimulus configuration would lead to loss of performance efficiency and this is exactly what early studies found. In a well-cited study reported by Jerison (1963), for example, subjects monitored three displays simultaneously with the result that the decrement function was greatly enhanced. Indeed, significant detection failures were obtained within the first 2-3 minutes of commencement of the vigil – a period in which performance efficacy is usually quite good. In contrast to this, contemporary studies have revealed a more complex picture. Several studies have shown that performance decrement effects may be reduced by increasing task complexity if the overall workload (the total number and rate of target signals and background events) remains quite low. In these studies, observers were asked to monitor between 6-36 screens over a 6 hour period. The apparent improvement in task performance, however, is short lived and increases to the overall task complexity restore the decrement function (Loeb et al, 1987). In an interesting variation on this basic theme, Fisk and Schneider (1981) approached the issue of performance decrements as a function of task complexity from the human information processing perspective of controlled and automatic processing. According to this view, automatic processes represent fast, effortless, skilled-based behaviours, whilst controlled processes are relatively slow, effortful and capacity-limited. Using a methodology in which vigilance skills were acquired by operators over several hundred trials, Fisk and Schneider were able to demonstrate that changes to the vigilance decrement function as a result of changes to task complexity, are largely restricted to people reliant upon controlled, capacity-limited processing, that is, the unskilled. Consequently, some authors conclude that the responsiveness of vigilance to task complexity might be overcome via the provision of training and practice. 4.3
Summary
The study of vigilance reveals that there are significant performance effects when people are asked to remain alert to target signals over prolonged periods. Specifically, experiments reveal that up to 50% of performance efficiency can be lost in the first 60 minutes following commencement of the vigil when operating conditions or the design of the task are unfavourable. Several factors can influence a person’s ability to react to task demands. The salience of the target object, the levels of uncertainty associated with the onset of the target event, the character and makeup of background materials, objects and events, and the complexity of the task are all factors likely to influence the quality of performance. Changes to any of these parameters can affect performance efficiency for the better or for the worse.
66
5.
Measures of Cognitive Workload
The goal of cognitive workload assessment is to evaluate the effect of the demands that a task places on the human operator or operating team. There are presently four generic approaches which may used – either singly or in combined form - to make assessments of cognitive workload. The first views workload as a subjective phenomenon and advocates use of self-report rating scales. The second adopts a behavioural perspective and looks for evidence of workload as it becomes manifest in a variety of task performance measures. A third assumes workload to be a physiological phenomenon and therefore aims to register its presence by monitoring a range of psychophysiological sensors producing output obtained concurrent with performance of the task. The final approach adopts an engineering perspective in which workload estimates are formed on the basis of consideration of prevailing task demands coupled with assessment of the influencing factors likely to affect a persons ability to cope with that demand. Each of these approaches will be considered more fully below. Before this, however, it would be useful to consider some of the factors that are important in relation to choice of method. 5.1
Factors influencing choice of workload measure
Identification of criterion variables for evaluating the effectiveness of measures of cognitive workload has been discussed on a number of occasions (e.g., de Waard, 1996). For example, O’Donnell and Eggemeier (1986) suggested that workload measurement techniques varied according to the properties: workload measure sensitivity, diagnosticity, primary-task intrusion, implementation requirements, and operator acceptance. Wickens later added two additional properties – ‘selectivity’ and ‘bandwidth and reliability’ to the list of criteria. Because these criteria are of importance to the achievement of Workpackage 3 objectives (dealing with selection of a workload assessment method) each factor will be considered in more detail below2. 5.1.1 Sensitivity Sensitivity refers to the ability of a workload assessment technique to detect changes in workload levels within a particular scenario. One form of measuring sensitivity relates to the recognition of a differential experience of workload for a particular individual performing a specific task. Conversely, a second form of the criterion relates to establishment of differences apparent within populations of users each performing the same work activities. Notions of sensitivity are especially important where the requirement is to identify violations of safe upper and lower workload limits. With this in mind it is worth 2
ISO 10075-Part 3 Provides additional assessment criteria. These will be considered more fully in Workpackage 3.
67
noting that the sensitivity of a measure can be equally effective when specified using qualitative terms as opposed to quantitative values. 5.1.2 Diagnosticity Diagnosticity is a criterion that attempts to map the workload assessment onto the psychological and environmental factors which give rise to the demand. Within the specific context of multiple resource theory, for example, a highly diagnostic measure is one that enables the analyst to conclude that the source of excessive workload arises because two visual processing tasks must compete for the same cognitive processor or channel. Conversely, and in some situations equally useful, diagnosticity refers to the ability of the measure to help attribute workload levels to the design of the task or job (e.g., Wierwille & Eggemeier, 1993). 5.1.3 Primary Task Intrusion The extent to which application of a measurement technique intrudes upon task performance – thereby contributing to task load - is called primary task intrusion3. In general, primary task intrusion is considered a bad thing and is to be avoided at all costs wherever possible. Task intrusion resulting from application of a measurement instrument will tend to contaminate the workload estimate and lead to unreliable assessments. Overall, it can be avoided by modifying design of the data acquisition cycle to allow workload measurement either prior to or following completion of the task of interest. Task intrusion is permissible and even actively encouraged in some assessment scenarios, as occurs, for example, when workload is assessed using secondary performance measures. 5.1.4 Implementation Requirements Implementation requirements refer to costs which accrue from using a particular assessment method. For example, all workload assessment techniques consume resource in terms of human involvement although some techniques are more costly than others. Methods that require training, for example, can be relatively expensive when compared to methods which are designed to be used by the novice or intermediate analyst. Similarly, assessments requiring application of multivariate statistics will usually require the involvement of a specialist technician, in contrast to alternative approaches which rely on summary statistics to generate workload indices. Finally, some techniques require extensive investment in hardware and specialist laboratory equipment. This is especially the case for physiologically based assessments which require large amount of sensors and data 3
This is not to be confused with interference effects which occur in dual task performance. The term primary teak intrusion is used here to refer to situations where the measurement technique itself becomes demanding of resource.
68
acquisition devices to capture person’s physiological states of arousal (e.g., Unema, 1995). 5.1.5 Operator Acceptance The final criterion discussed by O’Donnell and Eggemeier (1986) concerns the issue of operator acceptance. In order for a technique to produce a useful indicator of workload then the results must generally accord with the operators experience and they must also be willing to accept its conclusions. In addition, the operator will need to be prepared to participate in evaluations. de Waard (1996) noted that operator acceptance tends to be higher when the assessment method is less intrusive. People are also more willing to accept assessments made in the operational environment, as opposed to ratings established on the basis of performance of analogous laboratory-based tasks. Whilst it is clear that user acceptance of an assessment technique is an important factor it needs also to be remembered that users may not be fully aware of the implications of task demand on performance. For this reason, the extent to which users accept assessments needs to be treated with caution. It may be quite unreasonable to reject an assessment solely on the basis of the unwillingness of users to accept the evaluation results. 5.1.6 Selectivity The selectivity of measurement techniques was a criterion introduced by Wickens (1991). Selectivity refers to the ability of the measurement method to reflect changes in mental load but remain impervious to changes in physical load. This criterion is important when the task being assessed has both a significant cognitive and physical dimension. The bandwidth and reliability criterion refers to the ability of assessment methods to identify upper and lower performance limits (i.e., the range of optimal task performance) with sufficient reliability. There should be a degree of stability relative to the bandwidth setting insofar as general assessments would ideally transfer to other analogous domains. Having briefly considered the issue of workload assessment method selection, we now turn to consideration of the measurement methods themselves. In the following section, work representative of each of the four approaches identified above will be surveyed. 5.2
Subjective Workload Measures
Perhaps the most frequently used technique for estimating cognitive workload involves asking the user directly. This approach assumes that cognitive workload is essentially a subjective phenomenon and therefore the most
69
appropriate way of eliciting workload measures involves asking the person with responsibility for completing the task directly: “if the person feels loaded and effortful, he is loaded and effortful, whatever the behavioural and performance measures show” (Johannsen et al, 1979) Several rating scales have been developed specifically for the purpose of measuring subjective workload. According to Lysaght et al, (1989) they can be classified according to the level of measurement: nominal, ordinal, interval and ration scale levels, although they note that most investigators elect to measure workload at the ordinal or interval levels. The following account is not intended to provide an exhaustive review of these scales. Rather instead, it concentrates on presenting the most widely used scales and in so doing, draws heavily on the excellent review of subjective workload assessment methods provided by the NATO Human Factors and Medicine Panel in 2001 (NATO, 2001). 5.2.1 Modified Cooper-Harper Scale (MCH) One of the earliest subjective workload assessment rating scales was developed in the late 1960s for use in evaluating workload associated with the handling characteristic of military aircraft (Cooper and Harper, 1969). In its original form the Cooper-Harper scale was mainly concerned with mapping physical workload and in consequence the tool provided little to no data of relevance to the assessment of cognitive workload. However, following the emergence of the stage theoretic models of human cognition, coupled with increasing practical concern with the prevention of pilot error, the scale was modified to incorporate assessments of cognitive workload (Wierwille and Casalli, 1983). A graphical representation of the Modified Cooper-Harper (MCH) Scale is provided in Figure 16. Use of the MCH Scale requires that the instrument be used in situations – experiments or evaluations – where a global measure of cognitive workload is required. In a typical MCH session, where the scale is administered to a representative population of users, Wierwille and Casalli (1983) recommended that: • • • •
the order of presentation of scale items be randomised, to prevent bias from order effects, task ratings be obtained immediately following exposure to the task being assessed, subjects be briefed as to the purpose of data collection prior to assessment given the opportunity to practice workload assessments prior to the actual data collection session. Results were to be analysed using nonparametric statistics reflecting the fact that ratings are made on an ordinal scale. The final computed workload score was an aggregate of
70
all assessments made by participants drawn from the user population and used in the assessment. According to Geddies et al (2001) the advantages of MCH are that the decision tree (shown in Figure 16) makes the rating task easier because it provides informants with qualitative guidance during assessment. They also point out that data collection is relatively quick and painless and that this proves popular with informants. On the negative side, it was felt that the assessment technique is founded upon the simplistic assumption of a linear relationship between performance and effort, a fact which has been shown to be erroneous. Similarly, the overall metric appears to assume that low workloads are highly desirable. Again, this assumption has been shown to be incorrect as low workload can lead to problems of vigilance and sustained attention. Finally, it was noted that the scale lacks diagnosticity insofar as it provides a general or global measure of workload without being diagnostic as to what cognitive factors are actually driving the subjective experience of task load.
Figure 16 Modified Cooper-Harper Scale (From Geddies et al., 2001)
71
5.2.2 NASA Task Load Index (NASA-TLX) The NASA TLX (Hart and Staveland, 1988) is a cognitive workload rating scale derived from the NASA Biploar-Rating Scale in which self-report scores are collected from participants in relation to six bi-polar subscales assumed to contribute to the total perceived workload experience. The NASA TLX is founded on the idea that cognitive workload is a hypothetical construct that represents the cost that must be incurred by the operator in order to maintain an acceptable level of performance. According to Hart and Staveland (1988) workload is the product of the interaction between the requirements of the task, the circumstances under which the task must be performed and the skills, behaviours and perceptions of the human operator. They argue that its multidimensional nature requires a broader assessment tool that is implied by the MHS scale. The workload sub-scales proposed by Hart and Staveland (1988) and their associated definitions are shown in Figure 17.
Figure 17 NASA-TLX rating scales definitions
The six subscales shown in Figure 17 can be grouped according to the three factors assumed to produce workload. The mental, physical and temporal demand sub-scales are properties believed associated with the task. The performance and effort sub-scales are assumed to be characteristics of behaviour and skill. Finally, the frustration is assumed to be a characteristic of the individual. Each subscale is presented to the rater using a 20 point scale with values running from 0-100 along the scale. The extremes of each scale (the endpoints) have verbal descriptors.
72
The assessment process involves two steps. In the first phase, raters are asked to score the task relative to each subscale by placing a mark at some point along the 20 point scale. If the rater places a mark within one of the boxes then that mark is awarded. If the subject places a mark against the division indicator between two boxes the higher mark is allocated to the task. Ratings on each scale are multiplied by 5 to give each scale a value in the range 0 to 100. In the second phase, raters are required to make paired comparisons using each of the possible 15 pairwise contrasts between the sub-scales. In these comparisons, raters are asked to identify which of the two factors is the more important with regard to workload in the task being evaluated. A rank order of the sub-scales, from 0 to 5 is derived from the results, by which the individual subscale scores of the rated task are weighted. By summing the weighted scores and then dividing them by the sum of the weights, a Mean Weighted Workload Score is obtained, that indicates the level of workload expressed as a percentage (see Figure 18). If two or more qualitatively different tasks are rated with regard to workload, a separate paired comparison has to be made for each task (e.g., NASA, 1986). The NASA TLX is a more sophisticated tool than the MCH and requires both the analyst and task rater to be familiar with the assessment method. It has been suggested that with sufficient training the subject needs approximately one minute to complete the event storing and around three minutes to complete the paired comparisons. If these assessments are accurate then the method should score highly on the implementation assessment criteria (e.g., O’Donnell and Eggemeier, 1986).
Figure 18 NASA-TLX scoring example (underlined scale rated most important)
Despite the widespread popularity of the NASA TLX rating scale, the tool has been criticised on a number of occasions. In a study carried out by Pfendler (1991), for example, a more reliable estimate of workload demands was obtained using a Mean Unweighted Workload Score of the six scales, rather than the recommended technique that relies on the Mean Weighted Workload
73
Score for development an objective assessment of cognitive workload. Replications of these results have questioned whether the paired comparison element of the assessment method is required. A similar conclusion was reached in an evaluation of the method performed by Nygren (1991). He described the weighting procedure as ineffective and recommended that it be ignored when using the NASA TLX scale in applied situations (as opposed to situations involving laboratory-based assessments of cognitive workload). Further practical considerations associated with use of the scale have been discussed on several occasions (e.g., Beevis, 1992; Hancock et al, 1989; Hart and Wickens, 1990) and investigators have been both supportive (e.g., Hill et al, 1992) and critical (e.g., Veltman and Widdel, 1993). On the positive side, it is suggested that the method has good face validity and assignments are well accepted by the user community. Ratings can be obtained quickly from subjects and application of the methods tends not to intrude into the normal work activities of the raters. . Finally, the method is generally thought to be superior to MCH and other global rating scale methods due to its multidimensional character, which is thought better able to capture the complexity of cognitive workload. On the negative side, doubts about the technique have been raised from a methodological standpoint. It has been suggested that the validity of certain sub-scales used by the method is questionable or irrelevant. Veltman and Galliard (1993), for example, reported that in many experimental assessments, the sub-scale “frustration” and “physical demand” only show minimal influence on total workload. They floated the idea that these constructs might be dropped from the assessment method. Pfendler (1993), on the other hand, took issue with the validity of the workload computation. He suggested that the TLX scale shows lower sensitivity (i.e., ability to respond to variations in workload) than is the case for other scales such as the Sequential Judgement Scale (SJS) or the Dutch ‘Effort’ Scale (BSMI). Less problematic, but still relevant, various authors have raised doubts about the practicality of the TLX. Because of the relatively large number of dimensions being assessed, the TLX scale is unsuitable for real-time assessments obtained concurrently during task performance in a field situation. The instructions for use were found to be too extensive – at least in the earlier versions of the scale – and contained technical terms not immediately comprehensible to all subjects. As the technique is based on the use of expert judgments, it is necessary that the user has direct recent experience of the task being assessed. This means that the method has limited applicability for use in an undeveloped system at the concept stage of design. 5.2.3 Subjective Workload Assessment Technique (SWAT) The subjective workload assessment technique (SWAT) is part of a family of workload assessment techniques in which subjects are asked to rate the workload of a task on the basis of the dimensions of time load, mental effort load and psychological stress load. These dimensions are derived from a definition of cognitive workload proposed by Sheridan and Simpson (1979). The method uses conjoined measurement and scaling techniques, to combine
74
ordinal level assessments into a single, overall workload score which is a value on an interval scale. Each of the three subscales used for the assessment has three ordinal levels with verbal descriptors provided to aid the rating activity (see Figure 19).
Figure 19 High, Medium, Low descriptors for the three SWAT sub-scales (Reid and Nygren, 1988)
As in the case of the NASA TLX, administration of the SWAT scale involves a two-step procedure. The first phase of analysis involves scale development. Scale development is a complex procedure which requires a significant amount of time and resources to implement fully. Event rating, on the other hand, is a much easier activity and an individual assessment of task demand can be obtained relatively quickly as the work activity is being performed. In the first part of the analysis, a unique scale is developed for the assessment using a card sorting procedure. During this stage, the subject is asked to rank order all 27 possible configurations of the three levels of the three dimensions according to his or her perceptions of increasing workload. Thus, for example, a card with the combination 1,1,1 for time load (T), mental effort load (E) and psychological stress load (S), would exemplify the lowest possible workload, whilst the card with the combination 3,3,3 would indicate the “worst case” scenario with the highest possible workload. The results of the card sorting procedure are submitted to computer analysis – in which conjoined measurement and scaling techniques are used to convert the rank order data into an interval scale solution with the range 0 to 100. Each of the 27 possible TES configurations has a fixed value on the interval scale. When the card sorting procedure is given to groups of people, each individual’s rank orderings are likely to be slightly different. If there is enough agreement between card sorts in a population, however, formulation of a single group scale can be achieved (Reid and Nygren, 1988). Degree of
75
concordance among a population of subjects can be tested using a Kendall’s coefficient of concordance (W). A group scale solution is possible where W is 0.75 or higher. If, on the other hand, W is below this value a secondary procedure called SWAT prototyping is applied. This secondary procedure divides groups of raters into homogeneous subgroups. From the SWAT dimensions of time load (T), mental effort (E) and psychological stress (S), six hypothetical prototypes are specified: TES, TSE, ETS, EST, STE, and SET. The TES prototype, for example, places greater emphasis on the time effort dimension and least on psychological stress. In contrast to this, the SET prototype rates stress to be the most influential contributor to workload and time constraints to be the least important. Using these prototypes, subjects can be divided into groups to explain the differential experience of cognitive workload within a given population. The individual’s assignment to a prototype is achieved by submitting the individuals rank order score to analysis in which there assignments are correlated with prototypical rank order of each prototype. Subjects are assigned to the prototype which returns the highest correlation. Following scale development, and prototype assignment, subjects are asked to rate specific tasks (e.g., event scoring) with regard to time load, mental effort load and psychological stress load dimensions (e.g., 1,3,1). The scale value associated with this combination (e.g., 45.4) is returned as the workload assessment for that task expressed as a percentage. Group metrics are obtained by combining assessments into a single value using appropriate descriptive statistics. The SWAT approach has been evaluated on a number of occasions and has been found generally to be a useful technique for eliciting subject assessments of cognitive workload in a wide variety of situations (e.g., Damos, 1991; Hart and Wickens, 1988). The method provides a very quick, non-intrusive means of obtaining workload assessments, although this only really applies once the workload scale has been constructed or alternatively when subjects have been assigned to their SWAT profiles. The tendency of the method to express assessments in the form of a percentage provides a useful, intuitively compelling metric and raises the possibility that the method could be used to predict workload (Reid and Colle, 1988). There is very little evidence available in the public domain, however, to suggest that this is indeed the case. In the majority of situations reported, it appears that the method has been used most widely to detect workload variations in the attainment of work goals in real-time assessments (Corwin et al., 1989; Nataupsky and Abbott, 1987). Despite the appearance of rigour with regard to scale development and assignment of subjects to SWAT profiles, questions have been raised regarding the appropriateness of the three dimensions used to drive the assessment and doubts have been voiced about the validity of the values use to rate each item. DiDimenico (2003), for example, notes that the three dimensions of the scale were developed on the basis of an intuitive analysis and there is very little empirical evidence to suggest that they are the best indicators of cognitive workload. She points to the work of Boyd (1983) who
76
tested Sheridan and Simpson’s (1979) original definition of workload and concluded that the dimensions are not independent. In his experiments people’s TES ratings tended to increase for all three dimensions following a task demand change to a single dimension (e.g., increasing time load). Biers and Maseline (1987) also questioned aspects of the method. They compared workload assessments computed using conjoint measurement and scaling techniques with alternative assessments obtained using summative statistics and multivariate analyses. They concluded that the resultant scores were equally sensitive and highly correlated. The conclusion was drawn that workload measurement with SWAT can be equally effective without scale development. 5.2.4 Subjective Workload Assessment Techniques: Summary Multidimensional assessment scales, such as the NASA TLX and SWAT, appear to offer increased diagnostic capability relative to single index scale index approaches such as the Cooper-Harper Scale. One weakness of the single scale approach is that specific effects – such as time-pressure – can become obscured during computation of a single workload measure. Whilst, multidimensional scales still permit characterisation of workload in terms of a global measure they also allow the analyst the opportunity to look at the effect of sub-scale items such as time stress in isolation of the main result. On the downside, multidimensional scales take much longer to administer on the whole and can prove tedious for the participant to complete. In some cases, this may lead to the subject providing spurious assessments in order to complete the task – interestingly this is in itself a strategy designed to reduce the subjective experience of workload. A further problem with multidimensional scales relates to establishment of the meaning of the variable weights. Nygren (1991) has shown that equal weighting of the scales during computation of the overall workload metric is just a good a predictor of subsequent task performance that computations in which weightings have been varied based on the data presented. Subjective workload measures are sometimes dissociated from other measures of workload. For example, Sirevaag et al (1993) collected TLX ratings from helicopter pilots who were about to perform a training mission in a high fidelity simulator. During the mission, communication demands were varied such that pilots were found to have trouble adhering to nap-of-the-earth altitude criteria under high communication demands. The greater load imposed by the communication demand was reflected in several workload measures but not in subjective ratings. When questioned about this, pilots indicated that they were indeed aware of the greater difficulty involved in the demanding communication condition, but that their rating did not reflect this because they felt that none of the conditions exceeded their capacity to perform the task successfully, and so they rated each scenario as equivalent. Subjective measures are just that and subjects can adopt their own criteria for making the ratings.
77
Hendy et al. (1993) point out that both the TLX and SWAT go to great lengths to provide a composite measure of workload. They suggest that, if one is interested in a global measure, one could do just as well by asking subjects to estimate workload on a univariate scale. The procedures used by SWAT and TLX seem to assume that, although subjects may be able to give accurate estimates on specific workload components, they are not able to report reliably on the overall or global workload experience. In four studies, Hendy and colleagues (Hendy et al, 1993) compared the composite score on a modified TLX test with a univariate measure of global workload, asking subjects to use a magnitude estimation procedure to estimate the difficulty of various segments of flight relative to the difficulty of the take-off segment. They found that the univariate measure was more sensitive to variations in task difficulty than the TLX composite measure, or indeed any of its subscale scores. This suggests that, if a global measure of workload is needed, a simple univariate scale works as well as a multidimensional scale and does not attract biases due to people not taking care to complete all the assessments required for a full TLX/SWAT analysis. The work of Hendy et al raises an interesting question. How useful is it for a system designer to know that a workload problem stems from excessive mental effort, time pressure or frustration? Excessive time pressure, for example, could arise from a number of sources and the workload measure itself provides very little information regarding what aspect of the work situation would need to be changed to reduce workload levels. One way around this problem would be to use subjective assessment scales which focus upon appropriate system design features. For example, the knowledge that the human information processor encounters workload difficulties when concurrent tasks require the same information processing channels (visual perception) ought to allow the designer to infer that presentation of this data needs to be separated along a time continuum. Such assessment tools are currently available in the form of usability scales but these would need to be adapted to reflect potential workload problems. 5.3
Performance-based Measures
Psychologists routinely use performance data collected during the completion of well-defined experimental tasks as the dependant variable in psychological experiments, so it is perhaps only natural that a number of investigators have proposed that task performance measures can provide the basis for cognitive workload assessment (de Waard, 1996; Lysaght et al, 1989; O’Donnell and Eggemeier, 1986; Hart and Wickens, 1990). Two different types of performance-based workload assessment techniques can be identified from the psychological literature: Primary task experiments, in which performance data collected during completion of a main task is used to make inferences about workload, and secondary task studies, in which an additional task is added to the performance requirement, to consume any remaining available mental resource. Workload estimates are then based on the subjects’ ability to perform both tasks concurrently.
78
Irrespective of whether the workload assessment is based on analysis of primary task or secondary task performance, the data collected in performance based workload assessments is typically one or more of the following four speed-accuracy measures: time taken to complete the task, response latency, accuracy of response and, error rates. 5.3.1 Primary Task Measures In a typical psychological experiment, inferences about human information processing are based upon measures of task performance taken relative to the completion of a primary task, and several attempts have been made to use this technique for estimating workload. For example, O’Donnell and Eggemeier (1986) proposed that measures of primary task performance, obtained in the psychological laboratory, can provide an indication of the overall effectiveness of the human-machine interface when considered from the perspective of cognitive workload. Other investigators, however, have taken issue with this proposition and have suggested that it is not at all clear that primary task measures are directly associated with mental workload. Errors in performance, or the time taken to complete a task do not necessarily indicate high workload imposed by the primary task. These errors and can arise from a variety of sources including situations in which prevailing task demands are at levels too low to engage the subject, as in the case of vigilance experiments in which the operator may miss signals because they are so infrequent. With reference to the inverted U performance function, De Waard (1996), suggested that whilst primary task performance may well degrade with additional task demand, stable performance within the optimal zone does not necessarily reflect lower task demands. In some cases, the subject may be able to maintain performance whilst working at levels very near to capacity and this cannot be determined using primary task measures alone. In order to improve both the sensitivity and diagnosticity of performance-based measures, what is needed is the addition of a secondary task. 5.3.2 Secondary Task Measures Secondary task measures were originally used as tools to assess work capacity and limitations of the human operator, with regard to the performance of a primary task, as far back as the early 1940s (e.g., Bornemann, 1942). From this perspective, workload assessment is achieved by asking subjects to perform a secondary task at the same time as they complete a primary task. The approach is based upon the assumption – already discussed in relation to workload theory - that the human information processing system is severely limited, and spare capacity left by the primary task can be used to perform a secondary concurrent activity. Workload assessment by means of secondary task performance comes in two forms. In one scheme, subjects are instructed to maintain error-free 79
performance in relation to the secondary task. The workload assessment is then constructed based on analysis of primary task performance objectives. In the second scheme, the instruction is to maintain error-free performance relative to the primary task. Workload assessments are then formulated using indices obtained relative to the performance of the secondary task. The former approach has been called the “task loading paradigm” whilst the latter is known as a “subsidiary task” assessment. Although there is no reason to suppose that either one approach is to be preferred to the other, there has been the suggestion that the task loading paradigm assesses the workload imposed by a task directly, in contrast to the subsidiary task approach, which is assumed to provide a measure of spare capacity (Brown and Poulton, 1961). 5.3.3 Summary It was pointed out earlier that there are a number of difficulties associated with using secondary task performance as a measure of spare capacity. Subjects may use various strategies for dealing with what is basically a dual task situation and some might even prepare for the probe task despite the instruction to treat the task as secondary. In addition, the existence of multiple resources means that a given probe task, such as auditory detection, will vary in difficulty depending on its similarity to the primary task – due to variation in overlap of the particular resource used by each task. Finally the secondary task can be intrusive and therefore disruptive to performance of the primary task. As was discussed in relation problems associated with the evolution of single resource theory even very simple, dissimilar tasks can produce interference when both of them require use of the same cognitive process. These observations suggest that secondary workload measures need to be treated with caution when used as the main source for an index of cognitive workload. 5.4
Physiological Measures
Evidence that human physiology responds to variations in task demand has been available since the late 19th century. In a recent wide ranging review of the physiological literature, for example, Andreassi (2000) noted that skin conductivity (a measure of the activity of the eccrine sweat glands), cardiovascular activity, respiration, electrical activity in the brain, the peripheral nervous system and pupillary size, have all been shown to vary as a function of factors such as task difficulty, levels of attention, and activities involving decision making and problem-solving. These responses are largely involuntary and surprisingly sensitive which makes the assessment of cognitive workload on the basis of human physiology both a viable and attractive proposition (Ward and Marsden, 2004). The full range of physiological measures which have been nominated as potential indices of cognitive workload is very extensive and a full review is
80
beyond the scope of this particular survey. However, a flavour of the approach can be provided from consideration of some of the main candidate measures. 5.4.1 Electroencephalogram (EEG) The electroencephalogram (EEG) is a recording of the difference in electrical potential between various points on the surface of the scalp. EEG monitoring produces a rhythmic wave generated by cyclical changes in the membrane potentials of nerve cells located in the cortex. Disruption to the rhythmic pattern attributable to the response of the brain to an external event, i.e., the event related potential (ERP), can be identified by using averaging procedures. Thus, it is possible to associate ERPs to specific events or target activities such as presentation of a tone of a particular pitch embedded within a series of non-target tones. When particular cognitive structures can be related to specific components of the EEG it becomes possible to assess the impact of ERPs. For example, one component of the EEG, the P300 amplitude, is known to provide a measure of how much attention a person allocates to a stimulus signal (Israel, 1980). When the subject is asked to perform multiple tasks it has been observed that the P300 amplitude component of the EEG becomes attenuated following onset of the secondary task (e.g. Hoffman et al, 1985). The extent to which the measure changes, was found to vary as a function of the complexity of the secondary task. A similar finding was reported by Sirevaag et al (1988, cf, de Waard, 1996). These investigators reported that dual task performance was associated with decreased Alpha and increased Theta wave activity. Thus, it would seem that changes to the Alpha and Theta wave profiles might provide a means for examining workload psychophysiology. Measuring ERPs associated with probe items presented concurrently with a primary task is clearly a variation on the secondary task method and it is fair to ask whether the information provided by this technique is worth the additional expense and complexity associated with EEG recording. One advantage of using the P300 relative to reaction time measures is that it appears to be sensitive to perceptual/central processes but not affected by the response/motor system. Reaction time, by way of contrast, is sensitive to limited capacity processes which intervene between input and output. Viewed in this way, P300 potentially offers increased diagnosticity over behavioural measures. A good example of EEG use in relation to workload assessment is provided by Humphrey and Kramer (1994). They presented subjects with a dual task scenario in which operators were asked to monitor a series of gauges for critical readings whilst periodically solving arithmetic problems presented on the same screen. The difficulty of each task was manipulated and P300s collected in response to the presentation of information in each task. The goal was to determine the difficulty level (and presumably workload) that subjects experienced by examining the P300 amplitudes. They found 90 percent discrimination accuracy could be obtained by using 1 to 11 seconds
81
of logged ERP data, a result which suggests that the P300 has a valuable role to play in the analysis of workload physiology. A similar study is reported by Makeig and Jung (1996). They found that EEG could be used to monitor a subject’s alertness such that they could reliably predict, in real time, the likelihood that operators would miss occasional sonar signals by monitoring EEG power in particular frequency bands. If these results are replicated, it may become possible in the near future, to provide sensors – using wearable technology devices – which warn operators when they have entered a state of low alertness. 5.4.2 ECG and Associated Cardiovascular Measures The cardiovascular system is extremely responsive to psychological stimuli although individuals display considerable variation in their cardiovascular reactions to stressful events. Some individuals react quite strongly to provocation and show rapid marked elevation in heart rate and blood pressure measures. Others can be quite resistant to stress although people most show some degree of cardiovascular change in relation to the onset or withdrawal of psychological stressors (Hugdahl, 2001). In most research work, cardiovascular reactivity is defined as a deviation of a cardiovascular response parameter from a comparison or control value that results from an individual’s response to a discrete, environmental stimulus. This definition reflects the highly dynamic nature of cardiovascular measures which are best characterised using bandwidth norms. Observations of cardiovascular effects therefore are usually made by comparing shifting averages relative to prevailing bandwidth norms. In practical terms this means that in psychological assessments (including workload studies) individuals usually have to act as their own controls in a repeated measures experimental design. There are several options available for the collection of cardiovascular data. Robust heart rate measures can be derived using electrocardiography. An ECG analysis measures electrical activity in the heart and for assessments where the aim is to infer workload from consideration of an ECG; it is not the ECG itself that is of interest but the extent to which it enables the determination of the time duration between heartbeats. Heart rate (HR) is the number of times a heart beats within a specific time period – which is usually a minute. The mean heart period or the interbeat interval (IBI), on the other hand, is the average time duration of heart beats within that time period. Heartbeats have varying time durations, resulting in an IBI time series with characteristic patterns and frequency content. These variations are known as heart rate variability (HRV) measures and when subjects are required to expend mental effort to complete tasks, the load is usually reflected in increasingly elevated heart rate (HR) and decreased (HRV) when compared to resting situations (Lambertus et al, 2005).
82
A less intrusive means of obtaining cardiovascular data to make inferences about workload involves measurement of a person’s blood volume pulse (BVP obtained using a sensor placed on the finger or ear lobe of the subject. BVP sensors use photoplethysmography to detect blood pressure variations in the extremities of the body. Photoplethysmography is a process of applying a light source to the skin and measuring its reflection. At each contraction of the heart, blood is forced through the peripheral vessels under the light source, which modifies the amount of light returned to the photosensor. Systolic blood pressure values are obtained during peak pressure in the cardiac cycle. Diastolic pressure is obtained from identification of troughs in the BVP waveform. Since vasomotor activity – the activity which controls the size of blood vessels – is controlled by the sympathetic nervous system, the resultant BVP measure can display changes in sympathetic arousal. An increase in BVP amplitude is indicative of greater blood flow to the extremities, showing a heightened state of psychological arousal. 5.4.3 Electrodermal Measures Electrodermal activity (EDA) is a term used to refer to a range of techniques where the objective is to measure electrical phenomena associated with the skin. EDA techniques are frequently used in association with BVP measures and together the two provide the basis for polygraph testing. The practice of electrodermal recording to infer psychological states, including workload, dates back to the late 19th Century and data can be obtained with or without the application of an external voltage. According to Boucsein (2005) EDA signals are fairly easy to measure and interpret. The Galvanic Skin Response (GSR) is probably the best known instance of an event induced EDA. GSR is a measure of the skin conductance between two electrodes placed at different points on the subject’s body. Electrodes can be placed in any number of locations although the usual site is on two fingers of the same hand. Other locations have been used to collect GSRs for tasks requiring both hands for performance (e.g., driving). In very general terms, EDA is regarded as a physiological indicator of arousal related stress-strain processes. Consequently, variations in EDA measures are associated with activities of the sympathetic division of the autonomic nervous system such that sympathetic division domination tends to occur following the onset of stress or workload. Thus enhanced EDA are taken as evidence of increased psychological arousal. Conversely, reduced EDA are assumed to reflect increasing parasympathetic division dominance, indicative of decreasing arousal levels. In laboratory settings, EDA measures have been used for investigating workload associated with a wide range of response orientating situations and have been particularly useful for exploring the habituation process. Following presentation of a novel stimulus there will be a very rapid increase in skin
83
conductance producing a peak in the GSR signal (indicative of onset of an event related skin potential). Average readings will then increase as long as attention remains directed to the stimulus. After a period of time, either when the stimulus is removed, or the organisms reaction to it becomes habituated, the level of skin conductance will slowly degrade until it eventually returns to a base line state. 5.4.4 Electrooculogram (EOG) The electrooculogram is a technique for recording a wide range of eye motions including saccadic movement, smooth pursuit and smooth compensatory movements, nystagmus and eye blinks, amongst others (Shackel, 1967). Of these, measurement of pupil size and endogenous eye blinks has been mentioned most frequently in relation to workload assessment. Endogenous eye blinks (EEB) are those blinks which occur routinely in healthy adults in the absence of any identifiable, eliciting stimulus (de Waard, 1996). The sensitivity of three components of EEBs, eye blink rate, blink duration and eye blink latency; have each been explored as measures of cognitive workload. In one review of the field, for example, Kramer (1991) concluded that eye blink latency (speed of response of the blink following presentation of stimuli) and closure duration, both decrease as a function increases in task demand. Stern et al (1994), on the other hand, demonstrated that increased eye blink frequency was a strong indicator of fatigue. A similar observation was noted by Mallis and Dinges (2005) but in this case working from the perspective of vigilance decrements. They argued that decreased levels of alertness among subjects in a vigilance task can be identified on the basis of an analysis of eyelid closure in which eyes close slowly over a period of time. The same authors also noted that significant practical problems need to be overcome in order to collect this data. Specifically, detection of biobehavioural signs of alertness and drowsiness often require very intrusive physiological monitoring which often makes data collection in the field impractical. Pupil size is another ocular measure that has frequently been related to workload (e.g., Kahneman, 1973). Increased task demand is usually accompanied by increased pupil diameter when a person elects to engage with that demand. However, whilst pupil size can be considered a useful measure of increased arousal in the subject, it provides little data to help determine the magnitude of arousal, or indeed, the extent to which this heightened psychological state is likely to last. Consequently, the measure is perhaps best retained for use as one of a battery of test utilised to determine the onset of arousal.
84
5.4.5 Summary There is little doubt that physiological measures can be an invaluable aid in the analysis of mental workload both in the upper and lower ranges. As the above survey reveals, there is now a wealth of evidence to show that a person’s reaction to task demand is reflected across a whole range of physiological systems including the central nervous system, the sympathetic and parasympathetic divisions of the autonomic nervous system, the cardiovascular system, as well as those systems with responsibility for controlling respiration and visual processing. However, it should also be clear that there are significant obstacles to be overcome before psychophysiology is able to achieve its full potential. Specifically, physiological data collection techniques are expensive to implement, the sessions can be time consuming to plan and run, data analysis can be problematic and the equipment used is invasive and may possibly interfere with performance of the primary task the same way as occurs in dual task performance. Whilst, it is perfectly possible that these shortcomings will be overcome in the next few years with the advent of wearable computers which routinely human physiology, these devices do not yet presently exist in forms which offer reliable measures of physiological status (Ward and Marsden, 2004; Picard and Klein, 2002). 5.5
Task Loading Methods
The final types of measures discussed here take an engineering approach to workload assessment. Task loading techniques aim to measure workload based upon an estimate of task demands compared with the resources required to satisfy these demands. In addition, the psychological and environmental factors known to influence performance e.g. by influencing the effectiveness of the utilisation of these resources may be evaluated. Task loading models come in two main forms. As the name suggests, time-based evaluations tend to assume that task demands arise primarily in relation to time constraints and therefore provide indices which define workload as a function of the time needed to perform the task given the amount of time available to do the job. Values are set (typically between 25-75% fully loaded) which define acceptable workload limits. Cognitive transaction models, on the other hand, tend to evaluate the extent to which the structure of the task is likely to impact on the quality of human performance given the presence or absence of known performance influencing factors. Cognitive demands are typically calculated using tables, which encode known cognitive loadings associated with the performance of particular activities. Various examples of the two types of task load model are provided in the following sections. 5.5.1 Time Based Task Loading Models A flow chart describing a prototypical time based, task loading, approach to mental workload assessment is shown in Figure 20 below. This scheme was first used to assess the control room design of the THORP nuclear 85
reprocessing facility and was later refined by Penington et al (1993) and used to measure task demands implicit within the design of a newly automated chemical process control room. In the first stage of the assessment a task analysis is performed in which all the activities prescribed by the design of the process are documented using formal methods such as Hierarchical Task Analysis (HTA). HTA requires the analyst to decompose critical tasks into goals, sub-goals, and base-level events and to specify the plans which will describe how these activities are to be blended to produce an optimal job description.
Figure 20 Time based workload assessments methods
In the second stage, each activity is plotted along a timeline. Two sets of timing are usually collected. The first timings relate to the amount of time needed to perform each of the prescribed activities. The second set describes critical processes times, which may or may not be dependant upon the actions of the operator and/or operating team. When the method is being used prospectively, timing data is obtained using consensus panels of experts and operators. In the third stage of analysis, workload is calculated and a time plot is calculated to illustrate how workload demands vary throughout the duration of a shift. In the specific case discussed by Penington et al (1993) workload was calculated using the formula: Workload = 100 x [SH + (TTxN)] SL
86
Where: SH = Shift Handover Time, TT = Total Task Time, N = Number of operations per shift, and SL = Shift Length. All timing information was specified in minutes. Time based evaluations are typically used to compare workload implications of two or more systems and can be used either concurrently or, as in the case of THORP, prospectively. Time based evaluations are often used to evaluate the adequacy of manning levels where peak workload levels can often be managed by redistributing work activities throughout the members of the operating team. An example of the output from a recent time-based evaluation of train driver workload is shown in Figure 21. The data were collected using the recently developed Train Driver Mental Workload (TDMW) analysis tool (e.g., RSSB, 2005).
Figure 21 Sample time line graph (RSSB (2005) - Train Driver Mental Workload: The Time Line Analysis Guidance Note)
5.5.2 Task Analysis Workload Methodology (TAWL) The Task Analysis Workload (TAWL) Methodology is a task load approach to workload assessment developed within the context of military operations (specifically the US Army Light Helicopter Experimental Program – LHX). The method places primary emphasis on workload assessments based on construction of descriptions of the task or set of tasks to be performed which can take either a descriptive or prescriptive form. TAWL is a partially timebased analysis technique insofar as timing data can be incorporated into the evaluation. However, this data is only used where time constraints are deemed to be an important performance influencing factor.
87
The TAWL approach has dominated US army workload evaluations since the late 1980s. According to Mitchell (2000), present tools originate from two TAWL software based tools developed in the early 1990s. MAN-SEVAL (Manpower System Evaluation Tool) was a workload assessment module incorporated into a suite of human performance assessment tools known as Hardware Versus Manpower III (HARDMAN III) work package (cf: Allender et al, 1997). CrewCut (Little et al, 1993) provided more complex evaluations of workload and was developed to model dynamic relationships between workload and task performance. MAN-SEVAL System designers used the MAN-SEVAL module to predict likely workload demands for military systems in an early design stage. Assessments were achieved using VACP modelling techniques in which workload is characterised in terms of the demand placed upon the visual, auditory, cognitive, psychomotor components of human cognition. VACP modelling is a practical implementation of multiple resource theoretic models discussed in Section 3. In the first phase of a MAN-SEVAL assessment a task analysis is constructed and areas requiring dual task performance are singled out for special attention. Each sub-task is then categorised according to the VACP model and rated using the values shown in Figure 15 below. Workload metrics can be computed in the range 1 to 7 (with 7 representing full resource deployment) using one of several schemes dependent upon the issue being considered at a particular time. CREWCUT CrewCut is a tool used to compare operating team workload resulting from the particular design of crew workstations. The application allows designers to consider how an operating team might react to different task demands using a default set of workload management strategies incorporated into the software tool. The relationship between task demands and team performance with respect to workload is explored in the method using discrete event simulation, which are encoded using a task network simulation language (e.g., Hart and Wickens, 1990). CrewCut assumes that the evaluation of workload management allows designers to account for the fact that people adopt coping strategies to deal with workload demand. At the start of assessment, a threshold value is set to denote the point at which task demand is considered excessive (See Figure 8 Section 2.1). The CrewCut assessment module provides six workload management strategies for dealing with overload conditions and these are summarised in Figure 22.
88
Task Descriptors Visually Unaided Tasks (naked eye) Visually register or detect object Visually discriminate or detect visual difference Discrete visually inspection or check in static conditions Visually locate or align Visually track or follow target (maintain orientation) Read Scan or search monitor for multiple possible conditions Visually Aided Tasks (with NVGs) Visually register or detect object Visually discriminate or detect visual difference Discrete visually inspection or check in static conditions Visually locate or align Visually track or follow target (maintain orientation) Scan or search monitor for multiple possible conditions Auditory Detect or register sound Orientate to sound (general orientation or attention) Orientate to sound (selective orientation or attention) Verify auditory feedback (detect occurrence of anticipated sound) Interpret semantic content of message (simple) Interpret semantic content of message (complex) Detect auditory difference between similar sounds Interpret sound patterns (e.g., pulse rates, etc) Speech (Simple) Speech (Complex) Cognitive Automatic processing (simple association) Selection between alternatives Sign or signal recognition Evaluation or judgement (single aspect) Encoding or decoding recall Evaluation or judgement Estimation, calculation, conversion Rehearsal Psychomotor Discrete actuation Continuous adjustive Manipulative Discrete adjustment Symbolic production Serial discrete manipulation Figure 22 VACP workload ratings for task components
89
WL Rating 3.0 5.0 3.0 4.0 4.4 5.0 6.0 5.0 7.0 5.0 5.0 5.4 7.0 1.0 2.0 4.2 4.3 3.0 6.0 6.6 7.0 2.0 4.0 1.0 1.2 3.7 4.6 5.3 6.8 6.8 5.0 2.2 2.6 4.6 5.5 6.5 7.0
Strategy
Description
1
No effect: All tasks are performed, regardless of workload values. This is the default strategy. Do not begin next task: The start of the following task is deferred or cancelled by the operating team. This strategy is known as “task shedding”. Perform tasks sequentially: Time sharing between tasks can occur when the operating team perform task elements sequentially. Interrupt task performance: Interrupt performance of ongoing task in favour of later – but more pressing need – of subsequent task or tasks. Restart performance of interrupted task when workload falls to acceptable value Reallocate next task to contingency operator: If a spare operator exists then performance of certain tasks can be reallocated to spare person. On occasions the contingency operator can be an automated system (e.g., engagement of autopilot). Reallocate task: Reallocate ongoing task to contingency operator or automated system.
2 3 4
5
6
Figure 23 Predefined Workload Management Strategies for CrewCut
In addition to the default strategies, system designers can also create their own strategies for the operator to use. These are created by combining predefined system variables and arithmetic and logical operators into “if-thenelse” statements. The variables used by designers to establish the workload management strategy are: P: Task priority (a value between 1 and 5), H: Highest priority for any concurrent ongoing task set, T: Total workload for the operator after commencement of the next task, and S: The operators workload threshold value. An example of a defined strategy might be: If P > H, then 3, else 4. Where 3 and 4 are the predefined strategies shown in Figure 23 above. CrewCut output is used to optimise work flows in (usually) partially automated work systems. When potential problems with workload are predicted by the modelling activity, designers have the opportunity to revise job-designs and information flows, to allow effective management of foreseeable workload problems. Sometimes workload can be managed by making revisions to the design of the workstation. Sometimes, task demand can be managed by changing procedure, and on other occasions, when other means fail, by increasing manning levels. IMPRINT and WinCrew The latest releases of MAN-SEVAL and CrewCut are presently available as options in the US Army improved performance research integration tool
90
(IMPRINT)4. MAN-SEVAL has been retained in this suite of tools to provide quick and ready data of use in the early system design phase of military systems development. CrewCut has been retained as an advanced workload analysis model intended for use during the detailed engineering design phase. The CrewCut module is also available as a stand-alone commercially available application now called WinCrew. 5.5.3 Cognitive Task Analysis (OFM-COG) The third example of workload assessment using task loading measures is interesting because the method was developed specifically for use in a maritime context to assess the workload implications of ship-borne automation systems. The approach is based upon a state-transition type of task analysis and this is used to develop an Operator Function Model (OFM). The OFM provides a representation of the operator functions, sub-functions, behaviours and information needs constructed using the language and methods of discrete control event modelling. The technique was first developed in the mid 1980s and has been used on a number of occasions to evaluate operator behaviour in a number of high risk work environments (e.g., Miller, 1985; Mitchell and Miller, 1986). Mitchell and Miller claim that OFMs provide a rich, dynamic, task analytic structure for defining job profiles, which can be employed as an alternative to traditional static task analysis methods (e.g., Mitchell, 1987). In order to apply the approach to the evaluation of mental workload, Lee and Sanquist (2000) extended the mathematical base of the OFM model to include a description of cognitive transactions that impose cognitive load on the human operator. The resultant model – which has become known as OFM-COG, uses tables to compute cognitive load from the OFM task description. An example of one such table is provided in Figure 24. Column 1 of the table provides a description of the main cognitive transactions which may be used in the construction of the cognitive Operator Function Model component. The table also provides a brief description of its function. Column 2 indicates the category of human information processing implicated in completing the transaction The HIP model used to drive the analysis is relatively simple and has only three information processing stages: Information Acquisition, Information Handling and Information Interpretation. Column 3 defines the cognitive resource requirement associated with the cognitive transaction. Lee and Sanquist propose nine resources types as shown in Figure 25:
4
The tools are not generally available except in the case of CrewCut which has been released in commercial format as WinCrew.
91
Cognitive Agent Task Input Select. Selecting what to pay attention to next Filter. Straining out unimportant data streams
Acquisition
Detect. Is something there?
Acquisition
Search. Looking for something
Acquisition
Identify. What is it, what is its name?
Acquisition Interpret
Prepare Message. Prepare collection of symbols for sending as a meaningful statement Queue to channel. Lining up a process for near future performance
Category of IP
Acquisition
Handling Handling
Code. Translate item from one for to another
Handling
Transmit. Move place to another
Handling
something
from
one
Store. Keep something for future use Store in Buffer. Hold something temporarily Compute. Figure out something logically or mathematically answer a defined problem Edit. Arranging or correcting things according to the rules Display. Show something that makes sense Purge. Clear data Reset. Getting ready for some different action Count. Keep track of how many Control. Change an action according to a plan Decide/Select. Choose a response to fit the situation. Plan. Matching resources in time to expectations.
Handling Handling
Selective attention Perceptual sensitivity Distributed attention Sustained attention Perceptual sensitivity Perceptual discrimination Working memory Long-term memory Response precision Working memory Processing strategies Response precision Working memory Long-term memory Response precision Working memory Long-term memory Working memory Processing strategies
Handling
Working memory Processing strategies
Handling
Selective attention Long-term memory
Handling
Response precision
Handling
Selective attention
Handling Handling Interpretation Handling Interpretation Interpret Interpret
Test. Are things what they should be?
Interpretation
Interpret. What does this mean?
Interpretation
Categorise. Define and name a group of things Adapt/Learn. Making and remembering new responses to a learned situation. Goal Image. A picture of the task well done
HIP Resource requirement Selective attention Perceptual sensitivity
Interpretation
Selective attention Response precision Sustained attention Working memory Response precision Long-term memory Processing strategies Working memory Processing strategies Perceptual sensitivity Working memory Long-term memory Long-term memory Sustained attention Long-term memory Perceptual sensitivity
Interpretation
Long-term memory
Interpretation
Long-term memory Processing strategies
Figure 24 Miller's Cognitive Task Transaction List and require IP resource
92
• • • • • • • • •
Perceptual Sensitivity Perceptual Discrimination Selective Attention Distributed Attention Sustained Attention Working Memory Long-term Memory Response Precision Processing Strategy Figure 25 The Nine Cognitive Resource Types
The presumed relationship between each information processing strategy and its associated resource demand is summarised in Figure 26. Information processing category
Resource Demand
Information Acquisition
Perceptual sensitivity Perceptual discrimination Working memory Response precision
Information Handling
Selective attention Sustained attention Distributed attention
Information Interpretation
Long-term memory Processing strategy
Figure 26 Resource demand for each presumed stage of information processing
The OFM-COG workload evaluation is essentially a qualitative analysis, which uses expert judgement to explore the implications of task structure on mental workload. A table is constructed, similar to that required for a Failure Modes and Effects Analysis (FMEA) in which cognitive transactions are used to seed consideration of a variety of factors that will contribute to workload. For example, Figure 27 shows the results of an evaluation of the sub-function “Course Execution” which is deemed to consist of four activities: (a) Determine position, (b) Record position, (c) Monitor progress, and (d) coordinate with VTS and Pilots. This task description is shown in Column 1 of the table. Column 2 records the transaction type. These are obtained from the descriptions already provided in Figure 24. It should be noted that the term cognitive agent is used here in its broadest sense and can refer to transactions implemented by the crew member or automated systems. In column 3, the input channel is recorded. Location of position uses GPS data and this is shown in the analysis. Columns 4 and 5 note the resource demand (taken from Figure 26) and specify the expected output. The final
93
column records the presence of factors that will influence efficient task performance.
Figure 27 OFM-COG Analysis of track keeping sub-function within ECDIS
5.5.4 TAD & TADAM The final example of a task loading approach to workload assessment discussed here is perhaps the most complex, although it arguably has the greatest potential benefit for workload assessment in a maritime context. The approach is known as the target audience descriptor (TAD) model and a TAD assessment is achieved through application of the TAD Application Methodology (TADAM). TAD and TADAM were developed to assist in workload evaluations of air traffic control scenarios in the mid 1990s and were used particularly to explore the implications of automation on the ATCO role. TAD had three main purposes: • • •
To help identify possible areas of shortfall in the ability of ATCOs to address demand specific task requirements (especially those involving performance of safety critical tasks) To evaluate changes to the ATCO role implied by proposed deployment of new technology To help specify skill bases and profiles useful for selection and recruitment of ATCO trainees
The basic premise of the method is that prediction of human performance needs to consider the demands placed on the operator in the light of the abilities (or resources) that he or she possesses to meet those demands. Essentially, human error is viewed as arising from a demand/ resource mismatch. From this perspective, if a method is developed for measuring the
94
cognitive demands arising from a task, and the abilities of people to meet these demands, it would be possible to identify possible mismatches for existing or redesigned tasks. This, in essence, is what the TAD attempts to do. Any mismatches identified can then be overcome using the two classical human factors approaches. The ‘Fitting the person to the job’ approach would either select or train the operator to eliminate the specific skill deficiency. The ‘Fitting the job to the person’ strategy would involve a redesign of the task to eliminate the demands for which the required abilities were inadequate. In either case, a method for explicitly identifying the nature of the demand/resource mismatch would be essential. Implementation of a TAD analysis via the TAD Application Method (TADAM) involves three stages of work. TAD Construction In the first stage, a Target Audience Description (TAD) is prepared. The TAD is a profile of measures that are used to describe populations of stakeholders responsible for the completion of a particular job or work role. The ATCOTAD, for example, is a description of a given population of Air Traffic Controllers (ATCOs). Any TAD is made up of two different types of measures. The first set of measures is used to denote cognitive abilities. That is, the range of cognitive functions which all people possess but in different proportions (etc., memory, convergent/divergent thought, perceptual speed/accuracy). Cognitive abilities are deployed in response to prevailing task demands and will influence how well the individual is able to respond to those demands. Cognitive abilities are generally defined and informed by cognitive theory, empirical study within the target population and error data. A second set of descriptor measures are used to characterise an individual’s personality: Measurable demographic factors that describe an individual’s personality style which are expected to influence their willingness and/or ability to stay on task (e.g., Decisiveness, tenacity, stress tolerance, etc). Again, identification and selection of appropriate personality attributes are informed by cognitive theory, assessment of the target population and error data. Estimate of task demands (task demand profiling) In the second stage, a task analysis is performed to assess the cognitive demands that are implicit within a task or proposed task. The results of the task analysis are captured in a Task Demand Profile (TDP) in which the task is described in terms of a set of task demand descriptors (TDDs) - A description of the cognitive primitives (e.g., detects, assess, diagnose) used to describe cognitive tasks in work environments. A partial list of TDDs is shown in Figure 28 for reference purposes.
95
Task Demand Descriptors Detect Ascertain Recognise Interpret Assess Formulate Decide Act Initiate/Comply Check Monitor Status Tracking event Diagnose
Communicate Liaise Negotiate
Definition Become aware of initial stages of problem Observe or collect data from instruments based upon a specific need (e.g. confirmation of an initial diagnosis) Identify system state directly from a pattern of indicators Evaluate the implications of an event that has occurred Evaluate and select alternative goals Plan path by which goal will be successfully achieved Choose or formulate a procedure to achieve required objective Execute the chosen procedure following its selection Select appropriate procedure Perform visual examination to determine status Continually visually or aurally evaluate status in order to detect deviance from the expected state Continually track a sequence of events in order to detect deviance from an expected plan Identify a pattern of indicators and associate them with an underlying cause or causes Send information to another person or obtain or acquire an incoming message Communicate with more than one person or agency Co-ordinate and agree activities with more than person or agency
Figure 28 Sample list of Task Demand Descriptors (TDDs)
Development of a Task Demand Profile involves moving from the description of the task elements (captured in task analysis) to an alternative description based on the Task Demand Descriptors defined in Figure 28. The reason for this process is to enable this controlled set of terms (a lexicon) to express the mental and physical demands that are common to all tasks. The terms in the lexicon set out in Figure 28 can be used in a number of ways for this purpose. For example, if the terms are linked together by means of a flow chart, they can be used to express the way in which Task Demand Descriptors (TDDs) such as detection, communication, action and checking occur as a task progresses. Typically, such a process will be both iterative and interactive, with certain demands such as action checking and event tracking occurring repeatedly. Such a representation of a task structure is useful for identifying the nature of the demands on the operator and where these may give rise to
96
errors. The number of times that the various task demand descriptors occur within the task provides a simple measure of the overall level of demands. A more sophisticated estimate of task demands are obtained via the creation of the task demand profile – as shown in Figure 29. The abbreviations in the table are as follows: *
RR : Resource Requirements F: Frequency of Occurrence of a TDD in a task N: Sum of Frequencies of occurrence of each TDD in column 1 F/N : Relative Frequency of each TDD in Task (Calculated as F/N for each TDD Frequency)
The frequencies and relative frequencies of the TDDs present in a sample task are shown in the second and third columns of Figure 29 (see page 88) and can be interpreted as follows. The most extensive demands arise from the TDDs ‘Act’ and ‘Communicate’, each of which account for 20% of the total task demand (see column 3). Diagnose and event tracking each account for 1% of the demands and the rest are distributed across the TDDs as shown in column 3 of the table. This process provides a comprehensive profile of the demands arising from the tasks. However, this does not yet describe workload levels, are evaluated by examining the interactions from the deployment of resources to meet these task demands.
97
Task Demand Profile (F/N for each TDDs)
Memory
Attention
Verbal Skills
Spatial Visualisation
Perceptual Speed and Accuracy
Motor Skills
Decision Making
Convergent Thinking
Detect Ascertain Recognise Interpret Assess Formulate Decide Act Initiate/Comply Check Stat. Monitoring Event Tracking Diagnose Communicate Liaise Negotiate Abilities Requirements Profile (Average RR across all TDDs)
Freq. of TDDs in Standard C
Task Demand Descriptors (TDD)
Cognitive Skills and Abilities Resource requirements
7 12 10 16 11 10 16 46 11 13 11 3 2 45 9 7
0.03 0.05 0.04 0.07 0.05 0.04 0.07 0.20 0.05 0.06 0.05 0.01 0.01 0.20 0.04 0.03
0.40 0.40 0.70 0.70 0.55 0.50 0.40 0.60 0.75 0.55 0.60 0.70 0.60 0.40 0.40 0.40 0.54
0.75 0.80 0.60 0.70 0.60 0.70 0.60 0.40 0.55 0.70 0.75 0.80 0.70 0.70 0.60 0.60 0.64
0.15 0.40 0.20 0.30 0.20 0.20 0.35 0.20 0.30 0.15 0.40 0.40 0.35 0.70 0.75 0.80 0.62
0.55 0.50 0.70 0.60 0.20 0.40 0.60 0.35 0.30 0.45 0.60 0.55 0.60 0.20 0.30 0.25 0.41
0.70 0.60 0.50 0.40 0.40 0.25 0.45 0.45 0.60 0.70 0.60 0.54 0.44 0.35 0.30 0.30 0.47
0.10 0.15 0.20 0.20 0.20 0.15 0.20 0.90 0.60 0.30 0.25 0.30 0.15 0.20 0.15 0.10 0.37
0.45 0.40 0.30 0.60 0.80 0.75 0.90 0.30 0.75 0.35 0.55 0.60 0.90 0.30 0.40 0.70 0.49
0.45 0.50 0.30 0.60 0.70 0.65 0.85 0.30 0.60 0.20 0.45 0.55 0.75 0.30 0.40 0.60 0.46
Average Resource Requirements per TDD
0.444 0.469 0.438 0.512 0.456 0.450 0.769 0.438 0.556 0.425 0.525 0.555 0.561 0.394 0.413 0.469
Figure 29 Demand-Resource Matrix of Cognitive Skills and Abilities for an example task showing Ability Resource Requirements for each demand and Abilities Requirements Profile (ARP)
98
Resource Requirements weighted by frequency of demands Task Demand Descriptors (TDD)
Memory
Attention
Verbal Skills
Detect 0.01 0.02 0.00 Ascertain 0.02 0.04 0.02 Recognise 0.03 0.03 0.01 Interpret 0.05 0.05 0.02 Assess 0.03 0.03 0.01 Formulate 0.02 0.03 0.01 Decide 0.03 0.04 0.25 Act 0.12 0.08 0.04 Initiate/Comply 0.04 0.03 0.01 Check 0.03 0.04 0.01 Stat. Monitoring 0.03 0.04 0.02 Event Tracking 0.01 0.01 0.01 Diagnose 0.01 0.01 0.00 Communication 0.08 0.14 0.14 Liaise 0.02 0.02 0.03 Negotiate 0.01 0.02 0.03 Average DRP for whole task (CMWL measure)
Spatial Visualisation
Perceptual Speed and Accuracy
Motor Skills
Decision Making
Convergent Thinking
0.02 0.03 0.03 0.04 0.01 0.02 0.04 0.07 0.01 0.03 0.03 0.01 0.01 0.04 0.01 0.01
0.02 0.03 0.02 0.03 0.02 0.01 0.03 0.09 0.03 0.04 0.03 0.01 0.00 0.07 0.01 0.01
0.00 0.01 0.01 0.01 0.01 0.01 0.01 0.19 0.03 0.02 0.01 0.00 0.00 0.04 0.01 0.00
0.01 0.02 0.01 0.04 0.04 0.03 0.06 0.06 0.04 0.02 0.03 0.01 0.01 0.06 0.02 0.02
0.01 0.03 0.01 0.04 0.03 0.03 0.06 0.06 0.03 0.01 0.02 0.01 0.01 0.06 0.02 0.02
Figure 30 Resource Requirements for each TDD weighted by the relative frequency of the TDDs
99
Overall Demand/ Resource Profile (DRP) 0.09 0.20 0.15 0.28 0.18 0.16 0.52 0.71 0.22 0.20 0.21 0.70 0.50 0.63 0.14 0.12 0.31
Workload assessment Workload assessment involves the creation of two further profiles - the Abilities Requirements Profile (ARP) and the Demand/Resource Profile (DRP). The first stage in the process is to evaluate the proportion of the abilities in the TAD likely to be required to meet the demands represented by each of the Task Demand Descriptors in Figure 28. This uses a structured expert judgement process in which a small team of psychologists and human factors experts independently rate the proportion of each TDD that would be required to perform the task effectively on each occasion that the TDDs are invoked. These ratings are set out in Figure 29. Thus, if the TDD ‘Detection ‘ was required, each time a radar display had to be scanned, and this demand occurred seven times during a watch keeping task, these demands are rated as involving 40% of the available memory resource, 75% of the available attention resource and so on, as shown in the first row of Figure 29. It is assumed for the purpose of this example that the proportion of the resources required remains the same during each demand, but this assumption could me modified if a more detailed analysis were required. Figure 29 contains a subset of the TDDs shown in Figure 28, as it is assumed that only these are relevant to the example task assessment provided. If the Resource Requirements (RR) in Figure 29 are averaged vertically for each TDD, the resulting scores (the bottom row) represent the proportions of each ability that are required for each of the TDDs that are required in the task. These averages are called the Abilities Requirements Profile (ARP) for the Task (or Task Set). If the Resource Requirements in Figure 29 are averaged horizontally, i.e. within TDDs (See the last column of Figure 29), this provides a measure of the average resource requirements to satisfy each of the TDD demands. However, in order to develop a CMWL index from these interactions between demands and resources, this profile of resource requirements needs to take into account the frequency of occurrence of the TDDs within the Task or task set being evaluated. Each of the Resource Requirement entries in Figure 29 (columns 4 - 11) is multiplied by the Task Demand Profile (Column 3), which is the relative frequency that each TDD demand is invoked. The results of this process are the entries in columns 2-9 of Figure 30. The entries in the rows for each TDD in Figure 30 are summed to give the Demand/Resource Profile (DRP) in the last column. The Demand Resource Profile (DRP) is an attractive approach to CMWL measurement as it directly measures the interaction between demands and resources, which is central to the concept of CMWL. This column specifies the separate contribution of each of the task demands (TDDs) to the workload. Thus the amount of workload generated by the ‘Act’ component of the task demands (0.71) is considerably higher than the ‘Recognise’ component.. Thus, this analysis would suggest that reducing the action
100
demands in the task would have a greater impact in reducing workload than reducing the recognition demands. Alternatively, interventions to reduce workload by improvements in resources, e.g. by means of training, would best be directed at improving the motor skills component of the ‘Act’ demand, since this is the greatest contributor (0.19) to the greatest demand (0.71) in the Demand Resource Profile in the last column of Figure 30. A simple way to create a CMWL index for the task considered in this example is to average the values in the DRP column, to give a value of 0.31. If the TAD approach to workload evaluation were used in this way, it would be very important to apply the process at an appropriate level of aggregation for the task set. For example, if a very large task set were assessed in this way, the overall level of loading, as measured by the index, might appear to fall within an acceptable range. However, this might conceal the presence of certain tasks that that exceed the acceptable workload, and which therefore have a higher probability of failure. Thus, the workload analysis needs to be preceded by a comprehensive task analysis stage. 5.5.7 Summary of Task Loading Models Task loading models provide workload analyses that are driven largely by consideration of task demands. This approach has the advantage that the early stages of evaluation can be focused upon objectively verifiable data. As an example, the TAD Application Methodology comprises a set of analysis tools that can be used to assess cognitive workload in a variety of settings. The method comprises the following elements: Task Demand Profile (TDP): The relative frequency with which each Task Demand Descriptor occurs in the specific task. Abilities Requirements Profile (ARP): The average amount of each ability required to perform the task (obtained by averaging across Task Demand Descriptors). Demand Resource Profile (DRP): The combination of task demands (TDDs) and resources required to satisfy these demands, weighted by the frequency of occurrence of the demands. One of the major advantages of the TAD Application Methodology is that it allows tasks to be analysed from several perspectives. These include a task directed and abilities directed perspective and a consideration of the interaction between abilities (resources) and demands. These perspectives allow human factors interventions to manage workload to be based two alternative (or complementary) strategies. Task demands can be manipulated, e.g. by the provision of electronic support or alternatively, the effective personnel resources available can be increased e.g. training or selection policies to provide an improved set of abilities to satisfy demands. In terms of the usefulness of the TADAM approach as a workload measurement tool, the most vulnerable aspect of the methodology is the initial
101
assignment by the expert judges of the relative contributions of each of the components of the TDDs to the task. If this process can be shown to be consistent and reliable, then the TAD approach is very attractive because of its capability to evaluate the separate contributions of demands and resources to CMWL. 5.6
Influence Diagrams
Influence Diagrams (IDs) were first developed in the context of modelling and evaluating complex decision problems involving a range of stakeholders. They provide a graphical representation and process for modelling complex relationships between variables that influence the probability of outcomes of events. In the eighties, some researchers saw the potential for applying IDs for evaluating the probabilities of human errors in nuclear power emergencies and for modelling more general major accident scenarios. This approach emphasises the value of the ID as a means for identifying and structuring the factors that influence human and system failures, based on information gathered from a variety of sources. These include incident reports, insights from people who work in the system of interest, and any relevant scientific literature. The ID approach has mainly been applied to the analysis of human error in safety critical systems. However, a simple extension of the approach allows it to be applied to the measurement of Cognitive Mental Workload. In the following section, we will describe how the ID has been applied to modelling the factors influencing human error, and then illustrate how the technique can be modified for assessing cognitive mental workload. 5.6.1 Using Influence Diagrams to Model Causal Relationships The development of a comprehensive model of the factors that influence the likelihood of occurrence of an event is an essential first step in many aspects of systems reliability assessment. In fact, any form of predictive modelling relies on knowledge of these factors. In general, the probabilities of hardware component failures are influenced by a relatively small set of factors, which do not vary in their effects. It is therefore usually straightforward to aggregate the frequency of failure data across a large number of components to estimate overall failure probabilities, which can then be made available in a database. These data are then utilised in failure models such as fault trees to assess system reliability in areas such as Probabilistic Safety Analysis (PSA) or Quantitative Risk Analysis (QRA). However, in the case of human failures, the situation is much more complex. Human failures are influenced by a much wider range of variables than hardware, and similar considerations apply to modelling the factors that influence CMWL. In the following discussion, we shall first focus on modelling
102
human failures, and subsequently show how a similar approach can be applied to modelling CMWL. The ID can be regarded as a method for aggregating together all of the available knowledge of the factors that affect the likelihood of occurrence of a particular event, or influence some other quantity such as CMWL. It is also possible to construct separate IDs for systems and subsystems at various levels of aggregation, and then to combine these using logical or other relationships to produce a more comprehensive model. In the case of states such as the subjectively perceived overload that a person experienced when in a particular situation, where the both the nature and the magnitude of the factors that impact on this state are likely to be difficult to quantify objectively, the modelling process has to be able to accommodate a combination of objective, subjective and other types of ‘soft’ data. In this initial description of the ID we will focus on the primary application of the technique, the quantification of human error probabilities in safety assessments. Later sections will then describe the application of the process to CMWL evaluation. 5.6.2 Example Influence Diagram for a medical application An ID in its simplest form is a network of conditioning factors that influence the probability of the event of interest. Figure 31 illustrates how an ID could be used to model the factors conditioning the probability of an error in a medical context. This example concerns the injection of incorrect medications into the spinal cord of patients. The box at the top of the ID is the outcome of the scenario. This describes the probability (or frequency) of the event being analysed, given the state of the nodes below it. The boxes or nodes at the next level are the states of the factors that influence the conditional probability of the outcome event at the top of the tree. The state of any box or node in the diagram can be influenced by the state of the nodes below it, and this node in turn can influence any node above it. Factors that influence the outcome box are called Performance Influencing Factors (PIFs).
103
Figure 31 Influence Diagram for a medical accident scenario
In the case of the incident under consideration, a number of analyses have suggested that the occurrence of this type of incident is influenced directly by the following PIFs: Equipment similarity: The extent to which equipment, (e.g. syringes containing different drugs) are readily confusable. Level of distractions: If a number of procedures are being carried out simultaneously, or there are distracting events in the area where the drugs are being administered, then the likelihood of the error will be increased. Operator Fatigue: In this context, the ‘operator’ is likely to be a medical specialist such as a doctor or nurse, who often experience excessive levels of fatigue. Quality of labelling: For example, the extent to which the route of injection is clearly indicated in the labelling. Communication: In incidents of this type, there are typically a number of communication failures, e.g. between the pharmacy and the ward, and between members of the ward team. Operator competency: This relates to the knowledge and skills relevant to the task possessed by the individual or the team performing it. Quality of supervision: This factor refers to the extent to which work is effectively checked and monitored, thus increasing the likelihood of recovery if an error is made.
104
In many cases, these PIFs are multi-attributed (i.e. are made up of a number of constituent sub-factors), and hence they cannot easily be defined, or indeed measured, along a single dimension. In these cases, the factors are decomposed to their constituent PIFs, to the level at which that they become measurable. A combination rule is then used to aggregate measurements at this level to give a score to the PIF to which they are connected. Thus, the first level factor Quality of procedures is decomposed to the factors Quality of ward procedures and Quality of pharmacy procedures. Because these states are difficult to measure at this level of decomposition, they are further decomposed to the three factors of procedure accuracy, availability and usability, all of which can be measured relatively easily. It should be noted that if the same procedures were used on the wards as those in the pharmacy, then a single set of the influencing factors of accuracy, availability and usability would have linkages into both the Quality of ward procedures and Quality of pharmacy procedures. This is an example of how the ID can model the effects of the dependencies that need to de considered in order to obtain accurate results in reliability analyses. The decision regarding the depth of decomposition is a pragmatic one depending on the application. In general, the greater the decomposition the better specified and accurate is the model. This may be particularly important if it is to be used for quantitative applications. On the other hand, for some purposes, where accurate assessments of the state of a factor are either unnecessary or impractical (e.g. for highly subjective dimensions such as motivation), it may not be worth decomposing a factor to obtain a greater degree of resolution. 5.6.3 Different types of ID The preceding example has illustrated an ID for a very specific type of medical scenario. It is also possible to develop IDs for broad classes of situations, and to combine a number of discrete sub-models, using Boolean AND & OR gates, to produce more complex representations. In Figure 32 below, we illustrate a generic human error failure model, which represents a number of ways in which failures could arise, and then combines these together using an OR gate.
105
Figure 32 Generic ID human failure model showing use of logical combination rules
5.6.4 Developing the Influence Diagram The ID is developed by combining information about causal relationships from a variety of sources. These can include research findings, the analysis of incident reports and near misses, and inputs from subject matter experts. Normally, an initial ID is constructed off-line, based on research findings incident reports and other formal sources of evidence. This ID is then developed further in an interactive group setting, which includes subject matter experts with direct working knowledge of the domain of interest. This group is called a Consensus Group, and is designed to ensure that all available sources of evidence are combined within a highly structured process. The Consensus group is led by a facilitator, whose role is to manage the group interaction effectively, to ensure that the views of all the participants are captured. 5.6.5 ID Calculations Evaluation of Weights Weights are developed in the ID to represent the relative effects of the PIFs acting at each node in the network of influences. Thus, if the three factors influencing the PIF: ‘Operator fatigue’, i.e. Initial fatigue, Quality of breaks, and Time on shift were evaluated as being equally important, they would each be given weights of 0.333, 0.333 and 0.333 respectively, because the total contribution of the PIFs is equally distributed between the three factors in this case. If the last two of the three PIFs were half as influential as the first, the ‘normalised weights’ of 100/200, 50/200 and 50/200 respectively, i.e. 0.5, 0.25, and 0.25 would be assigned.
106
The assignment of weights is not mandatory in IDEAS. If weights are not assigned explicitly, the model assumes that all factors are of equal importance at a node, and assigns the corresponding weights. Where there is substantive evidence that the equal weights assumption is invalid, different weights should be assigned to the factors in the ID model. The assessment of weights is carried out during the group consensus sessions and is based on a combination of available evidence, including any relevant research data, field experience of the subject matter experts participating in the session, and information from incident analysis. As described in Section 3.3, weights can also be developed based on empirical data from incident investigations. Evaluation of ratings The subject matter experts then numerically rate the current state of the bottom level PIFs, as specified by the structure of the Influence Diagram model on a scale of, for example, 1 to 100. Normally the ends of the scale represent the credibly best or worst PIF conditions that could occur (these end points should not be taken to represent hypothetically best or worst cases, but rather those that could realistically arise or could be achieved). For example, a high level of time stress would be represented by a rating of 90, which would imply an increased level of errors. In the case of level of experience, 90 would represent the rating corresponding to a highly experienced individual. The most experienced individuals in the system the system would be rated at 100. Evaluation of the PIF rating at each node from the contributory PIF ratings and weights The numerical ratings and weights of the PIFs at each of the nodes of the Influence Diagram are combined to give the derived PIF rating for the node. The calculation of the derived PIF rating is carried out as follows: Derived PIF Rating for node j= (Rating of PIFj on contributory PIF1 x weight of PIF1) +(Rating of PIFj on contributory PIF2 x weight of PIF2) +(Rating of PIFj on contributory PIFn x weight of PIFn) Generally: Derived PIF Rating for node j = Σ RijWi Where: Rij = Rating of the jth node on the ith factor Wi = Normalised importance weight for the ith factor (weights sum to 1) Thus, if three PIFs, for example, Quality of procedures, Level of experience and Time available to carry out the task, were rated at 40, 10 and 50 respectively, and the corresponding importance weights were 0.5, 0.25 and 0.25, the derived PIF rating would be calculated as follows:
107
PIF Quality Rating (on 0-100 scale)
Relative Importance Weight Quality of procedures 40 0.5 Level of experience of personnel 10 0.25 Time available to carry out the task 50 0.25 Derived PIF Rating (Sum of Rating x Weight products) = 35
(Rating Weight)
x
20 2.5 12.5
SLI Calculation The Success Likelihood Index (SLI) is the derived rating at the top of the Influence Diagram that is obtained by repeating the calculations described above for each node in the tree. Once the ratings of the PIFs at the bottom of the ID have been made, the derived ratings for all the other nodes are obtained by carrying out the calculations specified above. The results of the bottom level ratings are ‘rippled up’ the tree using the weights at each node and the derived ratings from lower down the tree. The resulting SLI at the top of the tree provides an index which represents the extent to which the ratings of the situation being assessed represent the best (SLI=100) or worst case (SLI=0) conditions. The SLI can subsequently be calibrated to give an absolute probability, as discussed below. Calibration At the top level of the Influence Diagram, the SLI represents a relative measure of the likelihood that a situation being investigated will succeed or fail. In order to convert the SLI scale to a probability scale, calibration is required. If a reasonably large number of failure probabilities can be derived for different conditions, giving rise to different SLIs, then it is possible to perform a regression analysis or other type of analysis that will find the function that gives the best fit between the SLI values and their corresponding success or failure probabilities (or frequencies). The resulting regression equation can then be used to calculate the probabilities for the other events by substituting the SLIs into the regression equation. If there are insufficient data to allow the calculation of an empirical relationship between the SLIs and error probabilities, then a mathematical relationship has to be assumed. The simplest assumption is that the relationship is linear, as shown below, and this is recommended in the absence of hard calibration data: EP = A (SLI) + B Where EP = Event Probability A, B are constants If data are available for at least two samples of tasks where average failure probabilities are available and the average SLIs for these samples have been measured (assuming the same ID model applies), this allows the constants A and B to be evaluated. These can be substituted into the above equation, which can then be used to evaluate probabilities from the SLIs measured in other situations where the same ID model applies. A similar process can be
108
carried out graphically, as shown in Figure 33 below, where the dots represent SLIs and error probability data evaluated from four samples of tasks.
0.009 0.008
Human Error Probability (or failure frequency)
0.007 0.006 0.005 0.004 0.002 0.001 0
50
100
Success Likelihood Index (SLI)
Figure 33 Graphical method for converting the SLI to failure probabilities
Sensitivity analysis The nature of the IDEAS technique renders it suitable for 'what if' analyses to investigate the effects of changing the current ratings of the PIFs on the resulting error probabilities. For example, in the example in Section 5.6.5 above, the effects of increasing the time available to carry out the task can be investigated by assigning a rating of 90 instead of 50 to this factor. This increases the SLI from 35 to 45, which would decrease the expected probability of failure. However, an alternative intervention of improving the current quality of procedures from 40 to 90 would produce an even greater increase in the SLI to 60 and consequently reduce the expected probability of failure even more. The choice between these interventions would need to consider the relative costs and feasibility of these changes. 5.6.6 Applying IDEAS to CMWL Assessment Adaptation of the IDEAS process to develop a CMWL measurement tool requires a simple extension of the approach. An index of the CMWL for a task being performed in a particular context needs to be developed as a function of the Performance Influencing Factors in that situation. In many cases, the model will involve the decomposition of the PIFs into sub-factors to a level at which they can be reliably measured. A simple example of this approach is shown in Figure 34. The following diagrams are taken from an existing computer based Influence Diagram modelling tool called IDEAS, which has been developed in-house at Human Reliability Associates, and used in a number of projects by over the past ten years. This tool expresses the SLI as a number between 0, (worst case situation all PIFs at their worst values), and
109
1 (best case situation, all PIFs optimal). This assessment appears in the top left of the top box of the model. The ratings of the PIFs are on a scale from 1 to 100. In the context of CMWL measurement, the SLI is interpreted in a different manner from the its original meaning as an index of the quality of the PSFs that determine the likelihood of human error. Instead, the SLI is used as a workload index, where SLI stands for ‘Seafarers Loading Index’. There is a direct link between this interpretation and the more general meaning of the SLI discussed in previous sections. The SLI can be seen as a loading index, which is also related to the likelihood of error arising from overload. There is no intrinsic difference in this interpretation and that discussed in the medical scenario, except that the SLI is only considering factors that contribute directly to workload, rather than including other factors that may not be related to workload but which could still contribute to error probability. Actually, nearly all factors that could affect error probability can be mapped onto a workload scale. This is because in nearly all tasks, a mismatch between demands and resources is a major error mode. This particularly the case in tasks of a dynamic nature where systems such as cars, trains and ships are being controlled dynamically by one or more operator. This generalisation is less true in tasks where underload and overload is not likely to occur, but diagnostic errors arising from a misinterpretation of a situation are possible. Given this interpretation, we can use a similar calibration approach to converting a particular workload SLI to an error probability as we have used for the more general interpretation of the SLI discussed earlier (see Figure 33).
Figure 34 Example of an ID for High CMWL situations
In the simplified example shown in Figure 34, the CMWL for a high demand task is influenced by three primary PIFs: Task demands, Personnel resources, and PIFs that increase the severity of demands These primary PIFs are decomposed into sub-factors. The numerical values in Figure 34 are interpreted as follows, using the PIFs box on the right as an example.
110
The quality of PIFs assessed by the derived rating in the PIF box is determined by the quality of the automation systems (e.g. radar, GPS), the weather conditions (more severe weather means that the complexity of information processing requirements in general will increase) and the level of fatigue (efficiency of cognitive processing will degrade at high fatigue levels). The ratings assigned to the sub-factors can be interpreted as follows: Factor Distractions
Initial Ratings 80
Automation quality
50
Extent of Good Weather Conditions Fatigue
70
90
Interpretation There is a high level of distractions The quality of the bridge automation is average The weather conditions are good The bridge personnel are very tired
Range assumptions (1=Best case 100=worst case) (1=worst case, 100=best case) (1=worst case, 100=best case) (1=Best case 100=worst case)
Recalculated Ratings 80
50
30
90
Figure 35 Assessments of PIFs in Figure 34 showing scale reversals
Since the PIF box in Figure 34 refers to PIFs that increase loading, as this rating increases, the loading on the person, and hence the error rate, is assumed to increase. Figure 34 shows four factors as influencing the PIF box, Distractions, Automation quality, presence of good weather conditions and Fatigue. As some of these scales increase, they have a positive (decreasing) effect on workload. Examples in Figure 34 are Automation quality and Good weather conditions. These ratings therefore need to be subtracted from 100 in order to ensure that the derived rating (62) moves in the correct direction as ratings on this scale increase. This reversal of scales is shown column 4 of Figure 35, and is carried out automatically by the software if a minus sign is appended to the weight (see Figure 34) The other factors such as Distractions and fatigue will increase the loading as their rating increase. These scales therefore do not have to be reversed. The derived rating for the quality of the PIFs is calculated as follows: Rating for quality of PIFs Box = (80+50+30+90)/4 = 62.5 (rounded to 62 as shown in Figure 34) Let us assume that the SLI scale generated by the factors in the ID in Figure 34 is related to error probability in the following manner (based on field observations):
111
Conditions Worst case (SLI=1) Best case (SLI=0)
Error rate 0.1 (100 failures per 1000 demands 0.001 (1 failure per 1000 demands)
Figure 36 Calibration data for Figure 34
If the SLI model is calibrated using these data, the expected failure rate based on an overall SLI of 0.56 is 56 failures per 1000 demands (see Figure 34). If the fatigue level is reduced to its best case (a rating of 1), the predicted number of failures due to overload decreases to 49 per 1000 demands. Other similar analyses are possible to investigate the changes in loading and error rates as a function of varying the assessed conditions in the model. A great strength of the IDEAS approach is that it allows the model developed by the experts to be verified by including some known scenarios in the assessment, and seeing if the predicted workload is in accordance with their experience. The ‘What if’ capability also allows cost benefit analyses to be performed if alternative load management approaches are being investigated. 5.6.7 Conclusions on the applications of IDEAS to CMWL Assessment IDEAS is not in itself a theoretically based approach. Rather, it is an application framework, which allows the insights from a range of theoretical positions or pragmatic research to be cast into the form of a simple linear additive model. This model allows the development of a CMWL index as a function of the various factors that emerge from the theoretical and experimental research that has been carried out in this domain. The IDEAS approach provides a promising approach to CMWL evaluation from the following perspectives. •
Flexibility: the modelling approach allows a wide variety of factors that might affect CMWL to be included in the construction of a predictive model of CMWL.
•
Breadth of modelling: The model developed in the IDEAS environment can include both ‘soft factor’ such as subjective opinions and ‘hard data’ e.g. length of shift and work break schedules to be included.
•
Predictive capability: Although several of the techniques considered in this review could potentially provide predictions of CMWL, only IDEAS provides the capability to make these predictions as a function of the context i.e. characteristics of the task, the individual and the environment.
•
Linkage with error prediction: As discussed earlier, the IDEAS approach can evaluate the contribution of CMWL as part of a larger
112
predictive model of human error. This means that concepts of ‘acceptable workload’ at both the low and high demand ends of the spectrum can be defined operationally in terms of failure probabilities for the tasks being assessed. This is particularly valuable if these measures are to be used as part of an overall risk management framework. It would be possible for example to decide whether a given expenditure to produce a desired CMWL was defensible in terms of the expected levels of risk reduction. (i.e. reducing the likelihood of a particular undesirable outcome).
113
6. Industry Specific Workload Assessment Tools This section provides a listing of a selection of the main workload assessment tools which have been developed for use in specific contexts. For each tool several items are noted (where they are relevant or known): Tool name, originating industry sector, sponsor organisation, theory base and core measurement methodology. This information will enable the reader to follow through on any particular tool of interest. Because the most important tools have already been discussed at length elsewhere in this report the list of assessment tools are presented without further commentary.
Tool
Originating Industry Sector
Sponsor
Theory Base
Core Methodology
ATCO-TAD
Air Traffic Control (ATC)
CAA
Generic IP
Task load
AWAS
ATC
EuroControl
MRT
Subjective Assessment
Bedford Scale
Aviation
RAE
SRT
Subjective Assessment
BLV Questionnaire
Vehicle handling
NATO
Generic IP
Subjective Assessment
CART
Aviation
AFRL
CrewCut / WinCrew
Defence
ARL
MRT
Task load
ECG response
Cardiovascular
Physiological
EEG response
Neurology
Physiological
Electrodermal response
Neurology
Physiological
EOG
Visual Neurology
Physiological
FWTCI
Naval Ops
NAWC
IMPRINT
Defence (Land Ops)
ARL
IPME
Manual Ops
MA&D
Lysaght Test Battery
Vehicle handling
M-Model Obersver Pro
Generic Commercial
MAN-SEVAL
Defence (Land Ops)
Task load MRT
Task load Subjective Assessment
SCH
Secondary Task
Noldus
Behavioural
Primary Task
ARL
MRT
Task load
(table continues on next page)
114
Originating Industry Sector
Tool Modified Cooper-Harper
Aviation
NASA TLX
Aerospace
OFM-COG
Sponsor
Theory Base
Core Methodology
Generic IP
Subjective Assessment
NASA
Generic IP
Subjective Assessment
Maritime Transportation
University R&D
Generic IP
Cognitive Task Analysis
OWLKNEST
Military
ARI-IFRU
Generic IP
IKBS
POP
Manual Ops
MA&D
PUMA
ATC
NATS
MRT
Task load
Qh
Air/Rail Transportation
HEL
MRT
Task load
Rummel Test Battery
Vehicle handling
NATO
SCH
Secondary task
Sequential Judgement Scale
Vehicle handling
MRT
Subjective Assessment
SHIPSHAPE
Generic
CIL
SOLE
Military
DRDC
Generic IP
Task load
STRES
Aviation
AFRL
Generic IP
Secondary Task
SWAT
Aerospace
Armstrong Aerospace
Generic IP
Subjective Assessment
SWORD
Aviation
ARL
MRT
Subjective Assessment
TAWL
Defence (Aviation)
LHX
VACP-IP
Cognitive Task Analysis
TDMW (timeline)
Rail
RSSB
Generic IP
Timeline
WCFIELDE
Military
DRDC
MRT
Task Load
Subjective Assessment
IKBS
Figure 37 Industry specific workload assessment tools
RAE CART CAA RSSB ARL AFRL ARI-IFRU DRDC CIL MA&D
Royal Aircraft Establishment Combat Automation Requirements Testbed UK Civil Aviation Authority Rail Standards and Safety Board US Army Research Laboratory Air Force Research Laboratory Army Research Institute Infantry Forces Research Unit Defence Research & Development Canada Carlow International Ltd Micro Analysis and Design
115
NATS NAWC
National Air Traffic System Naval Air Warfare Center
116
7.
Conclusions
The review of the scientific literature on workload assessment reveals that the topic has proved an extremely rich, vibrant and productive work area with investigators contributing to the creation of an extremely large body of knowledge. Mental workload studies have tended to be of two types: Pure science - where investigators have a special interest in the development and evolution of cognitive theory and applied work – where the objective has been to measure the impact of task demands on the performance of the human operator in work settings. With some important exceptions, earlier work tended to reflect pure science concerns. Later investigations have tended to reflect applied concern with the role played by unsatisfactory workload levels in work based accidents. Within this latter strand of work, the literature further sub-divides into two sub-areas. One important area of work has concentrated on investigating human performance in conditions of high workload. Here the objective is mainly to identify psychological and environmental stressors that reduce a person’s ability to complete a task or range of tasks. The second work area has concentrated on conditions of low workload, where the operator must remain alert to low frequency but highly critical events. This latter work stream deals mainly with the maintenance of watchkeeping tasks and there is particular interest in mapping the vigilance decrement errors that occur during target detection tasks. Given the sheer size of the literature, it is difficult to provide a summary statement that encapsulates each individual approach. However, within applied work, it seems clear that each practitioner takes up a position in relation to the topic according to three parameters: background theory, preferred workload measure, and target industry. There are four main theoretical positions with regard to the topic of mental workload. Three of these, the single channel hypothesis, single resource theory and multiple resource theory, are rooted in the view of the human as a processor of information - much in the vein of a computer system. The fourth (e.g., EPIC) provides a composite model rooted in cognitive science. Conclusive evidence to allow investigators to choose between each theory has not yet been forthcoming but, on balance, it would seem fair to conclude that multiple resource theoretic models have been most successful in accounting for benchmark task interference affects. Consequently, most assessment tools tend to be based on the MRT perspective. This theory proposes that cognitive processes are limited both in terms of resources and cognitive structures and that the likelihood the dual performance will be affected by task demands depends largely on the types of tasks to be performed. With regard to workload measures, the review confirms that four different types of measures have been deployed in the assessment of mental workload: Subjective assessment, psychophysiological data, primary and secondary task performance measures, and task loading and influence diagrams.
117
Consideration of the development of industry specific tools reveals that most industries involving a significant loss potential have invested in the creation of workload assessment tools. Perhaps the interesting point here is the low levels of tool migration and sharing of data between sectors. Each industry has preferred to develop its own tool possibly due to a feeling that it has special requirements. For example, tools developed for use in assessments involving road vehicles have tended to favour measures obtained from the performance of primary or secondary tasks, presumably because these methods enable collection of workload data concurrent with performance of the task in a real-world driving situation.
118
References Allender, L., Salvi, L. Promisel, D. (1997) Evaluation of Human Performance under Diverse Conditions via Modeling Technology. Proceedings of Workshop on Emerging Technologies in Human Engineering Testing and Evaluation, NATO Research Study Group 24, Brussels, Belgium. Allport, A. (1993). Attention and control: Have we been asking the wrong questions? A critical review of twenty five years. In D.E. Meyer and S. Kornblum, eds., Attention and Performance XIV. Cambridge, MA: MIT Press Andreassi, J.L. (2000). Psychophysiology: Human behaviour physiological response (4th edition). Lawrence Erlbaum Associates.
and
Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychological Bulletin, 91, 276-292. Beevis, D. (1992). Analysis techniques for man-machine systems design. Report AC/243 (Panel 8) TR/7 Vol. 2. Brussels: NATO Defence Research Group. Beevis, D., R. Bost, B. Döring, E. Nordø, J.-P. Papin, I.H. Schuffel, and D. Streets. (1994). Analysis Techniques for Man-Machine System Design. AC/243 (Panel 8) TR/7. Defence Research Group, North Atlantic Treaty Organization. Biers, D.W. and Masline, P.J. (1987). Alternative approaches to analyzing SWAT data.. In Proceedings of theHuman Factors Society 31st Annual Meeting (pp. 63-66). Santa Monica, CA: Human Factors and Ergonomics Society. Bornemann, E. (1942). Untersuchungen über den Grad der geistigen Beanspruchung. II. Teil: Praktische Ergebnisse. Arbeitsphysiologie, 142, 12, 173-191 (cf: NATO, 2001). Boucsein, W., (2005). Electrodermal Measurement. In N.A.Stanton, A.Hedge., K.Brookhuis, E. Salas., H.Hendrick (eds) Handbook of Human Factors and Ergonomic Methods. CRC Press: London. Boyd, S. (1983). Assessing the validity of SWAT as a workload measurement instrument, Proceedings of the Human Factors and Ergonomics Society 27th Annual Meeting (pp. 124-128). Santa Monica, CA: Human Factors Society. Broadbent, D.E. (1958). Perception and communication. London: Pergamon. Brown, I.D. & Poulton, E.C. (1961). Measuring the spare ‘mental’ capacity of cardrivers by a subsidiary task. Ergonomics, 4, 35-40.
119
Caggiano, D.M., & Parasuraman, R., (2004). The role of memory representation in the vigilance decrement. Psychonomic Bulletin and Review. 11 (5), pp. 932-937. Cooper, GE & Harper, RP (1969). The use of pilot ratings in the evaluation of aircraft handling characteristics. Washington, DC: NASA, Report No. TN-D5153. Corcoran, D.W.J., Mullin, J., Rainey, M.T., (1977). The effects of raised signal and noise amplitude during the course of vigilance tasks. In R.R.Mackie, (ed) Vigilance: Theory, Operational Performance and Physiological Correlates. Plenum: New York. Corwin, W.H., Sandry-Garza, D.L., Biferno, M.H., Boucek, G.P., Logan, A.L., Jonsson, J.E., and Metalis S.A. (1989). Assessment of crew workload measurement; methods, techniques, and procedures: Volume 1. Process, methods, and results (Tech. Report WRDC-TR-897006). Wright-Patterson Air Force Base, OH: Wright Research and Development Center, Air Force Systems Command. Damos, D.L. (Ed.) (1991). Multinle task performance. Washington, DC: Taylor Davies, D.R., & Parasuraman, R., (1982). The Psychology of Vigilance. Academic Press: London. de Waard, D., (1996). The Measurement of Drivers’ Mental Workload. PhD Thesis. The Traffic Research Centre VSC, University of Groningen, The Netherlands. DiDimenico, A.T., (2003). An investigation on subjective assessments of workload and postural stability under conditions of joint mental and physical demands. PhD Thesis, Virginia Polytechnic Institute and State University. Doll, T.J., & Hanna, T.E., (1989). Enhanced detection with bimodal sonar displays. Human Factors. 31: pp. 539-550. Fisk, A.D., & Schneider, W., (1981). Control and automatic processing during tasks requiring sustained attention: A new approach to vigilance. Human Factors. 23: pp.737-750. Galinsky, T.L., Warm, J.S., Dember, W.N., Weiler, E.M., & Scerbo, M.W., (1990). Effects of event rate presentation of subjective workload in vigilance performance. Society for Philosophy and Psychology, Chicago, Illinois. Geddie, J.C., Boer, L. C., Edwards, R. J., Enderwick, T. P., Graff, N. Pfendler, C. Ruisseau, J., van Loon, P. A. (2001) NATO Guidelines on Human Engineering Testing and Evaluation. RTO Technical Report 21 BP 25, Cedex: France
120
Griffin, J.A., Dember W.N., & Warm, J.S., (1986). Effects of depression on expectancy in sustained attention. Motivation and Emotion. 10: pp.195-205. H.R. Booher (Ed.), (1990) Manprint: An approach to systems integration. New York: Van Nostrand Reinhold. Hancock, P. A., Meshkati, N., and Robertson, M. M. (1985). Physiological reflections of mental workload. Aviation, Space, and Environmental Medicine, 56(11), 1110-1114. Hancock, P.A., and J.S. Warm. (1989). A dynamic model of stress and sustained attention . Human Factors 31:519-537. Hart, S.G. & Staveland, L.E. (1988). Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. In P.A. Hancock & N. Meshkati (Eds.), Human Mental Workload. Amsterdam: North-Holland. Hart, S.G. and Wickens, C.D. (1990). Workload assessment and prediction. In H.R. Booher Ed.), MANPRINT. An approach to systems integration (pp. 257296). New York: Van Nostrand Reinhold. Hendy, K.C., Hamilton, K.M., & Landry, L.N., (1993). Measuring subjective workload: When is one scale better than many? Human Factors. 35, pp. 579601. Hill, S., Iavecchia, H., Byers, J., Bittner, A.C., Zaklad, A.L., and Christ, R.E., (1992). Comparison of four subjective workload rating scales. Human Factors, 34 (4), 429-439. Hoffman, J.E., M.R. Houck, F.W. MacMillan, R.F. Simons, and J.C. Oatman (1985) Event-related potentials elicited by automatic targets: A dual-task analysis. Journal of Experimental Psychology: Human Perception and Performance 11:50-61. Hugdahl, K., (2001). Psychophysiology: The Mind Body Perspective. Perspective in Cognitive Neuroscience. Harvard University Press. Canbridge: MA> Humphrey, D.G. & Kramer, A.F. (1994). Toward a psychophysiological assessment of dynamic changes in mental workload. Human Factors, 36, 326. Israel, J.B., Chesney, G.L., and Donchin, E. (1980). The event related brain potential as an index of displaymonitoring workload. Human Factors, 22, 211224. Jerison, H.J., (1959). Effects of noise on human performance. Journal of Applied Psychology. 43: pp.96-101.
121
Jerison, H.J., (1963). On the decrement function in human vigilance. In D.N.Buckner & J.J.McGrath., Vigilance: A Symposium. McGraw Hill: New York. Johannsen, G., Moray, N., Pew, R., Rasmussen, J., Sanders, A., and Wickens, C. (1979). Final report of experimental psychology group. In N. Moray (Ed.), Mental workload, its theory and measurement. New York. Plenum Press, 101-114. Joshi, A., Dember, W.N., Warm, J.S., & Scerbo, M.W., (1985). Capacity demands in sustained attention. Psychonomic Society. Boston: Mass Just, M. A. and Carpenter, P. A. (1993). The intensity dimension of thought: Pupillometric indices of sentence processing, Canadian Journal of Experimental Psychology 47: 310-339. Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice Hall. Kieras, D. E. and Meyer, D. E. (1997). An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction, 12, 391-438. Kramer, A.F. (1991). Physiological metrics of mental workload: a review of recent progress. In D.L. Damos (Ed.), Multiple-task performance. (pp. 279328). London: Taylor & Francis. Lambertus, J.M., deWaard, D., & Brookhuis, K.A., (2005). Estimating mental effort using heart rate and heart rate variability. In N.A.Stanton, A.Hedge., K.Brookhuis, E. Salas., H.Hendrick (eds) Handbook of Human Factors and Ergonomic Methods. CRC Press: London. Lanzetta, T.M., Dember, W.M., Warm, J.S., & Berch, D.B., (1987). Effects of task type and stimulus heterogeneity on the event rate function in sustained attention. Human Factors. 29: pp.625-633. Lee, J.D., and Sanquist, T.F., (2000). Augmenting the Operator Function Model with Cognitive Operations: Assessing the Cognitive Demands of Technological Innovation in Ship Navigation. IEEE Transactions on Systems, Man and Cybernetics – Part A: Systems and Humans. 30, (3), pp. 273-285. Little, R., (1993) Crew Reduction in Armored Vehicles Ergonomic Study Report number: ARL-CR-80, U.S. Army Research Laboratory. Loeb, M., Noonan, T.K., Ash, D.W., Holding., D.H., (1987) Limitations in the cognitive vigilance increment. Human Factors. 29: pp.661-674 Lysaght, R. J., Hill, S. G., Dick, A. O., Plamondon, B. D., Linton, P. M., Wierwille, .W., Zaklad, A. L., Bittner Jr, A. C., and Wherry, R. J. (1989). Operator workload: Comprehensive review and evaluation of workload
122
methodologies (ARI Technical Report 851). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Mackworth, N.H., (1948). The breakdown of vigilance during prolonged visual search. Quarterly Journal of Experimental Psychology 1: pp 6-21. Makeig, S., and T.P. Jung (1996) in Changes in alertness are a principal component of variance in the EEG spectrum, NeuroReport, 7(1), 213-216. Mallis, M.M., & Dinges, D.F., (2005) Monitoring Alertness by Eyelid Closure. In N.A.Stanton, A.Hedge., K.Brookhuis, E. Salas., H.Hendrick (eds) Handbook of Human Factors and Ergonomic Methods. CRC Press: London. Meijman, T.F. & O’Hanlon, J.F. (1984). Workload. An introduction to psychological theories and measurement methods. In P.J.D. Drenth, H. Thierry, P.J. Willems & C.J. de Wolff (Eds.), Handbook of Work and Organizational Psychology. (pp. 257-288). New York: Wiley. Miller, R.A., (1985) A systems approach to modeling discrete control performance, in W. B. Rouse (Ed.), Advances in Man-Machine Systems Research vol 11. Greenwich, CT: JAI, , pp. 177-248. Mitchell, C. M. (1996). GT-MSOCC: Operator models, model-based displays, and intelligent aiding. In W. B. Rouse (Ed.), Human/technology interaction in complex systems (Vol. 8, pp. 67-172). Greenwich, CT: JAI Press Inc. Mitchell, C.M., & Miller, R.A., (1986). A discrete control model of operator function: A methodology for information display design,” IEEE Trans. Syst., Man, Cybern., vol. SMC-16, pp. 343–357. Mitchell, C.M., (1987) “GT-MSOCC: A research domain for modelling humancomputer interaction and aiding decision making in supervisory control systems,” IEEE Trans. Syst. Man Cybern., vol. SMC-17, no. 4, pp. 553-570, July/Aug. 1987. Mitchell, D. K. (2000) Mental Workload and ARL Workload Modeling Tools. Army Research Laboratory, Report No ARL-TN-161. Moray, N. (1967). Where is attention limited? A survey and a model. Acta Psychologica, 27, 84-92. Muckler, F.A. & Seven, S.A. (1992). Selecting performance measures: ‘objective’ versus ‘subjective’ measurement. Human Factors, 34, 441-455. Mulders, H., Meijman, T., Mulder, B., Kompier, M., Broersen, S., Westerink, B. & O’Hanlon, J. (1988). Occupational stress in city bus drivers. In J.A. Rothengatter & R.A. de Bruin (Eds.), Road user behaviour: theory and research. Assen, The Netherlands: Van Gorcum.
123
NASA Task Load Index , (1986). TLX: Paper and pencil version. Moffet Field, CA: NASA-Ames Research Center, Aerospace Human Factors Research Division. Nataupsky, M. and Abbott, T. S. (1987). Comparison of workload measures on computer-generated primary flight displays. In Proceedings of the Human Factors Society 31st Annual Meeting (pp. 548-552). Santa Monica, CA: Human Factors Society. NATO (2001). Guidelines on Human Engineering: Testing and Evaluation. Final Report of the RTO Human Factors and Medicine Panel. Neuilly-SurSeine, CEDEX, France. Navon, D. (1984). Resources - a theoretical soup stone? Psychological Review, 91, 216-234. Navon, D. and Gopher, D. (1979). On the economy of the human processing system. Psychological Review, 86(3), 214-255. Norman, D. and Bobrow, D. (1975). On data-limited and resource-limited processing. Journal of Cognitive Psychology, 7, 44-60. Nygren, T.E. (1991). Psychometric properties of subjective workload measurement techniques: implications for their use in the assessment of perceived mental workload, Human Factors, 33, 17-33. O’Donnell, R.D. & Eggemeier, F.T. (1986). Workload assessment methodology. In K.R. Boff, L. Kaufman & J.P. Thomas (Eds.), Handbook of perception and human performance. Volume II, cognitive processes and performance. (pp 42/1-42/49). New York: Wiley. Pashler, H., (1994). Dual-task interference in simple tasks: data and theory. Psychology Bulletin. 116, 220– 244. Penington, J., Joy, M., Kirwan, B., (1993) A staffing assessment for a local control room. In B. Kirwan & L.K. Ainsworth., (1993) A guide to Task Analysis. London: Taylor Francis. Pfendler, C. (1982). Bewertung der Brauchbarkeit von Methoden zur Messung der mentalen Beanspruchung bei Kfz-Lenkaufgaben. Zeitschrift für Arbeitswissenschaft, 36, 170-174 (cf: NATO, 2001). Pfendler, C. (1991). Vergleichende Bewertung der NASA-TLX Skala und der ZEIS-Skala bei der Erfassung von Lernprozessen. Wachtberg: Forschungsinstitut für Anthropotechnik, FAT Report No. 92. Pfendler, C. (1993). Vergleich der Zwei-Ebenen Intensitäts-Skala und des NASA Task Load Index bei der Beanspruchungsbewertung während Lernvorgängen. Zeitschrift für Arbeitswissenschaft, 47.(Cf: NATO, 2001).
124
Picard, R. W., & Klein, J. (2002). Computers that recognize and respond to user emotion: theoretical and practical implications. Interacting with Computers, 14, 141-169. Preece.J., Rodgers, Y., Sharp, H., Benyon, D., Holland, S., & Carey, T., (2004). Human-Computer Interaction., Adison-Wesley: London. Reason, J.T., (1990) Human Error. Cambridge University Press: Cambridge. Reason, J.T., (1970). Skilled Performance in D.S.Wright, A. Taylor, D.R.Davies, W. Sluckin, S.G.M. Lee and J.T.Reason (Eds). Introducing Psychology: An Experimental Approach. Penguin Modern Psychology Texts. Pengiun Books: Harmondsworth. Reid, G. B. and Nygren, T. E. (1988). The subjective workload assessment technique: a scaling procedure for measuring mental workload. In P.A. Hancock and N. Meshkati (Eds.), Human Mental Workload (pp. 185-218). Amsterdam: North- Holland. Reid, G.B. & Colle, H.A. (1988). Critical SWAT values for predicting operator overload. In Proceedings of the Human Factors Society 32nd annual meeting. Santa Monica, CA: Human Factors Society. RSSB (2005) Train Driver Mental Workload: The Time Line Analysis Guidance Note. Report No. 20-T147-IOECCD. London: RSSB. Shackel, B., (1967) Eye Movement recording by electro-oculography. In P.H. Vanables & I. Martin (Eds)., A manual of psychophysiological methods. Amsterdam: North Holland. Shackel, B., (1981). The concept of usability. Proceedings of the IBM Software and Information Usability Symposium. pp: 1-30. Sheridan, T.B. (1970). On how often the supervisor should sample. in IEEE Transactions on Systems Science and Cybernetics, SSC-6, Pp. 140–145. Sheridan, T.B. and Simpson, R.W. (1979). Toward the definition and measurement of the mental workload of transport pilots (FTL Report R79-4). Cambridge, MA: Flight Transportation Laboratory. Sirevaag, E., Kramer, A.F., De Jong, R. & Mecklinger, A. (1988). A psychophysiological analysis of multi-task processing demands. Psychophysiology, 25, 482. Sirevaag, E.J., Kramer, A.F., Wickens, C.D., Reisweber, M., Strayer, D.L. & Grenell, J.F. (1993). Assessment of pilot performance and mental workload in rotary wing aircraft. Ergonomics, 36, 1121- 1140. Stern, J.A., Boyer, D. & Schroeder, D. (1994). Blink rate: a possible measure of fatigue. Human Factors, 36, 285-297.
125
Teigen, K.H. (1994). Yerkes-Dodson: a law for all seasons. Theory & Psychology, 4, 525-547. Treisman, A., (1969). Strategies and models of selective attention. Psychological Review, 76, 282- 299. Unema, P. (1995). Eye movements and mental effort. PhD Thesis, TU Berlin. Aachen, Germany: Verlag Schalter. Veltman, J. A. and Gaillard, A. W. (1998). Physiological workload reactions to increasing levels of task difficulty. Ergonomics, 41(5), 656-669. Veltman, J.A. & Gaillard, A.W.K. (1993). Indices of mental workload in a complex task environment. Neuropsychobiology, 28, 72-75. Ward R. D. and Marsden P. H. (2004). Physiological responses to different web-page designs. International Journal of Human-Computer Studies 59 (1/2), 199-212. Warm, J.S., & Jerrison, H.J., (1984). The psychophysics of vigilance. In J.S., Warm (ed) An Introduction to Vigilance: Sustained Attention in Human Performance. John Wiley: London. Warm, J.S., Dember, W.N., & Hancock, P.A., (1996). Vigilance and workload in automated systems. In R. Parasuraman & M. Mouloua (Eds), Automation and Human Performance: Theory and Applications. Mahwah, NJ: Erlbaum. Warm, J.S., Loeb, M., & Alluisi, E.A., (1970). Variations in watchkeeping performance as a function of the rate and duration of visual signals. Perception and Psychophysics. 7: pp. 97-99. Warm. J.S., (1984). An introduction to Vigilance. In J.S.Warm (Ed) Sustained Attention in Human Performance. John Wiley: London. Wickens, C. (1984/1991). Engineering Psychology and Human Performance. Columbus, OH: Merrill. Wickens, C.D. (1991). Processing resources and attention. In D.L.Damos (Ed.), Multiple-task performance. London: Taylor & Francis. Wickens, C.D. (1991). Processing resources and attention. In D.L. Damos (Ed.), Multiple Task Performance. Washington, DC: Taylor & Francis. Wierwille, W.W. & Casali, J.G. (1983). A validated rating scale for global mental workload measurement application. In Proceedings of the Human Factors Society 27th Annual Meeting. Santa Monica, CA: Human Factors Society.
126
Wierwille, W.W. & Eggemeier, F.T. (1993). Recommendation for mental workload measurement in a test and evaluation environment. Human Factors, 35, 263-281. Wyatt, S., and Langdon, J.N., (1932). Inspection processes in industry. Industrial Health Research Board. Report No. 63. HMSO: London. Yerkes, R.M. & Dodson, J.D. (1908). The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology, 18, 459-482. Young, M.S., & Stanton, N.A., (2005). Mental Workload. In N.A.Stanton, A.Hedge., K.Brookhuis, E. Salas., H.Hendrick (eds) Handbook of Human Factors and Ergonomic Methods. CRC Press: London.
127
Appendix 2 – Factors influencing workload in shipping incidents
128
This appendix gives examples of some workload errors in shipping. The section first explains how these examples were gathered – the databases from which they were gathered and the criteria for selecting appropriate accidents. The section then gives examples of workload errors, divided into three categories: (i) examples of errors due to high workload; (ii) examples of error due to low workload; and (iii) examples of errors due to low cognitive workload switching to high cognitive workload. The section ends by listing some Performance Influencing Factors that were noted during the examination of these accidents.
1.
Databases examined
To gather a sample of accidents in which cognitive underload or overload was a factor, accident reports from four separate databases were examined. These databases contained collections of marine accident reports from four major investigative bodies: • • • •
Australian Transportation Safety Bureau Marine Accident Investigation Branch (UK) National Transportation Safety Board (USA) Transportation Safety Board of Canada
Each of these four bodies is an independent organisation. The aim of each organisation is to determine the circumstances, causes and contributory factors of accidents. The organisations do not apportion blame or liability, nor are they responsible for enforcing recommendations made at the end of an investigation. The databases of reports from each organisation are freely available on the World Wide Web. Some of the databases contain more accessible accident reports than others. For example, the NTSB database contains approximately two hundred and twenty three reports, but only thirty-four of these are accessible via the website. The MAIB seems to contain the highest number of freely accessible reports, with two hundred and twenty reports between the years 1999 and 2005 alone. The reports tend to be quite similar in structure, and are generally divided into three or four sections: (i) factual information; (ii) analysis; (iii) conclusions; and (iv) recommendations. However, the reports can vary in length and detail, depending upon the circumstances and severity of the accident, and the level of investigation required by the investigative body.
129
2.
Criteria for selecting accidents
As expected, the four databases contain hundreds of marine accident reports, and so some selection criteria had to be determined in order to select a number of reports to use as case studies illustrating the effects of cognitive underload and overload on marine accidents. The project specification from the MCA gave an example of low workload as “maintaining watch with autopilot on in an open and calm sea at night”, and thus it was decided that the sample of workload errors should focus on accidents involving watch crew and / or the Officer of the Watch (OOW). These types of accidents have a tendency to result from errors during navigation, manoeuvring and control of the ship. A large number of accident reports from each organisation were examined. As mentioned previously, the reports tend to be quite similar in structure and, in most cases, contain a summary of the accident and probable causes at the beginning of the report. This made report selection somewhat easier, as the accident summary could be examined initially to determine whether there was any indication of cognitive workload as a factor in the accident. If that was found to be the case, then the full report was examined in more detail to determine how and why cognitive workload contributed to the accident.
130
3.
Number and type of accidents examined
The types of accidents investigated by the four organisations can vary greatly, even within the single marine domain. Typical accident types include collision, grounding, capsizing, sinking, fire on board, loss of vessel, injury on board, shifting or loss of cargo, and various near-miss accidents. Organisation
No. of Accidents
Range of Years
ATSB
25
1995 – 2002
MAIB
220
1999 – 2005
NTSB
34
1994 – 2004
TSBC
32
2001 – 2004
Figure 38 Number of accidents examined
Figure 38 above shows the number of accidents that were examined for this study. In total, three hundred and eleven accident summaries were examined to determine whether cognitive workload was a factor in that accident. Of these, a sample of thirty accidents was selected as there were deemed to be due, in some part, to cognitive workload errors, and so were examined in more detail. In some cases, even after examining the report in detail, it was difficult to determine whether cognitive workload played a part in the accident, because it was not commented upon in the accident report, or there was insufficient information available in the report to allow assessment of the workload of the watch crew. Thus, these accidents could not be chosen for the research sample. As mentioned previously, thirty accidents were examined in more detail for examples of cognitive workload errors. The majority of accidents in this sample were groundings and collisions. This is typical of accidents where cognitive workload was a factor, as accidents such as groundings and collisions tend to occur due to workload errors in manoeuvring, navigating and controlling the ship. Other types of accidents, such as fires, equipment failures, injuries, or cargo problems tend not to have cognitive workload as a factor, and usually do not involve members of the watch crew (while they are on watch, at least). Figure 39 below shows the thirteen accidents that were selected as they were deemed useful for this study, to demonstrate examples of errors due to high or low cognitive workload.
131
Report ID
Year
Organisation
Vessel Name(s)
Accident Type
159
2000
ATSB
Star Sea Bridge / Sue M
Collision
196
2003
ATSB
Lancelot / Jenabar
Collision
190
2003
ATSB
Tauranga Chief
Grounding
211
2005
ATSB
Spartia / Hannah Lee
Collision
MAIB 1/6/109
1999
MAIB
Baltic Champ
Grounding
32/2000
1999
MAIB
Dole America
Collision
Elm
1999
MAIB
Elm / Suzanne
Near-Miss
34/2000
2000
MAIB
Betty James
Grounding
7/2002
2001
MAIB
Lomur
Grounding
12/2005
2004
MAIB
Cepheus J / Ileksa
Collision
11/2004
2004
MAIB
Scot Venture
Contact
M02C0064
2002
TSBC
Canadian Prospector / Stellanova
Collision
M04L0050
2004
TSBC
Catherine Legardeur
Grounding
Figure 39 Examples of accidents due to high / low cognitive workload
The next three sections give examples of accidents in which errors due to cognitive workloads contributed to the accident.
132
4.
Examples of errors due to high workload
The following six accidents display errors due to high cognitive demand. In some cases, distractions combined with the already high workload to exacerbate the situation. In other cases, the errors occurred due to the stress of the high workload itself. Report ID: Elm Investigating Authority: MAIB Near-Miss Incident between Elm and Suzanne Accident summary: The Suzanne left her home port of Plymouth at 0445 on the 11th of February 1999, bound for fishing grounds 12 miles south-east of the port. She reached the fishing grounds at approximately 0600, and the skipper began preparations for shooting the fishing gear. At 0630, Suzanne shot her fishing gear and began towing in a direction of 190° at a speed of 2.4 knots. The correct lights and shape were displayed for a fishing vessel engaged in trawling. On board Elm, the master was on watch. Also present on the bridge was the night lookout. Elm was on passage from Belfast to Teignmouth steering a course of 075° at a speed of 10.1 knots. At 0645, with the master’s approval, the night lookout left the bridge because daylight was breaking. Shortly afterwards, the master selected the 16 mile range scale on the radar to obtain a range and bearing from Start Point. He then began entering waypoints into the GPS for the next leg of the voyage. While carrying out this task, the autopilot started operating erratically, applying port and starboard helm in rapid succession. The master then went to the autopilot control to try to rectify the fault. On board Suzanne, sometime before 0700, the skipper detected Elm approaching from the west-south-west. He also noticed that Elm was not on a steady course, but steering erratically. Due to this, the skipper was unable to ascertain whether or not there was a risk of collision until the distance between the vessels had reduced to mile. Realising that both vessels were on collision courses, the skipper of Suzanne called Elm on VHF radio to advise Elm’s watchkeeper of the situation. When no reply was received, he fired a red distress flare in the direction of Elm to attract attention, and then began taking evasive action by releasing the trawl warps from the towing block. This had the effect of pulling Suzanne astern. Realising that this action alone was insufficient to avoid a collision, he used the main engine controls to come hard astern. On board Elm, the master, who was still busy trying to rectify the fault with the autopilot, heard a call for an eastbound coaster on the VHF radio, and then saw a red flare. He went immediately to the port side of the bridge and saw Suzanne in close proximity on the port side He assessed by then that it was
133
too late to make an alteration of course without making the situation worse. Elm was still steering erratically. At approximately 0700, both vessels passed each another within a distance of 1 cable. Additional information: Before leaving the bridge at 0645, some 15 minutes before the incident, the night lookout failed to detect Suzanne. When he left the bridge she would have been bearing to port at a distance of 2.8 miles. The visibility at the time was very good and the sea state was calm. Suzanne was displaying the correct lights and shape for a vessel engaged in trawling and should have been easily seen by the lookout. It is possible that the night lookout had been busying himself with other tasks in preparation for going off-watch, or was in conversation with the master and had inadvertently ceased his night look-out duties when dawn began to break. Consequently, when the night lookout left the bridge, neither he, nor the master, was aware that both vessels were on or nearly on a collision course. When the master allowed the night lookout to leave the bridge, he then took over his responsibility. Being unaware that his vessel was on or nearly on a collision course with Suzanne he used the radar to start fixing the vessel’s position. Selecting the 16 mile range scale would have made it difficult to detect any targets in close proximity, especially with his mind being focused on obtaining a range and bearing from Start Point. When the position was obtained, he began entering way-points into the GPS. During that time, he was totally distracted from keeping a proper look-out. He was further distracted when the autopilot began operating erratically and, by trying to rectify the fault himself, without calling assistance to the bridge, made it impossible for him to maintain a proper look-out. The master should have made a full appraisal of the situation by sight and all other available means, including the use of short range scales on radar, before he relieved the night lookout and began fixing the vessel’s position. Had this been done, he would have probably detected Suzanne on or nearly on a collision course, and avoiding action could have been taken in ample time. When the autopilot began operating erratically the master should have called an additional person to the bridge, selected manual steering, and employed a dedicated helmsman so that he could act as lookout. Any work carried out on the autopilot should have been left until the master could be relieved from his look-out duties. HRA Comment: The lookout was probably engaged in low cognitive activities as it was the end of his watch, and he was preparing to leave the bridge. The master, on the other hand, was engaged in high cognitive activities. Firstly, he was navigating
134
the vessel, and then when the lookout left the bridge, he was focused on fixing the vessel’s position. He was then distracted by the autopilot’s erratic behaviour, and thus was not able to react in time when he saw the Suzanne. If not for the action taken by the skipper of the Suzanne, the two vessels probably would have collided.
135
Report ID: 211 Investigating Authority: ATSB Collision between Spartia and Hannah Lee Accident summary: At 0535 on the 15th of April 2005, the Greek registered bulk carrier Spartia collided with the Australian rock lobster fishing vessel Hannah Lee, 17 nautical miles off Cape Bouvard, on Western Australia’s south-west coast. Spartia was in ballast and making for the port of Bunbury to load alumina. Hannah Lee had departed the small port of Mandurah at 0345 to work its rock lobster pots located approximately 37 nautical miles south-west of the port. Hannah Lee’s skipper failed to observe Spartia in the time leading up to the collision as he was preoccupied with keeping his vessel on course. The bridge team on Spartia had identified the fishing vessel about 20 minutes prior to the collision. They had assessed that a risk of collision existed but, as Hannah Lee was on their port side, they maintained the vessel’s course and speed, in accordance with the international collision regulations. When it became obvious to the bridge team that Hannah Lee was not going to give way, the master ordered avoiding action, consisting of a change in course and turn to starboard. This action was ineffective in preventing the collision and Hannah Lee impacted Spartia’s port side, in way of number six hold a short time later. No one was injured in the collision and no pollution resulted. Additional information: On board the Hannah Lee, the skipper was on watch alone, and the two deckhands were asleep below in the sleeping quarters. The sea on the port bow meant that the skipper had to monitor the Hannah Lee’s position on the Electronic Chart System (ECS) and make frequent adjustments to the autopilot setting to keep Hannah Lee ‘on track’ (the autopilot and ECS were not interfaced). The sea on the port bow also meant that spray on the wheelhouse windows partially obscured the skipper’s visibility. The vessel was not equipped with any method to wash the spray off. During the passage out to the lobster pots, Hannah Lee was in the company of another fishing vessel, to port and several nautical miles ahead of it, skippered by the son of Hannah Lee’s skipper. The two men had tuned their VHF radios to channel eight so they could communicate. Just before 0535, Hannah Lee’s skipper stood up from where he had been sitting at the helm position. He looked up over the helm and saw ‘a wall’ (Spartia) in front of his vessel. Realising this was the side of a ship; he immediately swung the helm to port and was in the process of disengaging the autopilot when the two vessels collided. Hannah Lee’s starboard shoulder had struck Spartia’s hull.
136
HRA Comment: The skipper on board the Hannah Lee was engaged in high cognitive activity as he had to constantly monitor and make adjustments to the autopilot in order to keep the vessel on track. This meant that he was unable to maintain a proper visual lookout, as he was concentrating solely on the autopilot.
137
Report ID: M04L0050 Investigating Authority: TSBC Grounding of Catherine Legardeur Accident summary: At 1900 on the 26th of April 2004, the master and mate started their duty assignment on the Catherine-Legardeur. The ferry made its scheduled crossings every half hour and, at 0315 on the 27th of April, the ferry was stopped for the night at Sorel as scheduled. Visibility was reported to be good and operations had been uneventful. Later that morning, between 0500 to 0510, during preparations for the resumption of service, the master noticed that the visibility was reduced somewhat by fog. At 0510, visibility had cleared for about five minutes but then was again reduced as before. At 0530, winds were negligible, but the fog reduced visibility to less than 30 m as the vessel prepared to unberth and start its northbound crossing from Sorel. The ship’s whistle was activated and lookouts were posted forward and in the wheelhouse. Both radars were turned on and set in relative motion display, unstabilized, head-up presentation. By 0535, the vessel was underway and once it cleared the Lanctôt basin and passed the jetties, it quickly fell off to starboard into the river current. The master and mate both noticed that the gyrocompass repeater heading was rapidly turning to the east. When the navigation personnel monitored the radars, both set on the 1.5 nautical mile scale, the echoes of the nearby landmass were seen to be quickly shifting, creating a blurred image from which the navigation personnel were unable to determine the vessel’s position. Without visual cues or an understanding of the radar information because of the blurred image due to the swing, the master and mate used the GPS receiver’s changing readout to gain an appreciation of the vessel’s speed. As an attempt was made to bring the vessel to a stop with the help of this navigation instrument, the lookout reported seeing buoys ahead. The master manoeuvred to avoid the buoys and, shortly thereafter, at approximately 0545, the vessel grounded. Additional information: To navigate safely in restricted visibility, blind pilotage is knowing one’s vessel’s position, course made good, speed over the bottom as well as handling characteristics such that the vessel is guided on its intended track in a precise fashion. This is accomplished with the aid of the instruments at hand in the wheelhouse, while exterior visual aids are not necessarily available. Blind pilotage requires specific training in the use of navigation instruments. Expertise at carrying out blind pilotage is acquired through training in Simulated Electronic Navigation (SEN). The master had not received SEN training and his expertise in using radar and other navigation aids in blind pilotage situations was limited. His proficiency at manoeuvring and navigating the vessel across the river was
138
reliant on using visual cues. When visibility was reduced by fog to less than 30 m, the master could not carry out the series of tasks he routinely used to cross the river, as all visual reference was lost. Once clear of the harbour, the vessel quickly fell off to starboard into the easterly flowing river current. The master, being unable to interpret the changing radar display images or information from other navigation equipment, became disoriented. Since no counter rudder and engine manoeuvres were carried out to correct the situation in a safe and timely manner, the ferry veered off its intended course and ran aground. HRA Comment: The master of the vessel was used to navigating using visual references, and thus, when visibility was severely reduced due to fog, his cognitive workload was increased as he tried to navigate using instruments with which he was not familiar. When the ferry fell off to starboard, he became disorientated and unable to act sufficiently quickly to avoid the grounding.
139
Report ID: 32/2000 Investigating Authority: MAIB Collision of Dole America with the Nab Tower Accident summary: At 0402 on the 7th of November 1999, Dole America collided with the Nab Tower in the eastern approaches to The Solent. The vessel left her berth at Portsmouth at 0250, and proceeded under pilotage to the vicinity of the New Grounds buoy, about 2 miles to the north of the Nab Tower, where the pilot disembarked. Having dropped the pilot, the master began to increase speed, and ordered port helm to set a course to pass about 2 cables to the east of the Nab. As the vessel approached the tower, the master saw what he thought was a red light of another vessel at close range on the starboard bow and presenting a risk of collision. He ordered starboard helm before going to the front of the bridge to confirm what he thought he had seen. The second officer, who was the only other person on the bridge apart from the helmsman, joined him and confirmed the presence of a red light and said he saw a second to starboard of the first. The master then ordered hard to starboard helm. When no further lights were seen ahead, he ordered hard to port helm, still with the intention of passing to the east of the Nab Tower. From his position by the window at the front of the bridge, the master was unaware of the vessel's heading and her exact position in relation to the tower. He failed to take full account of the advance and transfer of his intended actions, given that the helm was hard to starboard and Dole America was swinging to starboard at the time of his order. Shortly afterwards, Dole America collided with the tower's foundation, bounced off, and then made contact a second time. Additional information: A dedicated lookout was not posted during Dole America’s departure from Portsmouth. Although the bridge was manned initially by four people, each had other duties assigned to him; the helmsman was steering the vessel, the chief officer was navigating, the pilot was conning, and the master was in overall command. The duties of the pilot, master and chief officer to some extent involved each of them keeping a lookout. However, there was no guarantee that at any one time a proper lookout was being maintained. The situation became less satisfactory when the master sent the chief officer below to rest. The master then found he was handling the additional task of navigating until the second officer arrived on the bridge. The situation further deteriorated when the second officer escorted the pilot to the pilot ladder; leaving the master to conn the vessel, navigate and maintain a lookout alone. The master’s perception and decision-making abilities may have been affected by fatigue and stress, but to what degree is uncertain.
140
HRA Comment: The master was in charge of conning the vessel, navigating and maintaining a lookout. When he saw what he thought were lights, he was distracted from his principal duties, and moved forward to check the lights. He then asked the second officer to move forward and confirm the presence of the lights. When the master and second officer realised they were going to collide with the source of the lights, they did not have sufficient time to fully appraise themselves of the changed situation.
141
Report ID: 11/2004 Investigating Authority: MAIB Contact with Number 16 buoy by Scot Venture Accident summary: At 0755 on 29 January 2004, the UK registered general cargo vessel Scot Venture made contact with Number 16 buoy in the Drogden Channel, Denmark, in restricted visibility. The vessel then anchored clear of the channel until towed to Malmo, Sweden, for inspection. Scot Venture’s propeller blades were distorted; the buoy was subsequently found to have been severed from its moorings. The contact occurred when Scot Venture was approaching the Drogden Channel from the south. The chief officer was the OOW. He was accompanied on the bridge by an AB lookout until the lookout was stood down at about 0715. At 0747, the chief officer called the master via intercom to wake him in preparation for taking over the watch at 0800. At about the same time, he recorded the visibility in the deck log, which he assessed to be about 2 miles, based on a recent sighting of a nearby vessel. After the vessel passed to the east of the Drogden Channel Lighthouse, he altered course to head towards the Drogden Channel’s southern entrance, in accordance with the voyage plan. The channel entrance was less than a mile away and marked by Numbers 16 and 17 lateral buoys. On completion of the alteration, the chief officer reduced the range scale on the port bridge radar display from 6 miles to 3 miles, and the starboard display from 3 miles to 1.5 miles. He did not detect any adverse weather conditions when doing so. Both radar displays were operating north up in relative motion, and were off-centred towards the south. Soon after the alteration, visibility significantly decreased because of snow. The precipitation also degraded the radar picture to the extent these buoys were no longer displayed. The chief officer immediately changed to hand steering, reduced the propeller pitch control to about 80%, giving a speed of about 12 knots, and called the master via intercom to go to the bridge immediately. Number 16 buoy was then sighted visually at close range off the starboard bow, and although avoiding action was taken, this was unsuccessful. Additional information: The reduction in speed, caused by moving the pitch control lever from 100% to 80% would have only been about 1.5 knots. As a result, Scot Venture continued to close the channel at about 12 knots, so reducing the chances of
142
the master reaching the bridge in time to influence the action and decisions required to be taken. The chief officer was aware that the ship’s track passed very close to Number 16 buoy, but remained on the planned course despite knowing there was sufficient water to the east. He also did not make any adjustment to the planned course to allow for the tidal stream. By changing to hand steering, with no-one else on the bridge to take the helm, the chief officer had to monitor the position of the buoy from near the centreline over the timber deck cargo. He was unable to move to the starboard bridge wing, from where a more accurate assessment of the ship’s movement and proximity to the buoy could have been made. As soon as the buoy was sighted, and port helm applied, the chief officer’s concentration on the relative position of the buoy was at the expense of monitoring the position of the rudder and the movement of the ship’s head via the rudder and gyro compass repeaters. When taking avoiding action, too much port helm was initially applied, and opposite helm was applied too little and/or too late to prevent the vessel’s stern from swinging into the buoy. The chief officer also lost his awareness of the relative position and proximity of the two vessels which were southbound in the channel. These factors indicate the chief officer would have been better placed to cope with the sudden worsening of the visibility, if he had been more familiar and practised with manoeuvring the vessel, and had not been alone on the bridge. HRA Comment: The chief officer was alone on the bridge, and responsible for the navigation of the ship, and thus was engaged in high cognitive workload. When the visibility suddenly reduced, he was placed in a dangerous situation where he had to make quick decisions in an unfamiliar area, on an unfamiliar vessel, near navigational dangers.
143
Report ID: 190 Investigating Authority: ATSB Grounding of Tauranga Chief Accident summary: Tauranga Chief arrived at Sydney from Port Kembla on 17 January 2003 on its normal liner route. It had sailed from Port Kembla the previous evening and arrived at the Sydney pilot boarding ground on schedule at 0300 local time. The pilot boarded as planned and the ship continued inwards toward the booked berth at White Bay container terminal. When the ship came to an intended course alteration position in the harbour, east of Bradleys Head, the pilot initiated the turn to starboard to round the headland. He firstly ordered 5° starboard rudder and, when the ship did not respond quickly enough, he increased the order to starboard 10°. The rate of swing increased markedly and so the pilot ordered port 20° to slow the swing. The seaman on the wheel made an error executing this last wheel order and instead applied starboard 20° wheel. Before the consequences of this error could be corrected, the ship ran aground on a mud/sand patch just south of the light on the southern end of the headland. Additional information: The error made by the helmsman was the ‘active’, unsafe act that led to the grounding. Such an error is what is known as an action slip, occurring at the skill-based level of performance. Slips are errors resulting from some failure in the execution of an action sequence and are due to a failure to monitor one’s own current intention. The evidence suggests that the pilot’s command was noted and recorded at the fringes of the helmsman’s consciousness, and correctly acknowledged but was not acted on because his attention was not focussed on the job at hand. None of the bridge team suggested that he did not understand the helm orders given. There were no indications of excess noise levels or aural disturbances to distract the helmsman at or about the time of the steering error. It is a common convention that, before ordering counter helm, a pilot would order the rudder be put amidships. This practice increases the salience of a potential change of rudder direction by directing the attention of the individual towards it. Therefore, without this cue, the possibility that a change of rudder direction was potentially imminent was reduced. This strategy is implemented as a means of reducing the risk of the type of error that occurred in this incident. It would be reasonable for a helmsman to anticipate such an order before applying the rudder in the opposite direction, especially given that the pilot had used this order before. An increase in rudder angle in the same direction is usually not preceded by any other special phrase (a reduction in rudder angle is normally either preceded by ‘port’ or ‘starboard’ as appropriate or by the phrase ‘ease to . . .’).
144
The pilot, as a matter of routine, normally ordered the wheel amidships before ordering a rudder angle in the opposite direction. On this occasion it seems that he did not, due to some slip or lapse in his mental processing. Although the pilot could not recall whether or not he had ordered ‘midships’ prior to the order for counter rudder the ship’s crew stated that there had been no order of ‘midships’ between the order to increase the rudder angle to 10° to starboard and the order for 20° of port rudder. The helmsman’s error may therefore have been further influenced by his expectation. The expectation may have existed that prior to any counter helm orders, he (the helmsman) would be instructed to return the helm to midships. Conversely, if this order is not given it is reasonable to assume that the helmsman would not expect a counter rudder order to follow. In its absence, the helmsman may have expected orders that either increased or decreased the rate of turn in the one direction (i.e., all orders to starboard direction), not an absolute change in rudder angle (i.e., from starboard to port). Consequently, when the order for counter helm was given (port 20°), without the expected cue ‘midships’, the second routine may have held greater salience and given the natural increment from 5°–10° to 20° (which potentially agreed with expectation) there was little to challenge the helmsman’s mind set. The role of expectation therefore may have further exacerbated the potential for error. HRA Comment: The helmsman may have been suffering from fatigue and the effects of circadian dysrhythmia due to the time of day. Therefore, following the orders of the master and the pilot probably required a great deal of cognitive effort on his part. As he was probably expecting a ‘midships’ command before changing direction, when he didn’t hear this he did not now correctly acknowledge the pilot’s order to turn to port.
145
5.
Examples of errors due to low workload
The following five accidents display errors due to low cognitive demand. In some cases, fatigue combined with the already low workload to exacerbate the situation. Report ID: 196 Investigating Authority: ATSB Collision between Lancelot and Jenabar Accident Summary: At 0400 on the 21st of August 2003, the bulk carrier Lancelot was off Diamond Head, on the New South Wales coast, heading south towards Newcastle. The visibility was good, and the second mate had earlier sighted the lights of a group of four fishing vessels to starboard. He used the automatic radar plotting aid (ARPA) to assess their movements. When the mate took over the watch, at 0405, he too used the ARPA to plot the movements of the approaching fishing vessels. The ARPA indicated that the nearest fishing vessel was on a reciprocal course and that its closes point of approach (CPA) was more than 1½ miles to starboard. He altered the ship’s course to starboard, clearing the other vessel, before ordering the original course resumed. The mate then realised that the second fishing vessel was on a collision course and he ordered the helm ‘hard to port’, attempting to steer away from it. However, the fishing vessel continued to close with the ship and, when the mate realised that a collision was imminent, he ordered full starboard rudder to minimise the angle of impact. At 0427 the two vessels collided. On board the Jenabar the watch changed at 0330, when a deckhand took over. He checked the radar, the chart plotter and the gauges for the engines’ revolutions and temperatures. He checked that the navigation lights were on before he sat behind the table in the wheelhouse, looking out and watching the fishing vessel ahead. The engine controls and helm were approximately two metres from where he was seated. The vessel was equipped with a ‘watchguard’ vigilance system to alert watchkeepers by sounding an alarm at predetermined times. However, this was not switched on. The watch was due to change at 0430. At about 0425, the deckhand looked at the plotter and the radar. He noted, from the GPS, that Jenabar’s speed was 8.8 knots. There was clutter on the radar, but no echo of any approaching ship and he sat down again. When, a minute or so later, he looked up, he noticed the dark hull of a ship, extremely close to the starboard bow. He attempted to get to the helm to disengage the autopilot and alter course, but before he could do so, Jenabar collided with the ship.
146
Additional information: The deckhand had not maintained a proper visual lookout, but was seated at a table about two metres from the wheelhouse front. The lights of the leading fishing vessel would probably have been visible, giving him a point of reference for his own vessel’s course. The deckhand’s hours of work for the week leading to the collision were analysed, using the FAID5 programme, for the possibility that he might have been fatigued. Based only on his hours of work, the FAID analysis did not suggest that he would have been fatigued during his watch on the morning of the collision. However, in the evening before the collision, the deckhand had been out for a meal and he returned to the fishing vessel at midnight. At most, he would have had about three hours of sleep before being awakened for his watch. Given the time of day and the deckhand’s uncertain schedule before he returned to the fishing vessel when it sailed from Forster, fatigue at some level was probably present during his watch. HRA Comment: The deckhand was on watch for one hour. At the beginning of the watch he checked the various navigational instruments in the wheelhouse, before sitting down approximately two metres away from the engine controls and the helm. He did not make any visual checks before sitting down. Whilst sitting down, the deckhand was looking out at the fishing vessel ahead, and thus was engaged in low cognitive activity. There is no evidence to suggest that the deckhand checked any of his navigational equipment, or made any visual checks over the period of his watch, until about five minutes before his watch was due to end. At about 0425, the deckhand checked the plotter and the radar, but did not see any indication of approaching ships, even though the Lancelot would have been on his radar for about four minutes at this point. The deckhand sat down again, and about a minute later looked out and suddenly saw the Lancelot. He did not have time to get to the helm and disengage the autopilot before the two vessels collided. Although the deckhand may have been suffering from some low level of fatigue, there is also evidence to suggest that he did not maintain a sufficient watch due to the low levels of cognitive activity required of him. The weather was good, the seas were relatively calm, and he was watching the lights of the fishing vessels ahead. As a result he was not aware of the impending collision, and unable to react in time to try to avoid it.
147
Report ID: 34/2000 Investigating Authority: MAIB Grounding of Betty James Accident Summary: On the evening of the 9th of July, 2000, the merchant fishing vessel Betty James landed her catch in Mallaig, Scotland. The landing of 150 boxes was completed at 2130. Between 2145 and 2215 the three deckhands went ashore to the Marine Hotel bar where they were joined at about 2300 by the skipper. The crew remained there until returning to Betty James at midnight. They then had a cup of tea before sailing for the Hillies Edge fishing ground at 0015. The skipper took the steaming watch out of Mallaig until relieved by a deckhand at 0115, when approximately south of Sleat Head. The vessel was being steered by autopilot on a course of about at a speed of 7 knots. Before leaving the wheelhouse the skipper asked the deckhand if he was fit to take the watch, and instructed him that if he was feeling too tired he was to call him or the next watch. He also directed that the hour passage to Hillies Edge was to be divided equally among the three deckhands. The skipper then went to bed. A few minutes later, the deckhand left the wheelhouse to make a sandwich. He returned at approximately 0125, turned on the wheelhouse lighting, reduced the volume on the radio, and sat down to read the Sunday newspapers. Between 0135 and 0145 he fell asleep and a planned alteration of course to take the vessel between the isles of Rhum and Eigg was missed. A watch alarm was fitted and working, but it failed to wake either the watchkeeper or the crew asleep below in the accommodation. At 0230 Betty James grounded on rocks on the south-east coast of the Isle of Rhum. Additional information: The watch alarm beeped intermittently every 4 minutes and, if not reset by one of two reset buttons within a minute, sounded a continuous tone. The alarm could be heard in the wheelhouse and galley area but not in the accommodation, and could be reset by the wheelhouse watchkeeper while seated. A television, video recorder and a domestic radio were fitted in the wheelhouse, but there were no formal guidelines stating when, and under what circumstances, these were to be used at sea. The reading of newspapers while on watch was a practice accepted by the skipper. The use of wheelhouse general lighting at night while underway was not. The deckhand on watch probably had no more than 6 hours sleep in the 24 hours before the grounding. However, he was aware that he would be taking the first wheelhouse watch after sailing from Mallaig but opted to accompany the rest of the crew to the Marine Hotel bar, rather than taking the opportunity
148
to rest. By asking the deckhand if he was fit to take the watch, then telling him to call either himself or the next watchkeeper if he felt too tired, the skipper demonstrated that he was conscious of the need of the person on watch to be alert. He went to bed only after being assured by the deckhand that he was fit to take the watch. Taking charge of the navigational watch at 01 15 on 10 July, the deckhand felt no more tired than he had on previous occasions. It is apparent, however, that he was more fatigued than either he or the skipper appreciated. Any tendency for him to fall asleep was probably further exacerbated by the working routines and ergonomics in the wheelhouse. The reliance placed on the video plotter for the vessel’s safe navigation, along with the lack of ship echoes on the radar, meant there was little to tax the deckhand on taking over the watch. Initially he felt he had sufficient time to leave the wheelhouse to make a sandwich, and on his return he considered it quiet enough to put on the bridge lights and sit down and read newspapers. He was able to monitor and control all key equipment and instrumentation while remaining seated, and from this position could reset the watch alarm. After a week of disrupted sleep and several beers, the deckhand was now sitting down in the wheelhouse with little to do except read the newspapers. With the radio on low volume in the background, the balance was tipped and he fell asleep. The skipper had firm views of when and under what circumstances the television, video recorder and domestic radio could be used in the wheelhouse. However, these were not formally laid down as guidelines to the crew. This resulted in different interpretations regarding their use. The skipper believed that in some circumstances such equipment, along with other activities such as reading, helped to alleviate boredom, particularly when towing. This may have been true to a certain extent, but at least two problems would have been encountered. First, a watchkeeper watching the television could not have maintained a proper lookout and second, a recreational rather than a formal working environment may have been generated. In a recreational environment, a person is likely to feel comfortable, and, if the conditions are right, he may be more prone to falling asleep than otherwise might be the case. The use of general overhead lighting in the wheelhouse would have severely impaired the deckhand’s ability to maintain a proper visual lookout. HRA Comment: Although fatigue and alcohol were both factors in this accident, low cognitive workload also played an important part. The deckhand did not have any duties, either physical or cognitive, to keep him alert. The accepted practices of reading and listening to the radio on watch, whilst sitting down, contributed to his low cognitive workload, and ultimately, contributed to his falling asleep.
149
Report ID: MAIB 1/6/109 Investigating Authority: MAIB Grounding of Baltic Champ Accident summary: Baltic Champ arrived port side alongside the outer western berth in Kirkwall at 0905 on 3 February 1999, with a cargo of containers, stowed both above and below deck. Discharge operations were started at 0930, but were suspended by shore staff at 1100, due to strong winds. At 1300, the master was instructed by the harbour master to leave the berth to allow Contender, a regular-calling ro-ro vessel, sufficient sea room to berth sternfirst at the ro-ro berth, situated ahead of Baltic Champ. Initially, the master protested and offered to move astern to the outer limit of the pier. However, after being told that Contender’s bow thruster was inoperable, he agreed to anchor off the port. At 0130 on 4 February, the master relieved the chief officer on watch and checked the vessel’s position by radar and GPS navigator. The master was alone in the wheelhouse with a stand-by crewman stationed in the messroom. A watch alarm was not fitted on the vessel. The master drank a cup of coffee and sat in the starboard wheelhouse chair for about half-an-hour. He then walked around the wheelhouse, occasionally looking at the radar to check the vessel’s position. After a while, the master noticed from the radar display that the vessel was moving quickly astern. He called the stand-by crewman on the intercom and said, “Quickly to anchor”. He also telephoned the chief officer, and then put the propeller pitch ahead in an attempt to arrest the vessel’s drift astern, which he thought was due to the anchor cable having parted. The master instructed the crewman to get ready to heave or slack away more cable, expecting him to put the windlass into gear. Seeing that the vessel was close to the shore, he then instructed the crewman and the chief officer, who was proceeding forward, to heave the anchor cable. The vessel grounded on her starboard side aft, approximately 0.65 miles from the anchorage position. Additional information: The master slept from 2100 on the 2nd of February to 0300 on the 3rd of February and from 2030 on the 3rd of February to 0130 on the 4th of February. He also had an hour’s rest on his cabin daybed between 0900 and 1100 on the 3rd of February. On taking the watch at 0130 on the 4th of February, he felt rested and fit for duty. He had not consumed alcohol and had not taken drugs or other medicines. Prior to the incident, he did not leave the wheelhouse, and at no time did he feel drowsy or fall asleep.
150
HRA Comment: The master was not fatigued or under the influence of alcohol or drugs. He was on watch alone in the wheelhouse and he, appropriately, walked around and checked radars to keep alert. However, he still had a low cognitive workload as the vessel was anchored, and not engaged in any activity. Therefore, he did not detect that the vessel was drifting in sufficient time to prevent the vessel from grounding.
151
Report ID: 12/2005 Investigating Authority: MAIB Collision between Cepheus J and Ileksa Accident summary: The container ship Cepheus J, and the general cargo ship Ileksa, were transiting the Kattegat off the Danish coast on the 22nd of November 2004 when they collided. Both were following the recommended route ‘T’ on a south-easterly course. The weather was overcast and windy with rain, localised sleet and poor to moderate visibility. It was still dark. Cepheus J was proceeding at 16 knots and Ileksa at 6.5 knots. On Cepheus J’s bridge, the chief officer had sent the lookout to clean the crew mess room, while he continued completing paperwork, standing at the chart table on the port side of the bridge. From there, he had an unrestricted view ahead, and close by, to his right; the displays of the ECDIS and radar were available. He did not see Ileksa until after the collision. On board Ileksa, the third officer had just taken the watch and the master was also on the bridge, sending a daily report to the company. Cepheus J was noted astern by radar at between 3 and 3.5 miles and visually sighted astern at 1.5 miles. When the ships were 0.5 mile apart, Ileksa called Cepheus J by VHF radio to establish what her intentions were. When no reply was heard, and with the ships approximately 0.3 mile apart, Ileksa began to take evasive action, but with the wind ahead was not able to turn sufficiently to avoid collision. At 0519 UTC, the two ships collided, with Cepheus J’s bow striking the stern of Ileksa. The impact caused severe damage to Ileksa’s stern and holed Cepheus J above the waterline. Both vessels were able to resume their voyages; there were no injuries and no pollution. Additional information: The chief officer on board Cepheus J was required to compile a record of the temperatures maintained in the refrigerated cargo containers. This required that the temperature be recorded every 6 hours. The method adopted was for the temperature to be obtained by the AB on watch and then transferred to a fair copy by the chief officer. There were 61 refrigerated containers on board Cepheus J at the time of the collision, which was fewer than usual, however, the chief officer still spent a considerable amount of time each day completing the fair copy temperature log. It was this task that the chief officer was engaged in at the time of the collision. As a Ukrainian national, the chief officer was naturally interested in the political situation of his country in the run up to the presidential elections. This is why he chose to listen to the radio during the time he spent on watch. He was listening to the radio, broadcasting the news in Russian, at the time of the collision.
152
Fatigue was not an issue in this accident, with the OOWs on both ships recording sufficient hours of rest. With the visibility restricted to 1.5 miles, Ileksa would have been visible from Cepheus J for 9 minutes before the collision. Ileksa was displaying the lights required for a vessel of her size, which would have meant that Cepheus J would have seen at least a single white sternlight. However, additional lighting was illuminated around the accommodation, which would have enhanced the visibility of the vessel from astern. Cepheus J’s OOW was involved in tasks which distracted him from his primary duty of lookout. This meant that he was not paying attention to the radar or to keeping a visual lookout. Because he was listening to the news on the radio, he was unable to monitor the VHF radio effectively so he missed yet another indication of the presence of Ileksa. HRA Comment: The OOW on the Cepheus J was engaged in tasks that distracted him from the navigation of the vessel. He was not paying attention to the radar or keeping a visual lookout. He also missed the VHF radio call because he was listening to the radio.
153
Report ID: 7/2002 Investigating Authority: MAIB Grounding of the Lomur Accident summary: Lomur hauled her nets at 0530 on the 14th of June 2001 and started a 16mile passage towards Scalloway from a position to the west of Vaila. Steering was by autopilot. The skipper was on watch in the wheelhouse and read several telex messages before sitting in the starboard wheelhouse chair at about 0540. The remaining crew worked the fish until about 0600, and then joined the skipper in the wheelhouse to discuss fish tallies and to smoke cigarettes. Several minutes later, the deckhand went to bed, and the engineer went down below to check the engines. He then drank a cup of coffee in the mess room. Alone again in the wheelhouse, the skipper called the fish market in Scalloway via mobile telephone for about 5 minutes to advise them of the catch he would be landing that morning. At about 0610, the vessel passed Skelda Ness and altered course to port, to head towards the entrance to the Middle Channel. About 20 minutes later, as Lomur passed the northern point of the Cheynies, course was adjusted several degrees to starboard to enter the Middle Channel, and the watch alarm was reset. The skipper remained seated and kept the vessel in autopilot during both of these alterations and, soon after the second, he fell asleep. Lomur continued on the course set on the autopilot until she grounded on Hoe Skerry at about 0635. Additional information: As the skipper had only managed about 7 hours sleep, taken in three separate periods, during the 3 days before the grounding, it is almost certain that he fell asleep while on watch in the wheelhouse, through fatigue caused by inadequate rest. Disrupted sleeping patterns, and lack of sleep while trawling, are common and recognised causes of fatigue among fishermen. On this occasion, the problem was exacerbated by fishing inshore, the duration of the tows, and by operating with a crew of three, one of whom was very inexperienced. The skipper was unable to take adequate rest periods, and his intention to ask one of the off-crew to join while the vessel was in Scalloway, was recognition of this. It is, therefore, concluded that Lomur was inadequately manned for the pattern and duration of work she was conducting. Given the skipper’s lack of sleep, it is not surprising that he was tired, but tiredness alone does not cause a person to fall asleep. Other preconditions are also required. The skipper was alone in the wheelhouse on a calm summer morning, sitting on a chair within reach of all key equipment, and with the steady throb of the engine being the only noise. He was also navigating by eye in very familiar waters, and steering by autopilot. Such an environment encouraged inactivity and undoubtedly caused the skipper to feel comfortable
154
and relaxed and, therefore, more likely to succumb to the effects of fatigue, and to fall asleep. Although it was usual practice for the skipper to be accompanied in the wheelhouse when entering and leaving harbour, this was not the case on this occasion. As the engineer was in the mess room drinking a cup of coffee when the vessel grounded, it is concluded that the practice was one of routine, rather than a requirement. Had the engineer also been in the wheelhouse, his presence might have either raised the skipper’s alertness and prevented him from falling asleep, or at least alerted the sleeping skipper as the vessel neared Hoe Skerry. A major function of a watch alarm, is to wake a sleeping watchkeeper. To be effective, however, this must be done in sufficient time to allow action to avoid an accident. It follows, therefore, that the interval set on a watch alarm needs to be commensurate with traffic density and the proximity of navigational dangers. In this case, with Lomur on passage through confined waters, the watch alarm did not wake the skipper in time to take action to avoid Hoe Skerry because it was set at a 10-minute interval. The subsequent fitting of a watch alarm with a 3-minute interval is, therefore, considered to be prudent. HRA Comment: Both the skipper and the watchkeepers should take full account of the quality and quantity of rest taken when determining fitness for duty. Particular dangers exist when the watchkeeper is alone. It is all too easy to fall asleep, especially while sitting down in an enclosed wheelhouse. Watchkeepers should ensure they remain alert by moving around frequently, and ensuring good ventilation. A tired watchkeeper who is sitting down with little to do, is more likely to fall asleep than one who is busy. The skipper’s decisions to navigate by eye, and to remain in autopilot, contributed to his inactivity. Although navigation by eye through the Middle Channel was reasonable, in view of the sea and weather conditions and the skipper’s local knowledge of the waters, the use of electronic aids to navigation would have been beneficial. Not only would they have provided a check on the skipper’s visual assessment, they would also have given him more to think about and so helped to keep him more alert. A lesser reliance on the autopilot would also have been advantageous. Had the skipper chosen to change to manual steering for the course alterations before the grounding, such action would have been navigationally prudent in confined waters, and demanded a greater degree of concentration from the skipper.
155
6. Examples of errors due to low cognitive workload switching to high cognitive workload The following two accidents display errors due to situations in which the person had to move from a low cognitive workload to a high cognitive workload. This can lead to disorientation, confusion and inability to react in time to prevent the accident. Report ID: M02C0064 Investigating Authority: TSBC Collision between Canadian Prospector and Stellanova Accident summary: At approximately 1850 on the 12th of October 2002, the upbound Stellanova was going west through the South Shore Chanel on the St. Lawrence Seaway while the downbound Canadian Prospector was preparing to enter the South Shore Canal eastbound just after transiting Lake St. Louis. As it approached Mile 12, the Stellanova was on the south side of the channel when the pilot called the Canadian Prospector and arranged a starboard-tostarboard passing. The master of the Canadian Prospector concurred with the arrangement and manoeuvred the vessel towards the north side of the channel. The Stellanova was manoeuvred in order to keep it on the south side of the channel, but it sheered to port and then again towards the centre of the channel, and the Stellanova and the Canadian Prospector collided. Both vessels sustained significant damage. Additional information: The master of the Stellanova was acting as the officer of the watch, and was therefore responsible for navigation. However, he was doing some administrative tasks for some of the time the vessel was navigating in the confined waters of the channel. In the crucial seconds before the collision, only the pilot had a clear mental model of where the ship was in relation to the channel and to the Canadian Prospector. Keeping a close watch on the movement of a vessel is key to safe navigation in confined waters, and is a very important factor for the execution of manoeuvres. It is therefore essential that each member of the bridge team clearly understands his/her role and ensures that all information relating to the conduct of the vessel is conveyed to other team members. This was not done. HRA Comment: The master, acting as the officer of the watch, was doing administrative tasks which did not have a high cognitive demand, such as that required for
156
navigation. Thus, when the Stellanova suddenly sheered to port, the master had to quickly switch between low cognitive tasks to high cognitive tasks. As a result, he did not have time to properly assess the situation and develop a clear mental model of where the ship was in relation to the Canadian Prospector or the channel, and so could not take appropriate action to avoid the collision.
157
Report ID: 150 Investigating Authority: ATSB Collision between Star Sea Bridge and Sue M Accident Summary: At about 0110 on the 21st of June 2000, off Evans Head, New South Wales, the skipper of the prawn trawler Sue M and the deckhand were sorting their catch on the after deck. The two men sorted the catch for approximately half an hour, facing each other on opposite sides of the sorting tray above the icebox, the skipper on the starboard side, the deckhand on the port side. Eight or nine trawlers could be seen to the west, the closest about a mile away. Periodically the skipper would enter the wheelhouse to check the boat’s course. On board the bulk carrier Star Sea Bridge, sometime between 2340 and 2350 on the 20th of June, the second mate arrived on the bridge. After checking the positions plotted by the third mate who had been on watch, he read some incoming messages and had a chat with the third mate. He went to the forward part of the wheelhouse and had a look around before returning to the chart table and taking over the watch at midnight. He had seen no other vessels, either visually or on radar. The third mate, on being relieved, did not go to his cabin but stayed at the chart table where he worked on the port log and some papers which he was to submit after the voyage. According to the two officers, during the following hour, the third mate had some brief conversations with the second mate when the latter went to the chart table to check the chart. Just after 0045, the lookout saw a group of fishing vessels on the starboard side, which he estimated were about 7–8 miles away. There was no other traffic around at the time. At 0050, he saw a ship, the Ever Able, right ahead and, using the radar, saw that it was about 14 miles away. At 0110, the second mate of Star Sea Bridge switched the steering from autopilot to manual, ordering an alteration of course of 5° to starboard to clear Ever Able. While the ship was turning slowly back to its original course, the second mate noticed a white light about 3 to 4 points on the starboard bow at a distance which he initially estimated to be two miles off, perhaps less. The lookout, at the wheel, also saw this white light on the starboard side. The second mate could not make out what the white light was, but it was closing. He wondered how it could be getting nearer even though the ship was turning to port. According to his evidence, he sounded the ship’s whistle and ordered the lookout to steer ‘hard-a-port’. At this point, the white light appeared to close rapidly and, moments later, it made contact with the starboard side of Star Sea Bridge.
158
Additional information: Despite the fact that the lookout on duty with the second mate on Star Sea Bridge was supposed to have been maintaining a watch for other vessels, he did not notice Sue M until he was at the wheel, altering course for Ever Able. He had been talking to the lookout, who had been on duty with the third mate, until 0045 and this could well have distracted him from keeping a proper lookout. Just after this, he observed fishing vessels on the starboard side, which he reported to the second mate. By this time, however, Ever Able was visible and the attention of both the second mate and the lookout was then directed solely to that vessel. The second mate on Star Sea Bridge would have been aware of the presence of the third mate’s lookout on the bridge after midnight, but he did nothing about it. He spent some time at the chart table with the third mate, which suggests that he was distracted and omitted to ensure that a proper lookout was maintained. After Ever Able was sighted, he became engrossed in ensuring that its CPA was adequate, but he paid no attention to what appeared to him to be two white lights on the starboard bow of his own ship. It is possible that these lights were those of the fishing vessel Sue M, but it was only when a collision was imminent, that he noticed a ‘white light’ close to the bulk carrier and attempted to take avoiding action. According to his evidence, he sounded the whistle at this time, but the deckhand on the fishing vessel did not hear it. HRA Comment: The second mate and lookout on watch, on board the Star Sea Bridge, did not maintain a proper lookout, and were both distracted by talking to the crew members who had been on a previous watch. When the lookout sighted the Ever Able, all attention was directed solely to the task over avoiding that vessel. This means that the Sue M was not sighted for some time, and when it was sighted, it took some time for the lookout and the second mate to determine what it was and where it was.
159
7. List of Performance Influencing Factors (PIFs) This section describes a list of Performance Influencing Factors that were noted during the examination of accidents from the four databases. These factors can influence human behaviour and can also contribute to accidents in which high or low cognitive workloads are a factor. The time of the incident: Many accidents where workload is an issue tend to occur early in the morning, between the hours of 0000 and 0600. At this time of day, people are naturally more tired and less alert than, say, in midafternoon. The stage of the person’s shift: Many accidents occur at the beginning of a shift, when a person is assessing the current situation and trying to obtain a mental model of the current situation. Conversely, at the end of a shift, the person may be preparing to hand over, or may be distracted by thoughts of finishing the shift. In either case, the person may not be fully concentrating on their task(s) at hand. The number of persons on watch: Many accidents tend to occur when a person is on watch alone, as that person is then responsible for all aspects of the navigation of the vessel. In cases where there are multiple people on watch together, distractions and lack of communication between people can also lead to accidents. The type of vessel: This can sometimes contribute to the accident, especially if there is a large vessel navigating near a smaller vessel. In many cases, the smaller vessel cannot be sighted visually, due to visual restrictions in the wheelhouse. The technology used: Many vessels now have numerous navigational aids, which can contribute to accidents in two ways. Firstly, people can become over-reliant on these aids, and thus less vigilant in their visual lookout and in cross-checking data between instruments. Conversely, people may not use the navigational aids as they prefer to navigate by visual references, or they may not fully understand how to use the aids. Weather conditions: These can contribute to accidents by reducing visibility (e.g. rain, snow, fog, glare of sun), or making it more difficult to control a vessel. The area of the incident: Collisions tend to occur during the voyage, when vessels are passing each other in narrow channels. Groundings tend to occur when berthing, coming into port or exiting port.
160
Fatigue: There have been extensive studies in to how fatigue can influence performance, and, when combined with high or low workloads, will almost inevitably lead to an accident. Environment: The environment around the person on watch can influence performance. For example, if the person can conduct all tasks whilst sitting down, it can make the person less vigilant, and reduce the ability to switch between low cognitive workloads and high cognitive workloads.
161
Appendix 3 – Stakeholder analysis
162
1.
Introduction
For the purpose of selecting a methodological approach to cognitive workload assessment, the various stakeholders in the shipping industry were identified and their potential usage of the assessment tool anticipated. Shipping is characterised by having a relatively large number of stakeholders, only a few of whom are likely actually to make use of the tool (although many more may make use of information generated by use of the tool). Of those who may actually use the tool, only a minority are likely to be primary users in the first instance. Their requirements should therefore dominate in terms of their influence on methodology selection.
2.
Stakeholder definition
The word “stakeholder” is used to mean a participant in the shipping industry, i.e. a person, group, organisation, body corporate, authority, etc of any kind which in some way or other contributes to the provision of shipping services. Thus, a pilot, a ship’s crew, a classification society, a marine equipment manufacturer and a flag state administration are all stakeholders in the shipping industry. They, and many other similar types of individual and organisation, each contribute to the provision of a shipping service to endusers. (In this context, end-users include manufacturers, importers, exporters, consumers, passengers, the general public and society at large, for whom the cost-effectiveness of shipping, and the risks associated with their reliance on shipping, are valid considerations).
3.
Stakeholder categorisation
A categorisation of stakeholders is proposed in respect of the application of the workload assessment tool to the tasks performed by a watchkeeping navigating officer (i.e. the OOW, this scope having been agreed during the project kick-off meeting). This categorisation is as follows. Stakeholders for whom, in respect of the core objectives of that stakeholder, the OOW’s cognitive workload is either: a) Directly relevant (stakeholder will use the tool and make decisions based on results); b) Indirectly relevant (will not actually use the tool, but will make decisions informed by results); or c) Not relevant at all (decisions made largely irrespective of results of tool). Of course, this categorisation is not clear cut, since complex interactions exist between many of the shipping industry stakeholders, and also because the
163
eventual scope and applicability of the workload assessment tool is yet to be determined. The table below (see Section 5) comprises a nearcomprehensive list of shipping industry stakeholders (as defined in Section 2 above). The role of each stakeholder is briefly indicated.
164
4.
Stakeholder requirements
For stakeholders in category 3(a), their anticipated specific functional requirements for the cognitive workload assessment tool are developed in the tables in section 6 below. Requirements for stakeholders in category (b) have not been developed; and it is assumed that stakeholders in category (c) will impose no requirements. In addition to the specific functional requirements for category (a) stakeholders (given in section 6), generic functional requirements should include: • • • • • • • •
5.
Applicable at the overall task and/or sub-task level; Suitable for predictive and reactive task analysis; Able to accommodate safety critical tasks in both the high workload and ‘dormant’ states; Minimal ambiguity associated with input parameters; Clear user guide manual; Repeatability; Provide both qualitative and quantitative outputs; and Simple, robust and rapid version (i.e. sailor-proof) needed for use on board.
List of stakeholders (alphabetical order)
Stakeholders Accident investigators
Category Direct
Accident/incident database managers
Indirect
Aids to Navigation, lighthouse organisations Bridge/lock controllers (inland navigation)
Indirect
Bunkering suppliers Cargo company
Not relevant
Not relevant insurance Not relevant
Role Investigation of maritime accidents Dissemination of safety advice and recommendations Database structuring and incident/accident information collection and archiving Provision and maintenance of visual (lighthouses, leading lights), radio, etc navigational markers and information Assessment of (ship type specific) risks (e.g. to optimise checklists for bridge/lock control surveys) Acceptance/refusal of ships Fee estimation/adjustments Pollution avoidance Fulfilling national/local requirements for contingency planning, and provision of emergency services Supply of fuel and other consumables to ships in port, anchored etc. Provision of insurance cover for cargoes carried by sea
165
Stakeholders Charterers
Category Not relevant
Chief Engineer
Indirect
Classification societies, construction surveys Classification societies, periodic surveys Classification societies, rule development
Not relevant
Coastguard agencies
Direct
Consumer environmental pressure groups Crew managers
/
Not relevant Indirect
Not relevant Indirect
Customs authorities
Not relevant Drug enforcement Not authorities relevant Engineering OOW Indirect
Environmental authorities
Indirect
EU Financial organisations, financier
Indirect Not relevant
Fire department
Not relevant
Role Selection of proper ship for operation (vetting inspections) Refusal of non-compliance vessels. Cargo damages. Port State Control inspections and other surveys Management of operation and maintenance of main and auxiliary machinery on board under the direction of the ship’s master Supervision of hull and machinery construction to ensure compliance with classification rules. Type approval of ship’s equipment Survey and inspection of ships in operation to ensure continued compliance with classification rule requirements Pro-active approach to assess ship design safety Tool for (safety) equivalency evaluations Acceptance/refusal of ships' safety design Tool for rule development/adjustment Accident and incident investigation Port State Control inspections and other surveys Coordination and/or provision of search and rescue services to the maritime community Operation of MRCCs (Marine Rescue Coordination Centres) Persuasive activities Crew bargain agreements Minimum safe manning Crew education, training and licensing. Crew leave and holidays Crew safety Cargo declaration procedures. Cargo control Avoidance of smuggling, secret lodges, etc. Day-to-day watchkeeping operation and monitoring of main and auxiliary machinery on board Information on/assessment of risks in certain areas (e.g. coastal) Acceptance/refusal of Ships Accident and incident investigation Policy setting Information on a ship's risks Acceptance/refusal of financing Interest estimation/ adjustments On board fire fighting training
166
Stakeholders Flag State Administration, policy setting and compliance
Category Indirect
Role Information on/assessment of risks Decision on safety regulations (through IMO) Database structuring and incident/accident information collection and archiving
Harbourmasters
Indirect
Hull & insurer
Not relevant
Overall control of shipping movements, pilotage, tug services, etc in ports Information on/assessment of ship's risks Acceptance/refusal of ships Premium estimation/ adjustments Developing codes of good practice Loss reduction campaigns Advisory activities Stowaway avoidance
machinery
Immigration authorities IMO
Not relevant Indirect
Individual crew Indirect member Insurance brokers Not relevant Intergovernmental Indirect bodies Marine equipment Indirect manufacturer Marine institutes
Indirect
Marine warranty surveyor Maritime communications provider Maritime professional institutions
Not relevant Indirect
Maritime research organisations
Direct
Maritime universities
Indirect
Master
Direct
Mooring masters
Not relevant Indirect
National governments Navigating OOW Navigation simulators
Direct
Direct Direct
Policy setting Database structuring and incident/accident information collection and archiving Day-to-day shipboard tasks under the direction of watchkeeping officer Intermediaries between shipowner / charterer and insurance company Support for policy setting Manufacture and supply of machinery, control systems, navigational and safety equipment, etc Centres of professional expertise for people working in the maritime industry Surveys and inspections for assessing seaworthiness of ships Organisations providing radio and satellite communication infrastructure for the maritime community Ship safety Minimum safe manning Crew education, training and licensing Crew safety Support for ship design and operation development/improvement Provision of formal higher education in ship science, naval architecture and marine engineering The senior nautical officer on board with overall day-to-day responsibility for operating a ship Shore-based person responsible for safe mooring of a ship at a terminal Policy setting Watchkeeping nautical officer responsible for day-to-day navigation and operation of the ship Shore-based schools for navigational training of nautical watchkeeping officers
167
Stakeholders P&I Clubs
Category Indirect
Passengers
Not relevant Direct
Pilots
Political advisors (e.g. members of EU parliament) Port authorities
Not relevant
Port state inspectors
Direct
control
Port terminal operators Regional (waterway) authorities
Indirect
Not relevant Indirect
Regional Indirect governments Regulatory policy- Direct makers Salvage companies Not relevant Seafarer education and training establishments
Indirect
Role Information on/assessment of (ship type specific) risks Acceptance/refusal of shipping companies Evaluation of fleets Developing codes of good practice Loss reduction campaigns Port State Control inspections and other surveys Personal Safety Clean Environment Ship handling Ship safety En route / channel safety Policy setting Assessment of (ship type specific) risks (e.g. to optimise checklists for port state control surveys) Acceptance/refusal of ships Fee estimation/adjustments Pollution avoidance Fulfilling national/local requirements for contingency planning, and provision of emergency services Port State Control inspections and other surveys Policy setting Shipboard inspections to verify compliance with IMO SOLAS and other regulatory requirements Operation of shore facilities for loading/unloading cargo and passengers Information on/assessment of risks in certain areas (e.g. coastal) Acceptance/refusal of Ships Pollution avoidance Fulfilling national/local requirements for contingency planning, and provision of emergency services Policy setting Policy setting Flag State maritime policy development Ship handling Ship safety Ship design Crew training Training in routine operations and emergency preparedness Education in safety management
168
Stakeholders Search and rescue (SAR) organisations
Category Not relevant
Ship design Direct organisations Ship managers Direct
Ship associations
owner
Indirect
Ship owners
Direct
Ship repair company / drydock Ship sale and purchase brokers
Not relevant Not relevant
Ship surveyors
Indirect
Ship yards shipbuilders
,
Not relevant
Ship’s crew
Indirect
Ship’s equipment servicing/maintenance contractor Shippers, cargo forwarding agents, other carriers, etc
Not relevant Not relevant
Role Training in routine operations and emergency preparedness Fulfilling national/local requirements for contingency planning, and provision of emergency services Design responsibility for new ship construction, refit and repair ISM compliance Pollution avoidance Crew training Training in routine operations and emergency preparedness Evaluation of fleets Developing codes of good practice Advisory actions Persuasive activities Selection of ship design options Safety design optimisation Seeking exemptions or equivalencies to prescriptive regulations Negotiation with yards Negotiations with insurance companies Safety system investments Support for safety management investigations/ decisions ISM compliance Pollution avoidance Crew training Training in routine operations and emergency preparedness Accident and incident investigation Database structuring and incident/accident information collection and archiving Ship repair and maintenance Ship provisioning Ship maintenance, repair and refit Support for acceptance / refusal of ships Advisory activities Marketing Condition surveys of ships and machinery for owners, charterers, class, etc New ship construction Design development / optimisation Day-to-day shipboard tasks under the direction of watchkeeping officer Routine servicing and repair of machinery, control systems, navigational and safety equipment, etc Cargo security/safety (avoidance of theft, damage) Cargo declaration Cargo control
169
Stakeholders Stevedores
Category Not relevant
Tour operators
Not relevant
Trade Unions (officer)
Indirect
Trade Unions (ratings) Indirect
Tug owners, operator VTS / organisation
tug Not relevant VTMS Direct
Role Cargo handling. Load/discharge safety Cargo gear safety Cargo securing Passenger accommodation (hotel) catering Passenger leisure Passenger safety (handicapped passengers) Ship safety Minimum safe manning Crew education, training and licensing. Crew bargain agreement Crew safety Crew accommodation Crew provisioning Crew leisure ILO resolutions Crew bargain agreement Crew safety Crew accommodation Crew provisioning Crew leisure Ship handling, manoeuvring. Ship design Provision and operation of vessel traffic information and management systems in ports and confined waterways
170
6. Evaluation of the importance of Stakeholder requirements In this section, an attempt is made to prioritise the requirements of the Primary stakeholders by developing an importance index for these stakeholders Stakeholder Requirements Objectives Relevance: User Application
Practicalities
Importance Specific Needs Functional reqts
Definitions What are the key roles and objectives of this stakeholder within the maritime community? As a possible user of the tool, how relevant is cognitive workload to the key objectives of this stakeholder? Rate H/M/L Is this stakeholder a primary (P, will use it), secondary (S, will be directly affected by use) or tertiary (T, will only be indirectly affected by use) user? P scores high, T scores low Will this stakeholder use the tool primarily in: acute, demanding situations (Tactical); for routine operational workplanning on board (Strategic); or as a long-term planning aid (Policy-setting)? T scores high, P scores low Is use of the tool by this stakeholder likely to be complex (C), middling (M) or simple (S) in terms of the required expertise, input data and time to use the tool? C scores high, S scores low Summation of relevance, class, application and practicality scores, on scale of 1 (lowest) to 9 (maximum). Brief description of stakeholder needs in relation to cognitive workload of OOW Specific functional requirements of workload assessment tool, given this stakeholder's specific needs (see also list of common requirements)
171
Evaluation of the importance of Stakeholder requirements Stakeholder
Type
Objectives
Relev'nce
Class
Applicat'n
Practical's
Importance
High/ Medium/ Low
Tactical/ Strategic/ Policy
Complex/ Middling/ Simple
HPTC = 9
P
M
5
Improved performance, task scheduling and rostering of onboard staff. Justification of workload v. manning scales.
P
M
5
Improved performance, task scheduling and rostering of onboard staff. Justification of workload v. manning scales.
Ship owners
Commercia l
Profitable seaborne trade; safety through statutory compliance ; minimum manning
M
Primar y/ Secon dary/ Tertiar y P
Ship managers
Commercia l
Profitable ship operation; safety through statutory compliance ; minimum manning
M
P
172
Specific Needs
Functional reqts.
LTPS = 0
Reflect particular trading pattern of ship, and of 3/O, 2/O, 1/O roles on board. Allow office-based rostering and scheduling of shipboard work. Allow comparison of 6/6, 4/8 and other watchkeeping profiles. Require only modest levels of expertise to use. Same as ship owners.
Coastguard agencies
Regulatory
Improved maritime safety; efficient coastal monitoring and response activities
H
P
S
M
7
Port state control inspectors
Regulatory
Marine accident prevention through ship-board inspections
M
P
T
S
6
173
Improved performance, task scheduling and rostering of MRCC staff. Justification of workload v. manning scales. Rapid seafarer workload assessment (acceptable, marginal or unaccceptabl y high) during routine on board inspections.
Applicable to shorebased shift work. Allow rostering and work scheduling. Able to accommodate task sharing when workload levels rise.
Portable; readily useable on board during inspection visits or very short voyages. Rapidly provide qualitative workload rating. Require minimal time/difficulty/equip ment to collect input data. Enable effect of multiple concurrent tasks to be assessed.
Stakeholder Maritime research organisations
Type Educational
Objectives Improved understanding of bridge activities and navigational ship safety
Relev'nce H
Class P
A pplicat'n P
Practical's C
Importance 7
Specific Needs Sophisticated assessment tool to support continuing research into cognitive loading in the maritime domain.
Navigation simulators
Educational
Providing highquality, good value, simulator training for ships' officers
H
P
T
C
9
Repeatable and very flexible scenario modelling, focused on navigational tasks
Accident investigators
Regulatory
Enhancing maritime safety; determining causes of marine accidents
H
P
T
C
9
Painstaking assessment of workload as contributory factor in accident. May need to gather data rapidly during interviews and site examination.
Pilot
Individual
Protecting independence of employment; safety of ship and port facilities
H
P
T
S
7
Rapid, on-board, situation-specific assessments (acceptable, marginal, unaccceptable) during routine operations.
174
Functional reqts. Allow fine-grained differentiation between different tasks and types of task. Allow sensitivity analysis, Monte Carlo simulation and risk assessment. High degree of flexibility to accommodate different types of data input (observation, measurement, etc). Reflect expert / highly intelligent useage and permit detailed interpretation of outputs. Accommodate instructorgenerated, realtime, multitasking simulations. Able to capture and record user behaviour and performance. Allow detailed feedback of subjective experience and comparison with model output. Same as maritime researchers, plus: Require minimal time/difficulty/equipment to collect input data. Enable effect of multiple concurrent tasks to be assessed. Short-term, simplistic, forecasting of workload level (safe or not?). Minimal (and preferably formulaic) input required. Output suitable as an aid to decision-making, and for logging.
Stakehold er Ship design organisati ons
Type
Objectives
Relev'nce
Class
Practical's
P
A pplicat'n P
M
Importan ce 5
Commerci al
Enhanced competitiveness through innnovation; safety through compliance; minimum manning
M
VTS / VTMS organisati on
Regulator y
Safe and efficient management of ship movements within a port area
H
P
S
M
7
Ships' masters
Individual
Safe & timely navigation; efficient shipboard operations
H
P
T
S
7
175
Specific Needs Comparative analysis of alternative IT provision on a ship's bridge; design of control and alarm systems for machinery, fire, cargo, ballasting, etc. Improved performance, task scheduling and rostering of VTS staff. Justification of workload v. manning scales. Rapid, on-board, person and situation-specific assessments (acceptable, marginal, unaccceptable) during routine and emergency operations.
Functional reqts. Same as maritime researchers.
Same as coastguard agencies.
Short-term, simplistic, forecasting of workload level (safe or not?). Minimal (and preferably formulaic) input required. Output suitable as an aid to decision-making, and for logging.
Appendix 4 – Bridge task inventory
176
1.
Generic overview
Only tasks undertaken on-board, by the ship’s normal complement are considered. There are three departments: Deck, Engine and Hotel. Only the deck department is considered in detail. The ship’s master is part of the deck department and has overall command and responsibility for all activities onboard. Tasks are split into watch-keeping duties, and other (off-watch) activities. Focus is primarily on watch-keeping activities, with off-watch activities only considered to the extent that they overlap with or impinge on watch-keeping tasks. In addition, from an accident prevention perspective, consideration is given only to tasks undertaken prior to an event, i.e. study is not dealing with emergency response activities by those on board after an accident (but will include emergency preparedness and actions taken / not taken to avert an imminent accident, e.g. collision avoidance). Tasks include the following: 1.1 • • • • • •
1.2 • •
1.3 • •
Deck department Navigation and control of the ship (passage planning, conning, external communications) Manoeuvring (including berthing, unberthing, mooring, anchoring) Control of seaworthiness (ballasting, stability, watertight integrity) Management of cargo operations (loading, unloading, stowage, securing) Maintenance of hull structure and fittings Emergency preparedness (maintenance, training and drills with lifesaving and firefighting appliances. Engine department Control of main and auxiliary machinery (propulsion, steering, power generation, utility systems, etc) Maintenance of machinery, systems and equipment on board, including communication systems. Hotel department Provision of food, drinking water and medical care for crew and passengers Shipboard house-keeping.
All departments also have administrative tasks (liaison with shore authorities, record-keeping, compliance with security, safety and commercial management requirements, etc).
177
2.
Deck Department Tasks
2.1
Navigation and control of the ship
•
Route planning: Reference to voyage instructions, charts and pilot books; selection of way-points; reference to tidal stream information; tide offset calculations; derivation of courses to steer; chart plotting; identification of lights, marks and buoys; detailed passage planning for transit of designated shipping lanes
•
Position fixing: GPS checks; range/bearing of marks and lights by compass/radar; radio position-fixing; star/sun sights; dead reckoning calculations;
•
Control of ship speed and course: Conning, steering instructions to helmsman; monitoring of heading, course made good and speed; setting autopilot; overseeing change-over manual to auto steering (and vice versa); selection of engine power/revs;
•
Lookout for other vessels: Maintaining continual visual lookout; liaison with seafarer on lookout duty; maintaining radar watch (at ranges appropriate to circumstances); maintaining radio listening watch; use of binoculars/night-vision glasses; estimation of target ships course and speed;
•
Collision avoidance: Maintaining close radar and visual watch over target vessel(s); estimation of CPA (closest point of approach) distances and times; estimation of stopping/turning distances and times; evaluation of own ship manoeuvring options in context of Colregs and target ship(s) behaviour(s); seeking to establish radio communication with target ship(s), and VTS where relevant;
•
Radio communication: Monitoring routine radio traffic; monitoring GMDSS; position input (if not automatic within GMDSS); operational ship-shore radio traffic (with ship’s agents, owners, etc, port and VTS authorities, etc);
•
Weather forecasting: Receipt, collation and study of forecast weather information, sea-state observations; evaluating trends and anticipating changes; adjusting passage plan (heading, speed) if appropriate;
•
Logbook entries: Routine logbook entries at watch hand-over (position, course made good, speed, weather, etc); occurrence / event records
•
Watch hand-over: Covering navigational aspects (as above); plus status information regarding seaworthiness, cargo, safety functions, emergency preparedness and hull maintenance (as below).
178
2.2
Manoeuvring
•
Slow speed manoeuvring: Communication with machinery control room, ship handling; use of propeller(s), rudder(s), transverse thruster(s) to control ship’s position / direction / heading; steering instructions to helmsman; navigation with escort tug(s); navigation with tug assistance; picking-up or being taken in tow; estimation of, and compensation for, windage and tidal stream effects; ship/terminal communications; ship/VTS communications;
•
Pilotage: Loitering at pilot station; heaving to; manoeuvring to create a lee for pilot cutter; communication to/from pilot ladder deck party; pilot/OOW checklist completion; review of passage plan; OOW/pilot, and pilot/tug, communications during harbour transit;
•
Berthing, unberthing, mooring, anchoring: Control of, and communication to/from, mooring parties on forecastle/poop, also any mooring boats; monitor mooring lines; establish security watch; ship/terminal communications; maintain anchor watch.
2.3
Control of seaworthiness
•
On passage: Supervision, recording and monitoring of ballast water exchange / treatment; supervision, recording and interpretation of tank soundings; monitoring bilge alarms; stability monitoring; stability calculations; arrival drafts calculations;
•
Alongside: Supervision of ballasting and de-ballasting operations; monitoring of drafts and soundings; departure drafts calculations; stability calculations;
•
Watertight integrity: Monitoring integrity and status of weather-tight / watertight doors, hatches and closures.
2.4
Management of cargo operations (loading, unloading, stowage, securing)
At the time of completion of the project, these tasks had not been fully defined, as they were not assessed as having a major impact on OOW workload in the types of scenario that were likely to be assessed by the CMWL tool. 2.5
Maintenance of hull structure and fittings
Considerations similar to those set out in Section 2.4 also apply to these tasks.
179
2.6
Emergency preparedness
Maintenance, training and drills with lifesaving and firefighting equipment and systems. 2.7 Safety Critical Tasks The following deck department/OOW tasks are considered to be safety critical: • • • •
Collision avoidance Cargo securing Ballast exchange at sea Passage planning/position fixing.
180
Appendix 5 – Sample letter to shipping companies
181
Dear Mr X, Earlier this year you kindly agreed that Company Y would participate in a research project supported by the Maritime and Coastguard Agency (MCA) to develop a computer based tool to assess the cognitive (i.e. mental as opposed to physical) workload that operational tasks impose on seafarers. Research has shown that both underload conditions, such as maintaining watch with autopilot on in an open and calm sea at night, and overload situations (e.g. navigating into port in a very busy seaway) can both be significant factors in accident causation. In addition to situations involving high levels of cognitive activity, this project will explicitly examine minimum levels of cognitive activity and its effects on the ability of individuals to readily switch into an active mode e.g. during an unforeseen incident. This also has significance when considering the increased use of ship-borne automated systems that reduce the cognitive workload of crewmembers for significant periods of time. The overall research objectives, as stated by the MCA, are as follows: 1 Review current research into safe maximum and minimum human cognitive workload capabilities (this part of the research is now largely complete). 2 Identify the safe maximum and minimum human cognitive workload levels for the maritime industry and, if necessary, for different trades or conditions of work within it. 3 Develop a robust tool that can effectively and efficiently assess human cognitive workload levels. 4 Test the tool using examples of rosters/shift patterns from the maritime industry. The work is being conducted by Human Reliability Associates Ltd (HRA), which has been involved in analyzing the human role in a wide variety of safety critical industries for the past 25 years. Information on the company is available on our website: www.humanreliability.com. HRA is being assisted in this project by Mr Jim Peachey, an experienced naval architect and internationally recognised expert in maritime safety. The HRA contacts for the work are Dr Claire Blackett or Dr David Embrey. At MCA's request, the study is focussing on navigational aspects of ship operation, hence on the bridge and the role of the Officer of the Watch (OOW). As a first step, we have compiled a draft task inventory and identified some safety critical tasks associated with this watch-keeping role. These are contained in Appendix 1 (attached). Our intention is to examine the OOW's cognitive workload when performing these safety critical tasks. At this stage, therefore, we would value confirmation that we have adequately reflected the OOW's role. Please could we ask you to comment on Appendix 1 within, say, the next fortnight? We would like a corporate response in this respect, i.e. your company's comments on the appropriateness of the selected tasks. The next stage of our work will comprise an information gathering process, using a short diary based questionnaire, which asks the OOW to recount his experience of undertaking bridge watch-keeping tasks, describing, in
182
particular, his subjective view of the level of cognitive loading (both underload and overload) experienced when performing these tasks during a voyage. As part of developing a cognitive workload assessment methodology, we envisage initially trialling this questionnaire amongst a few individual OOWs. This would involve participants recording their perceived loading levels during stages of a voyage, together with data on other factors (e.g. levels of distractions, overlapping tasks, fatigue) that may affect the perceived levels of loading. We would then like to follow this up by conducting short telephonebased debriefing interviews, using the information gathered during the voyage as an aide memoir (the practicalities are obviously dependent on the voyage pattern, and would be agreed with you beforehand). We would take care to ensure that this process would not have any negative impact on the efficiency and safety of your operations. We trust that you are able to assist us in developing and trialling this tool in the way described. We appreciate that you have limited time and resources available, and we would therefore anticipate that you would be able to put us in touch with one or more individuals in your organisation with appropriate experience who would be able to coordinate with HRA. In summary, we would like you to assist us in the following areas: ° Comment on the Task Inventory and Safety Critical Tasks provided in Appendix 1; ° Facilitate contacts with individuals in your organisation who could act as coordinators for the project; ° Assist us in the development and application of the data collection protocol to gather information on the factors which influence underload and overload in marine operations; and ° In the longer term assist in validating the measurements of workload provided by the cognitive mental workload tool. The contact details of the study team are as follows: Mr Jim Peachey Dr Claire Blackett Dr David Embrey
[email protected] 01256 473974
[email protected] 01257 463121
[email protected] 01267 463121
We believe that this project will bring significant safety benefits to the marine industry, and we would therefore like to thank you in advance for your willingness to participate. Yours sincerely, Jim Peachey, FRINA, CEng
183
Appendix 6 – Workshop sessions to develop workload models
184
1. The Influence Diagram Workshops This section describes the interactive procedure for the development of Influence Diagrams (IDs). The process described here refers to the development of an overload model with one specific consensus group, but the same procedure is used for all groups to create both an overload and an underload model. 1.1 Developing a seed model The first part of the ID procedure is the development of a seed model. This model is developed separately, and without the input of the consensus group, and is based upon factors that are known to cause feelings of stress or pressure in the Officer of the Watch (OOW). The seed model used for these interactive sessions included eight primary factors, and twenty-one subfactors. There are three main reasons for developing a seed model: 1. The seed model helps to establish the goal of the Influence Diagram by demonstrating how these factors contribute to the feeling of being overloaded. For example, as the number of distractions increases, the OOW might start to feel increasingly under pressure or overloaded. In contrast, as the degree of visibility improves, this might help to decrease feelings of loading. 2. The Influence Diagram process can be somewhat difficult to grasp at first, as it requires a different, less traditional way of thinking about the problem area. The seed model helps to demonstrate the conventions of Influence Diagram modelling. 3. Due to the fact that IDs require an alternative way of problem solving, it can be difficult to stimulate thought and discussion when starting with a blank ID. By showing the group a seed model, the group is encouraged to discuss the factors already present, and to expand upon them or delete them as appropriate. The factors that were included in the seed model are factors that are known to cause feelings of stress or pressure amongst bridge crew members, as reported in the MAIB’s Bridge Watchkeeping Safety Study, and a number of marine accident reports. The Bridge Watchkeeping Safety Study identified the following factors as having an influence on feelings of stress or pressure amongst bridge crew members: • • • •
Fatigue Manning levels (e.g., one man bridge operation) Distractions (e.g., leading to missed course alterations) Level of natural light
185
• •
Degree of visibility Use of bridge technology
A number of marine accident reports were also reviewed to determine the factors that may have caused feelings of overloading amongst bridge crew members. The following influential factors were identified: • • • • • • • • •
The time of the incident (e.g., very early morning) The stage of the OOW’s shift (e.g., at the end of the shift) The number of persons on watch The type of vessel (e.g., larger vessels may not be able to see smaller vessels, and they may not show up on radar) The technology used (e.g., autopilot, GPS) Weather conditions The area of the incident (i.e., proximity to hazards) Fatigue Physical environment (e.g., layout of the bridge)
1.1.1 Example of an accident caused by overload These factors, when taken individually, may only have a mild effect upon the person’s feelings of stress or pressure, but, when they occur in conjunction, they can cause significant loading. Consider the following accident (taken from the MAIB’s Bridge Watchkeeping Safety Study): A general cargo vessel of 794gt sailed from a port on the east coast of England at 0050 bound for Le Havre in ballast. The master and the mate, who were the only two watchkeeping officers, had both been involved with cargo work, hold cleaning and then bunkering on the previous day. They had both slept for about 4 hours between 0200 and 0600, and the mate had been able to sleep between 2200 and the time of sailing, and again between the pilot disembarking at 0100 and 0300 when he relieved the master on the bridge. The usual watchkeeping pattern had been disrupted by the demands of the work in port. The master went straight to bed when he was relieved, and fell asleep almost immediately. He had left no night orders for the mate, who was an experienced officer. However, the mate began to have trouble navigating soon after the master had left the bridge, but he was reluctant to ask the master to return as he knew that he was tired. He had been intending to navigate by eye from buoy to buoy along a pre-planned route. He failed to see one buoy, but carried on. Clutter was seriously affecting the radar picture, and spray was hampering the visibility from the wheelhouse. Despite failing to see the next two buoys, he still carried on, while trying desperately to establish the ship’s position, until the vessel eventually grounded on the Goodwin Sands at 0420.
186
This is quite clearly an overload situation, in which a number of factors contributed to the mate’s feelings of loading: • • • • • • •
The mate was more than likely fatigued due to the fact that he was sharing watchkeeping duties with only one other person. The watchkeeping patterns had been disrupted, which, again, would have contributed to fatigue. The mate was on watch on the bridge by himself. The mate was reluctant to call the master to the bridge, even though he was experiencing difficulties. The radar picture was distorted, which may have confused or distracted the mate. Visibility was low due to the spray. The mate continued with his course, even though he was unsure of the ship’s position, and was having difficulties with navigating the ship.
The combination of these factors resulted in the mate failing to identify three buoys and failing to identify the ship’s position, ultimately ending in the ship grounding. 1.1.2 The seed model From the literature review, the factors that were identified as contributing to overload were arranged into a list of eight primary factors and twenty-one subfactors. The primary factors are those which are considered to have a significant impact upon cognitive loading. Subfactors are any factors which are considered to be contributory to the primary factors, but not significant enough to be considered primary factors by themselves. Figure 41 shows the seed model used for these ID sessions. Figure 40 lists the eight primary factors and their subfactors:
187
Factor
Subfactor
•
Quality of the bridge automation
•
(no subfactors)
•
Primary task characteristics
•
Complexity
•
Flexibility
•
Severity of perceived consequences
•
Manning levels
•
(no subfactors)
•
Fatigue
•
Length of time on watch
•
Disruption to watch patterns
•
Quality of rest periods
•
Responsibilities to additional duties
•
Navigation
•
Pilotage
•
Administrative tasks
•
Personal tasks
•
Traffic density
•
Training
•
Experience
•
Clarity of roles and responsibilities
•
Communication
•
Level of natural light
•
Visibility
•
Weather conditions
•
Telephone / radio / other distractions
•
On board relationships / other crew members
•
•
•
•
Concurrent task demands
Bridge crew competence
Environmental conditions Distractions
Figure 40 The primary factors and subfactors of the seed model
188
Figure 41 Initial Overload Model
189
Appendix 8 contains a full list of overload and underload factors, including a detailed description of each factor. 1.2 The Influence Diagram process For many participants, the ID session requires a different way of thinking about the problem area, and so it is useful to begin the ID session with a general discussion about the factors that influence workload. The group is then shown the seed model, and the participants are invited to discuss the model and comment upon the factors included. 1.2.1 Add, remove and/or change factors To ensure that all participants understand how the ID process work, the facilitator first describes each factor in detail and demonstrates how that factor contributes to the overall concept of cognitive workload. For example, the facilitator will explain how disruptions to watch patterns may lead to irregular rest periods, which in turn can lead to an increase in fatigue, which in turn is likely to increase feelings of loading as fatigue can severely impair the individual’s mental capacity. The group is asked to discuss the seed model to determine whether all of the factors are relevant and whether they are in the correct place. Some factors can be expanded upon (by adding subfactors) to more clearly define how they impact upon workload. For example, as Figure 42 shows, the group discussed how commercial pressure can have a significant impact upon feelings of loading experiences as part of concurrent task demands. Commercial pressure refers to the pressure to adhere to commercial schedules and timetables. The group discussed how this is a constant source of pressure that may affect different crew members in different ways (e.g., the master may feel more under pressure to adhere to commercial schedules than a third officer), but it is something that all crew members are aware of. Therefore, commercial pressure was added as a subfactor to concurrent task demands.
190
Figure 42 Close-up of overload model with added factor
Other factors may not need to be decomposed any further. For example, the participants of this session felt that manning levels on board the bridge will usually be either good or bad, and are not influenced by any other factors. As a rule of thumb, factors should be decomposed to a set of subfactors if they cannot easily be evaluated by the group at their existing level of detail. The group may decide that some factors should be removed from the model, as they do not have a significant impact upon cognitive workload. For example, as Figure 42 shows, traffic density (i.e., collision avoidance) was not considered by this group to be a significant concurrent task, and thus the subfactor was removed from the diagram. The group also decided that the factors navigation and pilotage were in the wrong place. They agreed that navigation is a significant primary task, and that pilotage is a subfactor of both navigation and watchkeeping (the two primary tasks). Therefore, the two factors were moved to a more suitable part of the diagram, as shown in Figure 43.
191
Figure 43 Close-up of primary task demands
The group is also asked to consider the placement of the factors. Factors that are lower down in the diagram (i.e., subfactors) will have less influence on those factors at higher levels. The strength of the factors becomes more diluted towards the bottom of the tree. If the group considers a particular subfactor to be very influential, then that factor may be moved further up the tree, to become a primary factor, which will increase its strength and influence in the overall model. Figure 44 shows the completed overload model as developed by this particular consensus group.
192
Figure 44 Completed overload model
193
Figure 45 Completed overload model with top level weights displayed
194
1.2.2 Add weights to the model The next step is to add weights to each of the factors in the model. Figure 45 shows the completed overload model, with the top level weights displayed. Not all factors will influence workload to the same degree, and the differences between the strength of the individual factors can be reflected by adding weights to the factors at each level of the tree. Weights are recorded in the upper right-hand corner of each factor, as can be seen in Figure 45. The factors are weighted by their relative impact on the factor(s) to which they are connected in the Influence Diagram, usually on a scale of 1 to 100. For example, this group was asked to consider which of top-level or primary factors were considered the most influential. These factors were given a weight of 100, indicating that they have the biggest impact on overload. The group then arrived at a consensus view regarding which of the top factors was the least influential, and this was given a weight accordingly. For example, if the least influential factor is considered to be about half as influential (i.e. to have half the impact upon workload) as the factor(s) weighted at 100, then the least influential factor would be weighted at 50. The remaining factors are then weighted appropriately, according to the consensus view of the group, somewhere in between.
Figure 46 Close-up of subfactors with weights added
This process is repeated for each level of factors. For example, as can be seen in Figure 46, degree of bridge crew competence has four subfactors that influence it: training, experience, clarity of roles and responsibilities, and communication. As Figure 46 shows, communication was considered the most important subfactor, and so was weighted at 100 to indicate that it has the highest impact upon degree of bridge crew competence. Clarity of roles and responsibilities, on the other hand, was weighted at 50, the lowest of all the subfactors, indicating that it is only half as influential as communication. The remaining two factors were weighted in between these two poles, with training weighted at 90, and experience weighted at 80, indicating that it is almost, but not quite, as influential as training.
195
1.2.3 Check the Influence Rankings of the model The next step of the ID process is to ensure that the “direction” of each weighted factor is correct. For example, a factor such as quality of rest periods will have a positive impact upon fatigue, in that as the quality of the rest periods increases, fatigue would be expected to decrease, as the individual is more rested. This is reflected by placing a negative sign in front of the weight for that factor. The negative sign is simply a signal to the program that the direction of the scale is to be reversed, not that the weight of the factor is treated as negative in the influence diagram calculations. For example, if the rating for quality of rest periods moves from 10, meaning conceptually that the quality of rest periods is poor, to 90, meaning that the quality of rest periods is good; this will lead to a decrease in the resultant rating for fatigue. When all of the weights have been assessed, the IDEAS software can use the combined Influence Rankings for all the factors in the model to show which factors will have the highest impact upon the overall outcome if they are changed from their current ratings or states. For example, as Figure 47 shows, in this overload model, manning levels is the factor that will have the highest impact upon the overload situation (i.e., it is the highest ranked factor). The next highest ranked factors are: usability of bridge systems; visibility; quality of alarm management; and quality of rest periods. This is a useful exercise, as displaying the Influence Rankings allows the group to see how the different factors will impact upon the overall outcome at the top of the diagram, in this case, the likelihood of an overload situation developing. At this point, the diagram is still very much a work in progress and if the group did not think that, for example, visibility would have such an impact upon overload (currently ranked as third), they could change the weighting of the factor so that it is less influential, or move it further down the tree to become a sub-subfactor, and thus dilute its effects.
196
Figure 47 Overload model with influence rankings displayed
197
Figure 48 Adding ratings to test the model
198
1.2.4 Test the model with a scenario When the group is satisfied with the weights and position of the factors in the model, it should then be tested for accuracy by entering ratings (usually on a scale of 1 to 100) for a particular scenario. The group is asked to think of a real scenario that they have experienced in which they felt highly pressured and on the brink of becoming so overloaded that, had anything else happened, they probably would have made an error or an accident might have occurred. One participant (a master) outlined the following scenario: The master was trying to berth a large container vessel in extremely bad weather, in an unfamiliar port. There was very heavy rain, so visibility was quite poor. There was a pilot on board the vessel, but the master said that he was not contributing or helping in any way, thus manning levels were considered inadequate. The task of berthing this vessel in an unknown port was quite complex, not flexible in any way, and the situation was worsened by the bad weather and poor visibility. Figure 48 shows the completed overload model, with ratings added for this particular scenario (in the bottom left-hand corner of each box). Note that the ratings shown in blue indicate that these factors have been further decomposed, and thus the ratings are automatically calculated by the software, based on those ratings of the factor(s) below. To enter a rating, the group is asked: “In this overload scenario, where you felt highly stressed, how should a particular factor be rated?” It should be noted that the ratings are for the specific scenario being considered, whereas the weights are generic for all overload situations. Take, for example, the factor degree of bridge crew competence, and its relevant subfactors, as shown in Figure 49. The master involved in the scenario was asked to rate each individual subfactor, on a scale of 1 to 100. The master was first asked what the level of communication between bridge crew members was like during this scenario, with 100 meaning that communication was at its best case possible (i.e., communication between bridge crew members was excellent, the information being communicated was useful and understood by all, there were no language or cultural barriers that may have prevented communication, etc.). In this case, the master said that there were only two people on the bridge – the master himself and the pilot. The master reported that the pilot was very uncommunicative and did not attempt to help the master berth the vessel at all, and thus he considered communication to be very bad, almost at its worst case possible. The subfactor communication, therefore, was given a rating of 10, to indicate that it was almost at its worst case. Similarly, clarity of roles and responsibilities was rated quite low (at 20), as, according to the master, the pilot did not seem to know what he should have been doing, or what his responsibilities were on the bridge, and thus the
199
master had to berth the vessel by himself. Experience was rated at 50, as the master had plenty of sea experience, but was unfamiliar with this particular port. Training was given a relatively low rating (30) because the master was trained in how to berth the vessel, but was not trained in berthing the vessel at this particular port under these conditions.
Figure 49 Close-up of ratings for overload scenario
As Figure 49 shows, the overall rating for the factor degree of bridge crew competence was automatically calculated at 27 (as indicated by the blue colour), based upon the ratings entered for the four subfactors. On the basis of all the ratings entered, the Seafarer’s Loading Index (SLI) in the bottom left-hand corner of the top box of the model, Overload Situation, will also be recalculated. This index reflects the overall loading of the OOW (in this case, the master) for the scenarios represented by the ratings and the weights. The SLI is on a scale from 0 to 1, with 0 meaning that the OOW is not suffering from overloading (i.e., best case), and 1 meaning that the OOW is experiencing the worst case possible in which he/she is completely overloaded. An SLI of 1 indicates that all of the factors contributing to loading are at their worst case possible, whereas an SLI of 0 indicated that the factors are at their best case. It should be noted that “best case” may actually represent a moderate level of loading, rather than a minimum level. This is because a moderate level of loading would represent the level at which the probability of human error arising from cognitive loading was also at its minimum. In the early stages of research, we used separate models to evaluate underload and overload. In each of these models, SLIs of 0 and 1 were taken to represent the best and worst case conditions respectively. However, intuitively it appeared easier to think of a cognitive workload index on a scale where 0 represents maximum underloading, and 1 maximum overloading. From this perspective, ideal loading would be represented by an SLI value of
200
about 0.5. Thus we anticipated that in the later stages of the project we would probably develop a revised scale with a single continuum from low to high cognitive loading, which could map directly on to a scale such as likelihood of error or frequency of near misses. In the event, however, we found it easier to retain separate models for the underload and overload situations. This was to avoid some factors having a different “direction” or sense (and thus a different sign, positive or negative, as discussed in section 1.2.3), depending on whether underload or overload is being considered. As Figure 48 shows, the SLI for the specific scenario under consideration is 0.69, indicating a relatively high level of loading, which is to be expected in this scenario. When the model has been completed (i.e., all factors, weights and ratings have been added) and the SLI has been recalculated, the group is then asked whether the resultant SLI reflects their expectations and experience of the overload situation. For example, given the scenario of trying to berth the ship in an unfamiliar port, in very bad weather with an unhelpful pilot, one would expect the SLI to be quite high. If the group thinks that the model does not realistically reflect their thoughts (e.g., if the SLI had been calculated at about 50), then the model can be altered accordingly. For example, weights can be changed to increase or decrease the impact of certain factors, or factors can be moved further up or down the tree to increase or dilute their influence on overload. It is useful to try out different “what if” scenarios to get a better idea of how accurately the model reflects the group’s beliefs. For example, if the group believes that in a highly stressful situation the addition of an extra qualified person on board the bridge (perhaps as a lookout, or to deal with communications) would make a significant difference to the loading level experienced by the OOW, then this can be tested by increasing the rating of the manning levels factor, and examining the resultant SLI. By using trial and error, and testing the model with different scenarios, the group will eventually come to an agreement of a model that best reflects their own beliefs, opinions and experiences about the factors that influence overload. These insights are combined with the results from the literature and from incident investigation data to represent a model that includes all the available knowledge relevant to the domain being addressed.
201
2. The Completed Models This section reviews the specific models that were created during the Influence Diagram sessions. There were four groups of participants from three companies: The Maersk Company Ltd., Condor Ferries and James Fisher Shipping Services Ltd. Figure 50 shows the number of participants in each ID session. Group Number
Company
Number of participants
1
Maersk (tanker)
5
2
Maersk (container)
6
3
Condor Ferries
8
4
James Fisher
5
Figure 50 Participants of the ID sessions
The models developed by each group are described in the following subsections, with comments on points of interest and difficulties experienced during the development of each model. It should be noted that only two of the groups (the Maersk tanker and the Maersk container groups) developed underload models. The concept of underload was discussed with all four groups, but the latter two (the Condor Ferries and the James Fisher groups) felt that they did not have sufficient experience of underload situations to be able to develop an accurate model. 2.1 Group 1: Maersk tanker vessel crew members The participants for the first workshop included two 3rd Officers, a Chief Officer and two Masters. The staff had varying levels of experience and came from different backgrounds, but all had recent relevant experience on board a tanker vessel, and this became the basis for the Influence Diagram sessions. 2.1.1 Group 1: Overload model •
During the creation of the overload Influence Diagram, shown in Figure 51, it was agreed that visibility is one of the most important factors that will influence the OOW’s workload. As the degree of restricted visibility increases, overload also increases. However, the group commented that the impact of visibility is dependent upon the circumstances. If the vessel is in an area of high traffic density, high proximity to hazards, or restricted manoeuvrability, then bad visibility is very problematic. However, if the vessel is in open seas, with no traffic or hazards nearby, then visibility is not such a problem.
•
Also important at the top level of the diagram were: manning levels, concurrent task demands and fatigue. As the demands from concurrent
202
tasks and fatigue increase, overload also increases. However, as manning levels increase, the group agreed that overload would decrease, as there would be more people to share the physical workload, pressure and responsibility. •
The next most important factors at the top level of the diagram were environmental conditions and bridge crew. Environmental conditions refers to the degree of degradation of conditions such as the level of natural light and the weather. The group agreed that at night time or during the day, levels of natural light are not problematic, as they tend not to affect visibility. However, during the hours of dawn and dusk, the level of natural light can make it difficult to visually sight other vessels or hazards, thus increasing the workload. Therefore, in this model, as the level of natural light increases, it has a negative impact on overload.
•
Bridge crew refers to the level of competence of the bridge crew, which in turn is influenced by factors such as training, experience, communication and the clarity of roles and responsibilities. As the level of bridge crew competence increases, it too reduces loading.
•
The next most important factors at the top level of the diagram were the primary task characteristics, and distractions. Primary task characteristics refers to factors such as the difficulty of the primary task, the flexibility of the task (i.e., if it has to be done “to the letter” or if there are different ways to complete the task), the time constraints within which the task has to be completed, and the severity of the perceived consequences if the task is completed incorrectly. All of these factors, except for flexibility, will increase workload as they increase. As the flexibility in the way in which the task can be performed increases, the workload will decrease, since the OOWs can reorganise the work to cope with increasing demands.
•
Distractions includes radio, telephone and other distractions, and also distractions in the form of onboard relationships with other crew members and morale on board the bridge. However, the group agreed that as morale and relationships on board improve, workload decreases.
•
The least important factor at the top level of the diagram, according to this group, is bridge automation. The group agreed that this factor is ten times less influential than factors such as manning levels, fatigue, visibility and concurrent task demands.
During this session, it was agreed by considering a particular scenario that an SLI of 0.63 was the maximum allowable loading level. If overload increased beyond this level, then errors would start to occur. There was a general agreement that this is a realistic reflection of the loading levels under the conditions specified (i.e., low visibility, high traffic density, average manning levels, and high demands from concurrent tasks). It is possible to calibrate the
203
SLI scale so that particular values of the SLI (representing specific scenarios with this level of loading) are associated with specific error or near miss frequencies (or probabilities). For illustrative purposes, it was assumed that under worst case conditions (corresponding to an SLI of 1), there would be a frequency of 50 near misses per hundred voyages. In the best case conditions, with an SLI of 0, it was assumed that the near miss rate would be 1 near miss per hundred voyages. Using these illustrative data, the model predicted that, with an SLI of 0.63, there would be 32 near misses in 100 voyages undertaken. The group agreed that if any of these factors were to worsen, then the likelihood of an accident occurring would significantly increase. It should be pointed out that in reality, the model would need to be calibrated using data from simulators or actual voyages in order to make these predictions of anticipated near miss rates. However, the data obtained from these sessions do provide an estimate of the maximum acceptable load on the basis of the scenarios considered by the expert group.
204
Figure 51 Group 1 overload model (tanker vessels)
205
Figure 52 Group 1 underload model (tanker vessels)
206
2.1.2 Group 1: Underload model The group of participants was then invited to create a model to reflect an underload situation, as shown in Figure 52, and the factors that would influence that situation. Some of the factors for this model were the same as for the overload model, but they affected workload in a different way. •
According to the group, the most important factors at the top of the diagram are the primary task activity and fatigue. As the level of activity associated with the primary task increases, underload will decrease. The primary task activity is influenced by the degree of familiarity with the task, the monitoring of onboard activities, administrative tasks and navigation tasks. All of these factors have a positive impact upon primary task activity, with the exception of familiarity. As the degree of familiarity with the task increases, the feeling of underload will also increase, as the OOW will not have to concentrate as hard on the task.
•
Fatigue, in this model, is influenced by the length of time the OOW has been on his / her watch, the quality of the rest breaks, the level of disruption to the watch patterns and cumulative stress. As in the previous model, as the quality of the rest breaks increases, fatigue will decrease.
•
The next most important factor in this model is the level of external activity, which is influenced by the following factors: interaction with other personnel; traffic density; frequency of navigational hazards; radio and alarms on the bridge; and other external objects (nonhazardous, such as sightings of dolphins, etc.). As the level of external activity increases, feelings of underload will decrease, as the OOW will have more mental stimulation.
•
The next most important factor is the level of physical activity, which is influenced by bad weather conditions, and physical movement around the bridge. Again, as the level of physical activity increases, the level of underload will decrease.
•
The next most important factor in the underload model is the degree of comfort of the physical environment. This factor is encapsulates the level of heating and lighting on the bridge, as well as the opportunity to remain seated whilst on watch. As the degree of comfort increases, underload can also increase, which indicates that as the OOW becomes more comfortable, he / she may also become less vigilant, and it may take more effort to maintain performance.
•
The least important factor in the underload model is the duration of reduced visibility. The group agreed that in the initial period of restricted visibility, the OOW will be quite alert, but as the period of restricted visibility continues, the OOW may become less vigilant.
207
During this session, it was agreed that an SLI of 0.83 represents the maximum allowable level of underload. To arrive at this judgement, the group envisaged a scenario where the vessel was at deep sea during the night, with good weather. The OOW had had disrupted shift patterns, and less than average quality of breaks. Note that, for this particular scenario, visibility was not an influential factor, and therefore the weight was reduced to 0 so that it would not have an impact upon the SLI. With an SLI of 0.83, the model calculated that the frequency of a near miss occurring would be 42 out of 100 voyages. The group commented that it is difficult to determine the point at which factors will have a significant impact upon failures (or near misses). However, there was a general agreement that if the “boredom” of the OOW increased any more than in the scenario outlined above, then the likelihood of a near miss occurring increases significantly. The group also commented that levels of “acceptable” underload can be very subjective, and can vary widely from person to person, and from organisation to organisation. Having successfully completed the overload and the underload model, the group then attempted to model a situation in which there might be a rapid transition from an underload situation to an overload situation. However, the group commented that they had not ever experienced such a situation, and so they did not feel qualified to comment or speculate on the factors that might influence such a situation. It was agreed that this is a fuzzy area that would only be applicable in a minority of cases.
208
2.2 Group 2: Maersk container vessel crew members The participants for the second workshop included two 3rd Officers, two 2nd Mates, a Chief Officer and a Master. Again the staff had varying levels of experience and different backgrounds, but all were currently working on container vessels, and this experience provided the basis for the Influence Diagram sessions. 2.2.1 Group 2: Overload model This group were shown the seed overload Influence Diagram and invited to comment upon the factors already present, and to alter and add to the diagram as they saw fit. There are some subtle differences between the overload model created by this group, as seen in Figure 53, and that of Group 1. This may be due to the different nature of tasks on board container vessels as opposed to tanker vessels. These differences are discussed below. •
According to this group, the most important factor at the top level of the diagram is loading due to primary task characteristics. The group agreed that, on board a tanker vessel, the primary task is both navigation and watchkeeping, and that these tasks are influenced by the complexity of the task, the flexibility of the task procedures and demands from pilotage tasks.
•
The second most important factor at the top of the diagram is fatigue. Again, this group agreed that fatigue is influenced by the length of time on watch, disruptions to watch patterns, the quality of the rest periods, and responsibility to additional duties. However, this group also commented that fatigue is influenced by the quality of rest management. By this, the group meant that the person has a responsibility within themselves to ensure that they have enough rest breaks. For example, a crew member should not decide to sacrifice a rest break for the sake of spending a longer time ashore. However, the group also commented that the master on board the vessel should ensure that rest breaks are managed effectively, and that crew members get adequate time ashore, so that they should not have sacrifice their rest breaks.
•
The next most important factor is degree of bridge crew competence. This group also agreed that this factor is influenced by training, experience, communication, and the clarity of roles and responsibilities. As bridge crew competence increases feelings of overload decreases.
209
Figure 53 Group 2 overload model (container vessels)
210
•
Manning levels and adverse environmental conditions were given equal weighting by this group. The group agreed that manning levels will either be good or bad, and that the addition of one extra person on the bridge can significantly decrease overload. Adverse environmental conditions, on the other hand, are influenced by conditions both inside the bridge and outside the vessel. The conditions include the level of natural light, the degree of restricted visibility, the degree of bad weather, and the degree of departure from ideal ambient conditions inside the bridge, such as temperature, humidity, lighting and ventilation.
•
The next most important factor at the top level of the diagram is the quality of bridge automation. This factor is influenced by the usability of the various bridge systems, and the quality of alarm management. The group commented that alarm management on board the vessel tends not to be very good, and that alarms tend to cascade, meaning that if one goes off, then suddenly almost all of the alarms will go off. This can be very distracting, particularly when trying to manoeuvre in or out of port, or trying to berth the vessel, and so can increase feelings of pressure or stress.
•
At the other end of the scale, the least important factors influencing overload are distractions and concurrent task demands. Distractions are influenced by telephone, radio and other distractions of that nature, on board relationships with the other crew members, and the level of onboard operations, including noise and vibrations. Concurrent task demands are influenced by administrative tasks, personal tasks, and commercial pressures in the form of pressure to adhere to schedules, etc.
When the overload model had been completed, one of the members of the group, a captain, described an overload scenario in which he was trying to berth the vessel in extremely bad weather. There was heavy rain, so visibility was quite poor. There was also a pilot on board, but he was not helping the captain in any way, and so manning levels were considered to be quite low, as the pilot did not contribute. The task of berthing this vessel in an unknown port was quite complex, and not flexible in any way, and was not helped by the bad weather. With this scenario in mind, the group entered ratings into the overload model, and agreed that the calculated SLI of 0.71 is a realistic reflection of the overload that a crew member would experience in this situation. It was also agreed that this is probably the maximum allowable loading level for a scenario such as this, and that if any of the other factors had worsened, then an accident might have occurred.
211
2.2.2 Group 2: Underload model In the final part of the workshop, Group 2 addressed the underload situation. Although this group were not very familiar with underload situations, this group decided to address this class of scenarios by modifying the overload model developed previously. The justification for this approach was that some factors, such as fatigue, applied in both overload and underload scenarios, even though their weights and ratings might be different in these two cases. They also felt that certain factors that were present in overload but not underload conditions could be removed from the model by assigning zero weights. Some factors were also added to the underload model, e.g. physical activity, that were not present in the overload situation. The resulting model is shown in Figure 54 below. The participants in this session proposed that a rating of 50 be taken to represent an average, i.e. optimal level of loading in the scenario. The degree to which a rating was less than 50 was then taken as the extent to which underload was potentially a problem. From this perspective, the rating scale ranges from 50 (best case) to 1 (worst case, i.e. greatest possible level of underload). This can be compared with the approach adopted for the overload scenarios, where the rating scale was taken to range from 1 (best case) to 100 (worst case). There are certain attractions in using a bipolar scale where 1 to 50 is a measure of potential underload, and 50 to 100 measures potential overload. However, in this case, separate models with different factors, weights and ratings would need to be used. For the reasons previously outlined, it was decided to retain separate models for the underload and overload situations.
212
Figure 54 Group 2 underload model (container vessels)
213
2.3 Group 3: Condor Ferries crew members There were eight senior deck officers present for this workshop. The officers had varying backgrounds and levels of experience, ranging from working on deep sea vessels, tankers other ferry companies and coastal vessels. At the beginning of the session, one of the officers asked about how the end tool will be used. It was explained that, at that point, the MCA had not yet decided how the tool should be used, whether in a regulatory context, or whether it will be used by shipping companies. As with Maersk, the Condor Ferries officers expressed interest in the tool, and agreed that they would like to see the final tool and perhaps have a say in how it should be used by the MCA. A brief discussion was held with the officers to determine how often they experience overload and underload situations. Because the session was to be relatively short (08:30 until 15:30), it was decided that it would be more efficient to focus on creating an overload model. The officers confirmed that, due to the nature of their work on the high speed ferries, and the fact that their voyages last for a maximum of twelve hours, they do not often experience significant underload. However, they agreed that overload is quite often an issue for them, and so the time and resources available would be better spent focusing on this issue. 2.3.1 Comments about the specific factors in the overload model The group were shown the seed overload diagram, and the eight primary influencing factors and their subfactors were explained in detail, to demonstrate how the Influence Diagram works. The group made some comments about various factors, which are listed below: •
Fatigue: Fatigue is also affected by environmental conditions, as adverse weather can wear the individual out quickly, especially if he/she is already suffering from fatigue.
•
Experience: Experience affects the competence of the bridge crew, but also affects the individual’s ability to handle workload, distractions, etc. Experience is generic to all types of vessels. Bridge crew interaction and experience is more important on a high speed craft, as opposed to one-man-bridge operation on tankers. Bridge crew of the high speed craft tend to be more qualified than on deep sea vessels (e.g., high speed craft will always have a master, chief officer, chief engineer on the bridge, deep sea vessel may only have a single third officer on the bridge).
•
Primary task characteristics: The group mentioned that commercial pressures, such as pressure to keep to schedule, can influence flexibility, when combined with, for example, severe weather conditions.
214
Tools will also affect flexibility, for example, the quality of the tools available to handle situations, such as the manoeuvrability, newness and capabilities of the vessel (i.e., robustness of the vessel’s systems). One officer mentioned that Condor vessels are “simplex”, meaning that there is one system controlling everything. Other vessels tend to be “duplex”, i.e., two systems. On Condor vessels, if the system goes down it affects all aspects of the vessel, and can cause problems. •
Communication: This factor can be further decomposed to include language and culture, and the experience of working together. The group pointed out that communication is much easier when all crew members speak English as their first language, and when they come from similar backgrounds and cultures. In addition, experience of working together builds up trust and morale, which in turn improves communication.
•
Flexibility: The group returned to this factor to make the distinction between operational flexibility and the flexibility of the task. Operational flexibility is more dependent on the vessel characteristics. The group had a discussion about the definition of primary task and secondary, concurrent tasks. They agreed that, on the high speed craft, it is difficult to distinguish between primary and secondary tasks. The primary task is to get their passengers from point A to point B, and all other tasks are secondary and concurrent. Some tasks have a certain degree of flexibility, and others are not flexible at all. Some tasks can be delegated to other crew members.
•
Individual experience: Again, the group pointed out that some people are more capable of handling high workload situations than others. It was explained that the function of training is to raise everyone to the same level of ability. The organisation accepts that there will be some differences in personal capabilities, but that this should be minimal if the training is effective.
•
Age: There was a brief discussion of age, and how it might affect the ability to handle high workload situations. There was a general agreement that age can impair the ability to allocate resources, and that this might be a factor, but not relevant to this overload model.
•
Navigational tasks: There was general agreement that pilotage and traffic density (collision avoidance) both influence navigational tasks. On a deep sea vessel, these would be separate, distinct tasks, but on the ferries they are part of navigation. One officer made the comment that there is sometimes a separate pilot on the bridge, but that this does not happen often. There was then a general agreement that pilotage and traffic density are separate concurrent tasks.
•
Primary task characteristics: The group returned to this topic again to try to define what is meant by ‘primary’ task. There was a general agreement that the primary task can change over the course of the
215
voyage. It was explained that the primary task is the most important of the concurrent tasks in which you are currently engaged. The primary task may be very complex, requiring many resources, which in itself makes you feel overloaded. Alternatively, the primary task may not be complex or require many resources, but, in combination with other concurrent tasks, distractions, fatigue, etc., it may lead to loading. It was mentioned that perhaps the difficulty in defining the primary task is due to the nature of the tasks on board the high speed craft. On deep sea vessels, primary tasks are generally more clearly defined (e.g., OOW is involved in navigation and/or watchkeeping, which is the primary task, and chart correcting and administrative tasks, which are secondary concurrent tasks). However, on a ferry, there are many tasks of equal importance occurring at the same time, all of which can be defined as the primary task. Perhaps it would be better to have one factor to cover all tasks? It was mentioned that it would perhaps be better to think in terms of “task demands” rather than concurrency. There are many tasks that have several strands, and these can affect the individual’s loading in different ways. •
The group was asked about the mental demands that they experience from day to day. One officer said that manoeuvring out of port is a complex task, and that visibility can affect this. Another officer said that the awareness of the number of passengers on board, and their safety, contributes to loading. There is also the pressure to keep to the schedule.
Once the overload model had been constructed, as shown in Figure 55, the group now started to add weights to the factors. Some comments made during this part of the session are listed below: •
One officer mentioned that the general welfare of the passengers, vehicles etc. is constantly in the back of your mind, and that this will take up attention, but it’s not an aspect of ship operations. There was a discussion about whether this is a distraction or a separate issue. Agreement that it depends on the circumstances. For example, if someone has fallen down a stairs whilst the individual is trying to berth the ship, this is a significant distraction, as he/she still has to berth the ship before he/she can take any action. However, if it happens en route, then it becomes an additional task as the individual has to contact the relevant authorities and arrange for medical help.
•
There was some discussion about the importance of manning levels. There was some disagreement about how important manning levels are in overload situations. Some officers argued that manning levels do not significantly affect overload. However, the majority agreed that manning levels are quite important, and can affect overload in different ways (e.g., an extra person can make a significant difference in an emergency or critical situation.
216
2.3.2 Testing the model with scenarios The group then discussed two scenarios to test the model. Scenario 1: To add ratings to test the model, the group was asked to think of a specific scenario in which they felt overloaded. The first scenario was outlined as follows: The vessel was leaving Poole on two (of four) engines, with the promise of having a third engine on line by the time the vessel reached a particular buoy. The third engine did not come online, which meant that the vessel was sailing on half power. The weather conditions were bad, and a passenger on board had had an accident, and had to be taken off the vessel by helicopter. The SLI for this scenario was calculated at 0.56 (as seen in Figure 55), which was agreed to be about right. It was a complex situation, but not at the boundaries of overload. Scenario 2: The second scenario was outlined as follows: The ferry was one mile away from another vessel that had hit a rock. There were 100 passengers on board the vessel that had to be rescued. Weather conditions, visibility, etc., were fine. This involved a very complex task of manoeuvring the ferry to pick up the passengers safely, as well as co-ordinating the rescue effort (helicopters, coast guard, other vessels, etc.). The calculated SLI for this scenario was 0.56. Scenario 2 (revised): During the second scenario, it was pointed out that there was a slight discrepancy in how the ratings were being added. For some of the factors, the officer was asked whether that factor was good or bad, on a scale of 1 to 100. The officer would then pick a rating on this scale (for example, Complexity was given a rating of 80, as the task was a very complex one, but not the most complex situation imaginable). However, for some other factors, the officer would mention that the factor was good, but that was normal, and so it was given a rating of 50. For example, the officer said that the quality of rest periods was good, but that this was the norm, and so it was given a rating of 50. It was pointed out that 50 meant average conditions, which means that the factor could get a lot better or a lot worse. However, if the officer said that conditions were good, then surely the rating should be much higher than 50? Another example was the language and culture factor, which influences communication. Again, the officer said that this was not a problem, and that is normally the case, so it was given a rating of 50. However, the group was asked if the level of communication was actually ‘good’, rather than ‘normal’, and if this was because all of the officers on board the bridge were English. The officer replied that this was the case, and so changed the rating to 90.
217
The group assessed the model for the second scenario again, and checked the ratings to ensure they reflected the actual conditions for that scenario, rather than the ‘normal’ conditions for day-to-day voyages. The calculated SLI was now 0.53. The officer involved in that scenario agreed that this was a more realistic reflected of the actual loading he experienced during this scenario.
218
Figure 55 Condor Ferries overload model
219
2.4 Group 4: James Fisher crew members There were five senior officers present at this workshop. Their ranks were as follows: master, chief officer, second officer, chief engineer, second engineer. The workshop began with a general discussion about the types of vessel that they work on, and the typical tasks on board. The group reported that they work mostly in Northern Europe. The sea passages can range from a minimum of about five hours to a maximum of about three days. The passages are generally short haul voyages, with a quick turnaround at each port. They occasionally get a few hours at anchor, usually when waiting for a tide. However, even when at anchor, the seafarers will be quite busy and will have a lot of work to do. This is typically when they catch up with administrative or maintenance tasks. The officers all agreed that, while they are always busy, they are not always under pressure. The primary task, as identified by the group, is ship management. The officers agreed that it is difficult to define what the main primary task is, but agreed that navigation and watchkeeping encompass the majority of their time on watch. However, because the crew on board is relatively small (usually 8 persons), each crewmember will be involved in all aspects of running the ship. There are only two disciplines on board (deck and engineering), and everything is done by one or both of these departments. When on watch, the main duties involve lookout, monitoring radars and traffic, and monitoring the ship’s track. When coming into port, the watchkeeper will usually liaise with the pilot until about half an hour before berthing. The master will then come up to the bridge to berth the ship. The engineers in the group reported that, during normal operations, they are generally not under pressure. They only feel under pressure when things go wrong. Breakdowns are the most common problems in the engineroom, rather than fire or other emergencies. Load switching can occur in the engineroom, particularly if the engineer is not on watch but the alarm goes off. The engineer will then have to get up and investigate. He will have no idea what the alarm is for until he gets to the engineroom. On the bridge, loading is always moderate to high. The group agreed that they have become used to it, and accept it as part of the job. They also agreed that they never experience underload situations. 2.4.1 Comments about the overload model The whole group agreed that visibility has a huge impact on workload. Bridge technology also makes a significant difference. Some vessels have newer equipment, but it may not always be better. There can sometimes be too many alarms, or cascading alarms (i.e., one alarm triggers two or three more if not silenced in time.) Engineroom alarms tend to be duplicated in the bridge.
220
Crew competence and experience is also a significant factor. When there is a multinational crew on board, there can sometimes be language barriers. The difficulties experienced are due to the language differences, however, rather than due to crew competence. This is not as much of a problem as it has been in the past as James Fisher now test foreign crewmembers for language competency (comprehension and use of English). According to the group, manning levels is one of the most considerable factors influencing workload. The entire group agreed that having just one extra person on deck can make a huge difference to the workload of the entire crew. There is a general feeling that the crew is always one person short. However, there is a danger on board some vessels that, if an extra person is put on board, the skipper will then hand over his watch to that person, and take extra rest time for himself, leaving the crew one person short again. To ensure consistency, the group was shown the same initial overload diagram that has been shown during the previous three workshops. Each of the factors in the seed model was explained to the group, to demonstrate how the Influence Diagram works. The group was then invited to comment upon the seed model, and make any changes they felt necessary. Their comments are listed below:
Primary task characteristics: The group agreed that the primary tasks are watchkeeping and navigation.
Morale: The group commented that morale can affect workload as it can affect how quickly or how well a crewmember will do a task. Thus, morale was added to the model as a subfactor of vessel crew capabilities.
Primary task characteristics: There was some further discussion about the separation of tasks, with the general agreement that navigation and watchkeeping are the primary tasks. However, in certain situations, many tasks may occur at once. For example, when coming into port, the OOW will be engaged in navigation, watchkeeping, pilotage, communications, administrative tasks, etc. Therefore, the speed at which tasks occur is important when dealing with workload. Specifically, if multiple failures/errors occur at the same time, then it will cause considerable pressure.
Manning levels and Fatigue: There was a discussion about manning levels and how they relate to fatigue. The group explored whether fatigue and manning levels should be separate factors. There were some comments made that if manning levels are low, this can contribute to fatigue because the same amount of work is has to be completed by less people. However, there were also comments about how, even when manning levels are good, individuals can still suffer from fatigue. Therefore, it was agreed to keep these factors separate.
221
Fatigue: The group commented that fatigue is also affected by the time of year. For example, during the winter months, feelings of fatigue are intensified due to the extended hours of darkness, bad weather, etc. Conversely, when working during the summer months, the crewmembers commented that it is easier to get up when it’s bright outside, and that general levels of morale are better in the summer, combating the effects of fatigue.
Concurrent task demands: The loading experienced as a result of concurrent task demands depends on the individual’s ability to deal with concurrent tasks.
Due to the fact that there were two engineers in the group, it was difficult to decide who the model applies to. In the previous workshops, the participants were all bridge officers, and so the model was created from the point of view of the officer of the watch. The group discussed this issue, and agreed that there is a case for creating a separate model for each rank of officer, as each individual will have different priorities according to his rank. However, for the purpose of this workshop it was agreed that it would be better to focus on the model as a representation of the feelings of the bridge team as a group. The completed model can be seen in Figure 56. 2.4.2 Testing the model using scenarios The group was then invited to add ratings to test the model, and was asked to think of a scenario in which the crew (or officer of the watch) felt very overloaded. The master described the following scenario: Scenario 1: The first scenario involved a vessel that was off the north coast of Holland, near the Friesian Islands. The vessel itself was about twenty-five years old, and very basic, with no automated systems on the bridge or in the engineroom. The weather conditions at the time were quite poor, and the vessel was, consequently, travelling at quite a slow speed. At about midnight, the crank shaft (which connects the engine to the propeller) broke. Over the next forty-eight hours, the crew were involved in a complex situation in which they had no propulsion. The crew were trying to co-ordinate a rescue mission with a number of local tugs, the coast guard, etc. The workshop group were next invited to add ratings for each factor to reflect the scenario above. When all the ratings for this scenario had been added, the SLI was calculated at 0.39. The group agreed that this was a realistic reflection of the loading experienced during this situation. They commented that, because the scenario stretched out over forty-eight hours, the loading wasn’t particularly high for the entire scenario, but that it fluctuated during the
222
two days. The calculated SLI, therefore, probably represents an average score for the entire scenario. Scenario 2: The group were next asked to think of a scenario in which they felt least loaded (a “best case” scenario). They described the scenario below: The second scenario involved a new, modern vessel which was at anchor. It was about 2pm in the afternoon, mid-August, and weather conditions, visibility, etc. were perfect. As the vessel was new, the crew were still being trained on how to use all of the vessel’s systems, and, although they were an experienced crew, they were unfamiliar with this particular vessel. The workshop group added ratings for this scenario, and the SLI was calculated at 0.12. The group agreed that this was a realistic reflection of the loading experienced for this scenario, as, although they were at anchor, they were still engaged in a number of activities, including catching up on paperwork, maintenance, and training for the new ship.
223
Figure 56 James Fisher overload model
224
3. Analysis This section looks at the similarities and the differences between the different models that were created, and discusses the possible reasons for these similarities and differences. This section also discusses the development of a single composite model for overload, based on the four separate overload models, and the development of a single composite model for underload, based on the two separate underload models. 3.1 Analysis of the overload models This subsection looks at the similarities and differences between the separate models, and discusses the reasons for and implications of these similarities and differences. 3.1.1. Common factors across all overload models There are a number of factors and subfactors that are common across all of the overload models that have been developed with the separate groups of sea farers. These factors are listed in Figure 57 below (these factors are described in detail in Appendix 8). Name of Factor •
Quality of bridge automation
•
Clarity of roles & responsibilities
•
Primary task characteristics
•
Communication
•
Bridge manning levels
•
Telephone/radio/other distractions
•
Severity of perceived consequences
•
Onboard relationships/other crew members
•
Bridge crew competence
•
Complexity of the task
•
Environmental conditions
•
Length of time on watch
•
Distractions
•
Disruption to watch patterns
•
Degree of restricted visibility
•
Quality of rest periods
•
Fatigue
•
Responsibility to additional duties
•
Training
•
Level of natural light
•
Experience
•
Weather conditions
•
Flexibility of primary task
•
Administrative tasks
Figure 57 List of common overload factors
As Figure 57 shows, these factors are all present in the seed model, and thus it could be argued that this is the only reason they are common to all four completed overload models. However, it is worth noting that some of the factors in the seed model have been removed by different groups, for different
225
reasons, and this indicates that each of the groups did spend time discussing the factors in the seed model, before deciding whether to keep these factors or not. As mentioned previously, the factors in the seed model and in Figure 57 are commonly known to influence workload, regardless of the type of vessel or the situation under discussion. This is likely to account for their presence across all the overload models created during the Influence Diagram sessions.
226
3.1.2 Differences between the common factors Although there are twenty-four factors and subfactors that appear in each of the overload models, there are differences in how these factors have been weighted by the individual groups, as shown in Figure 58. Weight Name of Factor
Group 1
Group 2
Group 3
Group 4
•
Quality of bridge automation
10
60
50
10
•
Primary task characteristics
75
100
100
15
•
Bridge manning levels
100
70
75
90
•
Fatigue
100
90
80
100
•
Bridge crew competence
85
80
75
30
•
Environmental conditions
85
70
80
40
•
Distractions
75
35
30
25
•
Degree of restricted visibility
100
100
90
100
•
Flexibility of primary task
50
80
80
40
•
Severity of perceived consequences
100
30
100
80
•
Training
75
90
90
80
•
Experience
75
80
100
70
•
Clarity of roles and responsibilities
100
50
50
80
•
Communication
90
100
80
100
•
Telephone/radio/other distractions
100
100
100
100
•
Onboard relationships/other crew members
70
60
90
50
•
Complexity of the task
100
100
90
100
•
Length of time on watch
50
70
60
100
•
Disruptions to watch patterns
80
80
80
80
•
Quality of rest periods
100
100
100
80
•
Responsibility to additional duties
25
25
30
50
•
Level of natural light
30
75
20
50
•
Weather conditions
50
50
100
70
•
Administrative tasks
15
70
10
40
Figure 58 Individual weights of common overload factors
As Figure 58 demonstrates, there are a number of factors that have been similarly weighted by each of the individual groups, meaning that, despite
227
differences in vessel types and operations, these factors are largely seen to have the same effect upon the workload of the bridge crew members. These factors are: bridge manning levels; fatigue; degree of restricted visibility; training (of bridge crew); experience (of bridge crew); communication; telephone / radio / other distractions; complexity of the task; disruptions to watch patterns; quality of rest periods; and responsibility to additional duties. The disparities between the remainder of the factors may be down to the subjective opinions of the individual group members but may also be due to the different trades of the individual groups (i.e., tankers, containers, fast ferries and coastal vessels). The differences in trades means that the groups may have different priorities and goals, and thus the same factor might be of utmost importance to one group, but of relatively little importance to another. For example, weather conditions was weighted quite highly (i.e., given a weight of 100) by the third group, which was a group of officers from Condor ferries, but was weighted relatively lowly by the officers from the Maersk tanker and container vessels (both weighted it at 50). This may be due to the fact that tanker and container vessel are much sturdier than high-speed ferries, and thus not as susceptible to weather conditions. The fourth group, consisting of officers from James Fisher coastal vessels, weighted weather conditions relatively highly (70), although not as highly as the fast ferries group. This may be due to the fact that coastal vessels have to dock numerous times a day at different ports, and bad weather conditions may hamper their actions, thus increasing workload. 3.1.3 Differences between the overload models There are a number of factors that appear in some, but not all of the overload models and these are listed in Figure 59 below, along with their relative groups and weights. The shaded cells in Figure 59 indicate that that factor was not present in the overload model for that particular group. Again, these factors are explained in detail in Appendix 8. Weight Name of Factor
Group 1
Group 2 10
•
Concurrent task demands
100
•
Difficulty of primary task
100
•
Time constraints of primary task
80
•
Level of morale on board
80
•
Familiarity with task/route/area
80
•
Monitoring onboard operations
80
•
Navigation tasks
100
228
Group 3
Group 4 50
40
100
70
•
Traffic density (collision avoidance)
•
Watchkeeping tasks
•
Distractions from onboard operations
100
50
100
100 80 (table continued on next page)
•
Quality of rest management
50
•
Ambient conditions (on the bridge)
20
•
Pilotage tasks
60
•
Usability of bridge systems
•
Quality of alarm management
90
•
Personal tasks
30
•
Commercial pressure
•
Individual experience (of the OOW)
90
•
Management of passenger issues
10
•
Fatigue due to bad weather
90
•
Operational flexibility
70
•
Maintenance tasks
20
•
Delegation of tasks
100
•
Rescheduling capability
100
•
Robustness of vessel systems
100
•
Common language and culture
100
•
Experience of working together
100
•
Dealing with system failures
•
Unannounced extended tours of duty
100
•
Distractions from personal problems
25
•
Distractions from alarms
70
•
Experience/training in bridge systems
•
Over-reliance on bridge systems
40
•
Situational stress (leading to fatigue)
80
•
Severity of consequences of systems failure
100
•
Speed of escalation of systems failure
80
•
Experience in dealing with systems failure
80
•
Immediate consequences of making an error
100
•
Commercial consequences of making an error
50
70
80
100
100
50
20
90
60 25
Figure 59 Other factors in the overload model
229
100
As Figure 59 shows, there are many factors that are present in one or two of the models, but that are not present in all four models. There are a number of possible reasons for this: • In some cases, factors may have been renamed or combined with another factor to clarify its meaning and impact upon workload. For example, the fast ferries group had a lengthy discussion about the difference between primary and concurrent tasks, and came to the conclusion that, on board a high-speed ferry, there are no concurrent tasks. Thus, that factor was removed, and its subfactors were combined with those of the primary task characteristics factor, to reflect the types of tasks done on board a high-speed ferry. • Some factors are particular to the type of trade the crew members are engaged in. For example, the factor management of passenger issues applies solely to the ferry trade, as the other trades do not take passengers on board their vessels. However, it may be possible to extend this factor slightly, to make it relevant to the other trades. This is explored in more detail in section 5.2. • Some factors may have been discussed in more detail with one group than with the other groups. The very nature of these interactive group sessions means that the model is shaped by the thoughts and feelings of the group on the day, as well as by their more universal opinions on the causes of workload. The facilitator must be careful to control the conversation to ensure it remains within the scope of the Influence Diagram session, whilst at the same time, giving the group enough independence to ensure they are not guided or influenced by the facilitator’s own thoughts or opinions. Thus, there are some areas that may have been investigated more thoroughly in one group than in the others. For example, when exploring how bridge crew competence contributes to overload, the high-speed ferry group discussed the issue of communication, specifically how having a multicultural crew, with language barriers and different cultures and backgrounds, can cause communication problems on board. Similarly, the fourth group discussed the issue of over-reliance on automated bridge systems, which had not been discussed with previous groups. Common sense dictates that these are two factors which can affect any bridge crew, regardless of the type of vessel they work on or the trade that they are engaged in. Thus, the following section introduces a composite model, which attempts to capture all of this knowledge and use it in a single overload model. 3.2 The composite overload model As discussed in the previous section, many factors and subfactors feature in one or two but not all of the overload models. One of the reasons for this may be that one group explored a particular factor or subfactor in more detail than the other groups. There are some factors amongst those in Figure 59 that
230
apply to all bridge crew members, regardless of shipping trade or type of vessel they serve on. To this end, a composite model was developed in an attempt to capture all of this knowledge and combine it in a single overload model that is generic enough to be used across all shipping trades, and yet specific enough to include the opinions of all the officers in the Influence Diagram sessions. The composite model is shown in Figure 60. The factors are described in detail in Appendix 8.
231
Figure 60 Composite overload model
232
As can be seen from Figure 60, the composite overload model contains all twenty-four factors and subfactors that were common to each of the separate overload models. However, the composite model also contains a number of factors or subfactors that were present in only one or two of the models. During the development of the composite model, some professional judgement had to be used to determine whether these factors should be included or not. If the factor is clearly generic, then it was included in the composite model, even though it may only appear in one of the individual overload models. For example, the factor degree of common language and/or culture was discussed by only one group, but it is obvious that this factor could apply to any vessel, regardless of the differences in vessel size, trade etc. If the crew on board the vessel share a common language or cultural background, then communication is likely to be more effective than on a vessel where the crew have language barriers or cultural differences. Conversely, the factor loading due to concurrent task demands was not included in one of the individual models, but it was decided to include this factor in the composite model. During the development of the individual model in which that factor was removed, the group had a general discussion about the difference between primary and secondary tasks, and concluded that it was difficult to differentiate between the two. However, professional judgement was again employed and it was decided that, whilst the primary and concurrent tasks may change during the OOW’s watch, the factors that influence loading will not, and thus there is a distinction between the two. 3.2.1 Verification of the composite overload model In order to verify the contents of the composite model, and to assign weights to the factors, it was decided to contact the previous participants of the Influence Diagram sessions by email and ask them to evaluate the new model. The participants were asked to say whether they agreed that the list of factors included in the composite model is realistic, and whether it reflects their own views of the causes of mental workload. They were also asked if, in their opinion, there are any missing factors, or any factors that should not have been included in the model. The participants were also asked to weight the factors as they would have done during the consensus group (i.e., weight the factors relative to one another). Unfortunately, due to various reasons, there was relatively little success in following up verification with the original ID session participants. Feedback was received from four participants, and the results are outlined as follows: •
All participants agreed that the composite model includes all of the factors that they consider contributory to feelings of mental overload.
•
There are no spurious factors in the model.
233
•
The placement of all of the factors within the model is correct.
•
One participant felt that the length of time away from home can affect fatigue, but this can be considered part of the cumulative stress factor.
•
Whilst most of the factors in the model were weighted differently by the four participants, the differences between the weights are generally relatively insignificant. There are some significant differences in weights, but these are in areas where differences would be expected, due to the disparity between the two types of shipping. For example, a fast ferry is likely to have more crew members on the bridge, and thus manning levels is not likely to be seen as an issue. Conversely, because there are less people on the bridge of a tanker ship, and the automated systems tend not to be as sophisticated as on a fast ferry, there are also likely to be less distractions, and so this not seen as a major issue.
A consensus group was also convened in order to validate both the composite overload and underload models. Again, due to busy schedules, it was not possible for all of the previous participants of the Influence Diagram sessions to attend the meeting. However, four senior officers from two shipping companies were able to attend. The participants were shown the composite overload model first, and were invited to discuss the factors and their weights. The group were asked to consider whether the model was general enough to be useful to different sectors within the shipping industry (i.e., tankers, coastal vessels, fast ferries, accident investigators, shipping company owners, etc.) and yet specific enough to allow the individual to model a specific scenario. The overload model was discussed in some detail, but overall the group agreed that the model was realistic, and that the weights were as accurate as possible. One factor was added to the model – Degree of preparation for contingencies – as a subfactor of Dealing with contingencies. The participants felt that if the bridge crew have been trained in how to deal with the contingency situation, and/or there are procedures available in how to deal with the situation, then loading is likely to decrease. The validated overload model is shown in Figure 61.
234
Figure 61 Validated composite overload model
235
3.2.2 Testing the composite overload model In order to test the robustness of the overload model, three different sets of weights were entered into the model. The first set is the average weight for each factor, based upon the weights assigned during the Influence Diagram sessions. The second set includes the weights provided by the participant from the tanker vessel group. The third set includes the weights provided by the participant from the fast ferry group. Each version of the model was tested using the same scenario, as outlined below. This scenario was taken from one of the Influence Diagram sessions, and thus the ratings given during that session will be used to evaluate the composite model. The SLI in the original session was 0.71 for this scenario. A master was trying to berth a large container vessel in extremely bad weather, in an unfamiliar port. There was very heavy rain, so visibility was quite poor. There was a pilot on board the vessel, but the master said that he was not contributing or helping in any way, thus manning levels were considered inadequate. The task of berthing this vessel in an unknown port was quite complex, not flexible in any way, and the situation was worsened by the bad weather and poor visibility. Each version of the model had the following ratings entered for each factor, as shown in Figure 62. (Note: there are some factors in the composite model which were not present in the original model used to test this scenario. In these cases, professional judgement was used to give an approximate rating for that factor).
236
Factor
Rating
Reliability of bridge systems on demand
10
Degree of over-reliance on automated systems
10
Training in and/or experience of using bridge systems
50
Usability of bridge systems
20
Quality of alarm management
50
Time constraints of the task
20
Complexity of the task
100
Familiarity with task, route, etc.
20
Ability to delegate tasks
10
Ability to reschedule tasks
10
Perceived immediate consequences of making an error
60
Perceived commercial consequences of making an error
10
Bridge manning levels
20
Length of time on watch
60
Disruption to watch patterns
60
Quality of rest periods
70
Responsibility to additional duties
50
Weather conditions
100
Quality of rest management
50
Cumulative stress
50
Administrative tasks
10
Collision avoidance
80
Personal tasks
10
Monitoring other onboard operations
1
Dealing with passenger / cargo / crew issues
1
Clarity of roles and responsibilities
20
Degree of collective experience
50
Training
30
Unannounced extended tours of duty
1
Adverse environmental conditions
80
Degree of common language and/or culture
10
Familiarity with working together
1
Speed of escalation of contingency situation
60
Severity of consequences of contingency situation
60
237
Experience in dealing with contingency situation
50
Level of natural light
50
Quality of visibility
90
Severity of weather conditions
100
Quality of ambient conditions on the bridge
70
Telephone / radio / other distractions
50
Distractions from other crew members
80
Personal problems
10
Unnecessary / cascading alarms
20
Distractions from other onboard operations
50
Individual experience of the OOW
70
Figure 62 Ratings used to test composite model
These ratings were entered into each version of the overload model, and yielded the following results: Model version Calculated SLI Average weights
0.53
Tanker weights
0.53
Fast ferry weights
0.49
Figure 63 Calculated SLI for each version of composite model
As Figure 63 shows, the calculated SLI for each version of the composite model is significantly lower than that calculated during the original Influence Diagram session in which this berthing scenario was used to test an overload model (SLI of 0.71). There are a number of reasons why this might be the case: •
The composite model includes many factors that were not present in the original overload model developed during that Influence Diagram session, and thus, the impact of certain factors may have been diluted.
•
The ratings entered into the original overload model for this scenario were entered by the master who was directly involved in the scenario, with general consensus from the remainder of the group. Some of the ratings above may be inaccurate as they are for factors that were not present in the original model, and so professional judgement had to be used.
•
The weights for the three different versions of the model were not reached by consensus, but rather represent an average of the weights from the four individual models, and two individual’s opinions on the
238
factors. Thus, the weights may be somewhat more subjective than if they had been determined by consensus. The model was also tested during the consensus group by entering ratings for a particular real-life scenario. The scenario in question had occurred whilst one of the participants was Chief Officer on board a tanker near Iceland in late December in extremely bad weather conditions. The participant reported that, although the crewmembers had experienced prolonged bad weather conditions, the crew were well trained and quite experienced, and manning levels were excellent. Therefore, he did not feel particularly loaded during the scenario. When the ratings for this scenario were entered into the overload model, the calculated SLI was 0.57, which the participant agreed was a relatively accurate reflection of how loaded he had felt at the time. 3.3 Analysis of the underload models As mentioned previously, two of the four groups that participated in the Influence Diagram sessions reported that they did not have a significant amount of experience of underload sessions to be able to develop a complete underload model. Unfortunately, this means that only two underload models have been developed within the scope of this project. As can be seen from Figure 53 and Figure 54 there are a number of differences between the two underload models. Figure 64 outlines the common factors between the two models, and their respective weights. Name of factor
Weight Group 1
Group 2
Primary task characteristics
100
100
Fatigue
100
100
Comfort of physical environment
40
20
Level of physical activity
50
50
Length of time on watch
50
70
100
100
Disruptions to watch patterns
80
80
Stimulation from alarms
70
90
Monitoring onboard activities
75
80
100
70
50
50
Quality of rest periods
Administrative tasks Adverse weather conditions
Figure 64 Common underload factors
239
As Figure 64 shows, although there are relatively few common factors between two models, their weights are quite similar, which implies that the two groups have quite similar feelings about the factors that cause underload, and the extent to which each factor impacts on underload. There are a number of different factors between the two models, but it should be pointed out that the second group used the same model (with different weights) for both underload and overload situations. Any factors that were not relevant to the underload situation were given a zero weight, indicating to the software that the particular factor does not contribute to the overall SLI. 3.4 The composite underload model To develop the composite underload model, the model developed by the second group of participants (Figure 54) was used as a base, as this is more comprehensive than that created by the first group (Figure 52). Firstly, a number of factors were removed from both models, as they had been given zero weights by the group participants, indicating that those factors do not have any impact on underload. The factors that were removed are: • • • • • • •
Perceived severity of the consequences of error Clarity of roles and responsibilities Flexibility of navigation task procedures Pilotage tasks associated with navigation Flexibility of watchkeeping task procedures Pilotage tasks associated with watchkeeping Duration of reduced visibility
The remainder of the factors were merged to create a single composite underload model, as shown in Figure 65. Again, professional judgement had to be used to decide which factors to include and which to remove. For example, the model developed by the first group of participants included navigational tasks as a subfactor of the primary task. However, in the interests of preserving the generality of the model, it was decided to remove this factor as the primary task may not always include navigational tasks. The weights were taken directly from the underload models that had been created by the two groups of participants. In the cases where the two groups had assigned different weights to a particular factor, an average weight was inserted into the model.
240
3.4.1 Verification of the composite underload model The composite underload model was also validated by the consensus group of four senior officers. Again, the four participants agreed that the underload model was generally accurate, and only minor changes were made to the wording of some factors (e.g., ‘Quality of bridge automation’ was re-phrased as Degree of interaction with automated bridge systems). The factor Ambient conditions was considered a particularly important factor in underload situations, and thus was promoted from being a subfactor of Adverse environmental conditions to a primary factor that directly contributes to feelings of underload. The validated composite underload model is shown in Figure 66. 3.4.2 Testing the composite underload model In order to test the composite underload model, ratings based on the following scenario were entered into the model, to determine the SLI. There was an OOW on watch by himself on the bridge. It was late at night and quite dark outside, although the visibility was very good. The weather was very calm and there was no other traffic around the vessel. The OOW had been on watch for approximately three hours, and was experiencing a moderate degree of fatigue as his watch patterns over the past few days had been quite disrupted, due to a recent visit to a port. Recent bad weather had prevented the OOW from getting adequate rest when not on watch, due to the deep pitch and roll of the vessel. The OOW had minimal contact with other crew members. His primary task was lookout. There was a deadman alarm, but it had been switched off by the previous OOW.
241
Figure 65 Composite underload model
242
Figure 66 Validated composite underload model
243
When this scenario was used to test the original underload model during the Influence Diagram session, the SLI was calculated at 0.83. Figure 67 displays the ratings that were entered for this scenario: Factor Adverse weather conditions
Rating 5
Physical movement around the bridge
20
Quality of bridge automation
50
Complexity of the primary task
1
Familiarity with the primary task
90
Manning levels
50
Length of time on watch
75
Disruption to watch patterns
60
Quality of the rest periods
20
Responsibility to additional duties
50
Quality of rest management
70
Cumulative stress
85
Administrative tasks
1
Personal tasks
20
Monitoring other onboard operations
1
Quality of alarm management
90
Provision of a deadman alarm
1
Interaction with other crew members Frequency of navigational hazards
30 1
Training including dealing with overload
50
Experience in dealing with overload
50
Frequency of communications
50
Level of natural light
50
Visibility
80
Degree of bad weather
1
Ambient conditions
90
Telephone / radio / other distractions
10
Distractions from other crew members
10
Distractions from other onboard operations
80
Personal / domestic pre-occupations
1
Commercial pressure
1
Figure 67 Ratings for composite underload model
244
When the ratings for this scenario were entered into the composite overload model, the calculated SLI was 0.74, which is not significantly different from the original calculated SLI of 0.83. The composite underload model was also tested during the consensus group meeting, using a real-life scenario experienced by one of the participants, in which he was master on board a vessel that had been at sea for a week. The master was on the bridge with an AB, acting as lookout, for the 0000 – 0400 watch. The weather was very calm; there was a full moon, but it was cloudy. Somebody had previously turned the magnetic heading alarm off, and the automatic steering had put five degrees of starboard helm on. The vessel had drifted more than ten degrees off course, but the alarm did not sound. The master and the AB were both quite drowsy, due to lack of activity, and did not notice that the vessel was going around in a slow circle. Suddenly, the moon came out from behind a cloud, alerting the master and the AB, and they realised that they were facing in the wrong direction. Ratings for this scenario were entered into the underload model, and the calculated SLI was 0.81, which the master agreed was an accurate reflection of the level of loading experienced during that scenario.
245
4. Feedback This section outlines the validation methods that were employed and their results to date. 4.1 Feedback from the participants of the Influence Diagram sessions After each of the Influence Diagram sessions, the participants were emailed a feedback questionnaire, to ask their opinions on the sessions and the results of the sessions. A copy of the email is contained in Appendix 2 of this report. The participants were asked to answer questions on four different topics: •
General questions about the ID sessions – for example, did they find the session interesting, did they think the subject of cognitive workload was fully explored, etc.
•
Questions about the ID process – for example, did they find the process easy to follow, do they think it resolved all issues raised, etc.
•
General questions about the results of the sessions – for example, do they think the results will be of practical use in helping to reduce underload or overload, etc.
•
Specific questions about the results of the sessions – for example, do they agree with the results, are there any factors missing, etc.
Unfortunately, due to some participants being on leave and some incorrect email addresses, feedback was not received from all participants. However, of the feedback that was received, the majority of the participants were very positive about the whole experience, and the results that came out of the ID sessions: •
Most participants found the ID sessions very interesting and informative.
•
Most participants commented that this was the first opportunity they have had to discuss cognitive workload, and how it affects them personally.
•
The participants were pleased to see that work is being done to try to understand the causes and effects of mental workload.
•
The participants felt that, whilst a lot of useful work was done during the ID sessions, this needs to be pursued further in order to fully understand the causes of mental workload.
246
•
Specifically, the participants commented that it would be useful to widen the audience to include input from shipping company managers, insurance companies, vessel designers, etc.
•
Most participants commented that the results could be used to investigate accidents where workload may have been an issue.
•
The results could also be integrated with formal bridge teamwork training, regular situational trialling, crew resource and management training, etc.
Overall, the participants felt that the results of these ID sessions have the potential to be very useful and to give a greater insight into the factors that cause errors/failures on board a vessel. However, it is essential that the input into these sessions comes from as wide an audience as possible to get a true representation of the factors and how they affect different sectors of the shipping industry.
247
5. Validation of the CLIMATE software tool The Marine Accident Investigation Board (MAIB) was invited to view and test the CLIMATE software tool using MAIB scenarios, and to provide feedback on the potential usefulness of the tool for their accident investigation work. It was anticipated that, whilst the tool would not be able to specifically identify the cause(s) of an accident, it would be able to indicate which factors are more likely than others to have been contributory, thereby helping to identify systemic causes. Two accident scenarios were analysed using CLIMATE. The first scenario was that of a collision of a vessel with its loading ramp. This accident investigation is still underway, and thus specific information regarding the accident cannot be published. When ratings for the specific factors were entered into the overload model, the CLI was calculated at 0.36, which coincides with the MAIB’s opinion that high workload was not a causal factor in this accident. The second accident scenario analysed using CLIMATE was the grounding of the Lerrix off the Darss peninsular in the Baltic Sea on the 10th October, 20055. The following is a summary of the accident: At 2342 on 10th October 2005, the British registered general cargo vessel Lerrix ran aground of the Darss peninsular in the Baltic Sea. The single hold vessel was carrying a cargo of second-hand vehicles destined for Klaipeda in Lithuania. Twenty-five minutes later the master re-floated the vessel using astern propulsion, narrowly avoiding a second grounding as he did so. It was the master’s first command with the company. Earlier the same day, the vessel had transited the Kiel Canal, and the master reported that his rest period between midnight and 0600 had been disturbed by nervous tension brought on by the vessel’s approach to and navigation down the River Elbe. During the afternoon the master suffered a second disturbed rest period while transiting the canal, making several visits to the bridge to check progress and, finally, to pilot the vessel outbound from the canal lock to sea. At about 2230, the lookout requested and was granted permission to proceed below to complete cleaning the galley. A short while later the master fell asleep in the bridge chair. As a result, the vessel missed a planned alteration of course at 2242 within the TSS and continued on a 090 heading at 10.5 knots until grounding at 2342.The vessel’s movements were monitored by Warnemunde VTS and, when it became apparent that the vessel was not following the prescribed route, the VTS operators made several attempts to contact Lerrix by VHF, but received no response. 5
Report number 14/2006. The full report is available online at: http://www.maib.gov.uk/cms_resources/Lerrix.pdf
248
When the mate arrived on the bridge at midnight the master, who had woken seconds before, was seen at the engine control level with maximum astern power set. The general alarm was sounded, sounding were taken and at about 0007 the vessel floated free and proceeded to anchor close by. The master was breathalysed for alcohol consumption and the test proved negative. The accident was analysed using the underload model and the CLI was calculated at 0.80, which corresponded with the MAIB’s view that overload was a significant factor in the accident. Overall, the MAIB commented that the tool seemed straightforward to use and relatively accurate in so far as the CLI’s calculated by CLIMATE for the two scenarios examined corresponded with the MAIB’s own opinion on the extent to which workload had contributed to the two scenarios. The MAIB inspectors commented that they would wish to apply the tool to many different accident scenarios to gain confidence in its applicability and usefulness, but that, in doing this, trends of causation could emerge which could be valuable. For example, the results could help to explain why an OOW was overloaded / underloaded or perhaps indicate that, in a particular situation, he was not overloaded / underloaded and that other factors were more important. MAIB observed that excessive cognitive workload is a common factor in many marine incidents, and equally that modern automated bridge systems often remove the stimulus of watch-keeping activity so that physical fatigue is no longer kept at bay in the underload situations. The tool was therefore potentially very useful.
249
Appendix 7 – Diary study planning
250
1. Development of a pilot diary study In order to validate the factors and weights of the overload and underload models it was decided to develop a diary study to be distributed amongst the shipping companies that were involved in the original Influence Diagram sessions. The diary study would consist of a short questionnaire to be completed by the Officer of the Watch at the end of his/her watch. An example of a diary study questionnaire is contained in Appendix 10 of this report. It was decided to run an initial pilot study with a single shipping company to determine how the questionnaires should be distributed and to get initial feedback from the participants about how easy or difficult the questionnaires were to complete, how relevant the questions were, etc. A diary study questionnaire was developed according to the overload model that had been created by that particular shipping company during the ID sessions. As the overload models tend to be quite large, it was decided to include questions about the most influential factors only, as the OOW would be expected to complete the questionnaires at the end of his/her shift, before taking a rest break. It was considered unrealistic to ask the OOW to fill out a questionnaire containing fifty questions, and so a decision had to be made regarding which questions to ask. As explained previously, when developing the Influence Diagram, there is an option to display the rank order of the factors, according to their impact upon workload. It was decided to use this ranking as a basis for the questions that should be asked. It was also decided that no more than fifteen questions should be asked, to ensure that the questionnaire is relatively quick to complete, and thus more crew members would be willing to participate. The overload model used for the basis of this questionnaire was examined to determine the factors that were ranked as having an influence of between one and ten. These factors are listed in Figure 68. As the figure shows, some factors were ranked as having the same amount of influence on overload (for example, quality of rest periods and situational stress are both ranked as 6). However, as there are only thirteen factors with an influence rankings between one and ten in this model, it was decided to use all thirteen factors in the questionnaire. The questions were divided into categories, according to the respective primary factors they are each linked to, as the following list shows:
251
Name of factor
Rank
Available crew resources (manning levels)
1
Severity of perceived immediate consequences
2
Severity of perceived commercial consequences
3
Length of time on watch
4
Collision avoidance
5
Disruption to watch patterns
6
Quality of rest periods
6
Situational stress
6
Quality of visibility
7
Pilotage (concurrent task demands)
7
Adverse weather conditions
8
Responsibility to additional duties
9
Complexity of the primary task
10
Figure 68 Top most influential factors for this model
• • •
• • •
Manning levels o Available crew resources Severity of the perceived consequences o Immediate consequences o Commercial consequences Fatigue o Length of time on watch o Disruption to watch patterns o Quality of rest periods o Situational stress o Responsibility to additional duties Concurrent task demands o Collision avoidance o Pilotage tasks Severity of the environmental conditions o Quality of visibility o Adverse weather conditions Primary task characteristics o Complexity of the primary task
The development of the questionnaire raised the issue of the difference between variable and static factors, and those factors which can be improved or which are beyond the control of the individuals, the shipping companies, the regulators, etc. For example, adverse weather conditions is a factor that cannot be changed; manning levels is a fairly static factor, once the voyage is underway, as the number of crew on board will remain the same at least until the next port; and fatigue is a variable factor that some people may be able to handle better than others.
252
However, it is still important to ask about the influence that these static, variable and unchangeable factors have upon workload, as they provide a context for the factors that can be improved. For example, consider the following scenario: A vessel is in open waters, approximately six hours from the next port. The total number of crew members on board the vessel is eight, two of whom are engineroom officers. The weather conditions are quite bad, and the OOW is feeling quite fatigued due to the severe pitch and roll of the vessel. When the OOW is finished his watch, he has a number of non-essential administrative tasks that he needs to complete before he can go to bed. However, he is due to be on watch again in six hours, to act as lookout when the vessel is coming into port. If he stays up to complete his administrative tasks, he will be quite tired by the time he has to act as lookout. In this case, there are a number of factors that cannot be changed or improved in any way. The weather conditions are likely to remain quite bad, meaning that the pitch and roll of the vessel will continue to wear out the crew members. The manning levels will remain the same, so the OOW is still required to act as lookout in six hours time. However, the master of the vessel could tell the OOW to postpone his administrative tasks until after they have left port, which means that the OOW could get six hours rest before going on watch again, which would reduce his fatigue and make him more alert during his next watch. Thus, the knowledge of the adverse weather conditions provides a context by which the situation can be judged. Had the weather conditions been calm, then the OOW would probably not be suffering from fatigue to the same extent, and thus would not have to postpone his other duties in order to get sufficient rest. Therefore, it was decided that questions about the factors such as environmental conditions and manning levels are relevant to provide a context by which the remaining factors can be evaluated. Once the questionnaire had been completed, it was printed and posted to the shipping company for dissemination. It was decided that, for this pilot study, the OOWs of two vessels would be asked to complete these questionnaires after each watch, over a period of two days. On each vessel, there were six crewmembers responsible for watchkeeping, and each OOW would be expected to have two watches per day. Over a two day period, it was expected that a maximum of forty-eight completed questionnaires should be returned. The shipping company was also given a number of copies of a feedback questionnaire, which was designed to find out what the OOWs thought of the diary study, of the questions that they were asked, and any suggestions they
253
had. This feedback questionnaire is also included in Appendix 10, with the diary study questionnaire.
254
2. Results of the pilot diary study Unfortunately, this first phase of the pilot study raised a number of issues with dissemination of materials, etc., which had not been anticipated, thus rendering this phase of the study unsuccessful. However, as a result of this unsuccessful attempt at the diary study, a number of courses of action have been developed to ensure the success of the next phase of the diary study. These courses of action are outlined in the following list: •
Fact-to-face interaction: It is important to have some level of face-toface interaction with the people who will be completing the questionnaires. The participants of the Influence Diagram sessions reported that they already have a huge amount of administrative work to complete on board, so there is a danger that if the OOW is simply handed a questionnaire to complete, he/she may put it aside and forget about it. However, a face-to-face interaction will offer an opportunity to provide more information about the overall project and the purpose of the diary study, as well as an opportunity to answer in questions from the end users. In other words, it will provide an opportunity to “sell” the diary study to the proposed participants, and explain that this is an opportunity for them to report their own opinions and feelings on the factors that influence overload. Of course, it would be impossible to meet each proposed participant of the diary study face-to-face, as some of the shipping companies involved will have huge fleets of vessels and thousands of potential participants. Therefore, it has been suggested that the masters and first officers of a number of vessels should be invited to participate in a face-to-face meeting, as ultimately these officers have the most authority on board the vessels, and thus will be able to ensure that their crew members understand the importance of the diary study.
•
A diary study training session: During the face-to-face meeting with the masters and first officers, it has been proposed that a short training session be given to fully equip them with all of the information they will need to ensure the successful participation of their crew members. During this training session, the masters and first officers will be trained in how to disseminate the questionnaires, what information they should give the crew members about the questionnaires, how the questionnaires should be completed, how the results should be returned, etc.
•
An onboard facilitator: It has been decided that an onboard facilitator would further help to ensure the success of the diary study. It is accepted that, once on board, the master and the first officer may not have time to check that crew members are completing the questionnaires, etc. Therefore, it would be useful if the master or first officer could appoint another crew member to facilitate the diary study,
255
i.e., check that crew members are completing the questionnaires regularly. It would also be useful to contact the facilitator on a relatively regular basis (e.g., once a week) via telephone or email to check on the progress of the study and to answer any questions or issues that may have arisen. It may be helpful, therefore, to include the facilitator in the training session. •
A diary study information pack: Many of the participants of the Influence Diagram sessions commented that they were not given enough information about the overall project before the ID sessions, and so did not know what to expect. It would be useful, therefore, to provide all propose participants of the diary study with an information pack (e.g., a fact sheet explaining cognitive workload, instructions on how to complete the questionnaire, etc.) with the questionnaires. This would provide additional background information for the participants. Any additional questions or comments could be answered by the master, first officer, facilitator or by Human Reliability.
These actions have been discussed in detail, and will be implemented in the next phase of the diary study.
256
Appendix 8 – List and description of overload and underload factors
257
1. List of primary overload factors and descriptions The following table (Figure 69) lists and describes the primary overload factors as defined in the composite overload model (Figure 60). The table also lists the secondary factors which influence these primary factors, according to the composite overload model. These secondary factors are described in Figure 70.
No.
Factor & Description
Influenced by
Quality of automated bridge systems
1
Description: This factor refers to automated systems such as radars, autopilot, and GPS as well as communication systems, etc. Good quality bridge automation means that the systems are easy to use, reliable, well designed and useful. As the quality of the bridge automation increases, loading is likely to decrease.
12, 13, 14, 15, 16
Loading due to primary task characteristics
2
Description: This factor refers to the mental demands experienced by the individual due to primary tasks such as navigation and watchkeeping. It is specifically influenced by task characteristics such as the difficulty of the task and the flexibility of the task. If there is little or no loading from the primary task, this means that the primary task characteristics are not putting the individual under too much pressure. If there is a high level of loading from the primary task, this means that the individual feels under pressure which, in turn, is likely to increase overall feelings of loading.
17, 18, 19
Severity of the perceived consequences
3
Description: This factor refers to the severity of the perceived consequences of making an error, and can be further divided into the immediate consequences of making an error, and the commercial consequences of making an error. If the individual perceives the consequences of making an error to be severe, then feelings of loading are likely to increase.
Bridge manning levels 4
Description: There are sufficient qualified and/or experienced bridge crew members available for the tasks that need to be performed. As the required number of qualified and/or experienced crew members increases, individual loading is likely to decrease as there are more people to share the workload.
258
20, 21
Fatigue 5
Description: Fatigue can cause the individual to feel overloaded as it affects the amount of mental resources available to the individual and can slow reaction times, ability to process information, etc.. Therefore, as feelings of fatigue increase, feelings of loading are also likely to increase.
22, 23, 24, 25, 26, 27, 28
Loading due to concurrent task demands
6
Description: This factor refers to other tasks that may have to be performed by the individual, in conjunction with his/her primary tasks. These tasks may include administrative tasks, monitoring onboard operations, monitoring nearby traffic or personal tasks. If there are too many demands from concurrent tasks, then the individual will have less cognitive resources available, and thus loading is likely to increase.
29, 30, 31, 32, 33
Quality of the bridge crew competence
7
Description: This factor refers to the quality of the bridge crew competence, and includes the level of training, experience, morale, degree of communication, etc. of the bridge crew. If the quality of the bridge crew competence is good, then loading is likely to decrease, as the individual does not have to worry about the other bridge crew carrying out their responsibilities and duties correctly. Conversely, if the quality of the bridge crew competence is bad, then the individual is more likely to suffer loading, as his/her own workload may increase because he/she feels he/she has to double check the work of the other crew.
34, 35, 36, 37, 38
Dealing with contingencies
8
Description: This factor refers to dealing with contingency situations such as a systems or parts failure. In these situations, mental resources may very quickly be consumed as the individual has to deal with a sudden change in routine, and may not have previous experience in dealing with this type of situation. Therefore, as the number or seriousness of the situation increases, loading may also increase.
39, 40, 41
Severity of the environmental conditions
9
10
Description: This factor refers to environmental conditions including the level of natural light, the weather conditions and the degree of visibility. Very severe environmental conditions means that each of these factors is at, or close to, their worst case (i.e., poor level of natural light such as at dawn or dusk, severe weather such as gale force winds and very rough seas, and poor visibility due to heavy fog, snow or rain). As the severity of the environmental conditions increases, loading is also likely to increase as the individual tries to deal with and compensate for these conditions.
Number of distractions
42, 43, 44, 45
46, 47, 48, 49,
259
Description: This factor refers to distractions from sources such as unnecessary telephone or radio transmissions, other crew members and personal problems. As the number of distractions increases, the individual is more likely to experience loading, as his/her cognitive resources are used up with trying to deal with these distractions.
Degree of individual experience (of the OOW) 11
Description: As the degree of individual experience increases, the OOW should be more capable of dealing with the tasks he/she is responsible for, and thus his/her feelings of loading should decrease. Figure 69 List and description of primary overload factors
260
50
2. List of secondary overload factors and descriptions The following table (Figure 70) lists and describes the secondary overload factors as defined in the composite overload model (Figure 60). The table also lists the tertiary factors which influence these secondary factors, according to the composite overload model. The tertiary factors are described in Figure 71.
No.
Factor & Description Reliability of bridge systems on demand
12
Description: The bridge systems are reliable and work correctly when needed. The information given by the bridge systems (e.g. radars, etc.) is accurate, reliable and up-to-date. As the reliability of the bridge systems increases, the quality of the bridge systems also increases.
Degree of over-reliance on automated systems 13
Description: As individuals become more familiar with bridge systems, there is a danger that they may become over reliant or complacent, for example, cancelling alarms before checking out the source of the alarm or checking data provided by automated systems such as the ARPA. If the bridge systems allow individuals to become over-reliant, then it can be assumed that the quality of the bridge systems is decreasing.
Training in and/or experience of using bridge systems 14
Description: The crew have received training for the specific bridge systems on board this vessel and/or they have experience in using these bridge systems. As the level of training and experience increases, the quality of bridge systems will increase.
Usability of bridge systems 15
Description: The bridge systems are physically well laid out and radars, telephones etc. are nearby. In addition, the system interfaces are user-friendly and easy to understand. As the bridge systems become more userfriendly, the quality of the bridge systems will increase.
Quality of alarm management 16
Description: For example, unnecessary, spurious and/or cascading alarms are kept to a minimum. As the quality of alarm management increases, the quality of the bridge systems will increase.
261
Influenced by
Difficulty of the primary task 17
Description: This factor refers to the degree of difficulty of the primary task. As the task becomes more difficult to do, loading due to the primary task characteristics is likely to increase as the individual uses more cognitive resources to perform the task.
51, 52
Flexibility of primary task
18
Description: This factor refers to the degree of flexibility of the primary task. If the individual can postpone the task, or delegate the task to another crew member, then the task can be considered very flexible. As a task becomes more flexible, feelings of loading due to the primary task characteristics are likely to be reduced, as the individual does not have to perform the task if needs be.
Time constraints of task 19
Description: If the primary task is not subject to any time constraints, then the task does not have to be performed within a certain timescale. As the time constraints increase, feelings of loading due to the primary task characteristics are also likely to increase.
Perceived immediate consequences of making an error 20
Description: For example, there is no potential for serious harm or injury to any crew member, or damage to the vessel. If the perceived immediate consequences of making an error are not severe, then feelings of loading are likely to decrease.
Perceived commercial consequences of making an error 21
Description: For example, there are no commercial consequences if the vessel is late coming into port. If the perceived commercial consequences of making an error are not severe, then feelings of loading are likely to decrease.
Length of time on watch 22
Description: Crew members are more likely to feel fatigued towards the end of their watch, rather than at the beginning of their watch.
Disruptions to watch patterns
23
Description: Watch patterns may be disrupted if, for example, crew members have to work an extra watch to cover sudden illness, before getting to the next port. Alternatively, extreme circumstances, such as very bad weather or a system failure, may require crew members to work extra hours until the situation can be dealt with. As watch patterns become more disruption, rest periods may also be interrupted, or may be irregular, preventing individuals from getting adequate rest. As the disruptions to watch patterns decrease, however, fatigue
262
53, 54
is also likely to decrease.
Quality of rest periods
24
Description: Rest periods may be of poor quality if, for example, crew members have to cover extra watches due to sudden illness of other crew members, or if there is very bad weather or other extreme circumstances that prevent crew members from getting regular, uninterrupted, quality rest. As the quality of rest periods increases, however, fatigue is likely to decrease.
Responsibilities to additional duties
25
Description: Additional duties might include administrative, cargo, maintenance, etc. tasks that the individual must complete after his/her watch, and before he/she can take a rest period. If there are an excessive number of tasks, or if the person is unable to complete the tasks in time, then rest periods may become disrupted, leading to increased fatigue.
Weather conditions
26
Description: Weather conditions (for example, rough seas or winds increasing the pitch and roll of the vessel) may contribute to fatigue by preventing crew members from getting adequate rest. Weather conditions may also lead to fatigue as the individuals may become physically tired due to compensating for the pitch and roll of the vessel or due to dealing with bad weather when on deck. However, as the severity of weather conditions decreases, fatigue is also likely to decrease
Quality of rest management
27
Description: Rest management refers to how well rest periods are controlled and allocated by the master, and managed by the individual. For example, if the master (or person responsible for the roster) does not allocate an individual adequate shore time, then the individual may have to sacrifice rest periods to spend more time ashore. However, the individual also has a responsibility to ensure that he/she gets enough rest time and doesn't sacrifice that time for more shore leave or to catch up on tasks that should have been completed whilst on duty. As the quality of rest management increases, therefore, fatigue is likely to decrease.
Cumulative stress
28
29
Description: Cumulative stress may be caused by the stress of being away from home for extended periods of time, the stress of being on board the vessel in prolonged adverse weather conditions, the stress of the job itself, etc. As this stress accumulates, the individual may find it more difficult to get adequate or quality rest which can lead to an increase in fatigue.
Administrative tasks Description: This factor refers to the demands the individual may experience from having to complete
263
administrative tasks (for example, chart corrections, logbook entries) as well as having to perform his/her primary tasks. As the demands from administrative tasks increases, the demands from concurrent tasks are also likely to increase.
Collision avoidance
30
Description: Collision avoidance incorporates all tasks to do with dealing with other traffic and/or hazards. This includes keeping a watch for traffic/hazards, making contact with other vessels where necessary, manoeuvring around other traffic/hazards, giving way to traffic, etc. As the demands from collision avoidance decrease (i.e. there is less traffic and/or hazards about), there are less concurrent tasks to consume the individuals mental resources, and thus feelings of loading are also likely to decrease.
Personal tasks 31
Description: Personal tasks may include telephoning or writing home (especially if away from home for an extended period of time), or other daily personal tasks such as balancing a chequebook or sorting out personal finances, that the individual may be engaged in.
Monitoring other onboard operations
32
Description: For example, the individual may also be responsible for monitoring, from the bridge, other onboard operations such as maintenance or cargo operations, by keeping in telephone or radio contact with the crew members performing the operations, and making notes in a logbook. As the demands from other onboard operations decrease, there are less concurrent tasks to consume the individuals’ mental resources, and thus feelings of loading are also likely to decrease.
Dealing with passenger / cargo / crew issues 33
Description: Examples include dealing with passenger queries or complaints, dealing with cargo issues such as spoilt cargo or dealing with crew issues such as rosters, etc. As the number of issues decreases, there are less concurrent tasks to consume the individuals’ mental resources, and thus feelings of loading are also likely to decrease.
Clarity of roles and responsibilities 34
Description: For example, all crew members know what their tasks are and what they are responsible for whilst on watch on the bridge. As the clarity of roles and responsibilities increases, the quality of bridge crew competence also increases.
Degree of collective experience 35
Description: The bridge crew have a good degree of experience of working on this vessel, or this type of vessel, and of working in these conditions (route,
264
weather, etc.). As the degree of collective experiences increases, the quality of the bridge crew competence also increases.
Training 36
Description: The bridge crew are all fully trained for their roles and responsibilities. As the degree of training increases, the quality of the bridge crew competence also increases
Level of morale on board
37
Description: If the level of morale on board the bridge is quite poor, this can have an adverse effect on bridge crew competence, as individuals may be reluctant to "go the extra mile" during their watch. Conversely, if the level of morale on board is good, then this can have a positive effect on competence, which in turn decreases feelings of loading.
55, 56
Degree of communication
38
Description: This factor refers to the degree of communication amongst the bridge crew, and includes both the level of information sharing and the quality of information being shared. Very good communication means that information sharing is optimal, and that the information being shared is necessary and correct. As the quality of information increases, the quality of the bridge crew competence will also increase.
Speed of escalation of contingency situation 39
Description: Contingency situations may include a failure on board the vessel (e.g. of the automated system or of a part such as the rudder or engine) or contact with another vessel/hazard, etc. As the speed of escalation of the contingency situation increases, loading is also likely to increase.
Severity of consequences of contingency situation 40
Description: For example, there is no potential for serious injury or harm to any crew members or passengers (if any), or of damage to the vessel or cargo. As the consequences of the contingency situations become more severe, loading is like to increase.
Experience in dealing with contingency situation 41
Description: For example, the crew have experienced this situation before, and know how to act. As the experience of dealing with this situation increases, loading is likely to decrease
Level of natural light 42
Description: For example, it is not dawn or dusk (when natural light levels tend to be quite low), and, hence, it is not difficult to visually identify targets. If the level of
265
57, 58
natural light is good, then the severity of environmental conditions decreases.
Quality of visibility 43
Description: Poor visibility may be caused by heavy rain, fog, snow, etc., which makes it difficult to visually identify hazards, other vessels, targets on the radar etc. As the quality of visibility increases, the severity of the environmental conditions decreases.
Severity of weather conditions 44
Description: Severe weather conditions might include gale force winds, heavy rain, rough seas, heavy fog, etc. As the severity of the weather conditions increases, the severity of the environmental conditions also increases.
Quality of ambient conditions on the bridge
45
Description: Poor ambient conditions may include the bridge being too hot, which may make the individual feel sleepy, or too cold, which may reduce manual dexterity. Ambient conditions, such as the temperature, humidity, lighting and ventilation affect the level of comfort of the individual on the bridge. If the individual becomes too comfortable, he/she may become drowsy or un-alert. If the individual becomes too uncomfortable, his/her manual dexterity may be reduced, or he/she may become distracted by attempting to get comfortable. As the quality of the ambient conditions increases, the severity of the environmental conditions decreases.
Telephone / radio / other distractions
46
Description: Unnecessary or irrelevant telephone or radio calls can be a source of distraction for individuals on the bridge (for example, telephone calls from other crew members on the vessel, or to other crew members on the bridge, when the individual is trying to berth the vessel or manoeuvre around another vessel). As the number of distractions from the telephone or radio increases, loading is also likely to increase.
Distractions from other crew members 47
Description: For example, other crew members coming to the bridge, or chatting on the bridge can be a source of distraction. As the number of distractions from other crew members increases, loading is also likely to increase.
Personal problems 48
49
Description: Sometimes personal problems (financial problems, family issues, bereavement, etc.) can also be a source of distraction. As the number of distractions from personal problems increases, loading is also likely to increase.
Unnecessary / cascading alarms Description: Unnecessary or cascading (for example,
266
engine room alarms that also sound on the bridge) alarms can be a source of distraction. As the number of distractions from these types of alarms increases, loading can also increase.
Distractions from other onboard operations 50
Description: For example, there may be distractions in the form of noise, vibrations, etc. from other parts of the vessel. If the number of distractions from other onboard operations increases, then feelings of loading are likely to increase. Figure 70 List and description of secondary overload factors
267
3. List of tertiary overload factors and descriptions The following table (Figure 71) lists and describes the tertiary overload factors as defined in the composite overload model (Figure 60).
No.
Factor & Description Complexity of the task
51
Description: Tasks may be considered complex if the individual is required to do many calculations (e.g. for route planning), or to gather and compare data from different sources (e.g. compare ARPA data with GPS data), etc. However, as the task becomes less complex, the individual is less likely to experience loading due to the primary task characteristics.
Familiarity with task, route, etc. 52
Description: As individuals become more familiar with the task they are performing, or the route they are travelling, etc. they tend to become more confident in performing that task, and so are less likely to experience loading.
Ability to delegate tasks 53
Description: For example, if the primary task includes both watchkeeping and making chart corrections, the individual may be able to delegate one of these tasks to another crew member, for example, a lookout or a junior officer. If the primary task, or aspects of the primary task can be delegated to other crew members, then the individual has more cognitive resources available, and thus loading is likely to decrease.
Ability to reschedule tasks 54
Description: For example, there may be certain elements of the primary task (for example, recording information in a logbook) that may be rescheduled to another more convenient time. If the primary task, or aspects of the primary task can be rescheduled, then the individual has more cognitive resources available, and thus loading is likely to decrease.
Unannounced extended tours of duty 55
Description: An unannounced or unplanned extended tour of duty may occur when an individual is asked to stay aboard a vessel for a longer period of time than anticipated to cover, for example, illness. As the number of unannounced extended tours of duty increases, morale is likely to decrease, which can have an adverse affect on the quality of bridge crew competence.
Adverse environmental conditions 56
Description: Environmental conditions can have an effect on morale, for example, if the bridge crew are working in prolonged periods of darkness (i.e., during winter), bad weather, cold temperatures, etc. As the environmental conditions improve, however, the level of morale on board is also likely to increase, which is likely to have a positive effect on the quality of bridge crew competence.
268
Degree of common language and/or culture 57
Description: For example, bridge crew members are from similar backgrounds and cultures and there are no language barriers. As the degree of common language and culture increases, communication between bridge crew members is likely to increase, which in turn increases the quality of the bridge crew competence.
Familiarity with working together 58
Description: As the bridge crew become more familiar with each other's working practices, communication between the crew members is likely to increase, which in turn will increase the quality of the bridge crew competence. Figure 71 List and description of tertiary overload factors
269
4. List of primary underload factors and descriptions The following table (Figure 72) lists and describes the primary underload factors as defined in the composite underload model (Figure 65). The table also lists the secondary factors which influence these primary factors, according to the composite underload model. These secondary factors are described in Figure 73.
No.
Factor & Description
Influenced by
Level of physical activity 1
Description: As the individual is engaged in more physical activity, the individual will receive more mental stimulation and thus is less likely to experience underload.
11, 12
Quality of bridge automation 2
Description: As the quality of the automated bridge systems increases, underload is also likely to increase as the individual has to do less mental computation to get the information he/she needs to complete his/her tasks.
Loading due to primary task characteristics 3
Description: The primary task is the foreground, or most important that that the individual is currently engaged in. This may be watchkeeping, navigation, administrative tasks, lookout, etc. As the loading from the primary task characteristics increases, underload is likely to decrease.
13, 14
Manning levels 4
Description: For example, there are a sufficient number of qualified crew members available to complete the necessary tasks.
Fatigue 5
Description: As the level of fatigue increases, the individual has less cognitive resources available to him/her. If the individual becomes very fatigued, he/she may become less aware of the mental stimulation around him/her, and thus is more likely to experience underload.
15, 16, 17, 18, 19, 20
Concurrent task demands 6
Description: If the individual is engaged in a number of concurrent tasks, in addition to the primary task, he/she will receive more mental stimulation and is less likely to
270
21, 22, 23
experience underload.
Level of external activities 7
Description: External activities include any activities going out around the individual, either inside or outside the vessel. As the number of external activities increases, so too will the level of mental stimulation of the individual and so he/she is less likely to experience underload.
24, 25, 26
Degree of bridge crew competence 8
Description: As the degree of competency of the bridge crew increases, they will be more prepared and equipped with techniques and knowledge of how to deal with underload situations.
27, 28, 29
Adverse environmental conditions 9
Description: As the environmental conditions become more severe, the individual must use more cognitive resources to deal with and compensate for the conditions, and thus underload is likely to decrease.
30, 31, 32, 33
Distractions 10
Description: As the number of distractions increases, the individual will receive more mental stimulation and thus underload is likely to decrease.
34, 35, 36, 37, 38
Figure 72 List and description of primary underload factors
271
5. List of secondary underload factors and descriptions The following table ( Figure 73) lists and describes the secondary underload factors as defined in the composite underload model (Figure 65). The table also lists the tertiary factors which influence these secondary factors, according to the composite underload model. These tertiary factors are described in Figure 74.
No.
Factor & Description Adverse weather conditions
11
Description: If the vessel is pitching and rolling, this is likely to help stave off underload, as the individual has to concentrate on keeping his/her balance on board.
Physical movement around the bridge 12
Description: For example, to compare data from different sources or to use different equipment. The individual is less likely to suffer from underload when moving about than when standing or sitting still on the bridge.
Complexity of the primary task 13
Description: As the complexity of the primary task increases, the individual is less likely to experience underload as he/she has to use more cognitive resources to complete the primary task.
Flexibility of the primary task
14
Description: This factor refers to the degree of flexibility of the primary task. If the individual can postpone the task, or delegate the task to another crew member, then the task can be considered very flexible. As a task becomes more flexible, feelings of loading due to the primary task characteristics are likely to be reduced, as the individual does not have to perform the task if needs be.
Length of time on watch 15
Description: Individuals are more likely to feel the effects of fatigue towards the end of their watch than at the start.
Disruption to watch patterns 16
17
Description: As watch patterns become more disrupted, rest periods are also likely to become disrupted, which leads to an increase in fatigue.
Quality of rest periods Description: As the quality of rest periods increases
272
Influenced by
(i.e., the individual gets frequent, uninterrupted rest), the individual is less likely to suffer from fatigue.
Responsibility to additional duties 18
Description: If the individual has many duties in addition to his/her watchkeeping duties, these may reduce the amount of rest time available to the individual, which in turn can lead to an increase in fatigue.
Quality of rest management
19
Description: Rest management refers to how well rest periods are controlled and allocated by the master, and managed by the individual. For example, if the master (or person responsible for the roster) does not allocate an individual adequate shore time, then the individual may have to sacrifice rest periods to spend more time ashore. However, the individual also has a responsibility to ensure that he/she gets enough rest time and doesn't sacrifice that time for more shore leave or to catch up on tasks that should have been completed whilst on duty. As the quality of rest management increases, therefore, fatigue is likely to decrease.
Cumulative stress
20
Description: Cumulative stress may be caused by the stress of being away from home for extended periods of time, the stress of being on board the vessel in prolonged adverse weather conditions, the stress of the job itself, etc. As this stress accumulates, the individual may find it more difficult to get adequate or quality rest which can lead to an increase in fatigue.
Administrative tasks 21
Description: As the number of administrative tasks increases, the individual has to use more cognitive resources, and so is less likely to experience underload.
Personal tasks 22
Description: As the number of personal tasks increases, the individual has to use more cognitive resources, and so is less likely to experience underload.
Monitoring other onboard activities 23
Description: As the number of other onboard activities that the individual is responsible for monitoring increases, the individual has to use more cognitive resources, and so is less likely to experience underload.
Stimulation from alarms 24
Description: Alarms can provide a vital level of mental stimulation (for example, the deadman alarm), which can help to prevent underload from developing. However, if there are too many unnecessary or spurious alarms on the bridge, there is a danger that the individual may become complacent when cancelling the alarms, and may not check out the source of new alarms
273
39, 40
before cancelling them.
Interactions with other crew members 25
Description: As the number of interactions with other crew members increases, the individual is less likely to experience underload.
Frequency of navigational hazards 26
Description: Navigational hazards can include other vessels, shallow draughts, narrow channels, etc. As the number of these hazards increases, the individual is forced to be more alert to navigate safely around them, and thus is less likely to experience underload.
Training including dealing with underload 27
Description: For example, the bridge crew have been training in how to deal with underload situations, how to stay alert, etc.
Experience in dealing with underload 28
Description: For example, the bridge crew have previously experienced underload situations and have developed techniques (either formally or informally) in how to stay alert during periods of low mental stimulation.
Frequency of communication 29
Description: As the quality of communication between bridge personnel increases, individuals are less likely to experience underload as they are engaged in a mentally stimulating exercise.
Level of natural light 30
Description: For example, it is not dawn or dusk, when the level of natural light tends to be quite low. If the level of natural light is good, then the severity of the environmental conditions decreases.
Visibility 31
Description: Poor visibility may be caused by heavy rain, snow, fog, etc., which makes it difficult to visually identify hazards, other vessels, targets on the radar, etc. As the quality of visibility increases, the severity of the environmental conditions decreases.
Degree of bad weather 32
Description: Severe weather conditions might include gale force winds, heavy rain, rough seas, dense fog, etc. As the severity of the weather conditions increases, the severity of the environmental conditions also increases.
Ambient conditions 33
Description: Ambient conditions include the temperature, lighting, humidity, ventilation and comfort of the physical surroundings on the bridge. If these are too comfortable, the individual may become drowsy and may find it difficult to retain alertness.
274
Telephone / radio / other distractions 34
Description: For example, frequent unnecessary telephone or radio communications. As the number of distractions increases, the individual uses up more cognitive resources to deal with these distractions, and thus is less likely to suffer from underload.
Distractions from other crew members 35
Description: For example, other crew members coming to the bridge to chat. As the number of distractions increases, the individual uses up more cognitive resources to deal with these distractions, and thus is less likely to suffer from underload.
Distractions from other onboard operations 36
Description: For example, noise or vibrations from other areas of the vessel. As the number of distractions increases, the individual uses up more cognitive resources to deal with these distractions, and thus is less likely to suffer from underload.
Personal / domestic pre-occupations 37
Description: For example, the individual may be preoccupied with personal financial or family problems. As the number of distractions increases, the individual uses up more cognitive resources to deal with these distractions, and thus is less likely to suffer from underload.
Commercial pressure 38
Description: As the level of commercial pressure increases (i.e., pressure to adhere to schedules, etc.) the individual is more likely to consumer cognitive resources, and thus less likely to experience underload. Figure 73 List of secondary underload factors and descriptions
275
6. List of tertiary underload factors and descriptions The following table (Figure 74) lists and describes the tertiary underload factors as defined in the composite underload model (Figure 65).
No.
Factor & Description Quality of alarm management
39
Description: Good quality alarm management means that there are very little or no spurious or unnecessary alarms sounding on the bridge. If the alarms are poorly designed, for example cascading alarms or alarms from other areas of the vessel sounding on the bridge, then this can cause to become complacent when cancelling alarms, and may reduce their alertness when a genuine alarm sounds.
Provision of deadman alarm 40
Description: The deadman alarm can be set to sound at regular intervals throughout the watch and is designed to keep the individual alert. Figure 74 List and description of tertiary underload factors
276
Appendix 9 – Feedback Questionnaire for ID sessions
277
Hello, I would like to take this opportunity to thank you once again for your participation in our Influence Diagram workshops to determine the factors that drive Cognitive Mental Workload. We enjoyed meeting you and hearing your thoughts and feelings on the factors that contribute to overload and underload situations. We found the workshops very interesting and informative, and we hope you did too! Further to these workshops, we have developed a short questionnaire to get some feedback from you about what you thought of the sessions. We have also compared the results of the different workshops, and would like to get some feedback about whether you think these results accurately reflect your experiences of mental workload. The questionnaire is included below, and we would greatly appreciate if you could take some time to answer the questions and send this email back to the following address:
[email protected] . I'd like to remind you that all information given is strictly confidential! Best regards, Dr. Claire Blackett, Consultant, Human Reliability Associates. Phone: 01257 463121 Email:
[email protected] Web: http://www.humanreliability.com
======================================================== COGNITIVE MENTAL WORKLOAD STUDY - FOLLOWUP QUESTIONNAIRE ======================================================== Please fill out your answer below each question. -------------------------------------------------------General Questions: -------------------------------------------------------1. Did you find the workshops interesting?
2. Do you think that we fully explored the subject of cognitive mental workload (overload and/or underload) in these workshops?
3. Did the facilitator allow everybody to make a contribution?
4. Are there any areas / issues that you would like to see explored in more detail? -------------------------------------------------------The Influence Diagram Process: -------------------------------------------------------5. Did you find the process for creating the Influence Diagrams easy to follow and understand?
278
6. Were there any parts of the process that you think should be explained in greater detail?
7. Did you find it easy to differentiate between the importance of different factors (i.e., assign weightings)?
8. Did you find it easy to judge the quality of different factors in a given scenario (i.e., assign ratings)?
9. If not, what would help?
10. Do you think the process fully explored and resolved all the issues?
11. Do you have any suggestions for improving the process?
12. Was the length of the sessions too short/too long?
-------------------------------------------------------The Results of the workshops: -------------------------------------------------------13. Do you think the results of these workshops will be of practical use in helping to reduce situations in which mental workload may contribute to errors?
14. How do you think the end tool of this research project should be used (i.e., by regulatory authorities, by shipping companies, by the captain on board a vessel, etc.)?
-------------------------------------------------------Results Feedback: -------------------------------------------------------As promised, we have compared the results of the different workshops to determine which factors are most influential in overload and underload situations. The comparison of the results of the different workshops has shown that the top five most influential factors are: Factors affecting overload: 1. Manning levels 2. Factors affecting fatigue (e.g., quality of rest periods, disruption to shift patterns, etc.) 3. Factors affecting environmental / weather conditions (e.g., degree of restricted visibility, sea conditions, adverse weather, etc.) 4. Distractions on the bridge (e.g., unnecessary telephone / radio transmissions) 5. Level of training and experience of the bridge crew Factors affecting underload: 1. The level of physical activity of the Officer of the Watch
279
2. Factors affecting fatigue (e.g., quality of rest breaks, disruption to shift patterns, length of time on watch, etc.) 3. Factors affecting weather conditions (e.g., degree of bad weather, sea conditions, etc.) 4. The physical environment of the bridge (e.g., the degree of physical comfort on the bridge, layout of the bridge, etc.) 5. The frequency of radio transmissions and alarms on the bridge ----15. Do you agree that these factors are most likely to influence overload and/or underload?
16. If not, what factors do you think are most influential in an overload or an underload situation?
-------------------------------------------------------Further information: -------------------------------------------------------17. We would be grateful if you could you give us a brief past experience (i.e., the different types of vessels you on) and your current post, for our records? We would like do not need any personal information, and all information strictly confidential!
18. Would you mind if we contacted you again to clarify information or to ask your opinion on some matters?
280
summary of your may have worked to stress that we given here is
Appendix 10 – Sample Diary Study questionnaire and feedback questionnaire
281
ABOUT THIS QUESTIONNAIRE Human Reliability Associates are working with the Maritime and Coastguard Agency to develop a software tool that will measure levels of cognitive (mental) workload in seafarers. By completing this questionnaire at the end of each watch, you will help us to gain a better understanding of the types of tasks that you perform during your watch and of the factors that increase or decrease mental workload during these tasks. The questions on the next two pages have been developed with the help of four separate groups of seafarers, and represent their opinions of the factors that influence workload. We would like you to answer the questions and also to give comments or examples where possible to supplement the information we have received from these four groups of officers. When you have completed the questionnaire, please place it in the box provided. If you wish to remain anonymous, then do not enter your name in the first box on the next page. Many thanks for your help.
AN EXPLANATION OF SOME OF THE TERMS USED Mental loading:
Mental loading can be defined as the amount of mental resources a person needs to use in order to perform a particular task. High levels of mental workload (overload situations) may indicate that there are many things happening at once, and the person cannot take in all of the information directed at him/her, and feels under a lot of pressure. On the other hand, low levels of workload (underload situations) may indicate that the person is not getting enough mental stimulation from his/her surroundings, and may become less vigilant and their attention may begin to wander.
The primary task:
The primary task is the foremost task that you must perform at any given moment. For example, when berthing the vessel, your primary task may be manoeuvring the vessel safely. When in open seas, your primary task may be keeping a visual lookout and monitoring the radar.
Concurrent tasks:
Concurrent tasks are other tasks that must be done, in addition to the primary task. For example, if the vessel is in open waters and your primary task is navigation and watchkeeping, a concurrent task may be keeping a log of all course corrections or monitoring radio communications.
INSTRUCTIONS FOR FILLING IN THIS QUESTIONNAIRE In order to mark your response to each question, please circle the appropriate number as in the example below. In this example, the person has circled number 8, which would indicate that manning levels were quite good at this time, but were not at their best case, which would be indicated by circling number 10. Manning levels on board the bridge were very good (e.g. there are sufficient qualified crew members for the tasks that must be performed).
Strongly
Disagree
Neither agree
disagree
1
Agree
Strongly
nor disagree
2
3
4
5
6
agree
7
8
9
10
Manning levels were good, but an extra lookout on the bridge would have been helpful as there was a lot of traffic in the area during my watch.
The person has also added a comment with some extra information. Please bear in mind that your answers should be based upon the watch that you have just completed.
PERSONAL DETAILS: Name (optional): Position / Rank: Length of watch: Weather conditions:
Please indicate your main responsibilities during the watch you have just completed:
_____________________________________________ _____________________________________________ _____________________________________________ _____________________________________________
______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________
DEGREE OF LOADING: Can you please make a mark on the scale opposite to indicate how loaded you felt during this watch:
Very
Somewhat
Neither overloaded
Somewhat
Very
underloaded
underloaded
nor underloaded
overloaded
overloaded
|------------|------------|------------|------------|------------|------------|------------|------------|------------|
QUESTIONS: Look at the questions below and indicate whether you agree or disagree. Please write any comments you may have about these factors in the space provided. MANNING LEVELS 1. Manning levels on board the bridge were very good (e.g. there were sufficient qualified crew members for the tasks that had to be performed).
Strongly
Please write any additional comments here: Disagree
Neither agree
disagree
1
Agree
Strongly
nor disagree
2
3
4
5
6
agree
7
8
9
10
SEVERITY OF THE PERCEIVED CONSQUENCES 2. There were no immediate consequences if I had made an error during my watch (e.g. no potential for injury or harm to another crew member or the vessel).
Strongly
3. There were no commercial consequences if I had made an error during my watch (e.g. no potential for financial loss or damage to the company’s reputation).
Strongly
Disagree
Neither agree
disagree
1
2
3
4
Disagree
5
6
Strongly agree
7
Neither agree
disagree
1
Agree
nor disagree
8
9
Agree
Strongly
nor disagree
2
3
4
5
6
10 agree
7
8
9
10
FATIGUE 4. I did not feel any more tired or weary at the end of my watch than I did at the beginning (i.e. the length of time on watch did not affect fatigue).
Strongly
5. My shift patterns have been very regular recently (e.g. I have not had to unexpectedly work extra shifts to cover illness or other circumstances).
Strongly
6. The quality of my rest periods has been very good recently (e.g. no interruptions or bad weather conditions to prevent me getting adequate rest).
Strongly
Disagree
Neither agree
disagree
1
2
3
4
Disagree
Strongly
6
agree
7
8
9
Agree
3
4
Disagree
5
6
agree
7
Neither agree
8
9
Agree
3
4
5
6
10 Strongly
nor disagree
2
10 Strongly
nor disagree
2
disagree
1
5
Neither agree
disagree
1
Agree
nor disagree
agree
7
8
9
10
(Questions continued overleaf)
7. I have not suffered from any situational stress recently (e.g. stress accumulating from issues on board or any recent incidents or near-misses). 8. My responsibilities to other duties outside of my watch have not prevented me from getting sufficient rest (e.g. have not been overly time-consuming)
Strongly
Disagree
Neither agree
disagree
1
2
Strongly
3
4
Disagree
Strongly
5
6
agree
7
Neither agree
disagree
1
Agree
nor disagree
8
9
Agree
3
4
5
6
10 Strongly
nor disagree
2
Please write any additional comments here:
agree
7
8
9
10
CONCURRENT TASK DEMANDS 9. Traffic density in the area was acceptable, and did not require an excessive amount of attention (e.g. no potential for collision with any other vessels). 10. I was not engaged in any pilotage tasks, or, if I was, they did not require an excessive amount of attention (e.g. vessel was in open waters, or there was a pilot on the bridge).
Strongly
Disagree
Neither agree
disagree
1
2
Strongly
3
4
Disagree
5
6
Strongly agree
7
Neither agree
disagree
1
Agree
nor disagree
8
9
Agree
Strongly
nor disagree
2
3
4
5
6
10
agree
7
8
9
10
SEVERITY OF THE ENVIRONMENTAL CONDTIONS 11. The visibility was very good during my watch (e.g. I had no difficulty seeing the horizon or visually identifying any targets or hazards on the radar).
Strongly
12. The weather conditions were very good during my watch (e.g. clear skies, calm seas, little or no wind, and warm weather).
Strongly
Disagree
Neither agree
disagree
1
2
3
4
Disagree
5
6
Strongly agree
7
Neither agree
disagree
1
Agree
nor disagree
8
9
Agree
nor disagree
2
3
4
5
6
10 Strongly agree
7
8
9
10
PRIMARY TASK CHARACTERISTICS 13. My primary task(s) during my watch were not very complex (e.g. did not require many calculations or collection or comparison of data from various sources)
Strongly
Disagree
Neither agree
disagree
1
Agree
Strongly
nor disagree
2
3
4
5
6
agree
7
8
9
10
ADDITIONAL FACTORS: Are there any other factors that, in your opinion, have an influence upon workload that haven’t been covered in this questionnaire? How do these factors influence workload? Are there any other comments that you would like to make about your watch?
FEEDBACK FORM NOTE: This feedback form should only be filled out once, at the end of your voyage. Firstly, we would like to thank you for participating in our workload study. We would also like to ask you to fill out this feedback form to let us know what you thought of the questionnaires. When you have completed this form, please place it in the box/envelope provided. If you wish to remain anonymous, then do not fill out your name below. However, we would appreciate if you could fill out the remainder of the Personal Details box for statistical purposes. Many thanks for your participation in this study.
PERSONAL DETAILS Name (optional):
____________________________________________________________________________________________
Position / Rank:
____________________________________________________________________________________________
Number of years with company:
____________________________________________________________________________________________
Number of years in current office:
____________________________________________________________________________________________
IMPORTANCE OF THE FACTORS Below is a list of ten factors that are known to influence workload. Please rate each factor according to how much of an influence that factor would have on workload, on a scale of 1 to 100, with 100 meaning it is highly influential, and 1 meaning it is not influential at all. For example, if manning levels were at their worst case possible, how influential would that be on your feelings of loading? If you think that this factor would have the biggest impact upon your feelings of loading, then enter 100 into the ‘Rating’ box. If you think it would have a moderate impact then enter ‘50’ into the box. If you don’t think it would have any impact then enter ‘1’ into the box.
Factor
Rating
Quality of bridge automation – This factor refers to the usability and/or reliability of radars, autopilot, communication systems, etc. Primary task characteristics – This factor refers to the difficulty, complexity and/or flexibility of the primary task. Severity of perceived consequences – This factor refers to the perceived cost of making an error, i.e. the potential for injury or damage. Manning levels – This factor refers to the number of qualified and experienced crew available for the task. Fatigue – This factor refers to the level of fatigue as caused by inadequate rest, excessive duties, etc. Concurrent task demands – This factor refers to the tasks that must be completed in addition to primary task. Crew competence – This factor refers to the experience and training of the crew members. Severity of environmental conditions – This factor refers to the weather conditions, visibility and/or level of natural light. Distractions – This factor refers to any unnecessary telephone communications, alarms and/or interruptions from other crew members. System failures – This factor refers to the speed of escalation of the failure and/or the experience of dealing with similar situations.
FEEDBACK QUESTIONS
How easy to use did you find the questionnaires? ____________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________ On average, how long did it take for you to fill out each questionnaire? ____________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________ (Questions continued overleaf)
Did the questions make sense to you, and were they relevant to your watch? ____________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________ Did you need clarification on any of the questions, and who did you ask for clarification? ____________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________ Did you have any trouble understanding the concepts behind the questions (i.e. loading, primary tasks, etc.)? ____________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________ Was it easy to determine how loaded you felt during your watch, and to mark this on the scale provided? ____________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________ Were there any additional factors that you felt were important, but that were not dealt with in the questionnaire? ____________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________ If so, how would you rate the importance of these factors, on a scale of 1 to 100 (100 meaning having the most impact on workload)? ____________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________ Did you find it difficult to rate the factors on the previous page of this feedback form, using the scale of 1 to 100? ____________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________
290