Using Bayesian Networks to Model Accident Causation in the UK Railway Industry William Marsh, RADAR Group, Queen Mary, University of London, Mile End Road, E1 4NS, London, UK
[email protected] George Bearfield Atkins Rail, Euston Tower, 286 Euston Road, NW1 3AT, London, UK
[email protected]
Abstract We describe a method of modelling organisational causes of accidents, using Bayesian Networks. A rigorous method is used to relate interactions within the organisation operating the system to causal factors for accidents. Using examples from a model of the causes of SPAD incidents in the UK railways, we describe how such a model can be used for risk assessment.
1 Introduction It is widely recognised that accidents in which ‘human error’ plays a part are often not solely attributable to errors made by an operator but have deeper causes, arising from the behaviour of many others within the organisational context of a system. To analyse the risk of accidents and to improve safety, these ‘organisational factors’ need to be understood and included in models of possible accidents and their causes, which underlie risk assessment of operational systems. Several frameworks for analysing the organisational context of accidents have been proposed [3, 7], but without the capability to assess risks numerically. We outline an alternative method, using Bayesian Networks. The presentation uses extracts of a model we are developing of the causes of drivers passing danger signals in the UK rail network.
1.1 Signals Passed at Danger (SPADs) A SPAD (Signals Passed at Danger) is defined as ‘an incident when a train has passed a stop signal at danger without authority’ (Railtrack Company Procedure, RT/D/P/010). SPADs vary in severity; a ‘serious’ SPAD (severity 3 and greater) occurs when the distance travelled past the signal exceeds the safe overlap. Following serious rail crashes at Watford (1996), Southall (1997) and Ladbroke Grove (1999), all involving a signal passed at danger, there has been a sustained focus on reducing the incidence of SPADs on the UK rail network. The Rail Safety and Standards Board (RSSB) reports monthly on the occurrence of SPADs on the network (for example [2]). Figure 1 shows the reduction in SPADs since 1997.
Number of SPADs
700 600
all severities
500 400 300
serious – severity 3 to 8
200 100 0 1997
1998
1999
2000
2001
2002
Figure 1. Annual incidence of SPADs (category A)
In 1997 there were 229 serious (severity 3 to 8) SPADs (dotted line) on UK mainline railways; by 2002/2003 the incidence had reduced to 140. The percentage of serious SPADs has remained consistently at around 40% of the total number.
2 Organisational Factors in Operational Safety Investigations of accidents in complex systems have shown that seemingly ‘random’ events leading to accidents have underlying causes. Events attributed to human error and blamed on an operator have systemic causes, such as procedural or organisational weaknesses. Reason [8] describes the gradual relaxation of safety alertness following a period of safe operation, followed by increased alertness after an accident as ‘currents in the safety space’. Rusmussen [7] stresses the changing nature of organisations; pressure to increase efficiency or reduce workload, unless checked, will cause the operation of a system to migrate towards the boundary of safety.
2.1 Organisational Causes of SPADs The inquiry into the Ladbroke Grove accident found evidence of performance pressures affecting the safety culture of the UK railways at that time: ‘Within the workforce there is a perception that emphasis on performance has affected attitudes to safety. … The disparity in sanctions between those for failures in performance and those for failures in safety may well have conveyed to the industry that performance was of top priority’ [1, page 3]. Until 1999/2000, around 40% of SPADs occurred at signals that had previously experienced a SPAD. Investigation of the causes of these so-called multi-SPAD events, such as poorly sighted signals, had reduced their proportion to 27% by 2003 [2]. As the proportion of multi-SPAD events falls, this reactive approach needs to be replaced with more proactive analysis of the causes of SPADs, including organisational factors. However, it is less clear how investigation of organisational factors is to be achieved. An assessment of the investigation of SPAD incidents [9] comments that ‘the industry is generally poor at identifying organisational issues that may underpin
SPAD incidents unless those incidents are so serious as to warrant significant input from senior managers.’ One aspect of this weakness is modelling accidents just as event sequences. Procedural and organisational weaknesses have only an indirect effect on the accident and are therefore not readily represented as events. Ignoring causal factors undermines risk assessment and limits the scope of accident investigation; it also means organisational audit findings are not linked to possible accidents. Therefore, we use Bayesian Networks (BNs) to model accidents, with explicit representation of both events and root causes.
3 Causal Modelling with Bayesian Networks A BN is a graph (such as that shown in Figure 2) with a set of probability tables. The nodes represent uncertain variables and the arcs represent the causal/relevance relationships between the variables. There is a probability table for each node, providing the probabilities of each state of the variable, for each combination of parent states. The model of cause is probabilistic rather than deterministic and this makes it possible to include factors that influence the frequency of events, but do not determine their occurrence. Although the underlying theory (Bayesian probability) has been around for a long time, executing realistic BN models was only first made possible in the late 1980s using new algorithms. Methods for building large-scale BNs are even more recent [4] but it is only such work that has made it possible to apply BNs to the problems of systems engineering. The RADAR group at QMUL, in collaboration with Agena Ltd, has built BNbased applications that have shown the technology to be effective. Several existing BN applications are for dependability assessment, notably the TRACS tool [5] used to assess vehicle reliability by QinetiQ (on behalf of the MOD). Recently, the SCORE project [6] has applied BNs to model the influence of organisational culture on the safety of operational systems, using an air-traffic control case study. mis-interpreation
signal not located
sighting obs truction
brakes not applied
read-across at proceed SPAD
read acros s
phantom proceed distraction
late sighting
late brake application
Figure 2. A Simplified BN Model of SPAD Events
4 Developing a Causal Model We follow four steps to identify causal factors and relate them to the structure of the operational system. The causal factors and events leading to an accident are formed into a causal model, represented by a BN.
4.1 Modelling the Organisational Context The first step is to understand the interaction of ‘actors’ in the organisation contributing to incidents; we use an adaptation of Rasmussen’s AcciMap [7] for this. Trains on the UK Railway are operated by one of the Train Operating Companies (TOC); Figure 3 shows part of a model of the TOC organisation, relevant to the occurrence of SPAD incidents. Driver Training
Driver Management
Driver
Figure 3. Organisational Actors in a Train Operating Company
The organisational model is made up of: • Actors: each actor is a role; it is described in terms of its responsibilities relevant to the incident. • Interactions: these are modelled as information exchanged between actors. For example, the driver receives shift instructions from a driver manager and the ‘driver training’ role trains new drivers and ensures that existing driver maintain their ‘route knowledge’. Some interactions are expected to occur, following standard procedures (e.g. for the maintenance of route knowledge) but all the interactions of actual working practices should be included. In addition, measurements that provide evidence for each interaction and its effectiveness are also identified. Some of the interactions have conflicting influences, for example the ‘driver trainer’ teaches defensive driving but the driver manager also tries to ensure the timetable is met.
4.2 Modelling Information: Entities and Attributes The interactions within the organisation influence the frequency of accidents. The next step in the derivation of a causal model for predicting the accident frequency is to identify the attributes or properties to be represented as variables in the causal model. We use a standard relational data modelling approach. Each data entity derives either from an actor or from an interaction in the organisational model. Entities in the SPAD model include actors like ‘driver’ and interactions such as the ‘timetable’, since a tight timetable may pressure a driver to drive faster than the ‘defensive driving’ principle would recommend.
4.3 Accident Scenarios: Events and Influences Possible scenarios leading to an accident are analysed. Each scenario consists of: • Events: the sequence of events leading to the accident. Each event has a probability (or frequency) of occurring. Following the MARS model [10], we consider the approach to the signal to be composed of ‘detection’, ‘decision’ and ‘response’ stages. An event is a failure at one of these stages. • Causal factors: attributes that influence the probability that an event follows the previous event in the sequence. SPAD incident reports can be used to identify SPAD scenarios. Figure 4 shows one scenario modelled by a BN – when the driver passes the signal at danger, having mistakenly read the signal aspect for an adjacent line. A SPAD of this type occurred in March 2003 at signal E744; the investigation concluded that the driver read an adjacent signal that was displaying a single yellow cautionary aspect (see http://www.hse.gov.uk/railways/spad/investigation/2003/27-03-2003.htm). The event nodes are shown shaded and other nodes are the immediate influences on these events. driver route knowledge
pressure on driver driver alertness
infrastructure factors late brake application
read across
read-across at proceed
brakes not applied
SPAD
Figure 4. SPAD Scenario: Read Across an Adjacent Signal
The un-shaded nodes show the influences on the likelihood of the SPAD scenario; for example, infrastructure influences read-across. The investigation noted above found that the two signals are approached on a left hand curve and this brings the adjacent signal into the driver’s direct line of sight for a short time. Many infrastructure factors for SPADs are known with checklists used for risk assessment of individual signals. The final step to complete the causal model is to merge the events chains for each scenario (see Figure 2) and to elaborate the causal factors, as described in the next section.
4.4 Completing the Causal Model The causal factors of the scenarios are attributes of entities in the information model (section 4.2); each factor may in turn have causes. For example, the driver training activity within the TOC will influence the average driver’s route knowledge. The causal relationships between the attributes and their strengths are determined by interviews with ‘experts’ in each process, such as drivers and driver trainers. Although we have not yet completed these interviews, we expect that the relational structure of the information model can be used to simplify the identification of causal links, with most causes found in nearby related entities. Data on past incidents is also useful. For example, signal E744 has been passed at danger seven times previously since 1994 – given an estimate of the demand rate an estimate of the SPAD rate per demand for E744 can be derived.
Data on the overall incidence of SPADs, such as that reported monthly by RSSB, can also be used to calibrate the BN model with the overall frequency of SPADs and available breakdowns of this frequency.
5 Model Applications We have described a systematic method of deriving a causal model for the occurrence of operational accidents. The model is derived from an analysis of the interactions within an organisation operating a system, from accident scenarios and from elicited ‘expert’ judgements of the strength of cause and effect relationships. Since the model is represented by a BN it can be used to calculate accident frequencies and therefore risk levels. In the case of SPADs one possible use of the model would be to identify high-risk signals, using data characterising each signal. The model could also be used to monitor changes in an organisation’s susceptibility to SPADs using data gathered from audits of activities such as driver training. We have described a causal model for accidents that includes organisational factors and so meets some of the objectives described in [7]. A key advance is for audit, risk assessment and, possibly, accident investigation to use a single model of accident cause. We do not yet explicitly model dynamic aspects of the organisation, such as the slow decay of safe working practices after a long period of safe operation. Further, the chosen case study has some special features: data is readily available on SPADs and they follow a small number of known scenarios. In future, we hope to show that similar models can be constructed for other accidents. References 1. Cullen, The Ladbroke Grove Rail Inquiry – Part 2 Report, HSE Books, 2002. 2. Davis L., Downes M. SPAD Report for October 2003. Rail Safety and Standards Boards, November 2003. Available from www.rssb.co.uk. 3. Leveson N. A new accident model for engineering safer systems. Safety Science, Elsevier Science Ltd., 2003 (to appear). 4. Neil M, Fenton N. Building Large Scale Bayesian Networks. Knowledge Engineering Review. 15(3), 257-284, Sept 2000. 5. Neil M, Fenton N, Forey S and Harris R. Using Bayesian Belief Networks to Predict the Reliability of Military Vehicles, IEE Computing and Control Engineering J 12(1), 11-20, 2001. 6. Neil M., Shaw R., Johnson S., Malcolm B., Donald I., Cheng Qui Xie. Measuring & Managing Culturally Inspired Risk, in Proceedings of the Eleventh Safety-critical System Symposium, Bristol, UK, 4-6 February 2003. 7. Rasmussen J. Risk Management in a Dynamic Society: A Modelling Problem. Safety Science, vol. 27, No. 2/3, Elsevier Science Ltd., 1997, pages 183-213. 8. Reason J. Managing the Risks of Organizational Accidents. Ashgate Publishing, ISBN 1 84014 105 0, 1997. 9. Sitwell G., Purcell S. Assessment of Investigations into Signals Passed at Danger (SPADs) BL2077 004 – TR06, WS Atkins Rail Limited, June 2001. 10. Wright K., Embrey D. Using the MARS Model for Getting at the Causes of SPADs, Rail Professional, 2000.