Performance of Early Warning Scoring Systems to Detect Patient ...

Performance of Early Warning Scoring Systems to Detect Patient Deterioration in the Emergency Department Mauro D. Santos, David A. Clifton, and Lionel Tarassenko Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, United Kingdom {mauro.santos,david.clifton,lionel.tarassenko}@eng.ox.ac.uk

Abstract. Acute hospital wards in the UK are required to use Early Warning Scoring (EWS) systems to monitor patients’ vital-signs. These are often paper-based, and involve the use of heuristics to score the vital signs which are measured every 2-4 hours by nursing staff. If these scores exceed pre-defined thresholds, the patient is deemed to be at risk of deterioration. In this paper we compare the performance of EWS systems, that use different approaches to score abnormal vital-signs, to identify acutely-ill patients, while attending the Emergency Department (ED). We incorporate the use of data acquired from bed-side monitors into the EWS system, thereby offering the possibility of performing patient observations automatically, between manual observations. Keywords: Emergency Department, Early Warning Scoring System, Receiver Operator Characteristic Analysis.

1

Introduction

The ED is often one of the busiest wards in the hospital due to unscheduled admissions, diverse patient clinical conditions, and the requirement to diagnose, provide initial treatment, and discharge 98% of the patients within 4 hours within the UK NHS [1]. In these conditions, patient deterioration may be missed between clinical observations. Vital signs are often monitored using paper-based Track and Trigger (T&T) charts, where an alert is generated if the combined scores of heart rate (HR), respiration rate (RR), oxygen saturation (SpO2 ), systolic blood pressure (Sys BP), temperature (Temp), and Glasgow Coma Scale (GCS) is higher than a predefined threshold. T&T charts use Early Warning Scoring (EWS) systems to score vital-signs and although a wide variety of EWS systems have been proposed, there is no clear evidence for their validity, reliability, utility, or performance [2,3]. Continuous monitoring systems such as bed-side monitors are also present in some areas of the ED and can identify, during the intervals between the intermittent nurse observations, patients who will require escalation of care in the ED. However, studies have shown that these systems suffer from a high rate of false alerts, such that they are often ignored [4]. J. Gibbons and W. MacCaull (Eds.): FHIES 2013, LNCS 8315, pp. 159–169, 2014. c Springer-Verlag Berlin Heidelberg 2014

160

1.1

M.D. Santos, D.A. Clifton, and L. Tarassenko

Patient Care in the Emergency Department

Depending on the triage result the patient is sent to one of three areas in the ED1 : “Minors” , “Majors” , or the “Resuscitation room” (Resus). The Minors area admits patients who do not require immediate treatment, and whose condition is deemed to be of low severity. The Majors area accommodates adult patients that have a high likelihood of needing further treatment in the hospital, and Resus includes patients with life-threatening illnesses or injury. The frequency of the clinical observations taken by nurses is dependent on the area of the ED to which the patient has been admitted and on the patient’s condition. Patients from Majors and Resus will typically also be continuously monitored, being connected to bed-side monitors.

2

Methods

2.1

Dataset

We used a dataset obtained from 3039 patients (age > 15 years), from an observational study that occurred in 2012, in the Majors area of the ED of the John Radcliffe Hospital, Oxford (Figure 1). A total of 6812 clinical observations, comprising the vital signs previously mentioned, were collected from ED T&T paper charts. About 6325 hours of continuous vital-sign data comprising HR, RR, and SpO2 were obtained, sampled approximately every 30 seconds, and BP sampled every 30 minutes, from bed-side monitors. Patients are observed by nursing staff every hour in the Majors area if the vital signs are normal, and at 30- and 15-minute intervals when the EWS is greater than or equal to 3. Nurses report to the hospital’s Electronic Patient Record (EPR) System the times the patients moved between different ED clinical areas. The times of patients moving from the Majors to the Resus area were collected from this system. All data was anonymised with study numbers. 2.2

Early Warning Scoring Systems

One single-parameter EWS and three multi-parameter EWS systems are used in the retrospective analysis of the dataset: 1. The single-parameter EWS is called Medical Emergency Team (MET, Table 2) calling criteria [5], which is used to call a Medical Emergency Team when there are gross changes in a single vital sign or a sudden fall in the level of consciousness. It was implemented with the assumption that the physiological processes underlying catastrophic deterioration, such as cardiopulmonary arrest, are identifiable and treatable 6 to 8 hours before the condition occurs [5,6]. This system has the advantage of being easy for clinical staff to follow. 1

This set of ED areas exists in the John Radcliffe Hospital, Oxford, equivalents to which may be found in most UK hospitals.

Performance of EWS Systems in the ED - FHIES 2013

161

Fig. 1. Consort Diagram with the total amount of patients attending the ED Majors area between April 23rd and June 10th , 2012, and number of patients with and without escalation events used to study the performance of EWS in identifying physiological deterioration.

2. The modified EWS (MEWS, Table 3) [7] is a multi-parameter system where abnormal vital-signs (HR, RR, SpO2 , Sys BP, Temp, and GCS) are scored as 0, 1, 2 or 3 according to their level of abnormality. If one of the individual scores is 3 or if the aggregated score is greater than or equal to 3, 4 or 5 (see below) the patient is considered to be at risk of deteriorating and a clinical review is requested. In our analysis, we test an aggregated score of 3 or 4 for MEWS to trigger an alert. Also the abnormal thresholds previously described were assigned to vital-signs according to expert opinion from doctors working at the John Radcliffe ED [8]. 3. The centile-based EWS, (CEWS, Table 4) [9] has scores assigned to the vital signs depending on their centile value in a distribution computed from 64,622 h of vital sign data. We also perform the analysis with alerting thresholds of 3 and 4 for the aggregated score for CEWS. 4. The National EWS (NEWS, Table 5) [10] where the scores are optimised for predicting 24-hour mortality. It uses scores of 3 and 5 for the single and multi-parameter alerting criteria, respectively. This system uses an additional parameter: the presence of supplemental oxygen provided to the patient, which adds 2 points to the total score. 2.3

Application of EWS to Bed-Side Monitor Data

In this paper we use the EWS systems to score physiological abnormality both on the intermittent vital signs collected by nurses (i.e. the clinical observations), as well as the continuous vital-sign data collected by bed-side monitors. EWS

162


systems were created primarily to manage the patient’s condition using intermittent observations. Applying an EWS to continuous data requires a different methodology to trigger an alert, since artefacts caused by body movement or probe disconnection could easily result in a false alert. We therefore use a “persistence criterion”: an alert is deemed to begin if a physiological abnormality (EWS above alerting threshold) is scored for 4 minutes in a 5-minute window, and is deemed to end if less than 1 minute of physiological abnormality exists in a 3-minute window. This criterion has been applied in other patient monitoring studies [11]. 2.4

Performance of EWS Systems in the ED

This paper follows the patient-based analysis formulated in [12]. From the 3039 patients, we focus on “stable” and “unstable” (or “event”) patients that presented both complete clinical observations documentation and bed-side monitor data (continuous data). In this analysis, “stable” patients comprise patients that did not move to Resus, and who were discharged home at the end of their stay in the Majors area (494 patients). 65 patients needed to be removed due to incomplete timestamps documentation. “event” patients include patients that were escalated from Majors to Resus, due to physiological abnormality during their stay in the ED, including neurological abnormality (41 patients). Inaccuracies may exist in reporting this data in the EPR, therefore only escalations happening 30 minutes after the patient arrival to the ED were considered. Earlier escalations times could have already been planned at patient arrival (for example for patients coming in an ambulance). The remaining patients, that had both clinical observations and continuous data, but were admitted elsewhere in the hospital, were removed from this analysis. These patients may have had other escalations that did not require them to move to Resus. The analysis was subsequently carried out on a total of 494 patients, with a total of 1519 observations sets and about 1318 hours of continuous bed-side monitor data. True Positives (TP) are defined as the “event” patients for whom moving to Resus was detected successfully by an abnormal period scored by the EWS systems. False Negatives (FN) are “event” patients that were not detected because no abnormal periods were identified by the EWS system under consideration. We define a TP to have occurred if the alert is generated within an interval t before the first escalation and 10 min after the escalation (because the times of escalation to Resus are not precise). For this study, the performance of the systems is evaluated when the interval t = 1 hour. A True Negative (TN) is considered to be a “stable” patient for whom there were no alerts generated and finally a False Positive (FP) is a “stable” patient for whom at least one alert was generated. One of the effects of the use of this N methodology is the fact that the specificity (defined as being F PT+T N ) will be the same in all cases since the TN and FP do not depend on time t.


163

The performance of the EWS systems as a tool to detect patient deterioration in “event” patients is analysed for five different patient monitoring cases: (a) using nurses’ intermittent clinical observations data only; (b) using bed-side monitor continuous data only; (c) using bed-side monitor continuous data with the persistence criterion (as explained in section 2.3); (d) using clinical observations and continuous data; and (e) using clinical observations and continuous data with the persistence criteria. Cases (a), (c) and (e) try to simulate three possible scenarios of patient monitoring in an ED. Cases (b) and (c) allow the effect of the persistence criterion on alert generation on continuous data to be P +T N studied. The accuracy (ACC = T P +TTN +F P +F N ) and the Matthews correlation coefficient (MCC) were used to study the performance. The MCC is a balanced measure of quality of binary classifications, which can be used even if the classes are of very different sizes 2 .

3

Results

The median time to escalation to the Resus area for the “event” patients considered in this paper was 1.8 hours (range is 38 min to 9 hours, Figure 2). Patient that were escalated to the Resus area within the first 30 min were assumed to be escalated at arrival. In this paper we are interested in reviewing the use of EWS in patients that escalated after arrival.

Fig. 2. a) Distribution of time from patients arrival to the ED Majors area to their escalation to the Resus areas. The escalations that happened within 30 min of the patient arrival to the ED were removed. The mean and median times to escalation for a total of 41 patients are 2.4 hours and 1.8 hours respectively (range is from 38 min to 9 hours).

2

M CC = √

T P ∗T N−F P ∗F N . (T P +F P )(T P +F N)(T N+F P )(T N+F N)

The MCC returns a value between

-1 and 1. 1 represents a perfect prediction, 0 no better than random prediction and -1 indicates total disagreement between prediction and observation.

164


Fig. 3. Performance of EWS systems when considering patients escalated to Resus after arrival for three monitoring cases, and two different aggregated score thresholds for MEWS and CEWS. When appropriate, the aggregated score threshold used is presented next to the EWS system in the legend.

Fig. 4. Performance of EWS systems when considering patients escalated to Resus after arrival for the ideal patient monitoring cases, with and without persistence criterion on the continuous data and two different aggregated score thresholds for MEWS and CEWS. When appropriate, the aggregated score threshold used is presented next to the EWS system in the legend. Table 1. Results of performance of EWS systems to identify physiological deterioration within 1 hour of the escalation to Resus. Results are sorted by accuracy (ACC). EWS System MET CEWS 4 MEWS 4 CEWS 3 MEWS 3 NEWS 5 Total

TP 19 20 26 26 27 26 41

TN SE (%) SP (%) ACC (%) 390 46 79 76 345 49 70 68 328 63 66 66 303 63 61 61 295 66 60 60 289 63 59 59 494 -

MCC 0.16 0.107 0.165 0.134 0.138 0.118 -


165

Figure 3 shows the performance results for detecting physiological abnormality, using the EWS systems previously described, within 1 hour of the escalation time, when only using either the clinical observations or the continuous data. These are shown in a Receiver Operator Characteristic (ROC) plot. The system with best accuracy and MCC was MET when using the clinical observations (ACC=88% and MCC=0.21, Table 6 ). When only using clinical observations MEWS shows the next best performance, but when using continuous data CEWS shows higher accuracy then MEWS, but lower MCC. The best performing systems present high specificity (70 to 80%) but low sensitivity (30 to 50 %). The use of continuous data without the persistence criterion generates many false alerts, contributing for a lower accuracy. Figure 4 shows the performance of EWS systems when using both clinical observations and continuous data, and consequently the combination of their alerting criteria. Case (d), with no persistence criterion for the continuous data presents a low accuracy (