ROBUST SENSOR FUSION IMPROVES HEART RATE ESTIMATION: CLINICAL EVALUATION Je¡rey M. Feldman, MD, MSE,1 Mehboob H. Ebrahim,2 and Izhak Bar-Kana, PhD 2
Feldman JM, Ebrahim MH, Bar-Kana I. Robust sensor fusion improves heart rate estimation: clinical evaluation. J Clin Monit 1997; 13: 379^384
ABSTRACT. Objective. To determine if Robust Sensor Fusion (RSF), a method designed to fuse data from multiple sensors with redundant heart rate information can be used to improve the quality of heart rate data. To determine if the improved estimate of heart rate can reduce the number of false and missed heart rate alarms. Methods. A total of 85 monitoring periods were investigated, 12 from the operating room, 60 from adult ICU and 13 from pediatric ICU. The operating room periods began with induction of anesthesia and ended at the completion of the anesthetic. For the ICU data, four hour blocks of time were studied. For each monitoring period, HR values were recorded at 5 second intervals or less from the ECG, SpO2 and IAC using a SpaceLabs Medical GatewayÕ connected to a SpaceLabs Medical PC2Õ. Fused estimates of HR were derived for every time point using RSF and all results accepted regardless of con¢dence value. Data were annotated manually to identify the ``reference'' HR (that HR value most likely to be correct) at all time points. All HR values from the sensors and the fused estimate that were di¡erent from the reference HR by more than 5 beats/min were considered inaccurate. For each monitoring period, the total time per hour that data were either inaccurate or unavailable was calculated for each sensor as well as the fused estimates. The total time of false and missed HR alarms was found for all sensors and the fused estimate by comparing the data to thresholds for both high and low HR alarms at 150 bpm, 130 bpm, 110 bpm and 50 bpm, 40 bpm, 30 bpm respectively. Results. The fused estimate of HR was consistently as good or better than the estimate available from any individual sensor. The fused estimates also consistently reduced the incidence of false alarms compared with individual sensors without an unacceptable incidence of missed alarms. Discussion. Redundancy in sensor measurements can be used to improve HR estimation in the clinical setting. Methods like RSF which improve the quality of monitored data and reduce nuisance alarms will enhance the value of patient monitors to clinicians. KEY WORDS. Patient monitoring, heart rate, electrocardiogram, alarms, sensor fusion, robust estimation.
INTRODUCTION
From the Departments of 1 Anesthesiology, Allegheny University of the Health Sciences, 2 Electrical and Computer Engineering Drexel University, Philadelphia, PA, U.S.A. Received Dec 16, 1996, and in revised form Sep 9, 1997. Accepted for Publication Sep 9, 1997. Address correspondence to Je¡rey M. Feldman, Department of Anesthesiology, Allegheny University of the Health Sciences, MCP Division, 3300 Henry Ave., Philadelphia, PA 19129, U.S.A. E-mail:
[email protected] Journal of Clinical Monitoring 13: 379^384, 1997. ß 1997 Kluwer Academic Publishers. Printed in the Netherlands.
The technology available for physiologic monitoring during anesthesia and in the critical care unit has advanced dramatically over the years. Despite these advances, it has been said that ``the best monitor is a drop of glue applied between a ¢nger and the temporal pulse.'' Although it is not a popular sentiment among developers of monitoring technology, this simple statement contains certain truths which should be heeded. First, it emphasizes that the human is still needed as part of the monitoring e¡ort to analyze data and make deci-
380
Journal of Clinical Monitoring Vol 13 No 6 November 1997
sions. Second, implicit to the recommendation to feel the pulse directly is a lack of con¢dence in the data available from physiologic monitors. Despite concerns about the quality of data available from physiologic monitors, there are few studies which document the accuracy of monitored data. Wesseling and Smith showed that continuous blood pressure data is lacking almost 10% of the time when a continuous £ush is used [1]. We have found that estimates of heart rate are frequently in error [2]. Certainly clinical experience indicates that monitored data are often incorrect. Clinicians are also well aware of the useless alarms which blare for no physiologic reason [3]. Although the role of technology in patient care may be subject to debate, one cannot dispute that the role of the human remains ¢rm; machines are not likely to replace the human in patient care nor should they. Instead, the focus should be on improving the machines to serve as better tools for the human. This e¡ort is ongoing and includes projects to create more sophisticated alarm algorithms and advanced information systems for organizing clinical data. Although these systems have the potential to aid the clinician greatly, they all depend on one basic building block ^ good quality data! Without quality data these systems are unlikely to be e¡ective. This project is directed towards evaluating a new statistical methodology for combining estimates derived from sensor readings called Robust Sensor Fusion (RSF) [4] (Figure 1). The underlying concept is to combine or ``fuse'' redundant data from di¡erent sensors to obtain the best estimate of the common information, and to identify when a particular sensor is in error. In addition to looking for consistency between sensors, the technique also uses consistency with past informa-
tion and with physiologic expectations to ¢nd the best estimate of heart rate. If RSF is e¡ective, the fused estimate should be more consistently reliable than the estimates available from any individual sensor. Although the technique is applicable to any setting where there is more than one source of the same information, this study focuses on heart rate estimation. Continuous heart rate measurements are available from several sensors including the electrocardiogram (ECG), pulse oximeter (SpO2 ) and intraarterial catheter (IAC). Since heart rate estimates are available from multiple sensors, it is possible to combine the heart rate information and analyze that information in concert to determine how likely it is that each individual sensor has the correct estimate. The primary question we sought to address is whether or not we could combine information from multiple signals to improve heart rate estimation relative to the accuracy of heart rate estimates that are available from any individual sensor. Secondarily, we sought to examine whether the improved estimation of heart rate could reduce the incidence of false heart rate alarms. METHODS A total of 85 monitoring periods were investigated, 12 from the operating room, 60 from adult ICU and 13 from pediatric ICU. Patient consent was not sought since no alteration in clinical management was required and anonymity was maintained. The operating room periods began with induction of anesthesia and ended at the completion of the anesthetic. For the ICU data, four hour blocks of time were studied. For each monitoring
Fig. 1. Diagrammatic representation of sensor fusion process. Sample data represent a trend plot of derived heart rate values.
Feldman: Robust Sensor Fusion Improves Heart Rate Estimation: Clinical Evaluation
period, HR values were recorded at 5 second intervals or less from the ECG, SpO2 and IAC using a SpaceLabs Medical GatewayÕ (Redmond, WA) connected to a SpaceLabs Medical PC2Õ patient monitor. The values for blood pressure and percent oxygen saturation were also recorded from the IAC and SpO2 respectively. Fused estimates of HR were derived whenever a new sensor reading became available using RSF [4]. Since RSF is a statistical method, each estimate derived from multiple sensors has an associated con¢dence value which expresses how likely the fused estimate is to be correct. All fused estimates were recorded along with the associated con¢dence values. Data recorded from each sensor were annotated manually by one of the authors to identify the ``reference'' HR (that HR value most likely to be correct) at all time points. Whenever a HR estimate from a sensor or the fused result was di¡erent from the reference HR by more than 5 beats/min, the estimate was considered inaccurate. Any data missing from a given time point was considered unavailable. For each monitoring period, the total time per hour that heart rate data were either inaccurate or unavailable was calculated for each individual sensor (Ti, where i is the sensor number) as well as for the fused estimates (Tf). The sensor fusion advantage was de¢ned as the ratio of Ti to Tf so that a larger ratio indicates a relatively greater amount of faulty data from the individual sensor when compared with the fused estimate. Since RSF is a statistical methodology, a con¢dence value is generated that can be used to optimize the performance of the technique. By changing the threshold of the con¢dence value at which a fused estimate is accepted, the ability to discriminate between good and bad data is altered. To evaluate the impact of decision threshold on performance, the number of minutes of false positive and false negative estimates were found for con¢dence values of 0.0, 0.2, 0.4, 0.6, 0.8 and 1.0. A false positive estimate was de¢ned as an incorrect HR value identi¢ed as good. A false negative estimate was de¢ned as a correct estimate labeled as bad. Performance compromises were evaluated by examining the change in false positive and false negative estimates at di¡erent con¢dence thresholds. As the con¢dence value threshold for accepting a fused estimate as correct is increased, one would expect the number of minutes of false positive estimates to decrease (less bad data accepted as good), and the number of false negative minutes to increase (more good data rejected as bad). The impact of heart rate estimates on heart rate alarms was evaluated by calculating the number of data points that would trigger an alarm for both high and low HR alarms at 150, 130 , 110 and 30, 40, 50 bpm
381
respectively. The total time of false and missed HR alarms was found for all sensors and for the fused estimate. False alarms were de¢ned as those sensor values or fused estimates that violated the alarm threshold when the reference heart rate was within the threshold. Missed alarms were de¢ned as those sensor values or fused estimates that did not violate the alarm threshold when the reference heart rate did. The relative numbers of false and missed alarms were calculated at con¢dence value thresholds of 0.0, 0.2, 0.4, 0.6, 0.8 and 1.0. RESULTS When the number of minutes of unavailable plus artifactual data per hour are compared for each monitoring period, the fused estimate of heart rate is as good or better than the estimate derived from any individual sensor (Figure 2). Note that the IAC was consistently the worst source of heart rate information followed by the pulse oximeter and then the electrocardiogram. No sensor was consistently reliable however so that the fusion method had to identify which sensors were correct for every point in time. Although for many of the monitoring periods, the individual sensors performed quite well, there were a large number of monitoring periods with signi¢cant artifact. For the monitoring periods studied, almost one-third had 10% or more of the individual sensor data annotated as faulty. In many cases, all three sensors had signi¢cant artifact at di¡erent times during the monitoring period. The performance results from a representative case (Abdominal Aortic Aneurysm Repair) demonstrate the fusion results when all three sensors had signi¢cant artifact. In this instance, the total case duration was 373 mins and each of the individual sensors were either faulty or unavailable for 25% or more of the case time (Table 1). Even at a con¢dence value of 0.0, where all fused estimates are accepted as good, the total time of false positive fused estimates was 23.5 mins; much less than the total time of faulty heart rate estimates from individual sensors. These results can also be used to demonstrate the impact upon performance by changing the con¢dence value threshold for accepting a fused estimate as correct (Figure 3). As the threshold for accepting a value as correct is increased, fewer bad fused estimates are accepted as good, i.e., fewer false positives. The penalty is more false negative results i.e., some good fused estimates are rejected as false because the con¢dence value is too low. Both low and high heart rate alarms were improved for many of the monitoring periods when alarms were based upon the fused estimate of heart rate. This was
382
Journal of Clinical Monitoring Vol 13 No 6 November 1997
Fig. 2. Total number of minutes of unavailable plus artifactual data per hour for the fused estimate and individual sensors. The monitoring periods are ranked from worst to best. All fused estimates accepted as correct (con¢dence value 0.0).
true for both false and missed alarms. There were more false and missed heart rate alarms based upon the arterial pressure signal and the pulse oximeter than on the fused estimate of heart rate and the ECG. In all cases except one, the false and missed alarms from RSF were less than those obtained from the ECG. Increasing the con¢dence value at which fused estimates would be
accepted reduced the number of false positive alarms and increased the number of missed alarms. The results of ¢nding the low and high heart rate alarms are presented for the monitoring period with the greatest number of false and missed alarms when heart rates were obtained from the ECG (best sensor source) versus the fused estimate (Tables 2, 3).
Table 1. Performance assessment for RSF applied to data obtained during abdominal aortic aneurysm repair.The total case duration was 373 minutes. Each of the individual sensors either provided artifact or were unavailable for a greater amount of time than the fused estimates. The Fusion advantage is a ratio of the total times each sensor was unavailable or artifactual to the total time for the fused estimate.
DISCUSSION
Unavailable (mins) Artifact (mins) Total Fusion advantage
ECG
SpO2
IAC
Fusion
3.1 91.3 94.4 3.5
97.7 49.1 146.8 5.5
19.1 104.7 123.8 4.6
3.2 23.5 26.7
The results of this study indicate that fused estimates of heart rate obtained using RSF are consistently better than the estimates that can be obtained from any individual sensor. Not only is the quality of the heart rate estimates improved, but fewer false high and low heart rate alarms will occur if the alarms are based upon the fused estimate of heart rate. The alarm data must be interpreted carefully. Our analysis counted each data point that would trigger an alarm recognizing that in reality, each data point would not be an individual alarm, only those data points that are consecutive.
Feldman: Robust Sensor Fusion Improves Heart Rate Estimation: Clinical Evaluation
383
Fig. 3. The change in performance for fused estimates is indicated in relation to the con¢dence value decision threshold. As the threshold is raised fewer bad fused estimates are accepted as good (false positive) but the number of good fused estimates which are considered bad increases (false negatives). Data obtained from an actual case (abdominal aortic aneurysm repair).
Nevertheless, the positive impact upon alarms is documented. In many cases, the amount of bad data from any individual sensor is not that great in which case the improvement to be realized by the fusion approach is marginal. In those cases where there are large amounts of bad data from individual sensors the advantage of the fusion approach becomes clear. RSF should therefore be of particular assistance in di¤cult situations where patients are moving or the signals are worsening due to hypoperfusion or dysrhythmias. An important advantage of the RSF technique is that for any given measure-
ment it does not matter which sensor is providing bad data. The method of analysis employed in this study was designed to document the quality of the data obtained. For any monitoring system there are design compromises that in£uence the quality of the data. E¡orts to reject artifact will invariably be limited by concern about rejecting useful data. Ultimately, a compromise is reached where a certain amount of artifact is accepted so as not to label an unacceptable amount of good data as artifactual. The statistical method employed here generates a con¢dence value which can be used as a basis
Table 2. False and Missed high heart rate alarms for the worst case monitoring period using ECG data and the fused estimate of heart rate in minutes per hour. All fused estimates were accepted as correct, that is, the decision threshold for accepting a fused estimate was a con¢dence value of 0.0.
Table 3. False and Missed low heart rate alarms for the worst case monitoring period using ECG data and the fused estimate of heart rate in minutes per hour. All fused estimates were accepted as correct, that is, the decision threshold for accepting a fused estimate was a con¢dence value of 0.0.
150 130 110
False ECG
Alarms RSF
Missed ECG
Alarms RSF
0.6 1.0 1.9
0.1 0.1 0.2
1.63 1.8 5.3
0.1 0.5 1.6
30 40 50
False ECG
Alarms RSF
Missed ECG
Alarms RSF
4.7 5.2 5.8
2.1 2.2 2.3
0 0 0.8
0 0 0
384
Journal of Clinical Monitoring Vol 13 No 6 November 1997
for decision making and will in£uence the performance of the method. The analytical approach documents the impact of selecting di¡erent con¢dence values as the decision threshold. Improvements in the method will be evident by better discrimination between good and bad data at a lower decision threshold. It is desirable to identify the optimal con¢dence value to use as a decision threshold that will maximize the ability to discriminate good from bad data, or true from false alarms. The data obtained for this study are from diverse enough patients and clinical settings that it is not possible to pool the data to explicitly determine the optimal con¢dence value decision threshold. Additional study is needed in similar patient groups to ¢nd the optimal con¢dence value threshold. One limitation of the study is that we do not know the true heart rate with certainty. The reference heart rate used for data analysis was obtained by manual annotation of the data ¢les. Each observer (the authors) indicated the reference heart rate by inspecting all of the available signals and making a best guess of the true heart rate. If too much noise was present, the reference heart rate was considered unavailable. The results of the analysis are however consistent across many cases which provides con¢dence that the manual annotation is appropriate. An important limitation on performance imposed by the data is the use of digitized values from the sensors. This type of data is only available at discrete intervals that are fairly long relative to changes in heart rate. It is impossible therefore to obtain a fused estimate on a continuous basis. In addition, the heart rate extracted from individual sensors is derived from a pre-processed signal which may be in£uenced by signal processing artifact such as long periods of averaging. Future plans include application of this methodology to analog signals in which case it should be possible to be very exact about the quality of the signal. Beat ^ to ^ beat changes can be quanti¢ed and the additional information in the analog signal should improve the fusion approach. Furthermore, once the heart rate information can be reliably used to evaluate the quality of the analog signal, it should be possible to improve feature extraction (e.g., blood pressure) from these signals and thereby improve the quality of all data obtained from the signal. The utility of current patient monitors is limited by the frequency of faulty data and the inability to reliably document the quality of any individual measurement. These results indicate that it is possible to use robust sensor fusion to both improve and document the quality of data from multiple sensors. Although technology is unlikely to completely replace the value of the ¢nger
on the pulse, sensor fusion represents a signi¢cant advance which enhances the value of patient monitoring to the clinician. This work presented originally at the annual meeting of the Society for Technology in Anesthesia, January 1996. Special thanks are due to SpaceLabs Medical (Redmond, WA) and Stephen French, PhD for making this project possible. A substantial portion of this work was completed at Albert Einstein Medical Center, Philadelphia, PA. The e¡orts of Aurel Cernaianu, MD in facilitating the collection of Intensive Care Unit data at Cooper Hospital ^ University Medical Center, Camden NJ are greatly appreciated.
REFERENCES 1. Wesseling KH, Smith NT. Availability of intraarterial pressure waveforms from catheter-manometer systems during surgery. J Clin Monit 1985; 1: 11^16. 2. Feldman JM, Ebrahim M. Which sensor measures heart rate best? Anesthesiology 1995; 83 (3A): A478. 3. Kestin IG, Miller BR, Lockhart CH. Auditory alarms during anesthesia monitoring. Anesthesiology 1988; 69: 106^109. 4. Ebrahim M, Feldman JM, Bar-Kana I. A robust sensor fusion method for heart rate estimation. J Clin Monit 1997; in press.