Hybrid Outcome Prediction Model for Severe Traumatic Brain Injury

JOURNAL OF NEUROTRAUMA Volume 24, Number 1, 2007 © Mary Ann Liebert, Inc. Pp. 136–146 DOI: 10.1089/neu.2006.0113

Hybrid Outcome Prediction Model for Severe Traumatic Brain Injury BOON CHUAN PANG,1 VELLAISAMY KURALMANI,2 ROHIT JOSHI,3 YIN HONGLI,3 KAH KEOW LEE,1 BENG TI ANG,1 JINYAN LI,2 TZE YUN LEONG,3 and IVAN NG1

ABSTRACT Numerous studies addressing different methods of head injury prognostication have been published. Unfortunately, these studies often incorporate different head injury prognostication models and study populations, thus making direct comparison difficult, if not impossible. Furthermore, newer artificial intelligence tools such as machine learning methods have evolved in the field of data analysis, alongside more traditional methods of analysis. This study targets the development of a set of integrated prognostication model combining different classes of outcome and prognostic factors. Methodologies such as discriminant analysis, logistic regression, decision tree, Bayesian network, and neural network were employed in the study. Several prognostication models were developed using prospectively collected data from 513 severe closed head-injured patients admitted to the Neurocritical Unit at National Neuroscience Institute of Singapore, from April 1999 to February 2003. The correlation between prognostic factors at admission and outcome at 6 months following injury was studied. Overfitting error, which may falsely distinguish different outcomes, was compared graphically. Tenfold cross-validation technique, which reduces overfitting error, was used to validate outcome prediction accuracy. The overall prediction accuracy achieved ranged from 49.79% to 81.49%. Consistently high outcome prediction accuracy was seen with logistic regression and decision tree. Combining both logistic regression and decision tree models, a hybrid prediction model was then developed. This hybrid model would more accurately predict the 6-month post–severe head injury outcome using baseline admission parameters. Key words: adult brain injury; assessment tools; Bayesian network; decision tree; discriminant analysis; human studies; logistic regression; neural network; outcome measures; traumatic brain injuries INTRODUCTION

T

RAUMATIC BRAIN INJURY (TBI) continues to be a major public healthcare problem and a common cause of death and disability amongst the young (Jennett, 1996;

Thurman et al., 1999). Management of severe TBI is costly and labor-intensive, with significant emotional toll on caregivers and healthcare providers. As such, accurately predicting outcome in severe TBI would allow healthcare institutions to organize rational treatment par-

1Acute

Brain Injury Research Laboratory, Department of Neurosurgery, National Neuroscience Institute, Singapore. Mining Lab, Knowledge Discovery Department, Institute for Infocomm Research, Singapore. 3Medical Computing, Department of Computer Science, School of Computing National University of Singapore, Singapore. 2Data

136

HYBRID OUTCOME PREDICTION MODEL FOR SEVERE TBI adigms and strike a balance between escalating cost of modern TBI management and limited resources. In addition, a more realistic outlook may be offered to anxious caregivers. Choi and Barnes (1996) compared the accuracy of several prediction models using data from the Traumatic Coma Data Bank (TCDB). The group demonstrated a predictive accuracy with an upper limit of between 75% and 80% in severe TBI patients based on data available at admission. The inaccuracies were attributed to different treatment strategies, unknown prognostic factors, and inherent patient variability. Whilst there was good reliability in predicting extremes of outcome (i.e., excellent or poor outcome), the accuracy dropped drastically to 50–60% when prediction was attempted on the intermediate outcome group. Many different variables have been studied in an attempt to improve prediction accuracy in severe TBI patients (Bowers and Marshall, 1980; Braakman et al., 1980; Choi et al., 1983, 1988, 1991, 1994; Clifton et al., 1992; Overgaard et al., 1973; Signorini et al., 1999; Eisenberg, 1990; Gennarelli, 1982; Lokkeberg and Grimes, 1984; Marshall, 1991). Very often, past studies were based on different patient population. Conclusions were drawn from analyses which employed various statistical methods. This has made direct comparison of results difficult. Furthermore, these statistical models often utilize entire dataset in prediction model building, thus falsely elevating the prediction accuracy of models studied. To overcome this, our study compared outcome prediction using various statistical and artificial intelligence tools on a prospectively collected dataset of severe TBI patients from a single neurosurgical centre.

METHODS With institutional review board and medical ethics committee approval data was prospectively collected from April 1999 to February 2003. A total of 513 patients with non-penetrating severe TBI (Glasgow Coma Scale [GCS] score of 8 or less) were admitted to the Neurocritical Unit at the National Neuroscience Institute in Singapore (Table 1). The severe TBI treatment protocol in our unit follows an algorithm supporting aggressive monitoring of physiological parameters, prompt intervention in addressing secondary systemic, and neurological insult (Ng et al., 1998). An incremental escalation of treatment intensity approach to achieve normal physiological parameters was adopted. This includes achieving cerebral perfusion pressure (CPP) of more than 60 mm Hg, whilst limiting intracranial pressure (ICP) to 20 mm Hg or less, maintaining euvolemia, normothermia, and

TABLE 1. PROGNOSTIC FACTORSa IN 513 PATIENTS WITH SEVERE HEAD INJURY No. Age (mean) Gender Male Female Ethnic group Chinese Malay Indian Others Mechanism of injury MVA Fall from height Fall Assault and violent Others Not known Presence of TSAH Presence of hypoxia Presence of hypotension Presence of coagulopathy Median pre-GCS Pre-pupillary abnormality Median post-GCS Post-pupillary abnormality Glasgow Outcome Score Death Vegetative state Severe disability Moderate disability Good recovery

% 44.4 19.7

406 107

79.1% 20.9%

357 66 38 52 . 231 89 136 16 36 5 247 58 58 173.b

69.6% 12.9% 7.4% 10.1% 45.% 17.3% 26.5% 3.1% 7.% 1.% 48.1% 11.3% 11.3% 33.7% 8

168.c

32.7% 7

205.d

40.%

182 34 43 59 195

35.5% 6.6% 8.4% 11.5% 38.%

aPrognostic

factors excluded: type of motor vehicle accident, cervical injuries, multiple injuries, other contributing factors (e.g., depression). b14 missing values. cThree missing values. d14 missing values. MVA, motor vehicle accident; TSAH, traumatic subarachnoid hemorrhage; GCS, Glasgow Coma Scale.

euglycemia. Bifrontal craniectomy was indicated in patients with persistently raised intracranial pressure refractory to maximal medical therapy (mannitol, cerebrospinal fluid [CSF] drainage, permissive hyperventilation, and barbiturate coma). Data collection included patients’ demographics, injury details, presence of coagulopathy, hypoxia (defined as SPO2 90), hypotension (defined as systolic blood pressure 90 mm Hg), pre- and post-resuscitation findings for both GCS and pupillary light response. Pre-resuscitation status refers to the clinical status on arrival at the emergency department prior to any medical inter-

137

PANG ET AL. TABLE 2. TABULATED PREDICTION ACCURACY

FOR

MODELS

WITH

337 PATIENTS

AND

Five-category GOS: model a (337,16,15)

16 PROGNOSTICATION FACTORS Three-category GOS: model b (337,16,3)

Methodology

Training

Test

Overall

Training

Test

Overall

DA LR NN BN DT

68.80% 73.40% 49.79% 75.25% 78.97%

70.80% 74.08% 55.56% 57.89% 74.08%

69.43% 70.63% 48.37% 73.29% 75.08%

71.90% 75.11% 70.82% 77.59% 77.26%

70.80% 74.08% 81.49% 60.53% 81.49%

71.55% 72.70% 69.44% 75.67% 75.08%

aData

from 337 patients, incorporating 16 prognostic factors, were used to build each prediction model. Patient outcome was segregated into five distinct categories. bData from 337 patients, incorporating 16 prognostic factors, were used to build each prediction model. Patient outcome was segregated into three distinct categories. DT, decision tree; LR, logistic regression; DA, discriminant analysis; BN, Bayesian network; NN, neural network.

vention. Post-resuscitation refers to the clinical status following correction of preexisting hypotension or hypoxemia. This parameter was collected following stabilization and transfer to the neurocritical unit. A single independent scorer (either in outpatient clinic or via telephone contact) determined patients’ outcomes by measuring the Glasgow Outcome Scale (GOS) score at 6 months post-injury. Sixteen variables previously identified to be highly predictive of outcome (Choi et al., 1983, 1988, 1991, 1994; Signorini et al., 1999; Eisenberg, 1990; Gennarelli, 1982; Lokkeberg and Grimes, 1984; Marshall, 1991) were measured in this study (Table 2). In cases whereby patients were directly transferred from the emergency department to the operating theatre, no post-resuscitation GCS or pupillary light response was recorded (30% of cases). However, all such patients would have been fully resuscitated prior to a decision for transfer to operating theater being made. In such instances, only 14 prognostic variables would be recorded. Outcomes were stratified into three or five categories. A three-category GOS (GOS 3) segregated outcomes into death (GOS 1), disabled (GOS 2), and good recovery (GOS 3) groups. A five-category GOS (GOS 5) segregated outcomes into death, vegetative, severely disable, moderately disable, and good recovery groups. The data collected was effectively divided into two sets (Fig. 1): • Dataset 1: 337 patients with 16 prognostic factors (incorporating pre- and post-resuscitation data for both GCS and pupillary light response). Pre-resuscitation refers to findings on arrival in emergency department. Post-resuscitation refers to findings fol-

lowing arrival in intensive care unit with physiological abnormalities such as hypoxia and hypotension fully corrected. • Dataset 2: 513 patients with 14 prognostic factors utilizing the most recent GCS and pupillary light response (the most recent GCS and pupillary light response refers to findings related to post-resuscitation parameters). In the absence of post-resuscitation findings, pre-resuscitation GCS and pupillary light response were used instead.

Outcome Prediction Strategies Five prognostication methods were used to predict outcome for each scenario described earlier, giving a total of 20 predictive models (Tables 2 and 3) Ninety percent of data in each model were used to create an outcome prediction model. The remaining 10% were used as a test sample. This would allow a higher confidence in prediction accuracy when the model is used to run new data. Apart from this, a 10-fold cross-validation method was added to reduce overfitting in model development.

Prediction Models Each suggested methodology has its own characteristics. In discriminant analysis (DA), variables are grouped based upon shared characteristics (Choi, 1996; Afifi and Clark, 2004; Morrison, 1990). Logistic regression (LR) is advantageous in cases of unequal group sizes. Neural networks (NN) provides learning capability especially when data involves non-linear data structure. Decision tree (DT) is useful in identifying patterns from a vast dataset; it then maps successive splitting of each variable based on its true information into subgraphs (Quinlan,

138

HYBRID OUTCOME PREDICTION MODEL FOR SEVERE TBI TABLE 3. TABULATED PREDICTION ACCURACY

FOR

MODELS

WITH

513 PATIENTS

AND

Five-category GOS: model a (337,16,15)

14 PROGNOSTICATION FACTORS Three-category GOS: model b (337,16,3)

Methodology

Training

Test

Overall

Training

Test

Overall

DA LR NN BN DT

67.30% 71.55% 68.51% 65.12% 68.79%

66.90% 62.00% 58.00% 61.67% 66.00%

67.18% 69.01% 65.89% 64.72% 67.45%

69.60% 71.83% 66.86% 67.33% 70.17%

69.00% 62.00% 56.00% 60.00% 68.00%

69.43% 69.21% 64.33% 66.47% 69.40%

aData

from 513 patients, incorporating 14 prognostic factors, were used to build each prediction model. Patient outcome was segregated into five distinct categories. bData from 513 patients, incorporating 14 prognostic factors, were used to build each prediction model. Patient outcome was segregated into three distinct categories. DT, decision tree; LR, logistic regression; DA, discriminant analysis; BN, Bayesian network; NN, neural network.

1986). The Bayesian network (BN) model is able to infer future event based on analyses of past events. This approach allows identification of relationships between different variables. For the purpose of this study, Cheng’s bayesian network methodology was selected. Overfitting occurs when a prediction model differentiates patients’ outcome using clinically non significant factors. This will degrade the model’s ability to predict outcome when applied to a new population despite very good accuracy achieved using the training population. Thus, a comparison of overfitting error would provide some reassurance of the true accuracy and reliability each prediction model. A model’s prediction accuracy is dependent on the amount of data used in algorithm development. Often the highest prediction accuracy (at a cost of increased overfitting error) is achieved when the entire data collection is used for algorithm development. This difficulty was addressed in this study by addition of a 10-fold crossvalidation technique to each analysis. This technique rotates part of the data through the training algorithm. This is repeated until the entire data has been utilized for algorithm development. This allows all data to be used in model building and at the same time minimizes overfitting errors.

comparing predictions made using sixteen prognostic factors with outcomes stratified into five-category GOS (Table 2), DT was most accurate. However when analysis was limited to the use of test sample only, LR was as accurate as DT, both achieving a prediction accuracy of 74.08%. Comparing prediction results of groups with 14 prognostic factors and five-category GOS gave DT as the second most reliable method in terms of accuracy. The accuracies were 2.76% and 1.56% lower than those achieved with LR using training and overall data, respectively. The prediction accuracies achieved with fourteen prognostic factors were considerably less than prediction accuracies achieved with 16 prognostic factors for the var-

RESULTS The demographic profile of patients is shown in Table 1. The two commonest causes of TBI are fall and motor vehicle accidents, affecting 43% and 45% of the patients respectively. Prediction accuracies for training, test and the overall data are summarized in Tables 2 and 3. When

FIG. 1. A schematic representation of an integrated prognostication model. All 16 prognostic factors are listed in Table 1. In patients transferred from emergency department to operating theatre, post-resuscitation GCS and post-resuscitation papillary reaction to light are not available. GOS-5, five-category outcome scale; GOS-3, three-category outcome scale.

139

PANG ET AL. dataset involving five-category GOS and 14 prognostic factors, the results ranged from 65.89% for NN to 69.01% for DA. For dataset with three-category GOS and 16 prognostic factors group, prediction accuracies ranged from 69.44% for NN to 75.08% for DT. In dataset involving three-category GOS and 14 prognostic factors, the results ranged from 64.33% for NN to 69.43% for DA. These results show a narrower range and higher prediction accuracy in general for groups with three-category GOS or 16 prognostic factors when compared to groups with fivecategory GOS or 14 prognostic factors. This implies a reduction in prediction accuracy when the number of outcome categories is increased or fewer prognostic factors are employed in the prediction model. A DT analysis of 337 patients with 16 prognostic factors and three-category GOS is represented in Figure 2. This DT model gave the highest prediction accuracy

A

FIG. 2. Decision tree modeling for model B (337 patients, 16 prognostic factors, and 3 outcome categories). Generated by splitting data into training (model building)/testing (model selection)/validation (model accuracy): 70%:20%:10%. RGCS, post-resuscitation GCS; Rpupil, post-resuscitation pupillary light response (yes fixed, unreactive pupil; no reactive pupil); PreGCS, pre-resuscitation GCS; TSAH, traumatic subarachnoid hemorrhage (SAH); Coagu, presence of coagulopathy in first blood result; G. Recovery, good recovery (GOS 5); dis, disability (GOS 2,3,4); death, (GOS 1).

ious models. Overall the accuracies were lower by 10.18%, 7.18%, and 6.7% when training, test and overall data were compared, respectively. Results remained comparable in the three-category GOS (Table 3) group. In analyses involving groups with three-category GOS and 16 prognostic factors, DT gave the highest prediction accuracy for analyses using test and overall data. However the accuracy achieved using DT was 1.66% lower than that of LR when comparing analysis using training data only. In contrast, very low prediction accuracy was seen with NN in general. The results obtained from analyses using groups with 14 prognostic factors were slightly better than those achieved using 16 prognostic factors. For dataset with five-category GOS and 16 prognostic factors the prediction accuracies achieved ranged from as low as 48.37% for NN to as high as 75.08% for DT. In

B

FIG. 3. (A) Overfitting error for models A and B. DT, decision tree; LR, logistic regression; DA, discriminant analysis; BN, Bayesian network; NN, neural network. (B) Overfitting error for models C and D. DT, decision tree; LR, logistic regression; DA, discriminant analysis; BN, Bayesian network; NN, neural network.

140

HYBRID OUTCOME PREDICTION MODEL FOR SEVERE TBI TABLE 4. OVERALL PERFORMANCE Methodology Average performance of models a–d

OF

EACH METHODOLOGY

DT

LR

DA

BN

NN

73.1%

70.51%

69.39%

65.67%

63.38%

DT, decision tree; LR, logistic regression; DA, discriminant analysis; BN, Bayesian network; NN, neural network.

overall. The patients were initially split into two groups based on the most recent GCS cut off of five and more. Subsequent split was based on the most recent pupillary light response, pre-GCS and age. Presence of a poor preresuscitation or most recent GCS score was noted to be a strong predictor of poor outcome. Unreactive pupil, especially in patients above 65 years of age, was also highly correlated with a very poor outcome. While poor outcome is generally expected in patients with a poor GCS score, this is not inevitable. Figure 2 shows potential good outcome in patients with RGCS 4 and pre-GCS 6. This could represent prompt medical intervention (including rapid and appropriate surgical evacuation of mass lesions) following clinical deterioration in conditions amenable to surgical treatment such as acute epidural hematoma. The older (65 years old) group was associated with a poor outcome despite aggressive intensive care intervention. The mortality rate for this group approached 72.7%. Figure 3 compares the prediction errors among the models using training and test data. NN was most accurate for models a and b. For models c and d, DA provided the least overfitting error. In contrast, when the overall (average) performance for each model was compared, DT has the highest while NN has the lowest prediction accuracy (Table 4). In order to verify the robustness of this result, a repeat analysis of each method was carried out using a 10-fold cross-validation technique. The results achieved with 10fold cross-validation technique (Table 5) could be summarized as such: (1) The addition of post-resuscitation GCS and post-resuscitation pupillary light response

TABLE 5. PERCENTAGE ACCURACY ACHIEVED

DT DA LR NN BN

(model b (Fig. 4) and d (Fig. 5)) significantly improves prediction accuracy (compared to model a and c). (2) DT ranks third behind LR and NN in overall performance. However the actual difference is small. Coupled to its high degree of accuracy and robustness, LR has the advantage of deployment to patient’s bedside without need for a computation device. In severe head-injured patients, outcome prognostication is dependent on both methodology and prognostication factors. Overall, LR performed best in the absence of post-resuscitation GCS and pupillary light response data. In the presence of post-resuscitation data, DT produced the most accurate result. A single DT and LR (Tables 6 and 7) combined model would ensure a high degree of prediction accuracy across the different scenarios described in Figure 1.

DISCUSSION Decision-making in a TBI patient’s management often relies on outcome prediction made soon after head injury. In a patient who eventually succumbs to severe head injury, an early and accurate prediction allows finite resources to be put to better use. However, this runs the risk of treatment withdrawal, leading to a self-fulfilling prophecy if the prediction is inaccurate. Clinicians’ outcome assessment is notoriously inaccurate. Prediction accuracy is about 50–60% as evidenced by the TCDB literature (Choi, 1996). The prognostic significance of a variable may be intricately connected to other predictive factors. For example, Yono et al. (2001) showed presence of subarachnoid blood in head-injured patients with

WITH

TEN-FOLD CROSS-VALIDATION TECHNIQUE

Gos5/337 (model a)

Gos5/337 (model b)

Gos5/513 (model c)

Gos5/513 (model d)

Average (overall)

64.3917 61.7211 67.0623 64.9852 65.8754

69.7329 66.7656 70.3264 70.6231 65.2819

65.4971 61.7934 66.0819 66.0819 65.1072

68.8109 65.4971 66.4717 67.2515 63.5478

67.1082 63.9443 67.4856 67.2354 64.9531

DT, decision tree; LR, logistic regression; DA, discriminant analysis; BN, Bayesian network; NN, neural network.

141

PANG ET AL. TABLE 6. MODEL A: LOGISTIC REGRESSION Coefficients Variables

GOS1

GOS2

GOS3

GOS4

PreGCS RGCS RPupil Intercept

0.2771 0.8969 1.6201 4.4666

0.4314 0.4418 0.9435 2.1173

0.4842 0.0252 1.5378 0.7528

0.2405 0.4328 0.309 1.4946

PreGCS, pre-resuscitation (emergency department) GCS value; RGCS, post-resuscitation GCS (intensive care/stabilized physiological parameters after admission to intensive care); Rpupil, post-resuscitation pupillary light response (yes (1) fixed, unreactive pupil; no (0) reactive pupil); Intercept, intercept on y axis; GOS1, death; GOS2, vegetative; GOS3, severe disability; GOS4, mild disability. Outcome probability for GOS5 1/((sum of [n 1–4]Exp(Cn1(PreGCS) Cn2(RBCS) Cn3(Rpupil Intercept)) 1) Outcome probability for GOSn (Exp(Cn1(PreGCS) Cn2(RGCS) Cn3(RPupil) Intercept))/((sum of [n 1–4]Exp(Cn1(PreGCS) Cn2(RGCS) Cn3 (Rpupil) Intercept)) 1), where Cn1, Cn2, Cn3, Cn4 are coefficients and n GOS 1 to 4. An Excel format spreadsheet to automate the calculations is available at www.nni.com.sg.

acute subdural hematoma and similar GCS was associated with a poorer outcome. This has led investigators to use various computational methods in predicting outcome. As physicians may alter management decisions in response to results from outcome prediction models, it is vital that such predictive models be rigorously assessed for accuracy and robustness. While outcome prediction models have been exhaustively described in the literature, these models are often described in isolation or with limited comparison being made to alternative models. Additionally, the data accrued may be pooled from multiple institutions, for example, TCDB. In the absence of perfect sampling, direct comparison between different published models is not informative. Using data from a single center, this study was able to directly compare the discriminative power of five different prediction models. Direct comparison reveals improved prediction accuracy when outcome groups are reduced from five to three in this study. This reinforces earlier publications (Choi et al., 1991, 1996; Hukkelhoven et al., 2005) which described higher accuracy in predicting extreme of outcomes, such as with very good outcome or death. In contrast, intermediate outcomes of varying degrees of disability are much harder to predict. Choi et al. (1991) used decision tree to predict the outcome in 555 severe TBI patients. Using four variables (pupillary light response, age, motor response, and intracerebral hema-

toma) recorded on admission, they predicted good outcome, moderate disability, and death with an accuracy of 82.3%, 60.5%, and 81.5%, respectively. While the overall prediction accuracy was 77.7%, the accuracy for the group with moderate disability was only 60.5%. Recently, Rovlias and Kotosou (2004) used eight variables—GCS, pupillary light response, age, subarachnoid hemorrhage, intracranial diagnosis, whole blood cell count (on admission), glucose level on admission and on day 2—to construct a DT with an overall accuracy rate of 86.8%. Despite showing great promise, the DT was not tested on an independent test sample. Prediction accuracy obtained from a test data reflects the true accuracy better. In contrast accuracy achieved from training data is often biased. DT accuracy obtained using overall data for the threecategory GOS (cohort 2a) provided the highest prediction accuracy among the different strategies tested. This method avoids the burden of complicated computational algorithm. Interactions between the different variables may be read off the tree with ease, by following the split along the decision tree. Patients who deteriorated after admission (poorer RGCS compared to pre-GCS) may still have a good outcome if prompt intervention is carried out (Fig. 3). In contrast, poor GCS and unreactive pupils following resuscitative efforts often predict a poor outcome. Despite these observations, caution should still be taken when variables are near cut-off points, especially in pre-

TABLE 7. MODEL C: LOGISTIC REGRESSION Coefficients Variables

GOS1

GOS2

GOS3

GOS4

Age PreGCS RGCS RPupil Intercept

0.9193 0.418 0.8824 1.1465 1.7487

0.3906 0.4459 0.4193 1.0222 0.866

0.367 0.5346 0.068 1.1976 0.824

0.1976 0.2488 0.4291 0.1435 0.6944

PreGCS, pre-resuscitation (emergency department) GCS; RGCS, post-resuscitation GCS (intensive care/stabilized physiological parameters after admission to intensive care); Rpupil, post-resuscitation pupillary light response (yes fixed, unreactive pupil; no reactive pupil); age in years; Intercept, intercept on y axis; GOS1, death; GOS2, vegetative; GOS3, severe disability; GOS4, mild disability; GOS5 outcome probability 1 [1/((sum of [n 1 to 4] Exp(Cn1(Age) Cn2(PreCGS) Cn3(RGCS) C4(Rpupil) Intercept 1)) outcome probability for GOSn (Exp(Cn1(Age) Cn2(PreGCS) Cn3(RGCS) Cn4(Rpupil) Intercept))/((sum of [n 1 to 4], where Cn1, Cn2, Cn3, Cn4 are coefficients and n GOS 1–4. An Excel format spreadsheet to automate the calculations is available at www.nni.com.sg.

142

HYBRID OUTCOME PREDICTION MODEL FOR SEVERE TBI DA was able to tolerate loss of such data better albeit at the expense of a lower prediction accuracy. Therefore, when limited variables are available, DA remains a useful option. NN represents another non-parametric pattern recognition tool in artificial intelligence. It is able to determine the probability of each relationship among different factors and translate this into an outcome prediction tool. It has the advantage of accommodating all data types and is tolerant against missing data. Unfortunately, the training process requires a huge amount of data. This limits its accuracy in settings of relatively small population size often adequate for other less demanding statistical tools. This is reflected by the low accuracy seen when processing data from the four models, a–d. The exception is the high accuracy achieved with model b. This is difficult to explain as the inner working of neural networks cannot be deciphered. Alternatively, NN may be described as a black box in which variables keyed into it is

FIG. 4. Model B: decision tree (model built using 10-fold cross-validation). RGCS, post-resuscitation GCS (intensive care/stabilized physiological parameters after admission to intensive care); Rpupil, post-resuscitation pupillary light response (yes fixed, unreactive pupil; no reactive pupil); Prepupil, pre-resuscitation (emergency department) pupillary light response; PreGCS, pre-resuscitation (emergency department) GCS; age in years; GOS1, death; GOS2, vegetative; GOS3, severe disability; GOS4, mild disability; GOS5, good recovery.

dicting disability. This is due to the fact that each prognostic factor in a decision tree analysis is artificially split at each node. It may benefit physicians to assume the better outcome when predictions straddle the boundary of two separate outcomes in order to avoid a self-fulfilling prophecy, resulting in a poor patient outcome (Steinberg and Colla, 1995, 1997; Rovlias et al., 2004). Choi et al. (1988) used DA and the variables age, motor score and pupillary light response to achieve an outcome prediction accuracy of 78.4%. More recently, LR is often preferred over DA as it requires fewer assumptions. Furthermore, it accommodates non-linearly related independent variable, unequal group sizes, categorical and continuous variables better. LR models have been extensively used in the literature for predicting severe head injury outcome (Choi et al., 1994; Signorini et al., 1999; Murkherjee et al., 2000; Gomez et al., 2000; Pillai et al., 2003). Our study demonstrated higher prediction accuracy for LR over DA when important independent variables (post resuscitation GCS and pupillary response) were made available. In contrast,

FIG. 5. Model D: decision tree (model built using 10-fold cross-validation). RGCS, post-resuscitation GCS (intensive care/stabilized physiological parameters after admission to intensive care); when RGCS/Rpupil not available, replace values with preGCS/Prepupil; Rpupil, post-resuscitation pupillary light response (yes fixed, unreactive pupil; no reactive pupil); Prepupil, pre-resuscitation (emergency department) pupillary light response; PreGCS, pre-resuscitation (emergency department) GCS; age in years; GOS1, death; GOS2, vegetative; GOS3, severe disability; GOS4, mild disability; GOS5, good recovery.

143

PANG ET AL. transformed into an outcome result with the intervening analysis hidden from the user. In contrast, Lang et al. (1997) did not show any difference in predictive power between NN and LR. However, the population of their study was significantly more, as his database numbered one thousand sixty-six consecutive severe head injured patients. Validity of traditional Bayes method applied to TBI prognostication has been queried by Stablein et al. (1980). Sequential Bayes method makes the assumption of statistical independence for prognostic factors used. This is not always true. Instead, we utilized Bayesian network, a form of artificial intelligence, for our analysis. BN uses inferential statistics to predict future events based upon analyses of past events. Its usefulness is limited by the reliability of the available data. BN has the highest overall accuracy for cohort 2a. Unfortunately, the discriminative power was not preserved when the model was tested using test data. This implies a reduced performance in the presence of insufficient variables. Despite this, BN has the added benefit of allowing mapping of variable interactions and permitting contribution of each factor to be adjusted through expert opinion. The 10-fold cross-validation technique utilized to build prediction algorithm maximizes data availability for algorithm development without increasing sample biasness. It is very robust and mimics closely the actual result when the prediction model is applied onto unknown dataset. It is therefore not surprising that its prediction accuracy lies between that achieved by using all (100%) and 70% of the data. The accuracy of different prediction models (using 10-fold method) seems to be about 80% for predicting poor (GOS 1 and 2) or good outcome (GOS 5). While highly accurate, this leaves a 20% prediction error. A poor prognosis may lead to less aggressive treatment with resultant poor outcome. In the literature, inaccurate predictions are attributed to different treatment protocol and inherent patient variability. We believe that the predictive error in this study is due large to the inherent physiological variability as head injury management in our intensive care is protocol driven. The two commonest causes of TBI are fall and motor vehicle accidents, affecting 43% and 45% of the patients studied. In Europe, the figures are 37% and 40% respectively (Tagliaferri, 2005). The corresponding figures in the United States are slightly lower, at 21% and 25%, respectively (Langlois, 2004). Despite protocol driven modern head injury management, the outcome of severe head injury has not improved as a result of existing treatment. Addressing inter patient variability would require surrogate physiological or radiological

marker and developing a treatment regime tailored to each individual. In the United States, TBI-related death rate has declined 20% since 1980, a success attributed to good preventive strategies (Thurman, 1999). Have similar results been reproduced with modern severe TBI management? Regular monitoring in a specialized unit, and control of intracranial pressure and cerebral perfusion pressure have been introduced to address physiological parameters in the hope of improving patient outcome. In the TCDB study of the 1980s, mortality rate in severe TBI cohort was 33% at discharge and 36.3% at last contact (up to 2 years after injury) (Marshall, 1991). An important outcome of the study was the clear correlation of hypoxia and hypotension with poor outcome. Presence of both insults raise mortality rate to about 60% (Kelly, 1999). In the early 1990s (1991–1993), the mortality rate in Addenbrooke’s hospital was 22.7%. This was reduced to 20.2% following introduction of specialist neurosciences critical care unit, which utilizes an algorithm for severe TCI management focusing on parameters such as intracranial pressure and cerebral perfusion control (Patel, 2002). Such historical comparison seems to suggest significant improvement in severe TBI mortality as a result of neurosurgical care over the past two decades. However definitive conclusion is unavailable as a result of the inherent bias of historical control. Patient outcome is dependent on prognostic-, treatment- and patient-specific (random) factors. The accuracy in predicting TBI outcomes using data available from the TCDB data was about 70–75% (Choi, 1996). Subsequent model using 21 prognostic factors available in the emergency department increased the prediction accuracy to 78% (Choi, 1988). With modern data mining tools, we were able to marginally increase the prediction accuracy using admission data to about 80%. More importantly, the robustness of our study confers good reliability of the result. We believe that 80% approximates the upper limit of prediction accuracy using data available on admission. Further clinically significant improvement in prediction accuracy is unlikely to be achieved by analyzing admission data alone. Such improvement would require analysis of treatment effect and individual patient response. This requires analysis of physiological and biochemical parameters, with emphasis on the temporal progression of these parameters with treatment. Traditional physiological parameters like intracranial pressure, cerebral perfusion pressure would need to be considered alongside newer biochemical profiles such as tissue lactate, glycerol, and pyruvate. These newer modalities act as an early indicator of tissue ischemic response to injury and treatment. This will play a

144

HYBRID OUTCOME PREDICTION MODEL FOR SEVERE TBI crucial role in improving the poor accuracy achieved in predicting outcomes for GOS of 2, 3, or 4. Currently, the accuracy in predicting disability (GOS 2, 3, or 4) is about 50–60%. In conclusion, despite the wealth of statistical and artificial intelligence techniques available for severe TBI outcome prediction, the benefits and demands for each tool are quite different. Among the models, DT and LR are most reliable and accurate in predicting head injury outcome. However, DT analysis has the added benefit of a simple visual representation of its prediction algorithm. This facilitates clinical bedside use, and is a good compromise between ease of use and prediction accuracy. In the case of five-category GOS, LR performs better than other models. In general, the proposed hybrid model would satisfy different clinical scenarios encountered at admission. Also of note in this study is the maximum prediction accuracy of 70–80%, limited by prognostic factors available at admission only. Future work to improve upon this would require evaluation of the temporal sequence of clinical, radiological, and biochemical parameters.

ACKNOWLEDGMENTS Grant support is provided by SingHealth Foundation Funds (SHF/FG116S/2005). Our school of computing collaborators are funded by the Biomedical Research Council and the National University of Singapore (Medical Computing Laboratory research grant no. BM/00/ 007). At the commencement of this study, Dr Kuralmani was among the collaborators from the School of Computing, National University of Singapore. He has since taken up a position as research scientist at the Institute of Infocomm Research.

REFERENCES ABDELMONEM, A., and CLARK, V.A. (2004). Discriminant Analysis, Computer-Aided Multivariate Analysis, 4th ed. Chapman & Hall/CRC: London. BOWERS, S.A., and MARSHALL, L.F. (1980). Outcome in 200 consecutive cases of severe head injury treated in San Diego County: a prospective analysis. Neurosurgery 6, 237–242. BRAAKMAN, R., GELPKE, G.J., and HABBEMA, J.D.F. (1980). Systematic selection of prognostic features in patients with severe head injury. Neurosurgery 6, 362–370. BRAIN TRAUMA FOUNDATION. (2002). Management and prognosis of severe traumatic brain injury. Available at:

www.braintrauma.org/guidelines/index.php. Accessed November 1, 2006. BULLOCK, H.R., and POVLISHOCK, J.T. (1996). Guidelines for the management of severe head Injury. J. Neurotrauma 13, 639–731. CHENG, J., BELL, D.A., and LIU, W. (1997). Learning belief networks from data: an information theory based approach. Presented at the Sixth ACM International Conference on Information and Knowledge Management. CHOI, S.C., and BARNES, T.Y. (1996). Predicting outcome in the head-injured patient. Neurotrauma 53, 779–792. CHOI, S.C., WARD, J.D., and BECKER, D.P. (1983). Chart for outcome prediction in severe head injury. J. Neurosurg. 59, 294–297. CHOI, S.C., NARAYAN, R.K., ANDERSON, R.L., and WARD, J.D. (1988). Enhanced specificity of prognosis in severe head injury. J. Neurosurg. 69, 381–385. CHOI, S.C., MUIZELAAR, J.P., BARNES, T.Y., MARMAROU, A., BROOKS, D.M., and YOUNG, H.F. (1991). Prediction tree for severely head-injured patients. J. Neurosurg. 75, 251–255. CHOI, S.C., BARNES, T.Y., ROSS BULLOCK, M.S., GERMANSON, T.A., MAARMAROU, A., and YOUNG, H.F. (1994). Temporal profile of outcome in severe head injury. J. Neurosurg. 81, 169–173. CLIFTON, G.L., HAYES, R.L., and LEVIN, H.S. (1992). Outcome measures for clinical trials involving traumatically brain-injured patients: report of a conference. Neurosurgery 31, 975–978. EISENBERG, H.M., GARY, H.E., JR., ALDRICH, E.F., et al. (1990). Initial CT findings in 753 patients with severe head injury. A report from the NIH Traumatic Coma Data Bank. J. Neurosurg. 73, 688–698. GENNARELLI, T.A., SPIELMAN, G.M., LANGFITT, T.W., et al. (1982). Influence of the type of intracranial lesion on outcome from severe head injury. J. Neurosurg. 56, 26–32. GOMEZ, P.A., LOBATO, R.D., BOTO, G.R., DE LA LAMA A., GONZALEZ, P.J., and DE LA CRUZ, J. (2000). Age and outcome after head injury. Acta Neurochir (Wien) 142, 373–381. HUKKELHOVEN, C.W. (2005). Predicting outcome after traumatic brain injury: development and validation of a prognostic score based on admission characteristics. J. Neurotrauma 22, 1025–1039. JENNETT, B. (1996). Epidemiology of head injury. J. Neurol. Neurosurg. Psychiatry 60, 362–369. JENNETT, B., and BOND, M. (1975). Assessment of outcome after severe brain damage. Lancet 1, 480–484. KELLY, D.F. (1999). Emergency department management, in: D.W. Marion (ed), Traumatic Brain Injury. Thieme: NY, pps. 67–80.

145

PANG ET AL. LANGLOIS, J.A. (2004). Traumatic Brain Injuries in the United States: Emergency Department Visits, Hospitalizations, and Deaths. CDC: Atlanta. LOKKEBERG, A.R., and GRIMES, R.M. (1984). Assessing the influence of non-treatment variables in a study of outcome from severe head injuries. J. Neurosurg. 61, 254–262. MARSHALL, L.F. (1991a). The outcome of severe closed head injury. J. Neurosurg. 53, S28–S36. MARSHALL, R.J. (1991b). Mapping disease and mortality rates using empirical Bayes estimators. J. R. Stat. Soc. Ser. C Appl. Stat. 40, 283–294. MORRISON, D.F. (1990). Multivariate Statistical Methods, 3rd ed. McGraw-Hill: New York. MUKHERJEE, K.K., SHARMA, B.S., RAMANATHAN, S.M., KHANDELWAL, N., and KAK, V.K. (2000). A mathematical outcome prediction model in severe head injury— a pilot study. Neurol. India 48, 43–48. NG, I., LEW, T.W., YEO, T.T., et al. (1998). Outcome of patients with traumatic brain injury managed on a standardized head injury protocol. Ann. Acad. Med. Singapore 27, 332–339. OVERGAARD, J., CHRISTENSEN, S., and HVID-HANSEN, O. (1973). Prognosis after head injury based on early clinical examination. Lancet 2, 631–635. PATEL, H.C. (2002). Specialist neurocritical care and outcome from head injury. Intensive Care Med. 28, 547–553. PILLAI, S.V., KOLLURI, V.R., and PRAHARAJ, S.S. (2003). Outcome prediction model for severe diffuse brain injuries: development and evaluation. Neurol. India 51, 345–349.

SIGNORINI, D.F., ANDREWS, P.J.D., JONES, P.A., WARLAW, J.M., and MILLER, J.D. (1999). Predicting survival using simple clinical variables: a case study in traumatic brain injury. J. Neurol. Neurosurg. Psychiatry 66, 20–25. SPSS. (2006). SPSS 12.0 for Windows. SPSS Inc. USA: Chicago. STABLEIN, D.M., MILLER, J.D., CHOI, S.C., et al. (1980). Statistical methods for determining prognosis in severe head injury. Neurosurgery 6, 243–246. STEINBERG, D., and COLLA, P.L. (1995). CART: Tree-Structured Nonparametric Data Analysis. Salford Systems: San Diego. STEINBERG, D., and COLLA, P.L. (1997). CART—Classification and Regression Trees. Salford Systems: San Diego. THURMAN, D.J. (1999). Traumatic Brain Injury in the United States: A Report to Congress. CDC: Atlanta. THURMAN, D.J., ALVERSON, C., DUNN, K.A., GUERRERO, J., and SNIEZEK, J.E. (1999). Traumatic brain injury in the United States: a public health perspective. J. Head Trauma Rehabil. 14, 602–615. WILSON, W., PENN, C., SAFFER D., and AGHDASI, F. (2002). Improving the prediction of outcome in severe acute closed head injury by using discriminant function analysis of normal auditory brainstem response latencies and amplitudes. J. Neurosurg. 97, 1062–1069. YONO, J., YAMAURA, A., KUBOTA, M., OKIMURA, Y., and ISOBE, K. (2001). Outcome prediction in severe head injury: analyses of clinical prognostic factors. J. Clin. Neurosci. 8, 120–123.

QUINLAN, J.R. (1986). Induction of decision trees. Mach. Learn. 1, 81–106. ROVLIAS, A., and KOTOSOU, S. (2004). Classification and regression tree for prediction of outcome after severe head injury using simple clinical and laboratory variables. J. Neurotrauma 21, 886–893.

Address reprint requests to: Ivan Ng, M.D. Department of Neurosurgery National Neuroscience Institute 11, Jalan Tan Tock Seng Singapore 308433

SAS. (2006). Enterprise Miner 9.1. SAS Institute Inc.: Cary, NC.

E-mail: [email protected]

146