HEPAR: An Intelligent System for Hepatitis Prognosis and Liver Transplantation Decision Support Constantinos Koutsojannis, Andrew Koupparis, and Ioannis Hatzilygeroudis Department of Computer Engineering & Informatics, School of Engineering, University of Patras Rion, 265 00 Patras, Hellas (Greece)
[email protected],
[email protected]
Abstract. In this paper, we present the clinical evaluation of HEPAR, an intelligent system for hepatitis prognosis and liver transplantation decision support in an UCI medical database. The prognosis process, linguistic variables and their values were modeled based on expert’s knowledge the statistical analysis of the records of patients from the existed medical database and relevant literature. The system infers from rules the elements relating to prognosis and liver transplantation by combining the risk scores with weights which can vary dynamically through fuzzy calculations. First we introduce the medical problem, the design approach to the development of the fuzzy expert system and the computer environment. Then we present HEPAR architecture, the reasoning techniques in comparison with the probabilistic characteristics of the medical database. Finally we indicate a few details of the implementation. The expert system has been implemented in FuzzyCLIPS. The fuzzy rules are organized in groups to be able to simulate the diagnosis process. Experimental results showed that HEPAR did quite s well as the expert did.
1 Introduction 1.1 Problem Description According to the rules of evidence-based medicine every clinical decision should always be based on the best verified knowledge available. Additionally novel scientific evidence applicable to an individual case arises every day update from studies published all over the world. Checking day-by-day clinical decisions against the latest biomedical information available worldwide is a complex and time-consuming job, even though the advent of almost universal computer-based indexes. Most physicians lack the necessary time, skills or motivation to routinely search, select and appraise vast sets of online sources, looking for the most proper and up-to-date answers to a given clinical question concerning a single patient. The preferred option is still to ask more expert colleagues. When, as it is often the case, no experts are timely available, the physician has to rely on personal experience, which includes memory of guidelines, policies and procedures possibly refreshed by consulting handbooks or journals that may be at hand and not necessarily contain information that is specific, relevant and up-to-date. On the other hand, patients, especially those affected by chronic or C. Koutsojannis and S. Sirmakessis (Eds.): Tools & Appli. with Artificial Intel., SCI 166, pp. 163–180. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
164
C. Koutsojannis, A. Koupparis, and I. Hatzilygeroudis
progressive diseases may have time and motivation for systematically searching new sources of information that could possibly be related to their case. Though their knowledge may generally be inaccurate as an effect of medical illiteracy, media report distortions or wishful thinking; more and more frequently they ask questions about new findings or scientific advances of which the physicians are often completely unaware. Even today intelligent systems for diagnosis, prognosis and decision support are desirable. Among the numerous applications which have been found for intelligent systems in the medical field, many aim at solving problems in gastroenterology [1]. 1.2 Liver Transplantation In this paper the issue of evaluating prognosis in hepatitis is addressed. It is often difficult to identify patients at greater risk and foretell the outcome of the disease. The grading of the disease is an estimation of the severity regardless of the problem being acute or chronic, active or not, mild or severe. Many tests and grading systems that combine them have been devised in order to increase the validity of these estimations [2]. Their importance is obvious as they are used for liver transplantation decisions. A better discriminative value means better management of transplants, less time in the waiting list and finally, less death. An example of such a scoring system is the Child-Turcotte-Pugh score, invented in 1964 and modified in 1973, designed to estimate the risk of certain surgical operations. It uses the values of blood bilirubin, prothrombin time, albumin, grade of ascites and severity of encephalopathy. The score ranges from 5 to 15 and discriminates the patients in 3 classes. Classification in class B (score > 7) has been used as a strong criterion for transplantation. A newer system is MELD (Model for End-stage Liver Disease), invented by Mayo Clinic in 2002 also intended for a surgical operation [3, 4, 5]. It uses creatinine values (for the estimation of the kidney function), bilirubin, INR (a test relevant to prothrombin time) and adds a factor for non-alcoholic causes. A 15% reduction of the number of deaths in waiting lists for liver transplantation is reported after its use. Development of such grading systems always focuses on specific conditions and a well defined population [6]. However, researchers try to evaluate their application in other clinical problems beyond the original scope. So, their validity is confirmed in a wider range of situations [7, 8, 9, 10]. In this paper, we present a Fuzzy Expert System for hepatitis prognosis (called HEPAR). The system primarily aims to help in liver transplantation by physicians (but not experts). Also, it can be used by medical students for training purposes.
2 Knowledge Modeling 2.1 Database Description Appropriate diagnosis doctors with long experience in gastroenterology. To evaluate HEPAR we finally used the data from the UCI Machine Learning Repository (hepatitis.data). These include the outcome of the disease (live – die), age, sex, history information (use of steroids, antivirals, biopsy request), symptoms (fatigue, malaise, anorexia), signs (big liver, firm liver, palpable spleen, ascites, spider-like spots, varices) and laboratory results (bilirubin, alkaline phosphatase, SGOT, albumin, prothrombin time) [11].
HEPAR: An Intelligent System
165
The description (file hepatitis.names) is incomplete, but it sums up to:
● ● ● ●
6 continuous variables, 12 boolean (1-2 = yes-no), class (1-2 = die-live) and gender (1-2 = male-female).
Missing data are noted with “?”. Every line comprises of the 20 variables for each of the 155 cases. 2.2 Database Descriptive Statistics For boolean variables, the number of cases in each category are given in Table 1. In brackets, the percentage of those who die. Underlined, the statistical significant differences. Table 1a. Number of cases in each variable category in UCI hepatitis database Input/Class
32 DIE
123 LIVE
Sex
139 Male (23%)
16 Female (0%)
Steroids
76 Yes (26%)
78 No (15%)
Antivirals
24 Yes (8%)
131 No (23%)
Fatigue
100 Yes (30%)
54 No (4%)
Malaise
61 Yes (38%)
93 No (10%)
Anorexia
32 Yes (31%)
122 No (18%)
Big Liver
25 Yes (12%)
120 No (20%)
Firm Liver
60 Yes (22%)
84 No (17%)
Spleen Palpable
30 Yes (40%)
120 No (16%)
Spider
51 Yes (43%)
99 No (9%)
Ascites
20 Yes (70%)
130 No (13%)
Varices
18 Yes (61%)
132 No (15%)
Biopsy
70 Yes (10%)
85 No (29%)
Table 1b. Number of cases for continuous variables, average and range in UCI hepatitis database Input/Class
Class DIE
Class LIVE
Age
46.59 (30 - 70)
39.8 (7 - 78)
Bilirubin
2.54 (0.4 – 8)
1.15 (0.3 - 4.6)
Alkaline Phosphatase 122.38 (62 - 280)
101.31 (26 - 295)
SGOT
99.83 (16 - 528)
82.44 (14 - 648)
Albumin
3.15 (2.1 – 4.2)
3.98 (2.7 - 6.4)
Prothrombin time
43.5 (29 - 90)
66.57 (0 - 100)
166
C. Koutsojannis, A. Koupparis, and I. Hatzilygeroudis
In Fig.1 we can also see the primary statistics according to the input variables
Fig. 1. The HEPAR variables' distributions
3 System Architecture and Design 3.1 Data Entry Function readatafacts is defined in file functions.clp to take care of facts entry. The function handles [12, 13]: 1. 2. 3.
4.
the conversion of commas (“,”) to spaces (“ ”) in the input line so that explode$ function can operate and give us easy access to every value, the assertion of a fact (patient (value 1) (value 2) (…)) using default values: fuzzy values with equal representation 0.5 in their whole range, the modification of the fact for each variable in the input line (if it has numeric value and not a “?”) so that the new value is stored and finally end up with a fact storing the input values or the default for those missing, initialization of the facts (class live) and (class die) with a certainty factor of 0.01.
HEPAR: An Intelligent System
167
3.2 Fuzzy Rules Rules reside in file rules.clp. The last rule to be activated sets the result to the global variable out: ;This rule defines the outcome in the range 1 - 2 (defrule patientgoingtodie; last rule to be activated (declare (salience -1000)) ?f 40 0 100 years ( (young (0 1) (30 1) (45 0)) (middle age (30 0) (50 1) (65 0)) (old (50 0) (65 1)) ) )
Fig. 2. Linguistic value and membership functions of ‘AGE’
168
C. Koutsojannis, A. Koupparis, and I. Hatzilygeroudis
Gender 1 represents “male” and 2 represents “female”. (deftemplate fz_sex ;Not necessary - just to be sure [in our data no female dies] 1 2 ( (male (1 1) (1.01 0)) (female (1.99 0) (2 1)) ) ) Yes/No 1 represents “yes” and 2 represents “no”. There are some problems in the calculations of certainty factors if ((1 1) (1 0)) is used, so I use ((1 1) (1.01 0)). In theory, the two classes could crossover so that the probability of error could be taken into account. After all, many variables that use this template are highly subjective. However, this doesn’t seem to help in the case of our UCI data. (deftemplate fz_yesno ; yes/no 1 2 ( (yes (1 1) (1.01 0)) ; (1 1) (1 0) complicates things with certainty factors (complementary probability perhaps) (no (1.99 0) (2 1)) ) ) Bilirubin We define the classes based on the Child-Pugh scale, which discriminates bilirubin to 3. We define values from 2 to 3 as “high” with an increasing certainty from 0 to 1, so that values 3 are definitely high. (deftemplate fz_bilirubin ; normal values 0.3 - 1 we have range 0.3-8 0 10 mg/dL ( (normal (0.3 1) (1 1) (2 0)) ; we don't care for low values (high (2 0) (3 1)) ) ) Alcaline Phosphatase We consider normal values 30 – 120 U/dL, according to Harrison’s Principles of Internal Medicine, 15th ed., as we don’t have normal values according to the lab. We
HEPAR: An Intelligent System
169
assume that values higher than the double of the Upper Normal Value (UNV) are definitely “high” while an increase up to 33% of the UNV is definitely not “high”. This assumption is compatible with the expected increase in hepatitis. (deftemplate fz_alkpho ; normal values 30 - 120, we have range 26-295, shows obstruction, in other diseases >1000 0 400 U/dL ( (normal (30 1) (120 1) (180 0)) ; terminate normal at 1.5 x UpperNormalValue (high (160 0) (240 1)) ) ) SGOT We consider normal values 5 – 50 U/L. We define that a value up to three times the UNV is not “high”, while a value more than five times the UNV definitely is. This assumption seems compatible with the UCI data. (deftemplate fz_sgot ; difficult to define normal values (not bell shaped 5 - 50), we have range 14-648, in other diseases >10000 0 1000 U/L ( (normal (5 1) (30 1) (50 0.8) (100 0)) ; terminate normal at 2 x UpperNormalValue (high (150 0) (250 1)) ; start at 3x UNV, established at 5x UNV ) ) Albumin Based on Child-Pugh score once more, we define values less than 2.8 definitely “low” while values over 3.5 are definitely normal – not low [15]. (deftemplate fz_albumin ; we have range 2.1-6.4, Child score divides: >3.5, 2.8-3.5, 100 (useless for our data as all patients have 50. We use this to define a class “extreme” for values >50. (deftemplate fz_pt ; normally this is compared to martyr... I take 12-15 as normal (range 21-100) 0 100 sec ( (normal (0 0) (12 1) (15 1) (19 0)) ; Child divides: normal+6 (high (15 0) (21 1)) (extreme (45 0) (50 1)) ) ) 3.4 Fuzzy Rule Description 3.4.1 Protective Rules These rules assert the fact (class live) when one of the protective factors is present. In total, these conditions may be regarded as protective (Table 2): 1. 2. 3. 4.
young age use of antivirals female gender use of steroids
Regarding gender, there is no clear evidence. Although there are reports that women have better prognosis, many believe that this difference is not statistically significant. In the UCI data none of the 16 women dies, but we don’t believe this can be used. In the rule femalegender the certainty factor is 0 (inactive). There is no clear evidence for the use of steroids, too. In most cases of (viral) hepatitis steroids are not recommended. Under very specific conditions, for example in autoimmune hepatitis, they may have some positive result. In our data a possible negative effect on prognosis is indicated. We don’t know the cause of the disease in every case so that a more specific rule can be formulated (for example a positive effect in acute or autoimmune hepatitis). The rule onsteroids doesn’t seem to affect the program results, so we set the certainty factor to 0 (inactive). It is certain that the use of antivirals in viral hepatitis ameliorates prognosis. However, in UCI data we don’t see a statistically significant difference. We set a CF of 0.1 to the onantivirals rule. Age is a very important factor. We assume that younger patients have better prognosis – 20% more deaths are reported on patients of older age. Youngage rule has a CF of 0.2. We also assume that young age and use of antivirals are independent factors. As such, a patient with both factors has better prognosis. A rule antivir_young is formulated with a 0.3 CF.
HEPAR: An Intelligent System
171
Table 2. Protective fuzzy rules (age young) youngage
(antivirals yes)
(steroids yes)
+
CF 0.2
onantivirals
+
onsteroids
0.1 +
femalegender antivir_young
(sex female)
0.0 +
+
+
0.0 0.3
The certainty factor for the fact (class live) has a range of 0.01 to 0.3, as we see from these rules. With the use of only those 3 rules we get an estimation of 0.719 for the area under the ROC curve, which means that about 72% of the cases can be classified correctly. Indeed, if we assume that only those cases for which a rule has been activated survive, we have an accuracy of 67.7%, which shows mostly the effect of age.
Fig. 3. The general structure of HEPAR
3.5 Intermediate Rules We use these rules for an estimation of the general condition of the patient, the existence of portal hypertension (and therefore cirrhosis) and a test for the factors used by the Child-Pugh scoring system. The general condition is a combination of three variables, fatigue, malaise and anorexia. We assume that these are at least partially independent and that the existence of more than one is a sign of worse general condition. Fatigue and malaise show a statistically significant difference in the two classes – always regarding the UCI data. There are reports that fatigue doesn’t have any prognostic value, though the frequency of appearance in chronic hepatitis is recognized. A group of rules assert the fact (generalcondition bad) when these symptoms are reported. The other combination of clinical evidence is used to evaluate the existence of portal hypertension, which is a result of cirrhosis and therefore a bad prognostic feature.
172
C. Koutsojannis, A. Koupparis, and I. Hatzilygeroudis Table 3. Intermadiate fuzzy rules (fatigue yes)
(malaise yes)
(anorexia yes)
+
generalcondition1
0.5 +
generalcondition2
0.5
generalcondition3 +
generalcondition12
+
generalcondition123
+
+
0.4
+
0.7
+
0.7
+
0.9
+
0.7
+
generalcondition23 generalcondition31
CF
+
Of the available data, portal hypertension is connected with palpable spleen, varices, spider spots, ascites (which is not used in these rules) and maybe a firm liver. A group of rules estimates the existence of portal hypertension and asserts the fact (portal hypertension). Combinations of the variables are pointless in this case, as the existence of one is enough for the diagnosis. Table 4. Diagnosis fuzzy rules (spleen yes) hyperportal1
(kirsoi yes)
(spider yes)
(firmliver yes) CF
+
0.9 +
hyperportal2
0.9 +
hyperportal3
0.9 +
hyperportal4
0.2
Child-Pugh scoring system uses the values of bilirubin, albumin, prothrombin time, severity of ascites and severity of encephalopathy. These are graded from 1 to 3, and thus the result has a range of 5 to 15. For a score greater than 10, the patient is classified to class C where we have the greatest risk. We don’t have the data to evaluate encephalopathy; however, the existence of three of the other four is enough to assume we have a greater danger. A group of rules asserts the fact (child high) when these conditions are met. Table 5. Child fuzzy rules (albumin low) (bilirubin high) highChildscore1
+
highChildscore2
(protime high)
(ascites yes)
CF
+
+
+
+
+
0.8
+
+
0.8
+
0.8
+
1.0
highChildscore3
+
highChildscore4
+
+
highChildscoreall
+
+
+
0.8
HEPAR: An Intelligent System
173
High risk rules The factors used for the assertion of (class die) and the rules that handle them are 1. 2. 3. 4. 5. 6. 7.
child score: this is the most reliable, CF 0.9 general condition: this is not objective, CF 0.1 portal hypertension: this uses clinical observations, CF 0.2 ascites: the existence of ascites alone is a very important prognostic feature; 70% of patients with ascites die in the UCI data. Although child score uses this as well, it is reported as an independent factor. CF 0.5. alkaline phosphatase: not considered a statistically important factor, it is acknowledged as a possible tool, CF 0.35 SGOT and bilirubin values: inspired by Emory score, which uses among others bilirubin and SGPT, we use bilirubin and the available transaminase SGOT, CF 0.5. Age, PT, bilirubin: This is the only rule using the “extreme” PT, based on the King’s College criteria. CF 0.7
Rules with low certainty factor are useful when the data cannot support a rule of higher validity. Table 6. Risk calculation fuzzy rules (child high) (generalcondition bad)
(portal hypertension)
(ascites yes)
+
Ischildhigh
Cf 0.9
+
Badcondition
0.1 +
Hasportalhyper
0.2 +
Hasacites
0.5
For example, when we don’t have the results of laboratory tests we can use the less objective clinical evidence. Table 7. Alternative fuzzy rules (alkpho high) (sgot high) highalcpho emory KingsCollege Transplant
(bilirubin high)
(age old)
(protime extreme)
+
CF 0.35
+
+ +
0.50 +
+
0.70
3.6 Prognosis Rules The majority of the rules are based on an existing grading system or report on the data of the UCI database. Many modern scales cannot be used, as the data necessary for their calculation are unavailable. For example, MELD uses creatinine values and INR, Child-Pugh uses information about encephalopathy, etc. In addition, every scale has
174
C. Koutsojannis, A. Koupparis, and I. Hatzilygeroudis
its use on hepatitis of specific etiology, like Lille score for alcoholic hepatitis, which is important information also absent in the UCI data. These variables are not used in any rules: gender, steroid use, liver size and decision to perform biopsy. Liver size, while useful for diagnosis, cannot help in the evaluation of prognosis. It is difficult to have an objective estimation on liver size. Moreover, only the change over time could be useful in the evaluation of the severity of the disease. In the UCI data, the difference in the two classes is not statistically important. The decision to perform biopsy may be a sign of chronic disease, which could lead to a worse prognosis. However, the biopsy result may result in better treatment. Statistical processing on the UCI data suggests a difference, with fewer deaths on those patients who were subjected to liver biopsy. We cannot support a reasonable rule using this information; on the other hand, random tests don’t seem to ameliorate the prognostic value of the rules.
4 Implementation Issues 4.1 FuzzyCLIPS 6.10d The system has been implemented in FuzzyCLIPS 6.10d. Two main facts are used, (class die) and (class live). Every rule representing risk factors asserts the first, while every protective rule asserts the second. The final result is the difference between the certainty factors of these facts. While forming the rules, we keep in mind that some of the data can be combined (as all grading systems do) to form values useful for prognosis. The representation of the data is achieved through a fuzzy template which defines “normal” and “abnormal” for every variable. Default fuzzy values with equal representation to all their range are used when a value is missing. The program is segmented in 4 files. Input-output processes We begin with batch file start.bat (defglobal ?*out* = "2") (defglobal ?*in* = "") (load "definitions.clp") (load "functions.clp") (load "rules.clp") (open "../Hepatitis-uci/hepatitis.data" infile "r") (open "results.csv" outfile "w") (while (stringp (bind ?*in* (readline infile))) (readdatafacts ?*in*) (run) (printout outfile ?*in* "," ?*out* t) (reset) ) (close infile) (close outfile)
HEPAR: An Intelligent System
175
Variables out and in are defined. In will store a line of input from file hepatitis.data and out will store a value in [1,2] where 1 classifies the patient to class DIE and 2 to class LIVE. If no rule is activated, out retains its default value of 2. Fuzzy templates are defined in file definitions.clp, a helper function readdatafacts is defined in file functions.clp and the rules are stored in file rules.clp. Input and output files are opened. Input file contains lines of this format: 2,30,2,1,2,2,2,2,1,2,2,2,2,2,1.00,85,18,4.0,?,1 A while loop begins, where 1. 2. 3. 4. 5.
the next line of the input file is stored to variable in function readdatafacts converts this line to facts rules are executed with (run) and thus the value of out is set the input line and the result are stored to the output file facts are cleared with (reset)
The loop terminates when in doesn’t contain a string, in other words when end of file has been reached and EOF symbol is returned.
ROC Curve 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 4. Receiver Operating Characteristic curves
Finally, the files are closed. The output file is a comma separated values (CSV) file and can be readily processed from spreadsheets programs (like Excel).
176
C. Koutsojannis, A. Koupparis, and I. Hatzilygeroudis
4.2 Template Definition The following fuzzy templates are defined in file definitions.clp 1. 2. 3. 4. 5. 6. 7. 8.
fz_age fz_sex fz_yesno fz_bilirubin fz_alkpho fz_sgot fz_albumin fz_pt
A template is also defined for the patient. Each given variable for the case is stored in a slot: (deftemplate patient (slot prognosis) (slot age (type FUZZY-VALUE fz_age)) (slot sex (type FUZZY-VALUE fz_sex)) (slot steroid (type FUZZY-VALUE fz_yesno)) (slot antivirals (type FUZZY-VALUE fz_yesno)) (slot fatigue (type FUZZY-VALUE fz_yesno)) (slot malaise (type FUZZY-VALUE fz_yesno)) (slot anorexia (type FUZZY-VALUE fz_yesno)) (slot bigliver (type FUZZY-VALUE fz_yesno)) (slot firmliver (type FUZZY-VALUE fz_yesno)) (slot spleen (type FUZZY-VALUE fz_yesno)) (slot spider (type FUZZY-VALUE fz_yesno)) (slot ascites (type FUZZY-VALUE fz_yesno)) (slot kirsoi (type FUZZY-VALUE fz_yesno)) (slot bilirubin (type FUZZY-VALUE fz_bilirubin)) (slot alkpho (type FUZZY-VALUE fz_alkpho)) (slot sgot (type FUZZY-VALUE fz_sgot)) (slot albumin (type FUZZY-VALUE fz_albumin)) (slot protime (type FUZZY-VALUE fz_pt)) (slot biopsy (type FUZZY-VALUE fz_yesno)) )
5 Experimental Results The program returns a continuous value with a minimum of 1 for the cases definitely belonging to class “die”, whereas if only protective rules are activated, we get values larger than 2, with a maximum of 2.3. We need to set a threshold over which cases are classified to class “live”. Afterwards, we can perform the classical analysis for false and true positive and negative results, and calculate the sensitivity and the specificity. For example, for a threshold of 1.87 we have 96.9% sensitivity (only one false
HEPAR: An Intelligent System
177
negative) and 69.9% specificity. Of those classified to “die” only 45% do indeed die, whereas 98.9% of those classified to “live” actually survive. If we set a threshold of 1.68, specificity rises to 91.9%, but sensitivity goes down to 62.5%. This relationship between sensitivity and specificity is readily observed in Receiver Operating Characteristic1 curves, where sensitivity vs. (1 – specificity) is plotted for every threshold point [18, 19]. The area under the curve is a measure for the discriminative value of the test, with 1 showing a perfect discrimination while 0.5 is a useless test (equal to flipping a coin). For the estimation of the curve we use the Johns Hopkins University program in the website http://www.jrocfit.org/ (some numeric conversions are necessary for compatibility). The calculated area under the ROC curve is 0.911 (standard error 0.0245), which is considered a very good result. It is possible that changes in the certainty factors or the boundaries in the definitions of abnormal could give slightly better results. The calculated ROC curve and 95% confidence intervals: The estimation for the area under the curve: AREA UNDER ROC CURVE: Area under fitted curve (Az) = 0.9110 Estimated std. error = 0.0245 Trapezoidal (Wilcoxon) area = 0.9069 Estimated std. Error = 0.0364
6 Related Work Various systems have been created in order to be used as hepatitis diagnosis but no one for liver transplantation. One of the most successful is HERMES [20]. This system focus is the evaluation of prognosis in chronic liver diseases. Our focus is the prognosis in hepatitis, regardless if it is acute or chronic. So, the systems overlap in the case of chronic hepatitis. Our system covers cases of acute hepatitis; Hermes covers cases of chronic liver diseases other than hepatitis. Note that UCI doesn’t specify the patient group in hepatitis – it is not certain if our data cover both acute and chronic disease. HERMES evaluates prognosis as a result of liver damage, therapy, hemorrhagic risk and neoplastic degeneration effects – it calculates intermediate scores which are linearly combined (with relevant weights) to estimate prognosis. Our data can cover estimations of liver damage and therapy (antiviral) effects on prognosis, and partially (though the existence of varices) hemorrhagic risk; every rule has a certainty factor, similar to the weight of HERMES. In general, HERMES makes use of more functional tests – they report 49 laboratory tests. Some rules make use of temporal data as well. Our variables are fuzzy. For example, HERMES would assert a glycid score (a functional test we don’t have in our data) if it is > 200, or if we have two tests > 140. A similar variable in our data would be defined as definitely 1
The name originates from the ability of the operator to discriminate whether a dot on the radar is an enemy target or a friendly ship. ROC curves are a part of Signal detection Theory developed during 2nd World War.
178
C. Koutsojannis, A. Koupparis, and I. Hatzilygeroudis
abnormal if it was > 200, definitely normal if it was for example < 120 and 50% abnormal if it was 140. This gives our program power over marginal scores. The rule would be activated in our program if we had 199, but not in HERMES. Note that the HERMES description does not cover all their variables and rules. It isn’t specified if the program gives some result when many variables are missing. The authors don’t give any results: we cannot evaluate the HERMES approach – at least not in this article where the prototype is discussed. In an other system for the diagnosis, treatment and follow-up of hepatitis B is described in [21]. The focus of this system is the diagnosis through serological tests (we don’t have any in the UCI data) of the exact type of the virus and an estimation (using biochemical data and patient’s history) of a possible response to therapy (interferon therapy). The system uses rules, and thus judged a very powerful learning tool. However, “in real-life cases, consultation of an expert system cannot serve as a substitute of the consultation of a human expert”. This is one of the many examples of expert systems focused on diagnosis. In viral hepatitis a multitude of serological data and their changes over time must be evaluated in order to establish a diagnosis with certainty, a task rather complicated for a non-expert doctor or a student and that is why most systems aim at this problem. The rules are well-known and simple, which makes the use of expert systems ideal. Neural networks have been used, as well. Data differentiation and parameter analysis of a chronic hepatitis B database with an artificial neuromolecular system [22]. For hepatitis B, one can define different classes, healthy subjects, healthy carriers, chronic patients. Thus, neural networks can be used to discriminate them. The last one is described in [22] were the authors use similar data (plus a serological), but their aim is diagnosis, and finally don't use rules.
7 Conclusions and Future Work In this paper, we present the design, implementation and evaluation of HEPAR, a fuzzy expert system that deals with hepatitis prognosis and liver transplantation. The diagnosis process was modeled based on expert’s knowledge and existing literature. Linguistic variables were specified based again on expert’s knowledge and the statistical analysis of the records of 155 patients from the UCI hepatitis database. Linguistic values were determined by the help of expert, the statistical analysis and bibliographical sources. It cannot be denied that an expert system able to provide correct prognosis in the field of chronic liver disease is to be considered an intrinsically ambitious project, for which there is no guarantee of the final success. The large number of variables involved, the fact that knowledge of various aspects of the problem is still incomplete and, above all, the presence of a variable which is impossible to quantify, i.e. the peculiarity of each patient, may represent a barrier to achieving the intended results, especially with reference to individual patients. Finally setting up a prototype of this kind may provide new information about the mechanisms which regulate the development the disease severity and risk for liver transplantation, providing a more rational foundation for those assumptions which have so far been based only on the intuition and experience of the non-expert physician. Experimental results showed that HEPAR did quite well. A possible improvement could be the re-determination of
HEPAR: An Intelligent System
179
the values (fuzzy sets) of the linguistic variables and their membership functions. Better choices may give better results. One the other hand, use of more advanced representation methods, like hybrid ones [23], including genetic algorithms or neural networks may give better results.
References [1] Disorders of the Gastrointestinal System: Liver and Biliary Tract disease. In: Harrison’s Principles of Internal Medicine, 15th edn (2001) [2] Johnston: Special Considerations in Interpreting Liver Function Tests. American Family Physician 59(8) (1999) [3] Shakil,, et al.: Acute Liver Failure: Clinical Features, Outcome Analysis, and Applicability of Prognostic Criteria. Liver Transplantation 6(2), 163–169 (2000) [4] Dhiman, et al.: Early Indicators of Prognosis in Fulminant Hepatic Failure: An Assessment of the Model for End-Stage Liver Disease (MELD) and King’s College Hospital Criteria, LIVER TRANSPLANTATION 13, pp. 814–821 (2007) [5] Soultati, A., et al.: Predicting utility of a model for end stage liver disease in alcoholic liver disease. World J. Gastroenterol 12(25), 4020–4025 (2006) [6] Kamath, Kim.: The Model for End-Stage Liver Disease (MELD), HEPATOLOGY (March 2007) [7] Schepke, et al.: Comparison of MELD, Child-Pugh, and Emory Model for the Prediction of Survival in Patients Undergoing Transjugular Intrahepatic Portosystemic Shunting. AJG 98(5) (2003) [8] Cholongitas, E., et al.: Review article: scoring systems for assessing prognosis in critically ill adult cirrhotics. Aliment Pharmacol Ther. 24, 453–464 (2006) [9] Louvet, et al.: The Lille Model: A New Tool for Therapeutic Strategy in Patients with Severe Alcoholic Hepatitis Treated with Steroids. HEPATOLOGY 45(6) (2007) [10] Botta, et al.: MELD scoring system is useful for predicting prognosis in patients with liver cirrhosis and is correlated with residual liver function: a European study. Gut. 52, 134–139 (2003) [11] Hepatitis UCI database. G. Gong (Carnegie-Mellon University) via Bojan Cestnik, Jozef Stefan Institute, Ljubljana Yugoslavia (1988) [12] Chan, et al, Evaluation of Model for End-Stage Liver Disease for Prediction of Mortality in Decompensated Chronic Hepatitis B. American Journal of Gastroenterology (2006) [13] Katoonizadeh, et al.: MELD score to predict outcome in adult patients with nonacetaminophen-inducedacute liver failure. Liver International (2007) [14] Jones,: Fatigue Complicating Chronic Liver Disease. Metabolic Brain Disease 19(3/4) (December 2004) [15] Forrest, Evans, Stewart, et al.: Analysis of factors predictive of mortality in alcoholic hepatitis and derivation and validation of the Glasgow alcoholic hepatitis score. Gut. 54, 1174–1179 (2005) [16] Wadhawan, et al.: Hepatic Venous Pressure Gradient in Cirrhosis: Correlation with the Size of Varices, Bleeding, Ascites, and Child’s Status. Dig. Dis. Sci. 51, 2264–2269 (2006) [17] Vizzutti, et al.: Liver Stiffness Measurement Predicts Severe Portal Hypertension in Patients with HCV-Related Cirrhosis. HEPATOLOGY 45(5) (2007) [18] Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Researchers (2004)
180
C. Koutsojannis, A. Koupparis, and I. Hatzilygeroudis
[19] Lasko, T.A., et al.: The use of receiver operating characteristic curves in biomedical informatics. Journal of Biomedical Informatics 38, 404–415 (2005) [20] Bonfà, C., Maioli, F., Sarti, G.L., Milandri, P.R.: Dal Monte HERMES: an Expert System for the Prognosis of Hepatic Diseases, Technical Report UBLCS-93-19 (September 1993) [21] Neirotti, R., Oliveri, F., Brunetto, M.R., Bonino, F.: Software and expert system for the management of chronic hepatitis B. J. Clin. Virol 34(suppl. 1), 29–33 (2005) [22] Chen, J.: Data differentiation and parameter analysis of a chronic hepatitis B database with an artificial neuromolecular system. Biosystems 57(1), 23–36 (2000) [23] Medsker, L.R.: Hybrid Intelligent Systems. Kluwer Academic Publishers, Boston (1995)