IntuamOondJournalof
ELSEVIER
International Journal of Bio-Medical Computing 40 (1996) 179-185
Bin-Medical Computing
A new approach to differential diagnosis of diseases M.R.
O r t i z - P o s a d a s *a,b, J . F . M a r t i n e z - T r i n i d a d c, J. R u i z - S h u l c l o p e r ".d
aCINVESTA V-IPN, Electrical Engineering Department, Computer Sciem'e Section, Av. lnstituto Polit&'nieo National 2508, Col. San Pedro Zaeatenco, C.P. 07300 Mbxico, D.F. MExico bUAM-lztapalapa, Electrical Eng#leering Departnwm. Av. Micboachn y la Purls#ha s/n Col. Vicentina, Del. lztapalapa. C.P. 09340 ME.rico, D.F. MOxico ~BUAP, Benembrita Universidad Aut6noma de Puebla, Pbysk's and Mathematics Sciences Faculo,, Computer Science College, Av. San Claudio 3, 14Sur, Ciudad Universitaria, Puebla, Pue., MExico dlCIMAF, hlstituto de Cibernktk'a, Matemhtica y Fisica, Academy of Science .fiom Cuba, Calle E No. 309 Esq. A 15, Vedado, C.P. 10400, Ciudad Habana, Cuba
Received 14 August 1995; accepted 20 September 1995
Abstract
The main goal of this paper is to show the usefulness of the logic-combinatory approach in pattern recognition theory for developing auxiliary criteria for differential medical diagnosis, based on the methodology presented by Heathfield et al. (J Biomed Eng 13 (1991) 51-57). Firstly, we propose a change in the characterization base, from disease characterization to patient characterization then, we suggest a k-valued treatment for variables which allows us to assign them values in a wider range in order to represent different degrees in symptom manifestations. Secondly, the methodology proposed is based on Testor Theory. This theory allows us to obtain the minimum combination of features (symptoms) and the set of features combination equally discriminant (typical testors) among the diseases considered. Then, applying some classification algorithm that uses typical testors, physicians will have more making differential flexibility diagnosis.
Keywords: Computer-assisted diagnosis; Decision support; Differential diagnosis; Medical decision making; Pattern recognition; Testor theory
I. Introduction
A quick review on recent papers reveals a great interest in developing decision support systems for medical diagnosis applying several approaches as the probabilistic approach [1,2] the logical ap* Corresponding investav.mx.
author,
email:
[email protected]
proach (related to expert systems) [3,4] and those knowledge based systems that incorporate other decision support techniques [5-7]. It is in this last category that this work is classified. The main goal of this paper is to show the usefulness of the logic-combinatory approach in pattern recognition theory [8,9] for developing auxiliary criteria for differential medical diagnosis, based on the methodology presented in Heathfield
0020-7101/96/$15.00 © 1996 Elsevier Science Ireland Ltd. All rights reserved SSDI 0020-7101 (95)01143-3
180
M.R. Ortiz-Posadas et al. / hnernational Journal of Biomedical Comput#lg 40 (1996) 179-185
et al. [5]. There is a description of five different diseases in terms of five associated symptom manifestations, incorporated within a matrix called decision table in which the diagnosis is reduced to finding the minimum cover of the matrix. This approach is a particular case of the calculation of typical testors in Testor Theory [10,11] it will also be demonstrated that if the patient is characterized instead of the disease (which is the case physicians are normally in), it can also perform differential diagnosis of diseases.
M, L, because these represent the symptom manifestation in certain degrees in the disease characterization and ignoring them strictly means loss of information. In Heathfield et al. [5] the authors consider this fact and point out that it depends on the interest of the user and on the application, these uncertainty levels can be gradually incorporated in the transformation to the boolean matrix. For example, to include all those symptoms which have an H assigned we would substitute them by one, in addition to the A and N symbols previously mentioned.
2. Knowledge representation 2.1. Boolean treatment of variables
Representing knowledge involved in the process of medical diagnosis is an important problem because of its inherent subjectivity. Many approaches exist for accomplishing this. For example, in Heathfield et al. [5], the symptom manifestation which characterized certain diseases is represented by five levels expressed as A, H, M, L, N. The intermediate levels (H, M, L) have an uncertainty associated with them and the A and N levels represent absolute knowledge, i.e. a symptom is manifested or not. Nevertheless, it is necessary to transform these variables (which can be considered linguistic [12]) to magnitudes that allow computational processing in order to use them in an automated system. In this sense, there are many ways for transforming them; in Heathfield et al. [5], a transformation into boolean values is proposed, expressing the presence or absence of the symptoms in terms of ones or zeros respectively. It is important to realize that the symbol '?' used in the decision table means absence of information (we do not know if the symptom manifests or not) about a particular disease. Heathfield proposes a substitution of symbols A and N by one and zero, respectively, and the remaining enteries in the table (H, M, L) by the symbol '?' because of their associated uncertainty. At this point it is necessary to realize some aspects, first of all, although it is true that symbols A and N represent absolute knowledge and therefore they can be substituted directly by the one and zero codes, respectively, it is also true that we can not ignore the remaining symbols H,
Although in Heathfield et al. [5] there is no explanation of how to incorporate the next uncertainty levels (M, L), we propose that the M symbol be substituted by one or zero and the L symbol be substituted by zero, in such a way that knowledge representation with the uncertainty levels proposed in Heathfield et al. [5] along with the criteria proposed here, results in the transformation to a boolean matrix shown in Table 1. It is also necessary to realize some other aspects. As it was mentioned earlier, Heathfield et al. [5] considers incorporating the uncertainty levels of knowledge in the transformation to boolean variables, however, there is an erroneous transformation of them. It is known that A implies the absolute presence of the symptom but although H implies the presence of the symptom in a frequent manner it does not strictly mean the same as A, so the substitution of H symbol by one would imply they are both the same, which is not strictly Table 1 Boolean t r e a t m e n t for the v a r i a b l e s Iterations
A H M L N
la.
2a.
3a.
4a.
5a.
l ~ ~ ~ 0 ~
1 1 ° o 0 ?
1 1 ~ 0 0 ?
1 1 1 0 0 ?
l 1 0 0 0 ?
M.R. Ortiz-Posadas et al. / hlternational Journal o f Biomedical Computing 40 (1996) 179-185 Table 2 An example for k-valued assignment to variables Variable
k-valued assignment
A H M L N ?
4 3 2 I
0
correct. Likewise, symbol L which means that the symptom occasionally manifests, cannot be substituted by zero because this would mean that the symptom never appears, which happens to be false. Therefore, it may be concluded that even considering all the entries in the transformation to boolean matrix, loss of information still exists. However, the use of this criteria in the transformation of variables is justified because the proposed methodology in Heathfield et al. [5] allows only boolean values. 2.2. K-valued treatment of variables
The fact that during variable transformation to boolean values there is a loss of information, suggests certain risk in the decision making process. That is why, in order to rescue such information, we propose a k-valued treatment of variables, meaning to assign values the {0, 1. . . . k - I} set allowing us to have a range for the s y m p t o m manifestation. For example, regarding knowledge representation in Heatfield et al. [5], instead of assigning only ones and zeros, it could be possible to assign values in a wider range to variables (as shown in Table 2). The absence of the symptom would have assigned a zero (as in the boolean case) but the absolute manifestation of it would have assigned a 4; besides, we can still use symbol '?' to represent absence of information. Expressing knowledge in terms of k-valued variables yields a matrix as the one shown in Heathfield et al. [5] but with a wider range on the assignments. The problem that can arise is how to know whether a symptom manifestation with different values is similar or not, i.e. how to
181
compare different values of the same variable. This problem is solved by introducing the next function: Let Ci be a function called comparition criteria, be defined as: Ci:Mi
x
Mi ~
A
where M~ is the set of allowed values for variable x;, plus '?' to represent absence of information in such a way that M i = {?,~l .... ~k}
and A is any bounded representation space for variables which depends on the information levels required for the problem solution. The C~ function can be used for similitude comparison or difference comparison between variable values, each of them representing the magnitude of similarity (or difference respectively) between two different values of the same variable. 3. The extended method While considering a supervised classification problem with the logic-combinatory approach, it is necessary to construct a learning matrix (LM), as the one shown in Table 3. Let O be a finite set of objects. A description I ( O i ) is defined for every object Oi ~ O. The description of every object (pattern) is represented as a finite sequence of values of x~ features (variables) for i = 1..... n: I(OA (~,..... ~z,,), o~i E M i. It is known that the O set is conformed by a finite number of subsets Kj ..... K / called classes. Each class K~,i = 1..... l is =
Table 3 Learning matrix with with m objects, n variables and / classes Objects LM
Variables xl
...
x,
Ol
~I I :
""
~In :
Op
~p I
"'"
"~l m
Oq
lq I
"'"
~qn
"'"
~mn
K I
K t
:
Om
:~ml
:
182
M.R. Ortiz-Posadas et al. t hlternational Journal o / Biomedical Computing 40 (1996) 179-185
defined by a number of objects. This information about classes is called trainhlg inJormation. If the problem is related with Medicine and if the number of the objects in each class is greater than one, it means that characterization is done on patients by means of their symptom manifestation associated with a particular disease, and objects Oi in LM are the patients, features x~ are the symptoms and classes K~ are the different diseases a patient can suffer. If the number of objects in every class is exactly one, the learning matrix will be the same as the transposed decision table presented in Heathfield et al. [5], because in this case characterization was done on the disease and this means there is only one description for each disease incorporated in the diagnosis (at the same time, each disease represents a class) and the variables, as in the early case, correspond to the symptoms considered in the disease characterization. The algorithm presented in Heatfield et al. [5] expressed like a supervised classification problem with logic-combinatory approach, is shown below. 3. I. The algorithm
In general terms, the algorithm is 'attempting to improve upon the minimum cover technique as described by Reggia et al. with adaptation of hypergraphs partitioning' [5]. The diagnosis problem is reduced to finding the cover for all diseases considered in the decision table with the minimum number of features, i.e. to be able to differentiate among diseases with the minimum number of symptoms. We will show that this methodology is equivalent to typical testors calculation within Testor Theory [10,11]. Assuming that LM is the transposed decision table used in Heatfield et al. [5], O~ means a disease, x~ means a symptom and x~ (O~) is the value of variable xi for the O~ disease, moreover, every class has only one object: the description of the disease O~. Let E ( x ) be the set of all the description pairs of different diseases similar in symptom x, defined as follows: Let
E ( x ) = { ( O m O u ) / O p e K~/x Ou ~ ~. A i #j ^ C(x(@,),x(Ou) ) = 1}
where C is a boolean similitude comparison criteria. Also, we define: E'(x) =
U i,j
=
(K, x K~.) - E(x)
I....,/
as the set of description pairs of diseases differentiable by symptom x, i.e. the x value is different for every disease (different classes). It is necessary then, to calculate E'(xi) for i = 1..... n and determine the set of all symptoms capable of differentiating between every pair of diseases, in a way that if we know every set E ' ( x I) ..... E ' ( x , ) then, x(Kfi) = {x/O,, ~ K, ^ 0,, ~ E~ ^ (@,,0,,) ~ E'(x)}
is the set of all symptoms that differentiate between two descriptions of diseases (Op, Ou). This corresponds in Heathfield et al. [5] to set ek~ which contains symptoms that differentiate the diseases pair (Pk, P~). These symptoms have a different value for each disease. It is necessary to point out that depending on the number of objects in each class, set x ( K u ) will have different interpretations. We will discuss this aspect in the next section. 3. I. 1. The n u m b e r o/" objects #7 each cktss is greater than one (a) There exists a pair of descriptions of different diseases that has different values for all their symptoms x,., r = 1..... s. 3(Or, O u) e Ki x
Ki (i ~ j ) =
V r = 1 .... s(@,,O,)~
E'(x,.)
l ..... I such that
(b) There exists a pair of descriptions of different diseases that has different values for at least one of their symptoms x,., r = 1 ..... s. 3(@,,O,~)eKi
x K i(i
~ j)=l
.... I s u c h that
3 r = 1..... s ( @ , , O , ) e E'(x,.) (c) All pairs of descriptions of different diseases have different value in all their symptoms x,., r =
1 .... ,s.
183
M.R. Ortiz-Posadas et al. / hlternational Journal of B&mTedical Comput&g 40 (1996) 179-185 ¥(0~,,0 u) ~ K~ x K~ (i # j ) = 1..... I such that
Table 4 Learning matrix constructed by transposing ~decisiontable' [5]
V r = 1.... s(Op, O,i)~E'(x,.)
Symptoms
(d) All pairs of descriptions of different diseases have different value in at least one of their symptoms x,., r = I .... ,s. V(Op, O,~)~K, x K i ( i
~ j ) = l .... l such that
F;
Interpretations (a) and ( b ) j u s t consider the comparison between two of objects, each of them belonging to a different class and having different values in one or all their features. This allows the possibility that remaining objects in each class can be mistaken for another, therefore both interpretations are not useful for the proposed goal: we want to differentiate between classes K, and ~ , no matter which pair of objects is selected. Interpretation (c) is too rigurous for what the classification needs. It pretends that all pairs of objects from different classes differentiable in all their features. This is not the real case in medical diagnosis, because it would mean that two patients suffering different diseases ought to have different manifestations in all their symptoms, which is not necessarily true, because there are diseases characterized by similar symptoms. For all previously mentioned, if the number of elements in the class K; is greater than one (meaning patient characterization) the real interest is to be able to discriminate among all objects clustered in different classes and therefore the useful interpretation for it is (d), because it does not matter how many features differentiate between objects of different classes, it is enough if there is at least one differentiable feature. 3.1.2 The number o f objects 97 each class is exactly olle Determining set x(Ku) when cadinality of K, = 1, Heathfield et al. [5] means that every class has incorporated just one object description, which reduces the four possible interpretations into only two: ( a ) = (c) and ( b ) = (d). Same as in the previous case the useful interpretation is (d) = (b).
F,
F~
,~ ?
P;
0
I
?
0
Pz
I
?
!
1
1
1
0
?
0
0
Diseases P3
3r = 1.... s(Or, O,i)~ E'(x,.)
F,
P4
I
I
I
1
I
P5
0
?
0
I
0
Considering again the algorithm presented in Heathfield et al. [5] and expressing the total covering matrix E in terms of the logic-combinatory approach, it is defined as follows: Q(LM) = {x(K~j)/x(K~i ) = 0 } Each feature x; is associated to a subset of Q(LM) which denotes which disease pairs it differentiates: x; ~
{x(k~j) ~ Q ( L M ) / x , ~ x(K~i)}
Vi ~ j = 1..... l where x(K0.) represents a pair of disease descriptions such that:
O,, ~ K , ^ O , , ~ ~ The minimum set of features which covers the learning matrix L M will denoted by =
Ix,, ..... x,.~}
and has the following properties: Pt. V ( O p , O q ) e K , x K / ( i ~ j ) = l ..... 1 3 x , . e r (O;,,Oq) e E'(x;,.), and;
P2. "~3 z' c r such that satisfies Pt. Set r is a Testor Theory concept called typical testor [10,1 1]. It is important to realize that Heathfield et al. [5] presents an algorithm that determinates the most important symptoms for making differential diagnosis among the diseases considered, yielding as result the set {(F~, F4) }, and points out that these symptoms do not have the capability of discriminating between diseases P2 and P4. If the decision table presented in Heathfield et al. [5] is transposed (Table 4) a learning matrix with dis-
M.R. Ortiz-Posadas et al. / hlternational Journal of Biomedical Computhlg 40 (1996) 179-185
184
in Table 5 we can see that the description for each testor in each class is different for every symptom pair (there are no equal rows in any matrices) in such a way that each of these symptom combinations is equally discriminant among diseases P~, P3, Ps"
sense, this methodology does not limit the features to a single number or combination and means more flexibility for the physician in making a decision about the disease the patient suffers. There are several algorithms using typical testors for solving supervised classification problems with a logic-combinatory approach such as voting algorithms [14], kora-~ [15], representative sets [16], etc. The idea of these algorithms is to classify objects according to their description and their similitude with the objects already classified in the K~ classes. It would be possible to apply any of these algorithms to the learning matrix in order to classify a new object, obtaining the decision criteria for the class it belongs to.
4. Classification
5. Conclusions
The histopathological diagnosis of breast disease presented in Heathfield et al. [5] is made by symptom exploration, depending on the value assigned to each of them and relating them to the matrix minimum covering E: {(Fi, F4)}. For example, if the symptom values were F t = 1 and F4--0, the disease having this combination (in matrix shown in Table 4) would be P3 so this would be the diagnosis. Although it is important to realize that a diagnosis result depends on the manifestation of these particular symptoms. If this does not occur it is not possible to make a decision, this fact limits the system, Calculating typical testors (and continuing with methodology in Heathfield et al. [5]) brings all possible feature combinations equally discriminant among the different classes in such a way that when we want to classify a new object, its symptom manifestations compare against the different testors obtained. This allows us to classify the object in one or more classes. In this
Regarding knowledge representation, this work provides first of all, a wider criteria to obtain the boolean matrix presented in Heathfield et al. [5] allowing us to incorporate, in a gradual manner, the different uncertainty levels. It also proposes the k-valued treatment of variables to rescue information eventually lost with the boolean treatment of the matrix, allowing us to express the symptom manifestations by degrees. Likewise, it has introduced the comparison criteria function, C~, which allows us to make comparisons among different values from the same variable, expressed by boolean-values or k-values to know if they are similar or not. We discovered that methodology in Heathfield et al. [5] is a particular case of typical testors calculating in Testor Theory. This theory allows us to obtain the minimum combination of features (cover E) and the set of all feature combinations equally discriminant (typical testors) between different classes incorporated in the learning matrix. By applying some classification algorithm that uses typical testors, physicians will have more flexibility in the system for making differential diagnosis. The methodology introduced in this work allows us to process information, in both cases where the number of objects in every class is equal to one, as well as in those cases where the number of objects in every class is greater than one,
joint classes is generated and set {(Ft, F4)} corresponds to an irreductible feature combination [10,11]. This is the typical testor concept previously mentioned applied to disjoint classes. If we use any algorithm to calculate all typical testors [13] in the matrix shown in Table 4, it would obtain the follow set of typical testors: z = {(F,, Fs), (F., Fa), (F 2, F4), (F4, Is)}
Table 5 All typical testors c o r r e s p o n d i n g to the m a t r i x in T a b l e 4 F, F4
TI
F, F s
F_, F 4
i! !! ii r~ =
r3=
F4 F 5
r~ =
0 P3
1 Ps
M.R. Ortiz-Posadas et al. / hlternational Journal of Biomedical Computhlg 40 (1996) 179-185
depending on the object characterization. In Medical context it means disease characterization or patient characterization respectively. As we said, patient characterization .is the case physicians are normally in. The methodology shown in this paper is of extremely useful, not only in the hystopathological diagnosis of breast diseases [5], but in the differential diagnosis of any diseases, any time we have the disease or patient characterized by their symptom manifestations. Finally, it is important to say that we are now working on fuzzy knowledge representation and the necessary tools to compare variable values expressed in the fuzzy domain.
Acknowledgement The authors wish to thank the National Council of Science and Technology (CONACYT) from M6xico for the support provided.
References [1] Blinowska AG, Chatellier G, Wojtasik A and Bernier J: Diagnostica - - a bayesian decision aid system - - applied to hipertension diagnosis. IEEE Trans Biomed Eng, 3 (1993) 230-235. [2] Edwards NH: The accuracy of a bayesian computer program for diagnosis and teaching in acute abdominal pain of chilhood. Comput Meth Prog Biomed, 23 (1986) 155160. [3] Alonso-Betanzos AV, Moret-Bonillo J and HernfidezSande C: Fetus: An expert system for fetal assessment. IEEE Trans Biomed Eng, 38 (1991) 199-211. [4] Okada M, Okada M: Knowledge representation and compilation for symptom-disease-test relationships. IEEE Trans Biomed Eng, 36 (1989) 547-551. [5] Heathfield HA, Winstanley G, Kirkham H: Decision support system for the differential diagnosis of breast disease. J Biomed Eng, 13 (1991) 51-57. [6] Ortiz PM: Diarrheic diseases diagnostic by analytic hierarchy method. Proc 5th Int Conf on Systems Science on Health Care, Prague, 1992, pp. 1148-1151. [7] Tohfi JC, Fuentes P, Soto MA: Algorithm for assisting medical diagnosis. Comput Meth Prog Biomed, 39 (1993) 303-309.
185
[8] Ruiz-Shulcloper J, Lazo-Cort6s M: In Modelos matemhticos para el reeonocimiento de pafrones, (Ed: UNAM), Aportaciones-textos, IMATE. M6xico, 1994. [9] Ruiz-Shulcloper J. et al: In bm'oducci6n al reconocimiento de pafi'ones: enfoque 16gico-eombinatorio, (Ed: CINVESTAV-IPN), Serie verde, M6xico, 1994. [10] Ruiz-Shulcloper J. et al: In T6pieos acerca de la teoria de restores, Collecci6n amarilla No. 134, (Ed: CINVESTAVIPN), M6xico, 1994, pp. 1-51. [11] Ruiz-Shulcloper J. et al: hltroduccibn a la teoria de testores, (Ed: CINVESTAV-IPN), Serie verde, M6xico, 1994. [12] Zadeh LA: The concept of a linguistic variable and its application to approximate reasoning 1, hlfSci, 8 (1975) 199- 249. [13] Ruiz-Shulcloper J. et al: Algoritmos BT y TB para el c~iculo de todos los testores tipicos, Revista Ciencias Matemhtieas, 2 (1985) 11-18. [14] Ruiz-Shulcloper J. et al: ALVOT, sistema de programas de algoritmos de votaci6n para la clasificaci6n, Revista Ciencias Matem6tieas, 7 (1986) 41-60. [15] De la Vega-Doria LA: Extensi6n al caso difuso del algoritmo de clasiflcaci6n KORA-3. In Tesis para obtener el grado de Maestro en Ciencias(Ed: CINVESTA-IPN), M6xico, 1994. [16] Carrasco-Ochoa JA: Clasificadores basados en conjuntos representantes. In Tesis para obtener el grado de Maestro en Cieneias (Ed: CINVESTAV-IPN), M6xico, 1994. [17] Bankowitz R and Miller RA: Computer assisted medical diagnostic consultation service. Ann Intern Med, 110 (1989) 824-832. [18] Boom AR. Looking for indicants in the differential diagnosis of jaundice. Med Decision Makh~g, 6 (1986) 36-41. [19] Segaar RW, Wilson J, Habbema J, Hilden J. A computer aid for early diagnostic classification of jaundice (The COMIP program). Comput Meth Prog Biomed, 28 (1989) 131-136. [20] Boom AR, Fonseca L, Yafiez C, et al. Differential diagnosis between amoebic liver abscess and acute cholecystitis. J Med Systs, 3 (1983) 205-212. [21] Szolovits P, Pauker SG. Computers and clinical decision making: whether, how and for whom?. Proe. IEEE, 67 (1979) 1224- 1226. [22] Waxinan HS, Worley WE. Computer-assisted adult medical diagnosis: subjet review and evaluation of a new microcomputer-based system. Medichle, 68 (1990) 125136. [23] Ruiz-Shulcloper J. Modelos de algoritmos de reconocimiento con aprendizaje parcial. Proc II1 Congreso lberoamericano de hlteligencia Artificial IBERAMIA'92, La Habana 1992, 541-559. [24] Shortliffe EH. Computer programs to support clinical decision making. J Am Med Assoc, 258 (1987) 61-66.