Generating fuzzy rules for constructing interpretable ... - Springer Link

1 downloads 0 Views 1MB Size Report
Mohamadi H, Habibi J, Abadeh MS, Saadi H (2008) Data mining with a simulated annealing based fuzzy classification system. Pattern Recognit 41:1841–1850.
Australas Phys Eng Sci Med DOI 10.1007/s13246-012-0155-z

REVIEW

Generating fuzzy rules for constructing interpretable classifier of diabetes disease Nesma Settouti • M. Amine Chikh Meryem Saidi



Received: 7 May 2011 / Accepted: 9 July 2012  Australasian College of Physical Scientists and Engineers in Medicine 2012

Abstract Diabetes is a type of disease in which the body fails to regulate the amount of glucose necessary for the body. It does not allow the body to produce or properly use insulin. Diabetes has widespread fallout, with a large people affected by it in world. In this paper; we demonstrate that a fuzzy c-means-neuro-fuzzy rule-based classifier of diabetes disease with an acceptable interpretability is obtained. The accuracy of the classifier is measured by the number of correctly recognized diabetes record while its complexity is measured by the number of fuzzy rules extracted. Experimental results show that the proposed fuzzy classifier can achieve a good tradeoff between the accuracy and interpretability. Also the basic structure of the fuzzy rules which were automatically extracted from the UCI Machine learning database shows strong similarities to the rules applied by human experts. Results are compared to other approaches in the literature. The proposed approach gives more compact, interpretable and accurate classifier. Keywords Interpretable classification  Fuzzy rules  FCM  Neuro-fuzzy ANFIS  UCI machine learning database

N. Settouti (&)  M. A. Chikh  M. Saidi Biomedical Engineering Laboratory, Tlemcen University, Tlemcen, Algeria e-mail: [email protected] M. A. Chikh e-mail: [email protected] M. Saidi e-mail: [email protected]

Introduction Diabetes is a chronic disease that occurs when the pancreas does not produce enough insulin, or when the body cannot effectively use the insulin it produces. Hyperglycemia, or raised blood sugar, is a common effect of uncontrolled diabetes and over time leads to serious damage to many of the body’s systems, especially the nerves and blood vessels [1]. Diabetes is the most rapidly growing chronic disease of our time (see Fig. 1). It has become an epidemic that affects a large number of people in the world. The two main kinds of diabetes are: Type 1 diabetes, which is characterized by a lack of insulin production and therefore dependent upon insulin administration and Type 2 diabetes, which is characterized by an ineffective use of insulin. Type 1 diabetes is the most common type in young people whereas Type 2 is the most common diabetes and affects primarily but not exclusively adults and it is largely the result of obesity and physical inactivity [2]. Our work will take an interest more specifically in Type 2 of diabetes. Diagnose of diabetes for medical expert is a difficult task. For this reason a much research effort has been put till today in diagnosis of diabetes disease literature. In Temurtas et al. [4] a multi-layer neural network and a probabilistic neural network were used for diagnosing Pima Indian diabetes. They have reported, respectively 79.62 and 78.05 % in terms of correct classification rate. The recognition rate obtained by Kayaer and Yildirim [5] using general regression neural network was 80.21 %, while using multilayer neural network with LM algorithm was 77.08 % with tenfold CV and 82.37 % accuracy with conventional (one training and one test) validation method. They have also reported 78.05 %. Purnami et al. [6] have reported better performance in classifying diabetes disease diagnostic with accuracy

123

Australas Phys Eng Sci Med

in ‘‘Experimentation’’ section. In ‘‘Conclusion’’ section, the results are presented and discussed. Finally, ‘‘References’’ section concludes the findings.

Theory FCM algorithm

Fig. 1 Prevision of diabetes in different region [3]

93.20 %. They use a smooth support vector machine (SSVM) called Multiple Knot Spline SSVM. Fuzzy approaches have been also widely and successfully used in many areas such as knowledge discovery [7], data analysis [8] and image processing [9]. In general, fuzzy rules are generated from human expert knowledge or heuristics, which brings about good high-level semantic generalization capability. On the other hand, some researchers have made efforts to build fuzzy models from numeric data, leading to many interesting results [10–14]. Several efforts have been also made to deal with the problem of interpretability of data-driven fuzzy systems [15–22]. Actually fuzzy logic and neural networks have provided attractive structures to complex systems. Adaptive network-based fuzzy inference system (ANFIS) is a specific approach in neuro-fuzzy modeling which utilizes the neural networks to tune the rule-based fuzzy systems [11]. Successful applications of ANFIS in biomedical engineering have been reported recently in data identification [23–27], and pattern recognition [28, 29]. The exponential increasing number of fuzzy rules in neuro-fuzzy classifier is one of the difficult problems to be overcome in this paper, such problem is solved by adopting the FCM clustering method [30] for the structure identification in the ANFIS, wish is one of the most widely used neuro-fuzzy models proposed by Jang [11], against diabetes diagnosis problem. The algorithm fuzzy c-means (FCM) [30], is a typical clustering algorithm, which was used in a wide variety of engineering and scientific disciplines such as modeling [31] the decision [32], pattern recognition and classification [33], segmentation [34] as well as areas of medical applications [35]. The remainder of the paper is organized as follows, the adaptive neuro-fuzzy inference system (ANFIS) is proposed based on the model of Takagi–Sugeno. Pima Indian Database and neuro-fuzzy Classifier learning are presented

123

The FCM clustering algorithm was first introduced by Dunn [36] and later extended by Bezdek [30]. It is based on the concept of fuzzy c-partition, introduced by Ruspini [37]. FCM is a method of clustering which allows one piece of data to belong to two or more clusters. This method is frequently used in pattern recognition. It is based on minimization of the following objective function: J(U,V) ¼

C X N  X

lij

 m  x i  v j  2

ð1Þ

j¼1 i¼1

It requires every data point in the data set to belong to a cluster to some membership degree. The purpose of the FCM is to group data points into different specific clusters. Let X = {x1, x2,…xN} be a collection of data. By minimizing the objective function (1), X is classified into c homogeneous clusters. Where lij is the membership degree of data xi to a fuzzy cluster set vj, V = {v1,v2,…vC} are the cluster centers. U = (lij)N9C is a fuzzy partition matrix, in which each lij indicates the membership degree for each data point in the data set to the cluster j. The value of U should satisfy the following conditions: lij 2 ½0; 1; 8i ¼ 1; . . .N; 8j ¼ 1; . . .; C Xc l ¼ 1; 8i ¼ 1; . . .N j¼1 ij

ð2Þ ð3Þ

  The xi  vj  is the Euclidean distance between xi and vj. The parameter m is called fuzziness index, which control the fuzziness of membership of each datum. The goal is to iteratively minimize the aggregate distance between each data point in the data set and cluster centers until no further minimization is possible. The whole FCM process can be described in the following steps. Step 1: Initialize the membership matrix U with random values, subject to satisfying conditions (2) and (3). Step 2: Calculate the cluster center V by using following equation PN   m lij xi vj  Pi¼1 8j  1; . . .; C ð4Þ N  m ; i¼1 lij Step 3: Get the new distance: dij ¼ kxi  yi k;

8i ¼ 1; . . .; N;

8j ¼ 1; . . .; C

ð5Þ

Australas Phys Eng Sci Med

Fig. 2 Chart of the FCM algorithm

Step 4: Update the fuzzy partition matrix U: If dij = 0 (means xi = vj) lij ¼

Pc

k¼1

1 2  m1

ð6Þ

dij dik

Else lij = 1 Step 5: If the termination criteria have been reached, then stop. Else go back to step 2. The use of the FCM algorithm (Fig. 2) requires the determination of several parameters, namely, c, m, . In addition, the set U (0) of initial cluster centers must be defined. Although no theoretical basis is available for the proper value of m, 1.1 \ m \ 5 is generally reported as the most useful values [38]. ”

ANFIS Today, fuzzy logic concept is widely used in many implementations like data classification, pattern recognition, diagnostic and many others. Fuzzy logic forms a bridge between the two areas of qualitative and quantitative modeling. Thus, one of the advantages of fuzzy approach is that they allow to describe fuzzy rules, which fit the description of real-world processes to a greater extent. Another advantage of fuzzy approach is its interpretability; it means that it is possible to explain why a particular value appeared at the output of a fuzzy model. In turn, some of the main disadvantages of fuzzy approach are that expert input or instructions are needed in order to define fuzzy rules, and that the process of tuning of the fuzzy system parameters (e.g., parameters of the membership functions) often requires a relatively long time, especially if there is a high number of fuzzy rules in the model. Both these disadvantages are related to the fact that it is not possible to train fuzzy models. A diametrically opposite situation can be observed in the field of neural networks. User can train neural

networks, but it is extremely difficult to use a priori knowledge about the considered system and it is almost impossible to explain the behavior of the neural system in a particular situation. In order to compensate the disadvantages of one model with the advantages of another model, several researchers tried to combine fuzzy models with neural networks. A hybrid system named ANFIS has been proposed by Jang in [11]. The ANFIS is a fuzzy inference system (FIS) based on the model of Takagi–Sugeno [10]. It is the fuzzy logic based paradigm that grasps the learning abilities of neural networks to enhance the intelligent system’s performance using a priori knowledge. For reasons of representation, we will consider an ANFIS system with two inputs and one output and also consider a model of the first order using two rules: If x1 is A1 and x2 is B1 then y1 = f1(x1,x2) = a1x1 ? b1x 2 ? c 1. If x1 is A2 and x2 is B2 then y2 = f2(x1,x2) = a2x1 ? b2x 2 ? c 2. The ANFIS architecture that allows representing the basic rules is carried out by an adaptive network that contains fixed nodes (circular) and adaptive nodes (square) as illustrated in Fig. 3. Each node square or circular applies a function on its input signals and for a given layer nodes have the same type of function. The output Oki of a node i of the k layer (called node (i, k)) depends on the signals from the layer k-1 and parameters of the node (i, k).   k1 Oki ¼ f Ok1 . . .O ; a; b; c; . . . ð7Þ 1 nk1 nk–1 is the number of nodes in the (k-1) layer, and a, b, c are the parameters of the (i, k) node. It should be noted that a circular node has no parameters. Layer 1 Nodes of this layer are all adaptive nodes. This layer performs fuzzification of the inputs; it determines the membership of each input: O1i ¼ lAi ð xÞ

ð8Þ

x input of i node, Ai : linguistic variable, lAi (x) : degree of membership of x to Ai. The parameters of a node in this layer are those of the corresponding membership function, these are the premise parameters. Layer 2 The nodes of this layer are fixed nodes. They receive the output signals from the previous layer and send their product output.

123

Australas Phys Eng Sci Med Fig. 3 ANFIS architecture

O1 1

x1

A1

1

O

A2

w1

x . B1 x2

v1

A1 1

w1 w1 w2

O4 1

v1 .

f1

x1

v1

a1 x

1

b1 x

2

2

O2 1

B1

2

B2

x2 x2

w2

A2 x1 . B2 x2

v2

w2 w1 w2

4

O2

f v2. 2

x1 v2 a2

b2x2

c1

O15 y c2

i

vi . fi

O2

wi ¼ lAi ðx1 Þ  lBi ðx2 Þ;

i ¼ 1; 2

ð9Þ

wi is the degree of fulfillment of the rule i.

nodes in the first layer is equal to the total number of linguistic terms defined.

Layer 3

Experimentation

Each neuron in this layer calculates the normalized degree of fulfillment of the fuzzy rule. wi vi ¼ ð10Þ w1 þ w2

Diabetes disease database

The result out of each node represents the contribution of this rule to the final result. Layer 4 The nodes in this layer are adaptive and perform the consequent of the rules. The output of a node i is given by: O4i ¼ vi  fi ¼ vi ðai x1 þ bi x2 þ ci Þ;

i ¼ 1; 2

ð11Þ

The parameters in this layer (ai, bi, ci) are to be determined and are referred to as the consequent parameters. Layer 5 This layer consists of a single neuron circular makes the sum of signals from the previous layer to give the final output of the network: X O51 ¼ y ¼ vi  f i ð12Þ i

The generalization of the system to a system with multiple inputs does not pose any problem. The number of

123

We have used Pima Indian Dataset taken from UCI machine learning repository [41] in our applications. This dataset is commonly used among researchers who use machine learning method for diabetes disease classification. In the Diabetes PIMA data set the clinical classes concern diabetes Type 2 knowing that people develop Type 2 diabetes due to the cells in the muscles, liver, and fat do not use insulin properly. Eventually, the pancreas cannot make enough insulin for the body’s needs. As a result, the amount of glucose in the blood increases while the cells are starved of energy. Over the years, high blood glucose damages nerves and blood vessels, leading to complications such as heart disease, stroke, blindness, kidney disease, nerve problems, gum infections, and amputation. To confirm the diagnosis, the Diabetes blood tests must be done with Fasting blood glucose level—diabetes is diagnosed if it is higher than 126 mg/dL. Diabetes screening is recommended for: • • • •

Overweight children who have other risk factors for diabetes, starting at age 10 and repeated every 2 years. Overweight adults (BMI greater than 25) who have other risk factors. Adults over age 45. Family history and genes play a large role in Type 2 diabetes. Low activity level, poor diet, and excess body weight around the waist increase your risk.

Australas Phys Eng Sci Med

The database contains 768 records and two classes. The class distribution is: Class 0: non-diabetics (500) (65.1 %) Class 1: diabetics (268) (34.9 %). All patients in this database are Pima Indian women at least 21 years old and living near Phoenix, Arizona, USA. Notion that there is some zero values of attributes that were not reasonable for humans’ physical situation. After removing these records with unreasonable physical data, the total number of records is 392. Each record has eight attributes: 1. 2. 3. 4. 5. 6. 7. 8.

Npreg: Number of times pregnant. Glu: Plasma glucose concentration a 2 h in oral glucose tolerance test. BP: Diastolic blood pressure (mmHg). Skin: Triceps skin fold thickness (mm). Insulin: 2-h serum insulin (l U/mL). BMI: Body mass index (weight in kg/(height in m2)). Ped: Diabetes pedigree function. Age (years).

Neuro-fuzzy classifier learning Extraction of fuzzy rules has been one of the most important points in the design of fuzzy inference model. In this work, two phases of learning are required: First, a structure learning phase is used to determine proper input space partitioning (number of membership’s functions for each input). Second, a parameter learning phase is used to finetune the membership functions and consequent parameters. There are several ways that structure learning and parameter learning can be combined in a neuro-fuzzy model classifier. They can be performed sequentially: structure learning is used first to find the appropriate structure of a neuro-fuzzy system; and parameter learning is then used to fine-tune the parameters. In other cases, only parameter learning or structure learning is necessary when structure (fuzzy rules) or parameters (membership functions) are given by human experts, also the structure in some neuro-fuzzy model [11] is fixed a priori. In this work, the tenfold cross-validation technique is performed. The training data (392 records) are randomly splitted into 10 mutually exclusive and approximately equal partitions. Each one is used in turn for testing while the remainder is used for training i.e., 9/10 of data is used for training and 1/10 for testing. We repeat the whole procedure 10 times. Thus, ten different test results exist for each training-test configuration.

structure initialization is uniform partitioning of each input variable range into fuzzy sets, resulting to a fuzzy grid. This approach is followed in ANFIS, a well-known TSK model learning algorithm [11]. The proposed structure learning algorithm in this study is accomplished by using x-means algorithm 1 [39] this technique reveals the true number of classes in the underlying distribution, and that it is much faster than repeatedly using accelerated K-means [40] for different values of K. X-means give a plot of the within groups sum of squares by number of clusters extracted who can help to determine the appropriate number of clusters..

Algorithm 1. The x-means algorithm. 1. Improve-Params. 2. Improve-Structure. 3. If K [ Kmax stop and report the best scoring model found during the search. Else, goto 1.

The Improve-Params operation consists of running conventional K-means to convergence. The Improve-Structure operation finds out if and where new centroids should appear. As shown in Fig. 4 the x-means algorithm has allowed us to determine an optimal cluster number from 1 to 5 clusters for parameter learning phase. Our choice is to select c = 2 where the sum of squares within groups is 2 9 10-5 (relatively small value). To consolidate the results of this algorithm we applied the cluster validity index Silhouette to judge the quality of all clustering solutions, this technique was tested in my research for the master degree [43] and gives very good results. Therefore, we confirm this choice of cluster with the Silhouette validation technique [44] which calculates the

Structure learning phase Structure learning is a more difficult task than parameter learning; in general it is tackled by off-line, Trial-and-error approaches [42–45]. One of the most known methods for

Fig. 4 Number of clusters with x-means

123

Australas Phys Eng Sci Med Fig. 5 Silhouette width for 2 centers and c = 3

silhouette width for each sample, average silhouette width for each cluster and overall average silhouette width for a total data set. Using this approach each cluster could be represented by so-called silhouette (Fig. 5), which is based on the comparison of its tightness and separation. The average silhouette width could be applied for evaluation of clustering validity and also could be used to decide how good is the number of selected clusters. The largest overall average silhouette was obtained for c = 2 with 0.44 (Fig. 5), then 0.38 for c = 3. That indicates the best clustering is equal to 2 (number of cluster). Therefore, the number of cluster with maximum overall average silhouette width is taken as the optimal number of the clusters. Parameter learning phase Hybridization FCM–ANFIS This work concentrates on optimization of diabetes disease ANFIS-based fuzzy classifier. We propose to use FCM clustering strategy. The data clustering is to identify structure based on the scatter partition. So, this approach can be adopted to reduce the dimension of the classifier as well as training time since the number of fuzzy rules equals the number of membership functions regardless of the dimension inputs [46]. FCM method is used to predict the distribution of fuzzy membership functions by universe of discourse partition (see Fig. 6).

123

Fig. 6 Architecture of FCM–ANFIS

Fuzzy c-means tries to partition numerical data into clusters. The belongingness of a data point to a specific cluster is given by the membership value of the data point to that cluster. The membership value is calculated by minimization of a FCM function which emerging research affiliation with the least error. (see Fig. 7). The function needs approximate cluster centers, as well as a metric for membership evaluation as input, e.g., the Euclidean distance. The minimization is an iterative process where new cluster centers are computed as weighted averages of all data points, where the membership values are the weights. The stop criterion is when the membership

Australas Phys Eng Sci Med Fig. 7 Diagram representing the distribution of clusters with FCM into membership functions

Fig. 8 Error convergence for FCM

values have changed very little in comparison with the previous iteration. The error convergence for the 5 first clusters after 63 iteration is showed in Fig. 8, we notice that the error convergence for c = 2 has the lowest value around 0.1 in the other hand the error is around 0.28 for c = 5. The main purposes of the hybridization FCM–ANFIS classifier are to diagnose the diabetes diseases by using a

reduced number of fuzzy rules with relatively small number of linguistic labels, removing the similarity of the membership functions, preserving the meaning of the linguistic labels, and in same time improving the classification performances. It can be shown in Fig. 9 the repartition of two cluster centers obtained for the height feature inputs of the data base Indians PIMA. As an example we notice in Fig. 8 that the glucose is the most important attribute where the patients are regrouped into two groups where the first group has less than 126 mg/dL and the second one has over than 126 mg/dL. This repartition is confirmed by the expert medical knowledge. In other hand Age does not help enough to discriminate between diabetics and non-diabetics, the 6 others attributes contribute differently in the discrimination between classes. In this paper, hybrid learning algorithm is employed to determine the optimum values of the FIS parameters of Sugenotype. It applies a combination of the least squares method and the back propagation gradient descent method for training FIS membership function parameters to emulate a given training data set [47, 48]. It means that, in the forward pass the premise parameters are assumed fixed and the optimal values of the consequent parameters are estimated using the least squares method. Then using these parameters, the output of the system is calculated and the resulting error is used to adjust the premise parameters through a standard back propagation algorithm. In

123

Australas Phys Eng Sci Med

Fig. 9 Clustering results

this backward pass, the consequent parameters are assumed fixed and a gradient descent algorithm is used to find the optimal values of the premise parameters. In order to optimize the values of the neuro-fuzzy parameters, training and validation subsets are applied into various model classifiers. After the cross validation the best result is obtained in 100 epochs (MSE training = 0.1012, MSE validation = 0.1023), it permits to realize a good learning of the model classifier. Test subset data were applied to neuro-fuzzy model classifier and the accuracy is

123

83.85 %. Table 1 presents final modal points. We have two MFs for each attributes (see Fig. 10) with their modal points presented in Table 1.

Knowledge extracted from the FCM–ANFIS model The FCM–ANFIS classifier will thus generate a knowledge base of 2 rules for classification.

Australas Phys Eng Sci Med Table 1 Final modal points with FCM–ANFIS Npreg

Medium: 1–3–9

High: 0–5–17

Glu

Normal: 71.6–101.3–180

High: 85.1–155.8–200

BP

Normal: 55–71–92

High: 30–70–110

Skin

Medium: 16–33–51

High: 9–30–70

Insulin

Low: 14–100–460

High: 80–380–690

BMI

Medium: 18.2–31.83–51

High: 19–34.79–59

PED

Low: 0.085–0.5–1.45

High: 0.85–0.6–2.42

Age

Medium: 21–27.94–56.1

High: 21–35.09–61.8

Rule 1: If (Npreg is High) and (Glu is High) and (BP is High) and (Skin is Medium) and (Insulin is High) and (BMI is High) and (PED is High) and (Age is High) then (Class is Diabetic). Rule 2: If (Npreg is Medium) and (Glu is Normal) and (BP is Low) and (Skin is High) and (Insulin is Low) and (BMI is Medium) and (PED is Low) and (Age is Medium) then (Class is Non-Diabetic).

The degree of solicitation and the prototypes for each rule are shown in the Table 2. Their respective definitions in our work are as follows: 1.

2.

Degree of solicitation is equal to the number of examples activating a rule more than 50 % over the total number of examples in the database. A prototype: is an example that activates a rule more than 90 %.

In this work, each input attribute has two membership functions, so 256 fuzzy rules have been extracted logically, the overall complexity of the knowledge base of the fuzzy classifier increases substantially. To reduce this complexity, application of FCM algorithm is proposed. By applying this technique, the number of rules has been reduced from 256 to 2, thereby reducing the complexity of the knowledge base significantly. As shown in Table 2, 75.38 % of

Fig. 10 Final membership functions (after learning)

123

Australas Phys Eng Sci Med



Table 2 Degrees of solicitation for the two rules with FCM–ANFIS Degree of solicitation for diabetic Rule 1

Degree of solicitation for nondiabetic Rule 2

Prototype patient in test phase

Prototype patient in learning phase

d.

Diabetics patients

34.61 %

12.30 %

15

17

No diabetic patients

20 %

75.38 %

46

46

non-diabetics examples have solicited fuzzy Rule 2 (with degree fulfillment more than 50 %). In other hand just 34.61 % of diabetic’s examples have solicited fuzzy Rule 1 (with degree fulfillment more than 50 %). As stated in Rule 2 is requested for 75.38 % it provides for middle-aged people with normal glucose and weight denoting overweight are classified as non-diabetics. Rule 1 consistent with the reasoning of experts to diagnose diabetes with high glucose, a large age and weight showing obesity is sought at 34.61 %. The degree 12.30 % Rule 2 diabetes accounts for real cases misclassified we can see that in the performance Table 4. Experimental results The performances of the implemented classifier was evaluated by computing the percentages of sensitivity (SE), specificity (SP) and classification accuracy, the respective definitions are as follows: a.

Classification accuracy % PjTj

accuracyðTÞ ¼ 1;

assessðtÞ ¼ f0;

i¼1

c.

Confusion matrix

A confusion matrix contains information about actual and predicted classifications done by a classification system. After learning phase, 392 (262 healthy, 130 diabetics) testing data with tenfold cross validation were used to evaluate the classifier in terms of correct classification; SE; SP and matrix of confusion (see Table 3). Table 4 indicates that by using FCM–ANFIS we can get high accuracy with fewer rules. On the contrary, by using ANFIS more rules are needed to get a lower accuracy. Moreover the features projected partition in ANFIS is ambiguous and cannot preserve the meaning of the linguistic labels. The best number of the rules is a trade-off between the accuracy and the rules number, also with a minimum of clusters (c = 2) and just two fuzzy rules, FCM–ANFIS approach has given the best results with accuracy = 83.85 %, SE = 82.05 % and SP = 84.62 % comparing to the other cases. Table 5 gives the confusion matrix showing the classification results of this classifier on 90–10 % training-test split from the diabetes diseases dataset. The results validate the choice, we obtain a small number of misclassified cases with FP = 3 and FN = 2. • Patient no. 107 is correctly classified as diabetic case (TP) (Fig. 11). The Table 6 presents the characteristics of a diabetic patient no. 107, this example is correctly classified as

assessðti Þ  100 jTj

if classif ðtÞ¼t:c otherwise

where T is the test set, t € T, t.c is the class of the item t and classify (t) returns the classification of t by the algorithm. b. SE % sensitivity ¼

• •

True negatives: classifies no diabetic as no diabetic; False positives: classifies no diabetic as diabetic; False negatives: classifies diabetic as no diabetic.

TP  100 TP þ FN

SP %

Table 3 Confusion matrix Actual

Predicted Negative

Positive

Negative

TN

FP

Positive

FN

TP

Table 4 Performances of FCM–ANFIS classifier with different clusters Cluster number

8 attributes (ANFIS)

c=2

c=3

c=4

TN specificity ¼  100 TN þ FP

Accuracy %

73.08

83.85

83.08

81.54

SE %

66.67

82.05

69.23

82.05

where TP, TN, FP and FN denote, respectively:

SP %

75.82

84.62

89.01

81.32

Number of rules

256



True positives: classifies diabetic as diabetic;

123

2

3

4

Australas Phys Eng Sci Med Table 5 Confusion matrices for different cluster number using a 90–10 % training-test split Actual

Predicted Negative

Negative

Positive 3

Positive

2

12

Negative Positive

22 4

4 10

c=3

Negative

21

5

c=4

3

Patients Number

107

28

102

68

Cluster number

23

Positive

Table 6 Parameters of patients

c=2

11

Npreg

3

1

2

2

Glu

173

87

93

157

BP

78

68

78

74

Skin

39

34

64

35

Insulin

185

77

32

440

BMI

33.8

37

38

39.4

PED

0.97

0.401

0.674

0.134

Age

31

24

23

30

Class

1

0

1

0

diabetic with an apparent readiness to heredity (PED) and high values of glucose and insulin, respectively. Fuzzy c-means-adaptive network-based fuzzy inference system allowed having a more targeted with Rule 1 diabetic who was activated over 70 %. •

Patient no. 28 is correctly classified as non-diabetic case (TN) (Fig. 12).

Seventy-seven patients without diabetes were correctly classified; Table 6 shows an example of a patient no. 28 correctly recognized as no diabetic case. The FCM–ANFIS classifier has correctly recognized the no diabetic case; we noticed clearly that the Rule 2 is activated with degree of fulfillment equal 71 % (nondiabetic). • Patient no. 102 (diabetic) recognized as no diabetic (FN) (Fig. 13).

Fig. 12 The fuzzy rules for patient no. 28 who was correctly classified as TN



Patient no. 68 (no diabetic) recognized as diabetic (FP) (Fig. 14).

Discussion

Fig. 11 The fuzzy rules for patient no. 107 who was correctly classified as TP

As it could be noticed from Figs. 10, 11, 12 and 13, fuzzy rules are solicited, according to input vector parameters, then the fuzzy logic inference engine evaluates these inputs using the knowledge base and then diagnoses the patient class. These solicited fuzzy rules are closer to the medical expert reasoning. The neuro-fuzzy classifier justifies its results by the most solicited fuzzy rules. So the expert can easily check the fuzzy model classifier for plausibility, and can verify why a certain classification result was obtained for a certain patient, by checking the degree of fulfillment

123

Australas Phys Eng Sci Med

Fig. 13 The fuzzy rules for patient no. 102 (diabetic) recognized as non-diabetic (FN)

Fig. 14 The fuzzy rules for patient no. 68 (FN) recognized as diabetic (FP)

of the individual fuzzy rules. Instead of a simple classification the physician also gets a description of the patients in terms of the fuzzy rules that are active for this case. The Fig. 14 shows a TN recognized as a diabetic (false alarm) (patient has a high value of Plasma glucose concentration); the most activated fuzzy rule is Rule 1 with a degree of fulfillment equals to: 60.71 %. Experimental results have shown that the proposed approach with FCM is simple and effective in explaining the interpretability of neuro-fuzzy classifier by reducing the 256 rules to 2 efficient rules while preserving the model accuracy at a satisfactory level. Recently a similar work

123

using FCM–ANFIS approach [49] was applied on database of diabetes collected by the Faculty of Computer Science and Information University of Technology Malaysia; the results have reached 72.66 %. Many studies using ANFIS structure have been performed in diagnosis of diabetes disease. Polat and Gunes [50] have reported 89.47 % classification accuracy using principle components analysis and ANFIS. But when principle components are applied for fuzzy rule-based approaches, interpretation of produced rules naturally becomes more difficult or even impossible. Also, the result obtained by [50] was not reproducible, since Temurtas et al. [4] that used the same methods on the same database obtained 66.78 % classification accuracy. Vosoulipour et al. [51] have applied four features witch selected by a genetic algorithm, to a neural network and second to an ANFIS structure, they have reported, respectively 77.60 and 81.30 % classification accuracy. But the obtained results are not interpretable. In Dogantekin et al. [52], they obtained 84.61 % classification accuracy using linear discriminant analysis and ANFIS. The common criticism of this method is that it does not provide explicit knowledge to the user. Recently Ubeyli [53] applied ANFIS for diagnosing Pima Indian diabetes. She has obtained the highest classification accuracy 98.14 %. However, this improvement of the classification accuracy is not expected because of that a lot of atypical values in Pima Indian diabetes disease database were used in training; also she has used the conventional validation (one training base and one testing base) instead of K-fold cross-validation technique. This type of training and testing can be toughed as biased methods. Globally these cited works focused on classification accuracy instead of interpretability (participation of the fuzzy rules in the final decision). The detail accuracy of these studies can be seen in Polat and Gunes [50]. The results obtained in this paper are very interesting than the cited studies in literature, since our neuro-fuzzy classifier built in this work aims at finding with FCM a reduced set of fuzzy rules that can be interpreted linguistically. The clinician obtains information about why the diagnosis was selected.

Conclusion This study presents a fuzzy classification model of diabetes. Here, two criteria are used to evaluate the proposed method. The first criterion is the good accuracy performances of the classifier and the other is comprehensibility of obtained results. This study shows that the contribution of the FCM algorithm on ANFIS has reduced the size of models as well as learning time, since the number of centers is equal to the number of membership functions

Australas Phys Eng Sci Med

regardless of the size of incoming. With the hybridization of these methods extracting of knowledge has been made to provide rules that are reliable, accurate and sufficiently simple to be understood; all in improving performance with a rate reaching 83.85 %.

References 1. World Health Organization diabetes (2010) http://www.who.int/ diabetes/en/. Accessed 16 Mar 2011 2. Medical Dictionary (2011) http://medicaldictionary.thefreedictionary. com/diabetes. Accessed 05 Feb 2011 3. OMS se´rie de rapports techniques (2000) La pre´vention du diabe`te sucre´. Rapport d’un groupe d’e´tude de l’OMS. Technical report. Gene`ve 844:56–57 4. Temurtas H, Nejat Y, Temurtas F (2009) A comparative study on diabetes disease diagnosis using neural networks. Expert Syst Appl 36:8610–8615 5. Kayaer K, Yildirim T (2003) Medical diagnosis on Pima Indian diabetes using general regression neural networks. In: Proceedings of the international conference on artificial neural networks and neural information processing, 26–29 June 2003, Springer, Istanbul, pp 181–184 6. Purnami SW, Embong A, Zain JM, Rahayu SP (2009) A new smooth support vector machine and its applications in diabetes disease diagnosis. J Comput Sci 5(12):1003–1008 7. Mohamadi H, Habibi J, Abadeh MS, Saadi H (2008) Data mining with a simulated annealing based fuzzy classification system. Pattern Recognit 41:1841–1850 8. Er MJ, Zhou Y (2008) Automatic generation of fuzzy inference systems via unsupervised learning. Neural Netw 21:1556–1566 9. Nakashima T, Schaefer G, Yokota Y, Ishibuchi H (2007) A weighted fuzzy classifier and its application to image processing tasks. Fuzzy Sets Syst 158:284–294 10. Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern 15(1):116–132 11. Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern 23(3):665–685 12. Wang LX (1994) Adaptive fuzzy systems and control: design and stability analysis. Prentice-Hall, Inc., Upper Saddle River 13. Gan Q, Harris C (1999) Fuzzy local linearization and local basis function expansion in nonlinear system modeling. IEEE Trans Syst Man Cybern B 29:559–565 14. Harris C, Hong X, Gan Q (2002) Adaptive modelling, estimation and fusion from data: a neurofuzzy approach. Springer-Verlag, London 15. Jin Y (2000) Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement. IEEE Trans Fuzzy Syst 8(2):212–221 16. Hoppner F, Klawonn F (2000) Obtaining interpretable fuzzy models from fuzzy clustering and fuzzy regression. In: Proceedings of the international conference on knowledge-based intelligent engineering systems & allied technologies, pp 162–165 17. Roubos H, Setnes M (2001) Compact and transparent fuzzy models and classifiers through iterative complexity reduction. IEEE Trans Fuzzy Syst 9(4):516–524 18. Guillaume S (2001) Designing fuzzy inference systems from data: an interpretability-oriented review. IEEE Trans Fuzzy Syst 9:426–442 19. Johansen TA, Shorten R, Murray-smith R (2000) On the interpretation and identification of dynamic Takagi–Sugeno fuzzy models. IEEE Trans Fuzzy Syst 8:297–313

20. Casillas J, Cordon O, Herrera F, Magdalena L (2003) Interpretability improvements to find the balance between interpretabilityaccuracy in fuzzy modeling: an overview. In: Casillas J, Cordon O, Herrera F, Magdalena L (eds) Interpretability issues in fuzzy modeling of studies in fuzziness and soft computing, vol 128. Springer, Berlin, pp 3–24 21. Zhou SM, Gan JQ (2004) Improving the interpretability of Takagi–Sugeno fuzzy model by using linguistic modifiers and a multiple objective learning scheme. In: Proceedings of international joint conference on neural networks (IJCNN), Budapest, pp 2385–2390 22. Gan JQ, Zhou SM (2006) A new fuzzy membership function with applications in interpretability improvement of neurofuzzy models. In: Proceedings of the 2006 international conference on intelligent computing: part II, ICIC’06. Springer-Verlag, Berlin, pp 183–194 23. Guler I, Ubeyli E (2005) Adaptive neuro-fuzzy inference system for classification of EEG signals using wavelet coefficients. J Neurosci Methods 148(2):113–121 24. Xu W, Li L, Zou S (2007) Detection and classification of microcalcifications based on DWT and ANFIS. In: The 1st international conference on bioinformatics and biomedical engineering, ICBBE 2007 ¨ beyli ED (2008) Adaptive neuro-fuzzy inference systems for 25. U automatic detection of breast cancer. J Med Syst 18:157–174 26. Heydari Z, Farahmand F, Arabalibeik H, Parnianpour M (2008) Adaptive neuro-fuzzy inference system for classification of ACLruptured knees using arthrometric data. Ann Biomed Eng 36(9):1449–1457 27. Korytkowski M, Rutkowski L, Scherer R (2008) From ensemble of fuzzy classifiers to single fuzzy rule base classifier. ICAISC ‘08. Springer-Verlag, Berlin 28. Belal SY, Taktak AF, Nevill AJ, Spencer SA, Roden D, Bevan S (2002) Automatic detection of distorted plethysmogram pulses in neonates and pediatric patients using an adaptive-network-based fuzzy inference system. Artif Intell Med 24(2):149–165 29. Nauck D, Kruse R (1995) NEFCLASS—a neuro-fuzzy approach for the classification of data. Paper of Symposium on Applied Computing 1995 (SAC’95) in Nashville 30. Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York 31. Miao Y, Liu ZQ (2000) On causal inference in fuzzy cognitive maps. IEEE Trans Fuzzy Syst 8(1):107–119 32. Liu ZQ, Satur R (1999) Contextual fuzzy cognitive map for decision support in geographic information systems. IEEE Trans Fuzzy Syst 7(5):495–507 33. Chang X, Li W, Farrell J (2006) A c-means clustering based fuzzy modeling method. The ninth IEEE international conference on fuzzy systems, vol 2, pp 937–940. BIME J 6(1) 34. Kannan SR (2008) A new segmentation system for brain MR images based on fuzzy techniques. Appl Soft Comput 8:1599–1606 35. Murugavalli S, Rajamani V (2006) A high speed parallel fuzzy c-mean algorithm for brain tumor segmentation. ICGST international journal on bioinformatics and medical engineering. BIME J 6:29–34 36. Dunn J (1974) A fuzzy relative of the ISODATA process and its use in detecting compact, well-separated clusters. J Cybern 3:32–57 37. Ruspini EH (1969) A new approach to clustering. Inf Control 15(1):22–32 38. Cannon RL, Dave JV, Bezdek JC (1986) Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 8(2):248–255 39. Pelleg D, Moore AW (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning,

123

Australas Phys Eng Sci Med

40.

41.

42. 43.

44. 45.

46.

47.

ICML ‘00. Morgan Kaufmann Publishers Inc., San Francisco, pp 727–734 MacQueen JB (1967) Some methods for classification and analysis of multivariate observations, vol 1. University of California Press, Los Angeles, pp 281–297 Michael K, UCI Machine Learning Repository. Washington University, St. Louis. http://archive.ics.uci.edu/ml/datasets/ Diabetes. Accessed 06 Feb 2011 Sugeno M, Kang GT (1988) Structure identification of fuzzy model. Fuzzy Sets Syst 28:15–33 Settouti N (2011) Renforcement de l’Apprentissage Structurel pour la reconnaissance du Diabe`te. Technical report. http:// hal.inria.fr/inria-00605627/fr/ Rousseeuw PJ, Trauwaert E, Kaufman L (1995) Fuzzy clustering with high contrast. J Comput Appl Math 64:81–90 Tanaka K, Sano M, Watanabe H (1995) Modeling and control of carbon monoxide concentration using a neuro-fuzzy technique. IEEE Trans Fuzzy Syst 3:271–279 Dan S, Jun M, Zongyuan H (2006) Diagnosis of inverter faults in PMSM DTC drive using time-series data mining technique. Advanced data mining and applications, second international conference, ADMA 2006, Xi’an, 14–16 Aug 2006, pp 741–748 Jang JSR, Mizutani E (1996) Levenberg-Marquardt method for ANFIS learning. In: Biennial conference of the north american fuzzy information processing society, June 1996, pp 87–91

123

48. Jana A, Yang PH, Auslander DM, Dave RN (1996) Real time neuro-fuzzy control of a nonlinear dynamic system. In: Biennial conference of the north american fuzzy information processing society, June 1996, pp 210–214 49. Othman MFB, Thomas MSY (2007) Neuro fuzzy classification and detection technique for bioinformatics problems. In: Proceedings of the first Asia international conference on modelling & simulation, IEEE Computer Society, Washington, pp 375–380 50. Polat K, Gunes S (2007) An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit. Signal Process 17:702–710 51. Vosoulipour A, Teshnehlab M, Moghadam HA (2008) Classification on diabetes mellitus data-set based-on artificial neural networks and ANFIS. In: IFMBE proceedings of the 4th Kuala Lumpur International Conference on Biomedical Engineering 2008, pp 27–30 52. Dogantekin E, Akif D, Derya A, Levent A (2010) An intelligent diagnosis system for diabetes on linear discriminant analysis and adaptive network based fuzzy inference system: LDA–ANFIS. Digit. Signal Process 20:1248–1255 53. Ubeyli ED (2010) Automatic diagnosis of diabetes using adaptive neuro-fuzzy inference systems. Expert Syst 27(4):259–266