Ali Mohammadi, Mohammad Saniee Abadeh. Faculty of Electrical and Computer Engineering. Tarbiat Modares University. Tehran, Iran. E-mail: {ali.mohammadi, ...
Robust Medical Data Mining Using a Clustering and Swarm Based Framework Ali Mohammadi, Mohammad Saniee Abadeh Faculty of Electrical and Computer Engineering Tarbiat Modares University Tehran, Iran E-mail: {ali.mohammadi, saniee} @modares.ac.ir
Objective: The paper is aimed to present a novel fuzzy classification approach, implemented on swarm based framework for the extraction of fuzzy rules from labeled numerical data in order to improve efficient medical diagnosis systems. Interpretability of extracted fuzzy rules from medical datasets is one of the most important problems in the medical domain. Often, people consider interpretability as sum of rules lengths (SORL) and number of rules (NOR) in a rule based domain, but an important issue which is usually forgotten is the variance of the final result (Accuracy, SORL, NOR). This paper considers the variances of accuracy, SORL, and NOR as essential interpretability measures. Method: This paper proposes a parallel swarm based framework to generate multi-objective fuzzy rule based systems on three medical datasets that decreases the variances of Accuracy, SORL, and NOR and simultaneously improves the final Accuracy, SORL, and NOR values. Results and conclusion: The algorithms that has low variance in terms of accuracy, SORL and NOR are called robust algorithms, this means that in each run the output is not being different. It is clear that as the variances of measures decreases, the robustness of the final result improves correspondingly. In this paper we have shown that we have been successful in improving the two objectives that were negatively correlated and accordingly we have been successful in generating robust fuzzy rule based systems. We demonstrate with our results, that the accuracy of other works and their interpretability can be increased significantly by using our framework. In this paper, the proposal method is based on PSO with some modifications and is implemented on a parallel framework. Others can implement their algorithm on this framework as well. 1. Introduction Modeling of high performance and high interpretability fuzzy rule based systems has been studied in [1-3]. Modeling of FRBSs can be considered an optimization task or a search problem. Evolutionary algorithms [4] are used to achieve this objective. Interpretability should be regarded as a major subject in the field of fuzzy data mining [5]. NOR in FRBS and SRL are two major pillars of interpretability. The flexibility of fuzzy systems makes them applicable to a wide range of problems. Among them, problems with multiple conflicting objectives are interesting for researchers. In these types of problems, the enhancement of an objective leads to decline of the others; because of that, no unique solution can be found that satisfies all the objectives. In this study, we try to keep accuracy high and improve interpretability, since interpretability in medical domain is one of the major principles. In recent years few studies have considered variances of accuracy, SORL and NOR directly. In fact, to our knowledge, very few of them give importance to variance directly. We consider to variance of those measures as another measure of interpretability. Alonso et al. in [6] showed a grouping that contains two main measures. Those measures are description and explanation. In [7] a grouping in words of low-level interpretability and high-level interpretability was suggested. Low-level interpretability is achieved on the fuzzy set by optimizing membership functions in words of the semantic measure and high-level
interpretability is acquired on the fuzzy rule by considering median of number of variables and rules and consistency of them. The complexity problem has been achieved in the publications by using different approaches. Some studies use techniques like [8], [9],[10] and [11] to decrease the number of MFs or rule selection to decrease NOR in [12], [13] ,[14],[15], [16], [17] and [18] or methods for rule learning like [16] and [18] which directly acquiring simple models.
A GA approach was suggested in [17] for rule selection in classification problems, by trying to increase the number of patterns that correctly classified and to decrease the NOR. This leads to a less complex model, thanks to the minimizing NOR and using the “don’t care” term in the antecedent part of the rule. Another two-objective genetic algorithm for achieving non-dominated solutions has been proposed in [19], which tried to optimize both objectives. [20] is application based paper like our paper that suggests a fuzzy-genetic method that offers systems with two prime characteristics: high accuracy and high interpretability (just NOR). Then, in [13] both single-objective and two-objective algorithms are studied, to carry the rule selection out on an initial set of rules with “don’t care” terms and considering the mentioned objectives: accuracy and NOR. Moreover, Ishibuchi et al. [16] suggest a multi-objective evolutionary algorithm with three criteria: increasing the patterns that are classified correctly, decreasing the NOR and decreasing the SORL. Moreover, they consider two approaches, one for rule selection and a second for rule learning, But their algorithm does not consider variances of those measures. In [17], the authors study the effect of fuzzy partitioning and term selection in order to find a good trade-off between NOR and Accuracy. Although in some other works like [21], the investigations are focused on medical knowledge discovery, however, the importance of accuracy and the number of parameters have been more emphasized (in comparison to our study). In some other works like [22], the authors follow two objectives: improving accuracy and interpretability (just NOR). In [22] the used method is based on Pareto-based elitist multi-objective evolutionary algorithms. A new procedure for regression problems is proposed in [23]. They discuss the design of linguistic models with high interpretability by GA based machine learning algorithms using a Pittsburgh approach. They demonstrate how the membership function modeling problem can be managed by single-objective and multiobjective GAs, with a three-objective GA that tries to decrease variances of the rule set, NOR in the rule set, and SORL in the rule set. In the single-objective GA, they apply a weighted sum of the three objectives as a single objective. Moreover, they think “don’t care” as a supplementary antecedent fuzzy set is required for linguistic high-dimensional patterns. But they do not consider variances of those measures (Accuracy, NOR, and SORL). By using enhanced MOEA(Multi-Objective Evolutionary Algorithm), Ishibuchi et al suggest a multi-objective genetic algorithm with a local search [24] for classification problems, with regarding the same way as in [16], and three objectives: increasing the number of patterns that are classified correctly, decreasing the NOR, and decreasing the SORL. The approach contains two phases: first, the proposed method generates candidate rules and next, it using a multi-objective based rule selection. They used two well-known data mining measures: confidence and support, to find a manageable number of candidate fuzzy rules. The important thing is that those methods do not consider variances of accuracy, NOR and SORL as an interpretability criteria. In this study we have proposed a parallel framework that the other fuzzy rule learning algorithms can be implemented on. Using this framework causes a significant increase on accuracy and interpretability. Moreover, the obtained results are robust, which means by employing proposed framework, the results will not be diverse at all. On the other hand the framework gives the same fuzzy rule set even if the learning algorithm runs for several times.
The paper is organized as follows: In the next section, we present preliminaries in two sub-sections. In section 3, we explain our proposed method in two stages. Section 4 shows experimental results and the last section concludes the paper. 2. Preliminaries This section contains two sub-sections; the first sub-section is in the case of medical data overview and the next one is about the multi-objective fuzzy rule based systems. 2.1. Medical data overview We use three medical datasets for mining fuzzy rules: Pima Indians Diabetes, Statlog (Heart), and Contraceptive Method Choice (CMC) datasets. Below we briefly discuss about these datasets. The following information is taken directly from UCI site. Pima Indians Diabetes: includes 8 attributes as follows: Number of times pregnant, Plasma glucose concentration a 2 hours in an oral glucose tolerance test, Diastolic blood pressure, Triceps skin fold thickness , 2-Hour Serum insulin, Body mass index, Diabetes pedigree function and Age. All of those are Numeric variables. It contains 768 patterns. We try to extract the fuzzy rules to detect the presence or absence of diabetes in none variance way. Statlog (Heart): includes 13 attributes as follows: age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise induced angina, old peak = ST depression induced by exercise relative to rest, the slope of the peak exercise ST segment, number of major vessels (0-3) colored by fluoroscopy, thal: 3 = normal; 6 = fixed defect; 7 = reversible defect. It contains 270 patterns with two classes. We convert all attributes type to Numeric type and we try to detect the presence or absence of heart disease. There is a hope, many forms of heart disease can be prevented or treated with healthy lifestyle choices. Contraceptive Method Choice (CMC): includes 9 attributes as follows: Wife's age, Wife's education, Husband's education, Number of children ever born, Wife's religion, Wife's now working?, Husband's occupation, Standard-of-living index. We change all of attributes to numeric. It contains 1473 instances with three classes. In fact, every human loves his generation control and we try to mining fuzzy rules suggest better contraceptive method with low variance. 2.2. Multi Objective Fuzzy Rule-Base Systems (MOFRBS) and fuzzy reasoning method For n-dimensional pattern classification problems, the following fuzzy if–then rules are applied in the design of our robust method:
where is rule index, denotes a linguistic variable, is a linguistic fuzzy term, c is number of classes and demonstrate consequent class and is a certainty grade of this rule in the unit interval [0, 1], and N is a number of rules. To improve interpretability of fuzzy rules we used linguistic variables in this study. Variable has a linguistic set ; each value of uniformly shows of domain . We also used don’t care value as all of those values. The total number of – with n features is . Solving this problem with large space computationally is very expensive.
A multi-objective evolutionary algorithm can consider multiple conflicting objectives, instead of a single one. The hybridization between multi objective evolutionary algorithms and fuzzy systems is known as multi objective evolutionary fuzzy systems [25]. We pursue two main objectives in this study, maximizing accuracy and enhancing interpretability. Enhancing interpretability is achieved through the minimizing SORL and minimizing NOR and decreasing variance of the measures. Our work is a c-class problem in the n-dimensional space with numeric attributes. Also we have real instances are given as ( ) training patterns. Our fuzzy learning method searches for a few numbers of robust fuzzy if– then rules, which are able to detect the consequent class with high accuracy and low variance along with high interpretability. In this classification system, the result and certainty grade of rule will be calculated from [26] by the following: First calculate the compatibility of each training pattern with the fuzzy – rule by the following: (
where
∏
)
(
)
is the membership function of
.
Second calculate the whole compatibility grades for each class: (
∑
)
(
In next step find the consequent class {
)
that has maximum
(
)
(
)
(
):
(
)}
If there is more than one class with maximum value for ( ) , we do not assign any consequent class for . In step 4 we can calculate certainty grade of . But if the consequent class of is not recognized we must be careful and initialize it with zero. (
( ∑ ∑
(
)⁄
)
)
After applying the above formula for each rule, the consequent class and of each rule will be identified. For an input pattern will ( ) the winner rule be calculated by following: (
)
(
)
Thus, we can identify the consequent class of input pattern with the aforementioned formula. Multi-objectivity of our work is veiled within framework. We will discuss it in next section. 3. Proposed Method Our proposed method consists of two stages. In the first stage, it tries to fill the main rules pool with high quality if-then rules, using a rule extraction algorithm in a parallel manner,
and in the next stage we select the best combination of main pool rules that has the highest accuracy and interpretability with a low variance (robustly).
Fig.1 RS_PSO schema with rules pool, belief space and RSA process
3.1. Stage One (Filling the main pool) For filling the main pool in an acceptable execution time, we use a parallel approach. We can use various algorithms in this stage and we have the option to select the best combination of the best rule extraction algorithms. The better the rule extraction algorithms are, the better results are achieved in the next stage. We name each of these parallel algorithms as 'threads'.
In this stage we use our algorithm for filling the main pool. We generate random rules in for each class and put them in . After filling we give them to an imitation-based algorithm like PSO with an innovation that we call it RS_PSO (Rule Selector Particle Swarm Optimization). It contains a new process for analyzing rule sets called "RSA" as demonstrated in fig. 1. This process uses a belief space like cultural evolutionary algorithms, runs iteratively and deletes unfavorable rules in each iteration. Also it looks after not decreasing our test accuracy that is measured by validation dataset. Our RS_PSO's population in are represented with binary particles with length of in which each dimension of a particle can be set to 1 or 0. The value 1 indicates presence of the ith rule of random pool and 0 indicates its non-presence. Initialization in this algorithm is random in which the probability of absence of a rule in a particle is more than the probability of its presence. In essence, we like rule sets with few rules, because this rule sets have high interpretability with low variance. Our PSO favors best rule sets instead of some isolated single good rules and if it is executed for several iterations, it gives rule sets that cover a large space and improve our method's results in stage two. After initialization, we evaluate the fitness of each rule set. Fitness is defined as the number of patterns of the train dataset that are classified correctly. Because of two reasons we also evaluate rule sets with the validation data set. First, we want to avoid the reduction of accuracy in the training step and second, we prevent presence of rules that their fitness values are lower than a constant number (which is one of the input parameters). This operation is accomplished in the next step of algorithm, called "RSA". We can use (minimum patterns that are classified wrongly) and other criteria as the belief space. In this work had a low impact, but in other environments it may have a better impact. In the RSA process we can apply other ideas to have good rules. For example we can select rules that are discovered from various places of dataset, to make the main pool more complete. In the RSA step, we eliminate the rules of rule sets that their fitness values in validation dataset are lower than ; this operation makes our main pool better, because we will have sturdy rules in end of stage two with high interpretability, since SORL, NOR and variances of these measures decreases. Usually, the better rules have more in their antecedent. For this reason, interpretability of our rules will be enhanced. All of the next steps of the filling pool algorithm is like a binary PSO and this operations runs for times ( is another input parameter). We carry out this idea with variant topologies of PSO such as Star, Ring, 4_cluster, Cube and Von Neumann, but because of similarity of results in terms of accuracy, interpretability and variance viewpoints, we do not mention the results. Generalized schema of stage one and its relation to the second stage are illustrated in fig. 2. It shows the parallel framework of stage one and how the main pool is filled. 3.2. Stage 2: Rule Selection After filling the main rule pool, we must analyze the pool and Pool Analyzer Process (PAP) carries out this task. In PAP, we analyze pool and perform a rule-based clustering algorithm. The best rules are clustered in the first clusters. The number of clusters can be constant or dynamic according to the rules condition. In this study, we assume a constant number for the number of clusters. When number of our clusters increases, robustness of our output rule set will be higher. We should set a value to number of clusters which do not undermine the accuracy. Setting it with a suitable value is an easy procedure, and we divide rules in the pool by their fitness values (number of patterns that correctly classified), by first sorting rules based on their fitness values that are calculated in the stage 1, and next dividing
Fig.2 A new framework for generating robust rule set
them into clusters. The output of PAP is some sub-pools that contain clusters of rules. After PAP we explore sub-pools one by one. Now we can have different settings for exploration of each cluster, and it gives us an advantage. We believe that every part of the search space has its own unique characteristics and we can explore these spaces with their properties. For example, the first clusters have high NCPs and do not require deep exploration but the last clusters have low NCPs and require deep exploration, and we should set the parameters relatively. We will discuss it later. The results in the next section show that this analysis of pool satisfies our various objectives. Since we place best rules in top clusters and we want to find the best groups of rules in these clusters, the final accuracy is acceptable, because in this process better rules are selected in the first step. The final rule sets have lots of in their antecedent and it enhances interpretability and if good rules are selected in the first steps, the NOR also decreases. Also when we do the clustering and exploring in a smaller space, our variances decrease. If the NOR in main pool is n, we must explore a space with size of but if we use clustering in the pool, the size of our search space is reduced to in which k is a constant number. After PAP we use a RSP (Rule Selection Process), like stage one except we use sub-pools instead of random pools. In fact we use RS_PSO but everyone can choose their RSP. Technically our PSO is partly good for this application because it tends to find the best rule set instead of some isolated good rules. Now we have k clusters of rules so that best rules are in the first clusters. We run RS_PSO as our RSP on clusters in sequence. After executing RS_PSO on the first cluster, we save the best extracted rule set and run RS_PSO on next cluster plus all of rules that were not selected in the previous execution. This time, evaluation is different. We must evaluate current population rule sets with addition of the previous best rule set. We call this approach . Other than this linear topology, everyone can use their topology, such as hierarchical topology, such that in which they first execute their RSP on each cluster and then put all of best output rule sets into another pool and again perform clustering on it, and they run RSP on it again. Also we can use the ring topology in hard datasets in which after the end of the last RSP on the last cluster we return to first cluster again and so on. After last execution of RSP, we should evaluate the final rule set by the test dataset. Our approach for dividing original datasets into train, validation and test datasets is that we pick a portion of patterns in the end of file as the test dataset and pick up the rest of the pattern as train and validation datasets. 4. Experimental results In this section we present the experimental results. First, we show results that highlight impact of using the main pool and clustering in proposed framework, and then we compare our work with other fuzzy rule classifier systems. To find the performance of the main pool and its clustering, we execute three algorithms: RS_PSO1 to fill the main pool, RS_PSO2 to perform rule selection without clustering, and at last, RS_PSO3 to perform rule selection with clustering. We see later that this three algorithms show RS_PSO2 is better than RS_PSO1 in terms of all our objectives and RS_PSO3 is better than RS_PSO2 in terms of interpretability but not necessarily in accuracy. It is normal since our objectives are interacting, and we should do a trade-off between them. When we increase interpretability, the probability of reduction in accuracy will be higher. We have run each of the three algorithms on each of the datasets for 30 times; hence, there are nine results, each of them corresponding to an algorithm and a dataset. For each algorithm and dataset, accuracy, SORL, and NOR are measured, and for each of these three, average, variance, standard deviation, and range (max minus min) are calculated.
Our objective is not to find a result better than previous studies in accuracy; but, to propose a framework, so that other algorithms can be implemented on, in order to decrease variance and increase interpretability. While doing this, accuracy may not change, but it often gets better and it will not decrease. We have used variance, but because it is squared, the measurement unit changes; because of this we have calculated the standard deviation too, so that we can have the same measurement units as the input. The range is used so that the range of output can be predictable beforehand. By these experiments, we want to show the effect of using a main pool, and clustering of that main pool on accuracy, NOR and SORL and variances of all of them. We divide datasets to train, validation and test sub-datasets according table 1. Table1: Dividing the Datasets to Train, Test and Validation Dsatsets Num of train patterns Num of validation patterns
dataset Pima Indians Diabetes
Statlog (Heart) Contraceptive Method Choice (CMC)
318 120 800
Num of test patterns
150 50 200
300 100 473
In table 2, we show parameters values. Consecutive numbers are written corresponding to each cluster. Algorithm
Dataset PID
RS_PSO1
S_heart CMC PID
RS_PSO2
S_heart CMC PID
RS_PSO3
S_heart CMC
Tabale2. Parameter Setting on Three Algorithms Number of Number of Epochs Number of Particles Pool rules
Numberof Clusters
Min_NCP
100 100 100 100 100 100 50-50-30-30-30 -30 50-50-30-30-30 -30 50-50-30-30
0 0 0 0 0 0 6 6 4
1 1 2 2 2 2 3-2-2-2-1-1 2-2-1-1-0-0 7-7-6-6
100 100 100 100 100 100 100-50-50-50-50-50 100-50-50-50-50-50 100-50-50-50
500 400 900 300 200 200 300 200 200
Before we show our results, it is good to explain some measures that are used in this study. The accuracy is calculated according to the following formula:
The variance is defined as: The average of the squared differences from the Mean. To calculate variance, we use this formula on 30 values that were resulted from 30 execution of each our algorithms. ∑
̅
The Standard Deviation is a measure of how spread out numbers are. It is the square root of the variance. √ The following charts show our results on three datasets with three modes: accuracy, SORL and NOR obtained in 30 runs.
Pima Indians Diabetes Accuracy
Pima Indians Diabetes: Sum of Length
90
100
80
80 60
70
40
60
20
50
0 1
3
5
7
9 11 13 15 17 19 21 23 25 27 29
RS_PSO1
RS_PSO2
1
3
RS_PSO3
5
7
9 11 13 15 17 19 21 23 25 27 29
RS_PSO1
RS_PSO2
(a)
(b)
Pima Indians Diabetes: Num of Rules
Statlog (Heart): Accuracy
30
90
25
80
20
RS_PSO3
70
15
60
10
50
5 0
40 1
3
5
7
9 11 13 15 17 19 21 23 25 27 29
RS_PSO1
RS_PSO2
1
3
5
RS_PSO3
7
9 11 13 15 17 19 21 23 25 27 29
RS_PSO1
(c)
RS_PSO2
RS_PSO3
(d)
Statlog (Heart): Sum of Length
Statlog (Heart): Num of Rules
50
20
40
15
30
10
20
5
10
0
0 1
3
5
7
1
9 11 13 15 17 19 21 23 25 27 29
RS_PSO1
RS_PSO2
3
5
7
9 11 13 15 17 19 21 23 25 27 29
RS_PSO1
RS_PSO3
RS_PSO2
(e)
(f)
CMC: Accuracy
CMC: Sum of Length
60
150
50
100
40
50
30
RS_PSO3
0 1
3
5
7
9 11 13 15 17 19 21 23 25 27 29
RS_PSO1
RS_PSO2
(g)
RS_PSO3
1
3
5
7
RS_PSO1
9 11 13 15 17 19 21 23 25 27 29 RS_PSO2
(h)
RS_PSO3
Fig 3. Results of three algorithm on three dataset with three aspect (Accuracy, SORL, NOR): a. Accuracy on the Pima dataset on 3 algorithms; b. SORL on the Pima dataset on 3 algorithms; c. NOR on the Pima dataset on 3 algorithms; d. Accuracy on the Heart dataset on 3 algorithms; e. SORL on the Heart dataset on 3 algorithms; f. NOR on the Heart dataset on 3 algorithms; g. Accuracy on the CMC dataset on 3 algorithms; h. SORL on the CMC dataset on 3 algorithms; i. NOR on the CMC dataset on 3 algorithms.
CMC: Num. of Rules 40 30 20 10 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 RS_PSO1
RS_PSO2
RS_PSO3
(i) Results show that all of our objectives have been improved when learning algorithm employs the main pool (in RS_PSO2) and performs the rule based clustering of main pool (in RS_PSO3). If we have not good rules in the main pool, our results will get worse, and for this reason we improve simple PSO with RSA, but also we can use another powerful algorithm in stage one and two. The order of the outputs of 30 runs was not important for us and we should have used bar charts, but to show variance of runs in our work we employ line charts. As you can see in table 3, after using main pool (by RS_PSO2) or main pool with clustering (by RS_PSO3), all of measures have been improved. Although we used simple PSO in stage one of our framework, our results are good compared to other algorithms that are rule based. Tabale 3 results on three dataset after 30 runs
Accuracy Dataset
Alg. RS_PSO1
Pima Indians Diabetes
Statlog (Heart)
Contraceptiv e Method Choice (CMC)
RS_PSO2
Average
68.981 81.558
Sum of Rule Length(SORL) Variance
385.42 6.4
STD
4.959 2.53
Max-Min
18.162 8.766
Average
62.467 23.667
Variance
MaxMin
STD
9235431 27.402
Number of Rules(NOR)
11.524 5.235
47 20
Average
Variance
19.567 14.6
25343 9.145
MaxMin
STD
2.661 3.024
10 11
RS_PSO3
81.428
03030
0.198
0.649
16.233
031.5
0.43
1
11
0
0
0
RS_PSO1
63.333
135191
9.535
32
30.367
135882
9.615
38
10.7
65629
2.575
10
RS_PSO2
81.067
25322
1.799
5
14.7
85984
2.036
6
7.933
95236
1.1426
3
RS_PSO3
81.967
03033
0.182
1
9.567
03254
0.504
1
6
0
0
0
RS_PSO1
41.183
935388
3.169
13.742
58.1
3165329
17.211
83
20.7
3.5228
5.073
23
RS_PSO2
49.584
95331
1.109
4.651
61.2
335292
4.551
16
31.4
25428
1.958
7
RS_PSO3
54.221
030.3
0.270
1.268
30
0
0
0
16.8
03234
0.484
1
If we want to see the performance of RS_PSO1, RS_PSO2, and RS_PSO3, we can use the average of each run on our three datasets, which is shown in Fig. 4. It shows that in all datasets, not only our framework improves the accuracy and interpretability (SORL and NOR), but also we have variance reduction in accuracy and SORL, and NOR which are pillars of interpretability. In fact, by clustering main pool to k clusters, our search space gets smaller. Since best rules are in ealier clusters and search space is smaller, the accuracy increases. Setting various parameters values have great impact in increasing accuracy and decreasing variances of accuracy and interpretability.
Average of Results on Three Datasets
Accuracy
Sum of length RS_PSO1
RS_PSO2
Max-Min
STD
variance
average
Max-Min
STD
variance
average
Max-Min
STD
variance
average
180 160 140 120 100 80 60 40 20 0
Number of rules RS_PSO3
Fig.4 Average of Results on Three Datasets
Here we use Knowledge Extraction based on Evolutionary Learning (KEEL)[27] , an open source software which is available on http://www.keel.es to compare our results with others. Algorithms and their parameters that used in the comparison are listed in table 4. We tried to set the best parameters values for these algorithms. Table 4: KEEL Algorithms and Parameters Name
Author
Number of Runs on each DS
GFS-AdaBoost-C[28]
del Jesus et al.
30
FH-GBML-C[29]
Ishibuchi et al.
30
GFS-GP-C[30]
Otero et al.
30
SGERD-C[31]
Mansoori et al
30
SLAVE-C[32] GFS-MaxLogitBoost-C[33]
González et al
30
Sánchez et al.
30
GFS-LogitBoost-C[30]
Otero et al.
30
GFS-SP-C[34]
Sánchez et al.
30
GFS-GCCL-C[35]
Ishibuchi et al.
30
GFS-GPG-C[34]
Sánchez et al.
30
Max Rules is equal to the average of NOR in RSPSO3 is equal to the average of NOR in RSPSO3 is equal to the average of NOR in RSPSO3 is equal to the average of NOR in RSPSO3 is equal to the average of NOR in RSPSO3 is equal to the average of NOR in RSPSO3 is equal to the average of NOR in RSPSO3 is equal to the average of NOR in RSPSO3 is equal to the average of NOR in RSPSO3 is equal to the average of NOR in RSPSO3
Num of Linguistic Terms
5 5 3 5 5 3 5 5 5 5
Table 5: The Results of the Comparisons
Dataset
Pima Indians Diabetes
Statlog (Heart)
Algorithms GFS-AdaBoost-C FH-GBML-C GFS-GP-C SGERD-C SLAVE-C GFS-MaxLogitBoost-C GFS-LogitBoost-C GFS-SP-C GFS-GCCL-C GFS-GPG-C RS_PSO3 GFS-AdaBoost-C FH-GBML-C GFS-GP-C SGERD-C
Accuracy average 73.652 75.591 79.222 75.234 73.574 80.61 80.0 76.16 69.55 74.66 81.428 62.667 77.222 79.222 70.63
Accuracy STD 0.687 2.239 3.889 3.281 2.568 1.51 1.37 2.36 2.26 2.75 0.198 2.645 4.584 3.889 5.559
Contraceptive Method Choice (CMC)
SLAVE-C GFS-LogitBoost-C GFS-SP-C GFS-GCCL-C GFS-GPG-C GFS-LogitBoost-C RS_PSO3 GFS-AdaBoost-C FH-GBML-C GFS-GP-C SGERD-C SLAVE-C GFS-LogitBoost-C GFS-SP-C GFS-GCCL-C GFS-GPG-C GFS-LogitBoost-C RS_PSO3
78.67 225.2 2.582 76.42 24522 26584 81.967 47.884 42.284 45.119 50.611 36.395 43.16 45.36 43.41 83512 8852. 54.221
3.897 8533 2514 2518 8528 2516 0.182 2.058 3.562 2.836 1.924 7.807 1.28 2.1 2.95 35.9 3532 0.270
We have tried to apply the same requirements on KEEL algorithms as used in our algorithm. We just compare our work in terms of accuracy and variance of accuracy, because the number of rules (NOR) in KEEL algorithms is constant and is an input parameter. Since we set it ourselves, we don’t compare KEEL algorithms with our algorithm, in aspects of NOR and SORL. To set this parameter, we set it with the mean value of NOR in our algorithm (RS_PSO3). Other algorithms parameters are shown in table 3. The results of comparisons are demonstrated in table 4. After executing ten KEEL algorithms on three dataset for 30 times, the results show that we simultaneously improve two criteria that are usually negatively correlated. 5. Conclusion In this study we consider variances of accuracy, SORL and NOR as interpretability measures and propose a new framework that simultaneously generates rule sets n times. It puts output rule sets in a main pool, and after performing rule-based clustering and analyzing on main pool, our framework runs a binary rule selection algorithm to generate a robust rule set. In essence, we increase accuracy and interpretability and decrease variances of them in three medical datasets. The average of our framework results on three data sets shows that accuracy is increased from 57.8 % to 72.5% by using RS_PSO2 instead of RS_PSO1 and its variance also decreases from 41.8 to 0.05. It shows that by using RS_PSO3 instead of RS_PSO2, the SORL is decreased from 50.3 to 18.6 and the NOR is decreased from 16.3 to 11.2 while their variances are decreased, from 174 to 0.15 and 13.1 to 0.08 respectively. Finally, we compare our results with ten other works, and the results showed that our work has better performance, especially in standard deviation (STD). In this paper we have shown that we have been successful in improving the two objectives that were negatively correlated and accordingly we have been successful in generating robust fuzzy rule based systems. References 1. 2.
3.
Gacto, M.J., R. Alcalá, and F. Herrera, Interpretability of linguistic fuzzy rule-based systems: An overview of interpretability measures. Information Sciences, 2011. 181(20): p. 4340-4360. Cordón, O., A historical review of evolutionary learning methods for Mamdani-type fuzzy rule-based systems: Designing interpretable genetic fuzzy systems. International Journal of Approximate Reasoning, 2011. 52(6): p. 894-913. Shun Ngan, P., et al., Medical data mining using evolutionary computation. Artificial Intelligence in Medicine, 1999. 16(1): p. 73-96.
4. 5. 6.
7.
8.
9. 10.
11.
12.
13.
14.
15.
16. 17. 18.
19.
20. 21. 22.
Levenick, J., Showing the way: a review of the second edition of Holland's adaptation in natural and artificial systems. Artificial Intelligence, 1998. 100(1–2): p. 331-338. Mencar, C., et al., Interpretability assessment of fuzzy knowledge bases: A cointension based approach. International Journal of Approximate Reasoning, 2011. 52(4): p. 501-518. Alonso, J.M., L. Magdalena, and G. González-Rodríguez, Looking for a good fuzzy system interpretability index: An experimental approach. International Journal of Approximate Reasoning, 2009. 51(1): p. 115-134. Zhou, S.-M. and J.Q. Gan, Low-level interpretability and high-level interpretability: a unified view of data-driven interpretable fuzzy system modelling. Fuzzy Sets and Systems, 2008. 159(23): p. 3091-3131. Guillaume, S. and B. Charnomordic, A new method for inducing a set of interpretable fuzzy partitions and fuzzy inference systems from data, in Interpretability issues in fuzzy modeling. 2003, Springer. p. 148-175. Guillaume, S. and B. Charnomordic, Generating an interpretable family of fuzzy partitions from data. Fuzzy Systems, IEEE Transactions on, 2004. 12(3): p. 324-335. Liu, F., C. Quek, and G.S. Ng, A novel generic hebbian ordering-based fuzzy rule base reduction approach to Mamdani neuro-fuzzy system. Neural Computation, 2007. 19(6): p. 1656-1680. Pulkkinen, P., J. Hytönen, and H. Koivisto, Developing a bioaerosol detector using hybrid genetic fuzzy systems. Engineering Applications of Artificial Intelligence, 2008. 21(8): p. 13301346. Alcalá, R., et al., A multi-objective genetic algorithm for tuning and rule selection to obtain accurate and compact linguistic fuzzy rule-based systems. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2007. 15(05): p. 539-557. Gacto, M.J., R. Alcalá, and F. Herrera, Adaptation and application of multi-objective evolutionary algorithms for rule reduction and parameter tuning of fuzzy rule-based systems. Soft Computing, 2009. 13(5): p. 419-436. Gacto, M.J., R. Alcalá, and F. Herrera, Integration of an index to preserve the semantic interpretability in the multiobjective evolutionary rule selection and tuning of linguistic fuzzy systems. Fuzzy Systems, IEEE Transactions on, 2010. 18(3): p. 515-531. Ishibuchi, H., T. Murata, and I.B. Türkşen, Single-objective and two-objective genetic algorithms for selecting linguistic rules for pattern classification problems. Fuzzy Sets and Systems, 1997. 89(2): p. 135-150. Ishibuchi, H., T. Nakashima, and T. Murata, Three-objective genetics-based machine learning for linguistic rule extraction. Information Sciences, 2001. 136(1): p. 109-133. Ishibuchi, H., et al., Selecting fuzzy if-then rules for classification problems using genetic algorithms. Fuzzy Systems, IEEE Transactions on, 1995. 3(3): p. 260-270. Ishibuchi, H. and T. Yamamoto, Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining. Fuzzy Sets and Systems, 2004. 141(1): p. 59-88. Ishibuchi, H., T. Murata, and I. Turksen. Selecting linguistic classification rules by twoobjective genetic algorithms. in Systems, Man and Cybernetics, 1995. Intelligent Systems for the 21st Century., IEEE International Conference on. 1995. IEEE. Pena-Reyes, C.A. and M. Sipper, A fuzzy-genetic approach to breast cancer diagnosis. Artificial intelligence in medicine, 1999. 17(2): p. 131-155. Gadaras, I. and L. Mikhailov, An interpretable fuzzy rule-based classification methodology for medical diagnosis. Artificial Intelligence in Medicine, 2009. 47(1): p. 25-41. Jiménez, F., G. Sánchez, and J.M. Juárez, Multi-objective evolutionary algorithms for fuzzy classification in survival prediction. Artificial Intelligence in Medicine, 2014.
23.
24. 25.
26. 27. 28. 29.
30. 31.
32.
33. 34. 35.
Ishibuchi, H. and Y. Nojima, Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. International Journal of Approximate Reasoning, 2007. 44(1): p. 4-31. Ishibuchi, H. and T. Murata. Multi-objective genetic local search algorithm. in Evolutionary Computation, 1996., Proceedings of IEEE International Conference on. 1996. IEEE. Fazzolari, M., et al., A Review of the Application of Multiobjective Evolutionary Fuzzy Systems: Current Status and Further Directions. Fuzzy Systems, IEEE Transactions on, 2013. 21(1): p. 45-65. Ishibuchi, H., K. Nozaki, and H. Tanaka, Distributed representation of fuzzy rules and its application to pattern classification. Fuzzy Sets and Systems, 1992. 52(1): p. 21-32. Alcalá-Fdez, J., et al., KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 2009. 13(3): p. 307-318. Del Jesus, M.J., et al., Induction of fuzzy-rule-based classifiers with evolutionary boosting algorithms. Fuzzy Systems, IEEE Transactions on, 2004. 12(3): p. 296-308. Ishibuchi, H., T. Yamamoto, and T. Nakashima, Hybridization of fuzzy GBML approaches for pattern classification problems. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 2005. 35(2): p. 359-365. Otero, J. and L. Sánchez, Induction of descriptive fuzzy classifiers with the logitboost algorithm. Soft Computing, 2006. 10(9): p. 825-835. Mansoori, E.G., M.J. Zolghadri, and S.D. Katebi, SGERD: A steady-state genetic algorithm for extracting fuzzy classification rules from data. Fuzzy Systems, IEEE Transactions on, 2008. 16(4): p. 1061-1071. González, A. and R. Pérez, Selection of relevant features in a fuzzy genetic learning algorithm. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 2001. 31(3): p. 417-425. Sánchez, L. and J. Otero, Boosting fuzzy rules in classification problems under single‐winner inference. International Journal of Intelligent Systems, 2007. 22(9): p. 1021-1034. Sánchez, L., I. Couso, and J.A. Corrales, Combining GP operators with SA search to evolve fuzzy rule based classifiers. Information Sciences, 2001. 136(1): p. 175-191. Ishibuchi, H., T. Nakashima, and T. Murata, Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 1999. 29(5): p. 601-618.