Generating of Derivative Membership Functions for Fuzzy Association Rule Mining by Particle Swarm Optimization Fatemeh Alikhademi, Suhaila Zainudin Faculty of Information Science and Technology Centre for Artificial Intelligent technology, UKM 43600 Bangi, Selangor Darul Ehsan, Malaysia
[email protected] [email protected] Abstract—The association Rule Mining (ARM) is a data mining task that extracts relations between items based on the item’s frequency. The ARM regards items with high frequency as more interesting than items with low frequency. In quantitative datasets, each item will be grouped into a large range of values. Therefore, items with low frequencies may not be considered as interesting. Hence, the possibility of extracting potentially interesting relations between these items will decrease. Thus, this deficiency brings a challenging issue to this field. Most of the existing methods for quantitative ARM in handling this problem are based on the Sharp Boundary Discretization methods and Clustering methods. These methods group each item into intervals with crisp boundaries which do not overlap. They bring some problems as well, such as ignoring or emphasizing more on values near the boundary of intervals. To deal with the problem of quantitative ARM, the combination of S and Z fuzzy shapes, which is combined with the Particle Swarm Optimization (PSO) is proposed in this paper to generate appropriate membership functions for each item. Fuzzy logic will group items into overlapping intervals and then, the fuzzy rules will be generated from the interesting items. The performances of the proposed methods are evaluated over Bilkent datasets and then, are compared with the results of clustering method (Fuzzy C-Means) in aspect of their capability to transform data to fuzzy data and then their efficiency are evaluated based on the quality of their generated rules. The results show the efficiency of the proposed method to extract the rules with more quality. Keywords—Fuzzy Membership functions; fuzzy association rule mining; PSO algorithm
I.
INTRODUCTION
In many domains, extracting useful patterns from raw data has attracted many researchers’ attention. Therefore, many techniques emerge for extracting better informative patterns. Traditionally, analysts used the statistical techniques which consume too much time and this led researchers to employ other techniques to address this drawback. Data mining is one of the techniques that combines algorithms and concepts of machine learning, statistics, artificial intelligence and data management which analyze data from different aspects and summarize it into useful information. Data mining has several
tasks such as association, classification and clustering. In this work, we focus on the association rule mining task. The association rule mining is a tool for finding the correlations of different attributes in datasets and their effects on each other. Many researches have been done to extract these relations. Agarwal and his team introduced the association rule mining for the first time in 1993 [1]. Relations in the association rule mining are represented by rules (IF-ELSE statements). One of the issues in the association rule mining is dealing with quantitative values. Traditional algorithms such as the Apriori can be applied to categorical attributes as most algorithms focus on binary or discrete values. However, data in real world applications usually includes quantitative values. Thus, designing data mining algorithms which are able to deal with different types of data is a challenge for researchers in this field [2]. A common and traditional approach is discretizing attributes and putting their values into new interval values such as the partitioning method [3]. By using the fuzzy concept to handle the association rule mining problem over quantitative data, it allows us to interpret it by human language and give a smoother transaction between boundaries. Several researchers have employed the fuzzy association rule mining and they generally divided into two groups in aspect of transforming original data to fuzzy data; the first group consists of researchers who used predefined membership functions to perform this transformation. For instance, Hong and his team used the predefined membership functions and generated the fuzzy rules based on the Apriori algorithm from quantitative data [4]. However, it is unrealistic that experts are able to define the most appropriate fuzzy sets and membership functions over different data. So the second group tried to generate appropriate membership functions for each dataset. For instance, Hong and his group in 2006 used the genetic fuzzy mining for extracting the membership functions [5]. As mentioned earlier, the predefined membership functions provided by experts are not suitable for different kinds of datasets; in addition, membership functions may have impact on the results of the association rule mining; therefore, in this
work, we propose a framework based on Particle Swarm Optimization to generate the membership functions from quantitative data. Experiments on the proposed algorithm were performed to prove the effectiveness of the proposed approach. The remaining parts of this paper are organized as follows. Section II reviews the fuzzy set systems, fuzzy association rule mining and then, reviews the PSO algorithm. In section III, descriptions of the most common criteria for interpret fuzzy intervals are presented. In section IV, the methodology of our work is explained and then, its Performance is discussed in section V. Finally in section VI, the conclusion is given. II.
RELATED WORK
This section is divided into three parts: Fuzzy set concepts, fuzzy association rule mining and PSO algorithm. A. Fuzzy Set Nowadays, many intelligent systems work based on the fuzzy set theory; this is mainly because in real world, we deal with incompleteness and inaccuracy, and therefore, it makes the fuzzy logic be relevant in many domains. The fuzzy set theory was introduced by Lotfi Zadeh in 1965. To represent the concepts, he uses natural languages’ words which may have ambiguous meanings. He also highlights that each object can be valued indefinitely in interval [0, 1] instead of the traditional logic that assigns just one or zero to each object. Thus, the value of an object is between full membership and nonmembership. The degree of membership will be defined by the membership function μ A (u ) :U → [ 0,1], that denotes the degree of item u to set A. Many of researches have been done to optimize the membership functions and they are generally divided into two groups: Derivate methods and Non-derivative methods [6]. In derivate methods, the shape of fuzzy sets is predefined and their performance is related to the parameters which form the shape. For instance, in the triangular shape, the parameters are centers and half-widths of the shape. Unlike the derivate methods, non-Derivate methods do not use predefined parameters and they can work with different membership function forms. This capability of these methods allows them to be more robust than the derivate methods. However, in comparison to the derivate methods, their convergence speed is slower. In this paper, we attempt to propose a derivate method for generating membership functions. B. Fuzzy Association Rule Mining The problem of data mining and the association rule mining with quantitative data can be solved by the fuzzy logic. The main role of fuzzy sets and membership functions is transforming quantitative values to linguistic terms, so it reduces possible itemsets in the mining process [7]. Although there are some approaches to mine quantitative association rules in literature and they can solve the problem of association rule mining, other problems exist [3]. For instance, Srikant and Agrawal applied sharp boundary between intervals that either ignore or emphasize more on values near the boundary of intervals.
Several researchers have applied the fuzzy concepts for extracting the fuzzy association rules. Hong and his team, transformed quantitative values into fuzzy data by membership functions and then, the scalar cardinality of the linguistic terms are calculated on transaction data. At the end, the rules will be extracted based on the fuzzy count [8]. Papadimitriou and his group, proposed Fp-trees method for extracting the fuzzy association rule mining [9]. There are also other researches in this domain and many other researches that exist in the literature. In the above algorithms, the experts defined the membership functions. However, it is not an easy task for experts to identify the most appropriate fuzzy sets in a dynamic environment. Some researchers attempted to design methods that are able to automatically generate the membership functions. For example, Hong and his team modified their previous work and proposed a new algorithm for extracting both the membership functions and rules [5]. Kaya and his team proposed Genetic Algorithm based on clustering to achieve the best membership functions [10]. In this paper, we propose a PSO-based method for generating both membership functions and fuzzy rules. C. Particle Swarm Optimization The particle swarm optimization was proposed by Eberhart and Kennedy in 1995. It was inspired by social behavior of birds. In the PSO, each bird is a candidate solution to the problem that is called as a particle. Each particle has its position and velocity that allow it to move to other positions based on its experience and its neighborhoods particles experience. The neighborhood particles are particles in the whole swarm or those particles which are the nearest to the particle. Therefore, the PSO mixes both the local (birds learn from their own experience) and global (birds learn from the experience of others around them) search in its process to find the optimal solution [11]. Each solution is represented as particle in the PSO and has two properties: position x and velocity V. N is the dimensions of the problem. The velocity of each particle is updated based on gbest and pbest that indicate the best position among particles visited so far and the best position a particle has visited so far, respectively. Equation 1 and equation 2 show the updated formula for velocity of each particle and its position:
v (t) = wv (t −1) + c r (pbest − x) + c r (gbest − x).
(1)
x (t ) = x (t −1) + v (t ).
(2)
i
i
11
i
2 2
i
i
W is the inertia weight that controls the impact of previous velocity on the current velocity of the particle. The new velocity should not exceed the maximum velocity ( Vmax ). c1 and c 2
are positive constants. r1 and r2 are random numbers in the range of 0 and 1 and they are updated every time they occur. After updating the velocity and position of each particle,
they move to a new position by new velocity. This process is repeated until the termination criterion is satisfied. III. INTERPRETABILITY OF FUZZY INTERVALS The formal criteria for evaluating the interpretability of fuzzy intervals related to the shape of membership functions are [12]: • Normality: For each variable in the universe of domain of U, if at least one of linguistic terms in that variable has full membership, it is called normal. Mathematically, the definition of normality is: ∀Fi ,∃x ∈U : Fi ( x ) = 1.
(3)
• Convexity: It means the membership values in each membership function are strictly monotonically increasing or decreasing, or strictly monotonically increasing then, strictly monotonically decreasing. Its mathematical definition is: ∀a ,b,c ∈U : a ≤ c ≤ b → F (c ) ≥ min{F ( a ), F (b)}. i i i
(4)
• Leftmost and rightmost fuzzy sets: For each variable in the universe of domain of U, the leftmost and rightmost linguistic term should have full membership (value one) for the left limit value of the universe domain and right limit value, respectively. Mathematically, the definition of leftmost and rightmost is: ∃F1 , Fc ∈ F : Fc (maxU ) = 1, F1 (minU ) =1.
(6)
α is the coverage level and usually sets α ≥ 0.5 because the fuzzy sets will be able to cover all data in the domain. • Distinguishability: This measure evaluates fuzzy sets which do not overlap too much. Mathematically, the definition is: ∀Fi , Fi+1 ∈ F : Sup min{Fi ( x ), Fi+1 ( x )} ≤ β .
A. Swarm Reperesentation Each phase has its representation for each swarm. Swarm representation for generating membership functions: In the first phase, each particle represents a membership function which consists of K number of parameters. K is the number of fuzzy intervals; in this paper, we used three intervals. Thus, each membership function consists of three parameters and each parameter denotes the center of each interval. The parameters are real number in the range of maximum and minimum values of that item and they have to satisfy this condition: C1 ≤ C 2 ≤ C3 ,
in which Ci is the center of interval i. Encoded of the membership functions as one particle will be ( C1 , C 2 , C 3 ) for that item. There are three membership functions which are the Z-shaped, combination of Z-shaped and S-shaped, and Sshaped, respectively. This representation for encoding the membership functions reduces the computation time and memory requirement. The overlap point X i that is the overlap point of two neighboring fuzzy sets is calculated using the equation below: xi = (ci + ci+1 ) / 2.
(5)
• Coverage / α-complement: It evaluates the universe of domain U which is totally covered by all fuzzy sets. Mathematically, the definition is: ∀x ∈U , ∃Fi ∈ F : Fi ( x ) ≥ α .
section. In our framework, the most suitable membership function for each attribute is generated by the PSO in the first phase and optimized by applying the S and Z fuzzy shapes. At the final iteration the best membership functions will gather and are used to transform quantitative data to fuzzy data and then the new dataset will passed to the next phase to generate fuzzy rules.
(7)
Overlap level, β, is usually sets β ≥ 0.5 IV. METHODOLOGY The framework of our work in generating appropriate membership function and fuzzy rules is explained in this
(8)
Swarm representation for generating fuzzy rules: In this phase Binary representation is used, the length of each particle in swarm is: 4* Number of attributes. For each attribute, 4 bites is used: the first bit represents, existence of that attribute in the rule (ER), 0 denotes this attribute does not exist in the rule and 1 denotes it exists in the rule. Second bit shows its position in the rule (PR) that is located in antecedent part of the rule (0) or consequent part of the rule (1) and the last two bits represent attribute value (AV) that is located in Low (00), Middle (01/10) or High (11) fuzzy set. Fig. 1 shows swarm representation for generating rules. ER
Attribute 1 PR
...
ER
Attribute N PR
AV
AV
Fig. 1. Represention of the particls in the second phase
B. Fitness Function In order to obtain the best membership functions for each item in the first phase, the goodness of membership functions is evaluated by the criteria which are described in section III. Thus, to satisfy them we applied the combination of two S and Z fuzzy shapes for the membership functions. For guaranteeing
the Leftmost and rightmost fuzzy sets measures, Z-shaped is used for the leftmost fuzzy set and S- shaped is used for the rightmost fuzzy set. Equation 9 describes the membership function of the combination of S and Z shapes. The overlap point X i has a membership degree of 0.5. Each fuzzy set has full membership function (degree 1) in the center of each fuzzy set. Therefore the normality measure is also guaranteed.
⎧0 ⎪1 ⎪ 2 − [( x − c i ) /( X i − c i )] 2 ⎪1 − 2[( x − X i ) /( c i +1 − X i )] 2 ⎪ μ ( x ) = ⎨1 2 F ⎪1 − 2[( x − c i +1 ) /( X i +1 − c i +1 )] 2 ⎪ 2 − [( x − X i +1 ) /( c i + 2 − X i +1 )] ⎪1 ⎪⎩ 0
if(x < c ) (9) if(x = ci ) i if(c < x < X ) i i if(X < x < c ) i +1 i if(x = c i +1 )