2011 Second International Conference on Intelligent Systems, Modelling and Simulation
Multi-level Fuzzy Association Rules Mining via Determining Minimum Supports and Membership Functions Ehsan Vejdani Mahmoudi
Elahe Sabetnia
Masood Niazi Torshiz
Mehrdad Jalali
Ghamarnaz Tadayon Tabrizi
Islamic Azad University, Islamic Azad University, Department of computer Department of computer Department of computer Mashhad Branch, Young Mashhad Branch, Young Engineering, Islamic Azad Engineering, Islamic Azad Engineering, Islamic Azad Researchers Club, Researchers Club, University - Mashhad University - Mashhad University - Mashhad Mashhad, Iran Mashhad, Iran Branch, Mashhad, Iran Branch, Mashhad, Iran Branch, Mashhad, Iran e.vejdani@mshdiau .ac.ir
[email protected] [email protected] [email protected] [email protected]
algorithm [6] to find interesting itemsets and fuzzy association rules in transaction data with quantitative values. However, these mining algorithms are mostly based on the assumption that users can specify the minimum support appropriate to their databases, and thus referred to as the Apriori-like algorithms [7], [8]. Han et al [9] have pointed out that setting the minimum support is quite subtle, which can hinder the widespread applications of these algorithms. Our own experiences of mining transaction databases also tell us that the setting is by no means an easy task. With existing algorithms that assume a single minimum support, the best that one can do is to apply such algorithms at the lowest minimum support specified and filter the result using the other minimum supports. This approach will generate many candidates that are later discarded. Hong and et al proposed a method for Multi-level fuzzy mining with multiple minimum supports [10][11], but in this issue user must be specified minimum support for each item. Furthermore, taxonomic relationships among items often appear in real applications. For example, wheat bread and white bread are two kinds of bread. Bread is thus a higher level of concept than wheat bread or white bread. The information needed by decision makers in some applications is not necessary to be detailed to the primitive concept level, but at a higher one. For example, the association rule “bread →milk” may be more helpful to decision makers than the rule “wheat bread→ juice milk”. Discovering association rules at different levels may thus provide more information than that only at a single level [12], [13]. Basically, fuzzy mining algorithms first used membership functions to transform each quantitative value into a fuzzy set in linguistic terms and then used a fuzzy mining process to find fuzzy association rules. Items have their own characteristics, different minimum supports and membership functions may be specified for different items. This paper consists three phases. First phase, Vejdani et al proposed a method for extracting membership functions by ACS algorithm without specifying actual minimum support, this method for earn membership functions for each item [14]. In the second phase, computing minimum supports for each item in database with own features. Third phase, starting fuzzy multiple level mining algorithm with multiple
Abstract— Association rule mining is sought for items through a fairly large data set relation are certainly consequential. The traditional association mining based on a uniform minimum support, either missed interesting patterns of low support or suffered from the bottleneck of itemset generation. An alternative solution relies on exploiting support constraints which specifies the required minimum support itemsets. This paper proposes an ACS-based algorithm to determine membership functions for each item followed by computing minimum supports. It therefore will run the fuzzy multi-level mining algorithm for extracting knowledge implicit in quantitative transactions, immediately. In order to address this need, the new approach can express three profits includes specifying the membership functions for each items; computing the minimum support for each item regarding to characteristic for each item in database and making a system automation. We considered an algorithm that can cover the multiple level association rules under multiple item supports, significantly. Keywords- fuzzy data mining; multiple minimum supports; association rule; membership functions; ant colony system.
I.
INTRODUCTION
Data mining extracts implicit, previously unknown, and potentially useful information from databases. The discovered information and knowledge are useful for various applications, including market analysis, decision support, fraud detection, and business management [1]. Particularly interesting are association rules that reflect relationships among items in datasets. Recall that, in general, associations express specific semantics in linking data items together in the sense that if X →Y is such an association then “occurrence of X is associated with occurrence of Y”, where X and Y are attributes of data items. Recently, the fuzzy set theory [2] has been used more and more frequently in intelligent systems because of its simplicity and similarity to human reasoning [3]. Several fuzzy learning algorithms for inducing rules from given sets of data have been designed and used to good effect with specific domains [4], [5]. As to fuzzy data mining, Hong and Kuo proposed a mining approach that integrated fuzzy-set concepts with the Apriori mining 978-0-7695-4336-9/11 $26.00 © 2011 IEEE DOI 10.1109/ISMS.2011.20
55
minimum supports of items for extracting implicit knowledge from transactions stored as quantitative values. The remaining parts of this paper are organized as follows. Some related ACS algorithm and multiple level mining and compute minimum support for each item in Section II. The proposed algorithm modified framework multilevel fuzzy mining with multiple minimum supports is described in Section III. An example to illustrate the proposed algorithm is given in Section IV. Conclusion is given in Section V. II.
functions parent`s in first level of taxonomy tree for converting quantities values to fuzzy values. B. Mining multiple level association rules Previous studies on data mining focused on finding association rules at a single concept level. Mining association rules at multiple concept levels may, however, lead to discovery of more general and important knowledge from data. Relevant item taxonomies are usually predefined in real-world applications and can be represented as hierarchy trees. Terminal nodes on the trees represent actual items appearing in transactions; internal nodes represent classes or concepts formed from lower-level nodes [18]. A simple example is given in Figure 1. In Figure 1, the root node for “Food” is at level 0, the internal nodes representing categories (such as “Milk”) are at level 1, the internal nodes representing flavors (such as “Chocolate”) are at level 2, and the terminal nodes representing brands (such as “Kaleh”) are at level 3. Only terminal nodes appear in transactions. Han and Fu proposed a method for finding level-crossing association rules at multiple levels [13]. Their method could find flexible association rules not confined to strict, pre-arranged conceptual hierarchies. Nodes in predefined taxonomies are first encoded using sequences of numbers and the symbol “*” according to their positions in the hierarchy tree [10]. For example, the internal node “Milk” in Figure 1 is represented by 1**, the internal node “Chocolate” by 11*, and the terminal node “Pegah” by 111. A top-down progressively deepening search approach is used and exploration of “level-crossing” association relationships is allowed [10].
PRELIMINARIES
In this section, explain ACS algorithm and multiple level mining. Afterward, illustrate how can compute minimum support for each item. A. The ACS-based fuzzy mining algorithm Recently, Ant Colony Systems (ACS) has been successfully applied to several difficult NP-hard problems, such as Traveling Salesman Problem (TSP), Job Schedule Problem (JSP), Vehicle Routing Problems (VRP) etc [15], [16], [17]. They are a heuristic approach inspired from the behavior of social insects. Ants deposit their chemical trails called “pheromone” on the ground for communicating with others. According to the pheromone, ants can find the shortest path between the source and the destination. Proposed algorithm is divided into two phases. The first phase searches for an appropriate set of membership functions for the items by the ACS mining algorithm. After the searching for the solutions in the first phase is finished, the best set of membership functions is used for fuzzy data mining in the second phase. The Ant Colony System algorithm plays an important role in extracting the membership functions. It proposed that ACS algorithm for extracting membership functions without specifying minimum support [14]. We assume the parameters of membership functions as discrete values and thus try to use the ACS algorithm to find them. We transform the extraction of membership functions into a route-search problem. A route then represents a possible set of membership functions. The artificial ants, which refer to virtual ants that are used to solve this problem, can then be used to find a nearly optimal solution. Each item has a set of membership functions, which are assumed to be the shape of an isosceles triangle for simplicity. The membership functions stand for linguistic terms, such as low, middle, high. Each membership function thus has two parameters, center and half the spread (called span) [14]. In this issue, we used idea taxonomy for earn membership functions for each group in first level. So that, first obtained membership function for each item in taxonomy tree with averaging center and span between two membership functions of child therefore we can compute membership functions of parent. Thus, each class in first level have own membership functions and children in each class used membership
C. Computing the multiple minimum support A variety of mining approaches based on the Apriori algorithm were proposed, each for a specific problem domain, a specific data type, or for improving its efficiency. In these approaches, the minimum supports for all the items or itemsets to be large are set at a single value. Liu et al. proposed an approach for mining association rules with nonuniform minimum support values [19]. In reality, however, there are many good reasons that the minimum support is not uniform. First, deviation and exception often have much lower support than general trends [20]. For example, rules for accidents are much less supported than rules for non-accidents, but the former are often more interesting than the latter. Second, the support requirement often varies with the support of items contained in an itemset. Rules containing bread and milk usually have higher support than rules containing food processor and pan. Third, item presence has less support than item absence. Fourth, the support requirement often varies at different concept levels of items [12], [13]. Fifth, hierarchical classification like [21] requires feature terms to be discovered at different concept levels, thereby, requiring a nonuniform minimum support. Finally, in recommender
56
systems [22], recommendation rules are required to cater or both big and small groups of customers. In general, rules of high support are well-known to the user, and it is the rules of low support that may provide interesting insights and need to be discovered [20]. Hong et al proposed multilevel fuzzy mining with multiple minimum supports [10], [11], in their method; user must be specifying minimum support for each item. But, it has been recognized that setting the minimum support is a difficult task to users. This can hinder the widespread applications of these algorithms. In this paper we propose a method for computing minimum support for each item with own characterize in databases. We have important criteria for computing minimum support like; number of occurrences item in database and sum values of item in database. For example, suppose the number of occurrence item A in database is 10 and sum values is 20 and so number of occurrence item B in database is 2 and sum values is 20. Clearly in mining process item A valuable than item B. We computing minimum support for item B until this item can`t satisfying minimum support. As mentioned above, we suggested an Equation (1) as follow: _
( )=
∑ ∗ ∗
mining algorithm first encodes items (nodes) in a given taxonomy as Han and Fu’s approach did [13]. It then filters out unpromising itemsets, the count of a fuzzy region is checked to determine whether it is larger than support threshold. In this phase, a set of membership functions are used to transform the quantitative transactions into fuzzy values. The proposed algorithm then finds all the large itemsets for the given transactions by comparing the fuzzy count of each candidate itemset with its support threshold. The details of the proposed fuzzy mining algorithm are described with example below. IV.
In this section, a simple example is given to demonstrate the proposed fuzzy mining algorithm, which generates a set of fuzzy taxonomic association rules from a given quantitative transaction dataset under the maximum-itemset minimum-taxonomy support constraint of multiple minimum supports and multiple membership functions. Assume the quantitative transaction dataset includes the ten transactions as shown in Table III. Each transaction includes a transaction ID and some purchased items. Each item is represented by a tuple (item name, item amount) in table III name of item coded by method in section II. Assume the predefined taxonomy is shown in Figure 1. The food in Figure 1 falls into four classes: milk, bread, cookie and beverage. Milk can be further classified into chocolate milk and plain milk. There are two brands of chocolate milk, Pegah and Kaleh. The other nodes can be similarly explained. In proposed algorithm we have three phases; in first phase, we used ACS algorithm for extracting membership functions for each item. In second phase compute minimum support for each item in database with illustrated formula in section II. And then start Multi-level fuzzy mining with multiple minimum supports and multiple membership functions.
(1)
Let I = {i1, i2, ..., im} be a set of items and D = {t1, t2, ... , tn} be a set of transactions. N is total number of transaction data. T is number of occurrence an item in the database. Si is sum values of an item in database D. P is constant digit with respect to the interval [0, 1]. III.
AN EXAMPLE
THE PROPOSED ALGORITHM
The proposed algorithm modified framework multi-level fuzzy mining with multiple minimum supports [10], [11]. The proposed mining algorithm integrates fuzzy set concepts, data mining, ant colony system and multiple-level taxonomy to find fuzzy association rules in a given transaction data set. The knowledge derived is represented by fuzzy linguistic terms, and thus easily understandable by human beings. In spite of proposed algorithm [10], [11] which considered one constant membership functions for all items and user must be specified minimum support for each item, we use specified membership functions for each item and computed minimum support for each item with own characteristics; because, in real world applications, like applications which work on transactional data of chain stores, items have different quantities. Hence, using different membership functions and minimum supports for each item in order to mining association rules is an efficient idea. In this work, despite the previous ones that user specified minimum supports and membership functions; minimum supports are computed by a preprocessing on all items and membership functions achieved by ACS algorithm for each item. The minimum support for an itemset is set as the maximum of the minimum supports of the items contained in the itemset [10]. The proposed fuzzy
Figure 1. The predefined taxonomy in the example
A. Determining membersip functions First phase, Vejdani et al proposed a method for earn membership functions for each item [14]. And then with Specified membership functions by ACS algorithm for each class in first level according to explained in section II. In this example, amounts are represented by three fuzzy regions: Low, Middle and High. Thus, three fuzzy membership values are produced for each item amount according to the earned membership functions.
57
B. Determining minimum support Second phases, in this paper we proposed Equation (1) for compute minimum support, which have relation with number of occurrences item in database and sum values of item in database. For example, the item “Razavi white bread” or encoded name “211” in table III, sum values = 21, number of occurrences item 211 is T=5, total number of transaction N=10 and supposed P=%30 in this example.
In Figure 2 shows that the membership functions used in the class of Milk. There are four items “Pegah chocolate milk”, “Kaleh chocolate milk”, “Pegah plain milk” and “Kaleh plain milk” which is earn regarding to ACS algorithm and amount “Low”, “Middle” and “High” is value into two regions (Center, Span) then the average membership functions of child is membership functions of Milk’s classes. The membership functions of Pegah chocolate milk are (5, 5) (10, 5) (15, 5), the membership functions of Kaleh chocolate milk are (4, 4) (8, 4) (14, 5), the membership functions of Pegah plain milk are (3, 3) (10, 5) (16, 4), and the membership functions of Kaleh plain milk are (4, 4) (9, 5) (16, 6.5). Thus, membership functions class of Milk equal with (4, 4) (9.25, 4.75) (15.25, 5.125). Membership functions bread, cookies and beverage shown figure 3 which is mentioned above.
_
1
Low
Middle
4
8
Cookies
Low
1
Number of items High
Middle
Membership value
TABLE I. T HE COMPUTED MINIMUM S UPPORT FOR EACH ITEM Item name (terminal node)
0
Beverage 1
1
11
6
Low
Middle
3
6
Pegah chocolate milk Kaleh chocolate milk Pegah plain milk Kaleh plain milk Razavi white bread Khayam white bread Razavi wheat bread Khayam wheat bread Naderi chocolate cookies Dorna chocolate cookies Naderi lemon cookies Dorna lemon cookies Ahmed black tea beverage Golestan black tea beverage Ahmed green tea beverage Golestan green tea beverage
Number of items
High
Membership value
0
9+5+4+1+2 = 1.4 5 ∗ 10 ∗ %30
Step 1) Each item name is first encoded using the predefined taxonomy. Results are shown in Table II. For example, the item ’’Kaleh chocolate milk’’ is encoded as ‘112’, in which the first digit ’1’ represents the code ‘milk’ at level 1, the second digit ‘1’ represents the flavor ‘chocolate’ at level 2, and the third digit ‘2’ represents the brand ‘Kaleh’ at level 3. Step 2) Translate the item names in the transaction data according to the encoding scheme. The results are shown in Table III. Step 3) Specified membership functions by ACS algorithm for each class in first level according to explained in section II. Step 4) k is initially set at 1, where k is used to store the level number being processed. Step 5) All the items in the transactions are first grouped at level one and their corresponding amounts are added. Take the items in transaction T1 as an example. The items (411, 4) and (412, 6) are grouped into (4**, 10). Results for all the transaction data are shown in Table IV.
High
12
∗
=
C. Third phase, illustrate algorithm
Membership value
0
∑ ∗
For other items which is used in the same way. Results are shown in Table I.
Figure 2. The membership functions used in the class of Milk
Bread
(211) =
9
Number of items
Figure 3. Membership functions used in the classes of bread, cookies and beverage
58
Minimum Support 2 1.55 0.33 2 1.4 1.16 1.44 1.1 1.55 0.66 2.3 1.55 1.33 1.4 1.6 1.33
TABLE III. ENCODED TRANSACTION D ATA IN E XAMPLE
Step 6) The quantitative values of the items at level 1 are represented as fuzzy sets by the earned membership functions in Step 3. Take the first item in transaction T3 as an example. The amount of (2** , 7) is mapped to the membership functions class bread Low in Figure 3 with value 0.25, to Middle with value 0.75, and to High with value 0. The fuzzy set transformed is then represented . . as ∗∗ . + ∗∗ . This Step is repeated for the other items, and the results are shown in Table V, where the notation item. Term is called a fuzzy region. Step 7) The fuzzy regions with membership values larger than zero are collected as the candidate set . The scalar cardinality of each region in 11 is then calculated as its count value. Take the fuzzy region 3**.Middle as an example. Its cardinality = (0.8+0.2+0.2 + 0.8 + 0.4 + 0.4 + 0.8) = 3.6. This Step is repeated for the other regions, and the results are shown in Table VI. Step 8) The fuzzy region with the highest count among the three possible regions for each item is found. Take item ‘1**’ as an example. Its count is 2.5 for Low, 0.63 for Middle, and 0.95 for High. Since the count for Low is the highest among the three counts, the region Low is thus used to represent item ‘1**’ in later mining processes. This Step is repeated for the other items, “Middle” is chosen for 2**, “Middle” is chosen for 3**, and “High” is chosen for 4**. And then checked the maximum of minimum supports of items included in it. According to the computed minimum supports of items shown in Table I, the minimum supports of 1**, 2**, 3** and 4**, are 2, 1.44, 2.3 and 1.6 respectively. Since the count values of 1**.Low, 2**.Middle, 3**.Middle, and 4**.High satisfy their respective minimum supports, they are put into . Step 9) Since is not null, the next Step is done. Step 10) The candidate 2-itemset is generated from as (1**.Low, 2**.Middle). They are shown in Table VII.
TID T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
TABLE IV. LEVEL-1 REPRESENTAION FOR THE T RANSACTION D ATA IN E XAMPLE TID T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
Code 111 112 121 122 211 212 221 222 311 312 321 322 411 412 421 422
Item name (internal node) milk bread cookies beverage chocolate milk plain milk white bread wheat bread chocolate cookies lemon cookies black tea beverage green tea beverage
Items (1**,5) (2**,11) (4**,10) (2**,8) (4**,9) (2**,7) (3**,7) (4**,5) (1**,3) (2**,6) (3**,3) (4**,8) (2**,1) (3**,5) (4**,6) (2**,4) (3**,11) (4**,3) (1**,7) (2**,7) (3**,2) (1**,15) (2**,1) (3**,2) (1**,3) (2**,7) (3**,5) (4**,2) (2**,13) (3**,9) (4**,8)
TABLE V. THE LEVEL-1 F UZZY SETS TRANSFORMED FROM THE DATA IN TABLE IV TID T1 T2
0.25 0.75 + 2∗∗ . Low 2∗∗ . Mid
T4
0.75 1∗∗ . Low
T5
0.25 2 ∗∗. Low 1 2 ∗∗. Low
T6
Code 1** 2** 3** 4** 11* 12* 21* 22* 31* 32* 41* 42*
0.75 0.105 + 1 ∗∗. Low 1∗∗ . Mid 1 1 2 ∗∗. Mid 4∗∗. High
T3
T8
0.95 1 ∗∗. High
T9
0.75 1∗∗ . Low
1 2 ∗∗. High
0.8 0.2 + 3∗∗ . Mid 3∗∗ . High
1 4∗∗ . High
0.34 0.66 + 4∗∗ . Low 4∗∗ . Mid
0.6 0.4 + 3∗∗ . Low 3∗∗. Mid
0.34 0.66 + 4∗∗ . Mid 4∗∗ . High
0.2 0.8 1 + 3∗∗ . Low 3∗∗ . Mid 4∗∗ . Mid 1 1 3∗∗ . High 4 ∗∗ . Low
0.25 0.525 + 1∗∗ . Low 1∗∗ . Mid
T1 0
Items 0.25 0.75 + 2∗∗ . Mid 2∗∗ . High
0.5 0.5 + 2∗∗. Low 2∗∗ . Mid
T7 TABLE II. C ODES OF I TEM NAMES Item name (terminal node) Pegah chocolate milk Kaleh chocolate milk Pegah plain milk Kaleh plain milk Razavi white bread Khayam white bread Razavi wheat bread Khayam wheat bread Naderi chocolate cookies Dorna chocolate cookies Naderi lemon cookies Dorna lemon cookies Ahmed black tea beverage Golestan black tea beverage Ahmed green tea beverage Golestan green tea beverage
Items (112,5) (211,9) (222,2) (411,4) (412,6) (211,5) (221,3) (411,1) (412,3) (422,5) (212,3) (222,4) (322,7) (412,5) (111,3) (211,4) (212,2) (311,3) (411,7) (412,1) (211,1) (322,5) (412,6) (212,4) (322,2) (311,9) (422,3) (112,6) (121,1) (221,7) (311,2) (111,9) (122,6) (212,1) (312,2) (112,3) (212,3) (222,4)(321,5) (421,2) (211,2) (212,8) (221,3)(321,9) (421,8)
0.25 0.75 + 2∗∗ . Low 2∗∗ . Mid
0.25 2 ∗∗ . Low
0.8 0.2 + 3∗∗. Low 3∗∗ . Mid
0.8 0.2 + 3 ∗∗ . Low 3∗∗ . Mid
0.25 0.75 + 2∗∗ . Low 2∗∗. Mid
0.2 0.8 + 3∗∗ . Low 3 ∗∗ . Mid
0.4 0.6 + 3 ∗∗ . Mid 3 ∗∗ . High
0.66 4∗∗ . Low
0.34 0.66 + 4∗∗ . Mid 4∗∗ . High
TABLE VI. THE C OUNTS OF THE LEVEL-1 F UZZY REGIONS Itemset 1**.Low 1**.Middle 1**.High 2**.Low 2**.Middle 2**.High 3**.Low 3**.Middle 3**.High 4**.Low 4**.Middle 4**.High
59
Count 2.5 0.63 0.95 2.75 4 1.75 2.6 3.6 1.8 2 2.34 3.32
TABLE VII. T HE CANDIDATE 2-I TEMSETS AND THEIR COUNTS AT LEVEL 1
Step 11) The following substeps are done for each newly generated candidate 2-itemset in : a) The fuzzy membership values of the candidate 2itemsets are calculated for each transaction data. Here, the minimum operator is used for intersection. Take (1**.low, 2**.Middle) as an example. The derived membership value of (1**.low, 2**.Middle) for transaction T1 is calculated as min (0.75, 0.25) = 0.25. The results for the other transactions are shown in Table VIII. b) The scalar cardinality (count) of each candidate 2itemset in 12 is calculated. Results for this example are shown in Table VII (in Step 10). c) Since only the count value of {2**.Middle, 4**.High} is larger than its support threshold (the maximum of the minimum supports of items), it is then put into the set of large 2-itemsets 12 . That is, 12 = {(2**.Middle, 4**.Middle)}. Step 12) r is set at 2, where r is used to represent the number of regions stored in the current large itemsets. Step 13) Since is not null, the next Step is then done. Step 14) Since there is only one 2-itemset in , the candidate 3-itemsets generated at level 1 is null. Then k = k + 1 = 2 and Step 5 is done. Step 15) Steps 5 to 14 are executed again for level 2. The large itemsets found for level 2 are shown in Table IX. After large itemsets for level 2 have been found, Steps 5 to 12 are executed again for level 3. The large itemsets for level 3 are shown in Table X. Step 16) Since there is no 2-itemset generated at level 3 is null. The algorithm then goes to Step 13. In addition, since the level of the given taxonomy is achieved, Step 17 is then executed. Step 17) The association rules for each large q-itemsets, q ≥ 2, are constructed by the following substeps: a) All possible association rules are shown in table XI. b) The confidence factors of the association rules are calculated. Take the possible association rule ‘‘If 2** = Middle, then 4** = High’’ as an example. The confidence value for this rule is calculated as: ∑
2-Itemset 1**.low, 2**.Middle 1**.low, 3**.Middle 1**.low, 4**.High 2**.Middle, 3**.Middle 2**.Middle, 4**.High 3**.Middle, 4**.High
Count 1.75 1.35 1.41 2.1 1.75 0.8
TABLE VIII. T HE MEMBERSHIP VALUES FOR (1**.LOW, 2**.MIDDLE) TID T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
1**.low 0.75 0 0 0.75 0 0 0.25 0 0.75 0
2**.Middle 0.25 1 0.75 0.5 0 0 0.75 0 0.75 0
∗∗. 0.25 0 0 0.5 0 0 0.25 0 0.75 0
∩
∗∗.
TABLE IX. T HE SET OF LARGE I TEMSETS AT LEVEL 2 IN THE E XAMPLE Large r-itemsets 11*.Low 21*.Low 22*.Low 31*.Low 32*.Middle 41*.Middle 42*.Low
Count 2.75 5 4.25 2.2 3 2.3 2
21*.Low, 22*.Low 21*.Low, 41*.Middle 21*.Low, 42*.Low
2.25 1.58 2
TABLE X. T HE SET OF L ARGE ITEMSETS AT LEVEL 3 IN THE E XAMPLE Large r-itemsets 112.Low 211.Low 212.Low 221.Low 222. Low 312. Low 322.Middle 412.Middle 422.Low
2∗∗ . Middle ∩ 4∗∗ . High 1.75 = = 0.43 ∑ 2∗∗. 4
Count 2 2.5 4 1.75 2.5 0.8 1.8 2.66 1.34
Table XI. ALL P OSSIBLE ASSOCIATION RULES WITH MINIMUM CONFIDENCE Rule ID 1 2 3 4 5 6 7 8
Step 18) The confidence values of the above association rules are compared with the predefined confidence threshold k. Assume the confidence k is set at 0.5 in this example. Therefore the rules of ID 1, 4, 6 and 8 are accepted. In this example, four fuzzy association rules are generated. The proposed algorithm cans thus find the large itemsets level by level without backtracking.
Fuzzy Association Rules 2**.Middle, 4**.High 4**.High, 2**.Middle 21*.Low, 22*.Low 22*.Low, 21*.Low 21*.Low, 42*.Low 42*.Low, 21*.Low 21*.Low, 41*.Middle 41*.Middle, 21*.Low
V.
Minimum Confidence 0.53 0.43 0.45 0.52 0.4 1 0.32 0.69
CONCLUSIONS
In this paper, we have proposed mining algorithm integrates fuzzy set concepts, data mining, ant colony system and multiple-level taxonomy to find fuzzy
60
association rules in a given transaction data set and also it consists three phases. First phase, a method for extracting membership functions by ACS algorithm without specifying actual minimum support, in the second phase, computing minimum supports for each item in database with own features and third phase, starting fuzzy multiple level mining algorithm with multiple minimum supports of items for extracting implicit knowledge from transactions stored as quantitative values. In the proposed algorithm, the minimum support for an item at a higher taxonomic concept is set as the maximum of the minimum supports of the items belonging to it. Our approach have three profits includes specifying the membership functions for each items; computing the minimum support for each item regarding to characteristic for each item in database and making a system automation. The most important difference between our algorithm and existing fuzzy association rules mining algorithm are that our model does not require the membership functions and the minimum-support threshold for user. An example is also given to demonstrate that the proposed mining algorithm can derive the multiple-level association rules under multiple item supports in a simple and effective way. REFERENCES [1] C. I. Chang, H. E. Chueh, N. P. Lin, “Sequential Patterns Mining with Fuzzy Time-Intervals”, Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009, pp. 165-169. [2] L. A. Zadeh, “Fuzzy sets”, Information and Control, vol. 8, no.3, 1965, pp. 338-353. [3] A. Kandel, Fuzzy Expert Systmes, CRC Press, Boca Raton, 1992, pp. 8-19. [4] A. Gonzalez, “A learning methodology in uncertain and imprecise environments”, international Journal of intelligent Systems, vol. 10, 1995, pp. 57-37. [5] T. P. Hong and J. B. Chen,” Processing individual fuzzy attributes for fuzzy rule induction”, Fuzzy Sets and Systems, vol. 112, No .l, 2000, pp. 127-140. [6] T. P. Hong, C. S. Kuo, S. C. Chi, “Trade-off between time complexity and number of rules for fuzzy mining from quantitative data”, International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems, Vol. 9, No. 5, 2001, pp. 587-604. [7] S. Zhang, J. Lu, & C. Zhang, “A fuzzy-logic-based method to acquire user threshold of minimum-support for mining association rules”, International Journal of Information Sciences, Vol. 164, 2004, pp. 1-16. [8] C. Zhang, & S. Zhang, “Association rules mining: Models and algorithms”, Lecture notes in computer science SpringerVerlag, Vol. 2307, 2002, pp. 243. [9] J. Han, J. Wang, Y. Lu, &P. Tzvetkov, ”Mining top-k frequent closed patterns without minimum support”, Second IEEE International Conference on Data Mining (ICDM'02), 2002, pp. 211-217.
[10]