(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010
Extracting Membership Functions Using ACS Method via Multiple Minimum Supports Ehsan Vejdani Mahmoudi
Masood Niazi Torshiz
Mehrdad Jalali
Islamic Azad University, Mashhad Branch, Young Researchers Club, Mashhad, Iran
[email protected]
Department of computer Engineering, Islamic Azad University - Mashhad Branch, Mashhad, Iran
[email protected]
Department of computer Engineering, Islamic Azad University - Mashhad Branch, Mashhad, Iran
[email protected]
approach inspired from the behavior of social insects. Ants deposit their chemical trails called “Pheromone” on the ground for communicating with others. According to the pheromone, ants can find the shortest path between the source and the destination. Recently, Ant Colony Systems (ACS) has been successfully applied to several difficult NP-hard problems, such as the quadratic assignment [5], communication strategies [6], production sequencing problem [7]. Job Schedule Problem (JSP) [8], the traveling salesman problems [9], [10], Vehicle Routing Problems (VRP) [11], etc.
Abstract— Ant Colony Systems (ACS) have been successfully applied to different optimization issues in recent years. However, only few works have been done by employing ACS method to data mining. This paper addresses the lack of investigations on this study by proposing an ACS-based algorithm to extract membership functions in fuzzy data mining. In this paper, the membership functions were encoded into binary bits, and then they have given to the ACS method to discover the optimum set of membership functions. By considering this approach, a comprehensive exploration can be executed to implement the system automation. Therefore, it is a new frontier, since the proposed model does not require any user-specified threshold of minimum support. Hence, we evaluated our approach experimentally and could reveal this approach by significant improving of membership functions.
Basically, fuzzy mining algorithms first used membership functions to transform each quantitative value into a fuzzy set in linguistic terms and then used a fuzzy mining process to find fuzzy association rules. Items have their own characteristics, different minimum supports specified for different items. Han, Wang, Lu, and Tzvetkov [12] have pointed out that setting the minimum support is quite subtle, which can hinder the widespread applications of these algorithms. Our own experiences of mining transaction databases also tell us that the setting is by no means an easy task. Therefore, our approach proposed method for computing minimum supports for each item in database with own features. This approach leads to effectiveness, efficiency for global search and system automation, because our model does not require the user specified threshold of minimum support.
Keywords- fuzzy data mining; multiple minimum supports; association rule; membership functions; ant colony system.
I. INTRODUCTION Recently, the fuzzy set theory has been used more and more frequently in intelligent systems because of its simplicity and similarity to human reasoning [1]. As to fuzzy data mining, Hong and Kuo proposed a mining approach that integrated fuzzy-set concepts with the Apriori mining algorithm [2]. ACO is a branch of a larger field referred to as Swarm Intelligence (SI). SI is the property of a system whereby the collective behaviors of simple agents interacting locally with their environment cause coherent functional global patterns to emerge [3]. It is the behavioral simulation of social insects such as bees, ants, wasps and termites. This behavioral simulation came about for many reasons—optimization of systems and learning about self-organization are two of many reasons why scientists are interested in simulating these insects. More specifically, ACO simulates the collective foraging habits of ants—ants venturing out for food, and bringing their discovered food back to the nest. Ants have poor vision and poor communication skills, and a single ant faces a poor probability of longevity. However, a large group, or swarm, of ants can collectively perform complex tasks with proven effectiveness, such as gathering food, sorting corpses or performing division of labor [4]. They are a heuristic
Numerical experiments on the proposed algorithm are also performed to show its effectiveness. The remaining parts of the paper are organized as follows. Section II presents An ACSbased mining framework. The proposed algorithm based on the above framework is described in Section III. Numerical simulations are shown in Section IV. Conclusions are given in Section V. II.
THE ACS- BASED FUZZY MINING FRAMEWORK
In this section, the ACS based fuzzy mining framework [13] is shown in Fig. 1 where each item has its own membership function set .These membership function sets are then fed into the ant colony system to search for the final proper sets .When the termination condition is reached, the best membership function set (with the highest fitness value) can then be used to mine fuzzy association rules from a database.
330
http://sites.google.com/site/ijcsis/ ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010
The proposed framework modified the ACS-based Framework for Fuzzy Data Mining in [13]. The framework is divided into two phases. The first phase searches for an appropriate set of membership functions for the items by the ACS mining algorithm. Having searched for the solutions in the first phase, we use the best membership functions for fuzzy data mining in the second phase.
process. In this work, we use the fitness function proposed by Chen et al. [16] to obtain a good set of membership functions. B. ACS-based fuzzy data mining algorithm Although, the proposed algorithm as considered in [15] and [16] concerns one constant minimum support for all items, we applied the determined minimum support for each item. As a matter of fact, in real world applications such as work on transactional data of chain stores, the items have different quantities. Hence, using different minimum supports for each item in order to extracting membership functions is an efficient idea. However, the previous ones that user specified minimum supports, the new approach proposes the minimum supports are achieved by a preprocessing on all items. On the other hand, minimum support for each item is automatically set as a value correspond with the quantity of the item. We considered a method for computing minimum support for each item with its characteristics in databases. There are significant criteria for computing minimum support like, the number that each item happened in database and sum of values for each item in database. For example, suppose the number that item A happened in database is 10 and sum values is 20 and also the number that item B happened in database is 2 and sum values is 20. Clearly in mining process item A valuable than item B. We computing minimum support for item B until this item can`t satisfying minimum support. As mentioned above, we suggested in (1) as below:
The ACS algorithm plays an important role in extracting the membership functions. In the past, Parpinelli et al. proposed the AntMiner to discover association rules [14]. They worked on categorical attributes and discrete values. They proved that the ACS algorithm performed well on handling discrete values in a solution space. In this work, we assume the parameters of membership functions as discrete values and thus try to use the ACS algorithm to find them. We transform the extraction of membership functions into a route-search problem. A route then represents a possible set of membership functions. The artificial ants, which refer to virtual ants that are used to solve this problem, can then be used to find a nearly optimal solution.
min _Sup(I ) =
∑
(1)
∗ ∗
Let I = {i1, i2, ..., im} be a set of items and D = {t1, t2, ... , tn} be a set of transactions. N is total number of transaction data. T is the number that each item happened in database. Si is sum values of an item in database D. P is constant digit with respect to the interval [0, 1]. In addition, as we investigated the parameters defined in [13], the following parameters performed: The number of artificial ants, the minimum pheromone ratio of an ant, the evaporation ratio of pheromone, the local updating ratio, and the global updating ratio. The proposed ACS-based algorithm for mining membership functions and fuzzy association rules are given as follow.
Figure 1. The ACS-based framework for fuzzy data mining
III.
INPUT :
THE ACS_BASED FUZZY DATA MINING ALGORITHM
A. Initializations As revealing membership functions of all items result in a long code, we will encode the membership function of each item into a binary code. We use the coding algorithm which was represented in [13]. Furthermore, we utilize some rules called State transition rule, Pheromone updating rule, Local updating rule, Global updating rule which were defined in [15].
a) quantitative transaction data, b) a set of m items, which is with l predefined linguistic terms, c) a maximum number of iterations G, d) P is constant digit with respect to the interval [0, 1]. OUTPUT: An appropriate set of membership functions for all items in fuzzy data mining.
In this work, each item will have a set of isoscelestriangular membership functions. The membership function stands for the linguistic terms such as low, middle, high. Transforming these quantitative values into linguistic terms requires a feasible population of database. Therefore, we need to initialize and update a population during the evolution
step 1) Let = 1 , where is used to keep the identity number of the items to be processed. step 2) Let the multi- stage graph for the fuzzy mining problem be ( , ), where is the set of nodes and is the set of edges. Also denote the j- node in the i- th stage as , and
331
http://sites.google.com/site/ijcsis/ ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010
the edge from to ( ) as . Initially set the pheromone on every edge as 0.5. step 3) Let the initial generation = 1. step 4) Sets up the complete route for each artificial ant by the following sub steps. a) Selects the edges from start to end according to the state transaction rule. b) Update the pheromone of the edges passed through by according to the local updating rule.
IV. NUMERICAL SIMULATION We experimentally evaluated our approach to expose the performance of the proposed algorithm. The experiments were implemented in C/C++ on a computer with Intel Core(TM) 2 Duo Processor 2.66GHz and 4 GB main memory, running the Microsoft Windows 7 operating system. We used two datasets to present results: Dataset [13] with a total of 64 items and 10,000 transactions. In addition, a real dataset called FOODMART from an anonymous chain store was used in the experiments [17]. The FOODMART dataset contained quantitative transactions about the products sold in the chain store. There were totally 21,556 transactions with 1600 items in the dataset Used in the experiments. The initial count of ants was set at 10. The parameters in the ACS algorithm were set as follows: the initial ratio of pheromone was 0.05, the minimum pheromone of ants was 0.2, the evaporation ration was 0.9, the local updating ratio was 0.1 and the global updating ratio was 0.9, minimum support for FOODMART dataset was set to 0.0015 and for dataset [13] was set to 0.04. We considered the value of constant P, as mentioned in (1) for FOODMART dataset equal to 0.05 and for dataset [13] equal to 0.02.
step 5) Evaluate the fitness value of the solution (membership functions) obtained by each artificial ant according to the following sub steps. a) For each transaction datum , = 1 to n, transfer its quantitative value for item into a fuzzy set according to the membership functions obtained from the ant in (2). That is, is represented as : +
+⋯+
+⋯+
(2)
The average fitness values of the artificial ants along with different numbers of generations for two datasets are shown in Fig. 2 and Fig. 3.
Where Region is the k-th fuzzy term of itemI , f is v ’s fuzzy membership value in the region, and l is the number of fuzzy membership functions.
ACS with constant minimum support
b) The scalar cardinality of each region in the transactions is calculated in (3): 2.5
(3)
Avrage fitness values
()
=∑
ACS with multiple minimum supports
()
Where f is the fuzzy membership value of region R from the i-th datum. whether its / c) Check for each than or equal to the minimum support threshold satisfies the above condition, put it in the set of itemsets ( ). d) Calculate the fitness value of the solution ant by dividing the number of large itemsets in suitability. That is Equation(4),
is larger . If large 1 from the over the
2 1.5 1 0.5 0 0
1000
2000
3000
4000
Generations =
|
|
(4)
Figure 2. The average fitness values along with different numbers of generations with dataset [13]
step 6) Once all the artificial ants find their entire routes, the one holding the highest fitness value will be used to update the pheromone according to the global updating rule. step 7) If the generation g is equal to G, output the current best set of membership functions of item I for fuzzy data p mining; otherwise, g =g +1 and go to s 4. step 8) If p ≠ m, set p =p +1 and go to Step 2 for another item; otherwise, stop the algorithm.
It can be vividly seen from Fig. 2 and Fig. 3 that in our approach, the average fitness values increased by an offset compared with the previous one. Thus, became stable within less number of generations. In addition, we used smaller numbers for generation with the aim of comparing the difference between our model and the existing one that has static constant minimum support in Fig. 4. It is obviously represents that our model achieved the best fitness at 300 numbers of generations, whereas the existing one reached its best fitness at 500 numbers of generations.
The final set of membership functions output in step 7 and the 1-itemsets obtained are then used to mine fuzzy association rules from the given database.
332
http://sites.google.com/site/ijcsis/ ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010
ACS with constant minimum support
ACS with constant minimum support
ACS with multiple minimum supports
ACS with multiple minimum supports
2.5
Avrage fitness values
Avrage fitness values
2.5 2 1.5 1 0.5
2 1.5 1 0.5
0 0
1000
2000
3000
0
4000
0
Generations
200
400
600
800
1000
Generations
Figure 3. The average fitness values along with different numbers of generations with FOODMART dataset
Figure 5. The average fitness values along with different numbers of generations with FOODMART dataset (in smaller scale)
ACS with constant minimum support ACS with constant minimum support
ACS with multiple minimum supports
ACS with multiple minimum supports
2
Number of Large 1-itemsets
Avrage fitness values
2.5
1.5 1 0.5 0 0
200
400
600
800
1000
140 120 100 80 60 40 20 0
Generations
0
Figure 4. The average fitness values along with different numbers of generations with dataset [13] (in smaller scale)
1000
2000
3000
4000
Generations
As shown in Fig.5, the result of executing ACS algorithm with multiple minimum supports on FOODMART dataset is much better than ACS algorithm with constant minimum support since it has higher average of fitness values. The number of items in FOODMART dataset is too many. Therefore artificial ants have been through difficulty for optimizing membership functions. Meanwhile, ACS algorithm with multiple minimum supports could easily pass this test, and extracting membership functions with high average of fitness values.
Figure 6. The numbers of large 1-itemsets along with different numbers of generations with dataset [13]
Fig.7 illustrates the numbers of large 1-itemsets along with different generations for FOODMART dataset. The proposed ACS algorithm could increase large 1-itemsets in interval 50 to 500 generations, and stabilize after about 500 generations while the existing method with increasing generation had no changes, since the existing algorithm cannot work with FOODMART dataset which have a lot of items.
The numbers of large 1-itemsets along with different generations are shown in Fig. 6. The curve of the existing method stabilized after about three thousand generations while the curve of our approach remained constant after one thousand generations. Besides, the number of large 1-itemsets of our approach is clearly much higher.
333
http://sites.google.com/site/ijcsis/ ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010
ACS with constant minimum support
ACS with constant minimum support ACS with multiple minimum supports 35000
3000
30000
2500
25000
Time (second)
Number of large 1-itemsets
ACS with multiple minimum supports 3500
2000 1500 1000
20000 15000 10000
500
5000
0
0 0
1000
2000
3000
0
4000
1000
Generations Figure 7. The numbers of large 1-itemsets along with different numbers of generations with FOODMART dataset
3000
4000
Figure 9. The execution time of the ACS mining algorithm with FOODMART dataset
Fig. 8 and Fig. 9 reveal the execution time of the ACS algorithms for different numbers of generations. Although, execution time increased along with the generations within both line graphs. Therefore, our approach represents the same execution time for smaller number of generations, but increases for high number of generations, slightly.
ACS with constant minimum support ACS with multiple minimum supports Avrage fitnees values
2.5
ACS with constant minimum support ACS with multiple minimum support 8000 7000
Time (second)
2000
Generatons
6000
2 1.5 1 0.5 0
5000
0%
20%
4000
40%
60%
80%
100%
Size of dataset
3000
Figure 10. The average fitness values along with different size of dataset [13]
2000 1000
ACS with constant minimum support
0 0
1000
2000
3000
ACS with multiple minimum supports
4000
Generations Avrage fitness values
2.5
Figure 8. The execution time of the ACS mining algorithm with dataset [13]
In the following study, we expressed the ACS algorithms efficiency with scalability test on two datasets. The generation parameter among execution of algorithms is considered with constant value of 500. The average of fitness values of the artificial ants along with different size of dataset [13] is shown in Fig. 10. By increasing the size of dataset, the accurate membership functions are extracted, and the artificial ant can learn more and find proper solutions. While in existing algorithm with increasing the size of dataset has no changes. Fig. 11 which is executed on FOODMART dataset, is as the same as Fig.10 mentioned before.
2 1.5 1 0.5 0 0%
20%
40%
60%
80%
100%
Size of dataset Figure 11. The average fitness values along with different size of FOODMART dataset
334
http://sites.google.com/site/ijcsis/ ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010
The large 1-itemsets of the artificial ants along with different size of datasets is shown in Fig. 12 and Fig. 13. By increasing the size of dataset, the number of large 1-itemsets increased as well. However, at the first existing algorithm had high values. Nevertheless, the ACS with constant minimum support had remained steady.
ACS with constant minimum support
Time (second)
ACS with multiple minimum supports
Number of large 1-itemsets
ACS with constant minimum support ACS with multiple minimum supports 140 120 100
1000 900 800 700 600 500 400 300 200 100 0 0%
80
20%
40%
60%
80%
100%
Size of dataset
60
Figure 14. The execution time of the ACS mining algorithm along with different size of dataset [13]
40 20 0 0%
20%
40%
60%
80%
ACS with constant minimum support
100%
ACS with multiple minimum supports
Size of dataset 4000
Figure 12. The numbers of large 1-itemsets along with different size of dataset [13]
Time (second)
3600
ACS with constant minimum support Number of large 1-itemsets
ACS with multiple minimum supports 3500 3000
3200 2800 2400
2500 2000
2000
0%
1500
20%
40%
60%
80%
100%
Size of dataset
1000
Figure 15. The execution time of the ACS mining algorithm along with different size of FOODMART dataset
500 0 0%
20%
40%
60%
80%
V. CONCLUSIONS In this paper, we could seek for the issues of applying the ACS algorithm to extract membership functions for fuzzy data mining and have proposed an algorithm to address this aim. As a matter of fact, in this approach we could deliver two benefits including the usage of multiple minimum supports, and system automation. On the other hand, computation results illustrated our work can be given as an alternative for effective association rule mining.
100%
Size of dataset Figure 13. The numbers of large 1-itemsets along with different size of FOODMART dataset
Fig. 14 and Fig. 15 reveal the execution time of the ACS algorithms for different size of datasets. As can be observed in Fig. 14 and 15, the execution time of both algorithms is nearly equal. It therefore proves that proposed algorithm does not increase the execution time as well as improving efficiency encourages us to employ the proposed algorithm for extracting membership functions.
Meanwhile, the most significant difference between our algorithm and older ACS algorithms to extract membership functions concerns the independency of minimum support threshold. The experimental results of this new approach encouraged us to improve the system and utilize this strategy in real world applications, magnificently.
335
http://sites.google.com/site/ijcsis/ ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010
REFERENCES [1] [2]
[3]
[4]
[5]
[6]
[7]
[8]
[9] [10]
[11]
[12]
[13] [14]
[15]
[16]
[17]
Kandel, "Fuzzy expert systems," CRC Press, pp. 8- 19, 1992. T. P. Hong, et al., "Trade-off between time complexity and number of rules for fuzzy mining from quantitative data," International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, vol. 9, pp. 587604, 2001. E. Bonabeau, et al. (1999). Swarm intelligence from natural to artificial isystems.Available:http://link.library.utoronto.ca/eir/EIRdetail.cfm?Reso urces__ID=432271&T=F W. B. Tao, et al., "Object segmentation using ant colony optimization algorithm and fuzzy entropy," Pattern Recognition Letters, vol. 28, pp. 788-796, May 1 2007. V. Maniezzo and A. Colorni, "The ant system applied to the quadratic assignment problem," Ieee Transactions on Knowledge and Data Engineering, vol. 11, pp. 769-778, Sep-Oct 1999. M. Dorigo, et al., "Ant system: Optimization by a colony of cooperating agents," Ieee Transactions on Systems Man and Cybernetics Part BCybernetics, vol. 26, pp. 29-41, Feb 1996. P. R. McMullen, "An ant colony optimization approach to addressing a JIT sequencing problem with multiple objectives," Artificial Intelligence in Engineering, vol. 15, pp. 309-317, Jul 2001. Colorni, et al., "Ant system for job-shop scheduling," Belgian Journal of Operations Research, Statistics and Computer Science, vol. 34, pp. 3953, 1994. S. C. Chu, et al., "Ant colony system with communication strategies," Information Sciences, vol. 167, pp. 63-76, Dec 2 2004. M. Dorigo and L. M. Gambardella, "Ant colony system: a cooperative learning approach to the traveling salesman problem," IEEE Transactions on Evolutionary Computation, vol. 1, pp. 53- 66, 1997. A.Wade and S. Shalhi, "An ant system algorithm for the mixed vehicle routing problem with backhauls," in Metaheuristics: Computer Decision-Making, Norwell, MA: Kluwer, pp. 699- 719, 2004. J. Han, et al., "Mining top-k frequent closed patterns without minimum support," In Proceedings of the 2002 IEEE international conference on data mining, pp. 211- 218, 2002. T. P. Hong, et al., "An ACS-based framework for fuzzy data mining," Expert Systems with Applications, vol. 36, pp. 11844-11852, Nov 2009. R. S. Parpinelli, et al., "An Ant Colony Based System for Data Mining: Application to Medical Data," The Genetic and Evolutionary Computation Conference, pp. 791- 798, 2001. T. P. Hong, et al., "Extracting membership functions in fuzzy data mining by Ant Colony Systems," Proceedings of 2008 International Conference on Machine Learning and Cybernetics, Vols 1-7, pp. 39793984, 2008. C. H. Chen, et al., "Cluster-based evaluation in fuzzy-genetic data mining," IEEE Transactions on Fuzzy Systems, vol. 16, pp. 249-262, Feb 2008. "Microsoft Corporation. Example Database FoodMart of Microsoft Analysis Services."
336
http://sites.google.com/site/ijcsis/ ISSN 1947-5500