Extracting Membership Functions Using ACS

2 downloads 0 Views 379KB Size Report
this study by proposing an ACS -based algorithm to extract membership functions in fuzzy data mining. In this paper, the membership functions were encoded ...
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010

Extracting Membership Functions Using ACS Method via Multiple Minimum Supports Ehsan Vejdani Mahmoudi

Masood Niazi Torshiz

Mehrdad Jalali

Islamic Azad University, Mashhad Branch, Young Researchers Club, Mashhad, Iran [email protected]

Department of computer Engineering, Islamic Azad University - Mashhad Branch, Mashhad, Iran [email protected]

Department of computer Engineering, Islamic Azad University - Mashhad Branch, Mashhad, Iran [email protected]

approach inspired from the behavior of social insects. Ants deposit their chemical trails called “Pheromone” on the ground for communicating with others. According to the pheromone, ants can find the shortest path between the source and the destination. Recently, Ant Colony Systems (ACS) has been successfully applied to several difficult NP-hard problems, such as the quadratic assignment [5], communication strategies [6], production sequencing problem [7]. Job Schedule Problem (JSP) [8], the traveling salesman problems [9], [10], Vehicle Routing Problems (VRP) [11], etc.

Abstract— Ant Colony Systems (ACS) have been successfully applied to different optimization issues in recent years. However, only few works have been done by employing ACS method to data mining. This paper addresses the lack of investigations on this study by proposing an ACS-based algorithm to extract membership functions in fuzzy data mining. In this paper, the membership functions were encoded into binary bits, and then they have given to the ACS method to discover the optimum set of membership functions. By considering this approach, a comprehensive exploration can be executed to implement the system automation. Therefore, it is a new frontier, since the proposed model does not require any user-specified threshold of minimum support. Hence, we evaluated our approach experimentally and could reveal this approach by significant improving of membership functions.

Basically, fuzzy mining algorithms first used membership functions to transform each quantitative value into a fuzzy set in linguistic terms and then used a fuzzy mining process to find fuzzy association rules. Items have their own characteristics, different minimum supports specified for different items. Han, Wang, Lu, and Tzvetkov [12] have pointed out that setting the minimum support is quite subtle, which can hinder the widespread applications of these algorithms. Our own experiences of mining transaction databases also tell us that the setting is by no means an easy task. Therefore, our approach proposed method for computing minimum supports for each item in database with own features. This approach leads to effectiveness, efficiency for global search and system automation, because our model does not require the user specified threshold of minimum support.

Keywords- fuzzy data mining; multiple minimum supports; association rule; membership functions; ant colony system.

I. INTRODUCTION Recently, the fuzzy set theory has been used more and more frequently in intelligent systems because of its simplicity and similarity to human reasoning [1]. As to fuzzy data mining, Hong and Kuo proposed a mining approach that integrated fuzzy-set concepts with the Apriori mining algorithm [2]. ACO is a branch of a larger field referred to as Swarm Intelligence (SI). SI is the property of a system whereby the collective behaviors of simple agents interacting locally with their environment cause coherent functional global patterns to emerge [3]. It is the behavioral simulation of social insects such as bees, ants, wasps and termites. This behavioral simulation came about for many reasons—optimization of systems and learning about self-organization are two of many reasons why scientists are interested in simulating these insects. More specifically, ACO simulates the collective foraging habits of ants—ants venturing out for food, and bringing their discovered food back to the nest. Ants have poor vision and poor communication skills, and a single ant faces a poor probability of longevity. However, a large group, or swarm, of ants can collectively perform complex tasks with proven effectiveness, such as gathering food, sorting corpses or performing division of labor [4]. They are a heuristic

Numerical experiments on the proposed algorithm are also performed to show its effectiveness. The remaining parts of the paper are organized as follows. Section II presents An ACSbased mining framework. The proposed algorithm based on the above framework is described in Section III. Numerical simulations are shown in Section IV. Conclusions are given in Section V. II.

THE ACS- BASED FUZZY MINING FRAMEWORK

In this section, the ACS based fuzzy mining framework [13] is shown in Fig. 1 where each item has its own membership function set .These membership function sets are then fed into the ant colony system to search for the final proper sets .When the termination condition is reached, the best membership function set (with the highest fitness value) can then be used to mine fuzzy association rules from a database.

330

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010

The proposed framework modified the ACS-based Framework for Fuzzy Data Mining in [13]. The framework is divided into two phases. The first phase searches for an appropriate set of membership functions for the items by the ACS mining algorithm. Having searched for the solutions in the first phase, we use the best membership functions for fuzzy data mining in the second phase.

process. In this work, we use the fitness function proposed by Chen et al. [16] to obtain a good set of membership functions. B. ACS-based fuzzy data mining algorithm Although, the proposed algorithm as considered in [15] and [16] concerns one constant minimum support for all items, we applied the determined minimum support for each item. As a matter of fact, in real world applications such as work on transactional data of chain stores, the items have different quantities. Hence, using different minimum supports for each item in order to extracting membership functions is an efficient idea. However, the previous ones that user specified minimum supports, the new approach proposes the minimum supports are achieved by a preprocessing on all items. On the other hand, minimum support for each item is automatically set as a value correspond with the quantity of the item. We considered a method for computing minimum support for each item with its characteristics in databases. There are significant criteria for computing minimum support like, the number that each item happened in database and sum of values for each item in database. For example, suppose the number that item A happened in database is 10 and sum values is 20 and also the number that item B happened in database is 2 and sum values is 20. Clearly in mining process item A valuable than item B. We computing minimum support for item B until this item can`t satisfying minimum support. As mentioned above, we suggested in (1) as below:

The ACS algorithm plays an important role in extracting the membership functions. In the past, Parpinelli et al. proposed the AntMiner to discover association rules [14]. They worked on categorical attributes and discrete values. They proved that the ACS algorithm performed well on handling discrete values in a solution space. In this work, we assume the parameters of membership functions as discrete values and thus try to use the ACS algorithm to find them. We transform the extraction of membership functions into a route-search problem. A route then represents a possible set of membership functions. The artificial ants, which refer to virtual ants that are used to solve this problem, can then be used to find a nearly optimal solution.

min _Sup(I ) =



(1)

∗ ∗

Let I = {i1, i2, ..., im} be a set of items and D = {t1, t2, ... , tn} be a set of transactions. N is total number of transaction data. T is the number that each item happened in database. Si is sum values of an item in database D. P is constant digit with respect to the interval [0, 1]. In addition, as we investigated the parameters defined in [13], the following parameters performed: The number of artificial ants, the minimum pheromone ratio of an ant, the evaporation ratio of pheromone, the local updating ratio, and the global updating ratio. The proposed ACS-based algorithm for mining membership functions and fuzzy association rules are given as follow.

Figure 1. The ACS-based framework for fuzzy data mining

III.

INPUT :

THE ACS_BASED FUZZY DATA MINING ALGORITHM

A. Initializations As revealing membership functions of all items result in a long code, we will encode the membership function of each item into a binary code. We use the coding algorithm which was represented in [13]. Furthermore, we utilize some rules called State transition rule, Pheromone updating rule, Local updating rule, Global updating rule which were defined in [15].

a) quantitative transaction data, b) a set of m items, which is with l predefined linguistic terms, c) a maximum number of iterations G, d) P is constant digit with respect to the interval [0, 1]. OUTPUT: An appropriate set of membership functions for all items in fuzzy data mining.

In this work, each item will have a set of isoscelestriangular membership functions. The membership function stands for the linguistic terms such as low, middle, high. Transforming these quantitative values into linguistic terms requires a feasible population of database. Therefore, we need to initialize and update a population during the evolution

step 1) Let = 1 , where is used to keep the identity number of the items to be processed. step 2) Let the multi- stage graph for the fuzzy mining problem be ( , ), where is the set of nodes and is the set of edges. Also denote the j- node in the i- th stage as , and

331

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010

the edge from to ( ) as . Initially set the pheromone on every edge as 0.5. step 3) Let the initial generation = 1. step 4) Sets up the complete route for each artificial ant by the following sub steps. a) Selects the edges from start to end according to the state transaction rule. b) Update the pheromone of the edges passed through by according to the local updating rule.

IV. NUMERICAL SIMULATION We experimentally evaluated our approach to expose the performance of the proposed algorithm. The experiments were implemented in C/C++ on a computer with Intel Core(TM) 2 Duo Processor 2.66GHz and 4 GB main memory, running the Microsoft Windows 7 operating system. We used two datasets to present results: Dataset [13] with a total of 64 items and 10,000 transactions. In addition, a real dataset called FOODMART from an anonymous chain store was used in the experiments [17]. The FOODMART dataset contained quantitative transactions about the products sold in the chain store. There were totally 21,556 transactions with 1600 items in the dataset Used in the experiments. The initial count of ants was set at 10. The parameters in the ACS algorithm were set as follows: the initial ratio of pheromone was 0.05, the minimum pheromone of ants was 0.2, the evaporation ration was 0.9, the local updating ratio was 0.1 and the global updating ratio was 0.9, minimum support for FOODMART dataset was set to 0.0015 and for dataset [13] was set to 0.04. We considered the value of constant P, as mentioned in (1) for FOODMART dataset equal to 0.05 and for dataset [13] equal to 0.02.

step 5) Evaluate the fitness value of the solution (membership functions) obtained by each artificial ant according to the following sub steps. a) For each transaction datum , = 1 to n, transfer its quantitative value for item into a fuzzy set according to the membership functions obtained from the ant in (2). That is, is represented as : +

+⋯+

+⋯+

(2)

The average fitness values of the artificial ants along with different numbers of generations for two datasets are shown in Fig. 2 and Fig. 3.

Where Region is the k-th fuzzy term of itemI , f is v ’s fuzzy membership value in the region, and l is the number of fuzzy membership functions.

ACS with constant minimum support

b) The scalar cardinality of each region in the transactions is calculated in (3): 2.5

(3)

Avrage fitness values

()

=∑

ACS with multiple minimum supports

()

Where f is the fuzzy membership value of region R from the i-th datum. whether its / c) Check for each than or equal to the minimum support threshold satisfies the above condition, put it in the set of itemsets ( ). d) Calculate the fitness value of the solution ant by dividing the number of large itemsets in suitability. That is Equation(4),

is larger . If large 1 from the over the

2 1.5 1 0.5 0 0

1000

2000

3000

4000

Generations =

|

|

(4)

Figure 2. The average fitness values along with different numbers of generations with dataset [13]

step 6) Once all the artificial ants find their entire routes, the one holding the highest fitness value will be used to update the pheromone according to the global updating rule. step 7) If the generation g is equal to G, output the current best set of membership functions of item I for fuzzy data p mining; otherwise, g =g +1 and go to s 4. step 8) If p ≠ m, set p =p +1 and go to Step 2 for another item; otherwise, stop the algorithm.

It can be vividly seen from Fig. 2 and Fig. 3 that in our approach, the average fitness values increased by an offset compared with the previous one. Thus, became stable within less number of generations. In addition, we used smaller numbers for generation with the aim of comparing the difference between our model and the existing one that has static constant minimum support in Fig. 4. It is obviously represents that our model achieved the best fitness at 300 numbers of generations, whereas the existing one reached its best fitness at 500 numbers of generations.

The final set of membership functions output in step 7 and the 1-itemsets obtained are then used to mine fuzzy association rules from the given database.

332

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010

ACS with constant minimum support

ACS with constant minimum support

ACS with multiple minimum supports

ACS with multiple minimum supports

2.5

Avrage fitness values

Avrage fitness values

2.5 2 1.5 1 0.5

2 1.5 1 0.5

0 0

1000

2000

3000

0

4000

0

Generations

200

400

600

800

1000

Generations

Figure 3. The average fitness values along with different numbers of generations with FOODMART dataset

Figure 5. The average fitness values along with different numbers of generations with FOODMART dataset (in smaller scale)

ACS with constant minimum support ACS with constant minimum support

ACS with multiple minimum supports

ACS with multiple minimum supports

2

Number of Large 1-itemsets

Avrage fitness values

2.5

1.5 1 0.5 0 0

200

400

600

800

1000

140 120 100 80 60 40 20 0

Generations

0

Figure 4. The average fitness values along with different numbers of generations with dataset [13] (in smaller scale)

1000

2000

3000

4000

Generations

As shown in Fig.5, the result of executing ACS algorithm with multiple minimum supports on FOODMART dataset is much better than ACS algorithm with constant minimum support since it has higher average of fitness values. The number of items in FOODMART dataset is too many. Therefore artificial ants have been through difficulty for optimizing membership functions. Meanwhile, ACS algorithm with multiple minimum supports could easily pass this test, and extracting membership functions with high average of fitness values.

Figure 6. The numbers of large 1-itemsets along with different numbers of generations with dataset [13]

Fig.7 illustrates the numbers of large 1-itemsets along with different generations for FOODMART dataset. The proposed ACS algorithm could increase large 1-itemsets in interval 50 to 500 generations, and stabilize after about 500 generations while the existing method with increasing generation had no changes, since the existing algorithm cannot work with FOODMART dataset which have a lot of items.

The numbers of large 1-itemsets along with different generations are shown in Fig. 6. The curve of the existing method stabilized after about three thousand generations while the curve of our approach remained constant after one thousand generations. Besides, the number of large 1-itemsets of our approach is clearly much higher.

333

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010

ACS with constant minimum support

ACS with constant minimum support ACS with multiple minimum supports 35000

3000

30000

2500

25000

Time (second)

Number of large 1-itemsets

ACS with multiple minimum supports 3500

2000 1500 1000

20000 15000 10000

500

5000

0

0 0

1000

2000

3000

0

4000

1000

Generations Figure 7. The numbers of large 1-itemsets along with different numbers of generations with FOODMART dataset

3000

4000

Figure 9. The execution time of the ACS mining algorithm with FOODMART dataset

Fig. 8 and Fig. 9 reveal the execution time of the ACS algorithms for different numbers of generations. Although, execution time increased along with the generations within both line graphs. Therefore, our approach represents the same execution time for smaller number of generations, but increases for high number of generations, slightly.

ACS with constant minimum support ACS with multiple minimum supports Avrage fitnees values

2.5

ACS with constant minimum support ACS with multiple minimum support 8000 7000

Time (second)

2000

Generatons

6000

2 1.5 1 0.5 0

5000

0%

20%

4000

40%

60%

80%

100%

Size of dataset

3000

Figure 10. The average fitness values along with different size of dataset [13]

2000 1000

ACS with constant minimum support

0 0

1000

2000

3000

ACS with multiple minimum supports

4000

Generations Avrage fitness values

2.5

Figure 8. The execution time of the ACS mining algorithm with dataset [13]

In the following study, we expressed the ACS algorithms efficiency with scalability test on two datasets. The generation parameter among execution of algorithms is considered with constant value of 500. The average of fitness values of the artificial ants along with different size of dataset [13] is shown in Fig. 10. By increasing the size of dataset, the accurate membership functions are extracted, and the artificial ant can learn more and find proper solutions. While in existing algorithm with increasing the size of dataset has no changes. Fig. 11 which is executed on FOODMART dataset, is as the same as Fig.10 mentioned before.

2 1.5 1 0.5 0 0%

20%

40%

60%

80%

100%

Size of dataset Figure 11. The average fitness values along with different size of FOODMART dataset

334

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010

The large 1-itemsets of the artificial ants along with different size of datasets is shown in Fig. 12 and Fig. 13. By increasing the size of dataset, the number of large 1-itemsets increased as well. However, at the first existing algorithm had high values. Nevertheless, the ACS with constant minimum support had remained steady.

ACS with constant minimum support

Time (second)

ACS with multiple minimum supports

Number of large 1-itemsets

ACS with constant minimum support ACS with multiple minimum supports 140 120 100

1000 900 800 700 600 500 400 300 200 100 0 0%

80

20%

40%

60%

80%

100%

Size of dataset

60

Figure 14. The execution time of the ACS mining algorithm along with different size of dataset [13]

40 20 0 0%

20%

40%

60%

80%

ACS with constant minimum support

100%

ACS with multiple minimum supports

Size of dataset 4000

Figure 12. The numbers of large 1-itemsets along with different size of dataset [13]

Time (second)

3600

ACS with constant minimum support Number of large 1-itemsets

ACS with multiple minimum supports 3500 3000

3200 2800 2400

2500 2000

2000

0%

1500

20%

40%

60%

80%

100%

Size of dataset

1000

Figure 15. The execution time of the ACS mining algorithm along with different size of FOODMART dataset

500 0 0%

20%

40%

60%

80%

V. CONCLUSIONS In this paper, we could seek for the issues of applying the ACS algorithm to extract membership functions for fuzzy data mining and have proposed an algorithm to address this aim. As a matter of fact, in this approach we could deliver two benefits including the usage of multiple minimum supports, and system automation. On the other hand, computation results illustrated our work can be given as an alternative for effective association rule mining.

100%

Size of dataset Figure 13. The numbers of large 1-itemsets along with different size of FOODMART dataset

Fig. 14 and Fig. 15 reveal the execution time of the ACS algorithms for different size of datasets. As can be observed in Fig. 14 and 15, the execution time of both algorithms is nearly equal. It therefore proves that proposed algorithm does not increase the execution time as well as improving efficiency encourages us to employ the proposed algorithm for extracting membership functions.

Meanwhile, the most significant difference between our algorithm and older ACS algorithms to extract membership functions concerns the independency of minimum support threshold. The experimental results of this new approach encouraged us to improve the system and utilize this strategy in real world applications, magnificently.

335

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010

REFERENCES [1] [2]

[3]

[4]

[5]

[6]

[7]

[8]

[9] [10]

[11]

[12]

[13] [14]

[15]

[16]

[17]

Kandel, "Fuzzy expert systems," CRC Press, pp. 8- 19, 1992. T. P. Hong, et al., "Trade-off between time complexity and number of rules for fuzzy mining from quantitative data," International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, vol. 9, pp. 587604, 2001. E. Bonabeau, et al. (1999). Swarm intelligence from natural to artificial isystems.Available:http://link.library.utoronto.ca/eir/EIRdetail.cfm?Reso urces__ID=432271&T=F W. B. Tao, et al., "Object segmentation using ant colony optimization algorithm and fuzzy entropy," Pattern Recognition Letters, vol. 28, pp. 788-796, May 1 2007. V. Maniezzo and A. Colorni, "The ant system applied to the quadratic assignment problem," Ieee Transactions on Knowledge and Data Engineering, vol. 11, pp. 769-778, Sep-Oct 1999. M. Dorigo, et al., "Ant system: Optimization by a colony of cooperating agents," Ieee Transactions on Systems Man and Cybernetics Part BCybernetics, vol. 26, pp. 29-41, Feb 1996. P. R. McMullen, "An ant colony optimization approach to addressing a JIT sequencing problem with multiple objectives," Artificial Intelligence in Engineering, vol. 15, pp. 309-317, Jul 2001. Colorni, et al., "Ant system for job-shop scheduling," Belgian Journal of Operations Research, Statistics and Computer Science, vol. 34, pp. 3953, 1994. S. C. Chu, et al., "Ant colony system with communication strategies," Information Sciences, vol. 167, pp. 63-76, Dec 2 2004. M. Dorigo and L. M. Gambardella, "Ant colony system: a cooperative learning approach to the traveling salesman problem," IEEE Transactions on Evolutionary Computation, vol. 1, pp. 53- 66, 1997. A.Wade and S. Shalhi, "An ant system algorithm for the mixed vehicle routing problem with backhauls," in Metaheuristics: Computer Decision-Making, Norwell, MA: Kluwer, pp. 699- 719, 2004. J. Han, et al., "Mining top-k frequent closed patterns without minimum support," In Proceedings of the 2002 IEEE international conference on data mining, pp. 211- 218, 2002. T. P. Hong, et al., "An ACS-based framework for fuzzy data mining," Expert Systems with Applications, vol. 36, pp. 11844-11852, Nov 2009. R. S. Parpinelli, et al., "An Ant Colony Based System for Data Mining: Application to Medical Data," The Genetic and Evolutionary Computation Conference, pp. 791- 798, 2001. T. P. Hong, et al., "Extracting membership functions in fuzzy data mining by Ant Colony Systems," Proceedings of 2008 International Conference on Machine Learning and Cybernetics, Vols 1-7, pp. 39793984, 2008. C. H. Chen, et al., "Cluster-based evaluation in fuzzy-genetic data mining," IEEE Transactions on Fuzzy Systems, vol. 16, pp. 249-262, Feb 2008. "Microsoft Corporation. Example Database FoodMart of Microsoft Analysis Services."

336

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Suggest Documents