An Ant Colony Optimization Approach for Stacking Ensemble Yijun Chen Department of Computing and Decision Science Lingnan University Hong Kong Email:
[email protected]
Abstract—an ensemble in data mining is the strategy that combines a set of different classifiers together to generate an integrated classification system to classify new instances. In the early research, an ensemble outperforms any of its individual components. Stacking is one of the most influential ensemble among the proposed ensemble schemes. Stacking applies a two-level structure: the base-level classifiers outputs their own prediction and the meta-level classifier takes the outputs as its input to generate final decision. Most of the existing studies focus on the meta-level classifier adoption, and few on the topic about determining the configuration of both base-level classifiers and the meta-level classifiers together. This work is inspired by the Ant Colony Optimization which is good at solving combinatorial optimization problems. We propose an ACO-Stacking ensemble approach and also perform some preliminary experiments to compare our approach with some well-known ensembles. The preliminary results show that the performance of the new approach is promising. Keywords-Data Mining; Ensemble; ACO; Stacking; Metaheuristic
I. I NTRODUCTION Nowadays, data mining is considered as a very important technique by the academic researchers and the industry when facing the flood of information of modern society. Applying data mining techniques could help people discover novel knowledge from massive data repositories. One of the most frequently applied data mining function is classification, for example, classifying whether a potential customer will respond to the marketing promotion [1]. How to build a more reliable classification system and how to increase the classification accuracy are the main questions which attract much research effort. Ensemble, which combines several classifiers together to construct a more powerful classifier is considered as a possible solution [2] to these problems. Ensemble is studied and proved that it performs more accurate than any of its individual component. Some well developed ensemble techniques are applied in numerous studies, namely Bagging [3], Boosting [4] and Stacking [5]. In order to generate a well-performance ensemble, two key points should be carefully considered. The first is to inject enough diversity to the components of an ensemble. The second is to select a suitable combining method to combine
Man Leung Wong, IEEE Member Department of Computing and Decision Science Lingnan University Hong Kong Email:
[email protected]
the outputs of the components. For Bagging, the diversity is achieved by training the models of the same classification algorithm but with different data subsets by a bootstrap random sampling with replacement. Bagging uses a majority voting as the combining method. Boosting also achieve diversity by manipulating the training data set, however; Boosting assigns weights to the instances and forces the classifiers to focus on the important instances which are assigned with greater weights when they are misclassified previously. Boosting applies a weighted majority voting to combine the outputs of its components. Stacking is quite different from Bagging and Boosting. Stacking uses a two-layer structure namely the base-level classifiers and the meta-level classifier. The base-level classifiers are applied to achieve diversity and meta-level classifier is applied to combine the base-level classifiers. The base-level classifiers can be classifiers with different algorithms but trained by the same training data set or classifiers with the same algorithms but trained by different data sets. The outputs of the base-level classifiers will be processed by the meta-level classifier. The meta-level classifier can be generated by a certain classification algorithm or a voting scheme. The meta-level classifier takes the outputs of the base-level classifiers as its input to generate the final decision. Though Stacking is proved to be a promising ensemble algorithm, how to configure the configuration of base-layer classifiers and meta-layer classifier to obtain reliable and good performance is a difficult problem for the researchers. If the number of candidate base-level classifiers and candidate meta-level classifiers are small enough, an exhaustive search scheme could be applied to find the best configuration. However, the reality is not so desirable, so the exhaustive search is not a good choice. Even if the exhaustive search for an application could find its optimal configuration, this configuration may not be optimal in other applications if the domains quite differ from each other. One of the solution to making a good configuration of Stacking could be an adaptive metaheuristic search which has a smaller search sapce and faster converge speed than exheustive search. In this work, an Ant Colony Optimization (ACO) Stacking algorithm is proposed. ACO algorithm was first introduced by Dorigo and his colleagues in the
early 1990s as a nature inspired method which simulates the foraging behaviors of ants to solve the combinatorial optimization (CO) [6]. Though the ideal prototype of ACO is simple, ACO is proved powerful in several applications. In this paper, we try to extend the application of ACO to Stacking configuration optimization. This paper is organized as follows. Section II gives some backgrounds on Stacking and ACO. Section III introduces our approach of applying ACO to optimize Stacking configuration. Section IV introduces the our experiments comparing our approach with some well known and latest approaches. Finally, the section V will draw some conclusions.
II. S TACKING AND ACO In this section, the basic knowledge of Stacking and Ant Colony Optimization (ACO) will be introduced respectively in following subsections. A. Stacking Stacking is the abbreviation of Stacked Generalization [5]. The basic idea of Stacking is to combine different classifiers from different classification algorithms, such as decision tree, multilayer propagation, naive Bayes, to generate a higher level classification system. As what is mentioned in the introduction, diversity of base-level classifiers is important to generate an ensemble. Each algorithm applies different hypothesis to generate their classifications, thus the errors and bias of one algorithm may differ from the others. And the differences of classifiers are considered as not correlated, which achieve the diversity of base classifiers. Stacking uses the meta-level classifier to map the outputs of the base-level classifiers to the final decision. Once the baselevel classifiers are trained, their outputs of each training instance will be taken as the independent attributes and the real class labels of the instances will be taken as the dependent attribute. For all the training instances, the new training set is generated to train the meta-level classifier. When all the training processes of base-level classifiers and meta-level classifier are finished, a Stacking ensemble is obtained. To classify a new instance, the meta-level classifier takes the predictions of the base-level classifiers as its input and give its prediction as the decision. Since the proposition of Stacking, the question of how to obtain a right configuration to optimize its performance is asked. Some effort and research is devoted to solve this problem. Most work of Stacking focuses on the selection of the meta-level data and algorithm to generate the metalevel classifier [11] [12]. DEA (Data Envelopment Analysis) which is a nonparametric method in operation research and economics is also brought into finding the configuration of Stacking. [21] Metaheuristic approaches, which are good for
solving combinatorial optimization problems, such as Genetic Algorithms, , is also applied to configure the Stacking [23] [22]. In this work, we try to use another well-known metaheuristic approach: Ant Colony Optimization (ACO) to optimize the configuration of Stacking. In the best of our knowledge so far, this is the first work that applying ACO to configure Stacking. In the next subsection, the background of ACO will be introduced. B. Ant Colony Optimization ACO is an appealing metaheuristic which is inspired by the real ants’ foraging behaviors. Dorigo and his colleagues proposed this idea in the eraly 1990s [6]. Though the idea of ACO is simple, it is widely applied in some combinatorial optimization problems, such as Travel Salesman Problem (TSP) [6] [7], Graph Coloring [8], data mining [24] and other problems. The idea of ACO is based on the collective behaviors of real ant colonies, which are able to find a shortest path from their nest to the food source. Each ant is of limited intelligence to find the best or shortest path, however; it can use an indirect communication to communicate with other ants. When an ant is walking, it will deposit a chemical material called pheromone on the ground. The ants can smell the pheromone and use it to choose their ways. The ants choose the ways in a probabilistic manner, that the ways with strong pheromone concentrations will be selected with larger probabilities. If the pheromone is absent, the ants will randomly choose a way to walk on. After a period, the shorter way is chosen more frequently, which means more ants walk on this way and faster pheromone accumulates. If a way is not chosen by the ants, the pheromone will evaporate. The accumulation of pheromone is a positive feedback to encourage the ants to choose the shortest way. However, some ants may be ”stupid” to select the ways with less pheromone, but this situation is very important for the ants to get rid of the local shortest path to find another way to achieve the global shortest path. If the new path is shorter than the current path, the pheromone will accumulate thus attract more ants to walk on this way. Then the optimal path will be change to this one. In conclusion, although the ability of ants are limited, the optimal shortest path could be achieved by the collective behaviors of ants using the indirect communication. The first ACO algorithm, called Ant System is proposed to solve TSP problems, but its appealing performance attracted the researchers keep on developing it and applying it to solve other problems. Specifically on data mining applications, many approaches are proposed. Parpinelli et al. proposed Ant-Miner (Ant Colony-based Data Miner) extract classification rules from data [24]. Al-Ani applied ACO in the feature selection task [25]. Abraham and Ramos proposed an artificial ant colony clustering and linear genetic programming for web usage mining [26]. In this work, we
will apply ACO to search for the optimal configuration of Stacking ensemble. The detail of the algorithm will be introduced in the next section. III. ACO-S TACKING In a ACO-Stacking construction task, given a set of baselevel classifiers C, consisting of m classifiers, ACO searching process tries to find a subset S, which consists of n classifiers (n ≤ m, S ⊆ C) to maximize the classification accuracy. Some representation exploited in the approach are given as following: • m classifiers in the base-level classifier candidate set C = {c1 , · · · , cm }. • k artificial ants in the searching space. • µi : the pheromone associated to the ci in C. th • Sj : the configuration constructed by the j ant. Sj ⊆ C, and Sj ̸= ø • τ : the evaporation rate and τ ∈ (0, 1) • L: the maximum iteration number. At the beginning of the ACO-Stacking, a base-level classifier candidate set C is given. For the current approach, the meta-level classifier is set as C4.5 decision tree. Like other ACO approaches, the ACO-Stacking will iteratively execute for several times. At the beginning of each iteration, the pheromone µi of each base-level classifiers ci is initialized to a small positive number. For each iteration, each ant generates its best configuration and update the pheromones of the classifiers selected by this ant. When the j th ant begins its searching of its configuration by adding candidates to its configuration, the Sj is initialized as an empty set ø. A current best configuration S ′ of the j th ant is also initialized as an empty set ø. A candidate classifier, which is not selected in the current configuration Sj will be selected based on the pheromones possibility distribution. The roulette wheel selection technique is used for selecting the candidate. The possibility of selecting a candidate ci is given by pi as defined in equation 1: { µi ∑m−r ifci ̸∈ Sj µt pi = (1) t=1 0 otherwise where r refers to the number of the classifiers which are already included in the configuration Sj . Then a new configuration Sj′ will be generated where Sj′ = Sj ∪ {ci }. If Sj is an empty set, Sj and S ′ will be replaced by Sj′ , and the new search of the ant begins. If Sj is not an empty set, Sj′ is used to train and verify on the training set. The accuracy will be used as a criterion to compare this solution Sj′ with the currently best solution S ′ . If the classification accuracy by new configuration Sj′ is poorer than S ′ , the j th ant will stop its searching. If the new one is superior, Sj′ will replace the Sj and S ′ . Then the j th ant will go on to select a next classifier candidate adding to the new Sj to construct a new configuration until no better configuration could be found.
1) For i from 1 to m, initialize the pheromone µi of Ci in C; initial L to 0 2) While the maximal iteration L does not reach a) For j from 1 to k, the j th ant begins its searching • Initial its configuration Sj = ø • Initial the current best configuration S ′ = ø • Set the flag of adding a new classifier to true • While the flag equals to true – Using roulette wheel technique to select a ci ̸∈ Sj to generate a new configuration: Sj′ and Sj′ = Sj ∪ {c′ } – IF current best configuration Sj = ø , THEN Set S ′ = Sj′ , Sj = Sj′ – ELSE ∗ Training and verifying Sj′ on training set ∗ Compare the accuracy of Sj′ to that of S ′
· IF Sj′ is superior, THEN update the pheromone µi of ci and,S ′ = Sj′ , Sj = Sj′ · ELSE, set the flag of adding a new classifier to f alse b) Evaporation occurs when an iteration finishes. c) L = L + 1. 3) Using the same searching process of an ant to generate the final Stacking configuration
Figure 1.
The ACO-Stacking Algorithm.
When the j th ant stop its searching,the next ant begins its searching for its configuration until all the k ants execute their searching in an iteration. During the ants’ searching, once a candidate ci is chosen by an ant in its configuration, the pheromone µi will accumulate, enhancing the possibility of this candidate by the other ants. The equation 2 shows the update rule of pheromone. µi = µi ∗ ∆{αSj , αS ′ }
(2)
where αS refers to the classification accuracy achieved by configuration S and ∆{αSj , αS ′ } is the percentage of improvement on the accuracy of Sj . The pheromone will retain for all the iterations. At the mean time, the pheromones will evaporate during the time. The equation 3 gives the evaporation of pheromone. In our setting, the evaporation occurs when an iteration ends. µi = (1 − τ ) ∗ µi
(3)
In this strategy the pheromones of the ”good” candidates will accumulate and the pheromones of the ”poor” candidates will vanish. After completing the searching process, the final optimal configuration will be generated. The pseudo code of the ACO-Stacking is given in Figure 1. IV. E XPERIMENT The experiment is conducted on an open source data mining platform - WEKA [9], Waikato Environment for Knowledge Analysis. This platform is well developed including many data mining algorithms and experiment plans.
A. Datasets In the experiment of this work, 12 datasets in different domains from UCI machine learning repository [10] were tested. These data sets have been widely used in other relevant works. The details of the datasets are summarized in Table I. B. Candidate Algorithms
Random Forest [28]. Bagging and AdaBoost are two widely used ensemble approaches. StackingC is an derivant of the original Stacking approach, which is more efficient. Random Forest works as a non-ensemble benchmark classifier. •
Bagging: Bagging generates several classifiers using the same classification algorithm but with different data subsets by random sampling with replacement. Majority voting is used to generate the final decision from the outputs of the classifiers. We compared two Baggings approaches, one with C4.5 algorithm and another with REP Tree as their classification algorithms respective.
•
AdaBoost: AdaBoost generates a series of classifiers using the same classification algorithm but with different data instances associated with different weights. The more informative instances will be emphasized . The final decision will be generated by a weighted majority voting from the outputs of the classifiers. We also compared two AdaBoost approaches, one with C4.5 and another with Decision Stump.
•
StackingC: StackingC is a more efficient version of Stacking proposed in [12] and [11]. The meta-level classifier of StackingC is a Multi-Response Model Tree (MMT). The base-level classifiers of StackingC are C4.5 Decision Tree, Naive Bayes and KStar, which are those used in [11].
•
Random Forest: Random forest is constructed by combining a great number of unpruned decision trees with different feature subsets.
In this work, ten different classification algorithms in WEKA are selected to be the candidates of base-level classifiers. The algorithms are introduced below. The approach present here is preliminary, so the meta-level classifier is predefined as a C4.5 decision tree. • Naive Bayes classifier [13]. Naive Bayes uses the naive probabilistic estimator to classify instances. •
Logistic based classifier [20]. Classification is made by building and using a multinomial logistic regression model with a ridge estimators.
•
IB1 [18]. IB1 is the instance-based nearest-neighbour classifier using normalized Euclidean distance. If there is more than one instances having the same smallest distance, the first one is used.
•
IBk. IBk is similar to IB1, which uses k-nearest neighbour instead of one nearest neighbour. Here, five-nearest neighbours are used.
•
KStar [19]. KStar is an instance-based classifier. The class label of a test instance is decided by some entropy-based functions.
•
OneR [15]. It uses the minimum-error attribute for prediction.
•
PART [14]. PART builds a partial C4.5 decision tree in each iteration and turns the ”best” leaf into a classification rule using the separate-and-conquer strategy.
D. Experiment Setting In the preliminary experiment, we use the following settings:
•
ZeroR. It uses a 0-R classifiers for prediction.
•
Decision Stump [16]. Decision Stump generates an one-level decision tree to classify instances.
L=3
C4.5 Decision Tree[17]. An advanced version of decision trees algorithm.
τ = 0.1
•
C. Other Approaches The proposed approach, ACO-Stacking will be compared with the following well-known approaches: Bagging, AdaBoost (a variant of Boosting) [27], StackingC and
k = 10
The pheromone is initialized as 0.01. The final configuration generated by the ACO-Stacking will be used to construct a Stacking. To compare with the other approaches, a ten-fold cross validation was applied in WEKA environment.
Table I DATASET D ESCRIPTION
E. Experiment Result To evaluate the performance of the approaches, two test are used. The w/t/l test is used to evaluate the performance of ACO-Stacking when comparing with another approach. In the entry w/t/l means ACO-Stacking wins in w data sets, ties in t data sets and loses in l data sets comparing to the corresponding approaches. The Percent Improvement (PI) test is used to test the cumulative percent of improvement that ACO-Stacking can achieve in the 12 datasets comparing to the corresponding approaches. The formula of PI is given by: ∑ αi − α′ i p= αi where αi refers to the accuracy of ACO-Stacking in the ith data set and αi′ refers to the accuracy of the approaches compared. The average accuracies from the ten-fold cross validation of the approaches are given in Table II. Considering the average accuracy in each data set as the criterion to evaluate the performance , ACO-Stacking is the best in four of the twelve datasets, StackingC wins in five of the datasets and AdaBoost with C4.5 wins in two datasets in the whole twelve data sets. To compare the the other approaches respectively, the w/t/l test result is given in Table III. ACO-Stacking gets 8/0/4 with Bagging(REP), which means ACO-Stacking wins in 8 of the 12 data sets, ties in 0 of the data sets and loses in 4 of the data sets compared with Bagging with REP tree. ACO-Stacking gets 8/1/3 with Bagging(C4.5), 9/1/2 with AdaBoost(DS), 7/1/4 with AdaBoost(C4.5), 9/1/2 with Random Forest and 6/0/6 with StackingC. Considering the PIs of ACO-Stacking, the results are all positive values. The positive value indicates that ACO-Stacking does make improvement compared to the other approaches. ACO-Stacking gets a percent improvement of 30.32% with Bagging(REP), 37.2% with Bagging(C4.5), 31.82% with AdaBoost(DS), 33.05% with Bagging(C4.5), 28.73% with Random Forest and 17.66% with StackingC. Besides the empirical tests, some statistical tests are also conducted. However, no statistical significances can be achieved. So just from the empirical tests, ACO-Stacking is superior than Bagging with REP tree, Bagging with C4.5, AdaBoost with decision stump, AdaBoost with C4.5 and Random Forest. But ACO-Stacking is not significantly superior than StackingC. V. C ONCLUSION AND F UTURE W ORK In this work, a preliminary version of ACO-Stacking, a novel approach to search good configurations of Stacking ensembles for specific data sets by means of Ant Colony Optimization, is proposed. In the preliminary version of ACO-Stacking approach, the metaheuristic ACO is used to optimize the configurations of the base-level classifiers in the Stacking ensembles with a predefined meta-level classifier. Compared to the other well-known approaches
Dataset Breast-w Chess Colic Credit-A Credit-G Diabetes Heart-C Hepatitis Ionosphere Labor Sonar Vote
Attributes 11 37 27 15 21 9 14 20 35 17 61 17
Instances 699 3196 368 690 1000 768 303 155 351 57 208 435
Classes 2 2 2 2 2 2 2 2 2 2 2 2
Missing Values? no no yes no no no no no no yes no yes
in the empirical experiments, the ACO-Stacking is superior than the others. Though the performance of the preliminary approach is promising, this approach needs further development and refinement. The settings, such as of the pheromone updating rules, the evaporation rate, the number of ants and iterations, need further explored. It will be a good enhancement if the approach could choose the meta-level classifier to adapt to different configurations of base-level classifiers. In the experiments, some Stackings generated by the ACOStacking approach contain most of the classifiers of the candidate classifiers, which cost more training time but help less to improve the accuracy. It would be helpful to add some restriction on the maximal base-level classifiers when generating a Stacking. Though accuracy is very important to evaluate the performance of a classification system, other criteria are needed in some domains. For example, when dealing with cost-sensitive problem the cost relative criteria are more important than treating accuracy as the only criterion to evaluate the classifiers. In the future, the costsensitive criteria could be added to enhance the ability of ACO-Stacking in cost-sensitive problems. ACKNOWLEDGMENT This reseach is funded by XXX. The authors would also like to thank... R EFERENCES [1] Cui, G., Wong, M. L., and Lui, H. K., Machine Learning for Direct Marketing Response Model: Bayesian Networks with Evolutionary Programming, Management Science, 52, pp.597612., 2006. [2] Polikar R., Ensemble Based Systems in Decision Making, IEEE Circuits and Systems Magazine, 6, pp.21-45, 2006. [3] Breiman L., Bagging Predictors, Machine Learning, 24, pp.123-140, 1996. [4] Schapire, R., The Strength of weak learnability, Machine Learning, 5(2) pp.197-227, 1990. 1 PI
is ...
Table II T HE AVERAGE ACCURACY OF THE E NSEMBLES IN T EN -F OLD VALIDATION DataSet Breast-W Chess Colic Credit-A Credit-G Diabetes Heart-C Hepatitis Ionosphere Labor Sonar Vote PI
Bagging(REP) 95.708 99.124 70.109 86.232 74.4 74.609 78.878 83.871 90.883 85.965 77.404 95.862
Bagging(C4.5) 95.136 99.437 67.935 86.377 74 74.089 78.878 83.226 93.447 84.211 74.519 96.322
AdaBoost(DS) 94.993 93.836 82.609 84.348 69.5 74.349 83.498 82.581 90.883 87.719 71.645 95.402
AdaBoost(C4.5) 96.424 99.499 70.924 84.348 69.6 72.396 76.898 85.807 93.162 89.474 77.885 95.862
Random Forest 95.994 98.905 71.467 84.348 74.1 72.526 79.208 80.645 93.447 87.719 80.769 95.862
StackingC 97.282 99.437 64.13 86.812 74.7 76.5625 84.158 81.936 90.883 89.474 81.731 96.782
ACO-Stacking 96.996 99.343 82.88 84.348 74.8 74.479 81.188 83.226 92.023 91.228 82.692 95.632
30.32%
37.2%
31.82%
33.05%
28.73%
17.66%
-
1
Table III T HE W / T / L TEST ON ACO-S TACKING AND OTHER APPROACHES
w/t/l
Bagging(REP) 8/0/4
Bagging(C4.5) 8/1/3
AdaBoost(DS) 9/1/2
[5] Wolpert D., Stacked Generalization, Neural Network, 5, pp.241-259, 1992. [6] Dorigo M., and St¨utzle T., Ant Colony Optimization, MIT Press,2004. [7] Dorigo M., Optimization, learning and natural algorithms, Ph.d Thesis, Politecnico di Milano, Italy, 1992. [8] Costa D., and Hertz A., Ants can Colour Graphs, Journal of Operational Research Society, 48, pp.295-305, 1997. [9] Witten I., and Frank E., Data Mining: practical machine learning tools and techniques with Java implementations, Morgan Kaufmann, 2000. [10] Blake C., and Merz C., UCI repository of machine learning databases. http://www.uci.edu/ mlearn/MLRepository.html
AdaBoost(C4.5) 7/1/4
Random Forest 9/1/2
StackingC 6/0/6
[16] Iba W., and Langley P., Indection of One-Level Decision Trees, In Proceedings of the Ninth Conference on Machine Learning, pp.233-240, 1992. [17] Quinlan R., C4.5: Programs for Machine Learning,Morgan Kaufmann Publishers, San Mateo, CA, 1993. [18] Aha D., Kibler D., and Albert M. K.,Instance-based Learning Algorithms, Machine Learning, 6(1), pp.37-66, 1991. [19] Cleary K. G., and Trigg L. E., K*: An Instance-based Learner Using an Entropic Distance Measure, In Proceedings of the Twelfth International Conference on Machine Learning, pp.108-114, 1995. [20] le Cessie S., and van Houwelingen J. C., Ridge Estimations in Logistic Regression, Applied Statistics, 41(1), pp.191-201, 1992.
˘ [11] D˘zeroski S., and Zenko, Stacking with Multi-Response Model Trees,In Proceedings of the International Workshop on Multiple Classifier Systems, 2364, pp.201-211, June 2002.
[21] Zhu D., A Hybrid Approach for efficient ensembles, Decision Support Systems, 48(3), pp.480-487, 2010.
[12] Seewald A. K., How to Make Stacking Better and Faster While Also Taking Care of Unknown Weakness, In Proceedings of the Nineteenth International COnference of Machine Learning, pp.554-561, 2002.
[22] Ord´on˜ ez F. J., Ledezma A., and Sanchis A.,Genetic Approach for Optimizing Ensembles of Classifiers, In Proceedings of the Twenty-First International FLAIRS Conference, pp.89-94, 2008
[13] Jphn G. H., and Langley P., Estimating Continuous Distribution in Bayesian Classifiers, In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp.338345, 1195.
[23] Ledezma A., Aler R., and Sanchis A., Heuristic Searching Based Stacking of Classifiers, Heuristic and Optimization for Knowledge Discovery, pp.54-67, Ideal Group Publishing, 2002.
[14] Frank E., and Witten I. H., Generating Accurate Rule Sets Without Global Optimization, In Proceedings of the Fifteenth International Conference on Machine Learning, pp.144-151, 1998.
[24] Parpinelli R. S., Lopes H. S., and Freitas A. A., Data mining with an ant colony optimization algorithm, IEEE Transactions on Evolutionary Computation, 6(4), pp.321-332, 2002.
[15] Holte R. C., Very Simple Classification Rules Perform well on Most Commonly Used Datasets, Machine Learning, 11(1), pp.63-91, 1993.
[25] Al-Ani A., Feature Subset Selection Using Ant Colony Optimization, International Journal of Computational Intelligence, 2(1), pp.53-58, 2006.
[26] Abraham A., and Ramos V., Web Usage Mining Using Artificial Ant Colony Clustering and Linear Genetic Programming, In Proceedings of the Fifth Congress on Evolutionary Computation (CEC2003), Canberra, Australia, pp.1384-1391, 2003. [27] Freund Y., and Schapire, R., A Decision Theoretic Generalization of On-line Learning and an Application to Boosting, Journal of Computer and System Sciences, 55(1), pp.119-139, 1997. [28] Breiman L., Random Forest, Machine Learning, 45(1), pp.532,2001.