cAnt-Miner (Ant-Miner coping with continuous attributes), in this paper we ..... No. 162/025/361). Fernando Otero also acknowledges further financial support.
Handling Continuous Attributes in Ant Colony Classification Algorithms Fernando E. B. Otero, Alex A. Freitas, and Colin G. Johnson
Abstract— Most real-world classification problems involve continuous (real-valued) attributes, as well as, nominal (discrete) attributes. The majority of Ant Colony Optimisation (ACO) classification algorithms have the limitation of only being able to cope with nominal attributes directly. Extending the approach for coping with continuous attributes presented by cAnt-Miner (Ant-Miner coping with continuous attributes), in this paper we propose two new methods for handling continuous attributes in ACO classification algorithms. The first method allows a more flexible representation of continuous attributes’ intervals. The second method explores the problem of attribute interaction, which originates from the way that continuous attributes are handled in cAnt-Miner, in order to implement an improved pheromone updating method. Empirical evaluation on eight publicly available data sets shows that the proposed methods facilitate the discovery of more accurate classification models.
I. I NTRODUCTION
T
HE classification task in data mining aims at predicting the value of a given goal attribute for an example, based on the values of a set of predictor attributes for that example [1]. Since real-world classification problems are generally described by nominal (discrete) and continuous (real-valued) attributes, classification algorithms are required to be able to cope with both nominal and continuous attributes in order to build a classification model. However, most Ant Colony Optimisation (ACO) [2] classification algorithms have the limitation of being able to cope with only nominal attributes. Continuous attributes, if present, need to be transformed into nominal attributes, by creating discrete intervals in a preprocessing step. There are potentially two drawbacks by not coping with continuous attributes directly. Firstly, there is a need for a discretisation procedure in a preprocessing step. Secondly, less information is available to the classification algorithm, since the discretisation procedure creates a fixed number of discrete intervals for each continuous attribute. In this paper, we focus on extending the ideas of cAntMiner [3] in coping with continuous attributes. cAnt-Miner pioneered in coping with both types (nominal and continuous) attributes directly, taking full advantage of all continuous attributes’ information and not requiring a discretisation procedure in a preprocessing step. We propose two new methods for handling continuous attributes in ACO classification algorithms. The first method gives the ability to use continuous attributes intervals with lower and upper bound values (i.e. vlower ≤ attribute < vupper ). The second The authors are with the Computing Laboratory, University of Kent, Canterbury, Kent, United Kingdom (email: {febo2, A.A.Freitas, C.G.Johnson}@kent.ac.uk).
method explores the attribute interaction introduced by the way that continuous attributes are handled internally, which is not taken into account by cAnt-Miner. We present empirical evaluation results for validating the proposed methods. The remainder of this paper is organised as follows. Section II presents a brief overview of Ant-Miner [4], the first ACO classification algorithm, and cAnt-Miner. Section III discusses the proposed two new methods for handling continuous attributes. The empirical evaluation results are presented in Section IV. Finally, Section V draws the conclusions of the paper and present future research directions. II. BACKGROUND Ant Colony Optimisation (ACO) systems simulate the behaviour of real ants using a colony of artificial ants, which cooperate in finding good solutions to optimization problems. Each artificial ant, representing a simple agent in the system, builds candidate solutions to the problem at hand and communicates indirectly with other artificial ants by means of pheromone levels. At the same time that ants perform a global search for new solutions, the search is guided to better regions of the search space based on the quality of solutions found so far. The system converges to good solutions as a result of the collaborative interaction among the ants. The interactive process of building candidate solutions and updating pheromone values allows an ACO algorithm to converge to optimal or near-optimal solutions. In the context of discovering classification rules in data mining, ACO algorithms have been successfully applied to several different classification problems [5]. In this section, we present an overview of Ant-Miner, the first implementation of an ACO algorithm for the classification task of data mining [4], and cAnt-Miner [3], the first (to the best of our knowledge) ACO classification algorithm able to cope with continuous attributes without requiring a discretisation procedure in a preprocessing step. A. Ant-Miner Overview Ant-Miner aims at extracting IF-THEN classification rules of the form IF (term1 ) AND (term2 ) AND ... AND (termn ) THEN (class) from data. Each term in the rule is a triple (attribute, operator, value), where operator represents a relational operator and value represents a value of the domain of attribute (e.g. sex = male). The IF part corresponds to the rule’s antecedent and the THEN part corresponds to the rule’s consequent, which represents the class to be predicted by the rule. An example that satisfies the rule’s antecedent
will be assigned the class predicted by the rule. As AntMiner only works with nominal (categorical or discrete) attributes, the only valid relational operator is “=” (equality operator). Continuous attributes need to be discretised in a preprocessing step. A high level pseudo-code of Ant-Miner is presented in Algorithm 1 [4]. In summary, Ant-Miner works as follows. It starts with an empty rule list and iteratively (while loop) adds one rule at a time to that list while the number of uncovered training examples is greater than a user-specified maximum value. In order to construct rules, ants start with an empty rule (no terms in its antecedent) and add one term at a time to their rule antecedent (repeat-until loop). Terms are probabilistically chosen to be added to current partial rules based on the values of the amount of pheromone (τ ) and a problem-dependent heuristic information (η) associated with terms (vertices in the construction graph). A pheromone value and a heuristic value are associated with each possible term – i.e. each possible triple (attribute, operator, value). As usual in ACO, heuristic values are fixed (based on an information theoretical measure of the predictive power of the term), while pheromone values are iteratively updated based on the quality of the rules built by ants. Ants keep adding a term to their partial rule until any term added to their rule’s antecedent would make their rule cover less training examples than a user-specified threshold (in order to avoid too specific and unreliable rules), or all attributes have already been used. The latter rule construction stopping criterion is necessary because an attribute can only occur once in the antecedent of a rule, in order to avoid inconsistencies such as . Once the rule construction process has finished, the rule constructed by an ant is pruned to remove irrelevant terms from the rule antecedent. Then, the consequent of a rule is chosen to be the class value most frequent among the set of training examples covered by the rule in question. Finally, pheromone trails are updated using the best rule, based on a quality measure Q, created by ants. The process of constructing a rule is repeated until a user-specified number of iterations has been reached, or the best rule of the current iteration is exactly the same as the best rule constructed by a predefined number of previous iterations, which works as a rule convergence test. The best rule found along this iterative process is added to the rule list and the covered training examples (training examples that satisfy the antecedent of the best rule) are removed from the training set. B. cAnt-Miner Overview In order to overcome Ant-Miner’s limitation of only coping with nominal attributes, Otero et al. [3] have proposed an Ant-Miner extension — named cAnt-Miner (Ant-Miner coping with continuous attributes) — which can dynamically create thresholds on continuous attributes’ domain values during the rule construction process. Since cAnt-Miner has the ability of coping with continuous attributes “on-thefly”, continuous attributes do not need to be discretised in
Algorithm 1: High level pseudo-code of Ant-Miner. begin Ant-Miner tr set ← all training examples; rule list ← ∅; while |tr set| > M axU ncoveredExamples do τ ← initializes pheromones; rulebest ← ∅; repeat CreateRules(); ComputeConsequents(); P runeRules(); currentbest ← BestRule(); U pdateP heromones(τ, currentbest ); if Q(currentbest ) > Q(rulebest ) then rulebest ← currentbest ; end i ← i + 1; until i ≥ M axIterations OR Convergence() ; rule list ← rule list + rulebest ; tr set ← tr set \ CoveredExamples(rulebest ); end end
a preprocessing step. cAnt-Miner extended Ant-Miner in several ways, as follows. Firstly, cAnt-Miner includes vertices to represent continuous attributes in the construction graph. For each nominal attribute xi and value vij (where xi is the i-th nominal attribute and vij is the j-th value belonging to the domain of xi ), a vertex (xi = vij ) is added to the construction graph, as in Ant-Miner. Furthermore, for each continuous attribute yi , a vertex (yi ) is added to the construction graph, unlike in Ant-Miner. Note that continuous attributes vertices do not represent a valid term, since they do not have a relational operator and value associated in the construction graph, in contrast to nominal attributes. The relational operator and a threshold value will be determined when an ant selects a continuous attribute vertex as the next term to be added to the rule (an example of continuous attribute term is: ‘age > 21’). This makes the choice of a relational operator and value tailored to the current candidate rule being constructed, rather than chosen in a static preprocessing step. Secondly, in order to compute the heuristic information for continuous attributes, cAnt-Miner incorporates a dynamic entropy-based discretisation procedure. In Ant-Miner, the heuristic value of each nominal vertex (xi = vij ) involves a measure of entropy associated with the partition of examples which have the specific vij value for the attribute xi . The entropy measure, which is derived from information theory and is often used in data mining, quantifies the impurity of a collection of examples. Since continuous attribute vertices (yi ) do not represent a partition of examples as nominal attribute vertices, a threshold value v need to be selected in order to dynamically partition the set of examples into two intervals: yi < v and yi ≥ v. The best threshold value
v is the value v that minimizes the entropy of the partition, computed as
entropy(yi , v) =
|Syi