Ayman Khalafallah
IJCSET |December 2011 | Vol 1, Issue 11, 722-724
Constructing Near Optimal Binary Decision Tree Classifier Using Genetic Algorithm Ayman Khalafallah Department of Computer and Systems Engineering Faculty of Engineering Alexandria University Alexandria, Egypt
[email protected] •
Abstract—Decision Trees are extensively used in classification, pattern recognition and Data Mining. Classical decision tree building algorithms Iterative Dichotomiser 3 (id3) [3] uses one attribute to test at each internal node, resulting in the decision boundaries being parallel to the axis and builds the tree one node at a time. In this paper a new method is proposed where at each internal node a hyperplane is selected based on all attributes, this hyperplane partitions the training set into two disjoint sets. Our method also tries to build most of the tree in a single optimization problem. Genetic Algorithm is used as the optimization methods. The resulting tree using the proposed algorithm may be more compact and accurate in the classification problems.
•
II.
GENETIC ALGORITHM
A Genetic algorithm (GA) is an optimization method that tries to mimic the natural selection and mutations of living organisms [9]. The genetic algorithm is an iterative optimization technique in which a pool of possible alterative solution to an optimization problem is maintained. This pool is called the population and then using the basic GA operations which are Cross Over and mutation a new generation of the population is produced[9]. Genetic algorithm is widely used in optimization because it combines to benefits of random and steepest decent like search and it is a powerful mix between exploration and exploitation. To be able to use genetic algorithm possible solution of the problem has to be encoded as a chromosome. A chromosome in GA is nothing than an array of bits or digits. The basic operation of GA is cross over and can be described as in Figure 1 Cross
Keywords : Decison Tree, Genetic algorithm, classifier, Optimization
I.
The algorithm tries to construct a complete binary tree to make sure that the tree is as compact and balanced as possible. The algorithm does not attempt to completely construct the decision tree in order to reduce the complexity of the chromosome representation in Genetic algorithm.
INTRODUCTION
A classification problem is a problem where a label is assigned to an object based on this object attributes. A classical example is character recognition when a character is scanned at the scanned image is used to classify the character. This attributes might be the size the orientation, the contour or the scanned black and white matrix of the character. Classification problems are widely used in biology for example to detect possible diseases based on the outcome of a gene test. To solve the classification problem a classifier has to be built, the classifier is built using training examples. In the case of supervised training considered in this paper, the training examples consist of objects identified with their attributes and labeled with their classes. The training process of the classifier is called learning process. Decision trees are an important tool in decision making, pattern recognition, classification and data mining. Because of their importance many algorithms exists to build decision trees [1][2][3][6]. The purpose of these algorithms is to build as compact as possible decision tree to classify the data from the training examples. It is believed that a shorter decision tree is the best possible tree according to Occam Razor principle[1][2][3]. It has to be noted though that constructing an optimal decision tree is an NP-hard problem [4][5] so one must accept a heuristic to build a near optimal search tree. The references [6][7][10] show many attempts to build near optimal decision tree. In this paper a new method to construct near optimal binary decision tree is presented this method defers from the above methods in the following: • At each node a hyperplane test is used to decide on which branch to follow not just a single attribute test.
Over in GA Chromosome1: 3
1
4
-1
7
9
Chromosome 2: 7 6 11
12
45
22
-13
11
70
Offspring 1 7 6
12
1
4
-1
7
9
-13
11
70
1
2
Offspring 2 1 2
2
11
2
3
45
22
Figure 1 Cross Over in GA Here a random point is chosen and the two chromosomes change content at this random point. Another GA operation is mutation where a single bit or digit changes value randomly, typically, with a very small probability. Crossover make sure that a new population is being generated from the old one while mutation ensure diversity and make sure that the algorithm is not trapped in a local minimum. Important to the Genetic algorithm is what is called a fitness of the
722
Ayman Khalafallah
IJCSET |December 2011 | Vol 1, Issue 11, 722-724
chromosome. The fitness off chromosome is problem deppendent n the kind of thhe optimizationn problem and whether w and depends on it is a minimizaation or the maaximization. According to the survival s of the fittest thhe higher the fitness f value off the chromosoome, the more the posssibility of it oor its offspringg survival to thhe next population. A including crrossover Many parametters have to bee chosen for GA and mutation probabilities, population p sizee, crossover tyype and nerations. But the most important part is how to number of gen represent a posssible solution aas a chromosom me and how to evaluate e its fitness.
III.
DE ECISON TRE EES Figgure 2
A decision trree is a tree data structuree with the addditional properties that at each node tthere is a test of o a set of attrib butes to b to follow and also eacch leaf is labelled with decide which branch the name of a class. The rooot and each innternal node aree called decision nodess. The node atttributes’ check can be lookedd at as a partition of thhe search spacce. Classical decision d tree building b algorithms [3] uses a single attribute rangee tests at each internal e node paratthion the searcch space node and this will result in each b axis. A qquick look at Fiigure 2 will shhow that parallel to the basic this partition might be highly inefficieent and geneerally a ght be more useeful. hyperplane testt of the followinng equation mig
IV.
PROPO OSED ALGORITH HM
Thhe proposed method tries to overcomee all the preevious shhortcomings. The T method buuilds a binary y search tree using hyyperplane as thee separation meethod at each node. n Because of o the coomplexity of optimizing booth the tree structure andd the hyyperplane simuultaneously a ddecision was made m to look for a paarticular binaryy tree structuree namely a co omplete binary tree. Coomplete binaryy tree as shownn in Figure 3 is a binary treee is a binnary tree in which w every leevel, except possibly p the laast, is coompletely filledd, and all nodees are as far leeft as possible. The nuumber of leafs is i chosen to be the number claasses. Which leaads to thee fact that the number n of deccision nodes to be equal to nuumber off classes minus one. Anny binary decission tree will hhave a number of leafs greaterr than or equal to the number n of classses and a compplete binary treee has o all binary treees of thee shortest length and least avverage length of givven size makiing it the mosst compact off all decision trees. Annother advantagge of completee binary tree is that t it has a com mpact arrray representation the followiing array is a representation r o the of treee in Figure 3
ሺܽ ሺܽ ݔ ሻሻ ୀଵ
Using the abovve equation sam mples are classified to the left or right node branch deepending on thee sign of above term. In this eqquation xi’s are the seet of attributess and ai’s are the set of weiight the algorithm musst find in ordder increase the t efficiency of the hyperplane cut. d trees, thhe main There are mainnly three alternaatives to build decision objective of each e method is to build as compact as possible p decision tree in n hoping that it will be the mosst useful tree acccording to the debatedd but highly acccepted Occam m's Razor[3] prrinciple. These three aree 1. Buildd the tree onee node at a time t using a selected s attribute at eachh node to partittion the search space. s 2. Use hyperplanes h to ppartition the seaarch space. 3. Buildd the tree as a siingle optimization problem.
A
B
C
D
E
F
G
H
I
more concrete examples e abouut those The Survey inn [6] gives m methods. m some Any of these thhree approachess has its merits but suffers from shortcomings. The first alterrnative is the simplest and tends t to produce trees that rules can be extracted from them eassily but ure 2, it will beecome clear thaat trees built ussing this looking at Figu method might be not optimaal. Method1 thhough is the prreferred method becausse the resultingg tree and the rules associateed with traversing the tree t is easier to read and interppret. Method2 althoough superior at a each node may m suffer from m local anomalies as thhe cuts defined at upper levels may result in bad b cuts later down the tree. Methodd3 is superiorr but suffer frrom the M thouggh tends to be more robust. Also A the deficiency in Method1 optimization problem associaated with methhod3 turns to be b much a no guaranteee of reaching the t global optim mization more difficult and goal in a reasonnable amount oof time. Also alllowing the tree to be of arbitrary structture tends to reender the probllem very difficcult and time consumingg and the gain from f such methhod might not juustified.
Figgure 3 he array if theyy exist Baasically the chilldren of node aat position i at th aree at position 2i and 2i+1. This achieves two goals g 1. Traversing the t tree is easy. A is simple. 2. Encoding thhe tree as a chroomosome in GA
723
Ayman Khalafallah
IJCSET |December 2011 | Vol 1, Issue 11, 722-724
The first step of o the Algorithm m uses GA to find f a possible optimal search tree of o this structu ure and as noted n above th he tree representation as a chromosom me is simple. n guarantee that neither such tree exist as a solution s Since there is no to the classificaation problem nor n the GA willl be able to find one if one actually do oes exist, anoth her method has to be carried from fr the leafs of the tree found in stepp1 of the algoritthm. Method2, namely o the tree in sttep 1, is finding a hyperrplane cut at eaach leaf node of chosen to com mplete the tree w with GA as thee optimization method. m Figure 4 show ws a possible tr tree resulting frrom step 1 iden ntifying the places where the hyperplaane cuts must bee found by the number 1 to 7.
/// Holding Array y size is the poppulation size /// Internal array ys size are chrom mosome size} Prrocedure Crosso over(Chromosoome: CH1,CH1)){ // Do Crossover between CH1 aand CH2} me CH){ Prrocedure Mutatiion(Chromosom // Mutate Chrom mosome CH} Prrocedure GA(){{ InnitializePopulaation() R Repeat Repeat hromosomes CH H1,CH2 Pick two Ch Crossover(C CH1,CH2) Evaluate fitn ness(OffSpring1) Evaluate fitn ness(OffSpring1) EndRepeat Mutate Somee chromosomes E } End V V.
F Figure 4 f function n. For step 2 th he usual The last part is chosing the fitness f is a bit tricky entropy based[[3] fitness is uused. Step 1 fitness though as nearr complete classification is preeferred over minimum m entropy so thee fitness functtion is defined d as follows: first f the number of classses staying aliive (having a probability p of reaching r this leaf more than t 2%) at eacch leaf and thiss number is den noted by v, second the fitness f of each leaf l is ((2-v)/total number of classes). c The fitness of the whole tree is the sum of th he fitness of its leafs. nd that is that iff a leaf node haas only one classs it will The logic behin not need to be extended in sttep 2 which is best possible. If I it has s two classes claassical methodss in pattern reccognition can separate them and the contribution of this t node to fitn ness of the tree is zero. w need more than on While if a leaf has more than two classes it will level in step 2 of the algoritthm and this a bad outcome so it is assigned a negaative value andd it is subtractiv ve to fitness of the t tree. The more the classes c that end alive in a leaf node n the less itss fitness value. Finally in Table I a ssimple pseudo code of step 1 of the ven as step 2 is more or less sttraight forward.. algorithm is giv TABLE E I.
CONCLU USION AND FUTU URE WORK
In this paper a no ovel method forr constructing a near optimal binary b how why this method m search tree is given. Arguments are given to sh m the shhould be robustt. Experimentaal results is neeeded to confirm beenefits of the proposed p algorrithm. Also ex xperiments mu ust be caarried out to teest different fiitness function.. The possibiliity of pruuning the tree resulting r from tthe first part of the algorithm before b thee second step has h also to be ex xplored. REFEERENCES [1] Mitchell, T. M., M ”Machine L Learning”,McGrraw-hill, 1997. [2] Duda, R. O., Hart, P. E., and Storrk, D. G., Pattern Cllassification, 2n nd Ed., Wiley innterscience, 200 01. [3] Quinlan, J. R., R Induction off decision trees, Machine Learrning, 1(1), 81-106. [4] L. Hyafil and a R. L. Rivvest, Constructting optimal binary b deecision trees iss NP-complete , Information Processing Leetters, Vo ol. 5, No. 1, 15--17, 1976. [5] Bodlaender, L.H. and Zanteema H., Findin ng Small Equiv valent i Hard, Internnational Journal of Foundatio ons of Deecision Trees is Co omputer Sciencce, Vol. 11, N No. 2 World Scientific Publisshing, 20 000, pp. 343-354. [6] Safavian, S.R R. and Landgreebe, D., A surv vey of decision n tree dology,IEEE Transactions T on n Systems, Man n and claassifier method Cy ybernetics, Vol 21, No. 3, pp 6660-674, 1991.
STEP 1TREE BUILDING G PSEUDO CODE
[7] Gehrke, J., Ganti, G V., Ramaakrishnan R., an nd Loh, W., BO OATptimistic Decission Tree Connstruction, in Proc. of the ACM A Op SIIGMOD Conferrence on Managgement of Data, 1999, p169-18 80.
function classiffy(chromosomee Ch, Pattern t){ i=1 While (i0 i = 2i else i = 2i+1} Return i - ch.llength-1} Function Evalu uatefitness(Chro omosome Ch){ For every patttern t classify(ch,,t) return the fitn ness as describedd in the text abo ove} Procedure InitializePopulationn(){ a of arrays oof random inpu ut; Generate an array
[8] Zhao, Q. and d Shirasaka, M.., A Study on Evolutionary E D Design o the Congresss on off Binary Decission Trees, in Proceedings of mputation, Vol 3, IEEE, 1999,, pp. 1988-1993 3. Evvolutionary Com [9] Goldberg D. L., Genetic Allgorithms in Seearch, Optimizzation, nd Machine Leaarning, Addison n-Wesley, 1989. an [10] S. Cha and C. C C. Tappert, C Constructing Biinary Decision Trees gs of International ussing Genetic Algorithms, iin Proceeding Co onference on Genetic G and Evolutionary E Methods, M July 14-17, 1 20 008, Las Vegas,, Nevada. Systeems, Systems, IEEE I Systems, Man, an nd Cybernetics, Vol. 21, No. 5,, pp.1231-1238, 1991
724