A Genetic Algorithm for Classification - wseas

Recent Researches in Computers and Computing

A Genetic Algorithm for Classification RAUL ROBU, ŞTEFAN HOLBAN Department of Automation and Applied Informatics, Department of Computers “Politehnica” University of Timisoara Vasile Parvan Blvd.2, 300223, Timisoara ROMANIA [email protected], [email protected]

Abstract: - The paper presents aspects regarding genetic algorithms, their use in data mining and especially about their use in the discovery of classification rules. A synthetic presentation of the fitness functions of the genetic algorithms used for mining the classification rules is performed. A genetic algorithm with a new fitness function for mining the classification rules is suggested. The proposed algorithm was tested on classic dataset Car, Zoo and Mushroom. The same datasets were tested with classic algorithms NaiveBayes si J48. The results obtained by applying the three algorithms are presented. Key-Words: - genetic algorithm, data mining, classification

1 Introduction The genetic algorithms are adaptive techniques that can be successfully used to solve complex search and optimization problems [1]. They are based on the principles of genetics and Darwin’s natural selection theory (“the one that is best endowed, survives”). Genetic algorithms have been successfully used in data mining, in order to determine classification rules [2], in order to search for appropriate cluster centers) [1], to select the attributes of interest in predicting the value of a target attribute [3], etc. Classification of instances was performed using some hybrid algorithms based on genetic algorithms and particle swarm optimization [4], respectively Naïve Bayes and k-Nearest Neighbors [5]. A few applications in which genetic algorithms were successfully applied to solve classification problems are prints’ classification, heart disease classification, classification of emotions on the human face, etc. The fitness functions of the genetic algorithms used for mining classification rules may contain metrics concerning predictive accuracy, rule comprehensibility as well as rule interestingness [2][3]. Diverse studies suggest genetic algorithms with fitness functions that take into consideration in different ways the above mentioned metrics. The paper synthetically presents the fitness functions suggested in mining classification rules and proposes a new fitness function. The genetic algorithm that incorporates the suggested function was implemented in Java and tested on 3 classic datasets Zoo, Car and Mushrooms. The paper presents aspects regarding the steps of the proposed algorithm. The 3 datasets were also classified using Naive Bayes and J48 algorithms from WEKA. The results obtained with the three algorithms are comparable and are presented in chapter 4.

2 Fitness functions used in mining classification rules Genetic algorithms are used to discover classification rules for data that will be used for predictions. The discovered rules must have a great prediction accuracy, they must be comprehensive and interesting [2][3]. The form of the rules is IF X THEN Y, where X is the antecedent of the rule and is formed from a conjunction of conditions and Y is the consequent of the rule, that is the predicted class. In the fitness functions used for the discovery of classification rules the following factors from the confusion matrix are often used: • True positive (tp): the actual class is Y and the predicted class is also Y. • False positive (fp): the actual class is Y, but the predicted class is not Y.

ISBN: 978-1-61804-000-8

52

Recent Researches in Computers and Computing

• True negative (tn): the actual class is not Y and the predicted class is also not Y. • False negative (fn): the actual class is not Y, but the predicted class is Y. In [3] the next fitness function is suggested: Fitness=w1 x (CF x Comp) + w2 x Simp

(1)

The first term from relation (1) determines prediction accuracy and the second one determines rule comprehensibility. CF= tp / (tp+fp) is confidence factor and Comp= tp / (tp+fn) is the completeness factor. Simp represents the simplicity of the rule (is inversaly proportional to the number of conditions from the antecedent of the rule). w1 and w2 are weights defined by the user. In [6] the fitness function was determined: Fitness=tp / (tp+A.fn) * tn / (tn+B.fp)

(2)

Where 0.2

A Genetic Algorithm for Classification - wseas

A Genetic Algorithm for Classification - wseas

Suggest Documents

Improved C4.5 Algorithm for Rule Based Classification - wseas

A Genetic-Algorithm-Based Approach to UAV Path Planning ... - wseas

A parallel genetic algorithm for single class pattern classification and ...

A Genetic Algorithm for Text Classification Rule Induction

A Genetic Algorithm for Discovering Classification Rules in Data Mining

GENIE: A Hybrid Genetic Algorithm for Feature Classification in Multi ...

Lung Cancer Classification Using Genetic Algorithm ...

Medical image classification using genetic- algorithm ... - CiteSeerX

Optimization of droop setting using Genetic Algorithm - WSEAS

A Genetic Algorithm for the Job Shop Scheduling with a New ... - wseas

Genetic Algorithm-based Feature Set Partitioning for Classification ...

Feature Selection by Genetic Algorithm and SVM Classification for ...

Feature Selection for Efficient Gender Classification - wseas

Using Wavelets for Texture Classification - wseas

An algorithm for corrugated paper cutting - wseas

Fast parallelized algorithm for ECG analysis - wseas

A Genetic Algorithm Tutorial

a sparse greedy algorithm for classification

A Novel Algorithm for Associative Classification - CiteSeerX

A Simple Algorithm for Population Classification - Nature

A simple, binary classification algorithm for the

A Real-Time Intrusion Detection Algorithm for Network Security - wseas

A developmental and genetic classification for ... - CiteSeerX

A New Algorithm for Optimization of the Kohonen Network ... - WSEAS