Generating Effective Classifiers with Supervised

0 downloads 0 Views 116KB Size Report
integrating classification rules mining with association rules mining. The ..... tic-tac-toe. 30.1. 17.0. 15.4. 0.6. 0.4. 5.2. 1.6 vehicle. 40.0. 29.5. 28.4. 27.4. 31. 24.7.
Generating Effective Classifiers with Supervised Learning of Genetic Programming Been-Chian Chien1, Jui-Hsiang Yang1, and Wen-Yang Lin2 1

Institute of Information Engineering I-Shou University 1, Section 1, Hsueh-Cheng Rd., Ta-Hsu Hsiang, Kaohsiung County Taiwan, 840, R.O.C. {cbc, m9003012}@isu.edu.tw 2 Department of Information Management I-Shou University 1, Section 1, Hsueh-Cheng Rd., Ta-Hsu Hsiang, Kaohsiung County Taiwan, 840, R.O.C. [email protected]

Abstract. A new approach of learning classifiers using genetic programming has been developed recently. Most of the previous researches generate classification rules to classify data. However, the generation of rules is time consuming and the recognition accuracy is limited. In this paper, an approach of learning classification functions by genetic programming is proposed for classification. Since a classification function deals with numerical attributes only, the proposed scheme first transforms the nominal data into numerical values by rough membership functions. Then, the learning technique of genetic programming is used to generate classification functions. For the purpose of improving the accuracy of classification, we proposed an adaptive interval fitness function. Combining the learned classification functions with training samples, an effective classification method is presented. Numbers of data sets selected from UCI Machine Learning repository are used to show the effectiveness of the proposed method and compare with other classifiers.

1 Introduction Classification is one of the important tasks in machine learning. A classification problem is a supervised learning that is given a data set with pre-defined classes referred as training samples, then the classification rules, decision trees, or mathematical functions are learned from the training samples to classify future data with unknown class. Owing to the versatility of human activities and unpredictability of data, such a mission is a challenge. For solving classification problem, many different methods have been proposed. Most of the previous classification methods are based on mathematical models or theories. For example, the probability-based classification methods are built on Bayesian decision theory [5][7]. The Bayesian network is one of the important classification methods based on statistical model.

Many improvements of Naïve Bayes like NBTree [7] and SNNB [16] also provide good classification results. Another well-known approach is neural network [17]. In the approach of neural network, a multi-layered network with m inputs and n outputs is trained with a given training set. We give an input vector to the network, and an ndimensional output vector is obtained from the outputs of the network. Then the given vector is assigned to the class with the maximum output. The other type of classification approach uses the decision tree, such as ID3 and C4.5 [14]. A decision tree is a flow-chart-like tree structure, which each internal node denotes a decision on an attribute, each branch represents an outcome of the decision and leaf nodes represent classes. Generally, a classification problem can be represented in a decision tree clearly. Recently, some modern computational techniques start to be applied by few researchers to develop new classifiers. As an example, CBA [10] employs data mining techniques to develop a hybrid rule-based classification approach by integrating classification rules mining with association rules mining. The evolutionary computation is the other one interesting technique. The most common techniques of evolutionary computing are the genetic algorithm and the genetic programming [4][6]. For solving a classification problem, the genetic algorithm first encodes a random set of classification rules to a sequence of bit strings. Then the bit strings will be replaced by new bit strings after applying the evolution operators such as reproduction, crossover and mutation. After a number of generations are evolved, the bit strings with good fitness will be generated. Thus a set of effective classification rules can be obtained from the final set of bit strings satisfying the fitness function. For the genetic programming, either classification rules [4] or classification functions [6] can be learned to accomplish the task of classification. The main advantage of classifying by functions instead of rules is concise and efficient, because computation of functions is easier than rules induction. The technique of genetic programming (GP) was proposed by Koza [8][9] in 1987. The genetic programming has been applied to several applications like symbolic regression, the robot control programs, and classification, etc. Genetic programming can discover underlying data relationships and presents these relationships by expressions. The algorithm of genetic programming begins with a population that is a set of randomly created individuals. Each individual represents a potential solution that is represented as a binary tree. Each binary tree is constructed by all possible compositions of the sets of functions and terminals. A fitness value of each tree is calculated by a suitable fitness function. According to the fitness value, a set of individuals having better fitness will be selected. These individuals are used to generate new population in next generation with genetic operators. Genetic operators generally include reproduction, crossover and mutation [8]. After the evolution of a number of generations, we can obtain an individual with good fitness value. The previous researches on classification using genetic programming have shown the feasibility of learning classification functions by designing an accuracy-based fitness function [6][11] and special evolution operations [2]. However, there are two main disadvantages in the previous work. First, only numerical attributes can be calculated in classification functions. It is difficult for genetic programming to handle the cases with nominal attributes containing categorical data. The second

drawback is that classification functions may conflict one another. The result of conflicting will decrease the accuracy of classification. In this paper, we propose a new learning scheme that defines a rough attribute membership function to solve the problem of nominal attributes and gives a new fitness function for genetic programming to generate a function-based classifier. Fifteen data sets are selected from UCI data repository to show the performance of the proposed scheme and compare the results with other approaches. This paper is organized as follows: Section 2 introduces the concepts of rough set theory and rough membership functions. In Section 3, we discuss the proposed learning algorithm based on rough attribute membership and genetic programming. Section 4 presents the classification algorithm. The experimental results is shown and compared with other classifiers in Section 5. Conclusions are made in Section 6.

2 Rough Membership Functions Rough set introduced by Pawlak [12] is a powerful tool for the identification of common attributes in data sets. The mathematical foundation of rough set theory is based on the set approximation of partition space on sets. The rough sets theory has been successfully applied to knowledge discovery in databases. This theory provides a powerful foundation to reveal and discover important structures in data and to classify complex objects. An attribute-oriented rough sets technique can reduce the computational complexity of learning processes and eliminate the unimportant or irrelevant attributes so that the knowledge can be learned from large databases efficiently. The idea of rough sets is based on the establishment of equivalence classes on the given data set S and supports two approximations called lower approximation and upper approximation. The lower approximation of a concept X contains the equivalence classes that are certain to belong to X without ambiguity. The upper approximation of a concept X contains the equivalence classes that cannot be described as not belonging to X. A vague concept description can contain boundary-line objects from the universe, which cannot be with absolute certainty classified as satisfying the description of a concept. Such uncertainty is related to the idea of membership of an element to a concept X. We use the following definitions to describe the membership of a concept X on a specified set of attributes B [13]. Definition 1: Let U = (S, A) be an information system where S is a non-empty, finite set of objects and A is a non-empty, finite set of attributes. For each B ⊆ A, a ∈ A, there is an equivalence relation EA(B) such that EA(B) = {(x, x′)∈S2 | ∀ a ∈ B, a(x) = a(x′)}. If (x,x′) ∈ EA(B), we say that objects x and x′ are indiscernible. Definition 2: apr = (S, E), is called an approximation space. The object x ∈ S belongs to one and only one equivalence class. Let [x]B = { y | x EA(B) y,x, y ∈ S }, [S]B = {[x]B | x ∈ S }. The notation [x]B denotes equivalence classes of EA(B) and [S]B denotes the set of all equivalence classes [x]B for x ∈ S.

Definition 3: For a given concept X ⊆ S, a rough attribute membership function of X on the set of attributes B is defined as | [ x] B ∩ X | , µ BX ( x) = | [ x] B | where |[x]B| denotes the cardinality of equivalence classes of [x]B and |[x]B ∩X| denotes the cardinality of the set [x]B ∩X. The value of µ BX (x) is in the range of [0, 1].

3 The Learning Algorithm of Classification Functions 3.1 The Classification Functions Consider a given data set S has n attributes A1, A2, …, An. Let Di be the domain of Ai, Di ⊆ R for 1 ≤ i ≤ n and A = {A1, A2,…, An}. For each data xj in S, xj = (vj1, vj2, ... , vjn), where vjt ∈ Dt stands for the value of attribute At in data xj. Let C = {C1, C2, …, CK} be the set of K predefined classes. We say that is a sample if the data xj belongs to class cj, cj ∈ C. We define a training set (TS) to be a set of samples, TS = { | xj ∈ S , cj ∈ C, 1 ≤ j ≤ m}, where m = |TS| is the number of samples in TS. Let mi be the number of samples belonging to the class Ci, we have m = m1 + m2 + ⋅⋅⋅ + mK, 1 ≤ i ≤ K. A classification function for the class Ci is defined as fi : Rn → R, where fi is a function that can determine whether a data xj belongs to the class Ci or not. Since the image of a classification function is in real number, we can decide if a data xj belongs to the specified class by means of a specific range where xj is mapped. If we find the set of functions that can recognize all K classes, a classifier is constructed. We define a classifier F for the set of K predefined classes C as follows, F = { fi | fi : Rn → R, 1 ≤ i ≤ K}. 3.2 The Transformation of Rough Attributes Membership The classification function defined in Section 3.1 has a limitation on attributes: Since the calculation of functions accepts only numerical values, classification functions cannot work if datasets contained nominal attributes. In order to apply the genetic programming to be able to train all data sets, we make use of rough attribute membership as the definitions in Section 2 to transform the nominal attributes into a set of numerical attributes. Let A = {A1, A2, …, An}, for a data set S has the set of attributes A containing n attributes and Di is a domain of Ai, a data xj ∈ S, xj = (vj1, vj2, ... , vjn), vji ∈ Di. If Di ⊆ R, Ai is a numerical attribute, we have Ãi = Ai, let wjk be the value of xj in Ãi, wjk = vji. If Ai is a nominal attribute, we assume that S is partitioned into pi equivalence classes by attribute Ai. Let [sl]Ai denote the l-th equivalence class partitioned by attribute Ai, pi is the number of equivalence classes partitioned by the attribute Ai. Thus, we have pi

[ S ] Ai = U [ s l ] Ai , where p i =| [ S ] Ai | . l =1

We transform the original nominal attribute Ai into K numerical attributes Ãi. Let Ãi = (Ai1, Ai2, ... , AiK), where K is the number of predefined classes C as defined in Section 3.1 and the domains of Ai1, Ai2, ... , AiK are in [0, 1]. The method of transformation is as follows: For a data xj ∈ S, xj = (vj1, vj2, ... , vjn), if vji ∈ Di and Ai is a nominal attribute, the vji will be transformed into (wjk, wj(k+1), ... , wj(k+K-1) ), wik ∈ [0, 1] and wjk = µ ACi1 ( x j ) , wj(k+1) = µ CAi2 ( x j ) , ... , wj(k+K-1)= µ ACiK ( x j ) , where | [ s l ] Ai ∩ [ x j ] C k | µ ACik ( x j ) = , if xji ∈ [sl]Ai. | [ s l ] Ai | After the transformation, we get the new data set S′ with attributes à = {Ã1, Ã2, ... , Ãn} and a data xj ∈ S, xj = (vj1, vj2, ... , vjn) will be transformed into yj, yj = (wj1, wj2, …, wjn′), where n′ = (n - q) + qK, q is the number of nominal attributes in A. Thus, the new training set becomes TS′ = { | yj ∈ S′, cj ∈ C, 1 ≤ j ≤ m}. 3.3 The Adaptive Interval Fitness Function The fitness function is important for genetic programming to generate effective solutions. As descriptions in Section 3.1, a classification function fi for the class Ci should be able to distinguish positive instances from a set of data by mapping the data yj to a specific range. To find such a function, we define an adaptive interval fitness function. The mapping interval for positive instances is defined in the following. Let be an positive instance of the training set TS′ for the class cj = Ci. We ( gen ) urge the values of fi(yj) for positive instances to fall in the interval [ X i − ri , ( gen ) Xi + ri ]. At the same time, we also wish that the values of fi(yj) for negative ( gen ) ( gen ) ( gen ) instances are mapped out of the interval [ X i − ri , X i + ri ]. The X i is the mean value of an individual fi(yj) in the gen-th generation of evolution, ∑ fi (y j ) ( gen ) i

, 1 ≤ j ≤ mi , 1 ≤ i ≤ K . mi ( gen ) and positive instances for Let ri be the maximum distance between X i ( gen ) 1 ≤ j ≤ mi. That is, ri = max{| X i − f i ( y j ) |}, 1 ≤ i ≤ K . 1≤ j ≤ mi We measure the error of a positive instance for (gen+1)-th generation by 0 if c = C and | f ( y ) − X i( gen ) |≤ r i j i i j , Dp =  ( gen ) |> ri 1 if c j = C i and | f i ( y j ) − X i and the error of a negative instance by 1 if c ≠ C and | f ( y ) − X i( gen ) |≤ r i j i i j . Dn =  ( gen ) 0 if c C and | f ( y ) X | r ≠ − > i  j i i j i The fitness value of an individual fi is then evaluated by the following fitness function: X

=

< y j , c j >∈TS ', c j =Ci

m

fitness(fi, TS′) = ∑ ( D p + D n ) . j =1

The fitness value of an individual represents the degree of error between the target function and the individual, we should have the fitness value be as small as possible. 3.4 The Learning Algorithm The classification functions learning algorithm using genetic programming is described in detail as follows: Algorithm: The learning of classification functions Input: The training set TS. Output: A classification function. Step 1: Initial value i = 1, k = 1. Step 2: Transform nominal attributes into rough attribute membership values. For a data ∈ TS, xj = (vj1, vj2, …, vjn), 1≤ j ≤ m. If Ai is a numerical attribute, wjk = vji, k = k + 1. If Ai is a nominal attribute, wjk = µ CA ( x j ) , wj(k+1) = µ CA2 ( x j ) , ... , wj(k+K-1)= µ CAK ( x j ) , 1

i

i

i

k = k + K, repeat Step 2 until yj = (wj1, wj2, …, wjn’) is generated, n′ = (n - q) + qK, q is the number of nominal attributes in A. Step 3: j = j + 1, if j ≤ m, go to Step 1. Step 4: Generate the new training set TS′. TS′ = {| yj = (wj1, wj2, …, wjn’) , cj ∈ C, 1 ≤ j ≤ m}. Step 5: Initialize the population. Let gen = 1 and generate the set of individuals Ω1 = { h11 , h21 , …, h1p } initially, where Ω(gen) is the population in the generation gen and hi( gen ) stands for the ith individual of the generation gen. Step 6: Evaluate the fitness value of each individual on the training set. For all hi( gen ) ∈ Ω(gen), compute the fitness values Ei( gen) = fitness( hi( gen ) , TS′), where the fitness evaluating function fitness() is defined as Section 3.3. Step 7: Does it satisfy the conditions of termination? If the best fitness value of E i( gen ) satisfies the conditions of termination ( E i( gen ) = 0) or the gen is equal to the specified maximum generation, the hi( gen ) with the best fitness value are returned and the algorithm halts; otherwise, gen = gen + 1. Step 8: Generate the next generation of individuals and go to Step 5. The new population of next generation Ω(gen) is generated by the ratio of Pr, Pc and Pm, goes to Step 6, where Pr, Pc and Pm represent the probabilities of reproduction, crossover and mutation operations, respectively.

4 The Classification Algorithm After the learning phase, we obtain a set of classification functions F that can recognize the classes in TS′. However, these functions may still conflict each other in practical cases. To avoid the situations of conflict and rejection, we proposed a

scheme based on the Z-score of statistical test. In the classification phase, we calculate all the Z-values of every classification function for the unknown class data and compare these Z-values. If the Z-value of an unknown class object yj for classification fi is minimum, then yj belongs to the class Ci. We present the classification approach in the following. For a classification function fi ∈ F corresponding to the class Ci and positive instances ∈ TS′ with cj = Ci. Let X i be the mean of values of fi(yj) defined in Section 3.3. The standard deviation of values of fi(yj), 1 ≤ j ≤ mi, is defined as

∑( f (y

σi =

i < y j , c j >∈TS' , c j = Ci

j

) − X i )2

mi

, 1 ≤ j ≤ mi , 1 ≤ i ≤ K

For a data x ∈ S and a classification function fi, let y ∈ S′ be the data with all numerical values transformed from x using rough attribute membership. The Z-value of data y for fi is defined as | f ( y) − X i | Z i ( y) = i ,

σi

where 1 ≤ i ≤ K. We used the Z-value to determine the class to which the data should be assigned. The detailed classification algorithm is listed as follows. Algorithm: The classification algorithm Input: A data x. Output: The class Ck that x is assigned. Step 1: Initial value k = 1. Step 2: Transform nominal attributes of x into numerical attributes. Assume that the data x ∈S, x = (v1, v2, …, vn). If Ai is a numerical attribute, wjk = vji, k = k + 1. If Ai is a nominal attribute, wjk = µ CA ( x j ) , wj(k+1) = µ CA2 ( x j ) , ... , wj(k+K-1)= µ CAK ( x j ) , i i k = k + K, repeat Step 2 until yj = (wj1, wj2, …, wjn’) is generated, n′ = (n - q) + qK, q is the number of nominal attributes in A. Step 3: Initially, i = 1. Step 4: Compute Zi(y) with classification function fi(y). Step 5: If i < K, then i = i + 1, go to Step 4. Otherwise, go to Step 6. Step 6: Find the k = Arg min{ Z i ( y )} , the data x will be assigned to the class Ck. 1

i

1≤ i ≤ K

Table 1. The parameters of GPQuick used in the experiments Parameters Node mutate weight Mutate constant weight Mutate shrink weight Selection method Tournament size Crossover weight Crossover weight annealing

Values 43.5% 43.5% 13% Tournament 7 28% 20%

Parameters Values Mutation weight 8% Mutation weight annealing 40% Population size 1000 Set of functions {+, -, ×, ÷} 0 Initial value of X i Initial value of ri 10 Generations 10000

5 The Experimental Results The proposed learning algorithm based on genetic programming is implemented by modifying the GPQuick 2.1 [15]. Since the GPQuick is an open source on Internet, it is more confidential for us to build and evaluate the algorithm of learning classifiers. The parameters used in our experiments are listed in Table 1. We define only four basic operations {+, -, ×, ÷} for final functions. That is, each classification function contains only the four basic operations. The population size is set to be 1000 and the number of maximum generations of evolution is set to be 10000 for all data sets. Although the number of generations is high, the GPQuick still have good performance in computation time. The experimental data sets are selected from UCI Machine Learning repository [1]. We take 15 data sets from the repository totally including three nominal data sets, four composite data sets (with nominal and numeric attributes), and eight numeric data sets. The size of data and the number of attributes in the data sets are quite diverse. The related information of the selected data sets is summarized in Table 2. Since the GPQuick is fast in the process of evolving, the training time for each classification function in 10000 generations can be done in few seconds or minutes depending on the number of cases in the training data sets. The proposed learning algorithm is efficient when it is compared with the training time in [11] having more than an hour. We don’t know why the GPQuick is so powerful in evolving speed, but it is easy to get the source [15] and modify the problem class to obtain the results for everyone. The performance of the proposed classification scheme is evaluated by the average classification error rate of 10-fold cross validation for 10 runs. We figure out the experimental results and compare the effectiveness with different classification models in Table 3. These models include statistical model like Naïve Bayes[3], NBTree [7], SNNB [16], the decision tree based classifier C4.5 [14] and the association rule-based classifier CBA [10]. The related error rates in Table 3 are cited from [16] except the GP-based classifier. Since the proposed GP-based classifier is random based, we also show the standard deviations in the table for reference of readers. From the experimental results, we observed that the proposed method obtains lower error rates than CBA in 12 out of the 15 domains, and higher error rates in three domains. It obtains lower error rates than C4.5 rules in 13 domains, only one domain has higher error rate and the other one results in a draw. While comparing our method with NBTree and SNNB, the results are also better for most cases. While comparing with Naïve Bayes, the proposed method wins 13 domains and loses in 2 domains. Generally, the classification results of proposed method are better than others on an average. However, in some data sets, the test results in GP-based is much worse than others, for example, in the “labor” data, we found that the average error rate is 20.1%. The main reason of high error rate terribly in this case is the small size of samples in the data set. The “labor” contains only 57 data totally and is divided into two classes. While the data with small size is tested in 10-fold cross validation, the situation of overfitting will occur in both of the two classification functions. That’s also why the rule based classifiers like C4.5 and CBA have the similar classification results as ours in the labor data set.

6 Conclusions Classification is an important task in many applications. The technique of classification using genetic programming is a new classification approach developed recently. However, how to handling nominal attributes in genetic programming is a difficult problem. We proposed a scheme based on the rough membership function to classify data with nominal attribute using genetic programming in this paper. Further, for improving the accuracy of classification, we proposed an adaptive interval fitness function and use the minimum Z-value to determine the class to which the data should be assigned. The experimental results demonstrate that the proposed scheme is feasible and effective. We are trying to reduce the dimensions of attributes for any possible data sets and cope with the data having missing values in the future. Table 2. The information of data sets Attributes Attributes Data set Classes Cases Data set Classes Cases nominal numeric nominal numeric australian 8 6 2 690 lymph 18 0 4 148 german 13 7 2 1000 pima 0 8 2 768 glass 0 9 7 214 sonar 0 60 2 208 heart 7 6 2 270 tic-tac-toe 9 0 2 958 ionosphere 0 34 2 351 vehicle 0 18 4 846 iris 0 4 3 150 waveform 0 21 3 5000 labor 8 8 2 57 wine 0 13 3 178 led7 7 0 10 3200 Table 3. The average error rates (%) for compared classifiers Data sets NB NBTree SNNB C4.5 CBA GP-Ave. australian 14.1 14.5 14.8 15.3 14.6 9.5 german 24.5 24.5 26.2 27.7 26.5 16.7 glass 28.5 28.0 28.0 31.3 26.1 22.1 heart 18.1 17.4 18.9 19.2 18.1 11.9 ionosphere 10.5 12.0 10.5 10.0 7.7 7.2 iris 5.3 7.3 5.3 4.7 5.3 4.7 labor 5.0 12.3 3.3 20.7 13.7 20.1 led7 26.7 26.7 26.5 26.5 28.1 18.7 lymph 19.0 17.6 17.0 26.5 22.1 13.7 pima 24.5 24.9 25.1 24.5 27.1 18.3 sonar 21.6 22.6 16.8 29.8 22.5 5.6 tic-tac-toe 30.1 17.0 15.4 0.6 0.4 5.2 vehicle 40.0 29.5 28.4 27.4 31 24.7 waveform 19.3 16.1 17.4 21.9 20.3 11.7 wine 1.7 2.8 1.7 7.3 5.0 4.5

S.D. 1.2 2.2 2.9 2.7 2.3 1.0 3.0 2.7 1.5 2.8 1.9 1.6 2.4 1.8 0.7

References 1. 2. 3. 4.

5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

Blake, C., Keogh, E. and Merz, C. J.: UCI repository of machine learning database, http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, University of California, Department of Information and Computer Science (1998) Bramrier, M. and Banzhaf, W.: A Comparison of Linear Genetic Programming and Neural Networks in Medical Data Mining, IEEE Transaction on Evolutionary Computation, 5, 1 (2001) 17-26 Duda, R. O. and Hart, P. E.: Pattern Classification and Scene Analysis, New York: Wiley, John and Sons Incorporated Publishers (1973) Freitas, A. A.: A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction, Proc. the 2nd Annual Conf. Genetic Programming. Stanford University, CA, USA: Morgan Kaufmann Publishers (1997) 96101 Heckerman, D. M. and Wellman, P.: Bayesian Networks, Communications of the ACM, 38, 3 (1995) 27-30 Kishore, J. K., Patnaik, L. M., and Agrawal, V. K.: Application of Genetic Programming for Multicategory Pattern Classification, IEEE Transactions on Evolutionary Computation, 4, 3 (2000) 242-258 Kohavi, R.: Scaling Up the Accuracy of Naïve-Bayes Classifiers: a Decision-Tree Hybrid. Proc. Int. Conf. Knowledge Discovery & Data Mining. Cambridge/Menlo Park: AAAI Press/MIT Press Publishers (1996) 202-207 Koza, J. R.: Genetic Programming: On the programming of computers by means of Natural Selection, MIT Press Publishers (1992) Koza, J. R.: Introductory Genetic Programming Tutorial, Proc. the First Annual Conf. Genetic Programming, Stanford University. Cambridge, MA: MIT Press Publishers (1996) Liu, B., Hsu, W., and Ma, Y.: Integrating Classification and Association Rule Mining. Proc. the Fourth Int. Conf. Knowledge Discovery and Data Mining. Menlo Park, CA, AAAI Press Publishers (1998) 443-447 Loveard, T. and Ciesielski, V.: Representing Classification Problems in Genetic Programming, Proc. the Congress on Evolutionary Computation. COEX Center, Seoul, Korea (2001) 1070-1077 Pawlak, Z.: Rough Sets, International Journal of Computer and Information Sciences, 11 (1982) 341-356 Pawlak, Z. and Skowron, A.: Rough Membership Functions, in: R.R. Yager and M. Fedrizzi and J. Kacprzyk (Eds.), Advances in the Dempster-Shafer Theory of Evidence (1994) 251-271 Quinlan, J. R.: C4.5: Programs for Machine Learning, San Mateo, California, Morgan Kaufmann Publishers (1993) Singleton A. Genetic Programming with C++, http://www.byte.com/art/9402/sec10/art1.htm, Byte, Feb., (1994) 171-176 Xie, Z., Hsu, W., Liu, Z., and Lee, M. L.: SNNB: A Selective Neighborhood Based Naïve Bayes for Lazy Learning, Proc. the sixth Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining (2002) 104-114 Zhang, G. P.: Neural Networks for Classification: a Survey, IEEE Transaction on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 30, 4 (2000) 451462

Suggest Documents