Discretization of Continuous Attributes for Learning Classification Rules
Recommend Documents
german glass heart ionosphere iris segment wine yeast. Dataset. Accuracy difference. Fig. 1. Ten-fold Evaluation Results on Actual Data Sets. samples of the ...
high accuracy. Supervised discretization process on continuous features ... a huge range of continuous values to a greatly shrunk subset of discrete values.
proach to learn rules for simultaneous prediction of multiple target attributes. We propose ... Traditionally, inductive machine learning focuses on problems where the task is to pre- ...... Top-down Induction of First Order Logical Decision Trees.
that âbest describesâ the probability distribution over the training data. ... variables where each variable Xi may take on values from a finite domain. We use capital letters, such as X; Y; Z, for variable names and lowercase letters x; y; z to
discrete and continuous form simultaneously, and use both versions in the .... When the variables in X take values from finite discrete sets, we typically represent ...
Relative Classifier Information (RCI) is an entropy-based performance measure that quantifies how much the uncertainty of a decision problem is reduced by a.
DEFINITION 4: A joint class-attribute probability scheme P' is the set of ...... 1260, Computer Science Department, University of Illinois at Urbana-. Champaign ...
M = 2u′σ3 + (−4u2 − 2x + 8tu − 16t2)σ+ + (4u + 8t)σ−, C = 2σ3E, (40) .... B = 2λ((u − u)/h)σ3 + (−16t2 + 4t(u + u) + g2 − 4uu)σ+ ..... B 236 (1990) 144–150.
introduce the concept of cost-sensitive discretization as a preprocessing step to induction of a ... From an algorithmic perspective, we show that ... minimize the overall misclassification cost of false positive and false negative errors ... In sect
While the original RELIEF can deal with discrete and continuous attributes, it can not ... their values distinguish among the instances that are near to each other.
Computer Science Department. Stanford University. Stanford, CA. 94305. {wfd,ronnyk .... ciated with different classes in
Jan 18, 2017 - Learning Parsimonious Classification Rules from ... networks using a decision tree representation of its parameters with global constraints, and ...
the effort of developing intelligent web applications. ... which serves as an application domain. Section 3 ... according to structural classification rules and its app-.
inal e.g., attribute color may have values of red, green, yellow and ordinal .... of continuous attributes, the breast cancer data are of ordinal discrete ones, and the ...
also integrated in an a new version called RULES-F+. RULES-SRI [15]: This ..... seen that RULES in Bupa, Cleveland, Iris, Tic-Tac-Toe, and. Vehicle datasets ...
In the case of training on continuous-valued data, the associated attributes ... ACDT Ant-Miner Data Mining Ant Colony Decision Tree Ant Colony Optimization.
morphological classification of unknown words for German. The present ..... The MorphoClass system has been evaluated over an 85 KB German literature text:.
2652-430224; email: {ecarreno, legui}@unsl.edu.ar). Neal Wagner is member of the Department of Mathematics & Computer. Science, Augusta State University, ...
Jan 31, 2006 - Algorithms developed in these subfields to solve classification ..... 2.3 True error rate versus model complexity (or overtraining). ..... Writing the equations for n observed values gives: ..... to the generalized additive models as d
Nov 25, 2010 - These are very topical and truly di cult problems that are being ..... The knowledge contained in a network is encoded in the weights and biases.
A number of gradient boosting algorithms are proposed [4, 8, 21]. A significant advance is .... $4V ¡W 176 and 2 examples have $4â ¡9 8#6 ;. (2) The maximum ...
with discrete values, an novel discretization approach is proposed to improve ... applied to find a good solution for discretization of continuous attributes so that.
Comparison of Multinomial Classification Rules ... in terms of expected actual error rates, which are better than those for the plug-in rule. Keywords: .... the apparent and actual error rates, per group and overall, were evaluated (all these com-.
Discretization of Continuous Attributes for Learning Classification Rules
Recent work on entropy-based discretization of continuous attributes has pro- .... Table 1. Results on the Artificial Domain. two continuous attributes, named A1 and A2, have value .... URL: http://www.ics.uci.edu/AI/ML/MLDBRepository.html. 5.
Discretization of Continuous Attributes for Learning Classi cation Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada
Abstract. We present a comparison of three entropy-based discretiza-
tion methods in a context of learning classi cation rules. We compare the binary recursive discretization with a stopping criterion based on the Minimum Description Length Principle (MDLP)[3], a non-recursive method which simply chooses a number of cut-points with the highest entropy gains, and a non-recursive method that selects cut-points according to both information entropy and distribution of potential cut-points over the instance space. Our empirical results show that the third method gives the best predictive performance among the three methods tested.
1 Introduction Recent work on entropy-based discretization of continuous attributes has produced positive results [2, 6] . One promising method is Fayyad and Irani's binary recursive discretization with a stopping criterion based on the Minimum Description Length Principle (MDLP) [3]. The MDLP method is reported as a successful method for discretization in the decision tree learning and Naive-Bayes learning environments [2, 6]. However, little research has been done to investigate whether the method works well with other rule induction methods. We report our performance ndings of the MDLP discretization in a context of learning classi cation rules. The learning system we use for experiments is ELEM2 [1], which learns classi cation rules from a set of training data by selecting the most relevant attribute-value pairs. We rst compare the MDLP method with an entropy-based method that simply selects a number of entropy-lowest cutpoints. The results show that the MDLP method fails to nd sucient useful cut-points, especially on small data sets. The experiments also discover that the other method tends to select cut-points from a small local area of the entire value space, especially on large data sets. To overcome these problems, we introduce a new entropy-based discretization method that selects cut-points based on both information entropy and distribution of potential cut-points. Our conclusion is that MDLP does not give the best results in most tested datasets. The proposed method performs better than MDLP in the ELEM2 learning environment.
2 The MDLP Discretization Method Given a set S of instances, an attribute A, and a cut-point T, the class information entropy of the partition induced by T, denoted as E(A; T; S), is de ned N. Zhong and L. Zhou (Eds.): PAKDD’99, LNAI 1574, pp. 509-514, 1999. c Springer-Verlag Berlin Heidelberg 1999
510
as
Aijun An and Nick Cercone
E(A; T; S) = jjSS1jj Ent(S1 ) + jjSS2jj Ent(S2 );
where Ent(Si ) is the class entropy of the subset Si , de ned as Ent(Si ) = ;
Xk P(C ; S )log(P(C ; S )); j =1
j i
j i
where there are k classes C1; ; Ck and P(Cj ; Si) is the proportion of examples in Si that have class Cj . For an attribute A, the MDLP method selects a cut point TA for which E(A; TA ; S) is minimal among all the boundary points1. The training set is then split into two subsets by the cut point. Subsequent cut points are selected by recursively applying the same binary discretization method to each of the newly generated subsets until the following condition is achieved: ; 1) + (A; T; S) Gain(A; T; S)