Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing
A New Classifier to Deal with Incomplete Data Jun Wu, Yo Seung Kim, Chi-Hwa Song and Won Don Lee
Abstract—Classification is a very important research topic in knowledge discovery and machine learning. Decision-tree is one of the well-known data mining methods that are used in classification problems. But sometimes the data set for classification contains vectors missing one or more of the feature values, and is called as incomplete data. Generally, the existence of incomplete data will degrade the learning quality of classification models. If the incomplete data can be dealt well, the classifier can be used to real life applications. So handling incomplete data is important and necessary for building a high quality classification model. In this paper a new decision tree is proposed to solve the incomplete data classification problem and it has a very good performance. At the same time, the new method solves two other important problems: rule refinement problem and importance preference problem, which ensures the outstanding advantages of the proposed approach. Significantly, this is the first classifier which can deal with all these problems at the same time.
I. INTRODUCTION
C
lassification is an important problem which has been researched for many years. But the existence of incomplete data always decreases the quality of classification models. To show the definition of missing data more intuitively, the example is taken like this: If X1=(1,2,3,4), then (?,2,3,4) is X1 with 25% incomplete data, and (1,?,?,4) is X1 with 50% incomplete data. The classification problem can be separated into two phases: learning phase and classification phase. Learning phase is to build the classification model from a set of training data, and the classification phase is to classify unknown cases to one of the predefined classes. Many methods dealing with incomplete data in classification problem have been proposed, but most of them only focus on the processing of handling incomplete data in the training set for learning. For the incomplete value appearing in the classification phase,
Jun Wu is a Master degree candidate at the Department of Computer Science and Engineering at ChungNam National University (e-mail:
[email protected]) Yo Seung Kim is a Master candidate at the Department of Computer Science and Engineering at ChungNam National University (e-mail:
[email protected]) Chi-Hwa Song is a professor at the Department of Computer Science and Engineering at ChungNam National University (e-mail:
[email protected]) Won Don Lee is a Professor at the Computer Science and Engineering Department, ChungNam National University, DaeJeon, KOREA since 1987.(corresponding author to provide phone: +82 42 821-5448 e-mail:
[email protected] ) .
978-0-7695-3263-9/08 $25.00 © 2008 IEEE DOI 10.1109/SNPD.2008.44
most of the current approaches can not work. The decision-tree classifier can predict the class of the data based on their given attribute values. This methodology has been successfully used in many applications. C4.5 is the only algorithm which can work for the incomplete data (including nominal data and numerical data) classification problem in both learning phase and classification phase now. But C4.5 isn’t so efficient when handing incomplete data, which is analyzed in section 2. To solve the incomplete data classification problem smoothly in both learning phase and classification phase, a new decision tree method is proposed. The proposed method can solve the incomplete data classification problem very well. Besides the incomplete data classification problem, the proposed method solves two other important problems: rule refinement problem and importance preference problem. Refinement problem is to add new coming data into a decision tree to generate a more refined tree. Importance preference problem is how to combine the information from the different sensors in different environment with different importance. All these problems and solutions will be discussed in detail in section 3. In section 4, the experimental results are demonstrated, the results show a very good performance of the proposed method. It is noticed that the rules generated by the new decision tree can be easily combined with the new coming data and generate a new with more information. This experiment shows the excellent performance of this proposed method. And the conclusion of this new decision tree is made in section 5. II. PREVIOUS WORKS FOR INCOMPLETE DATA CLASSIFICATION PROBLEM
In the past decades, a lot of algorithms to deal with the problem of classification with incomplete data have been proposed, and they are mainly divided into two main strategies: (1)Ignore the data with incomplete values. (2)Fill the incomplete value by an appropriate alternative. Most of these algorithms only focus on the processing of handling incomplete data in the training set for learning but not for the incomplete value appearing in the classification phase. The famous C4.5 developed by J. Ross Quinlan[1] is the representative of the strategy. C4.5 can deal with both nominal data and numerical data and it is the only algorithm which can work for the incomplete data classification
105
problem in both learning phase and classification phase now. But it isn’t very efficient when handing incomplete data, the result of C4.5 will be compared with the proposed new decision tree in the experiment. The related terms in C4.5 are defined as follow: A: Denotes an attribute in the dataset. U: The set of training events at each node. UAj : It is the subset of U, which has the outcome value j of a single attribute A. |U| : The number of events in the set U. |UAj| : The number of events in the set UAj. Class : C1, C2, …………. Ck-1, Ck. Based on a single attribute A, the outcome is defined as: { OA1, OA2, ………… OA(n-1), OAn } If A is a continuous attribute, it has two outcomes according to the specific threshold Z. Late the process of calculating the threshold Z will be described. freq( Ci , U ) : The number of events in U that has the class value of Ci The functions needed to calculate the info(U) and the info(UAj) are followed: info(U) = −
k
∑ (freq(C ,U ) / | U |) i
i =1
⋅ log 2 (freq (Ci, U ) / | U |) n
infoA(U) =
∑ (| U
Aj
| / | U |) ⋅ info(UAj )
(1) (2)
j =1
info(UA j) = −
k
∑ (freq(C , U ) / | U i
Aj
Aj
|)
i =1
⋅ log 2 (freq (Ci, UAj ) / | UAj |)
(3)
Here, k is defined as the number of values of a class. Only the events in Ci and U without incomplete data are taken into account in equation (1), (2) and (3). Then the Gain(A) is calculated like this: (4) Gain(A) =P( info(U) – infoA(U)) P=the number of events with incomplete data/the number of all events. C4.5 decision tree consists of test nodes and leaf nodes. The test nodes are selected from the attributes. C4.5 normalizes Gain(A) with the Split_info(A), which is defined as: Split_info(A) = −
n +1
∑ (| U
Aj
j =1
| / | U |) ⋅ log 2 (| UAj | / | U |) (5)
Here, the events with incomplete data are regarded as an additional group, so the subsets number j changes from n to n+1. Then, the gain ratio of the attribute A is defined as Gain_ratio(A) = Gain(A) / Split_info(A) (6)
The attribute that has the biggest Gain_ratio among the attributes is selected to be a test node. And this process goes on until it comes to the leaf node. For the continuous attributes, the optimal threshold Z should be calculated first, and then the attribute can be divided into two subsets by the optimal threshold Z. The process to calculate the optimal threshold Z is followed: First, all the different values of attributes are sorted in order. Assume the ordered values are like: W1, W2, W3…. Wi,… Wm, m is the number of different values in this attribute. Second, assume the threshold is between Wi and Wi+1, then the attribute is divided into two subset and the function (1), (2) and (3) are used to calculate the entropy. Third, the threshold with the smallest entropy is chosen and it is the optimal threshold. After the optimal threshold is gotten, the Gain_ratios for all attributes can be calculated using the function (4),(5) and (6). Then the attribute with the biggest Gain_ratio is set to be the test node. This is the process to decide the test node. The same process is applied repeatedly to each subset until comes to the leaf node. If U has only one class value, it is set to be leaf node. A leaf node can also contain several classes if it is pruned. But this method wastes much useful information while handling the incomplete data. Because it ignores the event with incomplete data while calculating the gain, this wastes the information of the attributes with data in this event. For example, in Table 1, the event 1 is ignored while calculating the gain because the attribute temperature is incomplete data, but this wastes the information of the attribute “outlook” and “windy”. So how to use all the useful information from the original data is concerned to generate a more efficient classifier. Table 1 Training data set in example Event# 1 2 3 4 5 6 7
Outlook sunny ? rain rain sunny ? overcast
Temperature ? cool hot warm ? cool ?
Windy? false true ? ? false true false
Class Don't Play Don't Play Play Play Play Play Don't Play
III. A NEW DECISION TREE FOR INCOMPLETE DATA A. Solving the Incomplete Data Classification Problem To remain all the useful information to get a solution for the incomplete data classification problem, a new data expression is proposed in Table 2. In this data expression, each row is transformed from the original event. And there is a weight value for each event, which shows how important the event is. For example, an event with the weight of 20 has the same importance of 20 events with the weight of 1. This kind of data expression looks simple but improves C4.5 greatly because it makes the possibility of each attribute.
106
This means it is not necessary to give a single value to a event and 1 “overcast” event for the class value “Don’t Play”. particular attribute of one event, but make it possible to give So the total number of events for “Don’t Play” is 2.And there a sense of probability to an attribute. In this case, from all of are 2 “rainy” events and 1 “sunny” event for the class value the training data, the corresponding probability for the “Play”. So the total number of events for “Play” is 3. For the incomplete data in a given attribute can be calculated. Given event #2, the outlook attribute is incomplete, and the class a particular attribute and class value, the number of each value is “Don’t Play”, so assign it to be 1/2 “sunny”, 1/2 possible value of this attribute is calculated and divided by “overcast” and 0 “rainy”. And for the event #6, the outlook the total number of the events for this class value. Then attribute is incomplete, and the class value is “Play”, so it is assign the corresponding number for the incomplete data. assigned to be 1/3 “sunny”, 0 “overcast” and 2/3 “rainy”. In For example, in the attribute “outlook”, there are 3 possible this way, Table 2 can be filled values: “sunny”, “overcast” and “rainy”. There are 1 “sunny” Table 2 New data expression from Table 1 Event # 1 2 3 4 5 6 7
Weight
Outlook overcast 0 1/2 0 0 0 0 1
sunny 1 1/2 0 0 1 1/3 0
1 1 1 1 1 1 1
rainy 0 0 1 1 0 2/3 0
cool 1 1 0 0 1/3 1 1
Temperature warm 0 0 0 1 1/3 0 0
info(U) = −
i
Play 0 0 1 1 1 1 0
Class Don’t Play 1 1 0 0 0 0 1
k
n (u )
i =1
m =1
∑ ( ∑ Weight (m,U ) ⋅ C (m) / | U |) i
(8)
info(UAj)=
i =1
Aj
ci
ci
n (u )
i =1
m =1
⋅ log 2 (freq (Ci, UAj ) / | UAj |)
(9)
|U|: The number of events in the set U. One event means the event with the weight value 1. So an event has the weight value X means that there are X number of events with the equal distribution values of attributes and class. |UAj|: The number of events in the set UAj.
n (U )
∑ O (m ) / N
k
− ∑ ( ∑ Weight (m, U ) ⋅ Ci (m) ⋅ OAj (m) / | UAj |)
Outcome membership weight : OA1(m), OA2(m), ……… OA(n-1) (m), OAn(m) OAj(m): the value which shows how much the outcome value j in the attribute A can happen in the mth event. For the incomplete data: OAj’(mci’): the missing value in the OAj(m), mci is event without incomplete data mci’ is event with incomplete data Nci is the number of events which belong to class Ci (7)
n (UAj )
| UAj| =
j =1
∑Weight (m, U ) ⋅ O (m) Aj
Aj
(10)
m =1
n
Here,
False 1 0 1/2 1/2 1 0 1
⋅ log 2 (freq (Ci, U ) / | U |)
k
∑ C (m ) = 1
OAj(mci’)=
True 0 1 1/2 1/2 0 1 0
freq(Ci, UAj) : the number of events in the set UAj, which has the class value of Ci. Using these new definitions, the entropy functions can be derived as follows:
Using this data expression, new functions are derived to build a new decision tree. The definitions and functions are followed: Class membership weight : C1(m), C2(m), ………. Ck-1(m), Ck(m). Ci(m) represents how much the mth event belongs to the class Ci. Here,
Windy? hot 0 0 1 0 1/3 0 0
∑ O (m ) = 1
As it is shown in function (10), | UAj | can be calculated by multiplying OAj(m) and weight(m,UAj), and adding the results of all the multiplications. OAj(m) is for the outcome possibility value of each event in UAj and the weight(m,UAj) is for each m. Then the function to calculate the Gain_ratio is like function(11),(12) and (13).
Aj
j =1
UAj : The subset of U which has the outcome value j for the attribute A. For example, if the attribute Outlook has three values of sunny, overcast and rain, the set U is divided into three subsets. n(u): The number of events in the set U. Weight(m,U) : Weight value of the mth event in the set U. freq(Ci, U) : The number of events in the set U, which has the class value of Ci.
Split_info(A) = −
n (u )
∑ (| U
Aj
j =1
| / | U |) ⋅ log 2 (| UAj | / | U |)
Gain(A) = info(U) – infoA(UAj)
107
(11) (12)
Gain_ratio(A) = Gain(A) / Split_info(A) (13) Therefore, the attribute which has the biggest Gain_ratio in the node can be decided by using the above functions. To illustrate the process of generating the new decision tree, taken the data in Table 2 as an example. The attribute Outlook in Table 2 is chosen to get Table 3. Table 3 Information for attribute Outlook Play 4/3 0 8/3 4
Sunny Overcast Rainy total
Don’t play 3/2 3/2 0 3
with other outcomes. The outcome with the largest probability is selected as the final outcome. B. Rule Refinement Problem It is well known that the rule generated from the decision tree is simpler than the original data and has a prediction as accurate as the decision tree. How to build a new decision tree by adding some new data into a constructed tree is called rule refinement problem. Is it necessary to add the new data to the original data, and build the new tree from all these data? After all, training is time-consuming process. And it costs much storage to store all the original data. In the proposed method, the rule refinement problem is solved very well. Assume there is just a tree in Fig.1 and the original data is deleted after building the tree. Here is the process of generating rules from Fig.1. For example, if the most left leaf node is chosen, trace back from leaf node to the test node, we can find that Wind? attribute, Temperature attribute and Outlook attribute are selected. For the attribute outlook, sunny is assigned to 1, overcast and rain are assigned to 0. For the attribute Temperature, cool is assigned to 1, warm and hot are assigned to 0. For the attribute Windy?, True is assigned to 1, False is assigned to 0. And for the class, Play is assigned to 1/3 and Don’t play is assigned to 1/2. So for this leaf node, the generated rule is like Event #1 in Table 4. For the leaf node without a special attribute, it is taken to be “don’t care” because it is not selected as a test node while constructing the decision tree. Therefore it is taken to be equal probability. For example, for the most right leaf node in Fig.1 there is no attribute for Outlook, so sunny, overcast and rain are all assigned equal probability 1/3 for this event in Table 4. The new data can easily be added to rules in Table 4 because they have the same data expression. In Table 5, event 9 to event 12 are new coming data, they are combined with the rules in Table 4, then Table 5 can be used to generate a new decision tree. To evaluate the value of this kind of decision tree, some experiments are also designed in section 4.
total 17/6 3/2 8/3 7
Using the above functions, it can be calculated as follows: Info(outlook)=0.98523; InfoA(outlook)=0.3819; Split_info(outlook)=1.53479; So Gain_ratio(outlook)=0.393098. Similarly, it can be calculated that Gain_ratio(Temperature)=0.25783 and Gain_ratio(Windy?)=0.600544, then the attribute Windy? is chosen to be testing node because it has larger Gain_ratio. In this way, the decision tree is generated as Fig.1
Fig.1 New Decision Tree Built From Table 2 For the incomplete data in the testing data set, the proposed method goes through all the possible branches (below the current node) and takes into account that some are more probable than others. In the new decision tree there are different outcomes with probability, so all the probabilities of a special outcome are added, and compared Table 4 Rule table generated from Fig.1 Event # 1 2 3 4 5 6 7 8
Weight
1 1 1 1 1 1 1 1
Outlook
Temperature
Windy?
Class
sunny
overcast
rain
cool
warm
hot
True
False
Play
1 0 0 1/3 1/3 1/3 1/3 1/3
0 1 0 1/3 1/3 1/3 1/3 1/3
0 0 1 1/3 1/3 1/3 1/3 1/3
1 1 1 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 1 0 0 1
1 1 1 1 1 0 0 0
0 0 0 0 0 1 1 1
1/3 0 2/3 1/2 1/2 1/3 5/6 5/6
108
Don’t Play 1/2 1/2 0 0 0 2 0 0
Table 5 Rule added by new coming data Event # 1 2 3 4 5 6 7 8
1 1 1 1 1 1 1 1
sunny 1 0 0 1/3 1/3 1/3 1/3 1/3
Outlook overcast 0 1 0 1/3 1/3 1/3 1/3 1/3
rain 0 0 1 1/3 1/3 1/3 1/3 1/3
60 1 1 1 0 0 1 0 0
Temp(OF) 70 0 0 0 0 0 0 1 0
80 0 0 0 0 1 0 0 1
True 1 1 1 1 1 0 0 0
false 0 0 0 0 0 1 1 1
Play 1/3 0 2/3 1/2 1/2 1/3 5/6 5/6
Don’t Play 1/2 1/2 0 0 0 2 0 0
10 1 1 1
0 1 0 0
1 0 1 0
0 0 0 1
1 0 0 1
0 1 0 0
0 0 1 0
1 1 0 1
0 0 1 0
1 0 1 1
0 1 0 0
Weight(i)
9 10 11 12
Windy?
Class
C. Importance Preference Problem data is more reliable and important than the second sensor. In C4.5 all the data is considered to be same important, but But how to deal with this problem? In the proposed method please think about if there some training data comes from it is solved smoothly by modifying the weight. It means to more reliable environment and some from not so reliable give the more important events higher weight. It is shown in environment. For example, there are two sensors to get the Table 6. It looks very simple but it is quite efficient and data, sensor 1 is in a reliable environment, it can get almost useful to combine the different sensors with different true data, while the sensor 2 in the bad environment just can importance. get not so reliable data. So it is sure that the first sensor’s Table 6 New data expression of different sensors with different importance Sensor number 1 1 1 1 2 2 2
Event # 1 2 3 4 5 6 7
Weight
2 2 2 2 1 1 1
Outlook
Temperature
Windy?
sunny
overcast
rain
cool
warm
hot
True
False
Play
1 2/5 0 0 1 2/5 0
0 1/5 0 0 0 1/5 1
0 2/5 1 1 0 2/5 0
2/4 1 0 0 2/4 1 2/4
1/4 0 0 1 1/4 0 1/4
1/4 0 1 0 1/4 0 1/4
0 1 2/5 2/5 0 1 0
1 0 3/5 3/5 1 0 1
0 0 1 1 1 0 0
Class Don’t Play 1 1 0 0 0 1 1
are the results of this part. In this part the “Rule +data” is designed like this: After merging 9 blocks into one for training data set, this training data set is divided into 2 blocks randomly. 1 block is used to generate the New Tree to get the rule, then this rule is added to the other block. And this Rule+data is used to generate the New Tree again to classify. From the result it shows that although the data of the first block is deleted, the rule from the New Tree of the first block can be combined with the new adding data, and get a good result. This process is shown in Fig.2.
IV EXPERIMENTS To evaluate and compare the performance of the proposed method, the UCI Machine Repository collected data sets are used in the experiments. The iris data set has 150 events and Pima data set has 768 events. The datasets are modified to be incomplete datasets by randomly deleting a given percentage of data. The incomplete training rate is shown in the first column in Table 7 and Table 8. And the incomplete testing rate is shown in the first column in Table 9 and Table 10. The generation of incomplete data has two constrains: 1: each event has at least one attribute remains. 2: each attribute has at least one value remains. The result of the experiment is evaluated by the average error rate of 10-fold cross validation of each data set for 10 runs. 10-fold cross validation is a process which divides the data set into 10 blocks. 9 blocks are merged for training data and the rest block for the testing data. The experiment is divided into two parts: In the first part incomplete data is just used in the training data set, and the testing data is complete. Table 7 and Table 8
Fig.2 The process of classifying incomplete data
109
In the second part of the experiment, the incomplete data is used in both training data set and testing data set. First the training data with 10% incomplete data is used to generate the decision tree, then the testing data with special incomplete rate is used to get the error rate, Table 9 and Table 10 are the results of this part. In both parts, the results of New Tree and Rule+data are better than C4.5. It is noticed that in the Rule+data cases, half of the original data has been deleted after generating the new decision tree. The rule generated form the decision tree is added by the other half original data, and used to generate a new decision tree. And the performance of this Rule+data is still better than C4.5 when there is incomplete data. This means the proposed method has a very good ability to generate useful rules. And it is even not necessary to store the original data because the rule which is generated from the decision tree can be easily combined with the new added data. This also proves the proposed new decision is very efficient and useful. Table 7 Error rate of classifying Iris data with complete testing data: Training data’s Incomplete rate 0 5% 10% 20% 30%
C4.5
New Tree
Rule+data
4.7% 5.69% 6.16% 7.7% 12.09%
4.7% 4.7% 5.33% 6.0% 7.32%
5.33% 5.33% 6.0% 6.7% 7.99%
performance of the proposed method. This new classifier can also deal with other two problems: rule refinement problem and importance preference problem. It is noticed that when handling rule refinement problem, the rule generated from the new decision tree can be easily added by the new coming data, and to generate another new decision with more information. The experiment shows this new decision tree generated by rule and new data has a better result than C4.5 when there is incomplete data existing. To solve the importance preference problem, the value of weight is adjusted to adapt the different sensors with different importance. This means it can be used to combine the sensors in different environments. This is the first and only classifier which can handle all these problems at the same time. Because of all these advantages, this new decision tree method is strongly proposed. REFERENCES [1] [2]
[3]
[4]
Table 8 Error rate of classifying Pima data with complete testing data Training data’s Incomplete rate 0 5% 10% 20% 30%
C4.5
New Tree
Rule+data
25.4% 25.7% 26.3% 29.1% 28.6%
25.4% 25.5% 25.92% 26.58% 27.11%
25.65% 25.65% 26.18% 26.84% 27.5%
[5]
[6] [7]
Table 9 Error rate of classifying Iris data with 10% incomplete training data: Testing data’s Incomplete rate 0 5% 10% 20% 30%
C4.5
New Tree
Rule+data
4.7% 10.74% 13.46% 19.7% 29.4%
4.7% 6.0% 8.66% 15.33% 21.33%
5.33% 9.3% 12.6% 17.32% 22.98%
[8]
[9]
[10]
Table 10 Error rate of classifying Pima data with 10% incomplete training data: Testing data’s Incomplete rate 0 5% 10% 20% 30%
C4.5
New Tree
Rule+data
25.4% 27.43% 28.12% 32.45% 34.57%
25.4% 26.44% 26.97% 28.03% 30.52%
25.65% 26.57% 27.23% 29.87% 31.97%
[11]
[12]
[13]
J. R. Quinlan, “C4.5:Program for Machine Learning ,” San Mateo, Calif, Morgan Kaufmann, 1993 Dong-Hui Kim, Dong-Hyeok Lee and Won Don Lee, "Classifier using Extended Data Expression", IEEE Mountain Workshop on Adaptive and Learning Systems, pp. 154-159, July. 2006 J.W.Friedman, “A recursive partitioning decision rule for non-parametric classification” IEEE Transaction on Computer Science, 977, pp,404-408 J.W.Grzymala-Busse, “Rough set strategies to data with incomplete attribute values” in Proceedings of the Workshop on Foundations and New Directions in Data Mining ,associated with the third IEEE International Conference on Data Mining, November 19-22,2003,pp.56-63 R.J.Hathaway and J.C.Bezdek, “Fuzzy c-means clustering of incomplete data,” IEEE Transaction on systems, Man, and Cybernetics-part B: Cybernetics, Vol.31, No.5,2001. M.kryszkiewicz, “Rough set approach to incomplete information systems,” Information Science, Vol.112,1998, pp.39-49 J.R.Quinlan, “Unknown attribute values in induction”, in Proceedings of the Sixth International Workshop on Machine Learning,1989,pp.164-168. A.P.Dempster, N.M.Laird, D.B.Rubin, “Maximum-likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Vol.B39,1977, pp.1-38. J.W.Grzymala-Busse, “On the unknown attribute values in learning from examples,” in Proceedings of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, Lecture Notes in Artificial Intelligentce, Vol.542,Springer-Verlag, Berlin Herdelberg New York,1991,pp.368-377. J.Han and M.Kamber, Data mining: Concept and Techniques, Morgan Kaufmann publishers,2001 T.-P.Hong, L.-H. Tseng and B.-C. Chien, “Learning fuzzy rules from incomplete numerical data by routh sets” in Proceedings of the 2002 IEEE International Conference on Fuzzy Systems, pp.1438-1443. I.Koninenko, I.Bratko and E.Roskar, “Experiments in automatic learning of medical diagnostic rules” Technical Report, Jozef Stenfan Institute, Ljubljana,1984 R.Slowinski and J.Stefanowski, “handing various types of uncertainty in routh set approach” in Proceedings of the International Workshop on Rough Sets and Knowledge Discovery,1993, pp.366-376.
[14] UCI data summary http://www.ics.uci.edu/~mlearn/MLSummary.html
V CONCLUSION In this paper, a new method of generating a decision tree is proposed to deal with the problem of classification with incomplete data. The experiment shows the good
110