Journal of Information, Control and Management Systems, Vol. 3, (2005), No. 1
45
FUZZY DECISION TREE FOR PARALLEL PROCESSING SUPPORT Vitaly LEVASHENKO, Penka MARTINCOVÁ Faculty of Management and Informatics, University of Žilina, Slovakia e-mail:
[email protected],
[email protected] Abstract Decision tree construction is an important data-mining problem. In this paper we introduce an approach of using cumulative information estimations for fuzzy decision tree induction. We present new type of fuzzy decision tree: ordered tree. This tree is oriented to parallel processing of attributes with differing costs. Keywords: classification, cumulative information estimation, ordered fuzzy decision trees, parallel processing. 1
INTRODUCTION Recent times have seen an explosive growth in the availability of various kinds of data. It has resulted in unprecedented opportunity to develop automated data-driven techniques of extracting useful knowledge. Data mining, an important step in this process of knowledge discovery, consists of methods that discover interesting, non-trivial, and useful patterns, hidden in the data. Decision tree induction has been widely used in extracting knowledge from featurebased examples for classification and decision-making. An induction algorithm is used to learn a classifier, which maps the space of feature values into the set of class values [11]. Detecting every feature value can be obtained by diagnostic tests of input attributes that have associated with integrated (financial and temporal) costs. An interesting problem here is to introduce such a method that would search for an optimal (or sub-optimal) sequence of tests to be undertaken when recognize a new subject in order to minimize the cost of diagnostics. One approach for solving this problem is a decision tree approach. On the other hand our approach should be suitable for parallel processing. In this paper, we present our approach, which deals with fuzzy defined data. These data are more accurate in reflecting a real around world. Our approach also should be able to analyze the order in which different diagnostic tests should be performed in order to minimize the diagnostics costs and to guarantee a desired predefined level of accuracy. For these purposes, we use a technique to compute cumulative information estimations of fuzzy sets [8]. The application of such estimations allows inducing minimum cost decision trees based on new optimality criteria. We obtain new type of fuzzy decision tree: ordered tree. Ordered tree differs from unordered fuzzy decision tree
46
Fuzzy Decision Tree for Parallel Processing Support
in the way of testing attributes. In ordered trees the order of attribute tests is independent from the results of previous tests, so we can test next attributes in parallel. This leads to decreasing of expenditures for test attributes. The paper is structured as follows. Section 1 contains brief information about fuzzy logic and fuzzy decision trees. Section 2 shortly describes how cumulative information estimates are calculated, and then Section 3 shows how these estimations are used for fuzzy decision trees induction with a simple example. 2
FUZZY LOGIC AND FUZZY DECISION TREES In many real world applications the data used are inherently of imprecise and subjective nature. When an expert tries to analyze a certain event, fact or what ever in a real world environment, she or he usually expresses the estimates with some degree of confidence and operates with the terms like ”highly favorable”, “rather favorable”, “absolutely inauspicious” and so on. These estimates can be represented as numerical or logical values. The use of numerical values makes the search of common characteristics from the input attributes more difficult because of variety of real values. The use of Boolean or Multi-Valued logic approaches does not always give necessary reliance. There is another significant drawback of such crisp approach to data evaluation: in many cases it is against the nature of human beings. People use their subjective feelings, background knowledge and short-time memory, rather than any frequency criteria, to distinguish different data. Fuzzy logic is a popular approach to capture this vagueness of information [19]. The basic idea is to come from the “crisp” 1 and 0 values to a degree of truth or confidence in the interval [0,1]. Definition 1. A fuzzy set A with respect to a universe U is characterized by a membership function µA : U → [0,1], which assigns an A-membership degree, µA(u), to each element u in U. µA(u) represents the degree to which u ∈ U belongs to the set A. Definition 2. The cardinality measure of fuzzy set A is defined by M(A) = Σu∈U µA(u), which is the measure of the size of A. Analyzing the corresponding values of a membership function performs the fuzzification of the initial data. Here, each attribute value can be seen as likelihood estimate. In this paper we analyze a particular case when the sum of membership values of all linguistic terms for an attribute equals to 1. One of the algorithms to transform from numeric to triangular fuzzy data was presented in [4]. A typical classification problem can be described as follows [18]. A universe of objects U={u} is described by N training examples and n input attributes A={A1,...,An}. Each attribute Ai (1≤i≤n) measures some important feature and is presented by a group of discrete linguistic terms. We assume that each group is a set of mi (mi ≥2) values of fuzzy subsets {Ai,1,…,Ai,j ,…,Ai,mi}. The cost of an attribute Ai denoted as Costi is an integrated measure that accounts financial and temporal costs that are required to define the value of the Ai for a certain subject. We will suggest that each object u in the universe is classified by a set of classes {B1,...,Bmb}. This set is described by output attribute B.
Journal of Information, Control and Management Systems, Vol. 3, (2005), No. 1
47
One approach for solving classification task based on decision tree induction follows. R. Quinlan proposed a general top-down mutual information algorithm to design crisp decision trees [14]. A well-known survey of crisp decision trees induction was presented in [15]. Usually, modifications of ID3 and C4.5 algorithms are used in traditional way to select effective input attributes for output classification. A first idea fuzzy decision tree was introduced by [1] who presented fuzzy decision trees structure and a search method and described a difference between crisp and fuzzy decision trees. However, this work did not address the issue of fuzzy decision trees induction. The generalizing of ID3 algorithm for fuzzy sets resulted into Fuzzy ID3 algorithm and its known variants and can be found in [1, 2, 3, 17, 13]. Yuan and Shaw proposed construction of fuzzy decision trees during the process of reducing classification ambiguity with accumulated fuzzy evidences [17]. DeLuca and Terminy’s entropy and ID3-like algorithm were used. This algorithm was developed in [16]. The investigation of nonordered fuzzy decision tree was realized in [12]. Let us consider the following example, which will be used in this paper. Example 1. An object is presented with four input attributes: A={A1, A2, A3, A4} and one output attribute B. Each attribute has values: A1={A1,1, A1,2, A1,3}, A2={A2,1, A2,2, A2,3}, A3={A3,1, A3,2}, A4 = {A4,1, A4,2} and B = {B1, B2, B3}. The membership of these attributes and their cost are presented in Table 1. Table 1: A small training set (adopted from [16]) A1,1
A1 A1,2
A1,3
A2,1
A2 A2,2
A2,3
A3,1
0.9 0.8 0.0 0.2 0.0 0.0 0.0 0.0 1.0 0.9 0.7 0.2 0.9 0.0 0.0 1.0
0.1 0.2 0.7 0.7 0.1 0.7 0.3 1.0 0.0 0.1 0.3 0.6 0.1 0.9 0.0 0.0
0.0 0.0 0.3 0.1 0.9 0.3 0.7 0.0 0.0 0.0 0.0 0.2 0.0 0.1 1.0 0.0
1.0 0.6 0.8 0.3 0.7 0.0 0. 0 0.0 1.0 0.0 1.0 0.0 0.2 0.0 0.0 0.5
0.0 0.4 0.2 0.7 0.3 0.3 0.0 0.2 0.0 0.3 0.0 1.0 0.8 0.9 0.0 0.5
0.0 0.0 0.0 0.0 0.0 0.7 1.0 0.8 0.0 0.7 0.0 0.0 0.0 0.1 1.0 0.0
0.8 0.0 0.1 0.2 0.5 0.7 0.0 0.2 0.6 0.0 1.0 0.3 0.1 0.1 1.0 0.0
No 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
Costi
2,5
1,7
A3
2,0
A3,2
A4,1
0.2 1.0 0.9 0.8 0.5 0.3 1.0 0.8 0.4 1.0 0.0 0.7 0.9 0.9 0.0 1.0
0.4 0.0 0.2 0.3 0.5 0.4 0.1 0.0 0.7 0.9 0.2 0.3 1.0 0.7 0.8 0.0
A4
A4,2
B1
B B2
B3
0.6 1.0 0.8 0.7 0.5 0.6 0.9 1.0 0.3 0.1 0.8 0.7 0.0 0.3 0.2 1.0
0.0 0.6 0.3 0.9 0.0 0.2 0.0 0.7 0.2 0.0 0.3 0.7 0.0 0.0 0.0 0.5
0.8 0.4 0.6 0.1 0.0 0.0 0.0 0.0 0.8 0.3 0.7 0.2 0.0 0.0 0.0 0.5
0.2 0.0 0.1 0.0 1.0 0.8 1.0 0.3 0.0 0.7 0.0 0.1 1.0 1.0 1.0 0.0
1,8
Costi is a cost for accomplishing the required tests and procedures. In this paper we introduce an approach for a sub-optimal sequence of expanded attributes testing, i.e. determination of input attributes’ values {A1i,...,Ani} of a new subject, that allows to accomplish the correct diagnostics. It is obvious that the problem is that the sub-optimal sequence should guarantee correct diagnostics with in advance defined level of accuracy while minimum cost for accomplishing the required tests and procedures is reached.
Fuzzy Decision Tree for Parallel Processing Support
48
3
MUTUAL CUMULATIVE INFORMATION AND CUMULATIVE ENTROPY We have proposed cumulative information estimates in [5]. The information and entropy are accumulated in the following way. Let’s have sequence of q input attributes Ai1,…,Aiq and one output attribute B. Definition 3. The cumulative joint information of the sequence of values Uq={Ai1,j1,...,Aiq,jq} (q≥2) and Bj (j=1,…,mb) is I(Bj, Uq) = -log M(Bj ×Ai1,j1×…×Aiq,jq) bits.
(1)
Definition 4. The cumulative conditional entropy between attributes B, Aiq and the sequence of values Uq-1= {Ai1,j1,...,Aiq-1,jq-1} of attributes {Ai1,...,Aiq-1} is uncertainty of values of output attribute B when the sequence Uq-1 and values Aiq,jq (or attribute Aiq) are known H(B|Uq-1,Aiq,jq)=
mb
∑ M (Bj×Uq)×(I(Bj,Uq) - I(Uq)) bits, j =1
miq
H(B|Uq-1,Aiq)=
∑ H (B|U
q-1,Aiq,jq)
bits,
(2)
jq =1
where I(Bj,Uq) is cumulative joint information of the sequence of values Uq and Bj (1). Definition 5. The cumulative mutual information in output attribute B about attribute Aiq and the sequence of values Uq-1={Ai1,j1,...,Aiq-1,jq-1} reflects the influence of attribute Ai1 on the attribute Ai2 when sequence Uq-1 of attributes is known. I(B; Uq-1, Aiq) = H(B|Uq-1) - H(B|Uq-1,Aiq) bits,
(3)
where H(B|Uq-1) and H(B|Uq-1,Aiq) are cumulative conditional entropies (2). Usage of these information estimations allows forming like a criterion for fuzzy decision tree induction for example I(B, A i1 , … , A iq-1 , A iq ) Cost (A iq )
4
→ max, for ordered FDT.
ORDERED FUZZY DECISION TREES INDUCTION We propose a new interpretation of Fuzzy ID3, which is based on cumulative information estimate [8]. Apart from the selection of expanded attributes, the determination of the leaf node is another important issue for fuzzy decision trees induction. The key points of a proposed algorithm for induction fuzzy decision trees are (a) a heuristic for selecting expanded attributes and (b) a rules for transform nodes into leaves. Expanded attributes are such attributes that according to values of attribute trees are expanded at the nodes considered. The induction of an ordered fuzzy tree has less complexity, while it does not require information estimates calculations for each branch of a tree. Choosing an expanded
Journal of Information, Control and Management Systems, Vol. 3, (2005), No. 1
49
attribute q is sufficient enough to maximize the increment of information about the attribute at minimum of costs: argmaxi I(Bq;Uq-1,Aiq)/Costi ,
(4)
where operator argmaxi Vi defines parameter i with max value of Vi , and
Uq-1={Ai1,...,Aiq-1}. I(Bq;Uq-1,Aiq) is calculated using (3).
Fuzzy decision tree induced from the data presented in Table 1 (β=0,75 and α=0,16) can be seen in Figure 2. α = 0 ,1 6 β = 0 ,7 5
A2
B 1 = 2 6 ,6 % B 2 = 5 4 ,1 % B 3 = 1 9 ,3 %
B 1 = 3 7 ,1 % B 2 = 1 5 ,9 % B 3 = 4 7 ,0 %
B 1 = 1 6 ,3 % B 2 = 4 ,9 % B 3 = 7 8 ,8 %
f = 0 ,3 8 1
f= 0 ,3 5 0
f = 0 ,2 6 9
A4
A4
B 1 = 1 5 ,7 % B 2 = 5 3 ,6 % B 3 = 3 0 ,7 %
B 1 = 3 2 ,3 % B 2 = 5 4 ,4 % B 3 = 1 3 ,3 %
B 1 = 1 7 ,2 % B 2 = 7 ,4 % B 3 = 7 5 ,4 %
B 1 = 5 3 ,4 % B 2 = 2 2 ,9 % B 3 = 2 3 ,7 %
f = 0 ,1 3 1
f = 0 ,2 5 0
f = 0 ,1 5 8
f = 0 ,1 9 2
A1
A1
B 1 = 3 3 ,2 % B 2 = 6 2 ,4 % B 3 = 4 ,4 %
B 1 = 3 9 ,1 % B 2 = 5 2 ,2 % B 3 = 8 ,7 %
B 1 = 6 ,0 % B 2 = 8 ,0 % B 3 = 8 6 ,0 %
B 1 = 5 7 ,9 % B 2 = 3 9 ,1 % B 3 = 3 ,0 %
B 1 = 5 5 ,2 % B 2 = 1 4 ,1 % B 3 = 3 0 ,7 %
B 1 = 3 6 ,9 % B 2 = 1 3 ,7 % B 3 = 4 9 ,4 %
f = 0 ,1 5 1
f = 0 ,0 6 6
f = 0 ,0 3 3
f = 0 ,0 6 8
f = 0 ,0 9 6
f = 0 ,0 2 8
Figure 2 Fuzzy decision tree Induction of an ordered decision tree allows reducing the classification time due to the possibility of parallel checking of attribute values at several levels of the decision tree. Actually, when classifying an instance at level s, using the unordered decision tree (s=2,…) we need to define Ais attribute’s value and it is impossible to know in advance which attribute value we need to evaluate at the next level (s+1). On the contrary, when using an ordered decision tree, one attribute is associated to all branches of the level. That is because we know in advance the sequence of attributes to be evaluated for each branch. Obviously, the construction of an ordered decision tree implies additional costs, but its use can be beneficial in the situations when time factor is critical and there is a possibility to implement the check of several attributes simultaneously. Parallel processing is possible because the choice of next attribute does not depend on values of the preceding attributes. The order of attribute tests is independent of a situation and only the amount of attribute tests depends on it. This feature allows us to test next attribute even if we do not know
Fuzzy Decision Tree for Parallel Processing Support
50
result from testing preceding attributes. The testing can be executed using more than one processor, which will lead to decreasing of the total executing time. Let us consider an example from Figure 3. If we test attributes sequentially, executing tests on one processor, the total time for testing (Figure 3a) is: Cost of FDT1 = Cost(A2)×1,0+Cost(A4)×(0,381+0,350)+Cost(A1)×(0,250+0,193) = 4,1208 If we use two processors (Figure 3b), the cost will be: Cost of FDT2 = max (Cost (A2), Cost (A4)) ×1,0 + Cost (A1)×(0,250 + 0,192) = 2,905 If we use three processors (Figure 3c), the total executing time will be: Cost of FDT 3= max (Cost (A2),Cost (A4),Cost (A1)) × 1,00 = 2,5 a)
Proc. P1
2,5
Proc. P2
1,8
Proc. P3
1,7
0
1
2
3
4
2
3
4
2
3
5
6
time
b)
Proc. P1
2,5
Proc. P2
1,8
Proc. P3
1,7
0 1 minutes
c)
Proc. P1
time
2,5
Proc. P2
1,8
Proc. P3
1,7
0
1
time
Figure 3 Using parallel processing This simple example shows, that exploiting the parallel processing can reduce total execution time for building the decision tree. The problem of mapping the processes to processors (in distributed system, or on SMP) is NP-hard task [10] and it is usually solved using heuristic algorithm.
Journal of Information, Control and Management Systems, Vol. 3, (2005), No. 1
51
5
CONCLUSION We have proposed the induction technique of new type of fuzzy decision tree, which is simple to understand and apply. The use of cumulative information estimations allows precisely estimating mutual influence of attributes. These evaluations are tool for analysis of training examples group. Our estimations are based on Shannon's entropy analogue. The use of such estimates allows inducing minimum cost decision trees based on different criteria. We introduced the cost of expanded attributes diagnostics into considered algorithms. We have shown a possibility of reducing the execution time at the first stage of building a decision tree due to ordered processing of attributes. Parallel processes were used for resolving time-consuming tasks in many scientific problems [5], [6], [10], [20]. The problem of parallelizing the process of building a fuzzy decision tree can be treated stage by stage. At the first step a fuzzification of the continuous data can be parallelized. If for each interval is created one process, than the level of parallelism will be equal to the number of intervals. The second improvement is possible through searching for maximum of the increment of information about the attribute at minimum of costs, which also can be done in parallel. Next possibility is to do parallel check of the attribute values, which was mentioned before in this article. Using parallel processes can also decrease the time for mining of association rules. Programming model of the processes [6], [7] must be carefully selected in order to reflect their real configuration. Execution time improvements are not straightforward, because of the dependence of data localization, memory requirements, scalability and load balancing. Building the fuzzy decision tree is process, which requires communication between nodes and also access to part of the data. Exact estimation of communication, memory requirements and scalability are important factors, which can influence resulting benefits from exploiting parallel processes. Acknowledgements This research was supported by grant of Scientific Grant Agency of the Ministry of Education of Slovak Republic and the Slovak Academy of Sciences (VEGA) No 1/1053/04. REFERENCES [1] Chang R.L.P., T. Pavliddis: Fuzzy Decision Tree Algorithms, IEEE Trans on Systems, Man and Cybernetics, 7, pp. 28-35, 1977. [2] Chiang I-Jen, Jane Yung-jen, Hsu: Fuzzy Classification Trees for Data Analysis Fuzzy Sets and Systems, 130, pp. 87-99, 2002. [3] Janikow C.Z.: Fuzzy Decision Trees: Issues and Methods, IEEE Trans. on Systems, Man, and Cybernetics — Part B: Cybernetics, Vol. 28, 1, pp.1-14, 1998.
52
Fuzzy Decision Tree for Parallel Processing Support
[4] Lee H.M., Chen C.M., Chen J.M. and Jou Y.L.: An efficient fuzzy classifier with feature selection based on fuzzy entropy, Proc. IEEE Trans. on Systems, Man and Cybernetics, Part B: Cybernetics, vol. 31, 3, pp. 426-432, 2001. [5] Kollár Ján: Metódy a prostriedky pre výkonné paralelné výpočty. Academic Press elfa, s.r.o., Košice, 2003, Edícia monografií FEI TU v Košiciach, 110 pp., ISBN 8089066-70-4 [6] Kollár, Ján: Paralelné programovanie. Academic Press elfa, s.r.o., Košice, 1999, monografia, ISBN 80-88964-14-8, 96pp. [7] Kollár Ján, Václavík Peter, Porubän Jaroslav: The Classification of Programming Environments, Acta Universitatis Matthiae Belii, 10, 2003, pp. 51-64, ISBN 80-8055662-8 [8] Levashenko V., E.Zaitseva: Usage of New Information Estimations for Induction of Fuzzy Decision Trees. Proc of the 3rd IEEE Int. Conf. on Intelligent Data Engineering and Automated Learning, 2002. [9] Martincová P.: Software Environment for Cluster Computing. odborný článok, in: Journal of Information, Control and Management Systems, Volume 1, No. 1, 2003, vyd.: University of Žilina, Faculty of Management Science and Informatics. HQ Sudio, Žilina, 2003, ISSN 1336-1716 [10] Martincová P., Kňaze M.: Using Parallel Processes for Optimization Tasks, Proc. of Mosmic 2003 (Modelling and Simulation in Management, Informatics and Control), EDIS – Žilina University publisher, pp.139-144, ISBN 80-8070-139-3 [11] Mitchel T.M.: Machine Learning, McGrey Hill, 273p., 1997. [12] Mitra S., K.M.Konwar, and S.K.Pal: Fuzzy Decision Tree, Linguistic Rules and Fuzzy Knowledge-Based Network: Generation and Evaluation, IEEE Trans. on Syst., Man Cybernet. — Part C: Applications And Reviews, Vol. 32, 4, pp.328-339, 2002 [13] Olaru C. and Whenkel L.: A complete fuzzy decision tree technique, Fuzzy Sets and Systems, 138, 2003, pp.221-254. [14] Quinlan J.R. : Induction of Decision Trees, Machine Learning, 1, pp. 81-106, 1986. [15] Safavian S.R. and D.Landgrebe. A Survey of Decision Tree Classifier Methodology, IEEE Trans. on Sys. Man Cybernet. 21, pp.660-674, 1991. [16] Wang X., Chen B., Qian G. and Ye F.: On the Optimization of Fuzzy Decision Trees, Fuzzy Sets and Systems, 112, 2000, pp.117-125. [17] Yeung D.S., E.C.C.Tsang and Wang X.: Fuzzy Rule Mining by Fuzzy Decision Tree Induction Based on Fuzzy Feature Subset. Proc. of the IEEE Int. Conf. on Syst., Man Cybernet., 4, pp.599-604, 2002. [18] Yuan Y. and M.J. Shaw: Induction of Fuzzy Decision Trees, Fuzzy Sets and Systems, 69, pp.125-139, 1995. [19] Zadeh L.: Fuzzy sets, Information and control, 8, pp.407–428, 1965. [20] Joshi M.V., Han E., Karypis G., Kumar V.: Parallel Algorithms in Data Mining, http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm Referee: doc. Ing. Ján Kollár, PhD. Faculty of Electrical Engineering and Informatics, University of Kosice