Visual Techniques for the Interpretation of Data Mining Outcomes

3 downloads 18303 Views 501KB Size Report
3-Dimensional visual data mining technique for the representation and mining of classification ... outcomes are: How well separated are the different classes?
Visual Techniques for the Interpretation of Data Mining Outcomes Ioannis Kopanakis1, Nikos Pelekis2, Haralampos Karanikas3, and Thomas Mavroudkis4 1

Technological Educational Institute of Crete, Heraklion Crete, Greece [email protected] 2 Univ. of Piraeus, Piraeus, Greece [email protected] 3 UMIST, Manchester, UK [email protected] 4

National & Kapodistrian Univ. of Athens, Knowledge Management Lab., Athens, Hellas

Abstract. The visual senses for humans have a unique status, offering a very broadband channel for information flow. Visual approaches to analysis and mining attempt to take advantage of our abilities to perceive pattern and structure in visual form and to make sense of, or interpret, what we see. Visual Data Mining techniques have proven to be of high value in exploratory data analysis and they also have a high potential for mining large databases. In this work, we try to investigate and expand the area of visual data mining by proposing a new 3-Dimensional visual data mining technique for the representation and mining of classification outcomes and association rules. Keywords: Visual Data Mining, Association Rules, Classification, Visual Data Mining Models. Categories: I.2.4, I.2.6 Research Paper: Data Bases, Work Flow and Data mining

1 Introduction and Motivation Classification is a primary method for machine learning and data mining [Frawley, 92]. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a pre-processing step for other algorithms operating on the detected clusters. The main enquiries that the knowledge engineer usually has on his/her attempt to understand the classification outcomes are: How well separated are the different classes? What classes are similar or dissimilar to each other? What kind of surface separates various classes, (i.e. are the classes linearly separable?) How coherent or well formed is a given class? Those questions are difficult to be answered by applying the conventional statistical methods over the raw data produced by the classification algorithm. Unless the user is supported by a visual representation that will actually be his/her navigational P. Bozanis and E.N. Houstis (Eds.): PCI 2005, LNCS 3746, pp. 25 – 35, 2005. © Springer-Verlag Berlin Heidelberg 2005

26

I. Kopanakis et al.

tool in the N-dimensional classified world, concluding inferences will be a tedious task [Keim, 95]. Our main aim therefore should be to visually represent and understand the spatial relationships between various classes in order to answer questions such as the above mentioned. Further more, mining for association rules, as a central task of data mining, has been studied extensively by many researchers. Much of the existing research, however, is focused on how to generate rules efficiently. Limited work has been done on how to help the user understand and use the discovered rules. In real-life applications though, the knowledge engineer wants first to have a good understanding over a set of rules before trusting them and use the mining outcomes [David, 01]. Investigation and comprehension of rules is a critical pre-requirement for their application. Those issues become even more tightening if we consider the “large resulting rule set”, the “hard to understand” and the “rule behaviour” problem [Zhao, 01]. In this paper, the proposed visual data mining model constructs 3D graphical representations of the classification outcomes produced by common data mining processes. Furthermore, association rules are also visualized in that representation, revealing each association rule’s “state” in their original N-dimensional world. Our attempt is to equip the knowledge engineer with a tool that would be utilized on his/her attempt to gain insight over the mined knowledge, presenting as much information extracted in a human perceivable way. The model proposed have distinctive advantageous characteristics, addressing the commonly tedious issues that the knowledge engineer handles during the exploitation of the classification outcomes. Furthermore, it brings us one step closer to make human part of the data mining process, in order to exploit human’s unmatched abilities of perception. In section 2 we introduce our application domain, along with the presentation of our 3D Class-Preserving Projection Technique. In section 4 we investigate the application of this model for the visualization of association rules, which is followed by two case studies in sections 5 and 6. Finally, the related work is presented in section 7 and we summarize our work in section8.

2 Visualizing Data Mining Classification Outcomes On our attempt to graphically reveal the knowledge extracted by a classifier we have mainly based our research effort on the underlying ideas of the geometric projection techniques [Dhillon, 98]. Among the several geometric projection techniques that we have studied, the most interesting methodology was the Class-Preserving Projection Algorithm [Dhillon, 99], due to the robust behaviour that it has and its middle level of computational complexity. The main characteristic of classified data embedded in high-dimensional Euclidean space is that proximity in Rn implies similarity. During the mapping procedures, class-preserving projection techniques preserve the properties of the classified data in the Rn space also to the projection plane in order to construct corresponding representations from which accurate inferences could be extracted. Our research study on those techniques formed a new geometric projection technique that expands the existing methods in the area of visualizing classified data. That new technique named 3D

Visual Techniques for the Interpretation of Data Mining Outcomes

27

Class-Preserving Projection technique projects from the Rn to the R3 space along with being capable of preserving the class distances (discriminating) among a larger number of classes.

3 3D Class-Preserving Projection Technique In this section we introduce 3D class-preserving projections of multidimensional data. The main advantage of those projections is that they maintain the high-dimensional class structure by the utilization of linear projections, which can be displayed on a computer screen. The challenge is in the choice of those planes and the associated projections. Considering the problem of visualizing high-dimensional data that have been categorized into various classes, our goal is to choose those projections that best preserve inter-class and intra-class distances in order to extract inferences regarding their relationships. On our attempt to expand the existing projection techniques we worked on the definition of a projection scheme that would result on the construction of a 3D world. Compared to the existing 2D class-preserving projection techniques, the proposed 3D technique results on the construction of an information rich representation due to the freedom provided by the additional dimension in the projection world would. In order to project onto the 3D space we should define our orthonormal projection vectors based on four points. If we chose those four points to be the class-means of the classes of our interest, we have managed to maximize the inter-class distances among those four classes on our projection. Such an approach provides the flexibility of distinguishing among four classes instead of three, as long as being promoted into the 3D projection space. We consider the case where the data is divided into four classes. Let x1, x2, …, xn be all the N-dimensional data points, and m1, m2, m3, m4, denote the corresponding class-centroids. Let w1, w2 and w3 be an orthonormal basis of the candidate 3D world of projection. The point xi gets projected to (w1Txi, w2Txi, w3Txi) and consequently, the means mj get mapped to (w1Tmj, w2Tmj, w3Tmj) j=1,2,3,4. One way to obtain good separation of the projected classes is to maximize the difference between the projected means. This may be achieved by choosing vectors w1, w2, w3 ∈ Rn such that the objective function 3

{

C ( w1 , w2 , w3 ) = ∑ wiT (m2 − m1 ) + wiT (m3 − m1 ) + wiT (m4 − m1 ) + i =1

2

2

2

2

wiT ( m3 − m2 ) + wiT ( m4 − m2 ) + wiT ( m4 − m3 )

2

2

is maximized. The above may be rewritten as 3

{ {

} }

C (w1 , w2 , w3 ) = ∑ wiT (m2 − m1 )(m2 − m1 )T + ... + (m4 − m3 )(m4 − m3 )T wi i =1

= w1T S B w1 + w2T S B w2 + w3T S B w3 = W T S BW

Where

W = [ w1 , w2 , w3 ], wiT wi = 1, wiT w j = 0, i ≠ j , i, j = 1,2,3 and S B = (m2 − m1 )(m2 − m1 )T + ... + (m4 − m3 )(m4 − m3 )T

28

I. Kopanakis et al.

The positive semi-definite matrix SB can be interpreted as the inter-class or betweenclass scatter matrix. Note that SB has rank ≤ 3, since (m3 − m2 ) ∈ span{(m2 − m1 ), (m3 − m1 )} , (m4 − m2 ) ∈ span{(m4 − m1 ), (m2 − m1 )}, (m4 − m3 )∈ span{(m4 − m1 ), (m3 − m1 )}. It is clear that the search for the maximizing w1, w2 and w3 can be restricted to the column (or row) space of SB. But as we noted above, this space is at most of dimension 3. Thus, in general, the optimal w1, w2 and w3 must form an orthonormal basis spanning the space determined by the vectors (m2 – m1), (m3 – m1) and (m4 – m1). This technique can be applied in any number of classes. In the constructed visual representation though it will best discriminate the four selected classes.

4 Class-Preserving Projection Techniques and Association Rules Class-preserving projection techniques could be also applied in the area of visual mining of association rules. Even in the case of association rules, inventing new visual data mining models is actually conceiving new mapping techniques from the multidimensional space to a lower dimensional space. As each attribute participating in a rule is actually adding an additional dimension to our data space, we try to map each association rule existing in Rn to a lower dimensional space. Those notions conform to the fundamental theory of the class-preserving projection techniques. Theoretically, each rule could be perceived as an n-dimensional surface which encloses a sub-space in the high dimensional data space. The boundaries of that area are defined by the conditions of rule’s sub-expressions, which pose the limits in each dimension (i.e. the sub-expressions of the association rule IF ((L1