Exploratory visualization for association rule rummaging - CiteSeerX

13 downloads 1904 Views 202KB Size Report
requirements: structuring the rule set into a rule network to allow the user to .... with Silicon Graphics' FSN, hypertext document graphs with. Harmony [3], or ...
Exploratory Visualization for Association Rule Rummaging Julien BLANCHARD

Fabrice GUILLET

Henri BRIAND

IRIN – Polytech'Nantes – University of Nantes rue Christian Pauc – BP 50609 44306 Nantes cedex 3 33 2 40 68 32 57

{julien.blanchard, fabrice.guillet, henri.briand}@polytech.univ-nantes.fr

ABSTRACT

1.

On account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge validation is one of the most problematic steps in an association rule discovery process. In order to comprehend this bulk of rules and to find relevant knowledge for decision-making, the user needs to really rummage through the rules. Visualization can be very beneficial to support him/her in this task by improving the intelligibility of the large rule sets and enabling the user to carry out his/her navigation inside them. In this article, we propose to answer the association rule validation problem by designing a visualization for the rule rummaging task. This new approach based on a specific rummaging model relies on interactive rule focusing and on rule quality measures. A first prototype implementing our representation has been developed. It allows the user to appropriate the interesting rules by navigating through a voluminous rule set by trial and error via successive limited subsets.

Generally speaking, Knowledge Discovery in Databases (KDD) is the process of finding new and useful information inside large datasets [10]. Among the knowledge models used in KDD, association rules [2] have become a major concept and received significant research attention. These rules are implicative tendencies of the form X Y where X and Y are conjunctions of database items (boolean variables). Such a rule means that most of the records which verify X in the database verify Y too. For instance, in market basket analysis where this concept is widely used to model the customer transactions, an association rule (pizza, crisps) (beer) means that if a customer buys a pizza and crisps then he/she most probably buys beer too. The main drawback of association rule mining is the enormous amount of rules that can be produced by the extraction combinatory algorithms. This is due to the unsupervised nature of rule discovery: the user (a decision-maker) does not make his/her goals explicit and does not specify any endogenous variable. Then, the number of item conjunctions handled by the algorithms increases dramatically. In practice, it is very difficult for the user to find interesting knowledge for decision-making in a corpus that can hold hundreds of thousands or even millions of rules with large business databases. This is the reason why knowledge interpretation and validation is still one of the most problematic steps in an association rule discovery process, giving rise to a second discovery challenge called knowledge mining.

Categories and Subject Descriptors Applications – Data mining; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval; H.5.2 [Information Interfaces and Presentation]: User Interfaces – Graphical user interfaces, Screen design, Interaction styles; I.3.8 [Computer Graphics]: Applications

General Terms Design, Experimentation, Human Factors

Keywords knowledge discovery in databases, knowledge validation, information visualization, interactive post-processing of association rules, 3D representation, information landscape

The copyright of these papers belongs to the paper’s authors. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. MDM/KDD’03, August 27, 2003, Washington, DC, USA.

INTRODUCTION

Visualization can be very beneficial to KDD [11]. Indeed the KDD process is by nature highly iterative and interactive and requires user involvement [36]. Visualization techniques are a very effective means of introducing the necessary human subjectivity into each step of the process while taking advantage of the human perceptual and cognitive capabilities. In particular, they are a possible answer to the association rule validation problem by providing intelligible visual representations to explore the rule sets instead of little comprehensible textual lists. However, although the design of appropriate metaphors for visualization in data mining has been addressed (see for example [37]), few works deal with the association rule visualization problem. Moreover, the main drawback with the proposed representations is that they become heavily cluttered when the rules are too numerous, mainly because they suffer from a lack of interactivity. In this article, we have opted for defining the user's task in the rule validation problem as a prerequisite. Indeed, in order to

efficiently support the user in his/her knowledge search, the KDD process should be considered not from the point of view of the discovery algorithms but from that of the user's, as a humancentered and task-oriented decision support system [7, 14]. From our previous works on association rules with the firm PerformanSE S.A. [27, 28], we learned that a successful way of considering the user's task during the validation step is to define it as a "rule rummaging" task, i.e. navigating through the rules by focusing on successive subsets according to the user's interest. The rule set is thus explored piece by piece so that the user does not need to appropriate it entirely. Hence, we propose in this article to answer the association rule validation problem by designing a visualization method for the rule rummaging task. This new approach based on our rummaging model relies on the rule focusing and on the rule quality measures -indexes which allow the user to evaluate the relevance of the rules on the basis of various criteria (refer to [20] or [39] for a survey). A first prototype implementing our new representation has been developed. After recalling the previous works on association rule visualization in a first section, we present the rule rummaging task and its requirements in terms of rule set structure and visualization. Then we describe our visualization method and the choices we made for the rule set structure, the visual metaphor and the interactions. Finally, we conclude on the rule rummaging tool we developed and on future work.

2.

SURVEY ON ASSOCIATION RULE VISUALIZATION

In KDD, the information visualization techniques can be used as knowledge extraction methods on their own, this is sometimes called Visual Data Mining (see for example [24, 32]), or they can collaborate with data mining algorithms to assist the KDD process supervision [16, 1]. In this paper we are interested in the second case since we are trying to visualize the outputs of data mining algorithms. More precisely, the association rule sets we want to represent are lists of rules where each rule consists of the set of items on the left-hand side, the set of items on the right-hand side (sets of items are called itemsets), and a set of numerical values, the rule's quality measures. At least two measures are always used to describe a rule set: support and confidence [2] that are exploited with filtering thresholds by data mining algorithms to prune the rules during their extraction. Support evaluates the generality of the rules, and confidence its validity (success rate). One way of representing association rule sets is the matrix approach. SGI MineSet, IBM DB2 Intelligent Miner Visualization, SAS Enterprise Miner, DBMiner [15], and Hoffmann & Wilhelm [21] give different implementations of it. In an item-to-item matrix, all the left-hand and right-hand side items are arranged along two orthogonal axes (figure 1). A rule between two items is symbolized in the matching cell by a mark (2D shape, 3D bar chart…) which is used to map the quality measures to visual form. The user can focus on some rules by filtering and sorting on items and measures. This technique was improved into rule-to-item matrices [41] whose cluttering is lower and which allow a more efficient representation of rules with more than two items. The main limit of these approaches is that the matrices

reach considerable sizes in case of a large rule set over numerous items, complicating the user's search.

Figure1. An item-to-item matrix Association rule sets can be visualized using a directed 2D graph too [15, 35], the nodes and edges respectively representing the items and the rules (see figure 2 where letters denote items). The quality measures are symbolized on the edges (for example with color or thickness), which does not allow lots of measures to be integrated into the visualization. Though this is a very intuitive representation, it does not suit the visualization of large rule sets over numerous items either: the graph is overloaded with nodes and crossing edges, all the more since rules with more than two items are considered. To extend the intelligibility limits of the representation, the same method has been used in 3D with a selforganization algorithm to guarantee a more efficient graph layout [18]. Also we proposed in [27] a dynamic 2D graph which is a subgraph of the itemset lattice well-known in the association rule extraction algorithms (see figure 3). In this graph, the nodes do not represent the items but the itemsets so that a rule (A,B) (C) is symbolized by an edge between the nodes (AB) and (ABC). Only rules with one item in the right-hand side are here visualized. The resulting graph is acyclic with fewer edge crossings but more nodes. The user can dynamically develop the graph as he/she wishes by clicking on the nodes. However, the graph gets bigger with each such interaction. C

A

B

D

E

Figure 2. A rule graph with items as nodes While the previous visual representations are structured by items, handling quality measures as additional information, the visualization proposed in [34] is organized by confidence and support. This consists in another kind of rule matrix: here the two orthogonal axes are dedicated to support and confidence, and moving the mouse pointer over a cell in the matrix pops a tooltip that shows the names of the rules contained in the cell. Dynamic queries on support, confidence and items via sliders and check

boxes allow the user to focus on selected rules. While this method is efficient to search rules using constraints on quality measures, it does not suit the search for rules using item constraints, all the more since rules are very numerous and can fall into the same cell. This approach is the closest to the one proposed in this paper, which also uses a spatial mapping to highlight the quality measures. ABC A

AB ABD

D

DE

Figure 3. A rule graph with itemsets as nodes Other rule representation methods for association rule sets have been proposed. They do not deal with the visualization of the whole rule set but with the visualization of a rule pattern (rules with given items on the left-hand and right-hand sides). These methods allow thoroughly studying a restricted number of rules, making their interpretation easier, and helping to understand their appearance context. They fall into the framework of supervised knowledge discovery (when the user knows in what he/she is interested). We can quote for example Hofmann & Wilhelm's mosaic plots [21] for rules with categorical variables, or Fukuda et al. [12] and Han & Cercone [17] for numerical rules. Also some techniques inspired from parallel coordinates have been considered to visualize patterns of classification rules [16] or association rules [26].

3.

in the context of decision models [31] on the other hand, we consider that the user applies a "focusing strategy": faced with a large amount of information, a decision-maker focuses his/her attention on a limited subset of potentially useful data and changes the subset he/she is focusing on until he/she is able to reach a decision. Bearing this in mind, we learned from our experiments on rule validation with the firm PerformanSE S.A. on marketing and human resource management data [27, 28] that a successful way of enabling the user to find interesting rules in an unsupervised context is to consider his/her task as a "rule rummaging" task. Rummaging is a means of supporting the user's focusing strategy: by rummaging among the rules, we mean navigating through the voluminous rule set with a kind of trial and error strategy by focusing on successive limited subsets according to the user's interest (figure 4). To be more accurate, the user carries out a series of local explorations through the whole rule set via restricted and so more intelligible subsets. At each step, the user chooses the next subset to explore. This decision is made by the user according to his/her feelings and current interests, which allows expressing subjectivity in the rule validation process. In the end, only the set portion selected by the user will have been visited gradually. Considering the validation problem as an optimization problem, the user acts here as the exploration heuristic. Exploiting a human or subjective heuristic is coherent because the function to be optimized, i.e. the user's interestingness, is subjective too. Organizing this rule rummaging process implies two requirements: structuring the rule set into a rule network to allow the user to navigate inside, and putting at his/her disposal a visual representation which can be used for supporting rule focusing and navigation. the whole rule set

ASSOCIATION RULE RUMMAGING

In this paper, we work on the set of all the rules that can be discovered in a database by data mining algorithms. As we are interested in rule visualization and not in rule extraction, we do not care whether in reality this rule set is completely extracted with a global algorithm (post-analysis of rules) or extracted only piece by piece with a local algorithm, according to what needs to be visualized (inductive database approach, introduced in [23]).

3.1

LE2

LE1

LE4 LE3

User's task

During the knowledge validation step, the user has the heavy task to search for interesting rules for decision-making in a bulky rule list described by quality measures. This task is feasible when the user knows what he/she is looking for and only wants to verify some hypotheses. In this case, expressing the corresponding constraints using a query language or an interactive tool should be useful and sufficient to find pertinent rules. But association rule discovery is by nature an unsupervised process, which means that it is typically applied when the user does not know precisely enough what he/she is looking for to express it with the rule terminology. Faced with rule profusion, it is not possible to explore the whole rule set, so how should one proceed to help the user in his/her task without explicit constraints? To address this problem, we make some hypotheses on the nature of the user's decision task. Inspired by research works on the user's behavior in a knowledge discovery process [4] on the one hand, and also by cognitive principles of information processing

LE: rule subset focused on for local exploration

Figure 4. Rule rummaging with local explorations of limited subsets

3.2

Structuring the rule set with neighborhood relations

At the output of the algorithms the rule set is simply a long list (see on table 1 a rule sample example with a few quality measures). They must be structured to allow the user to isolate rule subsets one by one and navigate from one to another. More precisely, we need to group the rules together into virtual subsets and combine these subsets among themselves thanks to neighborhood relations (figure 5). At each navigation step, to access a new rule subset after browsing the current one, the user has the choice among all the neighboring rule subsets, i.e. those reachable by the neighborhood relation. If this relation gives a graph structure to the rule set, the rummaging process is very

much like the graph incremental exploration methods that constitute a rather new approach to graph visualization [22, 33, 19]. The main difference is that with regard to rule rummaging only the nodes interest the user and must be visually represented, while the edges are only used as navigation vectors. Table 1. An excerpt from a rule set discovered in market basket data support (%)

confidence (%)

implication intensity

0,41

74,55

0,82

0,50

81,97

0,68

0,34

79,07

0,91

soap

0,84

93,33

0,12

shampoo

0,84

85,71

0,49

0,39

57,35

0,90

rule pizza

beer

crisps

beer

pizza, crisps shampoo soap

beer

beef steak tea

barbecue sauce

biscuits

0,61

62,24

0,34

59,09

0,41

coffee

biscuits

0,65

coffee

milk

0,84

76,36

0,25

diaper

beer

0,13

86,67

0,99

The neighborhood relation structures the rule set and induces the navigation operators. Of course it must be defined with the help of the user and have a pertinent meaning for him/her in order to support his/her search strategy. Instead of directly expressing precise constraints as he/she would do by querying the rule set, the user exploits these relations which can be viewed as constraint generalizations (class of constraints). For example, the neighborhood relation could be based on item patterns, exception rules [38], or rule summaries [29, 30]. One could also consider exploiting several neighborhood relations, so that the user can combine several types of moves in the same rule set exploration. Today, in our rummaging prototype, we use a relation of specialization/generalization among the rules. Choosing neighborhood relations based on items while letting the quality measures be symbolized in the visual representation seems to us the way to the most judicious solutions.

LE3 LE1

LE2

: neighborhood relation : explored subsets

Figure 5. The neighborhood relation allows navigating among the subsets The novelty here is not focusing on rule subsets but navigating among them which complements and reinforces the focusing. In the literature, the tools for rule validation are supplied with basic

focus functionalities for sorting and filtering by explicit queries or more intuitive visual dynamic queries generally on items and quality measures (cf. the tools described in the last section and the textual rule browser described in [25]). Though it could be possible to carry out the rule rummaging task with these classical functionalities, this would be very tedious because the user would have to implement by queries the neighborhood relations on his/her own. Indeed querying allows one to question the whole rule set and then only to refine the results by questioning the last outputs. This does not suit the rummaging task.

3.3

Designing a rule visualization

The user needs a visual representation that allows driving the rule rummaging process. We can take advantage of the fact that the user makes a succession of local explorations by only representing the current subset at each step of the exploration. This allows to draw fewer rules on the screen at the same time and so to improve the visualization intelligibility. As the rule rummaging task falls into the framework of unsupervised knowledge discovery, the representation for each rule subset must highlight the quality measures and the good quality rules have to be brought to the fore. Also the visualization should be able to integrate lots of measures (not only support and confidence as it often happens) and to represent rules with as many items as necessary. At last, the subset representation must support large rule amounts while remaining comprehensible and must allow dynamically filtering the rules on quality measures. With regard to the navigation requirements, the neighborhood relations must be triggered from the visualization. In order to do so, interactive controls for subset selection should be integrated in the visual representation or provided in the user interface or in a separate interface. Also, navigation would be improved by inserting minimum contextual information about the subsets that can be reached from the current one. For example, anticipating that a subset is worth being explored would be of great help.

4.

VISUALIZATION FOR RULE RUMMAGING

In this section, the rules are described by three quality measures: support, confidence [2] and the implication intensity. Implication intensity is a probabilistic index which evaluates the rule statistical surprisingness [13, 6]. This choice of measures is not a limitation on our approach and other indexes can be used.

4.1

Relation of specialization/generalization

To structure the rule set for subset focusing, we group together the rules into subsets identified by the name of an itemset. Each subset contains two kinds of rules: the specific and the general ones. The specific rules are all the rules with the subset name on the left-hand side, and the general rules are all the rules that can be built from the items in the subset name (refer to [5] for a formal definition). For example, in the subset named ABC where letters denote items, the specific rules are (A,B,C) (D), (A,B,C) (E), (A,B,C) (F) and so on, and the general rules are (A,B) (C), (A,C) (B) and (B,C) (A). Only rules with one item on the right-hand side are considered with this relation (this restriction is often done to make the rules more intelligible).

subset A

subset B

subset C

subset AB

subset AC

subset BC

subset ABC

over it. All things considered, we have chosen to use only one axis for the spatial mapping and so to spatially represent only one quality measure: in the specific rule and in the general rule areas, the spheres on cones are laid-out in the landscape on a wooden arena, which means that the further a sphere is, the higher it is placed (figure 8). This allows a better perception of the depth dimension and reduces occlusions, thus eliminating two main drawbacks of 3D. As can be seen on figures 7 and 8, the lawn and the sky in the landscape are displayed and some trees are dispersed in order to give the user some points of reference while making the representation as intuitive and natural as possible.

Figure 6. The relations of specialization and generalization among the rule subsets We use as neighborhood relation a relation of specialization among the subsets and its symmetrical relation of generalization (see the graph on figure 6). Specializing a subset amounts to adding an item to its name whereas an item is removed from the name by generalizing. These relations thus allow to gradually navigate inside the rule set from the general to the specific rules, or in the other direction from the specific to the general ones. The rules are more specific (respectively more general) in the sense that they involve more items (resp. less items) and concern fewer records (resp. more records) in the data set, i.e. their supports are smaller (resp. higher). Such relations make sense for the user by allowing the study of more and more particular phenomena or more and more global phenomena. For example, the user finds in a subset an interesting rule (pizza, crisps) (beer) and decides to examine this population who likes pizzas, crisps and beer. To answer the question "Which other items are purchased with pizzas, crisps and beer?", he/she just has to specialize the current subset.

4.2

Visual metaphor

We here describe how we have chosen to map the rules to visual form to support the rule rummaging process. To represent each rule subset, we use a 3D "information landscape" representation [9]. This visualization method associated with viewpoint control functionalities have shown to be efficient for browsing wide information corpuses (for example, large file system hierarchies with Silicon Graphics' FSN, hypertext document graphs with Harmony [3], or websites with Bray's Web visualization [8]), which we need with our large rule sets. Moreover, with the use of 3D, volumes can be exploited as marks in the landscape space, which allows benefiting from more mark graphical properties and so representing a maximum of quality measures. Here we symbolize the rules by spheres put on top of cones in the landscape, thus obtaining three straightforward graphical characteristics to represent quality measures: sphere diameter, cone height and color. For each rule subset, the landscape is shared into two areas: one is dedicated to the specific rules, and the other one to the general rules (figure 7). The representation size depends only on the rule quantity in the subset and not on the amount of items. As spatial position is the perceptually dominant visual coding, the most important quality measures are represented by using a spatial mapping in order to be highlighted. Several rules can have the same quality, therefore only two axes can be used to encode the measures letting the third dimension free to allow spreading rules

Figure 7. Two arenas facing each other in a landscape representation Weighing it all up, we have opted for the following visual encodings for the rule subsets (see figure 9): – the sphere and cone position represents the implication intensity, – the sphere visible area represents the support, – the cone height represents the confidence, – the sphere and cone color is used redundantly to represent a weighed mean of the three measures which allows to have a synthetic idea of the rule quality. More precisely, a large red sphere laid-out on a tall cone placed at the front of an arena, on the lower steps, represents a rule whose support, confidence and implication intensity are high, while a little blue sphere laid-out on a small cone placed at the back, on the upper steps, represents a rule whose three measures are weak.

Figure 8. The rules are symbolized by spheres on top of cones Ideally, this metaphor should be adapted for each user: one can choose for example to change the mappings or to represent one more quality measure with color or by using two axes for the spatial mapping. Furthermore, some complementary text labels appear above each sphere to display the corresponding rule name. They provide the numerical values for support, confidence and implication intensity too and thus complete the qualitative information given by the visual representation.

Finally, for each rule subset, the user can filter the rules in the landscape thanks to dynamic queries (first reported by Williamson and Schneiderman [40]) on the quality measures via sliders. Only the rules with all the measures between the lower and the upper thresholds are represented.

5.

surprising rules support little surprising rules confidence

Figure 9. Metaphor description

4.3

Interactions

The user can interact in three different ways with the visual representation: by visiting a rule subset, by navigating from subset to subset, and by filtering the rules on the quality measures in a subset. For each rule subset, when the browsing of a 3D landscape begins, the user is placed between the two arenas and benefits from an overall and synthetic view. With this comprehensive vision, the chosen visual mappings allow to stress the good quality rules, whose visualization and access are made easier compared to the worse rules. The user can next wander freely over the two arenas in the 3D landscape to browse the rules and examine them more closely (standard primitives for the viewpoint control are implemented in the 3D space browsers: walk, fly, or rotate and zoom). To facilitate the user's browsing, there also exist predefined viewpoints in each landscape for the overall vision of each arena and for the close vision of each sphere. To drive the rule rummaging process, the user just has to click on the spheres or cones in the visual representation. By clicking on a specific rule (respectively a general rule) in a subset, he/she triggers the relation of specialization (resp. generalization), the current rule subset is thus replaced by a new more specific subset (resp. a new more general subset), and the visualization is updated. For example, in the subset named ABC, the specific rule (A,B,C) (D) is an access to the subset ABCD, and the general rule (A,B) (C) is an access to the subset AB. Whenever he/she wants during the rummaging process, the user can decide to use one or the other of the neighborhood relations and thus change the navigation direction. Also to assist the navigation, some marbles and small cones are visible by transparency inside each sphere (figure 9). They are a shrunk version of the spheres and cones relative to the next subset that will be reached by clicking on the sphere. This allows anticipating that a subset contains good quality rules or on the contrary contains no rules at all.

CONCLUSION

In this paper, we have presented a visualization specially designed to support association rule validation in a KDD process. This new approach is based on our rule rummaging model. It enables the user to drive his/her navigation through the voluminous rule set by trial and error via the successive limited subsets he/she focuses on. The visualization thus helps him/her comprehend the bulk of rules and find the relevant ones for decision-making. We have chosen to use an intuitive information landscape 3D representation capable of integrating a large amount of rules described by several quality measures while remaining intelligible. Moreover, contrary to most association rule set representations, our visualization takes advantage of both the items used in the interactions to dynamically change the visualized rule subset, and the quality measures used in the visual representation. A rule rummaging tool implementing our visualization method has been developed. The 3D landscape is generated dynamically in VRML by a CGI program according to the outputs of the Felix rule retrieval program [28]. The visualization prototype runs with a classical web browser equipped with a relevant VRML plug-in. It can be used with shutterglasses to provide stereoscopic display in order to improve the perception of depth. Our future works will mainly concern: – implementing additional neighborhood relations to improve the navigation through the rules, – coupling the visualization with local extraction algorithms that the user would drive interactively via the representation in order to produce only the necessary rules.

6.

REFERENCES

[1] Aggarwal, C.C. 2002. Towards effective and interpretable data mining by visual interaction. SIGKDD Explorations 3(2), 11-22. [2] Agrawal, R., Imielinsky, T., and Swami, A. 1993. Mining associations rules between sets of items in large databases. In Proc. of the ACM SIGMOD'93, 207-216. [3] Andrews, K. 1995. Visualising cyberspace: information visualisation in the Harmony internet browser. In Proc. of the IEEE Symposium on Information Visualization InfoVis'95, 97-104. [4] Bandhari, I. 1994. Attribute focussing: machine-assisted knowledge discovery applied to software production process control. Knowledge acquisition 6, 271-294. [5] Blanchard, J., Guillet, F., and Briand, H. 2003. A virtual reality environment for knowledge mining. In Proc. of the 14th Mini-Euro Conference on Human Centered Processes, 175-179. [6] Blanchard, J., Kuntz, P., Guillet, F., and Gras, R. 2003. Implication intensity: from the basic statistical definition to the entropic version. In Statistical Data Mining and

Knowledge Discovery, Chapman & Hall / CRC, H. Bozdogan (Ed.), 475-487. [7] Brachman, J.R., and Anand, T. 1996. The process of knowledge discovery in databases: a human-centered approach. In Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, U.M. Fayyad, G. PiatetskyShapiro, and P. Smyth (Eds.), 37-58. [8] Bray, T. 1999. Measuring the Web. In Readings in Information Visualization: Using Vision To Think, Morgan Kaufman Publishers, S. Card, J.D. Mackinlay, and B. Shneiderman (Eds.), 469-492.

[21] Hofmann, H., and Wilhelm, A. 2001. Visual comparison of association rules. Computational Statistics 16(3), 399-415. [22] Huang, M.L., Eades, P., and Wang, J. 1998. Online animated graph drawing using a modified spring algorithm. Journal of Visual Languages and Computing 9(6). [23] Imielinski, T., and Mannila, H. 1996. A database perspective on knowledge discovery. Communications of the ACM 39(11), 58-64. [24] Keim, D.A. 2002. Information visualization and visual data mining. IEEE Transactions on Visualization and Computer Graphics 8(1), 1-8.

[9] Chalmers, M., Ingram, R., and Pfranger, C. 1996. Adding imageability features to information displays. In Proc. of ACM Symposium on User Interface Software and Technology, ACM Press, 33-39.

[25] Klemettinen, M., Mannila, H., and Toivonen, H. 1996. Interactive exploration of discovered knowledge: a methodology for interaction, and usability studies. Technical report C-1996-3, University of Helsinki.

[10] Fayyad, U.M., Piatetsky-Shapiro, G., and Smyth, P. 1996. From data mining to knowledge discovery: an overview. In Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.), 1-34.

[26] Kopanakis, I., and Theodoulidis, B. 2001. Visual data mining and modeling techniques. In KDD 2001 Workshop on Visual Data Mining.

[11] Fayyad, U.M., Grinstein, G.G., and Wierse, A., (Eds.). 2001. Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann Publishers. [12] Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T. 2001. Data mining with optimized two-dimensional association rules. ACM Transactions on Database Systems 26(2), 179-213.

[27] Kuntz, P., Guillet, F., Lehn, R., and Briand, H. 2000. A userdriven process for mining association rules. In Proc. of the 4th European Conference of Principles of Data Mining and Knowledge Discovery, Springer, L.N.A.I. 1910, 160-168. [28] Lehn, R., Guillet, F., Kuntz, P., Briand, H., and Philippé, J. 1999. Felix: an interactive rule mining interface in a KDD process. In Proc. of the 10th Mini-Euro Conference on Human Centered Processes, 169-174.

[13] Gras, R. et coll. 1996. L'Implication Statistique - Nouvelle Méthode Exploratoire de Données. La pensée sauvage éditions (in French).

[29] Liu, B., Hsu, W., and Ma, Y. 1999. Pruning and summarizing the discovered associations. In Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 125-134.

[14] Grinstein, G.G. 1996. Harnessing the human in knowledge discovery. In Proc. of the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, AAAI Press, 384-385.

[30] Liu, B., Hu, M., and Hsu, W. 2000. Multi-level organization and summarization of the discovered rules. In Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 208-217.

[15] Han, J., Chiang, J., Chee, S., Chen, J., Chen, Q., Cheng, S., Gong, W., Kamber, M., Koperski, K., Liu, G., Lu, Y., Stefanovic, N., Winstone, L., Xia, B., Zaiane, O.R., Zhang, S., and Zhu, H. 1997. DBMiner: a system for data mining in relational databases and data warehouses. In Proc. of CASCON'97: Meeting of Minds, 249-260.

[31] Montgomery, H. 1983. Decision rules and the search for dominance structure: toward a process model of decisionmaking. In Analyzing and Aiding Decision Processes, P.C. Humphreys, O. Svenson, and A. Vari (Eds.), 471-483.

[16] Han, J., and Cercone, N. 2000. RuleViz: a model for visualizing knowledge discovery process. In Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 244-253. [17] Han, J., and Cercone, N. 2000. AViz: a visualization system for discovering numeric association rules. In Proc. of PAKDD 2000, 269-280. [18] Hao, M.C., Dayal, U., Hsu, M., Sprenger, T., and Gross, M.H. 2001. Visualization of directed associations in ecommerce transaction data. In Proc. of VisSym'01, 185-192. [19] Herman, I., Melancon, G., and Marshall, M. S. 2000. Graph visualization and navigation in information visualization: a survey. IEEE Transactions on Visualization and Computer Graphics 6(1), 24-43. [20] Hilderman, R., and Hamilton, H. 2001. Knowledge discovery and measures of interest. Kluwer Academic publishers.

[32] Nagel, H.R., Granum, E., and Musaeus, P. 2001. Methods for visual mining of data in virtual reality. In PKDD 2001 International Workshop on Visual Data Mining. [33] North, S. 1995. Incremental layout in DynaDAG. In Proc. of the Symposium on Graph Drawing GD'95, Springer, L.N.A.I. 1027, 409-418. [34] Ong, K.H., Ong, K.L., Ng, W.K., and Lim, E.P. 2002. CrystalClear: active visualization of association rules. In ICDM'02 International Workshop on Active Mining AM2002. [35] Rainsford, C.P., and Roddick, J.F. 2000. Visualisation of temporal interval association rules. In Proc. of the 2nd International Conference on Intelligent Data Engineering and Automated Learning, Springer, 91-96. [36] Silberschatz, A., and Tuzhilin, A. 1996. User-assisted knowledge discovery: how much should the user be

involved. In ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. [37] Simoff, S.J. 2001. Form-semantics-function, a framework for designing visualization models for visual data mining. In KDD 2001 Workshop on Visual data Mining. [38] Suzuki, E. 1997. Autonomous discovery of reliable exception rules. In Proc. of the 3rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, AAAI Press, 259-262. [39] Tan, P., Kumar, V., and Srivastava, J. 2002. Selecting the right interestingness measure for association patterns. In

Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 32-41. [40] Williamson, C., and Shneiderman, B. 1992. The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploration system. In Proc. of the 15th ACM SIGIR Conference on Research and Development in Information retrieval, ACM Press, 338-346. [41] Wong, P.C., Whitney, P., and Thomas, J. 1999. Visualizing association rules for text mining. In Proc. of the IEEE Symposium on Information Visualization InfoVis'99, 120123.

Suggest Documents