association rule sets compounded with their low precision often make the ... able graphical-based visualization prototype, called CB-. VAR (1) ... Our purpose is to obtain a global and detailed view at the .... transcribe his actions and comments.
A scalable association rule visualization towards displaying large amounts of knowledge Olivier Couturier 1
Tarek Hamrouni 2
1
CRIL CNRS FRE 2499 Université d’Artois, IUT de Lens Rue de l’université, SP-16 F-62307 Lens Cedex France {couturier, mephu}@cril.univ-artois.fr Abstract Providing efficient and easy-to-use graphical tools to users is a promising challenge of data mining (DM). These tools must be able to generate explicit knowledge and to restitute it. Visualization techniques have shown to be an efficient solution to achieve such goal. Even though considered as a key step in the mining process, the visualization step of association rules received much less attention than that paid to the extraction one. Nevertheless, some graphical tools have been developed to extract and visualize association rules. In those tools, various approaches are proposed to filter the huge number of association rules before the visualization step. However both DM steps (association rule extraction and visualization) are treated separately in a one way process. Our approach differs, and uses meta-knowledge to guide the user during the mining process. Standing at the crossroads of DM and Human-Computer Interaction (HCI), we present an integrated framework covering both steps of the DM process. Furthermore, our approach can easily integrate previous techniques of association rule visualization. Keywords—Association rule mining, Clustering, Visualization, Human Computer Interaction.
1
Introduction
Data mining (DM) techniques have been proposed and studied to help users better understand and scrutinize huge amounts of collected and stored data. In this respect, extracting association rules has grasped the interest of the DM community. Thus, the last decade has been marked by a determined algorithmic effort to reduce the computation time of the interesting itemset extraction step. The obtained success is primarily due to an important programming effort combined with strategies for compacting data structures in main memory. However, it seems obvious that this frenzied activity loses sight of the essential objective
Sadok Ben Yahia 2
Engelbert Mephu Nguifo 1
2
Département des Sciences de l’Informatique Faculté des Sciences de Tunis Campus Universitaire 1060 Tunis, Tunisie {tarek.hamrouni, sadok.benyahia}@fst.rnu.tn
of this step, i.e., extracting a reliable knowledge, of exploitable size for users. Indeed, the unmanageably large association rule sets compounded with their low precision often make the perusal of knowledge ineffective, their exploitation time-consuming and frustrating for users. Moreover, unfortunately, this teenaged field seems to provide results in the opposite direction with the evolving "knowledge management" topic. Thus, DM techniques gain in presenting discovered knowledge in an interactive, graphical form that often fosters new insights. Thus, the user is encouraged to form and validate new hypotheses to the end of better problem solving and gaining deeper domain knowledge [7]. Visual data analysis is a way to achieve such goal, especially when it is tightly coupled with the management of meta-knowledge used to handle the vast amounts of extracted knowledge. We propose in this paper to exploit generic bases [2] in order to losslessly reduce the number of rules. Such bases contain the most informative association rules of minimal premises (i.e., minimal generators) and maximal conclusions (i.e., closed itemsets). Their use can be seen as a "sine qua non" condition to avoid that the visualization step comes up short in dealing with large amounts of rules. In fact, visualization techniques are frequently employed to present the extracted knowledge in a form that is understandable to humans. Indeed, the information presented in the form of images is more direct and easily understandable [6]. This makes graphical visualization tools more appealing when handling large datasets with complex relationships. During the last few years, different graphical approaches were proposed to present association rules in order to solve the major problem previously quoted. Various works already exist to help user analysis. Their common main drawback is that they do not offer a global and detailed view at the same time. Indeed, there is no representation which respects this point, which is crucial in
Human Computer Interaction (HCI) [20]. In our work, we are interested in presenting a scalable graphical-based visualization prototype, called C B VAR (1) , for handling association rules. Two interesting features of our prototype are noteworthy: 1. It encompasses the step of generic association rule extraction. As output, this step provides an XML file recording the set of generic association rules. 2. It provides a "guided" exploration based on a clustering of association rules. Such clustering implements the Shneiderman’s recommendation "Overview first...". "... then details on demand" are provided in a fish eye view fashion once the user puts the focus on the desired cluster. Laying on "divide to conquer" idea, this clustering makes it possible to portion the association rule set into smaller pieces of knowledge that can be easily handled. Interestingly enough, the prototype scope is not limited to only handle generic bases since C B VAR allows the visualization of any set of association rules. The choice of handling generic bases is quite natural. Indeed, they present a promising issue allowing a lossless reduction of the overwhelming number of association rules that can be extracted even from small real-life datasets. Moreover, these generic bases present an intrinsic structure that allows to automatically recommend the most adequate graphical visualization technique. The remainder of this paper is organized as follows. Section 2 is dedicated to a detailed description of the C B VAR visualization prototype after which Section 3 presents its evaluation. Section 4 presents our experimentations to validate our approach. Section 5 provides a review of related work and discusses in depth the visualization scalability issue. The last section ends with conclusions and future work.
2
C B VAR (Clustering-based Visualizer of Association Rules)
2.1 The prototype Human Computer Interaction (HCI) works are focused on Visual Information-Seeking Mantra illustrated by Shneiderman: "Overview first, zoom and filter, then details on demand". First of all, graphical tools must provide a global view of the system in order to deduce the main important points. The user must be able to determine the starting point of his analysis thanks to this global view. During his analysis, he must be able to explore in-depth particular areas if desired. Indeed, all details are not of need 1 C B VAR
stands for Clustering-based Visualizer of Association Rules. at http://www.cril.univ-artois.fr/∼ couturier/cbvar/.
2 Available
to be displayed at the same time. Unfortunately, all current representations do not respect this crucial point which is necessary in order to visualize large set of knowledge. Our purpose is to obtain a global and detailed view at the same time, which is not the case of related work. Indeed, the key to the success of a visualization prototype is a full compliance with such recommendations. In the following, we present a detailed description of the C B VAR prototype that is implemented in JAVA and runs on UNIX system (2) . The main feature of C B VAR is that it presents an integrated framework for extracting and shrewdly visualizing association rules. Thus, it avoids modifying the output format of the existing association rule mining algorithms to fit into the visualization module input requirements. The user has only to provide a data file and, at the end of the tunnel, association rules can be visualized. C B VAR is composed of two independent parts: generation of clusters and visualization of clustered association rules. The first part will not be detailed in this paper because it does not focus on visualization. Briefly, in order to obtain our clusters, we use several existing tools which are gathered within a shell-script. The well known K- MEANS algorithm [19] is applied on data. As output, we obtained disjoint clusters s.t. each one contains a reduced cardinality set of association rules. The choice of K- MEANS algorithm is argued by the fact that it is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. Moreover, the K- MEANS algorithm allows obtaining a significant set of clusters [22]. The visualization part is described hereafter.
2.2
Association rule visualization: Unraveling from global to detailed views
Actually, all graphical representations present advantage and drawbacks. For example, trees and graphs are easily interpretable thanks to their hierarchical structure. However, they are unreadable on dense data. Matrix 2D and 3D use colors in order to highlight metrics but they require cognitive efforts. Virtual reality allows introducing the user within the process in order to focus on a data subset but the metaphor choice requires a good data comprehension. There is no representation which is the best. Nevertheless, 3D matrix is a good compromise. We focused on 3D matrix "itemsets-itemsets". It is based on a 2D matrix but the third dimension is exploited. On the other hand, the different kinds of association rule representations also present advantages and limitations which must be taken into account depending on users’ preferences. All these representations have a common limitation: they are not global and detailed at the same time. Indeed, global representations are quickly unreadable, whereas detailed
Figure 1: Displaying clustered association rules with C B VAR.
representations do not present all information. One of the limitations of the existing representations is then to display all rules in a same screen space. Indeed, the user’s visibility and understanding are proportionally reduced according to the number of rules. In addition, these representations are not adapted to clusters visualization. Several kinds of known representations were proposed to study a large set of information as lenses and hyperbolic trees [16], perspective walls [18], fisheyes view (FEV) [11] or superposition of transparent views [14]. In our previous work, we showed that FEV is adapted to this kind of problem [9]. Indeed, FEV shows the point of interest in detail and the overview of the display in the same rep3 Available
at http://www.aisee.com.
resentation. This is done by distorting the picture which, for this purpose, is not uniformly scaled. Objects far away from the focus point are shrunk while objects near the focus point are magnified. The degree of visual distortion depends on the distance from the focus point. Several tools implement FEV as A I S EE (3) , I NFOV IS [10] and LARM [8]. A first approach using a FEV in an association rule visualization method was proposed in [9]. In this work, authors exploited a merging representation in order to represent a set of rules. The focus point is a detailed rule. In a similar manner, we will use it in our new prototype to visualize a set of clusters. Thus, we merge both 2D and 3D representations. The
2D representation allows having a global association rule view of one rule of each cluster. This rule is selected according to an interestingness measure such as lift [4], support or confidence [1]. Note that the lift measure is preferred by default, i.e., when the user does not give any choice. So, each area represents one rule of the corresponding cluster and we use the focus point of the FEV to detail the corresponding cluster, if FEV is active. To show it, the magnified focus point displays a 3D cabinet projection representation tackling occlusions appearance [8] and 2D representation, respectively for sparse and dense contexts. It is worth noting that a dense context is characterized by the presence of at least one exact rule since minimal generators are different from their respective closures, which is not the case for sparse contexts. In this case, users can vary several metrics in real-time: support and confidence. Moreover, main information and rules of the cluster are given in text areas. C B VAR is dedicated to cluster visualization. Its purpose is not to detail in-depth information of the current cluster but to visualize all clusters in the same time. Thanks to a double click on the chosen cluster, we propose to execute LARM which is able to obtain this detailed information. The main purpose of this hybrid representation is to use the advantages of both representations in order to resolve our main problem. The next section presents an evaluation of our prototype.
3
Evaluation
3.1 Methodology We realized a preliminary evaluation of our clusters representation. The aim of this study was to establish if people can understand the representation of a set of clusters. The set of data we used in our evaluation is divided on five clusters. Moreover, we wished to evaluate the intuitive features of our interface. In order to avoid dodging the issue, we select a set of computer science students which have no preliminary knowledge of the toy example. Each test concerns one student and one observer. Each student is concerned by three parts: a free exploration (FE), a demonstration (Demo) and a second exploration (SE). The total evaluating time is equal to 15 minutes. Firstly, during the FE part which lasts five minutes, a person is placed in front of our representation which is running on a computer. He has neither instructions on the software nor on its functionalities. Its aim is just to explore as he wish. However, he must comment its actions and explains what the visualization represents for him. Thus, during this step, an observer is needed not to help the student but to transcribe his actions and comments. The major questions which are mentioned on the evaluation form aimed to know whether the student established the links within the subsets of data (i.e., the clusters) or not, and how the student reacts with the 2D and 3D notions. The second step (Demo) is a
demonstration of the main functionalities by the observer during three minutes. Finally, the student can realize a last exploration (SE) during seven minutes. In this part, the aim was to interrogate the student about the software (functionalities, easy-to-use points, etc.). This part was just an informal discussion.
3.2
Results
Our sample is reduced and it doesn’t allow to lead to significant statistics. However, the used evaluation methodology (discount usability testing) [21] provides a correct evaluation of initial perceptions of the visualizing methodology of C B VAR by real users. This kind of approach is efficient to highlight the main advantages and drawbacks of a user interface. Some major outlines [21] appeared during this evaluation. On the one hand, all students understood without explications that the visualization represents several data subsets. They didn’t know FEV notions but they used words as groups or sets in order to define their perception of the representation. Eight students discovered that options 2D/3D (sliders and radio buttons) were dependent with the representation. The comments of six students are related to the size of data (much, large, important). We can give two preliminary conclusions. Firstly, C B VAR seems to be intuitive in order to represent subsets of data. Secondly, C B VAR seems to be relevant for the students to analyze large datasets. These two points consolidate the visual approach of C B VAR. On the other hand, three students mentioned that the visual deformation allows to research particular information. However, seven students thought that 2D is more comprehensible than 3D. Consequently, it is difficult for them to understand the information which is zoomed. But, eight students guessed that colors must have a signification because they saw that colors are variable. Several students criticized some functionalities of C B VAR as interact with the 3D representation in order to have detailed information. Six students understood that the cluster details are given in text-way but they thought this representation is not easyto-understand. A global conclusion is that the visualization process using C B VAR is convivial, while its global perception is clearly positive. Nevertheless, C B VAR still presents possibilities for improving its functionalities. In the next section, we present our experimental results.
4
Experimentations
It was worth the effort to experiment in practice the potential benefits of the proposed prototype. Experiments were conducted on a Pentium IV 3 Ghz with 1 GB of main memory, running the distribution Fedora Linux. Benchmark datasets used during these experiments are taken
Dataset
minsup
minconf
# rules
# generic rules
M USHROOM M USHROOM M USHROOM T10I4D100K T10I4D100K T10I4D100K
0.4 0.3 0.2 0.006 0.005 0.004
0.4 0.3 0.2 0.006 0.005 0.004
7, 020 94, 894 19, 197, 504 928 2, 216 8, 090
537 2, 184 7, 057 594 1, 231 3, 623
Displaying time (ms) rules Without clusters With ( # generic ) clusters x x = 100 x = 50 x = 25 828 375 375 594 2, 250 1, 782 3, 250 5, 750 17, 953 1, 453 516 813 1, 594 2, 475 1, 375 2, 329 4, 125 6, 078 11, 750 27, 000
Table 1: Displaying time for both M USHROOM and T10I4D100K datasets vs. the variation of the number of clusters. from the FIMI website (4) . In the remainder, we put the focus on the M USHROOM and the T10I4D100K datasets. The first dataset, composed of 8124 transactions for 119 items, is considered as typical dense dataset. The second dataset, composed of 100, 000 transactions for 1, 000 items, is considered as typical sparse dataset. A large number of association rules is expected to be mined from both datasets, especially when lowering the minsup and minconf values. The number of extracted generic rules (denoted # generic rules) increases as far as we lower these two metrics: from 537 (when minsup = 0.4 and minconf = 0.4) to 7, 057 (when minsup = 0.2 and minconf = 0.2) for the M USHROOM dataset, and from 594 (when minsup = 0.006 and minconf = 0.006) to 3, 623 (when minsup = 0.004 and minconf = 0.004) for the T10I4D100K dataset. The number of clusters is chosen according to the number x of association rules that will be contained in each cluster. Our results are depicted by Table 1. All XML files are already unreadable in the same screen space due to the relevant number of rules. Our approach allows visualizing these benchmark datasets in the same screen space in approximately the same time with a number of clusters equal rules to ( # generic ). Our tests with other numbers of clus100 ters (i.e., for x = 25 and x = 50) require in general more time to be displayed. However, they make it possible to reduce the displaying time relative to each cluster, taken separately from the others, since each cluster contains less association rules (less than 100). Table 1 also compares the total number of association rules (5) (denoted # rules) with that of generic ones. Statistics clearly show the benefits of generic bases towards presenting to users a set of interesting association rules that is compact as much as possible. In order to interact in real-time, it is necessary to load all rules in main memory. However, we can observe that with a minsup (resp. minconf ) value equal to 0.2 (resp. 0.2) for the M USHROOM dataset and with a minsup (resp. minconf ) value equal to 0.004 (resp. 0.004) for the T10I4D100K dataset, it is no more possible to load them
in a classical visualization without clustering. Our approach allows visualizing the first dataset in 17, 953 ms rules with ( # generic ) clusters. The associated set contains 100 7, 057 rules which are represented in a same screen space. The second dataset is displayed with our approach for the different values of x used in our tests. In this case, the association rule set contains 3, 623 rules. In this context, we did not use higher minsup values since the number of generated rules will be too weak. A classical representation can then treat them. However, we mainly focused on a scalable graphical-based visualization prototype.
5
Related work and scalability issue
Compared to the stampede algorithmic effort for extracting frequent (condensed) itemsets, only few works paid attention to visualizing association rules. The work of Hofmann et al. [15] mainly focused on visual representation of association rules and presented the relationships between related rules using either interactive mosaic plots (histograms) or double decker plots. In both approaches, there is no focus on rule reduction and the user’s interaction is limited. While, Liu et al. [17] presented an integrated system for finding interesting associations and visualizing them. Han and Cercone [13] proposed an interactive model for visualizing the entire process of DM. They mainly focus on the "parallel coordinates" technique to visualize rule induction algorithms. Blanchard et al. [3] proposed an experimental prototype, called A RV IS, which is intended to tackle the association rule validation problem by designing a humancentered visualization method for the rule rummaging task. This approach, based on a specific rummaging model, relies on rule interestingness measures and on interactive rule subset focusing and mining. The work of Zaki and Phoophakdee [24], introducing the M IRAGE tool, is the only one handling a generic basis of association rules (more precisely, that defined by the first author in [23]).
4 http://fimi.cs.helsinki.fi/data. 5 Provided
by the implementation of B. Goethals available at: http://www.adrem.ua.ac.be/∼ goethals/software/.
Even though this basis is not informative (and, hence, with information loss), as outlined in [12], this system presents a new framework for mining and visually exploring minimal rules (i.e., with minimal premises and also minimal conclusions) using a lattice-based interactive rule visualization approach. Buono and Costabile in [5] described a visual strategy that exploits a graph-based technique and parallel coordinates to visualize the results of association rule mining algorithms. In all these methods, it remains extremely difficult to visualize and understand the association information of a large dataset even when all rules are available. Our approach contributes to avoid such limitations. In fact, the ultimate goal is to bring the power of visualization technology to every desktop to allow a better, faster, and more intuitive exploration of very large data resources [7]. Thus, we believe that the scalability of graphical visualizations is of paramount importance. The use of the "scalability" term in this context may be misleading. However, we mean by this term that a user has actually to explore both large and small amount of knowledge with the same facility. Usually, the foggy and hazy interface that a user has to face when browsing a large amount of association rules is never showed when advertising graphical visualization systems (e.g., the P URPLE I NSIGHT software (6) , the A SSOCIA TION RULE VIEWER (7) and the M IRAGE prototype [24]). Indeed, faced with the deluge of association rules, a user may be asked to rely on a tedious and boring application of its perceptual abilities to explore such set of association rules. Even if we consider that we only have to visualize generic association rules, their sufficiently high number (8) can dissuade users to perform a scrutinizing exploration. So, to go beyond such limitation, we provide in this work a means to tackle the "scalability" issue by performing a clustering on the generic association rule set combined with the use of a FEV technique.
6
Conclusions and future work
In this paper, we presented the C B VAR prototype towards a scalable visualization of association rule sets. Its main originality is that it uses a back-end meta-knowledge to guide association rules through their clustering. Furthermore, this meta-knowledge allows a cooperative exploration of generic bases of association rules. Carried out experiments showed that thousands of rules can be displayed in few seconds and that the proposed clustering-based visualization can be an efficient tool to handle a large number of association rules, thanks to a merging representation which exploits both a 2D representation and a FEV. We presented a first evaluation which consolidates this choice. 6 Available
at http://www.purpleinsight.com/. at http://www.lifl.fr/ jourdan/. 8 Especially those extracted from dense contexts. 7 Available
This approach allows obtaining a global and detailed view at the same time. Nevertheless, our goal is not to automatically select the best relevant rules as it might be done with other methods that use interestingness measures, but to propose an interactive and intuitive approach that allows the user to be able to visualize the whole set of generated rules, and consequently to shed light on relevant ones. This approach is thus complementary to any kind of selective association rule methods, and can easily integrate any other visualization tool which is suitable for small set of rules. Another avenue for future work addresses the integration of other clustering algorithms as well as other visualization techniques such as perspective walls.
Acknowledgments This work is partially supported by the French-Tunisian project PAI-CMCU 05G1412.
References [1] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In Proceedings of Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California, USA, pages 307–328, 1996. [2] Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st International Conference on Computational Logic (DOOD 2000), Springer-Verlag, LNAI, volume 1861, London, UK, pages 972–986, July 2000. [3] J. Blanchard, F. Guillet, and H. Briand. Exploratory visualization for association rule rummaging. In Proceedings of the 4th International Workshop on Multimedia Data Mining MDM/KDD’03, Washington, DC, USA, pages 107–114, 2003. [4] S. Brin, R. Motawni, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlation. In Proceedings of the ACM-SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, pages 265–276, May 1997. [5] P. Buono and M. F. Costabile. Visualizing association rules in a framework for visual data mining. In M. Hemmje et al., editor, From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments, Springer-Verlag, pages 221–231, February 2004.
[6] P. Buono, M. F. Costabile, and F. A. Lisi. Supporting data analysis through visualizations. In Proceedings of the Workshop on Visual Data Mining, Freiburg, Germany, pages 67–78, September 2001.
plots. In Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, pages 227–235, August 2000.
[7] B. Bustos, D. A. Keim, J. Schneidewind, T. Schreck, and M. Sips. Pattern visualization. Technical report, University of Konstanz, Germany, Computer Science department, December 2003.
[16] J. Lamping, R. Rao, and P. Pirolli. A focus+context technique based on hyperbolic geometry for visualizing large hierarchies. In Proceedings of the ACM conference on Human Factors in Computing Systems (CHI’95), ACM Press, Denver, Colorado, USA, pages 401–408, 1995.
[8] O. Couturier, E. Mephu Nguifo, and B. Noiret. A formal approach to occlusion and optimization in association rules visualization. In Proceedings of International Symposium of Visual Data Mining (VDM) of IEEE 9th International Conference on Information Visualization (IV@VDM’05), Poster, Londres, UK, July 2005.
[17] B. Liu, W. Hsu, K. Wang, and S. Chen. Visually aided exploration of interesting association rules. In Proceedings of the 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’99), Beijing, China, pages 380–389, 1999.
[9] O. Couturier, J. Rouillard, and V. Chevrin. An interactive approach to display large sets of association rules. In Proceedings of the 12th International Conference on Human-Computer Interaction (HCI’07), Beijing, China, 2007.
[18] J. D. Mackinlay, G. G. Robertson, and S. K. Card. The perspective wall: Detail and context smoothly integrated. In Proceedings of the ACM conference on Human Factors in Computing Systems (CHI’91) , ACM Press, pages 173–179, 1991.
[10] J-D. Fekete. The Infovis toolkit. In Proceedings of the 10th IEEE Symposium on Information Visualization (INFOVIS’04), pages 167–174, 2004.
[19] J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, University of California Press, number 1, pages 281–297, 1996.
[11] G.W. Furnas. A fisheyes follow-up: Further reflections on focus+context. In Proceedings of the ACM conference on Human Factors in Computing Systems (CHI’06), ACM Press, Montréal, Canada, pages 999–1008, 2006. [12] Gh. Gasmi, S. Ben Yahia, E. Mephu Nguifo, and Y. Slimani. IGB: A new informative generic base of association rules. In Proceedings of the 9th PacificAsia Conference on Knowledge Discovery and Data Mining (PAKDD’05), Hanoi, Vietnam, pages 81–90, 2005. [13] J. Han and N. Cercone. RULEVIZ: A model for visualizing knowledge discovery process. In Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Boston, MA, USA, pages 244–253, August 2000. [14] B. L. Harrison and K. J. Vicente. An experimental evaluation of transparent menu usage. In Proceedings of the ACM conference on Human factors in computing systems (CHI’96), ACM Press, New York, USA, pages 391–398, 1996. [15] H. Hofmann, A. Siebes, and A. F. X. Wilhelm. Visualizing association rules with interactive mosaic
[20] B. Shneiderman. The eyes have it: A task by data type taxonomy for information visualization. In Proceedings of IEEE Symposium on Visual Languages, Boulder, Colorado, USA, pages 336–343, 1996. [21] B. Shneiderman and C. Plaisant. Designing the user interface. Boston: Addison-Wesley, 2005. [22] X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing itemset patterns: A profile-based approach. In Proceedings of the 11th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05), Chicago, Illinois, USA, pages 314– 323, 2005. [23] M. J. Zaki. Mining non-redundant association rules. Data Mining and Knowledge Discovery: An International Journal, 9:223–248, 2004. [24] M. J. Zaki and B. Phoophakdee. M IRAGE: A framework for mining, exploring and visualizing minimal association rules. Technical report (http://www.cs.rpi.edu/research/), Department of Computer Science, Rensselaer Polytechnic Institute, USA, July 2003.