2342 of Lecture Notes in Computer Science, pp. 264-278. [12] Blanchard, J., Guillet, F., and Guillet, F. Exploratory visualization for association rule rummaging.
Exploiting Knowledge Representation for Pattern Interpretation Mariângela Vanzin, Karin Becker Pontifícia Universidade Católica do Rio Grande do Sul – PUCRS Av. Ipiranga, 6681, Porto Alegre, Brazil {mvanzin, kbecker}@inf.pucrs.br
Abstract. Web Usage Mining (WUM) is the application of data mining techniques over web server logs in order to extract navigation usage patterns. Semantic Web Usage Mining aims at combining the Semantic Web and WUM. The main goal of the Semantic WUM is to improve the process and the results of WUM by exploiting the new semantic structure in the Web. Pattern analysis is a critical phase in WUM, for two main reasons: a) mining algorithms yield a huge number of patterns; b) there is a significant semantic gap between URLs and events performed by users. This paper discusses the use of ontologies available at Semantic Web to support the interpretation of web usage sequential patterns. Functionality is targeted at supporting the comprehension of patterns, as well as on the identification of potentially interesting ones through interactive pattern rummaging.
1
Introduction
Web Mining aims at discovering insights about Web resources and their usage [1][2]. Web Usage Mining (WUM) is the application of data mining techniques to extract navigation usage patterns from records of page requests made by visitors of a Web site. Access patterns mined from Web logs can represent useful knowledge in practice. It can help improving the design of Web sites, analyzing users reaction and motivation, building adaptive Web sites, improving site content, among others. The comprehension of mined patterns is difficult due to the primarily syntactical nature of web data [3]. Thus, the formalization of the semantics of Web resources and navigation behavior is increasingly required. Semantic Web is the proposal of enriching the Web with machine-processable information to better support users in their tasks [4]. Semantic Web Mining aims at combining these two research areas [3, 5, 6]. The main goal is, on one hand, to improve the results of Web Mining by exploiting the new semantic structures available in the Web; and on the other hand, to make use of Web Mining, for building up the Semantic Web. Recently, many approaches started exploiting the semantic structures stored in the ontology layer [7] in the Semantic Web architecture. The WUM process is divided into three generic phases [1]: preprocessing, pattern discovery and pattern analysis. Pattern analysis remains a key issue in the area of WUM. Typically mining techniques (e.g. association, sequence) yield a huge number of patterns and most of them are useless, uncompressible or uninteresting to users [8].
Due to the elevated number of patterns, users have difficulty on identifying the ones that are interesting with regard to the domain. This paper discusses the ontology usage, possibly available at the Semantic Web, to support pattern interpretation. Ontologies are exploited for addressing three interrelated problems: a) to represent patterns in a more intuitive form, b) to identify patterns related to some subject of interest, and c) to identify potentially interesting patterns through concept-oriented, interactive pattern rummaging. Other features complement this approach, such as patterns grouping and pattern visual representation. The remainder of this paper is structured as follows. Section 2 presents the proposed ontology-based functionality targeted at supporting the analysis phase. It describes the ontology properties, and its use for conceptual pattern representation, pattern rummaging, pattern retrieval and concepts merging. Section 3 describes a scenario of usage. Section 4 compares related work with the proposed approach. Conclusions and future work are addressed in Section 5.
2
An Ontology-based Approach for Pattern Analysis
Given the output of the pattern discovery phase, the goal of the pattern analysis phase is to eliminate irrelevant patterns and to extract the interesting ones, i.e. those that constitute knowledge. But pattern analysis is not an easy task because: a) the number of patterns yielded by mining algorithms can easily exceed the capabilities of a human user of identifying interesting results; b) the output of Web mining algorithms is not suitable for human interpretation, and c) frequently in a WUM process the user does not know what he is looking for, i.e. in most cases the search for interesting patterns is exploratory, which does not include hypothesis verification. Our approach makes use of ontologies, possibly available in the Ontology Layer of the Semantic Web, to support the interpretation of web usage sequential patterns. Ontologies are exploited for addressing three interrelated problems: a) to represent patterns in a more intuitive form, thus reducing the gap between URLs and site events, b) to identify patterns that are related to some subject of interest, and c) to identify potentially interesting patterns through concept-oriented interactive pattern rummaging. Other features complement this approach, such as the grouping of patterns by different similarity criteria and visual pattern representation and manipulation. The remaining of this section describes the underlying assumptions for developing the pattern analysis, the ontology structure, as well the functionality proposed to support the pattern analysis activity. The next section illustrates the use of the functionality using the prototype currently under implementation. 3.1 WUM Process Assumptions Our approach is targeted at the pattern analysis phase. The pre-processing phase considers a set of URLs as data source, which are processed using typical activities, such as data cleaning, user and session identification and path completion [1]. Preprocessing also does not assume any particular data enrichment. If available, a semantic log composed by records with formal semantics based on an ontology underlying the site could be used as well (e.g. [6]).
Because we are interested in usage patterns, we assume the application of the sequence technique in the pattern discovery phase using the algorithm of [9]. As in [10], we assume running the mining algorithm with minimum support threshold. Higher values can make a mining algorithm run faster, but at the risk of reducing the usefulness of data mining results. The basic idea is to accept the execution time required for mining, as well as the huge number of patterns returned. Then, pattern analysis functionality described in the remainder of this section is used to set focus on a subset of patterns, to interpret their meaning, and to identify the potentially interesting ones. 3.2 Ontology Representation Ontologies available at the Semantic Web can be used to represent the events of a web site, which can be roughly categorized as service (e.g. buying, finding) and content (e.g. Hamlet) [5]. Thus, they can be used to associate meaning to web pages and user actions over pages. Our approach exploits the semantic of the pages visited along users’ paths, where meaningful application events are mapped into domain knowledge. The domain events are represented in two levels: conceptual and physical. The conceptual level is composed by an ontology that specifies concepts and relationships among these concepts. At the physical level, events are represented by URLs. The conceptual layer corresponds to the ontology layer in the Semantic Web architecture. Ontologies represent and support relationships among concepts providing them meaning. Three types of relationship are considered in this work: generalization/ specialization, which are powerful abstractions for sharing similarities among classes while preserving their differences; aggregation (part-whole and part-of relationships), in which classes representing the components are associated to the class representing the entire assembly; and binary relationships, representing any other type of relationship that connects two concepts. URLs are then mapped into ontology concepts according to two dimensions: service and content. An URL can be mapped into one service, one content or both. In case an URL is mapped into a service and a content, the predominant dimension must be defined. A same ontology concept can be used in the mapping of various URLs. Not all URLs need to be mapped (e.g. auxiliary pages [1]). Figure 1 describes the structure of the ontology using a UML class diagram. Ontology Content *
0..1
binary *
* Concept
URL
*
Relation
aggregation
*
*
Service
generalization/ specialization
0..1
Fig. 1 Ontology structure class diagram.
Figure 2 illustrates this ontology structure by describing the semantics of a webbased learning site. The web-based learning site offers resources (services and contents) that support students learning. Services include chat, email, student’s assessment, assignment submission, etc. Content is related to the material available in the
site, or the subject related to some services (e.g. a forum has emails about “distance education”). All pages of the site (i.e. URL representing static or dynamic pages) are mapped into an ontology concept. In Figure 2, page URL1 was mapped to the ontology concept Send-file that represents a service offered by the site; page URL2 was mapped to both Glossary and Visualize-Information concepts, which represent content and service, respectively. Page URL3 is mapped to a content concept referred to as Virtual-Environment. Notice that URL2 was mapped according to two perspectives, where the content dimension was defined as default. Conceptual Level
Service
Content
is-a has-part is-a
is-a
Task-Submission
about
DistanceEducation
has-part LoadFile
is-a Cancel
has-part
is-a SendFile
Visualizeinformation
Learning Process
Glossary
has-part -
Virtual Environment
has words about
Physical Level
URL1
URL2 Legend: Concept
URL3 aggregation generalization binary relation
Fig. 2. Mapping URLs to semantic concepts.
The task of mapping URLs into ontology concepts can be laborious, but it pays off by greatly simplifying the interpretation activity, as described in the remaining of this section. The future semantic web will certainly contribute in reducing this effort [11], in that the creation of the respective ontology layer will be part of any site design. 3.3 Pattern Interpretation Functionalities Visual Conceptual Pattern Representation. Patterns yielded by the sequential mining algorithm are a sequence of URLs, which are often hard to interpret. In order to reduce the semantic gap between URLs and events performed by users in the Web sites, our approach exploits the semantic of the pages visited by users. Thus, the sequential patterns presented to the analyst are not composed of URLs, but rather of the primitive concepts of the ontology into which they were mapped. Considering the ontology illustrated in Figure 2, a pattern in the form URL1→URL2 is displayed using the concepts that represent the corresponding primitive events in the site, such as Send-File→ Glossary. This pattern representation provides the analyst with a more intuitive meaning of the pattern. By exploring the
dimensions, the analyst can interpret the patterns according to his interests. For instance, the pattern URL1→ URL2 can be represented as Send-File → VisualizeInformation if the analyst is interested by the service dimension or Send-File→ Glossary if both dimensions are of interest. According to the content dimension, the pattern URL2→ URL3 can be interpreted as Glossary→ Virtual-Environment. The generalization/specialization and aggregation relationships can be explored to provide various abstraction levels over a same pattern. For instance, the pattern SendFile→ Virtual Environment can also be represented as Task-Submission → VirtualEnvironment, Task-Submission → Distance-Education and so on. Interactive Pattern Rummaging. The interactive pattern rummaging functionality allows exploiting the ontology in different ways to identify relevant patterns. The analyst can visualize the patterns in different abstraction levels, exploring the generalization and aggregation relationships through operations similar to “roll-up” and “drill-down” in OLAP (On-line Analytical Processing). The roll-up operation represents a concept either by its generalization or aggregation relationship. The drill-down operation explores these relationships in the inverse sense. Roll-up and drill-down operations can be used for two different purposes: better understanding the events represented by the pattern, and to obtain abstract patterns that actually represent a set of patterns. Figure 3 illustrates the use of roll-up operations over individual elements of a pattern for understanding their meaning through more abstract concepts. This task is called pattern comprehension. In this example, the original pattern reveals that users access some page about the subject “virtual environment”, access the glossary and then load and send a specific file. By rollingup the concept Virtual-Environment, the user understands that it is part of the distance education content, which possibly motivate the users to look for other definitions available in the glossary. He also understands that loading and sending are activities related to the submission of an assignment. With the same purpose of pattern comprehension, binary relations can be used to complement the information about the pattern events, by showing other related concepts on demand. The user selects a concept and asks for the relationships in which it participates. For instance, Glossary concept has a binary relationship with the concept Learning-Process as represented by the Figure 2. Thus, the analyst can understand that the glossary has words about learning process. Pattern Support: 23.85 Roll-up / drill-down
DistanceEducation
Task-Submission
is-a
has-part Virtual-Environment
Glossary
Load-File
is-a Send-File
Fig. 3. Rolling-up the concepts of pattern comprehension.
Another use of the roll-up operation is to obtain an abstract pattern, i.e. a pattern that actually represents a set of patterns. For that purpose, the user substitutes one or
more pattern elements for their corresponding abstract concept, as depicted in Figure 4. In this example, the user is interested in patterns where a group of users access a page about virtual environment, then the glossary, followed by the use of two task submission activities (e.g. load, visualize, cancel and send a file, according to the ontology). Notice that in doing so, the support of the abstract pattern must be recalculated. For instance, the abstract pattern may match both Virtual-Environment→ Glossary→ Load-File→ Send-File and Virtual-Environment→ Glossary→ Load-File→ Cancel, which are found in the rule set. Our approach for recalculating the support is inspired in [10], and it is not discussed here due to space limitations. DistanceEducation
Pattern Support: 35.52
has-part Virtual-Environment
Glossary
Task-Submission
Task-Submission
Fig. 4. Abstract pattern obtained by rolling-up
The roll-up and drill -down operations allow users to analyze the rule set provided by the mining algorithm in an exploratory manner, based on the events captured by the ontology. For instance, the user starts with the pattern illustrated in Figure 3, and after having the insight that load and send a file are tasks related to the submission of assignments; he rolls-up for the abstract pattern of Figure 4, such that all patterns that match this abstraction can be found in the rule set. Then, by drilling down, he becomes aware of the use of other distinct task submission services that support the abstract pattern. By analyzing the support of these rules, he may realize that the number of students who canceled the submission after loading the file is greater than the number of students who actually sent their assignments. Then he realizes that the assignment submission service of the site is not intuitive for the students, and should be redesigned or a help/tutorial should be provided. Pattern Retrieval. User can interact with the ontology to retrieve patterns that are related to concepts of interest. The analyst visualizes graphically all concepts and relationships defined for the domain, and selects the ones in which he/she is interested on. Then related patterns are retrieved and displayed for further inspection. For example, if the user expresses interest in patterns that involve the forum concept, all patterns that include this concept will be presented. Additionally, the ontology is explored such that any rule involving other Communication-Resource (e.g. chat, forum) will also be presented. We are currently working on measures to rank the patterns based on the distance of the concepts in the hierarchy. These measures represent the potential interest the pattern represent to the user with regard to the selected concept. Concepts Merging. Patterns can be composed of many events. In this case, their length thus makes their interpretation difficult. A useful functionality is to merge a sequence of similar concepts according to the ontology. The merging concepts feature intends to represent a
sequence of pattern elements by a higher level abstraction, supporting user interpretation through the exhibition of simplified (i.e. shorter) patterns. For example, the pattern of Figure 3 can be simplified if the two submission activity concepts are replaced by a single one entry (i.e. Virtual-Environment→ Glossary→ TaskSubmission), in case the user is not interested in this level of detail Pattern Clustering. Given the very low support value for running the mining, the input file for the analysis phase probably contains thousands of sequential patterns formed by a set of URL’s. Similar patterns can be grouped in families according to different criteria, allowing the analyst to focus on a pattern set for applying the rummaging functionality. Presently, two grouping criteria are considered: maximal sequence patterns (i.e. patterns that are not a sub-pattern of any other pattern [9]) and conceptual similarity.
3 Discovering Interesting Web Usage Patterns in a Web-based Learning Site We are developing a prototype in Java that implements the pattern interpretation functionalities described above, called PIT (Pattern Interpretation Tool). In this section we present a scenario in which PIT functionality is explored to support the interpretation of mined sequential patterns. We use as case study a real Web-based learning application designed using the course-management infrastructure WebCT1. WebCT allows to design and manage learning sites, providing functionality that includes tools for content propagation (e.g. texts, videos), synchronous and asynchronous communication (e.g. chat, email, forum), assignment submission, performance evaluation (e.g. quiz), among others. This case study analyzes navigation behavior related to an intensive extracurricular course with 15 students. The pre-processed log contains nearly 6000 access records, which result in 943 sequential patterns using the sequential algorithm of [9], using zero as support threshold. Due to the unavailability of an ontology layer for this site, the corresponding domain ontology and URL mappings were manually input in the PIT database. For this example, 499 URLs were mapped to domain ontology concepts. The analyst uses PIT to interpret the patterns contained in a text file, and identify the ones that are interesting. After loading the file, the user becomes aware of the existence of 943 rules. Overwhelmed by this huge volume of rules, he decides to use the clustering feature to reduce the rule set, and to choose a group of rules to focus on. Then, he decided to clusters these patterns according to the maximal sequence criteria, reducing the rule volume to 46 maximal rules. With this, the rule set becomes simpler and can be more easily manageable. Figure 5 displays part of the prototype’s main interface, divided into three areas: Clustering Area (Figure 5.a), Detailed Patterns Area (Figure 5.b) and Rummaging Area (Figure 5.c). The Clustering Area displays all clusters formed according to the chosen criteria, in this example, the maximal patterns. The content of a cluster can be 1
http://www.webct.com/
inspected at any moment in the Detailed Patterns Area (Figure 5.a-b), in this example, the patterns subsumed by the selected maximal pattern. Notice that although the rules in the input file are composed of URLs, the user always sees them according to the primitive ontology concepts to which they were mapped. For this visualization, the user choose which dimension he is interested on: service, content or both (the default). This selection can be performed in the Rummaging Area.
a
c
b
Fig. 5. PIT prototype main interface.
The analyst then browses the Cluster/Detailed Areas until he finds a pattern that he wants to investigate, either because it is possibly interesting, or because he wants to better understand what the pattern reveals about the site. To interpret that pattern, the user selects it, and the corresponding graphical visualization is displayed in the Rummaging Area. At this point (Figure 5.c) the Rummaging Area displays graphically the maximal pattern exactly as it appears textually (i.e. using the primitive ontology concepts), together with the corresponding support. The analyst chooses a pattern that reveals that students login in the WebCT, access a course area, access an activity list and interact with the chat. To better understand the meaning of this pattern, he applies the roll-up operation over the Activity-List and Chat ontology concepts (Figure 6). Then he learns that the activity list is part of the pedagogic structure (which is in turn a content) and that chat is one among many communication resources (which is in turn a service). Since the user is still not sure about the meaning of the activity list, he explores the binary relationships associated to the concept and learns that it consists of a list of activities proposed for that course. Notice that patterns are presented in a form that is more suitable for human interpretation and that analyst does not need to be an expert of the application domain and site design to understand the meaning of patterns. He has just to be familiar with the site, and the ontology will provide deeper knowledge about it. The analyst concludes that students check the list of activities they have to perform during the course, and then use the chat to clarify doubts about it. Then the analyst becomes curious if in this situation, students also access other types of communication resource to interact with other students and teachers, such as mail or forum. He
then creates the abstract pattern of Figure 7, which becomes visible in the Pattern Rummaging Area.
Fig. 6. Interactive pattern rummaging.
Because the support of the abstract pattern increases with regard to the one used to generate it (Figure 6) the user realizes that this abstract pattern matches other patterns, thus indicating that other communication tools are used. To know which ones, and their support, the analyst drills-down and the Detailed Patterns Area is used to display all patterns that match the abstract one. He then becomes aware that email is used, but forum, another possible communication tool, is not. Thus, he infers that students prefer chat or mail because the answer is faster than forum.
Fig. 7. Abstract pattern and contained patterns.
We are currently developing the pattern retrieval functionality, merging functionality, as well as adding new clustering criteria. The main idea of the retrieval functionality is to display all the ontology concepts to the user, such that he can select a concept according to his interest. Then, all patterns that include that concept, or related ones, are retrieved and ranked according to a similarity function. This functionality allows users familiar with the domain to explore the ontology for finding potentially interesting patterns. The user is also not required to learn the syntax of a mining language, nor of a rule filter mechanism. The merging functionality has the goal of facilitating pattern interpretation by providing a more concise representation.
4
Related Work
Several approaches in the literature address issues related to pattern interpretation, which can be divided into syntactical, semantic and visualization. Syntactical approaches are based in objective and subjective measures [8]. The former is based on the adoption of objective measures upon the inherent structure of mined patterns, i.e.
statistics such as support and confidence [9, 10]. Objective measures help reducing the number of rules generated by the mining algorithm, but they do not help interpreting the meaning and significance of a pattern in the domain. Additionally, rules with high support are not necessarily interesting. Our approach recommends minimum support threshold, as in [10], because even patterns with low support can be potentially interesting. The disadvantage is the huge number of generated patterns. To deal with this problem, we propose the clustering functionality, in which patterns are grouped according to specific similarity criteria. The current implementation assumes the maximal sequence criterion, but approaches could be applied as well, such as [12,13]. Subjective measures depend on prior beliefs [2, 8], which express domain knowledge. Mining results that either support or contradict these beliefs are considered (un)interesting. Thus, the effectiveness of these measures is related to the ability of expressing beliefs for a given domain. Our approach in complementary to this approach, in that it supports an exploratory search and interpretation of patterns, for which prior beliefs may not exist (yet). Semantic approaches are required for providing meaning to mined patterns with regard to the domain. WUM patterns are often represented as a set of URLs. This type of pattern is often hard to interpret because a URL does not necessarily express intuitive knowledge about an event in the site. Thus, in the WUM context, patterns analysis has to deal with the semantic gap between URLs and events on the site. Application events are defined according to the application domain, a non-trivial task that amounts to a detailed formalization of the site’s business model, i.e. describe the user behavior, interest and intentions. Our approach proposes the notion of “conceptual pattern” to facilitate pattern interpretation. This represents an important contribution, considering as input a traditional web log file (e.g. in the CLF format). In case the input to the mining phase is a semantic log, the output would be straightforwardly the conceptual pattern, since the mapping between the physical and the primitive concepts of the conceptual layer would be no longer necessary. However, all proposed functionalities (e.g. rummaging, clustering) remain extremely useful for pattern interpretation. Integrating domain knowledge into the WUM environment is essential for making easier the pattern interpretation task. Conceptual hierarchies (taxonomies) are a primitive form of knowledge representation, and they have been extensively used in data mining. For instance, [14] employs taxonomies for pre-processing log data. Their work considers dynamic pages, where concepts are extracted from a combination of URI scripts, database contents, and site functionality interpretation. Although multiple concepts can enrich log data, the user has to choose a single one for mining patterns. The work in [9] proposes mining algorithms that generate more generic rules, with the aim of overcoming problems related to support threshold. The disadvantage of this approach is the elevated number of rules generated, which makes difficult the exploratory task. Also, the relationship between concepts is disregarded, making it hard to relate generic rules and the related specialized ones. In our approach, the relationship between concepts is maintained for pattern interpretation, and we consider other types of relationships (i.e. aggregation and binary relationships). In [15], the authors make use of taxonomies to express pattern template filters. The disadvantages are that users: (1) should be able to specify their interest to filter patterns for
hypothesis verification, and (2) must know the filtering language. Our approach allows exploratory analysis, because it does not assume any prior knowledge and belief for pattern interpretation. Additionally, users interact with a visual representation of the ontology, selecting the concepts and relationships in which they are interested on. Thus, it does not require learning any new language. Although taxonomy is in general useful for these applications, the semantic it expresses is limited, restricted to is-a relationships. Our approach assumes other types of relationship between concepts, exploring these relationships in pattern analysis. Recent works in the WUM area (e.g. [6, 16,]) have shown the usefulness of exploiting domain knowledge represented as ontology, a promising path considering the Semantic Web. The Semantic Web enriches the current “syntactical” web by a formal semantic in form of ontologies, which capture the meaning of pages and links in a machine-understandable form. For instance, Oberle et al. [6] propose the semantic log definition, where users requests are described in semantic terms. Using ontology concepts, the multitude of user interests expressed by a visit to one page can be captured, in a process referred to as conceptual user tracking. The work in [16] uses the semantics extracted about page content or structure to discover domain level web usage profiles, to be used in Web personalization. However, these works are limited to the content of pages, disregarding services. In our approach, both dimensions are considered. Finally, visual approaches are targeted at the graphical representation of rules [12,15]. Our approach compares to [12] in that it proposes rummaging functionality, but we do not address advanced visualization issues.
5
Conclusions and Future Work
In this paper we proposed an approach that exploits domain knowledge to support the pattern interpretation. The approach is intended to make the results of pattern analysis more easily compressible to human users, as well as to support the visual and interactive evaluation and identification of potentially interesting patterns. We illustrated in a real scenario how the proposed functionality can be exploited for pattern interpretation, using the prototype under development. The interpretation examples discussed correspond to real problems detected in a WUM application [17]. Functionality addresses three main problems related to patter interpretation: a) a more intuitive representation patterns in order to reduce the gap between URLs and site events, b) the identification of patterns that are related to some subject of interest, the c) to identification of potentially interesting patterns through concept-oriented, interactive pattern rummaging. Grouping of patterns by different similarity criteria and visual pattern representation and manipulation complements the approach. The main advantages of the approach are: a) effective support for exploratory pattern analysis, b) analysts can be familiar with the domain, not necessarily experts, c) no laborious preprocessing activities are assumed, which accounts for 60-80% of the effort in the whole process, d) abstract patterns are generated only on demand, thus reducing the size of the rule set, e) users do not have to learn the syntax of a language to write queries or pattern templates in order to focus the interest on a set of patterns, because the ontology is manipulated graphically.
Further research includes, among other topics, other similarity functions for clustering patterns, concept-based pattern similarity, and analyst profile learning for functionality personalization and recommendation actions.
6
References
[1] Cooley, R., Mobasher, B., and Srivastava, J. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems 1, 1 (1999), 5-32. [2] Cooley, R. The use of web structure and content to identify subjectively interesting web usage patterns. ACM Transactions on Internet Technology 3, 2 (2003). [3] Berendt, B., Hotho, A., and Stumme, G. Towards semantic web mining. In The Semantic Web - ISWC 2002, First International Semantic Web Conference (2002), I. Horrocks and e. J. A. Hendler, Eds., vol. 2342 of Lecture Notes in Computer Science, pp. 264-278. [4] Berners-Lee, T., Hendler, J., and Lassila, O. The semantic web. Scientific American 284, 5 (may 2001), 35-43. [5] Stumme, G., Hotho, A., and Berendt, B. Usage mining for and on the semantic web. In National Science Foundation Workshop on Next Generation Data Mining (Baltimore, USA, 2002). [6] Oberle, D., Berendt, B., Hotho, A., and Gonzalez, J. Conceptual user tracking. In International Atlantic Web Intelligence Conference (Madrid, 2003), Springer, pp. 142-154.. [7] Hendler, J. Agents and the semantic web. IEEE Intelligent Systems Journal 16, 2 (MarchApril 2001), 30-37. [8] Silberschatz, A., and Tuzhilin, A. What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering 8, 6 (1996), 970-974. [9] Srikant, R., and Agrawal, R. Mining sequential patterns: generalizations and performance improvements. In Lecture Notes in Computer Science (1996), Springer, Ed., vol. 1057, pp. 3-17. [10] Hipp, J., and Guntzer, U. Is pushing constraints deeply into the mining algorithms really what we want?: an alternative approach for association rule mining. SIGKDD Explor. Newsl. 4, 1 (2002), 50-55. [11] Berendt, B., Hotho, A., and Stumme, G. Towards semantic web mining. In The Semantic Web - ISWC 2002, First International Semantic Web Conference (2002), I. Horrocks and e. J. A. Hendler, Eds., vol. 2342 of Lecture Notes in Computer Science, pp. 264-278. [12] Blanchard, J., Guillet, F., and Guillet, F. Exploratory visualization for association rule rummaging. In 4th International Workshop on Multimedia Data Min-ing MDM'03 in conjunction with KDD'03 (2003), pp. 107-114. [13] Jorge, A. Hierarchical clustering for thematic browsing and summarization of large sets of association rules. In International Conference on Data Mining (2004). [14] Berendt, B., and Spiliopoulou, M. Analysing navigation behaviour in web sites integrating multiple information systems. The VLDB Journal 9 (2000), 56-75. . [15] Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., and Verkamo, A. I. Finding interesting rules from large sets of discovered association rules. In Proceedings of the third international conference on Information and knowledge management (1994), pp. 401-407. [16] Dai, H., and Mobasher, B. Using ontologies to discovery domain-level web usage profiles. In Semantic Web Mining Workshop (2002). [17] Machado, L., and Becker, K. Distance education: A web usage mining case study for the evaluation of learning sites. In The 3rd IEEE International Conference on Advantage Learning Technologies (Athens, 2003), I. Press, Ed., pp. 360-361.