Semantically enhanced sequential patterns for content adaptation on the web
3
Mehdi Adda1,3 , Petko Valtchev1 , Rokia Missaoui2 , Chabane Djeraba3 1 DIRO Université de Montréal CP 6128 succ Centre-Ville Montréal QC H3C 3J7 Canada 2 Département d’informatique et d’ingénierie Université du Québec en Outaouais CP 1250 succ B Gatineau QC Canada J8X 3X7 Canada LIFL - UMR CNRS 8022 - Bâtiment M3 59655 Villeneuve d’Ascq Cédex - FRANCE {addamehd,valtchev}@iro.umontreal.ca,
[email protected],
[email protected]
Abstract— Content adaptation on the Web is aimed at reducing the total amount of information served by a site or application to the items matching the needs/preferences/tastes of a specific user. As a major trend thereof, recommender systems typically rely on the computation of a relevance score for content object to the anticipated user’s needs. Using the navigation patterns for a user or a group of users as a support for the relevance guess is a classical approach to recommendation that breaks down to discovering, or mining, associations among content objects based on the way these are navigated through. As feeding domain knowledge into the mining process has proven to both increase the precision of these associations and ease their interpretation, we consider the "extreme" case, i.e., the availability of a full-scale domain ontology whose concepts and properties characterize the content objects and their relationships. We thus tackle the problem of frequent pattern extraction from sequences of content objects corresponding to user sessions, which are further described within a domain ontology. Here we define a theoretical framework for the resolution of the mining task which comprises a pair of languages for data and pattern description, respectively, and a hierarchically-organized pattern space split into levels. The latter underlies an Apriori-like level-wise method for frequent pattern generation and evaluation.
I. I NTRODUCTION The spectacular growth of the Web along the last decade has led to a situation where a user is faced with a huge quantity of information whereas only a tiny part of it might be relevant to the user’s needs (preferences, tastes, sensibilities, etc.). Even on a single site scale, the variety of content objects served by an application may require a specific mechanism to help the user find items of interest. Unlike search engines, which require a good knowledge of object descriptions in terms of keywords, content adaptation mechanisms in general and recommender systems in particular are less intrusive. Adaptation relies on the assessment of the relevance degree of an object with respect to the anticipated user’s needs which remain, in most cases, implicit. Therefore, the relevance is approximated based on data about the adequacy of the object to other (similar) users or of other (similar) objects to the same user. In the later case, objects are compared based on the heuristic guess that likeness will reflect similar relevance degrees. Likeness scores are obtained in various manners, such as: (i) similarity computation from structured descriptions
of objects, (ii) association extraction from co-occurrences of objects in user logs or (iii) aggregation of direct user votes on the relevance of particular objects. We are currently investigating an association-based recommendation approach whose key task is the mining of meaningful associations from the raw data about user clicks (or object "visits"). Unlike the classical association mining task [1], the approach relies on additional knowledge about the domain underlying the served content (e.g., e-tourism, on-line sale of PCs and appliances, etc.). Our hypothesis is that incorporating such knowledge in pattern extraction should help increase the relevance of the discovered associations while granting them higher degree of interpretability. Ontologies have become the standard way for expressing knowledge about a domain and the key to interoperability on the Web. Therefore, a growing number of information providers on the Web power their applications with an ontology describing the content that is served. Hence, in our approach we hypothesize that an ontology is available and retrieve two types of knowledge about content objects: membership to generic concepts (e.g., categories of products to be sold on-line at various levels of abstraction) and existence of inter-object links (e.g., compatibility, part-of, bonus product links, etc.). The data we work on is therefore derived from the raw object sequences by integrating semantic elements, such as the existing links between objects of the sequence and, later on, the domain concepts to which objects belong to, and the concept roles underlying the links. The corresponding pattern mining problem amounts to a combination of structured pattern mining (sequences [2], trees or graphs) and generalized pattern mining [3] which, to the best of our knowledge, has not been delt with before. The key difficulty here is the complex inner structure of the patterns, which vary on two dimensions: the is-a hierarchy of the ontology and the network that roles induce on top of it, resulting in a combinatorial explosion in the space of all potential patterns. Hence the design of efficient and effective mining methods require insightful procedures for pattern space traversal, candidate generation and frequency tests. In [4], we have presented the main elements of our theoretical framework, i.e., two descriptive languages, a generality relation
on patterns, level-wise splitting of the pattern space and an algorithm that takes advantage of that splitting “à la Apriori”. The present work completes the theoretical bases and explores the application of semantically enhanced patterns for content adaptation on the web. Topics such as efficiency, scalability and definition of reduced representations of the interesting patterns such as closed and maximal ones, are left for a future development. The rest of the paper is organized as follows. Section II summarizes related work on sequential pattern mining. Section III presents the theoretical basis of an approach to mine a new category of sequential patterns: semantically enhanced sequential patterns. A new recommendation strategy based on the mined patterns is presented in Section V. Section VI gives an example that illustrates the way our algorithm xPMiner works and presents some advantages of exploiting the mined patterns in a recommendation process compared to the classical approaches usually used in recommendation systems. Concluding remarks and discussions on future work are given in Section VII. II. R ELATED W ORK Apriori is a classical algorithm for frequent pattern mining [1]. Although it has been defined for plain itemsets, the underlying strategy has been later on adapted for data formats involving some inner structure, such as sequences, trees, graphs, etc. Sequential pattern mining problem was first introduced by Agrawal and Srikant [2] where patterns involve items only. The initial definition was extended to cover domain categories (classes). This leads to the notion of generalized sequential patterns [3]. Mannila et al. [5] proposed a new approach in which patterns are composed of episodes. An episode is defined as a collection of events that occur relatively close to each other in a given partial order. Instead of mining simple patterns, Pinto et al. have proposed multidimensional sequential patterns [6], [7] where sequential patterns are generated from multidimensional data. Bettini et al. [8] and Wang et al. have addressed the problem of sequential metapattern mining on object sequences [9], [10], where a metapattern is a pattern of patterns. Existing approaches are generally limited to mine objects and/or classes while other domain knowledge such as interobject links and inter-class relations are not taken into account. Our preliminary studies (see [11] and [4]) explore association rule mining based on domain ontology links and relations as a mechanism for system recommendation and semantically enhanced sequential pattern extraction. By including links and relations in the mining process, complex structures – like graphs– need then to be handled like in graph mining field where notions and algorithms for discovering frequent topological substructures over a set of graphs are studied. As in sequential pattern mining, graph mining relies mostly on the Apriori algorithm proposed in association rule mining [1]. For example, Kuramochi and Karypis [12] describe an algorithm, called FSG, which incrementally mines frequent
subgraphs by adding at each pass a new edge. Inokuchi et al. [13] present another Apriori based algorithm called AGM where a new node is added at each pass. Some other studies were concerned with mining frequent subtrees using Apriori scheme, and algorithms are proposed such as TreeMiner [14] and FreeTreeMiner [15]. Other graph mining algorithms do not rely on Apriori principle and use depth-first or breadthfirst search strategies. This is the case for Gspan proposed by Yan et Han [16] and DSPM proposed by Cohen et al. [17]. Algorithms such as Closegraph [18], CMTreeMiner [19] and Spin [20] allow the mining of frequent and closed subgraphs. Before presenting our own approach, we go back to Apriori which – more than a concrete method – represents a strategy for pattern mining that we have adopted in our knowledge-rich context. Indeed, the algorithm could be seen as a combinatorial generation procedure that uses additional validity criteria. In this sense, it represents a traversal of the pattern space in a top-down, breadth-first search. The space is layered, with layers corresponding to pattern cardinality. Moreover, to move between levels, Apriori uses optimization tricks: new level elements are generated by combining two elements from the level immediately above while interestingness measures (e.g., support) use already tested candidates from the same level. Finally, two languages are implicit in the approach: the data and the pattern ones. In the classical version, both are made of itemsets and hence are indistinguishable. Both relations that are inherent to such pairs of languages, instantiation linking data records (transactions) to patterns that "characterize" them, and generality between patterns, reduce to mere set-inclusion. The next sections explain how the above elements are spelled in our own environment. III. PATTERNS LANGUAGES AND RELATIONS The presentation of the descriptive formats for data and patterns starts with a brief recall of some basic terms from the ontology engineering field followed by the description of the underlying languages with their syntax and semantics. A. Ontologies and ontological languages An ontology is an explicit conceptualization of a specific domain, for instance, the tourism domain. In particular, it allows for a representation of domain concepts and their relations, or roles, as well as for the representation of some instances of those concepts that are further related by links instantiating generic roles. Roles are usually directed and behave like functions: a role maps the instances of a source concept to those of a target one. For instance, Figure 1 depicts a partial view of an ontology, called Travel1 , as shown by the Protégé plug-in ezOWL. Concepts like Destination, Activity and Accommodation are shown. Furthermore, roles like hasActivity and hasAccommodation are expressed in the ontology, although the selected view does not show them. Instances and their links, for example, a Destination object called Montréal with an Accommodation called Hotel Crowne Plaza may appear in the ontology. 1 http://protege.stanford.edu/plugins/owl/owl-library/travel.owl
Ontologies have recently gained interest with the emergence of the Semantic Web [21], and some standardization efforts are underway in the field of ontological languages (e.g., DAML+OIL [22], OWL [23]). OWL (Ontology Web Language) is a de facto standard for ontologies on the web. Its formal background lays in the description logic (DL) field [24]. This means that an ontology Ω is expressed using a language whose building blocks are a set of (atomic) concept names TC and role names TR . These are further combined into more complex descriptions by a set of constructors, such as conjunction, disjunction, role restrictions, etc. Individuals are introduced as a set of identifiers OΩ (or simply O). Although ontological languages provide powerful mechanisms for reasoning such as subsumption computation for descriptions, we limit our usage of an ontology to its strict representation functions. Hence we assume an asserted generality (is-a) relationship between descriptions, i.e., both concepts and roles, in Ω, say ⊆Ω , and a computed subsumption relationship vΩ that extends ⊆Ω to all descriptions of the language behind Ω. For example, just like Hotel is a subclass of Accommodation, hasHotel is a sub-role of hasAcccommodation. Moreover, we use two functions to expressed the connections between the roles and the concepts they describe. Thus, given a relation r ∈ TR , dom(r) provides the most general concepts that are specified as having the role r, while ran(r) provides the most general concepts that have been specified by the ontology designer as target concepts. In a restricted version, ran(r, c) where c ∈ TC , the latter function only computes the most general target concepts whenever the source is c. Furthermore, as ⊆Ω is usually a partial order, it makes sense to speak about predecessors (predΩ (ci )) and successors (succΩ (ci )) of a concept/role in the respective is-a hierarchy. Descriptions which are neither in predecessor nor in successor relationships are incomparable, and the underlying relationship will be denoted by ⊥. The ontology-related notations are summarized in Table I. The descriptions of an ontology language such as OWL are typically provided with an abstract denotational semantics which requires an interpretation domain and an interpretation function. However, for our purposes, we limit that domain to the set of all instances O, which means the semantics of a concept c is the set of all individuals that are explicitly stated to belong to c or to a sub-concept thereof. Similarly, a role r is interpreted as a set of links between individuals from O. These concrete semantics are jointly denoted [ ]Ω . In the remainder of this article we will use OWL terms like class, object and relation to indicate concept, individual and role respectively. B. Descriptions of data and patterns The raw data we process consists of sequences of object URLs as recorded in the Web site logs. These are easily translated into object identifiers from the ontology. Thus, on the ontology side, a data entry belongs to Oω = (O × O × ...), the set of all sequences on O. Although patterns could be described at the object level, these would have only limited
Expression Ω TC TR vΩ dom(r) ran(r) ran(r, c) succΩ (ci ) predΩ (ci ) c i ⊥ cj ri ⊥ rj M ax({c1 , c2 , ..., cn })
Definition domain ontology set of all ontology concept names c1 , c2 , ..., cn set of all ontology relation names r concept subsumption relation on Ω domain concepts of the relation r range concepts of the relation r range of r whenever its domain restricts to c set of successors of concept ci in Ω set of predecessors of concept ci in Ω ci , cj are incomparable if ci is neither parent nor child of cj relations ri and rj are incomparable if ri is neither successor nor predecessor of rj set of concepts ci which are incomparable and have highest levels of abstraction TABLE I BASIC
ONTOLOGICAL DEFINITIONS
interest since for a large number of objects in O, the chances of finding frequent combinations are relatively low. In contrast, moving to the class level increases substantially those chances. Therefore, we shall be looking exclusively at patterns whose elements are ontology classes. Moreover, relations between classes are incorporated in order to add more semantic precision to the resulting patterns. To that end, the initial object sequences are first extended by incorporating all the valid links from the ontology. For conciseness reasons, we limit the inserted links to those which are "co-linear" with the order among objects in s (the source object of the link is before the target one). 1) Pattern syntax: Assume a set of object (ground) sequences D and let DΩo be the set of extended object sequences whereas extended means adding the links to the initial structure as described above. Then, let ΓO denote the universe of all possible object sequences (see class sequences for a formal definition). Let also, ΓΩ be the universe of all extended class sequences (called simply sequences whenever confusion is excluded) that may be produced on top of an ontology Ω. An extended sequence is the pattern we are looking for. It is made of a basic part, which is an ordinary list, and an extension, which is a set of relations given with the indices of connected classes in the sequence. Formally speaking, the abstract syntax of our pattern language is defined as follows: Definition 3.1: An extended class sequence is a pair S = hstr, reli where: • str = hci ii=1..n is a sequence of concept names from TC such that: ci 6= ci+1 for 1 ≤ i < n; • rel = {r(l, m) | 1 ≤ l ≤ m ≤ n} is a set of relations (triplets) such that: – r ∈ TR is an ontology role name; – ∃c ∈ dom(r) such that cl vΩ c, which we will denote by cl vΩ dom(r); – cm vΩ ran(r, cl ). In the remainder of this paper we denote hci ii=1..n by S.str, {r(l, m) | 1 ≤ l ≤ m ≤ n} by S.rel and ci by S(i).
Fig. 1. Partial view of the travel ontology: nodes in the graph represent domain concepts and the labels associated with some of them represent the number of sub-concepts not shown in the figure. Lines represent the binary relationship is-a between concepts.
For example, assume that on Figure 1, a role hasActivity is defined between classes Destination and Activity, and that hasAccommodation is a role of Destination whose co-domain is Accommodation. Consider now the sequence on Figure 2 in which three classes, Destination, Accommodation, Activity, drawn as ovals, are given in the order of their appearance (to be read from left to right) and connected by the respective roles. The corresponding extended class sequence, say S1 , will be represented as follows: hhDestination, Accommodation, Activityi, {hasAccommodation(1, 2), hasActivity(1, 3)}i. hasActivity
Destination
Accommodation
Activity
hasAccommodation
Fig. 2.
object sequences that are "summarized" by that pattern is based on the following intuition. First, the structure of the object sequence is reflected in the pattern in a possibly reduced manner. Reduction here means that several objects from the data may be represented by a single common class in the pattern. In addition, not all objects need to be represented in the pattern, some may simply vanish. Similarly, relational links may be jointly represented in the pattern as a relation, or simply vanish. To spell it differently, given a pattern and an object sequence, all classes from the pattern need to have some "basis" objects in the sequence and the object packets representing class bases follow the order among classes in the pattern (i.e., pattern and sequence are aligned). Moreover, for each relation from the pattern, at least one link exists between the respective bases of classes adjacent to the relation. More rigorously, we may express the instanciation between a sequence and a pattern as the existence of a partial mapping from the sequence objects to the pattern classes such that:
Example of an extended class sequence. •
Pattern language is endowed with the empty element S∅ = hε, ∅i, where ε is the empty sequence. 2) Pattern semantics: As for the semantics of the above pattern constructs, one has to ask the question which data hasActivity entries do they represent? Thus, a pattern has to be interpreted in terms of extended object sequences, which means given Ω Central_Florida Surfline and O, ourMarriott_Orlando_World_Center_Resort interpretation domain is ΓO . Now, the function assigning to a pattern the set of all hasHotel
•
• •
an object is only mapped to a class it is an instance of, the order in the object sequence is preserved: if the mapping is defined for two objects, then their image classes are necessarily in the same order, the mapping is surjective, for each relation in the pattern, there is a link which is an instance of the relation whose adjacent objects are respectively mapped to the adjacent classes of the relation,
Clearly, the above mapping is not necessarily unique for a pair of an object sequence and a pattern. Formally, the mapping is defined as follows: Definition 3.2: Given s ∈ ΓO , S ∈ ΓΩ , S represents s (s ∈ [S]ΓΩ ) if there exist a set I ⊆ [1..|s|] and a surjective monotonously non-decreasing map: ψ : I → [1..|S|] such that: • ∀i ∈ I, s(i) ∈ [S(ψ(i))]Ω ; 2 • ∀r(i1 , i2 ) ∈ S.rel, ∃(j1 , j2 ) ∈ (I) such that ψ(j1 ) = 0 i1 , ψ(j2 ) = i2 , and ∃r (j1 , j2 ) ∈ s.rel, such that r0 vΩ r. Based on pattern semantics, a generality relationship between patterns can be established. Thus, given a couple of extended sequences S1 , S2 in ΓΩ , S1 is more general than S2 , denoted S2 ≤ S1 , if and only if [S2 ]Ω ⊆ [S1 ]Ω . In other terms, generalization amounts to the inclusion of pattern interpretations. 3) Pattern subsumption: Generalization relation definition, such as given above, is impractical as the set of all objet sequences is not available. Instead, like in FOL [24], we define a syntactically-bound relation called subsumption which is provably equivalent to semantic generalization. The definition we use is close to the one provided for instanciation which underscores the similarity of the structures between extended object sequences and patterns. Definition 3.3: Given S1 , S2 ∈ (ΓΩ )2 , S2 subsumes S1 (S1 vΓΩ S2 ) if there exist a set I ⊆ [1..|S1 |] and a surjective monotonously non-decreasing map: ψ : I → [1..|S2 |] such that: • ∀i ∈ I, S1 (i) vΩ S2 (φ(i)); 2 • ∀r(i1 , i2 ) ∈ S2 .rel, ∃(j1 , j2 ) ∈ (I) such that ψ(j1 ) = 0 i1 , ψ(j2 ) = i2 , and ∃r (j1 , j2 ) ∈ S1 .rel, such that r0 vΩ r. Definition 3.3 basically settles pattern subsumption on a graph morphism basis. Moreover, unlike generalization relations on similar knowledge formalisms (e.g., conceptual graphs [25]), the above relationship is a partial order, i.e., S1 vΓΩ S2 and S2 vΓΩ S1 imply S1 = S2 . Figure 3 illustrates pattern subsumption according to the ontology travel. While V iews A and C show two patterns that are subsumed by the pattern presented in V iew D according to the above definition, V iew B shows a pattern where conditions of Definition 3.3 are not satisfied. In fact, the sequence order in the two patterns is not the same. It may now be stated formally that subsumption can be used to test generalization. Proposition 3.4: Let S1 , S2 be two extended sequences from ΓΩ such that S1 vΓΩ S2 , then S1 ≤ S2 .
hasMuseum
Capital
LuxuryHotel
Campground
Museum
Contact
hasContact
hasLuxuryHotel
(View A) hasActivity hasContact
hasMuseum
Destination
Museum
Capital
Campground
Contact
Accommodation
Activity
LuxuryHotel hasAccommodation
hasLuxuryHotel
(View D)
(View B)
Capital
LuxuryHotel
hasAccommodation
Fig. 3.
Campground
Museum
Contact
hasContact
(View C)
Subset of an extended class sequence interpretation
turn the more general pattern into the more specific one. This derivational model of subsumption relies on the definition of four operations that jointly represent a complete and consistent calculus for subsumption. In other terms, two patterns are in generality relation if and only if one of them can be obtained from the other one by applying a finite number of operations of the set. The four operations consists in, respectively, specializing a single occurrence of a class in the sequence (i.e., replacing it by a more specific class), adding a new class at a specific location of the sequence, specializing an existing occurrence of a relation, and adding a new relation between two classes of the sequence. They are described in the remainder of this section. Specialize a class: replace the class at position j of a sequence S = hstr, reli with a class c from TC provided that c is more specific than S(j) in the ontology Ω, c vΩ S(j), c is comparable with the actual ranges of all the properties r(_, j) from rel, and c is different from the neighbor classes in the sequence. Formally, we have the following: Definition 3.5: Let S ∈ ΓΩ , c ∈ TC . The operation splCls(S, c, j) yields a sequence hstr, reli with: • str = hS(1), ..., S(j − 1), c, S(j + 1), ..., S(|S|)i; • rel = S.rel, where the following conditions are satisfied 1) c vΩ S(j); 2) c 6= S(j − 1); 3) c 6= S(j + 1); 4) ∀r(i, j) ∈ S.rel : c vΩ ran(r, S(i)), 1 ≤ i < j ≤ |S|. For example, by replacing Activity with a more specific class Sightseeing in the pattern hhDestination, Activityi, {}i we obtain the pattern hhDestination, Sightseeingi, {}i.
C. Pattern operations Although the subsumption computation does not require anymore the availability of the entire universe of extended object sequences, it is still tricky to check whether two patterns are comparable, mainly because of the constraints related to the ground morphism. To ease that, we further decompose the morphism into a sequence of simple transformations that
Add a class: insert a class c from TC into an extended sequence S at a position j is possible provided that j is within the range of the sequence size and c is not identical to its neighbor classes in the sequence. Definition 3.6: Let S ∈ ΓΩ , c ∈ TC . The operation addCls(S, c, j) yields a sequence hstr, reli with:
• • • •
1 ≤ j ≤ |S| + 1; c 6= S(j − 1); c 6= S(j + 1); str = hS(1), ..., S(j − 1), c, S(j), ..., S(|S|)i;
•
itemsets where the level notion relies on the size in number of items, there is no straightforward way of defining levels in our patterns space. In the next paragraph we propose such definition exploring the key fact that levels are actually another way of defining the precedence order of the generality relationship between patterns.
{r(l, m)|r(l, m) ∈ S.rel, 1 ≤ l < m < j} ∪ D. Rank-based level partition of the pattern space {r(l, m + 1)|r(l, m) ∈ S.rel, 1 ≤ l < j ≤ m} rel = To motivate our level construct, observe that in the Boolean ∪ lattice of all itemsets, the immediate successors of a pattern p {r(l + 1, m + 1)|r(l, m) ∈ S.rel, 1 ≤ j ≤ l < m} 0 For instance : addCls(hhDestination, Activityi, w.r.t. the generality relationship are all super-itemsets p that {hasActivity(1, 2)}i, Hotel, 2) = have exactly one item more than p. Thus, they lay in the level k + 1 of the lattice, given that p is on level k, i.e., has k items. hhDestination, Hotel, Activityi, {hasActivity(1, 3)}i. Thus, in the Apriori traversal of the lattice, one moves from a Specialize a relation: replace a relation occurrence r(i, j) frequent p to the test of all or at least some of its successors of an extended sequence S = hstr, reli with another relation by, basically, adding2 a single new item to p and looking at the occurrence r0 (i, j) provided that r0 is a specialization of r. resulting candidates . A similar principle could be imagined in our case, provided we can specify how generality precedence This is captured by the following definition: Definition 3.7: Let S ∈ ΓΩ , r ∈ TR . The operation could looks like. To that end, we need to define elementary or atomic versions of the above operations, i.e., transformations splRel(S, r, i, j, r0 ) yields a sequence hstr, reli with: that turn a pattern into an immediate successor of its. • str = S.str; • r(i, j) ∈ S.rel; The goal is, given an operation, to make sure no pattern 0 0 • rel = S.rel−{r(i, j)}∪{r (i, j)}, provided that r vΩ r. may be more specific than the argument pattern while strictly For instance : splRel(hhCapital, M useumi, {hasActivity more general than the operation result. Intuitively, this can be (1, 2)}i, hasActivity, 1, 2, hasM useum) = achieved by limiting the scope of both operation sorts, i.e., adhhCapital, M useumi, {hasM useum(1, 2)}i. dition and specialization of elements. Thus, the specializationAdd a relation: Adding a relational link r, from TR , between the classes at positions i and j, respectively, of an extended sequence S is possible provided that these positions are properly ordered and lay within the range of the sequence size, as well as the corresponding sequence members are at least sub-classes of a valid pair of domain and range classes for r in the ontology Ω, and that there is no relation in the sequence with the same indices, which is a specialization of r. Definition 3.8: Let S ∈ ΓΩ , r ∈ TR . The operation addRel(S, r, i, j) yields a sequence hstr, reli with: • str = S.str; • rel = S.rel + {r(i, j)}, where the following conditions are satisfied 1) 1 ≤ i ≤ j ≤ |S|; 2) S(i) vΩ dom(r); 3) S(j) vΩ ran(r, S(i)); 4) @r0 ∈ S.rel such that r0 vΩ r. For example, an insertion of the relation hasM useum into the pattern yielded by the previous operation has the following effect: addRel(hhDestination, M useumi, {}i, hasM useum, 1, 2) = hhDestination, M useumi, {hasM useum(1, 2)}i. The above operations do cover all the possible cases of specialization among patterns. However, an effective generation procedure using the Apriori level-wise traversal principle requires a precise definition of what level is. Unlike regular
bound operations will only perform a single down step in the respective generalization hierarchies, that is immediate successors of classes and relations could only be used in substitutions. Similarly, for addition, only maximally general classes and relations could be used. Although plausible, the above principles does not admit a well-founed definition of levels in the pattern space. Indeed, while the Boolean lattice of itemsets shows a highly regular structure, allowing easy computation of the level to which a pattern belongs, this is not the case in our pattern space where the generalization hierarchies of the ontology may introduce irregularities. Hence, our approach is to construct the definition of a level on top of a measure of generality for patterns. The measure, just like the itemset size in the case of Apriori, is to be a linear extension of the pattern space order, but, in addition to that need to reflect the potentially irregular character of the space whenever induced by an ontology. Clearly, the measure should reflect the cumulated specificity of the pattern elements, relations and classes, in their respective hierarchies. To that end, first, individual measures for both element sorts much be defined. With no surprise, these will have to depend on the location of the elements within the hierarchy, i.e., will reflect their "depth" in the partial order the structure represents. There is no standard definition of the depth. From a graph theory point of view, one may chose between the shortest or longest path in the precedence DAG 2 Actually, the generation of candidates is slightly more complex as database joins are performed on patterns of the level k, but end-effect is nevertheless exactly as indicated.
of the partial order leading to the node and starting with a top most element (typically there is a unique Thing class of an ontology). As we need to reflect relative generality, we would need monotony for the measure, that is more specific patterns will need to have greater depth values. Hence we chose the longest path metric as the shortest one is not necessarily monotonous. Mathematically, a step-wise definition for both measures, called height may be provided. Definition 3.9: Let c in TC , r in TR . The height functions for classes and relations, hcΩ and hrΩ , respectively, are defined on Ω as follows: c • hΩ : TC → N is defined as follows: 1, if predΩ (c) = ∅ c hΩ (c) = max{hcΩ (c0 )|c0 vΩ c} + 1, otherwise.
The new operations are merely constrained versions of these defined in the previous section. We name them by adding a simple ’E’ (for elementary) suffix to the original names, yielding addClsE(), addRelE(), splClsE() and splRelE(). For class additions, we shall only authorize maximally general classes that satisfy the conditions of Definition 3.6. This gives the following definition. Definition 3.11: Let S ∈ ΓΩ , c ∈ TC . The operation addClsE(S, c, j) is defined as: addClsE(S, c, j) = addCls(S, c, j) only for the classes c from the set M ax(TC )− {S(j − 1), S(j)}. Otherwise it is undefined. For example: addClsE(hhDestination, Activityi, {hasActivity(1, 2)}i, Accommodation, 2) = hhDestination, Accommodation, Activityi, {hasActivity(1, 3)}i.
hrΩ : TR → N is defined as follows: For elementary concept specialization, in turn, only re 1, if predΩ (r) = ∅ placement with immediate successor from the ontology are r hΩ (r) = max{hrΩ (r0 )|r0 vΩ r} + 1, otherwise. admitted. Moreover, to respect our level constraint, the height To spell it differently, the above definition assures that each of this successor must be exactly one plus the rank of the element has a height value that is at least a unit bigger than original concept. the values of its predecessors. Definition 3.12: Let S ∈ ΓΩ , c ∈ TC . The operaWe can now define a rank function underlies the levels tion splClsE(S, c, j) is defined as: splClsE(S, c, j) = structure within the pattern space hΓΩ , vΓΩ i. The rank value splCls(S, c, j) only for the classes c from the set reflects the generality of a sequence by mixing the generality M ax(succΩ (S(j))) s.t. hcΩ (c) = hcΩ (S(j)) + 1. Otherwise of member classes and relations. As the combination is addi- it is undefined. tive, it indirectly reflects the number of classes and relations For instance: splClsE(hhDestination, Activityi, {}i, in the pattern as well. Technically speaking, the rank of an Sightseeing, 2) = hhDestination, Sightseeingi, {}i. extended class sequence ρ : ΓΩ → N, is the sum of the heights of its classes and of the included relations: Equivalently, for relation additions, we shall only authorize X X general relations that satisfy the conditions of Definition 3.8. c r ρ(S) = hΩ (c) + hΩ (r). Definition 3.13: Let S ∈ ΓΩ , r ∈ TR . The operac∈S.str r∈S.rel tion addRelE(S, r, i, j) is defined as: addRelE(S, r, i, j) = For example, and according to the ontology addRel(S, r, i, j) only for the relation r from the set used in this work, the rank of the pattern S = M ax(TR ). Otherwise it is undefined. hhDestination, Hoteli, {hasHotel(1, 2)}i is: ρ(S) = 5 For instance: addRelE(hhCapital, LuxuryHoteli, {}i, as Destination has a depth of one, while Hotel and hasHotel hasAccommodation, 1, 2) = hhCapital, LuxuryHoteli, are both of depth two in their respective hierarchies. {hasAccommodation(1, 2)}i. The rank, as defined above, is monotonously decreasing with the generalization relation as indicated by the next For relation specialization only replacement with immediate proposition. successors from the relation is-a hierarchy are admitted, and Proposition 3.10: Let S1 , S2 be two extended sequences only for those whose height is greater with a difference of from ΓΩ . S1 vΓΩ S2 entails ρ(S1 ) ≥ ρ(S2 ). one. Definition 3.14: Let S ∈ ΓΩ , r, r0 ∈ (TR )2 . The operation We may now define the level k in hΓΩ , vΩ i as being splRelE(S, r, i, j, r0 ) is defined as: splRelE(S, r, i, j, r0 ) = composed of all the sequences of rank k. It is noteworthy splRel(S, r, i, j, r0 ) only for the relations r0 from the set that there may be natural numbers k for which the level k is M ax(succΩ (r)) s.t. hrΩ (r0 ) = hrΩ (r) + 1. Otherwise it is empty. undefined. When we go back to the elementary operations, we shall For instance: splRelE(hhCapital, M useumi, {hasActivity be looking at those that not only produce an immediate (1, 2)}i, hasActivity, 1, 2, hasSightseeing) = hhCapital, successor patterns, but also allow for an only unit increase M useumi, {hasSightseeing(1, 2)}i in the pattern rank. This means that the result will lay on the next generality level, something which is not insured by only Finally, while the above restrictions on the global traversal requiring precedence between argument and result. principles limit the number of generations of a specific pattern (it can only be generated by a subset of its immediate •
predecessors in the pattern space), there could still be a large number of such generations. To further reduce redundancy, we add additional constraints to the admissible operations of candidate pattern generation. Ideally, a there should be a single, canonical generating predecessor, although this seem out of our reach for the time being. Nevertheless, by restricting the place within the sequence where operations may apply, we further diminish the combinatorics. Indeed, we set the sequence end as the unique place where the operations must be performed. More specifically, only final classes will be eligible for specialization while new classes could only be added at the end of the sequence. For relations, that means that the target class must necessarily be the final one of the sequence. The resulting operations, called canonical, work as restrictions on the respective elementary ones. They are named by adding a ’C’ suffix (for canonical) to the base name: addClsC(), splClsC(), addRelC() and splRelC(). Instead of providing separate definitions of those operations, we refer the reader to Algorithms 4, 5, 6, and 7, respectively, which describe the computation behind each of them. IV. F REQUENT PATTERN MINING Mining sequential patterns from the extended sequences DΩo amounts to finding all the members of ΓΩ having support greater than minsup, a user provided threshold. The target set is thus ΓsΩ = {S ∈ ΓΩ |supp(S) ≥ minsup} where the support of S is the size of the subset of DΩo passes the instanciation test for S. In the following paragraphs we present our mining method that, similarly to Apriori, performs a top-down level-wise search through the pattern space. We start with a description of the instanciation test (see Section III-B.1), i.e., the procedure that determines whether an extended object sequence is a part of the interpretation of a (candidate for frequent) pattern. A. Instantiation test The goal is to check whether a class pattern represents a given object sequence. To that end, an algorithm that closely follows Definition 3.2 is used. Hence, its principle may be summarized as follows: for each class and each relation occurrence from the pattern look for a supporting set, or image, in the extended object sequence and make sure images for concepts and relations concord. Classes are mapped to contiguous subsequences of their instances whereas relations are supported by at least one link of the same relation label. In addition, such a link should (i) start at an object from the image of the source class of the relation occurrence and (ii) arrive at an object from the target class image. Technically speaking, in order to construct the graph morphism, the two sequences, s and S, are traversed in parallel. For each class c of the sequence S, its image in s is determined, i.e., the subsequence of yet unprocessed objects that are members of c. It is noteworthy that this sequence is both contiguous and maximal. Moreover, its determination is a tentative process as it also depends on the matching between relations. This means that several attempts on several
places in s may be necessary before the correct image of c is discovered, or, alternatively the absence of such image is established in which case the entire matching fails. More precisely, once a candidate image subsequence is determined, it is further checked for relation match, i.e., it is looked at whether all the incoming relations for c have an equivalent among the relations of s.rel that are incoming for at least one object from the target image subsequence. To that end, a marking mechanism is used to delimit the current candidate subsequence. Failure to find the match of even a single relation for c invalidates the candidate image and the search for a new candidate subsequence must go further in the sequence s. Algorithme 1 Instantiation procedure 1: procedure I NST (S : pattern; s: extended object sequence) 2: i ← 1; j ← 1; 3: Reset(Images); 4: while i ≤ |S| do 5: while s(j) 6∈ [S(i)]Ω and j ≤ |s| do 6: j + +; 7: end while 8: if j > |s| then 9: return f alse 10: end if 11: while s(j) ∈ [S(i)]Ω and j ≤ |s| do 12: Images[i] ← Images[i] ∪ {j}; 13: j + +; 14: end while 15: concord ← true; 16: for all r(k, i) in S.rel do 17: if @r0 (m, n) ∈ s.rel s. t. r0 vΩ r, m ∈ Images[k], n ∈ Images[i] then 18: concord ← f alse; 19: Images[i] ← ∅ 20: end if 21: end for 22: if concord then 23: i + +; 24: end if 25: end while 26: return true 27: end procedure The pseudo-code of the above test is provided in Algorithm 1 which is a greedy procedure. Moreover, Algorithm 1 can be easily adapted for testing subsumption between patterns, by essentially replacing the object sequence by a class one and the ontology membership test for (object, class) pairs by a subsumption test for class pairs vΩ . The instanciation test is a key building block of our frequent pattern mining method that is described in the following paragraph. B. The xPMiner algorithm Our mining algorithm, called xPMiner, is a top-down levelwise miner similar in spirit to Apriori. This means that at the (k + 1) − th level, xPMiner uses the frequent patterns
generated at level k to compose candidates and then checks their frequency in order to establish which ones are to be kept. The pseudo-code of xPMiner is given in Algorithm 2. Algorithme 2 xPMiner 1: Input: 2: Ω; 3: D; 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18:
Output: ΓsΩ ;
. Domain ontology . Set of object sequences
. Set of frequent patterns
Initialization: DΩo ← I S EQ T RANS (D); . Set of extended object sequences ΓΩ1 ← {hhci, ∅i|c ∈ M ax(TC )}; ΓΩ1 ← CAND T EST (ΓΩ1 ); Method: for k = 2; ΓΩk−1 6= ∅; k + + do ΓΩk ← ∅; for all S ∈ ΓΩk−1 do ΓΩk ← ΓΩk ∪ A DD C LS (S) ∪ S PL C LS (S) ∪ A DD R EL (S) ∪ S PL R EL (S) end for ΓΩk ← CAND T EST (ΓΩk ); end for return (∪k ΓΩk );
The algorithm works on an input made of an ontology Ω and a set of object sequences D. At its initialization step, it transforms the sequences into extended object sequences, i.e., it adds all the links from the ontology that connect two objects in the same order source target as they appear in the sequence (primitive I S EQ T RANS (D)3 ). Moreover, the structure carrying the candidate patterns is initialized with all patterns of rank one, i.e., the patterns made of a single top most concept. These candidates are then evaluated for frequency in the data by carrying out instanciation tests for each entry in DΩo (primitive CAND T EST () described in Algorithm 3). At each subsequent iteration step, the algorithm performs two main operations. First, using the four auxiliary functions representing the four canonical operations, the candidates of rank k + 1 are generated from those of rank k. The four operations are class append (Algorithm 4), relation addition (Algorithm 5), class specialization (Algorithm 6) and relation specialization (Algorithm 7). Then, the frequency of the candidates is evaluated by confronting those to the dataset by means of the CAND T EST (). With such a procedure, a question must be asked about its capacity to retrieve exactly the target pattern set, that is all frequent patterns with respect to minsupp. Although both consistency (no infrequent is retrieved) and completeness (all frequent are retrieved) are at stake, due to the specific nature of 3 Pseudo-code
skipped since straightforward.
Algorithme 3 Candidate test 1: procedure CAND T EST (P : set of patterns) 2: ΓΩtemp ← ∅; 3: DΩo : set of extended object sequences; 4: for all S ∈ P do 5: count ← 0; 6: for all s ∈ DΩo do 7: if Inst(s, S) then 8: count + +; 9: end if 10: end for 11: if count ≥ minsup then 12: ΓΩtemp ← ΓΩtemp ∪ {S}; 13: end if 14: end for 15: return ΓΩtemp 16: end procedure
the computation, only completeness needs to be examined. The following Proposition 4.1 states the completeness of xPMiner. Algorithme 4 Generate patterns by adding a class 1: procedure A DD C LS (S : pattern) 2: k ← |S|; 3: Φ ← ∅; 4: for all c ∈ M ax(TC − (S(k))) do 5: Φ ← Φ ∪ {hS.str&c, S.reli}; 6: end for 7: return Φ 8: end procedure
Algorithme 5 Generate patterns by adding a relation 1: procedure A DD R EL (S : pattern) 2: k ← |S|; 3: Φ ← ∅; 4: for all r ∈ M ax(TR ) s. t. S(k) vΩ ran(r) do 5: for all i s.t. S(k) vΩ ran(r, S(i)) do 6: if @r0 (i, k) ∈ S.rel s.t. r0 vΩ r then 7: Φ ← Φ ∪ {hS.str, S.rel ∪ {r(i, k)}i}; 8: end if 9: end for 10: end for 11: return Φ 12: end procedure Proposition 4.1: For an arbitrary natural number k, if patterns of rank k + 1 exist, then any of them can be generated by at least one pattern of rank k by means of the canonical operations. A corollary of the above proposition is that all the patterns may be generated by an unconstrained level-wise search (induction on k). Frequent patterns are contiguous subset of the space (an order filter in technical terms) so for any frequent
Algorithme 6 Generate patterns by specializing a class 1: procedure S PL C LS (S : pattern) 2: k ← |S|; 3: Φ ← ∅; 4: for all c ∈ M ax(succΩ (S(k))) s.t. hcΩ (c) hcΩ (S(j)) + 1 do 5: if (c 6= S(j − 1)) and (c 6= S(j + 1)) then 6: Φ ← Φ ∪ {hS.str[(k)←c] , S.reli}; 7: end if 8: end for 9: return Φ 10: end procedure
=
Algorithme 7 Generate patterns by specializing a relation 1: procedure S PL R EL (S : pattern) 2: k ← |S|; 3: Φ ← ∅; 4: for all r(i, k) ∈ S.rel do 5: for all r0 ∈ M ax(succΩ (r)) s.t. hrΩ (r0 ) = hrΩ (r)+ 1 do 6: Φ ← Φ ∪ {hS.str, S.rel + {r0 (i, k)} − {r(i, k)}i}; 7: end for 8: end for 9: return Φ 10: end procedure
pattern, all the super-patterns are also frequent. Hence there is always a frequent pattern of rank k above any of rank k + 1 that will enable the generation of the former. An example of the application of our method is provided in the next section which is dedicated to its application as recommendation tool. V. R ECOMMENDATION STRATEGY A recent state-of-art for recommendation systems is presented by G. Adomavicius and A. Tuzhilin in [26]. According to the authors, depending on the portion of domain knowledge used to represent data, such systems can be split into two categories: item-based and concept-based recommendation systems. In item-based recommendation systems, items (objects) frequently accessed together by a user or a group of users are recommended when at least one of them is selected by a new user. For concept-based recommendation systems, access frequencies are sought on the set of domain concepts at different levels of abstraction, and then, instances of concepts appearing together are recommended. A. Concept-based versus item-based recommendation Generally speaking, concept-based systems offer more flexibility compared to item-based systems due to the varying abstraction levels of concepts in concept hierarchies. However, both categories of systems present specific limitations that decrease their overall precision rates. Precision is the capacity
of a recommendation system to suggest objects that highly reflect the behavior which is the source of the patterns used in the current recommendation process. For instance, item-based systems suffer from the “new item problem” which basically means object not yet rated/visited are difficult to recommend with precision. Concept-based systems, in turn, suffer from low precision problem due to the difficulty to choose among the set of concept instances, those which are more likely to interest the user. To illustrate the lack of precision of concept-based systems, we present a realistic and simple example. The scenario is as follows: we consider a set of three users such that each of them has travelled to a city and visited a museum there. The pairs of concerned objects are respectively: hM ontreal, T he_M ontreal_M useum_of _F ine_Artsi, hP aris, Louvrei, hLondon, British_M useumi. Now, suppose there is a fourth user which has just visited the city N ew_Y ork. The question is to find out the items which are most likely to be relevant to this user according to the navigation behavior of the first three users. We observe that the first three users have explored museums located in the city they visited. Intuitively, the fourth user should get a suggestion to visit museums located in N ew_Y ork city, such as American_F olk_Art, American_N umismatic_Society, and Bronx_M useum_of _the_Arts. However, such suggestion and reasoning are not evident neither for itembased systems nor for concept-based systems. In fact, for item-based systems, the object N ew_Y ork was not yet visited and thus no recommendation can be made. In the case of concept-based systems, the behavior of the three users can be summarized by the pattern hCity, M useumi. This pattern may suggest all instances of concept M useum in the ontology travel whether they are situated in N ew_Y ork city or not. B. Our approach The above example shows the limitations of recommendation systems based either on items or domain concepts. To overcome such limitations, one should consider patterns at high level of abstraction and in the same time keep a high precision. It is noticeable that concepts of high level of abstraction as well as relationships between concepts (and hence their instances) bring added value to recommendation systems. In this section, we propose a third category of recommendation systems based on both concepts and relations belonging to a domain ontology. Patterns composed of concepts and relations of different levels of abstraction are discovered using the algorithm xPMiner as shown in Section IV-B. For example, in the situation presented above, user behavior will be represented with the pattern hCity, M useumi, {hasM useum(1, 2)}i and for the user consulting N ew_Y ork object only instances of concept M useum involved in a relation link hasM useum starting at the N ew_Y ork object in ontology T ravel will be suggested. Hence, only museums situated in N ew_Y ork will be recommended to the new user. We believe that the proposed approach combines the precision of item-based systems and
the flexibility of concept-based systems. In the remainder of the section, we present a recommendation algorithm SemRAr based on semantically enhanced sequential patterns. Its mostly original feature is that objects are recommended according to the way they are connected with other objects in the ontology. More specifically, the algorithm considers the sequence of objects already visited by the user (the user session) and tries to match it against the known patterns of user behavior. Whenever a prefix of a pattern matches completely the user session its remaining suffix is used to determine the class of the objects to recommend, i.e., the target class, as well as all the links from the session objects to the objects of the target class that might hold. The pseudo-code of SemRAr is provided in Algorithm 8. Algorithme 8 SemRAr 1: Input: 2: Ω . Domain ontology 3: ΓsΩ . Frequent patterns generated by xPMiner 4: OI . List of last visited objects 5: 6:
Output: Ho
7: 8:
Initialization: Ho ← ∅
9:
Method:
10: 11: 12: 13: 14: 15: 16: 17:
. Hashtable of objects to suggest
s ← I S EQ T RANS(Oi ); for all S ∈ ΓsΩ do i ← C OVERS (S, s, Images); if i < |S.str| then Ho ← Ho ∪ GET I NSTS (Ω, S, OI , i, Images); end if end for return Ho ;
To find out the objects to be recommended, algorithm SemRAr seeks for a subset of frequent patterns that cover the sequence of last visited objects. The pattern of an object sequence by a pattern is very similar to the instanciation between extended object sequences and patterns. Indeed, for each object sequence O in Oω , there is an associated extended object sequence s that may be tested for instanciation. Thus, a pattern S covers O if and only if it summarizes s or, alternatively, s is an instance of S. Definition 5.1: Given S ∈ ΓΩ , Oi in Oω and s the associated extended object sequence, S covers Oi (S Oi ) iff s ∈ [S]ΓΩ . For instance, the object sequence O1 = hSurf line, P aris, Louvrei is covered by the pattern S1 = hhCity, M useum, LuxuryHoteli, {hasM useum(1, 2), hasLuxuryHotel(1, 3)}i. Given the above observations, it is not surprising that the algorithm that computes coverage follows the same control
structure as Algorithm 1. The only noticeable difference is that instead of a Boolean output, the new algorithm yields an integer number which is the index of the first class in S.str that was not matched within s. Moreover, as a byproduct, the algorithm also exports the Images structure which embeds the mapping between S and s. The latter is necessary for the computation of the set of suggested objects. Algorithme 9 Coverage computation 1: procedure C OVERS (S : pattern; s: extended object sequence) 2: i ← 1; j ← 1; 3: Reset(Images); 4: while i ≤ |S| do 5: while s(j) 6∈ [S(i)]Ω and j ≤ |s| do 6: j + +; 7: end while 8: if j > |s| then 9: return i 10: end if 11: while s(j) ∈ [S(i)]Ω and j ≤ |s| do 12: Images[i] ← Images[i] ∪ {j}; 13: j + +; 14: end while 15: concord ← true; 16: for all r(k, i) in S.rel do 17: if @r0 (m, n) ∈ s.rel s. t. r0 vΩ r, m ∈ Images[k], n ∈ Images[i] then 18: concord ← f alse; 19: Images[i] ← ∅ 20: end if 21: end for 22: if concord then 23: i + +; 24: end if 25: end while 26: return i 27: end procedure Once a pattern S is partially aligned with the sequence s, the resulting mapping may be used in the computation of the target object set to recommend. To that end, all the instances of the first class from S that was not matched to s, say c, are considered as candidates. This corresponds to the intuition that the pattern represents a scenario which is unfolding and that the current step corresponds to the class that has yet to be “realized”. Among all instances of c, only those having at least one link to objects from s.str in concordance with the relations from S.rel are kept. The underlying test uses the partial mapping between both structure (Images established during instanciation test) to check the concordance condition. The test may have two outcomes: either there are some objects that satisfy the conditions (in the extreme case all objects do, e.g., as no relations exist for c in S.rel) or there no such objects. In the later case, the pattern does not contribute to
the recommendation. The test is described in Algorithm 10. Algorithme 10 GET I NSTS(Ω: the ontology, S: a pattern, O: the object sequence, i : index of the first unmatched concept, Images : the partial mapping) 1: Ot ← ∅; 2: for all o ∈ ([S(i)]Ω - asSet(O) ) do 3: for all r(k, i) ∈ S.rel do 4: if ∃j s. t. j ∈ Images[k] and (s(j), o) ∈ [r]Ω then 5: Ot ← Ot ∪ {o}; 6: end if 7: end for 8: end for 9: return Ot
In the following, for the sake of simplicity, instead of presenting all the patterns at different levels, we follow the evolution of a single pattern starting from the initialization phase. Class Country Capital City T own M useum Hotel LuxuryHotel N ationalP ark Campground
For example, if we consider the sequences O1 and S1 given above as being the list of last visited objects and the test pattern, respectively, then the set of objects to be recommended will be: Le_P etit_M anoir, Royal_Garden, Le_W alt, Clarion _St.James which are instances of hasLuxuryHotel involved in the relational link hasLuxuryHotel with the instance P aris. VI. E XAMPLE OF APPLICATION In this section, we illustrates the way xPMiner operates and show how the resulting semantically enhanced sequential patterns can be used in a recommendation process. A. Patterns extracted with xPMiner As public domain datasets for web usage of ontologypowered sites are not availabble yet, we used in the validation of our method a synthetic set of object sequences. The underlying ontology, Travel, is a public domain. Table II shows a set of eleven object sequences, while Table III shows the correspondence between the individual objects and their membership classes in Travel. Moreover, the minimum (absolute) support is set to 6 (minsup = 6). Id 1 2 3 4 5 6 7 8 9 10 11
Object sequence hF rance, Grenoble, P aris, Louvrei hM orocco, Rabat, Sof itel_Diwan_Rabati hCoonabarabran, W arrumbungle_N ational_P ark, GU M IN _GU M IN _HOM EST EADi hAustralia, Cairns, Clif ton_Beach, Cape_Y ork_Saf ari, M ontra_T rilogyi hCanada, F orillon, M arche_2T iers, T he_2Campersi hItaly, Roma, M ausoleum_of _Augustus, Grand_Hotel_ P lazai hBulgaria, Sophia, T he_N ational_Gallery, Hotel_Aneli hAlgeria, T izi, Djurdjura, Hotel_El_Arz, T ala_Guilef i hP aris, Rodin, Royal_Gardeni hCairns, Gazelle_T anzania_Saf ari, Cif ton_Beachi hT eritoires_N ord_Ouest, W ood_Buf f alo, Little_Buf f aloi
Beach Saf ari Hiking Skiing
Instances Australia, Algeria, Bulgaria, Canada, Italy, M orocco P aris, Rabat, Roma, Sophia Grenoble, Cairns Coonabarabran, T izi Louvre, M ausoleum_of _Augustus, Rodin, T he_N ational_Gallery M ontra_T rilogy, Hotel_El_Arz Sof itel_Diwan_Rabat, Grand_Hotel_P laza, Royal_Garden, HotelA nel W arrumbungle_N ational_P ark, F orillon, W ood_Buf f alo, Djurjura GU M IN _GU M IN _HOM EST EAD, Little _Buf f alo, T he_2Campers Clif ton_Beach Gazelle_T anzania_Saf ari, Cape_Y ork _Saf ari M arche_2T iers T ala_Guilef TABLE III C LASS / INSTANCE MAPPING
At the first step, the candidate patterns have only one concept: S1 = hhAccommodationi, {}i , S2 = hhAccommodationRatingi, {}i, S3 = hhActivityi, {}i, S4 = hhContacti, {}i, S5 = hhDestinationi, {}i. Then, the candidate test procedure (Algorithm 1) calculates for each candidate the number of its instances in the object sequences. For example, S1 fails the test with the first and the tenth sequences but succeeds the instantiation with all the nine remaining sequences. The support of the five patterns is presented in Table IV. From this table, we observe that S2 and S4 have a null support. In fact, it is easy to see that neither AccommodationRating in S2 nor Contact in S4 have instances in the set of object sequences. At this point, the anti-monotony property of non-frequency ensures that all patterns that can be generated from a non-frequent pattern are non-frequent. All other candidates have a frequency greater than the fixed minsup 6. Thus, S1 , S3 , and S5 are marked as frequent and will be considered a seed set for the next pass. Pattern S1 S2 S3 S4 S5
Support 9 0 8 0 11
TABLE IV PATTERNS OF RANK 1 AND THEIR ASSOCIATED SUPPORT
TABLE II S EQUENCES FORMED
OF OBJECTS BELONGING TO THE TRAVEL ONTOLOGY
Each 1 − f requent pattern is tentatively extended by means of canonical operations thus yielding the set of 2 − candidates. For instance, concept
specialization on S5 produces the following set of 2 − candidates: hhBackpakerDestinationi, {}i, hhBeachi, {}i, hhBudgetHotelDestinationi, {}i, hhF amillyDestinationi, {}i, hhQuietDestinationi, {}i, hhRetireeDestinationi, {}i, hhRuralAreai, {}i and hhU rbanAreai, {}i. Alternatively, by concept insertion, the following set of 2 − candidates is obtained: hhDestination, Accommodationi, {}i, hhDestination, AccommodationRatingi, {}i, hhDestination, Contacti, {}i and hhDestination, Activityi, {}i. The list of all 2 − candidates is presented in Table V. Pattern hhBackpakerDestinationi, {}i hhBeachi, {}i hhBudgetHotelDestinationi, {}i hhF amillyDestinationi, {}i hhQuietDestinationi, {}i hhRetireeDestinationi, {}i hhRuralAreai, {}i hhU rbanAreai, {}i hhDestination, Accommodationi, {}i hhDestination, AccommodationRatingi, {}i hhDestination, Contacti, {}i hhDestination, Activityi, {}i C ANDIDATES
TABLE V 2 AND THEIR ASSOCIATED (2- CANDIDATES )
OF RANK
Support 0 2 0 0 0 0 4 9 9 0 0 8
SUPPORT
It is noteworthy that operations involving relations are not applicable until the step two of the process. Now, the 2 − candidates are filtered through the instantiation test in order to determine 2 − f requent patterns. We observe that only patterns hhDestination, Accommodationi, {}i, hhU rbanAreai, {}i and hhDestination, Activityi, {}i are frequent (see Table VI). These patterns, having rank 2, are then used to generate 3 − f requent patterns. Pattern hhU rbanAreai, {}i hhDestination, Accommodationi, {}i hhDestination, Activityi, {}i
Support 9 9 8
TABLE VI F REQUENT PATTERNS OF RANK 2 AND THEIR ASSOCIATED SUPPORT (2- FREQUENTS )
As an illustration, consider the extension of the pattern hhDestination, Activityi, {}i. Candidates generated by concept specialization (on Activity) are: hhDestination, Adventurei, {}i, hhDestination, Relaxationi, {}i, hhDestination, Sightseeingi, {}i, hhDestination, Sportsi, {}i, hhActivity, Accommodationi, {}i. Observe that several concept insertions generate the patterns: hhDestination, Activity, AccommodationRatingi, {}i, hhDestination, Activity, Contacti, {}i, and hhDestination, Activity, Destinationi, {}i. The pattern hhDestination, Activityi, {hasActivity(1, 2)}i is the only one obtained by relation
insertion. Table VII provides the frequency of the candidates at the 3rd step. Pattern hhCityi, {}i hhDestination, Accommodationi, {hasAccommodation(1, 2)}i hhU rbanArea, Activityi, {}i hhU rbanArea, Accommodationi, {}i hhDestination, Activityi, {hasActivity(1, 2)}i hhDestination, Activity, Accommodationi, {}i
Support 8 9 9 8 7 6
TABLE VII F REQUENT PATTERNS OF RANK 3 AND THEIR
ASSOCIATED SUPPORT
At the fourth iteration, after the generation and the subsequent instantiation test a set of eight 4 − f requent patterns is obtained (See Table VIII). Pattern hhCity, Activityi, {}i hhU rbanArea, Accommodationi, {hasAccommodation(1, 2)}i hhU rbanArea, Hoteli, {}i hhU rbanArea, Soghtseeingi, {}i hhU rbanArea, Activityi, {hasActivity(1, 2)}i hhDestination, Hoteli, {hasAccommodation(1, 2)}i hhDestination, Activity, Accommodationi, {hasAccommodation(1, 3)}i hhDestination, Activity, Accommodationi, {hasActivity(1, 2)}i
Support 6 8 6 6 7 6 6 6
TABLE VIII F REQUENT PATTERNS OF RANK 4 AND THEIR
ASSOCIATED SUPPORT
Table IX and Table X present frequent patterns of range 5 and 6 respectively. Table XI presents the list of 7−candidates with their associated support whereas the seventh step is the final one. Pattern hhU rbanArea, Hoteli, {hasAccommodation(1, 2)}i hhU rbanArea, Soghtseeingi, {hasActivity(1, 2)}i hhDestination, Hoteli, {hasHotel(1, 2)}i hhDestination, Activity, Accommodationi, {hasActivity(1, 2), hasAccommodation(1, 3)}i TABLE IX F REQUENT PATTERNS OF RANK 5 AND THEIR
Support 6 6 6 6
ASSOCIATED SUPPORT
Actually, the process will stop for one of the following reasons: (i) there is no more frequent patterns, (ii) no new candidate can be generated. In our case, the mining process stops at the 7-th iteration because all 7 − candidates are not frequent enough. B. Instance recommendation In this subsection we highlight the advantages of using the discovered patterns to suggest/recommend objects according to the last visited ones. The basis of recommendation systems is to apply data analysis techniques to generate a list of
Pattern hhU rbanArea, Hoteli, {hasHotel(1, 2)}i hhU rbanArea, Sightseeingi, {hasSightseeing(1, 2)}i
Support 6 6
TABLE X F REQUENT PATTERNS OF RANK 6
AND THEIR ASSOCIATED SUPPORT
Pattern hhU rbanArea, LuxuryHoteli, {hasHotel(1, 2)}i hhU rbanArea, Hotel, AccommodationRatingi, {hasHotel(1, 2)}i hhU rbanArea, Hotel, Contacti, {hasHotel(1, 2)}i hhU rbanArea, Hotel, Destinationi, {hasHotel(1, 2)}i hhU rbanArea, Hotel, Accommodationi, {hasHotel (1, 2)}i hhU rbanArea, Hotel, Activityi, {hasHotel(1, 2)}i hhU rbanArea, M useumi, {hasSightseeing(1, 2)}i hhU rbanArea, Saf arii, {hasSightseeing(1, 2)}i hhU rbanArea, Sightseeing, AccommodationRatingi, {hasSightseeing(1, 2)}i hhU rbanArea, Sightseeing, Contacti, {hasSightseeing(1, 2)}i hhU rbanArea, Sightseeing, Destinationi, {hasSightseeing(1, 2)}i hhU rbanArea, Sightseeing, Accommodationi, {hasSightseeing(1, 2)}i hhU rbanArea, Sightseeing, Activityi, {hasSightseeing(1, 2)}i TABLE XI C ANDIDATE PATTERNS OF RANK 7 AND THEIR
Support 4 0 0 0 0 1 4 1 0 0 0 1 4
ASSOCIATED SUPPORT
recommended content products for each user according to his/her past behavior. In our approach, the past behavior is represented by means of sequential patterns that combine conceptual and relational information. In the following, we use our example to illustrate the item-based, concept-based, and concept and relation-based recommendation approaches. In the first approach, items (objects) frequently accessed together by many users are recommended when at least one of them is selected by a new user. From the object sequences of Table II the only objects that appear more than once are Paris and Cairns with a frequency less than minsup. In the concept-based approach, access frequencies are computed on the set of domain concepts at different levels of abstraction, and then, instances of concepts appearing together are recommended. The patterns based on concepts are identical to those mined in Subsection VI-A except for the fact that they do not cover ontology relations. For instance, the patterns hU rbanArea, Hoteli and hU rbanArea, Sightseeingi correspond respectively to the patterns hhU rbanArea, Hoteli, {hasHotel(1, 2)}i, and hhU rbanArea, Sightseeingi, {hasSightseeing(1, 2)}i. In our approach, concepts and inter-concept relationships are integrated into the pattern mining and hence the recommendation processes as shown in Subsection VI-A. Because of the essential role of object co-occurrences in item-based recommendation systems, an object will not be suggested if it is not frequently visited. For example, in Table XII there is no object to be recommended for a sequence of
accessed objects made of M ontreal and Central_F lorida. In fact, these objects do not appear at all in the initial data sequences. In contrast, a concept-based recommendation strategy would find some objects to recommend. These are the instances of concepts Hotel and Sightseeing. However, in doing that, no distinction is made between the hotels in Montreal and elsewhere. The same holds for Sightseeing whose instances further belong to either Saf ari or M useum, and will be recommended regardless of whether they are located in the Montreal area or not. In fact, even if it is an immediate association for human minds, there is no information in the pattern structure that could help deduce the geographical links between urban areas, hotels and sightseeing. As a result, the recommendation quality would suffer. With our ontology-based approach, the information that avoids inconsistent recommendations of this sort is provided by inter-concept relations. For instance, to help users selecting M ontreal, our approach would use the above patterns hhU rbanArea, Hoteli, {hasHotel(1, 2)}i and hhU rbanArea, Sightseeingi, {hasSightseeing(1, 2)}i to recommend T he_M ontreal_M useum_of _F ine_Arts and P ointe_à_Callière, both instances of M useum, and Le_Saint_M alo and P ierre_du_Calvet, instances of Hotel. All of them are connected to M ontreal within T ravel. For users selecting F lorida_Beach, no 6 − F requent (see Table X) patterns match the choice. However, if patterns from Table IX are considered, then the pattern hhDestination, Activity, Accommodationi, {hasActivity(1, 2), hasAccommodation(1, 3)}i suggests an instance of Surf — subclass Activity — that is associated to F lorida_beach by a hasSurf link. The other pattern, hhDestination, Hoteli, {hasHotel(1, 2)}i, however, fails. Another advantage of using the proposed patterns is to spot possibly incoherent data in the mining process. Such incoherent data remain out of the reach of purely concept-based pattern discovery. For example, if we consider the object sequence hCairns, Gazelle_T anzania_Saf ari, Cif ton_Beachi in concept-based pattern discovery, this sequence will be considered as element of the interpretation set of the pattern hCity, Saf arii. However, it seems logical to rule out such a sequence because there is no apparent link between Gazelle_T anzania_Saf ari and Cairns (but this hypothesis might be wrong). VII. D ISCUSSION AND FUTURE WORK In this paper, we have investigated the problem of mining frequent patterns from sequences of objects described by means of a domain ontology. We proposed a language to describe the target patterns with its syntax and semantics. The structure of the underlying pattern space was studied and a mining procedure to explore it in a level-wise manner was defined. The design is based on a careful analysis of the Apriori framework, and the procedure is complete in the sense that it discovers all the frequent patterns.
Last object visited Itembased {}
Montreal
Classic approaches Generalized
Recommendations Ontology-based xPMiner
{All direct and indirect instances of classes Hotel and Sightseeing, from the ontology Travel}
Central_Florida
{}
{All direct and indirect instances of classes Activity and Accommodation from the ontology Travel
{The_Montreal_Museum _of_Fine_Arts, Pointe_à_Callière} {Le Saint_Malo, _du_Calvet} {The_Westin_Diplomat _Resort_and_Spa} {Surfline}
Pierre
TABLE XII O BJECTS SUGGESTED ACCORDING TO
DIFFERENT RECOMMENDATION STRATEGIES
Devising an effective mining procedure for the new family of patterns is only the first step of our study. Indeed, as it is well known that level-wise algorithms may perform poorly on some datasets. The next goal will be to improve success rate by designing algorithms that look in the pattern space in a depth-first way. Another topic to tackle is the definition of reduced representations of the frequent pattern family such as closed or maximal patterns. Recently, the xPMiner has been implemented as a plug-in for the Protégé4 ontology manipulation platform. Since the ultimate goal of our study is recommendation, the mining component will be integrated into a complete architecture for extracting and exploiting behavioral patterns (e.g., for predicting next hit/choice/move operations). R EFERENCES [1] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” In Proc. 20th International Conference on Very Large Data Bases, VLDB94, 1994. [2] ——, “Mining sequential patterns,” Proceedings of the Eleventh International Conference on Data Engineering, IEEE Computer Society., pp. 3–14, 1995. [3] R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance improvements,” Proc. 5th Int. Conf. Extending Database Technology, EDBT, Avigon, France, vol. 1057, pp. 3–17, 1996. [4] M. Adda, P. Valtchev, R. Missaoui, and C. Djeraba, “On the discovery of semantically enhanced sequential patterns,” IEEE Proceedings. Fourth International Conference on Machine Learning and Applications, pp. 383–390, December 2005. [5] H. Mannila, H. Toivonen, and A. Verkamo, “Discovery of frequent episodes in event sequences,” Data Mining Knowledge Discovery journal, vol. 1, no. 3, pp. 259–289, July 1997. [6] H. Pinto, “Multi-dimensional sequential pattern mining,” M.Sc. thesis, Computing Science, Simon Fraser University, April 2001. [7] H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and U. Dayal, “Multidimensional sequential pattern mining,” Proc. Int. Conf. on Information and Knowledge Management (CIKM’01), Atlanta, GA, November 2001. [8] C. Bettini, X. Wang, S. Jajodia, and L. Jia-Ling, “Discovering temporal relationships with multiple granularities in time sequences,” IEEE Transations on Knowledge and Data Engineering, vol. 10, no. 2, 1998. [9] W. Wang, J. Yang, and P. S. Yu, “Meta-patterns: Revealing hidden periodic patterns,” Proceedings of the 2001 IEEE International Conference on Data Mining, 2001. [10] J. Yang, W. Wang, and P. S. Yu, “Discovering high order periodic patterns,” Discovering High Order Periodic Patterns. Knowledge and Information Systems, Publisher: Springer-Verlag London Ltd, pp. 243 – 268, May 2004. 4 http://protege.stanford.edu
[11] M. Adda, R. Missaoui, P. Valtchev, and C. Djeraba, “Recommendation strategy based on relation rule mining,” IJCAI-2005 Workshop on Intelligent Techniques for Web Personalization, 2005. [12] M. Kuramochi and G. Karypis, “An efficient algorithm for discovering frequent subgraphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 9, pp. 1038–1051, 2004. [13] A. Inokuchi, T. Washio, and H. Motoda, “An apriori-based algorithm for mining frequent substructures from graph data,” PKDD ’00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 13–23, 2000. [14] M. J. Zaki, “Efficiently mining frequent trees in a forest,” KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada, pp. 71–80, 2002. [15] Y. Chi, Y. Yang, and R. R. Muntz, “Indexing and mining free trees,” ICDM2003 : Proceedings of the International Conference on Data Mining, 2003. [16] X. Yan and J. Han, “gspan: Graph-based substructure pattern mining,” ICDM ’02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), p. 721, 2002. [17] M. Cohen and E. Gudes, “Diagonally subgraphs pattern mining,” Proceedings of the 9th ACM SIGMOD Workshop on Research issues in data mining and knowledge discovery, 2004. [18] X. Yan and J. Han, “Closegraph: Mining closed frequent graph patterns,” KDD’03 : Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 286–295, 2003. [19] Y. Chi, Y. Yang, and R. R. Muntz, “Cmtreeminer: Mining both closed and maximal frequent subtrees,” PAKDD’04 : The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2004. [20] J. Huan, W. Wang, J. Prins, and J. Yang, “Spin: Mining maximal frequent subgraphs from graph databases,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 581–586, 2004. [21] T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web,” Scientific American, May 2001. [22] D. J. Commette, “Daml+oil language,” http://www.daml.org/2001/03/daml+oil-index.html, March 2001. [23] D. L. McGuinness and F. van Harmelen, “Owl web ontology language overview,” http://www.w3.org/TR/owl-features/, 2004. [24] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. PatelSchneider, “The description logic handbook : Theory, implementation and applications,” Cambridge University Press, p. 574, 2003. [25] G. W. Mineau, M. Bernard, and J. F. Sowa, “Conceptual graphs for knowledge representation,” Lecture Notes in Artificial Intelligence 699, Springer-Verlag, Berlin, 1993. [26] G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, pp. 734–749, June 2005.