Closed patterns and abstraction beyond lattices

3 downloads 0 Views 256KB Size Report
there exists a local meet operator and that we call pre-confluences. ... support-closed elements of F with respect to O. When requiring the existence of such.
Closed patterns and abstraction beyond lattices Henry Soldano Universit´e Paris 13, Sorbonne Paris Cit´e, L.I.P.N UMR-CNRS 7030 F-93430, Villetaneuse, France

Abstract. Recently pattern mining has investigated closure operators in families of subsets of an attribute set that are not lattices. In particular, various authors have investigated closure operators starting from a context, in the Formal Concept Analysis (FCA) sense, in which objects are described as usual according to their relation to attributes, and in which a closed element is a maximal element of the equivalence class of elements sharing the same support, i.e. occurring in the same objects. The purpose of this paper is twofold. First we thoroughly investigate this framework and relate it to FCA, defining in particular a structure called a preconfluence, weaker than a lattice, in which we can define a closure operator with respect to a set of objects. Second, we show that the requirements allowing us to define abstract concept lattices also allow us to define corresponding abstract Galois pre-confluences.

1

Introduction

Until recently searching for closed motifs or patterns when exploring data was restricted to lattices as pattern languages. A pattern in some language L is said closed whenever it can be obtained by applying a closure operator to some pattern. This subject has been thoroughly explored both from a mathematical and algorithmical point of view as well in formal concept analysis, Galois analysis and, more recently, in data mining. Most of this work considers support-closed patterns. In this case, we also have a set of objects O and a motif may occur or not in each object. The support of a motif is then the subset of objects in which the motif occurs. The language is a lattice with respect to a generalto-specific ordering and each object is described by a motif. A motif then occurs in an object whenever the motif is more general than the object description. Motifs that cannot be specialized without losing some object in their support are said support-closed. Clearly, there is some redundancy in enumerating all motifs when we are concerned with properties relative to their support, such as frequency, and it is interesting to only consider support-closed motifs. In a lattice, support-closed motifs may be efficiently searched for because there exists a closure operator on the lattice that returns as the closure f (t) of some pattern t the unique support-closed pattern sharing the same support as t. The most investigated pattern language is the power set 2X of some attribute set X, ordered following the set-theoretic inclusion order. Formal Concept Analysis [1] as well as Galois analysis [2,3] relies on the relation between objects and attributes. In data mining, these ideas have been investigated under the name of itemsets mining and rely on the same relation[4].

Recently, pattern mining has gone beyond this general framework in two directions. First, various mining problems have been investigated that come down to searching for closed motifs which cannot be considered, strictly speaking, as support-closed motifs, such as convex hulls of subsets of a given set of points, or sequential motifs with wildcards. Solving the problem then means defining and building the corresponding closure operator [5]. To characterize such closure operators, the authors make use of a wellknown theorem stating that in a finite lattice T there is a one-to-one correspondence between the families closed under the meet operator and the closure operators on T . Second, various mining problems have been addressed in which the pattern language is not a lattice, in particular problems where closed motifs are support-closed motifs with respect to some dataset of objects. A framework has been proposed for that purpose in which the language is a family F included in a host lattice 2X . The pair (F, 2X ) is denoted as a set system. For instance, consider the set of the subgraphs generated by a subset of the set X of the edges of a given graph G = (V, X). Such a subgraph can be represented as a subset of X and therefore searching for support-closed patterns can be performed as a standard 2X lattice mining problem. However if we want to consider as a language the family F of connected subgraphs of G, then F is not a lattice1 . Still there is a closure operator that relates a connected subgraph to a supportclosed connected subgraph. This means that we can use the same kind of algorithm that specializes a closed pattern, computes the support of the new pattern and closes it in the same way we do in the lattice mining case. In their paper [6] M. Boley and coauthors state in particular the necessary and sufficient conditions that the family F of a set system (F, X) has to fulfill in order to guarantee that whatever the dataset O of objects is2 , there exists a closure operator to compute support-closed patterns. These conditions on set systems defines the property of confluence that requires a kind of local union closure on F : given three elements t, t1 , t2 non empty elements of F , if t1 and t2 are greater than or equal to t then necessarily t1 ∪ t2 belongs to F . This condition is clearly satisfied by the connected subgraphs family in the representation mentioned above: consider two connected subgraphs represented as their edge sets t1 and t2 and each including an edge set t, then as the subgraph generated by t is connected and non empty, t1 ∪ t2 also generates a connected subgraph and therefore belongs to F . Our contribution concerns the two directions mentioned above. First, we state sufficient conditions to obtain closed patterns for structures weaker than lattices. This extends the theorem on finite lattices mentioned above to finite partial orders in which there exists a local meet operator and that we call pre-confluences. We obtain then that the set f [F ] of closed elements of a pre-confluence F also is a pre-confluence. The main condition requires that given three elements t, t1 , t2 of F , if t1 and t2 belongs to the upset ↑t then there exists a greatest lower bound of t1 and t2 in the upset ↑t of F . This local meet element is denoted by t1 ∧t t2 . In the case of set systems, the family F = {a, b, abc, abd, abcd} is a pre-confluent family. Here we have that abc ∧a abd = a and abc∧b abd = b i.e. there are two maximal lower bounds of abc and abd in F because ab does not belong to F . The left part of Figure 1, represents a pre-confluent family F where a, b, c, d are the edges of a graph. 1 2

the intersection of two such connected subgraphs is not necessarily connected with some mild restriction we discuss further

Second, we show that when adding to the pre-confluence property some condition on the elements of a set of objects O, there exists a closure operator returning the support-closed elements of F with respect to O. When requiring the existence of such a closure operator for any database O whose objects are represented as elements of a lattice T ⊇ F , we need a stronger property denoted by confluence* which is stronger than confluence on set systems and which scope is extended to any host lattice T . The property requires that the local join condition has to be satisfied even when t is the bottom element of T and belongs to F . The pre-confluence on the left part of Figure 1 is a confluence* hosted by the whole set of subgraphs generated by subsets of {a, b, c, d}.

a

b

a

b

a

{o1,o2,o 3}

c

c a

b

a

b

d

o1

o2

b {o ,o } 2 3

o3 c

c a

b d

a d

b {o } 3

Fig. 1. The diagram on the left represents a family F of connected subgraphs each generated by a subset (represented by a word) of the edges {a, b, c, d} of the original graph.The subgraphs generated by a and by b are the minimal elements. F is a pre-confluent family in which, for instance, {abc, abd} have two maximal lower bounds, a = abc ∧a abd greater than or equal to a, and the other, b = abc ∧b abd greater than or equal to b. The diagram on the right represents the support closed pre-confluence f [F ] (see Section 3.2) with respect to the set of subgraphs O = {o1 , o2 , o3 } represented on the middle part of the figure. The closed patterns abc and abcd represent the greatest connected subgraphs whose extensions are respectively {o2 , o3 } and {o3 }. The thick box around closed patterns a and b indicates that both patterns have the same extension {o1 , o2 , o3 }. Elements a and b are the closed patterns of the bottom elements of the projected concept lattices built respectively from (F a , Oa ) and (F b , Ob ) and represented respectively as the up sets f [F ]a and f [F ]b .

We can also observe in Figure 1 that for any element x of the pre-confluence F , the upset ↑ x is a lattice. This is a general and straightforward result, that allows to link closure operators on pre-confluences to closure operators on lattices, and therefore allows to relate FCA to the analysis of support-closed patterns in pre-confluences. Finally, a last contribution consists in noticing that when F is a pre-confluence, by applying an interior operator to the extensional space 2O , therefore obtaining abstract supports, we can build an abstract support closure operator. This means that we extend abstract Galois lattices, as alpha Galois lattices [7], to abstract Galois pre-confluences.

2

Closure subsets of a partial order

We are interested here in closed elements of an ordered set. When this ordered set refers to a language for pattern mining, we call patterns the elements of the ordered set. 2.1

Preliminaries

We first recall definitions of closure and dual closure operators: Definition 1 Let E be an ordered set and f : E → E be an automorphism such that for any x, y ∈ E, f is monotone, i.e. x ≤ y =⇒ f (x) ≤ f (y) and idempotent, i.e. f (f (x) = f (x), then: – if f is extensive, i.e. f (x) ≥ x, f is called a closure operator – if f is intensive, i.e. f (x) ≤ x, f is called a dual closure operator or an interior operator, or also a projection. In the first case, an element such that x = f (x) is called a closed element. We define hereunder a closure subset of an ordered set E as the range f [E] of a closure operator on E. We give then a necessary and sufficient condition for a subset of E to be a closure subset. This condition answers the general question of which subsets of some pattern language are sets of closed patterns. The set of upper bounds of some element x in E is denoted as the up set ↑x = {y | y ≥ x} also denoted as E x when more than one partial order is concerned. In the same way, the set of lower bounds of x is denoted as the down set ↓ x = {y | y ≤ x} also denoted as Ex . Definition 2 (T.S. Blyth [8]) A subset C of an ordered set E is called a closure subset if there is a closure f : E → E such that C = f [E]. Proposition 1 (T.S. Blyth [8]) A subset C of an ordered set E is a closure subset of E if and only if for every x ∈ E the set ↑x ∩ C has a bottom element x∗ . The closure f : E → E is then unique and defined as f (x) = x∗ . However this property does not give a direct information in which pattern languages closed patterns are to be found and in which conditions closure operators exist. A direct information is provided by a well known result on closure subsets of complete ∧-semilattices [1]. This result states that in such a pattern language, the closure subsets are the subsets closed by the meet operator ∧. When the language is the power set of some set X, the meet operator simply is the intersection operator ∩. Proposition 2 Let T be a lattice. A subset C of T is a closure subset if and only if C is closed under meet. The closure f : T → T is then unique and defined as f (x) = ∧{c∈C∩↑x} c and C is a lattice.

All ordered sets considered here are finite, and as all lattices are finite lattices they are also complete lattices: any subset of a lattice T is then closed under arbitrary meet and arbitrary join. Note that when saying that C is closed under meets we intend here that the meet of ∅ also belongs to C. Therefore > = ∧∅ c belongs to C. We will also further need the dual proposition which states that a subset A of T is a dual closure subset, also denoted as an abstraction, whenever A is closed under joins. The interior operator p : T → T is then defined as p(x) = ∨{a∈A∩↓ x} a, A is a lattice and ⊥ belongs to A. In particular when T is a powerset 2K , p(x) = ∪{a∈A|a⊆x} a. We are interested now in pre-confluences which are structures weaker than lattices. 2.2

Closure subsets in pre-confluences

Definition 3 Let F be an ordered set such that for any t ∈ F , ↑t is a ∧-semilattice and has a top element. F is called a pre-confluence, x ∧t y is a local infimum or local meet, and >t a local top. Lemma 1 Let F be a pre-confluence, then for any t in F and x, y ∈ F ∩ ↑t 1. ↑t is a lattice with as join, denoted as x ∨F y, the least element of ↑x∩ ↑y 2. Let t0 ≥ t then ↑t0 is a sublattice of ↑t. Proof 1. As F is a pre-confluence, ↑t is a finite ∧-semilattice (with meet x ∧t y) and has a top element (>t ). As a consequence of a well known result on lattice theory, ↑t is lattice. The join x ∨t y is the least upper bound of {x,y} in ↑t, i.e. the least element of ↑t∩ ↑x∩ ↑y which is also ↑x∩ ↑y, as both x and y are greater than or equal to t. As it does not depend on t we simply denote it as x ∨F y. 2. For any t0 ≥ t and x, y in ↑t0 , x, y also belong to ↑t, As a consequence, x ∧t0 y is also a lower bound of {x, y} in ↑t, and therefore t0 ≤ x ∧t0 y ≤ x ∧t y. But this means that x ∧t y belongs to ↑t0 and therefore is also smaller than or equal to x ∧t0 y. As a consequence we have that x ∧t0 y = x ∧t y. As ↑t0 has same meet and join as ↑t, it is a sublattice of ↑t. 2 Furthermore we only need minimal elements of F to check whether F is a pre confluence: whenever there is a local meet and a local top on the up set of minimal elements, there is also a local meet and a top element in the up set of any element of F . Lemma 2 F is a pre-confluence if and only if for any m ∈ min(F ), ↑ m is a ∧semilattice and has a top element. Proof if F is a pre-confluence, as M ⊆ F obviously all ↑m are ∧-semilattices and have a Top element. Now suppose that all elements m of M are such that ↑ m is a ∧-semilattice and has a Top element, then consider some t ≥ m and two elements t1 , t2 ∈↑t, we have then that t1 , t2 ∈↑m. We know that t1 ∧m t2 is the greatest lower

bound of {t1 , t2 } in ↑m and as t is a lower bound of {t1 , t2 } and t ∈↑m, we have that t1 ∧m t2 ∈↑t. As a consequence t1 ∧m t2 is also the greatest lower bound of {t1 , t2 } ∈↑t and so t1 ∧t t2 exists and this means that ↑t is a ∧-semilattice. Furthermore, >m also belongs to ↑t and therefore ↑t also has a greatest element. As for any t ∈ F there exists some m ∈ M such that t ≥ m, then F is a pre-confluence. 2 Definition 4 A subset C of a pre-confluence F is called closed under local meet whenever for any element t and any C 0 ⊆ C∩ ↑t we have ^ c belongs to C. t {c∈C 0 }

V This means in particular that >t = t ∅ c belongs to any subset which is closed under local meet and then, by definition, C is also a a pre-confluence. The following theorem extends Proposition 2 to pre-confluences: Theorem 1 Let F be a pre-confluence. A subset C of F is a closure subset if and only if C is closed under local meet. The closure f : F → F is then defined as f (t) = ∧t{c∈C∩↑t} c and C = f [F ] is a pre-confluence. Proof We use Proposition 2 and the fact that ↑t in a pre-confluence is a lattice. – ⇒ C is a closure subset of F means that there exists a closure operator f : F → F such that f [F ] = C. As F is a pre-confluence, for any t ∈ F , C t =↑t ∩ C is a lattice with meet operator ∧t . Furthermore, for any x ∈↑t, we have that f (x) ∈↑t (extensivity of f ). We can then define ft :↑t →↑t such that for any x ∈↑t, ft (x) = f (x). It is straightforward that ft is a closure on ↑t as f is a closure on F . As a result, from Proposition 2 we have that C t = ft [↑ t] is closed under the meet operator ∧t of ↑ t. But, as this is true for any t in F , this also means that C = ∪t∈F C t is by definition closed under local meet. – ⇐ Let C be a subset of F closed under local meet, and let for any t in F , C t = ↑t ∩ C. By hypothesis, for any x, y ∈↑t, x ∧t y belongs to C, and as x ∧t y is the greatest lower bound of x and y in ↑ t, we have that x ∧t y belongs to C t . This means that C t is a subset of the lattice ↑t and is closed under the meet operator. As a result of Proposition 2 we have then that there exists a closure ft :↑t →↑t which is such that for any x ∈↑t, ft (x) = ∧tc∈↑x∩C t c. Furthermore, as x ∈↑t, we have that ↑x ∩ C t =↑x ∩ C and therefore ft (x) = ∧tc∈↑x∩C c and also as ↑x is a sublattice of ↑t, ft (x) = fx (x) = ∧xc∈↑x∩C c . Let then define f : F → F as f (x) = fx (x). It is straightforward that f is a closure: • f (x) = ft (x) for any t ≤ x, therefore as ft is a closure, ft (x) ≥ x. As there always exists such a t, then f (x) ≥ x • if x ≥ y we have some t such that x, y ∈↑ t, therefore f (x) = ft (x) and f (y) = ft (y) and therefore f (x) ≥ f (y). • We have that f (x) ≥ x and there is some t in F such that f (x), x both belong to ↑t, therefore f (f (x)) = ft (ft (x)) = ft (x) = f (x).

2 As a summary, we have a generalization of the meet operator which is the basis of most work on closed patterns in data mining, as well as all work on formal concept analysis. This generalization, denoted as local meet operator ensures the existence of closure operators whose ranges are subsets closed with respect to the local meet operator. Whenever we consider a pre-confluence as a subset of a finite powerset 2X we call F also a pre-confluent family. A typical example of such a structure is the set of subgraphs generated by the vertices (or edges) of a given graph. We consider here the family F = {a, b, abc, abd, abcd} which diagram is represented in the leftmost part of Fgure 1. Here we have that abc ∧a abd = a and abc ∧b abd = b i.e. there are two maximal lower bounds of abc and abd in F because ab does not belong to F . Note that the up sets F a and F b are lattices, and share the same join operator, which in this case is the union operator.

3 3.1

Support closed patterns with respect to a set of objects Support closures in lattices

The standard case in which closed patterns are searched for is when the language is a lattice and that closure of a pattern relies on the occurrences of the pattern in a set of objects. In data mining the set of occurrences is known as the support of the pattern whereas in Formal concept analysis the set of occurrences defines the extension of the pattern and the extent of the corresponding concept. Definition 5 Let F be a partial order and O a set of objects, a relation of occurrence on F × O is such that if t1 ≥ t2 and t1 occurs in o then t2 occurs in o. The extension of t in O is defined as ext(t) = {o ∈ O | t occurs in o}. The cover of o is defined as the part of F whose elements occur in the object o, i.e. S(o) = {t ∈ F | t occurs in o}. The cover of a subset e of objects is defined as the part of F whose elements occur T in all objects of e, i.e. S(e) = {o∈e} S(o). We will say hereafter indifferently that t belongs to the cover of o, or that t occurs in o. The intuition here is that the order is a specificity order and whenever a pattern occurs in some object o then a more general pattern will also occur in o. This rewrites also as t1 ≥ t2 ⇒ ext(t1 ) ⊆ ext(t2 ). When F is a lattice, the interesting case is the one in which objects can be described as elements of F : Proposition 3 Let T be a lattice and O a set of objects, then if for any object o the cover of o has a greatest element d(o), denoted as the description of o in T , then, for any subset e of O ^ int(e) = d(o) o∈e

is the greatest element of the cover S(e) of e, denoted as the intension of e, and (int, ext) is a Galois connection on (2O , T ). Proposition 4 int ◦ ext and ext ◦ int are closure operators respectively on T and 2O and the corresponding sets of closed elements are anti-isomorphic3 lattices whose related pair (t, e) form a lattice called a Galois lattice. In FCA, the lattice is a powerset 2X of attributes, the description of an object i is the subset of attributes in relation with i, the table of this relation is a formal context and the Galois lattice formed by pairs of corresponding closed elements in 2X and 2O ordered following 2O is called a concept lattice. Note that, any Galois connection between two lattices may be rewritten as the connection between two powersets and therefore there is no strict gain in expressive power in the more general setting. However, the direct formulation as sets of closed elements of the lattice T is often useful [9,10,3]. Proposition 3 follows from, for instance, theorem 2 in [3]. Projected or abstract Galois lattices have been recently defined by noticing that applying an interior (or projection) operator on T [10,11] or 2O (or both) [11,7] when there exists a Galois connection between them, we obtain again closure operators and lattices of closure subsets. Because of the one-to-one correspondence between projections (dual closures) and abstractions (subsets closed under joins) the corresponding projected Galois lattices are also called abstract Galois lattices[12]. Proposition 5 Let (int, ext) be a Galois connection on (2O , T ). – Let p be an interior operator on T , then (p ◦ int, ext) defines a Galois connection on ((2O , p(T )) – Let p be an interior operator on 2O , then (int, p ◦ ext) defines a Galois connection on (p(2O ), T ) In both cases the closure subsets are anti-isomorphic and form a Galois lattice, denoted respectively as intensional and extensional abstract Galois lattices. In the conditions of Proposition 3 when considering two elements as equivalent whenever they share the same extension with respect to O, a closed element f (t) = int ◦ ext(t) is the greatest element of the equivalence class associated to ext(t). More generally, in data mining extensions are denoted as supports, and an element x of a pattern language is said support closed with respect to O whenever for any element y > x we have that ext(y) ⊂ ext(x) [6]. In other words, a support-closed element x is a maximal element of the equivalence class associated to its support ext(x). The previous proposition says that when its conditions are satisfied support-closed elements are obtained using a support closure operator and that there is exactly one such supportclosed element in each equivalence class. 3

i.e.isomorphic to the dual of f [T ]

3.2

Support closures in pre-confluences

We discuss now under which conditions support closures exist in pre-confluences. We will further denote the up set ↑t as F t . First we benefit from the fact that up sets of a pre-confluence are lattices in this straigthforward corollary of Proposition 3: Lemma 3 Let F be a pre-confluence, O be a set of objects, and consider Ot = ext(t). If, for any object o and any element t of F that occurs in o, S(o) ∩ F t has a greatest element dt (o), then, for any subset e of Ot , ^ intt (e) = dt (o) t {o∈e}

is the greatest element of the cover of e in F t , (intt , ext) is a Galois connection on t (2O , F t ) and intt ◦ ext is the support closure operator on F t with respect to O. We will further denote dt (o) as the local description of o with respect to t. We have then the following proposition: Theorem 2 Let F be a pre-confluence, O a set of objects, then If for any object o and any element t of F that occurs in o,o has a local description dt (o), then for any subset e of O, and any t belonging to the cover S(e), – intt (e) is the greatest element of S(e) ∩ F t – f defined by f (t) = intt ◦ ext(t) is a support closure on F with respect to O. The pairs (f (t), ext(t)) form a pre-confluence isomorphic to f [F ]] called a Galois preconfluence. Proof For any t in F that occurs in e, we have that the cover of e in F t is S(e) ∩ F t and as a consequence, intt ◦ ext(t)) is the greatest element of S(e) ∩ F t . Moreover a consequence of Lemma 3 is that for any t in F , ext ◦ intt (ext(t)) = ext(t). The closed element f (t) is the greatest element greater than or equal to t and sharing the same extension as t. This means that the support-closed elements of F form the closure subset f [F ], f is a support closure operator and by Theorem 1 f [F ] is a pre-confluence. 2 This means that the interesting case regarding pre-confluences is the one in which each object has a local description with respect to any t that occurs in o. This also means that the subset F (e) of elements whose extension is some e is partitioned in such a way that each part has a greatest element tm and contains the elements F (e) smaller than tm , and that tm = f (t) for all these elements. The following lemma shows that pre-confluences generalize lattices and Theorem 2 generalizes Proposition 3: Lemma 4 Whenever a pre-confluence F has a bottom element ⊥F , then 1. F is a lattice

2. If the occurrence relation is such that ⊥F occurs in all objects of O, then if for any object o and any t ∈ S(o), o has a local description dt (o) with respect to t, then o has a description d(o). Proof This is straightforward regarding (1) as any ↑t is a lattice and therefore, ↑⊥F = F also is a lattice. Regarding (2), first remark that as ⊥F occurs in any S(o), then S(o) is non empty, then again as ↑⊥F = F we have ↑⊥F ∩ S(o) = S(o) and therefore S(o) has a greatest element. 2 In the next section we connect these structures to Formal Concept Analysis.

4

Galois pre-confluences as union of Galois lattices

We consider now the standard case of a lattice T in which each object o of O has a description d(i) in T , and we further consider that any element of T can be such a description. We are then interested in which subsets F of T have support-closures with respect to any O. We connect here to the seminal result of M. Boley and co-authors [6] on confluent systems. To avoid confusion, up sets and down sets of T starting from an element x will be denoted respectively as T x and Tx wherease the notations F t and Ft will be used for the up sets and down sets of the subset F . We will first need a lemma to characterize how an object, as an element x of T , can be represented in F , then we add a condition to pre-confluences to obtain confluences* on which support closure operators are defined whatever is the object set O. Lemma 5 Let F be a subset of a lattice T . If for any t ∈ F and any x ∈ T t , there exists a greatest element pt (x) in F t ∩ Tx , then the mapping pt : T t → T t is an interior operator on T t and pt (T t ) = F t Proof 1. pt (x) ≤ x ? The hypothesis ensures that pt (x) belongs to Tx and therefore pt is intensive. 2. x ≤ y → pt (x) ≤ pt (y)? We have Tx is included in Ty and therefore F t ∩ Tx is included in F t ∩ Ty and the greatest element of F t ∩ Ty is greater than or equal to the greatest element of F t ∩ Tx . Therefore pt (x) ≤ pt (y). 3. dt (dt (x)) = dt (x) ? First note that pt (x) ∈ F t and therefore in T t . This means that pt (pt (x)) is defined. Let q = pt (x), then pt (q) = maxF t ∩ Tq . But as q belongs to F t and by definition q = maxTq , we conclude that q is the greatest element of F t ∩ Tq i.e. pt (q) = q By definition of pt , pt (T t ) ⊆ F t . Furthermore, we have seen that for any q ∈ F t we have q = pt (q) and therefore pt (T t ) ⊇ F t . As a consequence pt (T t ) = F t . 2 pt (x) is the local description of x in F t . Proposition 6 Let F be a subset of a lattice T , the following properties are equivalent:

1. For any t ∈ F and any x ∈ T t , there exists a greatest element pt (x) in F t ∩ Tx 2. For any x, y, t in F such that x and y belong to F t , x ∨ y belongs to F 3. F is a pre-confluence with join ∨F = ∨ F is then denoted as a confluence* on T and we have that pt (x) = ∨q∈F t ∩Tx q Proof 1 implies 2 as T t is a lattice and pt is a projection on T t and therefore F t = pt (T t ) is closed under join (as projections are dual of closure operators and closure subsets on lattices are closed under meet). 2 implies 3 as for any t ∈ F we have that F t is closed under union and has by definition t as its least element, and therefore F t is a lattice and have a meet operator ∧t . Finally 3 implies 1 as when considering two greatest elements q and q 0 in F t ∩ Tx we have that q ∨ q 0 is in F and therefore in F t and is also in Tx , and as a result we have that q = q 0 . The definition of the projection pt on the lattice F t as a join is a consequence of the dual result of Proposition 2. 2 Theorem 3 Let F be a confluence* of a lattice T , O be a set of objects described as elements of T and pt denote the local description operators on F , then we have that: f (t) = pt ◦ int ◦ ext(t) where (int, ext) is a Galois connection on (T, O), is a support closure operator on F with respect to O. Proof As F is a confluence*, from Proposition 6 we deduce that the conditions of Proposition 2 are satisfied. Furthermore, recall that pt is a projection on T t and by Lemma 5 that pt (T t ) = F t , we have then that following Proposition 5 (pt ◦ int ◦ ext) is the support closure operator on F t with respect to Ot . 2 Conversely, in order to guarantee that such a support closure operator exists for any set of objects O described in T , a subset of T has to be a confluence*: Proposition 7 Let F be a subset of the lattice T , then the support closure operator on F with respect to any set O whose objects are described as elements of T exists if and only if F is a confluence*. Proof We only need here to show that whenever F is not a confluence*, there is always an object set O such that the support closure operator does not exist. Recall that F is not a confluence* means that there is some t in F and some x, y in F t such that x ∨ y does not belong to F (and so x 6= y). Consider then O = {x ∨ y}, we have that S({x ∨ y} ∩ F t has both x and y has maximal elements as any element that occurs in x ∨ y has to be smaller than or equal to x ∨ y and cannot be greater than both x and y because it would then be greater than x ∨ y the lowest upper bound of x and y. Now, a support closure operator f should be such that f (x) = x and f (y) = y has they are both maximal elements of the cover of x ∨ y. Furthermore, consider that t is one of the maximal lower bounds of x and y (if not, we can replace t by one such element, and still have that F is not a confluence*). Then the support closure of t, f (t) should be be either x or y as t is not a maximal element of S(x ∨ y). But, whatever is the choice f is then not monotone and therefore f cannot be a closure operator. 2

In [6], the lattice T is a powerset 2X and a confluent system S is similar to the latter definition of confluences* except that ⊥ = ∅ belongs to S but x ∪ y is only required to belong to F when x ⊇ t and y ⊇ t for any t! = ∅. Proposition 7 is a straightforward adaptation of the theorem of [6] when T = 2X , confluent systems replaces confluences*, and which prohibits to have any attribute common to all objects in O. A useful result is the following: Proposition 8 If F is a confluence*, then if q ≤ t, and x ∈ T q , then pt (x) = pq (x) Proof By definition pt (x) (resp. pq (x)) is the greatest element of F t ∩ Tx (resp. F q ∩ Tx ). As F t ⊆ F q , we have also that F t ∩ Tx ⊆ F q ∩ Tx . As both sets have greatest elements, the greatest element of F t ∩ Tx is also the greatest element of F q ∩ Tx . 2 This means that to compute the support closure of some t we only need pm where m ∈ min(F ). Implicitly this also means that whether t is greater than two minimal elements m and m0 then pm (t) = p0m (t). For instance, in our example of connected subgraphs generated by edges of some graph, the minimal elements are the edges. As a consequence, connected subgraphs under some edge e simply are obtained by projecting subgraphs containing e on their connected component containing e. To summarize, first the support closure set f (F ) of a confluence* F on some lattice T , forms a pre-confluence of T , and second, we only need the minimal elements of F and their associated interior operators to characterize the pre-confluence f [F ]. When considering T = 2X ,T t is 2X\t and pt is an interior operator on 2X\t . 4.1

Implications

Another question regards the definition and construction of an implication basis whose implications have both left part and right part in F . An implication p → q holds on F whenever ext(p) ⊆ ext(q) and a basis of such implications is typically made of implications such that both p and q belong to the same equivalence class i.e. ext(p) = ext(q). Whenever F is a lattice, the nodes of the concept lattice represents these equivalence classes and q is a closed pattern i.e. the greatest element of the class, and therefore we have p ≤ q. As an example the min-max basis is made of the implications p → q where p 6= q and p is a minimal element of the class of q [13]. Whenever F is a confluence*, we have seen that each such equivalence class is associated to several closed patterns q1 ...qm each being the greatest element of a subclass. We have then in the basis both implications of the form pi → qi where pi ≤ qi and both belong to subclass i together with implications of the form pj → qi where j 6= i and therefore pj and qj are unordered. We extend the idea of the min-max basis to confluences* as follows: Definition 6 Let F be a confluence*, and F (e) = {t ∈ F | ext(t) = e}, the min-max basis B = Bi ∪ Be of implications in F is defined as the set {p → q | ext(p) = ext(q), p 6= q, p ∈ min(F (e)), q ∈ f [F (e)] } The internal sub basis Bi is made of the implications of the form pi → qi where pi ≤ qi and the external sub basis Be is made of the implications of the form pj → qi where {pj , qj } are unordered.

There are other implication basis such as the minimal Guigue-Duquenne basis [14] that can be as well extended to the case of confluences*. 4.2

Example

We consider here the example displayed in Figure 1. We have F = {a, b, abc, abd, abcd} and O = {ab, abc, abcd}. To compute the closures in F we take advantage of the fact that F has two minimal elements a and b and that for any t ≥ a (resp. t ≥ b) we can write f (t) = pa ◦ int ◦ ext(t) (resp. (f (t) = pb ◦ int ◦ ext(t)). We obtain then: – f (a) = pa ◦ int({ab, abc, abcd}) = pa (ab) = a – f (b) = pb ◦ int({ab, abc, abcd}) = pb (ab) = b – f (abc) = pa ◦ int({abc, abcd}) = pa (abc) = abc (we could have used pb as abc ∈ T b with the same result abc) – f (abd) = pa ◦ int({abcd}) = pa (abcd) = abcd (same remark as above) – f (abcd) = pa ◦ int({abcd}) = pa (abcd) = abcd (same remark as above) Note that the confluence* F is the union of the two lattices F a = {a, abc, abd, abcd} and F b = {b, abc, abd, abcd}. Therefore we have f [F ) = {a, b, abc, abcd} which is a pre-confluence whose minimal elements are f (a) = a and f (b) = b. We have that f [F ] = f [F A ] ∪ f [F b ] where f [F a ] and f [F b ] are the sets of closed patterns from the concept lattices built respectively from (F a , Oa ), and from (F b , Ob ). We have here f [F a ] = {a, abc, abcd} and f [F b ] = {b, abc, abcd}. Regarding the min-max implication basis we first consider the set of extensions ext[F ] = {e1 = {ab, abc, abcd}, e2 = {abc, abcd}, e3 = {abcd}} together with the corresponding equivalence classes F (e1 ), F (e2 ), F (e3 ). Each each equivalence class is divided into subclasses each containing one closed element: – F (e1 ) = {a} + {b} – F (e2 ) = {abc} – F (e3 ) = {abd, abcd} Figure 1 displays on the left the confluence* F , on the middle we have the object set O, and on the right is represented the pre-confluence f [F ] of support closed patterns of F . The min-max implication basis is made of the internal basis Bi = {abc → abcd} (this implication holds both in (F a , Oa ) and in (F b , Ob )) plus the external basis Be = {a → b, b → a}.

5

Abstract closed patterns in confluences*

In this section we consider abstract closed patterns as those obtained in extensionally abstract Galois lattices, denoted here as abstract Galois lattices for short, by constraining the space 2O . The general idea, as proposed in [12] and resulting in Proposition 5 in section 3.1 is that an abstract Galois lattice is obtained by selecting as an extensional space a subset A of 2O closed under union i.e. an abstraction (or dual closure subset) and therefore such that A = pA (2O ) where pA where pA is an interior operator on 2O . The intuitive meaning is that the abstract extension extA (t) of some pattern t

will then be the union of the elements of A contained in its (standard) extension, i.e. extA = pA ◦ ext and the corresponding abstract support closure operator with respect to A is therefore fA = int ◦ pA ◦ ext. Intuitively, as noticed in [7], this is because the corresponding abstract Galois lattice is isomorphic, and as same support closure subset as the Galois lattice associated to the object set O(A) each object a of which is an element of A and described in T as int(a)4 . It is then straighforward that we obtain that abstract Galois pre-confluences are simply the Galois pre-confluences obtained through this change on object set. Theorem 4 Let F be a confluence* of a lattice T , O a set whose objects are described as elements of T , A = pA (O) an abstraction of A, then: Let pt denote the local description operators on F , we have that fA (t) = pt ◦int◦pA ◦ext(t), where (int, pA ◦ext) is a Galois connection on (T, A), is a support closure operator on F with respect to A and fA [F ] is a pre-confluence. We continue here the example of section 4.2 by using the abstraction A = {{o1 , o2 }, o1 , o3 }} = {{ab, abc}, {ab, abcd}}. Recall that pA (e) = ∪{a∈A|a⊆e} a. We obtain then: – fA (a) = pa ◦ int ◦ pA ({o1 , o2 , o3 }) = a as pA ({o1 , o2 , o3 }) = {o1 , o2 , o3 } = {ab, abc, abcd} – fA (b) = pb ◦ int ◦ pA ({o1 , o2 , o3 }) = b (same reason as above) – fA (abc) = pa ◦int◦pA ({o2 , o3 }) = >a = abcd as pA ({o2 , o3 }) = ∅ and therefore pa ◦ int(∅) = pa (>a ) = >a – fA (abd) = pa ◦ int ◦ pA ({o3 }) = >a = abcd as pA ({o3 }) = ∅ (as above) – fA (abcd) = pa ◦ int ◦ pA ({o3 }) = abcd (same as above) F is represented on the left of Figure 2. The corresponding abstract support closure pre-confluence fA [F ] is displayed on the right of the figure. What happens here, is that there are only two possible extensions as extA [F ] = {∅, O}. As a result the two minimal elements of fA [F ] share the same abstract extension O whereas the unique maximal element >a = >b = abcd have an empty abstract extension.

6

Algorithmics

An algorithm to build closure support on confluent families on 2X has been proposed in [6] whenever F is strongly accessible. This restriction5 ensures a polynomial delay in outputting support closed elements. This algorithm has further been implemented as a generic tool and in order to be efficient on multicores architectures particular in PARAMINER [15]. Adapting it to confluences* is straightforward by avoiding computing the support closure of ∅. Basically, the algorithm performs a depth-first search 4 5

In fact we just need the ∪-irreducible elements of A as objects For (F, X) to be a strongly accessible set system, it is required that between any pair of elements t1 , t2 with t1 ≤ t2 in F there is a path t1 , t1 ∪ {x1 }, ..., t1 ∪ {x1 , . . . xk } = t2 all elements of which belong to F .

a

b

a

b

{o1 ,o2 ,o3} = pA({o1 ,o2,o3})

c a

a

b

b

A = {{o1 ,o2},{o1,o3}}

d

c

c a

b d

a

b

{} = p ({o3})

d

Fig. 2. Diagram of the abstract support closed connected subgraphs pre-confluence fA [F ] (on the left part of the figure) with respect to the abstraction A = {{o1 , o2 }, {o1 , o3 }} of O. The support closed element abc of f [F ] as been projected to the maximal element of F , abcd, because its extension {abc, abcd} is projected on ∅ as no element of A is included in {abc, abcd}.

each step of which consists in adding an attribute x to the current closed pattern t, checking whether the resulting pattern t ∪ {x} is in F , and closing the pattern. A SELECT function states whether a pattern belongs to F and closure is only computed if it returns TRUE. The function has an ad hoc implementation according to the problem in hand. In terms of interior operators, SELECT implicitly tests whether pt (t ∪ {x}) = t ∪ {x} is true. A CLOSURE function computes the closure of any t ∈ F by implicitly applying pt to int(ext(t)). Again the implementation is ad hoc, depending of the problem at hand. An open question is the construction and visualisation of the diagram of the preconfluence of support closed elements and of the corresponding min-max implication basis.

7

Conclusion

Motivated by the problem of finding closed patterns in languages as the set of connected subgraphs of a graph, we have investigated an extension of FCA where the pattern language is a pre-confluence, i.e. a partial order defined through the existence of a local meet operator, and that can be expressed as a constrained union of a set of lattices. We have first extended the standard property that relates closure subsets and subsets closed under the meet operator to the case of pre-confluences. Then we have discussed the existence of support-closure operators in pre-confluences, extending a result of [6] and we have called a Galois pre-confluence the pre-confluence of support closed patterns. Related FCA works concern indirect approaches in which a support closure operator is defined on sets of connected graphs, thus resulting in a standard concept lattice whose intents contains several support-closed connected graphs[16]. We have also shown that applying interior operators to the powerset of objects we obtain, as in the lattice case, abstract support closures. The connection to FCA we have attempted to rises some technical questions, as the construction of diagrams of closure subsets, as well as more fundamental questions. For instance, when considering a support closed element as the intensional part of some concept, i.e. an intent, we may have two different concepts with

the same extent which is somewhat disturbing. On the other hand, we could consider that the extension defines the concept, i.e. is an extent and in this case, a concept may have several intents. Finally, regarding applications, its seems worthwhile to consider such structures, as they are frequent when modeling data using graphs.

Acknowledgements Many thanks to Bernard Monjardet for his invaluable comments and to Sylvie Borne and Sophie Toulouse for their help in the preparation of this manuscript.

References 1. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer Verlag (1999) 2. Caspard, N., Monjardet, B.: The lattices of closure systems, closure operators, and implicational systems on a finite set: a survey. Discrete Appl. Math. 127(2) (2003) 241–269 3. Diday, E., Emilion, R.: Maximal and stochastic galois lattices. Discrete Appl. Math. 127(2) (2003) 271–284 4. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24(1) (1999) 25–46 5. Arimura, H., Uno, T.: Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In: SDM, SIAM (2009) 1087–1098 6. Boley, M., Horv´ath, T., Poign´e, A., Wrobel, S.: Listing closed sets of strongly accessible set systems with applications to data mining. Theor. Comput. Sci. 411(3) (2010) 691–700 7. Ventos, V., Soldano, H.: Alpha Galois lattices: An overview. In Ganter, B., (Eds), R.G., eds.: International Conference on Formal Concept Analysis (ICFCA). Volume 3403 of Lecture Notes on Computer Science. Springer Verlag (2005) 298–313 8. Blyth, T.S.: Lattices and Ordered Algebraic Structures. Universitext, Springer (2005) 9. Ferr´e, S., Ridoux, O.: An introduction to logical information systems. Information Processing and Management 40(3) (2004) 383–419 10. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. ICCS-01, LNCS 2120 (2001) 129–142 11. Pernelle, N., Rousset, M.C., Soldano, H., Ventos, V.: Zoom: a nested Galois lattices-based system for conceptual clustering. J. of Experimental and Theoretical Artificial Intelligence 2/3(14) (2002) 157–187 12. Soldano, H., Ventos, V.: Abstract Concept Lattices. In Valtchev, P., J¨aschke, R., eds.: International Conference on Formal Concept Analysis (ICFCA). Volume 6628 of LNAI., Springer, Heidelberg (2011) 235–250 13. Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. Journal Intelligent Information Systems (JIIS) 24(1) (2005) 29–60 14. Guigues, J., Duquenne, V.: Famille non redondante d’implications informatives r´esultant d’un tableau de donn´ees binaires. Math´ematiques et Sciences humaines 95 (1986) 5–18 15. Negrevergne, B., Termier, A., Rousset, M.C., M´ehaut, J.F.: Paraminer: a generic pattern mining algorithm for multi-core architectures. Data Mining and Knowledge Discovery (2013) 1–41 16. Kuznetsov, S.O., Samokhin, M.V.: Learning closed sets of labeled graphs for chemical applications. In Kramer, S., Pfahringer, B., eds.: ILP. Volume 3625 of Lecture Notes in Computer Science., Springer (2005) 190–208

Suggest Documents