Extracting Semantics from Data Cubes using Cube ...

4 downloads 0 Views 373KB Size Report
{casali,cicchetti,lakhal}@iut.univ-aix.fr. ABSTRACT. In this paper we propose a lattice-based approach intended for extracting semantics from datacubes: ...
Extracting Semantics from Data Cubes using Cube Transversals and Closures Alain Casali, Rosine Cicchetti, Lotfi Lakhal LIF - Universite´ de la Mediterran ee, ´ ´ Marseille, France {casali,cicchetti,lakhal}@iut.univ-aix.fr

ABSTRACT

1.

In this paper we propose a lattice-based approach intended for extracting semantics from datacubes: borders of version spaces for supervised classification, closed cube lattice to summarize the semantics of datacubes w.r.t. COUNT, SUM, and covering graph of the quotient cube as a visualization tool of minimal multidimensional associations. With this intention, we introduce two novel concepts: the cube transversals and the cube closures over the cube lattice of a categorical database relation. We propose a levelwise merging algorithm for mining minimal cube transversals with a single database scan. We introduce the cube connection, show that it is a Galois connection and derive a closure operator over the cube lattice. Using cube transversals and closures, we define a new characterization of boundary sets which provide a condensed representation of version spaces used to enhance supervised classification. The algorithm designed for computing such borders improves the complexity of previous proposals. We also introduce the concept of closed cube lattice and show that it is isomorph to on one hand the Galois lattice and on the other hand the quotient cube w.r.t. COUNT, SUM. Proposed in [16], the quotient cube is a succinct summary of a datacube preserving the Rollup/Drilldown semantics. We show that the quotient cube w.r.t. COUNT, SUM and the closed cube lattice have a similar expression power but the latter has the smallest possible size. Finally we focus on the multidimensional association issue and introduce the covering graph of the quotient cube which provides the user with a visualization tool of minimal multidimensional associations.

Hypergraph transversals [2, 8] and Galois closure of a finite binary relation [9] have various applications in data mining and various kinds of knowledge can be discovered: minimal keys and minimal functional dependencies [19, 17], implication rules [9], concise representation of frequent patterns [23, 25, 31], connection between positive and negative borders of theories [20], non-redundant association rules [1, 32], classification and conceptual clustering [21, 3, 27]. When mining minimal transversals and Galois closed sets, the search space to be explored is the powerset lattice of the binary attributes (values). We show in [4] that such a lattice is not really suitable when extracting semantics from datacubes [10], and suggest, as an alternative, an algebraic structure which is called cube lattice of a categorical database relation r. Such a cube lattice is the set of all possible and semantically valid multidimensional patterns and it can be seen as the union of the datacubes of r and r (complement of r). The cube lattice is a set of tuples representing multidimensional patterns, provided with a generalization/specialization order between tuples. Two operators Product and Sum are defined as the foundation of the lattice operations Join (Least Upper Bound) and Meet (Greatest Lower Bound). A similar lattice has been independently proposed by L.V.S. Lakshmanan, J. Pei, and J. Han [16]. Based on the latter structure, the authors define the quotient cube lattice, a succinct summary of the datacube with the nice property of preserving the Rollup/Drilldown semantics of the cube. Cube lattice provides a sound basis for defining the search space to be explored when extracting semantics from the datacube such as Roll-Up dependencies [28], multidimensional associations [29], decision tables and classification rules [18], iceberg cubes [13], multidimensional constrained gradients [7], concise representation of hight frequency multidimensional patterns [6] and reduced cubes [16, 30].

Categories and Subject Descriptors H.2.8 [Database Management]: Data Mining

Keywords

INTRODUCTION AND MOTIVATIONS

In this paper, following from this semantic trend, we propose, within the groundwork of cube lattice, the concepts of cube transversals and cube closures. Based on the introduced concepts, we provide a general and sound approach intended for extracting semantics from datacubes: borders of version spaces for supervised classification, closed cube lattice to summarize the semantics of datacubes w.r.t. COUNT, SUM, and the covering graph of the quotient cube as a visualization tool of minimal multidimensional associations. More precisely, we make the following contributions.

Algorithm, Closures, Datacubes, Hypergraph Transversals, Lattices, Version Spaces.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGKDD ’03, August 24-27, 2003, Washington, DC, USA. Copyright 2003 ACM 1-58113-737-0/03/0008...$5.00.

• We introduce two novel concepts for multidimensional data mining problems: the cube transversal and the

69

cube closure over the cube lattice. Finding cube transversals is a sub-problem of hypergraph transversal discovery because it exists an order-embedding from the cube lattice to the powerset lattice of binary attributes. By using this result, we propose a levelwise merging algorithm for mining minimal cube transversals with one database scan. Algorithms addressing the similar issue but in a binary context (i.e. mining minimal hypergraph transversals) require k database scans [19, 17]. We introduce the concept of cube connection and show that it is a Galois connection between the cube lattice of r and the powerset lattice of T id(r) (which is the set of tuple identifiers in r). Hence we derive from the cube connection a closure operator and obtain what we call the closed cube lattice of r, which is a reduced representation of the original cube. Each element of the closed cube lattice is a closed tuple and cube keys are minimal generators of closed tuples.

and the related closure operator are presented in section 4. In section 5, we propose the closed cube lattice and show the relationship with the quotient cube. Section 6 characterizes the computation of its covering graph. Section 7 describes a multidimensional data mining application based on cube transversals and connection: discovering boundary sets of version spaces for supervised classification. Proofs of propositions, lemmas and theorems are given in [5].

2. 2.1

BACKGROUND Cube Lattice Framework

Throughout the paper, we make the following assumptions and use the introduced notations. Let r be a relation over the schema R. Attributes of R are divided in two sets (i) D the set of dimensions, also called categorical or nominal attributes, which correspond to analysis criteria for OLAP, classification or concept learning [22] and (ii) M the set of measures (for OLAP) or class attributes. Moreover, attributes of D are totally ordered (the underlying order is denoted by g t0 and ∀t00 ∈ Space(r), t ≥g t00 g t0 ⇒ t = t00 . In the multidimensional space of our relation example, we have: t5 ≥g t2 , i.e. t5 is more general than t2 and t2 is more specific than t5 and t1 g t2 . Moreover any tuple generalizes the tuple and specializes the tuple .

[20]. Provided with multidimensional patterns at the level i, we only generate candidates of level i + 1, if they exist (else the constrained product yields ). Moreover, each tuple is generated only once. Let u and v be two tuples of Space(r), X = Attribute(u) and Y = Attribute(v).   t = u • v if X\max



Definition 9 [Cube Connection] Let Rowid : r → N be a mapping which associates each tuple with a single positive integer and T id(r) = {Rowid(t) | t ∈ r} (i.e. the set of the tuple identifiers of the relation r). Let λ and σ be two functions defined as follows:

< ?, ?, ? >

Figure 3: Hasse diagramme of the closed cube lattice of r (0 ?0 ⇔ ALL).

λ : CL(r) → hP(T id(r)), ⊆i t 7→ ∪{Rowid(t0 ) ∈ T id(r) | t ≥g t0 and t0 ∈ r} σ : hP(T id(r)), ⊆i → CL(r) P 7→ +{t ∈ r | Rowid(t) ∈ P }

minimal generators of t. Each tuple of Key(t) is a cube key.

Proposition 4 The cube connection rc = (λ, σ) is a Galois connection between the cube lattice of r and the powerset lattice of T id(r).

Considering our relation example (Cf. Table 1), we have: Key() = {, }.

Definition 10 [Closure Operator] Let us define the operator C : CL(r) → CL(r)

Remark: Let us consider a database relation r over r. Minimal cube keys of a closed tuple are similar to minimal keys of r. The closure operator is not similar since R is closed under a set of functional dependencies over r [19].

 t 7→

+t0 ∈r t0 | t ≥g t0 if ∃t0 ∈ r < ∅, ..., ∅ > elsewhere.

5.

CLOSED CUBE LATTICES

The closed cube lattice, defined in this section, is a summary of datacube w.r.t. SUM, COUNT, but in contrast with the quotient cube it does not preserve the semantics of Rollup/Drilldown operators. Thus it is specially relevant to show that closed cube lattice and quotient cube w.r.t. COUNT, SUM have a similar expression power, because we can take benefit of the aggregated data loss-less property of quotient cubes while providing a sound representation with a reduced size. In fact, it is the smallest possible representation of a datacube w.r.t. COUNT, SUM. We use Birkhoff theorem [9] to construct a lattice associated to our closure operator. In contrast with Galois (concept) lattices [9], our resulting lattice is not unspecified: it remains coatomistic.

Example 11 - Considering the multidimensional space of the relation example we have: C(, r) = + = C(, r) = Proposition 5 C is a closure operator over CL(r) under r and thus it satisfies the following properties [9]: 1. t ≥g t0 ⇒ C(t, r) ≥g C(t0 , r) (monotony)

Theorem 3 The partially ordered set CCL(r) = hC(r), ≥g i is a complete and coatomistic lattice called closed cube lattice. Moreover, we have: V 1. ∀ T ⊆ CCL(r), T = +t∈T t W 2. ∀ T ⊆ CCL(r), T = C(•t∈T t, r)

2. t ≥g C(t, r) (extensity) 3. C(t, r) = C(C(t, r), r) (idempotency). The closure of each tuple is computed and results are gathered within a closure system. The minimal tuples w.r.t. ≥g originating the very same closure are called cube keys.

Example 15 - Figure 3 exemplifies the cube closure lattice of our relation example (Cf. Table 1).

Definition 12 [Closure System] Let us assume that C(r) = { t ∈ CL(r) | C(t, r) = t}. C(r) is a closure system over r and its related closure operator is C. Any tuple belonging to C(r) is a closed tuple or a cube closure.

5.1

Lattice-Isomorphism

Example 13 - Considering our relation example, we have: C(r) = {, , , , , , }.

Closed set computation has been widely addressed in a binary context. Defining the following lattice-isomorphism makes it possible to reuse results obtain in a binary context.

Definition 14 [Cube Key] Let t be a closed tuple. Key(t) = min≥g ({t0 ∈ CL(r) | t0 ≥g t and C(t0 , r) = t}) is the set of

Definition 16 [Lattice-Isomorphism] Let P and Q be two lattices. ∀X, Y ∈ P , a mapping m : P → Q is a latticeisomorphism iff it satisfies:

74

V m(Y ) ( -morphism property) W W W 2. m(X P Y ) = m(X) Q m(Y ) ( -morphism property) 1. m(X

V

P

Y ) = m(X)

V

Q

3. m is bijective

Let us assume that Br = {Φ(t) | t ∈ r}, the mapping h : L(r) → L(r), X 7→ S ∩{Y ⊆ I | X ⊆ Y } is a closure operator on I = A.a, ∀a ∈ Dim(A) over Br [9]. A∈D

Cl(Br) = {h(X) | X ⊆ I} is a closure system and L(Br) = hCl(Br), ⊆ i is a lattice of Br called concept lattice [9]. Theorem 4 The mapping Ψ : CCL(r) → L(Br) is a latticeisomorphism and we have ∀t ∈ CCL(r), X ∈ L(Br) : • Ψ(t) = Φ(t) Figure 4: Lattice of closed-equivalence classes (0 ?0 ⇔ ALL).

•  if ∃A ∈ D and a1 , a2 ∈    Dim(A) | A.a1 and A.a2 ∈ X −1 Ψ (X) = a if A.a ∈ X    t | ∀A ∈ D, t[A] = ALL elsewhere.

C(t, r) = C(t0 , r). The equivalence class of t is given by [t] = {t0 ∈ CL(r) | tθt0 }. Thus we have max≥g ([t]) = C(t, r) and min≥g ([t]) = Key(C(t, r)) (Cf. [27] and theorem 4). The set of equivalence classes provided with the generalization order is a lattice (Cf. [27] and theorem 4) called lattice of closed-equivalence classes. The order relation within equivalence classes is also the generalization. Let us underline that this lattice is isomorph to the closed cube lattice (Cf. [27] and theorem 4).

Consequences of theorem 4 are specially attractive: when mining closed (frequent) tuples, we can use either binary algorithms like Titanic [27], Charm [31], Closet [25] or algorithms fitting into the cube lattice framework [16]. Let us notice that when finding tuples under conjonction of monotone and/or anti-monotone constraints in a binary framework, the former algorithms could return erroneous results [4] whereas the latter traverse a search space only encompassing semantically valid patterns.

5.2

Example 18 - The lattice of closed-equivalence classes of the relation illustrated in table 1 is given in figure 4. In this figure, each equivalence class is represented by its maximal tuple w.r.t. ≥g and by its minimal elements w.r.t. ≥g .

Relationships between Closed Cubes and Quotient Cubes

Closed cube is the smallest concise representation of datacube computed by using the aggregative functions COUNT and SUM (when all the values are strictly positive). Nevertheless this structure does not preserve the Rollup/Drilldown semantics of the cube whereas the quotient cube [16] preserves it. In such a context, it is specially interesting to state a sound relationship between the closed cube and the quotient cube. To meet this objective, we use results presented in [27] to construct equivalence classes: we merge within a single equivalence class tuples having the very same cube closure. Each equivalence class is then represented by its closed tuple (i.e. the maximal tuple w.r.t. ≥g ) and by the cube keys related to this closed tuple (i.e. the minimal tuples w.r.t. ≥g ). The result is a lattice of closed-equivalence classes which is a quotient cube. Finally we state a novel link between key tuples and closed tuples. Then we show that the closed cube lattice has the same expression power than the quotient cube lattice for the aggregative functions SUM or COUNT but its size is smaller. Thus closed cube lattice is a specially good candidate for an optimized representation of quotient cube lattice for the aggregative functions SUM or COUNT. In the remainder of this section, we assume that f is the aggregative function COUNT or SUM, and all the values of the measure M are strictly positive.

In the lattice of closed-equivalence classes like in the quotient cube, each equivalence class has a single maximal element and several minimal tuples. Based on the isomorphism between the lattice of closed-equivalence classes and the closed cube lattice, we introduce the following theorem which states that the latter lattice is isomorph to the quotient cube. Theorem 5 The lattice of closed-equivalence classes is a quotient cube and closed cube lattice is isomorph to quotient cube w.r.t. SUM,COUNT. The closed cube lattice is the smallest representation of datacube w.r.t. COUNT, SUM (Cf. theorem 4 and [27, 9]). As a consequence, the quotient cube has the smallest number of equivalence classes (due to theorem 5). For taking advantage of the loss-less representation of the quotient cube, our aim is to show that the closed cube lattice has a similar expression power. In order to meet this objective, it is necessary to find the cube keys of a closed tuple being provided only with the other closed tuples. Lemma 1 Let u be a tuple, v a closed tuple and u ≥g v, we have the following equivalence: F req(u) = F req(v) ⇔ @w ∈ DLB(t) : u ≥g w,

Definition 17 [Lattice of closed-equivalence classes] Let θ be the following equivalence relation : tθt0 holds iff

where DLB(t) = {t0 ∈ CCL(r) | t0 g t}.

75

Theorem 6 ∀t ∈ CCL(r), Key(t) = cT r(DLB(t), {t}) Example 19 - In the lattice given in figure 3, for the particular tuple , we have DLB() = {< ALL, ALL,H >}. The keys of this tuple are and . We obtain results similar to the ones in figure 4. This theorem proves that the quotient cube lattice for the aggregative function SUM and COUNT can be obtained from the closed cube lattice. Thus the closed cube lattice has the same expression power as the quotient cube and moreover it is the smallest possible representation.

6.

Figure 5: Covering graph of quotient cube, a summary of multidimensional associations.

COVERING GRAPHS OF QUOTIENT CUBES

The covering graph of the quotient cube (w.r.t. COUNT) captures a summary of multidimensional associations by giving rules of minimal antecedent and maximal consequent. With such a covering graph, the user is provided with a visualization tool for minimal multidimensional associations. Given the closed-equivalence classes, we propose a novel theorem which is a sound basis for computing the covering graph. If two tuples t and t0 belong to the same class, then the rule Φ(t) → Φ(t0 ) is an exact rule, else the rule is approximative. The confidence of a transitive rule is given by Conf (Φ(t) → Φ(t0 )) ∗ Conf (Φ(t0 ) → Φ(u)) = Conf (Φ(t) → Φ(u)) [32].

7.1

• (C1 ) ∀t0 ∈ r+ , t ≥g t0 and • (C2 ) ∀t0 ∈ r− , t 6≥g t0 C1 and C2 are equivalent to the constraints F req(t, r+ ) = 1 and F req(t, r− ) = 0 respectively. The constraint C1 is antimonotone w.r.t. ≥g (i.e. t0 ≥g t and t satisfies C1 ⇒ t0 satisfies C1 ). The constraint C2 is monotone w.r.t. ≥g (i.e. t0 ≥g t and t0 satisfies C2 ⇒ t satisfies C2 ). The version space is defined as follows: V S(r) = {t ∈ CL(r) | t is consistent}.

Lemma 2 Let u, v be two closed tuples and u 6= , thus we have: u >g v ⇒ v\u is a cube transversal of Key(v) on CL(r). Lemma 3 Let v be a closed tuple, if t is a minimal cube transversal of Key(v) on CL(r) then v\t is a closed tuple. The following theorem is in the spirit of J.L. Pfaltz and al work about the covering relation of closed sets in a binary framework [26] .

In RDBMSs provided with OLAP functionalities, such as IBM DB2 or Microsoft SQL Server, V S(r) can be computed by using the Group By Cube operator [12] as follows: SELECT A1 , ..., An FROM r+ GROUP BY CUBE (A1 , ..., An ) HAVING COUNT(*) = |r+ | MINUS SELECT A1 , ..., An FROM r− GROUP BY CUBE (A1 , ..., An );

Theorem 7 Let u, v be two closed tuples and u 6= , u g v ⇔ v\u is a minimal cube transversal of Key(v) on CL(r). Example 20 - The covering graph for the relation example (Cf. table 1) is given in figure 4. The graph in figure 5 results from applying the order-embedding on the tuples in the equivalence classes of the covering graph. Hence, a summary of multidimensional associations is illustrated in figure 5, where the full lines stand for approximative rules and the dotted lines symbolize exact rules (for uniformity, trivial associations are represented and can be removed).

We assume that |r+ | is a constant given by the user. Due to the underlying Cube-By operations, it is obvious that this approach is specially time and space consuming, thus computing borders can be of great interest.

7.2 7.

Consistent Tuples

Let r be a categorical database relation over D ∪ {C} where Dim(C) = {0 +0 ,0 −0 }, r+ is the set of positive tuples (if r+ is empty, we consider that r+ = {}) and r− is the set of negative tuples (if r− is empty, we consider that r− = {}). A tuple t is consistent if and only if it satisfies:

Boundary Sets

The version spaces are usually represented by the boundary sets S and G encompassing the most specific and the most general consistent tuples respectively. H. Hirsh [14] shows that finite sets satisfying the convexity property can be represented by boundary sets. The following proposition shows that the version space of a categorical database relation is a convex space of the cube lattice and therefore preserves the representation through the sets S and G.

VERSION SPACES

The version space framework is based on a partial ordering of the hypotheses in a concept language [22]. Such a partial ordering is issued from the relative generality of hypotheses. In our context, the concept language is the cube lattice framework and hypotheses are tuples.

76

Proposition 6 The version space V S(r) is a convex space of the cube lattice CL(r) with the upper set S = max≥g (V S(r)) and the lower set G = min≥g (V S(r)). H. Hirsh shows that if the concept language used is conjonctive, which is the case of cube lattices, then |S| = 1 [15].

Alg. 3 VSM Algorithm

Let us suppose that the relation does not encompass erroneous data. Under such an assumption, the boundary sets S and G correctly classify unseen tuples t ∈ CL(r), as follows:  0 0  + iff s ≥g t. 0 −0 iff ∀ g ∈ G, g g t. t[C] =  unknown otherwise.

7.3

of version spaces in the cube lattice framework is tractable because |D| is a constant, and the number of dimensions is incomparably smaller than the number of possible values of all the dimensions (some attributes vs. several thousands or millions of values).

Input: r+ , r− Output: S, G 1: if r+ = {∅} then r+ := {} 2: if r− = {∅} then r− := {} 3: S 0 := {C(, r+ )} 4: if S 0 = {} then exit 5: if S 0 is a cube transversal of r− on CL(r+ ) then 6: G := cT r(r− , S 0 ) on CL(r+ ) \\ use MCTR algorithm 7: if G = {} then 8: S = G = {} and exit 9: else 10: S := {C(g, r)}, g ∈ G 11: end if 12: else 13: S = G = {} and exit 14: end if 15: return S, G

Novel Characterization of Boundary Sets

In this section, we propose a novel characterization of version space borders. The major interest of those boundary sets is that they can be computed more efficiently than previous representation of borders [22]. By using the concepts of cube transversal and cube closure, we identify relationships between on one hand negative tuples and G and on the other hand S and G. Since all consistent tuples satisfy the constraint C1 (F req (t, r+ ) = 1), each tuple of S must generalize the closure of the tuple over r+ . Moreover, consistent tuples satisfy the constraint C2 (F req(t, r− ) = 0), thus consistent tuples must specialize cT r(r− ). V S(r) is a convex set which has a single specific tuple and each tuple of V S(r) has the same frequency over r (∀t ∈ V S(r), F req(t, r) = |r+ |/|r|), thus V S(r) is a closed-equivalence classe (Cf. definition 17). This result is stated by theorem 8.

Theorem 8 proves the correctness of VSM algorithm. Its complexity is similar to the one of the CTR algorithm. VSM complexity is exactly |D|(|r+ | + 2|D| ∗ |r− | + 2|r|) or approximatively O(2|D| |r| ∗ |D|) which improves the complexity of Candidate-Elimination Algorithm [22]: O(|S| ∗ |G| ∗ |r| + |S|2 ∗ |r+ | + |G|2 ∗ |r− |). Example 21 - Let us consider the following categorical database relation: r+ = {t ∈ r | t[EnjoySport] = ’Yes’ } and r− = {t ∈ r | t[EnjoySport] = ’No’ }. The three steps of the VSM algorithm are the following:

Theorem 8 Let V S(r) be a version space of a categorical database relation r = r+ ∪ r− , then: 1. G = cT r(r− , C(, r+ )) on CL(r+ )

1. Compute S 0 = {C(, r+ )} = {}

2. S = {C(g, r)}, g ∈ G

7.4

2. identify the minimal cube transversals of r− on CL(r+ ) which generalize S 0 , cT r(r− , S 0 ) = {, };

Finding Boundary Sets

The standard algorithm finding boundary sets of version space is the Candidate-Elimination Algorithm (CEA) [22]. It discovers simultaneously S and G. In this section, we introduce the algorithm VSM (Version Space Mining) which is fundamentally different from CEA. It finds sequentially G and S using cube transversal and connection. Firstly, it computes the closure of the tuple over r+ . Then it finds G by computing the cube transversals of the complement of r− on CL(r+ ). These cube transversals are constrained with the previous closed tuple. Finally VSM mines S by using the cube connection. Computing the set G is critical because when fitting in the classical groundwork which is the powerset lattice, its size grows exponentially with respect to the number of binary attributes (i.e. the number of all possible values, |E|) [14]. The size q of G is  |E| 2|E| 2 . Forbounded by |E|/2 , which is asymptotic to √ π

3. Compute the cube closure of one of the tuples of G on r: S = {}

8.

CONCLUSION

The presented work is a contribution to the foundation of data mining and data warehouse. It is a cross-fertilization between the fields of discrete mathematics, databases and machine learning. We introduce the concepts of the cube transversals and cube closures of a categorical database relation, and propose a merging algorithm for mining minimal cube transversals. These concepts are applied to supervised classification using version spaces studied for many years in the field of machine learning. However, to the best of the author knowledge, it is the first time that the problem of mining version spaces of categorical database relations has been introduced and characterized using cube transversals and cube closures. This novel characterization makes it possible to improve T. Mitchell work on the computational complexity of mining version space borders.

|E|

tunately, when setting the cube lattice as the search space to be explored, this bound does not hold because the size  of the |D| largest level in the cube lattice is bounded by |D|/2 which q |D| 2 2 is asymptotic to √ [4]. Thus, mining boundary sets π |D|

77

In the spirit of the quotient cube, we propose the new concept of closed cube lattice which is the most reduced summary of datacube w.r.t. COUNT, SUM. We state the isomorphisms on one hand between the closed cube lattice and the concept lattice for which many fast algorithms are provided, and on the other hand between the closed cube lattice and the quotient cube lattice w.r.t COUNT, SUM. Finally we characterize the covering graph of the quotient cube w.r.t. COUNT by making use of cube transversals. Such a covering graph is a succinct summary of multidimensional associations and can be used as a visualization tool. Defining set operations on constrained cube lattices (convex spaces) is an interesting future work. It could be a basis for providing a convex space algebra in the cube lattice framework (with arbitrary monotone and/or antimonotone constraints given in [24]).

[13] J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. In Proceedings of the International Conference on Management of Data, SIGMOD, pages 441–448, 2001. [14] H. Hirsh. Theoretical Underpinnings of Version Spaces. In Proceedings of the 12th International Joint Conference on Artificial Intelligence, IJCAI, pages 665–670, 1991. [15] H. Hirsh. The Computational Compexity of the Candidate-Elimination Algorithm. Technical report, Rutgers Univeristy, 1992. [16] L. Lakshmanan, J. Pei, and J. Han. Quotient Cube: How to Summarize the Semantics of a Data Cube. In Proceedings of the 28th International Conference on Very Large Databases, VLDB, pages 778–789, 2002. [17] S. Lopes, J. Petit, and L. Lakhal. Efficient Discovery of Functional Dependencies and Armstrong Relations. In Proceedings of the 7th International Conference on Extending Database Technology, EDBT, pages 350–364, 2000. [18] H. Lu and H. Liu. Decision Tables: Scalable Classification Exploring RDBMS Capabilities. In Proceedings of the 26th International Conference on Very Large Databases, VLDB, pages 373–384, 2000. [19] H. Mannila and K. R¨ aih¨ a. The Design of Relational Databases. Addison Wesley, 1994. [20] H. Mannila and H. Toivonen. Levelwise Search and Borders of Theories in Knowledge Discovery. In Data Mining and Knowledge Discovery, volume 1(3), pages 241–258, 1997. [21] E. Mephu Nguifo and P. Njiwoua. Using Lattice-Based Framework as a Tool for Feature Extraction. In Proceedings of the 10th European Conference on Machine Learning, ECML, pages 304–309, 1998. [22] T. M. Mitchell. Machine learning. MacGraw-Hill Series in Computer Science, 1997. [23] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory, ICDT, pages 398–416, 1999. [24] J. Pei and J. Han. Constrained Frequent pattern Mining: A Pattern-Growth View. In SIGKDD Explorations, volume 4(1), pages 31–39, 2002. [25] J. Pei, J. Han, and R. Mao. CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets. In Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD, pages 21–30, 2000. [26] J. Pfaltz and R. Jamison. Closure systems and their structure. In Information Sciences, volume 139(3-4), pages 275–286, 2001. [27] G. Stumme, R. Taouil, Y. Bastide, N. Pasquier, and L. Lakhal. Computing Iceberg Concept Lattices with Titanic. In Data and Knowledge Engineering, volume 42(2), pages 189–222, 2002. [28] Toon Calders and Raymond T. Ng and Jef Wijsen. Searching for Dependencies at Multiple Abstraction Levels. In ACM Transactions on Database Systems, ACM TODS, volume 27(3), pages 229–260, 2002. [29] A. Tung, H. Lu, J. Han, and L. Feng. Efficient Mining of Intertransaction Association Rules. In IEEE Transactions on Knowledge and Data Engineering, TKDE, volume 15(1), pages 43–56, 2003. [30] W. Wang, H. Lu, J. Feng, and J. Yu. Condensed Cube: An Effective Approach to Reducing Data Cube Size. In Proceedings of the 18th International Conference on Data Engineering, ICDE, pages 213–222, 2002. [31] M. Zaki and C. Hsio. CHARM: An Efficient Algorithm for Closed Itemset Mining. In Proceedings of the 2nd SIAM International Conference on Data mining, 2002. [32] M. J. Zaki. Generating non-redundant association rules. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, KDD, pages 34–43, 2000.

Acknowledgments We would like to thank Laks V. S. Lakshmanan and Jian Pei for their fruitful and quick answers about the Quotient Cube .

9.

REFERENCES

[1] Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets. In Proceedings of the 1st International Conference on Computational Logic, CL, pages 972–986, 2000. [2] C. Berge. Hypergraphs: combinatorics of finite sets. North-Holland, Amsterdam, 1989. [3] C. Carpineto and G. Romano. A Lattice Conceptual Clustering System and Its Application to Browsing Retrieval. In Machine Learning, volume 24(2), pages 95–122, 1996. [4] A. Casali, R. Cicchetti, and L. Lakhal. Cube Lattices: a Framework for Multidimensional Data Mining. In Proceedings of the 3rd SIAM International Conference on Data Mining, SDM, pages 304–308, 2003. [5] A. Casali, R. Cicchetti, and L. Lakhal. Lattice-Based Discovery of Semantics from Datacubes. Technical report, Universit´ e de la M´ editerran´ ee, 2003. [6] A. Casali, R. Cicchetti, and L. Lakhal. Mining Concise Repr´ esentations of Frequent Multidimensional Patterns. In Proceedings of the 11th International Conference on Conceptual Structures, ICCS, 2003. [7] G. Dong, J. Han, J. Lam, J. Pei, and K. Wang. Multi-Dimensional Constrained Gradients in Data Cubes. In Proceedings of 27th International Conference on Very Large Data Bases, VLDB, pages 321–330, Italy, 2001. [8] T. Eiter and G. Gottlob. Identifying The Minimal Transversals of a Hypergraph and Related Problems. In SIAM Journal on Computing, volume 24(6), pages 1278–1304, 1995. [9] B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer, 1999. [10] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In Data Mining and Knowledge Discovery, volume 1(1), pages 29–53, 1997. [11] D. Gunopulos, H. Mannila, R. Khardon, and H. Toivonen. Data mining, hypergraph transversals, and machine learning. In Proceedings of the 16th Symposium on Principles of Database Systems, PODS, pages 209–216, 1997. [12] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.

78

Suggest Documents