Enhancing the mining top-rank-k frequent patterns - IEEE Xplore

74 downloads 233 Views 176KB Size Report
1Department of Computer Science, University of Science, VNU, Ho Chi Minh, Viet Nam ... Keywords—data mining; frequent pattern mining; top-rank-k frequent ...
2014 IEEE International Conference on Systems, Man, and Cybernetics October 5-8, 2014, San Diego, CA, USA

Enhancing the mining top-rank-k frequent patterns Bac Le1, Bay Vo2,3,∗, Quyen Huynh-Thi-Le1, Tuong Le2,3 1

Department of Computer Science, University of Science, VNU, Ho Chi Minh, Viet Nam [email protected], [email protected] 2 Division of Data Science, Ton Duc Thang University, Ho Chi Minh, Viet Nam 3 Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh, Viet Nam {vodinhbay, lecungtuong}@tdt.edu.vn Abstract — frequent pattern mining generates a lot of candidates which spends a lot of usage memory and mining time. Besides, in real applications, a small number of frequent patterns are used. Therefore, the problem of mining top-rank-k frequent patterns (TRFPs) is an interesting topic in recent years. This paper proposes iNTK algorithm for mining TRFPs. This algorithm employs N-list structure generated by PPC-tree to reduce the memory usage. Besides, the subsume concept is also used to enhance the process of mining TRFPs. The experimental results show that iNTK outperforms NTK in terms of mining time and memory usage.

The paper proposes an efficient method for mining TRFPs called iNTK (an improving version of NTK). The main contributions of this paper: (i) using the N-list structure with an improved N-list intersection function (ii) using the subsume index concept to reduce the runtime and memory usage for mining TRFPs. The remaining of the paper is organized as follows. Section 2 introduces the basic concepts. iNTK algorithm for mining TRFPs will be developed in Section 3. Section 4 studies the performance of iNTK and NTK. Section 5 summarizes our study and points out some future research issues.

Keywords—data mining; frequent pattern mining; top-rank-k frequent patterns; N-list.

2 PROBLEM DEFINITION

1 INTRODUCTION Frequent pattern mining is to find itemsets, subsequences, or substructures that appear in large datasets with frequency no less than a given threshold. As one of several essential data mining tasks, mining frequent patterns has been studied extensively in literature and plays an essential role in many important data mining tasks such as mining association rules [3, 16, 18], sequential patterns [2, 14], partial periodicity [9], classification [11-13], etc. Since the first introduction of mining frequent patterns [3], various algorithms [1, 9, 20-21] have been proposed to discover frequent patterns efficiently. The main framework of frequent pattern mining is based on a minimal support threshold which still exists two problems [10] that may hinder its use is widespread. Firstly, setting minimal support is quite subtle: a too low threshold may lead to generate a lot of patterns, whereas a too big one may often generate no answers. Secondly, frequent pattern mining often leads to the generation of a large number of patterns (and an even larger number of mined rules). And mining a long pattern may unavoidably generate an exponential number of subpatterns due to the downward closure property of the mining process. Therefore, Deng et al. [4] proposed mining top-rank-k frequent patterns (TRFPs). FAE [4] and VTK [8] algorithms were proposed for mining TRFPs. Recently, Deng [6] also introduced the NTK for mining TRFPs based on the idea of PPC-trees (Pre-order Post-order Code trees), an FP-tree like structure. NTK achieves effectively because it uses the patterns represented by Node-list, a cleverly structured data has proven to be very useful in frequent pattern mining [5, 7, 17, 19]. The experimental results in [6] show that NTK is more effective than FAE and VTK. Besides, Vo et al. [19] introduced the NSFI algorithm for mining frequent patterns that used the N-list and subsume index concepts to improve the mining time and memory usage.

978-1-4799-3840-7/14/$31.00 ©2014 IEEE

2.1

Problem of mining top-rank-k patterns Deng et al. [4] described the problem of mining TRFPs as follow. Let I = {i1, i2, ..., im} be a set of items, and DB = {T1, T2,..., Tn} be a set of transactions, where Ti (1  i  n) is a transaction that has a unique identifier and contains a set of items. Given a pattern X and a transaction T, we say T contains X if and only if X ⊆ T. TABLE I. Transaction

THE EXAMPLE DATASET (DBE)

Items

Ordered frequent items

1

a, b

a, b

2

a, b, c, d

c, a, b, d

3

a, c, e

c, a, e

4

a, b, c, e

c, a, b, e

5

c, d, e, f

c, d, e

6

c, d

c, d

Definition 1 (The support of a pattern). Given a DB and a pattern X (⊆ I), the support of a pattern X (SUPX) in DB is the number of transactions containing X. Definition 2 (The rank of a pattern). Given a DB and a pattern X (⊆ I), the rank of X (RX) is defined as RX = |{SUPY | Y ⊆ I and SUPY  SUPX}|, where |Y| is the number of items in Y. Definition 3 (TRFPs). Given a DB and a threshold k, a pattern X (⊆ I) is belong to a TRFPs (TRk) if and only if RX  k. Given a DB and a threshold k, the TRFPs mining is to find the set of frequent patterns whose ranks are no greater than k. That means TRk = {X | X ⊆ I and RX  k}. Example 1. The DBE in Table 1 (the first and second columns) is used throughout the article. According to Definition 1, SUP{c} = 5 because five transactions 2, 3, 4, 5 and 6 contain it. Table 2 shows the ranks and supports of all patterns in DBE. According to Table 2, SUP{c} is the biggest, therefore, R{c} = 1.

2008

THE RANKS AND SUPPORTS OF ALL PATTERNS FOR DBE

TABLE II. Rank

Support

1

5

{c}

2

4

{a}

3

3

{ca}, {ce}, {d}, {cd}, {b}, {ab}, {e}

4

2

{cea}, {ae}, {cb}, {cba}

1

{cdba}, {ceba}, {cda}, {cfed}, {ceb}, {cfe}, {dfe}, {ced}, {df}, {adb}, {cdb}, {cfd}, {de}, {ef}, {aeb}, {ad}, {bd}, {f}, {be}, {cf}

5

Patterns

2.2

N-list structure Deng et al. [6] presented the PPC-tree, an FP-tree-like structure with the PPC-tree construction algorithm and N-list structure as follows. Definition 4. PPC-tree is a tree structure with following features: (i) it consists of one root labeled as “null” and a set of item prefix sub-trees as the children of the root. (ii) Each node consists of five fields: item-name, count, children-list, preorder, and post-order. The value of item-name registers which item this node represents. The value of count registers the number of transactions presented by the portion of the path reaching this node. The value of children-list registers all children of this node. The values of pre-order and post-order are the pre-order and post-order ranks of the node. The PPC-tree construction algorithm [6] was also proposed and re-presented in Fig. 1.

Input: A transaction dataset DB Output: A PPC-tree Method: Construct-PPC-tree 1. Finding all items and their supports by scan DB. 2. Sort all items in support descending order. If the supports of some items are equal, the orders among them can be assigned arbitrarily. 3. Create the root of a PPC-tree, Tr, and label it as “null”. 4. For each transaction Trans in DB do 5. Select the items in Trans and sort them in support descending order. Let the sorted item list in Trans be P. 6. Call Insert_Tree(P, Tr). 7. Scan the PPC-tree to generate the pre-order and the postorder values of each node by pre-order traversal and postorder traversal. To traverse a PPC-tree in pre-order, we need to perform the following three operations, visiting the root node then traversing all left subtree and then traversing all right subtree. To traverse a PPC-tree in post-order, we also need to perform the following three operations, traversing all left subtree and then traversing all right subtree and finally the root node. Function Insert_Tree(P, Tr) 1. Let p be the first element in P and P = P \ p 2. If Tr has a child N such that N.item-name = p then N.count ++. 3. Else the function will create a new node N with N.count = 1 and N.item-name = p, add it to Tr’s children-list. 4. If P is nonempty then call Insert_Tree(P, N). Figure 1. PPC-tree construction algorithm

Figure 2. The PPC-tree for DBE

Definition 5 (PP-code). For each node Ni in a PPC-tree, we call Pi = ¢(Ni.pre-order, Ni.post-order): Ni.count² as the PPcode of Ni. Property 1 (Ancestor-descendant relationship of PPcodes). Given two PP-codes Pi and Pj, Pi is an ancestor of Pj if and only if Pi.pre-order < Pj.pre-order and Pi.post-order > Pj.post-order. Example 3. Let P1 = ¢(4,6):3² and P2 = ¢(7,3):2². Based on Property 1, P1 is an ancestor of P2 because P1.pre-order = 4 < P2.pre-order = 7 and P1.post-order = 6 > P2.post-order = 3. Definition 6 (The N-list of a 1-pattern). Given a PPCtree, the N-list of a frequent 1-pattern A, NL(A), is a sequence of all the PP-codes of nodes registering the item in the PPCtree. The PP-codes are arranged in an ascending order of their pre-order values.

Figure 3. The N-lists of all 1-patterns in Example 1

Example 4. The N-list of b is NL{b} = {¢(2,0):1², ¢(5,4):2²}. Fig. 3 shows the N-lists of all frequent items in Example 1. Definition 7 (The N-list of a t-pattern). Let the N-list of P1={ixi1i2...i(k-2)} is NL1={¢(x11,y11):zP11²,¢(x12,y12):z12²,...,¢(x1m, y1m):z1m²} and the N-list of P2={iyi1i2...i(k-2)} is NL2={¢(x21, y21):z21²,¢(x22,y22):z22²,...,¢(x2n,y2n):z2n²}. The N-list of P={ixiyi1i2 ...i(k-2)}, (NLP), is a sequence of PP-codes sorted by pre-order ascending order and generated by the following rule: For any ¢(x1p,y1p):z1p² ∈ NL1 (1  p  m) and ¢(x2q,y2q):z2q² ∈ NL2 (1  q  n), if ¢(x1p,y1p):z1p² is the ancestor of ¢(x2q,y2q):z2q², then ¢(x1p,y1p):z2q² ∈ NLP. As shown by Fig. 3, NL{c} = {¢(3,10):5²} and NL{f} = {¢(11,7):1²}. According to Definition 7, NL{cf} = ¢(3,10):1² . Fig. 4 shows the whole processing procedure.

Example 2. Firstly, items in transactions will be sorted in descending order of frequency. The original dataset can be rewritten as Table 1 (the third column). Secondly, PPC-tree (Fig. 2) was generated for DBE. Figure 4. The N-list of cf in Example 1

2009

Property 2. Given the N-list of item X, NLX = {¢(x1,y1):z1², ¢(x2,y2):z2²,..., ¢(xl,yl):zl²}, SUPX = z1+z2+···+zl. Example 5. We have NL{a} = {¢(1,1):1², ¢(4,6):3²}. Hence, SUP{a}= 1 + 3 = 4. Property 2 means that we can obtain the support of a pattern by simply scanning its N-list without scanning the whole transaction dataset. This greatly reduces the time for computing patterns’ supports. 2.3

The subsume index of frequent 1-patterns To reduce the search space, the concept of the subsume index was proposed in [15] which is based on the function GX = {T.ID∈ DB | X ‫ ك‬T} where T.ID is the ID of the transaction T, and GX is the set of IDs of the transactions which include all items i ∈ X. Example 6. We have G{c} = {2, 3, 4, 5, 6} because c exists in the transactions 2, 3, 4, 5, 6. Definition 8 [15]. The subsume index of a frequent 1pattern, A, denoted by SSA is defined as follows: SSA= {B ∈ I1 | GA ‫ ك‬GB}. Example 7. We have G{e} = {3, 4, 5} and G{c} ={2, 3, 4, 5, 6}. Because G{e} ‫ ك‬G{c}, thus c ∈ SS{e}. In [15], the following property concerning the subsume index idea were also presented, which in turn can be used to speed up the frequent pattern mining process. Property 3 [15]. Let the subsume index of a pattern X be {a1,a2,…, am}. The support of the patterns which is combined X with each of the 2m -1 nonempty subsets of {a1, a2,…, am} is equal to SUPA. Example 8. According to Example 6, we have SS{e} = {c}. Therefore 2m -1 nonempty subsets of SS{e} is only {c}. Based on Property 3, the support of 2m - 1 pattern which are combined 2m -1 nonempty subsets of SS{e} with e is equal to SUP{e}. In this case, we have SUP{ec} = SUP{e} = 3. Besides, SUPXe = SUPXec. For detail, {ae} is a frequent pattern with SUP{ae} = 2. So, {aec} is a frequent pattern and SUP{aec} = 2. 3 THE INTK ALGORITHM 3.1 Lemma of mining TRFPs Lemma 1. If X is not in TRFPs, any pattern Y such that X ⊆ Y cannot be in TRFPs. Proof. For any transaction T, if Y ⊆ T, we have X ⊆ T. According to Definition 1, SUPX  SUPY and according to Definition 2, RX  RY. Therefore, if X is not in TRFPs then Y is also not in TRFPs. Lemma 1 shows that the anti-monotone property of TRFPs. We will employ this property to discover all TRFPs from short patterns to long patterns by using an iterative approach known as a level-wise search [1]. 3.2

The N-list intersection function Vo et al. [17, 19] proposed an improved N-list intersection function (Fig. 5) for determining the intersection of two N-lists which was O(n+m) where n, m is the length of the first, the second N-lists. Function NL_intersection Input: N-list NL1 and N-list NL2 Output: the result N-list and its frequency 1. NL3 ← ∅ 2. Let i = 0, j = 0 and SUP = 0 3. While i < |NL1| and j < |NL2| do

4. If NL1[i].pre-order < NL2[j].pre-order then 5. If NL1[i].post-order > NL2[j].post-order then 6. If |NL3| > 0 and the last element of NL3 has the pre-order value is equal to NL1[i].pre-order then 7. Increase the count value of the last element of NL3 by NL2[j].count 8. Else 9. Add the tuple ¢NL1[i].pre-order, NL1[i].post-order, NL2[j].count² to NL3 10. SUP = SUP + NL2[j++].count 11.Return NL3 and SUP Figure 5. The improved N-list intersection function

3.3

The subsume index associated with each frequent 1pattern Vo et al. [17, 19] presented the property of the subsume index associated with each frequent 1-pattern based on N-list concept which is summarized as follows: Property 4. Let A be a frequent 1-pattern. The subsume index of A, SSA = {B ∈ I1 | ∀Pi ∈ NLA, ∃Pj∈ NLB and Pj is an ancestor of Pi}. Example 9. We have NL{c} = {¢(3,10):5²} and NL{e} = {¢(7,3):1², ¢(8,5):1², ¢(10,8):1²}. According to Property 4, ¢(7,3):1², ¢(8,5):1² and ¢(10,8):1² ∈ NL{e} are descendants of ¢(3,10):5² ∈ NL{c}. Therefore, c ∈ SS{e}. Property 5. Let A, B, C ∈ I1 be three frequent 1-patterns. If A ∈ SSB and B ∈ SSC then A ∈ SSC. Procedure Find_Subsume 1. For i ← 1 to |I1| - 1 do 2. For j ← i - 1 to 0 do 3. If j ∈ I1[i].subsume then continue 4. If CheckSubsume( I1[i].N-list, I1 [j].N-list) = true then // using Property 4 5. Add I1[j].item-name and its index, j, to I1[i].subsume 6. Add all elements in I1[j].subsume to I1[i].subsume // using Propery 5 Function CheckSubsume(N-list a, N-list b) 1. Let i=0 and j=0 2. While j < |a| and i < |b| do 3. If b[i].pre-order < a[j].pre-order and b[i].post-order > a[j].post-order then j++ 5. Else i++ 7. If j = |a| then 8. Return true 9. Return false Figure 6. The generating subsume index procedure

3.4

Algorithm iNTK employs t-patterns to explore (t+1)-patterns. By using N-lists, iNTK does not need to scan datasets repeatedly when we want to get the supports of (t+1)-patterns. Furthermore, using subsume index will reduce runtime of candidate generating function because the number of candidates generated less than in the original algorithm when we only use items not belong to subsume of the current pattern to generate new candidate. In addition using subsume index also save storage space N-list of patterns which certainly in the TRFPs and not used to create the next candidate. The processing procedures are listed below.

2010

(1) Scan the PPC-tree and generate N-lists of all 1-patterns. (2) Find subsume of all 1-patterns. (3) Find TRFPs and insert them into the top-rank-k table. The top-rank-k table contains patterns and their supports. All patterns with the same support are stored in the same entry. The number of entries in the top-rank-k table is not more than threshold k. With each 1-pattern is inserted to top-rank-k, find subsets of subsume of 1-pattern, associated with 1-pattern and insert into top-rank-k, has same rank of 1-pattern. (4) Use 1-patterns in the top-rank-k table to generate candidate 2-patterns. Combining 1-pattern to create 2pattern only use 1-patterns which not belong to subsumes of current 1-pattern. If the support of a candidate 2pattern is not less than the smallest support of the toprank-k table, the candidate 2-pattern is inserted into the top-rank-k table. (5) If 2-pattern was chosen insert into top-rank-k table, use Property 5, find 2m-1 subset of subsume index and associated with 2-pattern to create patterns which have support equals support of current 2-pattern. Insert this patterns into rank of 2-pattern. After each inserting operation, the top-rank-k table is checked to ensure the number of entries is not more than k. If the number is bigger than k, the entries with support that is less than kth maximum support are deleted from the top-rank-k table. (6) Repeat steps 4 and 5 by using t-patterns (was produced in candidate_generate function) in the top-rank-k table to generate candidate (t+1)-patterns until no new candidate patterns can be generated. Input: A transaction dataset DB Output: a top-rank-k table, Tabk, which has a fixed number of entries. Each entry contains patterns with the same rank. Method: 1. Call Construct-PPC-tree algorithm to build PPC-tree 2. Determine the N-list of I1 (the frequent 1-patterns) 3. Call Find_Subsume to find all the subsume index of each item in I1 4. Find 1-patterns with rank not larger than k in all 1-patterns; denote the set of these 1-patterns by TR1 5. Insert 1-patterns and patterns was produced by combined 2m-1 subset of subsumes with 1-patterns into Tabk with their supports 6. For (j=2; TRj-1  ∅; j++) 7. CRj ← Candidate_gen(TRj-1) 8. For each C ∈ CRj, let C is generated by P1(∈ TRj-1) and P2 (∈TRj-1) 9. SUPC and C.N-list ← NL_intersection(P1.N-list, P2.N-list) 10. If SUPC is larger than or equal support of pattern in rank, insert C with its support into Tabk 11. If C.subsume not null, let S be the set of subset generated from all elements of C.subsume 12. For each subset s in S, insert C combined s with SUPC into Tabk 13. TRj ← C Procedure Candidate_gen(TRj-1) 1. CRj ←∅ 2. For each Cu ∈ TPj-1 3. For each Cv ∈ TPj-1 (Cv  Cu)

If (Cv[1] ∉ Cu.subsume) and (Cu[1]  Cv[1] ^ Cu[2] = Cv[2] ^... Cu[j-2] = Cv[j-2] ^ Cu[j-1] = Cv[j-1]) then 5. C ← Cu[1]Cv[1]Cu[2]...Cu[j-2]Cu[j-1] 6. C.subsume.add(Cu.subsume) 7. If each (j-1)-subset of C belongs to TRj-1 then 8. CRj ←CRj∪{C} 9. Return CRj 4.

Figure 7. The iNTK algorithm

4 EXPERIMENTAL RESULTS This section reports the running time and memory usage of iNTK and NTK. Two datasets1 are used for this section. Table 3 shows their characteristics including the number of transactions (#Trans) and the number of items (#Items). All the experiments were performed on a PC with Intel Core2 Duo 2.66G and 2G Memory. The operating system was Microsoft Windows 7. All the programs were coded in MS/Visual C#. TABLE III.

STATISTICAL SUMMARY OF THE EXPERIMENTAL DATASETS Dataset

#Trans

#Items

Mushroom

8,124

120

Retail

88,162

16,470

4.1

The mining time According to the various requirements, the input data for the two algorithms is little different. NTK requires Node-list structure. iNTK requires N-list structure and subsume index of frequent 1-pattern. In this process, iNTK requires more time than NTK. However, this procedure is called only once, and the result is used repeatedly to generate top-rank-k with different k values. Therefore, it is not significantly affect the efficiency of iNTK. Table 4 shows the time for this procedure. Figs 8-9 show the mining time of iNTK and NTK for the experimental datasets with different values of thresholds k. The experimental results can be observed that iNTK faster than NTK with large thresholds and dense datasets. TABLE IV. Datasets

THE CONVERTING TIME

The converting Time for Node-list (sec)

The converting Time for Nlist and Find Subsume (sec)

Mushroom

0.275016

0.3920224

Retails

2715.031

3893.403

Figure 8. The mining time of iNTK and NTK for Mushroom dataset

1

2011

downloaded from http://fimi.ua.ac.be/data/

ACKNOWLEDGMENTS This research was funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2013.20. REFERENCES [1] [2] Figure 9. The mining time of iNTK and NTK for Retail dataset

[3]

The memory usage Using the subsume index concept not only reduces the mining time but also reduces the memory usage. Figs. 10-11 show the amount of memory used by the two algorithms with different values of thresholds k. Generally, iNTK uses less memory than that by NTK.

[4]

4.2

[5]

[6] [7]

[8] [9] [10] [11] Figure 10. The memory usage of iNTK and NTK for Mushroom dataset

[12]

[13]

[14]

[15]

[16] Figure 11. The memory usage of iNTK and NTK for Retail dataset

5 CONCLUSIONS AND FUTURE WORK This paper presented iNTK which uses the N-list structure and subsume concept for mining TRFPs which reduces the number of patterns generated. Through extensive experiments, the iNTK outperforms NTK in terms of the mining time and memory usage for various datasets. For the future work, the new methods for mining top-rankk compressed frequent patterns, such as maximal frequent patterns or closed frequent patterns are interesting topics.

[17] [18]

[19]

[20] [21]

2012

Agrawal R., Srikant R. (1994). Fast algorithms for mining association rules. In VLDB’94, pp. 487-499 Agrawal R., Srikant R. (1995). Mining sequential patterns. In ICDE’95, pp. 3-14 Agrawal R., Imielinski T., Swami, A. (1993). Mining association rules between set of items in large databases. In SIGMOD'93, pp. 207-216 Deng Z.-H., Fang G. (2007). Mining top-rank-k frequent patterns. In ICMLC, pp. 851–856 Deng Z.-H., Wang Z. (2010). A new fast vertical method for mining frequent patterns International Journal of Computational Intelligence Systems, 3(6), 733–744 Deng Z.-H. (2014). Fast mining top-rank-k frequent patterns by using Node-lists. Expert Systems with Applications, 41(4), 1763-1768 Deng Z.-H., Wang Z., Jiang J.J. (2012). A new algorithm for fast mining frequent itemsets using N-lists. SCIENCE CHINA Information Sciences, 55(9), 2008-2030 Fang G., Deng Z.-H. (2008). VTK: Vertical mining of top-rank-k frequent patterns. In FSKD’08, pp. 620–624 Han J., Pei J., Yin Y. (1999). Mining frequent patterns without candidate generation. In SIGMOD’00, pp. 1–12 Han J., Wang J., Lu Y., Tzvetkov P. (2002). Mining top-k frequent closed patterns without minimum support. In ICDM’02, pp. 211–218 Liu B., Hsu W., Ma Y. (1998). Integrating classification and association rule mining. In KDD'98, pp. 80-86 Nguyen T.T.L., Vo B., Hong T.-P., Thanh H.C. (2012). Classification based on association rules: A lattice-based approach. Expert Systems with Applications 39(13), pp.11357-11366 Nguyen, T.T.L., Vo, B., Hong, T.P., Thanh, H.C. (2013). CAR-Miner: An efficient algorithm for mining class association rules. Expert Systems with Applications 40(6), pp. 2305-2311 Pham T.T., Luo J., Hong T.P., Vo B. (2014). An efficient method for mining non-redundant sequential rules using attributed prefix-trees. Engineering Applications of Artificial Intelligence, 32, 88-99 Song W., Yang B., Xu Z. (2008). Index-BitTableFI: An improved algorithm for mining frequent itemsets. Knowledge-Based Systems, 21, 507-513 Vo B., Hong T.-P., Le B. (2012). DBV-Miner: A dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Systems with Applications 39(8), 7196-7206 Vo B., Coenen F., Le T., Hong T.-P. (2013). A hybrid approach for mining frequent itemsets. In SMC’13, 4647-4651 Vo B., Hong T.P., Le B. (2013). A lattice-based approach for mining most generalization association rules. Knowledge-Based Systems, 45, 20–30 Vo B., Le T., Coenen F., Hong T.-P. (2014). Mining frequent itemsets using N-list and subsume concepts. International Journal of Machine Learning and Cybernetics. DOI: 10.1007/s13042-014-0252-2 Zaki M.J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), 372-390 Zaki M.J., Gouda K. (2003). Fast vertical mining using diffsets. In SIGKDD'03, pp. 326-335

Suggest Documents