第十四屆離島資訊技術與應用研討會
A List-based Algorithm for Efficiently Updating the Discovered High-Utility Itemsets Jerry Chun-Wei Lin1,2, Wensheng Gan1,2, Tzung-Pei Hong3,4, and Mu-En Wu5 1 Innovative Information Industry Research Center (IIIRC) 2 Shenzhen Key Laboratory of Internet Information Collaboration School of Computer Science and Technology Harbin Institute of Technology Shenzhen Graduate School HIT Campus Shenzhen University Town, Xili, Shenzhen 518055 P.R. China 3 Department of Computer Science and Information Engineering National University of Kaohsiung, Kaohsiung, Taiwan, R.O.C. 4 Department of Computer Science and Engineering National Sun Yat-sen University, Kaohsiung, Taiwan, R.O.C. 5 Department of Mathematics, SooChow University
[email protected],
[email protected],
[email protected],
[email protected] mine the frequent itemsets in a level-wise way based on minimum support threshold and secondly produce the association rules based on minimum confidence threshold. Thus, only the occurrence frequencies of the itemsets are considered but other significant factors such as profit, weight, and quantity, which is inefficient to reveal the importance of the itemsets in real-life applications. To address the above limitation of ARM, high-utility itemsets mining (HUIM) [5, 21, 22] was thus proposed to reveal high-utility itemsets (HUIs). HUIM may be thought of as an extension of frequent itemsets mining, which is emerging as an important topic in recent years. Several algorithms have been proposed to mine the HUIs from a static database [5, 8, 15, 17, 18, 20, 21], which indicates that when the transactions (data) are changed in the original database, a batch-mode procedure is required to process the updated database for discovering the updated HUIs. It is thus necessary to develop a maintenance approach to maintain and update the discovered in the dynamic situation. In the past, several algorithms have been proposed to maintain and update the discovered frequent itemsets of ARM while transactions are changed [7, 10, 12, 14]. Lin et al. then developed several algorithms [13, 16] to effectively maintain and update the discovered HTWUIs and HUIs with transaction deletion. Since the developed
Abstract High-utility itemsets mining (HUIM) has been recently studied to mine high-utility itemsets (HUIs) from the transactional database by considering more factors such as profit and quantity, which is used to overcome the limitation of traditional association-rule mining (ARM). In the past, the FUP-HUI-DEL and PRE-HUI-DEL algorithms were respectively proposed to effectively maintain the discovered high transaction-weighted utilization itemsets (HTWUIs) and high-utility itemsets (HUIs) when the transactions are consequentially deleted from the original database. The original database is still, however, required to be rescanned when small transaction-weighted utilization itemsets in the original database are necessary to be maintained. In this paper, an efficient algorithm to maintain the built utility-list structure for transaction deletion (HUI-list-DEL) is developed. Based on the maintenance approach of the utility-list structure, the HUIs can be directly produced without candidate generation or the numerous database scans. Substantial experiments indicates that the proposed maintenance approach for transaction deletion significantly outperforms the previous approaches in terms of execution time, memory consumption and scalability. Keywords: High-utility itemset mining, Dynamic, Transaction deletion, Maintenance, List structure.
1. Introduction The main purpose of Knowledge Discovery in Database (KDD) is to discover meaningful and useful information from a collection of data. Association-rule mining (ARM) [2, 3, 6, 9, 11] is the fundamental way to discover the binary relationships between itemsets. In the field of ARM, most algorithms consist of two phases, firstly 586
2015 Conference on Information Technology and Applications in Outlying Islands
第十四屆離島資訊技術與應用研討會
algorithms are processed in a level-wise way based on the Apriori-like approach, numerous computations are still required to handle the discovered the HUIs level-by-level. In this paper, an efficient maintenance algorithm with transaction deletion is proposed to efficiently maintain and update the built utility-list structure for directly mining HUIs without candidate generation or extra database scans. Two pruning strategies are also developed to speed up the computations for mining HUIs. Major contributions are stated as follows. 1. The proposed algorithm is unnecessary to rescan the original database for producing the updated HUIs compared to the past algorithms for mining HUIs with transaction deletion. 2. Two early pruning strategies can be used to terminate the depth-search procedure, thus reducing search space in the enumeration tree. 3. The estimated utility co-occurrence structure (EUCS) strategy is also applied in the proposed algorithm to avoid the numerous computations of generation and test approach for 2-itemsets.
processed to overestimate the upper-bound utilities of high transaction-weighted utilization itemsets (HTWUIs) based on two-phase mode. Liu et al. presented the HUI-Miner algorithm [18] to directly produce HUIs without candidate generation. Fournier-Viger et al. also extended HUI-Miner algorithm to design a novel strategy for efficiently mining HUIs based on the co-occurrences among 2-itemsets [8]. The above algorithms are processed to handle the static database to mine the HUIs. In real-life situation, data is consequentially changed whether transaction insertion or transaction deletion. Lin et al. respectively designed FUP-HUI-DEL [16] and PRE-HUI-DEL [13] maintenance algorithms for handling transaction deletion to maintain and update the discovered HUIs and HTWUIs based on the two-phase model [19]. Both the FUP-HUI-DEL and PRE-HUI-DEL algorithms are, however, still required numerous computations to scan the original database for discovering HTWUIs. An additional database scan is also needed for determining the remaining HTWUIs to find the actual HUIs.
2. Related Works High-utility itemsets mining (HUIM) is an extension of the frequent itemsets mining by considering both quantities and profits of itemsets as the interesting factors for deriving valuable and meaningful information than traditional ARM. Chan et al. first presented top-k high utility closed patterns for producing both positive and negative utilities [5]. Yao et al. developed an algorithm to efficiently mine HUIs based on the mathematical properties of utility constraints [21-22]. Liu et al. developed a two-phase model [19] for discovering high-utility itemsets by the designed transaction-weighted downward closure (TWDC) property. Lin et al. also developed a high-utility pattern (HUP)-tree algorithm [15] to compress the original database into a tree structure and use the similar FP-growth algorithm to derive HUIs without candidate generation. The above approaches are
3. Problem Statement Given a transaction database D, the total utility of D is set at TUD, a minimum utility threshold is set at , a set of d transactions are deleted from D, and the total utility of d is set at TUd. The problem of HUIM for transaction deletion from the updated database (D - d) is to find the completely updated set of the k-itemsets whose utilities are no less than × (TUD TUd). Thus, the purpose of HUIM for transaction deletion is to efficiently mine the updated HUIs in the updated database. Based on the current observation of HUIM for transaction deletion, it can be divided into two categories as: (1) the algorithms are processed in batch mode when transactions are changed (deleted) from the original database, and (2) the maintenance strategy to maintain and update the discovered information of HUIs or HTWUIs. In this 587
2015 Conference on Information Technology and Applications in Outlying Islands
第十四屆離島資訊技術與應用研討會
paper, we propose an efficient maintenance algorithm for maintain and updating the discovered HUIs and HTWUIs with transaction deletion
{E}
{B}
{A}
{C}
30
44
2
12
1
2
1
0
1
8
36
1
36
0
3
45
18
5
12
3
3
2
16
3
4
12
3
12
0
5
15
15
6
30
32
5
3
0
6
8
24
4
24
0
7
15
29
7
24
5
7
1
4
7
4
0
6
24
48
8
4
0
9
12
36
9
36
0
10
2
36
12
16
0
10
36
0
11
24
0
8
15
10
4. Proposed Algorithm For HUIM, the utility-list structure must build in advance for later maintenance process. When some transactions are consequentially deleted from the original database, the proposed maintenance is then processed to maintain and update the utility-list structure by the designed procedure. The construction of utility-list structure is described in Algorithm 1.
{D}
1
4
15
38
9
12
11
30
29
12
18
17
11
5
24
12
1
16
0
TID Iutility Rutility
Figure 1: A constructed utility-list structure of 1-itemsets. Based on the utility-list structure, the search space of the proposed HUI-list-DEL algorithm can be represented as the enumeration tree by the ascending order of their TWU values. Each node in the enumeration tree corresponds to an itemset and is an extension (superset) of its parent. The TWU value of left sibling is smaller than the TWU value of right sibling. An illustrated enumeration tree is shown in Figure 2.
Algorithm 1: Utility-list construction INPUT: an itemset X; X.UL is the utility-list of X; Xab.UL, Xa.UL, Xb.UL, Xa⊆X, and Xb⊆X, XaXb. OUTPUT: Xab.UL. BEGIN Procedure 1. Xab.UL = null. 2. for each element EaXa do 3. if EbXb.UL˄Ea.TID = Eb.TID then 4. if X.ULnull then 5. find EX.UL that E.TID = Ea.TID; 6. Eab. 7. else 8. Eab. 9. end if 10. Xab.ULEab. 11. end if 12. end for 13. return Xab.UL. END Procedure
root
E
ED
EDB
EDBA
……
EB
EDA
EDBC EDAC
EDC
D
EA
B
EC
A
C
……
……
E