20
JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008
An Effective Mining Algorithm for Weighted Association Rules in Communication Networks Wu Jian School of Communication and Information Engineering, University of Electronic Science and Technology of China Chengdu, 610054, China
[email protected]
Li Xing ming School of Communication and Information Engineering, University of Electronic Science and Technology of China Chengdu, 610054, China
[email protected]
Abstract-The mining of weighted association rules is one of the primary methods used in communication alarm correlation analysis. With large communication alarm database, the traditional methods often treated each item evenly that makes the process of mining association rules time consuming. To improve the efficiency, items appearing in transactions are weighted using the analytic hierarchy process to reflect the importance of them which is more meaningful in some application. This paper implements a fast and stable algorithm to mining weighted association rules based on ISS(item sequence sets). Experiments on performance study will prove the superiority of the new algorithm. Index Terms-Association rules, Data mining, Mining methods and algorithms, Rule-based databases
I
INTRODUCTION
A.
Background Data mining is a process for discovering previously unknown and potentially useful abstractions from the content of large databases. Mining alarm association rules in networks can find out effective alarm rules which reflect the faults in networks rapidly. It is also validated that the system which provides the user the compact alarm rules with complete information is valuable and useful to the alarm correlation analysis and fault diagnosis, localization and recovery in networks. In modern networks, some of the alarms' attributes that have different levels and QoS requests result in different treatment of alarms. Therefore, traditional methods treated each item uniformly are no longer proper. Due to the problem, we put forward a solution by assigning corresponding weight to attribute of different importance called weighted association rule mining. B.
Association Rules The basic task in mining for association rules is to determine the correlation between items belonging to a transactional database. Let I = {i1, i2,Ă , im} be a set of literals, called items. Let D be a set of transactions where each transaction T is a set of items such that T D, T I. An association rule is an implication of the form
© 2008 ACADEMY PUBLISHER
X=>Y, where X I, Y I and X Y = .In general, every association rule must satisfy two user specified constraints, one is support and the other is confidence. The support of a rule X=>Y is defined as the fraction of transactions that contain X Y, while the confidence is defined as the ratio support(X Y)/support(X). So, the target is to find all association rules that satisfy user specified minimum support and confidence values. We also call them strong rules to distinguish from the weak ones. It has been shown that the problem of discovering association rules can be reduced to two sub-steps: ķ Find all frequent itemsets for a predetermined support ĸ Generate the association rules from the frequent itemsets The rest of this paper is organized as follows. Section 2 summarizes the related work in weighted ARM. Section 3 introduces the AHP to determine the weights of all items. Section 4 presents the proposed weighted association rule mining method ISS. Experimental and comparison results are reported and discussed in Section 5.The last section concludes this paper. II
RELATED WORK
When we compute the weighted support of the rule, we can consider both the support and the weights factors. The classical Apriori algorithm[1] for finding binary association rules depends on the downward closure property which governs that subsets of a frequent itemset are also frequent. However, it is not true for the weighted case in our definition, and the Apriori algorithm cannot be applied. Therefore, another algorithm MINWAL[2] is designed for this purpose. Given a set of items I = {i1, i2,Ă, im}, we assign a weight wj for each item ij, with 0İwjİ1, where j={1, 2,Ă,m}, to show the importance of it. We define the weighted support for the weighted association rule X=>Y is
JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008
(
¦
Z j ) x ( Support ( X Y ))
21
(1)
i j ( X Y )
A k-itemset X is called a frequent itemset if the weighted support of such itemset is no less than the minimum weighted support ( Z minsup) threshold, or
( ¦ Z j ) x ( Support ( X )) t Z min sup
(2)
i j X
An association rule X=>Y is called an interesting rule if X Y is a frequent itemset and the confidence of the rule is greater than or equal to a minimum confidence threshold. In this situation, we want to get a balance between the two measures: weight and support. We have three parameters to consider in the weighted association rule: weights of items, support of itemsets and the confidence factor. Given a database with T transactions, we define the support count of a frequent k-itemset X to be the transaction number containing X, and it must satisfy: Z m in s u p u T Suppo rt ( X ) t (3) ¦Z j
ij X
Let I be the set of all items. Suppose that Y is a q-itemset, where q0 and would increase with 'ci , j , on the contrary
when 'c j ,i 0, then suppose the degree of significance of value ci is M times more than that of value cj (M=1,2,…,10). In other word, the degree of significance of value cj is 1/M of that of value ci. The value scope of the degree of relative significance 'c and the corresponding M are listed in Table 1. After comparing of the two values, the judgment matrix B could be obtained according to the value of 'c . Its form is showed as Table 2 where bij is the degree of relative significance between ci and cj which could take the
22
JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008
corresponding value M from Table 1. To calculate the weight of every ci through judgment matrix could be summed up as the following steps: ķ Once the judgment matrix was built, calculate the product Mi(i =1,2, ...,n) of every row by the following formula: n
M
i
b
(8)
ij
j 1
ĸThen compute the Nth root of all Mi:
Wi
Mi
n
ĹAt last, normalize vector W
Wi
Wi n
T
(9)
ªW1 ,..., Wn º like: ¬ ¼
᧤i =1,2, ...,n᧥
(10)
¦W j j 1
Value Wi represents the corresponding weight of value ci of attribute C. Table1. The degree of relative significance 'c and corresponding M
CI
Omax n n 1
(12) In (12), when CI = 0, the judgment matrix is of complete uniformity; otherwise the bigger CI is, the worse uniformity of the judgment matrix is. In order to check whether the uniformity of judgment matrix is satisfying, the comparison between CI and the average stochastic uniform index RI (Table 3) is necessary. The ratio of CI and RI at the same step is called the stochastic uniform proportion of judgment matrix, as CR. Generally, when CR=CI/RI