Mining Negative Association Rules Xiaohui Yuan, Bill P. Buckles EECS Dept. Tulane University New Orleans, LA 70118 fyuanx,
[email protected]
Zhaoshan Yuan CSI Dept., Hefei Univ. of Tech. Hefei, Anhui, P.R.C. 230009
[email protected]
Abstract The focus of this paper is the discovery of negative association rules. Such association rules are complementary to the sorts of association rules most often encountered in literatures and have the forms of X ! :Y or :X ! Y . We present a rule discovery algorithm that finds a useful subset of valid negative rules. In generating negative rules, we employ a hierarchical graph-structured taxonomy of domain terms. A taxonomy containing classification information records the similarity between items. Given the taxonomy, sibling rules, duplicated from positive rules with a couple items replaced, are derived together with their estimated confidence. Those sibling rules that bring big confidence deviation are considered candidate negative rules. Our study shows that negative association rules can be discovered efficiently from large database.
1. Introduction Data mining is the process of extracting implicit, previously unknown, and potentially useful information from large quantities of data. Through the accretion of current data with historical data, enterprises find themselves in possession of larger data sets in electronic form than at any time heretofore. Various techniques have been employed to convert the data into information, including clustering, classification, regression, association rule induction, sequencing discovery, and so forth. In general, an association rule represents a relationship between two sets of items in the same database. It can be written in the form X ! Y , where X and Y are item sets (i.e., values from stipulated domains) and X \ Y = ;. The left-hand side (LHS) of the rule is called the antecedent, while the right-hand side (RHS) is called the consequence. 1 This work was supported in part by DoD EPSCoR and the State of Louisiana under grant F49620-1-0351.
Jian Zhang EECS Dept. Tulane University New Orleans, LA 70118
[email protected]
Much effort has been devoted and algorithms proposed for efficiently discovering association rules [1, 2, 13, 5, 12, 6]. In applying these algorithms, quantity measurements for generality and correctness are developed, which are support and confidence, respectively. The support measures the frequency with which an item set appears in the database and the support for a rule is defined to be the frequency of occurrence that contains the union of items in the antecedent and consequent. For instance, in a given database, 266 out of 1000 records contain “bread”, and then the support is 26.6%. A rule with low support usually indicates it is not very important. However, the relative frequency of the occurrence of the combinations, which is measured by confidence, is essential in discovering strong rules. Confidence is computed as the ratio of the frequency of cooccurrence of antecedent and consequence divided by the support of the antecedent. It measures the correctness of the rule with the predictability of the rule. In the above example, if 218 records contain both “bread” and “butter”, then the frequency of occurrence of ”bread and butter” is 21.8%. Thus the confidence of the rule “bread ! butter”, which means if bread is purchased, butter is also purchased, will be 0.218/0.266, which is 81.95%. Thus, this rule can be restated as “81.95% of transactions that include purchasing bread also include purchasing butter.” The mining problem consists of finding all association rules having support and confidence that satisfy both predefined minimum support (minsup) and minimum confidence (minconf ), respectively. In addition to the sort of association rule discussed above, there are rules that imply negative relationships. Such rules are called negative association rules [9]. A negative association rule also describes relationships between item sets and implies the occurrence of some item sets characterized by the absence of others. In contrast to negative association rules, the classical rules most frequently studied are called positive association rules here. Algorithms for discovering negative rules are occasionally discussed [9, 10]. Ashok, et al. [9] generates negative rules based on a complex measure of rule parts. Brin, et al. [10] generalize
Proceedings of the Seventh International Symposium on Computers and Communications (ISCC’02) 1530-1346/02 $17.00 © 2002 IEEE
positive and negative rules into correlations and induce with the dependence between two item sets. In brief, mining negative association rules consists of defining a heuristic search procedure. Heuristics tend to favor one set of solutions over others. The heuristics cited above are not designed to find all possible negative association rules. It is the ones that are most useful that the heuristic should target for discovery. Here we argue that criterion for negative association rules utility is its relationship to a valid positive rule. In this article, we present a method to discover negative rules following the support-confidence scheme together with prior knowledge of the relationships between term values in the item sets. In the remainder of this paper, we start with presenting rule effectiveness measures in section 2 then give the problem statement and formal description in section 3. Section 4 explains the algorithm and the experimental results follow in section 5. Discussion and conclusions are in section 6.
2. Rule Effectiveness Measures Rule induction is the process of discovering rules that reach a predefined support-confidence threshold. If we consider the problem of determining which rules are meaningful from an information-theoretic aspect, those with high occurrence rate and corresponding low entropy, tend to be rules. P. Smyth et al. [11] proposed an alternate entropy measure of information level for rules and Gregory Piatesky-Shapiro [8] used the measure based directly on probabilities. However, in mining association rules, the purpose is to find the most often regularity. That means the way to measure “goodness” should be different. Fortunately, in our situation, namely, mining negative association rules, measuring the deviation is more critical than identifying information level itself. Generally, items from the same ‘market basket’ tend to appear in the same set of transactions. If rules are modified through substitution of items within the given basket, we naturally expect the truth to still hold with respect to the same measure. However, careful computation can show that the measure of some rules can deviate greatly from what would be expected. Such rules include negative association rules. A simple example of a negative rule is “78% of transactions that include purchasing bread and milk, do not include butter.” Example: Table 1(a) shows synthetic data for vehicle purchase information. Assumption 1: The minimum support is 30% and minimum confidence is 70%. Assumption 2: The numeric attribute AGE ranges from 18 to 70 and is quantized into two groups - less than thirty and over thirty. Item sets that satisfy minsup (=30%) are listed in Table 1(b). The rule that satisfies both minimum support and minimum confidence criterion is “age < 30 ! coupe”, the
Table 1. (a) vehicle purchase information, (b) large item sets with minimum support of 30% SSN 711205331 831213362 914223733 751238334 116243395 271253036 318261337 491273328 510283339 611294340
Name John Smith Tom Lee Rachel White Jack Gates Peter Cruise Sally Lent Mike Swami Erin Buckles Nick Park Jane Chen (a) Item sets Age < 30 Age > 30 Coupe Sedan Age < 30, coupe (b)
Age 23 45 50 27 38 20 41 22 39 40
Vehicle Type Coupe Truck Van Sedan Sedan Coupe Truck Coupe Coupe Sedan
Support 40% 60% 40% 30% 30%
confidence of which is 75%. However, if we are also looking for negative association rules, there exists a rule: “age >30 ! ‘not purchasing coupe”’, which has a confidence of 83.3%. For the purpose of identifying purchase pattern, it is obvious that the latter has better predictive ability. The preceding example illustrates that negative association rules are as important as positive ones. However, there are at least two aspects making mining negative rules difficult. First, we cannot simply pick threshold values for support and confidence that are guaranteed to be effective in sifting both positive and negative rules. Second, in a practical database, thousands of items are included in the transaction records. However, there may be a large number of items in the domain of an attribute that do not appear in the database or appear an insignificant number of times. An absent item is defined to be one that occurs an insignificant number of times in the transaction set. If we consider negative rules for which the LHS or RHS is a combination of absent as well as frequently used items, then there will be a great many of them. In this case, a valid rule, however, is not necessarily a useful one. This article addresses both issues.
3. Problem Statement The negative association rule differs from its positive counterpart not only in the mining procedure but also in
Proceedings of the Seventh International Symposium on Computers and Communications (ISCC’02) 1530-1346/02 $17.00 © 2002 IEEE
form. Before we present the problem of discovery of negative association rules, let’s state its formal definition.
3.1. Definition of Negative Association Rule Let I = fi1 ; i2 ; : : : ; in g be a set of items, where each item ij corresponds to a value of an attribute and is a member of some attribute domain Dh = fd1 ; d2 ; : : : ; ds g; i:e: ij 2 Dh. For simplicity, we make domains discrete. Let T = f1 ; 2 ; : : : ; p g be a table, where T D1 D2 : : : Dq . The Cartesian product specifies all possible combinations of values from the underlying domains [3]. Each transaction j = hg1 ; g2 ; : : : ; gm i in table T is associated with an unordered attribute value set tj = fg1 ; g2 ; : : : ; gm g using the obvious transform, where tj I . We say that tj contains itemset X , which is a subset of I , if for 8ik 2 X; ik 2 tj . A positive association rule represents a relationship between two sets of items in the form of X ! Y , where X I , Y I and X \ Y = ;. It might be noted that our definition of rules differs from previously published study [1, 2] in that our item domains are not necessary binary. We say that tj does not contain itemset X I , if 9ik 2 X , ik 62 tj . A negative association rule is an implication of the form X ! :Y (or :X ! Y ), where X I , Y I and X \ Y = ;(Note that although rule in the form of :X ! :Y contains negative elements, it is equivalent to a positive association rule in the form of Y ! X . Therefore it is not considered as a negative association rule.) In contrast to positive rules, a negative rule encapsulates relationship between the occurrences of one set of items with the absence of the other set of items. Our explanation is based on the former negative rule, X ! :Y ; however, we note in the following descriptions where there are differences. The rule X ! :Y has support s% in the data sets, if s% of transactions in T contain itemset X while do not contain item set Y . The support of a negative association rule, supp(X ! :Y ), is the frequency of occurrence of transactions with item set X in the absence of item set Y . Let U be the set of transactions that contain all items in X . The rule X ! :Y holds in the given data set (database) with confidence c%, if c% of transactions in U do not contain item set Y . Confidence of negative association rule, conf (X ! :Y ), can be calculated with P (X [ :Y )=P (X ), where P () is the probability function. Previous algorithms for discovering positive association rules, such as Apriori and AprioriTid [2], require multiple passes over the database. The support and confidence of item sets are calculated during iterations. However, it is difficult to count the support and confidence of non-existing items in transactions. To avoid counting them directly, we can compute the measures through those of positive rules. Given supp(X ! Y ) and conf (X ! Y ), the support and
confidence of the negative rule X as:
! :Y
can be computed
supp(X ! :Y ) = supp(X ) ; supp(X ! Y ) conf (X ! :Y ) = 1 ; conf (X ! Y )
(1) (2)
The computation of the support and confidence of negative rules of form :X ! Y differ, which is given in equation 3 and 4.
supp(:X ! Y ) = supp(Y ) ; supp(Y ! X ) conf (:X ! Y ) = 1 ;P (PY()X ) (1 ; conf (Y ! X )
(3) (4)
3.2. Mining Negative Association Rule Mining association rule as most frequently presented involves finding rules that satisfy both user-specified minimum support and minimum confidence. We cannot intuitively “complement” the thresholds to find negative rules. That is, we cannot find positive rules with small support and confidence values, such that by applying equations 2 and 4, the corresponding negative rules are exposed. That will result in many uninteresting rules. To eliminate unwanted rules and focus on potential interesting ones, we predict possible interesting negative association rules by incorporating domain knowledge of the data sets. 3.2.1 Locality of Similarity Let T be the taxonomy of the given item sets. Taxonomy T consists of vertexes and directed edges. Each vertex represents a class. The vertex with in-degree of zero is the most general class and the vertexes having out-degrees of zero are the most specific classes and correspond to items. Two vertexes are connected with a directed edge, which represents an is-a relationship. An example is shown in Figure 1.
Figure 1. Example of taxonomy In most cases, taxonomies over the items can be devised. They provide domain knowledge of the item sets. Within
Proceedings of the Seventh International Symposium on Computers and Communications (ISCC’02) 1530-1346/02 $17.00 © 2002 IEEE
the taxonomy, two kinds of relationships are important — vertical relationships and horizontal ones [13, 4, 7]. The vertical relationship semantics is that the lower level vertex values are instances of the values of immediate predecessor vertexes, i.e., the is-a relationship. In [13], a vertical relationship is used to discover generalized association rules. The semantics of the horizontal relationship is that the vertexes on the same level having the same immediate predecessor (siblings to borrow from rooted tree terminology) encapsulate similarity among classes. We call sibling relationships locality of similarity (LOS). Items belonging to the same LOS tend to participate in similar association rules. This is generally true because members of such groups tend to have similar activity patterns. For example, in a retail database, instances are items involved in transactions and customers are participants. If there is no preference for each person, the purchase probability of each item will be evenly distributed over all brands. Items fall into the same LOS can be denoted as [i1 ; i2 ; : : : ; im ], where i1 ; i2 ; : : : ; im 2 I are members of a LOS and are siblings. In the taxonomy shown in Figure 1, ‘IBM Aptiva’ and ‘Compaq Deskpro’ are on the same level and are inherited from ‘Desktop System’, thus they belong to one LOS that can be written as [‘IBM Aptiva’, ‘Compaq Deskpro’]. However, the LOS can be extended to different levels following the same parent node. For instance, it is more reasonable to put ‘IBM Aptiva’, ‘Compaq Deskpro’, ‘Notebook’, and ‘Parts’ into one LOS when viewing the database at a more abstract level. Intuitively, siblings are in the same LOS. Further, similarity is inheritable. Formally, the similarity assumption in this paper is stated as follows: = fi1 ; i2 ; : : : ; ih ; : : : ; im g; Y = fj1 ; j2 ; : : : ; jl g, and X; Y I; X \ Y = ;; 9ih 2 I; ik 62 X; and[ih; ik ], such that supp(ih) minsup, supp(ik ) minsup;
(a) Let X (b)
If 9(r : X ! Y ), where rule r consists of antecedent X and consequence Y , that is supp(X ! Y ) minsup; conf (X ! Y ) minconf , there is a possibility that the inequality conf (X 0 ! Y ) minconf is true if the item set X 0 = fi1 ; i2 ; : : : ; ik ; : : : ; im g is the same as X except item ik is substituted for ih . We denote the rule X 0 ! Y as rule r0 . We say that rule r : X ! Y and rule r0 : X 0 ! Y are sibling rules to each other. 3.2.2 Discovering Negative Rules The use of localities of similarity provides clues to potentially useful negative rules. Suppose that items ih ; ik 2 I are members of a LOS, that is, [ih ; ik ]. If rule r : X ! Y is true and ih 2 X , then based on the similarity assumption, by substituting ik for ih in the antecedent of rule r, a sibling
positive rule r0 : X 0 ! Y is generated. However, if such association is not supported, which means the occurrence of Y is not related to X 0 , then the corresponding negative association may exist. A salience measure needs to be defined. In the context of the taxonomy, the salience measure is defined as a distance between confidence levels:
SM = jconf (r0 ) ; E (conf (r0 ))j
(5)
Where conf (r0 ) is the actual confidence of rule r0 , computed with equations 2 or 4. E (conf (r0 )) is the estimated confidence of rule r0 , which is defined to be equal to the confidence of r based on the similarity assumption. A large value for SM is evidence for accepting the hypothesis that X 0 ! Y is false. That is, X 0 ! :Y may be true. From the information theoretic perspective, a large SM value signifies large information gain. In a sense, “interestingness” is associated with high entropy. If the candidate negative association rule satisfies both support and confidence criteria, it is kept. In brief, to qualify as a negative rule, it must satisfy two conditions: first, there must exist a large deviation between the estimated and actual confidence and, second, the support and confidence are greater than the minima required. 3.2.3 Pruning A major consideration in mining negative rules is avoiding the many uninteresting rules. This is overcome mostly by a prediction procedure. However, in addition to the pruning process in discovering positive rules, some redundancy introduced by negative rules need to be described. In constructing candidate negative rules, there are possibilities that an equivalent or similar pair is generated, such as X ! :Y and Y ! :X . Obviously, if both rules appear in the result, based on the equivalence theory, only one of them is enough to represent the information carried by both. Thus, one is pruned. Another redundancy exists when items from a LOS [i1 ; i2 ; : : : ; im ] constitute m ; k negative rules for which k rules are coupled with positive ones, and all are sibling rules. The pruning will either keep all positive ones or keep all negative ones that have high confidence. An example is the pruning between rule “Female ! `BuyHat0 ”, and “:Male ! `BuyHat0 ”. In this case, the domain of attribute gender only includes two values: Female and Male, which means :Male is equivalent to Female. Thus only one rule, either Female ! `BuyHat0 or :Male ! `BuyHat0 , is saved.
4. Algorithm The discovery procedure can be decomposed into three stages: (1) find a set of positive rules; (2) generate negative
Proceedings of the Seventh International Symposium on Computers and Communications (ISCC’02) 1530-1346/02 $17.00 © 2002 IEEE
rules based on existing positive rules and domain knowledge; (3) prune the redundant rules. The algorithm is shown in List 1. In generating negative rules, both RHS negative rules and LHS negative rules are developed. However, in identifying candidate negative rules, salience measures are computed for each type of negative rules based on different equations, namely equation 2 and equation 4.
1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) 16) 17) 18) 19) 20) 21) 22) 23) 24) 25) 26) 27) 28) 29) 30)
List 1. Mining negative association rules // Finding all positive rules FreqSet1 = ffrequent 1-itemsetsg; k = 2; while (FreqSetk;1 6= ;) for all transactions g 2 DataSet CandidateSett = subset (CandidateSetk , g); for all candidates c 2 CandidateSett c.count = c.count + 1; endfor endfor FreqSetk = fc 2 CandidateSettj c.count minsupg; k=k+1; endwhile // Generate Positive Rules with Apriori postiveRule = genRule(FreqSetk ); Rule = postiveRule; // Generate Negative Rules Delete all items t from the taxonomy, t 62 FreqSet1 for all rules r 2 postiveRule tmpRuleSets = genNegCand(r); for all rules tr 2 tmpRuleSets if SM(tr.conf, t.conf) > confDeviate) Rule = fRule, Neg(tr) j Neg(tr).supp > minsup, Neg(tr).conf > minconf g; endif endfor endfor // Pruning if all members of LOS have common itemset that form fr1 ; r2 ; : : : ; rn g Rule delete rk , where rk falls in the categories (see 2.2.4) endif
5. Results Our experiments are performed on a database of TV cable transactions. A few sample records are listed in Table 2. Different support-confidence thresholds were tested. Table 3(a) gives the LOS for discovering negative rules. A few positive and negative rules are listed in Table 3(b). They are generated under the support and confidence constraints of 18% and 70%, respectively. Note that rules in the right column are negative rules discovered with respect to positive
rules in the left column. Figure 2 shows the number of positive rules and negative rules vs. user-specified minimum support and minimum confidence. Thus, the algorithm can successfully generate negative rules and the number of negative rules discovered is reasonable. From Figure 2, we can see that the number of negative rules tends to be related to the number of positive rules. However, it is inversely proportional to the minimum support threshold. The reason is less and less high-support 1-item set survives with increasing support threshold, which reduces the number of candidate negative rules significantly. During the discovery of negative rules, positive rules, which are prior information, generally remain unchanged. Nevertheless, after negative rules are generated, both positive and negative rules are reconsidered together to reduce redundancy. That is, positive rules and negative rules are allowed to interact and some of the rules are eliminated because they contain the same predictive information as other rules. In this case, only rules that have higher confidence are kept. Table 2. Sample data Sex M M M M F M F F M F M F
Marital Married Married Married Married Single Married Single Single Married Single Married Single
TV Type Subscription Subscription Subscription Subscription Subscription Subscription Subscription Pay Per View Subscription Pay Per View Specific Events Specific Events
TV Program Sports Sports ABC News ABC News ABC News ABC News Super Station Movies Movies Movies Sports Sports
Promotion New Customer Addon Package Renewal Renewal No Promotion Renewal No Promotion New Customer No Promotion No Promotion New Customer No Promotion
6. Discussion and Conclusion We described a novel strategy for mining negative rules. The algorithm predicts useful negative rules with respect to existing positive rules by employing domain knowledge, namely, taxonomy of the data set. From the given taxonomy, sibling rules are derived together with their estimated confidence. Rules that have a large confidence deviation are considered candidate negative rules. Negative rules are generated from those candidates that satisfy support and confidence constraints. Furthermore, the pruning step reduces redundancy among the negative rules. Our experiments indicate the efficacy of this strategy. Given the number of positive rules P and the average size of the LOS L, the complexity of the algorithm is O(P L). Note that the
Proceedings of the Seventh International Symposium on Computers and Communications (ISCC’02) 1530-1346/02 $17.00 © 2002 IEEE
Table 3. (a) LOS used in the example (b) positive and negative rules generated [M, F] [Single, Married, Coreside] [Subscription, Specific Events, Pay Per View] [ABC, Fox, SUper Station, TNT, Movie, Sports] [Addon Package, New Customer, No Promotion, Renew] (a) Positive Rules ABC News MAR Movies SUB
Conf. 70% 86% Movies SE 99% MAR: Married, SUB: Subscription, SE: Specific Events
!
!
Conf. 72.3% 99%
Negative Rules
: Sports ! MAR : Sports ! SUB
!:
see the trend in the number of negative and the number of positive rules with different minimum support. A contribution of the strategy described here versus the mining of negative rules described elsewhere [9, 10] is that our strategy provides a uniform measure of usefulness based on domain knowledge and takes advantage of ready-to-use statistical data produced during the generation of positive rules. The computation of negative support and confidence is grounded in statistical theory. The experimental data suggests that by using LOS to predict the occurrence of negative rules combining with reasonable interestingness measurement as well as a viable pruning strategy, our algorithm yields results that are more accordance with expectation.
(b)
References
Figure 2. Positive and negative rules discovered under different supports and confidences
complexity does not depend on the number of transactions since it is assumed that the supports of item sets have been counted and stored for use in this as well as other mining applications. However if we are considering discovering positive rules, which is necessary in generating negative rules, the algorithm must browse all combinations of items. The complexity of discovering positive rules depends on not only the number of transactions, but also the sizes of attribute domains as well as the number of attributes. The overall complexity will be proportional to that of discovering positive rules. The performance is also affected by the choice of minimum support. A lower minimum support produces more numerous item sets and, with the same confidence constraint more positive rules will be generated, which adds to computation expense. From figure 6 we can
[1] R. Agrawal, T. Imielinaki, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, Washington, D.C., May 1993. [2] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th Int’l Conf. on Very Large Databases, Santiago, Chile, September 1994. [3] R. Elmasri and S. B. Navathe. Fundamentals of Database Systems 3rd Edition. Addison-Wesley, 2000. [4] S. Fortin and L. Liu. An object-oriented approach to multilevel association rule. In Proc. of Int’l Conf. on Information and Knowledge Management, November 1996. [5] J. Han and Y. Fu. Mining multiple-level association rules in large databases. IEEE Trans. on Knowledge and Data Engineering, 11(5), 1999. [6] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In 3rd Int’l Conf on Information and Knowledge Management, Gaithersburg, Maryland, November 1994. [7] B. Lent, A. Swami, and J. Widom. Clustering association rules. In Proc. of the 13th International Conference on Data Engineering, 1997. [8] G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, 1991. [9] A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a large database of customer transations. In Proc. of Int’l Conf. on Data Engineering, Feburary 1998. [10] B. Sergey, M. Rajeev, and S. Craig. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of the ACM SIGMOD international conference on Management of data, 1997. [11] P. Smyth and R. M. Goodman. Rule induction using information theory. Knowledge Discovery in Databases, 1991. [12] R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD Record, 25(2), June 1996. [13] R. Srikant and R. Agrawal. Mining generalized association rules. Future Generation Computer Systems, 13(2-3), November 1997.
Proceedings of the Seventh International Symposium on Computers and Communications (ISCC’02) 1530-1346/02 $17.00 © 2002 IEEE