Partial rules: Another class of derivative and ... - Semantic Scholar

6 downloads 0 Views 104KB Size Report
ABSTRACT. A single base rule representing a fundamental inter-relation- ship between variables within a dataset can generate a large number of derivative ...
Partial rules: Another class of derivative and potentially uninteresting rules Geoffrey I. Webb

Shiying Huang

Faculty of Information Technology Monash University Victoria, Australia, 3800

Faculty of Information Technology Monash University Victoria, Australia, 3800

[email protected]

[email protected]

ABSTRACT A single base rule representing a fundamental inter-relationship between variables within a dataset can generate a large number of derivative rules. These are rules that appear interesting only due to the inter-relationship that is best represented by the base rule. This phenomena is well understood with respect to derivative rules formed by adding unproductive conditions to the antecedent of the base rule. We identify a new type of derivative rule that is formed by removing conditions from the antecedent of the base rule. We present statistically sound and computationally tractable techniques for detecting such rules, and experimental evidence that for some analytic tasks many such rules can be generated.

Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications— data mining; I.2.6 [Artificial Intelligence]: Learning; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval

Keywords Rule discovery, Interestingness, Derivative rules

1. INTRODUCTION Exploratory rule discovery (exploratory in the sense of exploratory data analysis [15]) finds all rules that satisfy some set of user-specified constraints with respect to some set of sample data. Exploratory rule discovery techniques include association rule discovery [1], k-optimal rule discovery [18], contrast or emerging pattern discovery [3, 8], interesting itemset discovery [10] and impact or quantitative rule discovery [2, 16, 20]. A well understood problem in exploratory rule discovery is that a single rule X → Y capturing an inter-relationship of interest between factors with the sample data can generate vast numbers derivative rules, that each appear to capture

Submitted for publication

a potentially interesting inter-relationship but actually exist only by virtue of the inter-relationship captured by X → Y . For example, with respect to any rule X → Y , if there is a third factor Z that is independent of either X or Y , then, on average X&Z → Y will appear exactly as interesting as X → Y using any measure of interest based on confidence (the proportion of the data that satisfies the antecedent that also satisfies the consequent). Techniques based on minimum improvement [4, 17] seek to identify rules that are derivative in this way. Another example arises in the case where one value is a specialization of another, or to express this in another way, where one value is a necessary consequence of another. For instance, if there is an interesting relationship captured by the rule pregnant → Y , then as pregnant is a specialization of f emale, it follows that pregnant&f emale → Y will cover exactly the same records from the data and hence will appear equally as interesting according to most common measures of interest. Techniques based on closed itemsets [13, 14, 19] are able to identify derivative rules of this type, as also are the minimum improvement based approaches. Both of these forms of derivative rule result from the addition of unproductive clauses to the antecedent of a rule. Research has shown that large proportions of discovered rules can be derivative in these ways if steps are not taken to identify and discard them [17]. However, our experience with numerous exploratory rule discovery applications has led us to identify another class of derivative rule, one that is quite different in nature to those handled by existing techniques, and one that can also result in the ‘discovery’ of large numbers of spurious rules. Consider the situation where two factors A and B inter-relate to lift the frequency of factor Y when they both co-occur, but have no impact on Y when they occur in isolation. The rule A&B → Y will capture this underlying inter-relationship. However, by many measures of interest, the rules A → Y and B → Y will also appear of interest because both A and B will be individually correlated with Y . Table 1 illustrates such a situation. In a database of 16 records, y occurs in 50% of records except in the context of all of a, b, and c for which it occurs in all records. Thus it occurs more frequently in the context of any combination of a, b and c than outside each of those contexts and thus all seven rules with some combination of these three factors in the antecedent and y in the consequent will appear to capture a potentially interesting inter-relationship in the data. For example, the antecedent of the rule a&b → y covers 4 records of which y appears in 3 and thus the confidence of this rule is 0.75 which represents

where |· | represents the size of a set. We omit either or both of the arguments to coverset, coverage, support, confidence and lift when they are apparent from context.

Table 1: An artificial example 1: 2: y 3: a 4: a,y 5: b 6: b, y 7: a, b 8: a, b, y 9: c 10: c, y 11: a, c 12: a, c, y 13: b, c 14: b, c, y 15: a, b, c, y 16: a, b, c, y

3.

a lift of 1.33 on the default frequency of 0.56 with which y occurs in the data as a whole. Such phenomena are not only a theoretical possibility. As we show in section 5, in some real-world applications large proportions of the discovered rules can be derivative in this manner. First, however, in section 2 we define necessary terminology and notation. In section 3 we provide some detailed examples of real-world occurrences of the phenomena. In section 4 we present techniques for identifying partial rules. We then present experimental evidence of the extent of this phenomena in real-world data before concluding.

2. TERMINOLOGY AND NOTATION We consider the following scenario. We have a database D that represents a finite set of entities {e1 . . . en } and a finite set of Boolean conditions C = {c1 . . . cm } each of which is either true or false of each entity. We use Ei ⊆ C to represent the set of conditions that are true of ei . We seek rules X → y, such that X ⊆ C and y ∈ C that satisfy some set of constraints with respect to D. X is referred to as the antecedent and y as the consequent. For convenience we limit our discussion to the case of consequents containing a single condition, as in many applications these are of particular interest, but the principles generalize in a straightforward manner to the case of multiple condition consequents. We define the set of entities covered by a set of conditions Z as follows: coverset(Z, D) = {ei : ei ∈ D & Ei ⊇ Z}

(1)

We are typically interested in finding rules that satisfy constraints on the values of measures of potential interest of a rule such as: coverage(X → y, D) = |coverset(X, D)|

(2)

support(X → y, D) = |coverset(X ∪ {y}, D)|

(3)

support(X → y, D) coverage(X → y, D)

(4)

confidence(X → y, D) =

lift(X → y, D) =

confidence(X → y, D) confidence(∅ → y, D)

(5)

TWO EXAMPLES

To further elucidate the concept of partial rules and to demonstrate the practical value in detecting them, we here present two examples with reference to rules discovered from the datasets discussed in section 5. Note, we discuss here assessments of statistical significance. Details of how these assessments are made are provided in sections 4 and 5. We start with a rule discovered with respect to the shuttle dataset. The entities in this dataset represent features relevant to monitoring the space shuttle in flight. Most features are numeric and we have discretized them into three subranges each. The following rule indicates that a3=0 for 74.6% of the entities for which a1 is greater than zero. This is a substantial (and as it turns out, statistically significant) increase on the 65.8% of all entities for which a3=0. Rule 1: {a1>0} → a3=0, coverage=5665, support=4225, confidence=0.746. On the face of it this is a potentially interesting rule. However, consider now the following rule Rule 2: {a1>0, class=1} → a3=0, coverage=4680, support=3609, confidence=0.771. This second rule is a specialization of the first created by adding class=1 to the antecedent. This specialization raises the confidence of the rule by an amount that turns out to be statistically significant. At face value we have two distinct rules each of which is potentially interesting in its own right. The first rule has lower confidence but, because of its higher coverage, predicts substantially more entities with a3=0. However, consider the 985 entities in coverset({a1 > 0})− coverset({a1>0, class=1}). Of these, a3=0 is true of 616, or 62.5%. This is less than the frequency with which a3=0 is true within the dataset as a whole. So, while it is true that the antecedent of rule 1 is associated with an increase in the frequency of its consequent, this is only by virtue of the association captured by its specialization, rule 2. The same is true for the other direct generalization of rule 2: Rule 3: {class=1} → a3=0, coverage=22705, support=15061, confidence=0.663. Both rules 1 and 2 cover large sub-populations that can be readily identified and which are, if anything, weakly negatively associated with the antecedent rather than positively associated as the rules might suggest. In consequence, for many applications these rules would not be considered interesting, despite their superficial appearance of interestingness. Whereas in the current example rule 2 generates only two partial rules, where a complex inter-relationship is captured by a rule with many conditions in the antecedent, there is the potential for many potentially uninteresting partial rules to be generated. To be exact, for a rule with k conditions in the antecedent, there is the potential for up to 2k − 2 partial rules to be generated. We first became aware of this phenomena when we were confronted with a dataset for which the majority of the rules produced by a rule discovery exercise were partial rules resulting from a very small number of base rules. However, just because one generalization of a rule X → y is partial does not necessarily mean that all generalizations of X → y will be partial. Consider the following rules re-

sulting from analysis of the splice-junction dataset. Rule 4: {S30=G, S31=T, S34=G} → class=EI, coverage=340, support=322, confidence=0.947. Rule 5: {S31=T, S34=G} → class=EI, coverage=393, support=322, confidence=0.819. Rule 6: {S30=G, S31=T} → class=EI, coverage=478, support=379, confidence=0.793. Rule 7: {S30=G, S34=G} → class=EI, coverage=418, support=324, confidence=0.775. Class=EI was true of 382 entities in the database of 1589 entities with respect to which these rules were assessed. All of these rules significantly improve upon the frequency of 24.0% with which class=EI occurs in the database as a whole and hence, at least superficially, appear to be potentially interesting rules. However, inspection reveals that both rules 4 and 5 have identical support. Thus, of the 53 entities in coverset({S31=T, S34=G}) − coverset({S30=G, S31=T, S34=G}), class=EI is true of none. In many applications that seek positive associations, rule 5 will be of little interest given rule 4. Next, consider rule 7. It has higher support than rule 4. Of the 78 entities in coverset({S30=G, S34=G}) − coverset({S30=G, S31=T, S34=G}), class=EI is true of 2 (2.6%). This is considerably lower than the frequency with which class=EI occurs in the dataset as a whole. Again, it is difficult to conceive of applications in which positive associations are sought in which rule 7 would be of interest given rule 4. Finally, consider rule 6. Of the 138 entities in coverset({S30=G, S31=T}) − coverset({S30=G, S31=T, S34=G}), class=EI is true of 57 (41.3%). While the frequency of the antecedent is lower in these entities outside those covered by rule 4, it is still considerably (and significantly) higher than the frequency of the antecedent in the data as a whole, and hence rule 6 might be preferred over rule 4 in applications where higher coverage is more valuable than higher confidence. These rules suggest that S30=G and S31=T interact to increase the likelihood of class=EI, and that S34=G further strengthens this interaction, but does not do so in combination with either S30=G or S31=T alone. The appropriate rule to capture this knowledge if only positive associations are sought are rules 6 and 4. We turn our attention next to techniques for identifying such relationships between rules.

4. DETECTING PARTIAL RULES In section 3 we provided some examples of rules that we believe would not generally be interesting to a data analyst seeking rules that capture positive associations. We argue that a rule X → y is partial with respect to another rule X ∪ {z} → y if they satisfy the following two conditions: confidence(X ∪ {z} → y, Dom) > confidence(X → y, Dom) (6) |coverset(X ∪ {y}, Dom) − coverset({z}, Dom)| |coverset(X, Dom) − coverset({z}, Dom)| |coverset({y}, Dom) − coverset(X, Dom)| ≤ |Dom − coverset(X, Dom)|

(7)

where Dom represents the domain of data from which the sample D was drawn1 . (6) states that the frequency of y in 1

For ease of exposition we assume here that the domain

the context of X ∪ {z} must be higher than its frequency in the context of X but not {z}. (7) states that the frequency of y in the context of X but not {z} must be no greater than its frequency outside the context of X. It might be thought that (7) should be replaced by a clause that compares the data covered by the antecedent of the generalization but not the specialization with the dataset as a whole, as per the following: |coverset(X ∪ {y}, Dom) − coverset({z}, Dom)| |coverset(X, Dom) − coverset({z}, Dom)| |coverset({y}, Dom)| ≤ . |Dom|

(8)

However, consider a dataset containing 1000 entities of which 500 are ys. Now consider a rule X → y with coverage = 500 and support = 300 and a rule X ∪ {z} → y with coverage = 100 and support = 100. The frequency with which y occurs in the context of X but not z is 0.5, which is exactly the frequency of y in the data as a whole. However, the frequency with which y occurs in the data that are not Xs is 0.4, so X is positively associated with y, above and beyond the effect of the association between X ∪ {z} and y.

4.1

Statistical tests

We wish to use sample data to draw inferences about the domain from which they were drawn. Thus it is appropriate to use a statistical hypothesis test. However, we wish to perform such a test multiple times, and hence must confront the problem that the risk of type-1 error (accepting a hypothesis when it is false) may be substantially increased. We must also guard against the risks of assessing models against the data from which they are inferred, specifically, that we should expect models to perform better on the data from which they are inferred than on the domain from which the data are drawn. One solution to these problems is to repeatedly randomly perturb the data to instantiate the null hypothesis for the significance test, and then find settings of the data mining algorithm that produce acceptable levels of false discoveries on the perturbed data [12]. However, it is not apparent how one might perturb the data to produce the relevant null hypotheses for the statistical tests required for (6) and (7). In consequence, we use the holdout approach [17]. Under this approach the sample data are divided into two sets, the exploratory data and the holdout data. Exploratory rule discovery is applied to the exploratory data and the resulting models are subjected to statistical tests with respect to the holdout data. The Bonferroni adjustment is applied, so that the critical value for the tests is divided by the number of pairs of rules to be tested. By pairs of rules we mean each rule X → y to be tested together with each specialization X ∪ {z} → y against which it is to be tested. Note that it is possible to perform two tests (one each for (6) and (9)) on each pair of rules without further adjusting the critical value as both must fall below the critical value for the rule to be rejected and hence the multiple test increase the risk of type-2 error (rejecting a rule that should be accepted) rather than type-1 error, and hypothesis tests seek to control the latter not the former. is finite. The notation could trivially be extended to the case of an infinite domain, but at the cost of uninformative notational complexity.

Table 2: Contingency table α0 β0 a β1 c

for Fisher exact tests α1 b d

The Bonferroni adjustment is an extremely conservative adjustment that strictly controls the risk of falsely accepting any of the multiple hypotheses tested. Less conservative adjustments are available that control the expected rate of false errors, rather than the risk of any false errors [5, 6]. However, unlike the Bonferroni adjustment, these generally assume that the tests are independent of one another, an assumption that is strongly violated in the current context. The next problem that we confront is that hypothesis tests cannot test for equality in frequencies between two groups, as they can never rule out the possibility that the frequencies differ by a margin that is too small to detect in whatever size sample is available. In consequence, rather than (7) we test for |coverset(X ∪ {y}, Dom) − coverset({z}, Dom)| |coverset(X, Dom) − coverset({z}, Dom)| |coverset({y}, Dom) − coverset(X, Dom)| < . (9) |Dom − coverset(X, Dom)| We use a one-tailed Fisher exact test for (6) and (9). With respect to two binary conditions α and β and the resulting four possible outcomes illustrated in Table 2, a one-tailed Fisher exact test calculates, as follows, the probability p that the observed number of co-occurrences of α0 and β0 or greater would occur by chance. The total number of ways in which αs and βs can be combined without altering the row or column sums or decreasing the value of a is divided by the total number of ways in which αs and βs can be combined without altering the row or column sums. While the Fisher exact test has a reputation for high computational complexity, and indeed it is infeasible to calculate by hand for any but the simplest of problems, its complexity is polynomial and its computational demands are negligible compared with many of the computations required during exploratory rule discovery. To assess (6) we set the contingency table for the Fisher exact test as illustrated in Table 3. To assess (9) we set the contingency table for the Fisher exact test as illustrated in Table 4.

4.2 Proposed method We wish to assess each of the rules X → y discovered with respect to the exploratory data. To do so we test for each condition z ∈ C − X − {y} whether X → y passes (6) and (9) with respect to X ∪ {z} → y and the holdout data. Let s equal the number of hX → y, X ∪ {z} → yi pairs. We apply the Bonferroni correction by using a critical value of 0.05/s. Each rule X → y is rejected if the p values for (6) and (9) are both less than or equal to 0.05/s for any z ∈ C −X −{y}.

4.2.1

Feasibility

To test each rule against m comparators might appear computationally daunting. However, we are only testing rules that are returned by the rule discovery process, not all rules considered during rule discovery. Hence, the number of rules considered will usually be relatively constrained.

An extremely loose upper bound on the time complexity of calculating each of (6) and (7) is provided by O(n2 ). Each of the r rules is tested against fewer than m specializations, so an extremely loose upper bound on the time complexity of this process is provided by O(rmn2 ). In practice, the compute time required by this process is negligible.

4.2.2

Higher-order interactions

This approach will only test for the possibility of a rule X → y being partial with respect to its immediate specializations, rules created by adding a single condition to X. However, it is possible that a rule might fail (6) and (9) with respect to a specialization created by adding two conditions z1 and z2 , or more. If X → y fails (6) and (9) with respect to X ∪{z1 , z2 } → y, then there is a strong possibility that it will also fail (6) and (9) with respect to X ∪ {z1 } → y or X ∪ {z2 } → y, because these are also likely to be partial rules, with their confidence increased by virtue of the inter-relationship between X, z1 , z2 and y. However, there remains the risks that both these intermediate rules might act to suppress the frequency of y or that while (6) and (9) still hold, the p values for the statistical tests applied to the intermediate rules happen to fall above the critical value while they would fall below it if applied to X ∪ {z1 , z2 } → y. Despite these risks, there are strong practical reasons not to investigate beyond the immediate specializations of a rule. The first is computational. The complexity of testing all two-condition specializations would increase to O(rm2 n2 ). Perhaps a more critical issue, however, is the difficulty of controlling the risk of type-1 error. The number of tests that would need to be performed would be of order O(rm2 ). The critical value that would result from a Bonferroni adjustment of this magnitude would be extremely low and hence many partial rules would fail to be detected.

5.

EXPERIMENTS

We have established that partial rules can be detected in a computationally feasible and statistically sound manner. The remaining issue that we wish to explore is how much of an issue are partial rules in practice. To this end we conducted experiments using nine larger datasets from the UCI ML and KDD repositories [7, 9] together with the BMS-WebView-1 dataset [11]. These are all datasets used in previous rule-discovery research. They are described in Table 5. Each record from a dataset was treated as an entity. Quantitative attributes were discretized into three subranges each such that each sub-range contained as close as possible to one-third of the entities in the exploratory data. Each value of each attribute in the resulting qualitative data was treated as a condition that was true of each record for which the relevant attributes assumed the given value for a given entity. The Magnum Opus2 rule discovery system was applied to each dataset twice. The first time it was applied to find the 1000 rules with the highest support within the constraint that lift≥ 2.0. The constraint on lift ensures that we are considering only rules that are likely to be interesting. We seek 1000 rules as, in our experience, this represents an upper bound on the number of rules that an analyst is likely 2 Commercial software distributed by Rulequest Research, http://www.rulequest.com.

Table 3: Contingency table for (6) coverset(X) − coverset({z}) coverset(X) ∩ coverset({z}) coverset({y}) a b D − coverset({y}) c d

Table 4: Contingency table for (9) coverset(X) − coverset({z}) D − coverset(X) coverset({y}) a b D − coverset({y}) c d

to be interested in considering for a typical rule discovery task. We seek the rules with the highest support in order to emulate the widely used frequent itemset approach to rule discovery. To assess the effect of the quality of the rules found, this process was repeated with minimum lift set to 1.5, resulting in a set of rules that tended to have higher support but lower lift. We assumed that this would provide greater opportunity for specializations to occur with higher levels of lift than that of the selected rules. Table 6 presents for each dataset the minimum support that resulted and the number of resulting rules that were assessed as partial by the approach described above for each value of lift. The numbers vary greatly from dataset to dataset. The rules discovered for half of the conditions have no rules assessed as partial. Note, that this in part reflects the conservatism of our assessment methodology. For example, seven of the rules discovered for BMS-WebView-1 with min-lift=2.0 would have been assessed as partial if the Bonferroni adjustment had not been applied. For each level of lift, partial rules were detected for five of the ten datasets, although, surprisingly, the specific datasets varied slightly with the differing settings for lift. In the most extreme case, over 21% of all rules were assessed as partial. No obvious relationship is apparent in the results between the number of rules that are assessed as partial and any of the number of entities or the number of conditions in the data, the minimum support, or the minimum lift.

6. FUTURE RESEARCH Our approach is extremely conservative, using a Bonferroni correction for a very large number of tests, and hence having high risk of rules that are partial failing to be detected because the p value from the statistical test does not fall below the extremely low resulting critical value. There is much potential for the development of more sensitive techniques that are able to identify more partial rules. Our approach is limited by considering only whether a rule is partial with respect to its immediate specializations. There is also scope for investigating techniques for detecting rules being partial with respect to higher-level specializations.

7. CONCLUSIONS Derivative rules have long been identified as a serious issue for exploratory rule discovery systems. Previous research has identified and addressed the potential for derivative rules to result from the addition of unproductive clauses to the

antecedents of the rules of central interest. In a number of applications, however, we have encountered difficulties with large numbers of derivative rules resulting from the deletion of clauses from rules of central interest. In this paper we have formalized this notion, defining partial rules, a type of rule that is unlikely to be of interest when rules representing positive associations are sought. We have presented statistically sound and computationally tractable techniques for identifying partial rules. Experiments demonstrated that the degree to which partial rules are an issue varies greatly from analytic task to analytic task. While for many tasks no partial rules were detected, in other circumstances more than 21% of all rules discovered were partial. We have shown that at least some partial rules are simple to detect and if undetected have the potential to seriously mislead the data analyst. In consequence, we recommend consideration of these techniques during exploratory discovery of positive rules.

8.

ACKNOWLEDGMENTS

This research has been supported by Australian Research Council Grant DP0450219. We are grateful to Janice Boughton for assistance in the collation of experimental results and for comments on a draft of this paper.

9.

REFERENCES

[1] R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in massive databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data, pages 207–216, Washington, DC, May 1993. [2] Y. Aumann and Y. Lindell. A statistical theory for quantitative association rules. In Proc. Fifth ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD-99), pages 261–270, 1999. [3] S. D. Bay and M. J. Pazzani. Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery, 5(3):213–246, 2001. [4] R. J. Bayardo, Jr., R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2/3):217–240, 2000. [5] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A new and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57:289–300, 1995. [6] Y. Benjamini and W. Liu. A step-down multiple hypotheses testing procedure that controls the false

Dataset BMS-WebView-1 Covtype IPUMS LA 99 KDDCup98 Letter Recognition Mush Shuttle Soybean Large Splice Junction TICDATA 2000

Dataset BMS-WebView-1 Covtype IPUMS LA 99 KDDCup98 Letter Recognition Mush Shuttle Soybean Large Splice Junction TICDATA 2000

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Entities 59,602 581,012 88,443 52,256 20,000 8,124 58,000 683 3,177 5,822

Table 5: Data Conditions 497 125 1,883 4,244 74 127 34 119 243 709

sets Description Basket data Geographic forest vegetation data Census data Mailing list profitability data Image recognition data Biological data Space shuttle mission data Soybean disease data Gene sequence data Insurance policy holder data

Table 6: Number of partial rules Lift=1.5 Lift=2.0 Min. support No. partial rules Min. support No. partial rules 68 0 67 0 153,223 60 130,398 0 25,866 0 19,496 136 14,889 0 11,209 0 991 16 697 51 1,323 98 1,137 192 2,538 216 2,167 117 164 0 147 0 140 21 116 31 1,710 0 1,370 0

discovery rate under independence. The Journal of Statistical Planning and Inference, 82:163 – 170, 1999. C. Blake and C. J. Merz. UCI repository of machine learning databases. [Machine-readable data repository]. University of California, Department of Information and Computer Science, Irvine, CA., 2004. G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. Fifth ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD-99), pages 15–18. ACM, 1999. S. Hettich and S. D. Bay. The UCI KDD archive. [http://kdd.ics.uci.edu] Irvine, CA: University of California, Department of Information and Computer Science., 2004. S. Jaroszewicz and D. A. Simovici. Interestingness of frequent itemsets using Bayesian networks as background knowledge. In R. Kohavi, J. Gehrke, and J. Ghosh, editors, KDD-2004: Proc. Tenth ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 178–186, New York, NY, August 2004. ACM Press. R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng. KDD-cup 2000 organizers’ report: Peeling the onion. SIGKDD Explorations, 2(2):86–98, 2000. N. Megiddo and R. Srikant. Discovering predictive association rules. In Proc. Fourth Int. Conf. Knowledge Discovery and Data Mining (KDD-98), pages 27–78, Menlo Park, US, 1998. AAAI Press. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. Seventh Int. Conf. Database Theory

[14]

[15] [16]

[17]

[18]

[19]

[20]

(ICDT’99), pages 398–416, Jerusalem, Israel, January 1999. J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. 2000 ACM-SIGMOD Int. Workshop on Data Mining and Knowledge Discovery (DMKD’00), pages 21–30, Dallas, TX, May 2000. J. W. Tukey. Exploratory Data Analysis. Adison-Wesley, Reading, Mass., 1977. G. I. Webb. Discovering associations with numeric variables. In Proc. Seventh ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD-2001), pages 383–388, New York, NY, 2001. The Association for Computing Machinery. G. I. Webb. Preliminary investigations into statistically valid exploratory rule discovery. In Proc. Australasian Data Mining Workshop (AusDM03), pages 1–9. University of Technology, Sydney, 2003. G. I. Webb and S. Zhang. K-optimal rule discovery. Data Mining and Knowledge Discovery, 10(1):39–79, 2005. M. J. Zaki. Generating non-redundant association rules. In Proc. Sixth ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD-2000), pages 34–43, New York, NY, August 2000. ACM. H. Zhang, B. Padmanabhan, and A. Tuzhilin. On the discovery of significant statistical quantitative rules. In Proc. Tenth Int. Conf. Knowledge Discovery and Data Mining (KDD-2004), pages 374–383, New York, NY, August 2004. ACM Press.

Suggest Documents