... most of these techniques stop short of the final objective of data mining-providing ..... Table 2, the dataset store
Expert Systems with Applications 29 (2005) 691–699 www.elsevier.com/locate/eswa
Mining action rules from scratch* Zengyou Hea,*, Xiaofei Xua, Shengchun Denga, Ronghua Mab a
Department of Computer Science and Engineering, Harbin Institute of Technology, 92 West Dazhi Street, P.O Box 315, Harbin 150001, People’s Republic of China b Nanjing Institute of Geography and Limnology, CAS, Nanjing 210008, People’s Republic of China
Abstract Action rules provide hints to a business user what actions (i.e. changes within some values of flexible attributes) should be taken to improve the profitability of customers. That is, taking some actions to re-classify some customers from less desired decision class to the more desired one. However, in previous work, each action rule was constructed from two rules, extracted earlier, defining different profitability classes. In this paper, we make a first step towards formally introducing the problem of mining action rules from scratch and present formal definitions. In contrast to previous work, our formulation provides guarantee on verifying completeness and correctness of discovered action rules. In addition to formulating the problem from an inductive learning viewpoint, we provide theoretical analysis on the complexities of the problem and its variations. Furthermore, we present efficient algorithms for mining action rules from scratch. In an experimental study we demonstrate the usefulness of our techniques. q 2005 Elsevier Ltd. All rights reserved. Keywords: Action rule; Positive example; Negative example; Association rules; Flexible attributes; Stable attributes; Data mining; CRM
1. Introduction Data mining, as well as its synonym knowledge discovery, is frequently referred to the literature as the process of extracting interesting information or patterns from large databases. There are two major issues in data mining research and applications: patterns and interest. The techniques of pattern discovering include classification, association, outlier and clustering. Interest refers to patterns in business applications being useful or meaningful. Data mining may also be viewed as the process of turning the data into information, the information into action, and the action into value or profit (Ling, Chen, Yang, & Chen, 2002; Wang, Zhou, & Han, 2002). Other than general measures and domain specific measures, an important measure of interestingness is whether or not it can be used in the
* This work was supported by the High Technology Research and Development Program of China (No 2003AA413310, No. 2003AA413021) and the IBM SUR Research Fund. * Corresponding author. Tel./fax: C86 4516414906. E-mail addresses:
[email protected] (Z. He),
[email protected] (X. Xu),
[email protected] (S. Deng).
0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2005.04.031
decision making process of a business to increase its profit. Recent research efforts (Brijs, Goethals, Swinnen, Vanhoof, & Wets, 2000; Cleinberg, Papadimitriou, & Raghavan, 1998; Elovici & Braha, 2003; Liu, Hsu, & Ma, 2001; Lin, Yao, & Louie, 2002; Ling et al., 2002; Ras & Wieczorkowska, 2000; Wang & Su, 2002; Wang et al., 2002; Yang & Cheng, 2002; Yang, Yin, Lin, & Chen, 2003; Padmanabhan & Tuzhilin) focused more on the later part of the process, i.e. mining those actionable patterns that the user can act on them to his advantage. Hence, it is no longer enough for rule-discovery algorithms, for example, to only report large amount of rules. The successful analytic solution must make it easier for the user to grasp the significance of these rules in the context of the business action plan. Extensive research in rule discovery has been done. Despite such phenomenal success, most of these techniques stop short of the final objective of data mining-providing possible actions to maximize profit while reducing costs. While these techniques are essential to move the rules to the eventual application, they nevertheless require a great deal of expert manual to post-process mined rules. Most of the post-processing techniques have been limited to designing interesting measures, but they do not directly suggest actions that would lead to an increase on the objective such as profit.
692
Z. He et al. / Expert Systems with Applications 29 (2005) 691–699
In Ras and Wieczorkowska (2000), the notion of an action rule was proposed. It is assumed that attributes in a database are divided into two groups: stable and flexible. By stable attributes we mean attributes whose values cannot be changed (age, marital status, number of children are the examples). On the other hand, attributes (like percentage rate or loan approval to buy a house in certain area) whose values can be changed or influenced are called flexible. Rules are extracted from a decision table given preference to flexible attributes. In general, any action rule provides hints to a business user what changes within flexible attributes are needed to reclassify some customers from a lower profitability class to a higher one. However, to our knowledge, previous works construct each action rule from two classification rules extracted earlier and do not provide a formal definition on the problem of action rule mining. Hence, some meaningful action rules should be missed in these classification-based techniques and thus existing algorithms cannot specify when and how the correct and complete underlying action rules are discovered. In this paper, we make a first step towards formally introducing the problem of mining action rules from scratch and present formal definitions. In contrast to previous work, our formulation explicitly states it as a search problem in a support-confidence-cost framework and provides guarantee on verifying completeness and correctness of discovered action rules. In addition to formulating the problem, we provide theoretical analysis on the complexities of the problem and its variations. Furthermore, we present efficient algorithms that are in an Aprior-like (Agrawal & Srikant, 1994) fashion for mining action rules from scratch. In an experimental study, we demonstrate the usefulness of our techniques. The organization of this paper is as follows. First, we review related work that has appeared the literature in Section 2. Problem formulation and complexity analysis are provided in Sections 3 and 4, respectively. Section 5 presents the algorithms for mining action rules from scratch. Empirical studies are provided in Section 6 and Section 7 concludes with remarks.
decision class to the more desired one. More importantly, they explicitly incorporating the cost of each action in maximizing expected net profit. However, their methods are tightly coupled with decision tree and do not provide rules explicitly, which is hard to use in practice. Although decision tree is employed in Ling et al. (2002) and Yang et al. (2003) and a rough set based classification is utilized in Ras and Tsay (2003) and Ras and Wieczorkowska (2000), despite their apparent difference in choosing classification technique, both of them can be regarded as classification-based action rule mining technique. As we have argued in Section 1, if action rules are not mined from scratch, some meaningful action rules could be missed in such classification-based techniques and thus existing algorithms cannot specify when and how the correct and complete underlying action rules could be discovered. Yang and Cheng (2002) presented an approach to use ‘role models’ for generating advice and plans. These role models are typical cases that form a case base and can be used for customer advice generation. For each new customer seeking advice, a nearest-neighbor algorithm is used to find a cost-effective and highly probable plan for switching a customer to the most desirable role models. However, such a method will incur high computation costs in generating actions. Furthermore, actions generated according to only its nearest neighbor sometimes will miss better lower-cost actions. Class association rule (CAR) is a small subset of association rules whose right hand sides are restricted to the class label. CAR was first proposed by (Liu, Hsu, & Ma, 1998). They integrate the techniques of association rule mining and classification rule mining by focusing on mining a special subset of association rules. Then the discovered CARs are used to build a classifier. Li, Han, & Pei (2001) proposed another algorithm by using multiple class association rules. Actually, the purpose of these two approaches is to construct classifiers, rather than finding interesting re-classification rules. Other researches, e.g. Liu et al. (2001) mainly focus on pruning rules based on some predefined interestingness criteria and thus are beyond the scope of this paper.
2. Related work 3. Problem formulation The notion of action rule was firstly proposed in Ras and Wieczorkowska (2000) and extended in Ras and Tsay (2003). Each action rule was constructed from two rules, extracted earlier, defining different profitability classes. More precisely, action rules are discovered after a rough set based classification process in Ras and Tsay (2003) and Ras and Wieczorkowska (2000). Ling et al. (2002) and Yang et al. (2003) discussed the techniques on post-processing decision tress to maximizing expected net profit. They also consider how to take some actions to re-classify some customers from less desired
Definition 1. Given a discrete finite attribute space of m dimensions EZD1!D2!/Dm, where Dj(jZ1,2,.,m) represents a set of symbolic values whose cardinality denoted by jDjj. The attribute set of objects in E is AZ {A1,A2,.,Am}, where the value range of jth attribute Aj is Dj. e1 ; e2 ; .; en are examples or objects of E where *2{C, K}. A positive example of E is denoted by eC i Z fvi1 ; vi2 ; .; vim g, where vij2Dj. On the contrary, all the other examples are negative examples. PE and NE are the positive example set and negative example set,
Z. He et al. / Expert Systems with Applications 29 (2005) 691–699
respectively C C PE Z feC 1 ; e2 ; .; eKp g;
eC i Z ðvi1 ; vi2 ; .; vim Þ;
Kp % n; 1% i% Kp K K NE Z feK 1 ; e2 ; .; eKn g;
eK j Z ðvj1 ; vj2 ; .; vjm Þ;
Kn % n; 1% j% Kn In the action rule mining literature, it is assumed that attributes in A are divided into two groups: stable denoted as S and flexible denoted as F. By stable attributes, we mean attributes whose values cannot be changed. On the other hand, those attributes whose values can be changed or influenced are called flexible. Moreover, PE represents the set of most profitable customers and NE represents those less profitable customers. Clearly, the goal is to increase profit by shifting some customers from the group NE to PE. The discovered action rules can provide clues on how the values of flexible attributes of some customers can be changed so that all these customers can be moved from NE to PE. Definition 2. Suppose PE and NE are the positive example set and negative example set of E, AZ{A1,A2,.,Am} is the attribute set of E. We say that PE and NE are consistent with respect to A if PEgNEZE and PEhNEZF. Definition 3. A selector is a form like [AjZvj], vj2Dj. A formula or complex L is a conjunctive form of selectors and expressed as LZL j2J[A jZv j], where J4M, MZ {1,2,.,m}. We said that the length of formula L is jJj. Definition 4. A positive example eC is said to satisfy formula LZLj2J[AjZvj] under the background of a negative example eK if and only if eC satisfies L, while eK does not. eC satisfies formula L under the background of negative set NE if and only if eC does it under the K background of each eK i where 1%i%Kn. Similarly, e satisfies formula L under the background of positive set PE if and only if eK does it under the background of each eC i where 1%i%Pn. Definition 5. An action rule is a form like NL/PL, where NLZLj2J[AjZvj] and PLZLj2J[AjZwj] are formulas that have same structure. An action rule NL/PL is said to be valid if and only if: d eK satisfies formula NL under the background of positive set PE d eC satisfies formula PL under the background of negative set NE
ð1Þ
ð2Þ
The length of the action rule is jJj, which is equal to the length of NL and PL. For both of eK in (1) and eC in (2), we say that they satisfy NL/PL.
693
When an action rule is applied on an example/customer, it is assumed the fact that the value of attribute Aj has been changed from vj to wj. In other words, the property [AjZvj] of a customer has been changed to property [AjZwj]. This change, in which the value of an attribute is transformed from vj to wj, corresponds to an action in an action rule. Note that if vjZwj holds, it means that value of attribute Aj has not been changed. Clearly, in a valid action rule, NLsPL. Otherwise, PE and NE are not consistent. One might ask why stable attributes should be included in an action rule. The answer lies on the fact that many stable attributes are important in constructing valid and useful action rules in case that their attribute values are not changed, i.e. vjZwj. Furthermore, if some stable attribute has value changed in an action rule, clearly, such rule is not meaningful since value changed is not allowed in stable attribute and should be pruned according to the cost of action rule, as introduced in the following. Actions in each action rule will incur costs. For instance, to change the property [AjZvj] of a customer to property [AjZwj], the company has to spend money. We can use cost {[AjZvj]/[AjZwj]} to denote the money spent. Ignoring this fact will limit the practical use of action rule since some action rules can be too costly to be useful. These costs have to be determined by domain experts. Since stable attributes cannot be changed with any reasonable amount of money. So, the cost of changing stable attribute is set to be a very large number. Obviously, cost {[AjZvj]/[AjZwj]}Z 0 will hold when vjZwj. The total cost of an action rule can be defined as the sum of the cost of each action in it, as presented in Definition 6. Definition 6. The cost of an action rule NL/PL is defined as follows: X cost fNL/ PLg Z cost f½Aj Z vj / ½Aj Z wj g j2J
Besides the cost of action rule, we also develop objective measure support and confidence to evaluate the interestingness of action rules. Definition 7. The support of an action rule NL/PL with respect to NE is defined as the percentage of negative examples in NE that satisfy NL under the background of positive set PE, i.e. SupN{NL/PL}Zj{eKjeK satisfies NL under the background of positive set PE}j/jNEj. Similarly, the support of an action rule NL/PL with respect to PE is defined as the percentage of positive examples in PE that satisfy PL under the background of positive set NE, i.e. SupP{NL/PL}1Z{eCjeC satisfies PL under the background of negative set NE}j/jPEj. For abbreviation, in the remaining parts of the paper, we would like to use negative support and positive support to name SupN{NL/PL} and SupP{NL/PL}, respectively. If an action rule has larger support with respect to NE, it indicates that this rule can be applied on a larger portion of
694
Z. He et al. / Expert Systems with Applications 29 (2005) 691–699
customers in NE. Similarly, SupP{NL/PL} measures whether the rule is well supported on PE. Furthermore, from the viewpoint of utilizing the action rule to re-classify customers, positive support does provide hints on the potential successfulness to use the rule for re-classification purpose. Hence, both of these two support measures are useful in applications. Definition 8. Let NL/PL be an action rule, suppose O be a set of negative examples satisfy NL under the background of positive set PE, i.e. OZ{eKjeK satisfies formula NL under the background of positive set PE}. And assume that T is a set of examples produced by applying NL/PL on O, i.e. the attribute values of eK in O have been changed accordingly. Clearly, jOjZjTjZSupN(NL/PL). The examples in T can be further divided into two sets: TP and TU, where TPgTUZT and TPhTUZF, TP4PE and TU