Prioritization of association rules in data mining - Department of ...

5 downloads 4647 Views 257KB Size Report
Prioritization of association rules in data mining: Multiple criteria decision approach. Duke Hyun Choi a,*, Byeong Seok Ahnb. , Soung Hie Kim a. aGraduate ...
Expert Systems with Applications 29 (2005) 867–878 www.elsevier.com/locate/eswa

Prioritization of association rules in data mining: Multiple criteria decision approach Duke Hyun Choia,*, Byeong Seok Ahnb, Soung Hie Kima a

Graduate School of Management, Korea Advanced Institute of Science and Technology (KAIST), 207-43 Cheongryangri-Dong, Dongdaemun, Seoul 130-012, South Korea b Department of Business Administration, Hansung University, 389 Samsun-Dong 2, Sungbuk, Seoul 136-792, South Korea

Abstract Data mining techniques, extracting patterns from large databases are the processes that focus on the automatic exploration and analysis of large quantities of raw data in order to discover meaningful patterns and rules. In the process of applying the methods, most of the managers who are engaging the business encounter a multitude of rules resulted from the data mining technique. In view of multi-faceted characteristics of such rules, in general, the rules are featured by multiple conflicting criteria that are directly related with the business values, such as, e.g. expected monetary value or incremental monetary value. In the paper, we present a method for rule prioritization, taking into account the business values which are comprised of objective metric or managers’ subjective judgments. The proposed methodology is an attempt to make synergy with decision analysis techniques for solving problems in the domain of data mining. We believe that this approach would be particularly useful for the business managers who are suffering from rule quality or quantity problems, conflicts between extracted rules, and difficulties of building a consensus in case several managers are involved for the rule selection. q 2005 Elsevier Ltd. All rights reserved. Keywords: Rule prioritization; Rule conflict; Association rule mining; ELECTRE

1. Introduction Under intense competition forcing companies to develop and maintain competitive marketing activity, techniques from the disciplines of both data mining and decision analysis have been extensively used in the development of computerized decision aids. Data mining techniques, extracting patterns from large databases, are the processes that focus on the automatic exploration and analysis of large quantities of raw data in order to discover meaningful patterns and rules (Agrawal, Imielinski, & Swami, 1993; Agrawal & Srikant, 1994; Song, Kim, & Kim, 2001). On the other hand, decision analysis is concerned with applying decision theory to real-world problems to help companies through the decision making process in which decision makers’ preferences that are important for judging the * Corresponding author. Tel.: C82 2 958 3684; fax: C82 2 958 3604. E-mail addresses: [email protected] (D.H. Choi), bsahn@ hansung.ac.kr (B.S. Ahn), [email protected] (S.H. Kim).

0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2005.06.006

desirability of alternative outcomes, are associated with criteria in a problem solving or decision making situation (White, 1990). Whilst separately developed, both data mining and decision analysis have conceptual basic denominators, providing systematic methods for problem solving and decision making in view of objectives (decision aiding) and delivery vehicle (the computer). From the perspective of many types of practical decision aiding applications, however, both data mining and decision analysis techniques have some limitations. Particularly, in decision support system development, there is little effort for generating synergies with complementing each other’s limitations. More specifically, user preferences, which play a key role in decision aids with decision analysis, are not explicitly considered in the current generation of data mining systems. Even if they are (indirectly) addressed, they are constrained to the preferences of the data mining engineers by the use of threshold values rather than the decision makers’ preferences that should be continuously adjusted to the current dynamic business environment. Normative decision analysis, on the other hand, is usually built around a prescriptive and rigid problem

868

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878

structure called a decision analysis model. Decision analysis is not compatible with extracting knowledge from large corporate databases of nowadays, since it does not focus on the automotive generation of meaningful knowledge from raw data. It can be, thus, meaningful to complement the techniques from data mining with those from decision analysis: data mining for alternatives (i.e. rules) generation from large database and decision analysis for prioritizing those alternatives by reflecting decision makers’ preferences. In existing data mining techniques, there exist some situations that make necessary the prioritization of rules for selecting and concentrating on more valuable rules due to the number of qualified rules (Tan & Kumar, 2000) and limited business resources. Even though the purpose of data mining is rule (pattern) extraction that is valuable for decision making, patterns are deemed ‘interesting’ just on the basis of passing certain statistical tests such as support/ confidence in data mining. To the enterprise, however, it remains unclear how such patterns can be used to maximize business objectives. The major obstacle lies in the gap between statistic-based summaries (the statistic-based pattern extraction) extracted by traditional rule mining and a profit-driven action (the value-based decision making) required by business decision making (Wang, Zhou, & Han, 2002) which is characterized by explicit consideration of conflicts of business objectives (Bar & Feigenbaum, 1981; Hayes-Roth, Waterman, & Lenat, 1983) and by multiple decision makers’ involvement for corporate decision making. In summary, the research objective is to prioritize association rules resulted from the data mining, taking into account their business values by explicitly incorporating the conflicting criteria of business values and by the managers’ preference statements toward their trade-off conditions. For this purpose, a decision analysis method, e.g. the Analytic Hierarchy Process (AHP) is applied to aggregate the opinions of the group decision makers on what are the relevant criteria for evaluating business values of rules and relative importance of those criteria. Association rule mining, one of the data mining techniques, is then performed to capture a set of competing rules with their various business values and those are, in turn, used as input for the rule prioritization. Thus, the final rule selected from the appropriate decision method, e.g. ELECTRE reveals meaningful result obtained from the use of machine learning and human intelligence.

of little interest for the decision making. The quality and quantity problems of these rules in traditional data mining applications stem from the fact that rules are extracted and selected only with statistical criteria, not concerning with the business values that are closely related with business strategies. Since one of the most important considerations when firms extract rules from large database is the business values that the firms could expect to obtain by decision making relied on those rules, some ways for reflecting business requirements in the process of rule selection, thus, are necessary for effective decision aids. The importance of each of business values, on the other hand, varies according to the environments with which firms face and their business strategies. For example, after an Internet shopping mall engages in business, the frequency of rules might be considered relatively more important than others since more frequent transactions with the customers may imply wide acknowledgement in the market. Later, the monetary value of rules might become increasingly important with the purpose of maximizing profit from the loyal customers. For the firms dealing with the fashionable items such as clothes, music CD, or computer games, it might be considered to be important to capture trends in customers purchasing patterns since such items fall into the category having relatively short life cycle. Recognizing those various requirements, researchers in data mining domain began to give attention to the business values in pattern extraction. Some researchers focused on discovering time trends and differences between datasets (Dong & Li, 1999; Song et al., 2001) or capturing trends of patterns by highlighting recent dataset (Zhang et al., 2003). Another group of studies deal with pruning or ranking association rules according to their frequency, statistical interdependence, and interestingness measures (Hilderman & Hamilton, 2001; Tan & Kumar, 2000). Recommending products to customers by considering expected profit (Kitts et al., 2000), selecting products based on association rules and product-specific profitability (Brijs, Goethals, Swinnen, Vanhoof, & Wets, 2000), or recommending target items and promotion strategy with the goal of maximizing the net profit encompasses the third kind of researches (Wang et al., 2002). The business values, reflecting each group of studies, can be largely categorized by three types including time trends (recency), statistical significance (frequency), and profit (monetary value), respectively. The related research efforts with the types of business values considered are shown in Table 1. 2.2. Association rule mining

2. Research background 2.1. Business values of rules In supporting decision making for the real business activities, decision makers are embarrassed when they encounter too many rules extracted and (or) those rules are

Association rule mining has been widely used from traditional business applications such as cross-marketing, attached mailing, catalog design, loss-leader analysis, store layout, and customer segmentation (Agrawal et al., 1993; Srikant & Agrawal, 1995) to e-business applications such as the renewal of web pages (Cooley, Mobasher, & Srivastava,

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878

869

Table 1 Related studies considering business values of rule Type of business value

Definition in this study

Related studies

Description

Recency

Time trend of a rule

Discovering time trends and differences between datasets

Frequency

Statistical significance of a rule in a time period

Monetary value

Profitability of a rule

Dong and Li (1999); Song et al. (2001) Zhang et al. (2003) Hilderman and Hamilton (2001); Tan and Kumar (2000) Brin et al. (1997) Kitts et al. (2000) Brijs et al. (2000) Wang et al. (2002)

1999) and web personalization (Mobasher, Cooley, & Srivastava, 2000; Mulvenna, Anand, & Bu¨chner, 2000). Given a set of transactions where each of transactions is a set of items (itemset), an association rule implies the form X0Y, where X and Y are itemsets; X and Y are called the body and the head, respectively. A rule can be evaluated by two measures, called confidence and support. A measure, support for the association rule X0Y is the percentage of transactions that contain both itemset X and Y among all transactions. The confidence for the rule X0Y is the percentage of transactions that contain an itemset Y among the transactions that contain an itemset X. The support represents the usefulness of the discovered rules and the confidence represents the certainty of the rules. Many algorithms can be used to discover association rules from data to extract useful patterns. Apriori algorithm is one of the most widely used and famous techniques for finding association rules (Agrawal et al., 1993; Agrawal & Srikant, 1994). Apriori operates in two phases. In the first phase, all itemsets with minimum support (frequent itemsets) are generated. This phase utilizes the downward closure property of support. In other words, if an itemset of size k is a frequent itemset, then all the itemsets below (kK1) size must also be frequent itemsets. Using this property, candidate itemsets of size k are generated from the set of frequent itemsets of size (kK1) by imposing the constraint that all subsets of size (kK1) of any candidate itemset must be present in the set of frequent itemsets of size (kK1). The second phase of the algorithm generates rules from the set of all frequent itemsets. Association rule mining is a powerful solution for alternative rule extraction, because it aims to discover all rules in data and thus is able to provide a complete picture of associations in a large dataset. There are, however, two major problems with regard to the association rule generation. The first problem stems from the rule quantity and quality problems. If minimum support is set too high, the rules involving rare items that could be of interest to decision makers will not be found. Setting minimum support low, however, may cause combinatorial explosion.

Capturing trends of patterns by highlighting recent dataset Pruning or ranking association rules according to their frequency, statistical interdependence, and interestingness measures Assessing the dependence between products Recommending products to customers considering expected profit Selecting products based on association rules and product-specific profitability Recommending target items and promotion strategy with the goal of maximizing the net profit

In other words, too many rules are generated regardless of their interestingness (Tan & Kumar, 2000). To dealing with this problem, techniques that allow the user to specify multiple minimum supports to reflect the frequencies of each of the items in the databases (Liu, Hsu, & Ma, 1999) or that exploits support constraints specifying what minimum support is required for what items for necessary itemsets to be generated (Wang, He, & Han, 2000). Those approaches, however, do not considering heterogeneous business values of association rules. It is, thus, required to suggest an approache to rule quantity and quality problems in which compensations between criteria of business values are not clear. The second one, not entirely independent with the first, is rule conflict problem that already was mentioned in the domain of rule based expert systems. If a set of rules that matches a context is identified, some approaches can be applied to resolve those conflicts (Bar & Feigenbaum, 1981; Hayes-Roth et al., 1983). One way to address such problem is to depend upon the intervention from human intelligence. Thus, one can obtain a satisfactory result by the use of human preference judgments in resolving conflicting rules that are characterized by multi-faceted conflicting criteria. In summary, both problems can be handled by prioritizing the rules resulted from data mining in accordance with the explicit consideration of business values. In what follows, we shall introduce a decision method for resolving the conflicting rules that is a novel approach for aggregating multiple evaluations whereas information requirements from the managers are minimized in order to alleviate the burden of information articulations. 2.3. ELECTRE Roy and Bouyssou (1993) argued some situations where the outranking approach can be justified: (1) when at least one criterion is not quantitative, so that preference intervals ratios have no sense; (2) when the units of the different criteria are so heterogeneous that coding them into one common scale seems to be very difficult or artificial, and (3) when the compensations between gains on some criteria

870

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878

and losses on other criteria are not clear. These properties make the outranking approach, e.g. ELECTRE as a suitable approach for prioritizing a set of association rules since some of the multiple criteria reflecting business values can be heterogeneous or sometimes not quantitative, and thus it is difficult to assure any specific compensations ratio between losses and gains on those criteria. ELECTRE uses the concept of an outranking relationship. The outranking relationship of A/B says that even though two alternatives A and B do not dominate each other mathematically, the decision maker accepts the risk of regarding A as almost surely better that B. This method consists of a pairwise comparison of alternatives based on the degree to which evaluations of the alternatives and the preference weights confirm or contradict the pairwise dominance relationships between alternatives. It examines both the degree to which the preference weights are in agreement with pairwise dominance relationships and the degree to which weighted evaluations differ from each other. These stages are based on a concordance and discordance set, hence this method is also called concordance analysis. The idea in the method is to rank alternatives which are preferred for most of the criteria and yet not cause an unacceptable level of discontent for any one criterion. The construction of the above subset is accomplished by using the concordance and discordance indexes and the outranking relations. The concordance index measures the intensity of the agreement between the averaged opinions of the group of decision makers. The discordance index measures the intensity of the non-agreement between the averaged opinions of the group of decision makers. The concordance index, C (A, B), is based on the weights of the judgment criteria and is labeled: CðA; BÞ Z

1 X wj ; W

where W Z

n X

c j : gj ðAÞR gj ðBÞ;

wj ; n : number of criteria

jZ1

This index varies from 0 to 1 and can be considered as a measure of the arguments in favor of the assertion A outranks B. The concordance index reflects the relative importance of A with respect to B. A higher value of C(A, B) indicates that A is preferred to B as far as the concordance criteria are concerned. The discordance index, D(A, B), takes into account the total range of scale for a criterion i where there is a discordance and is labeled

DðA; BÞ Z

8 >
: maxi di



if gi ðAÞR gi ðBÞ; c j if gi ðAÞ! gi ðBÞ; c j

where gi ðAÞ Z value of criterion j of alternative A; di Z maxi jgi ðCÞ K gi ðDÞj; c i; any pair of alternativesðC; DÞ: The information contained in the concordance matrix differs significantly from that contained in the discordance matrix. Differences among weights are represented by means of the concordance matrix, whereas differences among attribute values are represented by means of the discordance matrix. The ELECTRE II method allows the choice by giving the rank of each alternatives. There are multiple levels of concordance (0!pK!p0!pC!1) and discordance (0! q0!qC!1) that are specified to construct two outranking relationships (strong and weak outranking relations). The strong outranking relation labeled (AFB) is defined when one of the two conditions (A F B)1 and (A F B)2 is true: ðA F BÞ1 iff CðA; BÞR pC; DðA; BÞ% qC; and

W CðA; BÞR W KðA; BÞ

ðA F BÞ2 iff CðA; BÞR p0 ; DðA; BÞ% q0 ; and W CðA; BÞR W KðA; BÞ The weak outranking relation labeled (A f B) is defined when the conditions are true: ðA f BÞ iff CðA; BÞR pK; DðA; BÞ% qC; and

W CðA; BÞR W KðA; BÞ

These two relations can lead to the construction of two outranking graphs (strong graph and weak graph). The strong graph is always a subgraph of the weak graph but the distinction between strong performance and weak performance must be made to assure a complete ranking of the alternatives. These graphs are then used in an iterative procedure to obtain the rankings. The ELECTRE II approach uses two separate rankings called forward ranking and reverse ranking to arrive at the final rankings of the alternatives. There are five steps in the forward ranking procedure. Step 1: Identify all nodes having no precedent (i.e. those nodes that have no arcs directed towards them) in the strong graph and denote this set as a set A. Step 2: Select all nodes in a set A having no precedent in the weak graph and denote this set as set B. The nodes in set B are assigned rank one. Step 3: Reduce strong and weak graphs by eliminating all nodes in set B and all the arcs emanating from these nodes. Step 4: With the reduced graphs perform steps 1–3 and the reduced set of new nodes are given rank two. Step 5: This iterative procedure is continued till all nodes in both strong

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878

and weak graphs are eliminated and all alternatives are ranked. In reverse ranking, the first step is to reverse the direction of the arcs in the strong and weak graphs. If an alternative i is preferred to j in forward ranking, the alternative j is preferred to i in reverse ranking and a high concord relationship becomes a low concordance and a low discord relationship becomes a high discordance. The remaining steps are identical to the steps outlined in forward ranking with one difference: the alternative which is ranked last is ranked one and the remaining alternatives are ranked in reverse order. This re-establishes the correct direction of the ranking process. The final ranking (r) is obtained, as suggested by Roy and Bertier (1971), by taking the average of the forward (r 0 ) and reverse (r 00 ) rankings (i.e. (r 0 Cr 00 )/2). The alternative which gets least average value is ranked first and the alternative having next value is ranked second and so on till all elements of the alternatives are ranked. The method has the ability to consider both objective and subjective criteria and the least amount of information required from the decision maker with full utilization of information contained in the decision matrix (Roy & Vincke, 1981). It has, however, also weak point in the arbitrariness of assigning weights to the criteria and no mention of decision criteria generation process. In this study, we suggest an approach using Saaty’s (1980, 1990) AHP as a complementary method for these weak points. It permits us to collect all the relevant elements of the problem, whether objective or subjective, into one model and then to interactively work out their interdependencies and their perceived consequences. One of the main strengths of the AHP is the inconsistency measure as a means for Decision group

measuring the consistency of the decision maker’s judgments. Users need to know when they have made inconsistent judgments, especially if they are working as a group. In fact, it is the only approach for solving multicriteria decision problems that has such a capability (Carlsson & Walden, 1995). Furthermore, multiple decision makers’ different opinions about the importance of each attribute can be easily collected and aggregated with geometric means. These strengths of the AHP can be useful for us to determine the relative importance of each attribute in ELECTRE method reflecting group opinions.

3. Methodology In this section, we suggest a methodology for prioritizing association rules, taking into account their business values. The methodology consists of four phases as shown in Fig. 1. The criteria for prioritizing association rules should be defined to clarify marketing objectives in the first phase. In phase II, relative business importance of those criteria are calculated. Group decision making reflecting various stakeholders’ opinions is performed with the AHP technique. In phase III, target association ruleset is generated from transaction dataset. We use the Apriori algorithm for this purpose. Calculation of the criterion data for each of target rules is performed to prepare a decision table in phase IV. In the final phase, a rule prioritization list for the given target association rules set is produced by performing ELECTRE II with the decision table prepared in previous phases. A more detailed description for each of phases is provided in the following subsections in detail.

Corporate DB

AHP for evaluating relative weight among the rule selection criteria

871

Extracting the rule selection criteria value for association rule set

Mining association rules

Association rule set

Criteria value

Relative weight

Decision table

Calculating dominance matrix with ELECTRE Fig. 1. The overall flow of association rule prioritization.

Calculating a ranking of rules Rule2

Rule1

Rule4

Rule3

Ranking

Rule

1

Rule2, Rule4

2

Rule3

3

Rule1

872

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878

For the explanation of each of phases in our methodology, some notations are briefly defined as follow. Dt: datasets at time t jDtj: number of transactions in dataset Dt Rt: discovered association rulesets at time t jRtj: number of rules in ruleset Rt ri: each rule from ruleset Rt, where iZ1, 2,.jRtj fA,B: number of transactions containing both A and B in dataset Dt P(A, B): percentage of transactions containing both A and B in dataset Dt Supt(ri): support of ri in Dt Conft (ri): confidence of ri in Dt Price(Y): price per-unit of item Y We used real-world data for the evaluation of a proposed methodology. The data used in the experiment were transaction records for those goods sold by the H department store, the third largest one in Korea. The dataset contains customers’ purchasing histories that are identified and recorded by the use of each customer’s smart card issued by the department store and it also contains product information such as price and brand. We constructed a data warehouse that aggregates historical data by individual customer. We prepared two datasets since we should calculate the measure, DoC for each of target association rules. The first dataset contains transaction records of certain customers who had bought more than one electronic product from May to December, 2000. In addition to the purchasing data in the first dataset, the second dataset contains information related with purchase of additional electronic products from January to April, 2001. With the second dataset, an Apriori technique was applied to discover the target association for alternatives of rule prioritization in our study.

3.1. Phase I: decision hierarchy with criteria for prioritizing association rules Since many factors would affect the business values of association rules, decision groups should discuss which factors are most appropriate to explain business values. In this study, recency, frequency, and monetary (RFM) method can be helpful for decision makers’ to avoid focusing on less promising rules, thus, allowing resources to be devoted to more valuable rules. Our approach for assessing the business values of rules parallels that of RFM analysis for assessing customer lifetime value in that the attributes related with recency, frequency, and monetary value of each rule might be worthy of special attention. Measuring RFM is an important method for assessing customer lifetime value which is typically used to identify profitable customers and to develop strategies to target those customers. Bult and Wansbeek (1995) defined the term recency as period since the last purchase, frequency as number of purchases made within a certain period, and monetary as the money spent during a certain period, respectively. It is conceived that these concepts of RFM are useful for determining the business values of each rule and ranking a set of competing association rules. A hierarchical structure of criteria for evaluating the association rules is shown in Fig. 2, in which the term RFM is redefined with a little modification (see Table 1). 3.1.1. Recency The term recency is defined as time trend of a rule between time intervals in this study; a higher value implies a higher worth of attention to a rule. This factor can be measured with the attribute of the degree of change in support. Degree of change (DoC). Even though most of data mining techniques usually give attention to the rules which have a large frequency of occurrence and ignore time trend, the rules with a large growth rate or decreasing rate in

Business values of a rule

The 1st level criteria

Recency

Frequency

Monetary value

The 2nd level criteria

Degree of change (DoC)

Support (Sup)

Confidence (Conf)

Interest factor (IF)

Fig. 2. An example for structuring rule selection problem.

Expected monetary value (EMV)

Incremental monetary value (IMV)

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878

occurrence may give significant implications to managers in changing business environment in spite of their relatively small occurrence. There are some studies on discovering trends and differences in patterns (see Dong & Li, 1999; Song et al., 2001). As we are interested in the attributes of each of rules from ruleset Rt, we bring the measure of the degree of change for emerging pattern case from the study of Song et al. (2001) with modification for our study. The emerging patterns are defined as itemsets whose supports increase significantly from one dataset to another. More specifically, emerging patterns are itemsets whose growth rates are larger than a given threshold value. When applied to time stamped databases, emerging patterns can capture emerging trends in business or demographic data (Song et al., 2001). The modified measure for the degree of change in support, si, is defined in this study as follows: 8 < Supt ðri Þ K SuptKk ðri Þ ; if SuptKk ðri Þ s0 tKk si Z ðr Þ Sup i : otherwise maxðs1 ; s2 ; .; sjRj Þ; For rule 1, goodcd_45530019310000 goodcd_4503620935000, that was generated from Dt in our study (see Table 3), we can compute the degree of change measure as follows SuptK1 ðr1 Þ Z 0:0012

Supt ðr1 Þ Z 0:0032

0:0032 K 0:0012 Z 1:734 s1 Z 0:0012 3.1.2. Frequency The term frequency is defined as statistical significance of a rule in a time interval in this study; higher frequency indicates greater statistical significance of a rule. This factor consists of three attributes, support, confidence, and interest factor. Support(Sup). Support criterion is necessary in the model because it represents the statistical significance of a pattern. From the marketing perspective, support of an itemset in retail sales data justifies the feasibility of promoting the items together. However, support alone may not serve as a reliable interestingness measure. For example, rules with high support quite often correspond to obvious knowledge about the domain that may not be informative (Tan & Kumar, 2000). Confidence (Conf). Confidence was initially proposed to measure the strength of an association rule by measuring the conditional probability of events associated with a particular rule. For example, if a rule X0Y has confidence c, this means that c percent of all transactions that contain X will also contain Y. However, it may produce counter-intuitive results especially when strong negative correlations are present (Brin, Motwani, & Silverstein, 1997). Interest factor (IF). Interest factor is another widely used measure for association patterns (Brin et al., 1997).

873

This metric is defined to be the ratio between the joint probability of two variables with respect to their expected probabilities under the independence assumption. The interest factor is a non-negative real number with a value of 1 corresponding to statistical independence. For we are interested in rules rather than itemset, we bring the measure of interest factor from the study of Tan and Kumar (2000) for association rules. The interest factor for the rule A1 A2 /Aj 0 AjC1 AjC2 / Ak can be defined as follows: IðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ Z

PðA1 ; A2 ; .; Ak Þ PðA1 ; A2 ; .; Aj ÞPðAjC1 ; AjC2 ; .; Ak Þ

For rule 1, goodcd_45530019310000goodcd_ 4503620935000, that was generated from Dt in our study, we can compute the interest factor measure as follows. P(goodcd_4553001931000, goodcd_4503620935000)Z 0.0033 P(goodcd_4553001931000)Z0.0337 P(goodcd_ 4503620935000)Z0.0469 Iðr1 Þ Z

0:0033 Z 2:090 0:0337 !0:0469

3.1.3. Monetary value The term monetary value is defined as profitability of a rule in this study; a higher value indicates that the company should focus more on that rule. This factor can be measured with two attributes, expected monetary value and increased monetary value. Expected monetary value (EMV). If we assume mutual independence between products, then the expected profit after buying a product X is equal to the probability of buying Y given X, multiplied by the profit of Y (Kitts et al., 2000). With modification to suite our study, we provide the following measure for expected monetary value of a rule: EMVðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ Z Conf t ðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ !

k X

PriceðAi Þ

iZjC1

For rule 1, goodcd_45530019310000goodcd_ 4503620935000, that was generated from Dt in our study, we can compute the interest factor measure as follows. Conf (r1)Z0.098 100,000

Price(goodcd_4503620935000)Z

EMVðr1 Þ Z 0:098 !100; 000 Z 9800 EMV can be seen as a recommendation profit of a rule that is on a per-recommendation basis and factors in both the hit rate (i.e. confidence) and the profit of the recommended item(s).

874

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878

Incremental monetary value (IMV). The idea behind incremental profit of Kitts et al. (2000) is the expected profit minus the profit you would expect to receive due to the natural course of a customer’s purchasing. Incremental profit maximizes the profit of the item, minus the baseline profit associated with the item (Kitts et al., 2000). Incremental monetary value is defined as follows in this study: IMVðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ   fA ;A ;.;Ak Z Conf t ðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ K jC1 jC2t jD j

!

k X

PriceðAi Þ

iZjC1

For rule 1, goodcd_45530019310000goodcd_ 4503620935000, that was generated from Dt in our study, we can compute the interest factor measure as follows. Conf (r1)Z0.098 f(goodcd_4503620935000)Z154 jDtjZ3285 Price(goodcd_4503620935000)Z100,000  154 EMVðri Þ Z 0:098 K !100; 000 Z 5112:024 3285

3.2. Phase II: relative weighting The AHP was used to determine the relative importance of the rule attributes. The main steps of the AHP are as follows. 3.2.1. Step 1: perform pairwise comparisons This asks decision makers to make pairwise comparisons of the relative importance of each of attributes. Two

decision makers in our example may judge the relative importance for each of levels: administrative managers (AM) and business managers (BM) in sales. The answers with their own different opinions are aggregated into geometric means, which are expressed in the form of pairwise comparison matrices for each of levels (Fig. 3). 3.2.2. Step 2: assess the consistency of pairwise judgments Decision makers may make inconsistent judgments when making pairwise comparisons. Before the weights are computed, the degree of inconsistency is measured by an inconsistency index. Perfect consistency implies a zero inconsistency index. However, perfect consistency is seldom achieved since humans are often biased and inconsistent when making subjective judgments. Therefore, an inconsistency index of less than 0.1 is acceptable (Harker, 1989). If the inconsistency index exceeds this value, then the pairwise judgment may be revised before the weights of attributes are computed. 3.2.3. Step 3: computing the relative weights. This determines the weight of each decision element. This work employs eigenvalue computations to derive the weight of the attributes (Table 2). The eigenvector of the maximum eigenvalue of the factors in the first level is 0.330 for recency, 0.274 for frequency, and 0.396 for monetary value, respectively. In this example, we can find that decision makers consider monetary value as the most important factor for prioritizing association rules. Continuing to compute the weight of the next level by computing their eigenvector, we can get 0.602 for Sup, 0.237 for Conf, and 0.161 for IF as the sub-criteria of frequency and 0.414 for EMV and 0.586 for IMV as the sub-criteria of monetary value, respectively. According to the assessments, the relative weights of DoC, Sup, Conf, IF, EMV, IMV are 0.330, 0.165, 0.065, 0.044, 0.164, and

Pairwise comparisons matrix of the first level criteria Business values of a rule Recency Frequency Monetary value

Recency 1

Frequency 1 (1/2; 2) 1

a

Monetary value 1 (1/3; 3) 0.577 (1/3; 1) 1

Pairwise comparisons matrix of the second level criteria Frequency Sup Conf IF

Sup 1

Conf 2.449 (3; 2) 1

Monetary value EMV IMV

EMV 1

EMV 0.707 (2; 1/4) 1

a

IF 3.873 (3; 5) 1.414 (2; 1) 1

Aggregated opinion (administrative manager’s opinion; business manager’s opinion) Fig. 3. Pairwise comparisons matrix of the first level criteria.

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 Table 2 Relative weights of criteria Criteria

Level 1

Level 2

Relative weights

Recency_DoC Frequency_Sup Frequency_Conf Frequency-IF Monetary value_EMV Monetary value_IMV

0.330 0.274 0.274 0.274 0.396

1 0.602 0.237 0.161 0.414

0.330 0.165 0.065 0.044 0.164

0.396

0.586

0.232

875

with the price of head item. In this case, rule 1 and rule 4 have the same body of goodcd_4553001931000, and rules 2–4 have the same head item of goodcd_4504940939002. It is, thus, necessary for decision makers to rank those rules and get the priorities among them in decision making processes.

DoC, degree of change; Sup, support; Conf, confidence; IF, interestingness factor; EMV, expected monetary value; IMV, increased monetary value.

0.232, respectively. In this example, thus, DoC is found to be the most important criterion, IMV is the next and the like. 3.3. Phase III: target association rule set In the condition of 0.25% minimum support, 8% minimum confidence and maximum itemset size 2, the system found seven association rules with the second dataset. The target association rules are provided in Table 3

3.4. Phase IV: preparing decision table Then the decision matrix of the rule prioritization problem is prepared as shown in Table 4. For the criterion of DoC, rule 2, rule 3, and rule 5 get the negative values. This means that the support, in other word, usefulness of those association rules decreased in 2001. Rule 5 has the largest value for EMV due to the highest price of head item but lowest value for IMV. The reason can be found in the fact that the value of IF is relatively low compared to the value of Conf for rule 5. In other word, the accuracy of the rule 5 stem from the high frequency of head item, not from relationship with the body item. 3.5. Phase V: calculating ranking For each relation Rule i and Rule j with i or j integers and 2 [1, 7], the values of the concordance index and

Table 3 Target association ruleset Alternatives

Body

Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7

goodcd_4553001931000 goodcd_4550410939000 goodcd_4550410939001 goodcd_4553001931000 goodcd_4502160939001 goodcd_4550900939001 goodcd_4550900939000

0 0 0 0 0 0 0

Head

Conf

Supp

Price of head item

goodcd_4503620935000 goodcd_4504940939002 goodcd_4504940939002 goodcd_4504940939002 goodcd_4550390935070 goodcd_4550900939000 goodcd_4550900939001

0.098 0.081 0.089 0.080 0.086 0.141 0.086

0.003332 0.002430 0.002759 0.002720 0.002752 0.004230 0.004300

100,000 30,000 30,000 30,000 179,000 90,000 90,000

Table 4 Decision table Alternatives

DoC

Sup

Conf

IF

EMV

IMV

Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7

1.734 K0.016 K0.032 0.741 K0.258 0.741 0.741

0.003332 0.002430 0.002759 0.002720 0.002752 0.004230 0.004300

0.098 0.081 0.089 0.080 0.086 0.141 0.086

2.090 20.468 22.490 20.215 1.002 2.842 2.854

9800 2430 2670 2400 15,394 12,690 7740

5112.024 2311.279 2551.279 2281.279 27.790 8224.247 5027.671

Table 5 Matrix 7!7 of concordance Alternatives

Rule 1

Rule 2

Rule 3

Rule 4

Rule 5

Rule 6

Rule 7

Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7

– 0.044 0.044 0.044 0.164 0.670 0.209

0.956 – 0.670 0.495 0.394 0.956 0.956

0.956 0.330 – 0.330 0.164 0.956 0.891

0.956 0.505 0.670 – 0.394 0.626 0.956

0.836 0.606 0.836 0.606 – 0.836 0.836

0.330 0.044 0.044 0.374 0.164 – 0.539

0.791 0.044 0.109 0.374 0.229 0.461 –

876

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878

Table 6 Matrix 7!7 of discordance

Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7

Rule 1

Rule 2

Rule 3

Rule 4

Rule 5

Rule 6

Rule 7

– 0.879 0.886 0.569 1.000 0.499 0.499

0.855 – 0.008 0.016 0.906 0.820 0.820

0.949 0.176 – 0.148 1.000 0.914 0.914

0.843 0.380 0.388 – 0.894 0.809 0.808

0.431 0.998 0.388 1.000 – 0.208 0.589

0.705 0.984 0.852 1.000 1.000 – 0.902

0.518 1.000 0.824 0.845 0.828 0.037 –

Fig. 4. Strong and weak preference graphs and forward ranking steps.

discordance indexes are given in the Tables 5 and 6, respectively. For concordance and discordance thresholds of ELECTRE II, we set pCZ0.609 (average of concordance indexes plus 0.1), p0Z0.509 (average of concordance indexes), and pKZ0.409 (average of concordance indexes minus 0.1); q0Z0.693 (average of discordance indexes), qCZ0.893 (average of discordance indexes plus 0.2). Fig. 4 shows the results of ELECTRE II, in the form of strong and weak relation graphs with illustration of forward ranking steps. The final ranking (r) of the target association rules are obtained by taking the average of the forward (r 0 ) and reverse (r 00 ) rankings (see Table 7). The sensitivity of selection of alternatives with changes in threshold values (pK, p0, pC, q0, and qC) was also studied and the results of ELECTRE II for these changes are shown in Table 8. It is found from Table 8 that the ranks have slightly changed for different threshold values. However, in all the cases, rule 6, rule 3, and rule 5 get the first, the second, and the last rank, respectively.

4. Conclusion The business values that the firms could obtain is the most important consideration to firms in the process of extracting rules from large database on which decision makers relied. Each of rules has various attributes related with the business values and those attributes have different Table 7 ELECTRE 2 method, direct, reverse and medium classification with the outranking relations Alternatives

Forward ranking, r 0

Reverse ranking, r 00

Average, (r 0 Cr 00 )/2

Final ranking, r

Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7

2 4 1 5 4 1 3

2 4 3 5 5 1 3

2 4 2 5 4.5 1 3

2 4 2 6 5 1 3

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878

877

Table 8 Sensitivity analysis Case1a pK; p0; pC q0; qC Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7

Case1c

Case2a

0.5; 0.7

Case1b 0.5; 0.6; 0.7 0.6; 0.8

0.7; 0.9

3 4 2 6 7 1 5

3 4 2 6 7 1 5

2 4 2 6 5 1 3

importance according to the business strategy of the firms. Our work presents a novel rule prioritization methodology that incorporate decision makers’ preferences in evaluating the promising association rules obtained from data mining. The proposed methodology is an attempt to make synergy with decision analysis techniques for solving problems in the domain of data mining. We believe that this approach would be particularly useful for the business managers who are suffering from rule quality or quantity problems, conflicts between extracted rules, and difficulties in building a group-consensus for the rule selection. As a further research area, we plan to extend our methodology to the adaptation process. With a feedbackloop that allows decision makers’ agreement and disagreement for some parts of ranks, it is expected to obtain more refined rule prioritization.

References Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association between sets of items in massive database. International proceedings of the ACM-SIGMOD international conference on management of data (pp. 207–216). Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the international conference on very large data bases (pp. 407–419). Barr, A., & Feigenbaum, E. A. (Eds.). (1981). The handbook of artificial intelligence. Los Altos, CA: Morgan Kaufmann. Brijs, T., Goethals, B., Swinnen, G., Vanhoof, K., & Wets, G. (2000). A data mining framework for optimal product selection in retail supermarket data: The generalized PROFSET model. Proceedings of the ACM-SIGKDD international conference on knowledge discovery and data mining (pp. 20–23). Brin, S., Motwani, R., & Silverstein, C. (1997). Beyond market baskets: Generalizing association rules to correlations. International proceedings of the ACM-SIGMOD international conference on management of data (pp. 265–276). Bult, J. R., & Wansbeek, T. J. (1995). Optimal selection for direct mail. Marketing Science, 14(4), 378–394. Carlsson, C., & Walden, P. (1995). AHP in political group decisions: A study in the art of possibilities. Interfaces, 25, 14–29. Cooley, R., Mobasher, B., & Srivastava, J. (1999). Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems, 1(1), 5–32.

0.5; 0.7

Case2b 0.6; 0.7; 0.8 0.6; 0.8

Case2c 0.7; 0.9

2 3 2 3 4 1 3

2 4 2 3 5 1 4

2 4 2 4 4 1 3

Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. Proceedings of the international conference on knowledge discovery and data mining (KDD’99) (pp. 43–52). Harker, P. T. (1989). The art and science of decision making: The analytic hierarchy process. The analytic hierarchy process application and studies. Berlin: Springer. Hayes-Roth, F., Waterman, D. A., & Lenat, D. B. (1983). Building expert systems. Reading, MA: Addison-Wesley. Hilderman, R. J., & Hamilton, H. J. (2001). Evaluation of interestingness measures for ranking discovered knowledge. Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD’01) (pp. 247–259). Kitts, B., Freed, D., & Vrieze, M. (2000). Cross-sell: A fast promotiontunable customer-item recommendation method based on conditionally independent probabilities. Proceedings of the ACM-SIGMOD conference on knowledge discovery and data mining (KDD’00) (pp. 437–446). Liu, B., Hsu, W., & Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the ACM-SIGMOD international conference on knowledge discovery and data mining (KDD’99) (pp. 15–18). Mobasher, B., Cooley, R., & Srivastava, J. (2000). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151. Mulvenna, M. D., Anand, S. S., & Bu¨chner, A. G. (2000). Personalization on the net using Web mining. Communications of the ACM, 43(8), 123–125. Roy, B., & Bertier, B. (1971). La methode ELECTRE II: Une Methode de Classement en Presence de Criteres Multiples. Note de Travail No 142, Direction Scientifique, Groupe Metra. Roy, B., & Bouyssou, D. (1993). Aide Multicrite`re a` la De´cision: Me´thodes et cas. Economica. Roy, B., & Vincke, P. (1981). Multicriteria analysis: Survey and new directions. European Journal of Operational Research, 8, 207–218. Saaty, T. L. (1980). The analytic hierarchy process: Planning, priority setting, resource allocation. New York: McGraw-Hill. Saaty, T. L. (1990). How to make a decision: The analytic hierarchy process. European Journal of Operational Research, 48, 9–26. Song, H. S., Kim, J. K., & Kim, S. H. (2001). Mining the change of customer behavior in an internet shopping mall. Expert Systems with Applications, 21, 158–168. Srikant, R., & Agrawal, R. (1995). Mining generalized association rules. Proceedings of the international conference on very large data bases (VLDB’95). Tan, P. N., & Kumar, V. (2000). Interestingness measures for association patterns: A perspective. KDD 2000 Workshop on Postprocessing in Machine Learning and Data Mining, Boston, MA, August.

878

D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878

Wang, K., He, Y., & Han, J. (2000). Mining frequent itemsets using support constraints. Proceedings of the international conference on very large data bases (VLDB’00). Wang, K., Zhou, S., & Han, J. (2002). Profit mining: From patterns to actions. Proceedings of international conference on extending data base technology (EDBT’02) (pp. 70–87).

White, C. C. (1990). A survey on the integration of decision analysis and expert systems for decision support. IEEE Transactions on System, Man, and Cybernetics, 20(2), 358–364. Zhang, S., Zhang, C., & Yan, X. (2003). Post-mining: Maintenance of association rules by weighting. Information Systems, 28, 691– 707.

Suggest Documents