This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright
Author's personal copy
Expert Systems with Applications 36 (2009) 3937–3945
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
A practical extension of web usage mining with intentional browsing data toward usage Yu-Hui Tao a,*, Tzung-Pei Hong b,1, Wen-Yang Lin c,2, Wen-Yuan Chiu d a
Department of Information Management, National University of Kaohsiung, 700 Kaohsiung University Road, Nan-Tzu District, Kaohsiung 811, Taiwan, ROC Department of Electrical Engineering, National University of Kaohsiung, 700 Kaohsiung University Road, Nan-Tzu District, Kaohsiung 811, Taiwan, ROC c Department of Computer Science and Information Engineering, National University of Kaohsiung, 700 Kaohsiung University Road, Nan-Tzu District, Kaohsiung 811, Taiwan, ROC d Taiwan Electronic Data Processing Corporation, 6F.-2, No. 171, Sanduo 2nd Road, Lingya District, Kaohsiung, Taiwan, ROC b
a r t i c l e
i n f o
Keywords: Web usage mining Intentional browsing data Web log files Browsing behaviour Fuzzy set concept
a b s t r a c t Intentional browsing data is a new data component for improving Web usage mining that uses Web log files as the primary data source. Previously, the Web transaction mining algorithm was used in e-commerce applications to demonstrate how it could be enhanced by intentional browsing data on pages with item purchase and complemented by intentional browsing data on pages without item purchase. Although these two intention-based algorithms satisfactorily illustrated the benefits of intentional browsing data on the original Web transaction mining algorithm, three potential issues remain: Why is there a need to separate the source data into purchased-item and not-purchased-item segments to be processed by two intention-based algorithms? Moreover, can the algorithms contain more than one browsing data types? Finally, can the numeric intention-based data counts be more user friendly for decision-making practices? To address these three issues, we propose a unified intention-based Web transaction mining algorithm that can efficiently process the whole data set simultaneously with multiple intentional browsing data types as well as transform the intentional browsing data counts into easily understood linguistic items using the fuzzy set concept. Comparisons and implications for e-commerce are also discussed. Instead of addressing the technical innovation in this extension study, the revised intention-based Web usage mining algorithm should make its applications much easier and more useful in practice. Ó 2008 Elsevier Ltd. All rights reserved.
1. Introduction Web mining can be defined simply as the application of data mining techniques to Web data (Oren, 1996). The Web poses new challenges to Web mining due to its size, the complexity of Web pages, its dynamic nature, the broad diversity of user communities, and the low relevance of useful information (Han & Kamber, 2001). Cooley et al. classified Web mining into three categories, namely, Web structure mining that identifies authoritative Web pages, Web content mining that classifies Web documents automatically or constructs a multi-layered Web information base, and Web usage mining (WUM) that discovers users’ access patterns of Web pages (Cooley, Mobasher, & Srivastava, 1999). From the data-source perspective, both Web structure and Web content mining target the Web content, while WUM targets the Web access logs that typically include the host name or IP address, remote user
* Corresponding author. Tel.: +886 7 5919220; fax: +886 7 5919328. E-mail addresses:
[email protected] (Y.-H. Tao),
[email protected] (T.-P. Hong),
[email protected] (W.-Y. Lin),
[email protected] (W.-Y. Chiu). 1 Tel.: +886 7 5919191. 2 Tel.: +886 7 5919517. 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.02.058
name, login name, date stamp, retrieval method, HTTP completion code, and number of bytes in a file retrieved (http://www.w3.org/ TR/WD-logfile). The log file content has been shown to be valuable in many WUM studies (Bae, Park, & Ha, 2003; Bonchi et al., 2001; Chen, Park, & Yu, 1998; Girardi, Marinho, & de Oliveira, 2005; Huang, Kuo, Chen, & Jeng, 2006; Perkowitz & Etzioni, 2000; Spiliopoulou, 2000; Tanasa & Trousse, 2004; Zhang & Dong, 2002) or data mining studies (Chang & Lee, 2005; Liao, He, & Tang, 2004; Thelwall, 2001); however, it does not include all the user interactions that the WUM algorithm can utilise. As a result, intentional browsing data (IBD) (Tao, Su, & Hong, 2008), such as the scroll-bar, select, or save-as user interactions, was formally defined as a new data ingredient to be used in WUM. The benefits of IBD has been shown to improve the estimation of Web browsing time (Tao, Hong, & Su, 2006) and is used in illustrating Web transaction mining (WTM) algorithms (Yun & Chen, 2000) that explore the role of traversal and purchasing behaviour in e-commerce applications. Specifically in the case of WTM, IBD was used to enhance the intention-based WTM (IWTM) algorithm which focused on the purchased items (IWTMp) and complemented the potential benefits of the not-purchased items not considered before (IWTMnp). These modified algorithms meet
Author's personal copy
3938
Y.-H. Tao et al. / Expert Systems with Applications 36 (2009) 3937–3945
their intended purpose satisfactorily. However, the following practical question begs to be asked: Why is there a need to separate the data sources into purchased and not-purchased segments for two algorithms? It would be more efficient in practice to have one unified IWTM algorithm (IWTMu) to process the whole data set at one time. Furthermore, one IBD seems to be less flexible in real-world applications. It would thus be good to allow multiple IBD types (IWTMum) such that different screens could address the most appropriate IBD types for better usage mining effects. Moreover, according to the idea of fuzzy logic which is ‘‘to approximate human decision making using natural language terms instead of quantitative terms” (Bih, 2006), IBD counts seem to be not userfriendly in practice. Consequently, this paper addresses these three practical issues by proposing a unified IWTMu with fuzzy treatment on IBD (IWTMumf) and discusses the comparative implications among IWTMp, IWTMnp, IWTMu, IWTMum, and IWTMumf. The organisation of the paper is as follows. The IWTMp and IWTMnp algorithms are described in Section 2, the unified IWTMumf algorithm is defined in Section 3, an example of IWTMumf is illustrated in Section 4, a simulation experiment for obtaining the initial empirical implications of the proposed IWTMumf algorithms is presented in Section 5, and finally, the conclusions and implications of the study are discussed in Section 6.
of one IBD on one Web page. Finally, IWTMumf is meant to add fuzzy treatment on IBDs to IWTMum. We modify and enhance the original notations, definitions, and implication rules of IWTM (Tao et al., 2008) as follows. Let N = {n1, n2, . . ., np1, np} be a set of Web pages of a Web site; I = {i1, i2, . . ., im1, im} be the merchandise items listed in the Web site assuming one Web page can have only one o o ot1 o ; bt t g be the merchandise item for sale; and BT ¼ fb1l ; b22 ; . . . ; bt1 IBD t-tuple assuming each Web page has up to BT IBDs associated with corresponding occurrences ot, where p, m, and t are non-zero positive integers and need not be the same values. Fig. 1 illustrates a Web transaction tree with associated IBDs, where A, B, . . ., L represent the Web page names, and A is the entry page that usually contains no merchandise items.
2. Preliminaries
The detailed definitions and notations of fuzzy treatment can be referred to in the work of Wang, Lo, and Hong (2002). Similar to either IWTMp or IWTMnp, the unified IWTMumf algorithm mines transaction patterns but with multiple fuzzy IBDs in each Web page, in which the IBD counts are transformed into fuzzy regions for linguistic terms by a given membership function. The procedure for IWTMuf is first summarised here, and an example is given in the next section:
The original WTM algorithm assumed one merchandise item on one Web page, which was represented as Bfi1 g, meaning a Web page B with item i1. The purpose of IWTMp is to show that an average user’s interest level on an item can be represented as the occurrence of a certain IBD, which can then enhance the predictive power of the original WTM association rules (Tao et al., 2008). For example, an IWTMp association rule on the browsing path of Web 4 1 pages A–B–F–G, that is, hABFG : Bfi1 ; b1 g ! Gfi6 ; b6 gi, indicates additional information on Web page B with 4 occurrences of b1type IBD, and on Web page G with 1 occurrence of b6-type IBD. One implication is that among users who have purchased i1 on page B, the one with a higher b1-type IBD is more likely to purchase i6 on page G, especially if it is accompanied by any b6-type IBD. Therefore, more resources and promotion strategies can be applied to those users with higher occurrences of corresponding b1 and b6. On the other hand, IWTMnp was used to probe those Web pages without any purchase by IBDs, which was not possible in the original WTM algorithm without the IBD data (Tao et al., 2008). For 5 1 example, hABFG : Ff0; b5 g ! Gf0; b6 gi implies that an average user with browsing path A–B–F–G may not purchase anything on Web pages F and G, but has a strong potential interest on page F with 5 occurrences of b5-type IBD and some interest on page G with 1 occurrence of b6-type IBD. Therefore, a proper promotion effort may stimulate a user who has purchased neither on page F nor on G but with higher occurrences of corresponding b6 and b7. The way to use IWTMp and IWTMnp in actual practice is to separately feed the purchase data into IWTMp and then the non-purchase data into IWTMnp for mining results. In the domain of Web mining where speed is sometimes one of the critical performance criteria (Huang et al., 2006; Song & Shepperd, 2006), this unnecessary waste of time in splitting the data for two algorithms needs to be resolved. Accordingly, a unified and enhanced IWTM algorithm with fuzzy IBD linguistic terms is introduced in the next section.
Definition 1. Let {s1, s2, . . ., sy} be a path sequence, where {s1, s2, . . . , sy} # Ny. Definition 2. Let z be the number of fuzzy regions for IBD counts, im # I for 1 6 m 6 x, and b # BT. A transaction pattern is reprer r r r r r sented as hs1 ; s2 ; . . . ; sy : n1 fi1 ; b11 ; b22 ; . . . ; bt z g; n2 fi2 ; b11 ; b22 ; . . . ; bt z g; r1 r2 rz . . . ; nx fix ; b1 ; b2 ; . . . ; bt gi, where r f is the fuzzy region of the fth IBD with 1 6 f 6 z, and {n1, n2, . . . , nx} # {s1, s2, . . . , sx} # Ny. Definition 3. Let hsy: X ? Yi be an association rule, where X and Y are both subsets of (I, BT) and X \ Y = w.
Step 1: Sort all transaction records in ascending order of IDs. Step 2: Generate a set of 1-transaction candidate patterns C1 from Step 1 with multiple fuzzy IBDs. Step 2-1: First, calculate the occurrences of purchased items in each Web page without repetition. For each user, count only one for repeated purchases of the same items, but these should be exact occurrences of IBDs. In the case of one user having the same IBDs in different patterns, take the minimum value as a conservative estimate since the IBD potential has yet to be realised. Then take the maximum value among all users having the same IBD occurrences. Repeat the same procedure for counterpart Web pages without item purchases.
A
I
B
C
D
F
E
J
G K
3. The unified IWTM algorithm with fuzzy IBDs (IWTMumf) From a practical viewpoint, the purpose of a unified IWTMu algorithm is to process the whole intended data set at once using only one algorithm, while IWTMum further removes the restriction
L
H
Node A
Items -
IBD -
B
i1
b1
C D
i2 i3
b2 b3
E
i4
b4
F
i5
b5
G H
i6 i7
b6 b7
I
i8
b8
J
i9
b9
K L
i 10 i 11
b 10 b 11
Fig. 1. A Web transaction tree and corresponding transaction data.
Author's personal copy
3939
Y.-H. Tao et al. / Expert Systems with Applications 36 (2009) 3937–3945
Step 2-2: For each item in C1, convert the occurrence values of multiple IBDs for each Web page into the membership values in all of the fuzzy regions according to the given membership function. An example of a membership function is shown in Fig. 2. Step 2-3: The scalar cardinality of each fuzzy region in all users is calculated as the count value. Step 2-4: The fuzzy region with the highest IBD count among all of the possible regions for each Web page is selected. A set of 1-transaction candidate patterns C1 is then completed with the semantic information of IBDs. Step 3: Set two minimum support values, one for the patterns with item purchases and the other for those without purchases. Save all C1 items whose sums of occurrences are greater than or equal to the minimum support value into large 1-transaction patterns T1, which represents possible browsing paths for purchasing or not purchasing an item over a hurdle value. Step 4: Generate a set of candidate 2-transaction patterns C2 by joining items in T1. Step 4-1: According to the Web browsing sequential paths, generate a set of candidate 2-transaction patterns C2 by joining items in T1. Step 4-2: Calculate the fuzzy membership values of all C2 patterns for all user IDs. For each C2 pattern, first take the minimum value of pattern components within the same user ID as a conservative estimate similar to Step 2-1, and then sum them up across user IDs. Repeat the same process for all C2 patterns. Step 4-3: Calculate the scalar cardinality (count) in C2 for all user IDs. Step 5: Set the minimum support values and save all C2 items whose sums of occurrences are greater than or equal to the minimum support value into large 2-transaction patterns T2. Step 6: Repeat Steps 4 and 5 until no large k-transaction sets can be generated. IWTMumf differs from IWTMp or IWTMnp mainly in three aspects. First, all records enter into IWTMu in Step 1 and are processed differently in Step 2 for records with purchases and without purchases as they are in IWTMp and IWTMnp, respectively. Second, items with purchases and without purchases are joined in a mixed way after Step 3. Therefore, minimum support values can be set differently as in IWTMp and IWTMnp in Steps 3 and 5 so that IWTMu is guaranteed to cover whatever outcomes IWTMp and IWTMnp have. Third, the multiple IBD counts, which are different in IWTMp and IWTMnp, are transformed into a linguistic representation in IWT-
Mumf so that the association rules are much user friendly during business implementation or actual practice.
4. An example of IWTMumf In order to easily demonstrate how the algorithm works, we only considered three IBDs, that is, BT (b1, b2, b3), on all Web pages. We illustrate IWTMumf using the example depicted in Fig. 1 as follows: Step 1: Order user Web browsing transaction records in the ascending order of IDs. Table 1 lists the user IDs, Web sequential browsing paths, and sets of (item, BT) which represent purchase states and corresponding IBD occurrences of b1, b2, and b3. Notice that every path contains both items with purchases and without purchases such 5 1 0 3 1 0 1 0 0 as Bf0; b1 ; b2 ; b3 g, Cfi2 ; b1 ; b2 ; b3 g, and Efi4 ; b1 ; b2 ; b3 g which exist in the path of ABCE path for user ID 1, where item zero represents no purchase. Step 2: Generate the candidate set C1 from all the pages with or without item purchases. Step 2-1: For the with-purchase illustration, user ID 2 in Table 1 has purchased i1 on page B twice, namely, 5 5 0 3 0 0 Bfi1 ; b1 ; b2 ; b3 g and Bfi1 ; b1 ; b2 ; b3 g, which are counted only once. Among the 10 users of IDs 1–10, only users 1–4 had purchased i1 on page B, which leads to a support value of 4 for i1 purchase on page B, as seen in the last column of Table 2. A similar process is applied to the IBD. The frequency of b1 is calculated first as
Table 1 The list of general transaction patterns by user ID ID
Path
Purchase and browsing data
1
ABCE
Bf0; b1 ; b2 ; b3 g; Cfi2 ; b1 ; b2 ; b3 g; Efi4 ; b1 ; b2 ; b3 g
ABFGH
2
3
6
7
9
10 Fig. 2. The membership function for IBDs.
3
1
0
1
0
0
AIJK
Bfi1 ; b1 ; b2 ; b3 g; Cfi2 ; b1 ; b2 ; b3 g; Ef0; b1 ; b2 ; b3 g
5
5
0
5
5
0
3
1
ABFGH
8 5 2 Ffi5 ; b1 ; b2 ; b3 g;
4 3 1 Gf0; b1 ; b2 ; b3 g;
ABCE
4 3 2 2 4 1 Bf0; b1 ; b2 ; b3 g; Cf0; b1 ; b2 ; b3 g; 4 7 1 5 8 5 Bfi1 ; b1 ; b2 ; b3 g; Cf0; b1 ; b2 ; b3 g; 2 3 1 8 3 1 If0; b1 ; b2 ; b3 g; Lf0; b1 ; b2 ; b3 g
1 3 2 Efi4 ; b1 ; b2 ; b3 g 2 5 2 Dfi3 ; b1 ; b2 ; b3 g
5
1
1
5
2
1
3
5
3
2
5
3
1
3
8
9
Hfi7 ; b1 ; b2 ; b3 g
3
3 0 0 Bfi1 ; b1 ; b2 ; b3 g;
2
Hf0; b1 ; b2 ; b3 g
1
ABCE
Bfi1 ; b1 ; b2 ; b3 g; Cfi2 ; b1 ; b2 ; b3 g; Efi4 ; b1 ; b2 ; b3 g
ABFGH
Bfi1 ; b1 ;212 ; b3 g; Ffi5 ; b1 ; b2 ; b3 g; Gfi6 ; b1 ; b2 ; b3 g; Hf0; b1 ; b2 ; b3 g
AIJK
7 9 5 Ifi8 ; b1 ; b2 ; b3 g;
ABCE
Bf0; b1 ; b2 ; b3 g; Cf0; b1 ; b2 ; b3 g; Ef0; b1 ; b2 ; b3 g
ABFGH AIL
2 0 1 Bf0; b1 ; b2 ; b3 g; 1 6 5 If0; b1 ; b2 ; b3 g;
ABCE
Bf0; b1 ; b2 ; b3 g; Cf0; b1 ; b2 ; b3 g; Ef0; b1 ; b2 ; b3 g
AIJK
9
4
1
5
2
0
2
8
5
8 9 8 Jfi9 ; b1 ; b2 ; b3 g;
1
2
0
0
2 7 1 Ff0; b1 ; b2 ; b3 g; 3 1 1 Lf0; b1 ; b2 ; b3 g
0
5
0
1
1
6
4
2 7 1 Kf0; b1 ; b2 ; b3 g 1
2
2
3 7 8 Gf0; b1 ; b2 ; b3 g;
2
1
AIL ABCE
Bf0; b1 ; b2 ; b3 g; Cf0; b1 ; b2 ; b3 g; Ef0; b1 ; b2 ; b3 g
ABFGH
2 1 0 Bf0; b1 ; b2 ; b3 g; 2 0 1 If0; b1 ; b2 ; b3 g;
7
3
0
1
7 3 1 Jf0; b1 ; b2 ; b3 g; 6 2 0 Lf0; b1 ; b2 ; b3 g
2 0 0 Kf0; b1 ; b2 ; b3 g
3
1
0
0
0
1
0
5
0
0
2
0
Bf0; b1 ; b2 ; b3 g; Cf0; b1 ; b2 ; b3 g; Df0; b1 ; b2 ; b3 g If0; b1 ; b2 ; b3 g; Lf0; b1 ; b2 ; b3 g 2
0
1
6
0
3
3
8
1
2
3
1
0
4
Hf0; b1 ; b2 ; b3 g
5
ABCD
1
8
0
4 0 1 3 0 0 Ff0; b1 ; b2 ; b3 g; Gf0; b1 ; b2 ; b3 g; 7 5 1 4 8 0 Jf0; b1 ; b2 ; b3 g; Kf0; b1 ; b2 ; b3 g
AIL
7
1
Hf0; b1 ; b2 ; b3 g
0
4 0 0 If0; b1 ; b2 ; b3 g; 6 3 5 If0; b1 ; b2 ; b3 g;
AIJK 8
0
ABCE
ABCD
5
1
8 3 1 4 2 0 3 2 0 Bfi1 ; b1 ; b2 ; b3 g; Ff0; b1 ; b2 ; b3 g; Gf0; b1 ; b2 ; b3 g; 2 0 1 9 1 0 4 3 1 If0; b1 ; b2 ; b3 g; Jfi9 ; b1 ; b2 ; b3 g; Kf0; b1 ; b2 ; b3 g
AIL 4
5
0
2
0
3
1
0
ABFGH
Bf0; b1 ; b2 ; b3 g; Ff0; b1 ; b2 ; b3 g; Gf0; b1 ; b2 ; b3 g; Hf0; b1 ; b2 ; b3 g
AIJK
If0; b1 ; b2 ; b3 g; Jf0; b1 ; b2 ; b3 g; Kf0; b1 ; b2 ; b3 g
AIJK
If0; b1 ; b2 ; b3 g; Jf0; b1 ; b2 ; b3 g; Kf0; b1 ; b2 ; b3 g
AIL
If0; b1 ; b2 ; b3 g; Lf0; b1 ; b2 ; b3 g
2
5
1
5
2
5
4
1
0
2
0
0
1
0
0
3
1
1
5
0
5
2
1
0
Author's personal copy
3940
Y.-H. Tao et al. / Expert Systems with Applications 36 (2009) 3937–3945
Table 2 The occurrence values of IBD for all user Ids Path
Behaviour
IBD
ID1
AB
Bf0; BT g
b1 b2 b3
5 1
AB
Bfi1 ; BT g
b1 b2 b3
8 3 1
ID2
ID3
ID4
4 3 2 3
ID5
ID6
ID7
ID8
ID9
2
1
2
3 1
2
ID10
Sup 7
1
4 3 1
5 1 1
4
Table 3 The counts of fuzzy regions for candidate sets C1 Path
Behaviour
IBD
Region
AB
Bf0; BT g
b1
Low Middle High Low Middle High Low Middle High
b2
b3
AB
Bfi1 ; BT g
b1
b2
b3
Low Middle High Low Middle High Low Middle High
ID1
ID2
ID3
ID4
0.25 0.75
1 1
1
ID6
ID7
ID8
ID9
0.75 0.25
1
0.75 0.25
0.5 0.5
0.75 0.25
0.5 0.5
1
0.75 0.25
0.25 0.75 0.5 0.5
ID5
0.5 0.5
0.25 0.75
1
1
0.5 0.5
1
1
1
the minimum value for each user. For the with-purchase illustration, user ID 2 in Table 1 has purchased 5 5 0 i1 on page B twice, namely, Bfi1 ; b1 ; b2 ; b3 g and 3 0 0 5 Bfi1 ; b1 ; b2 ; b3 g, which are calculated as minðb1 , 3 3 5 0 0 0 0 0 b1 Þ ¼ b1 , minðb2 , b2 Þ ¼ b2 , and minðb3 , b3 Þ ¼ b3 . This means that the IBD occurrence tuple is (3, 0, 0) for user ID 2 on page B with purchase. On the other hand, Bf0; BT g of path AB, which is the counterpart of Bfi1 ; BT g, is obtained similarly with 0 item purchase 5 4 2 1 2 3 2 5 and maxfb1 , b1 , b1 , b1 , b1 , b1 , b1 g ¼ b1 from IDs 1, 3, 5, 6, 7, 8, and 9. This calculation is repeated for all users with purchase items and without purchase items as seen in Table 2. Due to the length of the tabulation, only partial results are listed in Table 2. Step 2-2: The occurrence values of IBDs for each user on the Web page from Table 2 are converted into the fuzzy sets by the given membership function in Fig. 2. For instance, the IBD value of Bf0; BT g for user ID 3 is 4, which is converted into the fuzzy set (0.25/Low + 0.75/Middle + 0/ High). The complete result of items in Table 2 can be seen in Table 3, in which each item has its values in the regions of Low, Middle, and High. Again, only partial results are listed in Table 3. Step 2-3: In Table 3, the scalar cardinality of each fuzzy region in all users is calculated as the count value. Taking the Low fuzzy region of b1 for Web page B without purchase Bf0; BT g as an example, its scalar cardinality equals (0.0 + 0.0 + 0.25 + 0.0 + 0.75 + 1.0 + 0.75 + 0.5 + 0.75 + 0.0) = 4 as indicated in the last column of Table 3. This step is repeated for all other regions and for all behaviour items. Step 2-4: The fuzzy region with the highest count among the three possible regions for each Web page is selected.
ID10
Count
Sup
4 3.0 0 2.5 0.5 0 1.75 0.25 0
7
0.75 2.5 0.75 2 1 0 3 0 0
4
Taking the IBD of b1 in Bf0; BT g as an example, its counts are 4 for Low (L), 3 for Middle (M), and 0 for High(H). Since the count for Low is the highest among the three counts, the region Low is thus used to represent the L occurrences of b1 for pattern Bf0; BT g in later mining processes as shown in the first path AB by Bf0; BT g with support value (Sup) 7 in Table 4. Notice that Sup is calculated in Step 2-1 as explained earlier. This step is repeated for all other IBDs. Steps 2-2–2-4 complete the fuzzy treatment of IBD counts and thus the complete C1 is shown in Table 4. Step 3: Assume that the minimum support value for item purchase is 2 and for no item purchase, 6. The rationale for the different minimum support values is that among all the records, a practical Web site would have more no-purchased transactions than purchased transactions. Therefore, a higher initial minimum support value on no-purchase transactions will quickly filter out a large portion of insignificant no-purchase items. Save the transaction patterns with purchases whose support values are greater than or equal to 2 in the large 1-transaction pattern set T1. Similarly, save the transaction patterns without purchase whose support values are greater than or equal to 6 in the large 1-transaction pattern set T1 as seen in Table 5. Step 4: Generate a set of candidate 2-transaction patterns C2 by joining items in T1. Step 4-1: According to the Web sequential browsing paths, generate the 2-transaction pattern candidate set C2 from T1 by joining items in T1 as shown in Table 6. Notice that the Sup value is determined by the occurrence counts of the behaviour items found in Table 1. If the count is 0, then the 2-transaction pattern is not shown in Table 6.
Author's personal copy
3941
Y.-H. Tao et al. / Expert Systems with Applications 36 (2009) 3937–3945 Table 4 1-transaction pattern candidate set (C1) Path
Pattern
AB
L L L Bf0; b1 ; b2 ; b3 g M L L Bfi1 ; b1 ; b2 ; b3 g M M L Cf0; b1 ; b2 ; b3 g M L L Cfi2 ; b1 ; b2 ; b3 g L 0 M Df0; b1 ; b2 ; b3 g L M L Dfi3 ; b1 ; b2 ; b3 g L L L Ef0; b1 ; b2 ; b3 g L L L Efi4 ; b1 ; b2 ; b3 g M L L Ff0; b1 ; b2 ; b3 g H M M Ffi5 ; b1 ; b2 ; b3 g M L L Gf0; b1 ; b2 ; b3 g
AB ABC ABC ABCD ABCD ABCE ABCE ABF ABF ABFG
IBD count
Sup
Pattern
AB
L L L Bf0; b1 ; b2 ; b3 g M L L Bfi1 ; b1 ; b2 ; b3 g M L L Cfi2 ; b1 ; b2 ; b3 g L L L Efi4 ; b1 ; b2 ; b3 g
AB ABC ABCE
7
ABFG
4
ABFGH
3
5
ABFGH
2.5
3
AI
1
1
AI
1
1
AIJ
3.75
4
AIJ
2.5
3
AIJK
2.25
4
AIJK
1.25
2
AL
2.75
5
AL
Sup
Path
Pattern
Sup
H M M Ffi5 ; b1 ; b2 ; b3 g L M L If0; b1 ; b2 ; b3 g H H H Jfi9 ; b1 ; b2 ; b3 g M L L Kf0; b1 ; b2 b3 g
2
7
ABF
4
AI
3
AIJ
3
AIJK
Pattern
ABC
Bf0; b1 ; b2 ; b3 gCfi2 ; b1 ; b2 ; b3 g
8 2 6
Sup
Path
Pattern
2
ABF
Bf0; b1 ; b2 ; b3 g
ABC
Bfi1 ; b1 ; b2 ; b3 gCfi2 ; b1 ; b2 ; b3 g
L
2
AIJ
Bfi1 ; b1 ; b2 ; b3 g
L
2
ABCE
Bf0; b1 ; b2 ; b3 gEfi4 ; b1 ; b2 ; b3 g
2
AIJK
If0; b1 ; b2 ; b3 g
ABCE
Bfi1 ; b1 ; b2 ; b3 gEfi4 ; b1 ; b2 ; b3 g
1
AIJK
If0; b1 ; b2 ; b3 g
ABCE
M L L L L L Cfi2 ; b1 ; b2 ; b3 gEfi4 ; b1 ; b2 ; b3 g
3
M
L
L
L
L
M
M
L
L
L
M
L
L
L
L
L
L
L
L
L
L
L
Sup L
M
L
L
2
L
M
L
1
L
M
L
5
Table 7 M M The membership values for hBfi1 ; b1 g Cfi2 ; b2 gi of all users M
L
L
M
L
L
M
L
L
M
L
L
ID
Bfi1 ; b1 ; b2 ; b3 g
Cfi2 ; b1 ; b2 ; b3 g
Bfi1 ; b1 ; b2 ; b3 g \ Cfi2 ; b1 ; b2 ; b3 g
1 2 3 4 5 6 7 8 9 10
1.0 0.5 0.75 1.0 0 0 0 0 0 0
1.0 1.0 0 1.0 0 0 0 0 0 0
1.0 0.5 0 1.0 0 0 0 0 0 0
Total
IBD count
M
Path
L
L M M Gfi6 ; b1 ; b2 ; b3 g L H H Hf0; b1 ; b2 ; b3 g M M L Hfi7 ; b1 ; b2 ; b3 g L M L If0; b1 ; b2 ; b3 g H H M Ifi8 ; b1 ; b2 ; b3 g M L Jf0; b1 ; b2 b3 g H H H Jfi9 ; b1 ; b2 ; b3 g M L L Kf0; b1 ; b2 b3 g 0 0 0 Kfi10 ; b1 ; b2 ; b3 g M L L Lf0; b1 ; b2 ; b3 g 0 0 0 Lfi11 ; b1 ; b2 ; b3 g
3
Table 6 2-transaction pattern candidate set (C2)
L
Pattern
4
Table 5 Large 1-transaction pattern set (T1) Path
Path
2.5
Step 4-2: For each user ID, the fuzzy membership values of each 2-transaction pattern in candidate set C2 is calculated. For example, when calculating Bfi1 ; b1 ; b2 ; b3 g of ID1 as M seen in Table 3, the fuzzy region value of ð0:25=b1 þ L L 0:5=b2 þ 1=b3 Þ is calculated as max(0.25, 0.5, 1) = 1, that M L L is, the fuzzy value of Bfi1 ; b1 ; b2 ; b3 g is 1. Repeat the fuzzy-value calculation of all patterns for each user. The minimum operator is used for the intersection. TakM L L M L L ing hABC : Bfi1 ; b1 ; b2 ; b3 g, Cfi2 ; b1 ; b2 , b3 gi as an example, its membership value for ID 1 in Table 3 is calculated as min(1.0, 1.0) = 1.0. The overall membership
L
L
Sup
1
1
3.5
5
1
1
5
8
1
1
2.25
4
1.75
2
3.25
6
0
0
3.75
5
0
0
M
L
L
value for hABC : Bfi1 ; b1 ; b2 ; b3 g, Cfi2 ; b1 ; b2 ; b3 gi is 2.5, which is calculated by adding up the membership values of IDs 1–10 as shown in Table 7. Step 4-3: The scalar cardinality (count) of each candidate 2-item set in C2 is calculated as seen in Table 8. Step 5: Assuming both minimum support values are set to 2, only those patterns whose support values are greater than or equal to 2 are kept in the large 2-transaction pattern set T2 as seen in Table 9. Step 6: According to the Web sequential browsing paths, generate 3-transaction pattern candidate set C3 by joining the item sets from T2 as seen in Table 10. Since all the support values are less than the minimum support value 2, this algorithm ends here. Accordingly, the final results generate six association fuzzy rules as suggested by T2 in Table 9, which can be seen in Table 11 under Section 5. The derived rules from IWTMumf are listed together with the ones from IWTMp, IWTMnp, IWTMu, and IWTMfu in Table 11. As we can see, the unified IWTMu algorithm not only covers all the (first four) rules derived from both IWTMp and IWTMnp, but also generates two new rules by rejoining the split data sets. In other words, IWTMu can efficiently obtain the same results as IWTMp and IWTMnp do while simultaneously enriching the rule base of previous IWTM algorithms. For instance, the new rule 5 3 hABCE : Bf0; b1 g ! Efi4 ; b4 gi implies that an average user who did 5 not purchase on page B but with a high interest level of b1 -type 5 IBD may purchase items on Web pages E with high levels of b1 3 type or b4 -type IBDs, respectively. In practice, we can allocate more resources to promote any user who has presented a high occurrence of b1-type IBD on page B for potential purchase on page E. Furthermore, more dedicated strategies can be deployed based 5 on the interest levels of b1 so that more accurate customer targeting and marketing performance may be achieved. On the other hand, the rules derived from IWTMuf and IWTMu are identical except that the IBD counts in IWTMu are replaced by IBD linguistic terms L/H/M in IWTMuf. As a result, the IBD is expressed in a more natural and understandable way for decision makers to apply the association rules in practice as suggested in Bih (2006). Finally, IWTMumf has evolved into an integral version that preserves all the intended characteristics while removing the restriction of one IBD per page. Therefore, the derived rules are identical to the ones from either IWTMuf or IWTMu, except that more IBDs are included as extra clues for decision making. From the view of WTM evolution, the IWTMumf algorithm has been demonstrated to present a highly integrated view on IBD.
Author's personal copy
3942
Y.-H. Tao et al. / Expert Systems with Applications 36 (2009) 3937–3945
Table 8 2-Transaction pattern candidate set with fuzzy counts Path
Pattern
ABC
Bf0; b1 ; b2 ; b3 g Cfi2 ; b1 ; b2 ; b3 g
ABCE
Bf0; b1 ; b2 ; b3 g Efi4 ; b1 ; b2 ; b3 g
ABC
Bfi1 ; b1 ; b2 ; b3 g Cfi2 ; b1 ; b2 ; b3 g
1.0
ABCE
M L L L L L Bfi1 ; b1 ; b2 ; b3 g Efi4 ; b1 ; b2 ; b3 g M L L L L L Cfi2 ; b1 ; b2 ; b3 g Efi4 ; b1 ; b2 ; b3 g L L L H M M Bf0; b1 ; b2 ; b3 g Ffi5 ; b1 ; b2 ; b3 g M L L H M M Bfi1 ; b1 ; b2 ; b3 g Ffi5 ; b1 ; b2 ; b3 g L M L H H H If0; b1 ; b2 ; b3 g Jfi9 ; b1 ; b2 ; b3 g L M L If0; b1 ; b2 ; b3 g
1.0
ABCE ABF ABF AIJ AIJK
ID1
L
L
L
M
L
L
L
L
M
L
L
L
L
L
M
L
L
ID2
ID3
ID4
ID5
ID6
ID7
ID8
1.0
L
0.5 1.0
1.0
0.5
2 1
1.0
2.0
3
0
2
1.0
1.0 1.0
0.25
0.75
1.0
Sup
1.0
2
ABCE
L L L L L L Bf0; b1 ; b2 ; b3 g Efi4 ; b1 ; b2 ; b3 g M L L M L L Bfi1 ; b1 ; b2 ; b3 g Cfi2 ; b1 ; b2 ; b3 g M L L L L L Cfi2 ; b1 ; b2 ; b3 g Efi4 ; b1 ; b2 ; b3 g L L L H M M Bf0; b1 ; b2 ; b3 g Ffi5 ; b1 ; b2 ; b3 g M L L H M M Bfi1 ; b1 ; b2 ; b3 g Ffi5 ; b1 ; b2 ; b3 g
1.75
2
ABF
0.75
M
L
L
2.5
2
2.0
3
0
2
1.5
2
I
B
C
D
F
E
Table 10 3-Transaction pattern candidate set (C3) Path
Behaviour Bf0; b1 ; b2 ; b3 g Cfi2 ; b1 ; b2 ; b3 g Efi4 ; b1 ; b2 ; b3 g
ABCE
Bfi1 ; b1 ; b2 ; b3 g Cfi2 ; b1 ; b2 ; b3 g Efi4 ; b1 ; b2 ; b3 g
M
Sup L
L
M
L
3.75
5
J
N
L
G K
T
O
Q
P
U
W
R V S
Fig. 3. Item tree for the Web toy store.
ABCE
L
2 1
M
H
L
1.5 1.0
A Count
ABF
2
2.5
Pattern
ABCE
2
1.75
3.0
Bf0; b1 ; b2 ; b3 g Cfi2 ; b1 ; b2 ; b3 g
ABC
Sup
1.0
1.0
Path
L
Count
1.0
ABC
L
ID10
0.75
Table 9 Large 2-transaction pattern set (T2)
L
ID9
1.0
M
L
L
L
L
L
L
L
L
L
5. A simulation experiment
1
L
The merits of the integrated IWTMumf and some extra benefits have been clearly demonstrated through the examples in the above
1
Table 11 Comparisons of all IWTM algorithms Algorithm
Transaction Behaviour Rules
IWTMp (with purchase)
hABC : Bfi1 ; b1 g ! Cfi2 ; b2 gi
8
5
8
8
Implications to WTM Enhancement
hABF : Bfi1 ; b1 g ! Ffi5 ; b5 gi hABCE :
5 Cfi2 ; b2 g
3
! Efi4 ; b4 gi
7
4
IWTMnp (without purchase)
hAIJK : If0; b8 g ! Kf0; b10 gi
Complement
IWTMu (unified algorithm)
8 5 hABC : Bfi1 ; b1 g ! Cfi2 ; b2 gi 8 8 hABF : Bfi1 ; b1 g ! Ffi5 ; b5 gi 5 3 hABCE : Cfi2 ; b2 g ! Efi4 ; b4 gi 7 4 hAIJK : If0; b8 g ! Kf0; b10 gi 5 3 hABCE : Bf0; b1 g ! Efi4 ; b4 gi 9 4 hAIJK : Jfi9 ; b9 g ! Kf0; b10 gi
Complete enhancement & complement
IWTMuf (unified algorithm with fuzzy IBD)
hABC : Bfi1 ; b1 g ! Cfi2 ; b2 gi
M
M
Complete enhancement & complement with semantic IBD expression
M H hABF : Bfi1 ; b1 g ! Ffi5 ; b5 gi L M hAIJK : If0; b8 g ! Kf0; b10 gi M L hABCE : Cfi2 ; b2 g ! Efi4 ; b4 gi L L hABCE : Bf0; b1 g ! Efi4 ; b4 gi H M hAIJK : Jfi9 ; b9 g ! Kf0; b10 gi
IWTMumf (unified algorithm with multiple fuzzy IBDs)
M
L
L
M
L
L
hABC : Bfi1 ; b1 ; b2 ; b3 g ! Cfi2 ; b1 ; b2 ; b3 gi M L L H M M hABF : Bfi1 ; b1 ; b2 ; b3 g ! Ffi5 ; b1 ; b2 ; b3 gi L M L M L L hAIJK : If0; b1 ; b2 ; b3 g ! Kf0; b1 ; b2 b3 gi L L L M L L hABC : Bf0; b1 ; b2 ; b3 g ! Cfi2 ; b1 ; b2 ; b3 gi L L L L L L hABCE : Bf0; b1 ; b2 ; b3 g ! Efi4 ; b1 ; b2 ; b3 gi M L L L L L hABCE : Cfi2 ; b1 ; b2 ; b3 g ! Efi4 ; b1 ; b2 ; b3 gi L L L H M M hABF : Bf0; b1 ; b2 ; b3 g ! Ffi5 ; b1 ; b2 ; b3 gi
Complete enhancement & complement with multiple semantic IBDs
Author's personal copy
Y.-H. Tao et al. / Expert Systems with Applications 36 (2009) 3937–3945 Table 12 Parameters setting IBD number
Day type Parameters for a typical business day
Parameters for a holiday
One IBD (b1)
300 users with 1000 records
Three IBDs (b1, b2, b3)
300 users with 1000 records
Six IBDs (b1, b2, b3, b4, b5, b6)
300 users with 1000 records
700 users with 2000 records 700 users with 2000 records 700 users with 2000 records
section. In this section, a simulation experiment was used to further illustrate and explore how this unified algorithm could be useful to a decision maker by another example, that of a Web toy store. The scenario is that the daily volumes of browsing and sales in this Web toy store are stable but are significantly increased dur-
Membership function values Verylow
Low
High
Middle
Very high
1.0
0
1
2
3
4
5
6
7
8
9
IBD counts
Fig. 4. Modified membership functions for the IBD.
3943
ing holidays such as Halloween and Christmas. Therefore, the management of this Web toy store is interested in knowing what toy items are more attractive to buyers in order to appropriately allocate the marketing budget in online promotion for a typical business day as well as for a national holiday. Assume there were 22 different items as shown in Fig. 3. One typical business day and one national holiday were simulated assuming an average of 300 users with 1000 records for a typical business day and 700 users with 2000 records for a holiday. In order to explore the impact of the number of IBDs, the simulation runs were conducted with one IBD, three IBDs, and six IBDs, respectively. The experimental matrix of IBD number versus day type is listed in Table 12. The six IBDs are rolling the scroll-bar (b1), clicking a hyperlink (b2), hitting the back hyperlink (b3), selecting a range of context (b4), printing a page (b5), and copying texts (b6). The first three IBDs belonged to page manipulation behaviours, while the last three belonged to data retrieval behaviours. In order to appropriately present the fuzzy semantic for IBD information in this case, the membership function has been modified to become a five-layer hierarchical area as shown in Fig. 4 with the semantic of Very Low (S), Low (L), Middle (M), High (H), and Very High (X). A simple program was implemented using Borland’s C++ Builder using text files as the input and output files. The program interface for executing IWTMufm is shown in Fig. 5, which sets up mining data sources, minimum hurdles, and IBD counts for both purchased and no-purchased merchandises in area A, path structure in area B, and algorithm execution procedures in area C. Using the same random number seeds for the three simulation runs, five rules were mined out for the typical business day and four for the national holiday under one, three, and six IBDs as seen in Table 13.
Fig. 5. Interface for executing IWTMumf algorithm.
Author's personal copy
M M M S M M 0 0 0 S S M
¼ 0:070; Con ¼ 46=277
4. hAMN : Mfim ; b1 ; b2 ; b3 ; b4 ; b5 ; b6 g ! Nf0; b1 ; b2 ; b3 ; b4 ; b5 ; b6 giSup ¼ 0:075; Con ¼ 49=79
S S S S M M 0 0 0 S M M
! 2. hAMN :
3. hAMNO : Mf0; b1 ; b2 ; b3 ; b4 ; b5 ; b6 g ! Ofio ; b1 ; b2 ; b3 ; b4 ; b5 ; b6 giSup ¼ 0:061; Con ¼ 40=277
M M M S
M M S M H M Nfin ; b1 ; b2 ; b3 ; b4 ; b5 ; b6 giSup M M S 0 0 0 Mf0; b1 ; b2 ; b3 ; b4 ; b5 ; b6 g
M M 0 0
¼ 0:075; Con ¼ 49=79 !
0 S M M M M 0 0 0 S
1. 2. 3. 4. 5. Six IBDs (b1, b2, b3, b4, b5, b6)
M
S
4. hAMN :
1. hAMN : Mf0; b1 ; b2 ; b3 ; b4 ; b5 ; b6 g ! Nf0; b1 ; b2 ; b3 ; b4 ; b5 ; b6 giSup ¼ 0:274; Con ¼ 179=277
To avoid describing the lengthy analytical process, only important observations from Table 13 with practical implications for decision making are included as follows: First, the major strength of IWTMumf is that it can mine additional rules with all merchandise items and even including those without any purchases. Take the first four mined rules from the typical business day under the one IBD case for example. Although there was no purchase of merchandise item im in page M, certain IBD activities in page M still imply consequent purchases on merchandise items in in page N, io in page O, ip in page P, or ir in page R. Similar examples can be found in the mined rules for the holiday and for three and six IBD cases. Therefore, this simulation experiment confirmed again the integrated mining convenience of IWTMumf over IWTMp and IWTMnp. There are four new observations that can be concluded from this simulation experiment. First, applying IWTMumf to different typical days was necessary. Mined Rules 4 and 5 for the typical business day were not shown for the typical holiday. This implies that different scenarios need to be identified when applying IWTMumf for better adoption effectiveness. Second, the higher number of users and transaction records did not generate more mined rules. There were five mined rules for the typical business day, while there were only four for the national holiday. Therefore, this implies that IWTMumf can be quickly adopted without a large data set, which makes it a low-entry hurdle algorithm in practice. Third, more IBDs in the mined rules provide a better fine-tuning capability to effectively allocate resources for executing marketing strategies. For example, in the first mined rule for the typical business day, if only one IBD (b1) was recorded when two users browsed page M with the same M level of interest on b1, the same marketing strategy applies to both users. However, in the same situation, if six IBDs were recorded, then when any one of the remaining five IBDs was captured differently, a different marketing strategy can be deployed. This implies that multiple IBDs are preferred in practical applications of IWTMumf. Finally, contrary to the above observation, too many IBDs may degrade the usability of the mined rules. In the six-IBD case, mapping the large number of IBD semantic combinations (56) to marketing strategies becomes complicated. Even for the three-IBD case, there are already 125 (53) combinations. Therefore, it is recommended that only critical and representative IBDs should be included in IWTMumf for the purpose of increasing its usability in practical applications.
M
0
0
0
hAMN : Mf0; b1 ; b2 ; b3 ; b4 ; b5 ; b6 g ! Nfin ; b1 ; b2 ; b3 ; b4 ; b5 ; b6 giSup ¼ 0:096; Con ¼ 28=145 M S S 0 0 0 M M M S S S hAMNO : Mf0; b1 ; b2 ; b3 ; b4 ; b5 ; b6 g ! Ofio ; b1 ; b2 ; b3 ; b4 ; b5 ; b6 giSup ¼ 0:055; Con ¼ 16=145 M S S 0 0 0 M M S 0 0 0 hAMNP : Mf0; b1 ; b2 ; b3 ; b4 ; b5 ; b6 g ! Pfip ; b1 ; b2 ; b3 ; b4 ; b5 ; b6 giSup ¼ 0:055; Con ¼ 16=145 M S S 0 0 0 M S M 0 0 0 hAMQR : Mf0; b1 ; b2 ; b3 ; b4 ; b5 ; b6 g ! Rfir ; b1 ; b2 ; b3 ; b4 ; b5 ; b6 giSup ¼ 0:058; Con ¼ 17=145 M M S 0 0 0 M S M 0 0 0 hATU : Tf0; b1 ; b2 ; b3 ; b4 ; b5 ; b6 g ! Ufiu ; b1 ; b2 ; b3 ; b4 ; b5 ; b6 giSup ¼ 0:055; Con ¼ 16=101
S
M M S Nf0; b1 ; b2 ; b3 giSup M S S Mfim ; b1 ; b2 ; b3 g
M M S S M
S
3. hAMNO : Mf0; b1 ; b2 ; b3 g ! Ofio ; b1 ; b2 ; b3 giSup ¼ 0:061; Con ¼ 40=277
S M
M M S S M
¼ 0:075; Con ¼ 49=79
2. hAMN : Mf0; b1 ; b2 ; b3 g ! Nfin ; b1 ; b2 ; b3 giSup ¼ 0:070; Con ¼ 46=277
M S S M
! 4. hAMN :
1. hAMN : Mf0; b1 ; b2 ; b3 g ! Nf0; b1 ; b2 ; b3 giSup ¼ 0:274; Con ¼ 179=277
S
1. 2. 3. 4. 5. Three IBDs (b1, b2, b3)
M
S
M
M
M
hAMN : Mf0; b1 ; b2 ; b3 g ! Nfin ; b1 ; b2 ; b3 giSup ¼ 0:096; Con ¼ 28=145 M S S M M M hAMNO : Mf0; b1 ; b2 ; b3 g ! Ofio ; b1 ; b2 ; b3 giSup ¼ 0:055; Con ¼ 16=145 M S S M M S hAMNP : Mf0; b1 ; b2 ; b3 g ! Pfip ; b1 ; b2 ; b3 giSup ¼ 0:055; Con ¼ 16=145 M S S M S M hAMQR : Mf0; b1 ; b2 ; b3 g ! Rfir ; b1 ; b2 ; b3 giSup ¼ 0:058; Con ¼ 17=145 M M S M S M hATU : Tf0; b1 ; b2 ; b3 g ! Ufiu ; b1 ; b2 ; b3 giSup ¼ 0:055; Con ¼ 16=101
M
! Ofio ; b1 giSup ¼ 0:061; Con ¼ 40=277
M
M Nf0; b1 giSup M Mfim ; b1 g
3. hAMNO :
M Mf0; b1 g
M
! Nfin ; b1 giSup ¼ 0:070; Con ¼ 46=277
2. hAMN :
M Mf0; b1 g
M M
M
hAMN : Mf0; b1 g ! Nfin ; b1 giSup ¼ 0:096; Con ¼ 28=145 M M hAMNO : Mf0; b1 g ! Ofio ; b1 giSup ¼ 0:055; Con ¼ 16=145 M M hAMNP : Mf0; b1 g ! Pfip ; b1 giSup ¼ 0:055; Con ¼ 16=145 M M hAMQR : Mf0; b1 g ! Rfir ; b1 giSup ¼ 0:058; Con ¼ 17=145 M M 5:hATU : Tf0; b1 g ! Ufiu ; b1 giSup ¼ 0:055; Con ¼ 16=101 1. 2. 3. 4. 5.
Mining rules for the typical business day (1000 records and 293 users)
One IBD (b1)
Season type IBD number
Table 13 Simulation results
1. hAMN : Mf0; b1 g ! Nf0; b1 giSup ¼ 0:274; Con ¼ 179=277
Y.-H. Tao et al. / Expert Systems with Applications 36 (2009) 3937–3945
Mining rules for the national holiday (2000 records and 654 users)
3944
6. Conclusions and future work This paper focused on practical improvements that were not addressed in the illustrative work (Tao et al., 2008) regarding how potential benefits could be brought into the traditional WTM algorithm for e-commerce applications. We have successfully shown how an integrated IWTMumf algorithm could capture the same outcomes as IWTMp and IWTMnp algorithms do while deriving additional association rules and accommodating multiple IBD types. Moreover, IBD counts in derived association rules were converted to a more readable expression of linguistic terms via the fuzzy set concept to heed the suggestion of Bih (2006). In other words, IWTMumf is indeed an efficient replacement of IWTMp and IWTMnp algorithms and an effective improvement of the volume of useful association rules in practical business applications. Accordingly, the major contribution of this study as an extension work is that any electronic-commerce Web site can now more effectively and efficiently utilise IWTMumf to make recommendations to their customers based on their intentions as no other WUM algorithm has ever done before. This was due to the newly defined IBD source, which was analytically demonstrated in the
Author's personal copy
Y.-H. Tao et al. / Expert Systems with Applications 36 (2009) 3937–3945
sample example in Section 4 and which was partially illustrated by the toy-store simulation experiment in Section 5. Nevertheless, establishing the capability of collecting IDBs from Web users is not trivial; thus, one immediate future work may be to seek the cooperation of an electronic-commerce Web site for experimenting IWTMnp in real-world operations based on the work of Tao et al. (Tao et al., 2008). Another future technical work is to relieve the constraint of one merchandise item per Web page in IWTMumf, as posed by the original WTM algorithm (Yun & Chen, 2000), the purpose of which is to make IWTMumf an empirically validated method in practical applications.
References Bae, S. M., Park, S. C., & Ha, S. H. (2003). Fuzzy Web ad selector based on Web usage mining. IEEE Intelligent Systems, 18(6), 62–69. Bih, J. (2006). Paradigm shift – An introduction to fuzzy logic. IEEE Potentials, 25(1), 6–21. Bonchi, F., Giannotti, F., Gozzi, C., Manco, G., Nanni, M., Pedreschi, D., et al. (2001). Web log data warehousing and mining for intelligent web caching. Data and Knowledge Engineering, 39(2), 165–189. Chang, J. H., & Lee, W. S. (2005). Efficient mining method for retrieving sequential patterns over online data streams. Journal of Information Science, 31(5), 420–432. Chen, M. S., Park, J. S., & Yu, P. S. (1998). Efficient data mining for path traversal patterns. IEEE Transaction on Knowledge and Data Engineering, 10(2), 209–221. Cooley, R., Mobasher, B., & Srivastava, J. (1999). Data preparation for mining World Wide Web browsing patterns. Journal of Knowledge and Information Systems, 1(1), 5–32.
3945
Girardi, R., Marinho, L. B., & de Oliveira, I. R. (2005). A system of agent-based software patterns for user modelling based on usage mining. Interacting with Computers, 17(5), 567–591. Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Academic Press. Huang, Y.-M., Kuo, Y.-H., Chen, J.-N., & Jeng, Y.-L. (2006). NP-miner: A real-time recommendation algorithm by using web usage mining. Knowledge-Based Systems, 19(4), 272–286. Liao, S. S., He, J. W., & Tang, T. H. (2004). A framework for context information management. Journal of Information Science, 30(6), 528–539. Oren, E. (1996). The World Wide Web: Quagmire or gold mine. Communications of the ACM, 39, 65–68. Perkowitz, M., & Etzioni, O. (2000). Towards adaptive Web sites: Conceptual framework and case study. Artificial Intelligence, 118(1–2), 245–275. Song, Q., & Shepperd, M. (2006). Mining Web browsing patterns for e-commerce. Computers in Industry, 57, 622–630. Spiliopoulou, M. (2000). Web usage mining for Web site evaluation. Communications of the ACM, 43(8), 127–134. Tanasa, D., & Trousse, B. (2004). Advanced data preprocessing for intersites Web usage mining. IEEE Intelligent Systems, 19(2), 59–65. Tao, Y. H., Hong, T. P., & Su, Y. M. (2006). Improving browsing time estimation with intentional behaviour data. International Journal of Computer Science and Network Security, 6(12), 35–39. Tao, Y. H., Su, Y. M., & Hong, T. P. (2008). Web usage mining algorithm with intentional browsing data. Expert System with Applications, 35(4), 1893–1904. Thelwall, M. (2001). A Web crawler design for data mining. Journal of Information Science, 27(5), 319–325. Wang, S. L., Lo, W. S., & Hong, T. P. (2002). Discovery of fuzzy multiple-level Web browsing patterns. In Proceedings of the international conference on fuzzy systems and knowledge discovery. Singapore. Yun, C. H., & Chen, M. S. (2000). Using pattern-join and purchase-combination for mining transaction patterns in an electronic commerce environment. In The 24th annual international conference on computer software and applications (pp. 99–104). Taiwan, ROC. Zhang, D., & Dong, Y. (2002). A novel Web usage mining approach for search engines. Computer Networks, 39(3), 303–310.