Knowledge Discovery of Weighted RFM Sequential Patterns With Multi Time Interval From Customer Sequence Database Chandni Naik, Prof. Ankit Kharwar
Prof. Mukesh Patel
Computer Engineering Uka-Tarsadia University Surat, India
[email protected] [email protected]
Information Technology Sarvajanik college of Engineering and Technology Surat, India
[email protected]
Abstract— Sequential pattern mining is helpful methodology to discover customer purchasing behaviour from large sequence database. Sequential pattern mining can be used in medical records, marketing, sales analysis, and web log analysis and so on. The traditional sequential pattern mining does not give the pattern which is recent and profitable. So, RFM-based sequential pattern mining techniques is introduced. Although RFM-based sequential pattern mining gives buying patterns which are recently active and profitable however it does not give the time interval between each and every items. To discover a time interval, RFM-TI algorithm is proposed. The advantages of considering multi time interval is, from that we are able to realize what customer would possibly buy in next “h” step rather than next step. The experimental evaluation shows that the proposed method can discover more valuable patterns than RFM-based sequential pattern mining. Keywords— sequential pattern mining; RFM; knowledge discovery; Data mining; multi-time-interval.
I. INTRODUCTION Data mining is widely used to mine implicit, previously unknown and potentially useful information from database. Many approaches are proposed to extract useful information such as association rule mining, temporal pattern mining, clustering, and classification. For extracting useful information numerous techniques to the temporal data mining is proposed such as temporal association rules mining, time series analysis, and sequential pattern mining. Among them sequential pattern mining is most important one which has many application such as customer purchasing behavior analysis, web log analysis. A typical example of sequential pattern is “After buying a mobile phone customer return to buy a screen guard and phone cover together”. Earlier researches emphasis only on concept of frequency. In real life, however, Changes in the environment may occur frequently. Thus, the recent behavior of the users’ is not necessarily the identical as the past ones and it may be possible that the frequently occurring pattern in past may not happen again in the future. Furthermore, a pattern which has a low value is not important to the retailer. So to deal with this issue RFM based sequential pattern mining is introduced by Yen-Liang Chen et
978-1-4799-4674-7/14/$31.00©2014 IEEE
al. [4] and Ya-Han Hu et al. [3]. RFM stands for Recency, Frequency and Monetary value. Although RFM based sequential pattern mining gives pattern which is recent and profitable but it does not reveals the time interval between each and every pair of items. Existing study on discovering a time interval between items is on a traditional sequential pattern mining. Chen et al. [20] proposed a technique which reveals time interval between successive items but if item is not bought one after another it cannot tell us the interval between those two items. Even though it is useful to determine when to take the next step, we are not capable to determine when to take the next h steps. For example, although the pattern , , , , , , , , indicates that the time-interval between items c and d is , we are unclear to know which time interval is between items c and b. To deal with this issue Ya-Han Hu et al. [9] proposed a technique which gives the time interval between each and every pairs of item even if it is not bought consecutively. For example, we may have a multi-time-interval sequential pattern , , , , , , Here, the list of , , , intervals ( , , ) before item d means the intervals between items d and a is , c and a is and b and a is . In this study, we focus on mining a pattern which is actively recent, profitable and which reveals the time interval between each and every pair of items. So we incorporate concepts of multi time interval into RFM based sequential pattern mining. We develop RFM-TI algorithm to discover a RFM-multi time interval sequential pattern. The rest of this paper is organized as follows. In Section 2, we tend to review the related work. Then, we define the problem in Section 3. In Section 4, we develop the RFM-TI algorithm. Section 5 provides experimental results for performance evaluation. Finally, our conclusion is presented in section 6. II. RELATED WORK Sequential pattern mining is a method of mining sequential patterns which satisfy the user defined minimum support. The SPM has been extensively used in areas such as
recommendation system, reorganization of Web sites, use the history of symptoms to forecast disease. Sequential pattern mining can also be used in organizations to learning about customer behaviours and provide better CRM. Existing research on SPM is focus on two main issues: (1) Enhance the effectiveness of the mining process [6] (2) proposing further time-related patterns, like cyclic patterns, time-interval sequential patterns, hybrid sequential patterns, traversal patterns, frequent episodes, periodic patterns, multiple dimensional sequential patterns, fuzzy sequential patterns [6] these studies focus on the concept of frequency in selecting sequential patterns. Two researches were done on RFM-based sequential pattern mining which is discussed below: Yen-Liang Chen et al.[4] have define RFM sequential pattern mining and proposed RFM-Apriori algorithm for discovery of RFM-SP from customer purchasing database by integrating a recency, frequency, and monetary concepts. A pattern segmentation framework is also designed for managerial decision making. Ya-Han-Hu et al. [3] proposed RFM-PostfixSpan algorithm for discovery of a complete set of RFM-SP from customer sequence database. RFM-PostfixSpan algorithm outperforms the traditional PrefixSpan algorithm in terms of number of generated pattern and runtime. Although, the discovered RFM-SP pattern is recent and profitable, it cannot give the time interval between each and every items in pattern. Previous researches to discover a time interval is done on traditional sequential pattern mining. Srikant and Agrawal [1] proposed a GSP algorithm to handle time constraint. To apply the algorithm, we have to specify the sliding time window size, the minimum time gap, the maximum time gap in advance. The sequential pattern discovered is thus a sequence of windows, each of which includes a set of items. Items within the same window are bought in the same period. The problem with above approach is, first, information about the time-interval is not clearly shown in the patterns. Second, all items were restricted by the same time constraints. So the possibilities of existence of different time intervals between different pairs of items are not alloweded. Chen et al. [20] solve the first problem by generating pattern which gives the time interval between successive items. Second problem is solved by Ya-Han-Hu [9] which gives a time interval between each and every pairs of item. To the best of our knowledge, however this paper is first in applying the multi time interval techniques in RFM-based sequential pattern mining. A. RFM-Problem definitions Data-sequence S of customer can be represented by s , pt , pq , s , pt , pq , … , s , pt , pq , where s , pt , pq represent that item s was purchased with quantity pq at time pt , and the j is, 1 j m , and pt ≤ pt for 2 j m. The item is ordered alphabetically if items occur at the same time in the data-sequence. Several definitions are given below [3].
Definition 1: Let I indicate a set of all items in the database. Give a data-sequence S= s , pt , pq , s , pt , pq , … , s , pt , pq , and an item set I i , i , … , i , where I I and n m . So k p I is contained in S if there are n integers 1 k k m such that i s ,i s ,…,i s pt , … , pt . and pt Definition 2 (Subsequence): Let D = I , I , … , I be a sequence of itemsets, where each I I, 1 x r. Sequence D is said to be contained in S or a subsequence of S if the tI following conditions are satisfied: (1) t I t I where t I 1 x r is the time at which I occurs in S, and (2) each I in D is contained in S. Definition 3 (Compactness constraint): Following definition 2, assume that D = I , I , … , I is a subsequence of S. Consider ms_length is a user-defined maximum span length. D is called a compact subsequence or c-subsequence of S if t I tI ms_length. Definition 4: A c-subsequence D is said to cyclically occur n times in S if repetitions of D in S are equal to n and the concatenation of D is also a c-subsequence of S, i.e. sequence is a subsequence of S. Definition 5 (Frequency subsequence): For a datasequence S, Assume that D is cyclically occurs in S and D is a c-subsequence of S, the frequency score of D, represent as Fscore(D,S), is defined as the number of occurrence of D in S. Total frequency score of D in DB, represented as TFscoreDB(D), is defined as the summation of the frequency scores of all data-sequences containing D TF
DB
D
(1)
Fscore D, S S∈DB
Definition 6 (Recency subsequence): Assuming that a D is a c-subsequence which is cyclically occurs n times in S and Dr denotes the rth repetition of D in S for 1 r n , the recency score of D in S, represent as Rscore (D, S), is equal to (1−δ) ^ tcurrent− t BI , where tcurrent indicate the current time stamp, δ is a user-specified decay speed where δ ∈ [0, 1], and t BI denotes the timestamp of the last itemset of Bn. For sequence database SDB, the total recency score of sequence D, represent as TRscoreDB(D), is defined as the summation of the recency scores of all data-sequence containing D. TR
DB
D
Rscore D, S
(2)
S∈DB
Definition 7 (Monetary subsequence): As mentioned in above definition, D is a c-subsequence which is cyclically D occurs n times in S. Let pq the purchase quantity of si of the th r repetition of Dr in S, and P (si) indicate the price of item si. Then D’s monetary score in S, represent as Mscore (D, S), is defined as follows. D
M
DB
D
p s D
S∈DB
pq
(3)
TABLE I.
SEQUENCE DATABASE[3]
s ′ , pt ′ , pq′ , s ′ , pt ′ , pq′ , … , s ′ , pt ′ compact prefix of S with respect to D.
Sid
Sequence
10
20
30
40
50
The total monetary score of sequence D, represent as TMscoreDB (D), is defined as the summation of the monetary scores of all data-sequences containing D. TM
DB
D
Mscore D, S
(4)
∈DB
Definition 8: Let a frequency, recency, and monetary thresholds is given by user, which is represent as Rminsup, Fminsup, and Mminsup respectively, a subsequence D is an RFM-sequential pattern (RFM-SP) if TFscoreDB (D) Fminsup, TMscoreDB (D) Mminsup and TRscoreDB (D) Rminsup. Consider Table I and Table II for entire process. TABLE II.
LIST OF ITEM AND UNIT PRICE
Item
Price
a
10
b
150
c
25
d
45
e
80
Definition 9 (The compact postfix): Given a data-sequence S = s , pt , pq , s , pt , pq , … , s , pt , pq , and a sequence D= I , I , … , I , D is a compact postfix of S if t , and (2) D is a c-subsequence of S. and only if (1) t I Definition 10 (The compact projection):S = s , pt , pq , s , pt , pq , … , s , pt , pq let D be a c-subsequence of S. A sequence S s , pt , pq , s , pt , pq , … , s , pt , pq of sequence S is called a compact projection of S with respect to compact postfix D if and only if (1) S′ is a c-subsequence of S, (2) S′ has compact postfix D, and (3) there exists no supersequence S″ of S′ such that S″ is a c-subsequence of S and also has compact postfix D. Definition 11 (The compact prefix): Let S ′ ′ be a compact s , pt ′ , pq′ , s ′ , pt ′ , pq′ , … , s ′ , pt ′ , pq′ projection of S with respect to the compact postfix D = I , I , … , I . Let s ′ be the first item in itemset I1 and pt be the time at which s ′ occurs in S′ (1 n y). Then C =
, pq′
is the
The downward closure property no longer holds if total monetary score (TMscore) is taken into account within the mining method. If more and more items are included in a nonRFMQ-SP, so its total monetary score will increase and it is possible for its super-sequence become an RFMQ-SP. This property is important in pattern-growth based methods (i.e., PrefixSpan, RFM-Q PostfixSpan) because it can decrease the search space to a great extent. To solve this problem, two definitions are given as follows [3]. Definition 12 (The sequence monetary value): Given sequence S, let D be a compact postfix of S. SM (D, S) is represent the sequence monetary value of compact postfix D in S which is equal to the total amount of monetary gained from all compact prefixes of S with respect to D. Definition 13 (The total sequence monetary value): It is defined as the summation of all the sequence monetary values of D in SDB. TSMSDB D
SM D, S S∈SDB
(5)
III. PROPOSED APPROACH A. Problem Definition RFM-sequential pattern mining gives pattern which is recent and profitable, while RFM-TI sequential pattern mining not only gives the pattern which is actively recent and profitable but it also gives time interval between each and every pairs of item in pattern. For example, we may have a RFM-Multitime interval sequential pattern . Here, the list of intervals (ti1, ti2, ti3) indicate that the time interval between item “a” and item “b” is ti1, item “a” and “c” is ti2 and item “a” and item “d” is ti3. Definition 1 Multi-time-interval sequence: Let I= {i , i , … , i } be the set of all items and TI= {I , I , I , … , I } be the set of time intervals. A sequence B = (b , & , b , … , & , b ) is a multi-time interval sequence if 1. b ∈ I for 1 i s. 2. &i is an interval list , TR , … , TR ), 3. &i is represented as ( TR where TRxy∈ TI denotes the time interval between x∈ I and y ∈ I. For example, a multi-time-interval sequence B = ( c, I , I , I , a, I , I , b, I , d ) has &4= I ,I ,I = TR , TR , TR &3= (I1, I2) = (TR ,TR ), &2= (I1) = (TR ). Property 1 Descending property: The time intervals in each &i ≥TR for 1≤ u < v ≤ i in each &i. In the must satisfy TR above instances, &4 = I , I , I = TR , TR , TR , the time interval between c and d must be no shorter than the time interval between “ca” and “cb”. Therefore, it must TR TR TR , i.e.I I I
Definition 2 Contained Let A=((a1,t1,q1),(a2,t2,q2),(a3,t3,q3),...,(an,tn,qn)) be a data sequence and B = (b , & , b , … , & , b ) be a multi-timeinterval sequence. Then, B is contained in A if there are integers 1 j j j n satisfying 1. b1 =a , b2=a ,....,bs= a . 2. If t tj tj , t is in the time interval of & k, k for all 2 i s and for all 1 k i 1 For example, a data sequence A = ((a, 1, 2), (b, 3, 1), (c, 5, 2), (d, 6, 4), (e, 8, 5)), and I : t 0 , I : 0 3, I : 3 6, I : 6 ∞. A multi-time-interval sequence (d, I , I , I , c, I , I , b, I , a) is contained in A because 1. d =a , c=a , b=a , a=a . 2. t
t
t
t
t
6
5
1, t ∈ & 3,3
I
t
t
t
t
t
6
3
3, t ∈ & 2,2
I
t
t
t
t
t
6
1
5, t ∈ & 1,1
I
t
t
t
t
t
5
3
2, t ∈ & 2,2
I
t
t
t
t
t
5
1
4, t ∈ & 1,1
I
t
t
t
t
t
3
1
2, t ∈ & 1,1
I
Definition 3 Subsequence: A multi-time-interval sequence C = (d, (ti , ti ), c, (ti ), a) is a subsequences of multi-time interval sequence B = (d, (ti , ti , ti ), c, (ti , ti ), b, (ti ) a) because 1. b ′
2.&
d, b
c, b
ti
TR
a, where i1=1, i2=2, i3=4 and &′
ti , ti
TR , TR
Definition 4 k-subsequence: The total number of items in a multi-time-interval sequence is the length of the sequence. A sequence C is a k-subsequence of B when the length of C is k For example, consider a multi-time-interval sequence B :(c, I , I , b, I , a). All the B’s subsequence’s are {c, b, a, (c,(I1),b), (c,(I2),a), (b,(I1),a), (c,(I1,I2),b,(I1),a)}. Definition 5 Table: Table is used to count the support of candidate pattern. Let α= (b , & , b , … , & , b ), and then we construct a table, where each column corresponds to a frequent item γ in S|α and each row corresponds to a kind of multi-time-interval relationship in & k, 1 , TR , … , TR ), when α is of length k. (TR B. RFM-TI PostfixSpan Algorithm Input: A sequence database SDB, and maximum span length ms_length, and support threshold Rminsup, Fminsup, Mminsup, Tminsup Subroutine: RFM-TI PostfixSpan (α, l, SDB|α) Parameters: l is a length of α (Pattern) • S|α is the α projected database • α is a set of RFT-SP Output: The Complete set of RFT-SP and RFMT-SP Method: Each item in sequence database SDB|α is attach before α to form α′ or is added into the first itemset of α to form
α′ Scan the entire database SDB|α once and compute Rscore( α′ ) ,Fscore( α′ ), Mscore( α′ ),and time interval between item of each α′ For each α′ Calculate the support count IF support count> Tminsup then Scan the database SDB|α once and calculate TRscore(α′ ), TFscore(α′ ), TMscore(α′ ) of each α′ End For For each α′ IF TRscore(α′ )≥Rminsup,TFscore(α′ )≥Fminsup, TMscore(α′ )+TSMSDB(α)≥Mminsup then output α′ as RFT-SP IF TMscore(α′ ) ≥Mminsup then output α′ as RFMT-SP ′ Construct α - projected database SDB|α′ ,and call RFM-TI Postfix Span(α′ ,l+1, S|α′ ) END For Fig. 1. RFM-TI PostfixSpan Algorithm
C. Example of RFM-TI PostfixSpan Algortihm The RFM-TI PostfixSpan algorithm is developed by modifying the RFM-PostfixSpan [3] algorithm. RFM-TI PostfixSpan algorithm solves the problem by a divide and conquer-based strategy. Initially, α is set to null. For a given α–project database S|α, the algorithm first finds all the 1 length frequent patterns in S|α. For each 1 length frequent pattern in S|α, is appended before to α′ to form a RFM-TI sequential pattern α′ and calculate total R, F, M score for each α′, and then constructs the α′ –project database S| α′. If the length is l >1, then before calculating a total R, F, M score, we construct a Table which is used to record supports, if support is greater than minimum support(Tminsup), then total RFM score is calculated for that Pattern and check the pattern R,F,M score if it is satisfy the constraint, then that pattern is output as RFM- multi-time-interval sequence α′ with the length (l+1) and then constructs the α′ –project database S| α′. For creating a projected database of a pattern whose length l >1, for those pattern only those prefix projection is consider in projected database whose postfix satisfy the time interval between item in postfix. Recursively finding the RFM- multitime-interval sequence in S| α′ and finally yields all the RFMmulti-time-interval sequence in S. The RFM-TI PostfixSpan algorithm is showed in Fig. 1. Consider Table I and Table II for database and price of items. Consider Minimum Support Tminsup=2, tcurrent=110, ms_length=40, Rminsup=0.05, Fminsup=2, and Mminsup=1500. Time-Interval: I0: t=0, I1:0, start to find 2-RFTI sequential pattern and 2-RFMTI sequential pattern with postfix < (a)>. For the first compact projection sid10 the Rscore, Mscore, Fscore, and time interval between item is calculated. So, In sid10 the Rscore =0.000626, Fscore=1, Mscore=70. The values of Rscore=0.000626, Fscore=1, Mscore=120. For the second projection sid20 the Rscore=0.00179,Fscore=1,Mscore=2 40. Similarly all the remaining projection can be navigate and the score of each pattern can be calculated in same manner. Then, number of occurrence of each pattern will calculate. For example, a(I2)a occurred two times in projected database. So support count of a(I2)a is 2. Similarly support count of each pattern is calculated. The support count of b(I1)a=1, b(I2)a=1,c(I1)a=2,c(I2)a=1,d(I1)a=2,e(I2)=2. If support count is greater than Tminsup (minimum support threshold) then total R, F, M score of only that pattern is calculated. So after counting each pattern in -projected database, we get : (0.07240,2,160), : (0.1221,2,5020), : (0.12236,2,1410), : (0.4204,2,1890), where the notation “: (TRscoreSDB, TFscoreSDB, TMscoreSDB)” represent the pattern and its associated total recency, total frequency, total monetary score. Pattern satisfy the Rminsup, Fminsup, Mminsup and TMscoreSDB()+TSMSDB()=5020+7160=12,180. We output as a 2-RFT-SP and 2-RFMT-SP. Then algorithm is proceeds to build the projected database of
which is shown in fig.3 and find the 3-RFT-SP and 3-RFMT-SP with the postfix . Continuing in this way Yields the complete set of RFT-SP and RFMT-SP in SDB as shown in fig.4 IV. EXPERIMENT EVALUATION This section performs a simulation study to empirically compare the all proposed algorithm with the RFMPostfixSpan algorithm. All the algorithms are implemented in PHP language and tested on the Intel(R) core(TM) i3 2350M CPU-2.30 GHz Windows 7 system with 4 gigabytes of main memory. A. Real-life Datasets We used real-life sequential dataset, called SC-POS, which is the sales data of a chain supermarket in Taiwan. It consist a transactional records from twenty branches between 2002/03/01 and 2002/06/15. We take 10,000 records for our experiments. Each data sequence represent the shopping list of a customer’s transactions, where each transaction recorded include the purchased time, purchased item no, item prices and quantities of purchased items. A data pre-processing and cleaning tasks were performed, which includes merging the transactions of the same customer as one data-sequence. At last, the final dataset contained 2904 items and 3152 customers’ data-sequences. Here we take small dataset because of space limitation, if large data set is taken than difference in number of pattern is large. B. Performance Evaluation We compare the runtime and number of pattern generated for RFM-PostfixSpan and RFM-TI PostfixSpan algorithms by varying Monetary from 100 to 250.
Postfix
RFT
RFMT
, , , , ,
,,
, , , , , , , , , , ,
, , , , , ,
, , ,
,
We use Table III to compare monetary values and number of patterns produced by RFM-PostfixSpan algorithm and RFMTI PostfixSpan algorithm under the selected value of monetary. While all other values such as Rminsup=0.05, Fminsup=2, Tminsup=2. TABLE III.
MONETARY SUPPORT VS. NUMBER OF PATTERNS
Runtime (sec)
Monetary Support (Mminsup) 100 150 200 250
Number of Pattern RFM
RFM-TI
779 615 461 365
772 609 457 363
Runtime(sec)
Fig. 4. The complete set of RFT-SP and RFMT-SP
524 522 520 518 516 514 512 510 508 506 504 502 500
RFM RFM-TI 0.05 0.07 0.09 0.11 Recency Fig. 6. Runtime vs. Rminsup
530 528 526 524 522 520 518 516 514 512 510 508 506 504 502 500
We use Table IV to compare recency values and number of patterns produced by RFM-PostfixSpan algorithm and our proposed algorithm under the selected value of recency. While all other value such as Tminsup=2, Fminsup=2, Mminsup=100. RFM RFM-TI
TABLE IV.
RECENCY SUPPORT VS. NUMBER OF PATTERNS Recency Support
100 150 200 250 Mminsup Fig. 5. Runtime vs. Mminsup
From above comparison we can see that, our proposed algorithm takes a little more time but generate less number of patterns than RFM-PostfixSpan algorithm. This is because our proposed algorithm trims uninteresting pattern at the cost of complicated processing. As a result, the number of pattern reduced considerably but difference in runtime is not as large. Second, we compare the runtime of both the algorithms by varying Recency from 0.05 to 0.11 which is shown in Fig.6.
Number of Pattern
(Rminsup)
RFM
RFMTI
0.05
779
772
0.07
733
726
0.09
700
693
0.11
675
668
From above result we can see that it gives a similar behavior as monetary constraint. Similarly, we compare the runtime of both algorithms by varying Support from 2 to 5. We use Table V to compare Frequency values and number of patterns produced by RFM-PostfixSpan algorithm and my proposed algorithm under the selected value of Frequency. While all other values such as Mminsup=100, Rminsup=0.05, Tminsup=2.
518
TABLE VI.
Runtime(sec)
514
|D| VS. NUMBER OF PATTERNS
|D|
510
Number Of Pattern
506
RFM
RFM-TI
545
543
502
RFM
8000
498
RFM-TI
10000
779
772
15000
1111
1104
494 490 2
3
4
5
The following tests are designed to evaluate how the Time constraints influence the proposed algorithm’s runtime and number of patterns. We vary only the value of a single parameter Tminsup keeping all other parameters constant. Fig. 9 and Table VII show that the Tminsup increases, both runtimes and numbers of patterns decreases.
Fminsup Fig. 7. Runtime vs. Fminsup TABLE V.
FREQUENCY SUPPORT VS. NUMBER OF PATTERNS
TABLE VII.
Number of Pattern
Support
RFM
(Fminsup) 2
779
772
3
543
541
4
420
420
5
345
345
TMINSUP VS. NUMBER OF PATTERN Number Of Pattern
Time Support (Tminsup)
RFM-TI
RFM-TI 2 3 4 5
Next, the scalabilities of the both algorithms are compared. During the tests, we vary the value of a selected parameter |D| (database size) and keep all the other parameters constant. In each test, however, a parameter is increased to determine how the algorithms scale up as the parameter increases. The number of records varies from 8,000 to 15,000. Parameters which is given below will kept constant Mminsup=100, Rminsup=0.05, Fminsup=2, Tminsup=2. The results from Fig. 8 and Table VI indicate that the runtimes and numbers of patterns of these algorithms scale up linearly with |D|.
772 658 635 628
520 515 Runtime(sec)
Frequency
510 505 RFM-TI
500 495
Runtime(sec)
490 2
1020 970 920 870 820 770 720 670 620 570 520 470 420
3
4
5
Tminsup Fig. 9. Runtime vs. Tminsup
RFM RFM-TI 8000 10000 15000 |D| Fig. 8. Runtime vs. |D|
V. CONCLUSION RFM sequential pattern mining gives patterns which are recently active and profitable but it cannot give the time interval between each and every pairs of item in pattern. For discovering time interval we proposed a RFM-TI algorithm which discovers complete sets of RFMT-SPs from customer sequence database. From these patterns we can find that what customer might buy in next “h” step rather than next step.
With this capability, we can identify long-term customer behaviors, which help us take suitable actions beforehand. A real life data set is used in our experiments. The result shows that the proposed method outperforms the RFM-based sequential pattern mining in terms of number of generated patterns and also retains more meaningful results for user. In future work, fuzzy constraint can be added, so the limit is no longer rigid but it become flexible. Work can also be extended by adding more constraint such as location. REFERENCES [1] Micheline Kamber, Jiawei Han, Data Mining: Morgan Kaufmann, pp. no. 498-507, 2006. [2] R. Agrawal and R. Srikant, “Mining Sequential Patterns”, In Proceedings of the 1995 International Conference on Data Engineering, pp. 3-14, 1995. [3] Tony Cheng-Kui Huangb, Yu-Hua Kao, Ya-Han Hua, “Knowledge Discovery of Weighted RFM Sequential Patterns from Customer Sequence Databases”, The Journal of Systems and Software, 779-788, 2013. [4] Mi-Hao Kuo A, Shin-Yi Wub, Kwei Tang Yen-Liang Chen A,"Discovering Recency, Frequency, And Monetary (RFM) Sequential Patterns from Customers’ Purchasing Data, "Electronic Commerce Research and Applications, pp. 241-251,March 2009. [5] Hu Ya Han (2007), The Research of Customer Purchase Behavior Using Constraint-based Sequential Pattern Mining Approach, Ph.D. Thesis, National Central University Institute of Information Management: Taiwan. [6] Mi-Hao Kuo (2007), Discovering RFM sequential patterns from customers’ purchasing data, Master's thesis, National Central University Institute of Information Management: Taiwan. [7] Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, Mei-Chun Hsu Jian Pei, "Prefixspan: Mining Sequential Patterns Efficiently By Prefix-Projected Pattern Growth," In 17th International Conference On Data Engineering, 2001, pp. 215-224. [8] Derya Birant, Data Mining Using RFM Analysis, Knowledge-Oriented Applications In Data Mining, Prof.Kimito Funatsu (Ed.), ISBN: 978-953307-154-1, In Tech, 2011. [9] Tony Cheng-Kui Huang, Hui-Ru Yang , Yen-Liang Chen ,Ya-Han Hu, "On mining multi-time-interval sequential patterns," Data & Knowledge Engineering, pp. 1112-1127, 2009. [10] Jong-Hwa Lim,Raymond T.Ng,Kyuseok Shim ,Chulyun Kim, "SQUIRE: Sequential pattern mining with quantities," The Journal of System and Software, pp. 1726-1745, 2007. [11] C K Bhensdadia, Y P Kosta, "An Efficient Algorithm for Mining Frequent Sequential Patterns and Emerging Patterns with Various Constraints," International Journal of Soft Computing and Engineering, vol. 1, no. 6, January 2012. [12] Chetna Chand, Amit Thakkar, Amit Ganatra, "Target Oriented Sequential Pattern Mining Using Recency And Monetary Constraints," International Journal Of Computer Applications, Vol. 45, No. 10, May 2012. [13] Hao-En Chueh, "Mining Target -Oriented Sequential Pattern With TimeIntervals," International journal of computer science & information Technology, vol. 2, August 2010. [14] Suh-Yin Lee Ming-Yen Lin, "Efficient mining of sequential pattern with time constraint by delimited pattern growth," Knowledge and Information System, pp. 499-514, 2005. [15] Deepak Garg, Preetam Singh Grover Bhawna Mallick, "ConstraintBased Sequential Pattern Mining: A Pattern Growth Algorithm Incorporating Compactness, Length and Monetary," The International Arab Journal of Information Technology, 2012. [16] Fan Wu, Tzu-Wei Yeh, Ya-Han Hu, "Considering RFM-Values Of Frequent Patterns In Transactional Databases," In Software Engineering And Data Mining, Chengdu, 2010, pp. 422 - 427. [17] Shih-Yen Lin, Hsin-Hung Wu ,Jo-Ting Wei, "A review of the application of RFM model," African Journal of Business Management, vol. 4, pp. 4199-4206, December 2010.
[18] Jiawei Han ,Manish Gupta, "Approaches for Pattern Discovery Using Sequential Data Mining," University of Illinois at Urbana-Champaign, USA. [19] Bult, J. R. & Wansbeek, T. (1995). Optimal selection for direct mail, Marketing Science, Vol. 14,No. 4, (Fall 1995) 378-394, ISSN:07322399. [20] Chen, Yen-Liang, Mei-Ching Chiang, and Ming-Tat Ko. "Discovering time-interval sequential patterns in sequence databases." Expert Systems with Applications 25.3 (2003): 343-354. [21] Hui-Ru Yang (2003),"Discovering multi-time-interval sequential patterns in sequence database", Master Thesis, National Central University Department of Information Management [22] Bhawna Mallick, Deepak Garg and P.S Grover (2012),CFMPrefixSpan :A Pattern Growth Algorithm Incorporating Compactness and Monetary”.International Journal of Innovative Computing , 8 (7), 4509-4520.