Serv Bus (2009) 3:117–130 DOI 10.1007/s11628-009-0064-8 ORIGINAL PAPER
Comparison of customer response models David L. Olson Æ Qing Cao Æ Ching Gu Æ Donhee Lee
Received: 20 February 2009 / Accepted: 24 February 2009 / Published online: 7 March 2009 Ó Springer-Verlag 2009
Abstract Segmentation of customers by likelihood of repeating business is a very important tool in marketing management. A number of approaches have been developed to support this activity. This article reviews basic recency, frequency, and monetary (RFM) methods on a set of data involving the sale of beef products. Variants of RFM are demonstrated. Classical data mining techniques of logistic regression, decision trees, and neural networks are also demonstrated. Results indicate a spectrum of tradeoffs. RFM methods are simpler, but less accurate. Considerations of balancing cell sizes as well as compressing data are examined. Both balancing expected cell densities as well as compressing RFM variables into a value function were found to provide more accurate models. Data mining algorithms were all found to provide a noticeable increase in predictive accuracy. Relative tradeoffs among these data mining algorithms in the context of customer segmentation are discussed. Keywords RFM Customer segmentation Neural networks Decision tree models Logistic regression
D. L. Olson (&) D. Lee Department of Management, University of Nebraska, Lincoln, NE 68588-0491, USA e-mail:
[email protected] D. Lee e-mail:
[email protected] Q. Cao C. Gu Rawls College of Business, Texas Tech University, Lubbock, TX 79409-2101, USA Q. Cao e-mail:
[email protected] C. Gu e-mail:
[email protected]
123
118
D. L. Olson et al.
1 Introduction The Recency, Frequency, and Monetary (RFM) approach is a method to identify customers who are more likely to respond to new offers. RFM groups customers by: 1. 2. 3.
Recency: time since the customer made his/her most recent purchase, Frequency: number of purchases this customer made within a designated time period, Monetary: average purchase amount.
Bult and Wanskeek (1995) gave an early description of RFM for direct mail marketing. The three variables tend to be correlated, especially F and M. Yang (2004) suggested collapsing the data to a single variable ‘‘Value’’ = M/R. The procedure of RFM is to code each customer on the three dimensions listed above. One approach is to sort on each dimension and divide each into five equal parts, thus yielding 125 combinations. Each of these 125 cells can be associated with a positive response to a marketing campaign. RFM is limited in that there are usually more than three attributes important to a successful marketing program. Other attributes that may be important include product variation, customer age, customer income, customer lifestyle, and so on (Fitzpatrick 2001). However, RFM is the basis for a continuing stream of techniques to improve customer segmentation marketing (Elsner et al. 2003), and has been identified as widely used in a number of surveys (Verhoef et al. 2003). It has been found to work relatively well, if expected response rate is high (McCarty and Hastak 2007). This article demonstrates how RFM and other models classify potential retail customers work, using a sample of data from a beef product retailer. Section 2 describes the dataset used. Section 3 describes the RFM application, and presents the results of variants of RFM. Section 4 gives results for the classical data mining classification models. Section 5 compares the results and gives conclusions.
2 Data A meat products company is one of the largest retailers of premier meat products with worldwide operations. They process, market, and distribute a wide variety of premium steaks, red meats, and other gourmet foods. These products are custom cut and packaged to serve the needs of various markets. Markets nationwide and overseas include foodservice, mail order, incentive, telesales, retail stores, licensed restaurants’ sales to specialty, and food stores, and since 1990, interactive sales. The complete dataset included 64,180 individual purchase orders (mail order division) of 10,000 individual customers from October 11, 1998 to October 03, 2003. The purchase orders included ordering date, ordering amount (price), and whether or not promotion was involved. Normally, RFM analysis involves the use of millions of data points. Our purposes, however, are to describe how the method works, and we extracted a small sample to do so. Of this data, 1,000 observations were used to build the models, and 1,000 used for testing. This building set was
123
Comparison of customer response models Table 1
Variable correlations
* P B 0.10 *** P B 0.01
119
R
F
R
1.0
F
-0.385
1.0
M
-0.307
0.808
Response 2003 Response 2003 $
0.380*** -0.061*
M
1.0
0.086***
0.048
0.428***
0.518***
processed to identify the performance as of the end of 2002, using 2003 response as the predictive variable (binary, with value 1 if there was any activity in 2003, 0 if not). There is some correlation among these variables, as shown in Table 1. These correlations indicate that R is the most useful variable with respect to binary response, while F and M contribute additional minor predictive power. With respect to money spent in 2003, M is the strongest predictor, while F also has a strong relationship.
3 Research in RFM There have been many studies applying RFM to customer analysis (Weng et al. 2006/2007). Efforts to improve results through application of data mining techniques abound, to include Bayesian networks (Baesens et al. 2002; Cui et al. 2006), association rules (Yun et al. 2003), and statistical approaches to include logistic regression (McCarty and Hastak 2007). One of the elements of the RFM model is to allow differential weights to each of the three components. The use of analytic hierarchy process (AHP) for RFM analysis has been studied and reported (Shih and Liu 2003; Liu and Shih 2005a, b). 3.1 RFM Application RFM was initially applied, dividing the scales for each of the three components into five equally spaced groups. Table 2 shows boundaries. Group 5 was assigned the most attractive group, which for R was the minimum, and for F and M the maximum. RFM analysis typically divides the data into 125 cells, designated by the five groups. The most attractive group would be 555 or Group 5 for each of the three variables. Using equally divided scales thus indicates a maximum count of 1 for this most attractive cell (there is only one observation in Group 5 for F). This observation in fact falls in Group 5 for all the three variables (R = 16, F = 56, and M = $8,070, which can be seen to be in Group 1 for all the three variables in Table 3). Table 3 displays the counts obtained for these 125 cells (most null). 3.2 Response As to predictive results, measured as response in 2003, all cells having R groups of 1 through 3 responded positively and only one observation in R4 did not. The response
123
120 Table 2
D. L. Olson et al. RFM boundaries
Factor
Min
Max
Group 1
R
1
1,540
1,233–1,540
925–1,232
618–924
310–617
1–309
80
41
59
100
720
1–12
13–23
24–34
35–45
46–56
878
100
17
4
1
4–1,719
1,720–3,434
3,435–5,149
5,150–6,864
6,865–8,579
902
74
18
4
2
Count F
1
56
Count M
4.77
Count
8579.32
Group 2
Group 3
Group 4
Group 5
Note the skewness of the data, which is often encountered. Here, the smaller values dominate in all the three metrics, which is attractive in Recency, but the unfavorable side for Frequency and Monetary Bold indicates count (number of observations in group)
Table 3
Count by RFM cell
RF
M1
M2
M3
M4
M5
55
0
54
0
0
0
0
1
1
2
0
53
1
1
9
5
2
0
52
51
39
8
1
1
51
572
23
3
0
0
0
0
0
0
0
43–45 42
1
0
0
0
0
41
98
1
0
0
0
32–35 31 22-25 21 12–15 11
0
0
0
0
0
58
1
0
0
0
0
0
0
0
0
41
0
0
0
0
0
0
0
0
0
80
0
0
0
0
rate in 2003 for group R5 was 456/720 = 0.63. The result has the counterintuitive outcome that the better the R group, the less likelihood of positive response. This indicates the possibility of seasonality in the data that might be warping responses. For F groups, F1, F2, F3, and F4 had response rates of 630/878 = 0.72, 85/100 = 0.85, 16/17 = 0.94, 2/4 = 0.50, respectively, and F5 a perfect response based on only one observation. For M groups, M1’s response was 662/902 = 0.73, M2’s 59/74 = 0.80, M3’s 12/18 = 0.67, and M4 and M5 both perfect for their small count of three each. Using a model to predict response where R [ 309 (R in Groups 1 through 4), or F [ 23 (F in Groups 3 through 5), or M [ 5,149 (M in Groups 4 or 5), there were 323 cases out of 1,000 total test cases. The coincidence matrix for this model is given in the Appendix Table 9, along with results from all other models. This model thus had an accuracy of 0.415. Most of the error was that most cases not selected actually did respond.
123
Comparison of customer response models Table 4 RFM divisions based on dollars spent
121
Total
No response
Response
Average $
1–309 days
690
275
415
310–617 days
101
0
101
$132.42
618–924 days
52
2
50
$114.07
R range
925–12,332 days 1,233? days
$204.67
47
1
46
$121.43
110
0
110
$117.89
F range 46?
2
0
2
$353.96
33–45
4
0
4
$991.50
24–34
6
0
6
$299.48
13–23
85
12
73
$284.90
1–12
903
266
637
$150.41
$6,865?
2
0
2
$1,576.83
$5,150–$6,864
2
1
1
$404.96
$3,435–$5,149
10
2
8
$830.13
$1,720–$3,434
53
8
45
$286.66
933
267
666
$150.11
M range
0–$1,719
Another model, based on response in terms of dollars spent, was generated, selecting R = 1, F = 2–5, and M = 2–5. The response in the test set yielded Table 4. With the exception of those cells involving very low counts, there is a consistent trend in dollars expended. Here, low R values performed best, as expected. The coincidence matrix in terms of the binary response was 0.591, quite a bit higher than the model built based on binary response. Detailed results are given in the Appendix Table 9. Table 5 shows the results of balancing observations as much as possible within each of the 125 cells. The correlation across F and M (0.808 in Table 1) can be seen in Table 5, looking at the R = 5 categories. In the M = 1 category, there are many F entries, declining through M = 2 and M = 3, and 0 for M = 4 and M = 5. Conversely, when F = 5, the only observation had M = 5. One of the problems with RFM is this skewness. Our dataset is small, and obviously it would be better to have millions of observations, which would increase cell counts. However, sometimes data is not available. Thus, a second approach may be to tried to obtain cells with more equal density (size-coding). We can accomplish this by setting cell limits by count of the building set. We cannot obtain eight counts per each cell because we are dealing with three scales. However, we can come closer, as in Table 6. This was generated sequentially, starting by dividing R into five roughly equal groups. Within each group, F was then sorted into groups of roughly 40 each, and then within those 25 groups, M divided into groups of roughly eight, as shown in Table 5. The rough part arose primarily due to R and F being integer values, and these were not split across groups.
123
122
D. L. Olson et al.
Table 5
Balanced group limits
Factor
Group 1
Group 2
Group 3
Group 4
Group 5
R
554–1,540
165–553
57–164
25–56
1–24
201
200
199
198
202
Count RF 55
R
F
R 1–24
F 13?
Count 54
F 8–12 Count
53
F 5–7 Count
52
F=4 Count
51
F 1–3 Count
45
R 25–56
F 12?
Count 44
F 7–11 Count
43
F 5–6 Count
42
F=4 Count
41
F 1–3 Count
35
R 57–164
F 13?
Count 34
F 9–12 Count
33
F 7–8 Count
32
F 5–6
Count
123
1–1,379
1,380–1,720
1,721–2,387
2,388–2,957
2,958?
8
8
8
8
8
1–600
601–802
803–1,119
1,120–1,486
1,487?
7
7
7
8
8
1–350
351–527
528–703
704–924
925?
10
10
10
10
11
1–242
243–291
292–360
361–472
472?
5
5
5
6
6
1–132
133–166
167–216
217–323
324?
9
9
9
10
10
1–1,283
1,284–1,715
1,716–2,351
2,352–3,820
3,821?
7
7
7
7
7
1–513
514–749
750–856
857–1,202
1,203?
9
9
9
9
10
1–378
379–483
484–708
709–850
851?
6
6
7
7
7
1–192
193–267
268–333
334–407
408?
5
5
5
5
5
1–109
110–135
136–177
178–254
255?
11
12
12
12
12
1–1,120
1,121–1,638
1,639–2,367
2,368–2,879
2,880?
7
7
8
8
8
1–678
679–997
998–1,152
1,153–1,508
1,509?
8
8
8
9
9
1–443
444–571
572–705
706–1,088
1,089?
6
7
7
7
7
376-564
565-728
729?
8
8
8
1–147
148–209
210–313
314–385
386?
9
9
9
9
10
F 9?
1–566
567–569
570–1,131
1,132–1,574
1,575?
8
9
9
9
9
F 6–8
1–465
466–608
609–789
790–941
942?
9
9
10
10
10
F=5
1–302
303–392
393–423
424–545
546?
6
6
6
7
7
Count 23
M5
308-375
Count 24
M4
8
F 1–4 R 165–553
M3
1-307
Count 25
M2
7
Count 31
M1
Comparison of customer response models
123
Table 5 continued RF
R
22
F F 3–4
Count 21
F 1–2 Count
15
R 554?
F 5?
M5
1–180
181–220
221–309
310–441
442?
10
11
11
11
11
1–139
140–154
155–183
184–195
196?
4
5
4
4
5 863?
426–637
638–862
4
5
5
5
F=4
1–245
246–341
342–400
401–502
503?
3
4
4
4
4
F=3
1–127
128–194
195–302
303–538
539?
6
6
6
6
6
F=2
1–88
89–112
113–143
155–211
212?
12
14
14
13
14
F=1
1–45
46–53
54–71
72–105
106?
17
6
14
12
13
Count 11
M4
359–425
Count 12
M3
1–358
Count 13
M2
4
Count 14
M1
Count Bold indicates count in group
The unevenness of some cells (such as categories R = 1 F = 2 and R = 1 F = 1) are due to ties in the F category. Even our attempt to attain cells with eight observations each thus encountered a degree of variance in the counts. This approach does, however, provides data that can be used to calculate lift. (Again, it would obviously be much better to have many more observations, but the calculation cost was extremely time intensive, and our purpose is to demonstrate.) 3.3 Lift Retailers and manufacturers know that they are wasting a lot of money on mass marketing. The concept of lift is critical to marketing promotion. Lift is the difference between the average probability of positive response and the response obtained. Neural network models were used to identify the overlaps in mailing patterns and order-filling telephone-call orders. This enabled Fingerhut to more efficiently staff their telephones and enable them to handle heavy order loads. The purpose of lift analysis is to identify the most responsive segments. We are probably more interested in profit, however. We can identify the most profitable policy, but what really needs to be done is to identify the portion of the population to send promotional materials to. The basic objective of lift analysis in marketing is to identify those customers whose decisions will be influenced by marketing in a positive way (Lo 2003). In short, the methodology described earlier identifies those segments of the customer base that we would expect to make a purchase. This may or may not be due to the marketing campaign effort though. The same methodology can be applied, but more detailed data is needed to identify those whose decisions would have been changed by the marketing campaign rather than simply by those who would purchase.
123
124 Table 6
Fig. 1
D. L. Olson et al. V Values by cell
Cell
0
[0
Prob{[0}
Min V
Max V
20
15
35
0.70
134.203
1583.680
19
26
24
0.48
65.084
133.450
18
21
29
0.58
37.293
64.861
17
22
28
0.56
26.251
37.133
16
26
24
0.48
19.688
26.204
15
17
33
0.66
15.671
19.396
14
20
30
0.60
12.612
15.631
13
25
25
0.50
9.263
12.433
12
23
27
0.54
6.685
9.229
11
17
33
0.66
4.956
6.684
10
13
37
0.74
3.822
4.827
9
15
35
0.70
2.965
3.758
8
14
36
0.72
2.210
2.940
7
8
42
0.84
1.555
2.209
6
3
47
0.94
0.983
1.553
5
0
50
1.00
0.535
0.955
4
0
50
1.00
0.295
0.533
3
0
50
1.00
0.143
0.291
2
0
50
1.00
0.065
0.143
1
1
49
0.98
0.021
0.064
Lift for equalized data groups
The lift chart for this data is given in Fig. 1. Lift maxes out at Group 554, the 73rd of 125 cells. This cell had a response rate of 0.75, slightly above the training set data average of 0.739. Of course, the point is not to maximize the lift, but to maximize profitability, which requires knowing expected profit rate for revenue, and cost of marketing. The test results for coding the data with an effort to balance the cell size yielded overall correct classification, which was relatively better at 0.792.
123
Comparison of customer response models
Fig. 2
125
Lift of test set
3.4 Value function The value function compresses the RFM data into one variable—V = M/R (Yang 2004). Since F is highly correlated with M, the analysis is simplified to one dimension. Dividing the training set into groups of 50, sorted on V, generates Table 6. Based on the probability being [0, there seems to be a clear demarcation between cells 1 through 10 (higher probabilities of purchase) and cells 11 through 20. In cells 1–10, Prob{[0} is always C 0.7, while in cells 11–20 Prob{[0} is always less than 0.7. Thus, low values of V (4.827 and lower) seem associated with more sales in this data. This implies lower purchase amounts with longer recency seeming more attractive. This seems counterintuitive, but we applied it to the test dataset. The overall correct classification rate was 0.677. The lift chart using sorted V from the test set as the horizontal access (low values to the left) is shown in Fig. 2. The maximum lift is for the 9th group, associated with V ranging from 2.45 to 3.26.
4 Data mining classification models Three classical data mining classification models were applied to the data: logistic regression, decision trees, and neural networks. 4.1 Logistic regression model Logistic regression is a classical statistical tool for modeling discrete outcome data, and has been one of the three basic methods applied in most commercial data mining software. McCarty and Hastak (2007) applied logistic regression, as well as a decision tree algorithm and RFM to their two direct marketing datasets. In our data, the outcome variable is dichotomous. We can code it as either 0 (did not
123
126 Table 7 Regression betas for logistic regression
*** P B 0.01
D. L. Olson et al.
Variable
Beta
Constant
-2.1772
Significance
R
0.0284
0.000***
F
0.1726
0.000***
M
0.0001
0.422
respond in 2003) or 1 (responded in 2003). The purpose of logistic regression is to classify cases into the most likely category. Logistic regression provides a set of b parameters for the intercept (or intercepts in the case of ordinal data with more than two categories) and independent variables, which can be applied to a logistic function to estimate the probability of belonging to a specified output class (Olson and Shih 2007). The formula for probability of acceptance of a case i to a stated class j is Pj ¼
1 P ; b0 bi x i Þ ð 1þe
where the b coefficients are obtained from logistic regression. A logistic regression model was run on RFM variables. The model results were as shown in Table 7. M was not significant in this model. Application to test data yielded an overall correct classification rate of 0.840. Logistic regression has the ability to include other variables. The only external variable to RFM available was promotion. Including promotion improved the model fit by the smallest of margins. 4.2 Decision tree The second classical model considered was decision trees. Decision trees in the context of data mining refer to the tree structure of rules. They have been applied by many in the analysis of direct marketing data, to include Van den Berg and Breur (2007a, b), and D’Souza et al. (2007). The data-mining decision tree process involves collecting those variables that the analyst thinks might bear on the decision at issue, and analyzing these variables for their ability to predict outcome. Decision trees are useful to gain further insight into customer behavior, as well as lead to ways to profitably act on results. One of a number of algorithms automatically determines which variables are most important, based on their ability to sort the data into the correct output category. The method has relative advantage over neural network and genetic algorithms in that a reusable set of rules are provided, thus explaining model conclusions. There are many examples where decision trees have been applied to business data mining, including classifying loan applicants, screening potential consumers, and rating job applicants (Olson and Shih 2007). Decision trees provide a way to implement rule-based system approaches. The ID3 system selects an attribute as a root, with branches for different values of the attribute (Koonce et al. 1997). All objects in the training set are classified into these
123
Comparison of customer response models
127
branches. If all objects in a branch belong to the same output class, the node is labeled and this branch is terminated. If there are multiple classes on a branch, another attribute is selected as a node, with all possible attribute values branched. An entropy heuristic is used to select the attributes with the highest information. In other data mining tools, other bases for selecting branches are often used. WEKA includes many decision tree algorithms, one of the most popular being J48. The J48 decision tree algorithm using 10-fold cross validation in WEKA was applied to the build dataset. The resultant decision tree was IF Recency \= 82 AND Frequency \= 9: no (364.0/137.0) ELSE Frequency [ 9: yes (132.0/34.0) IF Recency [ 82: yes (504.0). This model did very well on the test data, with a correct classification rate of 0.837. Decision trees also provide a very easy to understand predictive model. 4.3 Neural network Neural networks are the third classical data mining tool found in most commercial data mining software products. Cui et al. (2006) and Crone et al. (2006) applied these models to direct marketing applications. Probabilistic Neural Networks (PNN) use kernel-based approximation to form an estimate of the probability density functions of classes in a classification problem, one of the so-called Bayesian networks (Specht 1990). Probabilistic Neural Networks are known for their ability to train quickly on sparse datasets. PNN separates data into a specified number of output categories. PNN networks are three-layer networks wherein the training patterns are presented to the input layer, and the output layer has one neuron for each possible category. There must be as many neurons in the hidden layer as there are training patterns. The network produces activations in the output layer corresponding to the probability density function estimate for that category. The highest output represents the most probable category. Due to the ease of training and a sound statistical foundation in Bayesian estimation theory, PNN has become an effective tool for solving many classification problems (Jin et al. 2001; Hoya 2003). The greatest advantages of PNNs are the fact that the output is probabilistic, which makes interpretation of output easy, and the training speed. Training a PNN actually consists mostly of copying training cases into the network, and so is as close to instantaneous as can be expected (Hoya 2003). The greatest disadvantage is the network size: a PNN network actually contains the entire set of training cases, and is therefore space consuming and slow to execute. There are many other neural network models with a number of parameters that can be selected. Running a number of these, the best fit was obtained with a PNN, yielding a correct classification rate of 0.849:
123
128
D. L. Olson et al.
5 Conclusions The results for all of the classification models are quite similar, and considering that we only have one dataset tested, none can be identified as clearly better. In fact, that is typical, and it is a common practice to try logit regression, neural networks, and decision trees simultaneously to fit a given dataset. The decision tree model provides a clear descriptive reasoning with its rules. Logit models have some interpretability for those with statistical backgrounds, but are not as clear as linear regression models. Neural network models very often fit complex data well, but have limited explanatory power. We have seen a variety of different models useful to take basic order data and segment customers into those most likely to order or not. Table 8 compares them, with relative data requirements and advantages. Basic RFM analysis is based on the simplest data. It involves significant work to sort the data into cells. One traditional approach is to divide the data into 125 cells (five equally scaled divisions for recency, frequency, and monetary value). However, this approach leads to highly unequal cell observations. Analysts could apply the analysis to a number of customers (the RFM model in Table 6) or to dollar values purchased (the $ on RFM row in Table 6). Since dollar volume is usually of greater interest than simple customer count, it is expected to usually provide better decision-making information. This was what we observed in our data. The worst performance on all the three dimensions typically has much higher density than other cells (Miglautsch 2002). However, these customers may offer a lot of potential opportunities for growth. We demonstrated how to balance cell sizes, which involves quite a bit more data manipulation. However, it gave a much better predictive model when applied to our dataset. Another drawback is that these three variables (R, F, and M) are not independent. Frequency and monetary value tend to be highly correlated. Yang’s (2004) value function simplifies the data, focusing on one ratio. In our analysis, this data reduction led to improvements over the basic RFM model. We then applied three classic data mining classification algorithms, all of which performed better than the RFM variants, all the three giving roughly equivalent results (the PPN neural network model gave a slightly better fit, but that was the best Table 8
Model Comparisons
Model
Accuracy on test data
Benefits
Drawbacks
RFM
0.415
Simplest data
Uneven cell densities
$ on RFM
0.591
Also simple data
Uneven cell densities
Balanced cell sizes
0.792
Better statistically
More data manipulation required
Value function
0.677
Condense to one IV
Less information
Logistic regression
0.840
Can use additional IVs
Formula hard to apply
Decision tree
0.837
Easy to interpret model
Neural network
0.849
Can fit nonlinear data
123
Hard to apply model
Comparison of customer response models
129
of five neural network models applied). These models differ in their portability, however. Logistic regression provides a well-known formula of beta coefficients (although logistic output is more difficult to interpret by users than ordinary linear squares regression output). Decision trees provide the simplest to use output, as long as you are able to keep the number of rules generated to a small number (here we only had two rules, but models can involve far more rules). Overall, the classification of customers to identify the most likely prospects in terms of future sales is very important. We have reviewed and demonstrated a number of techniques developed. We also have evaluated the relative benefits and drawbacks of each method, exhibiting a rough idea of relative accuracy based on the sample data used.
Appendix See Table 9. Table 9
Model results
Model
Correct no response
Actual no response, model response
Actual response, model no response
Total cases
278
Basic RFM
185
93
492
230
0.415
RFM on $
273
5
404
318
0.591
Balanced cell sizes
235
42
166
557
0.792
Value function
219
59
264
458
0.677
Logistic regression
205
73
87
635
0.840
Decision tree
240
38
125
597
0.837
Neural network
214
64
87
635
0.849
Correct response
Overall correct classification
722
Bold indicates total cases in row 1, error in subsequent rows
References Baesens B, Viaene S, Van den Poel D, Vanthienen J, Dedene G (2002) Bayesian neural network learning for repeat purchase modeling in direct marketing. Eur J Oper Res 138(1):191–211 Bult JR, Wanskeek T (1995) Optimal selection for direct mail. J Market Sci 14(4):378–394 Crone SF, Lessmann S, Stahlbock R (2006) The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur J Oper Res 173(3):781–800 Cui G, Wong ML, Lui H-K (2006) Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Manag Sci 52(4):597–612 D’Souza R, Krasnodebski M, Abrahams A (2007) Implementation study: using decision tree induction to discover profitable locations to sell pet insurance for a startup company. J Database Market Custom Strategy Manag 14(4):281–288 Elsner R, Krafft M, Huchzemeier A (2003) Optimizing Rhenania’s mail-order business through dynamic multilevel modeling (DMLM). Interfaces 33(1):50–66 Fitzpatrick M (2001) Statistical analysis for direct marketers—in plain English’’. Direct Market 64(4):54–56
123
130
D. L. Olson et al.
Hoya T (2003) On the capability of accommodating new classes within probabilistic neural networks. IEEE Trans Neural Netw 14(2):450–453 Jin X, Srinivasan D, Cheu RL (2001) Classification of freeway traffic patterns for incident detection using constructive probabilistic neural networks. IEEE Trans Neural Netw 12(5):1173–1187 Koonce DA, Fang C-H, Tsai S-C (1997) A data mining tool for learning from manufacturing systems. Comput Ind Eng 33(1–2):27–30 Liu D-R, Shih Y-Y (2005a) Integrating AHP and data mining for product recommendation based on customer lifetime value. Inf Manag 42(3):387–400 Liu D-R, Shih Y-Y (2005b) Hybrid approaches to product recommendation based on customer lifetime value and purchase preferences. J Syst Softw 77(2):181–191 Lo VSY (2003) The true lift model—a novel data mining approach to response modeling in database marketing. ACM SIGKDD 4(2):78–86 McCarty JA, Hastak M (2007) Segmentation approaches in data-mining: a comparison of RFM, CHAID, and logistic regression. J Bus Res 60(6):656–662 Miglautsch J (2002) Application of RFM principles: what to do with 1-1-1 customers? J Database Market 9(4):319–324 Olson DL, Shih Y (2007) Introduction to business data mining. Irwin/McGraw-Hill, New York Shih Y-Y, Liu C-Y (2003) A method for customer lifetime value ranking—combining the analytic hierarchy process and clustering analysis. Database Market Custom Strategy Manag 11(2):159–172 Specht DF (1990) Probabilistic neural networks. Neural Netw 3:110–118 Van den Berg B, Breur T (2007a) Merits of interactive decision tree building: Part 1. J Target Measur Anal Market 15(3):137–145 Van den Berg B, Breur T (2007b) Merits of interactive decision tree building: Part 2: How to do it. J Target Measur Anal Market 15(4):201–209 Verhoef PC, Spring PN, Hoekstra JC, Leeflang PSH (2003) The commercial use of segmentation and predictive modeling techniques for database marketing in the Netherlands. Decis Support Syst 34(4):471–481 Weng SS, Chiu R-K, Wang B-J, Su S-H (2006/2007) The study and verification of a mathematical modeling for customer purchasing behavior. J Comput Inf Syst 47(2): 46–57 Yang AX (2004) How to develop new approaches to RFM segmentation. J Target Measur Anal Market 13(1):50–60 Yun H, Ha D, Hwang B, Ryu KH (2003) Mining association rules on significant rare data using relative support. J Syst Softw 67:181–191
123