Comparison of customer response models - Springer Link

Serv Bus (2009) 3:117–130 DOI 10.1007/s11628-009-0064-8 ORIGINAL PAPER

Comparison of customer response models David L. Olson Æ Qing Cao Æ Ching Gu Æ Donhee Lee

Received: 20 February 2009 / Accepted: 24 February 2009 / Published online: 7 March 2009 Ó Springer-Verlag 2009

Abstract Segmentation of customers by likelihood of repeating business is a very important tool in marketing management. A number of approaches have been developed to support this activity. This article reviews basic recency, frequency, and monetary (RFM) methods on a set of data involving the sale of beef products. Variants of RFM are demonstrated. Classical data mining techniques of logistic regression, decision trees, and neural networks are also demonstrated. Results indicate a spectrum of tradeoffs. RFM methods are simpler, but less accurate. Considerations of balancing cell sizes as well as compressing data are examined. Both balancing expected cell densities as well as compressing RFM variables into a value function were found to provide more accurate models. Data mining algorithms were all found to provide a noticeable increase in predictive accuracy. Relative tradeoffs among these data mining algorithms in the context of customer segmentation are discussed. Keywords RFM Customer segmentation Neural networks Decision tree models Logistic regression

D. L. Olson (&) D. Lee Department of Management, University of Nebraska, Lincoln, NE 68588-0491, USA e-mail: [email protected] D. Lee e-mail: [email protected] Q. Cao C. Gu Rawls College of Business, Texas Tech University, Lubbock, TX 79409-2101, USA Q. Cao e-mail: [email protected] C. Gu e-mail: [email protected]

123

118

D. L. Olson et al.

1 Introduction The Recency, Frequency, and Monetary (RFM) approach is a method to identify customers who are more likely to respond to new offers. RFM groups customers by: 1. 2. 3.

Recency: time since the customer made his/her most recent purchase, Frequency: number of purchases this customer made within a designated time period, Monetary: average purchase amount.

Bult and Wanskeek (1995) gave an early description of RFM for direct mail marketing. The three variables tend to be correlated, especially F and M. Yang (2004) suggested collapsing the data to a single variable ‘‘Value’’ = M/R. The procedure of RFM is to code each customer on the three dimensions listed above. One approach is to sort on each dimension and divide each into five equal parts, thus yielding 125 combinations. Each of these 125 cells can be associated with a positive response to a marketing campaign. RFM is limited in that there are usually more than three attributes important to a successful marketing program. Other attributes that may be important include product variation, customer age, customer income, customer lifestyle, and so on (Fitzpatrick 2001). However, RFM is the basis for a continuing stream of techniques to improve customer segmentation marketing (Elsner et al. 2003), and has been identified as widely used in a number of surveys (Verhoef et al. 2003). It has been found to work relatively well, if expected response rate is high (McCarty and Hastak 2007). This article demonstrates how RFM and other models classify potential retail customers work, using a sample of data from a beef product retailer. Section 2 describes the dataset used. Section 3 describes the RFM application, and presents the results of variants of RFM. Section 4 gives results for the classical data mining classification models. Section 5 compares the results and gives conclusions.

2 Data A meat products company is one of the largest retailers of premier meat products with worldwide operations. They process, market, and distribute a wide variety of premium steaks, red meats, and other gourmet foods. These products are custom cut and packaged to serve the needs of various markets. Markets nationwide and overseas include foodservice, mail order, incentive, telesales, retail stores, licensed restaurants’ sales to specialty, and food stores, and since 1990, interactive sales. The complete dataset included 64,180 individual purchase orders (mail order division) of 10,000 individual customers from October 11, 1998 to October 03, 2003. The purchase orders included ordering date, ordering amount (price), and whether or not promotion was involved. Normally, RFM analysis involves the use of millions of data points. Our purposes, however, are to describe how the method works, and we extracted a small sample to do so. Of this data, 1,000 observations were used to build the models, and 1,000 used for testing. This building set was

123

Comparison of customer response models Table 1

Variable correlations

* P B 0.10 *** P B 0.01

119

R

F

R

1.0

F

-0.385

1.0

M

-0.307

0.808

Response 2003 Response 2003 $

0.380*** -0.061*

M

1.0

0.086***

0.048

0.428***

0.518***

processed to identify the performance as of the end of 2002, using 2003 response as the predictive variable (binary, with value 1 if there was any activity in 2003, 0 if not). There is some correlation among these variables, as shown in Table 1. These correlations indicate that R is the most useful variable with respect to binary response, while F and M contribute additional minor predictive power. With respect to money spent in 2003, M is the strongest predictor, while F also has a strong relationship.

3 Research in RFM There have been many studies applying RFM to customer analysis (Weng et al. 2006/2007). Efforts to improve results through application of data mining techniques abound, to include Bayesian networks (Baesens et al. 2002; Cui et al. 2006), association rules (Yun et al. 2003), and statistical approaches to include logistic regression (McCarty and Hastak 2007). One of the elements of the RFM model is to allow differential weights to each of the three components. The use of analytic hierarchy process (AHP) for RFM analysis has been studied and reported (Shih and Liu 2003; Liu and Shih 2005a, b). 3.1 RFM Application RFM was initially applied, dividing the scales for each of the three components into five equally spaced groups. Table 2 shows boundaries. Group 5 was assigned the most attractive group, which for R was the minimum, and for F and M the maximum. RFM analysis typically divides the data into 125 cells, designated by the five groups. The most attractive group would be 555 or Group 5 for each of the three variables. Using equally divided scales thus indicates a maximum count of 1 for this most attractive cell (there is only one observation in Group 5 for F). This observation in fact falls in Group 5 for all the three variables (R = 16, F = 56, and M = $8,070, which can be seen to be in Group 1 for all the three variables in Table 3). Table 3 displays the counts obtained for these 125 cells (most null). 3.2 Response As to predictive results, measured as response in 2003, all cells having R groups of 1 through 3 responded positively and only one observation in R4 did not. The response

123

120 Table 2

D. L. Olson et al. RFM boundaries

Factor

Min

Max

Group 1

R

1

1,540

1,233–1,540

925–1,232

618–924

310–617

1–309

80

41

59

100

720

1–12

13–23

24–34

35–45

46–56

878

100

17

4

1

4–1,719

1,720–3,434

3,435–5,149

5,150–6,864

6,865–8,579

902

74

18

4

2

Count F

1

56

Count M

4.77

Count

8579.32

Group 2

Group 3

Group 4

Group 5

Note the skewness of the data, which is often encountered. Here, the smaller values dominate in all the three metrics, which is attractive in Recency, but the unfavorable side for Frequency and Monetary Bold indicates count (number of observations in group)

Table 3

Count by RFM cell

RF

M1

M2

M3

M4

M5

55

0

54

0

0

0

0

1

1

2

0

53

1

1

9

5

2

0

52

51

39

8

1

1

51

572

23

3

0

0

0

0

0

0

0

43–45 42

1

0

0

0

0

41

98

1

0

0

0

32–35 31 22-25 21 12–15 11

0

0

0

0

0

58

1

0

0

0

0

0

0

0

0

41

0

0

0

0

0

0

0

0

0

80

0

0

0

0

rate in 2003 for group R5 was 456/720 = 0.63. The result has the counterintuitive outcome that the better the R group, the less likelihood of positive response. This indicates the possibility of seasonality in the data that might be warping responses. For F groups, F1, F2, F3, and F4 had response rates of 630/878 = 0.72, 85/100 = 0.85, 16/17 = 0.94, 2/4 = 0.50, respectively, and F5 a perfect response based on only one observation. For M groups, M1’s response was 662/902 = 0.73, M2’s 59/74 = 0.80, M3’s 12/18 = 0.67, and M4 and M5 both perfect for their small count of three each. Using a model to predict response where R [ 309 (R in Groups 1 through 4), or F [ 23 (F in Groups 3 through 5), or M [ 5,149 (M in Groups 4 or 5), there were 323 cases out of 1,000 total test cases. The coincidence matrix for this model is given in the Appendix Table 9, along with results from all other models. This model thus had an accuracy of 0.415. Most of the error was that most cases not selected actually did respond.

123

Comparison of customer response models Table 4 RFM divisions based on dollars spent

121

Total

No response

Response

Average $

1–309 days

690

275

415

310–617 days

101

0

101

$132.42

618–924 days

52

2

50

$114.07

R range

925–12,332 days 1,233? days

$204.67

47

1

46

$121.43

110

0

110

$117.89

F range 46?

2

0

2

$353.96

33–45

4

0

4

$991.50

24–34

6

0

6

$299.48

13–23

85

12

73

$284.90

1–12

903

266

637

$150.41

$6,865?

2

0

2

$1,576.83

$5,150–$6,864

2

1

1

$404.96

$3,435–$5,149

10

2

8

$830.13

$1,720–$3,434

53

8

45

$286.66

933

267

666

$150.11

M range

0–$1,719

Another model, based on response in terms of dollars spent, was generated, selecting R = 1, F = 2–5, and M = 2–5. The response in the test set yielded Table 4. With the exception of those cells involving very low counts, there is a consistent trend in dollars expended. Here, low R values performed best, as expected. The coincidence matrix in terms of the binary response was 0.591, quite a bit higher than the model built based on binary response. Detailed results are given in the Appendix Table 9. Table 5 shows the results of balancing observations as much as possible within each of the 125 cells. The correlation across F and M (0.808 in Table 1) can be seen in Table 5, looking at the R = 5 categories. In the M = 1 category, there are many F entries, declining through M = 2 and M = 3, and 0 for M = 4 and M = 5. Conversely, when F = 5, the only observation had M = 5. One of the problems with RFM is this skewness. Our dataset is small, and obviously it would be better to have millions of observations, which would increase cell counts. However, sometimes data is not available. Thus, a second approach may be to tried to obtain cells with more equal density (size-coding). We can accomplish this by setting cell limits by count of the building set. We cannot obtain eight counts per each cell because we are dealing with three scales. However, we can come closer, as in Table 6. This was generated sequentially, starting by dividing R into five roughly equal groups. Within each group, F was then sorted into groups of roughly 40 each, and then within those 25 groups, M divided into groups of roughly eight, as shown in Table 5. The rough part arose primarily due to R and F being integer values, and these were not split across groups.

123

122

D. L. Olson et al.

Table 5

Balanced group limits

Factor

Group 1

Group 2

Group 3

Group 4

Group 5

R

554–1,540

165–553

57–164

25–56

1–24

201

200

199

198

202

Count RF 55

R

F

R 1–24

F 13?

Count 54

F 8–12 Count

53

F 5–7 Count

52

F=4 Count

51

F 1–3 Count

45

R 25–56

F 12?

Count 44

F 7–11 Count

43

F 5–6 Count

42

F=4 Count

41

F 1–3 Count

35

R 57–164

F 13?

Count 34

F 9–12 Count

33

F 7–8 Count

32

F 5–6

Count

123

1–1,379

1,380–1,720

1,721–2,387

2,388–2,957

2,958?

8

8

8

8

8

1–600

601–802

803–1,119

1,120–1,486

1,487?

7

7

7

8

8

1–350

351–527

528–703

704–924

925?

10

10

10

10

11

1–242

243–291

292–360

361–472

472?

5

5

5

6

6

1–132

133–166

167–216

217–323

324?

9

9

9

10

10

1–1,283

1,284–1,715

1,716–2,351

2,352–3,820

3,821?

7

7

7

7

7

1–513

514–749

750–856

857–1,202

1,203?

9

9

9

9

10

1–378

379–483

484–708

709–850

851?

6

6

7

7

7

1–192

193–267

268–333

334–407

408?

5

5

5

5

5

1–109

110–135

136–177

178–254

255?

11

12

12

12

12

1–1,120

1,121–1,638

1,639–2,367

2,368–2,879

2,880?

7

7

8

8

8

1–678

679–997

998–1,152

1,153–1,508

1,509?

8

8

8

9

9

1–443

444–571

572–705

706–1,088

1,089?

6

7

7

7

7

376-564

565-728

729?

8

8

8

1–147

148–209

210–313

314–385

386?

9

9

9

9

10

F 9?

1–566

567–569

570–1,131

1,132–1,574

1,575?

8

9

9

9

9

F 6–8

1–465

466–608

609–789

790–941

942?

9

9

10

10

10

F=5

1–302

303–392

393–423

424–545

546?

6

6

6

7

7

Count 23

M5

308-375

Count 24

M4

8

F 1–4 R 165–553

M3

1-307

Count 25

M2

7

Count 31

M1

Comparison of customer response models

123

Table 5 continued RF

R

22

F F 3–4

Count 21

F 1–2 Count

15

R 554?

F 5?

M5

1–180

181–220

221–309

310–441

442?

10

11

11

11

11

1–139

140–154

155–183

184–195

196?

4

5

4

4

5 863?

426–637

638–862

4

5

5

5

F=4

1–245

246–341

342–400

401–502

503?

3

4

4

4

4

F=3

1–127

128–194

195–302

303–538

539?

6

6

6

6

6

F=2

1–88

89–112

113–143

155–211

212?

12

14

14

13

14

F=1

1–45

46–53

54–71

72–105

106?

17

6

14

12

13

Count 11

M4

359–425

Count 12

M3

1–358

Count 13

M2

4

Count 14

M1

Count Bold indicates count in group

The unevenness of some cells (such as categories R = 1 F = 2 and R = 1 F = 1) are due to ties in the F category. Even our attempt to attain cells with eight observations each thus encountered a degree of variance in the counts. This approach does, however, provides data that can be used to calculate lift. (Again, it would obviously be much better to have many more observations, but the calculation cost was extremely time intensive, and our purpose is to demonstrate.) 3.3 Lift Retailers and manufacturers know that they are wasting a lot of money on mass marketing. The concept of lift is critical to marketing promotion. Lift is the difference between the average probability of positive response and the response obtained. Neural network models were used to identify the overlaps in mailing patterns and order-filling telephone-call orders. This enabled Fingerhut to more efficiently staff their telephones and enable them to handle heavy order loads. The purpose of lift analysis is to identify the most responsive segments. We are probably more interested in profit, however. We can identify the most profitable policy, but what really needs to be done is to identify the portion of the population to send promotional materials to. The basic objective of lift analysis in marketing is to identify those customers whose decisions will be influenced by marketing in a positive way (Lo 2003). In short, the methodology described earlier identifies those segments of the customer base that we would expect to make a purchase. This may or may not be due to the marketing campaign effort though. The same methodology can be applied, but more detailed data is needed to identify those whose decisions would have been changed by the marketing campaign rather than simply by those who would purchase.

123

124 Table 6

Fig. 1

D. L. Olson et al. V Values by cell

Cell

0

[0

Prob{[0}

Min V

Max V

20

15

35

0.70

134.203

1583.680

19

26

24

0.48

65.084

133.450

18

21

29

0.58

37.293

64.861

17

22

28

0.56

26.251

37.133

16

26

24

0.48

19.688

26.204

15

17

33

0.66

15.671

19.396

14

20

30

0.60

12.612

15.631

13

25

25

0.50

9.263

12.433

12

23

27

0.54

6.685

9.229

11

17

33

0.66

4.956

6.684

10

13

37

0.74

3.822

4.827

9

15

35

0.70

2.965

3.758

8

14

36

0.72

2.210

2.940

7

8

42

0.84

1.555

2.209

6

3

47

0.94

0.983

1.553

5

0

50

1.00

0.535

0.955

4

0

50

1.00

0.295

0.533

3

0

50

1.00

0.143

0.291

2

0

50

1.00

0.065

0.143

1

1

49

0.98

0.021

0.064

Lift for equalized data groups

The lift chart for this data is given in Fig. 1. Lift maxes out at Group 554, the 73rd of 125 cells. This cell had a response rate of 0.75, slightly above the training set data average of 0.739. Of course, the point is not to maximize the lift, but to maximize profitability, which requires knowing expected profit rate for revenue, and cost of marketing. The test results for coding the data with an effort to balance the cell size yielded overall correct classification, which was relatively better at 0.792.

123


Fig. 2

125

Lift of test set

3.4 Value function The value function compresses the RFM data into one variable—V = M/R (Yang 2004). Since F is highly correlated with M, the analysis is simplified to one dimension. Dividing the training set into groups of 50, sorted on V, generates Table 6. Based on the probability being [0, there seems to be a clear demarcation between cells 1 through 10 (higher probabilities of purchase) and cells 11 through 20. In cells 1–10, Prob{[0} is always C 0.7, while in cells 11–20 Prob{[0} is always less than 0.7. Thus, low values of V (4.827 and lower) seem associated with more sales in this data. This implies lower purchase amounts with longer recency seeming more attractive. This seems counterintuitive, but we applied it to the test dataset. The overall correct classification rate was 0.677. The lift chart using sorted V from the test set as the horizontal access (low values to the left) is shown in Fig. 2. The maximum lift is for the 9th group, associated with V ranging from 2.45 to 3.26.

4 Data mining classification models Three classical data mining classification models were applied to the data: logistic regression, decision trees, and neural networks. 4.1 Logistic regression model Logistic regression is a classical statistical tool for modeling discrete outcome data, and has been one of the three basic methods applied in most commercial data mining software. McCarty and Hastak (2007) applied logistic regression, as well as a decision tree algorithm and RFM to their two direct marketing datasets. In our data, the outcome variable is dichotomous. We can code it as either 0 (did not

123

126 Table 7 Regression betas for logistic regression

*** P B 0.01

D. L. Olson et al.

Variable

Beta

Constant

-2.1772

Significance

R

0.0284

0.000***

F

0.1726

0.000***

M

0.0001

0.422

respond in 2003) or 1 (responded in 2003). The purpose of logistic regression is to classify cases into the most likely category. Logistic regression provides a set of b parameters for the intercept (or intercepts in the case of ordinal data with more than two categories) and independent variables, which can be applied to a logistic function to estimate the probability of belonging to a specified output class (Olson and Shih 2007). The formula for probability of acceptance of a case i to a stated class j is Pj ¼

1 P ; b0 bi x i Þ ð 1þe

where the b coefficients are obtained from logistic regression. A logistic regression model was run on RFM variables. The model results were as shown in Table 7. M was not significant in this model. Application to test data yielded an overall correct classification rate of 0.840. Logistic regression has the ability to include other variables. The only external variable to RFM available was promotion. Including promotion improved the model fit by the smallest of margins. 4.2 Decision tree The second classical model considered was decision trees. Decision trees in the context of data mining refer to the tree structure of rules. They have been applied by many in the analysis of direct marketing data, to include Van den Berg and Breur (2007a, b), and D’Souza et al. (2007). The data-mining decision tree process involves collecting those variables that the analyst thinks might bear on the decision at issue, and analyzing these variables for their ability to predict outcome. Decision trees are useful to gain further insight into customer behavior, as well as lead to ways to profitably act on results. One of a number of algorithms automatically determines which variables are most important, based on their ability to sort the data into the correct output category. The method has relative advantage over neural network and genetic algorithms in that a reusable set of rules are provided, thus explaining model conclusions. There are many examples where decision trees have been applied to business data mining, including classifying loan applicants, screening potential consumers, and rating job applicants (Olson and Shih 2007). Decision trees provide a way to implement rule-based system approaches. The ID3 system selects an attribute as a root, with branches for different values of the attribute (Koonce et al. 1997). All objects in the training set are classified into these

123


127

branches. If all objects in a branch belong to the same output class, the node is labeled and this branch is terminated. If there are multiple classes on a branch, another attribute is selected as a node, with all possible attribute values branched. An entropy heuristic is used to select the attributes with the highest information. In other data mining tools, other bases for selecting branches are often used. WEKA includes many decision tree algorithms, one of the most popular being J48. The J48 decision tree algorithm using 10-fold cross validation in WEKA was applied to the build dataset. The resultant decision tree was IF Recency \= 82 AND Frequency \= 9: no (364.0/137.0) ELSE Frequency [ 9: yes (132.0/34.0) IF Recency [ 82: yes (504.0). This model did very well on the test data, with a correct classification rate of 0.837. Decision trees also provide a very easy to understand predictive model. 4.3 Neural network Neural networks are the third classical data mining tool found in most commercial data mining software products. Cui et al. (2006) and Crone et al. (2006) applied these models to direct marketing applications. Probabilistic Neural Networks (PNN) use kernel-based approximation to form an estimate of the probability density functions of classes in a classification problem, one of the so-called Bayesian networks (Specht 1990). Probabilistic Neural Networks are known for their ability to train quickly on sparse datasets. PNN separates data into a specified number of output categories. PNN networks are three-layer networks wherein the training patterns are presented to the input layer, and the output layer has one neuron for each possible category. There must be as many neurons in the hidden layer as there are training patterns. The network produces activations in the output layer corresponding to the probability density function estimate for that category. The highest output represents the most probable category. Due to the ease of training and a sound statistical foundation in Bayesian estimation theory, PNN has become an effective tool for solving many classification problems (Jin et al. 2001; Hoya 2003). The greatest advantages of PNNs are the fact that the output is probabilistic, which makes interpretation of output easy, and the training speed. Training a PNN actually consists mostly of copying training cases into the network, and so is as close to instantaneous as can be expected (Hoya 2003). The greatest disadvantage is the network size: a PNN network actually contains the entire set of training cases, and is therefore space consuming and slow to execute. There are many other neural network models with a number of parameters that can be selected. Running a number of these, the best fit was obtained with a PNN, yielding a correct classification rate of 0.849:

123

128

D. L. Olson et al.

5 Conclusions The results for all of the classification models are quite similar, and considering that we only have one dataset tested, none can be identified as clearly better. In fact, that is typical, and it is a common practice to try logit regression, neural networks, and decision trees simultaneously to fit a given dataset. The decision tree model provides a clear descriptive reasoning with its rules. Logit models have some interpretability for those with statistical backgrounds, but are not as clear as linear regression models. Neural network models very often fit complex data well, but have limited explanatory power. We have seen a variety of different models useful to take basic order data and segment customers into those most likely to order or not. Table 8 compares them, with relative data requirements and advantages. Basic RFM analysis is based on the simplest data. It involves significant work to sort the data into cells. One traditional approach is to divide the data into 125 cells (five equally scaled divisions for recency, frequency, and monetary value). However, this approach leads to highly unequal cell observations. Analysts could apply the analysis to a number of customers (the RFM model in Table 6) or to dollar values purchased (the $ on RFM row in Table 6). Since dollar volume is usually of greater interest than simple customer count, it is expected to usually provide better decision-making information. This was what we observed in our data. The worst performance on all the three dimensions typically has much higher density than other cells (Miglautsch 2002). However, these customers may offer a lot of potential opportunities for growth. We demonstrated how to balance cell sizes, which involves quite a bit more data manipulation. However, it gave a much better predictive model when applied to our dataset. Another drawback is that these three variables (R, F, and M) are not independent. Frequency and monetary value tend to be highly correlated. Yang’s (2004) value function simplifies the data, focusing on one ratio. In our analysis, this data reduction led to improvements over the basic RFM model. We then applied three classic data mining classification algorithms, all of which performed better than the RFM variants, all the three giving roughly equivalent results (the PPN neural network model gave a slightly better fit, but that was the best Table 8

Model Comparisons

Model

Accuracy on test data

Benefits

Drawbacks

RFM

0.415

Simplest data

Uneven cell densities

$ on RFM

0.591

Also simple data

Uneven cell densities

Balanced cell sizes

0.792

Better statistically

More data manipulation required

Value function

0.677

Condense to one IV

Less information

Logistic regression

0.840

Can use additional IVs

Formula hard to apply

Decision tree

0.837

Easy to interpret model

Neural network

0.849

Can fit nonlinear data

123

Hard to apply model


129

of five neural network models applied). These models differ in their portability, however. Logistic regression provides a well-known formula of beta coefficients (although logistic output is more difficult to interpret by users than ordinary linear squares regression output). Decision trees provide the simplest to use output, as long as you are able to keep the number of rules generated to a small number (here we only had two rules, but models can involve far more rules). Overall, the classification of customers to identify the most likely prospects in terms of future sales is very important. We have reviewed and demonstrated a number of techniques developed. We also have evaluated the relative benefits and drawbacks of each method, exhibiting a rough idea of relative accuracy based on the sample data used.

Appendix See Table 9. Table 9

Model results

Model

Correct no response

Actual no response, model response

Actual response, model no response

Total cases

278

Basic RFM

185

93

492

230

0.415

RFM on $

273

5

404

318

0.591

Balanced cell sizes

235

42

166

557

0.792

Value function

219

59

264

458

0.677

Logistic regression

205

73

87

635

0.840

Decision tree

240

38

125

597

0.837

Neural network

214

64

87

635

0.849

Correct response

Overall correct classification

722

Bold indicates total cases in row 1, error in subsequent rows

References Baesens B, Viaene S, Van den Poel D, Vanthienen J, Dedene G (2002) Bayesian neural network learning for repeat purchase modeling in direct marketing. Eur J Oper Res 138(1):191–211 Bult JR, Wanskeek T (1995) Optimal selection for direct mail. J Market Sci 14(4):378–394 Crone SF, Lessmann S, Stahlbock R (2006) The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur J Oper Res 173(3):781–800 Cui G, Wong ML, Lui H-K (2006) Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Manag Sci 52(4):597–612 D’Souza R, Krasnodebski M, Abrahams A (2007) Implementation study: using decision tree induction to discover profitable locations to sell pet insurance for a startup company. J Database Market Custom Strategy Manag 14(4):281–288 Elsner R, Krafft M, Huchzemeier A (2003) Optimizing Rhenania’s mail-order business through dynamic multilevel modeling (DMLM). Interfaces 33(1):50–66 Fitzpatrick M (2001) Statistical analysis for direct marketers—in plain English’’. Direct Market 64(4):54–56

123

130

D. L. Olson et al.

Hoya T (2003) On the capability of accommodating new classes within probabilistic neural networks. IEEE Trans Neural Netw 14(2):450–453 Jin X, Srinivasan D, Cheu RL (2001) Classification of freeway traffic patterns for incident detection using constructive probabilistic neural networks. IEEE Trans Neural Netw 12(5):1173–1187 Koonce DA, Fang C-H, Tsai S-C (1997) A data mining tool for learning from manufacturing systems. Comput Ind Eng 33(1–2):27–30 Liu D-R, Shih Y-Y (2005a) Integrating AHP and data mining for product recommendation based on customer lifetime value. Inf Manag 42(3):387–400 Liu D-R, Shih Y-Y (2005b) Hybrid approaches to product recommendation based on customer lifetime value and purchase preferences. J Syst Softw 77(2):181–191 Lo VSY (2003) The true lift model—a novel data mining approach to response modeling in database marketing. ACM SIGKDD 4(2):78–86 McCarty JA, Hastak M (2007) Segmentation approaches in data-mining: a comparison of RFM, CHAID, and logistic regression. J Bus Res 60(6):656–662 Miglautsch J (2002) Application of RFM principles: what to do with 1-1-1 customers? J Database Market 9(4):319–324 Olson DL, Shih Y (2007) Introduction to business data mining. Irwin/McGraw-Hill, New York Shih Y-Y, Liu C-Y (2003) A method for customer lifetime value ranking—combining the analytic hierarchy process and clustering analysis. Database Market Custom Strategy Manag 11(2):159–172 Specht DF (1990) Probabilistic neural networks. Neural Netw 3:110–118 Van den Berg B, Breur T (2007a) Merits of interactive decision tree building: Part 1. J Target Measur Anal Market 15(3):137–145 Van den Berg B, Breur T (2007b) Merits of interactive decision tree building: Part 2: How to do it. J Target Measur Anal Market 15(4):201–209 Verhoef PC, Spring PN, Hoekstra JC, Leeflang PSH (2003) The commercial use of segmentation and predictive modeling techniques for database marketing in the Netherlands. Decis Support Syst 34(4):471–481 Weng SS, Chiu R-K, Wang B-J, Su S-H (2006/2007) The study and verification of a mathematical modeling for customer purchasing behavior. J Comput Inf Syst 47(2): 46–57 Yang AX (2004) How to develop new approaches to RFM segmentation. J Target Measur Anal Market 13(1):50–60 Yun H, Ha D, Hwang B, Ryu KH (2003) Mining association rules on significant rare data using relative support. J Syst Softw 67:181–191

123