A Quantitative Model of Dynamic Customer ... - Google Sites

A Quantitative Model of Dynamic Customer Relationships Niels Stender University of Aarhus [email protected] December 15, 2006 Abstract A dynamic discrete choice model of the evolution of customer choice in a multiproduct and multi-consumption intensity setting is presented. The transition probability function is formulated conditionally on individual observed and unobserved heterogeneity, using a Hierarchical Bayesian formulation of the latter. The model is used in a study of the customer transaction history of a telecommunications company. Churn and revenue forecasts are derived on an individual level. Estimation is carried out using Metropolis-Hastings MCMC.

Contents 1 Introduction

32

2 Building Models for Marketing Decisions 2.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 State Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Individual Level Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . .

33 33 34 34

3 Model 3.1 Transition Probabilities . . . . . . . . . . . . . 3.2 Interpretation . . . . . . . . . . . . . . . . . . 3.3 Related Model Interpretations . . . . . . . . . 3.4 Likelihood and Posterior . . . . . . . . . . . . 3.4.1 Simpli…ed Choice Probabilities . . . . . 3.4.2 Likelihood and Posterior . . . . . . . . 3.5 Metropolis-Hastings Simulation . . . . . . . . 3.5.1 A Prelude - The Accept-Reject Method 3.5.2 The Metropolis-Hastings Algorithm . . 3.5.3 Complications . . . . . . . . . . . . . . 3.5.4 Choice of Metropolis-Hastings Subtype 3.5.5 Estimation of Model Parameters . . . . 3.6 Sampling . . . . . . . . . . . . . . . . . . . . .

36 36 37 38 38 38 39 40 40 40 41 42 42 43

31

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

4 Application 4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Event Histories . . . . . . . . . . . . . . 4.2 Multiple Choice Components . . . . . . . . . . . 4.2.1 Continuation Choice . . . . . . . . . . . 4.2.2 Product Choice . . . . . . . . . . . . . . 4.2.3 Consumption Intensity . . . . . . . . . . 4.2.4 Choice Components and Transition Data 4.2.5 Interaction Variables . . . . . . . . . . . 4.2.6 Sociodemographics . . . . . . . . . . . . 4.2.7 Duration Dependence . . . . . . . . . . . 4.3 More on Estimation . . . . . . . . . . . . . . . . 5 Results 5.1 Prediction . . . . . . . . . . . . . . . . . . . . 5.2 Exit Behavior . . . . . . . . . . . . . . . . . . 5.3 Who Buys ADSL? . . . . . . . . . . . . . . . 5.4 Lift - Critical Values . . . . . . . . . . . . . . 5.5 Importance of Individual Level Heterogeneity 5.6 Customer Value . . . . . . . . . . . . . . . . . 5.6.1 Example: Marketing Decision . . . . . 5.7 Limitations and Future Research . . . . . . . 5.8 Conclusion . . . . . . . . . . . . . . . . . . . . A Descriptive Statistics A.1 Sociodemographic Variables A.2 Correlation Table . . . . . . A.3 Histograms . . . . . . . . . A.4 Empirical Exit Probabilities

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

B Model Statistics B.1 Parameter Estimates . . . . . . . . . . . . B.2 Empirical Distribution for Individual Level B.3 Churn Sensitivity . . . . . . . . . . . . . . B.4 PSTN to ADSL Transition Sensitivity . . . B.5 5-year Conditional CLV Prediction . . . .

1

. . . .

. . . .

. . . . . . . . . . . . .

. . . Coe¢ . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . cients . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

44 44 44 45 46 46 46 47 47 49 50 51

. . . . . . . . .

51 52 52 54 54 56 56 57 57 58

. . . .

60 60 61 62 63

. . . . .

64 64 66 67 69 70

Introduction

The formation of an ongoing relationship between a costumer and a company is an integral feature of many products and services. The relationship can be dynamic in the sense that consumption intensity and type of product or service involved changes dynamically. Financial performance of such companies depends decisively on the distribution of type and intensity in the customer base, so an understanding of how and why the relationship change through time is an important contributor to informed decision making. Properties of such customer relationships have been studied using a patchwork of marketing research methods such as customer satisfaction, segmentation and churn studies. Recently, there has been an increased focus on creating marketing metrics, among which

32

the concept of customer lifetime value (CLV) calculation is very prominent. The Marketing Science Institute, a non-pro…t organization, ranked marketing metrics as the most important topic in 2002-04 and third place for 2004-06 in The 2002-2004 Research Priorities (2002) and The 2004-2006 Research Priorities (2004). Business functions such as accounting and budgeting can bene…t from revenue forecasts, …nancial analysts and risk managers have an interest in customer base pricing. Marketing management executives are trying to link marketing resource allocation decisions to expected changes in customer value. Schmittlein & Peterson (1994) presented one of the earliest advanced models of relationship duration and revenue, while Reinartz et al. (2003), Donkers et al. (2003) and Kumar et al. (2006) are recent examples. The Kumar et al. paper also presents a brief overview of the recent literature on CLV. A related strain of research is the literature on purchase timing, originating from the challenges posed by retail sector scanner data, such as Manchanda et al. (1999) and Chib et al. (2002). The purpose of this paper is to explore an individual level multi-product, multiconsumption intensity model with a view to CLV calculations. The point of departure is the customer transaction data that already exist in a company’s data archives, in this case, a telecommunications company. The model di¤er from former contributions in its focus on accomodating individual level heterogeneity along with a relatively large state space. More advanced individual level heterogeneity formulations have gained popularity recently due to the possibilities enabled by the rediscovery of the Metropolis-Hastings algorithm, see Chib & Greenberg (1995). In section 2, some of the issues in modeling philosophy are presented. Section 3 describe a dynamic choice model, basically a dynamic version of the multinomial logit model, how to specify individual level heterogeneity and how to estimate it. Section 4 estimates a three component choice model tailored to the needs of telecommunications data, investigates aspects of its out-of-sample predictive ability with respect to consumer behavior and concludes by forecasting CLV 5 years ahead.

2

Building Models for Marketing Decisions

In this section it is argued that state dependence and unobserved heterogeneity are key elements to business models of consumer behavior, and, that hierarchical Bayesian methodology and modern MCMC-estimation are suited to support construction of models containing those elements.

2.1

Challenges

It is challening to model consumer behavior and customer lifetime value. Lack of comprehensive data on the environment and on consumers’information sets, such as attributes of competitive o¤erings, the individual’s knowledge of and preference for these attributes, knowledge of competitive product attributes and switching costs. A second problem is the possible lack of stationarity. Competitors simultaneously change the properties of their product o¤erings, new products are o¤ered, customers gradually learn about new o¤erings and may even change inherent preferences. There are very few studies of consumer behavior where the researcher can claim to know most qualitatively important aspects of the information set that the agent is basing 33

his decisions upon. A study by Miravete & Palacios-Huerta (2002) is one of the few exceptions this author is aware of, when excluding the stream of research related to experimental economics. The mentioned elements spell trouble for those that wish to conduct inference and predict customers’future behavior and value. Rather than constructing an ideal, simple model containing a few variables based on …rst principles, the modeling of customer behavior calls for a pragmatic approach to the problem. For the modeling of key customer behaviors, in many cases, a business analyst or social scientist has few options but to take the available data as a point of departure. To compensate for the high degree of noise and lack of knowledge, any such model must have a great deal of statistical ‡exibility and robustness built into it. When discussing the construction of models for the understanding of customers, several areas from statistics and econometrics intersect. One is that of discrete panel data methods and discrete choice models, while the other is event history analysis with models of durations as a special case. Duration models are concerned with time dependence, speci…cally how a phenomena spends time occupying a given state and how that a¤ects the probability of continually occupying the state.

2.2

State Dependence

Functional form issues aside, the key problems in the modeling of transition data are that of state dependence and unobserved heterogeneity. If the distribution of the state of a service in period t+1 depends on the state in period t or earlier, some form of state-dependence is at play. In the absence of state dependence, the sequence of events (0, 1, 0, 1) for a given service would be just as likely as (1, 1, 0, 0). This will not hold in most realistic cases. For a customer having bought into a given service, …nancial switching costs, through fees required to terminate a contract or entering into new ones, can induce state dependence. Other possibilities include indirect costs such as time lost due to the actual transition, costs of search and information, psychological or rational habit formation, learning and perceived risks are other arguments for state dependence. Lack of state dependence should only be expected in the extreme case of zero transaction costs, a transparent market, identical products, a constant demand and a type of product that cannot be hoarded.

2.3

Individual Level Heterogeneity

Two customers with the same observed characteristics vector might display systematically di¤ering behavior over time. This could be due to elements inherent to the laws governing behavior, but it could also stem from the in‡uence of unobserved characteristics. As has been argued in previous sections, it is not only possible that relevant aspects of the customer’s characteristics are unobserved, it is to be expected. Unaccounted unobserved heterogeneity can lead to fundamental errors of inference on the relation between observed heterogeneity and behavior as argued in general by Heckman (1981), Lancaster (1990) and Hsiao (2003) and in particular for marketing data by Keane (1997). Hence, a useful model of customer behavior must address state dependence and unobserved heterogeneity. In accomodating the stated criteria, the researcher has to make some broad choices in what methodology to use. A nonparametric or parametric formulation? A Frequentist or Bayesian estimation approach? How should unobserved heterogeneity be speci…ed? 34

Nonparametric methods have been increasingly popular in social sciences, as they lessen the number of assumptions necessary to make on the data generating process. The degree of functional form ‡exibility is maximized at the cost, usually, of computation time and interpretability of model results. Unfortunately, little tractable guidance and standards exist in the nonparametric statistical and econometric literature about how to formulate, and estimate, models of dynamic phenomena such as consumer behavior observed over time. A parametric approach is thus followed in this paper. The context in further discussion is implicitly given as that of a discrete choice model, with individual unobserved heterogeneity entering through some linear relationship as an intercept term, though some statements might hold for more complicated and general settings. Allenby & Rossi (1999) and Rossi & Allenby (2000) outline arguments in favor of a hierarchical Bayesian approach using MCMC-methods over a Frequentist approach, when dealing with marketing applications. This paper concurs with their arguments as they are summarized in the following paragraphs. The Frequentist approach by default has as its objective to estimate some average e¤ect across the population, hence leading the researcher to treat individual unobserved heterogeneity as a nuisance parameter that may skew estimates of population parameters. Note that this is not inherent to the Frequentist approach, but a practice carried out in large parts of the applied literature. In a typical marketing scenario, the anatomy of individual unobserved heterogeneity can have important policy implications and might even be the object of the study. The marketeer will often be at least as interested in identifying likely extreme behavioral patterns among customers, as in understanding average behavior. Studies of churn can serve as an example: Accurately capturing tail behavior of individual heterogeneity is essential in detecting the most likely churners. Even more challenging, the marketing analyst will often …nd it useful and pro…table to get some estimate of individual level parameters, rather than just estimating the population distribution parameters. That argument alone will not lead to the use of a Bayesian model formulation. In a Frequentist Fixed E¤ects approach, individual level parameter estimates can often be derived. The matter is complicated by the common panel data problem, that the depth of individual event histories is small relative to the number of cases. Parameters dependent solely on these few datapoints are suspect to great variability. As an analog to specifying a Bayesian prior, the Frequentist can follow the Random E¤ects approach and introduce a random e¤ects/mixing distribution for individual level parameters and estimate the mixing distribution parameters. In what Allenby and Rossi coin an approximate Bayesian approach, individual level estimates can be dervied from the individual level likelihood, conditional on estimated mixing parameter averages. The lowered parameter variance is induced by what is known as parameter shrinkage. The advantage of introducing a mixing distribution comes at the cost of possible misspeci…cation and the necessity of evaluating possibly complicated integrals. The integrals will often have no closed form solution and must be solved numerically. Another issue is the possible high number of individual level parameters that would be part of the estimation. Few optimization routines are geared to solve for parameters numbering in the thousands. The misspeci…cation issues are in practice checked through predictive validity and graphic diagnostics, as presented in Allenby & Rossi (1999). As mentioned in section 2.1, a second pillar is to build as ‡exible models as possible. Thinking of the mixing distribution as a Bayesian prior allows the researcher to avoid the cumbersome integrals and the approximate element in the Random E¤ects approach, 35

when using modern MCMC methods. The advent of these methods has alleviated the former Achille’s heel of complicated hierarchical Bayesian models: The calculation of the posterior. An example of a MCMC method, the Metropolis-Hastings algorithm, is given in section 3.5. The choice of a discrete or continous individual level heterogeneity distribution is discussed next. The use of a discrete point mass heterogeneity distribution for the intercept term has been advocated by Heckman & Singer (1984), when modeling the related problem of durations. It has been the typical choice in many econometric applications. In the marketing science literature, latent class models have been popular. Such models, sometimes implicitly, assume data are generated by agents with unobserved group membership, each group being homogenous with respect to some behavioral parameters. This yields a discrete heterogeneity distribution with probability mass at sets of parameters. A di¤erent approach is advocated by Allenby & Rossi (1999). They argue that a continuous mixing distribution is called for instead, since the tails of the true heterogeneity distribution is unlikely to be well described by a discrete approach. A continous distribution is utilized in this paper, since tail behavior is deemed important.

3

Model

Let N denote the number of customers. For each customer i 2 f1; 2; :::; N g we observe customer choice, or customer states, yit 2 at discrete intervals t 2 f0; 1; 2; :::; Ti g: Ti fyit gt=0 is called an event history. Ti is the maximum time index for customer i’s event history. The choice set, or state space, is a set of discrete values the signify distinct aspect of customer behavior. An example using a choice set of = f0; 1; 2g and its interpretation is given in eq. 1. 8 < 0 do nothing 1 buy A yit = (1) : 2 buy B More examples of choice sets can be found in section 4.2, p. 45. Note that the terms choice, behavior and state are used interchangeably for the remainder of the paper. The terms will always be refering to variations in yit . The challenge now is to understand the evolution in event histories conditional on some observed characteristics. In the following sections a framework for modeling and estimation is laid out. Initially, the framework will only treat a one-dimensional choice situation, while the application section will work with three-dimensional choice.

3.1

Transition Probabilities

Attention is now turned to a formulation of transition probabilities for event histories. It is assumed that the probability distribution of yit depends on yi;t 1 ; a vector of exogeneous variables xit and some additional parameters. Such a mathematical object is called a discrete semi-Markov chain of order 1 (one). Discrete since yit takes on discrete values. Markovian of order one since the probability distribution of yit depends on yi;t 1 but not yi;t i for i 2: Semi-Markov since the probability distribution of yit is not governed by yi;t 1 alone, but may also be in‡uenced by xit . Imposing a Markov order of one is a step that greatly accomodates matematical and computational tractability, but it comes at the cost of less model realism. Example: A cus36

tomer i that displayed a variable consumption pattern of (yi0 ; yi1 ) = (0; 2) might be more likely to enter state 0 again in period t = 2, when compared to customer j with pattern (yj0 ; yj1 ) = (1; 2). To compensate for this phenomena, it is possible to let time-varying dummy-variables enter xit , at the cost of being forced to estimate more parameters. The dummy-variables entering this particular model is decribed in a subsequent section. Customer choice is speci…ed using a random utility model, a standard framework for linking observed customer decisions to customer and choice context characteristics. See Train (2003) for textbook treatment of discrete choice models and random utility. If for customer i at time t 1 we observe state s; that is, yi;t 1 = s; then the utility at time t of choosing state d is uisdt . The utility is speci…ed in eq. 2. The choice made at time t is the choice yielding the maximum utility, eq. 3.

uisdt =

isd

0 sd xit

+

+ "sdt

0

for d 6= s for d = s

(2) (3)

yit = arg max uisdt d

isd represents an individual-speci…c intercept. sd is a vector of coe¢ cients that govern the impact of the observed characteristics vector xit and is of commensurate length. 1 and t; through sd is common across all customers, but speci…c to the state at times t indices s and t. The random element in utility is introduced through the error term "sdt and is assumed to be independently distributed Gumbel/type I extreme value. The …rst element of xit is set to 1 (one), such that the …rst element of sd corresponds to the average intercept. This forces the average of isd to 0 (zero) and is done as a matter of computational convenience. A hierarchical Bayesian approach is adopted in specifying isd . An argument for this approach is presented in section 2.3. isd is given a Normal prior , eq. 4. The dispersion parameter in eq. 4 is also given a prior in the form of an Inverse Gamma distribution in eq. 5. The Inverse Gamma with respect to dispersion is conjugate to the zero-mean Normal distribution, in the sense that the resulting posterior is also a Normal distribution. 2

exp p

( j )=

2

( j ;

)=

2

2

(4)

2 2

2

2 2

2

1

exp

2

where eq. 5 is identical to an Inverse Gamma distribution with shape parameter 2

(5)

;

2

and

2

scale parameter 2 . The hyperparameters, and ; can be interpreted as, respectively, the prior variance and an integer degrees-of-freedom measure of the strength of that belief. Model estimates in subsequent sections are based on 2 = 0:1 and = 1; a vague prior when compared to the number of observations.

3.2

Interpretation

The utility, as speci…ed in eq. 2, is ordinal and can be used for ranking purposes only. Adding or subtracting a constant across a set of choice utilities will not change their ordering. It is therefore common practice to let one choice in the choice set serve as 37

benchmark. In this paper the practice is followed with a slight variation. If yi;t 1 = s then at time t the utility of choosing state s again is set to 0 (zero). An element from sd then tells us, on average, how much utility is gained from changing state away from the current state s to state d in units of the current-state utility per unit of the corresponding element in xit : It is important to notice then, that s1 d is not directly comparable to s2 d for s1 6= s2 ; since the coe¢ cients work on scales normalized with di¤ering standards.

3.3

Related Model Interpretations

The presented model is equivalent to a series of mixed logit models, a standard model in the discrete choice literature. It can also be shown that the model is equivalent to a series of competing risks models observed at discrete intervals and with hazard function 0 hisd (t) = exp isd + sd xi[t] ; de…ning [x] = max fy 2 Zjy xg : The competing risks model are known in the hazard or duration model literature, such as Lancaster (1990) p. 106.

3.4 3.4.1

Likelihood and Posterior Simpli…ed Choice Probabilities

What does the likelihood equations that correspond to eq. 3 look like? To simply matters, we study a time-static case from a choice set of size three. Note that the notation in this section is simpli…ed with respect to variable indices. From and including section 3.4.2, the notation of the previous sections are reintroduced. The utilities for choice 0, 1 and 2 are: u0 u1 u2 y

=0 = v 1 + "1 = v 2 + "2 = arg max ui i

The Gumbel/Extreme Value Type I distribution is de…ned by cumulative distribution x x function F (x) = e e and corresponding density function f (x) = e x e e : Note that "1 and "2 are independent. The …rst case is that of y = 0: Pr (y = 0) = Pr (u0 > u1 [ u0 > u2 ) = Pr ( v1 > "1 [ v2 > "2 ) Z v1 Z v2 f ("1 ; "2 ) d"1 d"2 = 1

1

= F ( v1 ) F ( v2 ) =e

ev1 ev2

The second and third case is that of y = 1 or y = 2:

38

Pr (y = 1) = Pr (u1 > u0 [ u1 > u2 ) = Pr ("1 > v1 [ v1 v2 + "1 > "2 ) Z 1 Z v1 v2 +"1 = f ("1 ; "2 ) d"1 d"2 v1 1 Z 1 = f ("1 ) F (v1 v2 + "1 ) d"1 v1 Z 1 v1 +v2 e "1 ) d"1 = e "1 e (1+e v1

h 1 e (1+e 1 + e v1 +v2 ev1 v = v1 1 e e1 v 2 e +e =

v1 +v2

)e

"1

i1

v1

ev2

Interchange ev1 for ev2 and we have the solution for y = 2: The generalization to larger choice sets is easy but tedious. 3.4.2

Likelihood and Posterior

Let = f sd gs2 ;d2 ns denote the set of common parameters, and i = f isd gs2 ;d2 ns ; the set of individual level coe¢ cients for customer i. Let = ff i gi2N g represent the set of individual level model parameters for all customers. Let = f sd gs2 ;d2 ns specify the set of parameters controlling the individual level heterogeneity population distribution. Using eq. 2 and a generalization of the equations in the prior section, in particular exchanging vi for hisdt , we get eq. 6. Pr (yit jyi;t 1 ; xit ; i ; Pr (d = yit js = yi;t 1 ; xit ; i ;

)= )=

(

(6) 1 e

P

e

j6=s

where hisdt = exp (

isd

P

j6=s

hisjt

hisjt

+

0 sd xit ) ;

P hisdt j6=s hisjt

for d 6= s

for d = s

;

i 2 f1; :::; N g; t 2 f1; ::; Ti g

We can now state the model in full. Given a sample of customers i 2 f1; :::; N g , for all i, collect the corresponding observed behavior fyit gT0 i and characteristics fxit gT0 i in a set D. Collect all parameters in = f ; ; g A customer might never touch a speci…c state and untouched states does not a¤ect the model likelihood. Let Gi denote the set of states touched by customer i; so Gi = fs 2 j9t : yit = sg . The model’s joint posterior p is given in eq. 7. For the distribution parameters, and ; are …xed, as mentioned in section 3.1. ! Ti N Y Y p ( jD) = Pr (yit jyi;t 1 ; xit ; i ; ) (7) i=1

t=1

N Y Y

i=1 s2Gi d2 ns

(

isd j

39

sd )

Y

s2Gi d2 ns

2 sd j

;

In many classical applications we would now to some degree think of and as nuisance parameter vectors and strive to integrate them out. That approach would require the evaluation of complicated integrals and leave us with little knowledge of the individual . The simulation approach presented in the next section frees us from that task, i in while giving us the opportunity to learn about a :

3.5 3.5.1

Metropolis-Hastings Simulation A Prelude - The Accept-Reject Method

The popular Metropolis-Hastings algorithm (MH) is used in this paper for estimation of model parameters. The focus is concentrated on a particular version of the MH algorithm that is considered useful for the problem at hand. See Chib & Greenberg (1995), Albert & Chib (1996) or the textbooks Robert & Casella (1999) and Chen et al. (2000) for an introduction. A detailed exposition of the theory behind MCMC methods in general and the Metropolis-Hastings algorithm in particular is outside the scope of this paper, while enough information will be given for the practitioner to …nd more information and to understand some of the intuition and important issues involved in this type of simulation. Given a stochastic variable with density f (z) R we might wish to infer its mean, or other statistics, g (z) : The usual approach is to solve g (z) f (z) dz analytically. Sometimes this may prove intractable and we will want another approach. An Pnalternative is to simulate 1 realizations z1 ; :::; zn from f (z) and then simply calculate n i=1 g (zi ) for large enough n: The question is now how to generate random draws from the density f (z). Using a computer, it is easy to generate pseudo-random draws from the Uniform distribution. Draws from many standard distributions, such as the Normal distribution, are easy to generate as a transformation of one or more Uniform draws. If no transformation is known for f (z), it is possible to use the Accept-Reject method (AR) as described in (Robert & Casella 1999, p. 49). The idea is to …nd a density f2 (z) that is easier to simulate than f (z) ; but with identical support; then sample from f2 (z), while rejecting some draws to arrive at the original f (z) : Example: If f (z) where de…ned on R the Normal distribution could be used. Pick a value M such that 8z : f (z) M f2 (z) and follow the algorithm in (8). What is the intuition? If f (z) and f2 (z) where identical, M could be set to 1 and all proposals would be accepted. What about the case of f (z) 6= f2 (z)? For a …xed M and f2 (z) ; it is easy to see that values that are likely under f (z) are more likely to be accepted. Values that are likely under f (z) and unlikely under f2 (z) are even more likely to be accepted. It can be proven that the procedure generates values from f (z) : 1. 2. 3. 3.5.2

Generate Z2 f2 (z) ; U Uniform (0; 1) f (Z2 ) Accept Z = Z2 if U M f2 (Z2 ) Goto 1

(8)

The Metropolis-Hastings Algorithm

The Metropolis-Hastings algorithm builds on the AR method, with some useful twists. To study a joint density f (z1 ; z2 ) ; we can generate a sample from it using only tractable random number generators and two functions that are proportional to f (z1 ; z2 ) in z1 and z2 ; respectively.

40

(t)

Metropolis-Hastings Algorithm Example Let zi denote the t’th value in the generated sequence of draws. The parentheses are included to ensure that the time index is not mistaken for the operation of raising to a power. Let z^1 ; z^2 denote a starting values, preferably not too far from the range of f (z1 ; z2 ) : 1.

Set t = 1 and initialize

2.

Draw w1

3.

If u1
t

(14)

More on Estimation

20.000 iterations were spent calibrating the control parameters of the Metropolis-Hastings algorithm, another 20.000 on burn-in. One variable in each state in each submodel is eliminated using a general-to-speci…c principle. Then a new cycle of elimination started using 2000 iterations for tune-in and 2000 for parameter estimates. The process was repeated until no insigni…cant variables remained at the 10% level.

5

Results

How does one judge the predictive qualitiy of a model? One measure is that of model lift, where the model’s predictive ability is compared to that of a completely random prediction. Model lift is now explained. Given a sample of size n binary customer types yi 2 f0; 1g; i 2 f0; :::; ng, suppose the objective is to identify likely type-0 customers. We know related characteristics zi and our model scores likely type-0 customers using a function f (zi ) : If the sample contains n0 type-0 customers, we extract n0 customers from the full sample, according to a ranking generated by f (zi ). Let ncorrect designate the number of correct predictions. The lift is the ratio of the proportion of correct predictions in the model, relative to the expected correct predictions by a completely random model. lift =

nc o rre c t n0 n0 n

Lift measures for the model’s ability to predict churn and predict if a customer buys ADSL are reported in section 5.2 and 5.3. Section 5.2 gives a detailed example of lift calculation. Note lift is sometimes calculated using a base di¤erent from n0 ; for instance, lift could be calculated by picking out 10% of the sample and then measuring the ratio of correctness. That approach is not used here. Parameter estimates for a reduced model, reduced in the sense that insigni…cant variable coe¢ cients have been eliminated, are reported in appendix B.1, p. 64. Statistics on the individual level heterogeneity parameters are not reported, as they are too numerous, but their empirical distribution are shown in appendix B.2, p. 66. In line with the misspeci…cation methodology recommended by Allenby & Rossi (1999), the empirical distribution is graphed along with the prior distribution, eq. 4, with the 2 posterior average values for sd plugged in. A Shapiro and Wilk’s test (SW) for normality is carried out and its p-value displayed in each graph, p (SW). It turns out the unobserved 51

heterogeneity parameter for the continuation model is nearly rejected on a 1% level, questioning the choice of prior for this particular parameter. The empirical distribution seems left-skewed and this is con…rmed by a D’Agostino test for skewness, so a better prior should be introduced to accommodate this. A parametric alternative could be, as an example, a Skew-Normal distribution.

5.1

Prediction

It is challenging to interpret coe¢ cient estimates in this complex model. One-period ahead transition sensitivities to variables can be understood directly from coe¢ cient signs, but multiple periods complicate matters by allowing variables to have a direct and indirect e¤ect on the outcome of interest. A further complication is variables exhibiting some degree of multicolinearity, unobserved heterogeneity and states pushing customers into new states with varying degrees of inherent force. In the following sections the outcome of interest is the probability pi of a choice component variable equaling s at least once within n periods of time. This is investigated by simulating the state path n periods ahead for each agent, observing if s occurs. Model parameters are simulated simultaneously using the Metropolis-Hastings algorithm, so statistics based on the simulation is not conditional on average estimates, but on the parameters’full distribution. (k) Let sît denote the simulated state value in period t for customer i in simulation run numer k. Given K simulation runs, the estimated probability of customer i touching state s is given in eq. 15. o n (k) kj9j : sî;t+j = s (15) pî = K Eq. 15 is the basis for lift calculations and for sensitivity analysis. Sensitivity analysis, with respect to sociodemographic variables, is carried out by estimating the average pi conditional on one variable xj at a time. A local regression method, loc…t in the statistical system R, does that for us. To get a rough idea of the sensitivities, the conditional average is noted at the …rst, second and third quantile of each variable.

5.2

Exit Behavior

An out-of-sample prediction of customer termination is carried out in this section. Firstly, the predicted sample did not contribute to the likelihood of any aggregate parameter values. Secondly, individual heterogeneity parameter estimates are based on a truncated version of the individual event history. The agent event history is modi…ed, for an n period ahead prediction, by excluding the remaining n observations from estimation. Individuals with a transaction history length below size n are excluded from comparisons. The list results are shown in table 3 and sensitivity analysis conditional on sociodemographic variables are presented in appendix B.3, p. 67. Table 3 describe di¤erent horizons at which prediction is carried out. In the 1-step ahead predition, the …rst line of the table, n = 3602: Among those, it turns out that 92 churn. The model then picks out 92 it believes are most at risk of churning. It turns it, that among those 92; the model correctly identi…es 15: So the correctness ratio is 15=92 0:163: If we had picked out 92 from among the 3602 at random, on average, we would get a correctness ratio of 92=3602 0:026: This implies that the lift of the model is 0:163=0:026 6:4: 52

horizon 1 2 3 6 12

total actual predicted ratio correct ratio random 3602 92 15 0.163 0.026 3494 144 25 0.174 0.041 3402 197 35 0.178 0.058 2990 293 52 0.178 0.098 2233 424 143 0.337 0.190

lift 6.4 4.2 3.1 1.8 1.8

Table 3: Lift for out-of-sample predicted churn

Figure 6: Predicted 12 months ahead churn conditional on interaction variables. Point: Proportion estimate. Black line: 95% con…dence bands.

The lift values can be compared to the critical values in table 5, p. 5. From that it is clear that the lift values are signi…cant at all horizons. In general, lower risk of churn is associated with areas of higher income, higher proportions of self-employed, high-ranking job positions, educated people living in households as couples or as singles with two or more children. Higher risk of churn is associated with areas of lower income, higher proportions of people with no job market a¢ liation, little or no formal education, and living in single households. Among the sociodemographic groups, age composition seems to carry little signi…cance. Two e¤ects are highly non-linear. The proportion of single households is associated with higher risk for proportions below the median of 0.42. For values above the median, the association is reversed, though less steep. A similar relation holds for couple households with two or more children. Customers with low and high telephone consumption are more likely to churn than medium level consumers, low state consumers being much more likely to churn. Product choice does not seem to induce signi…cant di¤erences in churn, even though possession of ADSL or ISDN lowers risk directly: In the continuation component model parameter estimated in appendix B.1, p. 64 the internet10 coe¢ cient is signi…cant and negative. It is seemingly paradoxical that an e¤ect is directly signi…cant, but results in a prediction that says otherwise. Studying the consumption intensity component model, it turns out that ISDN and ADSL plays an important role. Leaving the low state is 53

horizon 1 2 3 6 12

total actual predicted ratio correct ratio random 2602 11 0 0.000 0.004 2541 26 0 0.000 0.010 2513 55 2 0.036 0.022 2273 86 10 0.116 0.038 1788 125 18 0.144 0.070

lift 0 0 1.7 3.1 2.1

Table 4: Lift for out-of-sample predicted PSTN to ADSL transitions signi…cantly less likely if an internet product is present. Entering state low from state medium is signi…cantly more likely in the presence of an internet product. Entering state low from state high is more likely for ADSL customers. Put di¤erently, assume we are in period t. A customer with consumption level in state medium and product choice in state ISDN/ADSL is less likely to churn than other customers in period t + 1. He is, however, more likely to move his consumption level to state low or high, thereby increasing the risk of churn in period t + 2. Considering that state low is much riskier than state medium and high, the indirect higher risk absorbs the direct e¤ect.

5.3

Who Buys ADSL?

An out-of-sample prediction of ADSL-buying behavior, that is, it is studied who among the non-ADSL customers eventually buy into ADSL. The lift results are shown in table 4 and a sensitivity analysis with respect to sociodemographics variables are shown in appendix B.4, B.4. The model does not pick up until above and beyond a three-month horizon. This has to do with the very small fraction of customers that buy into ADSL and noise surrounding the lift measure. Critical values of the lift measure are shown in table 5 in the next section. From that it is clear that the lift is signi…cant for the 6- and 12-months horizons only. It is also clear that measuring lift on the 1-month horizon will carry little information. Few sociodemographic characteristics have a direct signi…cant impact on the probability of transitions to the ADSL state. Average houshold income, inch, and the proportion of teenagers in a given area, teen, are the only two directly signi…cant sociodemographic variables as seen in appendix B.1, p. 64, for the product choice model. For ISDN customers, only income is directly signi…cant. However, due to the high correlation in sociodemographic variables, there is an observed signi…cance in several variables correlated with income and teenagers. Conditional on PSTN customers, it is clear from appendix B.4, that higher proportions of children, self-employed, couple households and unclassi…ed households are positively related to PSTN to ADSL transitions. Higher proportions of unemployed, people seeking education, uneducated and single housholds without children are negatively related. As seen in …gure 7, PSTN customers with high consumption are more likely to get ADSL, while medium and low consumption types display little di¤erence. ISDN customers are much more likely to get ADSL. No time-dummies survive variable elimination, so duration of stay seems to carry extra information.

5.4

Lift - Critical Values

The lift measure can be interpreted as a statistical test for randomness of a selection mechanism. Given the sample size n and the number of successes in the sample n0 ; let n 54

Figure 7: Predicted 12 months ahead probability of transition to ADSL conditional on interaction variables. Point: Proportion estimate. Black line: 95% con…dence bands. Churn - Critical Lift Values Lift Percentile Horizon 0.9 0.95 1 1.702 2.128 2 1.516 1.685 3 1.403 1.490 6 1.219 1.289 12 1.118 1.155

ADSL - Critical Lift Values Lift Percentile Horizon 0.9 0.95 1 0.000 0.000 2 3.759 3.759 3 2.492 2.492 6 1.844 1.844 12 1.373 1.488

0.975 2.553 1.853 1.578 1.323 1.180

0.975 21.504 7.518 3.323 2.151 1.602

Table 5: Critical value of lift measure at total observations and successes corresponding to table 3 and 4 denote the number of successes we get when sampling n0 items from among the n items. Under H0 the sampling mechanism is completely random, so n is Hypergemetrically distributed with probability function given in eq. 16. n0 z

f (z) =

n n0 n0 z N n0

(16)

The lift, given n is l=

n n0 n0 n

:

Given the critical values of n under H0 we can calculate the critical values of l: The calculation is carried out and shown in table 5. Example of use: The lift table for exit behavior prediction, table 3, shows at lift of 6:4 at a horizon of 1 month. To reject the H0 of random sampling, the lift should be higher than 1:702; 2:128 or 2:553, depending on the signi…cance level (one-sided 10%, 5% or 2.5%). The critical values for that particulat entry were calculated using n = 3602 anf n0 = 92:

55

Horizon Lift 1 Lift 2 1 6.4 4.9 2 4.2 3.4 3 3.1 2.8 6 1.8 1.9 12 1.8 1.8

Ratio 0.76 0.81 0.91 1.03 0.99

Table 6: Lift on 12 months ahead churn predictions. "Lift 1": Full model. "Lift 2": A model excluding individual unobserved heterogeneity $/month Product Choice PSTN 20 ISDN 30 ADSL 40 Consumption Intensity LO 0 MED 10 HIGH 60

Table 7: Revenue per month derived from choice component states

5.5

Importance of Individual Level Heterogeneity

A second run of the model is done excluding individual unobserved heterogeneity, to see if the complication adds anything to the predictive ability of the model. Table 6 reports lift measures, given the alternative model. It is clear that individual unobserved heterogeneity improves the lift of the model, but the relative gain decreases as the forecast horizon grows. The gain disappears when moving beyond a forecast horizon of three months. The method used to compare out of sample lift measures is bound to disfavor a model including individual unobserved heterogeneity. As the forecast horizon grows, the individual event history shrinks in line with the out of sample principle. The unobserved heterogeneity estimates being very dependent on individual event histories will thus gradually loose valuable information. The extent to which this distortion in‡uences the comparison is unknown, but the method makes it unlikely to overestimate the value of unobserved heterogeneity as a modelling element.

5.6

Customer Value

We will now calculate the customer lifetime value (CLV), the present value of customer revenue streams, …ve years into the future, applying a 10% per year discount rate. Each state is assigned a value, not exactly equal to the true value due to secrecy restrictions, but enough for relative values to stay roughly the same. Table 7 reports on the exact values. The individual values are not reported, but a sensitivity analysis is carried out conditioning on all sociodemographic variables, on at a time. The results are shown in appendix B.5, p. 70. When studying the …gure it can be concluded that it forms a pattern similar to that of churning customers. Well-o¤, better-educated customers in couple households are more valuable, as these customers also display lower churn levels. However, many variables associated with signi…cant changes in churn are not signi…cant in association with the 5-year value prediction. Areas with 56

above-average unemployment carry higher churn risk, but not a signi…cantly lower value. Similarly, groups with above-average rates of single households with one child usually go with higher churn rates, but not signi…cantly lower customer value. 5.6.1

Example: Marketing Decision

Imagine a simpli…ed budget decision scenario regarding a marketing campaign designed to win new customers. The revenue from a given campaign is a function of response-rate per person r, number of potential customer touched n, per-person cost of the campaign c and customer present value v. = rnv

cn

v is a stochastic variable, while c and n can be treated as constants. We might have no knowledge of r, but we can …nd a lower bound that must be met. E( )>0 rmin >

c E (v)

Suppose c = $400: Using numbers from the …gure in appendix B.5 we could derive di¤erent bounds for di¤erent sociodemographic groups. Using only household income inch we exchange the unconditional expectation E (v) for a conditional E (vjinch). The expected 5-year present value of a region earning $30,000/year, the …rst quantile, is $825, while an area earning around $42,000/year, the third quantile, has a present value of $1050. The rmin for the former region is then 48.5%, and 38.1% for the latter. A campaign expected to yield a response rate above 38.1%, but below 48.5%, should exclude areas where households on average earn less than $30.000/year. Compare the calculation above to one based on churn rates alone. The expected present value of the high-income group is almost one-third higher than that of the lowincome group. How does this compare to their termination rates? Turning to appendix 5.2, the …rst table, we see that the low-income group has yearly expected churn rates of approximately 17%, while the high-income group has an expected churn rate of 15%. This translates to a 5-year survival rate of 39.4% and 44.4%, respectively. The high-income group is only 1/8 more "valuable" using this measure. Clearly, not just termination rates, but also product mix and consumption intensity plays a role here. When studying the …gure in appendix B.4, it is seen that high-income households, high inch, are more likely to buy into ADSL. Studying the parameter estimates in appendix B.1, the third table, it it seen that the direct e¤ect of high inch is to increase the probability of moving from medium to high consumption intensity.

5.7

Limitations and Future Research

A key limitation in this study has no doubt been the nature of data. Prediction levels could probably go higher if data was made available on a …ner time-scale. A second issue is the variation in the product choice component; transitions are in general quite rare here. A third venue would be data on competitors o¤erings in terms of price and competitive pressure, measured by ad spending week by week. A fourth venue would the inclusion of marketing treatment variables on an individual level. Many companies are in the process of building up capabilities of that nature. 57

Moving on to model formulation, an alternative to discretizing the continous consumption variable would be to extend the statespace and introduce a parametric time-series model of variable consumption, letting relevant parameters depend on state. Consumption revenue is more naturally modeled by a continous distribution, that would enable the analyst to learn more about the potentially very valuable upper tail. The empirical distribution of minute usage revenue, as has been shown earlier, is highly skewed. In‡ow of new customers is currently ignored, but would be a useful addition for a company to get a more complete picture of e¤ects of marketing actions. This model accounts for existing customers only. The presence of large datasets and the lack of knowledge about functional form of the transition probability and observable heterogeneity-relation makes nonparametric approaches attractive. Few papers exist dealing with nonparametric models and state space representations.

5.8

Conclusion

In this paper we have successfully speci…ed and estimated a model accounting for individual level heterogeneity, even though the number of parameters in the model was very high. This should demonstrate to businesses and researchers alike, that it is possible to unify relatively complex models with a relatively large number of observations. Applied marketing researchers and business analysts should …nd the detailed exposition and application of the Metropolis-Hastings algorithm interesting in its own right and animate them to harness its power for their own use. The ability of the model to pick out and pro…le likely candidates that might terminate their relationship with the company, was signi…cantly above that of a random selection procedure. The model performance was not much higher than that of a model without coe¢ cients that vary on the individual level. So it seems that with respect to churn, in this speci…c data setup, much can be gained simply from modeling average e¤ects, ignoring individual level heterogeneity - a factor that would simplify the model setup considerably. This is a bit disappointing, since the main focus of the model-building e¤ort has been put into allowing for individual level heterogeneity. On the other hand, the model did perform between 10% to 30% better on a horizon on and below 3 months; secondly, it is not possible to know in advance if the individual level matters; thirdly, allowing the intercept to vary on the individual level solely might not be the only relevant focus. The performance of the model with respect to predicting future upsale of a speci…c high-value product, broadband internet, seemed to pick up on longer horizons; but few variables explained variation in preferences towards ADSL. The sociodemographic variables might simply not hold information that can help the analyst, or the model’s functional form were not su¢ ciently ‡exible. The multi-product, multi-intensity model facilitated the examination of rich questions in a uni…ed framework. A question asked in the previous section is if high-income customers are worth more. The answer was yes, beacuse their churn rates are lower, they are more likely to choose high-value products and their consumption intensity is likely to be higher. This is an example of an actionable insight that many companies currently does not have. A comprehensive model of customer value is a natural ingredient in the execution of a range of management challenges. A gold standard has not yet emerged, so the …eld of marketing will continue to search for better metrics of customer value and customer behavior. The …eld is hampered by the speci…city of each company’s situation and the 58

typical problems laid out in section 2.1. Any candidate will probably be tailored to the speci…c business segment, use very ‡exible and computationally tractable statistical methods and use data from multiple sources.

59

A

Descriptive Statistics

A.1

Sociodemographic Variables

Title Description Age Groups

Min

Q1

Q2

Q3

Max

child

Proportion aged =20

0.00 0.09

0.04 0.71

0.08 0.79

0.13 0.87

0.90 1.00

Proportion of households earning >$100k/yr Average household income, normed $100k

0.00 0.08

0.07 0.30

0.16 0.35

0.31 0.42

0.86 1.29

Income inchi inch

Jobmarket Status jself jhi jmed junempl

Proportion self-employed Proportion managers and workers on the highest level Proportion being workers on average pay and lower Proportion unemployed

0.00 0.00 0.00 0.00

0.02 0.05 0.35 0.00

0.05 0.10 0.42 0.01

0.10 0.16 0.49 0.02

0.46 0.62 0.73 0.46

jeduseek jpen

Proportion currently seeking an education Proportion outside the jobmarket

0.00 0.00

0.17 0.04

0.24 0.08

0.33 0.13

0.99 0.57

Proportion with no formal education Proportion with the equivalent of a college degree Proportion with a commercial background Proportion with a short to medium education Proportion with the equivalent of a university degree Proportion with an unknown educational background

0.01 0.00 0.02 0.01 0.00 0.01

0.14 0.01 0.26 0.12 0.01 0.07

0.23 0.04 0.33 0.17 0.04 0.10

0.31 0.09 0.40 0.23 0.08 0.14

0.57 0.64 0.65 0.68 0.60 0.76

Education enone ecoll ecomm eshort elong eunkn

Household Type hs0 hs1 hs2p hc0 hc1

Proportion of singles with no children Proportion of singles with one child Proportion of singles with two children or more Proportion of couples with no children Proportion of couples with one child

0.01 0.00 0.00 0.01 0.00

0.24 0.01 0.01 0.24 0.03

0.42 0.01 0.01 0.31 0.06

0.56 0.03 0.02 0.40 0.08

0.98 0.20 0.25 0.65 0.23

hc2p hother

Proportion of couples with two children or more Proportion of unclassified households

0.00 0.00

0.04 0.01

0.10 0.01

0.20 0.02

0.51 0.16

Overview of sociodemographic variables. Min: Minimum value. Q1: 25% percentile. Q2: Median. Q3: 75% percentile. Max: Maximum value. Names in italics indicate exclusion from estimation due to perfect multicollinearity.

60

0.11

0.29

0.2

0.23

0.1

0.1

inch

jself

Jhi

-0.2

61

0.17

hother

0.3

0.42

0.1

hc0

0.54

0.36

hs2p

hc1

0.19

hc2p

0.15

-0.49

hs0

hs1

-0.1

0.23

0.54

0.32

0.09

-0.48

-0.13

0.05

-0.12

eunkn

0.01

elong

0.1

eshort

0.3

-0.39

-0.3

0.18

ecoll

0.12

-0.12

-0.1

-0.07

0.27

0.16

0.28

-0.82

0

ecomm

enone

-0.01

jeduseek

jpen

-0.12

junempl

jmed

-0.02

-0.83

adult

inchi

1

0.36

0.36

teen

teen

1

child

child

-0.24

-0.66

-0.44

-0.15

-0.41

-0.17

0.59

0.15

0.03

-0.07

-0.29

0.41

-0.07

0.08

0.19

0.12

-0.13

-0.05

-0.3

-0.22

-0.35

1

-0.82

-0.83

adult

0.28

0.29

0.07

0.69

0.58

0.65

-0.23

-0.36

-0.74

-0.35

0.48

0.6

0.34

-0.37

-0.58

-0.55

-0.37

-0.3

0.07

0.66

0.53

0.87

1

-0.35

inchi

0.66

0.44

1

0.87

-0.22

0.16

0.2

0.06

0.46

0.43

0.46

-0.22

-0.28

-0.51

-0.28

0.58

0.54

0.17

-0.25

-0.57

-0.44

-0.32

-0.3

-0.02

inch

-0.41

-0.31

-0.24

-0.01

0.24

1

0.44

0.53

-0.3

0.27

0.23

0.12

0.58

0.4

0.56

-0.21

-0.34

-0.61

-0.24

0.16

0.3

0.3

-0.33

-0.22

jself

jhi

-0.04

0.31

0.4

0.43

-0.26

-0.27

-0.39

-0.31

0.8

0.7

-0.14

0.08

-0.74

-0.3

-0.5

-0.08

-0.1

1

0.24

0.66

0.66

-0.05

-0.02

0.1 0.11

0.1

0.03

0.19

0.23

0.26

0.01

0

-0.29

-0.51

-0.27

0.14

0.49

0.06

-0.08

-0.13

-0.62

-0.03

1

-0.1

-0.01

-0.02

0.07

-0.13

jmed

-0.02

-0.26

-0.21

-0.26

0.08

0.08

0.3

0.01

-0.01

-0.08

-0.35

0.57

0.01

0.4

-0.21

1

-0.03

-0.08

-0.24

-0.3

-0.3

0.12

-0.07

-0.12

junempl

-0.06

-0.32

-0.41

-0.39

0.04

0.08

0.44

0.58

-0.32

-0.5

-0.11

-0.29

0.47

-0.09

1

-0.21

-0.62

-0.5

-0.31

-0.32

-0.37

0.19

-0.1

-0.2

jeduseek

-0.3

-0.41

-0.44

-0.55

0.08

-0.12

-0.01

0.04

-0.41

-0.3

-0.56

0.38

0.43

0.46

0.19

-0.09

-0.34

-0.47

0.45

0.31

1

-0.09

0.4

-0.13

jpen

0

0.07

-0.23

-0.33

-0.44

0.33

0.33

0.32

0.27

-0.67

-0.8

-0.03

-0.23

1

0.31

0.47

0.01

-0.08

-0.74

-0.22

-0.57

-0.58

-0.07

0.12

enone

-0.3

-0.16

-0.52

-0.29

-0.28

-0.08

0.05

0.49

-0.09

0.16

0.06

-0.55

1

-0.23

0.45

-0.29

0.57

0.06

0.08

-0.33

-0.25

-0.37

0.41

-0.39

ecoll

0.08

0.46

0.32

0.47

-0.06

-0.15

-0.53

-0.46

-0.39

0.04

1

-0.55

-0.03

-0.47

-0.11

-0.35

0.49

-0.14

0.3

0.17

0.34

-0.29

0.3

0.18

ecomm

0.1

-0.04

0.34

0.39

0.5

-0.27

-0.28

-0.44

-0.49

0.53

1

0.04

0.06

-0.8

-0.34

-0.5

-0.08

0.14

0.7

0.3

0.54

0.6

-0.07

0.01

eshort

-0.02

0.13

0.23

0.2

-0.17

-0.16

-0.17

-0.16

1

0.53

-0.39

0.16

-0.67

-0.09

-0.32

-0.01

-0.27

0.8

0.16

0.58

0.48

0.03

-0.1

0.05

elong

-0.01

-0.27

-0.3

-0.46

0.13

0.11

0.42

1

-0.16

-0.49

-0.46

-0.09

0.27

0.19

0.58

0.01

-0.51

-0.31

-0.24

-0.28

-0.35

0.15

-0.13

-0.12

eunkn

hs0

-0.22

-0.87

-0.7

-0.77

-0.06

0.15

1

0.42

-0.17

-0.44

-0.53

0.49

0.32

0.46

0.44

0.3

-0.29

-0.39

-0.61

-0.51

-0.74

0.59

-0.48

-0.49

hs1

0.14

-0.21

-0.1

-0.45

0.55

1

0.15

0.11

-0.16

-0.28

-0.15

0.05

0.33

0.43

0.08

0.08

0

-0.27

-0.34

-0.28

-0.36

-0.17

0.09

0.19 0.32

0.36

0.2

0.07

0.02

-0.36

1

0.55

-0.06

0.13

-0.17

-0.27

-0.06

-0.08

0.33

0.38

0.04

0.08

0.01

-0.26

-0.21

-0.22

-0.23

-0.41

hs2p

hc0

-0.02

0.49

0.41

1

-0.36

-0.45

-0.77

-0.46

0.2

0.5

0.47

-0.28

-0.44

-0.56

-0.39

-0.26

0.26

0.43

0.56

0.46

0.65

-0.15

0.15

0.1

hc1

0.13

0.58

1

0.41

0.02

-0.1

-0.7

-0.3

0.23

0.39

0.32

-0.29

-0.33

-0.3

-0.41

-0.21

0.23

0.4

0.4

0.43

0.58

-0.44

0.3

0.42

0.54

0.54

0.16

1

0.58

0.49

0.07

-0.21

-0.87

-0.27

0.13

0.34

0.46

-0.52

-0.23

-0.41

-0.32

-0.26

0.19

0.31

0.58

0.46

0.69

-0.66

hc2p

1

0.16

0.13

-0.02

0.2

0.14

-0.22

-0.01

-0.02

-0.04

0.08

-0.16

0.07

0.04

-0.06

-0.02

0.03

-0.04

0.12

0.06

0.07

-0.24

0.23

0.17

hother

A.2 Correlation Table

A.3

Histograms

Histograms for sociodemographic variables. A vertical line marks the median.

62

A.4

Empirical Exit Probabilities

Empirical exit probability from a given state as a function of duration of stay. X-axis: Time in months. Grey lines: 95% con…dence bands.

63

B

Model Statistics

B.1

Parameter Estimates

Parameter estimates are presented below. Table headings should be interpreted as: Avg: average, SD: standard deviation of parameter; Sig: Signi…cance i.e. a test for zero being outside the con…dence interval (“**”: Yes, outside [2.5%, 97.5%], “*”: outside [5%, 95%], “-“: Inside, not signi…cant); Qx: x-percentile. The parameter names in the title column are to be interpreted in a special way. sd is represented by "sigma[s][d]", where s and d signify source and destination state, for example sigma10 in the …rst table represents 10 in the continuation component model. The …rst element of sd , a constant, is named "constant[s][d]". Coe¢ cients for timedummy variables, eq. 14, st ;t for transition s to d are named "s[t ]_[s][d]". Coe¢ cients for sociodemographic variables are named as in appendix A.1, with a post…x signifying the source state s and destionation state d: "[name][s][d]". Title sigma10

Avg

Continuation Component (0, churn),(1, stay) SD Sig Q0.5 Q0.025 Q0.05

Q0.95

Q0.975

0.4985

0.0780 **

0.4898

0.3854

0.3968

0.6542

0.6898

-3.8968

0.2577 **

-3.8625

-4.4438

-4.3596

-3.5131

-3.4308

s12_10

0.3345

0.0817 **

0.3338

0.1872

0.2091

0.4720

0.4931

vlo10

1.4436

0.0898 **

1.4417

1.2771

1.2956

1.5915

1.6226

vhi10

0.6948

0.0874 **

0.6954

0.5232

0.5449

0.8370

0.8584

-0.2849

0.0940 **

-0.2842

-0.4647

-0.4389

-0.1429

-0.1074

Teen10

1.2677

0.5676 **

1.2333

0.0934

0.357

2.2022

2.4470

jmed10

0.8048

0.3645 **

0.8014

0.1629

0.237

1.4075

1.5219

-1.5456

0.5063 **

-1.5595

-2.6212

-2.4313

-0.7822

-0.5926

constant10

internet10

ecomm10 eshort10

-2.0976

0.5704 **

-2.0853

-3.1628

-3.0467

-1.2135

-1.0323

eunkn10

-1.5766

0.6570 **

-1.6201

-2.7138

-2.5856

-0.5049

-0.2644

inch10

-0.9122

0.4459 **

-0.9275

-1.7056

-1.5824

-0.1318

-0.0389

Title sigma12 constant12

Product Choice Component (1,PSTN),(2,ISDN),(3,ADSL) Avg SD Sig Q0.5 Q0.025 Q0.05 Q0.95

Q0.975

3.8453

0.4748 **

3.7995

3.0861

3.1515

4.62

4.7297

-14.7535

1.5472 **

-14.8811

-17.4563

-17.2028

-12.0182

-11.8988

teen12

6.7159

3.7733 *

6.5287

-0.1194

0.8553

13.1094

13.729

inch12

6.0041

1.8473 **

6.1692

2.2538

2.8238

8.7652

9.0687

sigma13

1.9199

0.178 **

1.9016

1.5936

1.6592

2.2424

2.2962

-7.4215

0.37 **

-7.3776

-8.1827

-8.059

-6.926

-6.8228

constant13 vhi13

0.628

0.152 **

0.6204

0.3415

0.3828

0.8688

0.9131

inettch13

0.9399

0.3792 **

0.957

0.1868

0.3062

1.5663

1.6448

inch13

2.1206

0.7945 **

2.2071

0.4225

0.8043

3.2755

3.4872

sigma21 constant21 sigma23 constant23 child23 sigma31 constant31

1.1396

0.2467 **

1.1165

0.7664

0.7869

1.5852

1.6735

-5.6423

0.2764 **

-5.6335

-6.2283

-6.1138

-5.2292

-5.1775

1.915

0.2735 **

1.9186

1.4258

1.4725

2.3588

2.4193

-5.5955

0.397 **

-5.5947

-6.3822

-6.2781

-4.9223

-4.8316

6.3864

2.1342 **

6.3848

2.4604

3.1741

10.1244

10.8509

1.1212

0.2905 **

1.1414

0.516

0.6136

1.5706

1.7193

-3.5775

0.561 **

-3.5224

-4.7046

-4.5593

-2.71

-2.5835

vlo31

-0.931

0.4582 **

-0.9063

-1.904

-1.7511

-0.2687

-0.152

inch31

-4.7501

1.4793 **

-4.8405

-7.2523

-7.0198

-2.2055

-1.8201

64

Title sigma12

Consumption Intensity Component (1,low),(2,medium),(3,high) Avg SD Sig Q0.5 Q0.025 Q0.05 Q0.95

Q0.975

0.7328

0.084 **

0.7307

0.5718

0.5924

0.8744

-2.2348

0.1106 **

-2.2315

-2.4565

-2.4222

-2.0559

-2.026

s1_12

0.6454

0.0904 **

0.6421

0.4655

0.4931

0.7922

0.8119

s2_12

0.3223

0.13 **

0.3118

0.0964

0.1254

0.5666

0.5918

s3_12

0.5216

0.1126 **

0.5209

0.3062

0.3428

0.7153

0.749

adsl12

-0.4471

0.1001 **

-0.4442

-0.6561

-0.6233

-0.2758

-0.2555

teen12

-2.4502

0.6524 **

-2.4517

-3.7968

-3.4847

-1.3905

-1.207

1.5936

0.2266 **

1.589

1.1983

1.2336

2.0352

2.1623

constant12

sigma13 constant13

0.9075

-5.3318

0.4549 **

-5.3073

-6.4524

-6.1831

-4.6774

-4.6178

s1_13

1.2705

0.2373 **

1.278

0.7818

0.8538

1.649

1.6967

s2_13

0.9129

0.568 -

0.8612

-0.1874

-0.0046

1.9328

2.0486

s3_13

-0.9464

0.519 **

-0.91

-2.0462

-1.8466

-0.1491

-0.0484

s5_13

1.0918

0.4617 **

1.0596

0.3226

0.403

1.9252

2.1757

isdn13

-0.5515

0.3334 *

-0.5501

-1.2384

-1.1107

-0.0211

0.0843

adsl13

-1.6051

0.3482 **

-1.5997

-2.3487

-2.2089

-1.0631

-0.9724

sigma21

1.7985

0.0848 **

1.8014

1.6411

1.6658

1.939

1.9666

-3.9129

0.2094 **

-3.9045

-4.3519

-4.3077

-3.5695

-3.5299

s1_21

0.3508

0.0686 **

0.3484

0.2143

0.24

0.464

0.4866

s5_21

0.3033

0.0952 **

0.305

0.1094

0.1374

0.4553

0.4819

isdn21

0.4407

0.1576 **

0.4412

0.122

0.1744

0.7063

0.7627

adsl21

0.5046

0.121 **

0.5062

0.2704

0.3022

0.6997

0.7393

inettch21

-1.0653

0.5716 **

-1.038

-2.245

-2.0726

-0.2515

-0.0579

constant21

jmed21

-1.5004

0.4308 **

-1.5224

-2.2811

-2.1707

-0.7062

-0.5881

hs2p21

4.1931

1.8749 **

4.1284

0.6622

1.2982

7.3462

7.7055

sigma23

1.2674

0.0485 **

1.2694

1.1655

1.1826

1.3486

1.3584

-4.2892

0.1457 **

-4.297

-4.5216

-4.5009

-4.0504

-4.0022

constant23 s1_23

0.304

0.057 **

0.3073

0.189

0.2024

0.3932

0.4084

s2_23

0.1447

0.0724 **

0.1376

0.0148

0.0318

0.2653

0.3059

s3_23

0.1728

0.0706 **

0.1781

0.018

0.0488

0.2868

0.2962

s7_23

0.4411

0.0768 **

0.4486

0.3037

0.3147

0.5665

0.6011

isdn23

0.518

0.0992 **

0.5212

0.3301

0.3548

0.6719

0.7144

teen23

2.2014

0.5052 **

2.1994

1.1854

1.3822

3.0568

3.1808

pen23

2.5148

0.5216 **

2.5166

1.4741

1.6618

3.4371

3.5567

inch23

1.0736

0.3007 **

1.1047

0.4781

0.5509

1.5104

1.5548

sigma31

1.3713

0.3232 **

1.319

0.8367

0.885

2.0209

2.1107

-8.4085

0.542 **

-8.3677

-9.6074

-9.3699

-7.566

-7.4272

constant31 s1_31

0.1436

0.1659 -

0.1488

-0.2073

-0.1274

0.4047

0.4697

s5_31

0.4666

0.261 *

0.4595

-0.0668

0.0382

0.8836

0.9601

adsl31

0.4134

0.2208 *

0.4155

-0.0093

0.0608

0.7877

0.8566

jmed31

3.7239

0.8604 **

3.6425

2.1574

2.3982

5.3874

5.5916

edu_seek31

3.2287

0.7342 **

3.1677

1.889

2.1388

4.5172

4.7266

pen31

7.5359

1.2379 **

7.4957

5.1729

5.5743

9.6053

10.0598

sigma32 constant32

0.7255

0.0504 **

0.7224

0.6369

0.6516

0.8158

0.8349

-2.0982

0.1212 **

-2.0969

-2.3274

-2.3002

-1.9083

-1.8893

s1_32

0.3433

0.0484 **

0.345

0.2527

0.2652

0.42

0.4378

s3_32

0.3038

0.067 **

0.303

0.1627

0.1933

0.4133

0.4305

s7_32

0.6097

0.1069 **

0.6114

0.4051

0.4297

0.7916

0.8169

Isdn32

-0.469

0.0844 **

-0.4691

-0.6334

-0.6097

-0.3329

-0.3058 -0.5654

teen32

-1.4361

0.4694 **

-1.4558

-2.3157

-2.178

-0.674

i10032

-0.2773

0.1698 *

-0.2777

-0.602

-0.5644

-0.013

0.037

hc132

2.0332

2.0345

0.4256

0.6922

3.3167

3.7523

0.839 **

65

B.2

Empirical Distribution for Individual Level Coe¢ cients

parameter population distribution. Black line: Kernel estimate of empirical density. Grey line: Normal density. “p(SW)”: p-value of Shapiro and Wilk’s test for normality.

66

B.3

Churn Sensitivity

Percentage probability of churn in twelve months conditional on sociodemographic variables. Black line: Average risk. Dashed lines: 95% con…dence band. Grey vertical lines: First quantile, median and third quantile of sociodemographic variable.

67

Churn sensitivity. Predicted 12 months ahead churn percentage sensitivity to sociodemographic variables. “m-q1”: change in average churn from …rst quantile to median of variable. “q3-m”: change in average churn from median to third quantile of variable. Point: Average change. Grey line surrounding point: 95% con…dence band.

68

B.4

PSTN to ADSL Transition Sensitivity

Transitions from PSTN to ADSL. Sensitivity of 12 months-ahead probability of transition to sociodemographic variables. “m-q1”: change in average churn from …rst quantile to median of variable. “q3-m”: change in average churn from median to third quantile of variable. Point: Average change. Grey line surrounding point: 95% con…dence band.

69

B.5

5-year Conditional CLV Prediction

Estimated 5 years ahead average present value of customers, discounted at 10% per year - conditional on value of sociodemographic variable.

70

References Albert, J. & Chib, S. (1996), ‘Computation in bayesian econometrics: An introduction to markov chain monte carlo’, Advances in Econometrics 11, 3–24. Part A. Allenby, G. M. & Rossi, P. E. (1999), ‘Marketing models of consumer heterogeneity’, Journal of Econometrics 89(1-2), 57–78. Chen, M.-H., Shao, Q.-M. & Ibrahim, J. G. (2000), Monte Carlo Methods in Bayesian Computation (Springer Series in Statistics), Springer. URL: http://www.amazon.fr/exec/obidos/ASIN/0387989358/citeulike04-21 Chib, S. & Greenberg, E. (1995), ‘Understanding the metropolis-hastings algorithm’, The American Statistician 49(4), 327–335. Chib, S., Seetharaman, P. B. & Strijnev, A. (2002), Analysis of multi-category purchase incidence decisions using iri market basket data, in ‘Econometric Models in Marketing’, Vol. 16 of Advances in Econometrics : A Research Annual, pp. 57–92. Donkers, B., Verhoef, P. C. & Jong, M. d. (2003), ‘Predicting customer lifetime value in multi-service industries’. Heckman, J. J. (1981), Heterogeneity and state dependence, in S. Rosen, ed., ‘Studies in Labor Markets’, University of Chicago Press, pp. 91–139. Heckman, J. & Singer, B. (1984), ‘A method for minimizing the impact of distributional assumptions in econometric-models for duration data’, Econometrica 52(2), 271–320. Hsiao, C. (2003), Analysis of Panel Data, Econometric Society Monographs, Cambridge University Press. Jarner, S. F. & Tweedie, R. L. (2003), ‘Necessary conditions for geometric and polynomial ergodicity of random walk-type markov chains’, Bernoulli 9(4), 559–578. URL: http://www.projecteuclid.org/Dienst/UI/1.0/Summarize/euclid.bj/1066223269 Keane, M. P. (1997), ‘Modeling heterogeneity and state dependence in consumer choice behavior’, Journal of Business and Economic Statistics 15(3), 310–327. ¼S Kumar, V., Lemon, K. N. & Parasuramen, A. (2006), ‘Managing customers for value âA¸ an overview and research agenda’, Journal of Service Research 9(2), 87–94. Lancaster, T. (1990), The Econometric Analysis of Transition Data, Cambridge University Press. Manchanda, P., Ansari, A. & Gupta, S. (1999), ‘The "shopping basket": A model for multicategory purchase incidence decisions’, Marketing Science 18(2), 95–114. Miravete, E. J. & Palacios-Huerta, I. (2002), ‘Learning temporal preferences’, CEPR Discussion Papers . Reinartz, W., Thomas, J. S. & Kumar, V. (2003), ‘Allocating acquisition and retention resources to maximize customer pro…tability’. Robert, C. P. & Casella, G. (1999), Monte Carlo Statistical Methods, Springer Texts in Statistics, third edn, Springer. 71

Roberts, G. O., Gelman, A. & Gilks, W. R. (1997), ‘Weak convergence and optimal scaling of random walk metropolis algorithms’, Annals of Applied Probability 7(1), 110–120. Rossi, P. E. & Allenby, G. M. (2000), ‘Statistics and marketing’, Journal of the American Statistical Association 95(450), 635–638. Schmittlein, D. C. & Peterson, R. A. (1994), ‘Customer base analysis - an industrial purchase process application’, Marketing Science 13(1), 41–67. The 2002-2004 Research Priorities (2002), Technical report, Marketing Science Institute. The 2004-2006 Research Priorities (2004), Technical report, Marketing Science Institute. Train, K. E. (2003), Discrete Choice Models with Simulation, Cambridge University Press.

72