Chapter 2 Applications of Multivariate Latent Variable ... - Springer Link

11 downloads 2558 Views 2MB Size Report
more recent marketing applications of latent variable models based on. Green's ...... satisfaction data, direct marketing list segmentation problems, conjoint.
Chapter 2 Applications of Multivariate Latent Variable Models in Marketing Wayne s. DeSarbo, Pennsylvania State University Wagner A. Kamakura, Duke University Michel Wedel, University of Groningen and University of Michigan

Paul Green, writing with Carmone and Wachpress (1976), was among the first scholars to introduce latent variable models to marketing by utilizing the CANDECOMP procedure (Carroll 1980) on contingency tables. A year later, Green, Carmone, and Wachpress (1977) introduced logit and log-linear multi variate models to the field of marketing. Green went on to write several key books in the area of multi variate analysis, including Analyzing Multivariate Data (Green 1978), which describes many of these techniques in depth. Latent variable models have since become important tools for the analysis of multivariate data in marketing. In this chapter, we consider some more recent marketing applications of latent variable models based on Green's pioneering work. A multivariate statistical model specifies the joint distribution of a set of random variables, and it becomes a latent variable model when some of these variables - the latent variables - are unobservable. One can treat both manifest (observed) and latent variables as either continuous or discrete in this context. Bartholomew and Knott (1999) provide a framework for latent variable models, which we extend. The classification is based on the metrics of the manifest and latent variables. Both are considered to be either discrete or continuous or a combination of these, leading to the classification shown in Table 2-1. We will use this classification below to review multi variate latent variable models in marketing. Von Eye and Clogg (1994) examine latent variable models in several formats. One format arises when latent variables are regarded as causal variables, or, more generally, as predictors in some regression type model, or

43 Y. Wind et al. (eds.), Marketing Research and Modeling: Progress and Prospects © Springer Science+Business Media New York 2004

Marketing Research and Modeling

44

as moderators of a regres sion among manifest variables. Another format can be described as an attempt at measurement, when latent variable models serve to define the measurement process. For example, one might be interested in examining how well a given set of manifest variables actually measures an underlying construct. Or, a pertinent research question might be to determine the relationship between manifest variables and latent variables in some model in order to infer essential properties of both sets, such as "dimensionality" (see also Loehlin 1998). These formats will be reflected in our review of models below. Table 2-1. Classification of Latent Variable Models Adapted from Bartholomew and Knott (1999)

Latent Variables

Continuous

Manifest Variables Discrete

Continuous/Discrete

Continuous

Factor Models, Factor Regression Models,MOS

Generalized Factor Models, Factor Regression Models MOS, Latent Trait Analysis

Generalized (Mixed outcome) Factor Models

Discrete

Finite Mixture Models, Mixture Regression Models, Latent Profile Analysis

Latent Class Models, Mixture Logit Regression Models

Mixed Outcome Finite Mixture Models

Continuous / Discrete

MixtureMOS Models

Mixture MOS Models

Von Eye and Clogg (1994) examine latent variable models in several formats. One format arises when latent variables are regarded as causal variables, or, more generally, as predictors in some regression type model, or as moderators of a regression among manifest variables. Another format can be described as an attempt at measurement, when latent variable models serve to define the measurement process. For example, one might be interested in examining how well a given set of manifest variables actually measures an underlying construct. Or, a pertinent research question might be to determine the relationship between manifest variables and latent variables in some model in order to infer essential properties of both sets, such as "dimensionality" (see also Loehlin 1998). These formats will be reflected in our review of models below. All of the types of latent models we will discuss are based on assumptions on the distributions of the measured variables. Most of the more commonly used distributions represent specific members of the exponential family of distributions. This family is a general set that encompasses both discrete and

Applications of MuLtivariate AnaLysis

45

continuous distributions. The common properties of these distributions enable them to be studied simultaneously, rather than as a collection of unrelated cases. Table 2-2 presents several characteristics of a number of the most weB known distributions in the exponential family. The table lists a short notation for each distribution, the form of the distribution as a function of the parameters, and the canonicallink function. Table 2-2. Some Distributions in the Uni variate Exponential Family

Adapted from Wedel and DeSarbo (1996)

Distribution

f(y)

Domain

B(K,;r)

( : ) ' (1 _ ,,)K-,

[O,K]

P(ţl)

--

Notation

Link·function

Discrete Binomial

Poisson

e- P ţl)'

B=ln(

~) 1-;r

(0,00)

B = ln(ţl)

(-00,00)

B=ţl

(~) ex{ ~]

(0,00)

B =ţl -1

2Iy(;Jex{; ] yr~v)(~)'exp[- :;]

(0,00)

B =ţl -1

(0,00)

B =ţl -1

y!

Continuous Normal

N(ţl,a)

Exponential

Gl(ţl)

Erlang-2

G2(ţl)

Gamma

G(ţl, V)

J ,[2"u

exp

[-!Y-I'f] 2a 2

We review three important types of multivariate latent variable models, according to the metric of the latent variable (Table 2-1). We first discuss Unrestricted Finite Mixture Models and Mixture Regression Models, in which the latent variable moderates the regression equation. Next, we describe Latent Class MDS Models that have both discrete and continuous latent variables. Note that, when applied in a marketing context, the discrete classes uncovered by such finite mixture models are often interpreted as derived market segments. We then examine Generalized Factor Models and Factor

46

Marketing Research and Modeling

Regression Models before tuming our attention to Software for estimating many of the latent variable models discussed. We conc1ude by discussing some areas for future research and investigation.

UNRESTRICTED FINITE MIXTURE MODELS To formulate the finite mixture model, assume that a sample of N subjects is collected. Foreach subject, Kvariables Yn =(Ynk,n = 1, ... ,N;k = 1, ... ,K) are measured/observed. These subjects are assumed to arise from a population that is a mixture of T unobserved c1asses, in (unknown) proportions 1l1''''' 1lT . It is not known in advance from which c1ass a particular subject arises. Given that Ynk comes from c1ass t, the distribution function of the vector of observed measurements Yn' is represented by the general form

f, (y n Ie, ).

Here

el' denotes the

vector of unknown parameters for c1ass t.

For example, in the case that the Ynk within each c1ass are independent

a;,

of the normally distributed, e, comains the means, J1k,' and variances, normal distribution within each of the T c1asses. The unconditional distribution is obtained as: T

f(y" I~)= L1l,f,(y" 10,),

(1)

'=1

where ~ = (1l, O) denotes alI parameters of the model.

The conditional density function, f, (y n IO, ), can take many forms inc1uding the normal, Poisson, and binomial distribution functions (as weB as other distribution functions in the exponential family). Often, the K repeated measurements (or K variables) for each subject are assumed independent. This implies that the joint distribution function for the K observations factors into of the respective marginal distributions: the product

f, (y n IO, ) =TI f,k (y nk Ie,k ). K

Note that the distribution functions are

k=1

subscripted by k so that each observed outcome variable k may have its own distribution, i.e., some normal, others binomial, Poisson, etc. The posterior probability, PnI , that subject n comes from c1ass t can be obtained from Bayes' Theorem:

P = 1l,f,(Yn ni

T

lat}

L1l,f,(Yn 10,)

(2)

'

'=1

given estimates of the parameters

e,.

Applications of Multivariate Analysis

47

Identification and Estimation A potential problem associated with mixture models is that of under identification. Titterington, Smith, and Makov (1985) show that in general mixtures involving members of the exponential family, including the uni variate binomial, normal, Poisson, exponential, and gamma distributions, are identified. The parameters are usually estimated by maximizing the log-likelihood:

1(; I y, •...• yN ) =

t.

1n(

ţJr,!.(y, 10,)).

(3)

This can be done by direct numerical maximization, but the E-M algorithm is a convenient algorithm often applied for that purpose (cf. Dempster, Laird, and Rubin 1977). The value of T (the number of classes) is that which minimizes: CAICT = -2 In L + N T (In N + 1), where N T is the effective number of parameters estimated and N is the number of independent data observations (Bozdogan 1987). Several other model selection heuristics also exist (see Wedel and Kamakura 2001).

Application: Audio Equipment Buyers Dillon and Kumar (1994) offer an example of the application of mixture models to examine shopping behavior, using data from Dash, Schiffman, and Berenson (1975) on the shopping behavior of 412 audio equipment buyers, categorized by five descriptors:

1. Store in which the merchandise was purchased: full-line department store (DEPT = 1) or specialty merchandiser (SPEC

=2).

2. Catalog experience: individual had sought information from manufacturers catalogs (YES=l); else (NO = 2). 3. Prior shopping experience: individual had shopped for audio equipment prior to making final decision (YES = 1); individual had not shopped for audio equipment prior to making final decision (NO = 2). 4. lnformation seeking: individual had sought information from friends and/or neighbors prior to purchase (YES = 1); individual had not sought information from friends and/or neighbors prior to purchase (NO = 2). 5. lnformation transmitting: individual had recent1y been asked for an opinion about buying any audio-related product

Marketing Research and Modeling

48

(YES = 1); individual had not been asked for an opinion about buying any audio-related product (NO = 2). These five categorical variables have two levels each, generating 32 unique response pattems. For illustrati ve purposes, Dillon and Kumar (1994) compute the parameter estimates and corresponding standard deviations for the latent twoand three-class models shown in Table 2-3. Table 2-3. Store Choice Data: Parameter Estimates Standard deviations parameters assure boundary value - treated as if fixed, and degrees of freedom adjusted accordingly. Adapted from Dillon and Kumar (1994)

2-c1ass model Class size ()s Conditional probabilities

t/J jrs

store dept.

Specialty

3-c1ass model

I 0.4617 (0.055)

2 0.583 (fixed)

I 0.3110 (0.053)

2 0.1293 (0.045)

3 0.5597 (fixed)

0.0295 (0.037)

0.6690 (0.053)

0.000 (bounded)

1.000 (bounded)

0.4368 (0.076)

0.4706 (fixed)

0.3310 (fixed)

1.000 (bounded)

0.0000 (bounded)

0.5032 (fixed)

0.7969 (0.034) 0.2031 (fixed)

0.6466 (0.036) 0.3534 (fixed)

0.8282 (0.043) 0.1718 (fixed)

0.6414 (0.089) 0.3580 (fixed)

0.6709 (0.041) 0.3291 (fixed)

0.7826 (0.047) 0.2174 (fixed)

0.2667 (0.042) 0.7533 (fixed)

0.8678 (0.063) 0.1372 (fixed)

0.0000 (bounded) 1.0000 (bounded)

0.4198 (0.061) 0.5802 (fixed)

0.7463 (0.049) 0.2537 (fixed)

0.2437 (0.039) 0.7563 (fixed)

0.8410 (0.071) 0.1590 (fixed)

0.0553 (0.101) 0.9447 (fixed)

0.3699 (0.047) 0.6301 (fixed)

0.9517 (0.030) 0.0483 (fixed)

0.5455 (0.045) 0.4545 (fixed)

1.000 (bounded) 0.0000 (bounded)

0.2528 (0.142) 0.7472 (fixed)

0.6956 (0.049) 0.3044

Catalog experience Yes No

Prior shopping Yes No

Information seeking Yes No

Information transmitting Yes No

(bounded~

In the latent two-class model, Class 1, representing a little over 46 percent of the population, contains almost exclusively specialty store shoppers. These shoppers, compared to Class 2, exhibit more catalogue experience, shop in stores prior to purchase, and both seek and transmit information prior to purchase. In the latent three-class model, the classes have an interesting structure with respect to department and specialty store shoppers. Notice that

49

Applications of Multivariate Analysis

Class 1 is excIusively made up of specialty store shoppers, whereas Class 2 is excIusively made up of department store shoppers. Class 3 is made up of both types of shoppers, with the odds favouring specialty store shoppers slightly. Note that whenever a latent cIass parameter assumes a value of 0.0 or 1.0, a boundary solution has been obtained. Here, four parameters in the latent three-cIass model hit boundary values. When boundary values are encountered, the sampling distribution of X 2 and G 2 are not known. The convention in such cases, according to these authors, is to act as if these parameters had been set a priori to 0.0 or 1.0, in which case the large sampling theory underlying the X 2 and G 2 would apply. Thus, in the case of the latent three-cIass model, the degrees of freedom reported in the table are 18 instead of 14, reflecting the fact that four fewer parameters were estimated. The contrast between Class 1 and Class 2 is apparent. Class 1 specialty store shoppers initiate ano engage in all of the pre-purchase activities with greater probability than their Class 2 department store shopper counterparts.

MIXTURE REGRES SION MODELS The mixture regression framework extends the unconditional mixture approach described in the previous section. As above, we assume that the vector of observations (on the dependent variable) of subject n, y n' arises from a population which is a mixture of T unknown cIasses in unknown proportions. The distribution of Yn' given that Yn comes from class t,

ft (y n

Iet ),

is assumed to be one of the distributions in the exponential family. In addition to the dependent variables, a set of P non-stochastic explanatory variables X1' .•. , X P (X p = (X p = 1, ... , P ) is specified.

nJ;

The development of the c1ass of mixture regression models is very similar to that of the mixture models described above. However, the means of the observations in each cIass here are to be predicted from a set of explanatory variables. To this end, the mean of the distribution is written as 17nkt = g(PnkJ, where g(.) is the link-function, and 17nkt is the linear predictor. Convenient link-functions, called canonical links, are respectively the identity, log, logit, inverse, and squared inverse functions for the normal, Poisson, binomial, gamma, and inverse Gaussian distributions (see Table 2-2). The linear predictor in c1ass t is a linear combination of the P explanatory variables:

50

Marketing Research and Modeling p

17nkt

=L xnkp P

(4)

tp '

p=l

where

Pt = (PtP) is a set of regres sion parameters to be estimated for each

c1ass.

Identification and Estimation The same remarks on identification made above apply to mixture regres sion models. However, for the mixture regres sion model, an additional identification problem presents itself conceming the conditioning of the Xmatrix and the size of P. Collinearity of the predictors within c1asses may lead to instable estimates of the regression coefficients and large standard errors. In mixture regression models, this situation is compounded by the fact that there are fewer observations for estimating the regression model in each c1ass than at the aggregate level. Therefore, the condition of the X-variables is an important issue in applications mixture regression models. The parameters are again estimated by maximizing the log-likelihood:

1(; I y, ,... , y N

l ţln(t. ,. ,/, =

(y,

I p,

l).

(5)

through direct numerical maximization. The E-M algorithm can also be used (ef. Dempster, Laird, and Rubin 1977), where the M-step in this case involves numerica] optimization. Again, the number of mixture eomponents is determined as that value which minimizes CAlC.

Application: Trade Show Performance DeSarbo and Cron (1988) designed a mixture regression model that enables the estimation of separate regression functions (and corresponding object memberships) in a number of c1asses using maximum likelihood. They used the model to analyze the factors that influence perceptions of trade show performance, and to investigate the presence of c1asses that differ in the importance attributes to these factors in evaluating trade show performance. The model is a finite mixture of uni variate normal densities. The expectations of these densities are specified as linear functions of a pre-specified set of explanatory variables. In their study, DeSarbo and Cron asked 129 marketing exeeutives to rate their firm's trade show performance on eight performance factors, as well as on overall trade show performance. The performance factors included: 1) Identifying new prospects, 2) Servicing current customers, 3) Introducing new products, 4) Selling at the trade show, 5) Enhaneing eorporate image, 6) Testing of new products, 7) Enhancing corporate morale, and 8) Gathering competitive information.

Applications of Multivariate Analysis

51

An aggregate-Ievel regression analysis of overaH performance on the eight performance factors, the results of which are depicted in Table 2-4, revealed that identifying new prospects and new product testing were significantly related to trade show performance. These results were derived by a standard OLS regression of the overaH performance ratings on the ratings of the eight factors. However, a mixture regression model revealed two cJasses (on the basis of the AIC criterion), composed of 59 and 70 marketing executives respectively. The effects of the performance factors in the two cJasses are markedly different from those at the aggregate level, as shown in Table 2-4. Table 2-4. Aggregate and Segment·level Results of the Trade Show Performance Study Adapted from DeSarbo and Cron (1988) Intercept 1. New prospects 2. Current customers 3. Product introduction 4. Selling 5. Enhancing image 6. New product testing 7. Enhancing morale 8. Competitive information Size (%)

Aggregate 3.03' 0.15-0.02 0.09 -0.04 0.09 0.180.07 0.04

Class 1 4.093 ' 0.126 0.287-0.157-0.1330.1280.107 0.155-0.124 0.489

Class2 2.218' 0.242· -0.1640.2040.0740.072 0.282-0.026 0.023 0.511

*p