using sas iml® to perform hierarcidcal bayes estimation ... - Lex Jansen

9 downloads 252 Views 279KB Size Report
Hierarchical Bayes Estimation and use this algorithm written in SAS. IML® to perform a ... Discrete choice analysis consists of a series of questions, which ask respondents to ..... version 1.0, Sequim, W A: Sawtooth Software. Allenby, G.M. and ...
USING SAS IML® TO PERFORM HIERARCIDCAL BAYES ESTIMATION FOR DISCRETE CHOICE MODELING David Steenhard, LexisNexis, Louisville, KY Nan-Ting Chou, University of Louisville, Louisville, KY ABSTRACT Most methods for analyzing choice-based conjoint data do so by combining data for all individuals. There could be extreme weakness in analyzing data this way, for it could obscure important individual aspects of the data. Hierarchical Bayes Estimation is one method of estimating individual part-worths. This method can reasonably estimate individual part-worths even with relatively little data from each respondent. In this paper we provide an introduction to Hierarchical Bayes Estimation and use this algorithm written in SAS IML® to perform a choice-based conjoint aualysis. Sufficient details are provided to allow the readers to easily use Hierarchical Bayes Estimation as a tool for discrete choice modeling.

notation.

p,- N(a,D) where:

P1 =a vector ofpart-worths for the ith individual. a = a vector of means of the distribution of individuals' part-worths. D = a matrix of variances and covariances of the distribution of part-worths across individuals.

INTRODUCTION



Disaggregate or individual discrete choice modeling is fast becoming a favorite topic among market research professionals due to the technique's ability to answer a wide range of marketing questions. Discrete choice analysis consists of a series of questions, which ask respondents to choose between two or more hypothetical products. The products are described in each question by a list of attributes, which enable the respondent to easily compare between alternative products. Discrete choice modeling is typically done by using aggregate models (models that assume that all respondents have the same preferences). The resulting analysis provides a model of the choice behavior of a representative, or average respondent. However, respondents tend to differ across socio-demographic and/or attitudinal characteristics. The difficulty with an aggregate model is that by assuming all respondents have the same preferences, we assume away the important differences among respondents, which can potentially lead to problems with the model's accuracy and a loss of valuable marketing information. Important recent advances in statistics, however, have provided a way around the data deficiency of aggregate models. Hierarchical Bayes Estimation is one method of estimating individual partworths. Landmark articles by Allenby and Ginter ( 1995) and Link, DeSarbo, Green, and Young (1996) describe the estimation of individual partworths using Hierarchical Bayes (HB) models. This approach seemed extremely promising, since it could estimate reasonable individual part-worths even with relatively little amount of data from each respondent. However, this method is computationally intensive, and usually requires many thousands of iterations before convergence is met

This paper illustrates how Hierarchical Bayes Estimation in SAS IML® can be used to solve a discrete choice modeling problem and outlines the mechanics of Hierarchical Bayes Estimation in detail.

At the lower level we assume that, given an individual's part-worths, his/her probabilities of choosing particular alternatives are governed by a multinomiallogit model. The probability of the ith individual choosing the kth alternative in a particular task is: Pt = exp( x, 'P,)IL exp(x/P,) (I) 1

where:

pk

= the probability of an individual choosing the kth

concept in a particular choice task. X = a vector of values describing the jth alternative in 1

that choice task. The parameters to be estimated are the vectors

P1 part-

worths for each individual, the vector a means ofthe distribution of part-worths, and the matrix D the variances and covariances of that distribution.

ESTIMATION OF THE PARAMETERS The parameters P, a, and D are estimated by an itemtive process. This process is quite robust, and its results do not depend on starting values. However; to make the process converge as quickly as possible, one should start with estimates of the parameters that are reasonably close to fmal values.

P



Initial estimates of the 1 were set equal to the parameters of the aggregate multinomiallogit model.

IDERARCHICAL BAYES MODEL



Initial estimates of a is the avemge of the

The HB model used here is called "hierarchical" because it has two levels. • At the upper level, we assume that individuals' part-worths are described by a mnltivariate normal distribution with the following



Our initial estimates of D consists of variance and covariances of the aggregate multinomiallogit model.

146

P1 •

following fonnula:

Given these initial values, each iteration consists of the following three steps. Using the estimates of the



fl1 and D, generate a new estimate of

a . assuming that a is distributed nonnally with mean equal to

Relative Density=exrt-

Yz (fl -a)'D-1(fl -a)J

Finally calculate the ratio:

the average.

a and fl1 , generate a new estimate of



Using the estimates of the



D , from the inverse Wishart distribution. Using the estimates of the a and D, generate a new estimate of

fl1 , from a procedure known as "Metropolis Hasting Algorithm" Which will be discuss in detail in the next section. For each of these three steps we re-estimate one set of the parameters

(a, fl1 or D) based on current values of the other two sets. This technique is know as "Gibbs sampling", and converges to the correct distribution for each of the three sets of parameters. Another name for this procedure is a "Monte Carlo Markov Chain", because the fact that the estimates in each iteration are determined from those of the previous iteration by a constant set of transition rules. This process is carried out for a large number of iterations . The first few thousand are used to achieve convergence, with successive iterations fitting the data better and better. These iterations are called "burn-in" or "transitory" iterations. After the transitory iterations are completed we start to save the estimates of the

r = PNEWdNEW I Powdow From Bayesian updating the posterior probabilities are proportional to the product of the likelihood times the priors. The probabilities p NEW and pow are the likelihood's of the data given the parameter estimates

flow and fl NEW . The densities dow and d NEW are proportional to the probabilities of drawing those values of flow and flNEW, respectively, from the distribution of part-worths, and play the role of priors. Therefore, r is the ratio of posterior probabilities of

flow and flNEW.

If r is greater than unity, the new estimate has a higher posterior probability than the previous one, and we accept

fl NEW . If r is less than unity we accept flNEW with probability equal tor. Two influences are at work in deciding whether to accept the new estimate ofbeta. First, if a respondent's choices fit well, their estimated

fl1 depends mostly on his own data

fl1,a, and D for each iteration. To get a point estimates of the part-worths for each respondent, we take the average of the fl1 from

and is influenced less by the population distribution (relative density). But if their choices fit poorly then their

these iterations.

estimated fl1 depends more on the population distribution and is influenced less by their data. In this way HB makes use of every respondent's data in producing estimates for each individual. This sharing of information is what gives HB the ability to produce reasonable estimates for each respondent even when there may be inadequate information for each individual.

METROPOLIS HASTINGS ALGORITHM The Metropolis Hasting Algorithm is used to draw each new set of betas for each individual. We use the symbol

floLD

to indicate the

fl1 • We then generate a trial value for the new estimate of fl1 , which we call fl NEW , and then test whether it previous estimate of

The following SAS IML® code performs the Metropolis Hasting Algorithm.

represents an improvement. If so we accept it as our next estimate, if not we accept or reject it with probability depending on how much worse it is than the previous estimate. To get

flNEW we draw a random vector d of"differences" from a

distribution with mean of zero and covariance matrix proportioual to

D, and let flNEW = flow + d We then calculate the probability of the data given each set ofpartworths,

flOLD and fl NEW , using the formula for the multinomial

logit model (1). That is done by calculating the probability of each choice that individual made, using the multinomiallogit formula for

Pk

and then multiplying all these probabilities together and call

these resulting values

parameters

a

p OLD and p NEW , respectively.

flOLD and fl NEW , given current estimates of

and

jd,umean,arate); /* Help find the vector delb by finding the factor that is proportional to the covariance matrix */

ucov=jd*d; accept=O; decline=O; invd=inv(D); seed=int(ranuni(O)*lOOOO ); * Break all the information by individual respondents;

Next we calculate the relative density of the distribution of the betas corresponding to

start beta(nind,subj,set,x,beta ,alpha,d,

D (these serve as priors in the Bayesian

dow and d NEW . The relative density of the distribution at the location of a point fl is given by the updating). Call these values

147

do i=l to nind; xi=x[loc(subj=i),]; seti=set[loc(subj=i) ,]; /* To get the new estimate for beta draw a random vector delb from a multivariate normal distribution with mean of zero and covariance matrix proportional to D,

ratio=(pn*etmpn)l (po*etmpo); minr=min(ratio,l ); rand=uniform(O);

and let the new betan = beta + delb, where beta is the previous estimate of betan *I call vnormal(delb,ume an,ucov,l,seed); delb=delb'; seed=seed+i; betao=beta [, i I ; betan=betao+delb;

I* Determine if you want to save the new estimate of beta or not *I

* Find the exponential of the utilities for the new and old estimates; eutilo=exp(xi*be tao); eutiln=exp(xi*be tan);

I* Find the probability of each choice and for each choice task that the individual made then multiply all the probabilities together *I maxseti=max(seti ); po=l; pn=l; do j=l to maxseti; tutilo=eutilo[lo c(seti=j),]; tutiln=eutiln[lo c(seti=j),);

if rand