GENFOLD2: A set of models and algorithms for the general ...

3 downloads 0 Views 2MB Size Report
Abstract: A general set of multidimensional unfolding models and algo- rithms is presented to analyze preference or dominance data. This class of.
Journal of Classification 1:147-186(1984)

lournal of

Classification

©1984 Sprmger-VerlagNew York Inc.

GENFOLD2: A Set of Models and Algorithms for the GENeral UnFOLDing Analysis of Preference/Dominance Data Wayne S. DeSarbo

Vithala R. Rao

University of Pennsylvania

Cornell University

Abstract: A general set of multidimensional unfolding models and algorithms is presented to analyze preference or dominance data. This class of models termed GENFOLD2 (GENeral UnFOLDing Analysis-Version 2) allows one to perform internal or external analysis, constrained or unconstrained analysis, conditional or unconditional analysis, metric or nonmetric analysis, while providing the flexibility of specifying and/or testing a variety of different types of unfolding-type preference models mentioned in the literature including Carroll's (1972, 1980) simple, weighted, and general unfolding analysis. An alternating weighted least-squares algorithm is utilized and discussed in terms of preventing degenerate solutions in the estimation of the specified parameters. Finally, two applications of this new method are discussed concerning preference data for ten brands of pain relievers and twelve models of residential communication devices. Keywords: Multidimensional scaling; Unfolding analysis; Preference models

1. Introduction Carroll and Arabie (1980) present an extensive taxonomy of measurement data and multidimensional measurement models, updating the work of Coombs (1964). Their schema attempts separately to classify data and models. In classifying data, these authors discuss seven distinguishing properties: number of modes, power of a given mode, number of ways, scale of data, conditionality of data, completeness of data, and number and nature of replications. For their taxonomy of models, they list four relevant characteristics: type of geometric model, number of sets of points in space, Authors' Addresses: Wayne S. DeSarbo, Marketing Department, Wharton School, University of Pennsylvania, Philadelphia, PA 19104 and Vithala R. Rao, Graduate School of Management, CorneU University, Ithica, NY 14853.

148

W.S. DeSarbo and V.R. Rao

number of spaces or structures, and degree of external constraint on model parameters. The authors then attempt to classify existing models and methods on the basis of these criteria. We wish to present a general class of models: GENFOLD2, GENeral UnFOLDing Analysis-Version 2, which, according to the Carroll and Arable (1980) classification, analyzes two-mode, polyadic, two-way, ratio or interval or ordinal scale, unconditional or conditional (assumptions concerning the comparability of the data), complete data. GENFOLD2, like traditional unfolding models, is a spatial, distance model, positioning two sets of points in the same space, allowing for constrained or unconstrained solutions. GENFOLD2 is an improved, modified version of GENFOLD (DeSarbo and Rao 1983), utilizing a more efficient algorithm and, as shall be discussed, provides joint space solutions which are "nondegenerate." We first describe the analytical problem of unfolding and review the relevant literature. GENFOLD2 is then presented in detail and the algorithm employed in the estimation of the parameters is discussed. Two applications of GENFOLD2 to preference judgments for over-the-counter pain relievers and intention to purchase statements for twelve types of new residential communication devices are discussed in detail, as are the managerial implications from a marketing point of view. Finally, some directions for future research are discussed.

2. Literature Review A number of different models and algorithms for estimating them have been proposed to account for individual differences in preferences. Here, the data are typically two-mode, polyadic, and two-way, where subjects render preference judgments on a specified set of stimuli. The psychometric literature on preference models has focused on two distinct types of models -- Tucker's (1960) vector model and Coombs' (1964) unfolding model Both models assume that subjects arrive at their preference judgments by considering a multidimensional set of stimulus characteristics, but differ in assumptions about how subjects combine stimulus information to arrive at a judgment. Davison (1976) and Carroll (1972, 1980) compare these two types of models and discuss the assumptions and implications of each. Focussing upon the unfolding-type, distance, spatial models, Bennett and Hays (1960) first generalized Coombs' (1950) unidimensional unfolding model to the multidimensional case. Here, both subjects and stimuli are represented as points in the same multidimensional space. The points for individuals represent "ideal" stimuli, or optimal sets of stimulus values (e.g. objects, brands, products, etc.) for those individuals. In the Bennett and Hays model, the farther a given stimulus point is from an individual's ideal point, the less the individual likes that stimulus. This notion of

GENFOLD2 for Analysisof Preference/DominanceData

149

relative distance implies a metric on the space. Bennett and Hays assumed this to be Euclidean -- implying that the isopreference contours, in two dimensions, are a family of concentric circles centered at the individual's ideal point (hyperspheres in higher dimensions). Several authors have proposed algorithms for estimating stimulus scale values and ideal point coordinates from preference judgments assumed to follow the unfolding model (Lingoes 1972, 1973; Bennett and Hays 1960; Roskam 1973; Young and Torgerson 1967; Kruskal et al. 1973, 1977; Kruskal and Carroll 1969; Sch/Snemann 1970; Carroll 1972, 1980; Takane, Young, and de Leeuw 1977; Heiser 1981; Spence 1979; Greenacre and Browne 1982). This approach of estimating both ideal points and stimuli coordinates is known as internal analysis (Carroll 1972), as opposed to external analysis methods which estimate only ideal points given the stimuli coordinates (obtained from perhaps an MDS analysis of similarities). Carroll (1972, 1980) has introduced PREFMAP and PREFMAP2 as a series of models and algorithms to perform analyses of preference data. His methods allow the user to select between internal or external, metric or nonmetric, and unfolding or vector model analyses. Three different (nested) unfolding models can be estimated in PREFMAP and PREFMAP2. Let: i

--

1 .......... I subjects;

j

=

1 .......... J stimuli;

t

--

1 .......... T dimensions.

Mq

---

the preference value subject i renders for stimulus j;

d~

=

the squared distance between subject i's ideal point and stimulus j ;

xjt

--

the t-th coordinate of stimulus j;

YJt

---

the t-th coordinate for subject i's ideal point;

w,t

=

the importance or salience weight of dimension t to subject i;

at

--

multiplicative constant for subject i;

bt

--

additive constant for subject i.

Then, the unfolding model (Carroll 1972) can be written in general form as:

W.S. DeSarbo and V.R. Rao

150

M q ~ F 7 1 (d2)-- a i d 2 + b i

,

(1)

with ai presumably negative; the specifications of the various unfolding models appearing in the literature can be treated as special cases of this general model. The simple unfolding model (Coombs 1964, pp. 140-180) defines di} as: T

d.2= ,j ~., (xjt - yit) 2

(2)

This "simple unfolding model" assumes that a given distance on a dimension makes as much difference to one subject as to another, as well as assuming that all individuals use the same set of dimensions within the space. Sch6nemann (1970) presents an analytic internal solution for a strong case (treating the data as unconditional) of this unfolding model. Ross and Cliff (1964) provide methods for obtaining a stimulus configuration through a singular value decomposition of the double centered M - ((M~j)) matrix under suitable assumptions. Sch~Snemann and Wang (1972) combine this metric unfolding model with the Bradley-Terry-Luce choice model (Luce 1959), to produce a stochastic unfolding approach that is applicable to paired comparisons data. Zinnes and Griggs (1974) present a probabilistic multidimensional analogue of this model. Davidson (1972, 1973) presents a geometrical analysis for this type of unfolding model. Carroll's weighted unfolding model defines di~ as: T

di~ = ~., wi,(xjt - Yit) 2

(3)

t=l

Here, subjects are allowed to weight the dimensions differently, where wit can be thought of as the "salience" or "importance" of the t-th dimension for subject i. Now the isopreference contours are ellipses, ellipsoids, or hypereUipsoids instead of circles, spheres, or hyperspheres, respectively, as in the simple unfolding case. Srinivasan and Shocker (1973) present a nonmetric external unfolding analysis with this model using linear programming methods including nonnegativity constraints for the dimension weights. The same constraints are provided in a metric procedure using quadratic programming described by Davison (1976). Spence (1979) presents an interesting generalization of this model allowing for linear constraints on x = llxi, ll and/or Y = lly,,ll. Finally, (1972, 1980) Carroll's general unfolding model specifies di} as: T

= Z t=l

y;)2

,

(4)

GENFOLD2 for Analysis of Preference/Dominance Data

151

where Xj = XjTi (Xj = the j-th row of X), Yi* = YiT; (Y~ = the i-th row of Y), and T~ = an orthogonal transformation matrix for the i-th subject. This unfolding model allows subjects the freedom of implicitly choosing idiosyncratic sets of reference axes within the space. That is, each subject is allowed to rotate the reference frame of the perceptual spaces and then to weight differentially the dimensions defined by this rotated reference frame. Thus, X~ depends on i. Bechtel (1976) also presents a related general powered distance model for unfolding. Ramsay (1982) and de Leeuw and Heiser (I979) also consider such metrics for dissimilarity data. There is controversy in the literature over the desirability of constraining wi,'s in the weighted (and general) unfolding model to be positive. Carroll (1972) claims that in the weighted unfolding model, a negative wit has a clear interpretation -- if vet is negative with respect to dimension t, the ideal point for individual i indicates the least preferred, rather than the most preferred, value and the farther a stimulus is along that dimension from the ideal point, the more highly preferred is that stimulus. He thus argues for not constraining the w,t to be positive. Other authors such as Srinivasan and Shocker (1973) and Davison (1976) dispute the value of unconstrained analyses, claiming that the existence of such negative weights may lead to unrealistic predictions for subjects' most preferred stimuli.

3. GENFOLD2

3.1 Objectives Our primary goal is to provide a general, flexible unfolding model which can accommodate: conditional or unconditional analyses, internal or external analyses, W = II w,t ll constrained or unconstrained analyses, ratio, interval, or ordinal scaled data, and linear constraints on Y or X. This last feature proves to be of considerable utility in many types of applications, as we discuss later in the paper. In addition, we wish to create an unfolding procedure which will not render degenerate solutions which have plagued many past efforts to fit such models. Thus, GENFOLD2 allows the user to fit a large class of different types of unfolding models, combining the features of many types of unfolding models encountered in the literature.

3.2 The Model Let: Aij

The "dispreference data value" (inversely related to preference data values) the i-th subject has for the j-th stimulus;

152

W.S. DeSarbo and V.R. Rao

Y~t

=

the t-th coordinate of subject i's ideal point;

xjt

=

the t-th coordinate of stimulus j ;

Wj

=

subject i's linear (symmetric) transformation matrix;

a~

ffi

subject i's multiplicative constant;

bi

--

subject i's additive constant;

c~2

--

subject i's exponent;

f~j

ffi

squared distance between subject i and stimulus j ;

ee

--

error;

Aa

=

the l-th descriptor variable for subject i;

art

--

the importance or impact of the l-th descriptor variable on dimension t;

Bjk

=

the k-th descriptor variable for stimulus j ;

Okt

=

the importance or impact of the k-th descriptor variable on dimension t;

i

=

1 .... , I

subjects;

J

=

1 .... , J

stimuli;

t

=

1,...,T

dimensions;

!

--

1,...,L

subject descriptor variables;

k

=

1,...,K

stimulus descriptor variables.

Then, the full G E N F O L D 2 model can be written as: Au = / ~ u + eu where:

~kij= and

fc~2 + bi

ai.~ ij

,

(5)

153

GENFOLD2 for Analysisof Preference/Dominance Data

=

-

y,),

3.2.1 Conditional vs. Unconditional Analyses GENFOLD2 allows a variety of options for the estimation of ai, b~, and q parameters. These parameters prove to be very important sources of information concerning the structure of the data used in the analysis. For example, the a~'s in the simple unfolding context, when negative above, indicate the presence of an "anti-ideal" point, similar to the interpretation of negative weights in the weighted unfolding model. The exponents, c7, which are constrained purposely to be nonnegative for interpretability, render information as to the rate that preference decreases as one moves away in any direction from subject i's ideal point. Finally, the b,.'s are additive constants, appropriate for interval scale data. In the fully conditional case, all three sets of parameters are unconstrained. In the fully unconditional case, the user has the options of setting a~ -- a or some fixed constant (e.g., a = 1), b~ = b or some fixed constant (e.g., b = 0), and for c2 = c or some fixed constant (e.g., c = 1). The user is allowed the flexibility of mixing both constrained and unconstrained sets of parameters with respect to individual subjects. 3.2.2 Internal vs External Analysis Assuming that a configuration X has been derived (call it X*) from a previous analysis (e.g., from an MDS analysis of similarities), one can constrain X = X* in the analysis (external analysis). Or, one can estimate X (internal analysis) from the data. • 3.2.3 Form of W ~ One set of constraints on W i gives the user the flexibility of specifying a particular type of unfolding model. For example, with q2 = c = 1, Vi = 1 . . . . . 1, then by constraining W'----I (a Tx T identity matrix), V i= 1 . . . . . 1, one obtains Carroll's simple unfolding model. By constraining W i to be diagonal, V i = 1. . . . . I, one obtains Carroll's weighted unfolding model. Similarly with ci 2 = c = 1 , V i = 1 . . . . . I , and W i = UiU '', one obtains Carroll's general unfolding model. 3.2.4 Ratio, Interval, or Ordinal Scale GENFOLD2 is designed to handle all three of these types of data. For ratio scale data, bi -- b -- 0, ~/i-- 1 . . . . . I, while for interval scale data, the additive constant bi is estimated or constrained to some b (not necessarily equal to zero). For ordinal scale data, as to be discussed, a monotone

154

W.S. DeSarbo and V.R. Rao

regression phase (Kruskal 1964b) is appended to the metric GENFOLD2 algorithm. 3.2.5 X and/or Y Constrained vs Unconstrained Finally, GENFOLD2 allows one to constrain either Y or X or both to be linear function of some known set of preselected descriptor background variables such as demographic or product feature variables (A for subjects, B for stimuli), i.e., Y - A/a X-- BO

(6)

As in CANDELINC (Carroll, Pruzansky and Kruskal 1980) and in threeway multivariate conjoint analysis (DeSarbo, Carroll, Lehmann, and O'Shaughnessy 1981), these constraints can aid in the interpretation of the dimensions derived (c.f., Bentler and Weeks 1978; Bloxom 1978; Noma and Johnson 1977; de Leeuw and Heiser 1978; Lingoes 1980) and can replace the property-fitting methods often used to attempt to interpret results after a solution is obtained. Also, as shall be discussed shortly, the imposition of these sets of constraints can provide an effective tool for product positioning and market segmentation analysis.

3.3 The Algorithm In essence, GENFOLD2 attempts to estimate the desired set of constrained and/or unconstrained parameters described (i.e., some subset of: W j, X, Y, a, b, c, a , 0) given A and T (the number of dimensions) in order to minimize the weightedsum of squares objective function:

Min Z = E E ?u [Au - Ai/12 i

(7)

j

There has been considerable research attempting to cure unfolding of its tendency toward degenerate solutions. Degenerate solutions often occur in multidimensional unfolding in a number of ways. One common degenerate pattern that frequently occurs (in two dimensions) is where the two sets of points lie in concentrentric circles, where one set of points encircles another. Another typical degenerate solution often encountered is one where (in two dimensions) the two sets of points are separated from one another on one or more of the dimensions (e.g., the ideal points all in quadrant one and the stimuli all in quadrant three of a two-dimensional space). Such degeneracies have been investigated by Kruskal and Carroll (1969)

GENFOLD2 for Analysisof Preference/DominanceData

155

who propose different "badness-of fit" measures to solve this problem. Their method tends to work nicely with artificial data, but according to Carroll (1972), their technique "has so far performed less than fully satisfactory with real data." Carroll (1972, 1980) also acknowledges this degeneracy problem associated with the unfolding model. Heiser (1981) imposes configuration restrictions in the two-way case on the X = t txj, ll and re = I lyt, II joint space. The restriction Heiser imposes is that each stimulus is constrained to be at the centroid of the location of the subjects (ideal points) for whom it was most preferred (first chosen), or, more generally, among the first M ~< J most preferred stimuli. While this strategy is clearly a major contribution, several limitations arise with such an approach. One limitation with the Heiser approach concerns the assumption regarding the reliability of the data. Basically, Heiser's (1981) approach tends to place most of the weight in estimating parameters on the first M choices (most preferred stimuli), tending to down-weight the rest of the data. This strategy is roughly equivalent to assuming that a subject can only reliably render preference information about a brand most preferred, or the first M preferred, while the other judgments may be too "noisy" to carry significant weight in estimating model parameters. While this "implicit weighting scheme" may be viable for some particular applications, it is certainly not realistic for many. Depending upon the nature of the study, questionnaire design, number of stimuli, discriminability between stimuli, type of subjects, knowledge of the stimuli by subjects, etc., different assumptions may be appropriate concerning how much error preference judgments contain. For example, in product testing for an automobile, a consumer may be able to render reliable judgments on the first M preferred automobiles. Or, it may be the case that the typical subject could provide reliable ratings for a favorite M1 cars and for a least favored M2 cars. That is, it is conceivable that a subject can reliably tell the interviewer what he/she likes and dislikes, but may have trouble rating cars in between. Such data may require a quite different weighting scheme, if the aim is to weight more heavily the more reliable responses. We propose a different approach to the degeneracy problem in unfolding, similar to that of DeSarbo and Carroll (1983). Our approach involves explicitly altering the loss function in expression (7) to incorporate data weights 3'u, where the 7u are defined by the user to weight the A U values differently. We share Heiser's (1981) implicit theory about a possible cause for degeneracy being the error or noise in the data, and we thus provide the flexibility of the user specifying 7;j differently. For example, one may define the weights as: 7ii

=

(8)

W.S. DeSarbo and V.R. Rao

156

where p is some exponent. Assuming the Aij are dispreference data values, such a weighting scheme would weight smaller values (more preferred) higher than larger values (less preferred). In cases where specific preprocessing is involved (e.g., subtracting means) and negative values of A/J appear as interval scale data, such a weighting function may not be meaningful. Here, for example, one could use:

"F/j----

(9)

where p is an exponent and r(A 0) represents the row ranks (from smallest = 1 to largest = J) of the A/J. Note, as p-'-' ~ , this (row) weighting scheme in (9) resembles Heiser's (1981) configuration restrictions (M ~ 1) since only first choices would be significantly weighted (as 'Fu = 1 for first choices), while the rest of the 3,/j would tend to zero. (It should be noted that this limiting case would not be precisely equivalent to Heiser's approach.) Other weighting options are also possible. For example, one could specify 'F/J = 1, Vi,j, so that the "weighted" loss function reduces to the nonweighted one. Or, one could specify a bimodal or step weighting function where, say, the first three and last three choices would be highly weighted, and all others received low weights. The choice of the "appropriate" weighting function depends upon such factors as the preprocessing options and scale assumptions of the data, the assumptions of the conditionality of the data, the assumptions concerning the reliability of the different data values, and, trial and error. As mentioned, if the data were treated as interval scale and preprocessed accordingly (e.g., by taking out particular row, and/or column means), then specifying 'Fu as in (8) may not make sense. Depending upon the conditionality assumption made for the data, it may or may not make sense to define "F/J via expression (9) over rows. Also, different "F/j could be specified depending upon the assumptions made concerning the reliability of the A U collected. As stated previously, if the researcher believes that only highly preferred judgments are reliably given, one could specify a unimodal 'F/J function. On the other hand, if one believes that the A/J can be reliably given for both highly preferred and highly non-preferred judgments, then a bimodal weighting function can be specified. While the first three criteria may provide insight into the general form of "F/J (e.g., unimodal vs. bimodal, or expression (8) vs. expression (9)), specific decisions, such as what value p should have, can be made by trial and error, although as will be demonstrated in the example to follow, in expression (9), p - - 2 appears to work well. We believe that the

GENFOLD2 for Analysisof Preference/DominanceData

157

specification of p is an empirical issue depending upon the data. One approach is to run the analyses with a different sequence of p's (p = 0,1,2,3,4) and examine at what point degenerate solutions disappear. Another approach worthy of further research would be to estimate p while fitting the model. There are some twelve steps involved in the GENFOLD2 algorithm. Each step is described in turn. 3.3.1 Input In addition to the two-way, two-mode data matrix of dispreference values A, the user must input various control parameter values designating: the number of dimensions (T) for the analysis, preprocessing options, method of generating starting values, type of unfolding model, type of data scale, type of analysis (external or internal), whether Y is constrained, whether X is constrained, desired c~ option, desired bi option, desired ai option, W ~ constraints, maximum number of major iterations (MAJOR), maximum number of minor iterations (MINOR), convergence tolerance (TOL), maximum number of iterations for nonmetric procedure (NMET), and, type of weighting function. These input control values will be described in detail in the discussion of the main parts of the algorithm to follow. 3.3.2 Preprocessing GENFOLD2 allows the user to either use the raw data or to utilize one of a variety of preprocessing options such as: row center A, row center and row standardize A, row and column center A, double center & and row standardize, remove geometric mean from rows, remove geometric mean from columns, normalize columns to unit sums of squares, and normalize rows to unit sums of squares. Clearly, the choice of preprocessing option will depend on such factors as the assumption concerning the scale of the data, the units of measurement, conditionality assumptions, method of data collection, etc. 3.3.3 Generate Starting Values GENFOLD2 has been programmed to allow for a variety of different starting values for W i, X, Y, etc. One can choose from the following starting procedures:

3.3.3.1 Random Start Here, random and feasible values are generated for W', X (or #), and Y (or a ) from a uniform distribution. The values, especially for W j, must be feasible (e.g., diagonal if W i is constrained to be so)

158

W,S. DeSarbo and V.R, Rao

for the algorithms to function properly as will be demonstrated. Also, at = 1 b~ -- O, and ct = 1, V i -- 1. . . . . I, in this starting option.

3.3.3.2 External Analysis. Here, X = X" is given having been derived through a scaling analysis of similarities or from some other method (e.g., from a hypothesis or theory). The X* is fixed throughout the analysis. Values for the remaining parameters are generated randomly as above. 3.3.3.3 Values GiveJt GENFOLD2 also allows one to start the analysis with predetermined values of all of the parameters. This proves to be a valuable option when testing the goodness-of-fit and stability of some hypothesized solution or one obtained from some previous analysis. In this option, GENFOLD2 immediately proceeds to the estimation phases of the algorithm. 3.3.3.4 A "Close" Start On X. Using the procedure demonstrated by Ross and Cliff (1964), Sch6nemann (1970), and Carroll (1980), one can double center (or single center) A and perform a singular value decomposition on this double (single) centered matrix: A - - PDQ'

,

(10)

and let the starting value of X be QD. Starting values for the remaining parameters are generated randomly as described above. Under certain restrictive assumptions demonstrated by Ross and Cliff (1964) and Sch~nemann (1970), this method of internally generating an X configuration will be "correct" up to an affine transformation. Note, one could also perform such a decomposition on the matrix t l A u • in order to take into account the specified weighting function.

3.3.3.5 Close Values for Parameters. Assuming c~ = 1, Vi = 1. . . . . I, "close" values can be defined for the simple, weighted, and general unfolding models using regression techniques adapted from Carroll's (1980) PREFMAP2. 3.3.3.6 Some General Considerations. Each of these three "close starting" algorithms resemble the initial procedure utilized in PREFMAP and PREFMAP2 to derive X, Y, W ~ estimates. For the simple and weighted unfolding models where a rotational problem exists, the PREFMAP and PREFMAP2 procedure then, seeks to find an appropriate transformation (for orientation and/or weighting) for the coordinate system. In the weighted unfolding model, only the orientation of axes needs to be determined since the weighting is determined separately for each subject. For simple unfolding, however, both orientation and weighting are solved for. (However, the

GENFOLD2 for Analysisof Preference/DominanceData

159

transformation solved for in PREFMAP and PREFMAP2 is not optimal in a least-squares sense.) One interesting approach would be to use the final PREFMAP or PREFMAP2 solutions as starting values for GENFOLD2 -work that is currently underway. Note that these methods of generating "close" starting values stated above ignore the weighting function Yu. Current efforts are progressing to have these weights built into the estimation procedure as an option. This would be done by simply multiplying the corresponding data in each case by 7 i~ and adjusting the estimates accordingly. Also, note that starting estimates for a and/or # can also be obtained after providing such starting values for X and Y. One can obtain starting values for a by regressing Y on A, and similarly obtain starting values for 0 by regressing X on B. 3.3.4 Estimate W ~ There are a variety of options in GENFOLD2 for imposing particular constraints on W t. For example, if a version of the simple unfolding model were desired, W ~= I, an identity matrix, and the analysis would skip this phase of the algorithm for all subsequent iterations. Other major constraint options concern particular combinations on whether W j = UiU e is to be diagonal or generally symmetric. Note, by requiring that W " = U ' U e, we force W t to be symmetric and at least positive semi-definite. Both make sense given the quadratic form of the squared distance f e given in equation

(5).

3.3.4.1 General s y m m e t r i c W i. For this selection, a general, unconstrained symmetric I x Tx T array (W) is to be estimated. Since one can express W i = UiU ~', we calculate partial derivatives of the loss function in equation (7) with respect to Ui: 0_.__z.z= _ 2 E

OUi

0£j

'u(au -

(11)

a ^

c?--i

= - 2a, ci2 ~_~ T,j(A e - A i j ) f e

(Xj

--

Yi)'(Xj

-

Yi)U i

(12)

These partial derivatives are to be used in a Quasi-Newton unconstrained algorithm to provide an approximation to the Hessian matrix rendering information of the curvature of the objective function. The algorithm used is a modification of the Davidon-Fletcher-Powell Method (Davidon, 1959; Fletcher and Powell 1963).

160

W.S. DeSarbo and V.R. Rao

Let: r

=

I T 2,

Hk

=

an r x r positive definite symmetric matrix, at the

hk

=

optimal step length at iteration k,

VZk

=

[ 0 ~ ] = the gradient of the objective function at iteration k,

Sk

----

the search direction at iteration k.

k-th

iteration;

The steps of the iterative algorithm used are as follows: 1.

Start with given values (U), and an r × r positive definite symmetric matrix H1 = I (identity matrix) initially. Set k -- 1.

2.

Compute V Z k at the point (U)k and set S k -- -

HkVZ

k

(13)

Note that for the first iteration, the search directions will be the same as the steepest descent direction -- - VZ1, providing H1 -- I. 3.

Find the optimal step length h k* in the direction Sk. This is done through use of a quadratic interpolation (see Cooper and Steinberg 1972) search procedure. Then we set: (U)k+I ---- (U)k + hk*Sk •

4.

(14)

This new solution (U)k+l is tested for optimality and for maximum number of minor iterations. That is, we see if." (a) ( Z k Zk+l) MINOR. -

If either of these two conditions hold, this procedure is terminated. If neither hold, then we proceed to step (5). 5.

Update the H matrix as: H k + 1 ---- H k d" M k d- N k

where:

(15)

161

GENFOLD2 for Analysis of Preference/Dominance Data

I

SkSk

M k = h~

(16)

SkQk

'

(HkQk)(HkQk)'

Nk = --

t

,

(17)

QkHkQk

Qk 6.

(18)

VZk+l-- VZk

=

Set k = k + 1 and go the step (2).

Gill, Murray, and Wright (1981) provide a derivation of this procedure as well as its convergence properties. Note, we could have easily performed this same algorithm on W j instead of on U ~. However, letting W i = U;U e allows us to convert a constrained problem (symmetry and positive (semi) definiteness) into an easier unconstrained problem. The use of this QuasiNewton method had been favorably compared with other gradient search procedures such as steepest descent and conjugate gradient methods (Himmelblau 1972). It was found empirically that the approximate second derivative information can aid in speeding up convergence, especially when near the optimal solution. In addition, since the first step of this algorithm was a steepest descent search, one could take advantage of a steepest descent search when initially far away from the optimal solution (empirical research demonstrates that steepest descent is best used in early iterations when far from the optimal solution). 3.3.4.2 D i a g o n a l W i. The minimization problem to be solved here is to

minimize: I

J

z = E Z ru(a,j i=1 j = I

subject to: (i).

w/s > / 0

V i = 1.....

Vsfr=l

I .....

T

s = 1.....

T

and (ii).

w]s = 0

V r #

Vi = 1 . . . . .

(19)

I

While algorithms exist (e.g., Lawson and Hanson 1974) to solve such an equality and inequality constrained, nonlinear optimization problem using

162

W.S. DeSarbo and V.R. Rao

Kuhn Tucker conditions, the computational cost of such procedures becomes enormous for large IT. One alternative procedure is to re-express W i = UiU i' to enforce symmetry and nonnegativity. This decomposition is defined (see Green 1978) for all nonnegative definite symmetric matrices via the singular-value decomposition principle for symmetric matrices. Thus, we can convert the problem to one merely involving equality constraints which is much easier to solve. For this problem, a projected Quasi-Newton method is to be used. Let:

Ogl(h (k)) .....

Ogl(h (k)) .

.

Ohl

,

Ohb

(20)

A*=

Oga(h (k))

Oga(h (k))

0hi

Ohb

be the a x b matrix of partial derivative of the (active) equality constraints where: gs

ffi

The s-th equality constraint equation,

h (k)

=

the vector of parameters to be estimated whose elements are the b --- I T 2 elements of U

=

1. . . . .

IT(T-l)

=

1. . . . .

(IT 2) parameters,

=

the k-th iteration.

k

constraints,

Here, there are I T ( T - I ) equality or active constraints for the off-diagonal elements of U. One of the advantages of converting the problem into a pure linearly constrained one is that the matrix A* and the associated projection matrix: B*ffi I - A * ' ( A * A * ' ) - I A *

(21)

need only be calculated once in the entire program since all the constraints are linear equality constraints (no inequality constraints) and are therefore,

G E N F O L D 2 for Analysis o f P r e f e r e n c e / D o m i n a n c e Data

163

by definition, "active" or binding at every iteration in this phase of estimation. With B*, now computed, one can define the projected gradient as:

VZk"= B*VZk

(22)

and the projected Hessian approximate as: Hk*=

B*'HkB*.

(23)

These are then substituted in the algorithm above for V Z and H respectively, and the same general procedure is utilized. Gill, Murray, and Wright (1981) suggest alternative reduced space procedures to obtain B* which might prove more efficient. Current work is being performed in this area. 3.3.5 Estimate a and/or Y

3.3.5.1 Y Unconstrained. The optimization problem at this phase involves attempting to minimize the loss function in (7) with respect to Y. Taking partial derivatives:

O__z_z= _ 2 Z YJJ(ao- Au) -0A,: ~ OYi

j

= -- 4aici 2 ~

Tu(Aij

--

i j ,~J ri jc ) - l (.w - -i - ¥ i

' - - WiX~)

,

(24)

J

the unconstrained Quasi-Newton algorithm described in 3.3.4.1 is utilized to estimate Y.

3.3.5.2 Y = A a. Given an I x L ( L < I ) matrix of background descriptor information which presumedly contains relevant data to fully describe the behavior of Yit, we attempt to minimize the loss function in (7) with respect to a. Taking partial derivatives:

2 E E ~.(a. - L~) oL:

0z 0a

i

=

4 ~., ~ i

j

j

Tij(Au

Oa ~' "~ 2 ~ciZ - I ' " r_ _ i t -- zaij)aic i Jd (AiXjW -- AiAiaW

i

) ,

(25)

164

W.S. DeSarbo and V.R. Rao

where Ai is the i-th row of A, the unconstrained Quasi-Newton algorithm described in 3.3.4.1 is utilized to estimate a . Once ot is estimated, Y is updated via Y = Aa. 3.3.6 Estimate 0 and/or X 3.3.6.1 X U n c o n s t r a i n e d . Now, the optimization problem reduces to one involving attempting to minimize (7) with respect to X. Taking partial derivatives:

0 ~Z 0Xj = - 2 ~,, 7ij(Ai./_ AU) AUOXj _ .2tc?-ltwiv ' = - 4 ~., 7 i j ( A U - A U ) , i c i Jij "'" " j -

WiY;)

(26)

i

the unconstrained Quasi-Newton algorithm described in 3.3.4.1 is utilized to obtain an estimate of X. 3 . 3 . 6 . 2 X = B0. Given a J x K matrix of descriptor variables describing the

important aspects of the stimuli, we attempt to minimize the loss function in (7) with respect to 0. Taking partial derivatives:

i

j

= 4 ~ ~ Tu(A,j i

00 ~ •~ 2 t c i L 1 r i / U,,,;~, Ju (BjY/W -

r

i

BjB/OW )

,

(27)

j

where Bj is the j - t h row of B, the unconstrained Quasi-Newton algorithm described in 3.3.4.1 is utilized to obtain an estimate of O. Once an estimate of 0 is obtained, X is updated via X = B0. 3.3.7 Estimate ciz Assuming metric (ratio or interval scaled) data, several options exist, as discussed, concerning the estimation of c7. The simplest option is to constrain c/2 = c = 1, ~ i = 1 . . . . . I, in which case this phase is ignored. Two other major options exist. 3.3. 7.1 ca U n c o n s t r a i n e d . The optimization subproblem at this stage entails the estimation of ci, V i = 1 . . . . . I, so as to minimize the loss function in (7). Taking partial derivatives:

GENFOLD2 for Analysis of Preference/Dominance Data

e z = _ 2 ]~ ~,,:(a U - ~ . ) 0ci j

165

o~,,j 0c;

= - 4aic~ ~,, "fu(Au - ~- u , ,~ rc'2u In di: , J

(28)

the unconstrained Quasi-Newton method described in 3.3.4.1 is utilized iteratively to provide an estimate of q. 3.3.7.2 ciffie, a constant V i = ~ l , . . . ,I. Here we constrain c~ = c22 . . . . . ct2 = c 2, a constant which we attempt to solve for in order to minimize (7). Taking partial derivatives:

o z = _ 2 Z Z ~,,j(a;j - ,i,;? o/~,; 0c

i

:

Oc ^

c2

= - 4 ~ ~ 7ij(A U - Aij)a, cif,~ In ~ j , i j

(29)

the same unconstrained Quasi-Newton method is iterativety applied.

3.3.8 Estimate bl and/or ai Assuming metric data, there are nine possible options in G E N F O L D 2 concerning the estimation of combinations of additive and multiplicative constants. Note, only when W i--- I, the identity matrix, V i = 1 . . . . . 1, does it makes sense to estimate an at since its effect would be absorbed in a general or diagonal W j. 3.3.8.1 ai ffi 1, bi ~ 0,, Vi = 1 , . . . ,I. In this trivial case where there is no additive constant and all multiplicative constants equal one, there is obviously no need for estimating anything. 3.3.8.2 ai - 1, bi "= b, Vi =" 1 , . . . ,

I. Let:

mu

--

d lJ '2 ~ r~ [(X/-

rnij

=

m U x ~/ ~/],

M *

=

a vector (of length I J) stringing out mu~,

Yi)W t (Xj-

Yi)'] c'2,

W.S. DeSarbo and V.R. Rao

166

~,

---

a vector (of length H) stringing out Tu,

A

--

a vector (of length IJ) stringing out A U,

A*

=

A

N

--

[y~h,M*] ffi an (H) by 2 matrix where the first column consists of ~,~ and the second column is M*.

x

~,,h,

Then, unconstrained estimates for a and b that minimize the loss function in equation (7) can be solved for via: F ffi (bl ffi (N'N)-IN'A"

(30)

If a is not equal to one which in mostly every instance will be the case, then a constrained regression analysis is performed forcing a =-- 1. This is done first by defining:

[01 and estimating the constrained coefficients via (Johnston 1972):

[ab} -- r + t(N'N)-IR'tR(N'N)-IR']-I(r - RF)],

(31)

which now constrains a -- 1. 3.3.3.3 ai - 1, bi unconstrained, Vi - 1 , . . . , Mi*

_-

I. Let:

a vector of length J for subject i containing elements mij, j = 1 . . . . . J, a vector of length J containing the preference values for J stimuli for the i-th person,

H i

~--

[T~h, Mi*] = a J by 2 matrix for the i-th subject where the first column consists of T ~h and the second column is Mi*.

Then, for each subject i, i - - 1 . . . . . I, unconstrained estimates for ai and bt that minimize the loss function (7) are obtained via:

G E N F O L D 2 for Analysis of P r e f e r e n c e / D o m i n a n c e Data

F~ =

bi ' 1 ' ai -- (NiNi)- N i A i

167

*

(32)



If a~ ;~ 1, then a constrained regression analysis, for each subject where a~ ;~ 1, is performed constraining a~ -----1. This is done in the same manner as in (31) where Ni and F~ replace N and F respectively. 3.3.8.4 al = a, bl = 0, V i = 1 , . . . , I. Using the notation above, we can estimate a multiplicative constant for the entire sample of subjects, assuming no additive constant, that minimizes the loss function (7) via: a = (M*'M*)-~M*'A*

(33)

without estimating an intercept term. 3.3.8.5 a i ~= a, bi = b, V i =, 1 , . . . , (7) can be obtained via:

I. Estimates of a and b that minimize

bI = (N'N)-~N'A*.

(34)

3.3.8.6 ai ~ a, bi unconstrained, V i -- 1 , . . . , I. Estimation of a and bg that minimize (7) is performed in two stages. In stage 1, a is estimated for the entire sample using equation (33). One estimated, it is embedded into the W~'s via simple multiplication and reset to 1. Then, a constrained regression analysis is performed to estimate bi with a,.--1 via the technique discussed above. Once this is done, the a is extracted from the W"s via division and a i is set equal to a, redefining the Wi's. 3.3.8.7 ai~ constrained, bi-- 0,V i ~ 1 , . . . , which minimize (7) can be obtained via: ai =

I.

Here, estimates of ai

(M,*' Mi*) -1 M,*' Ai*,

(35)

e.g., performing I separate regressions without an intercept term. 3.3.8.8 ai is unconstrained, bi = b,Vi = 1 , . . . , I. Estimation of at and b which minimize (7) is performed in two steps. In step 1, a, is estimated via equation (33) without an intercept term. In step 2, the aj's are absorbed into the W~'s, and a constrained regression analysis is performed for the entire sample to estimate b via equation (31) constraining a = 1. The ai's are then extracted from the W~'s (redefining the W"s).

168

W.S. DeSarbo and V.R. Rao

3.3.8.9 ai is unconstrained, bi is unconstrained. Estimates of ai and bi which minimize (7) are obtained by performing I separate regressions of the form:

{]ajbi ---- (N~ Ni) -1 Ni''

•i

"

(36)

3.3.9 Termination Criteria The algorithm terminates when either one of two conditions hold: a.

b.

The number of major iterations exceeds MAJOR; or, Iz

Suggest Documents