Framing, Context and Value Averaging

0 downloads 0 Views 660KB Size Report
Models for several context effects have been described in the literature. One of ..... is the harmonic mean; and if r has the limiting value zero, V r .... orchestras.
Framing, Context and Value Averaging Khaled Boughanmi∗, Kamel Jedidi†, and Rajeev Kohli‡ Graduate School of Business, Columbia University

§



Electronic address: [email protected]



Electronic address: [email protected]



Electronic address: [email protected]

§

We thank Ran Kivetz and Oded Netzer for kindly providing the data from their 2004 study, and Robert

Rooderkerk for the data from his 2010 study.

Framing, Context and Value Averaging

Abstract We introduce a probabilistic choice model that captures the effects of framing and context on choice, allowing violations of regularity, independence of irrelevant alternatives and order independence. The proposed model represents the value of an alternative as a generalized mean of its attribute values. A single parameter determines the type of mean. Its value can change across decision frames and choice sets. A positive/negative parameter value corresponds to a positive/negative evaluation frame, and a more negative parameter value reflects greater aversion to extreme outcomes. The multinomial logit model is obtained as a special case when the averaging parameter has a zero value. Limiting cases of the model correspond to lexicographic rules by which the value of an alternative is determined solely by its best or worst attribute. We use data from published studies to illustrate how the model captures the effects of context and framing on choice. We describe an application concerning digital cameras, and discuss implications for product design, product positioning and demand forecasting.

Keywords:

Regularity violations, context effects, framing effects, generalized mean,

discrete choice models.

2

INTRODUCTION The failure to construct a canonical representation in decision problems contrasts with other cognitive tasks in which such representations are generated automatically and effortlessly. In particular, our visual experience consists largely of canonical representations: objects do not appear to change in size, shape, brightness or color, when we move around them or when illumination varies. When we look at a white circle from a sharp angle in dim light, we see it as circular and white, not as ellipsoid and grey. Canonical representations are also generated in the process of language comprehension, where listeners quickly recode much of what they hear into an abstract propositional form that no longer discriminates, for example, between the active and the passive voice and often does not distinguish what was actually said from what was implied or presupposed (Clark and Clark 1977). Unfortunately, the mental machinery that transforms percepts and sentences into standard forms does not automatically apply to the objects of choice. — Tversky and Kahneman (1986)

An important reason for the absence of a canonical representation for decision problems is that context and framing can substantially alter a person’s choices. Amartya Sen (1997) described several different ways in which context can influence choice: menu dependence (which includes similarity, attraction and compromise effects), social norms, others’ choices, the extent to which one’s actions can be observed by others, and the fiduciary responsibility

3

people assume for the well-being of others (see section 2). Context effects can result in the violation of regularity, a seemingly mild condition that requires the choice probability of an alternative to not increase upon the expansion of a choice set. Context effects can also cause violations of independence of irrelevant alternatives (IIA) and order independence (e.g Tversky, 1972, Huber et al., 1982, Redelmeir and Shafir, 1995, Sedikides et al., 1999). Framing effects can be even more dramatic, because the same alternatives can be evaluated in different ways when a person focuses on the desirable or undesirable consequences, or the positive or negative outcomes, associated with their choices (e.g., survival or mortality rates associated with a treatment). Models for several context effects have been described in the literature. One of these, proposed by Tversky and Simonson (1993), uses contingent weighting and binary comparisons to capture the effect of background context (previously considered alternatives) and local context (alternatives in a choice set) on choice. Kivetz et al. (2004b,a) described models representing compromise, asymmetric dominance, attraction and detraction effects. A model by Rooderkerk et al. (2011) uses context-free and context-dependent utility components to capture compromise, attraction, and similarity effects on choice. Tversky (1972)’s elimination by aspects (EBA) allows violation of order independence; and multinomial probit and nested logit models allow violations of independence of irrelevant alternatives (IIA), which is a special case of order independence. However, no random utility model can accommodate regularity violations or the effect of framing on choices. We propose a multi-attribute model that captures the effects of framing and context on choice. It allows violations of regularity, independence of irrelevant alternatives and order independence. The proposed model considers the value of an alternative to be a 4

generalized mean of the values associated with its attributes. A single parameter determines the type of mean. The value of the generalized mean increases with the parameter value, which can depend on the framing and context of decisions. A positive/negative parameter value corresponds to a positive/negative evaluation frame. A more negative parameter value reflects greater aversion to extreme outcomes and thus a greater tendency towards choosing a compromise alternative from a choice set. The multinomial logit model is obtained as a special case when the averaging parameter has a zero value. Other values of the averaging parameter yield other types of means. For example: (1) The harmonic mean is obtained when the averaging parameter has a value of −1. In this case, the value of an alternative is skewed towards the smallest of its attribute values. (2) The arithmetic mean is obtained when the averaging parameter has a value of +1. In this case, the value of an alternative is skewed towards the largest of its attribute values. (3) As the averaging parameter becomes arbitrarily small (large), the value of an alternative becomes equal to the smallest (largest) of its attribute values. That is, the limiting cases of the model correspond to lexicographic evaluations of alternatives based solely on their best or worst attributes. The proposed model allows evaluations of alternatives to depend on the deviations of their attribute values from unobserved reference levels. This captures the effect of local context (alternatives in a choice set) on choice (Tversky and Simonson, 1993).

5

Section 2 discusses previous research concerning violations of regularity and other related effects. Section 3 describes the proposed model. Section 4 uses examples from previous research to illustrate how the proposed model captures these effects. The averaging parameter has a negative value in these examples, suggesting a tendency by subjects to overweigh the negative features when evaluating an alternative. We use the proposed model to analyze choice data for digital cameras, allowing the averaging parameter to vary by choice-set size and individuals. The proposed model substantially outperforms a mixed logit model and captures a regularity violation in which the no-choice probability first decreases, and then increases, with the size of the choice set. We discuss the implications of the proposed model for new product design, product positioning and demand forecasting.

BACKGROUND Framing effects Tversky and Kahneman (1981, p. 453) describe framing as “the conceptions of acts, outcomes and contingencies associated with a particular choice.” It is possible to frame a decision in more than one way: its effectiveness or ineffectiveness in achieving a goal; or its desirable or undesirable consequences. Social norms, personal disposition, habits and mood can influence a decision frame. It is not necessary for a person to always frame the same decision in the same way. The literature documents numerous examples of framing: awarding or denying custody of a child to a parent (Shafir et al., 1993); assessing a public health initiative by the lives it saves, or fails to save (Tversky and Kahneman, 1981); focusing on survival or mortality rates when assessing treatments for lung cancer (McNeil et al., 1982, Tversky and 6

Kahneman, 1986). Framing can also happen in subtler ways. Sen (1997) noted that joining a herd makes the choice act less assertive and perspicuous, and thus change the choice probability. You may prefer red meat to salmon, yet choose the latter because you don’t want to stand out at a party at which everyone chooses salmon over red meat. You may not join a protest on your own, but if you knew that enough people were demonstrating against a Government, you may do so, too. Framing effects cannot be directly captured by a random utility model, which specifies the same choice probability for an alternative in all situations. Framing changes choice probabilities; Tversky and Kahneman (1986) called this the failure of invariance. They noted that invariance fails when there is no standard canonical representation of a decision. Although canonical representations are often generated in visual perception and language comprehension, they do not appear automatically in choice.

Context effects Sen (1997) observed that sometimes choice is affected by such characteristics as the number and types of available alternatives. Other times, it depends upon the choices others make, the visibility of our choices to others, social norms and our concern for and responsibility in placing another person’s well-being above our own. These considerations can result in violations of regularity, IIA and order independence.

IIA and order independence Let C1 and C2 denote two choice sets, each with at least two alternatives. Let Pij denote the probability that alternative j is selected from choice set Ci , i = 1, 2. Suppose j and k 7

are two alternatives that appear in both C1 and C2 . IIA requires that Pij /Pik have the same value in both choice sets, C1 and C2 . A less stringent condition, called order independence, requires that P2j ≥ P2k if P1j ≥ P1k . Multinomial probit and nested logit models allow IIA violations. Tversky (1972)’s elimination by aspects (and its special case, preference trees) further allows violations of order independence. Tversky (1972) observed that these conditions can be violated when a new alternative is more similar to some already available alternatives than others. Experiments demonstrating violations of order independence were reported in the 1960s by Becker et al. (1963), Chipman (1960), Coombs (1958), Krantz (1967), Tversky and Russo (1969). In each case, order independence is violated upon the introduction of alternatives that are closer substitutes for some existing alternatives but not others.

Regularity Order independence and IIA concern relations between the choice probabilities for pairs of alternatives. Regularity concerns the change in choice probability for a single alternative. It requires that the choice probability of an alternative should not increase when more alternatives become available to a person; that is, P1j ≥ P2j when C1 ⊂ C2 . Regularity and order independence are not directly related, because the violation of one does not imply the violation of the other. However, the violation of either condition implies the violation of IIA. Sen (1997) discussed how regularity can be violated for several reasons: menu dependence, herd behavior, chooser dependence and responsibility. Some, but not, of these have been previously discussed in the marketing literature. We briefly discuss each.

8

Menu dependence. Regularity can be violated if choice probabilities change with the size of a choice set or the types of alternatives it contains. A larger choice set can reduce the probability of no-choice when a person values having more alternatives. For example, the act of choice can itself have value that is independent of the values of the alternatives — we may freely choose an alternative, yet refuse to accept it when an authoritarian government assigns it to us with no other choice. Conversely, too many options can make it harder to choose. As more alternatives become available, a person may defer or avoid making a choice (Iyengar and Lepper, 2000). The resulting increase in the no-choice probability is a type of regularity violation. For example, Redelmeir and Shafir (1995) showed that doctors avoided having to choose among an increased number of drugs by recommending a patient for treatment to another doctor. Menu dependence also occurs when a person uses simplifying heuristics, or restricts attention to a subset of alternatives that are more easily compared. Redelmeir and Shafir (1995) showed that doctors avoided the difficult task of comparing two similar patients by recommending that a third dissimilar patient be selected for earlier surgery. On the other hand, Huber et al. (1982) found that the inclusion of similar alternatives in a choice set can make it appear as if these are more popular and thus more desirable. Huber et al. (1982) and Tentori et al. (2001) reported such regularity violations for beer, cars, television sets, films, restaurants and lottery tickets. Simonson (1989) conducted experiments in which the probability of choosing one of two apartments, and one of two calculator batteries, increased upon the introduction of a third extreme alternative. Sen (1997) observed that menu dependence can occur when the addition of an alternative provides new information not about the alternatives but the choice context. In his example, suppose you have a choice between 9

going home or having tea with a colleague. You might choose to have tea. But suppose the colleague offered a choice between having tea or some heroin and cocaine. You may then prefer to go home, because you have learnt something that could affect your decision even to have tea with your colleague.

Chooser dependence. Social convention or concern for others can influence choice. In one example, Sen (1997) considers a fruit basket with one mango and two apples. You prefer mangoes to apples, yet choose the Apple because you want to leave the mango for the other person. But if the basket had two mangoes, you would have no trouble choosing a mango. There may be several reasons for this choice, none of which is related to the choice set. You may be following social convention, acting in a way that is consistent with your reputation of being a considerate person, or simply behaving in a way that seems right. Sen notes that Immanuel Kant and Adam Smith emphasized the importance of “moral sentiments” in rational choice. Adam Smith also discussed extensively how various moral values (including generosity and public spirit) can alter our choice behavior, even though self-interest may be adequate to explain the special case of mutually profitable exchange.

Responsibility. Sometimes we make a choice out of responsibility, even when it conflicts with our own well-being. We may enjoy exercising responsibility, or find it a burden, but in either case we act in a way counter to our personal interest. A mother leaves the best piece of fruit for her child, but has no trouble choosing it for herself after she goes shopping and buys more fruit; a husband often drives the old car, leaving the newer one for his wife. After they buy another car, he begins driving the previously newer car more frequently. It is true that the choice context has changed; but the reasons for the change in behavior has nothing to 10

do with similarities and differences in the alternatives. Nor does it have to do with a utility function that incorporates the welfare of the other. Their own preferences are completely immaterial for our mother and husband, who assume the responsibility of placing the interest of a child and a wife above their own. The following choice model does not explicitly consider each of these reasons. Instead, it reflects their effects on choice probabilities through a single parameter that determines the manner in which a person combines information across the attributes of an alternative. The multinomial logit model is a special case that is obtained when the parameter has a zero value. In other cases, the parameter determines the type of averaging process a person might use when making a choice. Different types of averages reflect different types of weighting of the attributes associated with different decision frames and contexts.

MODEL We describe the proposed value function and choice model and illustrate how it captures violations of regularity, order independence and IIA that can occur because of framing and context effects.

The value function Let each alternative be described using m ≥ 2 continuous or discrete attributes. Let xjk denote the value of attribute k for alternative j. Let xj = (xj1 , . . . , xjm ) denote the profile of alternative j. The proposed model assumes that the value of an alternative is a weighted average of the values of its attribute levels.

11

Let βk denote the importance of attribute k. Let

vjk = eβk xjk

(1)

denote the value of attribute k for alternative j. We use the exponential form to constrain the values vjk to be positive. Although other functional forms can be explored, we retain the exponential form because it is tractable and allows us to relate our model to others. In addition, we require the xjk to be ratio scaled variables. Some attributes, like price, memory and disk size of a camera are naturally ratio scaled. Interval attributes can be introduced by replacing xjk by xjk − x∗k , where x∗k is a reference value for attribute k. Ordinal and nominal attributes (e.g., brand name) can be accommodated by a dummy variable coding of its levels. We further discuss this coding in the application section. The reference values (x∗k ) may be included in the value function not only to satisfy the measurement properties of the model but also to capture reference dependence and context effects. The behavioral literature provides evidence that consumers evaluate alternatives in terms of their deviations from reference values on attributes (e.g, Kahneman and Tversky, 1979). In the following development, we omit the straightforward extension of the model to include reference values but illustrate their use in the empirical application. The key feature of the proposed model is that it considers the value of an alternative to be a (generalized) average, rather than the sum, of the values vjk . As discussed below, this makes no difference when the averaging parameter has a zero value — the proposed model becomes equivalent to a multinomial logit model and the reference levels, x∗k , become irrelevant for the evaluation of alternatives. But averaging and summing can yield different 12

models, and reflect different choice processes, depending on the type of average used. The generalized mean of the attribute values for alternative j is

m

Vjr =

1 X (vjk )r m k=1

! r1 .

(2)

The value of r determines the type of average and the manner in which the vjk values are combined when evaluating an alternative. If r = 1, Vjr is the arithmetic mean; if r = −1, Vjr is the harmonic mean; and if r has the limiting value zero, Vjr is the geometric mean. Table 1 lists these and other special cases of (2); see Hardy, Littlewood and P´olya (1952).

— Insert Table 1 about here — Vjr is independent of r when all vjk values are equal. Otherwise, its value increases with r, as shown by the example in Figure 1. The right-hand side of equation (2) is a CobbDouglas function, Vj0 =

Qm

k=1 (vjk )

1 m

, when r → 0. It has the form of the constant elasticity

of substitution (CES) function when r < 0. The value of r determines the substitution patterns between pairs of attributes. Figure 2 shows the indifference curves between a pair of attributes. When r → −∞, the value of an alternative tends to the minimum value across its levels. This is a lexicographic rule because there is no tradeoff with other attributes; only improvements in the least valued attribute level can increase the value of an alternative (i.e., move it to a higher indifference curve). Panels (b), (c) and (d) correspond to r = −1, 0 and 1, and have convex, linear and concave indifference curves, respectively. When r = −1, the tradeoff between the attributes depends on the values of x1 and x2 . If x1 < x2 , then a decrease in the value of x1 requires a

13

disproportionately larger increase in x2 to obtain the same Vjr value. Similarly, when r = 1, a decrease in the value of x1 requires a disproportionately smaller increase in x2 to obtain the same Vjr value. The indifference curves become linear as the value of r approaches zero. Panel (e) shows that, as r → ∞, the value of an alternative approaches the maximum value across its levels. The indifference curves again reflect lexicographic preferences. In this case, only improvements in the most valued attribute level can increase the value of an alternative.

— Insert Figure 1 about here —

The effects of framing and context on choice can be reflected by changes in the values of r for a given set of alternatives. Larger positive values of r result in greater weighting of attributes with higher vjk values. In the limit, Vjr = maxk (vjk ) when r = ∞; that is, an alternative is evaluated solely on the value of its best attributes. Similarly, larger negative values of r correspond to greater weighting of attributes with lower vjk values. In the limit, Vjr = mink (vjk ) when r = −∞; that is, an alternative is evaluated solely on the value of its worst attributes. Thus, situations or decision frames emphasizing the achievement of favorable outcomes are represented by positive values of r; those emphasizing the avoidance of unfavorable outcomes are represented by negative values of r.

— Insert Figure 2 about here —

Choice model Let Ujr = ln(Vjr ) + ej ,

14

(3)

denote the utility of alternative j, where ln(Vjr ) is the systematic utility component and ej is an error term. For convenience, we assume that ej has an extreme value distribution that is independent across alternatives. Let C denote a choice set with n alternatives. We assume that a consumer chooses an alternative with the maximum utility in C. Since Ujr has an extreme value distribution, the choice probability for alternative j ∈ C is given by:

Pjr

=P

Vjr

l∈C

Vl r

,

(4)

where the value of Vjr is given by equation (2). The parameter r may differ across choice sets, choice contexts and the framing of decisions. Equation (4) reduces to the multinomial logit probability of choosing alternative j ∈ C when r obtains a limiting value of zero. In this case, the value of Vjr is a geometric mean of the attribute values for alternative j (it has the same form as the Cobb-Douglas function) and the probability of choosing alternative j ∈ C is given by  0 βk xjk 1/m e eUj k=1 r , lim Pj = P Qm β x 1/m = P Ul0 r→0 k lk ) e e ( l∈C k=1 l∈C Qm

where Uj0 =

Pm

k=1

αk xjk , for all j ∈ C and αk = βk /m, for all k = 1, . . . , m. The following

example illustrates how changes in the values of the averaging parameter across choice sets capture violations of IIA, order independence, and regularity.

— Insert Figure 3 about here —

Let j = 1, 2, 3 denote alternatives. Suppose each alternative is described using m = 2 attributes, say quality and price. Let C1 = {1, 2} and C2 = {1, 2, 3} denote two choice sets. Let xjk denote the value of attribute k for alternative j, where k = 1, 2 and j = 1, 2, 3. Let 15

x∗k denote a reference level for attribute k. Figure 3(a) plots the xjk − x∗k values for the three alternatives in an attribute space. It shows that both price and quality are high for alternative 1, moderate for alternative 2 and low for alternative 3. Let βq = 1 and βp = −1 denote the importance weights of quality and price, respectively. Then the attribute values vjk = eβk xjk are v11 = e−4 and v12 = e6 for alternative j = 1; v21 = v22 = e0 = 1 for alternative j = 2; and v31 = e4 and v32 = e−6 for alternative j = 3. Let P (j ∈ C1 ) denote the choice probability of alternative j in choice set C1 . Let r = r1 = 0 for choice set C1 . Then, as discussed, the values of P (j ∈ C1 ) and P (j ∈ C2 ) are given by the multinomial logit model. Their values are P (1 ∈ C1 ) = 0.73 and P (2 ∈ C1 ) = 0.27. The horizontal lines in Figure 3(b) correspond to these two probabilities. Let P (j ∈ C2 ) denote the choice probability of alternative j in choice set C2 . Let r = r2 for choice set C2 . Figure 3(b) plots P (j ∈ C2 ) as a function of r2 . It shows that P (1 ∈ C2 ) increases and P (2 ∈ C2 ) decreases with the value of r2 . Figure 3(b) shows that: (1) IIA is violated when r2 6= r1 = 0. In this case, P (1 ∈ C1 )/P (2 ∈ C1 ) 6= P (1 ∈ C2 )/P (2 ∈ C2 ). (2) Order independence is violated when r2 < −0.08. In this case, P (1 ∈ C1 ) > P (2 ∈ C1 ), P (2 ∈ C2 ) ≥ P (1 ∈ C2 ) (preference reversal). (3) Regularity is violated in two conditions: when r2 < −0.01 and when r2 > 0.04. If r2 < −0.01, then P (2 ∈ C2 ) ≥ P (2 ∈ C1 ) ; and if r2 > 0.04, then P (1 ∈ C2 ) ≥ P (1 ∈ C1 ). The first violation describes a compromise effect in which the availability of alternative j = 3 increases the choice probability of alternative j = 2. The second violation describes a polarization effect (Kivetz et al., 2004b, Tversky and Simonson, 16

1993) which is the tendency to seek an extreme outcome.

ILLUSTRATIONS We provide four examples illustrating how the proposed model captures different effects. The first example considers violations of order independence and IIA. The second considers the effect of framing on choice. The third and fourth examples consider attraction and compromise effects, both of which are context effects. Example 1 is a famous example by Debreu (1960) illustrating violations of IIA and order independence. The other three examples analyze data from previously published studies.

Violations of IIA and Order Independence Let P1 and P2 denote the choice probabilities of alternatives 1 and 2. Independence of irrelevant alternatives (IIA) requires that P1 /P2 = α is a constant for any choice set containing both alternatives. Order independence requires that if P1 ≥ P2 in one choice set, then P1 ≥ P2 in all choice sets. Observe that if order independence is violated, so is IIA. As noted, EBA allows violations of order independence, and both nested-logit models and multinomial probit models allow violations of IIA. We consider an example by Debreu (1960) to illustrate how the proposed model captures these violations1 . Let j = 1 and j = 2 denote two different but equally good recordings of a Beethoven symphony. Let j = 3 denote a recording of a Debussy suite. Consider a person who is indifferent between choosing one or the other Beethoven recording, and also indifferent between choosing a Beethoven recording or the Debussy suite. Let C1 = {1, 2}, 1

The example is sometimes recast as the blue bus-red bus problem.

17

C2 = {1, 3}, C3 = {2, 3} and C4 = {1, 2, 3}. Then the person chooses the albums with the following probabilities from each of the four choice sets:

P1 = P2 = 12 , if C1 = {1, 2} P1 = P3 = 12 , if C2 = {1, 3} P1 = P2 = 14 , P3 = 12 , if C3 = {2, 3}.

Order independence is violated because P3 /P1 = (1/2)/(1/2) = 1 when C2 = {1, 3} and P3 /P1 = (1/2)/(1/4) = 2 when C4 = {1, 2, 3}. The proposed model captures this violation of order independence. Let xj1 = 1 if alternative j is the Beethoven symphony and xj1 = 0 if it is the Debussy suite. Let xj2 denote an “irrelevant” attribute that is introduced to distinguish between the two Beethoven recordings. For example, it may represent two different but equally good music labels or orchestras. Let xj21 = 1 and xj22 = 0 for j = 1, the first Beethoven recording; xj21 = 0 and xj22 = 1 for j = 2, the second Beethoven recording; and xj21 = xj22 = 0 for j = 3, the Debussy suite. Then the value of the attribute “symphony” for alternative j is given by vj1 = eβ1 xj1 , j = 1, 2, 3, where β1 is the importance weight of the attribute. Similarly, the value for the orchestra attribute for alternative j is given by vj2 = eβ2 xj21 +β3 xj22 , j = 1, 2, 3, where β2 and β3 are the importance weights associated with the first and second recording labels or orchestras for the Beethoven symphony. Let β1 = 0 and let β2 = β3 = β be a large negative number so that eβ =  > 0 is an

18

arbitrarily small number. Then v11 = v21 = v31 = v32 = e0 = 1 and v12 = v22 = eβ = . Thus

V1r V2r V3r

r r v11 + v12 2

1/r



r r v21 + v22 2

1/r



r r v31 + v32 2

1/r

 = = =

 =

1 + r 2

1/r

1/r 1 + r = 2 1/r  1+1 = = 1. 2 

(1) Let r = +∞ for each of the binary choice sets C1 = {1, 2}, C2 = {1, 3} and C3 = {2, 3}. Then V1r = V2r = max{1, } = 1, V3r = 1 and Pj = 1/2, for all j ∈ C1 , C2 and C3 . (2) Let r = 1 (arithmetic mean) for the choice set C4 = {1, 2, 3}. Then

V1r

P 1 = P2 =

=

V2r

 =V =

1+ 2



, V3r = 1,

V (1 + ) 1 1 = , P3 = = , 2V + 1 2(2 + ) 2V + 1 2+ 1 1 lim P1 = P2 = , lim P3 = . →0 →0 4 2

Thus, as  approaches zero, P3 /P1 approaches the value 1 in the choice set C2 and 2 in the choice set C4 (similarly, P3 /P2 approaches the value 1. in the choice set C3 and the value 2 in the choice set C4 ). The inclusion of a second Beethoven recording has no effect on the choice probability of the Debussy suite, but splits the choice probability between the two Beethoven recordings.

19

Framing effect We illustrate how the proposed model captures the effect of framing by considering a study by Shafir et al. (1993). Subjects in one group were asked to identify which of two divorcing parents should be awarded custody of their child; those in another group were asked to identify the parent who should be denied custody. A majority of subjects in each group identified the same parent. Shafir et al. (1993) explained that framing the decision as an award of custody led subjects to focus more on a parent’s positive features; and framing it as a denial of custody led them to focus more on a parent’s negative features. The same parent had more positive features and other more negative features, and thus was selected by a majority of the subjects in each framing condition. The proposed model captures this framing effect when the averaging parameter has a positive value in the award condition and a negative value in the deny condition. To illustrate, suppose each parent is described using m = 2 attributes, income and health. Consider the following values vjk = eβk xjk for parent j on attribute k: v11 = v12 = 3, v21 = 9, v22 = 1. That is, parent 1 is average on both income and health whereas parent 2 has higher income but poorer health. Then

V1r

 =

3r + 3r 2

 r1 ,

V2r

 =

9r + 1r 2

 r1 .

In the “award custody” condition, Shafir et al. (1983) found that parent 2 is awarded custody with probability P2ra

V2ra = ra = 0.64, V1 + V2ra

20

where r = ra is the associated value of the averaging parameter. In the “deny custody” condition, Shafir et al. (1993) found that the same parent 2 is denied custody with probability

1 − P2rd =

V1rd = 0.55, V1rd + V2rd

where r = rd is the associated value of the averaging parameter. Figure 4 shows that P2r increases with r. The probabilities P2ra = 0.64 and P2rd = 0.45 are obtained when r = ra = 1.2 and r = rd = −0.3, respectively. In the “award custody” condition, ra > 0 weights the positive attribute of parent 2 more heavily; in the “deny custody” condition, rd < 0 weights the negative attribute of parent 2 more heavily, lowering its choice probability.

— Insert Figure 4 about here —

Attraction (context) effect Sedikides et al. (1999) conducted an experiment demonstrating the violation of regularity when a “decoy” was added to a choice set (this is sometimes called the asymmetric dominance effect). Each of 125 subjects was shown ten sets of hypothetical dating partners and asked to choose one person they would like to date from a given set. Each candidate was described using three out of five possible attributes: attractiveness, honesty, sense of humor, dependability, and intelligence. The attribute values were percentiles (e.g., more attractive than 80% of students of same gender, age, and race). Subjects were told to assume that all the prospective dating partners attended their university and were of their own age and ethnicity/race. Subjects were randomly assigned to three conditions. Subjects in the first condition were presented a choice of dating one of two candidates, A and B; those in the 21

second condition had a choice among A, B and CA , where CA was a decoy for A; and those in the third condition had a choice among A, B and CB , where CB was a decoy for B. Candidate A dominated decoy CA , and candidate B dominated decoy CB , on one attribute. However, A did not dominate CB , and B did not dominate CA . Table 2 describes the profiles of A, B and the decoys CA and CB for one of the ten choice sets. For these profiles, the percentage of subjects choosing A was 40% from the set (A, B), 47.5% from the set (A, B, CA ) and 20% from the set (A, B, CB ). Thus, consistent with regularity violation, the probability of choosing A increased when CA was added (and decreased when CB was added) to a choice set with A and B. Sedikides et al. (1999) found similar evidence for regularity violations in eight out of the ten experimental trials.

— Insert Table 2 about here —

We used the data in Table 2 and 3 of Sedikides et al. (1999, pp. 127–128) to illustrate how our model can capture these regularity violations2 . We mean-centered the attribute values within each choice set. The purpose of using the centroid as a reference point is to reflect the “local context” (Tversky and Simonson, 1993). For example, suppose all candidates in a choice set had high percentile scores on attractiveness. Mean centering implies that the choice probabilities are affected by variations in, rather than the values of, these scores. We estimated the proposed model using Sedikides et al. (1999) data. For comparison, we also estimated a multinomial logit model (MNL). We used Hamiltonian Monte Carlo (HMC) to estimate both models. For each model, we ran sampling chains for 4,000 iterations. We discarded the first 2000 draws as burn-in iterations and report results based on the last 2000 2

Table 3 of Sedikides et al. (1999, p. 128) reports only the choice probabilities for alternative A. Thus, we do not observe the choice probabilities of the other alternatives in the choice sets.

22

ˆ statistic (Gelman and Rubin, 1992) was close to one in all draws. The Gelman-Rubin R cases, suggesting satisfactory convergence of the solutions. — Insert Table 3 about here — Table 3 reports WAIC and leave-one-out cross-validation (LOO) statistics for the two models. WAIC penalizes for model complexity (Vehtari et al., 2017). For Bayesian models, it offers an improvement over the DIC criterion, which is based on point estimates of the effective number of parameters. LOO is a measure of predictive validity. It uses the draws from the posterior distribution of the parameters to compute the log predictive density of the leave-one out data. A preferred model has lower WAIC and LOO; see Gelman et al. (2014). The results reveal estimated differences of 243 in WAIC (with a standard error of 37.2) and 244 in LOO (with a standard error of 36.5) in favor of the proposed model over the multinomial logit model. — Insert Table 4 about here — Table 4 reports the actual and predicted probabilities of individual A being chosen as a date. As noted, these results imply an attraction effect. Across the ten experimental conditions, the choice probability for A increased from 0.50 to 0.615 when CA was used as a decoy, and decreased to 0.333 when CB was used a decoy, in the choice set. The multinomial logit model did not predict this attraction effect: the choice probability for A decreased from 0.514 to 0.409 when CA was used as a decoy. In contrast, the proposed model did well in capturing this effect: the choice probability for A increased from 0.5097 to 0.5845 when CA was included in the choice set. When CB was introduced, the multinomial logit model overestimated the choice probability for A as compared to the proposed model. 23

— Insert Table 5 about here — Table 5 reports the mean absolute deviations (MAD) between the observed and predicted choice probabilities, overall and by choice set. The multinomial logit model had the worst overall MAD of 0.1169. The multinomial logit model did well in predicting the probability of choosing A in choice set (A, B) but poorly when a decoy was added to the choice set. The MAD value was 0.2056 when CA was added to the choice set and 0.1034 when CB was added to the choice set. This result is expected since the multinomial logit model does not allow for regularity violations. In contrast, the MAD is 0.0594 for the proposed model, displays little variability across choice sets, and captures the regularity violation resulting from the addition of a decoy to a choice set. The parameter estimates suggest that subjects used the following attribute sequence when selecting a date (posterior intervals are given in square brackets): honesty (βˆ = 0.45 [0.39, 0.52]), humor (βˆ = 0.35 [0.29, 0.42]), attractiveness(βˆ = 0.30 [0.25, 0.37]), dependability (βˆ = 0.27 [0.22, 0.33]) and intelligence (βˆ = 0.22 [0.18, 0.26]). The estimate of the averaging parameter was rˆ = −0.19 [−0.28, −0.12]). Its negative value implies that the subjects attached more importance to the negative attributes of a potential date.

Compromise (context) effect The compromise effect is a well-established finding in the marketing and psychology literatures (Simonson, 1989). It reflects the notion that the same product has a higher market share when it offers a compromise between two extreme options than when it is an extreme option itself. This is a violation of regularity because the addition of a (more extreme) option increases the market share of an alternative. 24

We illustrate how the proposed model captures the compromise effect by analyzing data collected by Kivetz et al. (2004b). Four hundred and fifty four subjects were randomly assigned to three experimental conditions. Subjects in each condition were shown concept descriptions of three personal computers (PCs) and asked to choose one. Each PC concept was described using two attributes: computer speed (in MHz) and memory (in MB). The attribute levels used in the study were based on the profiles of actual PCs available in the market at the time of the study. Each choice set contained three of five (Pareto-optimal) profiles, denoted A, B, C, D, and E. Subjects were asked to assume that all computers in a choice set had the same price. After completing the choice task, subjects provided self-explicated part worth for the attribute levels.

— Insert Table 6 about here —

Table 6 shows the market shares for the PC concepts in different choice sets. It shows that the same product profile had a higher market share when it was a compromise between two other alternatives. For example, alternative B was chosen by 50% of the subjects when it was the compromise alternative in choice set 1 and by 18% of the subjects when it was an extreme alternative in choice set 2. Similarly, alternative C was chosen by 51% of subjects when it was an intermediate alternative in choice set 2 and by 24% of the subjects when it was an extreme alternative in choice set 3. Kivetz et al. (2004b) reported that these compromise effects were statistically significant (p < 0.05). We used these data to estimate proposed model. For comparison, we also estimated a multinomial logit (MNL) model and Kivetz et al.’s (2004) loss-aversion model (LAM) for the

25

compromise effect.3 Like the proposed model, LAM uses the attribute centroids in a choice set as reference points. It also introduces a loss aversion parameter for each attribute. Let xijk denote individual i’s part worth utility for the level of attribute k that appears in alternative j.4 As noted, these part worths were estimated by Kivetz et al. using the self-explicated conjoint task. Let x∗ik be individual i’s average part worth utility for attribute k across the alternatives in a choice set. Let α be a logit scaling parameter. The proposed model associates the following value with alternative j for consumer i:

2

Vijr =

1 X ([xijk −x∗ik ]r) e 2 k=1

! αr ,

(5)

where r is the averaging parameter. As noted, r = 0 yields the multinomial logit model (the reference values “cancel out”). Since the part worth utilities were estimated using selfexplicated conjoint analysis, one scaling parameter was estimated for the multinomial logit model, and two parameters were estimated for the proposed model. For the loss-aversion model (LAM), the utility of alternative j for consumer i is given by:

Vijlam

! 2 X = exp α (xijk − x∗ik )zijk + λk (xijk − x∗ik )(1 − zijk ) ,

(6)

k=1

where λk is a loss aversion parameter for attribute k, and zijk = 1 if xijk − x∗ik ≥ 0; otherwise, zijk = 0; see Kivetz et al. (2004b) for details. 3

Kivetz et al. (2004) proposed four context-dependent choice models, each capturing a different mechanism for the compromise effect. We report the results of the loss-aversion model (LAM) because it had the best likelihood value and minimum BIC (see Table 4, p. 248 and Table 5, p. 250 of their paper). We replicated Kivetz et al.’s results for all four models. These results are available from the authors. 4 The part worth values are effectively xijk = βik xjkl where βik is the self-explicated importance weight of attribute k for consumer i and xjkl is a dummy (indicator) variable for the level of attribute k that appears in alternative j.

26

We used Hamiltonian Monte Carlo (HMC) to estimate all three models. For each model, we ran sampling chains for 4,000 iterations. We discarded the first 2000 draws as burn-in ˆ iterations and report results based on the following 2000 iterations. The Gelman-Rubin R statistic (Gelman and Rubin, 1992) was close to one in all cases, suggesting satisfactory convergence of the solutions. — Insert Table 7 about here — Table 7 reports WAIC statistics for the multinomial logit model, Kivetz et al.’s loss aversion model (LAM) and the proposed model.5 It shows that the proposed model, which has one less parameter than Kivetz et al.’s loss aversion model, obtained comparable WAIC value (an estimated difference of 1 with a standard error of 4.5); it also obtained a better WAIC value than the multinomial logit model (an estimated difference of 45 with a standard error of 14.8). Table 7 also reports the mean absolute deviations (MAD) between the observed and predicted choice probabilities. Both the proposed model and LAM obtained MAD values of about 0.065, which is substantially lower than the MAD value of 0.1274 for the multinomial logit model. These results suggest that the proposed model performed as well as the best model for the compromise effect reported by Kivetz et al.. — Insert Table 8 about here — Table 8 reports the posterior means of the parameter estimates for each model. The present Bayesian parameter estimates for LAM are very close to the maximum likelihood estimates reported by Kivetz et al. (2004b) (see Table 4 on p. 248 of their paper). The 5

The WAIC statistic is asymptotically equivalent to the leave-one-out cross-validation (LOO) statistic (Vehtari et al., 2017). Since the sample size (n = 454) is large in this application, the WAIC statistic can be interpreted as the value of the LOO statistic.

27

loss aversion parameters of the LAM are significantly larger than one, which implies that subjects were loss averse (especially for processor speed). For the proposed model, rˆ = −3.19 is significantly different from zero and implies that the subjects gave more importance to the negative attributes in the evaluations of personal computers. A compromise alternative has the highest minimum among the alternatives in a choice set and is therefore the least affected by the tendency of subjects to evaluate alternatives by their least attractive attributes. Figure 5 depicts the value functions, estimated using LAM and the proposed model, for processor speed and memory. The value function for each attribute was constructed by using the posterior means of its parameters, keeping the other attribute fixed at its reference level. For each attribute, the horizontal axis plots the deviations of the part worths from reference values. The figure shows that the value functions are steeper for losses than for gains, a result that is consistent with prospect theory (Kahneman and Tversky, 1979).

— Insert Figure 5 about here —

APPLICATION: DIGITAL CAMERA CHOICES We describe an application to demonstrate how the proposed model can be estimated with heterogeneous parameters, including the averaging parameter. Substantively, it shows how the proposed model can capture the effect of increasing the choice set size on the no-choice probability (Iyengar and Lepper, 2000). We discuss the implications of the results for new product introduction, product positioning and demand forecasting. We used data from a choice experiment conducted by Rooderkerk et al. (2011). Sub28

jects were shown digital-camera concepts described using two attributes, picture quality and optical zoom. The following values of the attributes were used in the study: (1) Picture quality: 2 MP, 4MP, 6MP, 8MP and 10 MP (2) Optical zoom: 2X, 4X, 6X, 8X and 10X These levels reflect the attribute values found in cameras sold at the time of the study. Subjects were shown choice sets with three, four or five camera concepts differing in picture quality and optical zoom. They were told that all cameras in a choice set had the same price. No alternative dominated another. Each of one hundred and fifty four subjects saw twenty-one choice sets. Their task was to choose one, or none, of the alternatives in a choice set. Figure 6 shows a sample choice set with three alternatives. Each subject was shown seventeen choice sets with three alternatives each, two choice sets with four alternatives each, and two choice sets with five alternatives each. The choice sets varied across subjects and were selected randomly from a collection of 700 choice sets. A total of 3,234 sample observations were obtained.

— Insert Figure 6 about here —

A descriptive analysis of the data showed that the percentage of no choices decreased from 17.84% to 11.35% when the choice-set size increased from three to four alternatives, but then increased to 14.61% when the choice-set size further increased to five alternatives. The present analysis confirms that this pattern is consistent with a violation of regularity, which may have occurred because increasing the number of alternatives from four to five increased decision complexity and resulted in subjects abandoning choice (Iyengar and Lepper, 2000). 29

Model specification and estimation Let camera j have optical zoom OZj and picture quality P Qj . We assume that consumer i evaluates a camera by comparing each attribute relative to an (individual-specific) reference value, denoted OZ ∗i for optical zoom and P Q∗i for picture quality. We allow the value of r to vary by individual and the number of alternatives in a choice set. Let rsi be the averaging parameter for individual i in a choice set of size s, where s = 3, 4 or 5. Then the value of alternative j in a choice set with s alternatives is

Vijrs

 =

1 rsi β1i [OZ j −OZ ∗i ] ∗  e + ersi β2i [P Qj −P Qi ] 2

 r1

si

,

(7)

where β1i and β2i are subject i’s importance weights associated with optical zoom and picture quality, respectively. The probability that subject i chooses alternative j ∈ C, when C has s alternatives, is Pijrs =

1+

V rs Pij

1+

1 P

`∈C

Vi`rs

,

Vi`rs

.

and the no-choice probability is

Pi0rs =

`∈C

We used the data to estimate the proposed model (which we call the general averaging model). We also estimated a mixed logit model (which corresponds to setting rsi = 0) and a constant averaging model in which the value of r is independent of the number of alternatives in a choice set (that is, rsi = ri , for s = 3, 4, 5). We used Hamiltonian Monte Carlo (HMC)

30

to estimate all three models. For each model, we ran sampling chains for 4,000 iterations. We discarded the first 2000 draws as burn-in iterations and report results based on the last ˆ statistic (Gelman and Rubin, 1992) was lower than 2000 iterations. The Gelman-Rubin R one in all cases, suggesting satisfactory convergence of the solutions. We use model fit and prediction to assess violations of IIA, order independence and regularity.

Model Comparisons Table 9 reports WAIC and leave-one-out cross-validation (LOO) statistics for the three models. The constant averaging model is superior to the mixed logit model; the estimated difference in the WAIC values between these two models is 2671 (with a standard error of 89.90). The proposed general averaging model is superior to the constant averaging model; the estimated difference in the WAIC values between these two models is 22 (with a standard error of 10.94).

— Insert Table 9 about here —

Table 10 reports the expected in-sample and out-of-sample (leave-one-out) hit rates for choice sets with 3, 4 and 5 alternatives. The mixed logit model performs worse than the averaging models. Its relative performance becomes worse with increasing numbers of alternatives in a choice set. Both averaging models have similar hit rates when the choice set have three and four alternatives. For choice sets with five alternatives, the general averaging model performs better than the constant averaging model.

— Insert Table 10 about here —

31

Violation of Regularity Figure 7 plots the actual and predicted probabilities of no-choice across choice sets with 3, 4 and 5 alternatives. The mixed logit model and constant averaging model are consistent with regularity, because they predict that the probability of no choice decreases with increasing choice-set size. The general averaging model predicts regularity violation, because the probability of no choice first decreases and then increases with choice set size. It is also the only model consistent with the data. — Insert Figure 7 about here —

Estimation Results — Insert Table 11 about here — Table 11 shows the posterior means and 95% posterior intervals for the distributions of the model parameters. In all three models, picture quality and optical zoom have significant impact on utility (p ≤ 0.05). The standardized coefficients suggest that picture quality is more important than optical zoom (relative importance of 57% for the mixed logit model and 47% for the averaging models).6 The attribute weights of the averaging models are larger in magnitude than those from the mixed logit model. Since random utility models confound estimates of model parameters with error variability (Louviere and Eagle, 2006), models with better fit (lower error variance) have larger attribute weights. This is true in our case since the averaging models reported in Table 9 have better model fit. The standardized coefficients have the following values for picture quality (βˆ1∗ ) and optical zoom (βˆ2∗ ): (1) βˆ1∗ = 1.82, βˆ2∗ = 0.67, for the mixed logit model; (2) βˆ1∗ = 4.56, βˆ2∗ = 2.51, for the constant averaging model; and (3) βˆ1∗ = 4.42, βˆ2∗ = 2.43, for the general averaging model. 6

32

Reference values The reference levels of the two attributes have similar values in the two averaging models: about 3.75 megapixels for picture quality (close to the mid-point of the 2-6 megapixels scale); and 3.5X for optical zoom (in the lower third of the 2X − 10X scale).

Averaging parameters For both averaging models, the estimated averaging parameters are negative and significantly different from zero. Table 11 shows that the averaging parameters of the proposed model became increasingly more negative as the size of the choice set increases. The posterior means are −0.51 for three alternatives, −0.69 for four alternatives and −3.62 for five alternatives. Thus, as the choice set increases, consumers tend to increasingly use the worst features to evaluate the cameras. There is a statistically significant difference in the values of r4 and r5 (p ≤ 0.05), but not between r3 and r4 (their 95% posterior intervals overlap). Figure 8 plots the value functions for picture quality and optical zoom that are implied by each of the estimated averaging parameters in the proposed model. Each function was obtained using the posterior mean of its averaging parameter, keeping the other attribute fixed at its reference level. For both attributes, the value functions are s-shaped and not symmetric around the reference level. For a choice set with three alternatives, the value function is steeper for gains than for losses compared to the reference level. For a choice set with five alternatives, the value function is steeper for losses than gains, which is consistent with prospect theory (Kahneman and Tversky, 1979).

— Insert Figure 8 about here — 33

Figure 10 shows the overall value functions estimated using the mixed logit and the proposed models. The function has a smooth surface for the mixed logit model, and the value increases monotonically with the attributes values. The indifference curves are linear with a constant rate of attribute substitution. The curvature of the value function increases with choice set size for the proposed model. This is consistent with the observations made by (Aimee Drolet and Tversky, 2000, p.200), that the “indifference curves travel with the choice set.” With three alternatives in the choice set, the indifference curves are almost linear. However, increasing the choice set size accentuated the curvature. Consistent with previous findings, increasing choice complexity leads to the use of non-compensatory decision processes (Johnson and Meyer, 1984, Swait et al., 2002).

— Insert Figure 10 about here —

Implications The choice processes assumed by the proposed model and the mixed logit model have different implications for the structure of competition in the market. The mixed logit model implies that all the alternatives compete with each other. The proposed model implies that the degree of competition depends on the magnitude and sign of the averaging parameter. In the present application, the averaging parameter is negative and its magnitude increases with the size of the choice set. This suggests that as the choice set gets larger, consumers preferred alternatives with higher minimum attribute values. Thus, non-extreme alternatives (those with higher minimum attribute values) are likely to have higher market shares than extreme alternatives. This result is consistent with the compromise effect that is well documented in

34

the marketing and psychology literatures (Simonson, 1989, Kivetz et al., 2004b). Similarly, if a new product is introduced into the market, the mixed logit model predicts that its share will be proportionally drawn from the existing alternatives (including the nochoice option). Thus, the mixed logit model will always predict a market expansion as a result of the new product introduction. However, in the proposed model, the sources of share gain and the impact of the new product on the market size depend on the value of the averaging parameter. We find that the averaging parameter which is associated with a choice set size of five is negative and significantly lower than the one associated with a choice set of four alternatives. Thus, the market size is likely to shrink as a result of the new product introduction. In addition, the new product will not necessarily gain its share proportionally from existing alternatives given the loss aversion of the consumers in this study.

— Insert Table 12 about here —

To illustrate the differences in the implications of the two models, we use the parameter estimates to simulate the effect of introducing a new digital camera (N) to a market with four hypothetical options A, B, C and D, depicted in Figure 9 (a) and (b). Table 12 reports the simulated market shares of the four existing alternatives before and after the new product introduction. With four alternatives in the market, the proposed model predicted a much larger share for alternative C than did the mixed logit model (34% vs. 13%), capturing a compromise effect. On the other hand, the mixed logit model predicted higher market shares for alternative D (an extreme alternative) and the no-choice option.

— Insert Figure 9 about here —

35

With the introduction of the new product, the mixed logit model predicted a much lower market share for the new alternative, N, than did the averaging model (7% vs. 21.4%). This is expected since the new product is positioned as a compromise alternative. For the mixed logit model, the 7% share gain is proportionally drawn from the four existing alternatives as expected. In contrast, the 21.4% share gain predicted by the averaging model is predominantly gained from the extreme alternatives because of consumers’ loss aversion. More importantly, notice that the averaging model predicted a share increase of the no-choice option from 47% to 49%, suggesting that the market is likely to shrink by 2% as a result of introducing a fifth alternative because of increased choice complexity. On the other hand, the mixed logit model predicts a market expansion of 4% (=56% - 52%) as a result of the new product introduction, which is consistent with the regularity condition.

CONCLUSION We proposed a choice model in which the value of an alternative is a generalized mean of its attribute values. It generalizes the multinomial logit model. As the averaging parameter increases (decreases), the value of each alternative in a choice set becomes increasingly skewed towards the highest (lowest) of its attribute values. The limiting cases correspond to lexicographic rules by which the value of an alternative is determined by its best or worst attribute. The proposed model allows violations of IIA, order independence and regularity, and captures different framing and context effects. We described examples, and provided an empirical application illustrating the estimation of the model with heterogeneous parameters. It provides better fit and prediction than a multinomial logit model, and captures a regularity violation that occurs when increasing the size of a choice set increases the nochoice probability. We discussed the implications of the model for the nature of competition 36

in a market. Introducing a new brand makes the extreme alternatives less attractive and the compromise alternatives more attractive. This effect is associated with a change in the averaging parameter that places more emphasis on the worst feature of each alternative.

37

References Becker, Gordon M., Morris H. DeGroot, and Jacob Marschak. (1963). “Stochastic models of choice behavior,” Systems Research and Behavioral Science, 8 (1), 41–55. Chipman, John S. (1960), “The foundations of utility,” Econometrica, 28(2), 193–224. Clark, Herbert H., and Eve V. Clark (1977), Psychology and Language: An Introduction to Psycholinguists, New York: Harcourt Brace Jovanovich. Coombs, Clyde H (1958), “On the use of inconsistency of preferences in psychological measurement,” Journal of Experimental Psychology, 55(1), 1–7. Debreu, Gerard (1960), “Review of R. D. Luce, individual choice behavior: A theoretical analysis,” American Economic Review, 50, 186–188. Drolet, Aimee, Itamar Simonson, and Amos Tversky (2000), “Indifference curves that travel with the choice set,” Marketing Letters, 11(3), 199–209. Gelman, Andrew, and Donald B. Rubin (1992), “Inference from iterative simulation using multiple sequences,” Statistical Science, 7 (4), 457–472. ———–, Jessica Hwang, and Aki Vehtari (2014), “Understanding predictive information criteria for Bayesian models,” Statistics and Computing, 24 (6), 997–1016. Hardy, Godfrey Harold, John Edensor Littlewood, and George P´olya (1952), Inequalities, Second Edition, Cambridge University Press. Huber, Joel, John W. Payne, and Christopher Puto (1982), “Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis,” Journal of Consumer Research, 9 (1), 90–98. Iyengar, Sheena S., and Mark R. Lepper (2000), “When choice is demotivating: Can one desire too much of a good thing?,” Journal of Personality and Social Psychology, 79 (6), 995–1006. Johnson, Eric J., and Robert J. Meyer (1984), “Compensatory choice models of noncompensatory processes: The effect of varying context,” Journal of Consumer Research, 11 (1), 528–541. Kahneman, Daniel, and Amos Tversky (1979), “Prospect theory: An analysis of decision under risk,” Econometrica, 47 (2), 263–291. Kivetz, Ran, Oded Netzer, and V. Srinivasan (2004b), “Alternative models for capturing the compromise effect,” Journal of Marketing Research, 41 (3), 237–257. ———– (2004b), “Extending compromise effect models to complex buying situations and other context effects,” Journal of Marketing Research, 41 (3), 262–268.

38

Krantz, David H (1967), “Rational distance functions for multidimensional scaling,” Journal of Mathematical Psychology, 4 (2), 226–245. McNeil, Barbara J., Stephen G. Pauker, Harold C. Sox Jr, and Amos Tversky (1982), “On the elicitation of preferences for alternative therapies,” New England Journal of Medicine, 306 (21), 1259–1262. Redelmeier, Donald A., and Eldar Shafir (1995), “Medical decision making in situations that offer multiple alternatives,” JAMA, 273 (4), 302–305. Rooderkerk, Robert P., Harald J. Van Heerde, and Tammo HA Bijmolt (2011), “Incorporating context effects into a choice model,” Journal of Marketing Research, 48 (4), 767–780. Sedikides, Constantine, Dan Ariely, and Nils Olsen (1999), “Contextual and procedural determinants of partner selection: Of asymmetric dominance and prominence,” Social Cognition, 17 (2), 118–139. Sen, Amartya (1997), “Maximization and the act of choice,” Econometrica, 65(4), 745–779. Shafir, Eldar, Itamar Simonson, and Amos Tversky (1993),“Reason-based choice,” Cognition, 49 (1), 11–36. Simonson, Itamar (1989), “Choice based on reasons: The case of attraction and compromise effects,” Journal of Consumer Research, 16 (2), 158–174. Swait, Joffre, Wiktor Adamowicz, Michael Hanemann, Adele Diederich, Jon Krosnick, David Layton, William Provencher, David Schkade, and Roger Tourangeau (2002), “Context dependence and aggregation in disaggregate choice analysis,” Marketing Letters, 13 (3), 195–205. Tentori, Katya, Daniel Osherson, Lynn Hasher, and Cynthia May (2001), “Wisdom and aging: Irrational preferences in college students but not older adults,” Cognition, 81 (3), 87–96. Tversky, Amos (1972), “Elimination by aspects: A theory of choice,” Psychological Review, 79 (4), 281. ———–, and Daniel Kahneman (1981), “The framing of decisions and the psychology of choice,” In Science, 211 (4481), 453–458. ———–, and Daniel Kahneman (1986), “Rational choice and the framing of decisions,” The Journal of Business, 59 (4), 251–278. ———–, and J. Edward Russo (1969), “Substitutability and similarity in binary choices,” Journal of Mathematical Psychology, 6 (1), 1–12. ———–, and Itamar Simonson (1993), “Context-dependent preferences,” Management Science, 39 (10), 1179–1189. Vehtari, Aki, and Jonah Gabry (2017), “Practical Bayesian model evaluation using leaveone-out cross-validation and WAIC,” Statistics and Computing, 27 (5), 1413–1432. 39

TABLES

Table 1: Special cases of the value function

r r r r r

→ −∞ = −1 →0 =1 →∞

Vjr Vjr Vjr Vjr Vjr

→ mink vjk is the harmonic mean is the geometric mean (Cobb-Douglas function) is the arithmetic mean → maxk vjk

Table 2: Dating partner profiles on three attributes Dating partner Attribute

a

A

B

CA

CB

Attractiveness 80a 60 80 60 Honesty 55 60 55 60 Dependability 65 87 54 76 Partner A is more attractive than 80% of other prospective partners.

Table 3: Model fit results for Sedikides et al. (1999) study Model

# of Pars

WAIC

LOO

5 6

4932 4689

4939 4695

MNL Proposed

Table 4: Actual and predicted choice probabilities for dating partner A Model

(A,B)

(A,B,CA)

(A,B,CB)

Actual MNL Proposed

0.5000 0.5140 0.5097

0.6150 0.4094 0.5845

0.3330 0.4263 0.3090

40

Table 5: Mean absolute deviations between actual and predicted choice probabilities for Sedikies et al. (1999) study Model

(A, B)

(A, B, CA )

(A, B, CB )

Overall

MNL Proposed

0.0416 0.0472

0.2056 0.0742

0.1034 0.0567

0.1169 0.0594

Table 6: Choice options and their percent market shares for Kivetz et al. (2004b) study Processor Memory Choice set 1 Choice set 2 Choice set 3 Alternative speed (MHz) (MB) (n = 151) (n = 148) (n = 164) A B C D E

250 300 350 400 450

192 160 128 96 64

6 50 44

18 51 31

24 47 29

Table 7: Model fit and prediction results for Kivetz et al. (2004b) study Model MNL LAM Proposed model

#Par.

WAIC

MAD

1 3 2

959 912 913

0.1274 0.0679 0.0633

Table 8: Posterior means and 95% posterior intervals for parameter estimates Paramter α λspeed λmemory r

MNL

LAM

Proposed model

0.04 (0.03,0.05)

0.02 (0.01,0.03) 6.22 (3.30,11.18) 2.90 (1.52,5.42)

0.10 (0.08,0.12)

-3.19 (-10.69,-0.57)

41

Table 9: Model performance statistics Model

WAIC

LOO

6553 3904 3882

6558 3926 3912

Mixed Logit Constant averaging General averaging

Table 10: Expected in-sample and out-of-sample hit rates by choice-set size No of alternatives

3

In-sample or out-of-sample Mixed logit Constant averaging General averaging

In

4 Out

0.46 0.45 0.71 0.67 0.71 0.67

In

5 Out

0.39 0.38 0.67 0.62 0.68 0.63

In

Out

0.38 0.38 0.65 0.62 0.69 0.63

Table 11: Posterior means and 95% posterior intervals for parameter estimates Parameter

Mixed logit

No-choice

β0

Reference points Picture quality Optical zoom

P Q∗ OZ ∗

Sensitivities Megapixels Optical zoom

β1 β2

Constant averaging

General averaging

3.77 (3.61,3.90) 3.45 (3.24,3.84)

3.73 (3.58,3.89) 3.45 (3.14,3.76)

4.56 (4.02,5.16) 2.51(2.18,2.89)

4.42 (4.00,4.92) 2.43 (2.15,2.72)

10.71 (9.91,11.52)

1.82 (1.70,4.56) 0.67 (0.62,0.73)

Averaging parameter Choice-set independent r -0.43 (-0.59,-0.31) 3 alternatives r3 -0.51 (-0.71,-0.39) 4 alternatives r4 -0.69 (-1.48,-0.37) 5 alternatives r5 -3.62 (-6.66,-0.88) Notes. Significant coefficients at the 95% level are highlighted in boldface. The 95% posterior intervals for the parameters are shown in parentheses.

42

Table 12: Simulated market shares

Mixed logit

General averaging

Alternative

4 options

5 options

Relative change

4 options

5 options

Relative change

A B N C D No choice

2% 4% – 13% 24% 56%

2% 4% 7% 13% 22% 52%

-7% -7%

1% 5% – 34% 13% 47%

0.3% 2.4% 21.4% 20.9% 6.2% 49.0%

-54% -51%

-7% -7% -7%

43

-39% -53% 4%

FIGURES maxk vjk

Quadratic mean

Vjr

Arithmetic mean

Geometric mean

Harmonic mean

mink vjk −∞ -12

-10

-8

-6

-4

-2

0 r

2

4

6

Figure 1: Vjr as a function of r.

44

8

10

12

+∞

(a) Minimum (r = -¥)

(b) Harmonic mean (r = -1)

Overall Value

Overall Value

x2

x2 x1

x1

(c) Geometric mean (r = 0)

Overall Value x2 x1

(e) Maximum (r = ¥)

(d) Arithmetic mean (r = 1)

Overall Value

Overall Value

x2

x2 x1

x1

Figure 2: Value functions (V r ) and indifference curves for different values of the averaging parameter (r)

45

(a) 7 6

1

5 4 3 2

Quality

1 0

2

−1 −2 −3 −4 −5 −6

3

−7 4

3

2

1

0 Price

-1

-2

-3

-4

(b) 0.9

P (1 ∈ C2 |r2 )

P (2 ∈ C2 |r2 )

0.8 0.7 P (1 ∈ C1 |r1 = 0)

P (1 ∈ C2 |r2 = 0)

Probability

0.6 0.5 0.4 0.3

P (2 ∈ C1 |r1 = 0) P (2 ∈ C2 |r2 = 0)

0.2 0.1 0 -0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05

0 r2

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Figure 3: (a) Price-quality profiles of three alternatives (b) Choice probabilities of alternative 1 and 2 as a function of r2

46

0.7

P2ra

0.65

0.6

P2r

0.55

0.5

P2rd

0.45

0.4

0.35

0.3 −2

−1.5

−1

−0.5

0 r

0.5

1

1.5

2

Figure 4: P2r as a function of r.

1.06

1.06 1.04

LAM Proposed model

1.04

1.02

1.02

1 1 0.98 0.98

0.94

0.96

0.92

0.94

Vjr

Vjr

0.96

0.9

0.92

0.88 0.9

0.86 0.84

0.88

0.82

0.86

0.8

0.84

0.78 0.82 −2

−1.5

−1

−0.5

0 Speed

0.5

1

1.5

−2

2

(a)

−1.5

−1

−0.5

0 Memory

0.5

1

1.5

2

(b)

Figure 5: Estimated value functions of the LAM and the proposed model for (a) Speed and (b) Memory

47

Figure 6: An example of a choice set with three alternatives

18

% no choice option

17 16 15 14 13 12 11 3 alternatives

Actual

Mixed logit

4 alternatives

Constant averaging

5 alternatives

General averaging

Figure 7: Actual and predicted shares of no choice

48

3 alternatives 4 alternatives 5 alternatives Reference level

3

2.5

2.5

2

2 Vjrs

Vjrs

3

1.5

1.5

1

1

0.5

0.5

0

0

2.6

2.8

3

3.2

3.4

3.6 3.8 Megapixels

4

4.2

4.4

4.6

4.8

1.5

2

2.5

(a)

3

3.5 4 Optical zoom

4.5

5

5.5

(b)

Figure 8: (a) The fitted value function of picture quality (optical zoom fixed at its reference level) (b) The fitted value function of optical zoom (picture quality fixed at its reference level)

A

4.5

B

3.5

C

3

D 3.5 Megapixels

4

N

3.5

C

3

2.5

3

B

4

Optical zoom

Optical zoom

4

2.5

A

4.5

D

2.5

4.5

2.5

(a)

3

3.5 Megapixels

4

4.5

(b)

Figure 9: Digital cameras attribute values (a) Four available alternatives (b) Introduction of a fifth alternative N

49

(a) Mixed Logit

(b) General averaging: 3 options

Overall Value

Overall Value

a Optic

ical

l zoo

Opt

m

m

zoo

ixels

ap Meg

(c) General averaging: 4 options

ixel

gap

Me

s

(d) General averaging: 5 options

Overall Value

Overall Value

a Optic

a Optic

l zoo m

m

l zoo

s

ixel

gap

Me

ixel

gap

Me

s

Figure 10: The implied value surfaces of the mixed logit and the proposed model.

50