Optimal Conflict in Preference Assessment - INFORMS PubsOnline

0 downloads 0 Views 175KB Size Report
Conflict arises in decision making when the choice alternatives present strong ... that there may exist some optimal level of conflict for efficient measurement of ...
Optimal Conflict in Preference Assessment Philippe Delquié

INSEAD, Boulevard de Constance, Fontainebleau 77300, France [email protected]

C

onflict arises in decision making when the choice alternatives present strong advantages and disadvantages over one another, that is, when the trade-offs involved are large. Conflict affects human response to choice, in particular, it increases decision difficulty and response unreliability. On the other hand, larger trade-offs, i.e., higher conflict, reveal more information about an individual’s preferences and mitigate the influence of measurement unreliability on preference model estimation. This suggests, somewhat counterintuitively, that there may exist some optimal level of conflict for efficient measurement of preferences. How to determine this level? This issue is examined from behavioral and analytical angles. We outline a general analysis of the interaction between trade-off size and modeling accuracy, and demonstrate its application on a simple example. The kind of analysis developed here can be conveniently implemented in a computer spreadsheet, and would be especially valuable when large amounts of preference data are to be collected, as in consumer preference studies, experimental research, and contingent valuation surveys. (Preference Modeling; Utility Assessment; Decision Analysis; Value Trade-off; Response Bias and Error )

1.

Conflict in Choice

Conflict arises when the alternatives among which one must choose involve large give and take. This would be the case when alternatives are dissimilar on different attributes as, for example, when each alternative offers significant advantages and disadvantages over the other. For example, choosing between a job offer in New York and a similar job offer in Paris presents more conflict than choosing between two similar job offers in New York. As another example, the purchase choice between an expensive digital camera and an inexpensive standard camera involves more conflict than the choice between two low-priced standard cameras. The former choice involves a major trade-off between technology and price, whereas the latter does not. In decisions under risk, conflict will be high when one alternative can lead to either much better or much worse outcomes than the other. More generally, if we imagine choice alternatives as points in a multidimensional space of attributes, conflict will be higher when the choice alternatives are more disManagement Science © 2003 INFORMS Vol. 49, No. 1, January 2003 pp. 102–115

tant in that space, that is, more dissimilar. Figure 1 illustrates the idea for two-attribute alternatives. In Figure 1, a choice or trade-off between A and D involves a large degree of conflict, whereas B vs. C presents less conflict. Note that the notion of conflict is distinct from that of indifference or undecidedness. For example, all options in Figure 1 might be close to the same indifference curve; they may be valued nearly equally by the individual. Yet, the psychological tension from having to choose between A and D would be greater than that from having to choose between B and C. High conflict is associated with large potential for regret. Behavioral studies of choice have found that conflict affects decision making in several ways. The main finding is that the presence of conflict renders decision making difficult (e.g., Chatterjee and Heath 1996). Fischer et al. (2000b) have focused on conflict within an alternative, that is, an alternative that presents both good and bad aspects. They found that greater conflict increases response difficulty and is 0025-1909/03/4901/0102$5.00 1526-5501 electronic ISSN

DELQUIÉ Optimal Conflict in Preference Assessment

Figure 1

Choice Options in a Two-Attribute Space

Attribute Y A •

B •

C • D • Attribute X

associated with higher levels of response error. Several studies have found that individuals tended to shun conflict. People seem inclined to avoid conflictladen choice if they can, by deferring choice, selecting a default option, or maintaining the status quo (Tversky and Shafir 1992, Luce 1998, Dhar and Nowlis 1999). Another important effect of the presence of conflict in choice was investigated by Bettman et al. (1993). These authors studied how individuals adapt their decision strategies within an effort/accuracy framework. In their study, conflict was related to the amount of negative correlation between payoffs of gambles with identical probabilities, that is to say, the extent to which two gambles led to similar (low conflict) or disparate (high conflict) outcomes. They found that higher conflict led to more intensive information processing and the use of more compensatory choice rules. To cope with conflict, subjects resorted to decision strategies closer to normative rules and attempted to make trade-offs to a larger extent. Thus, an interesting finding of Bettman et al.’s (1993) study is that individuals may also react to conflict by getting more involved in the decision task. Although subjects were working harder when faced with conflict, their decisions were not more accurate, however. By contrast to the behavioral literature, standard decision theory ignores the notion of conflict. Whether alternatives present more or less conflict is irrelevant in the normative theory, because issues of cognitive tractability or psychological involvement have no place in that theory. If decision makers have difficulty expressing reliable preferences when the options they Management Science/Vol. 49, No. 1, January 2003

face involve conflict, it would seem that one should avoid large trade-offs in building models of their preferences. However, resolving conflict is the essence of decision making: it is hard, yet unavoidable at one level or another. Some conflict needs to be present for choice to be meaningful and reveal valuable information about an individual’s preferences. Another consideration is that the presence of conflict may get decision makers more engaged in resolving a choice dilemma, relying less on suboptimal cognitive shortcuts. There is evidence of this in Bettman et al.’s (1993) findings: their subjects used more sophisticated decision strategies when conflict was higher. Therefore, a prescriptive approach to decision making should push, and insofar as possible help, individuals to face up to conflict to the largest possible extent. The considerations in the preceding two paragraphs suggest that there may exist an optimal amount of conflict that one should submit to individuals’ appraisal to obtain rich, reliable representations of their preferences. In this paper, we show how to take this into account to judiciously pick trade-off questions for efficient, reliable preference modeling. In the next section, we examine more closely the relationship between conflict and modeling accuracy. In §3, we develop the analysis on an example to determine the right amount of conflict for assessing a simple preference model. In §4, we summarize the approach and discuss extensions and applications.

2.

Conflict, Error, and Preference Modeling Accuracy

Models of value trade-offs and preferences play a central role in decision and policy analysis. In some cases, values have to be elicited from stakeholders to build a model of their preferences. A model structure is specified, and the model’s parameters are estimated from individuals’ preference responses assessed over a sample of points selected within the domain of interest. The model so calibrated is then used to prescriptive or predictive ends over its domain of application. The preference measures from which models are derived are typically affected by bias and error. Bias refers to systematic effects resulting, for example, from elicitation method, context, cognitive 103

DELQUIÉ Optimal Conflict in Preference Assessment

strategies employed to respond to the task and the interaction thereof. Bias in preference elicitation has been and continues to be the subject of extensive research (e.g., Hershey et al. 1982, Delquié 1997). If bias is known and well understood, then it can, in principle, be dealt with either by designing appropriate elicitation methods to mitigate it or by accounting for it in the model. For example, the nonlinear treatment of probabilities in decision under risk constitutes a source of bias in calibrating expected utility (EU) models (de Neufville and Delquié 1988). However, nonlinear probability weighting is allowed for, i.e. modeled, in rank-dependent expected utility (RDEU). Therefore, biases due to this effect should be strongly reduced in RDEU models, provided a descriptively correct functional form is selected, of course. The point is that a good understanding of bias is important to (1) the design of sound assessment procedures and proper interpretation of responses, and (2) the proper selection of models. The present work is concerned with error; it is complementary to studies of bias. This is not to say that we ignore the impact of bias altogether in the present work, but rather we will assume that known biases are appropriately factored into the choice of models being assessed. We will return to this issue as part of the analysis presented in §3. While dealing with bias in preference elicitation usually requires us to focus on the issue of how assessment questions should be asked, by contrast, we will focus here on the issue of which questions should be asked for efficient modeling. Error refers to inherent random variability in responses. Response error directly translates into model error, and eventually error in all conclusions derived from using the model. To address this concern for example, Eliashberg and Hauser (1985) examined the implications of measurement error on the estimation of consumers’ utility functions. They showed how measurement error translates into probability distributions over the consumer’s utility parameters. From these distributions, they could then derive estimators of the probability that an alternative is chosen, which is ultimately the quantity of interest. Response error can have several sources, some of which are briefly reviewed here. 104

Unreliability in elicitation protocol. Unintended variations in the administration of the assessment procedure can create unreliability in the data. This can be controlled by standardizing elicitation protocols as much as possible, for example, by implementing them on a computer. Lack of attentiveness. Subject’s fatigue or boredom can lead to unreliable responses. Keeping task complexity, duration, and information load at reasonable levels to reduce this kind of unreliability is part of good experimental practice. Lack of motivation. Response error can also result from carelessness and low involvement on the part of respondents. Providing incentives designed to increase respondent’s motivation is a way to mitigate this type of error. Inherent lack of precision in preferences. The “fuzziness” of individual preferences is an important source of unexplained response variability. It may result from several factors. First, individuals have limited cognitive and perceptual abilities. Limitations in information processing can be partly overcome through enhanced elicitation strategies such as, for example, decomposition of key assessments into more elementary assessments (Kleinmuntz 1990), or response facilitation mechanisms such as aiding judgment through bracketing choices (Casey and Delquié 1995) or ranking (Laskey and Fischer 1987). In addition, response unreliability may result from individuals’ personal uncertainty about their own preferences, lack of confidence in their responses, or mere ambivalence of tastes. Averaging independent, repeated measures is perhaps the most straightforward means of reducing this kind of uncontrolled response variability. Individual uncertainty in preferences can be exacerbated by the presence of conflict, as reviewed in the previous section. Fischer et al. (2000a) developed a model that explains how preference uncertainty can be influenced by both the level of conflict between the attributes of an alternative and how extreme the attribute values are. Individual differences. Finally, response variability may just reflect individual differences, which are treated as random error whenever data are aggregated across individuals to obtain a descriptive model Management Science/Vol. 49, No. 1, January 2003

DELQUIÉ Optimal Conflict in Preference Assessment

Figure 2

Spacing of Assessment Points and Indifference Curve Estimation

Y1

fixed Y2

εx

fixed Y2

εx

x

X1

x

small trade-off large trade-off

of the typical or average individual. Indeed, the socalled “representative agent” is a standard assumption of economic models (see, e.g., McFadden 1981, Hartley 1996). Differences between individuals are a source of “legitimate” variance, not error in the sense of unreliability. Often, however, we are not interested in the preferences of particular individuals per se, but rather in features or patterns of preference that may exist across individuals in a population. We then assume that all individuals can be characterized by the same parameter, and individuals’ deviations from this value constitute error in the statistical treatment of the data pooled across the sample of individuals. The downside of enhanced elicitation schemes and repeated measures as error control mechanisms is that they multiply the amount and complexity of data collection. The approach to error mitigation that we develop here addresses response variance due to the individuals’ uncertainty about their own preferences as well as individual differences. The idea is to judiciously select the position of points to be assessed within the domain of interest, based on the recognition that (1) all assessment questions are not equally affected by response error, e.g., questions involving more conflict are more difficult, and (2) parameter Management Science/Vol. 49, No. 1, January 2003

estimation is more or less sensitive to response error depending on the relative locations of the assessment points, i.e., the trade-off size. Preference assessments are often obtained in the form of indifferences among pairs of alternatives. A very general method to elicit indifferences is that of trade-offs, in which the individual has to specify the difference required on one attribute, X, to compensate a given difference on the other attribute, Y . This can take the following form: [X1  Y1  ∼ x? Y2 ], where the values of X1  Y1  Y2 are set by the experimenter, and the value of x is supplied by the respondent so as to equate the two options in preference.1 The response will then define two points of an indifference curve in the X Y space. As the distance between these two points, i.e. the trade-off span, increases, two concurrent forces are liable to influence the error on the estimated indifference curve parameter: (1) response error grows, owing to higher conflict as reviewed above, but (2) the leverage of response error on the model parameter vanishes. Figure 2 provides one possible graphical interpretation for this. 1

The values fixed by the experimenter will be denoted with capital letters, and the responses elicited with small letters.

105

DELQUIÉ Optimal Conflict in Preference Assessment

Table 1 Trade-off size

Trade-off Size, Individual Response Reliability, and Modeling Error Indifference elicited X1  P1  ∼ x? P2 

Financial gain vs. probability Small $1000 0 90 ∼ x? 0 80 Medium $1000 0 90 ∼ x? 0 50 Large $1000 0 90 ∼ x? 0 10 Travel delay vs. probability Small [10 min., 0 90 ∼ x? 0 80 Medium [10 min., 0 90 ∼ x? 0 20 Large [10 min., 0 90 ∼ x? 0 05

Mean response x

Individual response error mean of  x

Parameter estimate 

Parameter error mean of  

1830 6042 27972

618 3634 23217

0.481 0.477 0.537

0.238 0.180 0.099

0.638 0.732 0.584

0.187 0.177 0.132

15 1 38 6 83 5

To obtain more direct evidence on the effect of trade-off size on individual response error, we collected trade-off statements involving different spans from 20 MBA participants. One set of three questions concerned trade-offs between a financial gain and its probability, specifically: a 90% chance of earning $1,000 next month (denoted as [$1,000, 0.90]) vs. a lower probability (resp. 80%, 50%, 10%) of earning an amount larger than $1,000 next month, the complementary probability being always to earn $0. The other set of questions asked the subjects to trade off travel delay incurred in commuting (in minutes) and the probability of such delay. The respondents lived in a large metropolitan area, and they were familiar with the experience of commuting and its associated risks of delay. The six questions, listed in Table 1, were presented on a paper questionnaire with written instructions. Note that the set up of these questions is parallel to that represented in Figure 2. The questionnaire was administered twice, 1 week apart. We were interested in the reliability of individual response as a function of trade-off size. The reliability of each individual’s response to a question was measured as the standard deviation of his or her two responses to that question. Table 1 reports the mean of the individuals’ responses and the mean of the individual errors for each question, in columns three and four, respectively. The results in Table 1 are based on 18 of the 20 individuals for which we did obtain complete repeated measures. The fourth column in the table shows that the individual response error is larger when the tradeoff is larger. Using medians instead of means show the same pattern, although less strong. 106

2 4 12 5 48 1

If response error increases with trade-off size, but its leverage on model error decreases with trade-off size, then the net result of these competing influences could be anything a priori: model parameter error could increase, decrease, or follow a nonmonotonic pattern as trade-off magnitude increases. Thus, the accuracy of a preference model estimation may depend in nonobvious ways on the spacing of elicitation points. This point is essentially illustrated in the rightmost column of Table 1. This column shows the mean individual error in the estimates of a power utility function parameter at each trade-off size. Let us briefly explain how these were obtained. Consider the risky financial gain questions: for each individual, we have six indifference points, two responses to each of three questions. Assume that these six preference measures for each subject can be adequately described by the following RDEU model:2 U X P = P  / P  + 1−P  1/ · X  . Here, we are interested in finding best estimates of  and  for each subject individually, and more particularly, the reliability with which we can estimate . If we know the value of  for an individual, we can easily calculate six estimates of  from the six responses. 2

More details and references will be provided on this in §3. An EU model does not provide an adequate fit over the whole probability × outcome domain covered by the six responses here due to nonlinear weighting of probabilities. This is consistent with this author’s own previous work showing that EU functions derived from outcome responses to lottery equivalent questions depend systematically on the probabilities used in the lotteries (see Experiment 6 in Contingent Weighting of the Response Dimension in Preference Matching (1989) PhD dissertation, MIT, Cambridge, MA, for a direct test of this).

Management Science/Vol. 49, No. 1, January 2003

DELQUIÉ Optimal Conflict in Preference Assessment

To fit the above model, we adjusted the value of  for each subject individually so as to minimize the standard deviation of the six  estimates for this individual. Thus, for each subject, we ended up with a unique  estimate and a tightest set of  estimates, of which we can take the mean to describe the subject’s six responses. The mean value of the ’s so calculated for the 18 subjects is 0.61 for the financial lottery data. The mean values of  estimates appear in column five of Table 1; they seem rather consistent across the three trade-off sizes. Now, consider the error in pairs of ’s estimated from the two responses to small, medium, and large trade-offs, respectively; the means of these individual errors are listed in the last column of Table 1. As can be seen there, the average individual error decreases with trade-off size. The risky travel delay data was processed in the same way, with the slight difference that the utility function is now − X  because it must be a decreasing function of travel delay. The mean of the  estimates from the 18 individuals is now 0.36, suggesting incidentally, that the probability weighting function may be context-dependent. Again, the mean estimates and errors in  are given in columns five and six of Table 1, showing the same effect as for the financial risk preferences. To conclude, the data of Table 1 show an example where response error increases with trade-offs, yet parameter estimation error decreases, at the individual level. The data in Table 2 provide another case in point, with a different setup and response mode. These data were previously collected as part of a study of utility assessment biases (de Neufville and Delquié 1988). Again, [X, P ] denotes a binary lottery offering a P chance of receiving $X. In the four indifferences shown in Table 2, the trade-off span varies systematically: X1 is set increasingly further from X2 , while P1 = 090 and X2 = 10000 remain constant. This configuration of trade-offs of varying sizes is different from that of Figure 2 and Table 1, and the response is now P instead of X. Table 2 shows the means and standard deviations of the responses obtained from 25 subjects. It also shows the standard deviation of the exponent  of a power EU function, U X P = P · X  , fitted to each individual response.3 3

EU, which assumes linear weighting of probability, provides an adequate descriptive model here. This is supported by the

Management Science/Vol. 49, No. 1, January 2003

Table 2

Response Error vs. Model Parameter Error

Indifference elicited X1  P1  ∼ X2  p? 8000 0 90 ∼ 10000 p? 6000 0 90 ∼ 10000 p? 4000 0 90 ∼ 10000 p? 2000 0 90 ∼ 10000 p?

Mean response p

Response error  p

Parameter error  

0.81 0.75 0.69 0.51

0.11 0.13 0.17 0.24

0.94 0.56 0.40 0.44

As can be seen in Table 2, the dispersion of p increases with wider trade-off span. responses  This may be attributed to higher individual response error as well as individual differences, which will appear more clearly with larger trade-offs. As a matter of fact, if two individuals have different utility functions, that is, distinct indifference curves going through [X1  Y1 ], their respective indifference curves will diverge by moving away from that point and, accordingly, the difference between these individup increases, als’ responses will become larger. While   , however, the dispersion of the estimated  values,  actually decreases and, perhaps, goes through a minimum for a setting of X1 around 4,000. This suggests that if the estimation of  for this sample of subjects was to be based on a single assessment question, the value of X1 should be set around 4,000 to minimize calibration error. We now turn to a more formal analysis of how to determine an appropriate level of conflict in assessing trade-offs.

3.

Finding an Optimal Level of Conflict

The analysis here concerns the elicitation of preferences in the form of simple trade-offs between two attributes as reviewed above. Trade-off methods can be used to assess multiattribute utility and value models with any number of attributes, as well as expected or nonexpected utility functions. In the case findings of de Neufville and Delquié (1988) that lottery equivalent questions with a probability response mode, such as those of Table 2, led to response data overall consistent with EU. This was indeed one of the key results of their study. The issue of model adequacy is further addressed in §3 by way of sensitivity analysis.

107

DELQUIÉ Optimal Conflict in Preference Assessment

of decision under risk, the two dimensions to be traded off are a risky outcome X and its associated probability P (the complementary probability 1 − P being associated with a fixed reference outcome, such as the status quo). Moreover, one of the probabilities could be equal to 1, thereby producing as special cases the well-known certainty equivalence and probability equivalence methods (cf. Hershey et al. 1982). Thus, the analysis that follows applies to a wide range of commonly used preference models and assessment methods. General Analysis The selection of fixed values X1  Y1  Y2 is often based on pragmatic considerations such as plausibility of these values to the respondent, but it remains fairly arbitrary within the relevant ranges of attributes. For given values of the stimulus point X1  Y1 , a whole range of possible settings for Y2 can be considered: Y2 could be set close to Y1 , at Y2 , or farther away at Y2 , producing responses correspondingly near or far from X1 , as shown in Figure 2. To examine the relationship between trade-off size, response error, and model parameter error, we need to specify the mathematical form of the preference model and assume the existence of “true” values for the model parameters. This is consistent with any practical application of preference modeling: a model is selected that is deemed a priori to be a good descriptor of reality, and the parameters characterizing the model are assumed to have true values that can be estimated albeit imperfectly, that is, with error (see, e.g., Eliashberg and Hauser 1985, p. 3). Here, we consider the determination of one parameter of a preference model. This could be, for instance, the parameter characterizing an EU function such as the commonly used power utility, u X = X  , or exponential utility, u X = 1 − e−cX ; or the relative weight of attributes, , in an additive multiattribute model U X Y = vx X + 1 −  vy Y . The analysis below is not restricted to one-parameter models, however. It can be applied to models with multiple parameters as long as these parameters can be estimated individually, e.g., as in a multiattribute model with n ≥ 2 attributes. This will be further discussed in §4. 108

For any given setting of fixed assessment values, a value of the model parameter  can be uniquely estimated from a response x, in other words, we can write:4  = f x 

(1)

Now, consider a relatively small error x from the “true,” error-free response x0 , i.e. x = x0 + x . Neglecting second-order terms in the Taylor expansion of (1) above, the resulting error in  will be  = x × f  x0 = x ×

d x  dx 0

(2)

Imagine a whole distribution of x ’s around x0 with standard deviation x . According to (2), the resulting distribution of  will have standard deviation  given by    d   (3)  = x  x0   dx Equation (3) clearly shows that parameter error  is the product of two factors: x , the response error and d/dx (taken at x0 ), which measures the influence or leverage of response error on the estimation of  for the particular setting of fixed values adopted. Larger values of d/dx mean greater sensitivity of  to error in x.5 Thus, Equation (3) suggests that reductions in  could be achieved by reducing either or both of these two factors. However, it may not be possible to reduce one of the factors without increasing the other, by adjusting trade-off range. As a matter of fact, response error x will increase with trade-off size, while the leverage d/dx decreases with tradeoff size. This is illustrated in Figure 2, showing that the same error x in response will cause a wider shift 4

 could be an implicit function of x, i.e., f  x = 0.

5

Equation (3) describes the relationship between response error and parameter error, but it makes no claim as to the actual locus of error. It could be that response error reflects, in part, error in a more fundamental parameter that characterizes preferences. However, since only responses (x) and response error are observable in building a model, we can take error as being entirely inherent in responses. See Laskey and Fischer (1987) for further discussion on the psychological locus of error, whether in the observable responses or the unobservable underlying preferences.

Management Science/Vol. 49, No. 1, January 2003

DELQUIÉ Optimal Conflict in Preference Assessment

in the indifference curve, hence the parameter characterizing it, when the trade-off is assessed over a narrower range. Note that this results purely from the geometric structure of the problem—it is independent of any particular model of the indifference curve. Therefore, it will be necessary to examine the resulting pattern of model error  as a function of tradeoff size to identify a setting that will yield a low or minimum (if one exists)  . Although the pattern of  could have any shape a priori, depending on the relative strengths of x and d/dx , a rudimentary analysis may be sufficient to identify an approximate setting of the fixed values that would produce lower  . The analysis entails (1) determining the pattern of x as a function of trade-off size, and (2) determining the pattern of d/dx as a function of trade-off size, and then multiplying the two. Next, we carry out this analysis on a simple example. A Worked-out Example As part of an experiment reported elsewhere (Study 1 in Delquié 1993), two indifferences involving the same stimulus were obtained from 15 subjects, as shown in Table 3. Table 3 shows that the dispersion x , is larger as P2 is set farther from 0.90, of responses,  at 0.80 and 0.67, respectively. Now, suppose we wish to assess a utility function for monetary outcomes by eliciting a single indifference between two simple lotteries as follows: [$1,000, 0.90] ∼ [x? P2 ]. Furthermore, we would like to represent this utility function with a power EU model, U X P = P · X  . Later, in a sensitivity analysis, we will relax this assumption and consider the more general RDEU form. The question we seek to answer here is: How should we set P2 (lower than 0.90) so as to obtain a reliable estimate of ? For example, consider these two extreme possibilities: (1) For what dollar amount x would you be indifferent between [$x, 0.89] and [$1,000, 0.90]? Table 3

Two Trade-offs with Different Spans

Indifference elicited X1  P1  ∼ x? P2  1000 0 90 ∼ x? 0 80 1000 0 90 ∼ x? 0 67

Mean response x

Standard deviation  x

1,419 2,518

347 1,242

Management Science/Vol. 49, No. 1, January 2003

(2) For what dollar amount x would you be indifferent between [$x, 0.01] and [$1,000, 0.90]? These two trade-off questions correspond to extreme settings of P2 , and their configuration is parallel to that depicted in Figure 2, with X on the horizontal axis and P on the vertical axis. In the first case, with P2 set close to 0.90 at 0.89, the response x will be correspondingly and predictably close to 1,000, in fact, slightly more than 1,000 because 0.89 is slightly less than 0.90. The typical error in x may be of the order of, say, 100. At the limit, if we set P2 = P1 = 090, the response would have to be exactly x = X1 = 1000, therefore, having no error at all (barring error due to such things as the subject misreading stimulus values, or not understanding the question). However, the response in that case would be totally uninformative, indeed, useless for estimating a model. By contrast, in the second case, when P2 is set at 0.01, a value quite distant from 0.90, the response x is clearly expected to be much larger (in fact, greater than 90,000 for a risk-averse respondent) and accordingly more widely distributed. The typical error in x may now be of the order of tens of thousands. As the gap between P2 and P1 increases, the trade-off spans a wider range and response variance grows. With the model we have chosen, an indifference response yields the following equation: P1 X1 = P2 x 

(4)

Therefore,  is readily derived from a response x as follows: =

log P2 /P1  log X1 /x

(5)

If x contains random error, then to a first-order approximation, the resulting error in  will be as in Equation (3) again:    d    = x  x0  dx where x0 denotes the error-free response. Now, to find how  varies as a function of the choice of P2 , we need to determine how both x and d/dx x0 vary as a function of P2 . We examine them in turn. 109

DELQUIÉ Optimal Conflict in Preference Assessment

x P2 = K · 1000 09/P2 1/0 − 1 

(6)

where K is a positive constant and P2 ≤ 09. Note that this implies the logical limit condition x = 0 for P2 = 09, that is to say that responses to the question 1000 090 ∼ x? 090 would have no variance. This is reasonable because the only possible response value is x = 1000 in this case. Also note that Equation (6) applies for P2 ≤ 09 only, that is, it was derived by considering how far to set P2 below P1 = 0.90. Should we be interested in setting P2 above P1 , then the magnitude of response error would be taken as X1 − x0 = X1 1 − P1 /P2 1/0 , and the resulting error function would be x P2 ∝ 1000 1 − 09/P2 1/0 , which increases as P2 departs above from 0.9. The basic phenomenon captured in either the preceding equation or Equation (6) is that indifference curves fan out away from the fixed point [X1  P1 ] (see Figure 2) whether we depart from this point above or below. The value 1/0 = 2.5 accounts well for the limited data we have in Table 3 on x P2 : the function x P2 = 1000 09/P2 25 − 1 of Equation (6) is plotted in Figure 3a. This is meant to represent but one example of a 110

Figure 3a

Size of Response Error as a Function of Trade-off Size

Size of response error, σ x (P2 )

Size of Response Error. How does response error x vary as a function of the setting of P2 ? In Table 3, we have only two values for x from which to estimate such a function, however, the form of this function (other than the fact that it increases as P2 decreases from 0.90) is unknown, so it will have to be hypothesized. We can always start with the behavioral assumption that response error x may be roughly proportional to the magnitude of the response itself. This is consistent with “Weber’s Law,” which states that the psychological discrimination of a variation in stimulus is proportional to the stimulus magnitude (see, e.g., Shepard 1981). In the absence of error, the response would be, from Equation (4), x0 = X1 P1 /P2 1/0 , where 0 is the error-free value of the parameter we seek to estimate. The magnitude of the response here can be taken as x0 − X1 , because it is the distance (between x0 and X1 ) to be bridged by the trade-off response that is relevant here, not the absolute size of x0 per se. Therefore, we may take response error x as proportional to x0 −X1 = X1 P1 /P2 1/0 −1 , that is,

0.90

0.70

0.50

0.30

0.10

Setting of P2 (controls trade-off size)

plausible x P2 function. As part of a sensitivity analysis (see below), we considered other values of 1/0 as well as other functional forms for x P2 . We now turn to the pattern of d/dx x0 , the influence of response error on parameter error as a function of P2 . Leverage of Response Error. To assess the impact of response error, we need to examine the derivative of (5) at x0 , which is log P2 /P1 d  x = dx 0 x0 log X1 /x0 2

(7)

The true response, x0 , is, of course, itself a function of the choice of P2 as we have seen above: x0 = X1 P1 /P2 1/0 . Substituting this in Equation (7) and taking the absolute value (again keeping in mind that P2 < P1 in the present case, i.e., log P2 /P1 < 0), we end up with    d  2 P /P 1/0  L0 P2 =  x0  = 0 2 1  (8) dx X1 log P1 /P2 In Equation (8), L0 P2 describes the leverage of response error on the estimate of , regarded as a function of P2 . It is plotted in Figure 3b, showing how this leverage decreases as a function of trade-off span, and eventually attenuates as P2 is set farther away from P1 = 090. In the graph, 0 is taken to be 0.55, which was the mean value obtained in that experiment; other choices of 0 yield the same decreasing pattern, more or less accentuated. Management Science/Vol. 49, No. 1, January 2003

DELQUIÉ Optimal Conflict in Preference Assessment

Leverage of Response Error as a Function of Trade-off Size

Leverage of response error, L 0 (P2 )

Figure 3b

0.90

0.70

0.50

0.30

0.10

Setting of P2 (controls trade-off size)

Resulting Model Error. It is now easy to obtain the pattern of  resulting from the product of the two functions of Figures 3a and 3b, that is,  P2 = x P2 × L0 P2 , which is plotted in Figure 3c. According to that graph, a minimum for  occurs around P2 = 0.25, suggesting this value as an optimal setting of P2 for minimizing model error. The robustness of this conclusion, however, should be assessed by investigating its sensitivity to the various assumptions made. Sensitivity Analysis. We examined the sensitivity of the above result with respect to the shape of the

Figure 4

Model Error as a Function of Trade-off Size

0.90

Model Error as a Function of Trade-off Size When Response Error Grows Linearly with Trade-off Size

Model error, σα (P2 )

Model error, σα ( P2 )

Figure 3c

x P2 function and the choice of 0 . We used power (as above) and linear functions to represent how response error x grows as a function of trade-off size, and values ranging from 0.35 to 0.7 for 0 . In many cases, the pattern of  does not present a minimum but, instead, continuously decreases with a more or less pronounced “elbow.” Figure 4 shows a typical example of this pattern, resulting from response error that grows linearly with trade-off size: x P2 = 4500 09 − P2 for P2 ≤ 09. In those cases, it seems that P2 should be set as far away from 0.90 as possible to reduce  . However, widening the trade-off range, i.e., 090 − P2 , beyond a certain point appears to provide little benefit in terms of further reduction in  , while it may substantially increase trade-off difficulty for the respondents. For instance, relatively little reduction in  is gained beyond, say, P2 = 0.5 or 0.4 in Figure 4. Thus, it would seem appropriate to select P2 close to the “elbow” value whenever  shows a pattern similar to that in Figure 4. Another form of sensitivity analysis concerns the choice of the model, in particular, the possible presence of probability distortion, which would create bias in the EU model used here. To address this, we repeated the above analysis using a RDEU model with probability weighting w p = p / p + 1 − p  1/ . Tversky and Kahneman (1992) and several other researchers, including Wu and Gonzalez

0.70

0.50

0.30

Setting of P2 (controls trade-off size)

Management Science/Vol. 49, No. 1, January 2003

0.10

0.90

0.80

0.70

0.60

0.50

0.40

0.30

0.20

0.10

0.00

Setting of P2 (controls trade-off size)

111

DELQUIÉ Optimal Conflict in Preference Assessment

(1996), Bleichrodt and Pinto (2000), and Abdellaoui (2000), all fitted this probability weighting function to their experimental data, and found values for  ranging from about 0.6 to 0.7 (see Table 1 in Bleichrodt and Pinto (2000) for a summary). Note that for  = 1, the model reduces to EU, i.e., the case examined above. We considered values for  ranging from 0.85 to 0.55. Using a proportional response error growth as in Equation (6), the resulting pattern of  P2 is similar to that of Figure 3c, but the minimum occurs at lower values of P2 . For example, when  = 0.85, minimum model error occurs for P2 = 0.22, and the location of this minimum gradually shifts to occur at P2 = 0.10 as  decreases to 0.55. For the case of response error growing linearly with trade-off size, the results are qualitatively similar to Figure 4, again with the desirable setting of P2 occurring at lower values when  is lower. Table 4 provides a summary of the recommended setting of P2 for these different cases. In summary, our sensitivity analysis indicates overall that  decreases rapidly in the beginning as P2 departs from 0.90, and then may go through a minimum as in Figure 3c, or continue decreasing as in Figure 4. Looking across the different cases we examined, we found that P2 should not be set above 0.40, and there is little advantage or possibly a penalty in setting it below 0.15 or 0.10. Considering the approximations of our analysis, our conclusion is that any choice of a P2 value in the 0.15–0.40 range would be justifiable to achieve reduced error in . Let us now summarize, and then underline practical implications and extensions of the idea developed here, as regards the assessment of preference models. Table 4

Sensitivity Analysis Summary: Recommended Setting of P2 Under Various Assumptions

Probability weighting parameter, 1.00 (EU) 0.85 0.75 0.65 0.55

Response error growth Proportional

Linear

0.25 0.22 0.20 0.15 0.10

≤ 0.50 ≤ 0.35 ≤ 0.25 ≤ 0.10 ∗

Note. All results in Table 4 correspond to 0 = 0.55 for calculating L0 P2 . ∗ Steadily decreasing pattern of  suggests setting P2 close to 0.

112

4.

Applications of the Concept of Optimal Conflict

The kind of analysis presented above may have useful applications in the design of individual preference surveys such as those used in environmental policy, urban and transportation planning, or marketing as well as utility assessment questionnaires. It can also be helpful to behavioral researchers in planning preference elicitation experiments. The whole procedure can be summarized as follows. Before designing the full-scale data collection on a particular assessment question (1) Determine the anticipated pattern of response error, x , as a function of the trade-off span. This can be done on the basis of preliminary tests, an educated assumption, or a combination of the two. In the absence of an educated guess, assuming that response error will be roughly proportional to response magnitude, as predicted by Weber’s Law, may provide an appropriate starting point. The role of conflict, as above, and boundary effects, as discussed below, should also be considered. The key idea is to anticipate the plausible pattern of response error as a function of trade-off span. (2) Derive the form of response error leverage, d/dx, and regard it as a function of trade-off span. This operation is a purely mathematical derivation performed on the preference model selected. No data or assumptions are required for this other than a guess of the true value of the parameter, which may also be based on preliminary tests if any are available. (3) Multiply the two above to derive the pattern of model parameter error as a function of assessment setting(s), which is straightforward. (4) In the pattern obtained in (3) above, identify assessment settings that yield low or minimum model error. If the pattern has an “elbow,” then a good choice may be the elbow value. (5) Conduct sensitivity analysis with respect to the assumptions made in (1) and (2) about response error pattern, true parameter value, and possibly model form to ascertain the choice of assessment settings. The calculations in the procedure outlined above can be conveniently implemented in a computer spreadsheet, enabling quick and easy sensitivity analysis. The essential ideas can also be implemented in Management Science/Vol. 49, No. 1, January 2003

DELQUIÉ Optimal Conflict in Preference Assessment

a less analytical, more empirical fashion by Monte Carlo simulation. This involves formulating plausible response error distributions for different settings of trade-off size and obtaining the resulting model error distributions through simulation. This should be fairly easy to implement on a spreadsheet with add-in simulation capabilities, and avoids the technical difficulties of deriving Equations (6) and (8). The cost of the procedure consists essentially of preliminary analytical work, relying on prior data or judgment and sensitivity analysis. The benefits of the approach will be a reduced amount of data collection for any desired level of model accuracy, or conversely increased model accuracy for any given data quantity. In a similar vein, Huber and Zwerina (1996) show that the efficiency of choice designs in marketing can be substantially improved by using similarly valued alternatives in choice sets. Our analysis will be especially valuable whenever large amounts of data are needed, as in consumer preference studies, experimental decision research (in particular, selection of stimuli values for efficient hypothesis testing), or contingent valuation surveys in which preference assessment questionnaires may be administered to hundreds of individuals (Gregory et al. 1993). In a sense, the idea developed here is much like the analysis of statistical power in experiment design: a little preplanning that may avoid data waste later. Extensions Case of Multiple Parameters. An analysis similar to the above can be used for multiparameter models in which the parameters can be sequentially assessed. Consider, for example, the so-called multiplicative  MAU model, KU X1      Xn + 1 = ni=1 Kki ui Xi + 1 , which holds under certain conditions of preference independence among attributes (Keeney and Raiffa 1976).6 This fairly general, widely used model can be calibrated with 2n independent parameters: each of the attributes’ scaling coefficient, ki , and one parameter for each of the single-attribute utility functions, ui . Clearly, the analysis developed in §3 can be applied to optimize the estimation of each of the n parameters  The additive model, U X1      Xn = i ui Xi , is structurally a special case of the multiplicative model.

6

Management Science/Vol. 49, No. 1, January 2003

characterizing the single utility functions. In addition, the analysis can be used in measuring all but two of the scaling coefficients, ki . As a matter of fact, the scaling coefficients can be assessed by using successive trade-offs among pairs of attributes. This begins with assessing two independent trade-offs between, say, X1 and X2 (all other attributes being held at fixed levels), which allows to solve for k1 and k2 simultaneously. For these first two parameters, the analysis presented in §3 is not applicable as such, because k1 and k2 are both simultaneous unknowns. However, once k1 (or k2 ) is known, ki , i ≥ 3 can be measured through a simple pairwise trade-off between Xi and X1 (or X2 ). Now, ki and k1 characterize the slope of the indifference curve between Xi and X1 , and the size of the trade-off between these two attributes can be controlled to optimize the measurement of ki . See Keeney and Raiffa (1976) for an in-depth coverage of multiattribute models assessment, and Delquié and Luo (1997) for special issues in assessing the weights of MAU models with trade-off questions. In sum, the analysis presented here is applicable to the estimation of 2n − 2 of the 2n model parameters. For models in which two or more parameters have to be jointly estimated from a set of preference data (e.g., by solving simultaneous equations or regression analysis), expressing overall model error as a function of response errors and trade-off sizes requires a much more complex mathematical analysis. In some cases, it may even be impossible to express this relationship analytically as we did in Equation (3). Simulation analysis might then be a more promising way of exploring the implications of stimuli choices on the modeling accuracy. The central idea developed in this paper, i.e., striking a balance between response reliability and its impact on modeling accuracy, still provides a guiding principle in deciding how to set assessment questions. Case of Multiple Assessments for One Parameter. In some cases, several indifference points will be measured to construct one indifference curve. The question that would naturally arise in view of the previous analysis is how these points should be spaced or, equivalently, how many points should be assessed within the range covered by the indifference curve for error-efficient model calibration. Assessing fewer, 113

DELQUIÉ Optimal Conflict in Preference Assessment

more spaced apart points will mitigate the influence of response error on model error. Conversely, assessing more points will provide advantages in terms of more reliable responses and more data. Achieving a best compromise of these two competing effects can be mathematically formulated as an optimization problem: that of minimizing model parameter error with the number of (equally spaced) assessments as decision variable. Such an analysis is beyond the purpose of the present paper, but is conceptually parallel to the simpler analysis demonstrated here. The results from the present analysis suggest that spacing the assessment points may generally provide advantages in terms of model error reduction. Another qualitative consideration is the more intangible benefit of mobilizing decision makers to confront difficult tradeoffs and, therefore, go deeper in their introspection of preferences as we remarked in §1. Therefore, and this is a qualitative conjecture, we might expect some benefits to spreading out the assessment points. Case of Bounded Response Domain. Sometimes the response domain has natural bounds, such as 0 1 for probabilities or proportions, 0 365 for days per year. In such cases, response error is unlikely to increase monotonically with trade-off size due to boundary effects. As a first remark, let us point out that it is better to avoid bounded response domains whenever bounds are liable to constrain subjects’ responses. The difficulty can be illustrated by the following example of a trade-off between simple gambles: [$5,000, 0.50] ∼ [$800, p?]. For those individuals preferring a 50% chance to win $5,000 to the certainty of $800, the above trade-off question will be impossible to resolve because p is bounded by 1. It is often possible to avoid bounded response domains by using an unbounded variable as response, transforming the bounded variable or adjusting the fixed values of alternatives so that the responses will be distributed far enough from the boundary to be affected by it. Now, if boundary effects are present, the analysis developed in §3 can still be carried out, but this necessitates, of course, to determine the appropriate pattern of response error as a function of question setting. We may expect the response error distribution to become narrower and skewed as responses get distributed closer to the boundary. For instance, 114

looking back at Table 2, if X1 was set at low values, we would expect the mean response p to get smaller, i.e., closer to 0 and, correspondingly, the distribution of response error to get tighter. Therefore, response error will initially increase with trade-off size as in Figure 3a, but then begin to decrease at some point as the response distribution approaches the boundary. How can we specify this pattern? Clearly, the best approach would be to conduct pretests to estimate this pattern empirically. Another possible approach is to judge the expected range of responses, that is, minimum and maximum responses corresponding to different question settings. The analyst will often be able to judge a likely reasonable range of responses for a particular question setting by introspection, for example. If the response range is well correlated with the standard deviation, which is usually the case for “well-behaved” distributions, then it can be used as an acceptable proxy for x in the analysis proposed in §3. Indeed, recall from Equation (3), that the pattern of response error x needs to be determined only up to a multiplicative constant because we are only interested in identifying a minimum in  . Thus, the range or, say, a 90% confidence range of responses could be used instead of the standard deviation in Equation (3), as a first approximation,7 to carry on with the analysis. This, of course, should be subjected to careful sensitivity analysis as emphasized before.

5.

Conclusion

We considered the interaction between trade-off size (conflict) and precision of model estimates (sensitivity to response error). Our purpose here was to sketch a general approach to this issue and demonstrate it on one simple example. A similar analysis could be conducted on a great variety of other cases to help select assessment settings efficiently rather than ad hoc. Not all cases or assessment configurations are readily amenable to the type of analysis we demonstrated. However, the analyst usually has flexibility in designing preference measurement questions, and assessment problems can often be formulated so as to parallel the kind of configuration we examined here. 7 This substitution would be exact if the confidence range remained a constant multiple of the standard deviation as the distribution widens or tightens, as is the case for the normal distribution, for example.

Management Science/Vol. 49, No. 1, January 2003

DELQUIÉ Optimal Conflict in Preference Assessment

For computer supported assessment procedures, it is even possible to implement this error control mechanism “on line,” that is, based on a few early responses, the computer would dynamically determine assessment settings expected to result in low model error and formulate subsequent questions accordingly. The method proposed here rests on two main ideas: (1) a behavioral observation (that response error may vary with the setting of elicitation questions owing, e.g., to conflict), and (2) a geometric property of value trade-off curves (that they are more informative and less sensitive to error when decision alternatives are less similar). Both ideas appear to be general, that is, largely independent of the decision theoretical framework retained. Therefore, the concept demonstrated here should be widely applicable in preference modeling. Even if a complete analysis as conducted here is not practical or desirable, the central idea developed here can serve as a guiding principle in elaborating preference elicitation questions. As a closing remark, the present work provides a clear illustration of how a good understanding of behavioral and cognitive issues in decision making can be exploited to improve upon the methods and practice of prescriptive decision analysis. Acknowledgments

The author thanks the associate editor and three anonymous referees for careful, constructive reviews. He also acknowledges Parmanand Dharwadkar, Marie-Edith Bissey, George Wu, Miguel Brendl, Pierre Chandon, and Tim Van Zandt for some comments on various drafts of this paper.

References

Abdellaoui, M. 2000. Parameter-free elicitation of utility and probability weighting functions. Management Sci. 46 1497–1512. Bettman, J., E. J. Johnson, M. F. Luce, J. W. Payne. 1993. Correlation, conflict, and choice. J. Experiment. Psych.: Learn. Memory, Cognition 19 931–951. Bleichrodt, H., J. L. Pinto. 2000. A parameter-free elicitation of the probability weighting function in medical decision analysis. Management Sci. 46 1485–1496. Casey, J. T., P. Delquié. 1995. Stated vs. implicit willingness-topay under risk. Organ. Behavior Human Decision Processes 61 123–137. Chatterjee, S., T. B. Heath. 1996. Conflict and loss aversion in multiattribute choice: The effects of trade-off size and reference dependence on decision difficulty. Organ. Behavior Human Decision Processes 67 144–155.

Delquié, P. 1993. Inconsistent trade-offs between attributes: New evidence in preference assessment biases. Management Sci. 39 1382–1395. . 1997. Bimatching: A new preference assessment method to reduce compatibility effects. Management Sci. 43 640–658. , M. Luo. 1997. A simple trade-off condition for additive multiattribute utility. J. Multi-criteria Decision Anal. 6 248–252. Dhar, R., S. Nowlis. 1999. The effect of time pressure on consumer choice deferral. J. Consumer Res. 25 369–384. Eliashberg, J., J. Hauser. 1985. A measurement error approach for modeling consumer risk preference. Management Sci. 15(1) 1–25. Fischer, G., J. Jia, M. F. Luce. 2000a. Attribute conflict and preference uncertainty: The RandMAU model. Management Sci. 46 669–684. , M. F. Luce, J. Jia. 2000b. Attribute conflict and preference uncertainty: Effects on judgment time and error. Management Sci. 46 88–103. Gregory, R., S. Lichtenstein, P. Slovic. 1993. Valuing environmental resources: A constructive approach. J. Risk Uncertainty 7 147–175. Hartley, J. E. 1996. Retrospectives: The origins of the representative agent. J. Econom. Perspectives 10 169–177. Hershey, J., H. Kunreuther, P. Schoemaker. 1982. Sources of bias in assessment procedures for utility functions. Management Sci. 28 936–954. Huber, J., K. Zwerina. 1996. The importance of utility balance in efficient choice designs. J. Marketing Res. 33 307–317. Keeney, R., H. Raiffa. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. John Wiley and Sons, New York. Kleinmuntz, D. 1990. Decomposition and the control of error in decision analytic models. R. M. Hogarth ed. Insights in Decision Making: A Tribute to Hillel J. Einhorn. University of Chicago Press, Chicago, IL, 107–126. Laskey, K. B., G. W. Fischer. 1987. Estimating utility functions in the presence of response error. Management Sci. 33 965–980. Luce, M. F. 1998. Choosing to avoid: Coping with negatively emotion-laden consumer decisions. J. Consumer Res. 24 409–433. McFadden, D. 1981. Econometric models of probabilistic choice. C. F. Manski, D. McFadden eds. Structural Analysis of Discrete Data with Econometric Applications. MIT Press, Cambridge, MA, 198–272. Neufville, R. de, P. Delquié. 1988. A model of the influence of certainty and probability effects on the measurement of utility. B. Munier ed. Risk, Decision and Rationality. D. Reidel, Dordrecht, The Netherlands, 189–205. Shepard, R. 1981. Psychological relations and psychophysical scales: On the status of “direct” psychophysical measurement. J. Math. Psych. 24 21–57. Tversky, A., D. Kahneman. 1992. Advances in prospect theory: Cumulative representation of uncertainty. J. Risk Uncertainty 5 297–323. , E. Shafir. 1992. Choice under conflict: The dynamics of deferred decision. Psych. Sci. 3(6) 358–361. Wu, G., R. Gonzalez. 1996. Curvature of the probability weighting function. Management Sci. 42 1677–1690.

Accepted by Martin Weber; received October 11, 2002. This paper was with the author 14 months for 3 revisions.

Management Science/Vol. 49, No. 1, January 2003

115

Suggest Documents