Decisions Based on Numerically and Verbally ...

4 downloads 0 Views 1MB Size Report
DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES. 283 construct validity—are relevant for comparing verbal and numerical responses.
Copyright 1988 by the American Psychological Association, Inc. 0096-1523/88/500.75

Journal of Experimental Psychology: Human Perception and Performance 1988, Vol. 14, No. 2, 281-294

Decisions Based on Numerically and Verbally Expressed Uncertainties David V. Budescu and Shalva Weinberg University of Haifa, Haifa, Israel

Thomas S. Wallsten University of North Carolina

A two-stage within subjects design was used to compare decisions based on numerically and verbally expressed probabilities. In Stage 1, subjects determined approximate equivalences between vague probability expressions, numerical probabilities, and graphical displays. Subsequently, in Stage 2 they bid for (Experiment 1J or rated (Experiment 2) gambles based on the previously equated verbal, numerical, and graphical descriptors. In Stage 1, numerical and verbal judgments were reliable, internally consistent, and monotonically related to the displayed probabilities. However, the numerical judgments were significantly superior in all respects because they were much less variable within and between subjects. In Stage 2, response times, bids, and ratings were inconsistent with both of two opposing sets of predictions, one assuming that imprecise gambles will be avoided and the other that verbal probabilities will be preferred. The entire pattern of results is explained by means of a general model of decision making with vague probabilities which assumes that in the present task, when presented with a vague probability word, people focus on an implied probability interval and sample values within it to resolve the vagueness prior to forming a bid or a rating.

Subjective probability (SP) is a basic concept in all models of individual decision making under uncertainty. In one class of models, functions of SP are used to weight the utilities, or values, of the basic outcomes to yield a global assessment of goodness for each alternative. This class includes the traditional Subjectively Expected Utility model (Savage, 1954) and a large variety of more recent generalizations and refinements such as the Subjectively Weighted Utility (Karmarkar, 1978), Certainty Equivalence (Handa, 1977), Prospect Theory (Kahneman & Tversky, 1979), and Anticipated Utility (Quiggin, 1982) models. Obviously, such models require the SPs to take real numerical values bounded by 0 and 1 that satisfy certain consistency or coherence conditions. Thus, SP is considered to represent a mapping of an individual's subjective beliefs into the real numbers. In another class of models, SP and outcome utilities are treated as separate dimensions, and alternatives are compared on a dimensional rather than a global basis (Payne, 1976; Russo & Dosher, 1983; Tversky, 1969). In these models, too, SP is treated as a mapping of subjective uncertainty onto the real numbers. The numerous decision models that assume numerical representation of uncertainty are in sharp contrast with the fact that people generally prefer to express their beliefs by means of natural language. Several reasons have been cited for the distinct preference of words over numbers (see also Beyth-Marom, 1982; Budescu & Wallsten, 1985; Wallsten, Budescu, Rapoport, Zwick, & Forsyth, 1986). It is claimed that most people understand words better than numbers and

typically handle uncertainty by means of verbal expressions and associated rules of conversation rather than by numbers (Zimmer, 1984). Historically, probability theory is much younger (Hacking, 1975) than natural language, and at the individual level language and its subtleties are mastered long before one can deal with the numerical system and its intricacies, let alone with probabilities. Finally, numbers are perceived as conveying a level of precision and authority that people do not associate with their opinions. Words are perceived as more flexible and less precise in meaning and, therefore, seem better suited to describe vague and imprecise opinions and beliefs. This property of probability phrases was recently demonstrated by Wallsten et al., (1986) and by Rapoport, Wallsten, and Cox (1987), who have proposed and evaluated various ways of modeling the vagueness of these expressions. In light of this reality, it is surprising that there is virtually no literature dealing with decisions based on verbally expressed beliefs. The only published article on this issue is by Shanteau (1974), but its goals were different from that of the present research, and its implicit assumption was that probability phrases have precise and well-defined meanings. Most of the empirical literature on probability expressions has focused on the translation of verbal expressions to point numerical equivalents. The overwhelming result is great variability in the values assigned to words and large overlap among the ranges assigned to the various expressions (e.g., Beyth-Marom, 1982; Budescu & Wallsten, 1985;Foley, 1959; Hakel, 1968; Johnson, 1973; Kenney, 1981; Lichtenstein & Newman, 1967; Nakao & Axelrod, 1983; Simpson, 1944, 1963). Some of these studies (Beyth-Marom, 1982; Budescu & Wallsten, 1985; Johnson, 1973) have also shown that the between-subjects variability in assigning numbers to expressions far exceeds the within-subjects variability, which itself is not minor. Thus, it may be concluded that verbal expressions have imprecise meanings to individuals, and further, that there are substantial individual differences in the meanings of the expressions.

This work was supported by a grant from the United States-Israel Binational Science Foundation (BSF), Jerusalem, Israel, and a Hugo Bergman Award to David V. Budescu. We wish to thank Hana Shtruminger for writing the programs for the experiments. Correspondence concerning this article should be addressed to David V. Budescu, Department of Psychology, University of Haifa, Haifa 31999, Israel.

281

282

D. BUDESCU, S. WEINBERG, AND T. WALLSTEN

By considering verbal expressions of uncertainty as vague probabilities, a link can be made to the literature on the role of vagueness in choice and decisions.' This link assumes that when a person is told, for example, that an event is likely, he or she will act similarly to being told that the probability falls within a certain interval or that various probabilities are more or less possible. Of course, each individual may interpret the phrase differently with respect to the uncertainty it conveys, but we assume that each deals with the imprecision in the same way that he or she would if the imprecision were stated in a corresponding numerical fashion. With this assumption in mind, we developed a paradigm in which in Stage 1 subjects assigned numerical probabilities and probability expressions to graphical probability displays (i.e., spinners) in a manner designed to achieve whatever equivalence is possible between the three modes—graphical, numerical, and verbal—for each person. In Stage 2, the same subjects bid to buy or sell gambles for gains or losses contingent on probabilities expressed in each of the three approximately equivalent ways. Contrasting predictions can be derived for Stage 2 based on two different approaches to decision making, given vague probabilities. From one perspective, Ellsberg (1961) was the first to demonstrate decision behavior inconsistent with the axioms of rationality (e.g., Savage, 1954) when probabilities are not described precisely, and similar results have been reported by others as well (Becker & Brownson, 1964; Curley & Yates, 1985; Larson, 1980; Yates & Zukowsky, 1976). In most of these experiments, subjects chose between gambles with precise probabilities and ones with probabilities defined only in terms of intervals or second-order distributions (i.e., probabilities over probabilities). Most people avoided the imprecise gambles, while a minority seemed to prefer them, in both cases generally at the sacrifice of expected gain. To explain his results, Ellsberg (1961) hypothesized that when faced with a situation in which many probability distributions are plausible, people consider two distributions: the most likely one and either (for a majority of the people) the distribution associated with the lowest expected gain or (for the minority who are optimists) the distribution associated with the highest expected gain. Each individual, then, behaves in a manner consistent with a SP equal to a weighted average of the two distributions, with the relative weights depending on the degree of imprecision. Whether or not we accept Ellsberg's explanation, we can generalize from his and related results to make predictions for our paradigm. First, because most people avoid vague probabilities, it can be predicted that they will pay more to be relieved of negative gambles and concurrently will pay less to obtain positive gambles, when the gambles are expressed verbally than when they are expressed numerically or graphically. Furthermore, it can be predicted that they will do so even at the loss of long-run expected gain. To the extent that this occurs, it can be said that behavior in response to verbal probabilities is suboptimal relative to that in response to numerical or graphical probabilities. Finally, in view of the negative correlation between preference and decision time (Jamieson & Petrusic, 1977), it can be predicted that it will take longer to bid to verbal than to numerical or graphical gambles.

A contrasting perspective comes from Zimmer (1983, 1984), who has suggested that the verbal mode of communication is more natural to people than is the numerical. Specifically, he claims that "people handle uncertainty by customary verbal expressions and the implicit and explicit rules of conversation connected with them" (Zimmer, 1983, p. 163). "Therefore, if one forces people to give numerical estimates, one forces them to operate in a mode which requires more mental effort and is therefore more prone to interference (and bias)" (Zimmer, 1984, p. 123). Zimmer's claims lead to just the opposite predictions from those made above, namely, that people will generally prefer gambles based on verbal rather than on numerical or graphical probabilities, that they will perform more optimally (in the same sense of making more money) with such gambles, and that decisions about them will be faster. The first goal of this study, therefore, is to test the contrasting predictions drawn from Ellsberg's and Zimmer's work. This is to be done in the present paradigm by comparing subjects' bids and the decision times in Stage 2 for gambles based on verbally, numerically, and graphically expressed probabilities established as approximately equivalent in Stage 1. Of course, there does not exist an exact one-to-one correspondence between a certain probability and a given phrase, as demonstrated by the empirical evidence cited earlier. Approximate equivalences can be established only by a methodology that will (a) allow for considerable individual differences in the understanding and use of probability phrases; (b) eliminate the asymmetry induced by the "quantify the word" paradigm; (c) link both words and numbers to certain welldefined events; and (d) allow comparison of the two types of judgments. The second goal of the study is to develop such a methodology, which will be used in Stage 1. The development of such a methodology gives rise to a third research goal, namely, a comparison of the quality of probabilistic judgments by verbal and numerical means. In an extensive review of the literature on encoding of subjective probabilities, Wallsten and Budescu (1983) have described five criteria for evaluating the quality of judgment of uncertainty. Three of them—reliability, internal consistency, and

1 It is necessary to clarify terminology here. Unfortunately, as pointed out by Oden (1986), many workers in behavioral decision theory (e.g., Einhora & Hogarth, 1985; Ellsberg, 1961; Yates & Zukowsky, 1975; as well as others) have used the terms ambiguous or ambiguity where they actually meant vague or vagueness. According to both the dictionary (e.g., Merriam-Webster, 1984) and careful analysis (Black, 1937), a statement, phrase, or event is ambiguous if it is capable of being understood in two or more different, but precise, ways. It is vague if it is not clearly defined or capable of being understood precisely. Thus, for example, the probability of drawing a ball from an urn of unspecified composition (Ellsberg, 1961) is vague, but it is not ambiguous, and a probability based on very little information (e.g., Einhora & Hogarth, 1985) is vague, but it is not ambiguous, and so forth. A probability would be ambiguous, for example, if one had an urn containing black cork-filled balls and white lead-filled balls, and were told "the probability of drawing a light ball is 0.80." We are concerned here with vagueness, not ambiguity. We will use only the former and related terms, while nevertheless referring to articles that (incorrectly) use the latter and related terms.

DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES

construct validity—are relevant for comparing verbal and numerical responses. Considering reliability first, the usual measures of correlation are not applicable to the verbal responses and therefore cannot be used for comparing the two modes. However, reliability can be assessed and compared in terms of (a) the number of distinct numerical and verbal responses given to a particular spinner probability, (b) the number of probabilities to which a unique response of each type is given, and (c) the within-subject variability for each of the two types of responses. A consistent and reliable judge will repeatedly use the same response when presented with a given spinner, will use responses in a differential and discriminatory fashion, and will display relatively little variation in the way events and responses are associated. (An important and complicated issue that we will not treat here concerns the use of verbal synonyms, but see Zwick, 1987, for one approach to this problem.) Internal consistency of the judgments can be tested by examining the way in which words and numbers are used over a large number of distinct probabilities. In particular, we will test to what degree the ordering of these probabilities is reflected in the subjects' judgments. Construct validity can be established by taking advantage of the fact that in this paradigm subjects perform several tasks associated with the same events—judgments, bids, and attractiveness ratings. The levels of correspondence among the responses in these tasks can serve as measures of construct validity.

Experiment 1 Method Subjects Twenty native English speakers, all students at the University of Haifa, agreed to participate in a judgment/decision-making experiment. In return for their participation, they received a fixed amount of $2.50 in Israeli currency and an additional bonus, depending on the quality of their decisions. Six of the subjects were male, 14 were female. Their average age was 25 years.

Procedure The experiment consisted of two sessions, approximately 1 week apart. During the first session, subjects provided numerical and verbal judgments of graphic displays. These judgments were used in the second session, which consisted of a bidding task. Both sessions were controlled by a PDF 11/73 computer with a graphic terminal (Visual 550). Stage 1: The judgment session. On each trial, subjects were shown a circle radially divided into a shaded and an unshaded sector on the terminal's screen. They were instructed to imagine that a dart was pointed at the circle's center and were asked to judge the probability that it would land on the shaded section. Eleven different displays were used, with the shaded sections equal to 0.05, 0.10, 0.20, 0.80,0.90, and 0.95 of the total area. Each subject judged each display in three different ways: (a) by open-ended numerical judgment, (b) by open-ended verbal judgment, or (c) by selecting a phrase from a fixed list. Each judgment was repeated three times for a total of nine judgments.

283

The subjects first provided the open-ended numerical and verbal judgments. These were obtained in blocks of 11 trials, covering the 11 displays. The order of presentation was randomized in each block. Half the subjects started with a numerical (N) followed by a verbal (V) block and continued through the alternating sequence of NVNVNV. The other half completed the inverse sequence VNVNVN. Subjects were instructed to respond by providing their best numerical or verbal estimate of the required probability. In the verbal case they were instructed to use only probability phrases and to avoid frequency or quantity descriptions. Numbers were restricted to integers from 0 to 100. Following completion of these 66 judgments, subjects performed three blocks of selections of phrases. A list of 18 phrases was presented on the screen next to the partially shaded circle. The list included only probability terms and was based on results obtained from previous studies of words-to-numbers conversions, so as to cover the entire range of values. The 18 words were ordered randomly on each trial, and the subjects were instructed to select the phrase that best captured the probability of the event. The selection task was always the last in order to avoid contamination of the subjects' subjective lexicon, and the words were reordered on each trial in order to force the subjects to carefully consider the list for each judgment and to discourage the development of alternative simplifying procedures. After the various judgments, subjects established an approximate equivalence between displays, phrases, and numbers. First, each of the 11 graphic displays was presented together with the list of the various words (up to six) that had been elicited from that subject. At the bottom of the screen the various numerical judgments (up to three) were shown, and the subject was asked to select the one that best described the graphic display and the various phrases. Finally, each of the graphic displays was presented together with its numerical estimates (up to three), and the verbal judgments (up to six) were shown at the bottom of the screen. The subject selected the one phrase that best captured the graphic display and the various numerical estimates. In each of the two sequences, the order of presentation was randomized, The number and phrase selected for each display were considered to be approximately equivalent with respect to probability from the subject's point of view and were used as such in the second session. Occasionally, a word or a number was selected by a given subject as "best" for two different probabilities. This problem occurred 7 times for the numerical selection (3.2%) and 26 times in the verbal selection task (11.8%). In these cases the modal numerical/ verbal response of the subject to that particular probability in the judgment stage was selected for presentation in the decision stage. The session lasted from 30 to 45 min. Stage 2: The bidding session. Subjects performed a slight variation of the Marshak bidding procedure (Becker, De Groot, & Marshak, 1964). On each trial they were told that they were in possession of a lottery ticket that had a certain chance to win/lose a given sum of money. They were instructed to decide on the minimal cash equivalent they would be willing to accept instead of a potentially winning ticket or the maximal sum they would be willing to pay in order to give away a potentially losing ticket. They were further informed that not all the offers would be accepted, but rather each would be compared with a counteroffer generated by the computer. The optimal strategy of responding according to one's "true" subjective worth of a gamble was described and illustrated for both the winning and losing cases. Half the trials involved winning lotteries, and the other half were their negative reflections. Eleven probabilities were presented in three forms: graphically (a display identical to the one used in Stage 1), verbally (the phrase selected in Stage 1), or numerically (the number selected in Stage 1). Each presentation was replicated three times with the stakes of $0.80, $ 1.05, and $1.25. After six training trials in which all modes of presentation were demonstrated and the computer's counteroffers were presented and explained, the subjects provided the experimental bids. Each person

284

D. BUDESCU, S. WEINBERG, AND T. WALLSTEN

performed 198 trials (11 probabilities x 3 modes of presentation x 3 sums x 2 domains) in a random order. A counteroffer was determined for each gamble by selecting a random number in the range EV ± >h SD (i.e., one standard deviation around its expected value), (Tversky, 1967). These values were used to determine whether an offer would be accepted or rejected. In order to allow comparison of the bids across subjects and modes of pres-

Table 1 Analyses of Verbal and Numerical Subjective Judgments of Probability Statistic No. of responses per subject Minimum Mean Maximum

entation, the same counteroffers were employed for all replications of a given Probability x Stake combination. However, neither the counteroffer nor the outcome of the comparison was presented to the subjects. The time that elapsed from the presentation of the gamble until the subject's bid (total decision time) was also recorded on each

SD No. of responses per display (%) 1 2 3 Mean No. of displays associated with a response (%) 1 2 3*3 Mean X Response/display Minimum Median Maximum Kendall's r b Minimum Median Maximum

trial. At the conclusion of the bidding task, subjects actually played six lotteries. First, it was determined whether or not an offer was accepted; then, if necessary, a random number was selected to simulate a lottery and determine its outcome. Subjects received the total amount earned in these six lotteries as well as their base pay. At the conclusion of this session, subjects were asked to answer several questions regarding their use of the various modes of representation of probability in the study and in everyday life. The average duration of this session was

90 min.

Results Stage 1: Judgment This is the first study in which numerical and verbal judgments of a fixed set of events were elicitated from the same subjects. Therefore, it is informative to compare carefully the two parallel sets of judgments. Direct comparisons of numerical and verbal responses are difficult, however, because the two modes have different metric characteristics. Without entering a debate on what transformation would render the numerical mode a linear or a ratio scale measure of subjective uncertainty, it can be agreed that the numerical mode has at least rough interval properties. In contrast, however, one cannot with certainty even rank order all the probability expressions used by an individual (Budescu & Wallsten, 1985). We used two methods of comparison to overcome this difficulty. First, we treated the responses categorically, simply counting the number of occurrences of distinct elements. Second, we calculated statistics regarding the graphic displays to which responses of each type were assigned. One of the most surprising results was the richness of the probability vocabulary of the subjects. The 20 subjects generated 111 different phrases (compared with 73 numbers of 101 possible integers). The distribution of the total number of responses over subjects for the three response methods is summarized in the top panel of Table 1. Note that the average subject used about 13 phrases (both self-produced and from the list) and about 18 numbers. The next two panels in Table 1 present the distribution of the number of responses per display and of the number of displays for which a certain response was used. A perfectly consistent subject should establish a 1:1 mapping between responses and displays. Although the numerical judgments do not demonstrate perfect consistency and reliability, they are more reliable and discriminatory. The mean number of responses per display under the numerical condition is signif-

Phrases

List

Numbers

7 13.45 19 1.65

10 13.25 16 1.45

12 17.95 29 4.66

19.09 49.55 31.36 2.12

26.82 47.33 25.45 1.99

35.45 43.18 21.36 1.86

40.11 19.99 39.90 2.67

30.81 25.18 44.01 2.52

84.87 13.53 1.60 1.94

.35 .55 .73

.33 .60 .79

.39 .39 .83

.75 .90 .97

.79 .92 .98

.86 .99 .99

Note. Phrases refers to the open-ended elicitation, and list to the selection from the fixed list of phrases.

icantly lower than in the two verbal conditions combined, ((19) = 2.68, p < .05, and so is the mean number of displays associated with a given spinner, r(19) = 5.90, p < .05. Figure 1 displays the mean number of responses generated by each method as a function of the displayed probability. Note the M-shaped curves for all three methods, with the number of responses lowest at .50, and .95. For all displays there are more verbal responses than there are numerical ones, with the largest difference being observed at the lower end of the continuum (.05). The M-shape of the curve was confirmed by a trend analysis that showed a significant quartic effect, F(l,

•—• Phrases

List

tf>

z

Numbers

5 10

20

30

40

50

60

PROBABILITY Figure 1.

70

80

90

100

DISPLAYED

Mean number of responses generated as a function of the

probability displayed and the response mode.

285

DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES

190) = 31.68, p < .05, accounting for over 44% of the interprobabilities variance. The last panels of Table 1 summarize the degree of monotonicity in the relation between the displayed probabilities and the responses provided by the subjects. For each subject, under each of the response modes we obtained a two-way table of the frequency of usage of a given phrase, or number, in describing the various displays. Then, the rows of the table were permuted to achieve the best possible monotonic fit. This was done by ordering them according to the weighted mean of the probabilities for which the response was used. (For example, if the phrase a was used once to describe a . 10 display and twice to describe a .20 display, its weighted average of. 167 placed it after word b, which was used twice to describe a .10 display and once to describe a .20 display and had an average of .133.) This procedure yielded a highly consistent monotonic pattern, with a large majority of the responses concentrated near the diagonal. In fact, for all three modes of judgment, over 85% of the judgments were located in this main block diagonal with a median value of 97% for the numerical judgments and 94% for the verbal ones. To further quantify this relation, we report two statistics focusing on slightly different aspects of the data. Goodman's X (response (display) is an asymmetric measure of nominal PRE (proportional reduction in error). This statistic describes the degree to which knowledge of the display presented can improve prediction of the subject's response; Kendall's rb is a symmetric ordinal coefficient based on calculation of the

proportion of concordant and discordant pairs of rows and columns. Both measures suggest that subjects tend to use the verbal as well as the numerical response modes in a consistent monotonic fashion, with a slight advantage to the numerical mode, but no significance tests are available to compare the various conditions. Figures 2-4 illustrate this analysis of monotonicity across all subjects. To facilitate presentation, the figures are limited to responses repeated at least 10 times (across subjects and replications). The graphic portions of the figures show the frequencies with which responses were used for each display. Note first that, as expected, the numerical responses (Figure 2) demonstrate a consistent monotonic relation to the displayed spinners. A similar pattern is obtained for the phrases (either self-generated in Figure 3) or selected from the list (Figure 4). The two modes do differ in a very important respect, however, as is evident by inspecting the figures. Namely, each phrase is given in response to a broader range of displays than is each number. The question of whether this unsurprising result is due primarily to within- or between-subjects variability is answered in the last two columns prior to the charts in the figures. The first column shows the weighted mean of within-subjects standard deviations of the displays associated with each response. The final column shows the standard deviation of the mean display value for each subject per response. It is particularly interesting to note that betweensubjects variability exceeds within-subjects variability for 18

Probability Mean

S.D.

S.D. (means)

50

5

19

5.00

0

0

6

10

6.50

0

3.24

10

40

11.31

1.16

5.69

15

19

10.00

1.44

3.48

20

26

17.78

2.09

2.26

25

14

22.14

5.40

5.98

30

25

26.80

4.85

4.67

35

23

29.13

5.62

6.25

33

12

30.00

0

D

45

18

33.33

8.82

9.05

40

33

40.30

5.27

6.23

50

54

49.44

2.77

3.19

60

32

60.31

3.22

4.71

65

25

64.80

5.48

10.03

66

11

70.91

0

4.77

70

27

72.59

4.17

8.80

Legend

75

23

74.35

3.97

7.41

- 1-5

80

42

80.73

5.31

10.67

+

6-10

85

21

87.62

0

8.07

x

11-20

90

33

92.26

2.55

7.34

* 21-30

95

27

93.52

2.41

4.16

Figure 2.



60

70

>30

Distribution of numerical judgments as a function of the probability displayed.

286

D. BUDESCU, S. WEINBERG, AND T. WALLSTEN Probability S.D.

S.D. (means)

40

Improbable

17

9.41

4.68

2.53

+

Very unlikely

10

10.50

8.23

5.04

-

Unlikely

40

12.13

5.56

6.2E

x

Fair chance

13

27.B9

5.77

7.98

Some chance

32

30.63

10.06

10.01

Possible

88

42.67

13.43

10.59

Likely

20

58.00

9.45

Quite likely

11

58.18

14.08

7.54

40

62.75

13.80

11.75

Quite possible

43

67.33

12.12

13.32

Very good chance

15

76.00

6.20

12.33

60

x

x

x

80

-

_

90

+

-

-

-

+

+

+

x

x

*

-

+ _

_

_

+ -

_

Legend

_

+

_

+

-

+

4.

_

_

_

Probable

26

78.65

9.95

6.28

_

Very possible

25

82. BO

8.40

7.98

+

6-10

-

Quite probable

15

83.67

6.16

16.84

x

11-10

-

Very likely

25

85.40

12.28

5.07

*

Almost certain

11

93.12

4.40

2.09

®

Figure 3.

70

+

22.62

Good chance

SO

1-5

+ +

+

21-30

-



(



_

-

+

-

+

-

-

+

-

>30

_

+

_

_

-

x

-

-

-

+

Distribution of verbal judgments (phrases) as a function of the probability displayed.

of the 21 numerical responses (86%), whereas the reverse is true in only 1 case (4.5%). In contrast, when looking at the verbal responses, between-subjects variability exceeds withinsubjects variability in 14 of 32 cases (44%), combining over the free and list verbal condition, and the reverse is true in 12 of 32 instances (43%). Thus, individuals are far more variable in their assignment of phrases than of numbers to displays, relative to between-subjects variability. Furthermore, comparison of the within- and between-subjects standard deviations indicates substantially greater individual differences in the use of phrases than of numbers. To substantiate this

impression, we compared the values obtained under the three conditions by a nonparametric procedure. Mann-Whitney U tests showed significant differences between the numeric condition and the fixed list, (Z = -2.19, p > .05, for the withinsubjects SDs, and Z = -2.89, p > .05 for the between-subjects SDs), the numeric condition and the self-generated phrases (Z = -4.50, p < .05, for the within-subjects variation, and Z = -2.34, p < .05, for the between-subjects variation), but no difference between the two verbal conditions (Z = 1.1, p > .05, for the within-subjects and Z = 0.05, p > .05 for the between-subjects values). Probability

Phrase

N

Mean

S^.

S.D. (means)

Almost impossible

31

5.32

1.44

0.72

*

Very poor chance

37

6.11

3.02

2.29

x

x

Improbable

14

11.43

2.04

6.84

+

-

-

Unlikely

24

15.83

8.00

8.54

+

+

-

Poor chance

25

16.00

3.95

7.08

+

+

Doubtful

21

19.29

Somewhat unlikely

66

28.03

7.15

10.34

11.16

9.16

0

5

52

42.12

11.50

13.97

Somewhat likely

47

48.09

13.30

12.81

Toss-up/even odds

54

50.19

0

Probable

41

64.15

12.01

20.35

_

Likely

32

64.38

10.27

14.36

+

Good chance

68

68.82

11.08

10.79

x

11-20

Very good chance

71

81.20

9.26

9.13

*

Almost certain

74

95.36

5.80

4.81

.05, or domain of lottery, F({, 5) = 1.3, p >.05, were detected in the ANOVA of the ratings. The subjects' ratings were mon-

Table 7 Mean Attractiveness or Unallraaiveness Rating (1-9 Scale) as a Function of a Mode of Presentation and Domain (Experiment 2)

otonically related to the probabilities, f(10, 50) = 73.97, p < .05, but as in Experiment 1, the form of the relation was

Mode of presentation

different in the two domains, as witnessed by the Significant

Domain

Graphic

Numeric

Verbal

M

Probability x Domain interaction, F(10, 50) = 2.23, p < .05. Finally, Table 8 presents the harmonic mean rating time as a function of the domain and mode of presentation. An

Gains Losses M

5.07 5.70 5.38

4.92 5.90 5.41

5.30 5.97 5.63

5.09 5.86 5.48

290

D. BUDESCU, S. WEINBERG, AND T. WALLSTEN



• Graphic Numeric Verbal

o z

O

GAINS 1 DC. -90 -80 -70 -60 -50 -40

-30 -20

-10

10

20

30

40

50

60

rt)

80

90

.05 .10

.20

.30

.40

.50

.60

.70

.80

.90 .95

PROBABILITY

Figure 8. Harmonic mean rating time (in seconds) as a function of probability displayed and mode of presentation.

PROBABILITY Figure 7. Mean rating (1-9 scale) as a function of lottery's probability and mode of presentation.

This superiority of the numerical responses cannot be attributed to either the open-ended nature of the free method or to artificiality of the list method, because when the numerical responses were rounded to multiples of 5 (yielding 21 categories, comparable to the size of the phrase list), they still showed the greatest within-subjects reliability. Evidence for internal consistency among the responses was obtained from the monotonicity analyses and indicated that subjects were consistent in both modes of response. Unfortunately, significance test are not available to compare these statistics, but they were always higher for the numerical judgments. Finally, construct validity was examined by regular correlational analysis. The adjusted bids in response to verbally and numerically presented uncertainties correlated highly with the display probabilities that initially elicited the verbal and numerical descriptors. Also attractiveness ratings of the gambles correlated well with both the adjusted bids and the original probabilities. However, correlations in the numerical case were always significantly higher than in the verbal one (cf. Tables 2 and 6). In examining the construct validity of the verbal and numerical presentations, it is instructive to compare the correlations with their upper limit, namely, those

Table 8 Harmonic Mean Judgment Times as a Function of Mode of Presentation and Domain (Experiment 2) Mode of presentation Domain

Graphic

Verbal

Numeric

M

Gains Losses M

3.03 3.37 3.19

3.31 3.89 3.58

3.48 3.46 3.47

3.26 3.56 3.41

obtained under the graphic presentation mode. If the judgment (numerical or verbal) introduces any noise and causes confusion, the former correlations will be lower than the latter. In fact, the numerical and graphic correlations are very similar, and in the domain of losses, are not significantly differentCorrelations in the verbal condition were in all cases significantly lower than their graphic and numerical counterparts, indicating that this mode of judgment is associated with higher levels of noise and confusion. To summarize, both modes of judging uncertainty yield reliable, internally consistent scales that demonstrated construct validity at the level of individual subjects. Although not surprising in the numerical mode, these results extend those of Budescu and Wallsten (1985) in the verbal mode, who showed only that individuals consistently rank order nonnumerical descriptors of certainty. However, all the comparisons favored the numerical over the verbal response mode, because of considerably and significantly smaller within- and betweensubjects variability under the former mode. Thus, the experimental procedure yielded verbal and numerical descriptions of the graphic displays that were as similar as possible in central probability meaning for each subject but dissimilar in at least two other regards. Specifically, the phrases were more vague than the numbers for each subject; this is consistent with much previous research (Rapoport et al., 1987; Wallsten et al., 1986). Also, when considering those phrases and numbers used by multiple subjects, we found that between-subjects variability and therefore individual differences were much greater in the verbal mode. Finally, it is noteworthy that, on average, subjects responded with over 13 phrases within each of the two verbal methods. Altogether, the 20 subjects generated 111 distinct expressions in response to 11 probability displays. Even after the "best" phrases were ultimately determined for each subject for each display, disagreement in choice of expression was extreme. Thus, it appears that our subjects had substantial

DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES

working vocabularies of uncertainty, with nevertheless relatively little overlap among individuals. These results stand in marked contrast to those of Zimmer (1983), whose subjects' active lexicons for uncertainty seemed to contain five or six expressions each, with many phrases in common. Specifically, the 150 subjects in his experiment used a total of 12 verbal labels of probability. Our and his experiments differ in numerous ways, any of which might be responsible for the conflicting results. Decision Behavior Before discussing the various predictions, it is worth pointing out that when considering each of the presentation modes separately, essential features of the present data replicate results previously obtained with numerical lotteries. Specifically, within the graphic, numerical, and verbal presentation modes, subjects were more sensitive to losses than to gains, in the sense of offering more money to avoid a negative lottery than requiring to replace an equivalent positive one (Budescu & Weiss, 1987; Kahneman & Tversky, 1979), and decision times were longer for negative than for positive lotteries (Ben Zur & Breznitz, 1981). However, when comparing behavior across modes, and considering Experiments 1 and 2 together, the data demonstrated neither of the expected patterns of results. There was no evidence of systematic avoidance of the verbal lotteries in conjunction with longer decision times and considerable sacrifice of financial gain, as predicted from the results of Ellsberg (1961) and others. Nor were there systematic preferences for such lotteries in conjunction with shorter decision times and enhancement of financial gain, as predicted from the idea (Zimmer, 1983, 1984) that verbally expressed uncertainties are preferred and more optimally processed than are their numerical counterparts. Two features of the rating times in Experiment 2 demonstrate that the longer bidding times for the numerical than the other gambles in the first study were artifactual, probably because of calculation time. First, rating times for the three presentation modes were equal, suggesting no intrinsic difference in time to process the three kinds of lotteries. Second, the rating times were considerably shorter than the bidding times, probably reflecting the absence of calculations. Finally, it must be emphasized that subjects were not informed that decision times were to be collected, nor were they urged to respond quickly. Thus, before final conclusions are reached regarding the relative difficulty of processing uncertainty presented in the various modes, it is necessary to collect data under conditions that encourage quick responding. Although neither set of predictions was supported, the relations among responses in the three modes are systematic, with three important conclusions emerging. First, the general pattern of bids and ratings was similar in all three cases (Figures 5 and 7). This result is especially surprising, given the vagueness of the phrases relative to the numbers and displays themselves, as documented by the judgments in Stage 1. Second, bids under verbal presentation were relatively less optimal than under the other two modes, as measured by expected earnings. Overall, the subjects would have earned

291

about 24% less in the verbal than in either of the other two conditions (Table 4). Third, on the basis of the bids in the first experiment, the verbal lotteries were slightly but significantly and consistently preferred in the domain of gains, whereas the opposite was true in the domain of losses. This conclusion is derived not only from the group analyses, but it also holds for 16 of the 20 individual subjects as well. Ratings in Experiment 2 showed the same pattern, although the effect of presentation mode was not significant in this case. This demonstration of preference for vagueness under a very general condition stands in marked contrast to the results of all other studies of which we are aware on the effects of vagueness on preference. Ellsberg (1961), Einhorn and Hogarth (1985), and Hogarth and Kunruther (1984) all reported some individuals who preferred vague to precise lotteries. In addition, experimental conditions were manipulated in the latter two articles that encouraged vagueness preferences. Otherwise, the universal result has been vagueness avoidance. Indeed, except for the studies of Einhorn and Hogarth (1985) and Hogarth and Kunruther (1984), all experiments used only positive lotteries, which is precisely where we found vagueness preference. In trying to understand the pattern of bids obtained in the domain of gains, it is important to realize that the general vagueness avoidance hypothesis implies a context or a domain-specific effect—namely, a vague probability is given more weight or responded to as a higher chance in the context of a loss than of a gain. Thus, a lottery based on a verbal probability is valued less in the context of gains but more (negatively) in the context of losses than is a gamble based on a precise probability to which the verbal one was judged equivalent in Stage 1. The alternative hypothesis, that verbal probabilities are preferred to numerical or graphic ones, implies an opposite effect in the two domains. That is, gambles based on verbal probabilities are valued more in the domain of gains and less (negatively) in the domain of losses. In contrast, our data indicate that the subjects attached more extreme values to the verbal gambles than to the precise ones in both domains. The most obvious difference between this study and all the others is that in the present case the vague uncertainties were represented linguistically. Also, we used a bidding task, as did Becker and Brownson (1964); all other studies used a pairwise choice procedure. Because Becker and Brownson's results are similar to everyone else's, we are inclined to think that the important feature is the manner in which the vagueness is expressed. However, the relative contributions of the two differences must be explored carefully, especially in light of the well-documented effects of response mode and framing on preferences (Goldstein & Einhorn, 1987; Slovic & Lichtenstein, 1983; Tversky & Kahneman, 1981). Integrating the Stage 1 and 2 Results: The v-n Model The primary puzzle in the present data is that the Stage 1 data clearly showed that the phrases are more vague than the numbers or the displays, but nevertheless the general patterns of bids, ratings, and times were similar in all cases (except for

292

D. BUDESCU, S. WEINBERG, AND T. WALLSTEN

the numerical bid times, presumably because of calculations). Despite the overall similarity, the verbal gambles were responded to less optimally, in that earnings from them were 24% less than from the other two types, because verbal gambles were valued more positively in the case of gains and more negatively in the case oflosses than were the others. A theory of judgment and choice on the basis of linguistic uncertainties (represented by the v-n model) has been developed to explain these and numerous other results (Wallsten, Budescu, & Erev, in press) and will be sketched here insofar as it pertains to the present data. The first assumption is that probability phrases are vague in the sense that they describe some probabilities very well, some not at all, and some to an intermediate degree. The exact probabilities represented by a phrase, and the degree of vagueness, vary over individuals. Thus, probability phrases are "linguistic variables" (Zadeh, 1974) that can be represented by membership functions over the (0, 1) interval of probabilities, as illustrated in Figure 9. The ordinate of such functions, /», denotes the degree of membership of a given probability in a particular vague phrase for a given person. Alternatively, n for a phrase Wean be thought of as the truth value of the statement, "The probability p is described by the phrase W," bounded by 0 (absolutely false) and 1 (absolutely true). Wallsten et al. (1986) and Rapoport et al., (1987) discuss properties of these functions in detail and have also developed methods of empirically establishing them in reliable and valid ways in the context of the representation of vagueness.

Because phrase meanings are vague and overlap, as illustrated by Wallsten et al. (1986) and in Figure 9, they were applied in a relatively inconsistent way to the probabilities presented in Stage 1. However, in Stage 2 it was absolutely necessary to resolve the vagueness in order to provide a bid or a rating. We theorized that this was accomplished by focusing on a restricted interval of the most representative probabilities and by sampling from it in some way to achieve a single valued representation of the phrase, from which a bid (rating) was formed. Thus, subjects considered only probabilities with memberships above a certain task-specific threshold, v. Figure 9 presents membership functions of four hypothetical words, W,-WA, and three possible thresholds, 1/1-1/3. Note that the threshold employed determines the degree of effective overlap among the four words, and the width of the four effective probability intervals. For example, if one uses v\ he or she is faced with four relatively wide and highly overlapping intervals. On the other hand, i>3 yields four narrow and nonoverlapping intervals. This amounts to the assumption that in the context of a given task, the subject can unambiguously rank order the four words. Thus, the general similarity of the bids and ratings in the three presentation modes can be explained by assuming a relatively high threshold, yielding a stable ordering of the words. Once a threshold is determined, the subject must select a point value in the interval to represent the phrase. Many sampling rules can be invoked; data from a choice experiment (Wallsten et al., in press) support a model in which probabil-

PROBABILITY

Figure 9.

Hypothetical membership function for four words (W,-W,) and three i>-cut levels (u,-v,).

DECISIONS GIVEN NUMERICAL AND VERBAL UNCERTAINTIES

ities are assigned a sampling weight equal to their relative membership value above the threshold. Given this model, the overweighting of high probabilities implies a strong prediction regarding the shape of the membership functions—namely, that they be generally positively skewed above the threshold. This v-n model serves as a general conceptual framework that provides one possible explanation for all aspects of the present results. Obviously, additional experiments using verbal probabilities with known and established membership functions are necessary to fully test this theory.

References Becker, S. W., & Brownson, F. O. (1964). What price ambiguity? Or the role of ambiguity in decision making. Journal of Political Economy, 72, 62-73. Becker, G. M, De Groot, M. H., & Marshak, J. (1964). Measuring utility by a single response sequential method. Behavioral Science, 9, 226-232. Ben Zur, H., & Breznitz, S. J. (1981). The effect of time pressure on risky choice behavior. Acla Psychologies 47, 89-104. Beyth-Marom, R. (1982). How probable is probable? Numerical translation of verbal probability expressions. Journal of Forecasting, I, 257-269. Black, M. (1937). Vagueness. Philosophy of Science, 4, 427-455. Budescu, D. V., & Wallsten, T. S. (1985). Consistency in interpretation of probabilistic phrases. Organizational Behavior and Human Decision Processes, 36, 391-485. Budescu, D. V., & Weiss, W. (1987). Reflection of transitive and intransitive preferences: A test of prospect theory. Organizational Behavior and Human Decision Processes, 39, 184-202. Curley, S. P., & Yates, J. F. (1985). The center and range of the probability interval as factors affecting ambiguity preferences. Organizational Behavior and Human Decision Processes, 36, 273287. Einhorn, H. J., & Hogarth, R. M. (1985). Ambiguity and uncertainty in probabilistic inference. Psychological Review, 92, 433-461. Ellsberg, D. (1961). Risk, ambiguity and the Savage axioms. Quarterly Journal of Economics, 75, 643-669. Foley, B. J. (1959). The expression of certainty. American Journal of Psychology, 72,614-615. Goldstein, W. M., & Einhorn, H. (1987). Expression theory and the preference reversal phenomenon. Psychological Review, 94, 236254. Guilford, J. P., & Fruchter, B. (19781. Fundamental statistics in psychology and education (6th ed.). New York: McGraw-Hill. Hacking, I. (1975). The emergence of probability. Cambridge, MA: Cambridge University Press. Hakel, M. (1968). How often is often? American Psychologist, 23, 533-534. Handa, J. (1977). Risk, probabilities, and a new theory of cardinal utility. Journal of Political Economy, 85, 97-122. Hogarth, R. M., & Kunruther, H. C. (1984). Risk ambiguity and insurance. Unpublished manuscript. University of Chicago, Graduate School of Business, Center for Decision Research. Jamieson, D. G., & Petrusic, W. M. (1977). Preference and the time to choose. Organizational Behavior and Human Performance, 19, 56-67. Johnson, E. M. (1973). Encoding of qualitative expressions of uncertainty. (Tech. Paper 250). Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Kahncman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263-291.

293

Karmarkar, U. S. (1978). Subjectively weighted utility: A descriptive extension of the expected utility model. Organizational Behavior and Human Performance, 21, 61-72. Kenney, R. M. (1981). Between never and always. New England Journal of Medicine, 305, 1097-1098. Kirk, E. R. (1982). Experimental design (2nd ed.). Belmont, CA: Brooks/Cole. Larson, J. R. (1980). Exploring the external validity of a subjectively weighted utility model of decision making. Organizational Behavior and Human Performance, 26, 293-304. Lichtenstein, S., & Newman, J. R. (1967). Empirical scaling of common verbal phrases associated with numerical probabilities. Psychonomic Science, 9, 563-564. Merriam-Webster. (1984). Webster's Ninth New Collegiate Dictionary. Springfield, MA: Author. Nakao, M. A., & Axelrod, S. (1983). Numbers are better than words: Verbal specifications of frequency have no place in medicine. The American Journal of Medicine, 74, 1061-1065. Oden, G. (1986, November). Discussion at the symposium on "The Representation and Role of Ambiguity in Judgment and Individual Decision Making" at the annual meeting of the Judgment/Decision Making Society, New Orleans. Payne, J.W. (1976). Task complexity and contingent processing in decision making: An information search and protocol analysis. Organizational Behavior and Human Performance, 16, 366-387. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organization, 3, 323-343. Rapoport, A., Wallsten, T. S., & Cox, J. A. (1987). Direct and indirect scaling of membership functions of probability phrases. Mathematical Mode/ing, 9, 397-417. Russo, J. E., & Doshe'r, B. A. (1983). Strategies of multiattribute binary choice. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 676-696. Savage, L. J. (1954). The foundations of statistics. NY: Wiley. Shanteau, J. (1974). Component processes in risk decision making. Journal of Experimental Psychology, 103, 680-691. Simpson, R. H. (1944). The specific meanings of certain terms indicating differing degrees of frequency. Quarterly Journal of Speech, 30, 328-330. Simpson, R. H. (1963). Stability in meanings for quantitative terms: A comparison over 20 years. Quarterly Journal of Speech, 49, 146151. Slovic, P., & Lichtenstein, S. (1983). Preference reversals: A broader perspective. American Economic Review, 73, 596-605. Smith, J. E. K. (1976). Data transformations in analysis of variance. Journal of Verbal learning and Verbal Behavior, 15, 339-346. Tversky, A. (1967). Additivity utility and subjective probability. Journal of Mathematical Psychology, 4, 175-201. Tversky, A. (1969). Intransitivity of preferences. Psychological Review, 76, 31-48. Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211, 453-458. Tversky. A., Sattath, S., Slovic, P. (1987). Contingent weighting in judgment and choice. Unpublished paper, Stanford University, Stanford, CA. Wainer, H. (1977). Speed vs. reaction time as a measure of cognitive performance. Memory & Cognition, 5, 278-280. Wallsten, T. S., & Budescu, D. V. (1983). Encoding subjective probabilities: A psychological and psychometric review, Management Science, 29, 151-173. Wallsten, T. S., Budescu, D. V., & Erev, I. (in press). Understanding and using linguistic uncertainties. Acta Psychologica. Wallsten, T. S., Budescu, D. V., Rapoport, A., Zwick, R., & Forsyth, B. (1986). Measuring the vague meanings of probability terms. Journal of Experimental Psychology: General, 115, 348-365.

294

D. BUDESCU, S. WEINBERO, AND T. WALLSTEN

WaUsten, T. S., Budescu, D. V., Rapoport, A., Zwick, R., & Forsyth, B. (1986). Measuring the vague meanings of probability terms. Journal of Experimental Psychology: General, 115, 348-365. Yates, J. F., & Zukowski, L. G. (1976). Characterization of ambiguity in decision making. Behavioral Science, 21, 19-25. Zadeh, L. A. (1974). The concept of a linguistic variable and its application to approximate reasoning. In K. S. Fu & J. T. Tow (Eds.), Learning systems and intelligent robots (pp. 1-10). New York: Plenum Press. Zimmer, A. C. (1983). Verbal vs. numerical processing of subjective probabilities. In R. W. Scholtz (Ed.), Decision making under un-

certainty (pp. 159-182). Amsterdam: North-Holland. Zimmer, A. C. (1984). A model for the interpretation of verbal predictions. International Journal of Man-Machine Studies, 20, 121-134. Zwick, R. (1987). Combining stochastic uncertainty and linguistic inexactness: Theory and experimental evaluation. Unpublished doctoral dissertation, University of North Carolina at Chapel Hill.

Received September 15, 1986 Revision received August 19, 1987 Accepted September 1, 1987 •

Inconsistencies and Aggravations: Word Processing Manuscripts in APA Style If some of the requirements of APA editorial style and typing format, as described in the Publication Manual, conflict with the capabilities of your word processing system, the APA Journals Office would like to hear from you. We are reexamining style and format requirements in the light of how easy or how difficult they are to implement by those preparing manuscripts on computers or word processors. For example, some systems cannot easily place the page number under the short title in the upper-right corner of each page. If you have had difficulties with this requirement or any others, we would much appreciate your taking the time to jot down which requirements have presented problems, and what hardware and software systems you used to prepare the manuscript. Write to Leslie Cameron, Room 710, APA, 1400 North Uhle Street, Arlington, Virginia 22201.

Suggest Documents