COHERENCE AND NONMONOTONICITY IN HUMAN ... - Springer Link

4 downloads 49 Views 181KB Size Report
as more probable than the individual event that Linda is a bank teller (B). Because of such 'upper bound' violations the human subjects are said to commit a ...
NIKI PFEIFER and GERNOT D. KLEITER

COHERENCE AND NONMONOTONICITY IN HUMAN REASONING

ABSTRACT. Nonmonotonic reasoning is often claimed to mimic human common sense reasoning. Only a few studies, though, have investigated this claim empirically. We report four experiments which investigate three rules of SYSTEM P, namely the AND, the LEFT LOGICAL EQUIVALENCE , and the OR rule. The actual inferences of the subjects are compared with the coherent normative upper and lower probability bounds derived from a non-infinitesimal probability semantics of SYSTEM P. We found a relatively good agreement of human reasoning and principles of nonmonotonic reasoning. Contrary to the results reported in the ‘heuristics and biases’ tradition, the subjects committed relatively few upper bound violations (conjunction fallacies).

1. INTRODUCTION

Consider two events, A and B, and their conjunction, A ∧ B, and assume that the probabilities of the individual events, P(A) and P(B), are given, but that the joint probability of their conjunction, P(A∧B), is not given. Of course, if A and B are independent then the probability of the conjunction is the product P(A) × P(B). If we are ignorant of the independence or dependence of A and B, it must fall into an interval given by an upper and a lower probability, max[0, P(A) + P(B) − 1] ≤ P(A ∧ B) ≤ min[P(A), P(B)]. Tversky and Kahneman showed that human probability judgments may be systematically greater than the upper bound of this interval (Tversky and Kahneman 1983; Hertwig and Gigerenzer 1999; Mellers et al. 2001). Take, as an example, the well known LINDA task: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

The subjects are asked to rank the following alternatives by their probability: Linda is active in the feminist movement (A), Linda is a bank teller (B), Linda is a bank teller and is active in the feminist movement (A ∧ B). In Synthese (2005) 146: 93–109

© Springer 2005

94

NIKI PFEIFER AND GERNOT D. KLEITER

the original experiments 85% of the subjects rated the conjunction ( A ∧ B) as more probable than the individual event that Linda is a bank teller (B). Because of such ‘upper bound’ violations the human subjects are said to commit a conjunction error. Is the inequality that is violated by human judgments in the conjunction fallacy a special case only or is it part of a more general tendency? Are there other inequalities that may be employed to investigate whether human reasoning respects elementary probabilistic relations? Probability theory is of course governed by linear constraints, and the underlying principle of coherence. The principle of coherence is a bridge between probability on one hand and nonmonotonic reasoning on the other hand (Gilio 2002; Coletti and Scozzafava 2002; Schurz 1997). While there are many systems of nonmonotonic reasoning, SYSTEM P, proposed by Kraus et al. (1990), is broadly accepted, and for just this system Gilio gave a probability semantics (Gilio 2002). The rules of SYSTEM P serve as a minimal set of basic rationality postulates for nonmonotonic reasoning. Since it provides some unification within the jungle of the manifold approaches to nonmonotonic reasoning, it is psychologically attractive (cf. Da Silva et al. 2002, p. 107). We denote a nonmonotonic conditional by ‘|∼’, which is a genuine nonmonotonic operator. ‘α |∼ β’ is read as ‘if α, then normally β’. Probability semantics interprets α |∼ β within a probability model, such that the conditional probability P(β|α) is ‘high’, that is α |∼ β is interpreted as P(β|α) > θ. Non-infinitesimal probability semantics requires practically high probabilities, that is, θ > 0.5. More specifically, the nonmonotonic conditional is then written as ‘α |∼x β’, where x is an element of the interval [x∗ , x ∗ ], that is α |∼x β

is interpreted as

P(β|α) ∈ [x∗ , x ∗ ],

where ‘x∗ ’ refers to the lower and ‘x ∗ ’ to the upper bound of the probability interval [x∗ , x ∗ ], and 0 ≤ x∗ ≤ x ∗ ≤ 1. Point probabilities are treated as special cases of intervals, such that x∗ = x ∗ . Gilio (2002) proposed to propagate probability bounds in SYSTEM P, such that the coherent probability bounds of the conclusion (z ∈ [z ∗ , z ∗ ]) can be inferred from the coherent upper and lower probability bounds of the premises (x ∈ [x∗ , x ∗ ], y ∈ [y∗ , y ∗ ]). Table I states the rules of SYS TEM P (Kraus et al. 1990, p. 189f), and how the lower and upper probability bounds of each rule are propagated (Gilio 2002, p. 6f). SYSTEM P is nonmonotonic, since (i) it contains a genuine nonmonotonic conditional, |∼, and (ii) monotony cannot be deduced, i.e., from α |∼ β we cannot deduce α ∧ γ |∼ β, or in probabilistic terms, from

COHERENCE AND NONMONOTONICITY IN HUMAN REASONING

95

P(β|α) = θ only P(β|α ∧ γ ) ∈ [0, 1] can be deduced, a completely non-informative conclusion. The goal of the present study is to investigate empirically a probabilistic interpretation of a selection of rules of SYSTEM P. An interesting byproduct is that the experimental material involved allows throwing some light on some previously neglected aspects of the conjunction fallacy, namely how subjects treat the lower probability bounds. Hitherto, only upper bound violations were considered. Theoretical papers on nonmonotonic reasoning often motivate their work by claiming to mimic human common sense. What is the empirical status of this claim? Lifschitz (1989) presented a list of nonmonotonic benchmark problems, which were investigated by Elio and Pelletier (in press), and by Ford and Billington (2000). Da Silva Neves, Bonnefon, and Raufaste report an experiment on SYSTEM P (Da Silva et al. 2002). Vogel worked on human nonmonotonic inheritance reasoning (Vogel 996), and Schurz conducted an empirical study on basic questions of nonmonotonic reasoning (Schurz 1997). Most of the studies used unspecified phrases like normally (Pelletier and Elio 2003; Voel 1995) or usually (Ford and Billington 2000). This may result in investigating the understanding of verbal phrases, but not nonmonotonic reasoning.

2. EXPERIMENTS

Four experiments were conducted to investigate empirically the AND, the OR and the LLE of SYSTEM P . In all studies, subjects were tested individually, using paper and pencil material. The rules were packed into cover stories. Probabilities of the nonmonotonic conditionals were transformed into percentages. The AND rule, for example, was represented by the following structure: Consider a group A. About A we know that x% of A are B (premise 1: A |∼x B), and we know that y% of A are C (premise 2: A |∼ y C). What is the percentage of B and C in group A (conclusion: A |∼z B ∧ C)?

The percentages in the premises were presented in terms of interval (from at least x% to at most 100%) or in terms of point percentages. The subjects were free to respond either by interval or by point percentages. The upper probability bounds of the premises were held constant at 100%. Two introductory examples were presented and explained. We varied the cover stories, the numerical values of the percentages in the premises, and the format in which the premises were presented, namely in the form of

TABLE I

(Axiom)

(Derived rule)

| α ↔ β , α |∼x γ β |∼z γ | α → β , γ |∼x α γ |∼z β α |∼x γ , β |∼ y γ α ∨ β |∼z γ α ∧ β |∼x γ , α |∼ y β α |∼z γ α |∼x β , α |∼ y γ α ∧ β |∼z γ α |∼x β , α |∼ y γ α |∼z β ∧ γ

α |∼z α

Rule

x∗ + y∗ > 1 i f x∗ + y∗ ≤ 1 0,

x∗ +y∗ −1 , if x∗

z ∗ = max{0, x∗ + y∗ − 1}

z∗ =

z ∗ = x∗ y∗ 

x∗ y∗ x∗ + y∗ − x∗ y∗

z ∗ = min{x ∗ , y ∗ }

y∗ y∗

x ∗ + y ∗ − 2x ∗ y ∗ 1 − x ∗ y∗ z ∗ = x ∗ y∗ + 1 − y∗  ∗ y , i f x∗ < ∗ z = x∗ 1, i f x∗ ≥

z∗ =

z∗ = 1

z ∗ = x∗ z∗ =

z∗ = x ∗

z∗ = 1

z ∗ = x∗

z∗ = 1

Propagation rule (z ∈ [z ∗ , z ∗ ], 0 ≤ z ∗ ≤ z ∗ ≤ 1)

‘|∼’ denotes the nonmonotonic conditional, ‘|’ the classical logical validity, ‘↔’ the material equivalence, ‘→’ the material implication, ‘∨’ the disjunction and ‘∧’ denotes the conjuction. The ‘,’ separates the elements of the premise set. The premises are written above the line and the conclusions are written below the line, respectively.

AND

CAUTIOUS MONOTONICITY

CUT

OR

RIGHT WEAKENING

LEFT LOGICAL EQUIVALENCE

REFLEXIVITY

Name of the rule

can be inferred from the probabilities associated with the premises, as indicated

SYSTEM P and propagation rules for lower (z ∗ ) and upper (z ∗ ) probability bounds. The probabilities associated with the conclusions

96 NIKI PFEIFER AND GERNOT D. KLEITER

COHERENCE AND NONMONOTONICITY IN HUMAN REASONING

97

point and in the form of interval percentages. A more detailed description of the experiments can be found in (Pfeifer 2002). 2.1. Study 1: The AND Rule The AND rule is of special interest, since it is a bridge between nonmonotonic reasoning and the research on the conjunction fallacy. 2.1.1. Method Eighty students of the University of Salzburg participated in Study 1. No students of mathematics were included. Subjects were paid four Euros each. A prize was promised to the ten best participants. Each subject received a booklet containing a general introduction, one example with point, and one with interval percentages. Three target tasks were presented on separate pages. Eleven additional target tasks were presented in tabular form. The first three target tasks were of the following kind: Please imagine the following situation: In a train station a tourist party from Alsace is waiting for their train connection. About this tourist party we know the following: at least 89%, and at most 100%, speak German. at least 91%, and at most 100%, speak French. Please try to determine what is the percentage of this tourist party that speaks both, German and French. The solution is either a point percentage or a percentage between two boundaries (from at least . . . to at most . . .): a.) If you think that the correct answer is an point percentage, please fill in your answer here: Point percentage Exactly . . .% of the tourist party speak German and French.

|——————————–| 0 25 50 75 100 %

b.) If you think that the correct answer lies within two boundaries (from at least . . . to at most . . .), please mark the two values here: Within the bounds of: At least . . .%, and at most . . .%, of the tourist party speak German and French.

|——————————–| 0 25 50 75 100 %

There were two experimental conditions: (i) the POINT versus INTER condition and (ii) the STRUCTURE versus NO STRUCTURE condition. In the POINT condition each premise specified just one percentage number, whereas in the INTERVAL condition lower and upper percentages were specified. VAL

98

NIKI PFEIFER AND GERNOT D. KLEITER

The point percentages of the two premises in the first task were 89 and 91%, in the second task 99 and 63%, and in the third task the percentages were 64 and 98%, respectively. The lower percentages in the INTERVAL condition were the same as the point percentages just given. The upper percentages were always 100%. In the STRUCTURE condition a 2 × 2 table was presented containing the column labels ‘German’ and ‘not-German’, the row labels ‘French’ and ‘not-French’, and the associated percentages. It is hypothesized that the structure should make the task easier. Twenty subjects were assigned to each of the four conditions. The next two tasks were formulated accordingly. Task one and task three were plausible with respect to the percentages of the spoken languages, since bilingualism is common in the according geographical areas. In the third task, however, this plausibility does not hold. In addition, the subjects were asked to solve eleven analogous tasks presented in tabular form, as follows: The following table lists eleven African villages (a–k). In each of them, two languages (A, B) are spoken. It is known how many percent are speaking language A, and it is known, how many percent are speaking language B. Please determine now how many percent of people are speaking both languages (language A and language B): Village

Speaking language A

Speaking language B

a

60% to 100%

70% to 100%

...

...

...

k

79% to 100%

88% to 100%

Speaking A and B

The subjects were tested individually in a quiet room in the psychology department. Subjects were told to take as much time as they want. In case of questions, the subjects were asked to reread the instructions carefully. 2.1.2. Results and Discussion At the end of each session the subjects rated the overall comprehensibility and the overall difficulty of the tasks on a rating scale from one (very comprehensible and very easy, respectively) to five (very incomprehensible and very difficult, respectively). The mean comprehensibility was 1.65 (SD = 0.82) and the mean difficulty was 3.45 (SD = 0.93). This indicates that the task comprehension was good, but the tasks were considered to be difficult. We denote the first three tasks by A1, A2, and A3, and the eleven tasks presented in tabular form by B1, . . . , B11. Task B7 (100% of group G speak A, 100% of G speak B, how many of G speak A and B?) was excluded from the data analysis. The 2 × 2 univariate Analyses of Variance performed on lower bound responses produced statistically significant differences for the condition INTERVALS versus POINT; see Table II. Mean

COHERENCE AND NONMONOTONICITY IN HUMAN REASONING

99

TABLE II Mean lower bound responses. ‘**’ denotes a significance level of p ≤ .01, ‘*’ of p ≤ .05, and ‘† ’ of p ≤ .1. Main effects of the POINT / INTERVAL factor in 2 × 2 Analyses of Variance (n = 79, df = 1/75)

POINT INTERVAL

POINT INTERVAL

A1**

A2†

A3*

B1*

B2*

B3†

B4†

52.70 73.87

46.15 55.46

45.83 56.99

30.48 41.93

33.15 44.51

37.74 45.87

42.11 50.71

B5*

B6**

B8**

B9†

B10*

B11†

33.03 44.35

24.85 40.18

11.65 29.62

49.66 62.18

30.48 41.72

50.81 62.13

lower bound responses in all tasks were higher in the INTERVAL condition than in the POINT condition. No main effect was found in the STRUCTURE condition (present / not present). No interaction effect of both conditions was found. Hence, we pooled the data such that two groups remain: the POINT (n = 39) versus the INTERVAL condition (n = 40). In the INTERVAL condition, 88.46% of the subjects responded correctly that the solution is an interval and not a point percentage (n = 39), but only 56.92% of the POINT condition gave interval responses (n = 40). Since the first three tasks (A1, A2, A3) are each presented on a separate page, while the ten remaining tasks (B1, . . . , B11) are presented in a tabular form, Table III summarizes numbers and percentages of correct interval responses separately. About 25% of the subjects in both conditions responded with the normative upper and lower bounds. Table IV lists categories of interval responses in 3 × 3 tables, where the columns define whether the subjects’ lower bounds are below (L B), within (L W ), or above (L A) the normative intervals, and the rows define whether subjects upper bounds are above (U A), within (U W ) or below (U B) the normative solution. The according frequencies of coherent responses in the INTERVAL condition are: 31, 28, 31, 32, 32, 32, 30, 32, 33, 34, 32, 32, 31, and 31 (n = 39; listed in the same order as in Table IV). In both conditions the most frequent responses were coherent (bold). Conjunction fallacies are clearly less frequent compared with lower bound responses that are lower than the normative lower bound.

100

NIKI PFEIFER AND GERNOT D. KLEITER

TABLE III Numbers and percentages of correct interval responses (i.e., both bounds correct) Condition

POINT

%

INTERVALS

Tasks A1–A3 Correct

Within ±2%

Tasks B1–B6 and B8–B11 Correct Within ±2%

30 (25.00%) 26 (22.22%)

54 (45.00%) 66 (56.41%)

98 (24.50%) 104 (26.67%)

119 (29.75%) 124 (31.79%)

In the tasks presented in tabular form, two pairs of tasks had the same premises, but with interchanged positions. These were tasks B2 and B5, and B9 and B11. The correlations between the lower and upper bound responses of these task-pairs are in both conditions very high: POINT condition (n = 40): 0.985 and 0.970 (lower bounds), and 0.993 and 0.959 (upper bounds), respectively. For the INTERVAL condition only the lower bound correlations were calculated, namely 0.997 and 0.998 (n = 39), since nearly all subjects responded correctly ‘100%’ as upper bound. Therefore, our subjects treated the premises as commutative. In addition, this speaks for the robustness of the data. Most responses of our subjects fell into the normatively correct probability intervals. This means that clearly more probabilistic inferences were coherent than incoherent. The heuristics and biases approach typically reports upper bound violations. We observed more lower than upper bound violations. Mean lower bound responses of the INTERVAL condition were higher than the mean lower bounds of the POINT condition. This effect may be produced by the fact that four percentages were presented in each task of the INTERVAL condition, but only two percentages were presented in the POINT condition: the upper bounds (100%) may, thus, induce an anchor in the INTERVAL condition. This possibility will be investigated in Study 4.

2.2. Study 2: The OR Rule and the LLE Rule The OR rule, is of special interest, since its presence is characteristic for SYSTEM P , and distinguishes SYSTEM P from the weaker SYSTEM C , which lacks the OR rule (Kraus et al. 1990, p. 190). We used a qualitative response mode for this rule, because of its complex semantical structure. LEFT LOGICAL EQUIVALENCE ( LLE ) is a very simple rule, and should be solved correctly.

UA UW UB

UA UW UB

UA UW UB

4 19 –

0 – –

1 26 –

0 – –

1 6 1

0 31 –

1 – –

Task B8 (2–51) L B LW LA

1 10 2

Task B3 (45–55) L B LW LA

0 12 5

Task A1 (80–89) L B LW LA 4 23 –

0 – –

1 25 –

0 – –

1 8 7

1 23 –

0 – –

Task B9 (67–79) L B LW LA

1 9 4

Task B4 (55–56) L B LW LA

0 9 4

Task A2 (62–63) L B LW LA 3 21 –

1 – –

1 24 –

2 – –

1 8 3

1 26 –

1 – –

Task B10 (33–56) L B LW LA

1 10 2

Task B5 (35–63) L B LW LA

0 11 4

Task A3 (62–64) L B LW LA 1 30 –

1 – –

0 31 –

1 – –

1 8 6

1 24 –

0 – –

Task B11 (67–79) L B LW LA

1 6 1

Task B6 (20–60) L B LW LA

0 8 0

Task B1 (30–60) L B LW LA

1 25 –

2 – –

0 30 –

a d f

b e –

Cells L B LW

6 0 4

c – –

LA

0 – –

Task B7 (100–100) L B LW LA

1 9 2

Task B2 (35–63) L B LW LA

Interval responses of the POINT condition, normative intervals in brackets (n = 40). U A: the subjects’ upper bound is above the normative upper bound, U W : upper bound within normative interval, U B: upper bound below normative lower bound; L A, L W , and L B: same for subjects’ lower bounds. a: too wide intervals, b: lower bound correct, c: both bounds above, d: upper bound correct, e: both bounds correct f: both bounds below

TABLE IV

COHERENCE AND NONMONOTONICITY IN HUMAN REASONING

101

102

NIKI PFEIFER AND GERNOT D. KLEITER

TABLE V Mean percentages of correct lower (L B) and upper bound (U B) responses and correct within ±2% of the POINT condition (first row), the INTERVAL condition (second row), and of both conditions (last row, n = 79). The B tasks are the ten tasks presented in tabular form Task A1–A3 LB UB

A1–A3 (±2%) LB UB

Ten B Tasks LB UB

B Tasks (±2%) LB UB

45.83% 52.50% 24.79% 77.78% 35.44% 64.98%

58.33% 72.50% 58.97% 86.32% 58.65% 79.32%

46.25% 52.50% 26.92% 74.62% 36.71% 63.42%

51.50% 57.25% 33.85% 75.13% 42.78% 66.08%

2.2.1. Method Twenty paid university students participated in Study 2. Students of mathematics were not included. As in Study 1, each subject received a booklet containing an introduction with examples. Each subject solved three tasks involving the OR and one task involving the LLE rule. Each task was presented on a separate page. The order of the task followed a pseudo-random order. In the OR tasks the subjects were asked to imagine: At a university, there are two leisure activity courses: a skiing course and a fencing course. About these courses we know the following: at least 76%, and at most 100%, of the students who are in the skiing course are female. at least 85%, and at most 100%, of the students who are in the fencing course are female. On Wednesday evening, all students in both courses meet each other in the gym (nobody else is there). Which one of the following assertions is true? (Please tick just one alternative)  One can say for certain that overall there are more female than male students in the gym (that is, there are more than 50% females).  One cannot say for certain that overall there are more female than male students in the gym (that is, there are more than 50% females).  Given the information above, one can say that the percentage of female students in the gym is a point percentage.  Given the information above, one can say that the percentage of female students in the gym lies within a certain interval (from at least . . . %, to at most . . . %).

The other two OR tasks differ in the values of the lower bounds, 75 and 97%, and 57 and 62%, respectively. The LLE task was presented as follows:

COHERENCE AND NONMONOTONICITY IN HUMAN REASONING

103

A scientist wants to know how a certain disease is caused. She assumes that the shape and the color of the blood cells matter. Hence, she investigated the shape of 100 blood cells. Her assistant investigated the color of the same 100 blood cells. The following was found: all oval blood cells are grey all grey blood cells are oval at least 70%, and at most 100%, of the grey blood cells lead to the disease with certainty The scientist asks herself: How often is the disease present, if the blood cells are oval?

As in Study 1, the subjects could give either point or interval percentages as responses. 2.2.2. Results and Discussion At the end of each session the subjects rated the overall comprehensibility of the tasks, and the overall confidence that their solutions are correct, on a rating scale from one (very comprehensible and very certain, respectively) to five (very incomprehensible and very uncertain, respectively). The mean comprehensibility was 2.10 (S D = 0.91) and the mean certainty was 2.75 (S D = 1.16). This indicates that the task comprehension was good, and subjects had intermediate confidence in their inferences. In the OR task with 57 and 62% in the premises, thirty-five percent of the subjects responded correctly that it is not certain that there are more female than male students in the gym. In the remaining two OR tasks ninety-five percent of the subjects responded correctly by saying that it is certain that there are more female than male students in the gym. In all three tasks all twenty subjects responded correctly, that based upon the information given, the percentage of female students in the gym lies within an interval. In the LLE task nineteen out of twenty subjects responded correctly that if the blood cells are oval, then the disease is present with at least 70%, and at most 100%. One subject responded with “exactly 100%”. It is interesting to note that in the task with 57 and 62% as lower bounds in premises, 35% of subjects seem to understand that the lower bound of the conclusion can be below 50%. In fact, the normative lower bound is 42.24%. It is far more easier to see that the conclusion must be above 50% in the other two OR tasks, and therefore it is clear that 95% of the subjects gave correct responses. The LLE task was, as expected, solved correctly by nearly all subjects. We think that this speaks for the validity of our experimental device. Da Silva Neves et al. used quite different methods, but could not corroborate the LLE rule (Da Silva et al. 2002).

104

NIKI PFEIFER AND GERNOT D. KLEITER

2.3. Study 3: AND, LLE, LINDA 2.3.1. Method Study 1 contains dispositional properties, namely languages that are spoken. In Study 3 we tried to formulate more concrete properties that might be easier to be represented by the subjects. Forty unpaid university students were assigned to two different groups, with twenty persons each. The experimental condition was PROPERTY versus ASSIGNMENT. In the PROP ERTY condition, the objects in the tasks were properties of individuals (e.g., being light-skinned and having black hair). In the ASSIGNMENT condition, the objects in the tasks were objects that are assigned to individuals (e.g., rice bags given to families). Each subject got a booklet containing introductory examples, four AND tasks, two LLE tasks, and the original LINDA task (cf. p. 1). The positions of the AND and LLE tasks were pseudo-randomized. The LINDA task was always the last problem. Each subject solved two AND tasks and one LLE task with interval premises, 80 and 65%, 70 and 55%, and 80% as lower bounds, respectively. The upper bounds were always 100%. In addition, all subjects solved the same tasks again, with the difference that these tasks were formulated in terms of point percentages, which were identical to the lower bounds just given. A prototypical AND task of the PROPERTY condition was presented as follows: We know the following about students of a high school: At least 55%, and at most 100%, are light-skinned. At least 70%, and at most 100%, have black hair. Please try to determine what is the percentage of the students that are both, light-skinned and have black hair.

A prototypical LLE task of the ASSIGNMENT condition was as follows: Above a village in Afghanistan an aeroplane drops aid in form of rice sacks and potato sacks. About this aid, we know the following: All potato sacks are blue. All blue sacks contain potatoes. Exactly 80% of the potato sacks are found by the villagers. Please try to determine what is the percentage of the blue sacks that are found by the villagers.

As in Study 1, subjects could respond either by point or by interval percentages.

COHERENCE AND NONMONOTONICITY IN HUMAN REASONING

105

2.3.2. Results and Discussion The mean comprehensibility of the tasks was 1.97 (S D = 1.09) and the mean certainty of correct solutions was 3.05 (S D = 1.07). This indicates that the task comprehension was good, and the subjects had intermediate confidence in their inferences. No significant differences were found when comparing the mean responses in the PROPERTY and the ASSIGNMENT conditions by t-tests. Therefore the data from both groups were pooled for further analysis. In the four AND tasks 11.25% of the subjects responded normatively correct upper and lower bounds. In the LLE task 92.50% upper and lower bound responses were correct. Table VI categorizes the frequencies of subjects’ responses. The tasks with 100% as normative upper bounds were the INTERVAL tasks, and the remaining tasks contained POINT in the premises. As in Study 1, the most frequent responses are coherent (bold, cell e). Within the non-coherent responses, no systematic tendency of upper or lower bound violations could be observed. In the AND tasks 14.38% of the lower bound responses are identical to the normative lower bounds; within ±5%, 20.00% of the subjects gave correct lower bound responses. In the AND tasks with intervals in the premises twenty-six and twenty-nine out of forty subjects gave correct upper bound responses, and twenty-one and nineteen to the respective AND tasks that contained point percentages in the premises. In the LLE tasks 95.00% of the subjects gave correct lower bound responses, 92.50% of the subjects gave correct upper bounds to the task that contained intervals and 97.50% responded correctly to the task that contained point percentages in premises. Paired t-tests between the two AND tasks, with 45% as normative lower bound, displayed a significant difference (t = 2.510, d f = 39, p = 0.016). However, paired t-tests comparing lower bounds of the two other AND tasks with 25% as normative lower bounds showed no significant differences. Mean lower bound responses of the interval tasks were higher than in the corresponding point percentage tasks. This is similar to the findings of Study 1. With the LINDA task we replicate the findings of Kahneman and Tversky. The original study reports that 85% of the subjects ranked the three sentences as follows: ‘Linda is active in the feminist movement’ (A) as most probable, ‘Linda is a bank teller’ (B) as least probable and the conjunction of both sentences as intermediately probable: P(A) > P(A ∧ B) > P(B) (Tversky and kahneman 1984, p. 297). We observed that 82.50% of the subjects rated the sentences according to this pattern. Solving AND and LLE tasks before the LINDA task has no effect of being more cautious with the LINDA task.

106

NIKI PFEIFER AND GERNOT D. KLEITER

TABLE VI Study 3 (n = 40). The normative intervals are given in the brackets. For explanation see Table IV AND

LB UA UW UB

– 5 2 AND

LB UA UW UB

0 6 7

Task (25–100) LW LA – 33 –

– – –

Task (45–65) LW LA 11 16 –

0 – –

AND

LB – 3 5 LLE

LB – 1 0

Task (45–100) LW LA – 32 –

– – –

Task (80–100) LW LA – 39 –

– – –

AND

LB 1 5 1 LLE

LB 0 1 0

Task (25–55) LW LA 10 23 –

0 – –

Task (80–80) LW LA 0 39 –

0 – –

2.4. Study 4: AND One finding of Study 1 was that subjects in the INTERVAL condition gave significantly higher lower bounds than subjects of the POINT condition. Study 4 investigates the question whether this effect is produced by presenting four percentages in the interval condition, that is, two lower and two upper percentages, while only two percentages were presented in the POINT condition. The method and the procedure were essentially the same as in the IN TERVAL and NO STRUCTURE condition of Study 1. The difference was that “to at most 100%” was dropped in the premises. Correlations of lower bound responses of those task pairs which have commutative premises are again as high as in Study 1, namely 0.973 and 0.984. Performing t-tests on the data of the INTERVAL condition of Study 1 and data of Study 4, comparing lower and upper percentages, respectively, does not show significant differences. This shows that stating explicitly the upper bound does not produce the higher lower bound responses in the INTERVAL condition compared with the lower bound responses in the POINT condition.

COHERENCE AND NONMONOTONICITY IN HUMAN REASONING

107

3. GENERAL DISCUSSION

When studying human nonmonotonic reasoning it seems to be a good strategy to start at a point where previous psychological research can be related to central principles of nonmonotonic reasoning. Probability is such a point. We thus compared human reasoning under incomplete information with one of the probability semantics of nonmonotonic reasoning. Practically all judgmental violations of probability theory seem to violate principles of probabilistic non-infinitesimal interpretations of nonmonotonic reasoning. There are, however, a number of differences between the more traditional judgment under uncertainty approach and the coherence oriented interpretation of nonmonotonic reasoning. An obvious difference is that between point and interval probabilities. While traditional probability judgment is concerned with point probabilities, coherent nonmonotonic reasoning is concerned with interval probabilities. Intervals arise in situations with incomplete information. Furthermore, there is a subtle but important difference between psychological experiments on ‘judgment’ and experiments on ‘reasoning’. In experiments on human judgment the subjects respond more spontaneously, they take less time and invest less processing effort than in experiments on human reasoning. In the latter, subjects are instructed to take more time and to ‘solve a problem’. In our experiments we explained the tasks very carefully, we first presented introductory examples, we told the subjects to take as much time as they need, we paid the subjects, and we announced a bonus for the best solutions. Moreover, all subjects were tested individually. Perhaps as a consequence of this care, we observed very high reliabilities. In the tasks investigated, the majority of the responses of our subjects were coherent and in agreement with the probability semantics of SYSTEM P . This is especially true for the LEFT LOGICAL EQUIVALENCE ( LLE ), but also for the conjunction (AND) and the disjunction (OR) rule. The conjunction fallacy that is frequently reported in the literature was not prominent in our data. However, presenting the original LINDA task does reproduce the findings reported by Kahneman and Tversky (Tversky and Kahneman 1983). The LINDA task was presented after the AND tasks. This did not reduce the conjunction fallacy in the LINDA task. There was no ‘transfer’ from the AND tasks to the LINDA task. In Study 1 we observed more ‘lower bound’ than ‘upper bound’ violations. This might be caused by a suspected negative correlation in our cover stories: Speaking one language is normally negatively correlated with speaking a second language.

108

NIKI PFEIFER AND GERNOT D. KLEITER

The lower bound responses in the INTERVAL condition are higher than in the point point percentage condition. Study 4 shows that this effect is not produced by a matching with respect to those numbers already contained in the explanation of the task. Closely related: the POINT condition produces more lower bound violations. Practically all subjects solved the LLE tasks correctly, which makes the LLE rule attractive for mental rule theories. Mental rule theories postulate that the human inference engine is driven by basic formal rules like MODUS PONENS (Rips 1994; Braine and O’Brien 1998). Furthermore, LLE is a candidate for a rule of a mental probability logic, which is interested in the subjects’ propagation of probabilities from the premises to the conclusions. We found relatively good agreement of human reasoning and a selection of SYSTEM P rules. This indicates that after careful explanation of the tasks and under thorough experimental conditions subjects make coherent inferences and respect the linear constraints of coherence. It would be misleading, though, to speculate that our subjects have a ‘nonmonotonic inference engine’ in their minds that processes incomplete uncertain information. Even if human subjects were perfect in handling the axioms and some elementary theorems of SYSTEM P, they would not necessarily be able to handle more complex tasks. There are ‘teaser tasks’ in which human intuition, even after careful thought, conflicts with coherence. One of them is Simpson’s Paradox, which has in its simplest version the same form as the OR rule of SYSTEM P.

ACKNOWLEDGEMENTS

This study was financially supported by the Austrian Research Fonds FWF (SFB-F012), and by the senate of the University of Salzburg (Experiments on uncertain reasoning).

REFERENCES

Braine, M. D. S. and D. P. O’Brien (eds.): 1998, Mental Logic, Erlbaum, Mahwah, NJ. Coletti, G. and R. Scozzafava: 2002, Probabilistic Logic in a Coherent Setting, Kluwer, Dordrecht. Da Silva Neves, R., J.-F. Bonnefon, and E. Raufaste: 2002, ‘An Empirical Test of Patterns for Nonmonotonic Inference’, Annals of Mathematics and Artificial Intelligence 34, 107–130. Ford, M. and D. Billington: 2000, ‘Strategies in Human Nonmonotonic Reasoning’, Computational Intelligence 16(3), 446–468. Gilio, A.: 2002, ‘Probabilistic Reasoning under Coherence in System P’, Annals of Mathematics and Artificial Intelligence 34, 5–34.

COHERENCE AND NONMONOTONICITY IN HUMAN REASONING

109

Hertwig, R. and G. Gigerenzer: 1999, ‘The “Conjunction Fallacy” Revisited: How Intelligent Inferences Look Like Reasoning Errors’, Journal of Behavioral Decision Making 12, 275–305. Kraus, S., D. Lehmann, and M. Magidor: 1990, ‘Nonmonotonic Reasoning, Preferential Models and Cumulative Logics’, Artificial Intelligence 44, 167–207. Lifschitz, V.: 1989, ‘Benchmark Problems for Formal Nonmonotonic Reasoning, Version 2.00’, in M. Reinfrank, J. de Kleer, and M. Ginsberg (eds.), Nonmonotonic Reasoning, Springer, Berlin, pp. 202–219. Mellers, B., R. Hertwig, and D. Kahneman: 2001, ‘Do Frequency Representations Eliminate Conjunction Effects? An Exercise in Adversarial Collaboration’, Psychological Science 12(7), 269–275. Pelletier, F. J. and R. Elio: 2003, ‘Logic and Computation: Human Performance in Default Reasoning’, in P. Gärdenfors, J. Wolenski, and K. Kijania-Placet (eds.): In the scope of Logic, Methodology and Philosophy of Science, Vol. I, Kluwer, Dordrecht, pp. 137–154. Pfeifer, N.: 2002, ‘Poychological Investigations on Human Nonmonotonic Reasoning with a Focus on System P and the Conjunction Fallacy’, Master’s thesis, Institut für Psychologie, Universtät Salzburg. [email protected]. Rips, L. J.: 1994, The Psychology of Proof: Deductive Reasoning in Human Thinking, MIT Press, Cambridge, MA. Schurz, G.: 1997, ‘Probabilistic Default Reasoning Based on Relevance and Irrelevance Assumptions’, in D. e. a. Gabbay (ed.), Qualitative and Quantitative Practical Reasoning, No. 1244 in LNAI, Springer, Berlin, pp. 536–553. Schurz, G.: 2001, ‘Nichtmonotones Schließen: Ergebnisse einer empirischen Untersuchung’, Technical report, Institut für Philosophie, SFB F012 Forschungsmitteilungen, p. 17. Tversky, A. and D. Kahneman: 1983, ‘Extensional versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment’, Psychological Review 90, 293–315. Vogel, C.: 1996, ‘Human Reasoning with Negative Defaults’, in D. Gabbay and H. J. Ohlbach (eds.), Practical Reasoning, Lecture Notes in Artificial Intelligence, 1085, Springer, Berlin, pp. 606–621. N. Pfeifer and G.D. Kleiter Department of Psychology University of Salzburg Hellbrunnerstraße 34 5020 Salzburg, Austria E-mails: [email protected]; [email protected]