Working memory, metacognitive uncertainty, and ...

7 downloads 0 Views 218KB Size Report
We would like to thank. Ruth Byrne, Stephen Payne, Miles Richardson, and an anonymous reviewer for their helpful comments on earlier versions of this paper.
THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2000, 53A (4), 1202±1223

Working memory, metacognitive uncertainty, and belief bias in syllogistic reasoning Jeremy D. Quayle and Linden J. Ball University of Derby, Derby, U.K.

Studies of syllogistic reasoning have shown that the size of the belief bias effect varies with manipulationsof logical validityand problemform. This paper presents a mental models-based account, which explains these ®ndings in terms of variations in the working-memory demands of different problem types. We propose that belief bias may re¯ect the use of a heuristic that is applied when a threshold of uncertainty in one’s processingÐattributable to working-memory overloadÐis exceeded during reasoning. Three experiments are reported, which tested predictions deriving from this account. In Experiment 1, conclusions of neutral believability were presented for evaluation, and a predicted dissociation was observed in con®dence ratings for responses to valid and invalid arguments, with participants being more con®dent in the former. In Experiment 2, an attempt to manipulate working-memory loads indirectly by varying syllogistic ®gure failed to produce predicted effects upon the size of the belief bias effect. It is argued that the employment of a conclusion evaluationmethodology minimized the effect of the ®gural manipulation in this experiment. In Experiment 3, participants’ articulatory and spatial recall capacitieswere calibrated as a direct test of working-memory involvement in belief bias. Predicted differences in the pattern of belief bias observed between high and low spatial recall groups supported the view that limited working memory plays a key role in belief bias.

Categorical syllogisms are one type of a number of problems that are used in studies of deductive reasoning. These problems have two premises and a conclusion. For example: Some politicians are drinkers No drinkers are judges Therefore, some politicians are not judges The premises feature two end terms (politicians and judges) and a middle term (drinkers). A logically valid conclusion to a syllogism describes the relationship between the Requests for reprints should be sent to J.D. Quayle, Cognitive and Behavioural Sciences Research Group, Institute of Behavioural Sciences, University of Derby, Mickleover, Derby, DE3 5GX, U.K. Email: [email protected] The reported research was supported by a 3-year grant from the University of Derby. We would like to thank Ruth Byrne, Stephen Payne, Miles Richardson, and an anonymous reviewer for their helpful comments on earlier versions of this paper.

q 2000 The Experimental Psychology Society http://www.tandf.co.uk/journals/pp/02724987.html

BELIEF BIAS IN SYLLOGISTIC REASONING

1203

classes of items or individuals referred to by the end terms in a way that is necessarily true given that the premises are true. Participants in syllogistic reasoning experiments are required either to produce their own valid conclusions or to evaluate the validity of presented conclusions. Studies using syllogisms with abstract or arbitrarily realistic contents have shown that errors are not random, but vary systematically according to two main factors termed ®gure and mood (see Evans, Newstead, & Byrne, 1993). Figure refers to the four different ways in which the three terms can be arranged within the premises; these are A±B, B±C, A±B, C±B, B±A, B±C, B±A, C±B (where B refers to the middle term, A refers to the end term in the ®rst premise, and C refers to the end term in the second premise). Mood refers to the different combinations of logical quanti®ers featured within the syllogism. Conventionally, these quanti®ers are referred to by letters of the alphabet: A 5 all; E 5 no (or none); I 5 some; and O 5 some . . . are not. The syllogism quoted earlier, therefore, can be said to be in the A±B, B±C ®gure and in the IEO mood. In addition to the effects of ®gure and mood, people’s beliefs and prior knowledge have been found to affect responding, leading to a phenomenon termed belief bias. This occurs whenever responses are given on the basis of a conclusion’s believability, despite instructions stressing that responses should be made on the basis of logical validity. In studies where belief and logic have been manipulated (e.g., Evans, Barston, & Pollard, 1983), three basic effects have emerged. First, believable conclusions are more readily accepted than unbelievable ones. Second, logically valid conclusions are more readily accepted than invalid ones. Third, there is an interaction between logical validity and believability such that the effects of believability are stronger on syllogisms leading to invalid conclusions than on those leading to valid conclusions. Three main theories have been proposed to account for belief bias effects: (1) the selective scrutiny model (e.g., Evans, 1989); (2) the misinterpreted necessity model (also see Evans, 1989); and (3) the mental models account (e.g., Oakhill, Johnson-Laird, & Garnham, 1989). The selective scrutiny model suggests that a conclusion’s prior believability determines whether participants engage in tests of logical necessity. If the conclusion is believable, then it is accepted without any attempt at logical evaluation. It is only when the conclusion con¯icts with prior beliefs that participants consider whether the conclusion necessarily follows from the premises. Hence, logic has a greater effect on the acceptance of unbelievable conclusions than believable ones. The misinterpreted necessity model was proposed as an alternative to the selective scrutiny model by Evans et al. (1983; see also Barston, 1986; Evans, 1989; Evans et al., 1993). It is based on Dickstein’s (1980) suggestion that people frequently endorse conclusions that are simply consistent with premises (i.e., indeterminately invalid conclusions), because they fail to understand that for a conclusion to be valid it must be necessitated by the premises. The misinterpreted necessity model of belief bias claims that participants initially decide whether a conclusion is falsi®ed by the premises. Any conclusion that is falsi®ed is rejected. If a conclusion is not falsi®ed, then participants judge whether it is determined by the premises. Conclusions that are determined by the premises are correctly accepted. If a conclusion is not determined by the premises, then a response is made based upon the believability of the conclusion.

1204

QUAYLE AND BALL

Both the selective scrutiny and misinterpreted necessity models have been criticized for failing to account for ®ndings in the literature. For example, the selective scrutiny model cannot explain the effects of logic found with problems leading to believable conclusions (Evans et al., 1983), and the misinterpreted necessity model cannot explain the effects of belief found with syllogisms other than those leading to indeterminately invalid conclusions (Newstead, Pollard, Evans, & Allen, 1992). Other failings of these models seem attributable to the non-speci®cation of the mechanisms by which deductive inferences are made (cf. Oakhill & Garnham, 1993). The mental models account of belief bias overcomes a number of these failings (cf. Evans et al., 1993; Oakhill & Garnham, 1993), and is set within the framework of the mental models theory of syllogistic reasoning (Johnson-Laird & Byrne, 1991). This theory proposes that people begin syllogistic reasoning by constructing a mental model that makes explicit the minimum amount of information concerning the logical relationship between premise terms. Some terms will be exhaustively represented in an initial mental model (i.e., they cannot occur elsewhere in the model) whereas others will not. As such, initial models may support a number of different putative conclusions. A reasoner may test putative conclusions that are true in an initial model against ``¯eshed-out’’ mental models that make explicit more information concerning the term(s) that are not exhaustively represented. If a model cannot be constructed that falsi®es a putative conclusion, then the conclusion remains validÐotherwise the conclusion is invalid. Syllogisms that require the consideration of more than one mental model to test a conclusion (i.e., multiple-model syllogisms) are claimed to place considerable demands on limited working-memory resources (Johnson-Laird & Bara, 1984), and should be more dif®cult than one-model problems. Many empirical studies support this prediction (e.g., Bara, Bucciarelli, & Johnson-Laird, 1995; Johnson-Laird & Bara, 1984). The mental models account of belief bias assumes that believability determines how far reasoners proceed in constructing alternative models of the premises that might render a putative conclusion invalid. First, reasoners construct an initial model of the premises, and any conclusion that is not true in this model is rejected. Next, the believability of the conclusion is considered, with believable conclusions being accepted without further consideration of logical validity. Only when a conclusion is unbelievable are reasoners assumed to attempt to construct alternative models of the premises that might falsify the conclusion (Oakhill & Johnson-Laird, 1985). As the mental models account claims that conclusion believability has an in¯uence only when alternative models can be constructed, it predicts no believability effects with one-model problems. Oakhill et al. (1989) tested this idea using a conclusion-generation task and found strong effects of belief with both one-model and multiple-model syllogisms. To explain this, Oakhill et al. suggested the notion of conclusion ®ltering, arguing that when putative conclusions are unbelievable, reasoners are inclined either to respond that there is no valid conclusion, or to change the conclusion’s wording so that the conclusion produced is believable. Being an ad hoc mechanism, the idea of conclusion ®ltering has been heavily criticized (Evans et al., 1993; Newstead & Evans, 1993). There are, however, other important ®ndings that this modi®ed mental models account seems unable to explain. For example, as unbelievable conclusions are assumed to be subjected to greater logical analysis than are believable ones, participants should be more

BELIEF BIAS IN SYLLOGISTIC REASONING

1205

likely to generate erroneous believable conclusions than erroneous unbelievable ones. This prediction was con®rmed by Oakhill et al. (1989), though only with invalid syllogisms; no such effect was found with valid ones. The theory does not provide a ready explanation for this (cf. Evans et al., 1993). The mental models account also predicts that the logic-by-belief interaction will disappear when one-model syllogisms are presented in a conclusion-evaluation task. Conclusions to one-model syllogisms that are false in the model that is constructed should be rejected, whereas conclusions that are true in this model (whether believable or unbelievable) should be accepted. Thus, belief will have no greater effect on invalid one-model syllogisms than on valid ones. Although Newstead et al.’s (1992, Experiment 3) ®ndings support this idea, in order to make this prediction it must be assumed that conclusion ®ltering does not occur in the conclusion-evaluation paradigm. More clear-cut evidence for the mental models account, however, was reported by Newstead et al. (1992, Experiment 5) in a study using augmented problem instructions stressing the importance of logical analysis. As predicted, it was found that the size of the logic-by-belief interaction decreasedÐpresumably because the augmented instructions increased participants’ motivation to search for counter-examples when dealing with believable conclusions. Note, however, that when Evans, Newstead, Allen, and Pollard (1994) attempted to replicate this ®nding they found that similar instructional manipulations were less effective. Although the mental models account of belief bias does seem to explain more of the data than do its predecessors (cf. Evans et al., 1993), in its present form it appears unable to explain the full range of ®ndings. Nevertheless, the support that is claimed for this account (e.g., Newstead et al.’s 1992, investigations using one-model syllogisms and augmented instructions) suggests that there may be value in developing an alternative mental models account of belief bias which has greater explanatory breadth. One way in which such an account might be developed is with reference to the role of workingmemory limitations in reasoning behaviour. The notion that limited working-memory capacity impacts upon our ability to construct adequate problem representations that show a putative conclusion to be true or untrue is central to the mental models theory of syllogistic reasoning (e.g., see Johnson-Laird & Byrne, 1991). Curiously, however, this notion appears to be overlooked in the existing mental models account of belief bias. Our central proposal in developing a revised account of belief bias effects is that a clear distinction exists between the processing demands of valid and invalid multiple-model syllogisms, and that such differing demands on mental resources act as a major determinant of the belief by logic interaction evident in syllogistic reasoning. Our argument hinges upon the observation that with valid multiple-model syllogisms correct responses can be given after the construction of a single mental model in a conclusion-evaluation task (cf. Garnham, 1993; Hardman & Payne, 1995). This is because the second term in a valid conclusion to a multiple-model syllogism (all of which take the some . . . are not form) is represented exhaustively in all accurate mental models. Consequently, the relationship between the end terms in the model that shows the conclusion to be true remains unchanged when the model is ``¯eshed out’’ by adding further representations of the ®rst term in the valid conclusion. For example, if the putative conclusion ``some politicians are not judges’’ is true in an initial mental model of the syllogism presented earlier, then because the politicians represented in the model that are not judges cannot become judges

1206

QUAYLE AND BALL

when the model is ¯eshed out, the conclusion cannot be falsi®ed by constructing alternative mental models. Hence, participants can be aware that the consideration of more than one model is unnecessary in order to assess the self-evident validity of the conclusion. In addition, participants may experience a relatively high degree of certainty as to the correctness of their responses with such arguments, given their ease of processing. With indeterminately invalid multiple-model syllogisms, however, because the second term in the indeterminate conclusion is not represented exhaustively in any accurate mental model, it will typically be necessary to ¯esh out initial models in order to pursue a full evaluation of the conclusion’s validity. For example, if the putative conclusion ``some judges are not politicians’’ is true in an initial mental model of the syllogism presented earlier, and ¯eshing out this model involves adding further representations of politicians, then because the judges that are not politicians could become politicians when the model is ¯eshed out, the conclusions could be falsi®ed by constructing alternative mental models. Owing to working-memory constraints, this process of ¯eshing out the representation of an invalid multiple-model problem will often be unsuccessful, and participants may subsequently feel uncertain as to whether to accept or to reject a given conclusion. Confronted with such uncertainty as to the ef®cacy of their reasoning process, participants may be inclined either: (1) to accept the result of their incomplete evaluation and accept an invalid conclusion (this should be especially likely with syllogisms having believable or belief-neutral conclusions); or (2) to resort to a belief heuristic and reject unbelievable conclusions. From our proposed account of belief bias effects a number of testable predictions can be derived regarding the association between working memory, processing complexity, con®dence, and belief bias. First, as the evaluation of indeterminately invalid conclusions is said to place greater demands upon limited working-memory resources than does the evaluation of valid conclusions, it follows that participants will be more successful and con®dent in the ef®cacy of their reasoning processes with evaluation of valid conclusions than with evaluation of invalid conclusions. Second, just as evaluations of valid and indeterminately invalid conclusions are associated with differential processing demands, variations in processing demands should also be associated with syllogisms of differing ®gure. This is because different numbers of processing operations are associated with different syllogistic ®gures (see Johnson-Laird & Byrne, 1991). Consequently, ®gure should affect reasoning success and con®dence, and thereby produce systematic variations in the extent of belief-biased responding. Third, given the emphasis of our account on the central role of working memory as the ultimate cause of belief-bias effects, direct measures of facets of working memory should be systematically associated with individual susceptabilities to belief bias. These interrelated predictions form the focus of the series of experiments reported in the present study. EXPERIMENT 1

Experiment 1 set out to test our prediction that there would be an effect of logic upon con®dence ratings such that participants would be more con®dent in the correctness of their responses to valid problems than to invalid problems. Also, based upon the suggested manner in which invalid problems are processed, outlined earlier, it was predicted

BELIEF BIAS IN SYLLOGISTIC REASONING

1207

that there would be a tendency for participants to judge invalid arguments as being valid ones, producing high acceptance rates for invalid conclusions. Method Participants Thirty-two psychology students from Mackworth College, Derby took part in the experiment and were tested in one group. None of the participants had taken formal instruction in logic. Materials Four forms of multiple-model syllogism were used (as de®ned by Johnson-Laird & Byrne, 1991). All of the syllogisms were presented with ``Some A are not C’’ conclusions. For two of the syllogisms (in the A±B, B±C ®gure and EIO mood, and in the B±A, B±C ®gure and OAO mood) this conclusion is indeterminately invalid (i.e., it is consistent with the premises, but not necessitated by them). For the other two syllogisms (in the A±B, B±C ®gure and IEO mood, and in the B±A, B±C ®gure and AOO mood) this conclusion is valid. As can be seen, in this experiment and in the subsequent two experiments, logical validity was manipulated by varying the mood between the valid and invalid problems (the invalid problems were the same as the valid problems, except that the premise quanti®ers were switched). Although this gives complete control over ®gure and conclusion form (and thus controls for potential ®gural biases), validity was inevitably confounded with the mood of the premises. There is no empirical evidence or theory, however, that suggests that this kind of mood manipulation will affect participants’ responding independently from the effect it has upon the validity of the presented conclusions. It should be noted that the overall atmosphere of the premisesÐa factor closely related to mood, which is known to affect responses (e.g., Dickstein, 1978)Ðis not affected by this kind of manipulation. In addition to the four multiple-model syllogisms that were three one-model ®ller syllogisms in the AAA, AEE, IAI moods and in the A±B, B±C ®gure. These ®llers were used to distract the participants from the form of the syllogisms of interest. All of the syllogisms featured arbitrary thematic content designed to produce conclusions of neutral believability. The A terms were all non-gender-speci®c occupations (e.g., teachers), and the B and C terms were all hobbies and pastimes (e.g., golfers). Design A repeated measures design was used. The syllogisms were presented one to a page in test booklets, together with their potential conclusions. The order of the problem types was varied using a four-by-four balanced Latin square design, with the restriction that the ®ller items appeared in the same position in each booklet (in 2nd, 4th, and 6th places). The thematic contents of the syllogisms were rotated over the different problem types, producing four different sets of materials, which were distributed evenly and randomly amongst the participants. Three multiplemodel practice syllogisms were given in each booklet prior to presentation of the experimental syllogisms in order to familiarize the participants with the task. The participants were unaware that these were practice syllogisms.

1208

QUAYLE AND BALL

Procedure The following instructions were presented on the second page of the test booklets and were also read aloud to each subject: This is an experiment to test people’s reasoning ability. You will be given ten problems. On each page, you will be shown two statements and you are asked if certain conclusions (given below the statements) may be logically deduced from them. You should answer this question on the assumption that the two statements are, in fact, true. If, and only if, you judge that the conclusion necessarily follows from the statements, you should tick the yes box, otherwise tick the no box. Beneath the yes/no response boxes you will ®nd the following six-point scale. Not at all con®dent

Very con®dent

1

2

3

4

5

6

Please indicate how con®dent you feel about the correctness of each answer you give by circling a number on the scale. Please take your time and be sure that you have the right answer before moving on to the next problem. You should not make notes or draw diagrams to help you in this task. Thank you very much for participating. The participants were allowed as much time as they required to complete the booklet of syllogisms.

Results

The mean percentages of frequencies of participants accepting conclusions (i.e., deciding that a conclusion was logically valid) are presented in Table 1 together with the mean con®dence ratings given for valid and invalid syllogisms. A Wilcoxon signed-ranks test showed that there was a signi®cant effect of logic, z 5 2.40, p , .01, one-tailed, such that more valid conclusions were accepted than invalid conclusions. The high acceptance rate for invalid arguments supports the idea that participants are often unable to ¯esh out initial mental models in which invalid conclusions are true, and TABLE 1 Mean percentages of frequencies of participants accepting conclusions and a mean con® dence ratings given in Experiment 1

Valid Invalid M

% Accepted

Con®dence rating

91 73 82

4.44 3.82 4.13

Minimum possible rating 5 possible rating 5 6. a

1; maximum

BELIEF BIAS IN SYLLOGISTIC REASONING

1209

so tend to accept the result of their incomplete evaluations, judging invalid conclusions to be valid ones. In addition, the acceptance rates for both valid and invalid conclusions resemble those typically observed with believable conclusions in studies where believability has been manipulated. There is no evidence for the very low acceptance rates found with invalid unbelievable problems in such studies, supporting a view of (cf. Evans & Pollard, 1990) that belief bias principally re¯ects a tendency to reject invalid unbelievable arguments. As can be seen from Table 1, participants were generally con®dent in their responses in these conclusion-evaluation tasks. There was a signi®cant effect of logic upon con®dence ratings, t(31) 5 3.40, p , .005, one-tailed, such that participants were more con®dent in their responses to valid problems than to invalid ones. Discussion

These results support the view that people are less con®dent in the correctness of their responses to invalid syllogisms than to valid syllogisms. We would also suggest that such feelings of metacognitive uncertainty (i.e., uncertainty in the outcome of one’s reasoning process) may determine the extent to which participants will respond in accordance with belief. Metacognition refers to the knowledge that we possess about our own perceptions, memories, and mental processes (Metcalfe & Shimamura, 1996). Metacognition plays a major role in reasoning and problem solving, because it enables us: (1) to identify, de®ne, and mentally represent problems; (2) to plan how to proceed at different stages in solving a problem; and (3) to evaluate what we know about our performance in solving a problem (Davidson, Deuser, & Sternberg, 1996). It is all three of these abilities with which our metacognitive uncertainty account of belief bias is concerned. When tackling a syllogism, as with any other problem, the reasoner needs to be aware of what has already been done, what is currently being done, and what still needs to be done. Thus, successful solution evaluation requires an individual to have control over the mental representations that have already been constructed, as well as those that still need to be constructed (see Davidson et al., 1996, p. 218). The metacognitive uncertainty account of belief bias claims that in some instances (e.g., when evaluating an indeterminately invalid conclusion to a multiple-model syllogism) reasoners have an awareness that the current mental representation that they have constructed will not enable them to determine the logical status of a conclusion, and that this mental representation, therefore, requires elaboration (as Johnson-Laird & Byrne, 1991, put it, the mental model needs to be ¯eshed out). When working memory constrains this ¯eshing-out process, reasoners experience metacognitive uncertainty concerning the ef®cacy of their reasoning. In other words, because reasoners are aware that something more needs to be done but are unable to do it, they cannot be certain of how to respond or proceed. At this point, when faced with the opportunity of responding in accordance with prior knowledge, an internal con¯ict arises. We argue that in many instances, this internal con¯ict is resolved in favour of beliefs. In support of this account, the levels of con®dence reported for the syllogisms in Experiment 1 were consistent with the levels of belief bias that have typically been observed (e.g., Evans et al., 1983; Newstead et al., 1992)Ðthat is, lower levels of

1210

QUAYLE AND BALL

con®dence were reported with the invalid syllogisms where higher levels of belief bias are observed, and higher levels of con®dence were reported with the valid syllogisms where lower levels of belief bias are observed. The observation that mean con®dence ratings with the invalid syllogisms were in fact around the middle of the 6-point con®dence scale (M 5 3.82) and not at the extreme lower end of the scale suggests that participants have some con®dence in their responses to these problems. This presumably re¯ects the fact that the participants have engaged in some logical analysis and have not simply guessed their responses. For example, participants may have eliminated the possibility that a presented conclusion is determinately invalid (i.e., not at all consistent with the premises) by constructing an initial mental model of the premises. This observation suggests that con®dence levels may not need to be at an absolute ¯oor level for participants to fall back on a belief heuristic. Instead, con®dence levels simply need to be lower than some threshold level. EXPERIMENT 2

The ®gure of a syllogism is known to have a strong effect on the form of the conclusions that participants generate as well as on the speed and accuracy of responses. As was shown by Johnson-Laird and Bara (1984), the highest percentages of correct responses and the lowest response latencies are found with syllogisms in the asymmetrical A±B, B±C and B± A, C±B ®gures, whereas the highest percentages of erroneous ``no valid conclusion’’ responses are found with syllogisms in the symmetrical A±B, C±B and B±A, B±C ®gures. Johnson-Laird and Byrne (1991) attribute these response variations to the varying number of operations that need to be carried out on mental models of the premises of syllogisms with different ®gures in order to bring the middle terms into contiguity. For example, the middle terms of syllogisms in the A±B, B±C ®gure are already contiguous, so no operations need to be carried out on the premises. With syllogisms in the A±B, C±B ®gure, however, it is necessary to switch round the mental representation of the second premise to bring the middle terms into contiguity. As these syllogisms place greater demands on working memory than do those in the A±B, B±C ®gure, fewer correct responses and greater response times are observed. In Experiment 2 the effect of ®gure upon the extent to which participants respond in accordance with belief was investigated by presenting syllogisms that varied not only in validity and believability, but also in ®gure (see the following Syllogisms 1 to 4). (1) Valid form: Some A are B No B are C Therefore , some A are not C

(2) Invalid form: No A are B Some B are C Therefore, some A are not C

(3) Valid form: Some A are B No C are B Therefore , some A are not C

(4) Invalid form: No A are B Some C are B Therefore, some A are not C

As can be seen, valid and invalid syllogisms were presented in the A±B, B±C and A±B, C± B ®gures. According to the mental models theory, the same mental models would be

BELIEF BIAS IN SYLLOGISTIC REASONING

1211

constructed for these problems irrespective of ®gure (Johnson-Laird & Byrne, 1991, pp. 107, 109). In this respect, therefore, the valid and invalid problems in the different ®gures may be considered logically equivalent. Being set within the framework of the mental models theory, the metacognitive uncertainty account predicts that participants should experience greater feelings of uncertainty and resort to a belief heuristic more readily with syllogisms in the A±B, C±B ®gure (where premise switching places demands on working memory) than with syllogisms in the A±B, B±C ®gure (which do not require premise switching). Thus, in addition to the standard interaction between logic and belief, an interaction between ®gure and belief was predicted in Experiment 2. Method Participants Seventy participants took part in the experiment. The group comprised undergraduate psychology students at the University of Derby. The participants were tested either individually or in small groups. Materials Four forms of multiple-model syllogism were used. Two of these were valid, and two were indeterminately invalid. As in Experiment 1, logical validity was manipulated by varying the mood between the valid and invalid problems. The problems in the A±B, B±C ®gure were the same forms of syllogism as those used previously by Newstead et al. (1992, Experiments 2 and 4), and so similar results to those obtained in this earlier study were expected here. A set of potential conclusions that were false by de®nition (e.g., ``Some sparrows are not birds’’) were chosen, together with a set of believable conclusions (e.g., ``Some animals are not cats’’). The conclusions were devised so as to appear believable when the terms were presented in one order, but unbelievable when the terms were reversed. In order to assess believability, the conclusionswere prerated by a group of 60 participants on a seven-point scale ranging from 2 3 (totally unbelievable ) to 1 3 (totally believable ). Those conclusions that received the most extreme and consistent ratings were used in this study (see Table 2). In summary, there were eight different types of syllogism deriving from three variables: (1) the ®gure of the syllogisms, (2) the logical validity of the presented conclusions, and (3) the believability of the presented conclusions. In addition to these eight types of experimental syllogism there were three ®ller syllogisms. These were the same one-model problems as those used in Experiment 1. Design A repeated measures design was used. The syllogisms were presented with their potential conclusions in booklets, one to a page. The order of the problem types was varied using an eight-by-eight balanced Latin square design; with the restriction that the ®ller items appeared in the same positionin each booklet: in 2nd, 5th, 8th, and 11thplaces. To counterbalance the effects of thematic materials, the contents of the syllogisms were rotated over the different problem types, producing eight different sets of materials. The eight sets of materials were distributed randomly amongst participants.

1212

QUAYLE AND BALL TABLE 2 a Believability ratings of conclusions used in Experiment 2

M

SD

2.98

0.13 1.83

2.95

0.22 1.52

2.88

0.69 1.09

2.87

0.35 1.74

2.78

0.92 1.90

2.77

0.57 1.16

2.57

1.44 2.20

2.25

1.46 1.92

B U

Some men are not kings Some kings are not men

2 2.23

B U

Some animals are not cats Some cats are not animals

2 2.50

B U

Some birds are not sparrows Some sparrows are not birds

2 2.80

B U

Some ¯owers are not tulips Some tulips are not ¯owers

2 2.23

B U

Some vehicles are not cars Some cars are not vehicles

2 2.15

B U

Some metal is not steel Some steel is not metal

2 2.47

B U

Some women are not actresses Some actresses are not women

2 1.78

B U

Some tall people are not giants Some giants are not tall people

2 1.98

Minimum possible rating 5 2 3 ``totally unbelievable’’; maximum possible rating 5 +3 ``totally believable’’. a

Procedure Instructions were presented on the second page of the test booklets. These were the same instructions as those given in Experiment 1 (minus the con®dence scale details). Participants were allowed as much time as they required.

Results

The mean percentages of frequencies of participants accepting conclusions are presented in Table 3 for each type of syllogism. AWilcoxon signed-ranks test revealed that there was a signi®cant effect of logic, z 5 5.43, p , .001, one-tailed, with participants accepting more valid conclusions than invalid conclusions. There was also a signi®cant effect of belief, z 5 5.224, p , .001, one-tailed Wilcoxon signed-ranks test, such that more believable conclusions were accepted than unbelievable conclusions. In order to test for the interaction between logic and belief, scores for the unbelievable problems were subtracted from scores for the believable problems across participants to give an index of the size of the effect of belief for the valid and invalid syllogisms. A Wilcoxon signedranks test was used to see if the size of the effect of belief differed signi®cantly between

BELIEF BIAS IN SYLLOGISTIC REASONING

1213

TABLE 3 Mean percentages of frequencies of participants accepting conclusions in Experiment 2 as a function of ® gure, logic, and belief

Val Inv M (val/inv) M (bel/unb) Note: Bel 5

val 5

A±B, B±C ®gure

A±B, C±B ®gure

Bel

Unb

M

Bel

Unb

M

86 57 71

61 23 42

74 40

79 59 69

50 21 36

64 40

57

believable; unb 5 valid; inv 5 invalid.

52 unbelievable;

valid and invalid syllogisms. Despite a tendency for the effect of belief to be greater with invalid syllogisms than with valid syllogisms, the interaction between logic and belief fell short of signi®cance, z 5 1.233, p 5 0.109, one-tailed. Overall, the effect of ®gure upon conclusion acceptances was not signi®cant. In order to test for the interaction between ®gure and belief, scores for the unbelievable problems were subtracted from scores for the believable problems across participants to give an index of the size of the effect of belief for the problems in the A±B, B±C and A±B, C±B ®gures. A Wilcoxon signed-ranks test was used to see if the size of the effect of belief differed signi®cantly between syllogisms in the two different ®gures. The interaction between ®gure and belief was not signi®cant. No other interactions were signi®cant. Discussion

These results provide a good illustration of the effects of belief and logic observed in studies of belief bias, as well as some evidence for a logic-by-belief interaction. Indeed, the ®ndings are much the same as those reported by Newstead et al. (1992, Experiments 2 and 4), in which the same forms of syllogism were used. There was, however, no evidence of an interaction between ®gure and belief as predicted by the metacognitive uncertainty account of belief bias. It is possible that varying the forms of the syllogisms between the A±B, B±C and A±B, C±B ®gures in Experiment 2 was not a powerful enough manipulation of problem dif®culty to cause working-memory loads to differ greatly between the two ®gures. If working-memory loads were much the same for problems in both ®gures, then the tendency to fall back on a belief heuristic would also be much the same with both ®gures, and so no interaction between ®gure and belief would be observed. The ineffectiveness of the ®gural manipulation in Experiment 2 may be due to the employment of a conclusion-evaluation methodology. Evidence for the effect of ®gure upon problem dif®culty comes from studies of syllogistic reasoning that have employed conclusion production (e.g., Johnson-Laird & Bara, 1984) or multiple-choice tasks (e.g., Dickstein, 1978), where no single conclusion is presented for evaluation. Hardman and

1214

QUAYLE AND BALL

Payne (1995) have reported a dissociation between the conclusion production, multiplechoice, and evaluation tasks, such that the evaluation methodology yielded the highest number of logically valid responses, followed by the multiple-choice and the production methodologies, respectively. They explain the superior performance observed using the conclusion evaluation methodology by suggesting that where a conclusion is presented for evaluation, participants have a goal for the construction and manipulation of mental models. We would suggest that the presence of a goal towards which the reasoning process is geared may minimize any effect of ®gure upon the nature of responses in this paradigm. EXPERIMENT 3

Experiment 2 aimed to demonstrate indirectly a role for working-memory limitations in determining belief bias using a manipulation of processing complexity. The failure to ®nd predicted effectsÐpossibly arising from the ineffectiveness of our manipulationÐ suggested a potentially more fruitful approach to examine this issue would be to calibrate directly aspects of working memory and assess their association with belief-biased responding. The metacognitive uncertainty account assumes that participants fall back upon a belief heuristic when the processing demands of syllogisms exceed workingmemory capabilities. From this assumption it follows that individual differences in working-memory resources will cause variations in the extent to which participants will respond in accordance with prior beliefs. Individuals with low working-memory capabilities, when compared with individuals with high working-memory capabilities, should more readily experience a level of uncertainty that exceeds the threshold level beyond which the belief heuristic is triggered. As a consequence, they should be more inclined to fall back on a belief heuristic when faced with the high processing demands of invalid problems. As valid syllogisms are claimed to place relatively manageable loads on working± memory, most participants, irrespective of their individual working-memory capabilities, should experience levels of uncertainty that do not exceed this threshold level. The capacity of working memory, therefore, should have little effect on the tendency to fall back on a belief heuristic with valid problems. Consequently, the size of the logic-by-belief interaction should be greatest among participants with low working memory (i.e., there should be a three-way interaction between logic, belief, and working-memory capacity). The earlier mental models account would seem to predict quite a different pattern of responding. This account claims that following construction of an initial mental model of the premises, believable conclusions are simply accepted, whereas unbelievable conclusions are tested against alternative, potentially falsifying models. As a result of this selective scrutiny, greater processing demands would be placed on working±memory when a problem has an unbelievable conclusion than when it has a believable conclusion. Participants’ working-memory capacities, therefore, should have little effect on responses to problems with believable conclusions, as these problems place a manageable load on working±memory. However, problems with unbelievable conclusions place a greater and less manageable load on working memory. The effect of logic observed with unbelievable problems, therefore, would be greatest among participants with high working-memory capabilities. Consequently, the size of the logic-by-belief interaction should also be more pronounced among such participants. This is because they should be able correctly to

BELIEF BIAS IN SYLLOGISTIC REASONING

1215

reject invalid conclusions, whereas participants with low working-memory capabilities should be inclined erroneously to accept such conclusions (this response is consistent with an incomplete conclusion evaluation). Thus, although a three-way interaction between logic, belief, and working-memory capacity is again predicted, this interaction is different in nature to that outlined earlier in relation to the metacognitive uncertainty account. So far, we have been discussing working-memory capacity in a generalized manner, and indeed the concept of working memory used in the mental models theory (e.g., Johnson-Laird & Byrne, 1991, p. 39) is itself a generalized one, with no stated allegiance to any particular contemporary account. In discussing the representational substrate of mental models, however, Johnson-Laird does allude to the possibility that they may be constructed within a spatial medium when he notes that ``the ability to construct alternative models . . . should correlate with spatial ability’’ (Johnson-Laird, 1985, p. 190). Although there is as yet little theoretical consensus over the best way of conceptualizing working memory (cf. Richardson et al., 1996), researchers interested in the role of working±memory in deduction have tended to adopt Baddeley’s (e.g., 1990) tripartite model. This model assumes that working±memory has three main components: (1) the phonological loop (which comprises a phonological store and an articulatory rehearsal loop)Ð where a limited amount of speech-based or phonological information is held; (2) the visuospatial sketch padÐwhere a limited amount of spatially or visually represented information is maintained and manipulated; and (3) the central executiveÐwhich allocates resources and controls processing within the two slave systems. Recent studies that have investigated the role of working memory sub-systems in deduction (e.g., Gilhooly, Logie, Wetherick, & Wynn, 1993; Klauer, Stegmaier, & Meiser, 1997; Toms, Morris, & Ward, 1993; Vandierendonck & De Vooght, 1997) have, however, produced inconsistent results. For example, Gilhooly et al.’s (1993) investigation of syllogistic reasoning produced evidence for central executive involvement, limited phonological loop involvement, and no visuospatial involvement. Klauer et al. (1997), on the other hand, although they presented similar evidence for the involvement of the central executive and phonological loop in propositional and spatial reasoning, reported additional evidence for spatial involvement. As Baddeley’s theory would suggest that the central executive, in its supervisory capacity, would have a role to play in all processing activities involving the allocation of resources and decision making, we do not ®nd it surprising that loading this component of working±memory affected reasoning performance in the two studies cited. However, the con¯icting ®ndings of these studies concerning the role of the visuospatial sketch pad in deduction suggest that the involvement of this sub-component needs further investigation. With this in mind, we set out to test the predictions deriving from the two theories of belief bias described previously by presenting participants with a syllogistic conclusion-evaluation task in which validity and believability were manipulated. Participants were also presented with separate tests designed to calibrate spatial and articulatory working-memory recall spans.

1216

QUAYLE AND BALL

Method Participants Thirty-two undergradute psychology students at the University of Derby took part in the experiment. None of the participants had taken formal instruction in logic, and all were familiar with personal computers (with standard English keyboard and mouse). Participants were tested in groups of four. Materials Opus Pentium75 multi-media personal computers (each with keyboard, mouse, and 14-inch Svga colour monitor) were used to present the memory tests and to record responses. Authorware Professional for Windows 2.0 was used to create and present all screen displays and to record all responses. The spatial span recall test was based upon the Corsi blocks test used by De Renzi and Nichelli (1975; see also Smyth & Scholey, 1996). A non-symmetrical array of nine 2 3 2 cm squares was used to present spatial sequences. The squares were presented in white, outlined in black upon a white background. The maximum distance between the centres of any two squares was 18 cm, and the minimum distance was 5 cm. The trial items were chosen by the computer at random. Prior to the test participants were given two practice trials in which they had to recall sequences of three squares. The start of the test was initiated by the subject pressing a key. The test began with the presentation of four trials on four squares and proceeded, increasing by one square every four trials, until four trials on ten squares had been presented. One by one the squares turned black (for 500 ms) and then back to white (at intervals of 750 ms) when selected by the computer in the span sequence. At the end of each span sequence a tone sounded, and the white squares turned into grey screen buttons. During the recall phase, whenever the subject selected any of the screen buttons with the mouse cursor the sound of a small bell was activated. The articulatory span recall test was based upon the memory span task described by Halford, Bain, and Mayberry (1984; see also Smyth & Scholey, 1996). The test consisted of four blocks of eight trials. The ®rst trial in the ®rst and third blocks consisted of strings of 3 letters, the second strings of 4 letters, and so on up to 10 letters. The ®rst trial in the second and fourth blocks consisted of strings of 10 letters, the second strings of 9 letters, and so on down to 3 letters. The letters were consonants selected so that no two alphabetically adjacent letters were adjacent in the trials. The start of the test was initiated by the subject pressing a key. One second later an asterisk appeared in the centre of the screen for one second followed by a letter string. The length of time for which each letter string was presented varied according to the length of the string, such that strings of three letters were presented for 3 s, strings of four letters for 4 s, and so on. When the letter string presentation time had elapsed a bell sounded and a prompt appeared so that participants could key in their recall responses. The reasoning task featured two forms of multiple-model syllogism (e.g., as classi®ed by JohnsonLaird & Byrne, 1991). Both syllogisms were in the B±A, C±B ®gure and were presented with conclusions of the form ``some C are not A’’. For one syllogism in the EIO mood this conclusion is valid, but for the other syllogism in the IEO mood this conclusion is indeterminately invalid. As in Experiment 2, logical validity was manipulated by varying the mood between valid and invalid syllogisms. Although this gives control over ®gure and conclusion form (and thus controls for potential ®gural biases), validity was confounded with mood. In Experiment 3, however, the valid problems had the same mood as the invalid problems in Experiment 2, and vice versa. Using such materials makes it possible to test whether the effect of logic upon acceptances observed in Experiment 2 is attributable to the mood manipulation and not the validity manipulation.

BELIEF BIAS IN SYLLOGISTIC REASONING

1217

Half of the valid and invalid syllogisms were presented with conclusions that were believable, and half were presented with conclusions that were unbelievable by de®nition. The believable and unbelievable conclusions were the same as those used in Experiment 2. In addition to the four experimental syllogisms there were three ®ller syllogisms. These were the same one-model problems as those used in Experiments 1 and 2. Design and procedure A repeated measures design was used with all of the participants receiving both the pen and paper reasoning task and two computer-based memory tests. The order of presentation of the three phases of the experiment was varied using a three-by-three Latin square design. In the reasoning task all of the participants received the two multiple-model syllogisms together with the three one-model ®ller syllogisms. These were preceded by 3 practice syllogisms (10 syllogisms in total). The syllogisms together with their potential conclusionswere presented in booklets one to a page. The order of the problem types was varied using a four-by-four balanced Latin square design, with the restriction that the ®ller items appeared in the same position in each booklet: in 2nd, 4th, and 6th places. The thematic contents of the syllogisms were rotated over the different problem types, producing four different sets of materials. The four sets of materials were distributed evenly and randomly amongst the participants. At the beginning of each booklet three multiple-model practice syllogisms were given in order to familiarize the participants with the task. The participants were unaware that these were practice syllogisms. The participants were seated in front of the computer screen throughout the experiment. The instructions for the two memory tests were presented on the computer screen. The participants were given as much time as they needed to carry out the memory tests. The instructions for the reasoning task were presented on the second page of the test books, and were the same as those given in Experiments 1 and 2. Again, the participants were allowed as much time as they required to complete the books of syllogisms.

Results

Span scores in the spatial recall test were calculated using the same method as that employed by Smyth and Scholey (1996): For each subject the number of trials correct out of four was calculated for each set size and totalled. This number was then divided by four and added to the number that was one below the smallest set size in the test trials. This gave a score for each subject, which was based on the number of trials correct throughout the testing sequence. Spans in the articulatory recall test were calculated by adding together the highest number of letters recalled in each of the four sets of trials and dividing this ®gure by four (cf. Halford, Bain, & Mayberry, 1984). The results for the two memory tests are presented in Table 4. Participants were divided into high and low recall span groups according to a median split. The overall mean percentages of frequencies of participants accepting conclusions (i.e., judging a conclusion to be valid) are presented in Table 5 (as a function of logic, belief, and spatial recall group) and in Table 6 (as a function of logic, belief, and articulatory recall group). There was a main effect of logic, z 5 2.80, p , .01, one-tailed Wilcoxon signed-ranks test, with participants accepting signi®cantly more valid conclusions than invalid conclusions. There was a main effect of belief, z 5 3.73, p 5 .001, one-tailed Wilcoxon signed-ranks test, with participants accepting signi®cantly more

1218

QUAYLE AND BALL TABLE 4 Means, medians, ranges, standard deviations, skewness, and kurtosis for span scores in the two memory tests in Experiment 3

Spatial recall

Articulatory recall

5.10 5.25 3.00±6.75 0.93 2 0.488 2 0.209

6.23 6.00 4.50±8.75 1.09 0.685 0.037

M

Median Range SD

Skewness Kurtosis

TABLE 5 Overall mean percentages of frequencies of conclusion acceptances in Experiment 3 as a function of logic, belief, and spatial working a memory span

Spatial recall span High (n 5

13)

Low (n 5

19)

Bel

Unb

M

Bel

Unb

M

Valid Invalid

92 54

46 15

69 34

79 89

68 16

74 53

M

73

31

52

84

42

63

Note: Bel 5 a

span .

5.25 5

believable; unb 5 unbelievable. ``high’’; span # 5.25 5 ``low’’.

TABLE 6 Overall mean percentages of frequencies of conclusion acceptances in Experiment 3 as a function of logic, belief, and articulatory a working memory span

Articulatory recall span High (n 5

17)

Low (n 5

15)

Bel

Unb

M

Bel

Unb

M

Valid Invalid

76 65

65 12

71 38

93 87

53 20

73 53

M

71

38

54

90

37

63

a

span .

6.00 5

``high’’; span