Deductive rationality in validating and testing ...

8 downloads 0 Views 201KB Size Report
Walter Schroyens (Walter[email protected]) ... Walter Schaeken (Walter. .... The hypothetical thinking model (Evans & Over, 2004; Evans et al.,. 2003) and ...
Deductive rationality in validating and testing conditional inferences.

Walter Schroyens ([email protected]) University of Gent, Department of Psychology Henry Dunantlaan, 2, B-9000, Gent, Belgium

and

Walter Schaeken ([email protected]) University of Leuven, Department of Psychology, Tiensestraat, 102, B-3000, Leuven, Belgium.

Acknowledgments We gratefully acknowledge the of the Flanders (Belgium) Fund for Scientific Research (G.0320.05) and the Canadian Natural Sciences and Engineering Research Council (NSERC 297517). Experiment 1 was presented at a Cognitive Science Society meeting (Schroyens, 2005a). We thank those reviewers who engaged seriously with the theoretical narrative and provided constructive criticisms to improve upon it.

Abstract We asked people to validate conditional inferences (e.g, “A, therefore C” with ‘if A then C’). People are more likely to look for falsifications (‘A and not-C’) versus confirmations (‘A and C’), given a forced choice. Second, falsification rates are lower for logically valid versus invalid inferences. Logically valid inferences are inferences that follow necessarily. Experiment 1 (N = 96) shows that emphasising this logicality constraint increases falsification rates in the validation task and corroborates that validation-by-falsification increases logically correct inference evaluations. Experiment 2 (N = 41) corroborates the other way round that people who are more likely to make logically correct evaluations, show higher falsification performance in the validation task. The results support mental-models theory and suggest alternative theories similarly need to specify how people would go about looking for counterexamples. We proffer such a specification for two alternatives to the model theory.

1

Introduction Our beliefs are hypotheses about the world we live in, the creatures in it and our interactions with them and each other. Expectations are grounded by deriving inferences from our beliefs. Forming and testing hypotheses is thus of capital interest in marrying our beliefs with the changing environment we inhabit. Many such beliefs reflect conditional relationships. Reasoning about conditionals has accordingly attracted the interest of cognitive scientists of all backgrounds. There exists abundant evidence people take counterexamples into account (e.g., Cummins, Lubart, Alksnis, & Rist, 1991; De Neys, Schaeken, & d’Ydewalle, 2003). These studies, however, have been conducted with knowledge-rich materials. These materials, mostly causal conditionals, are rich in associated background knowledge and have been pre-tested and classified as yielding few or many counterexamples. The results typically show that when there are more counterexamples, people are generally less certain about the conclusion countered by the counterexample. Consider, for instance the following two formally equivalent arguments, -

If a woman has sex, then she will become pregnant. Hence, given that this woman is pregnant, it follows that she has had sex.

-

If a dog has fleas, then it will scratch itself. Hence, given that this dog scratches itself, it follows that it has fleas.

Most readers will consider the latter conclusion less plausible, as there are many causes for scratching, while there are only few ways of getting pregnant. It is generally assumed that there is an automatic activation of background knowledge. Theories that address commonsense reasoning about knowledge-rich conditionals proffer a largely passive, data-driven “search” for counterexamples (see, e.g., Oaksford & Chater, 1998).

2

This stands in contrast with theories of deductive reasoning that posit an active goal-directed search. In these theories counterexamples are often not activated by semantic association; they are actively looked for to test inferences and constructed on the basis of the negation of the inference one aims to test (Schroyens & Schaeken, 2003; Schroyens et al., 2001). Theories that posit a goal-directed search for counterexamples have mostly focused on explaining performance on knowledge-lean deductive inference problems. For instance: (1)

If letter is an A then the number is a 2. The number is 2; hence the letter is an A.

Performance on such problems is consistent with the idea that people engage in an active counterexample search to test putative inferences drawn from an initial problem treatment (see, e.g., Johnson-Laird & Byrne, 1991; Schroyens et al., 2001). Being able to explain performance on problems such as (1), however, provides only indirect and inconclusive evidence for an active search for counterexamples. We thus set out to provide more direct evidence. But why would people engage in a critical thinking exercise in the first place? A counterexample search can establish logical validity. For arguments to be logically valid its conclusion must necessarily be true, given that the premises are true. When there is an acceptable alternative to the conclusion (which means the conclusion is possibly false), we know the conclusion is not necessarily true. When there is no such alternative, the conclusion necessarily follows. Consider the following logically valid Modus Ponens (MP) argument. (2)

If , then . , therefore .

Few people will have difficulty realizing that the conclusion is falsified by . That is, the counterexample to MP is formed by the
contingency. At the same time it

3

is a well-established phenomenon is considered impossible (see, e.g., Barrouillet & Lecas, 1998; Marcus & Rips, 1979). That is, the counterexample to MP is not an acceptable alternative. This means that (though the conclusion might be false) the conclusion cannot be false given that the premises are true. The conclusion follows necessarily: one cannot deny the conclusion without contradicting at least one of the premises. For instance, you cannot deny water will boil when heated up to 100°C, unless you deny that it is strictly true that water heated to 100°C will boil. The rule is “probabilified” (Morris & Sloutsky, 1998) by considering that it only holds in the context of normal atmospheric pressure. The ability to establish logical validity is an important property (Stanovich, 2004); one of the most robust findings in deductive reasoning literature is that people are more likely to endorse valid versus invalid arguments. Searching for alternatives seems the obvious way to test one’s inferences. What else could one do? Instead of looking for falsification, one could look for confirmations of one's beliefs. The choice of testing one's (potentially inferred) beliefs by means of verification vs. falsification likely depends on the goal set in testing inferences. Inferences are tested because of uncertainty; the question, however, is ‘uncertain’ in what way, about what? One can be uncertain about whether the conclusion is possibly correct, or one can question whether it is necessarily correct. Often people look for a plausible conclusion as opposed to one that is logically valid. A plausible conclusion or belief can provide sufficient grounds to take action in daily life; it would be adaptively irrational (Anderson, 1990) if one were to act only upon necessary beliefs. When we are not certain that something might be true, it seems as yet precipitous to ask whether this state of affairs is necessarily true. Uncertainty concerning possibilities seems to accompany situations in which we are reasoning about unfamiliar contents and cannot rely on semantic memory to establish whether a

4

hypothetical possibility constitutes a real, factual possibility. Instead of searching for alternatives, one might opt to first check whether this initial possibility is a factual possibility. Consider the following argument: (3)

If the figure is a triangle, then it is blue. The figure is not a triangle; therefore, the figure is not blue.

One can imagine applying the rule to a toy box filled, which you have not been allowed to check the contents of. Before checking whether there might be non-triangular figures that are blue (as a counterexample to the conclusion that the figure is not blue), one might opt to check/verify whether there are in fact non-triangular figures that are not blue. In summary, to test the possible truth of inferences one can attempt to confirm them, establishing hypothetical possibility as factual possibility. To test inferences one might also aim to establish hypothetical possibilities as impossibly false. When the putative inference cannot be false, it must necessarily be true. When the putative inference can be false it is not necessarily true. In this paper we explore the prevalence of confirmation/falsification strategies. By using abstract materials and a novel inference-validation task we query whether people can, and sometimes do, engage in an active goal-directed search for counterexamples. We stated “searching for alternatives seems to be the obvious way to test one’s inferences”. This is direct implication of the so -called mental-models theory (Johnson-Laird & Byrne, 2002; Schroyens & Schaeken, 2003). Mental-models theory postulates individuals understand sentences by representing possibilities. In accordance with the Gricean principles of conversation (Grice, 1975; Levinson, 2000), people assume information communicated to them is true: They initially not consider incompatible/false possibilities. This is the so-called truth-principle. Moreover, given bounded rationality, people cannot and hence will not even consider all true

5

possibilities from the outset. This is the implicit-model principle. The theory makes a distinction between generating and testing inferences. People first generate a putative inference on the basis of the initial possibilities. At least some people at least sometimes attempt to validate this initial conclusion by looking for counterexamples. It follows that the model theory must predict that falsification will be more prevalent than verification. Falsification (i.e., a search for counterexamples) is at the heart of mental-models theory and its account of critical thinking and reasoning. Alternative theories do not make the distinction between generating and testing inferences and do not posit a validation stage. So-called rule based theories (e.g., Rips, 1994) do not suppose people test their inferences. That is, ceteris paribus, at best these theories predict that when people are instructed to test their inferences, there is no difference in the prevalence of verifying or falsifying tests. The hypothetical thinking model (Evans & Over, 2004; Evans et al., 2003) and the conditional probability model (Oaksford, Chater & Larkin, 2002) do not posit a test phase either.

Experiment 1 To study procedures for testing beliefs and inferences, we confronted reasoners with a standard inference-evaluation task, as well as a novel reasoning task (i.e. the 'inference-validation task'). The four basic inference problems are formed by an affirmation or denial of the antecedent (A) or consequent (C): MP

if A then C, A; therefore C

DA

if A then C, not-A; therefore not-C

AC

if A then C, C; therefore A 6

MT

if A then C, not-C; therefore not-A

The arguments are classically referred to as Modus Ponens (MP), Affirmation of the Consequent (AC) Denial of the Antecedent (DA) and Modus Tollens (MT). In the inference-evaluation task people are asked to evaluate the conclusion (does it follow from the premises?). The inferencevalidation task asks people how they would test the inferences. They can do this by selecting which contingency they would need to check. There are only four possible contingencies wherein the antecedent and consequent does or does not hold (is True “T” or false “F”). In relation to conditionals these are: TT:

A and C

TF:

A and not-C

FT:

not-A and C

FF:

not-A and not-C

The status of a contingency as one that falsifies or verifies an inference depends on the arguments. The verifying contingency confirms both the categorical premise and the conclusion: one checks whether the conclusion is possibly true. The falsifying contingency also confirms the categorical premise but falsifies the conclusion: one checks whether the conclusion is possibly false. It then is a straightforward matter to determine the confirming or falsifying contingency for the four conditional arguments: MP, MT, AC, and DA. Consider , e.g. If it rains then the streets get wet. TT (wet streets when it rains) confirms both MP (“A, therefore C”) and AC (“C therefore A”). The confirmatory selection for DA (“not-A, therefore not-C”) and MT (“not-C therefore not-A”) is captured by FF (e.g., dry streets on a sunny day). What about the falsifying contingency? FT (e.g., a wet street on a sunny day because you watered your lawn) is the falsifying contingency for the invalid arguments (AC/DA). TF cases (e.g. a dry street in the tunnel on a rainy day) falsify the logically valid inferences (MP/MT).

7

It is important to note that the validation task 'forces', i.e., instructs people to test their inferences. It does not investigate whether people spontaneously test their inferences, but probes for the hypothetical test case that would need to be considered to test arguments. Constructing a hypothetical test case is distinct from the task of evaluating such cases as possible or impossible. Evaluating different truth-contingencies in relation to conditionals (vs. arguments) is investigated with truth-table tasks (e.g, Barrouillet & Lecas, 1998). The validation task does not require subjects to judge whether the case that would falsify the conclusion is a real possibility. Participants are accordingly not given the 'none of the above' option indicating that no contingency would need to be selected. The 'none of the above' option would defeat the purpose, which is to see how people test; not whether they test spontaneously. Only the relation between inference-validation and inference-evaluation task performance allows us to test whether some people spontaneously test their arguments. On the basis of the validation task we can classify reasoners according to their tendency to falsify or verify. Presuming people are to some extent consistent in how they go about testing inferences, it follows that 'falsifiers' should be more likely to reject arguments. This obviously only follows if these validation-task falsifiers also spontaneously test their inferences (i.e., without being asked or forced to do so). If we were to observe falsifiers are not more likely to reject arguments (when they are not explicitly asked to test), we would have evidence against the idea that at least some people -- i.e., some of those identifiable as falsifiers by means of the validation task -- also spontaneously look for counterexamples. The predicted relation between the two tasks only applies to falsifiable arguments. Only logically invalid arguments are falsifiable when the premises are true. The validation task requires people to consider hypothetical test cases, without the need to perform the test. In the

8

inference-evaluation task, at least some people would test the arguments. That is, validation-task falsifiers would not only consider the hypothetical situation that TF would, hypothetically, falsify MP "A therefore C". In the inference-evaluation task they would, ex hypothesis, also evaluate this counterexample. Truth-table tasks show people consider TF impossible. Hence, even if people consider TF as a potential counterexample, they would not accept it as a true possibility. The valid argument is, thus, not to be rejected. In short, validationtask falsifiers (vs. those who do not tend to falsify arguments in the validation task) should be more likely to reject invalid but not valid arguments. We suggested that the goal of falsifying or confirming arguments depends on whether one aims to establish whether the conclusion follows possibly or necessarily. We therefore decided to manipulate task instructions. Some reasoners were asked to select the contingency they would need to check to test whether “the conclusion follows”: a weak-necessity context. Others tested whether “the conclusion follows necessarily” (henceforth coined the standard-necessity condition). In a more stringent context they considered whether “the conclusion follows necessarily and not just possibly” (strong-necessity). The strong-necessity condition stresses the need to check the logical validity of the arguments. This should increase the tendency to falsify. Schroyens et al. (2003) manipulated inference-evaluation task instructions and observed higher rejection rates under stressed-necessity conditions, especially on logically invalid arguments. Truth-table tasks show people do not accept TF as a true possibility. Given that people assume the conditional is true and are instructed to assume such, this means that the hypothetical falsifications are not accepted as real possibilities. The stressed necessity effect should therefore be stronger on invalid arguments. For these arguments, the falsifying case is actually a real possibility. Applying the necessity-instruction manipulation to the validation

9

task provides a direct test of Schroyens et al.’s (2003) argumentation. In summary: First, since falsification is explicitly assumed to be the dominant strategy in the model theory and all models of this theory's general processing principles, we expected to observe exactly this dominance of falsification over confirmation. Second, falsification in the inference-validation task is expected to increase as a function of the emphasis placed on the rules of the language game of deduction (logically valid inferences are inferences that follow necessarily and not just possibly). Third, some of those who falsify when forced to test in the inference-validation task would also tend to falsify spontaneously. It follows that those who look to falsify arguments in the inference-validation task should also be more likely to reject the invalid (but not the valid) arguments in the inference-evaluation task. Method Design. Participants solved both an inference-validation task and an inference-evaluation task for each of the four logically valid (MP/MT) or invalid (AC/DA) affirmation (MP/AC) or denial problems (MT/DA). Participants were assigned to one of three groups: the Weak-, Standard- or Strong-Necessity group (see Materials and Procedure).

Participants. One-hundred sixteen first-year psychology undergraduates at the University of Leuven participated. They were randomly allocated to one of three groups: 39, 37 and 40 in respectively the ‘Weak-necessity’, ‘Standard-Necessity’ and ‘Strong-Necessity’ groups.

Materials and procedure. The arguments were presented in the following format: Rule:

If the figure is a CIRCLE, then it is coloured RED.

Fact:

The figure is NOT a CIRCLE

10

Conclusion:

Hence, the figure is NOT coloured RED.

All arguments concerned the same rule and participants were told to assume the rule and given fact were true. Argument presented format was identical in the inference-validation and inference-evaluation task. ‘Strong-Necessity’ instructions introduced the validation task as follows: Your task is to determine whether the conclusion follows necessarily (and not just possibly) from the given information (the rule and the given fact). We ask you to indicate which states of affairs (figures with a particular shape and colour) would allow you to decide the conclusion follows not just possibly but necessarily from the given information. A conclusion that follows possibly is a conclusion that ‘can be true’, which obviously is not the same as a conclusion that is necessarily correct (‘has to be true’).” The two other groups received ‘Standard Necessity’ or ‘Weak necessity’ instructions: “Your task is to determine whether the conclusion follows [necessarily] from the given information ... to decide whether the conclusion follows [necessarily] from the given information”. Reference to “necessity” was left out in the weak-necessity instructions. Strength of necessity was mirrored in the validation task presentation format: One would need to know whether it is possible or not that there are O

CIRCULAR figures that are RED.

O

NON-CIRCULAR figures that are RED

O

CIRCULAR figures that are NOT RED.

O

NON-CIRCULAR figures that are NOT RED.

to be able to decide the conclusion follows not just possibly, but necessarily.

11

In the standard-necessity context the final sentence read “... to be able to decide that the conclusion follows necessarily”. The inference-evaluation task also reflected the different levels of necessity, both in the question asked and the response options given. For instance, the strong-necessity version asked “does the conclusion follow necessarily or is it merely possible”. The two response options were: “Yes; the conclusion follows necessarily and not just possibly” and “No; the conclusion does not follow necessarily”. The experiment was run in groups of 20 to 40. When entering the auditorium participants received a booklet with the instructions and one argument per page (randomly presented). For each argument participants first completed the validation-task.

Results and Discussion We first present the validation-task results. Next we consider the evaluation-task and the relation between the two tasks. Some participants selected more than one response alternative, i.e., testcase on any of the four arguments. These participants were excluded from all analyses, which included only those participants who selected one or no contingency. This resulted in 37, 29 and 30 participants in the Weak-Necessity, Standard-Necessity and Strong-Necessity groups. Appendix A presents the full set of data.

*********** TABLE 1 ***********

Inference Validation. Table 1 presents selection rates as a function of the level of necessity. The data show that falsification is the dominant strategy. Chance-level performance is 1/4,

12

whereas falsification rates averaged .643 (χ² = 122.6, p < .001). This falsification rate was reliably larger than the verification rate (.138; χ² = 125.4, p < .001). Logical validity affected the falsification rates. People are more likely to falsify invalid AC/DA inferences than valid MP/MT inferences (.479 vs. .807: Wilcoxon T = 112.0, N = 54, Z = 5.428, p < .001). Overall, problem type (affirmation vs. denial) did not affect falsification rates (.656 vs. .630). When taking the proportion of confirmatory selections as the dependent measure, there were more confirmations of denial vs. affirmation inferences (.109 vs. .167; T = 72.0, N = 23, Z = 2.007, p < .05). The main effect of logical validity suggests that a counterexample search is not fully disentangled from the plausibility of these alternatives. Some people seem to select a confirmatory test because the falsifying test fails. TF falsifies the logically valid inferences. However, this contingency is impossible when the conditional is true. Were it always true that “if I cut my finger, then it bleeds’, it would be impossible that my finger does not bleed when cut. People recognize this and reject TF (see, e.g., Barrouillet & Lecas, 1998; Evans et al., 1996). Schroyens et al. (2001) already concluded it is “inappropriate to adhere to a strict serial processing sequence in which the process of constructing a hypothetically falsifying model is primordial and independent of its plausibility ... ” (p. 156). The lower falsification-rates of valid arguments are not mirrored by a shift just towards higher verification rates. Table 1 indicates some participants did not select any contingency. Since these participants did not fail to select a test case for invalid arguments, these “none” responses cannot be interpreted as a failure to engage in the task. Moreover, 'none' responses are also explained by a thesis we already explicated. They are consistent with the idea that subjects not only conceptualised hypothetical tests (as required by the task), but actually also performed the test. They would not have selected a test case because the test comes out negative. TF would,

13

hypothetically, falsify the valid arguments, but is not acceptable. Instructions told participants to assume the premises are true. This might have increased the conflict between the hypothetical test-case and the theoretical impossibility of TF. There were 15 subjects who gave the "none" response for both MP and MT. Our interpretation of these responses characterises them as falsifiers who actually performed the test and did not just consider the case hypothetically. It follows, assuming people are to some extent consistent, that these presumed falsifiers would also tend to falsify invalid arguments. They indeed opted for a falsifying test in 29 of the15x2 forced tests of AC and DA. This converges on the idea that 'none' responses come from falsifiers who actually consider the null effect of falsifying test for valid arguments. Stressing the constraint that valid inferences are inferences that follow necessarily (and not just possibly) produced the expected increase in falsification rates (Median Test χ² = 7.616, df = 2, p = .024). The specific effects were all in the expected direction, though statistically only the two extreme points of the linear relation were significantly different. The Strong-Necessity group showed a minor tendency to more frequently select the falsifying contingency as compared to the Standard-Necessity group (.717 vs. .655: Mann-Whitney U30,29 = 377.0, Z = .920, ns), which did not exhibit statistically higher falsification rates as compared to the Weak-Necessity group (.574, U29,37 = 461, Z = 1.011, ns). Only the comparison between the two extreme necessity groups (weak vs. strong) reached statistical significance (.717 vs. .574; U30,29 = 400.5, Z = 2.022, p < .05). The main instruction effect was reliable on valid arguments (405 < .465 .50). The pairwise comparison between the weak and strong-necessity instructions to the invalid arguments only approached the conventional significance level (U37, 30 = 460.5, Z = 1.493, p = .074, onetailed).

14

Inference Evaluation. Table 2 presents the inference-acceptance rates as a function of whether participants previously made a falsifying, confirmatory or other selection. But let us first consider the overall acceptance rates. They mirror the general pattern of results in reasoning about knowledge-lean conditionals (see Schroyens & Schaeken, 2003; Schroyens et al., 2001, for meta-analyses). That is, first, there was a main effect of inference type (affirmation vs. denial; .562 vs. .463: T = 180.0, N = 37, Z = 2.580, p < .05), a main effect of logical validity (.807 vs. .219; T = 54.0, N = 77, Z = 7.349, p < .05) and an interaction between these two factors (T = 218.5, N = 41, Z = 2.747, p < .05). Given that participants evaluated the inferences after the validation task, it is to be expected that the acceptance rates of the logical fallacies (AC/DA) are lower than generally observed. There is not a single theory that assumes all people spontaneously engage in a validating search for counterexamples. Obviously, by instructing people to test their inferences, we ensure they will do such. Our results confirm that falsification is the dominant testing strategy. We can therefore expect that more people than in non-primed conditions search for alternatives. Schroyens et al (2003) also used stressed-necessity instructions and observed increased logical-validity effects in the inference task: Reasoners are less likely to endorse the logical fallacies (AC and DA) when the necessity constraint is stressed. They argued people are more likely to falsify when necessity is stressed. First, the higher falsification-rate in stressed necessity instructions provides direct support for this. Second, the higher rejection rates of AC and DA for validation-task falsifiers provides direct evidence for a relation between falsification and logical validity. The present inference-validation task also replicated Schroyens et al. (2003). That is,

15

people endorsed fewer AC and DA arguments in weak versus strong necessity conditions (.297 vs. .15; Mann-Withney U37,30 = 426.0, Z = 1.891, p < .05, one-tailed). As observed by Schroyens et al. (2003), and argued above, valid arguments are not reliably affected by necessity instructions (.89 vs. .85; Z = 0.498).

Inference validation and evaluation.

Table 2 shows that, as predicted, those people

who falsified in the validation task were more likely to reject the falsifiable arguments in the evaluation task. Validation-task falsifiers are more likely to reject both the invalid AC (15 .0 vs. 50.0; U80,16 = 416.0, Z = 3.130, p < .05) and DA inferences (13.3 vs. 40.0; U80,16 = 442.5, Z = 4.199, p < .05). The higher rejection rates of invalid arguments by validation-task falsifiers are consistent with the idea that some people spontaneously engage in a search for counterexamples. If people test their inferences spontaneously, at least some inference-validation-task falsifiers would spontaneously test and consequently reject falsifiable inferences in the inferenceevaluation task. The results concur and therefore corroborate the model theory's assumption that at least some people spontaneously test their inferences. When evaluating arguments, at least some people would spontaneously consider counterexamples. We know people generally evaluate TF cases as impossible (see, e.g., Barrouillet & Lecas, 1998). This means that the hypothetical counterexamples to the valid arguments are unacceptable. This implies that these counterexamples would be of little consequence in evaluating the logical status of valid inferences. We previously argued this is also why some people did not select any test case for valid arguments in the forced-validation task. These people would not only have considered the hypothetical counterexample, but would also have evaluated the acceptability of TF. Since the alternatives to valid inferences are

16

inconsistent with the conditional, it is of little consequence to consider them. Table 2 shows that falsifiers were indeed not more or less likely to reject MP and MT when asked to evaluate these arguments in the inference-evaluation task.

Discussion The results suggest that looking for potential falsifications is the dominant strategy when testing inferences. This concurs with mental-models theory. It holds people aim to validate their initial inferences by looking for counterexamples. Initial inferences are derived from an incomplete representation of the possibilities (models) consistent with conditional utterances. People would test them by looking for potential refutations. Should an acceptable counterexample be found, the conclusion is merely possible. This search for counterexamples allows the theory to account for the pervasive logical validity effect; i.e., people are generally more likely to endorse logically valid (vs. invalid) inferences. This does not mean that people are more likely to generate logically valid inferences. It means people are more likely to reject logically invalid inferences once they have been generated. Considering and accepting a falsification is sufficient reason to reject the arguments. This is corroborated by performance of people who falsified in the validation task (when forced to do so). These falsifiers seemingly also falsified spontaneously (i.e., without being forced) by showing higher rejection rates than those who had verified in the forced validation task. Our findings also suggest that validation-by-falsification is not the only test strategy. Some people tend to first check their inferences by looking for factual confirmations, implying that mental-models theory would need to be elaborated and can no longer identify test-procedures with a search for counterexamples. The validation stage in reasoning by model has always been

17

specified as a validation-by-falsification stage. The results suggest that when people test their inferences, only some will look for counterexamples; others will look for examples that confirm their uncertain inferences. The tendency of some to verify was, however, only observed in the validation task, which forces people to test their inferences. It might be that those who verify when forced to test might not spontaneously test their inference, even though we reported evidence that those who falsify when forced to test will sometimes also do so spontaneously. Parsimony speaks against making additional assumptions. That is, ceteris paribus, we would need to assume that for both those who verify or falsify when forced to do so, at least some will also verify or falsify spontaneously. Mental-model theory’s computational-level goals accordingly need to be extended to encompass uncertainty reduction by means of looking for confirmatory evidence of one's hypotheses or putative conclusions (cf. Schroyens et al., 2001; Schroyens & Sevenants, 2006 for a computational model and specification of mental models theory that does so).

Experiment 2 Experiment 1 showed that people who falsified in the inference-validation task were more likely to reject inferences in the inference-evaluation task. This is consistent with the idea that at least some people test conclusions spontaneously. However, our participants validated the given inferences before they evaluated them, implying that the ‘mind-set’ or context invoked by the validation task may have been carried through to the inference-evaluation task. When people first indicate that they would test inferences by considering a potential falsification it seems trivial enough that they subsequently do exactly that. This means the evidence for a spontaneous counterexamples search is circumstantial.

18

Stronger evidence for spontaneous testing could be obtained by eliminating potential carryover effects from the validation- to the evaluation task. We therefore made our participants first solve the inference-evaluation task. A spontaneous tendency to falsify would, in this case, not be affected by the context of explicitly being asked to test one’s inferences. However, if the model theory is on the right track, then there should still be a close correspondence between inference validation and inference evaluation. The former is supposed to be part and parcel of the latter. Consider again the logically invalid inferences. Reasoners who do not reject (i.e., accept) these are less likely to have gone through the inference-validation process. When people do not accept (i.e., reject) these inferences they would ex hypothesis have engaged in a validation-byfalsification procedure. If it can be shown that people who reject inferences in the evaluationtask (before explicitly being prompted to do so in the validation task) are more likely to select falsifications, then it is corroborated people resist logical fallacies on the basis of a validation-byfalsification mechanism. Participants therefore first completed the inference-evaluation task. Reasoners who reject the logical fallacies (before being asked how they would test them) should have a more prominent tendency to look for falsifying evidence. This finding would provide additional converging evidence that falsification is a spontaneously used test strategy.

Method Design.

Participants

first

evaluated

each

of

the

four

conditional

arguments

(MP/AC/MT/DA) and then completed the validation task for each of these four arguments.

Participants. Participants were 41 volunteer second-year sociology undergraduates at Carlton University (Ottawa, Canada).

19

Materials and procedure. The arguments were presented as in Experiment 1. In the inference-evaluation task, participants had to “evaluate whether the conclusion follows logically’. They were informed “a conclusion follows logically when it is necessarily true given that the premises are true”. Participants evaluated the conclusion as one that does, or does not follow from the given information by selecting one of two corresponding response options. The inference-validation task followed the presentation format of the strong-necessity group of Experiment 1. The experiment was run in a single collective session. Participants first solved the inference-evaluation task and received the inference-validation task afterwards. They were instructed to work on a page-by-page basis, without looking forward or turning back to previously solved problems. Results and Discussion Inference-validation. Table 3 presents the proportions of selected test contingencies. First, though falsification rates are somewhat lower than in Experiment 1, comparing falsification and confirmation rates again reveals the former's dominance. Falsifications (61/(4x35) = 43.6%) were more frequent then expected by chance (X² = 48.62 , p < .001) and were selected more frequently than confirmations (X² = 3.11, p < .05, one-tailed). Second, falsification rates were again affected by logical validity. People were more likely to select falsifications for logically invalid (vs. valid) inferences (51.4 vs. 35.7; N = 18, T = 44.0, Z = 1.802, p < .05, one-tailed). Confirmation rates were again higher for the denial arguments (28.6 vs. 32.9), though this numerical effect was not significant (Z = .764, ns). Inference evaluation. Table 4 presents the inference acceptance rates as a function of the type of validation. As before, people are more likely to endorse valid vs. invalid inferences (.914

20

vs. .343; N = 27, T = 0.0, Z = 4.541, p < .001), and are more likely to endorse affirmation vs. denial arguments (.700 vs. .557; N = 14, T = 15.0, Z = 2.354, p < .05). In the present study, the interaction between these two factors is not statistically significant (Z =.425, ns). Results corroborate the expected relation between inference-evaluation and inferencevalidation. There was a strong negative correlation between falsification rates of invalid arguments and the acceptance rates of these arguments in the inference-evaluation task (r = -.59, p < .00001). On the valid arguments, this relation was considerably smaller (r = -.265, p =.124). Comparisons of inference-rates depending on whether people falsified in the validation task, confirmed the lower AC, DA and MT inference rates for validation-task falsifiers (respectively, Z = 1.749, p

Suggest Documents