Journal of Experimental Psychology: Learning, Memory, and Cognition Base Rates: Both Neglected and Intuitive Gordon Pennycook, Dries Trippas, Simon J. Handley, and Valerie A. Thompson Online First Publication, November 11, 2013. doi: 10.1037/a0034887
CITATION Pennycook, G., Trippas, D., Handley, S. J., & Thompson, V. A. (2013, November 11). Base Rates: Both Neglected and Intuitive. Journal of Experimental Psychology: Learning, Memory, and Cognition. Advance online publication. doi: 10.1037/a0034887
Journal of Experimental Psychology: Learning, Memory, and Cognition 2013, Vol. 40, No. 1, 000
© 2013 American Psychological Association 0278-7393/13/$12.00 DOI: 10.1037/a0034887
Base Rates: Both Neglected and Intuitive Gordon Pennycook
Dries Trippas and Simon J. Handley
University of Waterloo
Plymouth University
Valerie A. Thompson
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
University of Saskatchewan Base-rate neglect refers to the tendency for people to underweight base-rate probabilities in favor of diagnostic information. It is commonly held that base-rate neglect occurs because effortful (Type 2) reasoning is required to process base-rate information, whereas diagnostic information is accessible to fast, intuitive (Type 1) processing (e.g., Kahneman & Frederick, 2002). To test this account, we instructed participants to respond to base-rate problems on the basis of “beliefs” or “statistics,” both in free time (Experiments 1 and 3) and under a time limit (Experiment 2). Participants were given problems with salient stereotypes (e.g., “Jake lives in a beautiful home in a posh suburb”) that either conflicted or coincided with base-rate probabilities (e.g., “Jake was randomly selected from a sample of 5 doctors and 995 nurses for conflict; 995 doctors and 5 nurses for nonconflict”). If utilizing base-rates requires Type 2 processing, they should not interfere with the processing of the presumably faster belief-based judgments, whereas belief-based judgments should always interfere with statistics judgments. However, base-rates interfered with belief judgments to the same extent as the stereotypes interfered with statistical judgments, as indexed by increased response time and decreased confidence for conflict problems relative to nonconflict. These data suggest that baserates, while typically underweighted or neglected, do not require Type 2 processing and may, in fact, be accessible to Type 1 processing. Keywords: dual-process theories, base-rate neglect, conflict detection, intuition Supplemental materials: http://dx.doi.org/10.1037/a0034887.supp
theorists that some degree of slow and deliberative “Type 2” processing is required for prior probabilities to enter into judgment, indicating that a disparity in the ease with which base-rates and stereotypes are processed is the source of base-rate neglect (e.g., Bonner & Newell, 2010; De Neys & Glumicic, 2008; Ferreira, Garcia-Marques, Sherman, & Sherman, 2006; Kahneman & Frederick, 2002). This perspective is theoretically grounded in dual-process theory, which postulates a fundamental cognitive difference between autonomous “Type 1” processes and working memory dependent “Type 2” processes (Evans, 2008; Evans & Stanovich, 2013; Kahneman, 2003; Sloman, 1996; Stanovich, 2004; Thompson, 2013). In this work we adapt an instruction manipulation from a deductive reasoning paradigm (Handley, Newstead, & Trippas, 2011) to test this “analytic base-rate” account of base-rate neglect.
Much research has demonstrated that preconceived notions about representativeness can influence probability judgments (Barbey & Sloman, 2007; Kahneman & Tversky, 1973; Tversky & Kahneman, 1974). For example, given a description of a randomly sampled male U.S. citizen described as being shy and introverted, one might judge that he is more likely to be a librarian than a farmer, even though there are roughly 20 times more farmers than librarians in the United States (Kahneman, 2011). These types of “biased” judgments are thought to occur because representativeness cues an intuitive response that is difficult to override (Kahneman & Frederick, 2002; Kahneman & Tversky, 1973; Tversky & Kahneman, 1974). It is also frequently assumed by dual process
Gordon Pennycook, Department of Psychology, University of Waterloo, Waterloo, Ontario, Canada; Dries Trippas and Simon J. Handley, Department of Psychology, Plymouth University, Plymouth, England; Valerie A. Thompson, Department of Psychology, University of Saskatchewan, Saskatoon, Saskatchewan, Canada. This research was funded, in part, by a Discovery Grant from the Natural Engineering and Research Council of Canada to Valerie Thompson. Gordon Pennycook and Dries Trippas contributed equally to this work. Correspondence concerning this article should be addressed to Gordon Pennycook, Department of Psychology, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada, N2L 3G1. E-mail:
[email protected]
Base-Rate Neglect Consider the following example (De Neys & Glumicic, 2008, adapted from Kahneman & Tversky, 1973): In a study 1,000 people were tested. Among the participants there were 995 nurses and five doctors. Jake is a randomly chosen participant of this study. Jake is 34 years old. He lives in a beautiful home in a posh suburb. He is well spoken and very interested in politics. He invests a lot of time in his career. 1
PENNYCOOK, TRIPPAS, HANDLEY, AND THOMPSON
2 What is more likely? (a) Jake is a nurse.
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
(b) Jake is a doctor.
This very basic type of base-rate problem contains two conflicting pieces of information: (a) a base-rate probability that suggests Jake is likely to be a nurse, and (b) a stereotypical description that suggests Jake is likely to be a doctor. Decades of research has demonstrated that people have a strong tendency to favor the diagnostic information over the base-rates, leading to the term base-rate neglect (Kahneman & Tversky, 1973; Tversky & Kahneman, 1974). Further, participants higher in cognitive abilities (i.e., intelligence, working memory capacity, etc.) and/or more disposed to analytic thought are more likely to use base-rate information during reasoning (Pennycook, Cheyne, Barr, Koehler, & Fugelsang, 2013; Pennycook, Cheyne, Seli, Koehler, & Fugelsang, 2012; but see Stanovich & West, 2008). It is frequently claimed and broadly accepted that diagnostic information, such as stereotypes, are preferred over base-rates because the two types of information are processed very differently (see Barbey & Sloman, 2007, for a review). Proponents of fuzzy-trace theory, for example, contend that base-rate neglect occurs because diagnostic information cues intuitive gist-based associations in memory, whereas base-rates are processed via more analytical verbatim-based reasoning (Reyna, 2004; Reyna & Brainerd, 2008; Reyna & Mills, 2007; Wolfe & Fisher, 2013). Such dual-process theories, including more general versions that nonetheless presume that base-rates, unlike diagnostic information, require analytic reasoning (e.g., Barbey & Sloman, 2007; De Neys & Glumicic, 2008; Ferreira et al., 2006; Kahneman & Frederick, 2002), are supported by the finding that problem structure factors that allow base-rates to be more easily processed make base-rate neglect much less prominent (Barbey & Sloman, 2007; Brainerd, 2007; Reyna & Mills, 2007). For example, Barbey and Sloman (2007) argued that frequency formats make the nested set nature of the problem more evident, which induces people to substitute associative (Type 1) processing with rule-based (Type 2) processing thereby leading to decreased levels of base-rate neglect. More generally, on this view participants tend to neglect or underweight the base-rates because humans tend to forego Type 2 (T2) processing; and those who are more willing and able to think analytically are more likely to override the intuitive stereotypical response and employ the base-rate information. However, as pointed out by De Neys (2007), there is limited evidence that utilizing base-rates actually requires T2 processing. Moreover, recent research by Pennycook and Thompson (2012) has called this account into question, suggesting instead that baserates may be evaluated using Type 1 (T1) processing. Participants evaluated problems similar to the above example; however, the base-rates were excluded for half of the participants. The base-rate problems did not elicit longer response times relative to cases where base-rates were excluded despite evidence that participants incorporated them into their judgments. This finding is inconsistent with the claim that base-rates require slow, T2 processing. Also, contrary to the assumption that stereotypes are always processed in a primarily intuitive manner, participants were just as likely to change their answer in the direction of the stereotype as they were to change it in the direction of the base-rate when given
the opportunity to rethink their answer. This evidence suggests T1 processes are used to evaluate both base-rates and stereotypes and, in cases where they conflict, T2 processing is required to resolve the conflict in one direction over the other. Thus, the evidence indicates that both base-rates and stereotypes can be processed quickly and that both are also processed analytically, contrary to the assumption that belief information is only processed quickly via T1 processes and that utilizing base-rates requires slower, T2 processing.
Conflict Detection and Base-Rate Neglect Researchers have manipulated base-rate problems so that the base-rates either conflict or coincide with the diagnostic information (see Appendix) to determine whether base-rates are entirely neglected or, rather, if they have subtle effects on behavior when they conflict with diagnostic information. In a series of experiments, it has been shown that conflict problems elicit longer response times (De Neys & Glumicic, 2008; Thompson, Prowse Turner, & Pennycook, 2011) and lower confidence (De Neys, Comheeke, & Osman, 2011; Thompson et al., 2011) relative to nonconflict problems. This interference as a result of conflict even arises when participants give the stereotypical response, which De Neys and colleagues have taken to suggest that humans are able to at least implicitly recognize when they are being “biased” (see De Neys, 2012, for a review). While the research by De Neys and colleagues did not set out to investigate base-rate neglect per se, their findings were originally interpreted under the “analytic base-rate” view (De Neys, Comheeke, & Osman, 2011; De Neys & Franssens, 2009; De Neys & Glumicic, 2008; De Neys, Vartanian, & Goel, 2008; Franssens & De Neys, 2009). As discussed recently by De Neys (2012), the issue of whether base-rates are accessible to T1 processing (and therefore do not require T2 processing) has implications for the nature of conflict detection, particularly as it applies specifically to base-rate neglect. If comprehending base-rates requires T2 processing, some level of analytic reasoning is therefore necessary for conflicts between base-rates and stereotypes to be detected. However, participants were able to detect the conflict under a working memory load, suggesting that conflict detection is highly efficient and relatively effortless (Franssens & De Neys, 2009), at least when the base-rates are salient (Pennycook, Fugelsang, & Koehler, 2012). On the basis of this (among other studies), De Neys (2012) has rejected the earlier stance that the detected conflict is between a T1 (stereotype) and a T2 (base-rate) response, and now, consistent with Pennycook and Thompson (2012) and Thompson et al. (2011), claims that the conflict is between the outputs of two sets of T1 processes (see also De Neys & Bonnefon, 2013). Again, given that conflict detection is a general phenomenon that has been established using a number of reasoning tasks (such as syllogisms, De Neys & Franssens, 2009; De Neys, Moyens, & Vansteenwegen, 2010; the conjunction fallacy, De Neys et al., 2011; Villejoubert, 2009; ratio bias, Bonner & Newell, 2010; and the Cognitive Reflection Test; De Neys, Rossi, & Houdé, 2013), De Neys’s (2012) T1-T1 conflict stance was not intended as a specific theory of base-rate neglect, though it is consistent with the perspective of Pennycook and Thompson (2012; see also Pennycook, Fugelsang, & Koehler, 2012; Thompson et al., 2011) and inconsistent with the “analytic base-rate” view that is prominent in the field (e.g.,
BASE RATES ARE INTUITIVE
Barbey & Sloman, 2007; Ferreira et al., 2006; Kahneman & Frederick, 2002).
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Research Paradigm As with base-rate problems, participants are able to reliably detect conflict between logical validity and conclusion believability inherent in relatively simple syllogisms (De Neys & Franssens, 2009; De Neys et al., 2010). Perhaps unsurprisingly, then, a parallel debate has arisen in the deductive reasoning literature where some authors have claimed that people have an intuitive sense of logical validity (e.g., Morsanyi & Handley, 2012; but see Klauer & Singmann, 2012). Following those who have used instructional manipulations to investigate the relationship between deductive and inductive reasoning (Heit & Rotello, 2010; Rotello & Heit, 2009), Handley, Newstead, and Trippas (2011) instructed participants to judge both the logical validity and the believability of a series of simple reasoning problems. The instances of interest were problems in which logic and belief conflicted. The common view of belief bias is very similar to that of base-rate neglect, namely, that T1 processes produce belief-based judgments that are difficult to overturn via analytic T2 processing (e.g., Newstead, Pollard, Evans, & Allen, 1992; Sá, West, & Stanovich, 1999). Contrary to this account, participants were quicker and more accurate at making logic judgments than belief judgments. More important, belief-logic conflict actually interfered more with belief judgments during reasoning than with logic judgments. That is, inhibiting logical considerations appears more difficult than inhibiting belief-based considerations for simple propositional reasoning problems. It was concluded that (a) making logically valid judgments does not necessarily require T2 processing, and (b) making belief-based judgments does not exclusively rely on T1 processing. In the current work, we test the hypothesis that utilizing baserates requires T2 processing by adapting this instruction manipulation for base-rate problems. Whereas logic and belief conflict for syllogisms, base-rate probabilities and stereotypes (belief) conflict for base-rate problems. According to the analytic base-rate view, information about beliefs enter into judgment autonomously via T1 processes and should therefore interfere with the slower, more deliberate T2 processing required to evaluate the base-rates. In addition, because the base-rates should be processed more slowly, they should not interfere with judgments based on the stereotype. Thus, if instructed to respond according to “belief,” participants should be able to make probability estimates that accord strongly with the stereotypes, respond quickly, and be highly confident regardless of whether base-rates and stereotypes conflict. In contrast, when instructed to respond according to “statistics,” participants should have difficulty inhibiting the intuitive diagnostic information, leading to slower response time, decreased confidence, and less accurate probability estimates for conflict problems. However, if base-rates can be comprehended by T1 processes, responding under statistics instructions should be no more difficult than responding under belief instructions and the conflict between the two pieces of information should make these problems difficult, regardless of instructions. This should lead to longer response times, lower confidence, and less accurate probability estimates for conflict relative to nonconflict problems for both belief and statistics judgments.
3 Experiment 1
By analogy to Handley et al. (2011), we presented people with a series of base-rate neglect problems under either “belief” or “statistics” instructions. Belief instructions required participants to respond according to their “knowledge of what they think to be true in the world”; these instructions were designed to cue responding on the basis of the stereotypical description while ignoring the base-rate. Conversely, under statistics instructions, participants were told to assume that their prior beliefs about the world are not necessarily relevant and that they should concentrate on the actual probability that something will happen. Participants were therefore led to respond on the basis of the base-rate while ignoring the stereotypical description for the statistics instructions. The goal of the instruction manipulation was to lead participants to focus on one of the two categories of information in the base-rate problems (i.e., base-rates or stereotypes). The analytic base-rate view outlined above predicts that participants should be less inclined to incorporate base-rates into their judgments when given the opportunity to respond according to beliefs because they should already have a representation of the problem derived from a highly accessible representativeness heuristic (Kahneman, 2003).
Method Participants. A total of 48 undergraduate students volunteered to participate in exchange for course credit (23 from the University of Saskatchewan, Canada and 25 from Plymouth University, England). Of the participants, 25 were female and 23 were male. The mean age was 21 years (SD ⫽ 4). Materials and procedure. The participants received a total of 24 base-rate problems similar to those used by De Neys and Glumicic (2008). Twelve of the problems were taken from Thompson et al. (2011; previously adapted from De Neys & Glumicic, 2008) and 12 were novel. The stereotypes for the 12 novel problems were developed from adjective typicality ratings for a series of professions taken using data from an earlier study (Neilens, Handley, & Newstead, 2009). Stereotypical descriptions of comparable length to those used by De Neys and Glumicic were developed by using the adjective that was rated to be the most diagnostic for each profession (see online supplemental materials for full set of problems). A pilot test using nine graduate student volunteers from Plymouth University ensured that all descriptions for both old and new items were stereotypical for the intended group and nondiagnostic for the complementary group (see results below). Participants were presented with two types of problems as defined by the relationship between the stereotypical description and the base-rates (see Appendix): (a) base-rates and stereotype suggested the same response (nonconflict), and (b) base-rates and stereotype suggested different responses (conflict). In addition, we created four neutral items in which the description was not diagnostic of either group. These were used as practice items. Two versions of the stimulus set were created wherein each stereotypical description matched the larger group (nonconflict) or the smaller group (conflict) an equal number of times. Three base-rate probability ratios were presented equally often; 995/5, 996/4, 997/3. Extreme ratios were used to remain consistent with the problem set that was used in earlier base-rate research (e.g., De Neys & Glumicic, 2008; De Neys et al., 2008; Pennycook, Fugel-
PENNYCOOK, TRIPPAS, HANDLEY, AND THOMPSON
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
4
sang, & Koehler, 2012, 2013; Pennycook & Thompson, 2012; Thompson et al., 2011). Participants were asked to estimate the probability that the individual belonged to one of the two groups. The groups that were asked about were counterbalanced such that, for each question, the large group was asked about half the time, and the small group was asked about half of the time, allowing us to use the same description in both the nonconflict and conflict conditions (e.g., in the doctor problem presented earlier, changing the base-rate to 995 doctors and 5 nurses turns it into a nonconflict problem). Each participant received half the items in their “nonconflict” form and half in their “conflict” form; the items were counterbalanced across participants such that each item appeared equally often in each form. Problem order was randomized for each participant. After each judgment, participants were asked to rate their confidence on a scale from 1–7, ranging from not confident at all to very confident. The instructions were manipulated within participant, such that they responded on the basis of belief half the time and on the basis of statistics half the time: For each problem, you will be cued to answer either according to beliefs or according to statistics. When instructed to answer according to your beliefs, this means you must answer according to your knowledge of what you think to be true in the world. For example, if you met someone on the street who is dressed in very ragged clothing and asking for money, it is a good bet that such a person is homeless. If you were to be asked the probability that such a person is homeless, you would want to give a high probability because, based upon our knowledge of the world, people who dress in ragged clothing and ask for money on the street are usually homeless. In contrast, when instructed to answer according to statistics, this means you must assume that your prior beliefs about the world aren’t necessarily relevant. Instead, you should concentrate on the actual probability that something will happen. For example, if you knew that only a small percentage of people in a city were homeless, then you would want to give a low probability because, based on statistics, only a small percentage of people are homeless.
The instructions manipulation was counterbalanced across participants, such that the problems presented under belief instructions to half the participants were presented under statistics instructions to the other half, and vice versa. Participants were prompted with “BELIEF” or “STATISTICS” at the bottom of the screen for each problem in a randomized order.
Results Pilot test. Nine volunteers who did not participate in the actual experiment were presented our 24 descriptions. For each description participants were given a 7-point Likert-type scale and asked to rate whether the information was more stereotypical for the intended target group (⫽ 7) or a group for which the description was not intended (⫽ 1). A score of 4 indicated that the information was not particularly diagnostic for either group. We tested whether the personality descriptions were rated as more diagnostic of the intended group than for the complementary group using a one-sample t test comparing the average rating per participant with the scale midpoint of four. The descriptions were significantly more diagnostic of the target groups than the nontargets (M ⫽ 5.79, SD ⫽ 0.50), t(8) ⫽ 22.2, p ⬍ .001. The internal
consistency of the raters was on the high end of acceptable (Cronbach’s ␣ ⫽ .79). Scoring. The data in the conflict condition were recoded so that a high score always indicated a correct response given the instructions. This facilitated easy interpretation of the data. For instance, if there were 3 lawyers and 997 farmers in the sample and the participant was questioned about the probability of being a lawyer while under statistics instructions, a low probability estimate would indicate a response in line with the instructions. On the other hand, if the participant was questioned about the farmers, a high probability estimate would indicate a correct response. Under belief instructions, on the other hand, if the personality description was stereotypical for the lawyers and the participant was asked about the farmers under belief instructions, a low probability judgment would indicate compliance to the instructions. Finally, still under belief instructions, if the participant was questioned about the lawyers, a high probability judgment would be correct. To put all the trials on the same scale for comparison, we recoded the probability estimates for the relevant cases by subtracting them from 100 so that high values always indicated compliance with the instructions (hereafter referred to as “correct” probability estimates). Examples for each case can be found in the Appendix. Power analysis. We performed power analysis using MorePower 6.0 (Campbell & Thompson, 2012) to ensure power was adequately high to detect the crucial Conflict ⫻ Instructions interaction. Handley et al. (2011) conducted five experiments with a very similar methodology and reported an average effect size for the interaction of p2 ⫽ .23 (range: .14 –.33). We calculated the power to detect an effect with a conservative assumed effect size of p2 ⫽ .15 for an alpha level of .05, given our sample. In the current experiment, power was acceptable (.81) given the sample of 48 participants. Probability estimates. We analyzed the probability estimates as scored to reflect compliance with the instructions (hereafter referred to as accuracy; see above) using a 2 (Congruency: nonconflict vs. conflict) ⫻ 2 (Instructions: belief vs. statistics) repeated-measures analysis of variance (ANOVA). Probability estimates were less accurate for conflict (M ⫽ 65.8) than nonconflict (M ⫽ 81.3) trials, F(1, 47) ⫽ 43.4, p ⬍ .001, p2 ⫽ .48. No other effects approached significance(all Fs ⬍ 1.5, all ps ⬎ .22). In other words, conflict problems produced interference, regardless of whether people were instructed to judge on the basis of belief or statistics. Latency. We transformed the response times (RTs) using the natural logarithm—RTs in Table 1 are in original units. The resulting values were submitted to a 2 (Congruency: nonconflict vs. conflict) ⫻ 2 (Instructions: belief vs. statistics) repeatedmeasures ANOVA. A main effect of instructions was found, indicating that participants made faster judgments under statistics instructions (M ⫽ 12.73 s) than under belief instructions (M ⫽ 14.05 s), F(1, 47) ⫽ 6.24, p ⫽ .016, p2 ⫽ .12. There was no main effect of congruency, F(1, 47) ⫽ 2.55, p ⫽ .12, p2 ⫽ .05, although the trend is in the expected direction (i.e., longer RTs for conflict relative to nonconflict problems). The interaction did not approach significance (F ⬍ 1, p ⬎ .41). Confidence ratings. We performed a 2 (Congruency) ⫻ 2 (Instructions) repeated-measures ANOVA on the confidence ratings data. There was a main effect of congruency, wherein people were less confident for judgments where stereotype and base-rate
BASE RATES ARE INTUITIVE
5
Table 1 Mean Probability Estimates, Response Times, and Confidence Ratings as a Function of Congruency and Instruction for Experiment 1 Nonconflict
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Belief
Conflict
Statistics
Total
Belief
Statistics
Total
Variable
M
SEM
M
SEM
M
SEM
M
SEM
M
SEM
M
SEM
Probability estimates Response time (seconds) Confidence (out of 7)
79.3 13.7 5.6
1.9 0.9 0.1
83.3 12.2 5.8
2.3 0.9 0.1
81.3 12.9 5.7
1.6 0.8 0.1
63.9 14.4 5.3
3.0 1.1 0.1
67.7 13.2 5.3
4.0 0.9 0.1
65.8 13.8 5.3
2.3 0.9 0.1
information were in conflict (M ⫽ 5.28) than when both cues suggested the same response (M ⫽ 5.68), F(1, 47) ⫽ 19.2, p ⬍ .001, p2 ⫽ .29. A marginally significant interaction between congruency and instructions was also found, F(1, 47) ⫽ 2.9, p ⫽ .098, p2 ⫽ .06. Follow-up comparisons confirmed that participants were more confident in their judgments for nonconflict than for conflict trials for both belief, t(47) ⫽ 3.04, p ⫽ .004, and logic instructions, t(47) ⫽ 3.71, p ⫽ .001 (see Table 1).
Discussion We found a strong effect of congruency on accuracy, but no main effect of the instruction manipulation. This finding suggests that processing the stereotypes interfered with making judgments on the basis of the base-rates but also that the base-rates interfered with making judgments on the basis of the stereotypes. The results from the confidence ratings converged on the same finding. At a minimum, this suggests that inhibiting the stereotypical information is no more difficult than inhibiting the base-rate probability. Somewhat surprisingly, statistics judgments were performed more quickly than belief judgments without any loss of confidence, or without affecting the ability to comply with the instructions when making probability judgments. These results are clearly inconsistent with the view that analytic processing is required for baserates to affect judgment. Rather, both base-rates and stereotypes appear to be accessible to intuitive processing.
Experiment 2 In Experiment 1, the probability and the confidence data supported an account of base-rate neglect that allows for intuitive processing of base-rate probabilities. Although the latency data were less straightforward in terms of the congruency effect, overall, it was clearly not consistent with the view that utilizing base-rates requires analytic processing. In Experiment 2, we used a response deadline paradigm as a means to perform a stronger test of both accounts of base-rate neglect by limiting T2 processing. Limiting the time available to respond has been shown to interfere with analytic processing in deductive reasoning (Evans & CurtisHolmes, 2005; Schroyens, Schaeken, & Handley, 2003). If both base-rate and stereotypes are accessible to T1 processing, baserates will interfere with stereotypes (and vice versa), as in E1, regardless of whether participants are asked to respond according to belief or statistics and regardless of the time deadline. In contrast, the analytic base-rate view suggests that participants should have difficulty responding according to the “more analytic” base-rate information, particularly when under time pressure.
Method Participants. Eighty undergraduates from the University of Saskatchewan participated in exchange for course credit. There were 33 male and 47 female participants (M ⫽ 25, SD ⫽ 8). Materials and procedure. The materials and procedure were identical to those used in Experiment 1, with the exception of the time limit. Participants were instructed to respond within 5 s. After the time limit had passed, the problem text would go bold and red indicating that the participants had to respond immediately.
Results Missing data. Participants responded within the time limit for 82% of the trials on average. We removed those participants who did not score within the time limit for at least half the trials (n ⫽ 10). The remaining 70 participants responded within the imposed time limit for 89% of the trials. We excluded all responses that were not given within the time limit. Missing values were replaced by the cell mean (Handley et al., 2011). Power analysis. We calculated the power to detect a large effect (p2 ⫽ .15) for an alpha level of .05 and given the current sample size (n ⫽ 80 for the original sample but 70 for the actual analysis). Power was high (⬎.93) for both cases. Probability estimates. The probability estimates were rescored in the manner described for Experiment 1. We submitted the estimates to a 2 (Congruency: nonconflict vs. conflict) ⫻ 2 (Instructions: statistics vs. belief) repeated-measures ANOVA. The analysis resulted in a main effect of congruency with more accurate probability estimates for nonconflict (M ⫽ 80.2) than conflict (M ⫽ 60.3) items, F(1, 69) ⫽ 104.4, p ⬍ .001, p2 ⫽ .60 (see Table 2). No other effects reached significance (all Fs ⬍ 2, all ps ⬎ .17). Latency. All RTs were transformed using the natural logarithm—RTs in Table 2 are in original units. We submitted the resulting values to a 2 (Congruency: nonconflict vs. conflict) ⫻ 2 (Instructions: statistics vs. belief) repeated-measures ANOVA. The analysis resulted in a main effect of congruency showing that participants responded faster to nonconflict (M ⫽ 3.69 s) than to conflict (M ⫽ 3.80 s) trials, F(1, 69) ⫽ 6.10, p ⫽ .016, p2 ⫽ .08. No other effects approached significance (all Fs ⬍ 1, all ps ⬎ .50). Confidence ratings. The confidence ratings were submitted to a 2 (Congruency: nonconflict vs. conflict) ⫻ 2 (Instructions: statistics vs. belief) repeated-measures ANOVA. The analysis resulted in a main effect of congruency, showing that people were more confident responding to nonconflict (M ⫽ 5.53) than to conflict (M ⫽ 5.31) trials, F(1, 69) ⫽ 12.4, p ⫽ .001, p2 ⫽ .15. No
PENNYCOOK, TRIPPAS, HANDLEY, AND THOMPSON
6
Table 2 Mean Probability Estimates, Response Times, and Confidence Ratings as a Function of Congruency and Instruction for Experiment 2 Nonconflict
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Belief
Conflict
Statistics
Total
Belief
Statistics
Total
Variable
M
SEM
M
SEM
M
SEM
M
SEM
M
SEM
M
SEM
Probability estimates Response time (seconds) Confidence (out of 7)
79.9 3.68 5.5
1.7 0.07 0.1
80.4 3.70 5.5
2.0 0.07 0.1
80.1 3.69 5.5
1.5 0.07 0.1
63.9 3.81 5.3
2.7 0.08 0.1
56.6 3.79 5.3
3.4 0.08 0.1
60.3 3.80 5.3
1.7 0.07 0.1
other effects approached significance (Fs ⬍ 1, ps ⬎ .70). See Table 2 for the means and standard deviations.
Discussion As in Experiment 1, we found a strong effect of congruency on accuracy and confidence data regardless of the time deadline, but no main effect of the instruction manipulation. Participants also took longer to respond for conflict relative to no conflict problems, regardless of instruction. As above, base-rates interfered with belief judgments—a finding that is not consistent with the claim that comprehending base-rates requires slower, T2 processes, whereas beliefs are processed using faster, T1 processes. Moreover, base-rates had robust effects on behavior even when participants were instructed to respond according to beliefs under a time deadline intended to minimize T2 processing.
Experiment 3 In the previous two experiments we manipulated instruction as a within-participant factor. While the constant reminder about which type of information to focus on may have kept participants on task, it is also possible that switching from belief-based responding to statistics-based responding (and vice versa) may have led to increased interference as a function of response conflict. In other words, requiring participants to alternate use of base-rates and stereotypes within the same experiment may have increased the probability of conflict detection regardless of instruction. It is also possible that the act of switching in and of itself could create interference. To rule out these possible confounds we manipulated instruction as a between-participants factor in an online study.
Method Participants. We recruited 120 participants online via Amazon Mechanical Turk. There were 67 male and 53 female participants (Mage ⫽ 36, SD ⫽ 13.8). Materials and procedure. Items from Experiment 1 were presented to participants in a single random order. Participants were presented with either statistics or belief instructions based on whether their birthday falls on the first or second half of the year, counterbalanced across participants. This was done because the online format requires each condition to be run in-turn. Thus, participants filtered into the crucial two between-participants conditions at roughly equivalent times. Congruency was also counterbalanced across participants such that each problem was a conflict problem for half of the participants and a nonconflict problem for the other half. This yielded four between subjects conditions that
were cycled through in-turn. Participants entered probability estimates in a text box and gave confidence ratings on a 7-point scale (as above). Items were presented on separate pages, although alongside confidence ratings.
Results Missing data. Given that participants were not continually kept on task by a trial-by-trial instruction prompt, as in Experiments 1 and 2, we included a number of measures to ensure that only participants who followed the belief/statistics instructions were included in the analysis. Thus, participants were excluded from analysis for a number of reasons. Specifically, two participants were excluded because they had missing data, three indicated that they had seen similar problems before in previous studies, one person indicated that he or she was not fluent in English, and six people failed a “check” question included to ensure that participants were paying attention.1 We also included three additional follow-up questions to ensure that participants followed the belief/ statistics instructions. Of the remaining participants, eight responded “yes” when asked, “Did you forget at any point during the task to respond according to statistics?” (or “beliefs” depending on condition), and six responded “no” when asked, “Did you read all of the information for each problem carefully?” Finally, and most important, participants were given an open-ended text box and asked the following question directly after completing the 24 base-rate problems (i.e., prior to the above check questions): “At the beginning of this study, you were asked to respond to the problems using probability estimates based on what?” In total, 27 of the remaining participants (28.7%) failed to correctly answer this question and were excluded on that basis.2 Thus, 67 participants (31 statistics condition, 36 beliefs condition) remained in the final sample. None of the remaining participants had identical IP addresses. Power analysis. We calculated the power to detect a large effect (p2 ⫽ .15) for an alpha level of .05 and given the retained sample size of 67 participants. As in the previous two Experiments, power to detect the interaction between congruency and instruction was sufficiently high (.92). 1 Participants were given a list of hobbies following a demographics questionnaire and asked “Below is a list of hobbies. If you are reading these instructions please write ‘I read the instructions’ in the ‘other’ box.” 2 We permitted some minimal leniency here. For example, “probability” or “percentages” were common answers in the statistics condition and “knowledge of the world” and “experience” came up in the belief condition.
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
BASE RATES ARE INTUITIVE
Probability estimates. We submitted the probability ratings recoded to signify correct responding to a 2 (Congruency: nonconflict vs. conflict) ⫻ 2 (Instructions: statistics vs. belief) mixed ANOVA. The analysis resulted in a main effect of congruency with higher probability estimates for nonconflict (M ⫽ 88.6) than conflict (M ⫽ 64) items, F(1, 65) ⫽ 59.47, p ⬍ .001, p2 ⫽ .48 (see Table 3). No other effects reached significance(all Fs ⬍ 1.5, all ps ⬎ .23).3 Confidence ratings. The confidence ratings were submitted to a 2 (Congruency: nonconflict vs. conflict) ⫻ 2 (Instructions: statistics vs. belief) mixed ANOVA. The analysis resulted in a main effect of congruency, showing that people were more confident responding to nonconflict (M ⫽ 5.83) than to conflict (M ⫽ 5.26) trials, F(1, 65) ⫽ 42.24, p ⬍ .001, p2 ⫽ .39. No other effects approached significance (Fs ⬍ 1, ps ⬎ .35).4 See Table 3 for the means and standard deviations.
Discussion The results from Experiment 1 were replicated in a betweenparticipants design despite the fact that we were very conservative in our analysis, including only the participants who passed a number of “check” questions (representing 55.8% of the sample). These data rule out the possibility that switching from beliefbased to statistics-based judgments (and vice versa) explain the results from Experiments 1 and 2. We also note that the probability estimates and confidence ratings for the betweenparticipants online sample experiment (Table 3) closely parallel the data from the within-participant university sample experiments (Tables 1 and 2).
General Discussion The finding that base-rates are underweighted or neglected is commonly explained by noting the compelling intuitive nature of stereotypical or diagnostic information and asserting a failure to adjust the response by integrating the base-rates via analytic reasoning processes (e.g., De Neys & Glumicic, 2008; Ferreira et al., 2006; Kahneman & Frederick, 2002). This assumption is consistent with the finding that reasoners who are more able and/or willing to engage in T2 processing are more likely to utilize base-rate information when making judgments (e.g., Pennycook et al., 2013). However, the data presented here indicate that, at the very least, extreme base-rates are accessible to rapid, T1 processing. Specifically, both belief and statistical judgments were made difficult by conflict, even when the possibility for T2 processing was reduced by imposing a time deadline. Finally, participants were just as fast and confident when responding according to statistics as when making judgments based on belief. These results all converge on the same conclusion: Base-rates probabilities (at least the extreme ones used in our study) are accessible to T1 processing and can influence beliefs just as beliefs can influence statistics judgments. Thus, the conflict observed between baserates and stereotypes may result from a conflict of T1 outputs, rather than a conflict between a T1 and T2 output (De Neys, 2012; Pennycook, Fugelsang, & Koehler, 2012; Pennycook & Thompson, 2012; Thompson et al., 2011).
7
Why Are Base-Rates Neglected? Using a binary choice format, only roughly 18 –24% of respondents typically choose the base-rate option over the stereotypical response for problems similar to the ones used here (De Neys & Glumicic, 2008; Pennycook, Fugelsang, & Koehler, 2012). If base-rates are accessible to intuitive processing, why are they routinely underweighted or, in some cases, neglected completely? One view is that base-rates are neglected because they require effortful analytic reasoning and that responses based upon explicit cues of this kind are preempted by intuitive judgments that draw upon readily available stereotypes (e.g., De Neys & Glumicic, 2008; Ferreira et al., 2006; Kahneman & Frederick, 2002). We suggest instead that participants typically respond according to stereotypes rather than base-rates not because the former is primarily intuitive and the latter is primarily analytic, but because the former is simply more salient than the latter in the base-rate problems that are typically constructed. Whereas the base-rates represent a single source of information, the descriptions that we presented contained three to five pieces of detailed information that reinforced the targeted stereotype (see supplemental materials for items). If we assume that the likelihood and strength of a particular intuitive output is affected by contextual factors such as this, it is no surprise that diagnostic information is typically favored over base-rate probabilities. This context-dependent view of base-rate use is consistent with recent research which shows that the percentage of base-rate responses for problems with rich sources of diagnostic information (as used here) tends to be very low (24%; Pennycook, Fugelsang, & Koehler, 2012). However, an alternative set of problems where the descriptions merely consisted of a set of five personality traits (e.g., orderly, organized, precise, practical and realistic) along with extreme base-rates, resulted in a majority (59%) of base-rate responses. How does this relative salience account explain existing data that seems to support the claim that utilizing base-rates requires T2 processing? Specifically, high capacity participants respond more often according to the base-rates than lower capacity individuals (Pennycook et al., 2013; but see Stanovich & West, 2008) and the mere willingness to engage analytic reasoning processes also predicts base-rate responding (Pennycook, Cheyne, et al., 2012). Consistent with the conclusions presented above, we speculate that the observed differences may be attributed to strategy choices. Under this explanation, the overall preference for diagnostic information likely has more to do with cognitive miserliness (i.e., nonanalytic cognitive style) and less to do with competence (i.e., absent or ineffective cognitive abilities) because, as our data indicate, it does not appear to be a processing deficit that leads to base-rate neglect. In other words, we suggest that the base-rates are typically recognized by all but that individuals who are willing to engage T2 processing consider them more explicitly. More miserly participants may focus on the more detailed stereotypical informa3 As would be expected, the excluded participants gave probability estimates that were (moderately) less in-line with the instruction relative to the included participants, t(106) ⫽ 1.72, SE ⫽ 2.95, p ⫽ .089. 4 The excluded participants also had lower confidence for conflict (M ⫽ 5.40) relative to nonconflict (M ⫽ 5.85) problems, F(1, 40) ⫽ 20.35, p ⬍ .001, 2p ⫽ .34. Without the instruction manipulation, the task for the excluded participants would be akin to a standard base-rate neglect study.
PENNYCOOK, TRIPPAS, HANDLEY, AND THOMPSON
8
Table 3 Mean Probability Estimates and Confidence Ratings as a Function of Congruency and Instruction for Experiment 3 Nonconflict
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Belief
Conflict
Statistics
Total
Belief
Statistics
Total
Variable
M
SEM
M
SEM
M
SEM
M
SEM
M
SEM
M
SEM
Probability estimates Confidence (out of 7)
87.7 6.0
1.7 0.1
89.4 5.7
2.3 0.2
88.5 5.8
1.4 0.1
60.6 5.4
3.9 0.2
67.5 5.2
4.8 0.2
63.8 5.3
3.1 0.1
tion because it leads to a stronger intuitive response that comes to mind more fluently, leading to an increased feeling of rightness (Thompson et al., 2011, 2013). Participants who are more willing to engage analytic reasoning are simply more likely to weigh the base-rates against the stereotypes, given that the more salient diagnostic information remains the default, this leads to decreased levels of stereotypical responding. This process may also be facilitated by the possibility that more analytic individuals are better able to detect conflicts between cognitive outputs, leading to increased deliberative weighing of the conflicting responses against each other (Pennycook et al., 2013). This account is consistent with the finding that analytic cognitive style is more predictive of base-rate responding than cognitive ability (Pennycook, Cheyne, et al., 2012, 2013; Stanovich & West, 2008), though this finding needs to be replicated using full IQ scales. It is also consistent with the finding that “neutral” base-rate problems (i.e., items that lack stereotypical content) are associated with lower “feelings of rightness” than problems with stereotypical content, regardless of conflict status (Thompson et al., 2011). We note, in addition, that interpretation of the individual difference data is also complicated by the fact that the normatively correct response to typical base-rate problems according to Bayes’ theorem is to combine the diagnostic and base-rate probabilities, a strategy that does not appear to arise very often for conflict problems (Pennycook & Thompson, 2012). Another finding that appears to support the claim that utilizing base-rates requires T2 processing is that base-rate neglect has been shown to increase for conflict problems when participants were given a working memory load, suggesting a decrease in base-rate processing concurrent with a disruption of T2 processing (Franssens & De Neys, 2009). However, Franssens and De Neys (2009) also demonstrated that recall of base-rates was not affected by the working memory load, indicating that the base-rates were processed despite the decrease in base-rate responding. Moreover, as previously discussed, giving participants an opportunity to rethink an initial response leads to a roughly equal proportion of answer changes toward the base-rates as away from them (Pennycook & Thompson, 2012). These findings indicate that while T2 processing is not likely required for base-rates to enter into judgment, T2 processing may nonetheless be necessary to decide between the two sources of conflicting information.
Dual-Process Theory and Intuitive Base-Rates Given that our data are inconsistent with the standard dualprocess explanation of base-rate neglect, one might wonder why we have nonetheless framed our results in terms of dual-process theory. One could argue, for example, that the claim that base-rates
are accessible to T1 processing is a violation of the definition of T1 processing as evolutionarily old, contextualized, and isolated to implicit knowledge (for overviews of dual-process theories, see Evans, 2008, 2009; Evans & Frankish, 2009; Evans & Stanovich, 2013; Sloman, 1999; Stanovich, 2009; Stanovich & West, 2000). However, as recently discussed by Evans and Stanovich (2013), factors such as contextualization are typical correlates of T1 or T2 processes, but not defining features. Evans and Stanovich have isolated the use of working memory and the processes of cognitive decoupling and mental simulation as defining features of T2 processing. In contrast, T1 processes are considered autonomous and operate independently of working memory. Thus, although baserates may not be intuitive in the sense that they cue a cognitive output via gist-based associations in memory (Reyna, 2004, 2012; Reyna & Brainerd, 2008), it is possible that the use of base-rates in judgment has been practiced enough by adulthood that they are part of an “encapsulated knowledge base” that is accessible to T1 processing (Stanovich, West, & Toplak, 2011). This potentially explains how a simple but nonetheless abstract probabilistic concept can be processed via T1 processes. The finding that base-rates interfered with purportedly beliefbased judgments suggests that they entered into judgment autonomously; i.e., the execution of the T1 process was mandatory once the stimulus was encountered (Evans & Stanovich, 2013; Thompson, 2013). However, it is not necessarily true that the autonomous instantiation of the rule “if stimulus then response” is qualitatively different from more deliberative T2 processes (Kruglanski, 2013; Kruglanski & Gigerenzer, 2011). This forms part of a larger debate that certainly will not be resolved here (see Evans & Stanovich, 2013; Keren, 2013; Kruglanski, 2013; Osman, 2013; Thompson, 2013). However, our data do speak to this debate. Specifically, the finding that conflict selectively increases response time is parsimoniously explained in a dual-process framework as an increase in T2 processing that is required to resolve the conflict between two cognitive outputs. The difficulty that participants apparently have deciding between the base-rates and stereotypes (Pennycook & Thompson, 2012) is not at all surprising given the claim that cognitive decoupling is a primary function of (typically) more effortful T2 processing (Stanovich, 2009, 2011). Our data indicate that participants must actively inhibit the conflicting source of information to solve the problem, a process that seems to be fundamentally different from the one that engendered the output in the first place.
Additional Implications These data have implications for research outside of base-rate neglect. In Experiment 2, participants had lower confidence and
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
BASE RATES ARE INTUITIVE
slower response time for conflict relative to nonconflict problems despite the short deadline. This indicates that the participants in our sample were able to rapidly detect the conflict between baserates and stereotypes (De Neys & Glumicic, 2008; De Neys et al., 2008, 2011). This, along with evidence that conflict detection for base-rate problems is not disrupted by working memory load (Franssens & De Neys, 2009), suggests that the detection of conflict during reasoning is an effortless and potentially automatic process (De Neys, 2012). Future research is required to further investigate this claim. That base-rates and stereotypes appear to have been encoded and activated in parallel, a likely prerequisite for rapid conflict detection, is also consistent with the parallelism assumption of fuzzy trace theory (e.g., Reyna, 2012). We note, however, that although fuzzy-trace theory holds that verbatim and gist-based representations are encoded and processed in parallel, base-rate neglect is nonetheless explained by positing that base-rates are processed differently than stereotypes (i.e., diagnostic information cues gist-based associations in memory, whereas base-rates are processed via verbatim-based reasoning; see Wolfe & Fisher, 2013). Our data do not support the claim that base-rates and stereotypes are processed in fundamentally different ways. However, it may be the case that extreme base-rate probabilities presented in frequency format fall somewhere in-between the opposite poles of verbatim and gist representations, thereby leading to very subtle processing differences that require alternative methods and measures to discern. Here, too, further research is required.
Conclusion The tendency for researchers to make inferences about the type of reasoning process that has occurred based solely on the type of response that was given is widespread (Evans, 2012). Above all else, the data presented here demonstrate the difficulty with this approach. Specifically, the claim that stereotypes are “intuitive” and base-rates are “analytic” does not hold up to serious scrutiny. This finding, paired with similar results in the domain of deductive reasoning (Handley et al., 2011), provide empirical evidence that reinforces the point that attempts to infer a reasoning process solely from a response should be treated with caution.
References Barbey, A. K., & Sloman, S. A. (2007). Base-rate respect: From ecological rationality to dual processes. Behavioral and Brain Sciences, 30, 241. doi:10.1017/S0140525X07001653 Bonner, C., & Newell, B. R. (2010). In conflict with ourselves? An investigation of heuristic and analytic processes in decision making. Memory & Cognition, 38, 186 –196. doi:10.3758/MC.38.2.186 Brainerd, C. J. (2007). Kissing cousins but not identical twins: The denominator neglect and base-rate respect models. Behavioral and Brain Sciences, 30, 257–258. doi:10.1017/S0140525X07001689 Campbell, J. I. D., & Thompson, V. (2012). MorePower 6.0 for ANOVA with relational confidence intervals and Bayesian analysis. Behavior Research Methods, 44, 1255–1265. doi:10.3758/s13428-012-0186-0 De Neys, W. (2007). Nested sets and base-rate neglect: Two types of reasoning? Behavioral and Brain Sciences, 30, 260 –261. doi:10.1017/ S0140525X07001719 De Neys, W. (2012). Bias and conflict: A case for logical intuitions. Perspectives on Psychological Science, 7, 28 –38. doi:10.1177/ 1745691611429354
9
De Neys, W., & Bonnefon, J.-F. (2013). The ‘whys’ and ‘whens’ of individual differences in thinking biases. Trends in Cognitive Sciences. Advance online publication. doi:10.1016/j.tics.2013.02.001 De Neys, W., Comheeke, S., & Osman, M. (2011). Biased but in doubt: Conflict and decision confidence. PLoS ONE, 6, e15954. doi:10.1371/ journal.pone.0015954 De Neys, W., & Franssens, S. (2009). Belief inhibition during thinking: Not always winning but at least taking part. Cognition, 113, 45– 61. doi:10.1016/j.cognition.2009.07.009 De Neys, W., & Glumicic, T. (2008). Conflict monitoring in dual process theories of thinking. Cognition, 106, 1248 –1299. doi:10.1016/j .cognition.2007.06.002 De Neys, W., Moyens, E., & Vansteenwegen, D. (2010). Feeling we’re biased: Autonomic arousal and reasoning conflict. Cognitive, Affective, & Behavioral Neuroscience, 10, 208 –216. doi:10.3758/CABN.10.2.208 De Neys, W., Rossi, S., & Houdé, O. (2013). Bats, balls, and substitution sensitivity: Cognitive misers are no happy fools. Psychonomic Bulletin & Review. Advance online publication. De Neys, W., Vartanian, O., & Goel, V. (2008). Smarter than we think: When our brains detect that we are biased. Psychological Science, 19, 483– 489. doi:10.1111/j.14679280.2008.02113.x Evans, J. S. T. (2008). Dual-processing accounts of reasoning, judgment, and social cognition. Annual Review of Psychology, 59, 255–278. doi: 10.1146/annurev.psych.59.103006.093629 Evans, J. S. T. (2009). How many dual process theories do we need: One, two, or many? In J. Evans & K. Frankish (Eds.), In two minds: Dual processes and beyond (pp. 33–54). Oxford, England: Oxford University Press. doi:10.1093/acprof:oso/9780199230167.003.0002 Evans, J. S. T. (2012). Dual-process theories of deductive reasoning: Facts and fallacies. In K. Holyoak & R. G. Morrison (Eds.), The Oxford handbook of thinking and reasoning (pp. 115–133). Oxford, England: Oxford University Press. doi:10.1093/oxfordhb/9780199734689.013 .0008 Evans, J. S., & Curtis-Holmes, J. (2005). Rapid responding increases belief bias: Evidence for the dual-process theory of reasoning. Thinking & Reasoning, 11, 382–389. doi:10.1080/13546780542000005 Evans, J. S. T., & Frankish, K. (Eds.). (2009). In two minds: Dual processes and beyond. Oxford, England: Oxford University Press. doi: 10.1093/acprof:oso/9780199230167.001.0001 Evans, J. S. T., & Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the debate. Perspectives on Psychological Science, 8, 223–241. doi:10.1177/1745691612460685 Ferreira, M. B., Garcia-Marques, L., Sherman, S. J., & Sherman, J. W. (2006). Automatic and controlled components of judgment and decision making. Journal of Personality and Social Psychology, 91, 797– 813. doi:10.1037/0022-3514.91.5.797 Franssens, S., & De Neys, W. (2009). The effortless nature of conflict detection during thinking. Thinking & Reasoning, 15, 105–128. doi: 10.1080/13546780802711185 Handley, S. J., Newstead, S. E., & Trippas, D. (2011). Logic, beliefs, and instruction: A test of the default interventionist account of belief bias. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 28 – 43. doi:10.1037/a0021098 Heit, E., & Rotello, C. M. (2010). Relations between inductive reasoning and deductive reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 805– 812. doi:10.1037/a0018784 Kahneman, D. (2003). A perspective on judgment and choice: Mapping bounded rationality. American Psychologist, 58, 697–720. doi:10.1037/ 0003-066X.58.9.697 Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux. Kahneman, D., & Frederick, S. (2002). Representativeness revisited: Attribute substitution in intuitive judgment. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases (pp. 49 – 81). New York,
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
10
PENNYCOOK, TRIPPAS, HANDLEY, AND THOMPSON
NY: Cambridge University Press. doi:10.1017/CBO9780511808098 .004 Kahneman, D., & Tversky, A. (1973). Psychology of prediction. Psychological Review, 80, 237–251. doi:10.1037/h0034747 Keren, G. (2013). A tale of two systems: A scientific advance or a theoretical stone soup? Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8, 257–262. doi:10.1177/ 1745691613483474 Klauer, K. C., & Singmann, H. (2012). Does logic feel good? Testing for intuitive detection of logicality in syllogistic reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition. Advance online publication. doi:10.1037/a0030530 Kruglanski, A. W. (2013). Only one? The default interventionist perspective as a unimodel: Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8, 242–247. doi:10.1177/ 1745691613483477 Kruglanski, A. W., & Gigerenzer, G. (2011). Intuitive and deliberative judgments are based on common principles. Psychological Review, 118, 97–109. doi:10.1037/a0020762 Morsanyi, K., & Handley, S. J. (2012). Logic feels so good—I like it! Evidence for intuitive detection of logicality in syllogistic reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38, 596 – 616. doi:10.1037/a0026099 Neilens, H., Handley, S., & Newstead, S. E. (2009). Effects of training and instruction on analytic and belief-based reasoning. Thinking & Reasoning, 15, 37– 68. doi:10.1080/13546780802535865 Newstead, S., Pollard, P., Evans, J., & Allen, J. (1992). The source of belief bias effects in syllogistic reasoning. Cognition, 45, 257–284. doi: 10.1016/0010-0277(92)90019-E Osman, M. (2013). A case study: Dual-process theories of higher cognition: Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8, 248 –252. doi:10.1177/1745691613483475 Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2013). Cognitive style and religiosity: The role of conflict detection. Memory & Cognition. Advance online publication. doi:10.3758/s13421013-0340-7 Pennycook, G., Cheyne, J. A., Seli, P., Koehler, D. J., & Fugelsang, J. A. (2012). Analytic cognitive style predicts religious and paranormal belief. Cognition, 123, 335–346. doi:10.1016/j.cognition.2012.03.003 Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2012). Are we good at detecting conflict during reasoning? Cognition, 124, 101–106. doi: 10.1016/j.cognition.2012.04.004 Pennycook, G., & Thompson, V. A. (2012). Reasoning with base-rates is routine, relatively effortless, and context dependent. Psychonomic Bulletin & Review, 19, 528 –534. doi:10.3758/s13423-012-0249-3 Reyna, V. F. (2004). How people make decisions that involve risk: A dual-processes approach. Current Directions in Psychological Science, 13, 60 – 66. doi:10.1111/j.0963-7214.2004.00275.x Reyna, V. F. (2012). A new intuitionism: Meaning, memory, and development in fuzzy-trace theory. Judgment and Decision Making, 7, 332– 359. Reyna, V. F., & Brainerd, C. J. (2008). Numeracy, ratio bias, and denominator neglect in judgments of risk and probability. Learning and Individual Differences, 18, 89 –107. doi:10.1016/j.lindif.2007.03.011 Reyna, V. F., & Mills, B. (2007). Converging evidence supports fuzzytrace theory’s nested sets hypothesis, but not the frequency hypothesis.
Behavioral and Brain Sciences, 30, 278 –280. doi:10.1017/ S0140525X07001872 Rotello, C. M., & Heit, E. (2009). Modeling the effects of argument length and validity on inductive and deductive reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 1317–1330. doi:10.1037/a0016648 Sá, W., West, R., & Stanovich, K. (1999). The domain specificity and generality of belief bias: Searching for a generalizable critical thinking skill. Journal of Educational Psychology, 91, 497–510. doi:10.1037/ 0022-0663.91.3.497 Schroyens, W., Schaeken, W., & Handley, S. (2003). In search of counterexamples: Deductive rationality in human reasoning. The Quarterly Journal of Experimental Psychology. A, Human Experimental Psychology, 56, 1129 –1145. doi:10.1080/02724980245000043 Sloman, S. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119, 3–22. doi:10.1037/0033-2909.119.1.3 Stanovich, K. E. (2004). The Robot’s rebellion: Finding meaning in the age of Darwin. Chicago, IL: The University of Chicago Press. doi: 10.7208/chicago/9780226771199.001.0001 Stanovich, K. E. (2009). Is it time for a tri-process theory? Distinguishing the reflective and algorithmic mind. In J. S. T. Evans & K. Frankish (Eds.), In two minds: Dual processes and beyond (pp. 55– 88). Oxford, England: Oxford University Press. doi:10.1093/acprof:oso/ 9780199230167.003.0003 Stanovich, K. E. (2011). Rationality and the reflective mind. New York, NY: Oxford University Press. Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate. Behavioral and Brain Sciences, 23, 645– 665. doi:10.1017/S0140525X00003435 Stanovich, K. E., & West, R. F. (2008). On the relative independence of thinking biases and cognitive ability. Journal of Personality and Social Psychology, 94, 672– 695. doi:10.1037/0022-3514.94.4.672 Stanovich, K. E., West, R. F., & Toplak, M. W. (2011). The complexity of developmental predictions from dual process models. Developmental Review, 31, 103–118. doi:10.1016/j.dr.2011.07.003 Thompson, V. A. (2013). Why it matters: The implications of autonomous processes for dual-process theories: Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8, 253–256. doi: 10.1177/1745691613483476 Thompson, V. A., Prowse Turner, J. A., & Pennycook, G. (2011). Intuition, reason, and metacognition. Cognitive Psychology, 63, 107–140. doi: 10.1016/j.cogpsych.2011.06.001 Thompson, V. A., Prowse Turner, J., Pennycook, G., Ball, L., Brack, H., Ophir, Y., & Ackerman, R. (2013). The role of answer fluency and perceptual fluency as metacognitive cues for initiating analytic thinking. Cognition, 128, 237–251. doi:10.1016/j.cognition.2012.09.012 Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124 –1131. doi:10.1126/science.185 .4157.1124 Villejoubert, G. (2009). Are representativeness judgments automatic and rapid? The effect of time pressure on the conjunction fallacy. Proceedings of the Annual Meeting of the Cognitive Science Society, 30, 2980 – 2985. Wolfe, C. R., & Fisher, C. R. (2013). Individual differences in base rate neglect: A fuzzy processing preference index. Learning and Individual Differences, 25, 1–11. doi:10.1016/j.lindif.2013.03.003
(Appendix follows)
BASE RATES ARE INTUITIVE
11
Appendix Example Problem
This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Table A1 High base-rate
Low base-rate
High diagnosticity
[CONGRUENT, CELL #1] In a study 1,000 people were tested. Brannon is a randomly chosen participant of this study. Among the participants there were 995 accountants and 5 street artists. Brannon is 29 years old. He is very good with numbers but is shy around people. He spends much of his time working. What is the probability that Brannon is an accountant? Correct according to statistics ⬃ 100 Correct according to beliefs ⬃ 100 Correct according to base-rate ⬃ 100
[INCONGRUENT, CELL #2] In a study 1,000 people were tested. Brannon is a randomly chosen participant of this study. Among the participants there were 5 accountants and 995 street artists. Brannon is 29 years old. He is very good with numbers but is shy around people. He spends much of his time working. What is the probability that Brannon is an accountant? Correct according to statistics ⬃ 0 Correct according to beliefs ⬃ 100 Correct according to base-rate ⬃ 0
Low diagnosticity
[INCONGRUENT, CELL #3] In a study 1,000 people were tested. Brannon is a randomly chosen participant of this study. Among the participants there were 5 accountants and 995 street artists. Brannon is 29 years old. He is very good with numbers but is shy around people. He spends much of his time working. What is the probability that Brannon is a street artist? Correct according to statistics ⬃ 100 Correct according to beliefs ⬃ 0 Correct according to base-rate ⬃ 100
[CONGRUENT, CELL #4] In a study 1,000 people were tested. Brannon is a randomly chosen participant of this study. Among the participants there were 995 accountants and 5 street artists. Brannon is 29 years old. He is very good with numbers but is shy around people. He spends much of his time working. What is the probability that Brannon is a street artist? Correct according to statistics ⬃ 0 Correct according to beliefs ⬃ 0 Correct according to base-rate ⬃ 0
Received March 4, 2013 Revision received September 6, 2013 Accepted September 9, 2013 䡲