Running Head: Causal Status Effect: When and Why
The When and Why of the Causal Status Effect
Bob Rehder Department of Psychology New York University New York NY 10003
Word count: 3939
Send all correspondence to: Bob Rehder Dept. of Psychology 6 Washington Place New York, New York 10003 Phone: (212) 992-9586 Email:
[email protected]
Causal Status Effect: When and Why
2 Abstract
An outstanding puzzle in the causal-based categorization literature is why an important phenomenon—the causal status effect—can vary in size by an order or magnitude across studies. Two hypotheses were tested. Studies showing a large causal status effect provided participants with probabilistic interfeature causal links and presented a small number of similar item on the subsequent classification test (e.g., Ahn et al., 2000). Studies with more deterministic causal links and a longer classification test showed a small causal status effect (e.g., Rehder, 2003b; Rehder & Kim, 2006) An experiment manipulating these two factors confirmed that the causal status effect was larger with probabilistic causal links and few test items and smaller (indeed, absent) with deterministic links or more test items. Evidence was also found for coherence effects, extending the range of materials and procedure in which such effects have been found. These results are consistent with a generative model of classification and inconsistent with an alternative dependency model.
Causal Status Effect: When and Why
3
The When and Why of the Causal Status Effect One ongoing research question concerns how people classify objects on the basis of the various features, attributes, and properties they display. Features provide strong evidence of category membership of course to the extent they have been observed in the past to co-occur frequently with the category's label (Rosch & Mervis, 1975). More salient features will have more influence on classification than less salient ones (Sloman, Love, & Ahn, 1998). But it is now known that the causal and explanatory relations in which features are involved also affect their diagnosticity. For example, one important empirical regularity is the causal status effect, the phenomenon in which, all else being equal, features that appear earlier in a category's causal network of features carry greater weight in categorization decisions (e.g., Ahn, 1998; Ahn, Kim, Lassaline & Dennis, 2000; Sloman, Love, & Ahn, 1998; Kim & Ahn, 2002; Luhmann, Ahn, & Palmeri, 2006; Rehder & Hastie, 2001; Rehder 2003ab; Rehder & Kim, 2006). For example, imagine three category features X, Y, and Z that are arranged in a causal chain such that X causes Y which causes Z (Figure 1). According to the causal status effect, because it is "more causal," X is more diagnostic of category membership than Y (and Y is more causal than Z for the same reason). Of course, diagnosticity is not determined by causal status alone, being also influenced by co-occurrence and salience information as just mentioned. But when these factors are controlled causal status should dominate. Figure 2 presents two empirical studies that provide at least partial support for the causal status effect. In Ahn et al. (2000, Experiment 1), participants were instructed on artificial categories such as Roobans (a type of bird) that were described as possessing three features: eats fruit (X), has sticky feet (Y), and builds nests on trees (Z). In addition, participants were told about a causal relationship between X and Y ("Eating fruit tends to cause roobans to have sticky feet because sugar in fruits is secreted through pores under their feet.") and Y and Z ("Sticky feet tends to allow roobans to build nests on trees because they can climb up the trees easily with sticky feet."). Participants were then presented with items missing exactly one feature and asked to rate on a 0 to 100 scale how likely that item was a category member. For example, a missing-X item had sticky feet (Y) and builds nests
Causal Status Effect: When and Why
4
on trees (Z) but eats worms instead of fruit (not X). The ratings of the missing-X, missing-Y, and missing-Z items are shown in Figure 2A. In fact, an exemplar missing X was rated lower than one missing Y which in turn was lower than one missing Z, suggesting that X is more important to category membership than Y which is more important than Z. Similar results were found in Sloman et al. (1998, Study 3). A second study that provides partial support for the causal status effect was conducted by Rehder and Kim (2006). Once again participants were instructed on artificial categories. For example, participants were told about Romanian Rogos (a type of automobile) that had five features, three of which were causally related in causal chain. Participants were then shown a series of automobiles and asked to rate the category membership of each. Figure 2B presents the ratings for missing-X, missingY and missing-Z items from that study. The figure indicates that the results differed both quantitatively and qualitatively from Ahn et al. First, whereas the size of the causal status (measured as the difference between the missing-X and missing–Z items) was 35 points on a 100 point scale in Ahn et al., that difference was only 4 points in Rehder and Kim (and reached significance at the .05 level only by pooling 192 participants from three experiments). And, whereas the missing-X item was rated lower than the missing-Y item in Ahn et al., those items received almost identical ratings in Rehder and Kim. Rehder (2003b, Experiment 1) also found a smaller and partial causal status effect with a four-feature causal chain (a larger weight on the chain's first feature and a smaller but equal weights on the remaining features). Why should such an effect as theoretically important as the causal status effect vary so dramatically across studies? To answer this question this article considers two hypotheses, the strength of the causal links relating features and the number of items presented on the classification test. Hypothesis 1: Probabilistic Causal Links One possibility is that the large difference across studies has something to do with the materials tested. At first glance, this explanation would seem to be unlikely given that the studies actually tested some of the same categories and causal links. For example, Ahn et al. also tested
Causal Status Effect: When and Why
5
Romanian Rogos (a category first tested in Rehder & Hastie, 1997), but, just as with the other categories tested, Rogos yielded a large causal status effect in Ahn et al and a small one in Rehder and Kim, suggesting that the differences between studies is not due to the categories themselves. However, one potentially important difference concerns the exact wording of the causal links. For example, two causally related features of Romanian Rogo's were "butane-laden fuel" and "hot engine temperature," but whereas Ahn et al. described the causal relationship between them as "Butane-laden fuel tends to cause hot engine temperatures," Rehder and Kim retained the original wording from Rehder and Hastie (1997) and omitted the phrase "tends to." This alternative wording might have led Ahn et al.'s participants to interpret their causal links as more probabilistic. This difference in the perceived strength of the causal links would affect the causal status effect if one imagines that classifiers run "mental simulations" in which they estimate the probability of each feature in a causal chain. When links are probabilistic then each successive feature will be present with less certainty. This decreasing category validity (the probability of the feature given the category) means each subsequent feature will be less diagnostic of category membership (Rosch & Mervis, 1975); thus, a causal status effect appears. In contrast, deterministic causal links entail that each effect feature will be present whenever its cause is, implying more equal category validities and thus the absence of a causal status effect. The following experiment tests this hypothesis by replicating Ahn et al. (2000, Experiment 2), manipulating the strength of the causal links by either retaining the "tends to" phrase or replacing it with "always." Hypothesis 2: Number of Test Items A second fact that may influence the size of the causal status effect is the number of items presented during the classification test. On the one hand, participants in studies showing a small causal status effect asked participants to produce a classification rating for every test item formed by systematically varying the values on each binary dimension (Rehder, 2003b; Rehder & Kim, 2006). In contrast, a large causal status effect was found in studies that presented only items missing one of the characteristic features, that is, a missing-X item, a missing-Y item, and a missing-Z item (e.g., Ahn et
Causal Status Effect: When and Why
6
al. 2000; Sloman et al. 1998). Conceivably, with only three test items to rate, participants may have felt free to use a greater portion of the response scale to express any difference between those items. In addition, rating items that differed only on which feature was missing may have triggered a comparison of the relative importance of those features that wouldn't have been present otherwise. In other words, a large causal status effect may have arisen partly because of the demands of the task. To test this hypothesis, the following experiment manipulates the number of items presented during the classification test. Overview of Experiment The following experiment replicates the design and procedure of Ahn et al. (2000, Experiment 1) but also manipulates (a) the strength of the causal links by testing probabilistic ("tends to") and deterministic ("always") conditions and (b) the number of the test items by presenting either 3 test items or all 8 that can be formed on three binary dimensions. According to the hypotheses, a stronger causal status effect should appear in the probabilistic versus deterministic conditions and in the condition that tests 3 versus 8 items. This experimental design allows one other controversy in this literature to be addressed. In addition to testing how causal knowledge changes the weights of features considered individually, Rehder and Kim (2006) also asked how important certain combinations of features are for forming good category members. To assess this, they ran multiple regressions that included not only predictors that coded the presence or absence of the features but also predictors for the two-way interactions between features that code, for example, whether features X and Y are both present or both absent versus one present and the other absent. The regression weight on the interaction term represents the importance to participants' categorization rating of dimensions X and Y having the same value (present or absent) or not. Rehder and Kim found a large coherence effect, that is, categorization ratings were much larger when an expected correlation between two causally-related features was maintained in a test item and much lower when it was broken (also see Rehder & Hastie, 2001; Rehder 2003ab). However, Marsh and Ahn (2006) have questioned the robustness of this effect, arguing that it
Causal Status Effect: When and Why
7
is an artifact of the particular materials tested. The present experiment thus provides an opportunity to test for the presence of coherence effects using an alternative set of materials and experimental procedure. Method Materials The materials were the same as in Ahn et al. (2000, Experiment 1), except that in the deterministic condition the phrase "tends to" in the causal relationships was replaced with "always." For example, for Roobans the deterministic group was told that "Eating fruit always causes roobans to have sticky feet because sugar in fruits is secreted through pores under their feet." The other three categories were Covition (a disease), the Hino (a tribe) and Romanian Rogos (an automobile). Design Following Ahn et al. (2000, Experiment 1) each subject learned and answered questions for all four categories. Two of the categories were accompanied by interfeature causal relationships forming a causal chain whereas the other two (control) categories had no causal links. The strength of the causal links and the number of test items were manipulated between-subjects. There were two additional between-subject counterbalancing factors. First, the four categories were presented in four different orders. Second, which two of the four categories were accompanied by causal links was varied according to a four level factor. As result, each category appeared in every position and in it's causal and non-causal form an equal number of times. Participants One hundred and twenty-eight New York University undergraduates received course credit for participating in this experiment. They were randomly assigned in equal numbers to the probabilistic and deterministic condition, to the 3 or 8 test item condition, and the two counterbalancing factors.
Causal Status Effect: When and Why
8
Procedure The procedure replicated that of Ahn et al. (2000, Experiment 1). Each test question presented category information on top of the screen with the test item itself underneath. The first sentence of category information specified the category's three features. For causal categories, three additional sentences specified the two causal links and a summary of those links. The causal links included the phrase "tends to" or "always" in the probabilistic and deterministic conditions, respectively. The question specified a test item with values on three dimensions that indicated whether features X, Y, and Z were each present or absent. Participants were asked to rate the item's membership in the category on a 0–100 scale, with 0 defined as "definitely unlikely" and 100 as "definitely likely." The questions for each category were blocked and the presentation order within block was randomized for each participants. The 3-test-item condition presented items missing only feature X (011), only Y (101), and only Z (110). The 8-test-item condition presented all items that can be formed on the three binary dimensions. Results Test Item Ratings The test item ratings are presented in Table 1 and Figure 3 as a function of causal strength and the number of test items. Each panel of Figure 3 includes the results of the statistical test of whether the linear effect of features (i.e., the difference between the missing-X and missing-Z test items) differs in the causal versus control categories; that is, it indicates the presence of a causal status effect. The results in Figure 3A indicate that the Probabilistic-3Q condition which tested probabilistic causal links and presented only three test items replicated the essential results from Ahn et al. (2000, Experiment 1). Specifically, the missing-X test item 011 received a rating (28.5) that was 24.2 points lower than the missing-Z item 110 (52.7). The significant linear interaction with the control categories (p < .001) establishes a causal status effect in that condition. Figure 3 also reveals that the size of the causal status effect was affected by the experimental
Causal Status Effect: When and Why
9
manipulations. First, when causal links were probabilistic but subjects were presented with 8 rather than 3 test items (Figure 3B), the difference between the missing-X and missing-Z items was reduced to 10.9. This resulted in a nonsignificant causal status effect, p = .17. Second, when subjects were presented with 3 test items but were instructed on deterministic causal links by use of the world "always" (Figure 3C), the difference was reduced to 5.5 and the causal status effect was again nonsignificant, F < 1. Finally, when links were deterministic and only 3 test items were presented (Figure 3D), the causal status effect vanished entirely as the missing-X was rated slightly higher (35.5) than the missing-Z item (32.0). The apparent linear effect of features in Figure 3D (suggesting a reverse causal status effect) was only marginally significant however, p = .08. The effect of these manipulations is summarized in Figure 4 which presents the size of the causal status effect (the difference between the missing-Z and missing-X items) in the four conditions. The figure makes apparent that the causal status effect is larger for probabilistic versus deterministic links and for 3 versus 8 test items. A 2 x 2 ANOVA conducted with causal strength and number of test items as factors confirmed a main effect of strength, F(1, 124) = 9.46, MSE = 921, p < .01, a main effect of the number of test items, F(1, 124) = 4.27, MSE = 921, p < .05, and no interaction, F < 1. Regression Analyses The results in the 8-test-item conditions were supplemented with the same type of regression analyses conducted by Rehder and Kim (2006). Three predictor variables (fX, fY, fZ) were coded as +1 if the characteristic feature on that dimension was present and –1 if the uncharacteristic feature was present. Recall that the regression weight associated with each f i represents the influence that feature i had on category membership ratings. A positive weight indicates that the presence of the typical feature increased categorization ratings and the atypical feature decreased it. Three additional predictor variables were constructed by computing the multiplicative interactions between each possible dimension pair: fXY, fXZ, and fYZ. These variables are coded –1 if one of the characteristic features is present and the other absent, and +1 if both are present or both absent. Recall that for each interaction term a positive weight indicates that ratings are sensitive to whether the expected correlation between
Causal Status Effect: When and Why
10
that pair of dimensions is preserved (cause and effect both present or both absent) or broken (one present and the other absent). The feature weights presented in left panels of Figure 5 for the Probabilistic-8Q and Deterministic-8Q conditions mirror the test items ratings in Figures 3B and 3D. In the Probabilistic8Q condition there was a small causal status effect as the weight on feature X (17.2) was greater than on feature Z (14.0). The linear effect of feature did not differ significantly in the causal and control condition however, F(1, 31) = 1.00, MSE = 97.7, p > .20. In the Deterministic-8Q condition, there was no sign of a causal status effect at all, as the weight of 12.6 on feature X was actually lower than the weight of 13.8 on Z. The linear effect of features in the left panel of Figure 5B was marginally significant, F(1, 31) = 3.75, MSE = 47.4, p = .06. The weights on the two-way interaction terms presented in the right panels of Figure 5 reveal the presence of coherence effects in this experiment. In both the Probabilistic-8Q and the Deterministic-8Q conditions the two-way interaction terms were significantly larger when categories were accompanied by causal knowledge as compared to the control categories, F(1, 31) = 5.45, MSE = 12.4, p < .05 and F(1, 31) = 17.36, MSE = 24.3, p < .001, respectively. This finding indicates that, all else being equal, subjects generated higher classification ratings when a cause and effect feature were either both present or both absent, and a lower rating when one was present and the other absent. That is, items make for better category members when they manifest the pattern of interfeature correlations one expects to be generated by a category's causal laws. Further support for this interpretation is provided by directly comparing the interaction weights in the two causal conditions in Figure 6. First, the interaction weights were larger in the deterministic as compared to the probabilistic condition. This result reflects the intuition that interfeature correlations should be stronger for stronger causal links. A 2 x 2 ANOVA of the interaction weights in the causal conditions with causal strength and term as factors confirmed a main effect of strength, F(1, 62) = 6.51, MSE = 23.9, p < .01. Second, in Figure 6 the direct terms are larger than the indirect ones, F(1, 62) = 12.55, MSE = 10.5, p < .001, reflecting the fact that the indirectly connected features (X and Z) should be more weakly correlated than the directly connected ones (X
Causal Status Effect: When and Why
11
and Y, and Y and Z). The pattern of interaction weights in Figure 6 were not in perfect accord with the interpretation, as directly and indirectly related variables should be equally correlated (i.e., there should be no difference between the direct and indirect terms) when causal links are deterministic. But overall the weights in Figure 6 provide strong support for the view that good category members are those likely to be generated by causal relationships. Discussion The purpose of this research was to uncover why an important effect of causal knowledge on classification—the causal status effect—can vary by an order of magnitude across studies. Support was found for two hypotheses. First, the causal status effect was significantly larger when causal relations were probabilistic (as indicated by the phrase "tends to") versus deterministic ("always"). Indeed the effect was completely absent in the deterministic conditions. These results replicate those of Rehder and Kim (submitted) who also found a causal status effect when causal links were described as having a strength of 75% (and thus were probabilistic) but not 100% (deterministic). These findings are important because they provide evidence in favor of one model of the causal status effect over another. According to the generative model (Rehder, 2003ab; Rehder & Kim, 2006), people estimate the likelihood that a test item was likely to be generated by a category's causal model. This view predicts that each successive feature in a causal chain will be produced or "generated" less frequently when causal links are probabilistic, yielding a causal status effect. In contrast, when causal links are deterministic, each effect feature will be present whenever its cause is, and thus will be at least as prevalent as its cause, yielding no causal status effect. In contrast, Sloman, Love, and Ahn's (1998) dependency model predicts that the size of the causal status effect should increase monotonically with causal strength. The present results in which a causal status effect was present for the weaker (probabilistic) causal links but not the stronger (deterministic) ones confirm the predictions of the generative model and disconfirm those of the dependency model. Second, the causal status effect was significantly larger when participants were presented with only three test items versus eight. In our lab we have shown a similar increase in the causal status
Causal Status Effect: When and Why
12
effect for a three-feature chain using the materials and experimental procedure from Rehder and Kim (2006): A difference in the missing-X and missing-Z items of 5.3 points when 8 test items were presented increased to 25.0 point with only 3 test items. Because participants' beliefs about the diagnosticity of features is unlikely to be affected by the number of test items, they apparently felt free to use a larger portion of the response scale when rating a small number of similar items. In addition, many participants may have thought that the missing-X, missing-Y, and missing-Z items were equally good category members, but the perceived need to rate the items differently led participants to search for some basis for distinguishing them. That is, the small number of items may have introduced task demands that contributed to the large causal status effect. There are other factors known to influence the causal status effect. First, Rehder and Kim (submitted) found that the causal status effect is stronger when a category is essentialized, that is, when the causal chain of observable features is it itself caused by an unobserved feature that is defining of category membership (also see Rehder, 2003b). For example, if a disease causes symptom X which causes Y which causes Z, then classifiers are likely to reason backwards from the symptoms to the disease, and, of course, symptoms closer (in a causal sense) to the disease (e.g., X) are taken to be more diagnostic of that disease than more remote symptoms (Z). Second, Rehder and Kim (submitted) also found that the causal status effect is weaker when subjects are told that there are alterative causes of the features. This occurs because the differences in features' category validity induced by probabilistic causal links become smaller when there are additional causes of the features. Finally, there are undoubtedly other aspects of the materials and experimental procedures that can magnify or shrink the causal status effect (see Marsh & Ahn, 2006, and Rehder & Kim, 2008, for examples). But the present research showing a large causal status effect for probabilistic links and three test items but not otherwise indicates that those two factors are sufficient to explain the dramatic difference between Ahn et al. (2000) on the one hand and Rehder (2003b) and Rehder and Kim (2006) on the other. A secondary purpose of this article was to confirm the presence of the coherence effect using an alternative set of materials and experimental procedure. In fact, both conditions that tested 8 test
Causal Status Effect: When and Why
13
items yielded strong coherence effects. Moreover, the detailed pattern of the interaction weights largely conformed to the predictions of the generative model: Interfeature correlations were stronger for deterministic versus probabilistic links and for directly versus indirectly related features. These results rule out the possibility that the coherence effects observed in previous studies were due to an artifact of the materials tested (Marsh & Ahn, 2006). They also provide additional evidence against the dependency model which fails to predict coherence effects. In summary, this research has explained why the causal status effect varies by an order of magnitude across studies. It has also established the robustness of the coherence effect with an alternative set of materials and experimental procedure. These results are consistent with a generative model of classification and inconsistent with a dependency model.
Causal Status Effect: When and Why
14 References
Ahn, W. (1998). Why are different features central for natural kinds and artifacts? The role of causal status in determining feature centrality. Cognition, 69, 135-178. Ahn, W., Kim, N. S., Lassaline, M. E., & Dennis, M. J. (2000). Causal status as a determinant of feature centrality. Cognitive Psychology, 41, 361-416. Luhmann, C. C., Ahn, W., & Palmeri, T. J. (2006). Theory-based categorization under speeded conditions. Memory & Cognition, 34, 1102-1111. Marsh, J., & Ahn, W. (2006). The role of causal status versus inter-feature links in feature weighting. In R. Sun & N. Miyake (Eds.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 561-566). Mahwah, NJ: Erlbaum. Murphy, G. L. (2002). The big book of concepts: MIT Press. Rehder, B. (2003a). Categorization as causal reasoning. Cognitive Science, 27, 709-748. Rehder, B. (2003b). A causal-model theory of conceptual representation and categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1141-1159. Rehder, B. & Hastie, R. (1997). The roles of causes and effects in categorization. In Proceedings of the 19th Annual Conference of the Cognitive Science Society (pp. 650-655). Stanford, CA. Rehder, B. & Hastie, R. (2001). Causal knowledge and categories: The effects of causal beliefs on categorization, induction, and similarity. Journal of Experimental Psychology: General, 130, 323-360. Rehder, B., & Hastie, R. (2004). Category coherence and category-based property induction. Cognition, 91, 113-153. Rehder, B. & Kim, S. (2006). How causal knowledge affects classification: A generative theory of categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 659-683. Rehder, B. & Kim, S. (2008). The role of coherence in causal-based categorization. In V. Sloutsky, B. Love, K. McRae (Eds.), Proceedings of the 30th Annual Conference of the Cognitive
Causal Status Effect: When and Why
15
Science Society (pp. 285-290). Mahwah, NJ: Erlbaum. Rosch, E. H., & Mervis, C. B. (1975). Family resemblance: Studies in the internal structure of categories. Cognitive Psychology, 7, 573-605. Sloman, S. A., Love, B. C., & Ahn, W. (1998). Feature centrality and conceptual coherence. Cognitive Science, 22, 189-228.
Causal Status Effect: When and Why
16 Author Note
Bob Rehder, Department of Psychology, New York University. Correspondence concerning this article should be addressed to Bob Rehder, Department of Psychology, 6 Washington Place, New York, NY 10003 (email:
[email protected]).
Causal Status Effect: When and Why
17
Table 1 Classification ratings in each condition. Standard errors are shown in parentheses.
Probabilistic
Deterministic
Chain
Control
Chain
Control
011 (Missing X)
28.5 (3.4)
37.1 (4.0)
33.4 (3.5)
49.6 (3.8)
101 (Missing Y)
37.1 (3.4)
44.5 (4.1)
28.9 (4.5)
54.9 (3.3)
110 (Missing Z)
52.7 (4.1)
40.8 (3.5)
38.9 (5.0)
52.5 (3.7)
000
4.3 (1.3)
3.2 (1.2)
4.4 (1.3)
4.3 (1.2)
001
17.2 (2.9)
21.2 (3.3)
16.2 (2.5)
21.2 (2.4)
010
11.9 (2.0)
21.5 (2.5)
11.2 (2.1)
21.2 (2.2)
100
19.0 (2.3)
23.5 (3.0)
14.6 (2.7)
23 (2.3)
011 (Missing X)
42.5 (4.0)
56.2 (2.9)
35.5 (3.8)
50.3 (3.4)
101 (Missing Y)
48.2 (3.1)
57.6 (3.6)
27.7 (4.3)
56.1 (3.0)
110 (Missing Z)
53.4 (4.0)
56.6 (3.6)
32.0 (4.7)
56.8 (2.7)
111
93.0 (2.4)
95.1 (1.3)
93.5 (1.8)
91.1 (2.2)
3-Q Condition
8-Q Condition
Causal Status Effect: When and Why Figure 1 An interfeature causal network.
18
Causal Status Effect: When and Why Figure 2 The size of the causal status effect from two previous studies.
19
Causal Status Effect: When and Why Figure 3 Ratings for missing-X, missing-Y, and missing-Z test items in each condition.
20
Causal Status Effect: When and Why Figure 4 The size of the causal status effect in four conditions. Error bars are standard errors.
21
Causal Status Effect: When and Why Figure 5 Regression weights in the Probabilistic-8Q and Deterministic-8Q conditions.
22
Causal Status Effect: When and Why Figure 6 Causal interaction weights from the Probabilistic-8Q and Deterministic-8Q conditions.
23
Causal Status Effect: When and Why
24