Learning Complex Causal Structures

2 downloads 0 Views 433KB Size Report
are amply explained in these sources, and we will express all three models within a common ... unambiguous predictions, and so the R-W model will make the same prediction. .... Note that these two assumptions are each other's converse. ..... series of cases, and that they would receive partial course credit for completing ...
Learning Complex Causal Structures -- 1

Learning Complex Causal Structures

David Danks Institute for Human & Machine Cognition University of West Florida Craig R.M. McKenzie Department of Psychology University of California, San Diego

Currently under review at Cognitive Science Please do not quote or cite without authors’ permission

Address correspondence to: David Danks Institute for Human & Machine Cognition University of West Florida 40 S. Alcaniz St. Pensacola, FL 32501 [email protected] (850) 202-4462 (phone) (850) 202-4440 (fax) Draft of November, 2002

Learning Complex Causal Structures -- 2 Abstract Most current theories of human causal learning are essentially parameter estimators: they assume a fixed causal structure and estimate causal strengths within that structure. In these theories, absence of causation is represented as zero causal strength, rather than a distinct causal structure. In this paper, we first present the theoretical framework of Bayesian networks, which can represent both structure (presence/absence of causation) and parameters (strength of causation). We then present a series of experiments involving a particularly complex causal structure and a novel methodology that focuses on structural discriminations, rather than parameter estimation. These experiments suggest that people are capable of doing more than just parameter estimation. A significant group of participants seems to be learning (something isomorphic to) a Bayesian network.

Learning Complex Causal Structures -- 3 Learning Complex Causal Structures The psychological literature on causal learning has focused on cases in which participants are provided with information about binary potential causes and an effect in a series (or summary) of cases (with some exceptions – see, e.g., Ahn & Bailenson, 1996; Ahn, et al., 1995). For example, we might receive information about the co-occurrence of plants’ flowering and treatment with some chemical, and then have to determine whether the chemical causes the plants to flower. In recent years, the three most prominent theories of human causal induction in this experimental domain have been the Rescorla-Wagner theory (e.g., Rescorla & Wagner, 1972; Shanks, 1995), conditional ∆P (e.g., Spellman, 1996; Lober & Shanks, 2001), and Cheng’s Power PC theory (e.g., Cheng, 1997; Buehner, et al., 2001). Since the specifics of each theory are amply explained in these sources, and we will express all three models within a common framework, we provide here only a brief overview of them. We will assume throughout this article (along with all three causal learning theories discussed below) that all variables are labeled (typically through the experiment’s cover story) as either a potential cause or the effect. For simplicity, we will denote the potential causes as Ci’s, and the effect as E. Although initially intended as only a model of animal associative learning, the RescorlaWagner model (henceforth, the R-W model) provides an algorithmic account of how to update ratings of causal strength upon observation of novel cases. That is, it predicts that one’s strength ratings for individual cues are changed upon presentation of values (present or absent) for a set of cues and a value (present or absent) for the effect. The R-W model does not explicitly make predictions about people’s long-run behavior. However, the long-run behavior of this model has been characterized for special cases in Cheng (1997), and for arbitrary experimental designs in

Learning Complex Causal Structures -- 4 Danks (2002). It turns out that the R-W model makes the same predictions as the conditional ∆P whenever the latter theory makes an unambiguous prediction. The conditional ∆P theory only predicts individuals’ judgments after sufficient data have been provided that no further change in (perceived) associative strength is expected. According to conditional ∆P, the strength rating will be proportional to the conditional contrast, which is simply P(E | C) – P(E | ¬C), when there is only one potential cause. If there are n potential causes, then for each potential cause Ci, there are 2n–1 different conditional contrasts, corresponding to conditioning on the presence or absence of each of the other potential causes. We use the following notation for the conditional contrasts:

∆Pi.{X} = P(E | Ci & X) – P(E | ¬Ci & X)

where X is some combination of presence and absence of each of the other potential causes. Note that the conditional ∆P theory will in many cases be ambiguous, since all of the conditional contrasts for a particular potential cause might be undefined (since one or both of the conditional probabilities may be undefined), and that the conditional contrasts for a potential cause need not be equal even when they are defined. In this article, we will only be considering situations in which every potential cause has at least one defined conditional contrast, and every defined conditional contrast for a potential cause is equal. Hence, the conditional ∆P theory will make unambiguous predictions, and so the R-W model will make the same prediction. Cheng’s Power PC theory makes three changes in the conditional ∆P theory. First, her theory predicts that the particular ∆P used is conditioned on a focal set of events for which the participant judges that other causes of the effect are independent in probability of the occurrence

Learning Complex Causal Structures -- 5 of the particular potential cause whose strength or “causal efficacy” is to be assessed – not just the presence or absence of other cues. Second, the causal strength is normalized to take into account the amount the cue could possibly raise (or lower) the probability of the effect. Lastly, different estimation equations are used depending on whether the potential cause is generative (increases the likelihood of the effect) or preventive (decreases the likelihood of the effect). While each of these theories proposes a different computational mechanism for determining causal strength (based in some way on the conditional contrast), they are all parameter estimators over an as-yet-unexplained fixed graphical structure (assuming absence of causation is simply modeled as a zero parameter value). We will later outline the fixed graphical structure, and use this feature to design causal learning scenarios in which all three theories predict that people will behave non-normatively. The experimental results we then provide argue that all three theories seriously underestimate people’s ability to learn complex causal structures. Bayes Nets as Causal Models Before exploring the theories’ performance on complex causal structures, we need a framework in which to model (objectively) causal relationships. Numerous studies and applications have shown Bayes nets to be excellent models of causal relationships. In the next sections, we provide an overview of the Bayes net formalism and algorithms for learning the structure of a Bayes net from data; more details about the formalism can be found in one of several introductions to the subject (e.g., Pearl, 2000; Spirtes, et al., 1993). We then show how the causal learning theories discussed are simply estimating parameters in a (structurally fixed) Bayes net.

Learning Complex Causal Structures -- 6 The General Formalism Suppose we have a set of variables V = {V1, …, Vn}. A Bayes net for V is composed of two related pieces: (i) a directed acyclic graph; and (ii) a joint probability distribution over V. For the graph, there is a node for each variable in V, and nodes are (sometimes) connected by an edge with exactly one arrowhead. The directed edge points from the parent variable and to the child variable. We use similar relational terminology (e.g., “ancestor”) to describe other variables. So, for example, in the graph A

B  C, A is B’s parent, and B is A and C’s child.

Furthermore, we say that B is a collider in this graph, since the edges “collide” at it. A directed path is a sequence of the form V1



Vn with all edges oriented in the same direction. We

say a graph is acyclic when there are no directed paths (of non-zero length) from a node back to itself. The acyclic graph and probability distribution are related through two assumptions. The Markov assumption states: Any variable X is (probabilistically) independent of its (graphical) non-parental non-descendants conditional on its (graphical) parents. The Markov assumption enables us to decompose the joint probability distribution into the product of n simpler distributions. Specifically, if we denote the parents of Vi by pa(Vi), then the Markov assumption constrains the probability distribution to factor as:

n

P(V1 ,...,Vn ) = ∏ P(Vi | pa(Vi )) i =1

For example, any probability distribution Markov for the graph X Y, Z) = P(X) P(Y | X) P(Z | Y).

Y

Z must factor as P(X,

Learning Complex Causal Structures -- 7 The Faithfulness assumption (also called Stability by Pearl, 2000) states: If variables X and Y are (probabilistically) independent conditional on some set S ⊆ V \ {X, Y}, then there is no (graphical) edge between them. This assumption simply rules out cases in which two edges exactly cancel each other’s effects. For a real-life example, suppose that running increases your metabolism, which ordinarily causes you to lose weight. However, suppose that running also increases your appetite, so you eat more and usually gain weight. Faithfulness (or rather, a causal version of the assumption) simply says that these two chains do not exactly counter-balance each other so that your running makes no difference on your weight. This assumption is necessary to learn the most parsimonious graph for a particular dataset. Note that these two assumptions are each other’s converse. The Markov assumption says “no edge implies conditional independence,” and the Faithfulness assumption says “conditional independence implies no edge.” Bayes Net Learning Algorithms One often hears that “association does not imply causation.” That is, if we are only told that two variables A and B are associated, we cannot determine whether A causes B, B causes A, there is a common cause of the two, or some combination of these possibilities. This underdetermination is often taken to preclude the possibility of learning algorithms that take data as input and output causal graphs. While this underdetermination is an insurmountable problem for two variables, we might hope to learn the “true” Bayes net given more complex data, and given some rather general (i.e., not domain-specific) assumptions about the way causal and probabilistic relationships are intertwined (such as the above Markov and faithfulness assumptions).

Learning Complex Causal Structures -- 8 Two different types of learning algorithms have been proposed for learning Bayes nets from data: Bayesian updating and constraint-based search. These strategy types make (essentially) the same asymptotic predictions, but they differ in both process and performance profiles. In a Bayesian updating algorithm (e.g., Cooper, 2000; Geiger, et al., 1996; Heckerman, et al., 1994), we begin by assigning a probability distribution (typically uniform, or close to it) to the space of possible graphs over the variables in V. This distribution encodes any prior beliefs we might have about the likelihood of each graph (e.g., that a particular causal connection is particularly likely or unlikely). For each graph, we also assign a probability distribution over the parameters of the graph, which define the possible families of probability distributions encoded by the graph. So if, for example, we know that variables are only ever connected in noisy-OR gates, we can encode that information in each graph’s parameter distribution. As we receive data, we use standard Bayesian updating to revise both the probability distribution over the search space, and also the probability distributions for the parameters in each of the possible graphs. After processing all of the data, a Bayesian updating procedure then outputs an updated (and asymptotically correct) probability distribution over the space of possible graphs. Bayesian search procedures are quite flexible, and often give more useful information than their constraint-based counterparts. However, if we have n different variables in our system, then the number of possible graphs is at least exponential in n. Therefore, Bayesian search procedures are only practical if we use a series of heuristics, even though we can prove that the use of these heuristics is sometimes incorrect. Constraint-based procedures (e.g., Spirtes, et al., 1993) use patterns of conditional and unconditional independencies and associations to determine the equivalence class of graphs that could have produced that pattern. An equivalence class (in this context) is a set of graphs that all

Learning Complex Causal Structures -- 9 imply (by the Markov and faithfulness conditions) the same conditional and unconditional associations and independencies. The equivalence class output is usually represented as a partially directed graph with associated rules for transforming the output into the full equivalence class. For example, consider the simple common cause structure presented in Figure 1. Recall that we are assuming a provided partition of the variables into potential causes and the effect. If we take data from that graph and pass it as input to a suitable constraint-based algorithm, then (assuming there are no latent variables) the algorithm outputs the equivalence class given in Figure 2. In other words, the constraint-based algorithm cannot tell which of the three graphs is the true one, but it can tell us that no other graphs could have produced the data (if the Markov and faithfulness assumptions hold). These three graphs are the only ones that produce the observed pattern of (unconditional and conditional) associations. Notice, though, that the third graph is not a legitimate possibility for a typical causal learning situation. If E is a cause of C1, then E must come before C1. But if that were the case, then we never would have thought that C1 could possibly be a cause of E. Therefore, we know that the true graph must be one of the top two. And notice that, in each of these graphs, C1 is a cause of E and C2 is not a (direct) cause of E. Constraint-based methods also have limitations. As the number of connections within the graph increases, the number of conditional independence checks required grows exponentially. Hence, these methods are not practical for dense graphs. Also, constraint-based methods sometimes require determining independence conditional on a large number of other variables. In these cases, the statistical test for independence often has little power, and so does not provide a reliable answer, potentially leading to incorrect algorithm output.

Learning Complex Causal Structures -- 10 For the remainder of this article, we will use the output of a constraint-based algorithm as the normative standard for the information contained in the observed sample. That is, we will assume that the normative response for a particular causal learning situation is whatever information is true in every member of the equivalence class of graphs for the observed data. Our experimental design will allow for easy computation of the normative response, since all participants see the same data (though in differing orders). The choice of a constraint-based algorithm rather than Bayesian updating is purely one of convenience, since the two types of algorithms make essentially the same asymptotic predictions. We do not here claim that constraint-based methods provide a psychologically realistic theory; we only use them to place an upper bound on the amount of information that could possibly be extracted from a particular data set. Causal Learning Theories as Bayes Net Learning Given this general framework for causal modeling, we can now examine what causal models are learned by the earlier-introduced psychological theories. Before examining the underlying similarities of the theories, we should note that none of the theories are expressed in Bayes net terms. It turns out that, for one potential cause, the causal strength ratings for the various theories are maximum likelihood estimates of the wC parameter in the graph in Figure 3, where B represents the always-present background “cause” (Glymour, 1998; Tenenbaum & Griffiths, 2000). Differences among the theories arise from different parameterizations of the Bayes net. If we define δ(X) = 1 if X is present and 0 otherwise, then the parameterizations are:

R-W & Conditional ∆P: P(E) = wC × δ(C) + wB

Learning Complex Causal Structures -- 11 Power PC:

P(E) = wC × δ(C) + wB – wC × wB × δ(C) (generative C) P(E) = wB – wB × wC × δ(C) (preventive C)

We omit the δ(B) terms from the above equations, since we assume that B is always present and so δ(B) = 1 always. Also, not all pairs of wC and wB are possible for the R-W and conditional ∆P models, since some pairs imply P(E) > 1, which is of course impossible. These pairs correspond to situations in which the conditional ∆P theory does not make a prediction (because the conditional contrasts are not well-behaved). Interestingly, the power PC theory is only a maximum likelihood estimator for its parameterization if we give up Cheng’s notion of a focal set, and instead consider all cases as relevant (Tenenbaum & Griffiths, 2000). The parameterizations generalize to multiple variables as shown in Danks, et al. (2002). The R-W model and conditional ∆P theories are known to continue to be maximum likelihood estimators for multiple potential causes given their particular parameterization, and the power PC theory is conjectured to remain a maximum likelihood estimator for its parameterization (assuming we continue to do without the notion of focal sets). Moreover, if we are principally interested in qualitative predictions (e.g., “C causes E” vs. “C does not cause E”), then the similarity among these theories grows stronger: A factor is predicted to be judged causal if and only if it is associated with the effect, conditional on either (i) the presence or absence of the other (observed) potential causes (in the case of R-W and conditional ∆P); or (ii) the focal set of events (for power PC). The three theories have the same, relatively simple qualitative picture, though they disagree about the exact quantitative details. The power of this characterization of the three theories is that it enables us to rapidly derive qualitative predictions simply from the graphical causal structure. We only need to know (or

Learning Complex Causal Structures -- 12 assume) that the probability distribution is Markov and faithful to the graph to determine the qualitative, binary predictions of the three theories. For example, again consider the simple causal structure given in Figure 1. In this causal structure, C1 is a common cause of both C2 and E. Assuming the probability distribution allows the conditional ∆P theory to make a prediction, then both the R-W model and the conditional ∆P theory predict that humans will judge C1 to be a cause of E, since the two variables are conditionally associated (in any probability distribution Markov and faithful to this graph). However, since C2 and E are not conditionally associated given C1, these two theories predict that people will respond that C2 is not a cause. Determining the power PC prediction is a bit more difficult, since the focal set of events is, according to Cheng, determined on pragmatic grounds (that are left relatively unspecified). However, if both C1 and C2 are taken to be legitimate potential causes of E (and so, following a suggestion in Cheng (1997), the focal set for each variable consists of the events in which the other variable did not occur), then the power PC theory makes the same prediction as the other two theories: people will respond that C1 is a cause and C2 is not a cause. If, on the other hand, the potential causes are not taken seriously (at least when determining the strength of the other cause), then we should expect the focal sets for each variable to be simply every event, and so the theory’s prediction is that both C1 and C2 will be perceived as causal, since they are both unconditionally associated with E. We will refer to the former prediction as “Power PC with restricted focal set,” and the latter prediction as “Power PC with complete focal set.” We earlier considered the equivalence class of graphs for (data drawn from) this simple causal structure, and found that both of the possible graphs had two features in common: C1 was a direct cause of E, and C2 was not a direct cause of E. Causation is, for all three theories under

Learning Complex Causal Structures -- 13 consideration, understood to be direct causation; people are predicted to respond “C is a cause” if and only if C is perceived as a direct cause (relative to the other variables). Therefore, the normatively correct response for data from this common cause structure would be to say that C1 is a cause, and C2 is not a cause. Hence, the R-W model, conditional ∆P theory, and power PC theory with restricted focal sets predict that people will behave normatively; the power PC theory with complete focal sets predicts that people will not behave normatively. Alternately, consider the causal structure in Figure 4, in which both C1 and C2 actually are causes of E. For data Markov and faithful to this graph, both C1 and C2 are both unconditionally and conditionally associated with E. Therefore, all three theories (including both possibilities for the power PC theory) predict that people will respond that both C1 and C2 are causes of E. Furthermore, the equivalence class of graphs outputted by a constraint-based algorithm is the singleton set consisting of the graph in Figure 4, and so this predicted response is the normatively correct one. All three theories assume that the various potential causes are all (potentially) direct parents of the effect. In the simple causal structures described above, the theories all (with the exception of Power PC with complete focal sets) predicted that people will behave normatively. Moreover, the above causal structures exhaust (to our knowledge) the causal structures that have been tested in the psychological literature (when absence of causation is modeled as a zero parameter value). However, there are graphs that cannot be mapped onto the assumed fixed graph without loss of structural information. In the next section, we describe such a causal structure, show that all of these causal theories predict that people will behave non-normatively, and then outline experiments to test these predictions.

Learning Complex Causal Structures -- 14 Experimental Design The framework of Bayes nets enable us to construct quite complex causal structures, including ones that cannot be expressed in the “one-layer” form described in the previous section. For some causal structures, all three theories predict that people will not learn the correct causal structure when presented with data representative of (a distribution Markov and faithful to) that graph. The simplest such graph is provided in Figure 5, where U is an unobserved variable (and so is not considered a potential cause). The R-W model and the conditional ∆P theory both predict that a potential cause will be perceived as an actual cause if and only if it is associated with E conditional on the other potential cause. Conditioning on C1 does not change the association between C2 and E, so these two theories predict that C2 will incorrectly be perceived as a cause of E. Somewhat more surprisingly, C1 is also associated with E conditional on C2, since conditioning on a collider induces an association between the (previously independent) causal parents. To see why this surprising conclusion holds, consider a car that will start if and only if both the battery is charged and the gas tank is full (represented graphically as Battery

Car Start  Gas Tank). The states

of the battery and gas tank are independent of each other. However, conditional on the car not starting, the two states are now associated: learning that the battery is charged gives us information about the gas tank (namely, that it must be empty). Hence, as a result of this induced association, these theories predict that C1 will also be perceived as a cause of E. Now consider the power PC theory. If both C1 and C2 are perceived a priori as legitimate causes, then again each of the potential causes will be predicted to be perceived as an actual cause, since we will condition on sets in which the other potential cause is absent (the restricted focal set prediction). If they are not perceived as potential causes (with respect to determination

Learning Complex Causal Structures -- 15 of the other’s causal status), then C1 will be perceived as not a cause, since it is unconditionally independent of E, and C2 will still be perceived as a cause, since it is unconditionally associated with E (the complete focal set prediction). We might naturally now ask about the normative responses for data drawn from this complex causal graph. If the constraint-based algorithm assumes that there are no unobserved variables, then the equivalence class is given in Figure 6. Excluding the possibility of unobserved variables, there is only one graph that could have produced this particular pattern of associations. However, this graph is not a possibility, since we know that E cannot be a cause of C2 because the effect is never a cause of the potential causes, and so this algorithm makes no prediction. If we use a different algorithm that allows for the possibility of unobserved variables, then the equivalence class is given by the partial ancestral graph (PAG) in Figure 7. This graph is unlike others we have previously seen, and so requires a bit more explanation. This graph says that there is a common cause of C2 and E, and that either (a) C1 is a cause of C2, (b) there is a common cause of C1 and C2, or (c) both (a) and (b) are true. However, regardless of the exact nature of the relationship between C1 and C2, the constraint-based algorithm can determine that neither C1 nor C2 is a cause of E. The only causal connection involving E is the common cause of C2 and E. Hence, the normative response for data from this complex causal structure is that neither C1 nor C2 is a cause of E. We now have predictions for the three different theories, as well as the normative responses. These predictions, as well as the true causal relationships, are summarized in Table 1. We should note that the “Normative response” row does not directly correspond to any currently proposed psychological theory. There have been attempts to develop a psychologically plausible theory that matches the long-run predictions of the Bayes net learning methods

Learning Complex Causal Structures -- 16 (Tenenbaum & Griffiths, 2000; Danks, et al., 2002). We do not here commit ourselves to any particular account for learning Bayes nets; we are asking only whether people behave “as if” they are Bayes net learners, and thus can behave normatively using the information available to them. There are two important features of Table 1 for our purposes. First, none of the theories predicts the normative response for the complex causal structure (and the power PC theory with complete focal sets also predicts a non-normative response for the common cause structure). The complex causal structure thus provides a critical test of the three parameter estimation theories, since correct predictions of non-normative behavior are typically taken to provide stronger confirmation of a theory. Second, the normative response actually gives the correct causal relationships. That is, the truth (or at least, the relevant parts of it) is learnable in all three situations. Hence, if there is value to learning true causal relations in the world, we might reasonably expect that people will produce the normative response (since it also corresponds to partial truth). Before describing the experimental results of this article, we briefly outline the experimental methodology we use. The most common methodology in human causal learning experiments involves showing participants a (not always randomly ordered) series of cases, or occasionally just statistical information about the whole series, and then asking the participants to provide a rating of the “causal strength” (or some rough equivalent) of a particular factor (possibly given some linguistically specified baseline). The ratings are then analyzed in two different ways and tested against the theoretical rating predictions (averaged orderings in, e.g., Buehner and Cheng, 1997; Lober and Shanks, 2000; averaged ratings in, e.g., Baker, et al., 1993; Wasserman, et al., 1993). Both analysis methods are potentially seriously flawed, however, since

Learning Complex Causal Structures -- 17 each makes substantial (untested) assumptions about the stability of rating scales both within and between participants. Moreover, the differences in the various theories’ predicted ratings are often quite slight, and so one ought to use as robust an analysis method as possible. Keeping with the spirit of the theoretical predictions provided in this section, we use a different methodology. Rather than asking for ratings, we query only for binary decisions about whether a particular factor is or is not a cause of the putative effect. That is, participants are shown a series of cases and, after some number of cases, are asked whether a factor is a cause of the effect. They provide a simple “yes/no” classification, which we conjecture will be a more reliable judgment than numerical ratings. Some validation for this experimental methodology can be found in an experiment in Danks (2001), which used this setup and found strong agreement with some results from Lober and Shanks (2000). Experiment 1 Method Participants were 85 University of California, San Diego (UCSD) students who received partial credit for psychology courses. Participants were told in advance that they would be asked several questions at the end of each series of cases, and that they would receive partial course credit for completing the experiment. The experiments were conducted on a computer. Each participant saw both conditions, and the order of the two conditions was randomly determined for each participant. Participants were presented with the cover story that they were plant biologists examining a novel species of plant. They were told that the local people sprinkled a liquid on the plants, but that no one knew why. Regardless of the reason, however, their job was to try to figure out what caused the plants to bloom: the liquid, the height of the plant, some other factor, or some

Learning Complex Causal Structures -- 18 combination. They were also told that they would be able to view as many cases as they wanted. After every 12 cases1, the participants were given the option to stop examining cases and answer whether the liquid was a cause of the blooming, and, separately, whether the height was a cause (order of the questions was randomly assigned). If the participant said “Yes” to either question, then the participant was asked whether that cause made the plant more likely or less likely to bloom. Participants only had to provide the direction (if any) of the causal strength. Presentation of the control and experimental conditions (described below) were randomized. After seeing the series of cases for one condition, participants were immediately shown the second series; no feedback was provided about their responses. The cover stories were the same, but the colors of the liquid and of the plants (including their flowers) were different for the two series of cases. In the control condition, participants were shown cases drawn from the graph (and associated probability distribution) in Figure 8. This particular probability distribution was chosen in order to have relatively large associations, while keeping the probability of the liquid and the probability of the height constant across the conditions. In the experimental condition, participants were shown cases drawn from the graph (and probability distribution) in Figure 9. This probability distribution was chosen to maximize the conditional association between Liquid and Blooming, while keeping the unconditional probabilities of Liquid = Yes and Height = Tall constant across the two conditions. Predictions for the various theories for these two conditions are provided in the first and third columns of Table 1. Results Despite the fact that participants were also asked whether the cause was generative or preventive, we were only interested for this experiment in the binary choice: was the variable a

Learning Complex Causal Structures -- 19 cause or not? Table 2 provides the joint data for the two conditions. All reported p-values are for χ2 tests. As expected, the responses across the two conditions were significantly different from the 6.25% per response pattern expected by chance (p < .001), and the responses in the two conditions were significantly different from each other (p < .001). That is, participants were not just guessing, and they clearly responded differently in the two different situations. The order in which the two conditions were seen (p > .50), as well as the order of questions (p > .50), had no effect on the participants’ responses in either condition. Also, the number of cases viewed had no effect in either condition (p > .35). We also indicate in Table 2 the cells corresponding to participants whose response profiles fit the predictions of each of the theories (where we have collapsed together the theories that make identical predictions). So, for example, only 2 out of the 85 participants responded to all four questions as predicted by the three parameter estimation theories. 17 of the 85 participants responded “as if” they were normative Bayes net learners. We also pause to highlight the modal response, in which people simply say “Not a cause” for everything; we believe there are many reasons why participants would respond with “Not a cause” for each variable, and we explore those possibilities in subsequent experiments. Discussion We first note that the participants (or at least a significant group of them) were not just confused by the experimental design. If confusion were the best explanation of the data, then we would have expected the response patterns for the two conditions to be similar, but they were significantly different. Hence, (at least a large group of) the participants are not just expressing confusion throughout the experiment by defaulting to saying “Not a cause”.

Learning Complex Causal Structures -- 20 Second, we seem to be justified in assuming that the participants’ causal beliefs had stabilized, which is a necessary assumption for the various theoretical predictions, since (between-participant) the number of cases seen had no effect on participant responses. If only some of the participants looked long enough to reach asymptote, then we might have expected to find some effect of the number of cases seen. However, the lack of effect implies that the majority of the participants either all went to asymptote, or all stopped at the same point prior to asymptote. Therefore, given the experimental design, we have reason to believe that the beliefs of (at least a significant number of ) the participants had stabilized. Most importantly, this theory presents a serious challenge to the parameter estimation theories outlined in previous sections. Had participants’ responses universally (or nearly so) fit the common prediction of the parameter estimation theories, we could have drawn a relatively simple lesson from the experiment: people are sometimes incapable of producing the normatively correct response. This would be an interesting result, though it would not have helped decide the case for any of the three current theories. However, the fact that the theory that predicts the behavior of the most participants (besides doing nothing at all) is that people are behaving “as if” they are Bayes net learners presents a serious challenge to the parameter estimation theories. Despite this challenge, there are at least two potential problems with this experiment, though both can be addressed with minor adjustments. First, the three current theories produce the normatively correct prediction if participants do not think Height could possibly be a cause. If Height is not viewed as a legitimate potential cause, then participants should look at the liquid’s unconditional association with the blooming in each condition. The liquid is unconditionally associated with blooming in the control condition (and so would be judged a

Learning Complex Causal Structures -- 21 cause), and unconditionally independent of the blooming in the experimental condition (and so would be judged to be not a cause). And if Height is a priori ruled out, then the participants should just say “Not a cause” for it in both conditions. Hence, the three current theories all can recover the normatively correct predictions. We can respond to this possibility by using a different cover story in which both potential causes have equal a priori plausibility. A second open challenge is whether participants are confused by the probability distribution rather than the experimental setup. We earlier argued that the difference in response patterns in the two conditions showed that participants were not simply confused. However, the probability distribution in the control condition was arguably simpler than the distribution in the experimental setup since the experimental setup involves an unobserved cause, and confusion regarding the more complicated probability distribution might have led participants to respond “Not a cause” to both factors in the experimental condition. If this confusion actually occurred, then people might respond normatively even though they were actually following one of the parameter estimation theories. We can answer this challenge by replacing the control graph with the graph in Figure 4 (the common effect structure). The probability distribution for this graph will be as complicated as the distribution for the experimental condition. If participants can determine, for this graph, that both C1 and C2 are causes, then they are not being confused by the probability distribution. And for a graph of this type, all three of the theories (as well as the normative response) is to respond that both C1 and C2 are causes of E. Hence, if participants respond normatively in the control condition and non-normatively in the experimental condition, proponents of the parameter estimation theories cannot explain the non-normative responses by appealing to confusion caused by the probability distribution.

Learning Complex Causal Structures -- 22 Experiment 2 Method Participants were 97 UCSD students who received partial credit for psychology courses. Participants were told in advance that they would be asked several questions at the end of each series of cases, and that they would receive partial course credit for completing the experiment. The experiments were conducted on a computer. Each participant saw both conditions, and the order of the two conditions was randomly determined for each participant. In each condition, participants were presented with the cover story that they were doctors attempting to determine the cause(s) of a rare disease. In the control condition, the disease was “pneumothoria,” and the potential causes were the levels of “vidarons” and “mesomins.” In the experimental condition, the disease was “hemeosis,” and the potential causes were “endolyte” and “acrocyst” levels. Participants were also warned about the possibility of an unknown cause. They were then told that they would be able to view as many patients as they wanted. Before the participants saw any patient data, they were asked to provide a judgment of the likelihood that each potential cause was an actual cause. The rating was based on a 100-point scale, with three benchmarks provided: 0 = definitely not a cause; 50 = equal chance of being a cause and not being a cause; 100 = definitely a cause. Participants had to enter a number for each of the potential causes. They were then shown a series of patient data, with a reminder above each case that they were supposed to determine the cause or causes of the disease: potential cause #1, potential cause #2, or something else. In each condition, the actual names of the potential causes were used instead of, e.g., “potential cause #1.” The patient data consisted of two bars representing the level of the potential causes (high or low), as well as the word “High” or “Low” next to each bar. The

Learning Complex Causal Structures -- 23 bars were in different colors, and the two series of cases used different colors for the bars. Below the bars, the disease was listed as either “Present” or “Absent.” The patient number was also presented. After every 12 cases, the participants were given the option to stop examining cases and answer whether each potential cause was a cause of the disease (with question order randomly assigned). After seeing one series of cases, participants were shown the cover story for the other series, entered ratings, saw patient data, and answered the questions. In the control condition, participants were shown cases from the graph (and associated probability distribution) in Figure 10. In the experimental condition, participants were shown cases drawn from the graph (and probability distribution) in Figure 11. These probability distributions were chosen so that they were equally complicated (when the experimental condition is marginalized over the observed variables), and to keep the unconditional probabilities of the factors and the disease the same across conditions. Results & Discussion Recall that all of the theories under consideration will predict that participants will give the normative response in the control condition (namely, both potential causes are actual causes). The predictions for the experimental condition remain the same. Table 3 provides the number of participants who provided each possible response pattern. As in Experiment 1, the response tables for the two conditions were significantly different from chance (p < .001) and from each other (p < .001), indicating (at least most of) the participants were not simply falling back on default responses. The initial ratings also indicated that most participants treated the potential causes as legitimate, as 86% of them rated both potential causes between 40 and 60 in each condition. Furthermore, even participants who gave ratings outside of this range seemed to take

Learning Complex Causal Structures -- 24 the causes seriously, as some participants rated a potential cause with a “0,” but then subsequently answered that it was a cause. Unlike in Experiment 1, however, there was an order effect: participants who saw the control condition first were more likely to answer “No; No” in both conditions (p < .01 in control condition; p < .005 in experimental condition). We have no explanation for this effect. The response patterns in the control also differed based on the number of cases observed, as participants who saw more cases were more likely to respond (correctly) “Yes” to both variables (p < .01). Most importantly, we once again observe a small number of participants (in this case, zero) responding as predicted by the parameter estimation theories. Moreover, by changing both the cover story and the control condition, the two most obvious challenges to Experiment 1 have been answered. Moreover, the case for people behaving “as if” they were learning Bayes nets remains relatively strong, since a significant group of participants responded that way. Moreover, we note that, in the control condition, C2 is a weaker cause than C1, and so people should respond C2 = “No” if they either have a “strength threshold” before they attribute causation, or if they will only attribute causation to one variable in each situation. Hence, one could argue (on behalf of any of these theories) that the theory’s predictions potentially extend to the second column in Table 3 as well. However, note that this move does not substantially help the parameter estimation theories (they now predict two people’s behavior rather than none), but makes a large difference for the normative view (31 participants vs. 13). A new potential challenge arises, however, since the modal participant response remained simply to say “Not a cause” for everything. The presence of a large number of participants who responded “Not a cause” to everything requires further exploration and explanation. In particular, we might naturally wonder whether there are individuals that would never respond “Is a cause”

Learning Complex Causal Structures -- 25 for any variable (whether because of the causal connections are indeterministic – the cause doesn’t always produce the effect, or lack or confidence, or some other feature). If there are such individuals, then the parameter estimation theories could perhaps be excused for failing to predict their behavior; the unusual behavior points towards a possible “way out” for the proponents of the parameter estimation theories. In Experiment 3, we make a minor change to the cover story (emphasizing indeterminism) and institute a pretest to try to reduce the number of participants who respond “Not a cause” to every potential cause. Experiment 3 Method Participants were 151 UCSD students who received partial credit for psychology courses. Participants were told in advance that they would be asked several questions at the end of each series of cases, and that they would receive partial course credit for completing the experiment. The experiments were conducted on a computer. Each participant saw both conditions, and the order of the two conditions was randomly determined for each participant. For this experiment, we added a pretest “condition” before the control and experimental conditions to test whether individual participants would ever attribute causation in an indeterministic setting, even if there is only one cause and it is quite strong. In every condition, participants were presented with the cover story that they were doctors attempting to determine the cause(s) of a rare disease. In the pretest, the disease was “hemeosis,” and the potential causes were “endolyte” and “acrocyst” levels. In the control condition, the disease was “chromocystis,” and the potential causes were “denolate” and “moliton” levels. In the experimental condition, the disease was “pneumothoria,” and the potential causes were the levels of “vidarons” and “mesomins.” Participants were warned about the possibility of an unknown cause, and reminded

Learning Complex Causal Structures -- 26 that causation need not require determinism. They were then told that they would be able to view as many patients as they wanted. Before the participants saw the patient data for each series, they were asked to provide a judgment of the likelihood that each potential cause was an actual cause. The rating was based on a 100-point scale, with three benchmarks provided: 0 = definitely not a cause; 50 = equal chance of being a cause and not being a cause; 100 = definitely a cause. Participants had to enter a number for each of the potential causes. They were then shown a series of patient data, with a reminder above each case that they were supposed to determine what causes the disease: potential cause #1, potential cause #2, or something else. The patient data consisted of two bars representing the level of the potential causes (high or low), as well as the word “High” or “Low” next to each bar. The bars were in different colors, and the two series of cases used different colors for the bars. Below the bars, the disease was listed as either “Present” or “Absent.” The patient number was also presented. After every 12 cases, the participants were given the option to stop examining cases and answer whether each potential cause was a cause of the disease (with question order randomly assigned). Participants always saw the filter condition as the first series. After then seeing one of the other two series of cases, participants were shown the cover story for the other series, entered ratings, saw patient data, and answered the questions. The patient data for the pretest were drawn from the graph (and associated probability distribution) in Figure 12. Note that there is only one potential cause in the pretest, and it is a relatively strong cause. The cases for the control condition were drawn from Figure 13. As a side effect of equating the probability distributions in the control and experimental conditions, “moliton level” is a preventive cause in this condition. The cases for the experimental condition

Learning Complex Causal Structures -- 27 were drawn from Figure 14. Note that the three current theories all predict that “mesomin level” will be viewed as a preventive cause. Results & Discussion Table 4 provides the complete data for all 151 participants, and Table 5 provides the data for the 56 participants who passed the pretest. These tables are significantly different (p < .002; tables normalized to the same total number of individuals). Since we are principally interested in those individuals who “passed” the pretest, we will focus on Table 5 for the remainder of the analysis. These results are significantly different from chance (p < .001) and the results for each condition were significantly different from each other (p < .001). The predicted responses for the two conditions in this experiment are the same as those predicted for Experiment 2. Unfortunately, the vast majority of the participants did not fit any of the previously discussed theories. Nonetheless, we can explain the observed responses with a rather simply adjustment. The second potential cause in the control condition (moliton level) is only weakly associated with the effect (∆P = 0.2, both conditional and unconditional). Hence, if there is a threshold (of varying type, depending on the theory) under which people will not attribute causation, then the theories might well predict that C1 (denolate level) will be judged a cause, but C2 (moliton level) will not. This assumption corresponds to shifting the theoretical predictions in the first column over to the second column. As in Experiment 2, broadening the possible theoretical predictions enables both the parameter estimation and “as if Bayesian” theories to predict more participants behavior. Once again, however, the “as if” theory fares much better, as it now explains the behavior of almost three-quarters of the participant population, while the parameter estimation theories only explain 7.1% of the participants.

Learning Complex Causal Structures -- 28 The pretest seems to have successfully removed those individuals who say “Not a cause” to every variable, as only one of the 56 participants who passed the pretest proceeded to say “Not a cause” to everything in both the control and experimental conditions. Interestingly, though, there were many fewer “Not a cause”-everywhere responses even among participants who failed the pretest (ten out of 151 in the full participant population). This result suggests that other adjustments for this experiment were also successful. Besides the pretest, the substantive differences were (i) introduction of a preventive cause, and (ii) explicit reminders in the cover stories that causation could be non-deterministic. Introducing a preventive cause should, if anything, have confused participants even more, thereby increasing the number of “Not a cause”everywhere participants. Hence, we conclude that cover stories for experiments involving indeterministic causation should explicitly remind participants that causation need not be deterministic. Finally, we highlight the fact that nearly 63% of the participants failed the pretest; we do not here offer an explanation of this relatively large pattern of failure in a seemingly simple causal learning situation. General Discussion and Conclusions The central motivation for the experiments described in this paper was to determine whether people behaved non-normatively, as predicted by the three parameter estimation theories, when placed in a complex causal learning situation. These experiments give us strong reasons to think that people do not behave as predicted by the parameter estimation theories. The underlying assumption that people are simply estimating parameters in a one-layer graph does not seem to hold up in the face of this data. In addition to providing evidence that the parameter estimation theories are incorrect, these experiments give support to the notion that (at least some) people can learn something

Learning Complex Causal Structures -- 29 relevantly isomorphic to a Bayesian network. These experiments focused only on responses, and did not attempt to directly determine the structure of people’s representations of causal structures. However, participants’ abilities to respond “as if” they were Bayes net learners supports the theory that individuals’ internal representations are similar (in relevant ways) to Bayes nets. Of course, these experiments do not tell us anything about the specific algorithms used (e.g., Bayesian learning vs. constraint-based learning); they do, however, suggest that any psychological theory of Bayes net learning must provide a mechanism for the postulation of unobserved causes. More generally, participants seem to be using a range of strategies. If we classify participants by the theoretical predictions they satisfy, then the distribution of (apparent) learning strategies in Experiments 1 and 2 are not significantly different. In fact, the idea that participants are using a range of strategies has recently been supported in other experiments. Both Lober and Shanks (2000) and Buehner, et al. (2001) have conducted experiments with one potential cause and found evidence of two distinct participant subgroups: one using causal power, and one using ∆P. Hence, the experiments reported in this paper can also be viewed as providing evidence for broadening the scope of the “distinct strategies” claim: when presented with a sufficiently difficult causal learning situation, different individuals may use several different strategies, not just two. The structure of these experiments also prevents a possible Power PC reply that we have not yet considered: perhaps there is some (as yet unspecified) focal set in which the theory predicts the normative (and majority in Experiment 3) response. The Power PC theory has the flexibility to consider multiple possible focal sets; perhaps we have simply not yet found the one that yields the correct prediction. For the most complex causal structure (e.g., Figure 5),

Learning Complex Causal Structures -- 30 however, there is no non-trivial focal set such that C2 is independent of E. That is, the only sets in which C2 is conditionally independent of E (and so predicted to be judged not a cause) are those defined precisely to result in conditional independence. No independent considerations (e.g., the first 10 cases, those in which C1 is absent, etc.) lead to a focal set in which C2 is independent of E. Hence, these experiments present a serious challenge to all three current theories; the flexibility of the Power PC theory does not exempt it from these results. This work is complementary to that of a growing number of researchers. Tenenbaum & Griffiths (2000) argue that ratings data is better explained by a Bayesian learner of Bayesian networks than by the three current theories discussed here. Their theory is not directly applicable to these experiments, however, because standard Bayesian learning has no non-arbitrary method for introducing latent variables. Gopnik, et al. (2002) argue that very young children have internal representations of causal structure that are similar in important ways to Bayes nets. Their work uses a different experimental design that is centered on interventions, and does not provide any testable learning theory. In this latter regard, the reasoning justifying their conclusions is quite similar to that provided here: correct prediction of responses suggests an underlying representation similar to that posited by the theory. Finally, Waldmann and his colleagues have recently published several papers arguing that causal learning is Bayes net learning (Waldmann & Martignon, 1998; Hagmayer & Waldmann, ????). This research gives further support for the claim here that people are learning Bayes nets when they learn causal structure, but does not provide any further theoretical work to better specify the theory. Their theoretical work focuses on estimating parameters of a Bayes net, and not on learning the structure of the network. Hence, there seems to be a growing convergence of evidence from a variety of sources that, for many people at any rate, individual causal learning is (relevantly similar to) learning the

Learning Complex Causal Structures -- 31 structure and parameters of a Bayesian network. Despite this convergence, there are some substantial open problems that we simply note here: How do individuals learn causal structure in an on-line fashion (i.e., without storing all of the observed data)? Do individuals actually posit unobserved causes in the above experiments? If so, under what conditions do people introduce latent variables to explain their observations? Answers to these questions would substantially advance the psychological theory that causal learners are Bayes net learners.

Learning Complex Causal Structures -- 32 References Ahn, W., & Bailenson, J. (1996). Causal Attribution as a Search for Underlying Mechanisms: An Explanation of the Conjunction Fallacy and the Discounting Principle. Cognitive Psychology, 31, 82-123. Ahn, W., Kalish, C. W., Medin, D. L., & Gelman, S. A. (1995). The role of Covariation Versus Mechanism Information in Causal Attribution. Cognition, 54, 299-352. Baker, A. G., Mercier, P., Vallée-Tourangeau, F., Frank, R., & Pan, M. (1993). Selective Associations and Causality Judgments: Presence of a String Causal Factor May Reduce Judgments of a Weaker One. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 414-432. Buehner, M. J., & Cheng, P. W. (1997). Causal Induction: The Power PC Theory versus the Rescorla-Wagner Model. In M. G. Shafto & P. Langley, (Eds.), Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society (pp. 55-60). Mahwah, NJ: LEA Publishers. Buehner, M. J., Cheng, P. W., & Clifford, D. (2001). From Covariation to Causation: A Test of the Assumption of Causal Power. Manuscript submitted for publication. Cheng, P. W. (1997). From Covariation to Causation: A Causal Power Theory. Psychological Review, 104, 367-405. Cooper, G. F. (2000). A Bayesian Method for Causal Modeling and Discovery Under Selection. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI2000). Danks, D. (2001). The Epistemology of Causal Judgment. Ph.D. dissertation: Philosophy Department, University of California, San Diego.

Learning Complex Causal Structures -- 33 Danks, D. (2002). Equilibria of the Rescorla-Wagner Model. Forthcoming in Journal of Mathematical Psychology. Danks, D., Griffiths, T. L., & Tenebaum, J. B. (2002). Dynamical Causal Learning. Forthcoming in Advances in Neural Information Processing 15 (NIPS-2002). Geiger, D., Heckerman, D., & Meek, C. (1996). Asymptotic Model Selection for Directed Networks with Hidden Variables. Microsoft Research Technical Report: MSR-TR-9607. Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T., & Danks, D. (2002). A Theory of Causal Learning in Children: Causal Maps and Bayes Nets. Forthcoming in Psychological Review. Hagmayer, Y., & Waldmann, M. R. ????. Simulating Causal Models: The Way to Structural Sensitivity. Heckerman, D., Geiger, D., & Chickering, D. M. (1994). Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Microsoft Research Technical Report: MSR-TR-94-09. Lober, K., & Shanks, D. R. (2000). Is Causal Induction Based on Causal Power? Critique of Cheng (1997). Psychological Review, 107, 195-212. Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press. Rescorla, R. A., & Wagner, A. R. (1972). A Theory of Pavlovian Conditioning: Variations in the Effectiveness of Reinforcement and Nonreinforcement. In A.H. Black & W.F. Prokasy (Eds.), Classical Conditioning II: Current Research and Theory (pp. 64-99). New York: Appleton-Century-Crofts.

Learning Complex Causal Structures -- 34 Shanks, D. R. (1995). Is Human Learning Rational? The Quarterly Journal of Experimental Psychology, 48A, 257-279. Spellman, B. A. (1996). Conditionalizing Causality. In D.R. Shanks, K.J. Holyoak, & D.L. Medin (Eds.), Causal Learning: The Psychology of Learning and Motivation, Vol. 34 (pp. 167-206). San Diego: Academic Press. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, Prediction, and Search. 2nd edition, 2001. Cambridge, Mass.: AAAI Press and The MIT Press. Tenenbaum, J. B., & Griffiths, T. L. (2000). Structure Learning in Human Causal Induction. In Advances in Neural Information Processing 13. Waldmann, M. R., & Martignon, L. (1998). A Bayesian Network Model of Causal Learning. In M. A. Gernsbacher & S. J. Derry (Eds.), Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. Mahwah, NJ: Erlbaum. Wasserman, E. A., Elek, S. M., Chatlosh, D. L., & Baker, A. G. (1993). Rating Causal Relations: Role of Probability in Judgments of Response-Outcome Contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 174-188.

Learning Complex Causal Structures -- 35 Footnotes 1

The restriction to stopping only after some multiple of 12 cases ensured that every participant

saw exactly the probability distribution described, and not some random sample approximating the distribution. 2

The language of “drawn from” is perhaps misleading. The experimental design enabled us to

ensure that participants in fact saw this frequency distribution, regardless of when they stopped.

Learning Complex Causal Structures -- 36 Author Note Experiments 1 and 2 were performed as part of the first author’s dissertation for the Department of Philosophy, University of California, San Diego. Thanks to Clark Glymour for discussions about the design, implementation, and analysis of these experiments, as well as comments on earlier drafts. Patricia Cheng provided valuable comments about the experimental design, as well as ways to clarify exactly what mechanisms might underlie the responses. Alison Gopnik provided detailed comments on an early version of this article. The results of experiments 1 and 2 were presented at University of Colorado, Boulder, where the audience posed a number of difficult questions about the experiments. A pilot study of experiment 1 was supported by funds from the Valtz Chair at UC, San Diego.

Learning Complex Causal Structures -- 37

Table 1: Theoretical Predictions for Three Different Causal Structures

C1

Conditional ∆P & Rescorla-Wagner Power PC (restricted focal sets) Power PC (complete focal sets) Normative response True causal relations

C1

C2

C1 C2

E

E

C2

E

C1 Yes

C2 No

C1 Yes

C2 Yes

C1 Yes

C2 Yes

Yes

No

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

Yes Yes

No No

Yes Yes

Yes Yes

No No

No No

Learning Complex Causal Structures -- 38

Table 2: Participant Responses for Experiment 1 (N = 85)

C1 = Yes; C2 = Yes (Exp) C1 = Yes; C2 = No (Exp) C1 = No; C2 = Yes (Exp) C1 = No; C2 = No (Exp)

C1 = Yes; C2 = Yes (Cont) 0 1 (1.2%) 5 && (5.9%) 4 (4.7%)

C1 = Yes; C2 = No (Cont) 2 ** (2.4%) 4 (4.7%) 11 (12.9%) 17 ^^ (20%)

C1 = No; C2 = Yes (Cont) 0 1 (1.2%) 0 1 (1.2%)

‘**’ = Conditional ∆P/Rescorla-Wagner/Power PC (restricted focal sets) ‘&&’ = Power PC (complete focal sets) ‘^^’ = Normative response

C1 = No; C2 = No (Cont) 1 (1.2%) 1 (1.2%) 3 (3.5%) 34 (40%)

Learning Complex Causal Structures -- 39 Table 3: Participant Responses for Experiment 2 (N = 97)

C1 = Yes; C2 = Yes (Exp) C1 = Yes; C2 = No (Exp) C1 = No; C2 = Yes (Exp) C1 = No; C2 = No (Exp)

C1 = Yes; C2 = Yes (Cont) 0 ** 1 (1.0%) 6 && (6.2%) 13 ^^ (13.4%)

C1 = Yes; C2 = No (Cont) 2 (2.1%) 0

C1 = No; C2 = Yes (Cont) 0

C1 = No; C2 = No (Cont) 0

0

9 (9.3%) 18 (18.6%)

2 (2.1%) 1 (1.0%)

2 (2.1%) 9 (9.3%) 34 (35.1%)

‘**’ = Conditional ∆P/Rescorla-Wagner/Power PC (restricted focal sets) ‘&&’ = Power PC (complete focal sets) ‘^^’ = Normative response

Learning Complex Causal Structures -- 40 Table 4: Response Table for Control and Experimental Conditions in Experiment 3 (All Participants, N = 151)

C1 = Yes; C2 = Yes (Exp) C1 = Yes; C2 = No (Exp) C1 = No; C2 = Yes (Exp) C1 = No; C2 = No (Exp)

C1 = Yes; C2 = Yes (Cont) 3 ** (2.0%) 5 (3.3%) 0 && 7 ^^ (4.6%)

C1 = Yes; C2 = No (Cont) 18 (11.9%) 17 (11.3%) 6 (4.0%) 79 (52.3%)

C1 = No; C2 = Yes (Cont) 0

C1 = No; C2 = No (Cont) 0

1 (0.7%) 0

3 (2.0%) 0

2 (1.3%)

10 (6.6%)

‘**’ = Conditional ∆P/Rescorla-Wagner/Power PC (restricted focal sets) ‘&&’ = Power PC (complete focal sets) ‘^^’ = Normative response

Learning Complex Causal Structures -- 41 Table 5: Response Table for Control and Experimental Conditions of Experiment 3 (Passed Pretest Only, N = 56)

C1 = Yes; C2 = Yes (Exp) C1 = Yes; C2 = No (Exp) C1 = No; C2 = Yes (Exp) C1 = No; C2 = No (Exp)

C1 = Yes; C2 = Yes (Cont) 0 1 (1.8%) 0 3 (5.4%)

C1 = Yes; C2 = No (Cont) 4 (7.1%) 6 (10.7%) 3 (5.4%) 38 (67.9%)

C1 = No; C2 = Yes (Cont) 0

C1 = No; C2 = No (Cont) 0

0

0

0

0

0

1 (1.8%)

‘**’ = Conditional ∆P/Rescorla-Wagner/Power PC (restricted focal sets) ‘&&’ = Power PC (complete focal sets) ‘^^’ = Normative response

Learning Complex Causal Structures -- 42

Figure Captions Figure 1. A simple causal structure Figure 2. Equivalence class for common cause graph Figure 3: Fixed graph for parameter estimation Figure 4: A second simple causal structure Figure 5: Complex causal structure Figure 6: Equivalence class for complex graph (no latent variables) Figure 7: Equivalence class for complex graph (latent variables allowed) Figure 8: Control condition for experiment 1 Figure 9: Experimental condition for experiment 1 Figure 10: Control condition for experiment 2 Figure 11: Experimental condition for experiment 2 Figure 12: Pretest for experiment 3 Figure 13: Control condition for experiment 3 Figure 14: Experimental condition for experiment 3

Learning Complex Causal Structures -- 43

Figure 1

C1

C2

E

Learning Complex Causal Structures -- 44

Figure 2

C2

C1

E

C2

C1

E

C2

C1

E

Learning Complex Causal Structures -- 45

Figure 3 C

B

wC

wB E

Learning Complex Causal Structures -- 46 Figure 4

C1

C2

E

Learning Complex Causal Structures -- 47

Figure 5 C1

U

C2

E

Learning Complex Causal Structures -- 48

Figure 6

C1

C2

E

Learning Complex Causal Structures -- 49

Figure 7

C1

C2

E

Learning Complex Causal Structures -- 50

Figure 8 Liquid Height

Blooming

P(Liquid = Yes) = 0.5 P(Bloom = Yes | Liquid = Yes) = 1.0 P(Bloom = Yes | Liquid = No) = 0.5 P(Height = Tall | Liquid = Yes) = 1/3 P(Height = Tall | Liquid = No) = 2/3

Learning Complex Causal Structures -- 51

Figure 9 Unobserved factor

Liquid

Height

Blooming

P(Liquid = Yes) = 0.5 P(Unobserved = Yes) = 0.5 P(Bloom = Yes | Unobserved = Yes) = 1.0 P(Bloom = Yes | Unobserved = No) = 0.0 P(Height = Tall | Liquid = Yes; Unobserved = Yes) = 1.0 P(Height = Tall | Liquid = Yes; Unobserved = No) = 1/3 P(Height = Tall | Liquid = No; Unobserved = Yes) = 2/3 P(Height = Tall | Liquid = No; Unobserved = No) = 0.0

Learning Complex Causal Structures -- 52

Figure 10 Vidaron level

Mesomin level

Pneumothoria

P(Vidarons = High) = 0.5 P(Mesomins = High) = 0.5 P(Pneumothoria = Present | Vidarons = High; Mesomins = High) = 1.0 P(Pneumothoria = Present | Vidarons = High; Mesomins = Low) = 2/3 P(Pneumothoria = Present | Vidarons = Low; Mesomins = High) = 1/3 P(Pneumothoria = Present | Vidarons = Low; Mesomins = Low) = 0.0

Learning Complex Causal Structures -- 53

Figure 11 Endolyte level

Unobserved factor Acrocyst level

Hemeosis

P(Endolytes = High) = 0.5 P(Unobserved = Yes) = 0.5 P(Hemeosis = Present | Unobserved = Yes) = 1.0 P(Hemeosis = Present | Unobserved = No) = 0.0 P(Acrocysts = High | Endolytes = High; Unobserved = Yes) = 1.0 P(Acrocysts = High | Endolytes = High; Unobserved = No) = 2/3 P(Acrocysts = High | Endolytes = Low; Unobserved = Yes) = 1/3 P(Acrocysts = High | Endolytes = Low; Unobserved = No) = 0.0

Learning Complex Causal Structures -- 54

Figure 12 Endolyte level

Acrocyst level

Hemeosis

P(Endolytes = High) = 0.5 P(Acrocysts = High) = 0.5 P(Hemeosis = Present | Endolytes = High) = 0.8 P(Hemeosis = Present | Endolytes = Low) = 0.2

Learning Complex Causal Structures -- 55

Figure 13 Denolate level

Moliton level

Chromocystis

P(Denolates = High) = 0.5 P(Molitons = High) = 0.5 P(Chromocystis = Present | Denolates = High; Molitons = High) = 0.8 P(Chromocystis = Present | Denolates = High; Molitons = Low) = 1.0 P(Chromocystis = Present | Denolates = Low; Molitons = High) = 0.0 P(Chromocystis = Present | Denolates = Low; Molitons = Low) = 0.2

Learning Complex Causal Structures -- 56

Figure 14 Vidaron level

Unobserved factor Mesomin level

Pneumothoria

P(Vidarons = High) = 0.5 P(Unobserved = Yes) = 0.5 P(Pneumothoria = Present | Unobserved = Yes) = 1.0 P(Pneumothoria = Present | Unobserved = No) = 0.0 P(Mesomins = High | Vidarons = High; Unobserved = Yes) = 0.8 P(Mesomins = High | Vidarons = High; Unobserved = No) = 1.0 P(Mesomins = High | Vidarons = Low; Unobserved = Yes) = 0.0 P(Mesomins = High | Vidarons = Low; Unobserved = No) = 0.2

Suggest Documents