Extending the Role of Causality in Probabilistic ... - Semantic Scholar

0 downloads 0 Views 166KB Size Report
Joost Vennekens, Marc Denecker, and Maurice Bruynooghe. {joost, marcd, maurice}@cs.kuleuven.be. Dept. of .... winning or losing is a cause for continuing it.
Extending the Role of Causality in Probabilistic Modeling (Extended Abstract) Joost Vennekens, Marc Denecker, and Maurice Bruynooghe {joost, marcd, maurice}@cs.kuleuven.be Dept. of Computer Science, K.U.Leuven Celestijnenlaan 200A B-3001 Leuven, Belgium Abstract The remarkable success of Bayesian networks in probabilistic modeling seems to be at least in part due to the causal interpretation that can be given to such networks, i.e., the fact that the parents of a node can be seen as causally determining this node itself. For the most part, however, this causal interpretation remains an informal guideline. Indeed, it is not reflected in the formal semantics of Bayesian networks, which is expressed in terms of probabilistic independencies and conditional probabilities, rather than causal relations. In this paper, we propose a probabilistic modeling language that has causality at the heart of its fundamental constructs. We first show how this language can be used to express independencies similar to those expressed by a Bayesian network. We then examine how the fundamental causal principles of this language lead to different ways of modeling certain probabilistic relations and compare the two representations.

Keywords: Causal Reasoning, KR languages, Logic Programming, Probability, Bayesian networks

1

Introduction

There is a lot of research into the relation between Bayesian networks and causality [8]. However, while causality seems to plays an important part in explaining the succes of Bayesian networks as a probabilistic modeling language, there is at heart nothing necessarily causal about this formalism. Indeed, formally, a Bayesian network just expresses probabilistic independencies and conditional probabilities; the fact that most Bayesian networks tend to be written or interpreted with causal relations in mind is wholly immaterial to its semantics. Moreover, the causal interpretation that can be given to acyclic Bayesian networks is no longer possible for cyclic networks. As such, causality seems to be a somewhat coincidental feature of Bayesian networks, rather than a fundamental property. In this paper, we investigate a language with causality at its very core. This language allows causal relations between propositions to be expressed, but also incorporates a probabilistic component. Concretely, we introduce the following construct, which we call a conditional probabilistic event, CPevent for short: “If propositions b1 , . . . , bn hold, then a probabilistic event will happen that causes at most one of propositions h1 , h2 , . . . , hm , where the probability of h1 being caused is α1 , the probability of h2 is α2 , . . . , and the probability of hm is αm .” We use the following syntax to represent a CP-event of the above form: (h1 : α1 ) ∨ · · · ∨ (hm : αm ) ← b1 , . . . , bn . We will consider the language of Conditional Probabilistic Event Logic, or CP-logic for short, consisting of sets of such CP-events. Such a set is called a CP-theory. The meaning of a CP-theory is based on two fundamental principles, that seem to be crucial in constructing a good representation for probabilistic knowledge, based on causality. The first principle is that of independent causation. It states that every CP-event represents an independent causal process; in other words, learning the outcome of one event might give information about whether or not some other event will happen, but not about what the outcome of this event will be, should it in fact happen. This principle is crucial to arriving at a modular representation. Moreover, as we will see, it enables us to represent the relation between an effect and a number of independent possible causes for this effect in a compact and natural way. Finally, it also allows us, to a certain extent, to make abstraction of the order in which CP-events hapen. The second principle, which we call the “no deus ex machina”-principle, is that nothing happens without a cause, i.e., all propositions should remain false unless there is a cause for them to become 1

true. This is a fundamental principle of causal reasoning and will turn out to be vital to our interpretation of CP-events as causal processes, especially in the presence of cyclic causal relations. It also allows us to write compact representations, because cases where nothing happens, i.e., there is no cause for a proposition, can just be ignored. Under these two principles, a CP-theory can be seen as constructively defining a unique probability distribution over interpretations of the propositions. At each step, this constructive process simulates a single CP-event. Such a simulation derives proposition hi with probability αi , but can only be performed if all the propositions b1 , . . . , bn have already been derived. Moreover, each event can occur at most once. This process will start from the empty set, i.e., initially nothing has been derived yet, and will end once there are no more CP-events left to simulate. The probability of an interpretation, then, is the sum of the probabilities of all possible derivations of this interpretation. It can be shown that the precise order in which CP-events are simulated does not matter, i.e., all sequences will construct the same distribution. This follows from the principle of independent causation, together with the monotonicity of such sequences of simulations, i.e., the fact that if, at a certain time, all preconditions to a CP-event are satisfied, they will remain satisfied. Moreover, the “no deus ex machina”-principle is clearly incorpated in this semantics, because a proposition is only derived if it is caused by a CP-event with satisfied preconditions. To sketch some of the interesting properties of this language, we consider two ways in which a person might get infected by the HIV virus: sexual intercourse with an infected partner and blood transfusion. For concreteness, assume that the probability of contracting HIV from an infected partner is 0.6 and that the probability of contracting it through a blood transfusion is 0.01. For the case of two partners a and b, of which only a has recieved a bloodtransfusion, we can model this example by the following CP-theory: (hiv(a) : 0.6) ← hiv(b). (hiv(b) : 0.6) ← hiv(a). (hiv(a) : 0.01). As this example shows, the principle of independent causation makes it easy to represent the relation between an effect and a number of independent causes for this effect in a compact, clear and modular way, with each possible cause corresponding to a single CP-event. Moreover, this principle also makes the representation elaboration tolerant, in the sense that adding (or removing) an additional cause simply corresponds to adding (removing) a single rule. For instance, if b undergoes a blood transfusion as well, we only need 2

to add a rule (hiv(b) : 0.01). Because of the “no deus ex machina”-principle, the cyclic causal relation between hiv(a) and hiv(b) can be represented in precisely the same way as an acyclic one. Indeed, the first two rules will act as one would expect from such a causal loop: if neither a nor b have been infected by an external cause, then both are not infected, i.e., by itself such a loop does not cause anything; if precisely one of a and b has been infected by an external cause, then the probability of the other also being infected is 0.6. Another useful consequence of the “no deus ex machina”-principle is that domains can be represented in a compact way, because cases in which a proposition is not caused can simply be ignored. Indeed, we do not need to mention that without either intercourse with an infected partner or blood transfusion, an HIV infection is impossible. In the full definition of CP-logic, negated propositions will also be allowed to act as preconditions for CP-events. Together with the “no deus ex machina”-principle, this will lead to even more compact representations, because, in this way, the absence of a cause for one proposition can act as a cause for another proposition. For instance, when reasoning about actions, we can express frame axioms, stating that the absence of a cause for the termination of a fluent is a cause for its persistence. Another example, which we will discuss later, is that of a game in which the absence of a cause for winning or losing is a cause for continuing it. By introducing negation, however, the previously mentioned monotonicity property will be lost, i.e., it will no longer be the case that if the preconditions to a conditional experiment are satisfied at a certain point in time, they are guaranteed to remain satisfied. Consequently, every set of conditional experiments no longer necessarily corresponds to a meaningful description of a probability distribution. Indeed, for instance, it can now be the case that by executing some CP-event, we would actually derive a cause for not executing it. Such problems can be avoided by introducing an extra constraint that basically states that the falsity of an atom must not act in any way as a cause for this atom itself. In the next section, we formally define CP-logic and its semantics. We then discuss some interesting links between this language and Logic Programming. The rest of this paper is devoted to an analysis of the role of causality in probabilistic modeling, by means of a comparison between CPlogic and Bayesian networks. We first show that CP-logic also offers a way of stating the kind of probabilistic independence assumptions expressed by a Bayesian network. We then investigate how and when the causal principles 3

behind CP-logic lead to different ways of modeling probabilistic relations.

2

Conditional Probabilistic Event Logic

Syntactically, a CP-theory is a set of rules of this form: (h1 : α1 ) ∨ · · · ∨ (hn : αn ) ← b1 , . . . , bm .

(1)

The Pn hi are atoms, the bi are literals and the αi numbers between 0 and 1, s.t. i=1 αi ≤ 1. Formally, we only consider ground CP-theories, i.e., atoms are simply propositional symbols. In examples, however, CP-theories containing variables will also be used; such theories are simply viewed as abbreviations for their grounding w.r.t. the Herbrand universe. For a rule r of the above form, we use body(r) to refer to the set {b1 , . . . , bm } of literals. We also use body+ (r) to denote the set of all atoms that occur only positively in body(r) and body− (r) to denote all atoms that occur negatively. The set of pairs {(h1 , α1 ), . . . , (hn , αn )} will be denoted by head(r). By headAt (r) we mean the set {h1 , . . . , hn } of all atoms appearing in the head of r. Rules of the form (h : 1) ← body are written as h ← body(r). In this way, every normal logic program is also a CP-theory. For now, we will require CP-theories to be stratified, meaning that there has to exist a way of assigning to each atom p a level lvl(p) ∈ N, s.t. for each rule r and h ∈ headAt (r), for all b ∈ body+ (r), lvl(h) ≥ lvl(b), while for all b ∈ body− (r), lvl(h) > lvl(b). The level of a rule r is defined as the minimum of the levels of all atoms appearing in r. The existence of a stratification is a well-known condition in logic programming [1]. From our perspective, it is useful because any stratified CP-theory corresponds to a meaningful description of a probability distribution in terms of conditional experiments. Indeed, by executing CP-events with a lower level first, we can make sure that, by the time we need to decide whether a precondition ¬p of some event holds, all events that might cause p have already been executed. As such, if p has not yet been derived by that time, then it will never be derived and, therefore, the event can safely be executed. We will now formally define the semantics of a CP-theory in the way we outlined in the introduction. We use the mathematical structure of a probabilistic transition system. This is a tree structure T , in which every edge is labeled with a probability. To each node c, we associate an interpretation I(c). A node c executes a rule r of form (1) if c has as its children precisely 4

nodes c0 , c1 , . . . , cn , where I(c0 ) = I(c) and, for P all i > 0, I(ci ) = I(c) ∪ {hi }; the probability of the edge (c, c0 ) is 1 − 1≤i≤n αi and, for i > 0, the probability of (c, ci ) is αi . A rule r is executable in a node c if body(r) holds in I(c), i.e., body+ (r) ⊆ I(c) and body− (r) ∩ I(c) = {}, and no ancestor of c already executes r. A probabilistic transition system T runs a CP-theory C iff: • For the root r of T , I(r) is {}; • For every node c of T , either c executes an executable rule r, s.t. no executable rule r0 has a lower level than that of r, or no rules are executable in c and c is a leaf; Such a system T defines a probability distribution over its leaves: the probability of a leaf is the product of the probabilities of all edges in the path from the root to this leaf. From this, a probability distribution πT over interpretations can be derived, by defining the probability πT (I) of an interpretation I as the sum of the probabilities of all leaves c for which I(c) = I. In the full version of this paper, we show that every system T that runs an CP-theory C defines the same probability distribution πT . This is a consequence of the fact that, due the existence of a stratification for each CP-theory, such runs are monotonic, in the sense that whenever a conditional experiment is executed, its preconditions are guaranteed to hold in all subsequent states. The formal semantics of a CP-theory C is now defined as precisely this unique distribution, which we denote as πC .

3

The link to logic programming

The distribution πC can also be characterized in a different way, using socalled instances of the CP-theory. Such an instance is a normal logic program that is constructed by making a number of independent probabilistic choices: every rule r of form (1) is either replaced by the rule hi ←P body(r) (with probability αi ) or removed altogether (with probability 1 − i αi ). These instance are interpreted using the well-founded semantics [11] for normal logic programs. In this way, the probability of an interpretation I can be defined as the sum of the probabilities of all instances that have I as their well-founded model. In the full version of this paper, we show that this

5

probability distribution coincides with the semantics πC defined above.1 The characterization of the semantics of a CP-theory in terms of the well-founded models of it instances is interesting for a number of reasons. Firstly, it allows us to relax the condition that CP-theories have to be stratified. Indeed, this new semantics can easily be defined for every CP-theory whose instances all have a two-valued well-founded model. This is a strictly weaker condition, that, intuitively, not only allows CP-theories that admit an up front, syntactical stratification, but also CP-theories that can only be dynamically stratified, i.e., where it may take some initial derivations to reveal that negation is, in fact, used in a sensible way. Secondly, it allows us to relate CP-logic to logic programming. The equivalence between the two semantics points towards a connection between causality and the wellfounded semantics. Such a link could explain, for instance, the usefulness of this semantics in dealing with recursive ramifications in situation calculus [3, 4]. There is also an interesting link between CP-logic and disjunctive logic programs. From a syntactical point of view, a CP-theory can be transformed into a disjunctive logic program by dropping all probabilistic annotations, i.e., ignoring all quantitative information and only retaining the qualitative knowledge. The set of all interpretations that would be assigned a non-zero probability in CP-logic can be regarded as giving a possible world semantics to such a program. It turns out that, for stratified CP-theories, this semantics coincides with the possible model semantics for disjunctive logic programs [10] and, as such, offers an additional causal motivation for this non-standard semantics.

4

Bayesian networks in CP-logic

CP-logic can express the same kind of knowledge as expressed in a Bayesian network. The semantics of Bayesian networks [7] states that a probability distribution is a model of a network with graph hN, Ei iff (1) the conditional probabilities it determines are the same as the appropriate entries in the various tables and (2) the value of a node n is probabilistically independent 1

Historically, the semantics in terms of instances was defined first [12]. In that paper, CP-logic was called Logic Programs with Annotated Disjunctions and was motivated as a sensible way of combining logic programming and probabilities. Investigating the relation with probabilistic transition systems has lead to the interpretation of a rule as a CP-event, which, in turn, has been the basis for the link between CP-logic and causality.

6

Burglary

Earthquake Alarm

A

B,E 0.9

E

0.2

B,¬E 0.8

¬B,E 0.8

¬B,¬E 0.1 B

0.1

Figure 1: A Bayesian network.

of the value of all nodes m, s.t. there is no path from n to m in E, given a value for the parents of n in E. If a distribution π satisfies condition (2) w.r.t. some binary relation E, we say that π is Bayesian w.r.t. E. We now present such a relation for the semantics πC of a CP-theory C. The basic idea is that an atom p has a direct causal influence on an atom q if p appears in the body of a rule that has q in its head. However, by itself these direct causal influences are not enough. Indeed, there is another case when learning the truth of p gives direct (i.e., not mediated by another atom) information about q and that is when p and q are alternative outcomes of the same probabilistic event, i.e., appear in the head of the same clause. To save space, here, we only consider CP-theories where all heads contain precisely one disjunct. In the full version, we consider the general case. Theorem 1. If all rules of a CP-theory C have one atom in their head, then πC is Bayesian w.r.t. the binary relation containing all pairs (p, q) s.t. p has a direct causal influence on q. From this theorem, we can derive a way of representing a Bayesian network containing only boolean nodes in CP-logic. As we show in the full version of this paper, dealing with nodes with more than two possible values will require the use of rules with more than one disjunct in the head. Rather than formally define this representation, we show how to represent the Bayesian network depicted in Figure 1. The general principle should be clear from this example. (burg : 0.1). (earthq : 0.2). (alarm : 0.9) ← burg, earthq. (alarm : 0.8) ← ¬burg, earthq. (alarm : 0.8) ← burg, ¬earthq. (alarm : 0.1) ← ¬burg, ¬earthq. The structure of the Bayesian network is indeed mirrored by the structure of the rules: the bodies of the rules for earthquake and burglary are empty, while the rules for alarm have both burglary and earthquake in their body. 7

5

The role of causality

The results of Section 4 give us a way of representing Bayesian networks in CP-logic. Often, however, the most natural way of modeling certain probabilistic relations in CP-logic is different from how one would represent them in a Bayesian network. In this section, we examine those differences and trace them back to the two fundamental principles of CP-logic. In the full version of this paper, we also use the material presented in this section to derive a transformation from CP-logic to Bayesian networks.

5.1

The principle of independent causation

As discussed in Section 4, CP-logic incorporates the same kind of probabilistic independencies as Bayesian networks do. However, the principle of independent causation also allows a different kind of independence to be expressed, namely that between different causes for the same effect. In the introduction, we illustrated this by considering a number of different causes for HIV infection. We return to that example in the next section, where it will be used to illustrate the fact that our methodology for representing independent causation also applies when cyclic causal relations are involved. In this section, we focus on an example containing only acyclic causality: Example 1. Consider a game of Russian roulette with two guns, one in the player’s right hand and one in his left. Each of the guns is loaded with a single bullet. What is the probability of the player dying if he fires both guns? Firing a gun causes death with probability 61 . In CP-logic, this can be written as: (death : 16 ) ← f ire(Gun).This rule is all that is needed to model the operation of the guns. Indeed, if we also include facts f ire(lef t gun) and f ire(right gun), then, after grounding, we get the following rules: {(death : 1 ) ← f ire(lef t gun). (death : 61 ) ← f ire(right gun).}. In words, “firing 6 the left gun” and “firing the right gun” are two independent causes for death and each has a probability of 16 of actually causing death. In a Bayesian network, this relation would typically be expressed as follows: death

left, right ¬ left, right left, ¬ right ¬ left, ¬ right 11/36 1/6 1/6 0

Perhaps the most striking difference between the two representations is that, in CP-logic, the independence between the two causes for death is 8

a qualitative property rather than a quantitative one. Indeed, in the CPtheory, this independence is evident from the fact that f ire(lef t gun) and f ire(right gun) do not appear in the body of the same rule. In the Bayesian network, on the other hand, it is expressed by the fact that P (death | lef t, right) = P (death | lef t, ¬right) + P (death | ¬lef t, right) − P (death | lef t, ¬right) · P (death | ¬lef t, right). In many contexts, the separation between quantitative and qualitative knowledge is an important issue, because these might have different origins or have to be treated differently. For instance, qualitative knowledge is typically more robust to small changes in the specification of a problem. If we were to find out that one of the guns has a mechanical defect, making the probability of the bullet being in front 11 instead, then this would not affect the of the hammer not 61 precisely, but 60 independence between the two possible causes for death. In the CP-theory, this would be witnessed by the fact that the structure of this theory would not change. For another example, in a machine learning setting, one is typically most interested in parameter learning, i.e., the task of learning the quantitative knowledge, given the qualitative knowledge. It seems that in many cases, it might be useful to have independence between causes in the structural knowledge. A second difference is that the probabilities in the Bayesian network are conditional probabilities, whereas the probabilities in CP-logic are causal probabilities. Causal probabilities are more informative, in the sense that, together with the principles of independent causation and no deus ex machina effects, they imply the conditional ones. Indeed, it is straightforward to construct the above table from the CP-theory. However, this might lead to an exponential increase in the number of probabilistic parameters. Indeed, if there are n independent causes for a certain atom, e.g., n different guns in our game of Russian roulette, then this can be expressed by n ground CP-logic rules. The most obvious Bayesian network, however, would require a table with 2n entries. To avoid this blowup, one can take special care to introduce new nodes in between the causes and the effect. For instance, one can construct an inverse binary tree with the effect death as its leaf and the causes f ire(guni ) as roots. Each one of the new nodes would then, informally speaking, represent the proposition “the effect is caused by at least one of the two parents of this node”. To do this, n new nodes needs to be introduced, but each conditional probability table now only needs 4 entries. A third difference concerns the elaboration tolerance of the representation. Because of the principle of independent causation, adding an additional 9

cause to an existing CP-theory can be done by simply adding another rule. For instance, if it is also possible that the excitement of the Russian roulette game causes the player to have a heart attack, we can simply include the CPevent “(death : 0.2).”, which might lead to death without firing any guns. In the naive, exponential representation outlined above, one would construct a new conditional probability table of twice the size of the original one, half of which would contain new values. In a more complex representation, such as the binary tree one mentioned above, the introduction of an additional cause would only affect those nodes between the new cause and the effect.

5.2

The “no deus ex machina” principle

We now return to the following example from the introduction: Example 2. There are two ways of getting infected by the HIV virus. Firstly, there is a probability of 0.01 of contracting the virus by a blood transfusion. Secondly, one might get infected by an already infected sexual partner. The probability of an infected person infecting his/her partner is 0.6. (hiv(X) : 0.6) ← hiv(Y ), partners(X, Y ). (hiv(X) : 0.01) ← blood transf usion(X). Because the partners-relation is symmetric, this example leads naturally to cyclic causal relations. Because of the “no deus ex machina”-principle of CP-logic, cyclic relations can be modeled in the same way as acyclic ones. To make the discussion more concrete, suppose that a and b are the only persons we wish to consider and that they are partners. For simplicity, we ignore the partner and blood transf usion predicates and simply assume that our grounding process knows who are partners and who have received blood transfusions. The first rule of the CP-theory will lead to the grounding {(hiv(a) : 0.6) ← hiv(b). (hiv(b) : 0.6) ← hiv(a).}. Let us examine how these rules act when part of a larger theory. Firstly, if these two rules were to constitute the entire theory, then, because of the “no deus ex machina”principle, neither partner would be infected. Now, if partner a undergoes a blood transfusion, the rule “(hiv(a) : 0.01).” would also appear, adding this as an additional cause for hiv(a). In this case, a has a non-zero probability (namely 0.01) of being infected by an external cause and, therefore, b also has a non-zero probability (0.01 × 0.6) of infection. If both a and b undergo a blood transfusion, then the probability of, for instance, hiv(a) will be higher 10

still, because there are now two independent causes for hiv(a): a could have gotten infected by a transfusion, but also because b was first infected by a transfusion and infected a in turn. The best way of representing this causal loop in a Bayesian network, seems to be to introduce, for every proposition hiv(x) in this loop, a new node exthiv(x) , representing the possibility that x has gotten infected by some external (i.e., not in the loop) cause. Every exthiv(x) will have as its parents all possible external causes for hiv(x). Because each parent is an independent cause for exthiv(x) , the probabilities in the conditional probability table for this node can be calculated as in Section 5.1. The children of exthiv(x) will be all the hiv(y) propositions in the loop. The entries in the conditional probability tables can be derived from arguments such as those given above, e.g., P (hiv(b) | exthiv(a) , ¬exthiv(b) ) = 0.6. In the full version of this paper, we show that the phenomenon described here may also occurs when there are no real cyclic causal relations, but only an “apparent” cyclicity, caused by the fact that, initially, the direction of some causal relation might not yet be known. Another consequence of the “no deus ex machina”-principle is that CPlogic does not require cases in which “nothing happens”, i.e., an atom is not caused by anything, to be mentioned. Obviously, this can make representations more compact. This feature is made more powerful by the fact that CP-theories can contain negation, which allows the falsity of a certain atom (in other words, the absence of a cause for this atom) to act as a cause for other atoms. To illustrate, we consider the well-known dice game of craps. Example 3. In craps, one keeps on rolling a pair of dice until one either wins or loses. In the first round, one immediately wins by rolling 7 or 11 and immediately loses by rolling 2,3, or 12. If any other number is rolled, this becomes the player’s so-called “box point”. The game then continues until either the player wins by rolling the box point again or loses by rolling a 7. (roll(T +1, 2) : 1/11)∨· · ·∨(roll(T +1, 12) : 1/11) ← ¬win(T ), ¬lose(T ). win(1) ← roll(1, 7). win(1) ← roll(1, 11). lose(1) ← roll(1, 2). lose(1) ← roll(1, 3). lose(1) ← roll(1, 12). boxpoint(X) ← roll(1, X), ¬win(1), ¬lose(1). win(T ) ← boxpoint(X), roll(T, X), T > 1. lose(T ) ← roll(T, 7), T > 1. Here, we only specify when the game is won or lost and use negation to express that, as long as neither happens, the game carries on. In Bayesian 11

networks, there is no real way of ignoring irrelevant cases. Instead, there will be a probability of zero in the conditional probability table. For this game, we could use variables rollt , representing the outcome of a certain roll (with domain 2 through 12) and bp representing the box point (with possible values 4,5,6,7,8,9 or 10), that influence the state of the game at time t as follows: statet Win Lose Neither

(4, 2) 0 0 1

(4, 3) 0 0 1

(4, 4) 1 0 0

(bp, rollt ) (4, 5) (4, 6) 0 0 0 0 1 1

(4, 7) 0 1 0

(4, 8) 0 0 0

··· ··· ··· ···

CP-logic inherits this feature from its logic programming roots. Indeed, for instance, one of the strengths of logic programming in reasoning about actions has always been the fact that it is easy to incorporate frame axioms that state that some property persists unless there is a cause for it to be terminated (see, e.g., [3]). The same phenomenon will of course occur when CP-logic is used to reason about non-deterministic actions.

6

Conclusions

We have investigated the role of causality in probabilistic modeling. To this end, we introduced CP-logic, a language based on two fundamental principles. The first is the principle of independent causation, which allows a natural, compact, and elaboration tolerant representation for the often occurring pattern of a proposition having a number of independent possible causes. The second principle, that of no deus ex machina effects, allows this methodology to be applied to cyclic causal relations as well. Moreover, it also makes representations become more compact, due to the fact that only the cases in which a proposition will be caused need to be mentioned. In the full version of this paper, we show that our semantics is well-defined and equivalent to the instance-based semantics and that our analysis of the similarities and differences between CP-logic and Bayesian networks suffices to derive transformations between these two formalisms. We also compare our approach to related work, such as [5, 2, 6, 9].

12

References [1] K.R. Apt, H.A. Blair, and A. Walker. Towards a theory of Declarative Knowledge. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming, pages 89–148. Morgan Kaufmann, 1988. [2] C. Baral, M. Gelfond, and N. Rushton. Probabilistic reasoning with answer sets. In Proc. Logic Programming and Non Monotonic Reasoning, LPNMR’04, pages 21–33. Springer-Verlag, 2004. [3] M. Denecker and E. Ternovska. Inductive situation calculus. In D. Dubois and C. Welty, editors, Principles of Knowledge Representation and Reasoning: Proceedings of the Ninth International Conference (KR2004), pages 545–553, 2004. [4] M. Denecker, D. Theseider-Dupr´e, and K. Van Belleghem. An inductive definition approach to ramifications. Linkoping Electr. Art. in Comput. Inform. Sc., 3(7):1–43, January 1998. [5] J.Y. Halpern. An analysis of first-order logics of probability. Artificial Intelligence, 46:311–350, 1989. [6] K. Kersting and L. De Raedt. Towards combining inductive logic programming and Bayesian networks. In Proceedings of the 11th international conference on inductive logic programming, 2001. [7] J. Pearl. Probabilistic Reasoning in Intelligent Systems : Networks of Plausible Inference. Morgan Kaufmann, 1988. [8] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000. [9] D. Poole. Abducing through negation as failure: stable models within the independent choice logic. Journal of logic programming, 44:5–35, 2000. [10] C. Sakama and K. Inoue. An alternative approach to the semantics of disjunctive logic programs and deductive databases. Journal of automated reasoning, 13(1):145–172, 1994.

13

[11] A. Van Gelder, K.A. Ross, and J.S. Schlipf. The Well-Founded Semantics for General Logic Programs. Journal of the ACM, 38(3):620–650, 1991. [12] J. Vennekens, S. Verbaeten, and M. Bruynooghe. Logic programs with annotated disjunctions. In Logic Programming, 20th International Conference, ICLP 2004, Proceedings, volume 3132 of Lecture Notes in Computer Science, pages 431–445. Springer, 2004.

14

Suggest Documents