Learning with Abduction

8 downloads 0 Views 248KB Size Report
RLD93] L. De Raedt, N. Lavra c, and S. D zeroski. Multiple Predicate Learning. In Proceedings of the 3rd International Workshop on Inductive Logic Pro-.
Learning with Abduction A.C. Kakas Department of Computer Science, University of Cyprus 75 Kallipoleos str., CY-1678 Nicosia, Cyprus [email protected]

F. Riguzzi DEIS, Universita di Bologna Viale Risorgimento 2, 40136 Bologna, Italy [email protected]

Abstract

We investigate how abduction and induction can be integrated into a common learning framework through the notion of Abductive Concept Learning (ACL). ACL is an extension of Inductive Logic Programming (ILP) to the case in which both the background and the target theory are abductive logic programs and where an abductive notion of entailment is used as the coverage relation. In this framework, it is then possible to learn with incomplete information about the examples by exploiting the hypothetical reasoning of abduction. The paper presents the basic framework of ACL with its main characteristics and illustrates its potential in addressing several problems in ILP such as learning with incomplete information and multiple predicate learning. An algorithm for ACL is developed by suitably extending the top-down ILP method for concept learning and integrating this with an abductive proof procedure for Abductive Logic Programming (ALP). A prototype system has been developed and applied to learning problems with incomplete information. The particular role of integrity constraints in ACL is investigated showing ACL as a hybrid learning framework that integrates the explanatory (discriminant) and descriptive (characteristic) settings of ILP.

1 Introduction The problem of integrating abduction and induction in Machine Learning systems has recently received renewed attention with several works on this topic [DK96, AD95, AD94, Coh95, ?]. In [DK96] the notion of Abductive Concept 1

Learning (ACL) was proposed as a learning framework based on an integration of Inductive Logic Programming (ILP) and Abductive Logic Programming (ALP) [KKT93]. Abductive Concept Learning is an extension of (ILP) that allows us to learn abductive logic programs of ALP with abduction playing a central role in the covering relation of the learning problem. The abductive logic programs learned in ACL contain both rules for the concept(s) to be learned as well as general clausal theories called integrity constraints. These two parts are synthesized together in a non-trivial way via the abductive reasoning of ALP which is then used as the basic covering relation for learning. In this way, learning in ACL synthesizes discriminant and characteristic induction to learn abductive logic programs. This paper presents the basic framework of ACL with its main characteristics and demonstrates its potential for addressing several problems in ILP. From a practical point of view ACL allows us to learn from incomplete information and to later classify new cases that again could be incompletely speci ed. Particular cases of interest are learning from incomplete background data and multiple predicate learning where the abductive reasoning is used to suitably connect and integrate the learning of the di erent predicates. An algorithm for ACL is presented. This algorithm adapts the basic topdown method of ILP to deal with the incompleteness of information and to take into account the use and learning of integrity constraints. The algorithm also incorporates an abductive proof procedure and other abductive reasoning mechanisms from ALP that are suitably adapted for the context of learning. The algorithm is implemented in a new system also called ACL. Suitably adapted heuristics have been used that take into account the incompleteness of information. Several initial experiments are presented that demonstrate the ability of ACL to learn with incomplete information and its usefulness in multiple predicate learning. The rest of this paper is organized as follows. Section 2 presents a short review of ALP needed for the formulation and description of the main properties of ACL which are presented in section 3. Section 4 presents the basic algorithm for ACL and its implementation into a rst system. Section 5 presents our initial experiments with ACL and section 6 discusses related work and concludes with a discussion of future work.

2 Abductive Logic Programming In this section we brie y review some of the elements of Abductive Logic Programming (ALP) needed for the formulation of the learning framework of Abductive Concept Learning (ACL). For a more detailed presentation of ALP the reader is referred to the survey [KKT93] (and its recent update [KKT97]) and references therein. 2

Abductive Logic Programming is an extension of Logic Programming to support abductive reasoning with theories (logic programs) that incompletely describe their problem domain. In ALP this incomplete knowledge is captured (represented) by an abductive theory T. De nition 2.1 An abductive theory T in ALP is a triple hP; A; I i, where P is a logic program1, A is the set of predicates called abducible predicates, and I is a set of rst-order closed formulae called integrity constraints2. As a knowledge representation framework, when we represent a problem in ALP via an abductive theory T we generally assume that the abducible predicates in A carry all the incompleteness of the program P in modeling the external problem domain in the sense that if we (could) complete the information on the abducible predicates in P then P would completely describe the problem domain. Partial information that may be known about these abducible predicates can be represented either explicitly in the program P or implicitly as integrity constraints in I. An abductive theory can support abductive (or hypothetical) reasoning for several purposes such as diagnosis, planning or default reasoning. The central notion used for this is that of an abductive explanation of an observation or a given goal. Informally, an abductive explanation consists of a set of (ground) abducible facts (also called abductive hypotheses) on some of the abducible predicates in A which when, added to the program P, render the observation or goal true. The integrity constraints in I must always be satis ed by any extension of the program P with such abductive hypotheses for this to be an allowed abductive explanation. To formalize this we need rst the notion of Generalized Model of an abductive theory. De nition 2.2 Let T = hP; A; I i be an abductive theory and  a set of ground abducible facts from A. M() is a generalized stable model of T i  M() is the minimal Herbrand model of P [ , and  M() j= I . The program P [  is called an abductive extension of P or T .

Here the semantics of integrity constraints is de ned by the second condition in the de nition. Their satisfaction requires that they are true statements in the extension of the program with  for this to be allowed. An abductive theory is thus viewed as representing a collection of di erent allowed states given by the set of its generalized models. 1 For simplicity, we will assume that all logic programs are de nite Horn programs. 2 In practice, integrity constraints are restricted to be range-restricted clauses.

3

De nition 2.3 Let T = hP; A; I i be an abductive theory and  any formula3 called observation (or query). An abductive explanation for  in T is any set  of abducible facts from A such that  M() is a generalized model of T , and  M() j= .

De nition 2.4 Let T = hP; A; I i be an abductive theory and  any formula. Then  is abductively entailed by T , denoted by T j=A  i there exists and abductive explanation of  in T . We say that  is a credulous abductive consequence of T . A formula  is called a skeptical abductive consequence of T i  is true in all abductive extensions of T . Note that although the integrity constraints reduce the number of possible explanations for a set of observations it is still possible for several explanations that satisfy (do not violate) the integrity constraints to exist. This problem is known as the multiple explanations problem. In fact, abduction is often described as inference to the best explanation. To this end, additional criteria can help to discriminate between di erent explanations, by rendering some of them more appropriate or plausible or preferred over others. A typical criterion is the requirement of minimality (wrt set inclusion) of the explanations. Also various (domain speci c) criteria of preference can be speci ed. Moreover, we may be able to acquire new additional information that refutes or corroborates an explanation. The following example illustrates the above ideas. Example 2.5 Consider the following abductive framework hP; A; I i with P the logic program on family relations:

father(X; Y ) parent(X; Y ); male(X) mother(X; Y ) parent(X; Y ); female(X) son(X; Y ) parent(Y; X); male(X) daughter(X; Y ) parent(Y; X); female(X) child(X; Y ) son(X; Y ) child(X; Y ) daughter(X; Y ) loves(X; Y ) parent(X; Y ); the integrity constraints I = f male(X); female(X)g, and abducible predicates A = fparent; male; femaleg. Consider now the observation O1 = father(Bob; Jane). An abductive explanation for O1 is the set 1 = fparent(Bob; Jane); male(Bob)g. This is the unique minimal explanation. Let now O2 = child(John; Mary) be another observation.

3 In general  can be any formula. In practice in many cases it suces for  to be the conjunction of a set of ground facts.

4

This has two possible explanations 2 = fparent(John; Mary); male(John)g and 2 = fparent(John; Mary); female(John)g. If we also knew that male(John) holds then 2 would be rejected due to the violation of the integrity constraint. In fact, these two explanations are incompatible with each other. But, note that since parent(John; Mary) belongs to both explanations, then parent(John; Mary) and loves(John; Mary) are skeptical consequences of this abductive framework and the observation O2. 0

0

In some applications of abduction, e.g. when we apply abductive reasoning in learning (see next section), a case of particular interest is that of explaining negative observations or in other words explaining the falsity (or absence) of an atomic observation. In this case it is useful to extend the basic framework of ALP described above so that the set of abductive explanations contains explicitly assumptions of the absence as well as the presence of abductive hypotheses. According to the above de nitions 2.3 and 2.4 a negative fact not A 4 is explained by the theory T when there exists a generalized model M() such that M() 6j= A (also denoted by M() j= not A). For any given such  all abducible assumptions that do not belong to M() are assumed to be false. It is then useful to extend our abductive explanations for not A to record explicitly these negative abducible assumptions. More speci cally, it is useful to record in  a set of negative abducible assumptions that is sucient to ensure that the positive observation can not be abductively derived. To illustrate this consider again the above example and the negative observation that not father(Jane; John). An extended abductive explanation for this is 1 = fnot parent(Jane; John)g or 2 = fnot male(Jane)g, where following notation from ALP we use not ab to denote that the abducible assumption ab must be false or absent. This then means that, for example in the case of 1 , any set  of abducible assumptions that satis es the conditions of 1 namely that it does not contain the assumption parent(Jane; John) constitutes an abductive explanation for the negative observation not father(Jane; John). Furthermore, if the assumptions of absence of abducible hypotheses can be met then the negative observation will always hold and so the corresponding positive observation can not be abductively derived. De nition 2.6 Let T = hP; A; I i be an abductive theory and Q = not A a negative observation (or query). An extended abductive explanation for Q in T is any set  of positive and negative abducible facts from A such that  M(), where  is the subset of  containing only positive assumptions, is a generalized model of T ,  for any negative assumption not ab 2  , ab 62 M(), and

4 We will denote negative facts using the Negation as Failure symbol not instead of : as in many cases what is needed is the failure of the atom A rather than the explicit negation of the atom A.

5

 M() j= Q. Hence the extended abductive explanations can be seen as a speci cation of a set of sucient conditions which when met ensure that the corresponding positive observation can not be abductively entailed in the credulous sense i.e. that there is no abductive explanation for the positive observation. The satisfaction of these requirements is then usually achieved by generating or learning suitable integrity constraints that render the positive assumption ab of a negative assumption not ab in  inconsistent.

3 Learning with Abduction Abductive concept learning can be de ned as a case of discriminant concept learning but which contains in it also a characteristic learning problem. It thus combines discriminant and characteristic learning in a non-trivial way using the abductive entailment relation as the covers relation for the training examples. The language of hypotheses is that of Abductive Logic programming as is also the language of background knowledge. The language of the examples is simply that of atomic ground facts.

De nition 3.1 Abductive Concept Learning (ACL) Given  a set of positive examples E + of a concept C ,  a set of negative examples E ? of a concept C ,  an abductive theory T = hP; A; I i as background theory. Find A new abductive theory T 0 = hP 0 ; A; I 0i such that  T 0 j=A E + ,  8e? 2 E ? , T 0 6j=A e? . Therefore, ACL di ers from ILP because both the background knowledge and the learned program are abductive logic programs. As a consequence, the notion of deductive entailment of ILP is substituted with the notion of abductive entailment (j=A ) in ALP. The learned hypothesis T 0 = hP 0; A; I 0i will contain in P 0 together with the background knowledge P rules for the concept C and (possibly) new integrity constraints in I 0 to restrict the abducibles in A and thus implicitly also the de nition of the concept in P 0 . The concept rules in P 0 are learned using a form of discriminant induction based on abductive entailment. This discriminant induction is integrated with a form of characteristic induction which from the 6

same data learns integrity constraints that complement the rules in their solution of the discriminant learning problem. Abductive concept learning can be illustrated as follows.

Example 3.2 Suppose we want to learn the concept father. Let the background theory be T = hP; A; ;i where: P = fparent(john; mary); male(john); parent(david; steve); parent(katy; ellen); female(katy)g the incomplete or abducible predicates are A = fmale; femaleg and let the training data be: E + = ffather(john; mary); father(david; steve)g E ? = ffather(katy; ellen)g In this case, a possible hypotheses T 0 = hP 0; A; I 0 i learned by ACL would add in P 0 be the rule father(X; Y ) parent(X; Y ); male(X):

under the (extended) abductive assumptions  = fmale(david); not male(katy)g. The positive assumption comes from the requirement to cover the positive examples and the negative example from the requirement to fail covering the negative examples. In addition, by considering the background knowledge together with the positive assumption male(david) from  , we could learn (using characteristic induction) the integrity constraint in I 0 :

male(X); female(X):

Note that this integrity constraint provides independent support for the negative assumption not male(katy) in  thus ensuring that the negative example can not be abductively covered by the learned hypothesis.

It is important to note that the treatment of positive and negative examples by ACL is asymmetric. Whereas for positive examples it is sucient for the learned theory to provide an explanation of these for the negative examples there should not exist a single abductive extension of the learned theory that covers a negative example i.e. all abductive extensions of the theory should imply the corresponding negative fact. In other words, the corresponding negative facts should be skeptical consequences of the learned theory whereas for the positive facts it is sucient for these to be credulous consequences. It is often useful for practical reasons to initially relax this strong requirement on the negative examples and consider an intermediate problem, denoted by I-ACL, where the negative examples are also only required to be credulously uncovered i.e. that there exits in the learned theory at least one extension where the negative examples can not be covered (and of course in this extension it is also possible to cover the positive examples). To this end we de ne the intermediate problem where the two conditions in the above de nition are replaced by: 7

 T 0 j=A E + ; not E ? , where not E ? = fnot e? je? 2 E ? g. We can then solve rst this simpler problem using the notion of extended abductive explanations (as is done in the example above) and then use these explanations as input for further learning in order to satisfy the full ACL problem. We will also see that in some cases solving this intermediate problem is sucient to solve the full problem of ACL particularly when the background theory already contains relevant integrity constraints.

3.1 Properties of Abductive Concept Learning

In ACL we can compare two hypotheses by comparing the abductive explanations for the coverage of the positive examples or more generally by comparing their generalized models.

De nition 3.3 Let T = hP; A; I i and E +, E ? be given. Let T1 and T2 be two hypotheses for this learning problem. T1 is a more general solution than T2 i for any e+ 2 E + and any abductive explanation 2 of e+ in T2 there exists an abductive explanation 1 of e+ in T1 such that 1  2. De nition 3.4 Let T1 and T2 be two abductive theories such that for any generalized model M2 () of T2,  is an allowed abductive extension of T1 and its generalized model M1 () contains M2 (). Then T1 is a more general theory than T2 .

It is easy to see that if T1 is a more general theory than T2 then T1 is a more general solution than T2 . Requiring that the hypothesis is maximally general then has the e ect that the learned concept rules in the logic program P 0 of the hypothesis T 0 are minimally speci c, avoiding the addition of extra consistent but unnecessary abducible conditions in the body of the rules. In general, the monotonicity property does not hold for ACL. Given two solutions T1 = hP1 ; A; I1i and T2 = hP2 ; A; I2i that cover a positive example their union T = hP1 [ P2 ; A; I1 [ I2 i does not necessarily cover this example. This can happen because an abductive explanation which satis es the integrity constraints I1 (or the constraints I2 ) may violate the constraints I1 [ I2 . ACL has a restricted form of monotonicity as stated by the following lemma.

Lemma 3.5  If T1 = hP1 ; A; I i covers e+ and T2 = hP2; A; I i covers e+ then T = hP1 [ P2; A; I i covers e+ .  Let T1 = hP1 ; A; I1i cover e+ with abductive explanation  and T2 = hP1 [ ; A; I2i cover e+ . Then T = hP1 ; A; I1 [ I2 i covers e+ . 8

As a consequence of this property if we remember the abductive assumptions on which the coverage of the positive examples rests, then the generation of the integrity constraints in the learned theory (for uncovering the negative examples) can be monotonic. In the above example 3.2 in generating the integrity constraint we include in the background knowledge the abducible hypotheses male(david) needed for the coverage of the positive examples thus ensuring that the generated constraint would not uncover any positive examples.

4 Algorithms In this section we present an algorithm for ACL. For simplicity, we will concentrate on the case of single predicate learning and discuss brie y at the end of the section how this can be extended to multiple predicates. Also we will not consider here fully the problem of integrating the learning of integrity constraints in this algorithm but rather we will assume that these exist (or have been learned) in the background theory. Thus the formal speci cation problem for this algorithm is that of the intermediate problem I-ACL (see section 3). For more details on these issues the reader is referred to the technical report [KR96] on ACL. The algorithm is based on the generic top-down algorithm (see e.g. [LD94]) suitably adapted to deal with the incompleteness of the abducible predicates and to take into account the integrity constraints in the background theory. It incorporates (and adapts) algorithms for abductive reasoning from ALP [KM90] extending the algorithm in [ELM+ 96]. The top level covering and specialization loops of the algorithm are shown in ( gure 1) and ( gure 2) respectively. The covering loop starts with empty hypothesis and, at each iteration, a new clause is added to the hypothesis. The new clause to add is generated in the specialization loop. This clause is selected amongst other possible re nements of previous clauses in the current set of clauses (agenda) using a specially de ned heuristic evaluation function ( gure 3). The agenda is initialized to a clause with an empty body for each target predicate and it is gradually enlarged at each iteration of the specilization loop. The evaluation of a clause is done by considering each positive example e+ and negative example e? , and by starting and abductive derivation for e+ and not e? respectively, using the procedure de ned in [KM90]. This procedure takes as input the goal to be derived, the abductive theory and, if it succeeds, produces as output the set of assumptions abduced during the derivation. The set of assumptions abduced for earlier examples are also considered as input to ensure that the assumptions made during the derivation of the current example are consistent with the ones made before. Thus, we can test each positive and negative example separately and be sure that the clause will derive e+1 ^ : : : ^ e+n ^ not e?1 ^ : : : ^ not e?m . The positive and negative examples covered are counted, distinguishing be9

procedure I-ACL( inputs : E + ; E ? : training sets, T = hP; A; I i : background abductive theory, outputs : H : learned theory,  : abduced literals) H := ;  := ; Agenda := f hp(X) true:; V aluei p target predicate, V alue is the value of the heuristic function for the ruleg while E + =6 ; do (covering loop) Specialize(input: Agenda; P; H; E +; E ? ; ; output: Rule; NewAgenda) Evaluate Rule obtaining: + the set of positive examples covered by Rule ERule Rule the set of literals abduced during the derivation of e+ and not e? + + E := E + ? ERule H := H [ fRuleg  :=  [ Rule Evaluate all the rules in NewAgenda according to the new theory and  Agenda = NewAgenda

endwhile output H; 

Figure 1: ACL, the covering loop

10

procedure Specialize( inputs : Agenda : current agenda, T : background theory, H : current hypothesis,E + ; E ? : training sets,  : current set of abduced literals outputs : Rule : rule, Agenda)

Select and remove Best rule from Agenda While Best covers some e? do BestRefinements := set of re nements of Best allowed by the language bias for all Rule 2 BestRefinements do V alue := Evaluate(Rule; T; H; E +; E ?; ) if Rule covers at least one pos. ex. then add hRule; V aluei to Agenda

endfor Select and remove Best rule from Agenda endwhile let Best be hRule; V aluei output Rule; Agenda

Figure 2: ACL, the specialization loop

11

function Evaluate( inputs : Rule: rule ,T = hP; A; I i : background theory, H : current hypothesis, E + ; E ? : training sets,  : current set of abduced literals) returns the value of the heuristic function for Rule

n = 0, number of pos. ex. covered by Rule without abduction nA = 0, number of pos. ex. covered by Rule with abduction n = 0, number of neg. ex. covered by Rule (not e? has failed) n A = 0, number of neg. ex. uncovered with abduction in =  for each e+ 2 E + do if AbductiveDerivation(e+; hP [ H [ fRuleg; A; I i; in; out) succeeds then if no new assumptions have been made then increment n

else increment nA endif endif in = out

endfor for each e? 2 E ? do if AbductiveDerivation(not e? ; hP [ H [ fRuleg; A; I i; in; out) succeeds then if new assumptions have been made then increment n A else

increment n

endif

in = out

endfor

return Heuristic(n ; nA ; n ; n A ) Figure 3: ACL, evaluation of the heuristic function

12

tween examples covered (uncovered) with or without abduction, and these numbers are used to calculate the heuristic function. This heuristic function of a clause is an expected classi cation accuracy [LD94]: A(c) = p(jc) where p(jc) is the probability that an example covered by a clause c is positive. In de ning this we need to take into account the relative strength of covering an example with abduction (i.e. some assumption are needed and the strength of those assumptions) or without assumption (i.e. no assumption is needed) and similarly for the failure to cover negative examples with or without assumptions. The heuristic function used is    A =  n +k nA n + n + k  nA + k  nA where n ; nA ; n ; n A are as de ned in gure 3. The coecients k and k are introduced in order to take into account the uncertainty in the coverage of the positive examples and in the failure to cover the negative examples when abduction is necessary for this to occur. Further details on these coecients and the heuristic function can be found in [KR96]. We also mention that the integrity constraints given (or learned rst) in the background theory are used in several ways during the computation of the nal hypothesis. They can a ect signi cantly the coverage numbers of the rules while these are constructed by rendering assumptions impossible. One interesting case of their use is their con rmation that negative assumptions of the form not ab(e? ) generated in the computed extended abductive explanations are indeed valid by rendering the assumption ab(e? ) inconsistent. Then if not ab(e? ) is the only assumption that is needed to ensure that the negative example e? is not covered this example will now not be counted in n A . Again details can be found in [KR96].

4.1 Properties of the Algorithm

The algorithm described above is sound but not complete for the intermmediate problem I-ACL. Its soundness is ensured by the soundness of the abductive proof procedure and its extensions developed for the learning algorithm. As the assumptions abduced for the positive and negative examples are globally recorded the algorithm can not generate new clauses that make previous clauses inconsistent (i.e. cover negative examples) or that reduce their coverage of positive examples. The search space is not completely explored. In particular, there are two choice points which are not considered in order to reduce the computational complexity of the algorithm. The rst choice point is related to the greedy search in the space of possible programs. When no new clause can be added by the 13

specialization loop, backtracking can be performed on the previous clause added: the clause is retracted and the previous agenda is considered. The procedure Specialize is called again and the second best rule is chosen for specialization. The second choice point concerns the abductions that are made in order to cover the examples. In fact, an example can be abductively derived with di erent sets of assumptions. Depending on which set of assumptions we choose this can change the outcome of the algorithm later on. We will now brie y discuss how the above algorithmcan be easily extended to learn correctly multiple predicates and how this can tackle some of the problems of multiple predicate learning. The chages required essentialy amount to two simple extsnsions. First the candidate clauses for the di erent predicates are kept together in the agenda and the choice of which predicate (in fact which rule of the predicate) to learn further is left to the heuristic value of the rules. This approach is similar to the one adopted in the system MPL [RLD93] in order to tackle the problem of the order of learning di erent predicates (and di erent rules, in the case of mutually recursive predicates). Secondly, for each predicate, the other predicates in the set of multiple predicates to be learned are considered as additional abducible predicates of the background theory for this predicate. This then means that information from one rule can be passed via the abductive explanations to the construction of other rules for other predicates. In this way we can tackle the second major problem of most intensional empirical ILP systems for multiple predicate learning, namely the fact that the addition of a new consistent clause to a theory can make previous clauses inconsistent i.e. make them cover negative examples. In other words, we can be sure that local consistency of the currently constructed clause also implies global consistency of the whole theory. The next example shows how ACL can be useful for performing multiple predicate learning, exploiting assumptions made while learning a rule for one predicate as additional training data for learning another predicate.

Example 4.1 Suppose we want to learn the predicates grandfather and father . Let the background theory be: P = fparent(john; mary); male(john); parent(mary; ellen); female(mary); parent(ellen; sue); parent(david; steve); parent(steve; jim)g A = fgrandfather; father; male; femaleg

and let the training data for both concepts be:

E + = fgrandfather(john; ellen); grandfather(david; jim); father(john; mary)g E ? = fgrandfather(mary; sue)g

Suppose that the system generates rst the following rule for the grandfather predicate:

14

grandfather(X; Y ) father(X; Z); parent(Z; Y ):

This will be accompanied by the set of assumptions 1 = ffather(david; steve); not father(mary; ellen)g which become additional training data for the father predicate enabling us to learn the rule:

father(X; Y ) parent(X; Y ); male(X)

abducing now additionaly 2 = fmale(david); not male(mary)g. Note that without the new negative example father(mary; ellen) this rule could not be learned. Conversely, if we rst generate in the agenda the rule:

father(X; Y ) parent(X; Y )

for the father predicate, and then the same rule as above for the grandfather predicate, the extra information on father in 1 above will force the further specialization of the rule for father before the algorithm terminates.

Finally, we mention again that the algorithm presented in this section solves the intermmediate problem of I-ACL which is a restricted version of ACL although in some cases this is also sucient for the full ACL problem. In order to perform full ACL we must incorporate the learning of integrity constraints to ensure that negative examples are not abductively derivable in the learned theory. One way to address this issue is to integrate into the algorithm a system for characterizing induction, such as Claudien [RB93] (or ICL [RL95]) which will take as input together with the background theory and positive (or all) training examples, the abductive assumptions  (or  ) and generate clauses to be used as integrity constraints.

5 Experiments The main purpose of the experiments reported in this section is to demonstrate the ability of I-ACL and ACL to learn from incomplete background data and to perform multiple predicate learning. We will consider the well-known multiplexer example and problems of learning family relations from an incomplete database.

5.1 Multiplexer

The multiplexer example is a well-known benchmark for inductive systems [dV89]. It has recently been used in [RD96] for showing the performance of the system ICL-Sat on incomplete data. We perform experiments on the same data of [RD96] and compare the results of I-ACL with those of ICL-Sat. The problem consist in learning the de nition of a 6 bits multiplexer, starting from a training set where each example consist in 6 bits, where the rst two bits are interpreted as the address of one of the other four bits. If the bit at the speci ed address is 1 (regardless of the values of the other three bits), the 15

Experiments Complete background Incomplete background Incomplete background plus constraints

I-ACL ICL- Sat 100 % 100 % 98.4 % 82.8 % 96.9 % 92.2 %

Table 1: Performance on the multiplexer data example is considered positive, otherwise is considered negative. For example, consider the example 10 0110. The rst two bits specify that the third bit should be at 1, so this example is positive. We represent this example by adding the positive example mul(e1) to the training set and by including the following facts in the background knowledge bit1at1(e1): bit2at0(e1): bit3at0(e1): bit4at1(e1): bit5at1(e1): bit6at0(e1): For the 6-bit multiplexer problem we have 26 = 64 examples, 32 positive and 32 negative. We perform three experiments using the same datasets for each one as in [RD96]: the rst on the complete dataset, the second on an incomplete dataset and the third on the incomplete dataset plus some integrity constraints. The incomplete dataset was obtained by considering 12 examples out of 64 and by specifying for them only three bits where both the examples and bits were selected at random. E.g. the above example 10 0110 could have been replaced by 1? 0?1?. The incomplete example is still in the set of positive examples and its description in the background knowledge would be bit1at1(e1): bit3at0(e1): bit5at1(e1): Now, all the predicates bitNatB are abducibles and integrity constraints of the form below are added to the background theory bitNat0(X); bitNat1(X): The dataset of the third experiment is obtained by including additional integrity constraints to the incomplete dataset. I-ACL was run on all the three dataset. The measure of performance that was adopted is accuracy, de ned as the number of examples correctly classi ed over the total number of examples in the testing set, i.e. the number of positive examples covered plus the number of negative examples not covered over 64. The testing set is the same as the training set of complete examples. The results are reported in table 1. It is interesting to note that the loss of accuracy of ACL is not due to negative examples covered by the rules but it is due to positive examples not covered (1 for the rst experiment and 2 for the second): the learned theories are consistent but not complete. The accuracy 16

Experiments n n ACL Accuracy Experiment 2 32 5 92.2 % Experiment 3 32 2 96.9 % Table 2: Testing with incomplete data of I-ACL in the third experiment is lower than in the second: this unexpected result is due to the fact that the testing is done with a background theory di erent from the one used for learning. We also note that the results shown in the table 1 refer to the accuracy of the learned theory with respect to the full ACL problem showing that the I-ACL system has in this case solved successfuly the full ACL problem. We can also test the theory with incomplete data thus showing the ability of the generated theories to classify incomplete examples. We tested the theory on the same incomplete data set used for learning in experiments 2 and 3. The results of this di erent testing are reported in table 2, where n is the number of covered positive examples, while n ACL is de ned as the number of negative examples that are (incorrectly) covered according to the full ACL, i.e. for which e? abductively succeeds instead of failing. Similar results are obtained with other randomly generated incomplete subsets of the complete training examples. Finally, we performed an experiment where more information was taken out. In this case we have taken out completely from the training data some of the negative examples. The I-ACL system now learned rules that are incorrect (i.e. cover negative examples) as they are too general e.g. the rule: mul(X) bit1at1(X); bit2at1(X): Extending the I-ACL algorithm, we run Claudien to learn clauses, integrity constraints, that hold on the positive examples. This is able to generate several constraints amongst which is the constraint mul(X); bit1at1(X); bit2at1(X); bit6at0(X) which e ectively corrects the rule above by preventing the negative examples to be abductively classi ed as positive even when the rule succeeds. This experiment illustrates the potential of ACL to integrate characteristic with discriminant induction.

5.2 Learning Family Relations

We have considered the problem of learning family predicates, e.g. that of father, from a database of family relations [BDR96] containing facts about parent, male and female. We performed several experiments with di erent degree of incompleteness of the background knowledge in order to show the performance of I-ACL as a function of the incompleteness of the data. 17

Learns the correct theory Completeness without constraint with constraint 100 % yes yes 90 % yes yes 80 % yes yes 70 % no yes 60 % no yes 50 % no no Table 3: Performance on the family data The complete background knowledge contains, amongst its 740 facts, 72 facts about parent, 31 facts about male and 24 facts about female. The training set contains 36 positive examples of father taken from the family database and 34 negative examples of father that were generated randomly. The concept father was learned from a background knowledge containing 100 % , 90 %, 80 %, 70 %, 60 % and 50 % of the facts. The facts were taken out from the background randomly while the training set was the same for all experiments. Then, the same experiments were repeated with a background augmented with the constraint male(X); female(X). The results are shown in table 3: without the integrity constraint, I-ACL was able to learn the correct rule father(X; Y ) parent(X; Y ); male(X) with an incompleteness up to 80 %, while with the integrity constraint we still learn the correct rule at 60 %. This experiments demonstrate the usefulness of integrity constraints when learning from an incomplete background knowledge.

5.3 Multiple Predicate Learning

We considered the problem of learning the predicates brother and sibling from a background knowledge containing facts about parent, male and female. The bias allowed the body of the rules for brother to be any subset of fparent(X; Y ); parent(Y; X); sibling(X; Y ); sibling(Y; X); male(X); male(Y ); female(X); female(Y )g while the body of the rules for sibling can be any subset of fparent(X; Y ); parent(Y; X); parent(X; Z); parent(Z; X); parent(Z; Y ); parent(Y; Z); male(X); male(Y ); male(Z); female(X); female(Y ); female(Z)g Therefore, the rules we are looking for are brother(X; Y ) sibling(X; Y ); male(X): sibling(X; Y ) parent(Z; X); parent(Z; Y ): 18

The family database considered, taken from [RLD93], contains 16 facts about brother, 38 about sibling, 22 about parent, 9 about male and 10 about female. The background knowledge was obtained from this database by considering all the facts about male and female and only 50 % of the facts about parent (selected randomly). The training set contains all the facts about brother and 50 % of the facts about sibling (also selected randomly). The negative examples were generated by making the Closed World Assumption and taking a random sample of the false atoms: 36 negative examples for sibling and 37 for brother. Since sibling and parent are incomplete, they both are considered as abducible (brother could also be considered as an abducible but this, as expected, has no e ect). >From this data, the system has constructed rst the rule brother(X; Y ) sibling(Y; X); male(X): It has exploited both the positive examples of sibling to cover positive examples of brother and negative examples of sibling to avoid covering negative examples for brother. This rule was constructed rst because the heuristics preferred it to the rules for sibling, as the information available for brother was more complete than the information for sibling. When learning this rule, I-ACL has made a number of assumptions on sibling: it has abduced 3 positive facts (that will become positive examples for sibling) and 33 negative facts (that will become negative examples for sibling). Then, I-ACL constructs the rule for sibling sibling(X; Y ) parent(Z; X); parent(Z; Y ) using this new training set and making assumptions on parent. This experiment showns how I-ACL is able to learn multiple predicates exploiting as much as possible the information available and generating new data for one predicate while learning another.

6 Conclusions and Related Work We have studied the new learning framework of Abductive Concept Learning setting up its theoretical formulation and properties and developing a rst system for it. Initial experiments with this system have demonstrated ACL as a suitable framework for learning with incomplete information. We have also illustrated the potential ACL as a hybrid framework that can integrate together discriminant and characteristic learning and corresponding ILP systems. Abduction has been incorporated in many learning systems (see e.g. [DK96, AD95, AD94, Coh95, ?]) but in most cases this is seen as a useful mechanism that can support some of the activities of the learning systems. For example, in multistrategy learning (or theory revision) abduction is identi ed as one of the basic computational mechanism (revision operator) for the overall learning process. A notable exception is that of [TM94] where a simple form of abduction is used in the learning covering relation in the context of a particular application of learning theories for diagnosis. Also recently, the deeper relationship between 19

abduction and induction has been the topic of study of an ECAI96 workshop [MRFK96]. Our work builds on earlier work in [DK96] and [ELM+ 96, LMMR97] for learning simpler forms of abductive theories. Finally, the issue of integrating discriminant (or explanatory) and characterizing ILP systems has also been put forward in [DDK96]. Recently, there have been several other proposals for learning with incomplete information. The FOIL-I system [IKI+ 96] learns from incomplete information in the training examples set but not in the background knowledge, with a particular emphasis on learning recursive predicates. Similarly to our approach, it performs a beam-search on the space of clause re nements, keeping a set of clauses of di erent length. In [WD95] the authors propose several frameworks for learning from partial interpretations. A framework that can learn form incomplete information and is closely related to ACL is that of learning from satis ability [RD96]. This framework is more general than ACL as both the examples and the hypotheses can be clausal theories (although it is possible to extend ACL to allow examples which are clausal theories). On the other hand, when the training examples for learning from satis ability are in a suitable form for ACL then learning from satis ability can be simulated by ACL. In this case the learned abductive theory will contain a single trivial (default) rule for the concept with an empty body and the set of integrity constraints will be the clauses found by learning from satis ability. Further development of the ACL system is needed to integrate fully the characteristic induction process of learning the integrity constraints. An interesting approach for this is to use existing ILP systems for characteristic learning such as Claudien and Claudien- Sat, but also discriminant systems like ICL and ICL-Sat for learning integrity constraints that discriminate between positive and negative abductive assumptions generated by ACL. We have also began to investigate the application of ACL to real-life problems in the area of analyzing market research questionnaires where incompleteness of information occurs naturally by unanswered questions or do not know answers.

References [AD94] [AD95] [BDR96] [Coh95]

H. Ade and M. Denecker. RUTH: An ILP Theory Revision System. In Proceedings ISMIS94, 1994. H. Ade and M. Denecker. AILP: Abductive Inductive Logic Programming. In Proceedings IJCAI95, 1995. H. Blockeel and L. De Raedt. Inductive database design. In Proceedings of the 10th International Symposium on Methodologies for Intelligent Systems, volume 1079 of Lecture Notes in Arti cial Intelligence, pages 376{385. Springer-Verlag, 1996. W.W. Cohen. Abductive Explanation-based Learning: A solution to the Multiple Inconsistent Explanation Problem. Machine Learning, 8:167{219, 1995.

20

[DDK96]

Y. Dimopoulos, S. Dzeroski, and A.C. Kakas. Integrating explanatory and descriptive learning in ilp. Technical Report TR-96-16, University of Cyprus, Computer Science Department, 1996. [DK96] Y. Dimopoulos and A. Kakas. Abduction and Learning. In Advances in Inductive Logic Programming. IOS Press, 1996. [dV89] W. Van de Velde. Idl, or taming the multiplexer problem. In Morik K., editor, Proceedings of the 4th European Working Session on Learning. Pittman, 1989. [ELM+ 96] F. Esposito, E. Lamma, D. Malerba, P. Mello, M.Milano, F. Riguzzi, and G. Semeraro. Learning Abductive Logic Programs. In Proceedings of the ECAI96 Workshop on Abductive and Inductive Reasoning, 1996. [IKI+96] N. Inuzuka, M. Kamo, N. Ishii, H. Seki, and H. Itoh. Top-down induction of logic programs from incomplete samples. In S. Muggleton, editor, Proceedings of the 6th International Workshop on Inductive Logic Programming, pages 119{136. Stockholm University, Royal Institute of Technology, 1996. [KKT93] A.C. Kakas, R.A. Kowalski, and F. Toni. Abductive Logic Programming. Journal of Logic and Computation, 2:719{770, 1993. [KKT97] A.C. Kakas, R.A. Kowalski, and F. Toni. The role of abduction in logic programming. In D. et al. Gabbay, editor, Handbook of Logic in AI and Logic Programming. 1997. to appear. [KM90] A.C. Kakas and P. Mancarella. On the relation between Truth Maintenance and Abduction. In Proceedings of PRICAI90, 1990. [KR96] A.C. Kakas and F. Riguzzi. Abductive Concept Learning. Technical Report TR-96-15, University of Cyprus, Computer Science Department, 1996. [LD94] N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, 1994. [LMMR97] E. Lamma, P. Mello, M. Milano, and F. Riguzzi. Integrating induction and abduction in logic programming. In P. P. Wang, editor, Prooceedings of the Third Joint Conference on Information Sciences, volume 2, pages 203{206. Duke University, 1997. [MRFK96] M.Denecker, L.De Raedt, P. Flach, and A. Kakas, editors. Proceedings of ECAI96 Workshop on Abductive and Inductive Reasoning. Catholic University of Leuven, 1996. [RB93] L. De Raedt and M. Bruynooghe. A Theory of Clausal Discovery. In Proceedings of IJCAI93, 1993. [RD96] L. De Raedt and L. Dehaspe. Learning from satis ability. Technical report, Katholieke Universiteit Leuven, 1996. [RL95] L. De Raedt and W. Van Lear. Inductive Constraint Logic. In Proceedings of the 5th International Workshop on Algorithmic Learning Theory, 1995. [RLD93] L. De Raedt, N. Lavrac, and S. Dzeroski. Multiple Predicate Learning. In Proceedings of the 3rd International Workshop on Inductive Logic Programming, 1993.

21

[TM94] [WD95]

C. Thompson and R. Mooney. Inductive learning for abductive diagnosis. In Proc. of AAAI-94, 1994. S. Wroble and S. Dzeroski. The ilp description learning problem: Towards a genearl model-leve de nition of data mining in ilp. In Proceedings of the Fachgruppentre en Maschinelles Lernen, 1995.

22