Learning from satis ability Luc De Raedt and Luc Dehaspe Department of Computer Science, Katholieke Universiteit Leuven Celestijnenlaan 200A, B-3001 Heverlee, Belgium email :f Luc.DeRaedt,Luc
[email protected] March 19, 1997 Abstract
A formalisation of concept-learning in logic is presented under the name learning from satis ability. It generalizes learning from entailment as employed in inductive logic programming and learning from interpretations as used in attribute-value learning. When learning from satis ability, examples and hypotheses are both clausal theories, and a hypothesis covers an example if the conjunction of the hypothesis and the example is satis able. Learning from satis ability is especially useful to represent incomplete information about the examples. Keywords:
inductive logic programming.
1 Introduction A formalization of concept-learning in logic is presented under the name learning form satis ability. In this setting, hypotheses and examples are both clausal theories. Furthermore, a hypothesis H covers an example e if the conjunction of hypothesis and example is satis able, i.e. if H ^ e 6j= 2. This setting seems the most general setting possible within clausal logic as examples as well as hypotheses are both in a most expressive form. Furthermore, as shown in [De Raedt, 1996b], learning from satis ability generalizes learning from interpretations, learning from entailment as well as the classical inductive logic programming settings [Muggleton and De Raedt, 1994]. From a practical point of view, learning from satis ability allows to represent incomplete examples in a very expressive representation language. Two algorithms for learning from satis ability are presented. They are adapted from the earlier Claudien [De Raedt and Dehaspe, 1997] and ICL [De Raedt and Van Laer, 1995] systems that learn from interpretations. The rst algorithm, called Claudien-Sat performs characteristic concept-learning, where the aim is to nd a most speci c hypothesis (within the concept-description language) that covers a given set of positive examples. The second algorithm, called ICL-Sat, performs discriminant concept-learning, where the aim is to nd a hypothesis that discriminates the positive from the negative examples. Both algorithms are integrated and implemented in the existing systems and experiments with these systems are 1
presented. The experiments demonstrate the power and use of learning from satis ability in the context of incompletely speci ed examples. The experiments also show that this ability is not only useful in inductive logic programming, but also in the context of propositional or attribute-value learning. The paper is organised as follows: in Section 2, we brie y review some logical concepts; in Section 3, we introduce learning from satis ability; in Section 4, we present some of its properties; in Section 5, Claudien-Sat and ICL-Sat are presented; in Section 6, we report on some experiments, and nally, in Section 7, we conclude and brie y touch upon related work.
2 Logic
We rst review some standard concepts from the predicate calculus (see e.g. [Genesereth and Nilsson, 1987] for more details). A term t is either a constant, a variable or a compound term f (t1; :::; tn) composed of a function symbol f , and n dierent terms ti. An atom is a logical formula of the form p(t1; ::; tn), where p is a predicate symbol and ti are terms. A literal is an atom or the negation :A of an atom A. Atoms are positive literals, negated atoms are negative literals. In propositional logic, no terms are considered. Thus in propositional logic, only predicate symbols are used in atoms. In computational learning, one often employs CNF formulae, which are of the following form: (l1;1 _ ::: _ l1;n1 ) ^ ::: ^ (lk;1 _ ::: _ lk;n ) k
where all li;j are propositional literals. CNF expressions have also a natural upgrade in rst order logic, namely clausal theories (or CT, for short), which are of the following form: (8V1;1; :::; V1;v1 : l1;1 _ ::: _ l1;n1 ) ^ ::: ^ (8Vk;1; :::; V1;v : lk;1 _ ::: _ lk;n ) k
k
where all li;j are literals and Vi;1; :::; Vi;v are all variables occurring in li;1_:::_li;n . The symbol 8 reads for all and stands for universal quanti cation. Each disjunction is called a clause. A clause h1 _ ::: _ hn _:b1 _ ::: _:bm is often written as an implication h1 _ ::: _ hn b1 ^ ::: ^ bm. Interesting subsets of clausal logic, are Horn (resp. de nite) clause logic. They consist of CT expressions that have at most one (resp. exactly one) positive literal in each clause. These subsets are the basis of the programming language Prolog and of the machine learning technique known under the name of inductive logic programming. Interpretations are used to reason about the truth and falsity of speci c formulae. In this paper, as we use only clausal logic, we will focuss on so-called Herbrand interpretations, which can ? for our purposes ? be de ned as sets of variable-free (i.e. ground) atoms. The meaning of a Herbrand interpretation is that all atoms in the interpretation are true, and all other atoms are false. In the propositional or boolean case, a Herbrand interpretation corresponds to a variable asssignment, i.e. a truth-assignment to the propositional atoms which are the `variables' of the formula. A substitution = fV1=t1; :::; Vn=tn g is an assignment of terms t1; :::; tn to variables V1; :::; Vn. The formula F where F is a term, atom, literal or expression, and = fV1=t1; :::; Vn=tn g i
i
2
is a substitution is the formula obtained by simultaneously replacing all variables V1; :::; Vn in F by the terms t1; :::; tn. By now, we can de ne truth and falsity of a CT expression in a Herbrand interpretation. A ground literal l is true in an interpretation I if and only if l is a positive literal and l 2 I , or l is a negative literal and l 62 I . A CT (8V1;1; :::; V1;v1 : l1;1 _ ::: _ l1;n1 ) ^ ::: ^ (8Vk;1; :::; V1;v : lk;1 _ ::: _ lk;n ) k
k
is true in an interpretation I if and only if for all j and for all substitutions such that (lj;1 _:::_lj;n ) is ground, at least one of the lj;i is true in I . E.g. flies _ :bird _ :abnormal is true in the interpretations ffliesg; fabnormalg but false in fbird; abnormalg. If a theory is true in an interpretation we also say that the interpretation is a model for the theory. Logical entailment and satis ability is typically de ned using interpretations. We will write F j= G (read F logically entails G) when all models of F are also a model for G, and F j= 2 (read F is not satis able) if there exists no interpretation that is a model of F . j
3 Learning from satis ability In concept-learning, one is given a language of concepts LC , a language of examples Le , the covers relation that speci es how LC relates to Le, and two sets of examples P Le (the positives) and N Le (the negatives). The aim in discriminant concept-learning is then to nd a hypothesis H 2 LC that covers all positive examples (i.e. H is complete) and none of the negative examples (i.e. H is consistent). In characteristic concept-learning, no negative examples are given, and the aim is to nd a maximally speci c hypotheses that covers all the positive examples. Logical approaches to concept-learning instantiate this de nition by de ning the representation languages for concepts LC and examples Le , as well as the covers relation between them. Two main settings can be distinguished, cf. [Angluin et al., 1992; Frazier and Pitt, 1993; De Raedt, 1996b; De Raedt, 1996a]:
when learning from entailment LC is a set of clausal theories, Le is a set of clauses and a hypothesis H covers an example e if and only if H j= e when learning from interpretations LC is a set of clausal theories, Le is a set of Herbrand
interpretations and a hypothesis H covers an example e if and only if e is a model for H Learning from entailment is a variant of the main setting employed in inductive logic programming1. Learning from interpretations is a generalization of the typical atrribute value learning situation. We propose learning from satis ability, which is a generalization of the other two settings (cf. [De Raedt, 1996b] for more details). Furthermore, learning from satis ability is the most 1 There one typically also employs a background theory hypothesis if and only if ^ j= . H
B
H
e
3
B
and considers an example covered by a e
general setting possible within clausal logic, as examples and hypotheses are now both clausal theories. A hypothesis then covers an example when the hypothesis and the example are satis able. De nition 1 (learning from satis ability) A hypothesis H (a clausal theory) covers an example e (a clausal theory) if and only if H ^ e is satis able (i.e. H ^ e 6j= 2). So, covers(H; e) = H ^ e 6j= 22. The coverage notion employed within learning from satis ability was proposed earlier by Wrobel and Dzeroski [Wrobel and Dzeroski, 1995]. However, Wrobel and Dzeroski did not specify the form of hypotheses and examples, cf. also Section 7. Notice that it is easy to reformulate the covers relation in terms of entailment. This is because H j= e if and only if H ^ :e j= 2. As shown in section 6.2, this property allows one to transform learning from entailment (where both hypotheses and examples are full clausal theories) into learning from satis ability. So, the main novelty of the proposed framework is the use of examples that are full clausal theories. Notice that learning from satis ability is only meaningful when all examples are satis able (i.e. for all e 2 E : e 6j= 2). When a positive example is not satis able, there is no solution to the learning task. On the other hand, when a negative example is not satis able, the example is redundant. In the remainder of this paper, we will therefore assume that all examples are satis able. Learning from satis ability can be illustrated as follows.
Example 1 Consider the following examples : p1 = f male; human g p2 = ffemale _ male ; human g p3 = fmale ; femaleg n1 = fhuman ;
male; femaleg
The following hypotheses discriminate the positives from the negatives :
H1 = ffemale _ male humang H2 = f male ^ femaleg Furthermore, the following hypothesis characterizes all the positives : H3 = fhuman ; female _ male ; male ^ femaleg
The question as to Why learning from consistency is useful ? still needs to be answered. The rst answer, which is extensively discussed in [De Raedt, 1996b], is that learning from satis ability is more general than the other two settings. It is shown that learning from entailment and learning from interpretations can be reduced to learning from satis ability. This means that if we are provided with an algorithm that learns from satis ability, we can emulate the other two settings. The notion of reduction employed is stronger than that 2
When background knowledge is speci ed in the form of a clausal theory, B
4
(
) = ^ ^ 6j= 2.
covers H; e
B
H
e
traditionally employed within computational learning theory, in that one example in one setting maps to one example in the other setting. The second answer is that learning from satis ability allows us to represent incomplete examples. A rst type of incompleteness concerns missing values for propositions. This is illustrated already in Example 1, where e.g. the value of female is missing in p1 . Other types of incompletely speci ed examples can also be represented using clausal theories. This is again illustrated in Example 1. Consider e.g. p2, in which there are two predicates female and male whose values are not speci ed. However, the clause female _ male states that certain combinations of truthvalues are impossible, in this case it is impossible that both female and male are false. One could e.g. also express that two predicates p and q must have the same truthvalue by including p q and q p in the example. This clearly demonstrates that learning from satis ability allows us to represent many types of incomplete examples in a very expressive and natural way.
4 Theory of learning from satis ability
4.1 Generality and entailment
The relation is more general than plays a crucial role in concept-learning techniques. A hypothesis G is more general than a hypothesis S if and only if all examples that are covered by S are also covered by G. It is well-known that when learning from interpretations and learning from entailment the generality relation corresponds to logical entailment (cf. e.g [De Raedt, 1996a]). This property also holds when learning from satis ability.
Lemma 1 Let S and G be two clausal theories such that S j= G. When learning from satis ability, G will be more general than S .
Proof: Let e be covered by S , i.e. e^S 6j= 2. Because of S j= G it must be that S ^e j= G^e. Because of this and the fact that e ^ S is satis able, we have that e ^ G is satis able, i.e. 2
that G covers e (w.r.t. satis ability).
4.2 Searching
When learning clausal theories from interpretations, two properties facilitate the search for a solution :
Monotonicity : if H1 covers e and H2 covers e then H1 ^ H2 covers e Unique MSG : there is a unique maximally speci c hypothesis (up to logical equiva-
lence) that covers a set of positive examples; or formulated dierently, all maximally speci c hypotheses are logically equivalent
Monotonicity allows us to search the set of all clauses instead of the set of all clausal theories, which is much more ecient. Using monotonicity it is easy to prove the existence of a unique MSG, cf. [De Raedt and Dehaspe, 1997]. The existence of a unique MSG makes 5
the computation of an MSG easy. Indeed, to compute the MSG of a set of positives, one initializes the hypothesis with the MSG covering the rst positive, and repeatedly generalizes this MSG to accomodate new positives. Unfortunately, when learning from satis ability, monotonicity does not hold and the MSG need not be unique: Lemma 2 Monotonicity does not hold when learning from satis ability. Proof: We provide a counter-example. Let H1 = A ; H2 = B ; e = fA_B ; A^B g. Then H1 ^ e 6j= 2, H2 ^ e 6j= 2, but H1 ^ H2 ^ e j= 2. 2 Lemma 3 There may not exist a unique MSG when learning from satis ability. Proof: We provide again a counter-example. Let e = fA _ B ; A ^ B g. There exist two MSG of e, i.e. H1 = fA ; B g and H2 = fB ; Ag. It is easily veri ed that H1 and H2 are MSG's by enumerating all possible hypotheses using the predicates A and B , and eliminating those that are not MSG's. 2 These properties already indicate that learning from satis ability is more complicated than learning from interpretations. In this respect, learning from satis ability is closer to learning from entailment, where monotonicity does not hold either (when learning multiple or recursive predicate de nitions, cf. [De Raedt et al., 1993]). This is no surprise as learning from satis ability can be considered a generalization of learning from entailment.
5 Algorithms In this section two algorithms are proposed for learning from satis ability. The rst algorithm performs characteristic induction and is integrated in the clausal discovery engine CLAUDIEN [De Raedt and Dehaspe, 1997], the second one performs discriminant induction and is integrated in the ICL system [De Raedt and Van Laer, 1995].
5.1 Assumptions
We will assume that the set of clauses allowed in hypotheses L is nite. This is to avoid problems with non-termination or in nite solutions, see [De Raedt and Dehaspe, 1997] for more details. We will also assume that a re nement operator (under -subsumption) exists on L. De nition 2 Let c1 and c2 be clauses. c1 -subsumes c2 if and only if there is a substitution such that c1 c2. -subsumption is often used as a generality relation among clauses. De nition 3 Let L be a set of clauses. is a re nement operator for L if for all c 2 L : (c) = fc j c is a maximally general specialization of c in L under theta-subsumption g. For convenience, we wil assume that there is a unique maximally general clause > within the clausal language L considered. 0
0
6
H := ;; Q := f>g; while Q 6= ; do delete c from Q if H 6j= c then if for all e 2 E : e is a model for c then H := H [ fcg else Q := Q [ (c) Figure 1: Characteristic induction from interpretations
H := ;; Q := f>g; while Q 6= ; do delete c from Q if H 6j= c then if for all e 2 E : e ^ H ^ c is satis able then H := H [ fcg else Q := Q [ (c) Figure 2: Characteristic induction from satis ability
5.2 Characteristic induction
Let us rst look at the clausal discovery algorithm Claudien that learns from interpretations. It is summarized in Figure 1. The algorithm keeps track of a queue of candidate clauses which is initialized to the most general clause (under -subsumption3), i.e. the empty clause denoted by >, and repeatedly deletes a clause from the queue. If the clause is not already entailed by the current hypothesis (and hence redundant) the algorithm tests whether the clause covers all positive examples. If it does, it is added to the hypothesis; otherwise, all its direct re nements are added to the queue. [De Raedt and Dehaspe, 1997] show that this algorithm indeed computes the most speci c generalization of the positive examples when working with interpretations. This generalization is unique up to logical equivalence. One of the reasons why the algorithm is correct is the monotonicity property outlined above. However, when learning from satis ability we cannot longer test each clause independently of the already constructed hypothesis. This results in the Claudien-Sat algorithm in Figure 2. Claudien-Sat discovers one MSG of the positive examples. Whereas Claudien could consider each clause independently (and hence in parallel) with the other clauses, ClaudienSat must consider candidate clause and current hypothesis simultaneously. This is because monotonicity does not hold when learning from satis ability. 3
Using the generality order derived from j= this clause would be the most speci c one. S
G
7
H := ;; Q := f>g; found := false; while not found do delete c from Q if for all p 2 P : p is a model for c and for some n 2 N : n is not a model for c then H := H [ fcg N := fn 2 N j n is a model for cg found := (N = ;) else Q := Q [ (c) filter(Q) Figure 3: Discriminant induction from interpretations
Theorem 1 If Claudien-Sat terminates, H will be a most speci c generalization in L of the example-set.
Proof (sketch): Assume that Claudien-Sat terminates. A most speci c generalization of the example set within a given language of clauses L, is a maximal subset of clauses H L that covers all of the examples. Claudien-Sat only terminates when the queue becomes empty, and this can only happen when all clauses are either implied by the current hypothesis or when the addition of one of these clauses to the hypothesis would make certain positives uncovered. 2 To compute all MSG's instead of a single MSG, Claudien-Sat should be modi ed with some backtracking mechanism. The ICL algorithm of [De Raedt and Van Laer, 1995] is inspired on the CN2 algorithm of [Clark and Niblett, 1989] except that it learns a conjunctive normal form instead of the classical disjunctive normal form. This is extensively discussed by [Mooney, 1995; De Raedt and Van Laer, 1995]. Intuitively, when learning a conjunctive normal form instead of a disjunctive one, it is necessary to apply the covering algorithm over the negatives instead of over the positives. This is because each clause in a conjuctive normal form will cover all positives but will exclude some negatives. Reversing the roles of positives and negatives together with the ideas underlying CN2, resulted in the ICL algorithm, summarized in Figure 3. This algorithm will repeatedly construct a clause that covers all positives and excludes some of the negatives. To nd one such clause, the algorithm starts at the > clause, and repeatedly re nes the clause untill these two conditions are satis ed. When such a clause is found it is added to the hypothesis, and negatives excluded by the clause, are removed from further consideration. If all negatives are excluded, the search stops as a solution has been found. One feature, inherited from CN2, is that a beam-search is performed. This is realized in the algorithm using the function filter that removes all but the best k clauses from further consideration. Figure 4 presents a modi cation of the idealized ICL algorithm for discriminant conceptlearning under satis ability. As for characteristic induction, the main dierence between ICL and ICL-Sat is that the latter must consider clause and hypothesis simultaneously. This is 8
H := ;; Q := f(H; N; >)g; found := false; while not found do delete (H; N; c) from Q if for all p 2 P :H ^ c ^ p is satis able and for some n 2 N : H ^ c ^ n is not satis able then H := H [ fcg N := fn 2 N j H ^ c ^ n 6j= 2g found := (N = ;) if not found then add (H ; N ; >) to Q else add (H; N; c ) to Q for all c 2 (c) filter(Q) 0
0
0
0
0
0
0
Figure 4: Discriminant induction under satis ability realized in ICL-Sat by performing the search at the hypothesis level, instead of at the clause level. Therefore, the beam does not only contain clauses but contains tuples (H; N; c) where H corresponds to a current hypothesis, N to the set of negatives still to be excluded by H and c to a candidate clause that may be added to H . If a candidate clause c together with its corresponding H excludes some negatives (from N ) while still covering all positives, c is added to H , and the negatives excluded are (locally) removed from N . If the resulting N is empty, a solution has been found, and the algorithm terminates. Otherwise, a new clause will be searched for starting from >. On the other hand, if the selected tuple contains a clause that cannot directly be added to the hypothesis, tuples corresponding to all its re nements will be added to the queue. Again, as for CN2, a beam search (at the level of hypotheses) may be performed by setting filter appropriately.
5.3 Implementation
The above algorithms for learning from satis ability are fully integrated in the Claudien-Sat and ICL-Sat systems. Claudien-Sat and ICL-Sat inherited all major other options of the earlier systems. This includes : the DLAB formalism for specifying the syntax of clauses in hypotheses [Dehaspe and De Raedt, 1996; De Raedt and Dehaspe, 1997], the heuristics for focussing on the most interesting hypotheses [De Raedt and Van Laer, 1995], etc. The theoremprover used to check whether a hypothesis covers an example is the Satchmo [Manthey and Bry, 1988] modelgeneration procedure. One simpli cation made to ICL-Sat implementation w.r.t. the algorithm in Figure 4 is that the implementation performs hill-climbing at the level of hypothesis, and beam search only at the level of clause construction. This means that at any time in the ICL-Sat implementation, H and N will be identical for all elements on Q.
9
6 Experiments The main purpose of the experiments reported in this section is to demonstrate the expressive power of the framework, where we will especially focuss on types of learning problems that cannot be handled by learning from entailment and from interpretations. Three experiments will be discussed: the well-known multiplexer benchmark, the simulation of learning from entailment, and the induction of a simple de nite clause grammar.
6.1 Multiplexer
A well-known hard induction problem for divide and conquer algorithms such as TDIDT [Quinlan, 1986] is the well-known multiplexer problem (see e.g. [Seshu, 1989; Van de Velde, 1989]). In the 6 bit multiplexer problem, each example consists of 6 bits, and the rst two bits are interpreted as the address of one of the other four bits. If the bit at the speci ed address is then 1, the example is considered positive; otherwise it is considered negative. E.g. consider the 6 bits 10 0001. The rst two bits (i.e. 10) specify the 3rd bit, which is 0, so this example is negative. The ICL-Sat representation for this example would be :
fa1 ; a2; a3; a4; a5; a6 g when numbering the bits from left to right. In general, the bitstring a1; a2; a3; a4; a5; a6 is encoded as an interpretation, and the class of the example is determined by checking the bit at position a1; a2 in a3; a4; a5; a6 where the positions of bits are numbered from left to right. So, a6 corresponds to address 00, a5, to 01, etc. For the 6 multiplexer problem there are 26 = 64 examples, 32 positives and 32 negatives. We ran three dierent experiments with ICL-Sat. First, ICL-Sat was run on the complete example set and all bits of the examples were speci ed. (So, ICL-Sat actually learned from interpretations in this case). On this complete example set, ICL-Sat learned a completely correct hypotheses, achieving 100 per cent accuracy when the positive concept was learned, and 100 per cent when the negative concept was learned. Second, ICL-Sat was run on the complete example set, but 12 out of the 64 examples were incompletely speci ed. For each of these 12 examples, only 3 bits were speci ed, the other 3 bits were unknown4. (In this case, ICL-Sat learned from interpretations with missing values, i.e. from partial interpretations.) E.g. the above example 10 0001, could have been replaced by 1? 00?? (negative) or :
fa1 ; a3; a4g On this example-set ICL-Sat's accuracy was 82.8 % (when the positive concept was learned) and 70.3 % (on the negative concept). To test the ability to handle incomplete knowledge, a third experiment was performed. The example-set of the second experiment was completed in the following manner. For each of the incompletely speci ed examples, an attempt was made to specify additional clauses within the example in such a way that 1) the value of attributes that had an unknown value was still unknown (could still be 1 as well as 0); 2) 4
Both the examples and bits were selected at random.
10
some combinations of values that are incompatible with the known class of each example were made impossible. E.g. the example 1? ??00 (negative) would be represented by :
fa1 ; a2; a3; a2 a4; a5; a6g The clause a2; a3 states that at least one of the second and third bits should be 0 (for the class to be negative), and the clause a2 a4 states that if a4 is 1, also a2 should be 1. Though these clauses do not determine the value of the attributes, they specify that certain combinations of values are impossible for the class of the example to be negative. All models of this example will act as negative examples for the concept to be learned. Of the 12 examples that had missing values in the second example-set, 7 could be transformed so that all models of the corresponding examples belong to the right class (without actually determining the values of the unknown attributes). The resulting example set served as the basis for the third experiment. The accuracy when learning the positive concept was 93.8 %, and 92.2 % when learning the negative one. The experiment thus con rms that 1) knowledge about incomplete examples can be speci ed elegantly within our framework; 2) that such knowledge is likely to improve the accuracy of the induced hypotheses; and 3) that the framework is useful even in the propositional case. accuracy accuracy examples concept concept complete interpretations 100 % 100 % partial interpretations 82.8% 70.3 % clausal theories 93.8 % 92.2 %
6.2 Emulating learning from entailment with full clausal theories
Learning from entailment aims at inducing a hypothesis H from a set of positive and negative examples such that H entails all of the positive examples and none of the negatives. Given the current state of the art, learning from entailment and its variants within inductive logic programming have considered the special case of learning from entailment where H is a set of Horn-clauses and each example is a single Horn-clause (though the induction of normal program clauses using SLDNF-resolution has already been considered, cf. e.g. [Bergadano et al., 1996]). Using our setting, one cannot only emulate learning from entailment, but one can also extend the representation of learning from entailment, such that both hypotheses and examples are full clausal theories. From this point of view, the main novelty of our framework is the use of examples that are full clausal theories. The formulation of learning from satis ability as H ^ e 6j= 2 was then choosen instead of H j= e because the former is closer to the learning from interpretations setting employed within Claudien and ICL. A rst extension of learning from entailment is illustrated on a propositional Horn problem. The aim is to induce the Horn theory : fp q; q sg from the following positive examples : fp s ^ tg fp q ^ tg and negatives : fp q; q tg 11
fq s; q pg fp g
In this illustration, each example is a clausal theory that is (resp. is not) logically entailed by the target concept. As previous approaches to learning from entailment only consider examples that are single clauses, they cannot cope with this induction problem. Though a positive example c1 ^ ::: ^ cn (where the ci are clauses) in our framework can easily be transformed into a set of positive examples fc1; :::; cng (one clause becomes one example) to be handled by (classical) learning from entailment, this transformation does not work for negative examples. The reason is that if a negative example c1 ^ ::: ^ cn is not entailed by the target concept this may be due to the fact that one of its clauses is not entailed. E.g. in the rst negative example, one of the clauses is entailed but the other is not. As a matter of fact this is another form of incomplete knowledge, that can be expressed within our framework. Learning from entailment where the examples are full clausal theories can easily be emulated by learning from satis ability. This is because H j= e if and only if H ^ :e j= 2. Therefore, to emulate this setting, we only need to negate all the examples and to swap the positives with the negatives. Applying this procedure to the the rst negative example above results in the following positive example:
f p ^ q; t p; q _ t g On the transformed example-set, ICL-Sat indeed induced the target concept. A second extension w.r.t. classical approaches to learning from entailment is that it becomes possible to induce clausal theories that are not Horn. E.g. given the following background theory (which is used to initialize the hypothesis), and examples, ICL-Sat induced the target theory B ^ (flies _ abnormal bird).
B = f abnormal^flies; bird Negatives : Positives:
ostrich; bird
eagle; bird
eagle _ sparrow _ ostrich
sparrow;
birdg
fsparrow ; flies; abnormalg feagle ; fliesg fostrich ; fliesg fbird ; abnormalg f flies; abnormalg
12
eagle^abnormal;
6.3 Learning a DCG
In a third and larger experiment, the aim was to induce a de nite clause grammar for parsing ultra simple sentences in English. More speci cally, the aim was to induce the following DCG:
sent(A; B )
np(A; C ) ^ vp(C; B )
np(A; B ) det(A; C ) ^ noun(C; B ) vp(A; B ) verb(A; B ) vp(A; B ) verb(A; C ) ^ np(C; B ) Notice that this is a multiple predicate learning problem (cf. [De Raedt et al., 1993]). For the purposes of this experiment Claudien-Sat, i.e. the characteristic inducer, was employed. Claudien-Sat's DLAB mechanism was used to assure that 1) only clauses in DCG form would be derived. 2) each clause in hypothesis should de ne a non-terminal (i.e. sent, np or vp), and 3) at most 3 literals in each clause were allowed. The examples used to induce the DCG grammar could be regarded as the trace of a parser that explores a sentence and discovers that some subsentences belong to the language (true facts) or not (false facts). The examples used are listed below.
a rst interpretation including all true and false facts mentioning the following lists
: [the,dog,eats,the,cat], [dog,eats,the,cat], [eats,the,cat], [the,cat], [cat], and []. This example thus corresponds to a complete syntactic analysis of the sentence \the dog eats the cat". a second interpretation including all true and false facts mentioning the following lists: [the,cat,the,cat], [cat,the,cat], [cat,cat], [the,cat], [cat], [cat,the], and []. This example shows some ungrammatical sentences, and corresponds to several attempts to analyse \the cat the cat". similarly, a third interpretation including all true and false facts mentioning: [the; cat; eats], [cat; eats],[cat; sings], [the; cat; sings], [dog; cat], [sings],[eats],[the], and []. a fourth partial interpretation showing ungrammatical sentences, fdet([the; man; man]; [man; man]);det([the; the; man; man]; [the;man; man]); np([the; man; man]; [man]); :vp([the; the; man; man]; [man]); :vp([the; man; man]; []); :sent([the; man; man]; []); :sent([the; the;man;man]; [man]); :np([the; the; man; man]; [man]); noun([man]; []); :np([the; man; man]; []); :np([eats; man]; []); verb([eats; man]; [man]); vp([eats; man]; [man]); :vp([eats;man]; []); :sent([eats; man]; [])g. a fth partial interpretation showing some ungrammatical sentences, fnp([the; man; the; dog]; [the;dog]); np([the; dog]; []); vp([eats; the; man]; [the;man]); vp([eats; the;man;the; dog]; [the; man; the; dog]); :vp([eats; the; man;the;dog]; []); det([the]; []); 13
det([the; the]; [the]); :np([the;the]; [the]); :vp([the; the]; [the]); :np([the; the]; []); :vp([the;the]; []); :sent([the; the]; []); :sent([the; the]; [the])g. The experiment shows 1) characteristic induction from satis ability at work in a rst order setting, 2) that learning from satis ability correctly deals with the multiple predicate learning problem, 3) that dierent kinds of examples can be mixed in this setting: e.g. the rst three examples are models for the target theory, whereas the latter two examples are merely partial interpretations. The rst examples could thus also have been supplied within learning from interpretations, but not the latter two.
7 Conclusions and related work A framework for learning from incompletely speci ed examples has been presented. The main novelty of the framework is the use of full clausal theories as examples. Given this representation of examples, learning from satis ability and learning from entailment are logically equivalent. However, learning from entailment has so far been restricted to examples that are single clauses. Learning from satis ability also extends the learning from interpretations setting (as extensively discussed in [De Raedt, 1996b]). Algorithms for performing characteristic and discriminant learning from satis ability have been designed and implemented in the ICL-Sat and Claudien-Sat algorithms. The experiments clearly demonstrate that 1) learning from satis ability is the most general setting (within clausal logic) considered so far, 2) the setting can emulate the other settings considered within inductive logic programming (cf. [De Raedt, 1996b]), 3) learning from satis ability is especially useful to represent incomplete knowledge about examples, and 4) that this framework is also useful for propositional approaches to learning. The price to pay for this added expressive power is twofold. First, learning from satis ability requires the use of a general clausal theorem prover, which is computationally much more expensive. Second, the user must be able to cope with this added power, which is not always as easy when working with full clausal logic. As far as related work is concerned, we already mentioned [Wrobel and Dzeroski, 1995; Fensel et al., 1995], who proposed the coverage notion employed in learning from satis ability. However, Wrobel and Dzeroski did not represent examples and hypotheses by full clausal theories. Yet, this choice is essential to our framework because it allows 1) to clearly relate the dierent settings to each other (i.e. learning from interpretations, from entailment, and from satis ability, cf. [De Raedt, 1996b]), and 2) to develop algorithms for learning from satis ability. On the other hand, Wrobel and Dzeroski mainly studied the in uence of testing coverage at the local level (i.e. at the level of each single example) or at the global level (i.e. represeting all examples in a single database and testing coverage at that level), and the dierences between predictive and descriptive inductive logic programming (which is related to characteristic versus discriminant induction). One open question for further research is how learning from satis ability (which employs a monotonic logic) could be used for inducing nonmonotonic logic programs. 14
Acknowledgements Luc De Raedt is supported by the Belgian National Fund for Scienti c Research. This work is part of the ESPRIT project no. 20237 on Inductive Logic Programming II. The authors would like to thank Saso Dzeroski and Stefan Wrobel for inspiring part of this work, and Johannes Furnkranz for commenting on a draft.
References
[Angluin et al., 1992] D. Angluin, M. Frazier, and L. Pitt. Learning conjunctions of horn clauses. Machine Learning, 9:147{162, 1992. [Bergadano et al., 1996] F. Bergadano, D. Gunetti, M. Nicosia, and G. Ruo. Learning logic programs with negation as failure. In L. De Raedt, editor, Advances in inductive logic programming. IOS Press/Ohmshma, 1996. [Clark and Niblett, 1989] P. Clark and T. Niblett. The CN2 algorithm. Machine Learning, 3(4):261{284, 1989. [De Raedt and Dehaspe, 1997] L. De Raedt and L. Dehaspe. Clausal discovery. Machine Learning, 26:99{146, 1997. [De Raedt and Van Laer, 1995] L. De Raedt and W. Van Laer. Inductive constraint logic. In Proceedings of the 5th Workshop on Algorithmic Learning Theory, volume 997 of Lecture Notes in Arti cial Intelligence. Springer-Verlag, 1995. [De Raedt et al., 1993] L. De Raedt, N. Lavrac, and S. Dzeroski. Multiple predicate learning. In Proceedings of the 13th International Joint Conference on Arti cial Intelligence, pages 1037{1042. Morgan Kaufmann, 1993. [De Raedt, 1996a] L. De Raedt. Induction in logic. In R.S. Michalski and Wnek J., editors, Proceedings of the 3rd International Workshop on Multistrategy Learning, pages 29{38, 1996. [De Raedt, 1996b] L. De Raedt. Logical settings for concept-learning. Technical Report KUL-CW, Department of Computer Science, Katholieke Universiteit Leuven, 1996. submitted. [Dehaspe and De Raedt, 1996] L. Dehaspe and L. De Raedt. DLAB: A declarative language bias formalism. In Proceedings of the International Symposium on Methodologies for Intelligent Systems (ISMIS96), volume 1079 of Lecture Notes in Arti cial Intelligence, pages 613{622. Springer-Verlag, 1996. [Fensel et al., 1995] D. Fensel, M. Zickwol, and M. Wiese. Are substitutions the better examples? In L. De Raedt, editor, Proceedings of the 5th International Workshop on Inductive Logic Programming, 1995. 15
[Frazier and Pitt, 1993] M. Frazier and L. Pitt. Learning from entailment. In Proceedings of the 9th International Conference on Machine Learning, 1993. [Genesereth and Nilsson, 1987] M. Genesereth and N. Nilsson. Logical foundations of arti cial intelligence. Morgan Kaufmann, 1987. [Manthey and Bry, 1988] R. Manthey and F. Bry. SATCHMO: a theorem prover implemented in prolog. In Proceedings of the 9ht International Conference on Automated Deduction (CADE88), pages 415{434. Springer-Verlag, 1988. [Mooney, 1995] R.J. Mooney. Encouraging experimental results on learning cnf. Machine Learning, 19:79{92, 1995. [Muggleton and De Raedt, 1994] S. Muggleton and L. De Raedt. Inductive logic programming : Theory and methods. Journal of Logic Programming, 19,20:629{679, 1994. [Quinlan, 1986] J.R. Quinlan. Induction of decision trees. Machine Learning, 1:81{106, 1986. [Seshu, 1989] R. Seshu. Solving the parity problem. In K. Morik, editor, Proceedings of the 4th European Working Session on Learning. Pitman, 1989. [Van de Velde, 1989] W. Van de Velde. Idl, or taming the multiplexer. In K. Morik, editor, Proceedings of the 4th European Working Session on Learning. Pitman, 1989. [Wrobel and Dzeroski, 1995] S. Wrobel and S. Dzeroski. The ILP description learning problem: Towards a general model-level de nition of data mining in ILP. In Proceedings of the Fachgruppentreen Maschinelles Lernen, 1995.
16