Top-down Induction of Logic Programs from Incomplete ... - CiteSeerX

9 downloads 9587 Views 251KB Size Report
Top-down Induction of Logic Programs from. Incomplete Samples ? Nobuhiro Inuzuka1, Masakage Kamo2, Naohiro Ishii1,. Hirohisa Seki1 and Hidenori Itoh1.
Top-down Induction of Logic Programs from Incomplete Samples

?

Nobuhiro Inuzuka1 , Masakage Kamo2 , Naohiro Ishii1 , Hirohisa Seki1 and Hidenori Itoh1 1

Department of Intelligence and Computer Science, Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya 466, Japan E-mail: finuzuka,ishii,seki,[email protected] 2 Aishin Seiki Co.,Ltd., Asahi-cho, Kariya-shi, Aichi 448, Japan E-Mail: [email protected]

We propose an ILP system FOIL-I, which induces logic programs by a top-down method from incomplete samples. An incomplete sample is constituted by some of positive examples and negative examples on a nite domain. FOIL-I has an evaluation function to estimate candidate de nitions, the function which is composition of an information-based function and an encoding complexity measure. FOILI uses a best- rst search using the evaluation function to make use of suspicious but necessary candidates. Other particular points include a treatment for recursive de nitions and removal of redundant clauses. Randomly selected incomplete samples are tested with FOIL-I, Quinlan's FOIL and Muggleton's Progol. Compared with others FOIL-I can induce target relations in many cases from small incomplete samples.

Abstract.

1

Introduction

Many ILP(Inductive Logic Programming) systems have been proposed recently. They are expected to give a revolutionary function to Software Engineering and Arti cial Intelligence. Some researchers aim at practical uses of ILP. Indeed many researches in application areas have been reported[Coh95, DBJ94, DM91, HS92]. To use ILP systems in practical domains development of noise-tolerant ILP systems is one of essential topics. Some researches are worked in this area[DB92, BP91, AP93, Fur93, Dze95]. Easiness or tractability of a system is another aspect of practice. The size of a sample of a target logic program requested by a system a ects easiness of the system. Reducing the size of samples is one of main topics in ILP as well[MN95, ALLM94a, ALLM94b]. ?

Presented at The Sixth International Inductive Logic Programming Workshop (ILP96), Stockholm, Sweden, 28-30 August, 1996, and Published in LNAI Vol.1314, Springer, pp.265-282. This research is partially supported by the Grant-in-Aid for Encouragement of Young Scientists No.08780346 from the Japanese Ministry of Education, Science, Sports and Culture.

Our system FOIL-I(First Order Inductive Learner from Incomplete samples) basically uses the framework of FOIL[QCJ93, Qui90]. FOIL-I takes an extensional de nition of a target relation and background relations described in Horn clauses. An extensional de nition of a target relation consists of examples or tuples that satisfy the target relation and negative examples or tuples that do not satisfy it. The system hopefully outputs an intensional de nition of the target relation, that is, a set of Horn clauses that consist of background relations and possibly the target relation recursively. More precisely, we use a restricted version of Prolog language, which is also used in FOIL. The language does not include cuts, fail, disjunctive goals, and functions other than constants, but includes negated literals. FOIL requires only nite domains. To construct a de nition of the membership relation member FOIL maybe needs a domain that consists of three di erent atoms a, b and c and lists made from the atoms with length 0, 1, 2, or 3. The nite domain makes FOIL need only reasonable numbers of positive and negative examples to induce the target relation. The number is, however, sometimes still large. In this case the number of positive and negative examples are 75 and 85 respectively. In the case of another relation append with the same situation the number of positive and negative examples are 142 and 63,858 respectively. The aim of FOIL-I is to reduce the number of examples required. We call the set of all positive(negative) examples a positive(negative) sample. A sample of target relation is a pair of positive and negative samples, the pair which gives an extensional de nition of the target relation. We call a pair of a subset of positive sample and a subset of negative sample an incomplete sample. Users can use the system much easier, if an ILP system requires only small incomplete samples, which means incomplete samples consisted of small numbers of positive and negative examples. 2

An example of FOIL-I session

Before the details of FOIL-I we show an example of FOIL-I session. FOIL-I starts with an input of a le which gives information of a nite domain, information of a target relation, positive and negative examples and background relations. Figure 1 shows an input le. The rst three lines give information of a nite domain which consists of atom and list. The following line gives information of a target relation memb or the membership relation(, whose name replaced by memb to avoid con icting with a Prolog built-in predicate member). The line includes the name, arity, mode, and type. In mode `+'(`0') means that the argument at the place should be given with(without, respectively) being bounded. The type speci es possible values of each arguments. The following two lines give an incomplete sample or positive and negative examples. Finally background relations are given with their names, arities, modes, types and their intensional de nitions. FOIL-I uses extensional de nitions in the system. Intensional de nitions are expanded to extensional de nitions after FOIL-I reads them. To give

intensional de nitions is easier than to give lengthy extensional ones. domain(atom,[[],a,b,c]). domain(list,[[],[a],[b],[c],[a,b],[a,c],[b,a],[b,c],[c,a],[c,b],[a,b,c], [a,c,b],[b,a,c],[b,c,a],[c,a,b],[c,b,a]]). target(memb,2,[+,+],[atom,list]). positive([[a,[b,a,c]], [b,[c,b,a]], [c,[c,a]], [b,[a,b]], [a,[a,c,b]], [b,[b]]]). negative([[c,[]], [a,[c,b]], [[],[b,c]], [b,[c,a]], [[],[c,a]], [c,[b,a]]]). bg(component,3,[+,-,-],[list,atom,list],[component([A|B],A,B)]). bg(null,1,[+],[list],[null([])]). Fig. 1.

An example of input le for FOIL-I.

Figure 2 shows an output from FOIL-I with the le above. The rst line is an input by a user that starts the session with a name of the input le. Following to the title the third line tells the input le. After them lines tell information about evaluation of candidate literals, the information which includes the number of positive and negative examples covered by a literal and its evaluation value. Information for only literals that cover at least one positive example is shown. In this session a literal (d,component,[2,3,4]) is added as a determinate literal, which is described in Sec. 4.2. The lines that start with Candidate(s): show a list of partial clauses saved in the step of induction loop. A literal is shown by a tuple of a sign(p=positive, n=negative and d=determinate), a predicate name and a list of arguments. (p, component, [2,1,3]) denotes a literal `component(x2 , x1 , x3 )', where arguments of head literal are assumed to be a series of variables with subscriptions of consecutive numbers starting from 1, e.g. memb(x1 ; x2 ) in this case. If a partial clause covers no negative examples, the clause will be chosen as an induced clause after a redundancy check is done. When every positive example is covered by an induced clause, FOIL-I stops and shows results and some statistics. 3

Top-down induction and incomplete samples

FOIL-I uses a top-down induction like Quinlan's FOIL. FOIL tries to constitute clauses which cover some positive examples and no negative examples by generating literals systematically and evaluating their coverage by an information-based function. In this section we show an outline of FOIL and point out problems which we meet when we use an incomplete sample.

?-foili('mem.ex'). F O I L - I, v e r. 0.8(July,1996) Processing a sample file: mem.ex A determinate literal (d,component,[2,3,4]) is added. (p,null,[4]) covers 1 pos.and 0 neg.example(s). Evaluation is -3.032 (p,eqatom,[1,3]) covers 3 pos.and 0 neg.example(s). Evaluation is -1.283 Candidate(s): Eval.:(Pos.,Neg.) clause -1.283:(3,0) [(p,eqatom,[1,3]),(d,component,[2,3,4])] -3.032:(1,0) [(p,null,[4]),(d,component,[2,3,4])] Clause(s): memb(X1,X2):-component(X2,X3,X4), eqatom(X1,X3). is/are induced. 3 positive example(s) remain(s) uncovered A determinate literal (d,component,[2,3,4]) is added. (p,memb,[3,2]) covers 3 pos.and 5 neg.example(s). Evaluation is -3.907 (p,memb,[1,4]) covers 3 pos.and 0 neg.example(s). Evaluation is 0.338 Candidate(s): Eval.:(Pos.,Neg.) clause 0.338:(3,0) [(p,memb,[1,4]),(d,component,[2,3,4])] -3.907:(3,5) [(p,memb,[3,2]),(d,component,[2,3,4])] Clause(s): memb(X1,X2):-component(X2,X3,X4), memb(X1,X4). is/are induced. 0 positive example(s) remain(s) uncovered result((memb(A,B):-component(B,C,_),eqatom(A,C))). result((memb(A,B):-component(B,_,C),memb(A,C))). 11 literals visited Runtime: 0.559 sec. yes Fig. 2.

An example of output of FOIL-I system

3.1

FOIL's algorithm

1 2

Initialization theory := null program remaining := all tuples belonging to target relation

R

3 While remaining is not empty 4 clause := \R(A; B ; 1 1 1) : 0" 5 While clause covers tuples known not to belong to R 6 Find appropriate literal(s) L 7 add L to body of clause 8 Remove from remaining tuples in R covered by clause 9 Add clause to theory Fig. 3.

Outline of FOIL algorithm

Figure 3 shows an outline of FOIL's algorithm. FOIL starts with a null program(line 1), and repeats a loop in which FOIL searches a clause that covers only positive examples until every positive example is covered by a set of found clause(lines 3{9). In each iteration of loop FOIL starts to search a clause from the most general clause, a body-less clause(line 4), and makes it specify not to cover any tuples known not to belong to the target relation(lines 5{6). Speci cation is achieved by adding a literal to the body of the clause(line 6). In the algorithm literals to be added are evaluated from an information-theoretic aspect. A literal is evaluated as a high point if the added literal makes a clause cover a large part of positive tuples than a clause before adding the literal. When a clause covers no negative tuples, the clause is chosen as a part of de nition of a target relation(line 9). Positive examples covered by the found clause is removed from sample(line 8), and FOIL returns the loop with the new remaining sample. 3.2

`Unknown' examples

FOIL-I system does not assume the Closed World Assumption. A negative sample should be given explicitly. Any examples not given are not known to be positive nor negative. We call examples not given unknown examples. The existence of unknown examples a ects the following points. 1. The evaluation function of FOIL should be reconsidered. In FOIL literals are evaluated by an information-based function Gain. The function is calculated from the number of positive and negative examples covered by a candidate literals. The function will be reconsidered with the number of unknown examples. Another evaluation measure is also introduced. 2. Some small samples may let a system induce incorrect de nitions, because small samples can be covered by unexpected clauses. Plausible literals hence do not always give correct answer. We should reconsider a search strategy.

3. FOIL allows to use a target relation recursively in a clause induced, but treatment of recursive clauses is not sucient in the case of an incomplete samples. The treatment depends on a given samples, and the size of a sample e ects e ectiveness of it. We check the treatment of recursive clauses in FOIL and give another procedure. These points are discussed and we answer them in the following sections. 4 4.1

Induction from incomplete samples Evaluation function

The algorithm needs an evaluation function for literals to control the search. As we are going to see it in the following, FOIL uses an evaluation function Gain to evaluate literals, while FOIL-I uses an evaluation function to evaluate partial clauses because FOIL-I searches partial clauses instead of literals (see Sec. 4.2). FOIL-I's evaluation function is composed of two parts. The rst one is the function Gain of FOIL, which gives information-based estimates of literals. We have to verify e ectiveness of this function later. The second part gives an encoding complexity measure. Information-based evaluation. We consider a target relation R(x1 ; 1 1 1 ; xm ). First a most general clause, i.e., a body-less clause `R(x1 ; 1 1 1 ; xm ) : 0' is considered. Let T denote a given sample of R and T + and T 0 denote the numbers of positive and negative examples, respectively. When a literal L(xi1 ; 1 1 1 ; xij ) is added to the clause, an example (a1 ; 1 1 1 ; am ) in the sample T is expanded to examples (a11 ; 1 1 1 ; a1m ; a1m+1 ; 1 1 1 ; a1m+k ), 1 1 1 , (an1 ; 1 1 1 ; anm ; anm+1 ; 1 1 1 ; anm+k ) if

1. (ati1 ; 1 1 1 ; atij ) is a ground instance of the literal L(xi1 ; 1 1 1 ; xij ) for t = 1; 1 1 1 ; n, and 2. at1 = a1 ; 1 1 1 ; atm = am for t = 1; 1 1 1 ; m,

where fi1; 1 1 1 ; ij g [ f1; 1 1 1 ; mg = f1; 1 1 1 ; m + k g. Examples expanded from a positive(negative) sample constitute a new positive(negative) examples, respectively. T 0 denotes a new sample expanded by the literal, and T ++ denotes the number of positive examples in T that are expanded to at least one new example in T 0 . In Quinlan's FOIL the literal L is evaluated by an evaluation function Gain that is de ned as follows ( ) = T ++ 1 (I (T ) 0 I (T 0 ));

Gain L

(1)

where the function I (1) is de ned as follows.

+

( ) = 0 log + T + T0

I T

T

(2)

To apply the function Gain for incomplete samples we consider unknown examples. Some of unknown examples are potentially positive. Let TI+ (TI0 ) denotes the numbers of all positive(negative) examples including given positive(negative) examples and the potential positive(negative) examples, respectively. TI+ is estimated as follows. + + ? T (3) I  T +W 1T ; ? where T denotes the number of unknown examples, and W is a rate of positive examples in all examples. W is approximated by the following equation. T

+

(4)

= + T + T0 Using TI+ and TI0 , I (T ) is estimated as follows. W

+

( ) = 0 log + I 0 T I + TI + ?  0 log TT+ ++TW0 +1 TT ? + + (T + =(T + + T 0 )) 1 T ? T = 0 log + + T0 + T? T + + (T T + T + T 0 + T + T ? )=(T + + T 0 ) = 0 log + + T0 + T? T

I T

T

+

= 0 log + T + T0 Gain for incomplete samples can be approximated by the Gain of FOIL. As a side e ect we need not to keep unknown examples. This is an advantage in calculation time. T

Encoding complexity-based evaluation. FOIL uses a kind of encoding length heuristics[Qui90]. It is to deal with noisy domains. In some practical situations it is very dicult to give a precise de nition for a given sample. It is the purpose of the heuristics to give a criterion to stop lengthening clauses to t sample by ignoring small exceptions in noisy domains. FOIL-I also uses a kind of encoding length heuristics, but there are di erences in two points. The rst one is its purpose. In FOIL-I the encoding length heuristics does not explicitly intend the treatment of noise. The main purpose is to control search. FOIL-I searches partial clauses instead of literals, and clauses with di erent lengths are kept in the same list. Short clauses should be visited before lengthy ones. In second the encoding-length heuristics is implemented as an evaluation function with another heuristics variable's number heuristics, that is, the numbers of variables in clauses are measured. An evaluation function for this purpose is de ned by the following Complex function. nthe number of  o n  o the number of + 1 (5) Complex(C ) = log + 1 + log body literals in C

variables in

C

In FOIL-I, the following function Eval that is a composition of is used Eval(C ) = Gain(L) 0 Complex(C );

Complex

Gain

and (6)

where L is a literal in C that is newly chosen. 4.2

Search strategy

FOIL's algorithm is basically a greedy method, which means that a literal once chosen by the system is hardly canceled. In the case of incomplete samples, however, it is dicult to give a certain choice. The choice should be postponed. FOIL-I, instead, uses best- rst search with restricted expansion of each search step, which is more similar to the search algorithm of Progol[Mug95]. Figure 4 shows the algorithm of FOIL-I. In the same way as FOIL, FOIL-I starts with a null program(line 1) and repeats a loop to nd clauses that cover some positive examples but do not cover any negative example(lines 3{12). After some iterations of the loop learning is stopped if every positive example is covered by a found clause(line 3). In each iteration of the loop, to nd clauses the algorithm keeps partial clauses in clauses. At rst of steps the most general clause \R(A; B; 1 1 1) : 0" is kept in clauses as a partial clause(line 4) and enters a loop to construct clauses(lines 5{ 10). A partial clause with best evaluation (i.e., the most general clause at the beginning) is picked out from clauses(line 6). FOIL-I generates possible literals to add to the body of the best partial clause(line 7 and 8). The new partial clauses made from the literals generated are evaluated by the evaluation function(line 9). If some clauses of them cover at least one positive example and no negative examples, FOIL-I adds them to theory(line 11), removes positive examples covered by the found clauses from the sample(line 12), and moves to the next iteration of loop(return to line 3). Otherwise FOIL-I sorts the evaluated clauses and the clauses in clauses by their evaluation values and chooses the m new best partial clauses3 from them(line 10), while FOIL chooses only the best literal. The m clauses do not include any literals that covers no positive examples. The loop is repeated with the best clause from clauses. FOIL-I does not distinguish the new clauses from the others. They and other clauses that have been induced already have the same chance to be chosen. The major changes in the algorithm of FOIL-I from FOIL include the following points. 1. FOIL-I searches clauses with di erence length simultaneously. All partial clauses are kept in clauses without distinguishing their lengths. 2. More than one clauses can be found at the same time. FOIL prohibits to introduce a zero-gain literal in principle, but some kind of literals are useful even if they are zero-gain. [QCJ93] characterizes such kind of literals by using the idea of determinate literals originally proposed in [MF92]. A 3

In default the m is ten.

1 2

Initialization theory := null program remaining := all tuples belonging to target relation

R

3 While remaining is not empty 4 clauses := f\R(A; B; 1 1 1) : 0"g 5 While there is no clause in clauses that does not cover any tuples known not to belong to R (if there exist let them be found-clauses) 6 Choose the best clause clause from clauses if clauses is not empty and algorithm stops in fail if it is empty 7 Find appropriate literals fL1 ; 1 1 1 ; Ln g. 8 Add each literal of fL1 ; 1 1 1 ; Ln g to body of clause to make clauses fclause1 ; 1 1 1 ; clausen g 9 Evaluate fclause1 ; 1 1 1 ; clausen g 10 Choose the best m clauses from fclause1 ; 1 1 1 ; clausen g [ clauses and update clauses to them 11 Add found-clauses to theory 12 Remove from remaining tuples in R covered by found-clauses Fig. 4.

Outline of FOIL-I algorithm

determinate literal is a literal that introduces new variables which have exactly one binding for every positive example and for most of negative examples. For example component(X2 ; X3 ; X4 ) gives zero-gain to the following clause, because it only expands X2 to X3 and X4 but X3 and X4 may be useful to construct a de nition of the target relation member. member(X1 ; X2 )

: 0:

(7)

A determinate literal sometimes reveals attributes from another one, such as a head and a tail of a given list. FOIL-I introduces all possible determinate literals before each learning loop, i.e., before Step 7. When a clause is completed a determinate literal that is not referred from others is removed. 4.3

Induction of recursive clauses

To use recursive de nitions makes ILP systems more powerful. FOIL-I induces recursive de nitions by choosing a target predicate in the same manner as other predicates. However, the expansion of examples can not be treated the same as other literals in the case of incomplete samples. When a target relation is chosen, Quinlan's FOIL expands examples using a given sample itself, because it is an extensional de nition of target relation.

This expansion is the same as expansion for other relations. For example, let us imagine that FOIL nds a clause: member(A; B )

: 0component(B; C; D); member(A; D ):

(8)

during learning for a target relation member. Then an example (b; [a; b; c]) may be expanded and checked to be covered by this clause using an example (b; [b; c]) in a given positive sample. But it is not e ective in the case of incomplete samples, because a given sample is sometimes too small to expand, e.g. the positive example (b; [b; c]) may not be included in a given positive sample. Another method to expand a sample with a recursive clause is to calculate the relation for each example in the sample by using the recursive clause. For example (b; [a; b; c]) can be checked by calculation using the clause (8) with the following clause induced before : member(A; B )

: 0component(B; A; C ):

(9)

This method, i.e., calculation using recursive clauses, is e ective in small samples, but it is not when a recursive clause is still un nished. This will happen when a recursive de nition is not tail recursive or when more than one recursive clauses are necessary for a de nition of a target relation. FOIL-I takes the following procedure to expand an example by a recursive literal. 1. To check a recursive call FOIL-I checks if a tuple of values bounded by the recursive call is in a given sample. 2. If it fails, FOIL-I processes SLD-resolution using the recursive clause with other clauses induced before. This method is much e ective for small incomplete samples and does not reduce eciency for complete samples as well. There are other points on the implementation of processing a recursive de nition. They are described in Sec. 5.3. 5

Implementation

The current version of FOIL-I is implemented by using SICStus Prolog version 2. Compared with Quinlan's FOIL6, FOIL-I takes time to process induction. One of the reason is that it is implemented on Prolog. The source code will be available on request soon. In the following sections we discuss implementation issues. The rst matter is about redundant clauses. The section gives a method to correctly detect redundant clauses. The second is matters to cut down calculation time. Several topics are discussed. The nal section discusses about the treatment of recursive clauses.

c2 c1

'i i i i $'i i i i $ iii iii iii iii iii iii iii iii & %& %

-

+

+

+

+

0 0 0 0

?

?

?

?

?

?

?

?

c1

c2

-

+

+

+

+

0 0 0 0

?

?

?

?

?

?

?

?

A situation where a clause c1 is mistaken to be redundant against a clause +, 0 (the left) and a situation where c1 is really redundant against c2 (the right). ? denote a positive example, a negative one and an unknown one, respectively. and

Fig. 5. c2

5.1

Removing redundant clauses

In a theory found by the system, a clause is redundant if every instance of a target relation covered by the clause is covered by other clauses of the theory. In the FOIL-I algorithm there are two cases that redundant clauses can be induced. More than one clauses may be induced in an iteration of learning loop, and some of clauses induced may be redundant. This is the rst case. The other case will happen over iterations of the loop. A clause is induced in an iteration and another one is induced after the iteration. The rst clause may not be necessary, that is, every positive example covered by the rst clause may be covered by the second one. Note that the second clause is always necessary. Otherwise none of the examples is covered by the second clause, and so the clause must not be induced. FOIL-I checks redundant clauses and removes them for the rst case. To nd redundant clauses it is not sucient to check the subset relation of positive examples covered by clauses, because there are many unknown examples. Figure 5 illustrates that it is not sucient for detecting redundancy between two clauses to check the subset relation between positive examples covered by the clauses. The left gure of Fig. 5 shows a situation that a clause c1 is mistaken to be redundant against a clause c2 . In the right gure, c1 is really redundant against c2 . For each clause induced, FOIL-I calculates a set of all ground instances de ned by the clause within a given nite domain. Instead of the subset relation between positive examples covered by clauses, FOIL-I checks the subset relation between the sets of ground instances. In the case of Fig. 5 FOIL-I does not take the c1 as a redundant clause. Redundant clauses in the second case can be also removed by the same way. This is, however, a time expensive task, because the redundancy check has to be done for all of combinations of clauses. In this version of FOIL-I, redundancy check in the second case is omitted.

5.2

Cutting down calculation

Mode. FOIL-I uses mode to specify each argument to be used as input or output. Mode is speci ed as a list of `+' or `0' for a target relation and background relations in sample les to be given to the system. `+'(`0') speci es that a corresponding argument is an input(output) argument, respectively. Using mode helps to restrict possible literals to be added to the body of a clause. A literal (:)L(xi1 ; 1 1 1 ; xin ) is a candidate to be added to a partial clause c only if

1. the predicate L has a mode mode = (m1; 1 1 1 ; mn ), where mj = `+' or `0' (j = 1; 1 1 1 ; n), 2. the partial clause c has variables fx1 ; 1 1 1 ; xk g, and 3. for every variable xil in fxi1 ; 1 1 1 ; xin g 0 fx1 ; 1 1 1 ; xk g, ml = `0'. Using mode reduces computation time to search for useless literals. To specify a type of a predicate is also helpful to reduce computation time and memory. A type is also speci ed for a target relation and background relations in a sample le as a list of domain names. Each name at a place speci es a domain from which argument at the place can take a value. A type of a predicate can be extended to a type of a partial clause as follows.

Type.

1. If a partial clause has no body literal and it has only a target predicate as head literal P (xi11 ; 1 1 1 ; xi1np ) and the predicate P has type (dp1 ; 1 1 1 ; dpnp ), type type1 of the partial clause is: type1

= (dpi1 ; 1 1 1 ; dpi1n ) 1

p

(10)

:

2. If a partial clause c has variables x1 ; 1 1 1 ; xmk and a type typek = (dk1 ; 1 1 1 ; k dn ), a literal l = (:)L(xik ; 1 1 1 ; xik ) is chosen to be added to c, and L has k nk 1 a type (dL1 ; 1 1 1 ; dLnL ), type of a new partial clause made from c by adding l to the body is: typek+1

where fxj1 ; 1 1 1 ; xjmk g = occur in l in this order.

f

= (dk1 ; 1 1 1 ; dknk ; dLj1 ; 1 1 1 ; dLjmk ); k

xi ;

1

111

; xi

kn k

g 0 f 1 111 x ;

k

; xn

g and

(11) xj1 ;

111

mk

; xj

The de nition above is well-de ned only if literals to be added to a clause satisfy the condition of candidates which is given in the following. The speci cation of types has two e ects to reduce computation. The rst is to reduce the number of candidates of literals to be added to a partial clauses, which is the same as mode. The second one is to reduce the number of tuples which are generated from positive or negative examples by expansion according to new literals. The rst e ect is shown as follows. A literal (:)L(xi1 ; 1 1 1 ; xin ) is a candidate to be added to a partial clause c only if

1. the predicate L has a type type0 = (d01 ; 1 1 1 ; d0n ), where d0j is a domain name (j = 1; 1 1 1 ; n), 2. the partial clause c has variables fx1 ; 1 1 1 ; xk g and a type type = (d1 ;1 1 1; dk ), and 3. for every variables xil in fxi1 ; 1 1 1 ; xin g \ fx1 ; 1 1 1 ; xk g, dil = d0l . Using type FOIL-I avoids searching for useless literals. As the second e ect, type reduces tuples. A set of possible tuples for a partial clause is a product of domains in the type of the clause. 5.3

Induction of recursive clauses

Three points on the implementation of processing recursive de nitions are described here. First to induce recursive de nition correctly, FOIL-I prohibits to induce a recursive clause before it induces a non-recursive clause. It is because a recursive de nition always requires at least one base clause. Secondly when FOIL-I calculates and nds a tuples to be an instance of a target relation during expansion of an example, the tuple is saved in the system. The saved tuples are used in the same way as given positive examples, which saves processing time in the calculation of recursive clauses. Finally, we have to pay attention to that a recursive clause may not be completed. Another literal may be necessary to the tail of the clause or even another clause may be required and it and the recursive clause may call each other. In these cases the calculation of the coverage of examples only by a recursive clause is insucient. After a clause is completed or after another clause is induced, the coverage must be changed. FOIL-I checks the coverage using all of induced clauses again after a clause is completed or after another clause is induced. 6

Experimental results

In this section we show experimental results to compare with two ILP systems FOIL and Progol[Mug95]. We used FOIL6.3 written in C language and CProgol Version 4.1 which is also written in C language. In general FOIL6.3 and CProgol are much faster than our system FOIL-I. We compare their ability to focus on existence of incomplete examples. Experiments were done for the target relations member which checks if an element is a member of a list, last which gives the last element of a list. length which gives the length of a list, and nth which gives the n-th element of a list. Table 1 shows background knowledge used to induce these relations and the numbers of all positive and negative examples of them. In a 10% samples of member, for example, three positive examples and three negative examples are included. Fifty samples of each percentage and of each target relation are generated randomly and automatically, and are given to FOILI, FOIL6 and Progol.

Table 1.

target relation member last length nth

Target relations and background knowledge tested

positive negative examples examples component([AjB ]; A; B ) 33 31 component([AjB ]; A; B ), null([ ]) 39 121 component([AjB ]; A; B ), pred(X 0 1; X ), zero(0), null([ ]) 15 45 component([AjB ]; A; B ), pred(X 0 1; X ), zero(0) 33 223 background knowledge

Table 2 shows results the of induction of their target relations by the three systems with the incomplete samples. The numbers in the table are the numbers of incomplete samples with which the systems induce correct answers. Some answers include redundant de nitions even if they work expectedly. The numbers of samples with which the systems gives exactly correct de nitions without redundant clauses are shown as well. You can see that FOIL-I is better than FOIL6 and Progol in the point that FOIL-I can induce correct de nitions in more cases than others. In particular, FOIL-I is superior to others in lower percentage's samples. Another superiority of FOIL-I is to induce redundant clauses in few cases. Indeed Progol induced very few redundant clauses, but it did not mark high points in lower percentages. Table 2 shows the average runtime of induction processes as well. FOIL-I is ten times slower than others. A reason of this large runtime is that the system is written in Prolog language. The runtime includes processing time to expand intensional de nitions of background relations to extension de nitions. In the experiments, all de nitions that can correctly induces all instances of a target relation are classi ed as correct de nitions. FOIL6, however, induces correct but unexpected de nitions, for example: : 0 component(B; A; C ): memb(A; B ) : 0 component(B; C; D ); component(D; A; E ): (12) memb(A; B ) : 0 component(B; C; D ); component(D; E; F ); memb(A; F ):; memb(A; B )

which is not induced by FOIL-I. As you can see in Table2 the systems fail almost completely in inducing the relation length in lower percentages. Individual outputs from the systems tell that many cases of these failure are caused from the lack of positive examples that suggest the base case of a recursive de nition. The relation length has 15 positive examples and 45 negative ones. Only an example ([ ]; 0) of all 15 positive examples gives the base clause of length, while in the case of member 15 examples of all 33 give the base clause. Even in the case of last 3 examples of all 39 can give a base de nition. To check the ability of FOIL-I compared with the others more adequately, we made an experiment with samples of length that made from the same samples

Table 2. Fifty randomly selected (7%) 10%, 20%, 30%, 50%, and 80% incomplete samples are given to FOIL-I, FOIL6, and Progol. The numbers shown are those of samples with which FOIL-I, FOIL6 and Progol induced the correct de nitions that may include redundant clauses. The numbers of cases which include no redundancy are shown as well. Runtimes are the average time of induction processes that induced correct de nitions.

samples member

80% 50% 30% 20% 10% 7%

last

80% 50% 30% 20% 10%

length

80% 50% 30% 20% 10%

nth

80% 50% 30% 20% 10%

FOIL-I correct answer no redundancy runtime (msec)

FOIL6 correct answer no redundancy runtime (msec)

Progol correct answer no redundancy runtime (msec)

50 50 50 49 38 22

50 50 50 49 38 22

961 764 620 566 493 462

41 36 16 8 3 0

6 5 1 5 3 0

107 100 100 100 100 100

26 20 5 2 0 0

26 20 5 2 0 0

359 314 148 260 -

50 44 33 23 2

50 44 33 23 1

4400 3306 2580 2138 2050

45 24 25 13 2

2 3 7 5 2

198 267 232 146 100

21 6 0 0 0

17 3 0 0 0

285 250 -

38 18 6 4 0

31 18 6 4 0

1810 1068 762 613 -

38 18 4 2 0

12 11 4 2 0

113 138 100 100 -

39 13 0 0 0

39 13 0 0 0

87 65 -

50 49 46 27 1

50 49 46 27 1

3544 2191 1559 1313 1080

0 6 5 0 0

0 0 2 0 0

350 160 -

43 43 19 0 0

43 43 19 0 0

1310 809 510 -

Table 3.

The results of the target relation

samples length

80%+([ ]; 0) 50%+([ ]; 0) 30%+([ ]; 0) 20%+([ ]; 0) 10%+([ ]; 0)

FOIL-I correct answer no redundancy runtime (msec) 49 41 30 18 5

40 41 29 17 5

1901 1100 769 674 572

length

with an example ([ ], 0).

FOIL6 correct answer no redundancy runtime (msec) 49 30 26 9 0

19 21 14 5 0

108 133 108 100 -

Progol correct answer no redundancy runtime (msec) 50 37 3 2 0

50 37 3 2 0

86 72 50 30 -

used before by adding the positive example ([ ]; 0). Table 3 shows the results of this experiment. FOIL-I, FOIL6 and Progol gave better results than the samples before adding ([ ]; 0). FOIL-I is better than others in this experiment as well, particularly in low percentage's samples. 7

Conclusions

To reduce the number of examples in samples to be given to an ILP system, we have developed a FOIL-like system FOIL-I. FOIL-I has an ability which deals with large amount of unknown examples. To deal with unknown examples, FOIL-I uses a best- rst search with an evaluation function which estimates information-based utilities of literals and encoding complexity. FOIL-I also has an ability to approximately check the redundancy of clauses, which is based on subset relations between sets of tuples covered by the clauses. We implemented this check only partially because it is a time-expensive task. Even with the restricted implementation of the redundancy check, FOIL-I avoids redundant de nitions in many cases. Another special point is the careful treatment of recursive clauses in FOIL-I. This is necessary to deal with small incomplete samples. Experiments show that FOIL-I induces correct de nitions of the target relations in more cases of lower percentages than FOIL and Progol. The ability to remove redundant clauses is also shown by the results. While FOIL's search is basically a greedy method, FOIL-I uses a best- rst search, and so it has to keep many nodes, i.e., partial clauses to be visited. This makes memory to be used by the algorithm very large because every node or partial clause is kept with positive examples and negative examples expanded by the partial clauses. Especially the number of negative examples increases rapidly because generally negative examples share much more tuples than positive examples.

The experiment shown in Table 3 tells the importance of examples that give a base de nition. FOIL-I can not give correct de nitions without such examples as well as FOIL and Progol. It is a future subject to give a correct base clause without such examples. References

[ALLM94a] D.W. Aha, S. Lapointe, C.X. Ling, and S. Matwin. Inverting implication with small training sets. In F. Bergadano and L. De Raedt, editors, Proceedings of the 7th European Conference on Machine Learning, volume 784 of Lecture Notes in Arti cial Intelligence, pages 31{48. Springer-Verlag, 1994. [ALLM94b] D.W. Aha, S. Lapointe, C.X. , and S. Matwin. Learning recursive relations with randomly selected small training sets. In W.W. Cohen and H. Hirsh, editors, Proceedings of the 11th International Conference on Machine Learning, pages 12{18. Morgan Kaufmann, 1994. [AP93] K.M. Ali and M.J. Pazzani. Hydra: A noise-tolerant relational concept learning algorithm. In R. Bajcsy, editor, Proceedings of the 13th International Joint Conference on Arti cial Intelligence, pages 1064{1071. Morgan Kaufmann, 1993. [BP91] C.A. Brunk and M.J. Pazzani. An investigation of noise-tolerant relational concept learning algorithms. In L. Birnbaum and G. Collins, editors, Proceedings of the 8th International Workshop on Machine Learning, pages 389{393. Morgan Kaufmann, 1991. [Coh95] W. Cohen. Learning to classify english text with ILP method. In L. De Raedt, editor, Proceedings of the 5th International Workshop on Inductive Logic Programming, Scienti c report, pages 3{24. Department of Computer Science, Katholieke Universiteit Leuven, 1995. [DB92] S. Dzeroski and I. Bratko. Handling noise in inductive logic programming. In S. Muggleton, editor, Proceedings of the 2nd International Workshop on Inductive Logic Programming, Report ICOT TM-1182, 1992. [DBJ94] B. Dolsak, I. Bratko, and A. Jezernik. Finite element mesh design: An engineering domain for ILP application. In S. Wrobel, editor, Proceedings of the 4th International Workshop on Inductive Logic Programming, volume 237 of GMD-Studien, pages 305{320. Gesellschaft fur Mathematik und Datenverarbeitung MBH, 1994. [DM91] B. Dolsak and S. Muggleton. The application of ILP to nite element mesh design. In S. Muggleton, editor, Proceedings of the 1st International Workshop on Inductive Logic Programming, pages 225{242, 1991. [Dze95] S. Dzeroski. Learning rst-order clausal theories in the presence of noise. In A. Aamodt and J. Komorowski, editors, Proceedings of the 5th Scandinavian Conference on Arti cial Intelligence, pages 51{60. IOS, Amsterdam, 1995. [Fur93] J. Furnkranz. Avoiding noise tting in a FOIL-like learning algorithm. In F. Bergadano, L. De Raedt, S. Matwin, and S. Muggleton, editors, Proceedings of the IJCAI-93 Workshop on Inductive Logic Programming, pages 14{23. Morgan Kaufmann, 1993.

[HS92] [MF92] [MN95]

[Mug95] [QCJ93]

[Qui90]

D. Hume and C. Sammut. Applying inductive logic programming in reactive environments. In S. Muggleton, editor, Inductive Logic Programming, pages 539{549. Academic Press, 1992. S. Muggleton and C. Feng. Ecient induction in logic programs. In S. Muggleton, editor, Inductive Logic Programming, pages 281{298. Academic Press, 1992. C. R. Mo zur and M. Numao. Top-down induction of recursive programs from small number of sparse examples. In L. De Raedt, editor, Proceedings of the 5th International Workshop on Inductive Logic Programming, Scienti c report, pages 161{180. Department of Computer Science, Katholieke Universiteit Leuven, 1995. Stephen Muggleton. Inverse entailment and progol. New Generation Computing, 3+4:245{286, 1995. J.R. Quinlan and R.M. Cameron-Jones. FOIL: A midterm report. In P. Brazdil, editor, Proceedings of the 6th European Conference on Machine Learning, volume 667 of Lecture Notes in Arti cial Intelligence, pages 3{20. Springer-Verlag, 1993. J.R. Quinlan. Learning logical de nitions from relations. Machine Learning, 5:239{266, 1990.

This article was processed using the LaTEX macro package with LLNCS style