Inductive Inference of Prolog Programs with Linear

0 downloads 0 Views 202KB Size Report
We study inductive inference of Prolog programs from positive examples of a ... programs has been thought as one of the most hopeful frameworks to unify concept .... The SLD-resolution gives the procedural semantics of logic programs [7].
Inductive Inference of Prolog Programs with Linear Data Dependency from Positive Data Hiroki Arimura

Takeshi Shinohara

Department of Arti cial Intelligence Kyushu Institute of Technology Kawazu 680-4, Iizuka 820, JAPAN [email protected], [email protected]

Abstract We study inductive inference of Prolog programs from positive examples of a target predicate. Shinohara showed in 1991 that the class of linear Prologs with at most m clauses is identi able in the limit from positive data only about a target predicate. However, linear Prologs are not allowed to have local variables, which are variables occurring in the body but not in the head. In this paper, we introduce another subclass of Prolog programs, called linearly covering programs, which may have local variables in a clause under some constraints on data dependencies. In linearly covering programs, any data passing among subgoals preserves the total size of data contents. We prove that for every xed k; m > 0, the class of linearly covering programs consisting of at most m de nite clauses with at most k subgoals in the body is inferable from positive examples of a target predicate. Furthermore, we show that the restriction on the length of bodies is necessary the inferability.

1 Introduction Since the study on Model Inference System (MIS, for short) by Shapiro [10], inductive inference of Prolog programs has been thought as one of the most hopeful frameworks to unify concept learning and automated program synthesis. In Shapiro's system, the system identi es the unknown Prolog program by receiving both of positive and negative examples from an oracle and sometimes asking the truth value of an ground atom to the oracle. However, when we apply such an inductive learning mechanism to a practical domain, the learning mechanism is often faced with two major diculties. One is the diculty in inferring only from positive examples. In practical applications, negative examples are often hard to obtain. Therefore, it is natural requirement that an inference mechanism should work only from positive data. A learning mechanism that learns from positive data has to avoid overgeneralization. Another problem is a theoretical term generation problem [4, 6, 8]. To de ne a target predicate, Prolog programs use several auxiliary predicates in general. For example, to represent the predicate rev(1; 1) (reversal of a list), the following program uses an auxiliary predicate append(1; 1; 1) (concatenation of two lists). rev ([]; []): rev ([AjB ]; D) rev (B; C ); append(C; [A]; D ):

1

Suppose that an inference algorithm receives examples only of the target predicate rev (1; 1), rev ([]; []); rev ([a; b]; [b; a]); rev ([a; b; c]; [c; b; a]) : : : ;

and cannot know anything about auxiliary predicates. Then, to infer the whole program, it must discover the existence of an auxiliary predicate and must reconstruct the meaning of such a predicate only from given examples of rev(1; 1). Hence, a learning mechanism that infers correct Prolog programs only from positive examples of a target predicate is a desired one from the viewpoint of practical application. The inference problem we study in this paper is identi cation in the limit from positive data [5], where examples are given as an in nite enumeration of all the ground atoms that are true in the least Herbrand model of the unknown program. Receiving an atom as an example and modifying a hypothesis, an inference machine produces an in nite sequence of hypotheses. If the sequence of hypotheses converges to a hypothesis correctly explaining the unknown Prolog program, we say that the inference machine identi es the unknown Herbrand model in the limit. This is an in nite process that continues forever. A basic question for the problem is whether there is a natural class of Prolog programs that are identi able only from positive examples of a target predicate. In the early stage of studies on inductive inference, Gold [5] showed that even the class of regular languages is not inferable from positive data. Since regular languages can be de ned by very simple Prolog programs, we know from Gold's result that the whole class of Prolog programs is not inferable from positive data. In 1991, Shinohara [12] showed that the class of linear Prolog, which was introduced by Shapiro in his study on MIS [10], is inferable from positive data when the number of clauses in a program is bounded. The result further says that programs in the class are inferable from positive examples only of a target predicate. However, linear Prolog programs are too restricted for practical applications such as automated program synthesis. We call a variable in a clause local if it occurs in the body, but does not occur in the head of the clause. In practical Prolog programming, the use of local variables is necessary to represent data passing among subgoals in the body. For example, the local variable C in the second clause of the list-reversal program rev ([AjB ]; D) rev (B; C ); append(C; [A]; D ):

conveys the value of C computed in the subgoal rev (B; C ) to the subgoal append(C; [A]; D). On the other hands, linear Prolog programs have a restriction on their syntax of clauses that the subgoals in the body of any instance of a clause should be smaller in size than the head. This restriction implies a property called variable-boundedness, that any variable must occurs in the head of a clause. Therefore, any linear program is not allowed to use local variables. In Section 2, we introduce a subclass of Prolog programs, called linearly covering programs, which are allowed to use local variables. The second clause of program rev(1; 1), shown in the previous paragraph, is a linearly covering clause, although the whole program is not a linearly covering program. Each predicate symbols in a linearly covering program is associated with an input-output mode declaration like append(+; +; 0), where \+" and \0" indicate that the corresponding argument is of input mode and output mode, respectively. The class of linearly covering programs is de ned through a data-dependency constraint on moded atoms. The constraint ensures that any data-passing between subgoals does not increase the total size of data contents. Further, we show that a program has ground inputoutput property and that the membership decision of atoms in the least Herbrand model is e ectively decidable. We also show that the problem of deciding whether a program P with 2

mode declarations is lineary covering is polynomial time decidable. These properties are shown in Section 3 and Section 4. In Section 5, we prove that for every xed k; m > 0, if the number of subgoals in a clause is bounded by k, the class of linearly covering programs consisting of at most m clauses is identi able in the limit from positive examples of a target predicate. Furthermore, in Section 6, we show that the restriction on the length of bodies is necessary for the inferability.

2 Preliminaries First, we introduce a language for logic programs and their semantics. De nitions and terminology that are not given here are found in Lloyd [7]. The alphabet of a rst order language is a triple L = (5; 6; X ), where 5; 6 and X are mutually disjoint sets, and 5; 6 are nite. We call elements of 5; 6; X predicate symbols, function symbols, variables, respectively. Each symbols in 5 and 6 is associated with a nonnegative integer called an arity. A term is a variable or an expression of the form f (t1 ; : : : ; tn ) where f is a function symbol of arity n  0 and t1 ; : : : ; tn are terms. We denote sequences of terms by bold face letters s; t; u; t1 ; t2 ; : : :. In practical applications of logic programs, predicate symbols are often augmented with a mode declaration. For example, a mode declaration append(+; +; 0) means that the rst and the second arguments of a predicate append=3 are used as input and the third argument is used as output. For simplicity, we assume that all the input arguments are preceding output arguments, and we write an atom with a mode declaration using a semicolon ";" as a punctuation between input and output arguments.

De nition 1 An atom is an expression of the form p(s1 ; : : : ; sm ; t1 ; : : : ; tn ) (m; n  0). An atom is also written as p(s; t) by using the notation for sequence of terms s = s1 ; : : : ; sm and t = t1 ; : : : ; tn which are called the input and the output arguments, denoted by A=I and A=O, respectively. A goal is a clause of the form A1 ; : : : ; An and a de nite clause is a clause of the form A A1 ; : : : ; An , where n  0 is called the body-length. A program is a nite set P of de nite clauses. G1 + G2 denotes the concatenation of goals G1 and G2 . We do not distinguish two goals in di erent order of atoms. Let L be an alphabet of rst order language and P be a program. Then, we denote by B (L) the Herbrand base, the set of all the ground atoms on L, and by M (P ) the least Herbrand model of P . For a predicate symbol p with arity n, the meaning of p de ned by P is Mp (P ) = f p(t1 ; : : : ; tn ) j p(t1 ; : : : ; tn ) 2 M (P )g:

The SLD-resolution gives the procedural semantics of logic programs [7]. We give an equivalent semantics by proof trees. Let ground(P ) be the set of all the ground instances of clauses in P . Let a 2 B (L) be a ground atom and P be a program. A proof tree for a by P is a labeled nite tree de ned recursively as follows:  If c = a 2 ground(P ), then a tree of a single node labeled with a 2 B (L) is a proof tree for a by P .  If T1; : : : ; Tn are proof trees by P for atoms a1 ; : : : ; an, whose root nodes are v1 ; : : : ; vn, respectively, and there exists a clause c = a a1 ; : : : ; an 2 ground(P ), then a tree whose root node is labeled with a and has children v1 ; : : : ; vn is a proof tree for a by P . 3

In these cases, we say that a clause c is used to prove a in the proof tree T by P .

Theorem 1 ([7]) For a ground atom a, the following 1 | 3 are equivalent. 1. a 2 M (P ). 2. There is an SLD-refutation for f ag [ P . 3. There is a proof tree for a by P . Multisets are collections of objects in which an object can occur more than once. We denote multisets by lower case bold face letters x; y; x1 ; x2 : : :. For an object o and a multiset x, we denote by Oc(o; x) the number of occurrences of o in x. For example, if x = fo1 ; o1 ; o1 ; o2 g, then Oc(o1 ; x) = 3, Oc(o2 ; x) = 1 and Oc(o3 ; x) = 0. We de ne the inclusion  as x  y if Oc(o; x)  Oc(o; y) for any object o, the sum x + y as the multiset for which Oc(o; x + y) = Oc(o; x) + Oc(o; y). The size jxj of a multiset x is the number of total occurrences of objects in x. For ordinary sets, relations 2;  and operations [; \ are de ned by the usual manner. The size jtj of a term t is the total number of occurrences of variables and function symbols in t. For a sequence t = t1 ; : : : ; tn of terms, jtj = jt1 j + 1 1 1 + jtn j. The size jaj of an atom a = p(t) is de ned as jtj. We assume a special symbol 1 that is not contained in X . A parametric size over X is a nite multiset consisting of elements of X [ f1g. Let X be the class of all parametric sizes over X .

De nition 2 For term t, the parametric size [t] of t is de ned recursively as follows: 1. if t is a variable x, [t] = fxg, and 2. if t = f (t1 ; : : : ; tn ) for f 2 6, [t] = f1g + [t1 ] + 1 1 1 + [tn ]. 3. For a sequence t = t1 ; : : : ; tn of terms, [t] = [t1 ] + 1 1 1 + [tn ]. Clearly, [s]  [t] implies that jsj  jtj. For x = fo1 ; : : : ; on g (n  0) and a substitution , we de ne x = [o1 ] + 1 1 1 + [on ], where 1 = 1. By the de nition, [s] = [s]. The inclusion  on X is stable to substitutions, that is, if x  y then x  y for any substitution . For terms s = f (x; y), t = f (a; b) and u = [a; x] = :(a; :(x; [])), where :(1; 1) is the list constructor, the corresponding parametric sizes are [s] = f1; x; y g, [t] = f1; 1; 1g and [u] = f1; 1; 1; x; 1g, respectively. For  = fx := tg, [s] = f1; 1; 1; 1; y g. The inductive learning of Prolog programs we consider here is formalized as an identi cation from positive data in the limit. Here we give a brief review of basic de nitions and results on inductive inference according to [1, 12]. Let U and E be two recursively enumerable sets, whose elements are called objects and expressions, respectively. A concept is a subset R  U . A formal system is a nite subset 0  E . A semantic mapping is a mapping 8 from formal systems to concepts. When 8(0) = R, we say that a formal system 0 de nes a concept R.

De nition 3 A concept de ning framework is a triple C = (U; E; 8(1)) of a universe U of objects, a universe E of expressions, and a semantic mapping 8. For an alphabet L of the rst-order language, the triple (B (L); E; M (1)) is a concept de ning framework, where E is a class of de nite clauses and M (P ) is the least Herbrand model of P  E .

De nition 4 Let 01 ; 02 ; : : : be a recursive enumeration of all the formal systems. Then, a class C = f8(01 ); 8(02 ); : : :g is said to be an indexed family of recursive concepts if there is an algorithm that, given w 2 U and 0  E , decides whether w 2 8(0). 4

A positive presentation of a concept R  U is an in nite sequence  = w1 ; w2 ; : : : of objects such that f wn 2 U j n  1g = R. An inference machine is an e ective procedure that requests an object as an example and produces a formal system as a conjecture. Given a positive presentation  = w1 ; w2 ; : : :, M generates an in nite sequence  = g1 ; g2 ; : : : of conjectures as follows: it starts with the empty sample S0 . When M makes the n-th request, an object wn is added to the sample. Then, M reads the current sample Sn = fw1 ; : : : ; wn g and adds a conjecture gn to the end of the sequence of conjectures.

De nition 5 A class C of concepts is said to be identi able in the limit from positive data if there is an inference machine M such that for any R 2 C and any positive presentation  of R, for all suciently large n, the n-th conjecture gn produced by M is identical to g and 8(g) = R. Hereafter, we write \inferable from positive data" instead of \identi able in the limit from positive data". The rst result on inductive inference from positive data is negative one shown by Gold [5] in 1967. The following is the modi ed version of [5].

Theorem 2 If a class C of concepts is super nite, that is, C contains in nitely many nite concepts L0 ; L1 ; L2 ; : : : satisfying and an in nite concept L1

S = 1

n=0

L0 / L1 / L2 / : : :

Ln , then C is not identi able from positive data.

From Gold's theorem, we can see that even the class of regular languages is not inferable from positive data. Angluin [1] characterized a class that is inferable from positive data and presented a number of classes inferable from positive data. Shinohara [12] showed the following theorem by exploiting Angluin's characterization. A semantic mapping is monotonic if 0  00 ) 8(0)  8(00 ). A formal system 0 is reduced with respect to S  U if S  8(0), but S 6 8(00 ) for any proper subset 00 / 0.

De nition 6 A concept de ning framework C = (U; E; 8(1)) has bounded nite thickness if 8 is monotonic and for any nite set S  U and any m > 0, the set f 8(0) j ]0  m, 0 is reduced with respect to S g is nite. Theorem 3 ( [12] ) Let a concept de ning framework C = (U; E; 8(1)) has bounded nite thickness. Then, the class C m = f 8(0) j 0  E; ]0 < mg is inferable from positive data for every m  1. Hereafter, we write C m for C m if no confusion occurs.

3 Linearly Covering Programs In this section, we introduce a class of logic programs, called linearly covering programs as target programs of inference. An argumented goal is a triple (G; x; y) of a goal G, parametric sizes x and y over X.

De nition 7 A block is an augmented goal B = (G; x; y) de ned recursively as follows.  For the empty goal ; and x  y, B = (;; x; y) is a block called an empty block.  For an atom a = p(s; t), (fag; [s]; [t]) is a block called an atomic block. 5

B1 : B2 : B3 : B4 : B5 :

(;; f1; Xsg; fXsg) Empty (fsum(X; M ; L)g; fX; M g; fLg) Atomic (felemsum(Xs; L; N )g; fXs; Lg; fN g) Atomic (fsum(X; M ; L)g; f1; X; Xs; M g; fXs; Lg) Parallel (B1 and B2) (fsum(X; M ; L); elemsum(Xs; L; N )g; f1; X; Xs; M g; fN g) Series (B3 and B4)

Figure 1: Derivation steps for verifying that an argumented goal B5 is a block. This shows that the clause reverse([AjB ]; D) reverse(B ; C ); add last(C; A; D) is linearly covering.

 If B1 = (G1 ; x1 ; y1 ) and B2 = (G2 ; x2 ; y2) are blocks, then B = (G1 + G2 ; x1 + x2; y1 +

y2 ) is a block called a parallel block.  If B1 = (G1 ; x; y) and B2 = (G2 ; y; z) are blocks, then B = (G1 + G2; x; z) is a block called a series block.

Intuitively, we can think of a block (G; x; y) as a network of pipes along which uid can

ow, x as the source and y as the sink of the network. We can construct a larger network by combining two network modules in parallel or in series.

De nition 8 A de nite clause C = p(s; t) G is linearly covering if B = (G; [s]; [t]) is a block. A program P is linearly covering if P consists of linearly covering de nite clauses. A variable x in a clause is called local if x occurs in the body but not in the head of the clause. Consider the following program de ning a predicate elemsum(L; 0; N ) () L is the list of nonnegative integers and N is the sum of elements of L, where [X jXs] denotes the term :(X; Xs). A variable L in the second clause of elemsum=3 is local. elemsum([]; N ; N ) : elemsum([X jXs]; M ; N ) sum(0; Y ; Y ) : sum(s(X ); Y ; s(Z ))

sum(X; M ; L); elemsum(Xs; L; N ):

sum(X; Y ; Z ):

A clause a a1 ; : : : ; an (n  0) is called linear [11, 12] if for any substitution  and any i = 1; : : : ; n, ja j  jai j. While the program shown above is not linear because of the local variable L, it is linearly covering. For instance, the derivation steps in Figure 1 shows that the second clause of elemsum=3 is linearly covering. We show another example of linearly covering program in Figure 2. The next theorem shows that we can eciently decide whether a program is linearly covering.

Theorem 4 ([2]) For any xed alphabet L, there is a polynomial time algorithm that, given a program P and a mode declaration for predicates, decides whether P is linearly covering. Linearly covering programs have the following properties. In particular, Lemma 6 is essential to prove our main theorem in Section 4.

Lemma 5 Let B = (G; x; y) be a block and G0 be a resolvent of G and a linearly covering clause C by the mgu . Then, B 0 = (G0 ; x; y) is a block. 6

C1 : simp(X ; X ): C2 : simp(X ; Z ) simp(X ; Y ); simp(Y ; Z ): C3 : simp(X + X 0 ; Y + Y 0 ) simp(X ; Y ); simp(X 0 ; Y 0 ): C4 : simp(0 + X ; X ): C5 : simp(I (X ) + X ; 0): C6 : simp(X + (Y + Z ); (X + Y ) + Z ): C7 : simp(X + Y ; Y + X ): Figure 2: A linearly covering program P for simpli cation of arithmetic formulas, where 5 = fsimp(+; 0)g and 6 = f0; 1; 2; I (1); 1 + 1g. The symbol 1 + 1 is written in in x notation. Proof. For any empty block (;; x; y) and any , if x  y then x  y. Thus, (;; x; y) is block. For any atomic block (fp(s; t)g; [s]; [t]) and any , (fp(s; t)g; [s]; [t]) is an atomic block because [s] = [s] and [t] = [t]. Based on this fact, we can easily see the following claim by induction on construction of a block. Claim. For any , if (G; x; y) is a block then (G; x; y) is a block. Assume that B = (G; x; y) is a block and C = p(s; t) D. If B is an atomic block, then the only resolvent G0 is D. Since C is linearly covering, G0 = (D; x; y) is a block by de nition and Claim. Next we assume that the result holds for any block that is a component of B . Suppose that B = (G1 + G2 ; x1 + x2 ; y1 + y2 ) is obtained by combining B1 = (G1 ; x1 ; y1 ) and B2 = (G2 ; x2 ; y2 ) in parallel, and assume that G0 is a resolvent of G = G1 + G2 and C . Without loss of generality, we assume an atom in G1 is selected. Then, there is a resolvent G01 of G1 and C by  such that G01 = G01 + G2  . By induction hypothesis and Claim, both of B1 = (G01 ; x1 ; y1 ) and B2 = (G2 ; x2 ; y2 ) are blocks. Thus, we obtain a block B 0 = (G01 + G2 ; (x1 + x2 ); (y1 + y2 )) by combining B10 and B20 . In the case that B is a series block, we can show the result in a similar way. 2

Lemma 6 Let P be a linearly covering program and B = (G; x; y) be a block. If there is an SLD-refutation from G with the answer , then x  y and x  [s]  [t] for each atom p(s; t) 2 G. Proof. Assume that there is a refutation from G with the answer . Then, repeated applications of Lemma 5 show that that (;; x; y) is an empty block. Thus, x  y. Claim. If G has a refutation D, then x  [s] for a block B = (G; x; y) and p(s; t) 2 G. We show the claim above by induction on construction of a block. If B is atomic then Claim immediately holds because x = [s]. Assume that B = (G1 + G2 ; x1 + x2 ; y1 + y2 ) is a parallel block for B1 = (G1 ; x1 ; y1 ) and B2 = (G2 ; x2 ; y2 ), and assume that Claim holds for B1 and B2 . Let p(s; t) 2 G1 . Switching Lemma [7] says that we can exchange any two selections of atoms in constructing D so that the answer  is not changed. Using this lemma, we can construct a refutation from G with  where the k-th goal is G2 and atoms are selected only from G1 in every step 0  i  k 0 1. From this refutation, we get a refutation for G1 with  such that  =  for some . Thus, x1   [s] by induction hypothesis. By x  x1 and stability of  to substitutions, x  [s]. If B is a series block, a similar construction shows Claim. Using Switching Lemma, we can also construct from D a refutation from f p(s; t)g with the answer 0 such that  = 0 0 for some 0 . Since (fp(s; t)g; [s]; [t]) is an atomic block, we have [s]0  [t]0 from the same argument used to prove x  y. Since  is stable to substitutions, we get [s]  [t]. 2 7

A program P has ground input and output (ground I/O, for short) property if for any goal G = p(s; t) with ground input arguments s and any answer substitution  of an SLD-refutation from G t is ground.

Theorem 7 A linearly covering program has ground I/O property. Proof. Let G = p(s; t) be a goal such that s is ground. Then, [s] is of the form f1; : : : ; 1g. If there is an SLD-refutation from G with the answer , then [t] is also of the form f1; : : : ; 1g by Lemma 6. Hence, t is ground. 2

4 Positive result In this section, we show that for every k; m > 0, the class of linearly covering programs consisting at most m de nite clauses of body-length at most k is identi able in the limit from positive examples of a target predicate.

Theorem 8 ([2]) For the class of linearly covering programs, there is a recursive function f from ground goals and programs to positive integers such that for any ground atom a and any program P in the class, a 2 M (P ) i there is an SLD-refutation from a of length at most f (a; P ). Lemma 9 Let 01 ; 02 ; : : : be any recursive enumeration of linearly covering programs consisting of clauses of body-length at most k. Then, the class of their least Herbrand models is an indexed family of recursive concepts. Proof. By Theorem 8, M (P ) is recursive for any linearly covering programs P . On the other hand, given an alphabet for predicates 5, there are at most nitely many mode declarations for 5. Thus, from Theorem 4, we can e ectively enumerate all the linearly covering programs on a given rst order language. 2

De nition 9 Let P be a program. A supporting clause for P is a ground instance C = a a1 ; : : : ; am of a clause in P for which fa1 ; : : : ; am g  M (P ). Lemma 10 Let P be a linearly covering program. Then, for any supporting clause C a1 ; : : : ; am for P , ja0 =I j  jai =I j and jai =I j  jai =Oj for any 0  i  m.

= a0

2

Proof. By Theorem 1 and Lemma 6.

Now, we show the inferability of the class of Herbrand models restricted to positive examples of a target predicate. We consider a concept de ning framework LC k = (B (L); LCk ; Mp (1)), where k is a xed positive number, LCk is the set of all linearly covering clauses that the number of subgoals in a clause is bounded by k, and p is a special predicate symbol in 5.

Theorem 11 For every k  1, the concept de ning framework LC k has bounded nite thickness. Proof. Let m  1, S  U be a nite set, P  LCk and ]P  m. Assume that P is reduced with respect to S . Let C = A A1 ; : : : ; An 2 P be a clause. Claim1. For a 2 M (P ), let T be a proof tree for a by P . Then, for any node v in T and the label b of v, ja=I j  jb=I j. (See Figure 3)

8

simp(I (1) + (1 + 2);

2)

0(C2)@ @@ 0 0 @ 0 simp(I (1) + (1 + 2); 0 + 2) simp(0 + 2; 2) (C4) 0(C2)@ 0 @ 0 @@ 0

simp(I (1) + (1 + 2); (I (1) + 1) + 2)

simp((I (1) + 1) + 2;

0 @ 0 (C3) @ 0 @@ 0

(C6)

simp(I (1) + 1;

(C5)

0)

simp(2;

(C1)

0 + 2)

2)

Figure 3: A proof tree T for an atom simp(I (1)+(1+2); 2) by a linearly covering program P in Figure 2, where Ci in parentheses under each node labeled by an atom A denotes that the clause Ci is used to prove A in T . We can see that any atom in T satis es Claim 1 in the proof of Theorem 11 . (Proof of Claim1) We prove by induction on the height of T . If the height is 0, the claim immediately holds. Assume that the claim holds for any proof tree of height less than h. Let T be a proof tree for a by P of height h. Then, there is some a a1 ; : : : ; an 2 ground(P ) such that for children v1 ; : : : ; vn of the root v of T , the subtrees T1 ; : : : ; Tn with the roots v1 ; : : : ; vn of T are proof trees for a1 ; : : : ; an , respectively. Since each Ti is of height at most h 0 1, for any node occurring in some Ti , jai =I j  jb=I j by induction hypothesis. On the other hand, by Lemma 10, ja=I j  jai =I j. Therefore, the claim is proved. (End of Proof of Claim1) Since P is reduced with respect to S , for any C 2 P , there is a pair of a proof tree T of an atom a 2 S and a ground instance c = b b1 ; : : : ; bn of C that is used to prove the label b of some node in T . Combining Claim 1 and Lemma 10 again, we can prove that ja=I j  jbi=I j  jbi =Oj for all i = 0; 1; : : : ; n. Thus, the following claim holds. Claim2. For all C = A0 A1 ; : : : ; An 2 P and all i = 0; 1; : : : ; n, jAi j  2l, where l = maxf jaj j a 2 S g. Thus, the size jC j of C 2 P is bounded by 2l(k + 1) for the maximum body-length k of C . There are nitely many atoms smaller than a xed size except renaming of variables. Moreover, any program P such that ]P  m de nes at most m distinct predicates. Thus, we can prove that

f M (P ) j P  LC ; ]P  m; P is reduced with respect to S g: is nite. Since the semantic mapping M (1) is monotone, the result follows. 2 Hence, we can show the inferability of the concept class LC from positive data of a target p

k

p

m k

predicate by Theorem 3.

Corollary 12 Let p be a special predicate symbol in 5. For every xed k; m > 0, if the number of subgoals in a clause is bounded by k, the concept class LC m of linearly covering k programs consisting of at most m de nite clauses in LCk is identi able in the limit from positive examples of the target predicate p . 9

5 Negative result Finally, we show that the restriction on the body-length of clauses for the target class in Theorem 11 is necessary even in inference of least Herbrand models. Let LC be the set of all linearly covering clauses.

Theorem 13 Let L = (5; 6; X ) be an alphabet such that B (L) is in nite. Suppose that 5 contains at least two nonzero arity predicates and one of them is of arity  2. Then, for every m  4, the class of linearly covering programs consisting of at most m de nite clauses in LC is not identi able in the limit even from positive examples of all predicates in 5. Proof. Since B (L) is in nite, 6 contains at least one function symbol of arity more than zero and one constant symbol. Thus, we can assume that P = fp(+); q (+; 0)g and 6 = fa; f (1)g without loss of generality. Then, we de ne an in nite sequence P0 ; P1 ; : : : ; P1 of linearly covering programs consisting of at most four clauses as P0 P1 Pn P1

= =

f p(a) g; f p(x1 ) q(x1; a) g;

=

f p(x )

111

111 ( =

n

p(x)

q (xn ; xn01 ); : : : ; q (x1 ; a) g [ Q q (x; y ); p(y ); p(a)

where Q is de ned as Q=

(

)

(for n  1),

[ Q;

q (x; x) ; q (f (x); x)

) :

z }| { n

Let w0 be the term a, and wn be the term f (f (: : : f (a) : : :)) for n  1. Then for each n  0, Pn de nes the model Mn = fw0 ; w1 ; : : : ; wn g, and program P1 de nes M1 = f wn j n  0g. Hence, the result immediately follows from Theorem 2. 2

6 Conclusion In this paper, we proposed a subclass of Prolog programs, called linearly covering programs and showed that if the maximum length of subgoals in a clause is bounded by a xed constant k > 0, the class is inferable from positive examples only of a target predicate. Further, we showed that the restriction on the length of bodies is necessary. In recent years, a number of inference systems are proposed in the area of inductive logic programming concerning to theoretical term generation problem [4, 8]. From our main result, we know the existence of an e ective inference algorithm that infers a linearly covering program de ning an unknown target relation by receiving positive examples of the target relation. Such a inference algorithm is an ideal for concept learning and program synthesis in inductive logic programming. Therefore, studies on implementation of inference algorithm based on this work may be interesting. For practical application, eciency is important. In [3], the authors studied an ecient inference algorithm that infers a small subclass of linear Prologs from positive data. The study on ecient algorithms for subclasses of linearly covering programs is a future problem. 10

Plumer [9] studies the procedural semantics of a slightly wider class of Prolog programs than linearly covering programs based on linear predicate inequalities. Acknowledgments. The rst author is grateful to Prof. Setsuo Arikawa for his constant encouragement. He also would like to thank Prof. Setsuko Otsuki and Prof. Akira Takeuchi for their constant support and Hiroki Ishizaka for fruitful discussion on inductive inference of Prolog programs.

References [1] D. Angluin. Inductive inference of formal languages from positive data. Information and Control, 45:117{135, 1980. [2] H. Arimura. Depth-bounded inference for nonterminating prologs. Bulletin of Informatics and Cybernetics, 25(3{4):125{136, 1993. [3] H. Arimura, H. Ishizaka, and T. Shinohara. Polynomial time inference of a subclass of context-free transformations. In Proc. 5th Annual ACM Workshop on Computational Learning Theory, pages 136{143, 1992. [4] R. B. Banerji. Learning theories in a subset of a polyadic logic. In Proc. rst Annual ACM Workshop on Computational Learning Theory, pages 281{295, 1988. [5] E.M. Gold. Languages identi cation in the limit. Information and Control, 10:447{474, 1967. [6] H. Ishizaka. Polynomial time learnability of simple deterministic languages. Machine Learning, 5(2):151{164, 1990. [7] J.W. Lloyd. Foundations of Logic Programming. Springer Verlag, second edition, 1987. [8] S. Muggleton. Inductive logic programming. In Proceedings of the First International Workshop on Algorithmic Learning Theory, pages 42{62, 1990. [9] L. Plumer. Termination proofs for logic programs based on predicate inequalities. In Proc. of the seventh International Conference on Logic Programming, pages 634{648. MIT press, 1990. [10] E. Y. Shapiro. Algorithmic Program Debugging. MIT Press, 1982. [11] E. Y. Shapiro. Alternation and the Computational Complexity of Logic Programs. J. Logic Programming, 1(1):19{33, 1984. [12] T. Shinohara. Inductive inference of monotonic formal systems from positive data. New Generation Computing, pages 371{384, 1991.

11