c
, , 1–31 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
Inductive Program Synthesis as Induction of Context-Free Tree Grammars* UTE SCHMID, MARTIN MU HLPFORDT AND FRITZ WYSOTZKI
[email protected],
[email protected],
[email protected] Department of Computer Science, Technical University Berlin
Abstract. We present an application of grammar induction in the domain of inductive program synthesis. Synthesis of recursive programs from input/output examples involves the solution of two subproblems: transforming examples into straightforward programs and folding straightforward programs into (a set of) recursive equations. In this paper we focus on the second part of the synthesis problem, which corresponds to program synthesis from multiple traces or programming by demonstration. Instead of the original framework of synthesis of LISP functions and the currently prominent framework of inductive logic programming, we take a more general view covering both research areas: the synthesis of recursive program schemes. We show that this problem corresponds to the problem of inferring a context-free tree grammar from a single noise-free positive example and provide a synthesis method. While our method does (of course) not solve the synthesis problem for the unrestricted set of recursive program schemes, we can show that some limitations of known synthesis algorithms can be overcome. We argue that reformulating the program synthesis problem in the framework of grammatical inference provides for more transparency in what classes of problems really are covered by synthesis algorithms and can give rise to new algorithms with better performance. For the grammar induction community we hope to awaken interest in open problems in the inference of context-free tree grammars.
Keywords: grammar inference, context-free tree grammars, inductive program synthesis, recursive program schemes
*
draft, do not cite
D R A F T
December 11, 1998, 10:37am
D R A F T
2 1. Introduction
The synthesis of recursive programs from input/output examples is a research area of machine learning which was extensively worked on in the seventies in the context of functional (LISP) programming (Summers, 1977; Kodratoff & J.Fargues, 1978; Jouannaud & Kodratoff, 1979; Le Blanc, 1994) and now is readdressed in inductive logic programming (ILP, see Flener & Yilmaz, to appear; Muggleton & De Raedt, 1994 for a survey). All algorithms for functional program synthesis and a growing number of ILP algorithms (Aha, Ling, Matwin, & Lapointe, 1993; Idestam-Almquist, 1995) rely on small (ordered) sets of only positive examples as input. Prototypically, the automatic construction of common list-processing (sometimes also arithmetic) functions is investigated. Unfortunately, there is only limited interchange between research in program synthesis and algorithmic learning theory. The usual approach in inductive program synthesis is to develop an algorithm together with some heuristics and evaluate it empirically by studying its behavior on a set of (benchmark) problems. While there has been some progress in the scope and efficiency of such algorithms over the last decades (Flener & Yilmaz, to appear), critical analyses of the synthesis mechanisms employed and of the classes of programs which are inferable by these mechanisms are seldom provided (exceptions are Bostr¨om, 1996; Yamamoto, 1997 and some recent work in algorithmic learning theory given below). Algorithmic learning theory, on the other hand, provides us with the theory of grammatical inference which gives us a formal framework for the precise formulation of learning problems and thereby for analyzing the conditions, restrictions and hardness of learnability for clearly characterized classes of problems (Gold, 1967; Angluin & Smith, 1983; Osherson, Stob, & Weinstein, 1986; Zeugmann & Lange, 1995; Zeugmann, 1997; Richter, Smith, Wiehagen, & Zeugmann, 1998; Slutzki & Honavar, 1998). Most work in algorithmic learning theory is concerned with principal questions of learnability. Only since the last years there is a growing interest in development and analyses of learning algorithms. In the domain of learning pattern languages (defined by Angluin, 1980) it can be shown that such analyses can lead to the formulation of efficient inference algorithms (Reischuk & Zeugmann, 1998). But up to now, existing functional and ILP synthesis algorithms are seldom considered. Exceptions are some analyses of the learnability of logic programs (Rao, 1996, 1997; Yamamoto, 1993; Arimura, 1997; Sharma, 1998) and the work on PAC
D R A F T
December 11, 1998, 10:37am
D R A F T
3 (probably approximately correct) learning of recursive logic programs from equivalence queries by Cohen (1995). Furthermore, most algorithms are restricted to learning regular grammars and grammars for subclasses of pattern languages and the few papers on learning context-free grammars (see Sakakibara (1997) for an overview) do not address program synthesis as domain for applications1. In this paper we will formulate a class of problems for inductive program synthesis within the framework of grammatical inference. For the program synthesis community we hope to show that the use of this theory can help to clarify the synthesis task for which an algorithm is developed and as a result make the algorithms more transparent and provide a basis for comparing different algorithms not only empirically but also on an analytical level. For the grammatical inference community we hope to provide an example of a class of problems of practical relevance for which it would be useful to gain further theoretical insights. Our approach to inductive program synthesis (see Wysotzki, 1983; Schmid & Wysotzki, 1998, to appear; M¨uhlpfordt & Schmid, 1998) is in the context of functional programming. But in contrast to the classical work of Summers (1977) and Jouannaud and Kodratoff (1979) we take a more general view: Our algorithm is designed independently of a given programming language, relying instead on the notion of recursive program schemes which are elements of some arbitrary term algebra (i.e. representing classes of interpretations; Courcelle & Nivat, 1978). Thereby we believe that we contribute also to the current research in induction of recursive logic programs – as Le Blanc (1994) has shown for generalizing Jouannaud’s and Kodratoff’s BMWk methodology in a term rewriting framework. Other work taking a language independent approach to inductive program synthesis was conducted by Lange (1986, 1987). We adopt the two-step approach to solving the synthesis problem proposed by Summers: In the first step, we generate straightforward programs for transforming each example input into its corresponding output; in the second step, we generalize over the ordered sequence of these transformations. The first part of the task depends on background knowledge about the program domain (such as available primitive operators and the complete partial orders over the data structures involved) and is basically a search intensive rewrite problem. The second part is basically a pattern matching task and can be performed for a large range of problems purely syntactically.
D R A F T
December 11, 1998, 10:37am
D R A F T
4 If, for example, we have the following set of input/output examples E
([x; y]; 2)g we can rewrite it into the following transformations
= f([ ]; 0); ([x];1);
G1 = if empty(l) then 0 G2 = if empty(tail(l)) then plus(1; 0) G3 = if empty(tail(tail(l))) then plus(1; plus(1; 0)): This is clearly a non-trivial task which is highly dependent on the background knowledge which is provided. In our own approach, we combine these transformations into one nested conditional expression called “initial program” (see section 2). The straightforward transformations provide the information which is employed in programming by demonstration (Cohen, 1998) or program synthesis from traces (Wysotzki, 1983), which corresponds to the second step described above. To fold the transformations into a recursive program, we have to detect structural dependencies (recurrence relations) between the
Gi ’s.
Formally, this corresponds to the inverse of the procedure for deter-
mining the denotational semantics of a recursive program 2 (that is, determining the least fixpoint of the Kleene sequence, see Wysotzki, 1983; Schmid & Wysotzki, 1998). We will describe the technique in more detail in section 2. For the given example we obtain G(l) = if empty(l) then 0 else plus(1,G(tail(l))). Of course, the synthesis algorithms in the ILP framework also cover both steps – but they are seldom analytically discriminated. We believe that this discrimination is helpful in several ways: The complex problem of program synthesis is divided into two simpler parts which can be studied (and hopefully conquered to some extent) separately; the problems addressed in both steps demand not only quite different algorithmic approaches, but are also distinct on a cognitive level when regarding human programmers. The first step corresponds to exploring a new programming problem by “hand-simulations”, the second step corresponds to an unsupervised learning task - namly generalizing over the structure of the hand-simulated examples (Schmid & Wysotzki, to appear). In our own approach, we employ a generic planning algorithm for constructing the straightforward transformations (Schmid & Wysotzki, 1998). While our work here is clearly at an early stage, we can already show that a wider class of input/output examples can be handled by our planning approach than in the classical work of Summers (1977) and Jouannaud and Kodratoff (1979), which was restricted to single input parameters repre-
D R A F T
December 11, 1998, 10:37am
D R A F T
5 sented as lists. In the following, we will focus on the second step of the synthesis task: We regard an initial program as a term (word) produced by some unknown recursive program scheme (tree grammar). That is, we will start with a single, noise-free, structured positive example. We will allow linear and tree recursive program schemes with interdependent parameter substitutions. While simple linear recursions can be described by regular grammars (Bostr¨om, 1996; Rao, 1996; Cohen, 1995), tree recursion has to be modelled by more complex rules, namely by context-free tree grammars (Lu & Fu, 1984; Sakakibara, 1990, 1992; M¨akinen, 1992; Sakakibara, 1997). The paper is organized as follows: In the next section we will present our approach to inductive synthesis of recursive program schemes in an informal way, introduce our basic terminology and formulate the synthesis problem. In section 3 we will present our methodology for inferring a certain class of context-free tree grammars corresponding to recursive program schemes. We will show that this methodology directly gives rise to an induction algorithm which we will present together with an illustrative example in section 4. In section 5 we will discuss the scope and efficiency of the algorithm. We conclude with an evaluative summary of our approach.
2. Inductive Synthesis of Recursive Program Schemes Our approach to inductive program synthesis is based on a proposal by Wysotzki (1983). Wysotzki provided a framework for inferring recursive prog ram schemes from multiple traces, so-called “initial programs”. We extended this framework to a broader class of inferable structures – namely tail, linear and tree recursive structures, and combinations thereof and implemented the corresponding induction algorithm. Furthermore, we provide methods for generating initial programs by planning and for using analogical reasoning and learning as an alternative approach to synthesis from scratch. The system is reported in Schmid and Wysotzki (1998) and briefly described in the appendix. 2.1. Inferring Recursive Program Schemes from Initial Programs First we want to give the general idea of inductive synthesis of recursive program schemes from initial programs by means of an informal example. Consider the following initial program:
D R A F T
December 11, 1998, 10:37am
D R A F T
6
t = g(empty(l); 0; plus(1; g(empty(tail(l)); 0; plus(1; g(empty(tail(tail(l))); 0; plus(1; )))))): This program corresponds to the straightforward transformations for calculating the number of elements of a list as given in section 1. To represent terms completely in prefix notation we use the conditional
g(x; y; z) =def if x then y else z .
The transformations for
lists with zero, one and two elements are combined into a single nested conditional. We introduce the symbol
denoting “undefined” for cases where no information about the
desired transformation is available. The initial program corresponds to the third unfolding of the recursive equation given in section 1. In Schmid and Wysotzki (1998) we showed that we can infer the unknown recursive program if we can decompose t into a sequence
1 =m : : : G i?1 =m ) holds for all i, where tr is a = tr(G[it?1 =v ] 1 [tn =v] n constant term, [ti =v] are substitutions of variables v, and mi are positions in tr. This of terms G i for which G i
corresponds to a sequence of unfoldings (expansions) of a recursive equation, starting with the 0th expansion ( ) and ending with the expansion which exactly reproduces the initial program t. For our example we obtain the segmentation
G0 G1 G2 G3 where
= = = =
g(empty(l); 0; plus(1; )) g(empty(l); 0; plus(1; g(empty(tail(l)); 0; plus(1; )))) t
?1 G i = g(empty(l); 0; plus(1; G[itail (l)=l] )) holds.
That is, we can extrapolate the
recursive equation for calculating the length of arbitrary lists. For our simple example (and in Wysotzki, 1983 and Schmid & Wysotzki, 1998) we have considered an initial program as a syntactic term. If initial programs are generated by hand-simulations of a human programmer or by a planning algorithm, the semantics of the symbols is usually known or predefined and the program construction is done over constants and not over variables. To take this into account, in the following we will consider interpreted and valuated programs: We can interpret the term t given above with regard to
some algebra where the function symbols are interpreted as: empty(l) is true if l is empty
D R A F T
December 11, 1998, 10:37am
D R A F T
7 and false otherwise; tail(l) returns a non-empty list l without its first element;
plus(x; y)
returns the sum of two numbers; 0, 1 are the constants zero and one respectively. To keep
notation simple, we do not introduce new symbols for interpreted terms. Let us assume, the programmer or planner worked with a list containing two elements – for example, That is, we have a valuation (l)
= cons(a; cons(b; nil)).
(a; b).
The resulting interpreted and
valuated initial program is
tA = g(empty(cons(a; cons(b; nil))); 0; plus(1; g(empty(tail(cons(a; cons(b; nil)))); 0; plus(1; g(empty(tail(tail(cons(a; cons(b; nil))))); 0; plus(1; )))))): In the following formalization of the synthesis task we will always refer to interpreted and valuated terms (cf. Guessarian, 1992 p. 301) when we speak of initial programs. 2.2. Basic Terminology Now we will introduce some concepts and notations which are used during the rest of the paper. We use mostly the standard notation in the area of tree grammars (see cf. Guessarian, 1992; Hofbauer, Huber, & Kucherov, 1994).
is a finite set of function symbols f of fixed arity r(f); r : ! N. With X we denote the set of variables, with T(X) the set of terms over and X , and with T the set of all groundterms (terms without variables) over . We use tree and term as synonyms. A tree in T (X) will be denoted by t(x1; : : : ; xp) Terms and trees. A signature
in order to point out the variables; if unambiguous, we use the shorthand vector notation
t(~x). By t(t1 ; : : : ; tp ) we denote the tree obtained by simultaneously substituting the terms ti for each occurrence of the variables xi in t(x1 ; : : : ; xp ), i = 1; : : : ; p. The shorthand vector notation t(~t) will be used. To point out which trees are being substituted for which variables, we will note t(t1 =x1; : : : ; tp=xp ) resp. t(~t=~x). We call the terms ti instantiations of the variables xi . The set of all variables in term t is var(t). We use the standard concepts for substitution and unification. Subterms. A position in t is defined in the usual way as a sequence of natural numbers:
is a position in t – namely the root, i.e. 0th position; (b) if t = f(t1 ; : : : ; tp ), f 2 , and u is a position in ti, then i:u is a position in t. A subterm of t at the position u (denoted by t=u) is defined as: (a) t= = t; (b) if t = f(t1 ; : : : ; tp), f 2 , and (a)
D R A F T
December 11, 1998, 10:37am
D R A F T
8
u is a position in ti (1 i p), then t=i:u := ti =u. For a term t and a position u in t the function node(t; u) results in a pair (f; r) with t=u = f(t1 ; : : : ; tr ), where ti are some terms from T(X), and r = r(f). A prefix of a tree t 2 T(X) is a tree p 2 T (fy1 ; : : : ; yn g), with fy1 ; : : : ; yng \ X = ;, and such that there exist subtrees t1 ; : : : ; tn of t with t = p(t1=y1 ; : : : ; tn=yn). We write p t if p is a prefix of t. We write p < t, if p and t cannot be unified by renaming of variables only.
To refer to a (specific) “unfolding” of a recurrent term, we introduce the notion of a
s 2 T (Y ) of a term p 2 T(X [ Y ) with X \ Y = ; along (the “recursion points”) Y in a term t 2 T at occurence w is defined as: (a) s is a ~ x; ty1 =y1; : : : ; tyn=yn ) (i.e. p t) and segment of p in t at occurence iff t = p(tx=~ ~ x); (b) if t = p(t~x=~x; ty1 =y1; : : : ; tyn=yn ), and s a segment of p in tyi along Y s = p(tx=~ at occurence w then s is a segment of p in t along Y at occurence i:w. By S(p; Y; t) we denote the set of all segments of p in t along Y; with W(p; Y; t) the set of all occurences of p in t along Y: The mapping between an occurence w and the corresponding segment is s(w) : W(p; Y; t) ! S(p; Y; t). “segment”: A segment
R = (N; ; P; s) consists of disjoint signatures N (nonterminals, all symbols of N with rank 0) and (terminals), a finite rewrite system P over N [ , and a distinct constant symbol s 2 N (initial symbol). All rules in P are of the form A ! t, where A 2 N and t 2 TN [ . A context-free tree grammar (CFTG) C = (N; ; P; s) consists of disjoint signatures N (nonterminals with arity r(A) 2 N) and (terminals), a finite rewrite system P over N [ , and a distinct constant symbol s 2 N , r(s) = 0 (initial symbol). All rules in P are of the form A(x1; : : : ; xn) ! t, where A 2 N , r(A) = n, x1; : : : ; xn are pairwise different variables, and t 2 TN [ (fx1 ; : : : ; xng). For rules A ! t1 and A ! t2 we use the abbreviation A ! t1 j t2 . Starting with the initial symbol s, the nonterminals will be replaced by the right hand side of the appropriate rule Tree grammars. A regular tree grammar (RTG)
whereby all variables are substituted by the corresponding terms. With
! P
denoting
!P generated by P , the language generated by a grammar G = (N; ; P; s) is L(G) = ft 2 T(X) j s !P tg. the reflexive-transitive closure of the rewrite relation
D R A F T
December 11, 1998, 10:37am
D R A F T
9 2.3. Recursive Program Schemes A recursive program scheme (RPS) on a signature
,
a set of variables
X,
and a set
is a pair hS; t0 i, where t0 2 T[(X) (corresponding to the “main program” or “axiom”) and S is a system of n equations (recursive subprograms): S = hGi (xi1; : : : ; xini ) = ti , i = 1; : : : ; ni, with Gi 2 , xij 2 X for each j = 1; : : : ; ni, and ti 2 T[(fxi1; : : : ; xini g) for all i. Each RPS is associated with a CFTG ChS;t0 i = ( [ fsg; [ f g; P; s), s 62 , 62 ( as the bottom element); P is defined by: P = fs ! t0 j g [ fGi (xi1; : : : ; xini ) ! ti j g with Gi 2 (cf. Guessarian, 1979). With the CFTG ChS;t0 i we can now unfold a recursive scheme to terms in T[f g(X). An interpretation i of the unfolded trees of an RPS hS; t0 i is a homomorphism i in a -algebra A, i : T ! A, defined over operation symbols in and extended to terms. A valuated interpretation is an interpretation together with a valuation : X ! A; denotes the extension of to terms. In the following we will consider untyped structures only. of function variables
The initial programs, for which we want to infer an RPS, are interpreted and valuated
A is unknown, we regard an initial program not as element of A but as valuated term of the term algebra over with the valuation : X ! T (that is, the variables are valuated by groundterms). By L(C; ) we denote the set of the valued terms of a grammar C to a given valuation : L(C; ) = f(t)jt 2 L(C)g. For the “list-length” example given above we have the following RPS: = f00; empty1 ; tail1 ; plus2; g3 g, X = flg, = fG1g, t0 = G(l), S : G(l) = g(empty(l), 0, plus(1, G(tail(l)))). We can unfold it with the associated CFTG ChS;t0 i = ( [fsg; [f g; P;s), P = fs ! G(l) j , G(l) ! g(empty(l); 0; plus(1; G(tail(l)))) j g, and get for instance the term t given in section 2.1. This term can be interpreted in an algebra A and the occuring variable l can be valuated (see term tA in section 2.1).
terms in an unknown algebra. Because the algebra
2.4. The Synthesis Problem With the definition of recursive program schemes and (interpreted and valued) initial programs we can now formulate the synthesis problem. Before we do this, we will introduce some limitations on the set of RPSs we will consider in our synthesis algorithm and on the algorithm itself.
D R A F T
December 11, 1998, 10:37am
D R A F T
10 Our methodology is restricted to the following subclass of recursive program schemes: 1. There is only one recursive subprogram
G, i.e. kS k = 1.
That is, we do not allow
a recursive program to call another program and especially, we do not allow indirect recursions (programs mutually calling themselves). 2. All variables appear in the body t1 of the recursive subprogram
G (i.e.
there is no
variable just handed down) and each variable is used in at least one substitution per recursive call. 3. There is no recursive call in a substitution (as in the Ackermann function). 4. The term t0 (the main program) consists only of the recursive subprogram call. Additionally, there are some limitations for inference: 5. Variables, which are used as constants (like
n in the exponential function f(n,m) =
g(eq0(m),1, mult(n, f(n, p(m))))) cannot be identified as variables. They occur in the resulting RPS as constant parts in the subprogram body. 6. If the main program calls the subprogram with substitutions of variables (like t0 = G(succ(x))), then they cannot be separated from the valuation of the variables. Limitation 4 can be easily resolved by extending our algorithm accordingly (cf. Schmid & Wysotzki, 1998). Limitation 2 can be overcome by introducing a more complex search algorithm for finding substitutions (see below). Limitations 5 and 6 are principally not solvable without additional information about variables which occur in the RPS to be inferred (which is common practice in ILP algorithms; see Flener & Yilmaz, to appear) and/or their instantiations. Limitations 1 and 3 are challenging problems which will not be solved easily and may require a different inference method. We will discuss the limitations of our method in more detail in section 5. Now we are ready to specify the synthesis problem (see definition 1): Given an initial program we want to infer an appropriate CFTG, such that the initial program is an element of the language defined by it. That is, we regard program synthesis as a special kind of grammatical inference. Definition 1 (Synthesis problem). Let tinit be an initial program. Then
D R A F T
December 11, 1998, 10:37am
D R A F T
11
a signature ,
= fx1; : : : ; xnV g, a CFTG ChS;t0 i = (fs; Gg; [ f g; P; s) corresponding to an RPS hS; t0 i with P = fs ! G(x1; : : : ; xnV ) j ; G(x1; : : : ; xnV ) ! t1 j g, t1 2 T[fGg (X), a set of variables X
and
t1 = p(~x; G(ts11 (~x); : : : ; ts1nV (~x))=y1 ; .. .
G(tsnR 1(~x); : : : ; tsnR nV (~x))=ynR )
2 T[fGg(X [ Y )) with Y = fy1 ; : : : ; ynR g indicating the nR recursion points S and where 8i = 1; : : : ; nR : 8j = 1; : : : ; nV : tsij 2 T (X); j var(tsij ) = X , and a valuation : X ! T are to be inferred, such that tinit 2 L(ChS;t0 i ; ) and the following constraints hold: Recurrence: for each r = 1; : : : ; nR : kfw j w 2 W(p; Y; tinit); w = v:rgk 2. That (p
is, for each recursion point the initial program has to consist of the root segment and at least two unfoldings (segments) of the hypothetical RPS. To be more specific, the following information is needed for inference: (i) a root segment as hypothesis for the recursive structure to be induced, (ii) at least one additional segment for each recursion point for validating the hypothesis and for building a hypothesis for the substitutions, and (iii) at least one further segment for each recursion point for validating the substitution hypothesis.
Simplicity: there exists no RPS hS 0 ; t00i, kS 0 k = 1, such that L(ChS;t0 i ) L(ChS 0 ;t00 i ).
3. Inferring Context-Free Tree Grammars from Initial Programs In this section we will present our formal framework for inferring CFTGs. Our hypothesis language is restricted to RPSs as defined above3 . Input to the learning algorithm is an initial program (subsequently also called “initial tree”). The initial program is assumed to be generated by the nth unfolding of an unknown RPS, i.e. by the corresponding CFTG. The inference process is driven by the syntactic structure of the initial program. For a given initial program we have to identify (1) the subprogram body (recursive term) of the
D R A F T
December 11, 1998, 10:37am
D R A F T
12 hypothetical RPS and (2) the substitutions over the variables occurring in the recursion. The first step (see section 3.1) corresponds to the inference of a regular tree grammar which generates the prefix of the initial program. In the second step (see section 3.2) the RTG is extended to a context-free tree grammar.
3.1. Building a prefix-generating RTG Definition 2 (Prefix-generating RTG). Let tinit be an initial tree, the set of all symbols in
tinit without , and Z = fz1; : : : ; znV g, Y = fy1 ; : : : ; ynR g two disjoint set of variables (Y indicating the recursion points). An RTG R = (fG; sg; ; P; s) with P = fs ! G j , G ! t1 j g, t1 = p(z1 ; : : : ; znV ; G=y1; : : : ; G=ynR ), p 2 T (Z [ Y ), and s ! P t0init, t0init tinit is a prefix-generating RTG for tinit. During derivation the variables zi of t1 are renamed, such that all variables in the resulting term occur at one position only. Note that all generated ’s in the derived prefix
t
0init must match an in tinit .
This restricts the
choice of the recursion points.
Definition 2 entails that we have to construct a term
p with Y
indicating the recursion
points, which leads to a prefix-generating RTG for tinit and therewith to a segmentation of
tinit. Afterwards, we have to enlarge p to the subprogram body presupposing lemma 1.
Lemma 1 (Separability of program body and instantiations) Let : X ! T be a valuation and ChS;t0 i be a CFTG to an RPS hS; t0 i with P = fs !
G(ts01(~x); : : : ; ts0nV (~x)), G(x1; : : : ; xn) ! t1g, t1 = p( ~x, G(ts11 (~x); : : : ; ts1nV (~x)); .. .
G(tsnR 1(~x); : : : ; tsnR nV (~x))), tsij 2 T (X) for all i = 0; : : : ; nR and all j = 1; : : : ; nV . Let I 6= ; be an index set, so that for each i 2 I there exists a prefix pi 2 T(Y ) n Y with pi (ts0i (~x)) and for each j = 1; : : : ; nR : pi tsji. Then there exists an RPS hS 0 ; t00i over X 0 with the associated CFTG ChS ;t0 i and a valuation 0 : X 0 ! T, such that L(ChS;t0 i ; ) = L(ChS ;t0i ; 0) and for each i = 0
0
D R A F T
0
0
December 11, 1998, 10:37am
D R A F T
13
1; : : : ; n0V there exists no prefix pi 2 T(Y ) n Y j = 1; : : : ; nR : pi ts0ji.
with pi
0(ts00i (~x0 )) and for each
(Proof: by constructing the RPS hS 0 ; t00 i and structural induction.)4
If the potential recursion points and thereby a segmentation of the initial tree have been found, then the following propositions hold: (1) The maximum prefix over all segments has to be the body of the subprogram. (2) The remaining subtrees should be instantiations of the variables and must be explained by a valuation and some substitutions. This means that the predefinition of the recursion points determines the subprogram body and the variables. Hence the overall strategy is to backtrack over possible segmentations of the initial tree.
3.1.1. Building a segmentation
The search for the appropriate recursion points has to
allow backtracking. Initially, we are looking for a first potential recursion point ur1 (the lower index indicates the number of the recursion point) in the initial tree, i.e. a position
ur1 in tinit with ur1 6= , node(tinit; ur1) = node(tinit; ). Then we build a term p consisting of all nodes between the root and the recursion point by applying a function skeleton5 (defined below) and test it by building the prefix-generating RTG (i.e. the RTG generates a term t0init tinit) for which the constraints given in definition 2 must hold.
p by further recursion points uri (marked in p by pairwise different variables yi 2 Y ) until the associated prefix-generating RTG generates all ’s in tinit and satisfies the constraints. The skeleton p 2 T(Z [ Y ) is a minimal prefix of tinit Finally, we stepwise enlarge
which contains the recursion points. Definition 3 (Building the skeleton). Let tinit be the initial tree,
p be a skeleton built
over nR recursion points uri , i = 1; : : : ; nR , which leads to a prefix-generating RTG of tinit, Y = fy1 ; : : : ; ynR g the set of variables in p indicating the recursion points, Z the set of all other variables in p (initial values: p = ?, Y = ;, Z = ;). Let ur be a
further potential recursion point. Then the new skeleton is built by applying the function
skeleton(tinit ; ur ; p; Z; Y ), which is defined (in declarative style) in table 1.
D R A F T
December 11, 1998, 10:37am
D R A F T
14 Table 1. Algorithm 1 (Building the skeleton)
skeleton(t; ;?;Z; Y ) = skeleton(t; ;z; Z; Y ) = skeleton(t; i:u; ?;Z; Y ) =
skeleton(t; i:u; ts; Z; Y ) =
y; Z; Y [ fyg);y 62 Y (y; Z n fz g; Y [ fy g);y 62 Y (f (z1 ; :: : ;zi?1 ; t0 ;zi ; :: : ;zr ?1 ); Z 0 ; Y 0 ) with (f;r ) = node(t; ) zj pairwise different variables, zj 62 Z [ Y; j = 1;: : : ; r ? 1 Z 0 = Z [ fz1 ;: : : ; zr?1 g [ Z 00 (t0 ; Z 00 ; Y 0 ) = skeleton(t=i:; u; ?; Z; Y ) (f (t1 ; : :: ; t0i ;: : : ; tr );Z 0 ;Y 0 ) with ts = f (t1; :: : ;ti ; : :: ; tr ) if ti 2 Z then (t0i ;Z 0 ;Y 0 ) = skeleton(t=i:;u; ?; Z;Y ) if ti 62 Z then (t0i ;Z 0 ;Y 0 ) = skeleton(t=i:;u; ti ; Z; Y )
(
3.1.2. Building the subprogram body
We have now found a term p over and Y
[ Z,
such that the associated prefix-generating RTG generates a term t0init with t0init tinit, tinit = t0init(t1 =z1; : : : ; tn=zn ), ti 2 T , and for each r = 1; : : : ; nR (nR = kY k)
kfw j w 2 W(p; Y; tinit); w = v:rgk 2.
As a result of lemma 1, the body of the subprogram is the maximum prefix pmax over
2 T (X [ Y ) with pmax s 8s 2 S(p; Y; tinit) and :9 p0max > pmax with p0max s 8s 2 S(p; Y; tinit). all segments in S(p; Y; tinit): pmax
During the construction of the prefix pmax we preserve the variables in
Y.
Because of
our definition of the segmentation, pmax leads to a prefix generating RTG for tinit. 3.2. Building the CFTG The term pmax with nR recursion points marked by variables y1 ; : : : ; ynR is the hypothesis for the subprogram body. All the variables in
X = var(pmax ) n Y
must be (not
necessarily different) variables of the CFTG. This means in particular that all the terms ti
= pmax (t1=x1; : : : ; tnV =xnV ) for each s 2 S(p; Y; tinit) have to be instantiations of the variables. By ~vw we denote the instantiations of the variables ~ x in the segment w at occurrence w: ~v = (t1 ; : : : ; tnV ) with s(w) = pmax (t1 =x1; : : : ; tnV =xnV ) for each w 2 W(pmax ; Y; tinit). with s
D R A F T
December 11, 1998, 10:37am
D R A F T
15 We must ensure that all instantiations of an occurrence are used in a recurrent way in all subordinated occurrences:
Lemma 2 (Necessary conditions for variables) Let tinit be an initial tree, pmax
Y indicating the recursion points, var(pmax ) = X [ Y , X \ Y = ;, nV = kX k, nR = kY k, and ~vw the instantiation of the variables X at occurrence w 2 W(pmax ; Y; tinit). Usage of all instantiations: It must hold 8 i 2 f1; : : : ; nV g: 8 r 2 f1; : : : ; nRg 9 k 2 f1; : : : ; nV g and 9t 2 T (Z [ fxig) where t is a minimal prefix, such that 8 w:r 2 W(pmax ; Y; tinit) : t(viw =xi) vkw:r . be the inferred subprogram body with
Partial generating of all instantiations: The sets of partial substitutions PSubstkr
of variable xk in the r-th recursive call constructed in the following way are not empty: 1.
Set for each
Zkr = ; 2.
k 2 f1; : : : ; nV g and for each r 2 f1; : : : ; nR g PSubstkr = ;,
i 2 f1; : : : ; nV g and for each r 2 f1; : : : ; nRg all k for which a minimal prefix ps 2 T (Z [ fxig) exists (X [ Z [ Zkr = ; and all variables occur at only one position), with 8 w:r 2 W(pmax ; Y; tinit) : ps(viw =xi) vkw:r . Set PSubstkr := PSubstkr [ fpsg, Zkr := Zkr [ Z . Search for each
If one of these conditions does not hold, then no adequate CFTG can be built on the basis of the term pmax , i.e. there must be backtracking over the segmentation construction. (Proof: These are consequences of the restrictions that each variable has to occur in the subprogram body and has to be used in at least one substitution in a recursive call.)
We can now construct the final substitutions by means of the sets of partial substitutions. Therefore, a general unifier kr for each set of partial substitutions PSubstkr is built (these
unifiers exist, because all terms in PSubstkr are prefixes of the same term vkr ). Variables
k 2 f1; : : : ; nV g and for all r 2 f1; : : : ; nRg psi kr = psj kr for all psi ; psj 2 PSubstkr . are renamed only in variables from
X.
Formally for all
Lemma 3 (Reduction of variables) If some k1 contains a renaming fxi
xj g,
xi ; xj 2 X , then the variable xi in pmax can be replaced by xj . The set of variables can be reduced to X := X n fxig.
D R A F T
December 11, 1998, 10:37am
D R A F T
16 (Proof: 8w 2
( max
W p
; Y; t
init ) holds: viw
= jw (because of the construction of v
P Subst
kr ),
i.e. the instantiations of xi and xj are identical in all segments.)
= kX k and thereby the rank of G. The substitution kr of the variable xk in the r-th recursive call can be built by kr = fxk tskr g with tskr = pskr kr and pskr some term of PSubstkr . The constructed substitutions kr replace a variable by a term over some variables. Now we know the number of necessary variables nV
Finally, we have to check whether a substitution contains a subterm which is independent from the variables.
Lemma 4 (Incomplete substitutions) Let the right hand side tskr of a substitution
kr = fxk
tskr g contain some variables zi 2 Z . Let I be the index set for these variables (i.e. var(tskr ) = var(tskr ) \ X [ fzi ji 2 I g). If there exists for every zi a groundterm ti 2 T, such that 8w:r 2 W(pmax ; Y; tinit) holds vkw:r = tskr fx1 v1w ; : : : ; xnV vnwV gfyi ti ji 2 I g then the substitution kr can be extended to kr := kr fyi ti ji 2 I g. Otherwise no adequate CFTG can be built on the basis of the term pmax , i.e. there must be backtracking over the segmentation building.
Lemma 5 (Valuation) The valuation : X
~v = (v1 ; : : : ; vnV ):
! T is determined by the instantiations
(xi) = vi , i = 1; : : : ; nV .
(Proof: This is a consequence of the limitation that there is no distinction between valuation and substitution in the subprogram call in the main program.)
As result we have now the CFTG ChS;t0 i
= (fs; Gg; [f g; P; s) with ~x = (x1; : : : ; xnV )
and P
=f
s
! G(~x) j
( )! (
G ~ x
;
p ~ x;
(
()
nV (~x))=y1 ;
G ts11 ~ x ; : : : ; ts1
.. .
( nR 1 ( )
G ts
~ x ; : : : ; ts
nR nV (~x))=ynR ) j
and the valuation , such that tinit
g
2 L(ChS;t0 i ; ).
The described method guarantees
that this is the simplest possible recurrent hypothesis for tinit.
D R A F T
December 11, 1998, 10:37am
D R A F T
17 Table 2. Algorithm 2 (Inferring a CFTG)
To identify the subprogram body (i.e. the right-hand side of a recursive equation without parameter subsitutions) induced by a given initial tree, we have to perform the following steps (section 3.1): 1.
Identification of recursion points and thereby of a minimal recursive term (section 3.1.1);
2.
Extension of the minimal recursive term by constant parts of the hypothetical subprogram body (section 3.1.2).
The parts of the initial tree not considered yet have to be the instantiated variables together with their substitutions for each unfolding of the hypothetical recursive equation.
To construct substitutions we have to find regularities between the instantiations (section 3.2): 3.
Check that each instantiation is used in a regular way in each recursive call (on failure: backtrack to step 1 and start with a new hypothesis for the structure of the subprogram body);
4.
Check that each instantiation is partially representable by at least one of the previous instantiations in a regular way (lemma 2);
5.
Search for all regular relations between an instantiation and the previous instantiations and unify them to the final substitution rule;
6.
Reduction of the set of variables to such with different instantiations (lemma 3).
4. The Induction Algorithm The method described above implies an induction algorithm which we will describe and illustrate in this section.
4.1. An Algorithm for Inferring CFTGs If we summarize our method, we immediately obtain the outlines of an induction algorithm which we will present in an informal and abstract way (see table 2). For a realization of the algorithm we have to specify an order for enumerating the hypotheses for the subprogram body. While in Schmid and Wysotzki (1998) we presented
D R A F T
December 11, 1998, 10:37am
D R A F T
18 an ordering over structurally different classes of recursion, the enumeration presented here is quite straightforward: With each backtrack we enlarge the number of conditional expressions contained in the hypothetical subprogram body. We terminate with a failure if the subprogram body contains more than one third of the presented initial term (see definition 1, recurrence).
4.2. An Illustrative Example
In the following we demonstrate the application of our method as characterized by algorithm 2 (numbers correspond to the steps of the algorithm). We will consider the initial tree tinit
given in figure 1 as input. For this tree we obtain the signature = f0; s; p; eq0; mult; gg. g eq0
s
s
0
s s 0
g eq0 p s
mult
g
s
s
eq0
0
s
p
s s 0
0
mult
g
s
s
eq0
s
s
0
p
s
0
p s
mult
g
mult
s
eq0
mult
s
s
p
s
0
0
p
s s
eq0
mult
Ω
s
p
mult
s
0
p s
mult
s
mult
s
s s
0
s 0
s
0 s
s
s
0
0
s 0
g
p s s
s
s
0 s
0
s
0
0
s 0
Figure 1. Example for an initial tree
(1) We find the first and only potential recursion point ur1
= 3: and thereby the minimal
prefix p = g(z1 ; z2 ; y1). By means of the appropriate prefix generating RTG the following prefix of the initial tree t0init = g(z1 ; z2 ; g(z3 ; z4; g(: : : ) : : :) tinit can be derived.
D R A F T
December 11, 1998, 10:37am
D R A F T
19 (2) This results in the program body pmax
= g(eq0(x1 ); x2; y1) with the following instan-
tiations
w
v1
v2
s3 (0) p(s3 (0)) p(s3 (0)) p2(s3 (0)) p2(s3 (0)) p3(s3 (0))
1: 1:1: 1:1:1: 1:1:1:1: 1:1:1:1:1:
s(0) mult(s(0); s2 (0)) mult(s2 (0); s(0)) mult(mult(s2 (0); s(0)); s2(0)) mult(s2 (0); mult(s2 (0); s(0))) mult(mult(s2 (0); mult(s2 (0); s(0))); s2 (0))
(3) No regularity can be detected for the first instantiation (the relation found between
v1 and v11: is v11: = p(v1 ) but v11:1: = p(s3 (0)) 6= p2(s3 (0))).
So another potential
recursion point has to be found.
We backtrack to step one of the algorithm and (1’) find the new recursion point ur1
z4 ; y1)).
= 3:3: with the minimal prefix p = g(z1 ; z2; g(z3 ;
(2’) This results in the program body pmax
y1 )) with the following instantiations: w
v1
v2
= g(eq0(x1 ); x2; g(eq0(p(x3)); mult(x4 ; x5);
v3
v4
v5
s3 (0) s(0) s3 (0) s(0) s2 (0) 1: p(s3 (0)) mult(s2 (0); s(0)) p(s3 (0)) mult(s2 (0); s(0)) s2 (0) 1:1: p2(s3 (0)) mult(s2 (0); p2(s3 (0)) mult(s2 (0); s2 (0) mult(s2 (0); s(0))) mult(s2 (0); s(0))) (3’) We detect the following regularities between the instantiations viw and vkw:1 :
D R A F T
December 11, 1998, 10:37am
D R A F T
20
i
k
t
1 2 3 4 5
1 2 1 2 5
p(x1) mult(z1 ; x2) p(x3) mult(z1 ; x4) x5
(4, 5) We obtain the non-empty sets of partial substitutions (representations of vkw:1 by viw ) with the unifiers k1:
k PSubstk1
k 1
fp(x1); p(x3)g fx1 x3g fmult(x5; z1 ); mult(z2 ; x2); mult(z3 ; x4)g fz1 x2 ; z2 x5; z3 x5; x4 x2g fp(x1); p(x3)g fx1 x3g fmult(x5; z1 ); mult(z2 ; x2); mult(z3 ; x4)g fz1 x2 ; z2 x5; z3 x5; x4 x2g fx5g id (6) Because there are renamings fx1 x3g, fx4 x2g, the set of variables can be reduced to X = fx2; x3; x5g: pmax = g(eq0(x3 ); x2; g(eq0(p(x3)); mult(x2 ; x5); y1)) with the following substitutions: 21 = fx2 mult(x5 ; x2)g, 31 = fx3 p(x3)g, 51 = fx5 x5 g. The valuation is thereby : fx2 7! s(0); x3 7! s3 (0); x5 7! s2 (0)g, and finally we get the CFTG ChS;t0 i = (fs; Gg; [ f g; P; s) with 1 2 3 4 5
P = fs ! G(x2 ; x3; x5) j ; G(x2; x3; x5)! g(eq0(x3 ); x2; g(eq0(p(x3 )); mult(x2 ; x5); G(mult(x5; x2); p(x3); x5))) j g 5. Scope and Efficiency of the Method 5.1. Scope With the method described and illustrated above, the synthesis problem given in definition 1 can be solved within the limitations introduced in section 2.4. That is, we can deal with all structures which can be generated by a single recursive function of arbitrary complexity.
D R A F T
December 11, 1998, 10:37am
D R A F T
21 Our approach demonstrates that tail, linear, tree recursive and combined structures can be inferred from a given initial tree by means of a purely syntactic method. We believe that the main advantage of our method is the technique for determining variables and substitutions. All parts of the initial program which are not constant over the different hypothetical recursion points are regarded as variables. In a second stage of inference, regularities and interdependences are identified and thereby a hypothesis about the number of variables, their valuation and the substitution rule for the recursive call is obtained. In the following, we will discuss the limitations of our method. The numbering of the restrictions refers to the corresponding items in section 2.4. Our method can easily be extended to structures starting with a nonrecursive term (i.e. a more complex main program, see restriction 4) as was shown for our previous synthesis algorithm, presented in Schmid and Wysotzki (1998): When enumerating hypotheses for the subprogram body we have to allow for considering not the root but a subterm of the initial program as the starting point. The main program then consists of the constant initial part containing the call of the inferred recursive subprogram. The restriction that all variables appear in the body of the recursive subprogram and are substituted in the recursive call (restriction 2) guarantees that each variable occurs in every unfolding of the hypothetical RPS. Thus, when inferring the substitutions, we have only to check immediately succeeding unfoldings for regularities. This restriction can be relaxed by introducing a more complex search algorithm. Restriction 2 is the less problematic of two reasons why we cannot infer functions such as ackermann (see table 3). Restrictions 5 and 6 are inherent to our methodology: We want to fold RPSs from initial trees using their structural characteristics only. Without additional information about the head of the desired function, namely number, sequence and names of parameters (typically provided for ILP algorithms), it is not possible to distinguish variables which are simply “handed through” the recursive calls from constants (restriction 5). Restriction 5 is the reason why we cannot infer functions such as hanoi (see table 3). Separation of valuations of variables from substitutions (restriction 6) can also only be achieved with additional information. As an alternative to presenting the head of the desired function to the synthesis algorithm, we could provide information about the signature, so that all constant symbols occurring in the initial tree could be clearly identified.
D R A F T
December 11, 1998, 10:37am
D R A F T
22 Table 3. Examples for RPSs which are beyond the scope of our method
hanoi(n,a,b,c) = g(eq1(n), move(a,b), append(hanoi(pred(n),a,c,b), move(a,b), hanoi(pred(n),c,b,a))) ack(x,y) = g(eq0(x), succ(y), g(eq0(y), ack(pred(x),1), ack(pred(x), ack(x,pred(y))))) max(l) = max1(l,0) max1(l,m) = g(empty(l), m, g(gt(head(l),m), max1(tail(l), head(l)), max1(tail(l), m))) butmax(l) = g(empty(l), nil, g(eq(head(l), max(l)), butmax(tail(l)), cons(head(l), butmax(tail(l)))))
The most challenging restrictions to be overcome are that we allow for only one recursive subprogram (restriction 1) and that we do not allow recursive calls in a substitution (restriction 3). We currently see no (simple) solution for restriction 3. This is the main reason why we cannot infer functions such as ackermann (see table 3). Currently we are working on an extension of our method to detect sets of recursive functions (as for example butmax and max in table 3). A problem which is probably beyond the scope of our method is the detection of indirect recursions (i.e. recursive functions which mutually call themselves). The restrictions discussed are related to the problem of detecting a recurrence relationship in a term. This problem corresponds to program synthesis from multiple traces or programming by demonstration. It addresses the second step of program synthesis from input/output examples as discussed in section 1. Of course, the success of our method depends on the quality of the initial program which is provided as input to the synthesis algorithm. This first step of program synthesis from examples – construction of initial programs – is not the topic of this paper. But, of course, it is crucial for the design of synthesis algorithms. As already discussed in section 1, we believe that for transforming examples
D R A F T
December 11, 1998, 10:37am
D R A F T
23 into straightforward programs, and thereby into initial programs, a different methodology is needed. We presented preliminary results for constructing initial programs in Schmid and Wysotzki (1998) (see also appendix), but clearly there still remain many open problems. 5.2. Efficiency Our induction algorithm is a special kind of pattern-matcher for detecting regularities in labeled trees. The number of cycles for constructing a hypothesis is bounded by the size of the initial tree. Because we allow not only linear6 but also tree recursive structures the worst case effort is exponential. The possible number of recursive calls ( nR ) in the subprogram is bounded by the number of occurrences of (n ) in the initial tree (see section 3.1.1): nR = n ?1 + 1 where i is
i
the the number of segments. That is, i has to be a whole-numbered divisor. Because in the worst case n ? 1 is equivalent to some 2x , the upper bound for the number of possible
numbers of recursive calls is restricted by jnRj log2 (n ? 1). The recurrence condition (see definition 1) requires for each recursion point at least three segments, i.e.
p and therefore nR n .
i 3nR ,
The number of valid hypotheses for the skeleton (ns; see definition 3) is bounded by the
maximum number d of occurrences of the root label along a path in the initial tree. Each
path from the root to an can be partitioned in a maximum of log2 d ways; therefore, p ns (log2 d) n . This is also an upper bound on searching substitutions. Because we allow for interdependent substitutions, the effort of checking all possible hypothesis is quadratic (see section 3.2). Of course, our goal is to obtain initial trees with the minimum size for which the construction of a plausible folding hypothesis is possible as input. To detect the correct CFTG for an (already known) RPS, it is always enough to analyze its third unfolding. 6. Conclusion In this paper we have presented an application of grammatical inference in the domain of inductive program synthesis. The class of languages to be inferred by our learning algorithm is the (somewhat restricted) set of recursive program schemes. Each recursive program scheme defines a specific context-free grammar. Input to the learning algorithm is
D R A F T
December 11, 1998, 10:37am
D R A F T
24 a single noise-free positive example – namely an initial tree which is generated by unfolding some unknown recursive program scheme or, in our framework, by applying the rules of some unknown CFTG. We believe that the main benefit of reformulating program synthesis as a grammar inference problem is that the problem becomes more transparent. The synthesis problem can be defined precisely as the class of languages which should be inferrable by a learning algorithm. The same is true for the inference method/algorithm itself: because the formal framework does not allow for vague formulations, the benefits and restrictions of proposed algorithms can be clearly seen by the authors themselves and by the scientific community. Thus, grammatical inference also provides the means for a critical comparison of algorithms. The transparency of the problem formulations also aids in the development of new ideas for the design of algorithms. We had this experience when reformulating our synthesis method as presented in (Schmid & Wysotzki, 1998). The original algorithm presented there was not able to deal with interdependent substitutions (the same is true for all program synthesis algorithms, as far as we know). The strategy introduced for detecting a wider range of regularities over instantiations, and thereby for inferring quite complex substitutions, followed naturally when we framed our synthesis problem as a grammatical inference problem. We hope to have shown that researchers interested in developing algorithms for program synthesis can profit from the methods and results produced in grammar inference research. However, while we are able to apply grammar inference methodology in our domain, we have to rely on grammar inference researchers to address the problems and produce the theoretical results relevant for our work. Looking at current research in grammar inference (as, for example, reported at ALT-98 (Richter et al., 1998) and ICGI-98 (Slutzki & Honavar, 1998)), there is a lack of theoretical work addressing learnability of context-free tree grammars. Applications addressing program synthesis problems are mostly restricted to a narrow ILP context dealing with non-recursive or a restricted set of only linear recursive clauses which can be represented by regular grammars. There are some results for subclasses of context-free grammars (Sakakibara, 1997) but these subclasses do not cover recursive program schemes. Thus, we hope also to have awakened the interest of grammar
D R A F T
December 11, 1998, 10:37am
D R A F T
25 inference researchers in problems related to the inference of context-free grammars, and to have provided an interesting area for application. Acknowledgments A short version of this paper was presented at the Annual Meeting of the German Machine Learning Group (FGML-98) and at the ALT-98 Satellite-Workshop on Applied Learning Theory. The paper was written while the first author was on leave at the School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, supported by a DFG scholarship. We thank Steffen Lange for helpful discussions and for comments on an earlier draft of this paper, Steffen Lange and Thomas Zeugmann for a critical review of this manuscript, and Dayne Freitag, Marsal Gavald`a, and Chris Hogan for proofreading. Appendix An overview of the IPAL system Our system IPAL (Inductive Program Synthesis, Problem Solving and Analogical Learning) is implemented as an initial prototype in LISP (see Schmid & Wysotzki, 1998). IPAL is composed of three main modules: (1) a planning system for calculating straightforward operator sequences which transform arbitrary inputs (“problem states”) into a desired output (“goal state”); (2) a synthesis system for folding straightforward transformations into recursive programs (as described in this paper); and (3) an analogy system which can be used to sidetrack program synthesis by re-instantiation or adaptation of already synthetisized programs (see Schmid, Mercy, & Wysotzki, 1998). We are developing IPAL to address two goals: Firstly, we want to contribute to research in inductive program synthesis – providing new synthesis algorithms and investigating how the use of AI methods (planning and analogical reasoning) can improve performance. Secondly, we believe that IPAL provides a model of human learning by problem solving (in program construction and other domains; see Schmid & Wysotzki, to appear) and we argue that program synthesis is a suitable technique for modeling the acquisition of macro-operators (Cheng & Carbonell, 1986; Shell & Carbonell, 1989). In this appendix we will give a short sketch of our planning module: We implemented a non-linear backward planner which constructs a minimal spanning tree representing the
D R A F T
December 11, 1998, 10:37am
D R A F T
26 shortest operator sequences to transform each possible state of a given problem into a desired goal. Problem states (and the goal) are represented as conjunctions of literals; operators are defined as usual by preconditions and effects (add and delete lists). Our planner works in a “model-based” way; that is, we do not use axioms to check the admissibility of intermediate states but perform a look-up in the exhaustive set of problem states. Furthermore, we can regard rewrite rules for reformulating constants in a constructive way. The resulting planning tree represents an initial program as described above.
We have applied the planner in typical problem solving domains (such as blocksworld or puzzles) and can generalize recursive macro operators by means of the technique represented in this paper. The planner can also be applied to more usual list or arithmetic problems: If we want, for example, to infer an algorithm for sorting lists, we would present our planner with all possible sequences of three constant numbers (represented as conjunctions of literals), as for example ( (directly-before pos1 pos2) (directly-before pos2 pos3) (is-content pos1 3) (is-content pos2 1) (is-content pos3 2)) and with a rule for swapping list positions.
Starting with the goal state (((is-content pos1 1) (is-content pos2 2) (is-content pos3 3))) we would search for all instantiated operators which can transform a state of the problem domain directly into the goal. Then the planner recursively proceeds with all predecessors. States already covered by the planner are not considered at deeper levels of the planning tree. In this way we guarantee that for each state the shortest operator sequence is constructed. Afterwards, we apply pre-specified rewrite rules to gain a constructive representation of data types, for example: equal(head(l),max(l)) instead of (is-content pos1 3).
Our work on the planner is still at an early stage: the methodology for problem solving in list and arithmetic domains is not fully developed, and we cannot yet guarantee that the constructed plan trees meet all criteria needed for the “perfect” initial programs which are input to our synthesis system. We hope that we can report progress soon, extending our method from program synthesis from multiple traces to program synthesis from input/output examples.
D R A F T
December 11, 1998, 10:37am
D R A F T
27 Notes 1. To our knowledge, the only work addressing inductive program synthesis in the framework of grammar inference was conducted by a group of Baltic computer scientists, see Br¯azma (1991) and B¯arzdins and Bjørner (1991) for an overview. 2. cf. inverse resolution in ILP 3. The whole class of context-free grammars cannot be inferred efficiently (Angluin, 1990, 1991). One strategy to make inference more feasible is to provide additional information on the structure of the unknown grammar (cf. Sakakibara, 1997). 4. For all lemmas we only present the general idea of the proofs. 5. cf. the definition of “skeleton” by Sakakibara (1992) 6. Takada (1988) has shown that the problem of identifying the class of “even linear grammars” can be reduced to the problem of identifying the class of finite automata and thereby is inferrable in polynomial time. This
! uBv j w with A, B as nonterminals and kuk = kvk;w as terminals (cf. Sakakibara, 1997). But with the restriction kuk = kvk this subclass is subclass of context-free grammars allows only rules of the form A
less expressive than the class of linear recursive program schemes.
References Aha, D., Ling, C., Matwin, S., & Lapointe, S. (1993). Learning singly-recursive relations from small datasets. In F. Bergadano, L. De Raedt, S. Matwin, & S. Muggleton (Eds.), IJCAI93WS (p. 47-58). MK. Angluin, D. (1980). Finding patterns common to a set of strings. Journal of Computer and System Sciences, 21, 46-62. Angluin, D. (1990). Negative results for equivalence queries. Machine Learning, 5, 121-150. Angluin, D. (1991). When won’t membership queries help? In Proc. of the 23rd annual ACM Symposium on theory of computing (p. 444-454). ACM Press. Angluin, D., & Smith, C. H. (1983). Inductive inference: theory and methods. Computing Surveys, 15(3), 237-269.
D R A F T
December 11, 1998, 10:37am
D R A F T
28 Arimura, H. (1997). Learning acyclic first-order Horn sentences from entailment. In M. Li & A. Maruoka (Eds.), Proceedings of the 8th International Workshop on Algorithmic Learning Theory (ALT-97) (Vol. 1316, pp. 432–445). Berlin: Springer. B¯arzdins, J., & Bjørner, D. (Eds.). (1991). Baltic computer science - selected papers. Springer. Bostr¨om, H. (1996). Theory-guided induction of logic programs by inference of regular languages. In L. Saitta (Ed.), Proceedings of the 13th International Conference on Machine Learning (pp. 46–53). Morgan Kaufmann. Br¯azma, A. (1991). Inductive synthesis of dot expressions. In Baltic computer science (Vol. 502, p. 156-212). Springer. Cheng, P., & Carbonell, J. (1986). The FERMI system: Inducing iterative macro-operators from experience. In T. Kehler & S. Rosenschein (Eds.), Proceedings of the 5th National Conference on Artificial Intelligence (Vol. 1, pp. 490–495). Los Altos, CA, USA: Morgan Kaufmann. Cohen, W. (1998). Hardness results for learning first-order representations and programming by demonstration. Machine Learning, 30, 57-97. Cohen, W. W. (1995). Pac-learning recursive logic programs: efficient algorithms. Journal of Artificial Intelligence Research, 2, 501-539. Courcelle, B., & Nivat, M. (1978). The algebraic semantics of recursive program schemes. In Winkowski (Ed.), Math. foundations of computer science (Vol. 64, p. 16-30). Springer. Flener, P., & Yilmaz, S. (to appear). Inductive synthesis of recursive logic programs: Achievements and prospects. Gold, E. (1967). Language identification in the limit. Information and Control, 10, 447-474. Guessarian, I. (1979). Program transformation and algebraic semantics. Theoretical Computer Science, 9, 39-65.
D R A F T
December 11, 1998, 10:37am
D R A F T
29 Guessarian, I. (1992). Trees and algebraic semantics. In M. Nivat & A. Podelski (Eds.), Tree automata and languages (Vol. 10, pp. 291–310). Elsevier. Hofbauer, D., Huber, M., & Kucherov, G. (1994). Some results on top-context-free tree languages. In Proc. 19th CAAP (Vol. LNCS 787, p. 157-171). Springer. Idestam-Almquist, P. (1995). Efficient induction of recursive definitions by structural analysis of saturations. In L. De Raedt (Ed.), Proceedings of the 5th International Workshop on Inductive Logic Programming (ILP-95) (p. 77-94). Dept. of Computer Science, Leuven. Jouannaud, J. P., & Kodratoff, Y. (1979). Characterization of a class of functions synthesized from examples by a Summers like method using a ‘B.M.W.’ matching technique. In 6th ijcai (pp. 440–447). Morgan Kaufman. Kodratoff, Y., & J.Fargues. (1978). A sane algorithm for the synthesis of LISP functions from example problems: The Boyer and Moore algorithm. In Proc. of the AISB/GI Conference on Artificial Intelligence (p. 169-175). Hambourg. Lange, S. (1986). A program synthesis algorithm exemplified. In Proc. MMSSSS’85 (Vol. 215, p. 185-194). Springer. Lange, S. (1987). A decidability problem of Church-Rosser specifications for program synthesis. In K. Jantke (Ed.), 1st International Workshop on Analogical and Inductive Inference (AII’86) (p. 105-124). Springer. Le Blanc, G. (1994). BMWk revisited: Generalization and formalization of an algorithm for detecting recursive relations in term sequences. In F. Bergadano & L. de Raedt (Eds.), Machine Learning, Proc. of ECML-94 (p. 183-197). Springer. Lu, H. R., & Fu, K. S. (1984). A general approach to inference of context-free programmed grammars. IEEE Transactions on Systems, Man, and Cybernetics, 14, 191–202. M¨akinen, E. (1992). On the structural grammatical inference problem for some classes of context-free grammars. Information Processing Letters, 42, 193-199. Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, Special Issue on 10 Years of Logic Programming, 19-20, 629-679.
D R A F T
December 11, 1998, 10:37am
D R A F T
30 M¨uhlpfordt, M., & Schmid, U. (1998). Synthesis of recursive functions with interdependent parameters. In Proc. of the Annual Meeting of the GI Machine Learning Group, FGML-98 (pp. 132–139). TU Berlin. Osherson, D. N., Stob, M., & Weinstein, S. (1986). Systems that learn. The MIT Press. Rao, M. R. K. K. (1996). A class of Prolog programs inferable from positive data. In S. Arikawa & A. K. Sharma (Eds.), Algorithmic Learning Theory, Proc. of the 7th International Workshop, ALT ’96, Sydney, Australia (p. 273-284). Berlin: Springer. Rao, M. R. K. K. (1997). A framework for incremental learning of logic programs. Theoretical Computer Science, 185(1), 193-213. Reischuk, R., & Zeugmann, T. (1998). Learning one-variable pattern languages in linear average time. In Proc. 11th Annual Conference on Computational Learning Theory - COLT’98, July 24th - 26th, Madison (p. 198 - 208). ACM Press. Richter, M., Smith, C., Wiehagen, R., & Zeugmann, T. (Eds.). (1998). Proceedings of the 9th International Conference on Algorithmic Learning Theory (ALT-98). Springer. Sakakibara, Y. (1990). Learning context-free grammars from structural data in polynomial time. Theoretical Computer Science, 76, 223-242. Sakakibara, Y. (1992). Efficient learning of context-free grammars from positive structural examples. Information and Computation, 97, 23-60. Sakakibara, Y. (1997). Recent advances of grammatical inference. Theoretical Computer Science, 185, 15-45. Schmid, U., Mercy, R., & Wysotzki, F. (1998). Programming by analogy: Retrieval, mapping, adaptation and generalization of recursive program schemes. In Proc. of the Annual Meeting of the GI Machine Learning Group, FGML-98 (pp. 140–147). TU Berlin. Schmid, U., & Wysotzki, F. (1998). Induction of recursive program schemes. In Proceedings of the 10th European Conference on Machine Learning (ECML-98) (p. 228-240). Springer.
D R A F T
December 11, 1998, 10:37am
D R A F T
31 Schmid, U., & Wysotzki, F. (to appear). Skill acquisition can be regarded as program synthesis: An integrative approach to learning by doing and learning by analogy. In Mind modelling - a cognitive science approach to reasoning, learning and discovery. Lengerich: Pabst Science Publishers. Sharma, A. (1998). What can inductive inference do for ilp? In M. Richter, C. Smith, R. Wiehagen, & T. Zeugmann (Eds.), Proceedings of the 9th International Conference on Algorithmic Learning Theory (ALT-98) (Vol. 1501, p. 336-374). Springer. Shell, P., & Carbonell, J. (1989). Towards a general framework for composing disjunctive and iterative macro-operators. In 11th IJCAI-89. Detroit, MI. Slutzki, G., & Honavar, V. (Eds.). (1998). Fourth International Colloquium on Grammatical Inference (ICGI-98). Springer. Summers, P. D. (1977). A methodology for LISP program construction from examples. Journal ACM, 24(1), 162-175. Takada, Y. (1988). Grammatical inference for even linear languages based on control sets. Information Processing Letters, 28, 193-199. Wysotzki, F. (1983). Representation and induction of infinite concepts and recursive action sequences. In Proceedings of the 8th IJCAI. Karlsruhe. Yamamoto, A. (1993). Generalized unification as background knowledge in learning logic programs. In E. T. K.P. Jantke, S. Kobayashi & T. Yokomori (Eds.), Proceedings of the 4th International Workshop on Algorithmic Learning Theory (ALT ’93) (Vol. 744, pp. 111–122). Tokyo, Japan: Springer. Yamamoto, A. (1997). Which hypotheses can be found with inverse entailment?
In
N. Lavraˇc & S. Dˇzeroski (Eds.), Proceedings of the 7th International Workshop on Inductive Logic Programming (Vol. 1297, pp. 296–308). Berlin: Springer. Zeugmann, T. (Ed.). (1997). Theoretical Computer Science – Special Issue on Algorithmic Learning Theory for ALT’95 (No. 185 (1)). Zeugmann, T., & Lange, S. (1995). A guided tour across the boundaries of learning recursive languages. Lecture Notes in Computer Science, 961, 193–262.
D R A F T
December 11, 1998, 10:37am
D R A F T