exists an efficient algorithm to identify them from positive samples of structural ... examples of a reversible context-free grammar for the target language are available to the learning algorithm, the full class of context-free languages can be learned .... A tree over V is a mapping t from Dom, into V where the domain Dom,.
INFORMATION
AND
COMPUTATION
97, 23-60 (1992)
Efficient Learning of Context-Free Grammars from Positive Structural Examples* Y~su~um
SAKAKIBARA
International Institute for Advanced Study of Social Information Science (IIAS-SIS), Fujitsu Limited, 140, Miyamoto, Numazu, Shizuoka, 410-03 Japan
In this paper, we introduce a new normal form for context-free grammars, called reversible context-free grammars, for the problem of learning context-free grammars from positive-only examples. A context-free grammar G = (N, Z, P, S) is said to be reversible if (1) A + G(and B -+ a in P implies A = B and (2) A -+ a@ and A --f aCfl in P implies B = C. We show that the class of reversible context-free grammars can be identified in the limit from positive samples of structural descriptions and there exists an efficient algorithm to identify them from positive samples of structural descriptions, where a structural description of a context-free grammar is an unlabelled derivation tree of the grammar. This implies that if positive structural examples of a reversible context-free grammar for the target language are available to the learning algorithm, the full class of context-free languages can be learned efftciently from positive samples. 0 1992 Academic Press, Inc.
1. INTRODUCTION We consider the problem of learning context-free languages from positive-only examples. The problem of learning a “correct” grammar for the unknown language from finite examples of the language is known as the grammatical inference problem. An important aspect of grammatical inference is its computational cost. Recently many researchers, including Angluin (1987a, 1987b), Berman and Roos (1987), Haussler et al. (1988), Ibarra and Jiang (1988), Sakakibara (1988), and Valiant (1984), have turned their attention to the computational analysis of learning algorithms. One criterion of the efficiency of a learning algorithm is whether its running time can be bounded by a polynomial in the relevant parameters. In the search for polynomial-time learning algorithms for learning context-free grammars, Sakakibara (1988) has considered the problem of learning context-free grammars from their structural descriptions. A structural description of a context-free grammar is an unlabelled derivation tree of the grammar, that is, a derivation tree whose internal nodes have no labels. * A preliminary version of the paper was presented at FGCS’88, ICOT, Tokyo, Japan.
23 0890-54O1/92 $3.00 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.
24
YASUBUMI
SAKAKIBARA
Thus this problem setting assumes that information on the structure of the unknown grammar is available to the learning algorithm, which is also necessary to identify a grammar having the intended structure, that is, structurally equivalent to the unknown grammar. We showed an efficient algorithm to learn the full class of context-free grammars using two types of queries, structural membership and structural equivalence queries, in a teacher and learner paradigm which was introduced by Angluin (1988b) to model a learning situation in which a teacher is available to answer some queries about the material to be learned. In Gold’s criterion of identification in the limit for successful learning of a formal language, Gold (1967) showed that there is a fundamental, important difference in what could be learned from positive versus complete samples. A positive sample presents all and only strings of the unknown language to the learning algorithm, while a complete sample presents all strings, each classified as to whether it belongs to the unknown language. Learning from positive samples is strictly weaker than learning from complete samples. Intuitively, an inherent difficulty in trying to learn from positive rather than complete samples depends on the problem of “overgeneralization.” Gold showed that any class of languages containing all the finite languages and at least one infinite language cannot be identified in the limit from positive samples. According to this theoretical result, the class of context-free languages (even the class of regular sets) cannot be learned from positive samples. These facts seem to show that learning from positive samples is too weak to find practical and interesting applications. However, it may be true that learning from positive samples is very useful and important for a practical use of grammatical inference because it is very hard for the user to present and understand complete samples which force him to have a complete knowledge of the unknown (target) grammar. In this paper, to overcome this essential difficulty, of learning from positive samples, we again consider learning from structural descriptions,
the
FIG.
1.
big
A structural
dog
description
chases
for “the
a
Young
girl
big dog chases a young
girl.”
LEARNING CONTEXT-FREE GRAMMARS
25
that is, we assume example presentations in the form of structural descriptions. The problem is to learn context-free grammars from positive samples of their structural descriptions, that is, all and only structural descriptions of the unknown grammar. We show that there is a class of context-free grammars, called reversible context-free grammars, which can be identified from positive samples of their structural descriptions. We also show that the reversible context-free grammar is a normal form for context-free grammars, that is, reversible context-free grammars can generate all of the context-free languages. We present a polynomial-time algorithm which identifies them in the limit from positive samples of their structural descriptions by extending the efficient algorithm of Angluin (1982) which identifies finite automata from positive samples to obtain one for tree automata. This implies that if positive structural examples of a reversible context-free grammar for the target language are available to the learning algorithm, the full class of context-free languages can be learned efficiently from positive samples. We also demonstrate several examples to show the learning process of our learning algorithm and to emphasize how successfully and efficiently our learning algorithm identifies primary examples of grammars given in previous papers for the grammatical inference problem.
2. BASIC DEFINITIONS
Let N be the set of positive integers and N* be the free monoid generated by N. For y, x E N*, we write y d x if and only if there is a ZEN* such that x=y-z, and y 1. One of necessary and sufficient conditions for correct identification from positive samples is the following.
LEARNING CONTEXT-FREE GRAMMARS
35
Condition 1. An indexed family of nonempty recursive languages L,, L,, L,, ..a, satisfies Condition 1 if and only if there exists an effective procedure which on any input i > 1 enumerates a set of strings Tj such that 1. Ti is finite, 2. Ti c Li, and 3. for all j 3 1, if Ti c Lj then Lj is not a proper subset of Li. This condition requires that for ,every language L,, there exists a “telltale” finite subset Ti of L, such that no language of the family that also contains Ti is a proper subset of L,. Angluin proved that an indexed family of nonempty recursive languages is learnable from positive samples if and only if it satisfies Condition 1. These characterizations and results can be easily applied to the problem of learning tree automata, and hence to the problem of structural identification of context-free grammars because Angluin’s results assume only the enumerability and recursiveness of a class of languages. 5. REVERSIBLE CONTEXT-FREE
GRAMMARS
DEFINITION. A skeletal tree automaton A = (Q, Sk u Z, 6, F) is resetfree if and only if for no two distinct states q1 and q2 in Q do there exist a symbol B E Skk, a state q3 E Q, an integer iE F+J(1 $ i d k), and k - l-tuple ul, .... ukPI E Q u C such that 6,(0, ul, .... uimI, ql, ui, .... ukbl) =q3 = ). The skeletal tree automaton is said to be 6k(a~ ul~ -.a) ui-1, q2, ui, ...Y uk-1 reversible if and only if it is deterministic, has at most one final state, and is reset-free.
The idea of the reversible skeletal tree automatqn comes from the “reversible automaton” and the “reversible languages” of Angluin (1982). Basically, the reversible skeletal tree automaton is the extension of the “zero-reversible automaton.” Remark 3. If A is a reversible skeletal tree automaton and A’ is any tree subautomaton of A, then A’ is a reversible skeletal tree automaton. LEMMA 4. Let A = (Q, Sk u 2,6, { qf} ) be a reversible skeletal tree automaton. For t~(Sku.E)~andu,,u,~(Sku~)T, ifAacceptsboth t#ul and t#u,, then 6(u,) = 6(u,).
ProoJ
We prove it by induction
on the depth of the node labelled $ in
t. Suppose first that t = $. Since A has only one final state qf, 6(u,) = d(t#u,) = qf= s(t#u,) = 6(u,). Next suppose that the result holds for all t E (Sk u C): in which the depth of the node labelled $ is at most h.
36
YASUBUMI
SAKAKIBARA
Let t be an element of (Sk u Z): in which the depth of the node labelled $ is h + 1, so that t = t’# ~(si, .... si- i, $, si, .... sk- i) for some si, .... s+ r E (Sk u Z)T, i E N and t’ E (Sk ti .Z)c in which the depth of the node labelled $ish. IfA accepts both t#u, = t’#o(sl, .... si-r, ui, si, .... skP1) and t#u,= t’#o(Sl, .... si-1, ~2, siy .... ~k-1)~ then ~(o(s,, .... si-1, ~1, si, .... s,z-1))~ 6(o(sl, .*-Tsj- 1, u2, sj, ...) Sk- 1)) by the induction hypothesis. So ak(c,
d(sl
1, ...Y d(si-l),
d(ul),
d(si),
**.Y d(sk-1))
Since A is reset-free, 6(u,) = 6(u,), which completes the induction proof of Lemma 4.
and the Q.E.D.
DEFINITION. A context-free grammar G = (N, Z, P, 5’) is said to be invertible if and only if A + c1and B + CIin P implies A = B.
The motivation for studying invertible grammars comes from the theory of bottom-up parsing. Bottom-up parsing consists of (1) successively finding phrases and (2) reducing them to their parents. In a certain sense, each half of this process can be made simple but only at the expense of the other. Invertible grammars allow reduction decisions to be made simply. Invertible grammars have unique right-hand sides of the productions so that the reduction phase of parsing becomes a matter of table lookup. The invertible grammar is a normal form for context-free grammars. Gray and Harrison (1972) proved that for any context-free language L, there is an invertible grammar G such that L(G) = L. PROPOSITION 5 (Gray and Harrison, 1972). For each context-free grammar G there is an invertible context-free grammar G’ so that L(G’) = L(G). Moreover, if G is E-free then so is G’.
Note that this result is essentially the same one as the determinization of a frontier-to-root tree automaton, and suffers the same exponential blowup in the number of nonterminals in the grammer. It however preserves structural equivalence. (This needs a slight modification of the definition for context-free grammars. See also McNaughton (1967).) DEFINITION. A context-free grammar G = (IV, Z, P, S) is reset-free if and only if for any two nonterminals B, C and a, /? E (Nu Z)*, A --f EBB and A + CYC~in P implies B = C. DEFINITION. A context-free grammar G is said to be reversible if and only if G is invertible and reset-free. A context-free language L is defined
37
LEARNING CONTEXT-FREE GRAMMARS
to be reversibze if and only if there exists a reversible context-free grammar G such that L = L(G). EXAMPLE. The following is a reversible context-free grammar subset of the syntax for the programming language Pascal.
for a
Statement --f Ident : = Expression Statement -+ while Condition do Statement Statement + if Condition then Statement Statement + begin Statementlist end Statementlist -+ Statement; Statementlist Statementlist -+ Statement Condition -+ Expression > Expression Expression --f Term + Expression Expression -+ Term Term -+ Factor Term -+ Factor x Term Factor -+ Ident Factor + (Expression). Even if the above grammar contains the production “Expression -+ Term - Expression” or “Term --f Factor/Term,” it is still reversible. However, if it contains the production “Factor -+ Number” or “Factor -+ Function,” it is no longer reversible. DEFINITION. Let A = ((2, Sk u Z, 6, (qf)) be a reversible skeletal tree automaton for a skeletal set. The corresponding context-free grammar G’(A) = (N, A’, P, S) is defined as follows:
N=Q, s=qy, P= (6&,x, ,..., xk)+xl...xk 1 oESkk,xl and 6,(a, x1, .... xk) is defined}. By the definitions
,..., x,EQuZ
of A(G) and G’(A), we can conclude the following.
PROPOSITION 6. If G is a reversible context-free grammar, then A(G) is a reversible skeletal tree automaton such that T(A(G)) = k(D(G)). Conversely if A is a reversible skeletal tree automaton, then G’(A) is a reversible context-free grammar such that K(D(G’(A))) = T(A).
Therefore the problem of structural identification of reversible contextfree grammars is reduced to the problem of identification of reversible skeletal tree automata.
38
YASUBUMI
SAKAKIBARA
Next we show some important theorems about the normal form property of reversible context-free grammars. We give two transformations of a context-free grammar into an equivalent reversible context-free grammar. The first transformation adds a number of copies of a nonterminal that derives only E to the right-hand side of each production to make each production unique. THEOREM
context-free
7. For any context-free language L, there is a reversible grammar G such that L(G) = L.
Prooj First we assume that L does not contain the empty string. Let G’ = (N’, C, P’, S’ ) be an s-free context-free grammar in Chomsky normal form (see Hopcroft and Ullman (1979) for the definition of Chomsky normal form) such that L(G’) = L. Index the productions in P’ by the integers 1,2, .... (P’ / . Let the index of A + o[ E P’ be denoted I (A + IX). Let R be a new nonterminal symbol not in N’ and construct G = (N, Z:, P, s’) as follows: N= N’u {R},
and
P={A+aRij;4+a~P’
i=I(A-+a))
u {R+E}.
Clearly G is reversible and L(G) = L. If EE L, let L’ = L - {E > and G’ = (N, ,Y, P, S’ ) be the reversible contextfree grammar constructed in the above way for L’. Then G = (N u (S}, Q.E.D. Z, Pu {S-t s’, S+ RR), S) is reversible and L(G) = L. The trivialization occurs in the previous proof because s-productions are used to encode the index of the production We prefer to allow s-productions only if absolutely necessary and prefer s-free reversible context-free grammars if possible because s-free grammars are important in practical applications such as efficient parsing. Unfortunately there are context-free languages for which there do not exist any s-free reversible context-free grammars. An example of such a language is (aili>
l} u (b’lj>
1) u {c}.
However, if a context-free language does not contain the empty string and any terminal string of length one, then there is an s-free reversible contextfree grammar which generates the language. The second transformation achieves this result by means of chain rules with new nonterminals. THEOREM 8. Let L be any context-free language in which all strings are of length at least two. Then there is an E-free reversible context-free grammar G such that L(G) = L.
LEARNING CONTEXT-FREEGRAMMARS
39
ProoJ: We construct the reversible context-free grammar G= (N, C, P, S) in the following steps. First by the proof of Proposition 5 of Gray and Harrison (1972), there is an invertible context-free grammar G’= (N’, C, P’, s’) such that L(G’) = L and each production in P’ is of the form 1. A--+BCwith A,B,CeN’--{S’} or 2. A-+a with AEN’(S> and aE:C or 3. S’+ A with AEN’(S’}. Since all strings in L are of length at least two, P’ has no production of the form A-+a for AEN’-{S’) and aeZ such that S’--+AeP’. Next for all productions in P’, we make them reset-free while preserving invertibility. P is defined as follows: 1. For each AEN’-
{S’}, let {A -+ a,, A -+ tx2, .... A -+ a,}
be the set of all productions set of productions (A -+a,, A -+X,,,
in P’ whose left-hand side is A. P contains the X,,
4~2,
X,, -+X/,2, ...> &,-1-,~,),
where Xa,, X,,, ... . XA,-, are new distinct nonterminal symbols. 2. For each production A -+ BC E P’ such that s’ -+ A E P’, let I contain the production of the form S -+ BY,, where Y, is a new nonterminal symbol. Let us denote for the set I, I= {s-q,,
s-p,,
...) S-Q,).
P contains the set of productions
where Xs,, X,,, .... X,+, are new distinct nonterminal 3. P contains the set of productions
symbols. { Y, -+ C 1CE N’ - (s’ >}.
Let G= (N, C, P, S), where N= (N’- {s’})u (XA1, XA2, .... XA,-, 1AE N’-{~‘~}u{Y,I~~N’-{~‘)}u{~s,,~~~,...,~,~~,>u(~>. NOW we begin the proof that G is reversible, c-free, and L(G) = L(G’ ). CLAIM
1. G is reversible.
Proof. Since G’ is invertible, each production of the form A + BC, A-+a, X.++BBC or Xa,-+a for A,B, CEN’ and aEC in P has a unique right-hand side by the construction 1 of P, and each production of the form
40
YASUBUMI SAKAKIBARA
S + BY, or X, + BY, in P also has a unique right-hand side by the construction 2 of P. By the constructions 1, 2, and 3 of P, each production of the form A --, B for A, BEN in P has a unique right-hand side. Hence G is invertible. For each A EN, by the constructions 1, 2, and 3 of P, there are at most two productions whose left-hand side is A in P. Furthermore they have different forms, that is, A -+ BC or A + a and A + B, where A, B, CE N and a E ,Z. Hence G is reset-free. Therefore G is reversible. CLAIM
ProofI implies A implies S S % w in CLAIM
2. L(G’)sL(G). By the construction 1 of P, for each A E N’ - (S’ }, A -+ CIin G’ &. a in G. By the construction 2 and 3 of P, s’ 3 A * BC in G’ 3 BY, j BC in G. Hence for each w EC*, S’ 4 w in G’ implies G. 3. L(G’)zL(G).
ProoJ: First we prove by induction on the length of a derivation in G that for each AEN’and each WEZ*, A 4 w or XAj Z$ w in G implies A &. w in G’. Suppose first that A=> w or XA,* w in G. Then A + w or XAZ+ w is in P. By the construction 1 of P, A + w is in P’. Hence A 3 w in G’. Next suppose that the result holds for all derivations of the length at most m. Let A + BC 9 w or XAi + BC 9 w (B, C E N’ ) be a derivation of length m + 1 in G. This implies that A -+ BC or XAi + BC is in P and BC 9 w is a derivation of length m in G. By the construction 1 of P and the induction hypothesis, A 3 BC is in P’ and BC i!, w in G’. Hence A 4 win G’. Let A*X 4 w or XAi*X 9 w (XEN) be a derivation of length m + 1 in G. By the construction 1 of P, this implies that X = XAi, A --f XAj or XAi -+ XAj is in P, and XA, Z, w is a derivation of length m in G. By the induction hypothesis, A !I, w in G’. This completes the induction. Suppose that S B w in G. By the constructions 2 and 3, this implies that S a BYc* BC in G, s’* A* BC in G’, and BC 9 w in G for some B, CE N’ - {S’}. By the above result, BC 4 w in G’. Hence s’ Z, w in G’. This completes the proof of Claim 3. By Claim 2 and 3, L(G’) = L(G). To finish the proof, note that G is s-free. Q.E.D. We analyze how much the transformation used in Theorem 8 blows up the size of the grammar. Let G’ = (N’, Z, P’, S) be any invertible contextfree grammar such that each production in P’ has the form given in Theorem 8 and G = (N, Z, P, S) be the resulting equivalent reversible con-
LEARNING CONTEXT-FREE GRAMMARS
41
text-free grammar by the transformation. Then 1N} < 2 1N’ I+ 2 1P’ I- 3 and 1P) < 4 1P’ 1+ 1N’ 1- 3. Thus this transformation polynomially blows up the size of the grammar. However, the transformation of any context-free grammar into an equivalent one that is invertible suffers an exponential blowup in the number of nonterminals in the grammar. Note that while the standard transformation to make a context-free grammar invertible preserves structural equivalence (see, McNaughton, 1967, for example), the transformations including ones used in Theorems 7 and 8 to achieve reset-freeness in general do not, and cannot always, preserve structural equivalence, although they preserve language equivalence. This is because some sets of skeletons aocepted by skeletal tree automata are not accepted by any reversible skeletal tree automaton, which is the correct analog of the theory in the case of finite automata, where not all regular languages are reversible. DEFINITION. A context-free grammar G = (N, Z; P, S) is said to be extended reversible if and only if for p’=P(S+aIaEq, G’ = (N, Z, P’, 5’) is reversible.
By the above theorem, reversible context-free grammars can be easily extended so that for any context-free language not containing E, we can find an extended reversible context-free grammar which is e-free and generates the language. THEOREM 9. Let L be any context-free language not containing E. Then there is an E-free extended reversible context-free grammar G such that L(G) = L.
ProoJ It is obvious from the definition context-free grammars and Theorem 8.
of the extended reversible Q.E.D.
6. LEARNING ALGORITHMS In this section we first describe and analyze the algorithm RT to learn reversible skeletal tree automata from positive, samples. Next we apply this algorithm to learning context-free grammars from positive samples of their structural descriptions. Essentially the algorithm RT is an extension of Angluin’s (1982) learning algorithm for zero-reversible automata. Without loss of generality, we restrict our consideration to only e-free context-free grammars. DEFINITION. A positive sample of a tree automaton A is ‘a finite subset of T(A ). A positive sample CS of a reversible ‘skeletal tree automaton A is
42
YASUBUMI
SAKAKIBARA
a characteristic sample for A if and only if for any reversible skeletal tree automaton A’, T(A’) 1 CS implies T(A) c T(A’). 6.1. The Learning Algorithm
RTfor
Tree Automata
The input to RT is a finite nonempty set of skeletons Sa. The output is a particular reversible skeletal tree automaton A = RT(Sa). The learning algorithm RT begins with the base tree automaton for Sa and generalizes it by merging states. RT finds a reversible skeletal tree automaton whose characteristic sample is precisely the input sample. On input Sa, RT first constructs A = B.s(Sa), the base tree automaton for Sa. It then constructs the finest partition rrYof the set Q of states of A with the property that A/nf is reversible, and outputs A/7tf To construct rcfnf,RT begins with the trivial partition of Q and repeatedly merges any two distinct blocks B, and B, if any of the following conditions is satisfied: 1. B1 and B, both contain final states of A. 2. There exist two states q E B, and q’ E B, of the forms q = O(U1) ...) z+) and q’ = o(u;, .... uk) such that for 1