Tree Languages Generated by Context-Free Graph Grammars? Joost Engelfriet and Sebastian Maneth Leiden University, Department of Computer Science, PO Box 9512, 2300 RA Leiden, The Netherlands, email: fengelfri,
[email protected]
Abstract. A characterization is given of the class of tree languages
which can be generated by context-free hyperedge replacement (HR) graph grammars, in terms of macro tree transducers (MTTs). A similar characterization is given of the MSO de nable tree transductions.
1 Introduction A tree t (over a ranked alphabet) can conveniently be represented by a hypergraph g in the following way. Each node u of t is represented in g by the same node u and a hyperedge eu ; if u has children u1 ; : : : ; uk in t, then eu is incident with u1 ; : : : ; uk ; u in g (in that order). The right-hand side of the leftmost production in Fig. 2(b) shows the hypergraph representation of the monadic tree A(e). We call such a hypergraph a tree graph, as opposed to the well-known term graphs which are tree graphs with sharing of subtrees (see, e.g., [Plu98]). We want to characterize the class TR(HR) of tree languages which can be generated by HR grammars in this way, by means of MTTs which are a well-known model for syntax-directed semantics that combines features of the top-down tree transducer and the context-free tree grammar [EV85, FV98]. The class of tree languages obtained by unfolding the term graph languages which can be generated by HR grammars, is characterized in [EH92] as the class of output tree languages of attribute grammars. More recently it was proved in [Dre97] that TR(HR) can be obtained by evaluating the output tree languages of nite-copying top-down tree transducers [ERS80] in an algebra of hypergraphs in which each operation is a substitution into a tree graph. We show that the evaluation in such an algebra corresponds to a very restricted kind of MTT, viz. one which is linear and nondeleting in both the input variables and the parameters (Theorem 3). Composing this class with the class of nite-copying top-down tree transducers, we obtain our characterization of TR(HR) as the class of output tree languages of nite-copying MTTs which are linear and nondeleting in the parameters (Theorem 4). Adding regular look-ahead to these MTTs, they compute precisely the MSO de nable tree transductions (Theorem 7). Acknowledgment We thank one of the referees for constructive comments. ?
This work was supported by the EC TMR Network GETGRATS.
1
2 Trees and HR Grammars A set together with a mapping rank: ! N is called a ranked set. For k 0, (k) is the set f 2 j rank() = kg (we also write (k) to denote that rank() = k). For a set A, h; Ai is the ranked set fh; ai j 2 ; a 2 Ag with rank(h; ai) = rank(). By inc( ) (dec( )) we denote the ranked set obtained from by increasing (decreasing) the rank of each symbol by one, and zero( ) is obtained by changing each rank into zero. The set of all trees over is denoted T . For a set A, T (A) is the set of all trees over [ A, where all elements in A have rank zero. We x the set X of input variables x1 ; x2 ; : : : and the set Y of parameters y1 ; y2 ; : : :. For k 0, Xk = fx1 ; : : : ; xk g and Yk = fy1 ; : : : ; yk g.
We assume the reader to be familiar with hypergraphs and HR grammars (see, e.g., [Hab92, DHK97]). For a ranked set , the set of all hypergraphs over is denoted HGR(). The label of a hyperedge of type k (i.e., with k incident nodes) is in (k) . The type of a hypergraph is the number of its external nodes. We represent a tree t 2 T by a hypergraph g = gr(t) 2 HGR(inc( )) of type 1, as discussed in the Introduction, with the root of t being the external node of g. We also represent a simple tree t 2 T (Yk ) by a hypergraph g 2 HGR(inc( )) of type k + 1 (where \simple" means that each parameter in Yk occurs exactly once in t). Then the node u of t with label yi is the i-th external node of g, and there is no hyperedge eu in g; the root of t is the last external node of g. Consider, e.g., the tree t = a(A(b(y1 ))); the tree graph g = gr(t) is depicted in the production A ! g of Fig. 2(b). For a tree graph g over we denote by tr(g) the (simple) tree it represents, which is over dec() [ Yk?1 if k is the type of g. For a class of hypergraph languages L we denote by TR (L) the class of all tree languages obtained by applying tr to the tree graph languages (of type 1) in L. Hence if HR denotes the class of HR languages, then TR (HR) denotes the class of all tree languages which can be generated by HR grammars.
3 Macro Tree Transducers A macro tree transducer is a syntax-directed translation device in which the translation of an input tree may depend on its context. The context information is handled by parameters. We will consider total deterministic MTTs only. A macro tree transducer (for short, MTT) M is a tuple (Q; ; ; q0 ; R), where Q is a ranked alphabet of states, and are ranked alphabets of input and output symbols, respectively, q0 2 Q(0) is the initial state, and R is a nite set of rules ; for every q 2 Q(m) and 2 (k) with m; k 0 there is exactly one rule of the form hq; (x1 ; : : : ; xk )i(y1 ; : : : ; ym ) ! in R, where 2 ThQ;Xk i[ (Ym ). The right-hand side will be denoted by rhsM (q; ). If every state has rank zero, then M is a top-down tree transducer (for short, T). The rules of M are used as term rewriting rules in the usual way and the derivation relation of M (on ThQ;T i[ ) is denoted by )M . The transduction realized by M, denoted M , is the total function f(s; t) 2 T T j hq0 ; si )M tg. The class of all transductions which can be realized by MTTs (Ts) is denoted by 2
MTT (T ). For q 2 Q(m) and s 2 T we denote by Mq (s) the tree t 2 T (Ym ) such that hq; si(y1 ; : : : ; ym ) )M t. If, for every q 2 Q(m) and 2 (k) , (i) each x 2 Xk occurs exactly once in rhsM (q; ), then M is simple in the input (for short, si), or (ii) each y 2 Ym occurs exactly once in rhsM (q; ), then M is simple in the parameters (for short, sp). Note that we use `simple' to abbreviate the more usual `linear and nondeleting'. For a class of tree transductions and a class L of tree languages (L) = f'(L) j ' 2 ; L 2 Lg, and OUT () is the class of ranges of transductions in . If we disregard the input of an MTT, then we obtain a context-free tree (CFT) grammar. A CFT grammar G is a tuple (N; ; S; P ), where N and are ranked alphabets of nonterminals and terminals, S 2 N (0) is the initial nonterminal, and P is a nite set of productions of the form A(y1 ; : : : ; ym ) ! with A 2 N (m) and 2 TN [ (Ym ). If each y 2 Ym occurs exactly once in each , then G is simple. The class of tree languages generated by simple CFT grammars is denoted CFT sp . If N = N (0) , then G is a regular tree grammar. The class of regular tree languages is denoted by REGT.
Finite Copying The notion of nite-copying can be de ned for arbitrary
MTTs, but for convenience we restrict ourselves to sp MTTs. Consider a derivation of an MTT M which is sp. A subtree of an input tree s may be processed by M arbitrarily many times, depending on s and the rules of M . The state sequence of s at node u, denoted by sts(s; u), contains all states which process the subtree s=u of s rooted at u. Formally, the state sequence of s at its root is q0 , and if sts(s; u) = q1 qn , u has label , and u i is the i-th child of u, then sts(s; u i) = i (rhsM (q1 ; ) rhsM (qn ; )), where i changes every hq; xi i into q and deletes everything else. If for every s 2 T and every node u of s, jsts(s; u)j k for a xed k 0, then M is called k-copying or nite-copying (for short, fc). The class of transductions realized by MTTs (Ts) which are w 2 fsi; sp; fcg is denoted by MTT w (Tw ). As an example, a 2-copying MTTsp that translates the monadic tree n () into (an (bn (e)); an (bn (e))) has rules hq0 ; (x1 )i ! (hq; x1 i(e); hq; x1 i(e)), hq0 ; i ! (e; e), hq; (x1 )i(y1 ) ! a(hq; x1 i(b(y1 ))), and hq; i(y1 ) ! a(b(y1 )). A nite-copying MTT can be decomposed into a nite-copying top-down tree transducer followed by an MTT that is simple in the input: the top-down tree transducer simply generates jsts(s; u)j copies of the tree s=u. Lemma 1. MTT fc;sp = MTT si;sp Tfc. Proof. () Let M = (Q; ; ; q0; R) be an MTTfc;sp. De ne the Tfc M 0 = (zero(Q); ; ?; q0 ; R0) and the MTTsi;sp M 00 = (Q; ?; ; q0 ; R00 ) as follows. For every q 2 Q(m) and 2 (k) let hq; (x1 ; : : : ; xk )i ! n (hq1 ; xi1 i; : : : ; hqn ; xin i) be in R0 and n 2 ? (n), where hq1 ; xi1 i; : : : ; hqn ; xin i are all elements of hQ; X i in rhsM (q; ). Furthermore, let hq; n (x1 ; : : : ; xn )i(y1 ; : : : ; ym) ! be in R00 , where is obtained from rhsM (q; ) by changing hqj ; xij i into hqj ; xj i (with appropriate dummy rules added to make M 00 total). It can be shown that the state sequences of M 0 are precisely those of M , and that for q 2 Q and s 2 T , Mq00 (Mq0 (s)) = Mq (s). Hence M 2 Tfc and M = M M . 00
0
3
0
() By a straightforward product construction.
ut
4 Tree Graph Operations To state the characterization of TR (HR) by Drewes in [Dre97], we have to recall the characterization of HR in terms of operations on hypergraphs (cf. [BC87, Eng94, Dre96]). A hypergraph that contains variables z1 ; : : : ; zk can be seen as a k-ary operation on hypergraphs. Let Zk = fzij j 1 i k; j 0g with rank(zij ) = j . If g 2 HGR( [ Zk ) and for every 1 i k there is exactly one ji such that ziji appears in g, then g is a k-ary hypergraph operation over (on hypergraphs of type j1 ; : : : ; jk ). For a ranked alphabet and a mapping f , (; f ) is an alphabet of hypergraph operations (over ) if f () is a k-ary hypergraph operation over for every 2 (k) . With (; f ) we associate the (partial) valuation function val;f : T ! HGR(). For 2 (k) and s1 ; : : : ; sk 2 T , val;f ((s1 ; : : : ; sk )) = f ()[zij =val;f (si ) j 1 i k; j 0], where g[z=h] is the hypergraph obtained from g by replacing its unique z -labeled hyperedge e by the hypergraph h (provided type(h) = type(e)). With these de nitions, HR is the class of all hypergraph languages val;f (L) for some alphabet of hypergraph operations (; f ) and a regular tree language L over (assuming that all hypergraphs in a hypergraph language have the same type). An easy way to guarantee that an HR grammar generates a tree language, is to require all its right-hand sides to be tree graphs (because tree graphs are closed under hypergraph substitution). But it is well known that the corresponding class of tree languages is a proper subclass of TR(HR) (cf. [Dre97]). By the (proof of the) above characterization it is the class of all val;f (L) where f () is a tree graph for every 2 and L is in REGT . In [Dre97] Drewes shows that, to obtain TR(HR), REGT should be replaced by Tfc(REGT ). Let VALtr be the set of all val;f such that (; f ) is an alphabet of tree graph operations, which means that f () is a tree graph for every 2 . The following theorem is shown by Drewes for \hypertrees", which are a slight generalization of tree graphs; it is easy to see that it also holds for tree graphs. Proposition 2 ([Dre97]). TR(HR) = TR(VALtr(Tfc(REGT ))). We now show that the class of tree transductions TR VALtr is closely related to MTT si;sp: they produce the same output tree languages when applied to a class of input tree languages with weak closure properties. Theorem 3. Let L be a class of tree languages which is closed under intersection with regular tree languages and under Tsi . Then TR(VALtr (L)) = MTT si;sp (L). Proof. () If a tree graph operation h is applied to tree graphs g1 ; : : : ; gk , then the hyperedge of h labeled by variable ziji is replaced by gi . For the corresponding trees this means that in tr(h) the symbol ziji of rank ji ? 1 is replaced by tr(gi ), which contains parameters y1 ; : : : ; yji ?1 . This is a simple case of term rewriting which, as it turns out, can be carried out by an MTTsi;sp. Since MTTs are
4
total, whereas a function val in VALtr is in general partial, the input for the corresponding MTT should be restricted to the domain of val, which is a regular tree language (because well-typedness is a regular property). Let (; f ) be an alphabet of tree graph operations over , and L T a tree language in L (with val;f (L) of type 1). Let L0 = L \ dom(val;f ). Since L is closed under intersection with REGT languages, L0 2 L. Let N = maxftype(f ())?1 j 2 g. De ne M = (Q; ; dec(); 0; R) with Q = fm(m) j 0 m N g and R consists of all rules hm; (x1 ; : : : ; xk )i(y1 ; : : : ; ym ) ! , where 2 (k) , m = type(f ()) ? 1, and is obtained from tr(f ()) by changing every zij into hj ? 1; xi i. Since for every i there is exactly one zij in f (), M is si. Since tr(f ()) is a simple tree in Tdec() (Ym ), M is sp. For every s 2 L0 , Mm (s) = tr(val;f (s)), where m = type(val;f (s)) ? 1. Hence M (L0 ) = tr(val;f (L)). () The MTT in the above proof needs dierent states merely to provide the correct number of parameters, i.e., val;f has no state information. Thus, to realize M 2 MTT si;sp by some val;f we must rst add to the input tree the information by which states of M its subtrees are processed. This can be done by a simple top-down tree transducer which changes the label of a node u in s into h; qi, where q is the state of M that processes the subtree s=u. Let M = (Q; ; ; q0 ; R) be an MTTsi;sp and let L T be a tree language in L. De ne the Tsi M 0 = (zero(Q); ; ?; q0 ; R0 ) with ? = h; Qi and R0 consists of all rules hq; (x1 ; : : : ; xk )i ! h; qi(hq1 ; x1 i; : : : ; hqk ; xk i) where q 2 Q, 2 (k) , and hq1 ; x1 i; : : : ; hqk ; xk i are the elements of hQ; X i in rhsM (q; ). De ne f (h; qi) = gr( ), where is obtained from rhsM (q; ) by changing every hqi ; xi i 2 hQ; X i(j) into zij+1 . Since M is sp, gr( ) is de ned, and since M is si, it is a k-ary tree graph operation over inc(). For q 2 Q and s 2 T , Mq (s) = tr(val?;f (Mq0 (s))). Hence M (L) = tr(val?;f (M (L))) with M (L) 2 L because L is closed under Tsi . ut 0
0
5 Tree Languages Generated by HR Grammars In the previous section we have shown how the valuation functions induced by tree graph operations are related to MTTs. By Proposition 2 the class TR(HR) can be expressed in terms of tree graph operations. Thus we obtain the following characterization of TR(HR). Theorem 4. TR(HR) = MTT fc;sp(REGT ) = OUT (MTT fc;sp). Proof. By Proposition 2, TR(HR) is equal to TR(VALtr (Tfc (REGT ))) which equals MTT fc;sp (REGT ) by Lemma 1 and Theorem 3 because Tfc(REGT ) is closed under both (i) intersection with REGT and (ii) Tsi . If 2 Tfc and L; L0 2 REGT , then (L) \ L0 equals (L \ ?1 (L0 )) which is in Tfc(REGT ) because REGT is closed under inverse top-down tree transductions (cf. Corollary IV.3.17 of [GS84]); hence (i) holds. It is straightforward to show that Tfc is closed under composition and hence (ii) holds (every Tsi is 1-copying). Since every REGT language is the range of a 1-copying top-down tree transducer [Man96], and Tfc is closed under composition, the second equality follows from Lemma 1. ut 5
Special Cases There are two known ways to generate trees by HR grammars in
a structured way, viz. by restricting the right-hand sides of the productions to be (a) \leaf-linked forests", or, as already mentioned in Section 4, (b) tree graphs. One example of each is given in Fig. 2. We denote the generated classes of HR languages by HRfo and HRtr , respectively. It is shown in [Rao97] that TR(HRfo ) = Tfc(REGT ). Since, as observed in Section 4, TR(HRtr ) = TR(VALtr (REGT )), and since by Theorem 3 (and the known closure properties of REGT ) this class equals MTT si;sp (REGT ), the classes HRtr and HRfo correspond naturally to the \decomposition" result of Drewes (plus the one obtained from Theorem 3): TR(HR) = TR(VALtr (Tfc(REGT ))) = MTT si;sp (Tfc(REGT )). Another way of viewing these classes is to say that HRfo corresponds to the top-down tree transducer aspect of MTTs (because TR (HRfo ) = Tfc(REGT )), whereas HRtr corresponds to the context-free tree grammar aspect of MTTs. Theorem 5. TR(HRtr) = CFT sp. Proof. Clearly, by the argument in the beginning of the proof of Theorem 3, an HRtr grammar with productions A ! g where g is a tree graph, generates the same tree language as the corresponding CFTsp grammar that has the same nonterminals and the productions A(y1 ; : : : ; ym ) ! tr(g), where m = type(g)?1. As an example, the CFT grammar corresponding to HR grammar Gb of Fig. 2 has productions S ! A(e), A(y1 ) ! a(A(b(y1 ))), and A(y1 ) ! a(b(y1 )). ut CFT IO
HOM (REGT )
OUT (AG )
T (REGT ) TR(HR) TR(HR ) tr
REGT
TR(HR
fo )
Fig. 1. Classes of tree languages generated by HR grammars Let us now compare the classes TR(HRtr ), TR(HRfo ), and TR(HR) with each other and with some well-known classes of tree languages: HOM (REGT ), T (REGT ), CFT IO , and OUT (AG ), where HOM denotes the class of transductions realized by one-state Ts, CFT IO is the class of tree languages generated by CFT grammars in IO (inside-out) derivation mode, and AG is the class of tree transductions realized by attribute grammars.
Theorem 6. Figure 1 is a Hasse diagram. Proof. The inclusions are known from the literature. Thus, it suces to prove the following three inequalities. (1) TR(HRtr ) ? T (REGT ) 6= ?. Every monadic tree language in T (REGT ) is regular. However, the HRtr grammar Gb of Fig. 2(b) generates the non-regular monadic tree language fan (bn (e)) j n 1g.
6
(2) TR(HRfo ) ? CFT IO 6= ?. The HRfo grammar Ga of Fig. 2(a) generates all trees (t; t), where t is a binary tree over f(2) ; e(0)g and t is the same tree with each replaced by . In Section 5 of [EF81] it is shown that this tree language is not in CFT IO . 2 2 a 1 2 1 1 a
S!
A A!
(a)
1 2
A!
A
A
S! A A! A j e b
e e
1
(b) Fig. 2. Productions of G and G a
b
1
b
(3) HOM (REGT ) ? TR(HR) 6= ?. Consider the one-state T with the two rules hq; (x)i ! (hq; xi; nhq; xi) and hq; i ! a. It generates full binary trees the yields of which are in fa2 j n 0g. But the number of a's in the hypergraphs of an HR language form a semi-linear set (Theorem IV.4.3 of [Hab92]). ut We note that the classes T (REGT ), CFT IO , and OUT (AG ) are equal to the classes TM (HRfo ), TM (HRtr ), and TM (HR), respectively, where TM (HRw ) is the class of tree languages obtained by unfolding the term graph languages of these HR grammars (in which sharing of subtrees is now allowed). For the third class this is shown in [EH92].
Classes of Tree Transductions In [CE95] it is shown that the class HR of
HR languages is equal to the class MSOT (REGT ) of output languages of graph transductions de nable in monadic second order logic (cf. [Cou94]) taking regular tree languages as input. This means that TR(HR) = TR(MSOT (REGT )). Hence the particular MSO transductions involved translate trees into trees; moreover, they can be restricted to total functions. Let us denote this class of tree-to-tree transductions by MSOTT. In [BE98] MSOTT is characterized in terms of attribute grammars. We now present a characterization of this class in terms of MTTs (which are more powerful than attribute grammars), proved in [EM98]. As it turns out, the feature of regular look-ahead (cf. [EV85]) must be added to the MTTs in order to obtain a natural characterization of the class MSOTT. We denote regular look-ahead by a superscript R. Theorem 7. MSOTT = MTT Rfc;sp. Note that Theorem 4 is a consequence of Theorem 7 together with the result of [CE95] that HR = MSOT (REGT ) and the well-known fact that the regular look-ahead can be incorporated in the regular input language. Note also that, 7
since MTT R = MTT [EV85], every MSO tree transduction can be realized by a macro tree transducer, i.e., MSOTT MTT.
References [BC87] M. Bauderon, B. Courcelle. Graph expressions and graph rewritings. Math. Systems Theory, 20:83{127, 1987. [BE98] R. Bloem, J. Engelfriet. A comparison of tree transductions de ned by monadic second order logic and by attribute grammars. Technical Report 98-02, Leiden University, 1998. [CE95] B. Courcelle, J. Engelfriet. A logical characterization of the sets of hypergraphs de ned by hyperedge replacement grammars. Math. Systems Theory, 28:515{552, 1995. [Cou94] B. Courcelle. Monadic second-order de nable graph transductions: a survey. Theoretical Computer Science, 126:53{75, 1994. [DHK97] F. Drewes, A. Habel, H.-J. Kreowski. Hyperedge replacement graph grammars. In Handbook of Graph Grammars and Computing by Graph Transformation, Vol. 1: Foundations (G. Rozenberg, ed.), World Scienti c, 1997 [Dre96] F. Drewes. Computation by Tree Transductions. PhD thesis, University of Bremen, 1996. [Dre97] F. Drewes. A characterization of the sets of hypertrees generated by hyperedge-replacement graph grammars. Technical Report Bericht Nr. 3/97, Universitat Bremen, 1997. (to appear in Theory of Comput. Sys.). [EF81] J. Engelfriet, G. File. Passes and paths of attribute grammars. Inform. and Control, 49:125{169, 1981. [EH92] J. Engelfriet, L. Heyker. Context-free hypergraph grammars have the same term-generating power as attribute grammars. Acta Informatica, 29:161{210, 1992. [EM98] J. Engelfriet, S. Maneth. Macro tree transducers, attribute grammars, and MSO de nable tree translations. Tech.Report 98-09, Leiden University, 1998 [Eng94] J. Engelfriet. Graph grammars and tree transducers. In S. Tison, editor, Proc. CAAP'94, volume 787 of LNCS, pages 15{36. Springer-Verlag, 1994. [ERS80] J. Engelfriet, G. Rozenberg, G. Slutzki. Tree transducers, L systems, and two-way machines. J. of Comp. Syst. Sci., 20:150{202, 1980. [EV85] J. Engelfriet, H. Vogler. Macro tree transducers. J. of Comp. Syst. Sci., 31:71{146, 1985. [FV98] Z. Fulop, H. Vogler. Syntax-Directed Semantics. Formal Models based on Tree Transducers. EATCS-monographs on Theoretical Computer Science, SpringerVerlag, to appear 1998. [GS84] F. Gecseg, M. Steinby. Tree Automata. Akademiai Kiado, Budapest, 1984. [Hab92] A. Habel. Hyperedge Replacement: Grammars and Languages, volume 643 of LNCS. Springer-Verlag, 1992. [Man96] S. Maneth. On the generating power of deterministic tree transducers. Technical Report TUD/FI 96/19 - 1996, Technical University of Dresden. (to appear in Inform. and Comput.). [Plu98] D. Plump. Term graph rewriting. (to appear in Handbook of Graph Grammars and Computing by Graph Transformation, Volume 2). [Rao97] J.-C. Raoult. Rational tree relations. Bull. Belg. Math. Soc., 4:149{176, 1997.
8