Bidirectional Cooperating Distributed Grammar Systems Henning Fernau and Markus Holzer
WSI-96-01
Henning Fernau und Markus Holzer Wilhelm-Schickard-Institut fur Informatik Universitat Tubingen Sand 13 D-72076 Tubingen Germany E-Mail: ffernau,
[email protected] Telefon: (07071) 29-7569 (07071) 29-7568 Telefax: (07071) 68142
c Wilhelm-Schickard-Institut fur Informatik, 1996 ISSN 0946-3852
Bidirectional Cooperating Distributed Grammar Systems Henning Fernau
Markus Holzer
Wilhelm-Schickard-Institut fur Informatik Universitat Tubingen Sand 13, D-72076 Tubingen Germany
Abstract
We consider bidirectional cooperating distributed (BCD) grammar systems in the sense of Asveld. Independently of their mode , , =, , or t, BCD grammar systems with two components characterize the recursively enumerable languages even with right-linear rules and without -productions. This sharply contrasts earlier results on solely generating or accepting grammars, where right-linear rules always only lead to regular languages. Further restrictions lead to language families closely related to Lindenmayer language families. Moreover, we can solve some open problems in the theory of bidirectional grammars in this way.
1 Introduction Formal language theory mainly investigates the descriptional power of various language de ning devices. In this paper, bidirectional cooperating distributed grammar systems are considered. In this way, the present work continues and brings together at least ve different research directions in formal language theory: Cooperating distributed (CD) grammar systems are de ned, e.g., in the textbook of Csuhaj-Varju et al. [8]. CD grammar systems were introduced in order to grammatically model the multi-agent systems well-known in the theory of arti cial intelligence. Especially, various \interfaces" between the interacting agents (which are grammar forms in the grammatical model)
Supported by Deutsche Forschungsgemeinschaft grant DFG La 618/3-1.
1
are considered. We will introduce these work modes in detail below, since the present paper is basically a contribution to the theory of CD grammar systems. Accepting grammars were studied by Bordihn, Fernau and Holzer [6, 7, 14, 15]. Dierent grammars are introduced for various motivations, e.g., linguistical, biological or stemming from other parts of computer science, as explicitly explained in case of CD grammar systems above. It is quite clear that such grammatical models can be viewed both as language generators and as language acceptors. A grammar seen as a language acceptor contains accepting rules, e.g., accepting context-free rules are of the form w ! A, where A is some nonterminal symbol, and w is a string consisting of a possibly empty sequence of nonterminal and terminal symbols. Correspondingly, accepting right-linear rules are of the form vB ! A or v ! A, where A; B are nonterminals and v is a possibly empty sequence of terminal symbols. Bidirectional grammars were introduced by Appelt, Hogendorp and Asveld [1, 2, 20, 21]. They can be viewed as a mixture of generating and accepting grammars. Such grammars contain both generating and accepting rules. Moreover, we can solve some open problems in the theory of bidirectional grammars. Generative systems were de ned by Rovan [19, 27]; similar systems have also been previously studied by Wood [30]. In such systems, a derivation step is given by the application of a given nondeterministic general sequential machine with accepting state. Characterizations of the class of recursively enumerable sets can be obtained using surprisingly weak means, a topic which recently gained new interest in the community as exempli ed by the works of, e.g., Boasson [3], Book [4], Book and Brandenburg [5], Csuhaj-Varju et al. [9], C ulik [10, 11, 12], Geert [16, 17, 18], Ilie and Salomaa [22], Latteux and Turakainen [23], Paun [25], and Turakainen [29]. Also the generative systems just described fall into this category. The paper is organized as follows: rst, we introduce the necessary de nitions regarding (bidirectional) CD grammar systems. Secondly, we state our main characterization result of recursively enumerable languages, to which end we formally introduce generative systems. In Section 4 we show that further restrictions lead to language families closely related to Lindenmayer language families. Then, we discuss the regularly controlled bidirectional grammars as introduced by Asveld and Hogendorp and their relations to our main results. Thereby we solve some open problems in the theory of regularly controlled bidirectional grammars. Finally, we compare the power of bidirectional CD grammar systems to that of 2
\unidirectional" CD grammar systems, i.e., purely generating or accepting CD grammar systems, respectively.
2 De nitions We assume the reader to be familiar with some basic notions of formal language theory, as contained in Dassow and Paun [13]. In addition, we use to denote inclusion, while denotes strict inclusion. The set of positive integers is denoted by N, while N0 denotes the set of non-negative integers. The empty word is denoted by . We consider two languages L1; L2 to be equal i L1 nfg = L2 nfg. The reversal (mirror image) of a language L is denoted by LR. The end of a proof or of a proved statement is marked by 2. The family of languages generated by regular, linear context-free, contextfree, context-sensitive, type-0 Chomsky grammars, E0L, ET0L systems, contextfree matrix, and context-free matrix grammars with appearance are denoted by Lgen (REG), Lgen (LIN), Lgen (CF), Lgen (CS), Lgen (RE), Lgen (E0L), Lgen (ET0L), Lgen (M; CF), and Lgen (M; CF; ac), respectively. A superscript acc instead of gen is used to denoted the family of languages accepted by the appropriate device. If we want to exclude erasing productions, we add ? in our notations. We use bracket notations like Lgen (M; CF[?]) Lgen (M; CF[?]; ac) in order to say that the equation holds both in the case of forbidding erasing productions and in the case of admitting erasing productions (neglecting the bracket contents). For the convenience of the reader, we repeat the basic de nitions of CD grammar systems adapted from Paun [26], in a way suitable for the interpretation both as generating and accepting bidirectional systems. Throughout this paper, we further distinguish between so-called generating rules, which have the form A ! w, with A 2 N and w 2 (N [ T ), and accepting rules, which are of the form w ! A, with A 2 N and w 2 (N [ T ) . A bidirectional CD (BCD) grammar system of degree (m; n), with m; n 0, is a (m + n + 3)-tuple G = (N; T; S; P1; : : :; Pm ; Pm+1; : : : ; Pm+n ); where N , T are disjoint alphabets of nonterminal and terminal symbols, respectively, S 2 N is the axiom, and P1; : : :; Pm are nite sets of generating rewriting rules over N [ T , and Pm+1 ; : : : ; Pm+n are nite sets of accepting rewriting rules over N [ T . Let G be a BCD grammar system of degree (m; n). For the components with only generating rules we de ne the derivation relation as follows: For x; y 2 (N [ T ) and 1 i m, we write x )i y i x = x1Ax2, y = x1zx2 for some A ! z 2 Pi . Hence, subscript i refers to the production set to be used. In addition, we denote by )=i k ()i k , )i k , )i , respectively) a derivation consisting 3
of exactly k steps (at most k steps, at least k steps, an arbitrary number of steps, respectively) as above. Moreover, we write
x )ti y i x )i y and there is no z such that y )i z. For the components of the BCD grammar system with only accepting rules, we de ne the above relations x )i y, )=i k , )i k , )i k , )i , and )ti for m + 1 i m + n appropriately. Let D := f ; t g [ f k; = k; k j k 2 N g. The language generated in the f -mode, f 2 D, by a BCD grammar system G of degree (m; n) is de ned as: f f f f Lgen f (G) := f w 2 T j S )i 1 )i : : : )i ? `?1 )i ` = w with ` 1, 1 ij m + n, and 1 j ` g: Similarly, one can de ne the language accepted in f -mode, f 2 D, by a BCD grammar system G: f f f f Lacc f (G) := f w 2 T j w )i 1 )i : : : )i ? `?1 )i ` = S with ` 1, 1 ij m + n, and 1 j ` g: If f 2 D and X 2 fREG; LIN; CFg, then the families of languages generated (accepted, respectively) in f -mode by [-free] BCD grammar systems with at most m 2 N0 generating and at most n 2 N0 accepting components using rules of type X are denoted by Lgen (BCD(m;n); X [?]; f ) (Lacc (BCD(m;n); X [?]; f ), respectively). If the number of generating (accepting, respectively) components is not restricted, we write Lgen (BCD(1;n); X [?]; f ) (Lacc (BCD(m;1); X [?]; f ), respectively). The reader familiar with the theory of CD grammar systems might notice that (using conventional notation) trivially for each N 2 N0 [ f1g, Lgen (CDN ; X [?]; f ) := Lgen (BCD(N;0); X [?]; f ); and Lacc (CDN ; X [?]; f ) := Lacc (BCD(0;N ); X [?]; f ): Obviously, we have the following relations. 1
2
`
1
`
1
2
`
1
`
Lemma 2.1 If f 2 D and X 2 fREG; LIN; CFg, then we have: Lgen (BCD(0;0); X [?]; f ) = f;g; for N 2 N0 [ f1g, Lgen (BCD(0;N ); X [?]; f ) = f;g; for N 2 N0 [ f1g, Lacc (BCD(N;0); X [?]; f ) = f;g; Lgen (BCD(1;0); X [?]; f ) = Lgen (X ); Lacc (BCD(0;1); X [?]; f ) = Lacc (X ) = Lgen (X ); 4
for N; M; N 0 ; M 0 2 N0 [ f1g with N N 0; M M 0, Lgen (BCD(M;N ); X [?]; f ) Lgen (BCD(M 0;N 0); X [?]; f ):
2
Observe, having an accepting component with erasing rules, i.e., rules of the form ! A, working in t-mode we can delete the whole component, because a derivation using such a component does not terminate. Thus, e.g., Lacc (BCD(0;n); CF; t) and Lacc (BCD(0;n); CF ? ; t) denote the same family of languages. Since the situation within generating and accepting CD grammar systems has been thoroughly investigated before, see Fernau, Holzer and Bordihn [15], we restrict our attention to BCD grammar systems of degree (m; n) with m; n 1.
3 Characterizing the Recursively Enumerable Languages In the following, we are going to prove the main theorem of this paper, showing that any recursively enumerable language can be generated by some two component bidirectional CD grammar system in arbitrary mode, working only with right-linear rules. Of course, a similar statement is true for BCD grammar systems working with left-linear rules. Moreover, a corresponding statement is also true for accepting BCD grammar systems.
Theorem 3.1 For f 2 D, Lgen (BCD(1;1); REG ? ; f ) = Lgen (RE). To prove our main theorem, we need some notions and results of generative system. As already observed by Geert [16], generative systems are very appealing when dealing with characterizations of recursively enumerable sets. As regards our de nition of generative systems, we follow Rovan [27] with his slightly modi ed version contained in [19]. A nondeterministic generalized sequential machine with accepting state, or aNGSM 1 for short, is a 6-tuple M = (Q; X; Y; H; q0 ; qf ), where Q is a nite set of states, X and Y are nite (input and output) alphabets, q0 2 Q is the initial state, and qf 2 Q is an accepting or nal state, and H is a nite subset of Q X Y Q. By a computation of such an a-NGSM a word h = h1 hn 2 H + is understood such that
Rovan used this de nition in order to de ne what he called one-input nite state transducer with accepting state; unfortunately, in general a transducer (in contrast to a sequential machine) is allowed to make -moves. Therefore, we took the notions as introduced, e.g., by Wood [30]. It is easy to see that our restriction to just one nal state does not decrease the set of representable functions. 1
5
1. 1(h1) = q0, 4(hn ) = qf ; 2. 81 i n ? 1 (1 (hi+1) = 4(hi)); where i are homomorphisms on H de ned as projections by i((x1; x2; x3; x4)) = xi for i = 1; 2; 3; 4: The set of all computations of M is denoted by C (M ). An a-NGSM mapping is de ned for each language L X by M (L) = 3(2?1(L) \ C (L)): A generative system is a 4-tuple G = (N; T; S; M ), where N , T are nite disjoint alphabets of nonterminal and terminal symbols, respectively, S 2 N is the initial nonterminal symbol, and M is an a-NGSM with input and output alphabet equal to N [ T . We can de ne the rewrite relation ) on (N [ T ) by u ) v i v 2 M (fug). As usual, ) denotes the re exive transitive closure of ). The language generated gen by G is L (G) = f w 2 T j S ) w g. We need the following theorem of Rovan and Wood [27, 30].
Theorem 3.2 Every recursively enumerable language can be generated by a
generative system. It is easy to see that an equivalent statement is true for iterated a-NGSM's working from right to left on the strings instead of working from left to right. Now we are ready to proof the main theorem: Proof of Theorem 3.1. Below, we restrict ourselves to the -mode of derivation. By simple modi cations of our argument, we can prove the claim for the other modes, too. By a standard argument, it can be shown that (*) Lgen (BCD(1;1); REG ? ; ) is closed under union and contains the regular languages. Let L 2 Lgen (RE), L T . Then, [ L = (fagT +fbg \ L) [ (L \ T 2) [ (L \ T ) [ (L \ fg): a;b2T
Since L 2 Lgen (RE), Lab = f w 2 T + j awb 2 L g is recursively enumerable due to the closure of Lgen (RE) under derivatives and intersection with regular languages. By (*), it is sucient for the proof of the present assertion to show that L0 = fagLabfbg 2 Lgen (BCD(1;1); REG ? ; ) provided that Lab T + is recursively enumerable. Let G = (N; T; S; M ) be a generative system with Lgen (G) = LRab, where M = (Q; N [T; N [T; H; q0; qf ) is an a-NGSM. We de ne a BCD grammar system G0 = (N 0; T 0; S 0; Pgen; Pacc) of degree (1; 1) with right-linear rules generating L0. 6
Let N 0 = (Q (N [ T )) [ fS 0; R; R0 ; L; L0g [ T and T 0 = N [ T [ Q [ fa~; q0g. The production sets are given as follows: Pgen = fS 0 ! ~aSR; R ! q0R; L ! a~; L0 ! aq0; R0 ! bg [ f (q; b) ! pwR j (q; b; w; p) 2 H g [ f a ! aq0 j a 2 T g
Pacc = f bq ! (q; b) j q 2 Q; b 2 T [ N g [ fa~qf ! L; a~qf ! L0g [ f q0a ! a j a 2 T g [ fq0R ! R0g: Starting with S 0 ) ~aSR, grammar G is simulated as follows: each application of the a-NGSM M onto the sentential form wR is simulated by a right-to-left sweep of an additional state marker q injected into w by R ! q0R. The work of the transducer is simulated by an obvious interplay between the accepting and generating component. When successfully reaching the left marker a~, the accepting component introduces the nonterminal symbol L, which is turned again into the terminal marking symbol a~ by the generating component. At one point of the simulation, the accepting component guesses that the generative system G would nish its work now. This guess is done at the left marking symbol a~ by applying the production ~aqf ! L0, followed by L0 ! aq0. In a nal sweep the symbol q0 moves to the right, subsequently using one of the productions of the form q0a ! a and a ! aq0. Finally, the production sequence q0R ! R and R0 ! b produces the required terminal string. 2 Remark. Observe that most of the terminal symbols are pseudo-terminals in the sense that only the original terminals from T may occur in a sentential form consisting only of terminal symbols (see also the comment of Hogendorp [20, page 177]). The usage of pseudo-terminals is essential for the construction of a grammar system with right-linear rules.
4 Restricting the Power of BCD Grammar Systems Since even regular BCD grammar system of undesirable strong generative power, the question arises whether it is possible to restrict their generative power in some way. One possible restriction of right-linear rules is to allow only accepting rules of the form B ! A with B 2 N [ T , A 2 N . Counting components in the usual way, we denote the corresponding language classes by Lgen (BCD(m;n); rREG[?]; f ). Observe that our proof showing the equality of Lgen (BCD(m;n); REG ? ; f ) and Lgen (RE) fails in this case. In the following, we mainly restrict ourselves to f = t, since it is again the most interesting case. 7
In order to state our results, we need the notion of Lindenmayer system. An extended context-free Lindenmayer system with tables or ET0L system for short is given by a quadruple G = (; ; !; H ), where is the total alphabet, is the terminal alphabet, ! 2 is the axiom, and H is a nite set of nite substitutions H = f h1; : : :; hm g. Each hi : ! 2 is usually given by a set of context-free productions. The language generated by G is de ned as
Lgen (G) = f w 2 j w 2 hi hi hi (!) g : 1
2
n
An E0L system contains just one table. A propagating ET0L system (denoted by P in the abbreviations) does not contain erasing productions. In this way, we get the language families Lgen (ET0L), Lgen (EPT0L), Lgen (E0L), and Lgen (EP0L) (see Rozenberg and Salomaa [28]).
Lemma 4.1 Let m; n 2 N [ f1g. Then, Lgen (E0L) Lgen (BCD(m;n); rREG ? ; t): Proof. By the trivial inclusion Lgen (BCD(1;1); rREG ? ; t) Lgen (BCD(m;n); rREG ? ; t) and by the well-known relation Lgen (EP0L) = Lgen (E0L) [28, Theorem II.2.1] it is sucient to prove Lgen (EP0L) Lgen (BCD(1;1); rREG ? ; t).
Let G = (; ; !; h) be a propagating E0L system. Let 0; 00; 000 be alphabets of \coloured" symbols of , let F; S be two further additional symbols, and let h; g0; g00; g000 be morphisms de ned by h : (0) ! ( [fF g), a0 7! a if a 2 , A 7! F otherwise; g0 : ! (0), A 7! A0 for A 2 ; g00 : ! (00), A 7! A00 for A 2 ; g000 : ! (000), A 7! A000 for A 2 . We construct a BCD grammar system G = (N; T; S; Pgen ; Pacc) of degree (1; 1) generating Lgen (G), where N = 0 [ 00 [ fS; F g, T = [ 000. The generating component equals
Pgen = f A0 ! g000()g00(B ) j A ! B 2 h; 2 ; B 2 g [ f A0 ! g00(B ) j A ! B 2 h; B 2 g [ f A0 ! h(A0) j A 2 g [ fS ! g0(!) g; and the accepting component equals
Pacc = f A00 ! A0; A000 ! A0 j A 2 g [ f a ! F j a 2 g: The correctness of our construction is seen easily. We illustrate the above construction by an example: 8
2
Example 4.2 The EP0L system (fag; fag; a; fa ! aag) generates the language L = f a2 j n 2 N0 g. n
Consider now the degree (1; 1) bidirectional CD grammar system (working in t-mode): G = (fS; F; a0; a00g; fa; a000g; S; P1; P2) with P1 = fS ! a0; a0 ! a; a0 ! a000a00g and P2 = fa ! F; a00 ! a0; a000 ! a0g: By two applications of the generating component, we obtain either the terminal word a or the string a000a00. By the accepting component, this latter string is converted into a0a0. Applying now the generating component in t-mode, we can obtain the sentential forms (1) aa or (2) aa000a00 or (3) a000a00a or (4) a000a00a000a00. (1) is a correct terminal string. (2) and (3) contain both non-primed and doubleprimed nonterminal symbols, a mixture which cannot be successfully handled by our production sets, and nally (4) can be converted by the accepting component into a0a0a0a0. By a simple induction argument following the above reasoning, one sees that L = Lgen (G). In the following, we prove a sort of converse to Lemma 4.1. Lemma 4.3 For all m; n 2 N [ f1g, Lgen (BCD(m;n); rREG; t) Lgen (ET0L): Proof. Let G = (N; T; S; P1; : : :; Pm+n ) be a BCD grammar system of degree (m; n). Let #1; : : : ; #m+n be special symbols referring to the production sets of G. We construct an ET0L system G0 = (; ; S 0; H ) simulating G. Set = T . Let = T [ N [ f#1; : : : ; #m+ng [ fS 0; F g. Let Li denote the set of symbols occurring as left-hand sides of productions in Pi , 1 i m + n. H contains the following tables: hinit = f S 0 ! S #i j 1 i m + n g [ f X ! F j X 2 n fS 0g g
h n = f #i ! j 1 i m + n g [ f a ! a j a 2 T g [ f X ! F j X 2 fS 0; F g [ N g hj;1 = f #j ! #j g [ f #i ! F j i 6= j g [ f u ! v j u ! v 2 Pj g [ f X ! X j X 2 N n L j g [ f S 0 ! F; F ! F g hj;2 = f #j ! #i j i 6= j g [ f #i ! F j i 6= j g [ f X ! X j X 2 N n Lj g [ f X ! F j X 2 Lj [ fS 0; F g g 9
for every 1 j m + n. It is obvious how the derivation via a t-mode application of Pj is simulated in a breadth- rst manner by a number of applications of hj;1. Application of hj;2 tests whether the t-mode stop condition is satis ed. 2 The same idea carries over when simulating ET0L systems.
Lemma 4.4 Lgen (ET0L) Lgen (BCD(1;2); rREG ? ; t) Proof. Without loss of generality, we can assume that the ET0L language L is
given via a propagating ET0L system G = (; ; !; H ), where H contains only two tables h1 and h2 (see Rozenberg and Salomaa [28]), and ! 2 n . We construct an equivalent BCD system G0 = (N; T; S; Pgen ; Pacc;1; Pacc;2) of degree (1; 2) with N = ( f1; 2; 3; 4g) [ fS; F g, T = ( f5; 6g) [ . The production-sets are given as follows:
Pgen = f (A; i) ! (w1; j + 4) (wn?1; j + 4)(wn ; j + 2) j i; j 2 f1; 2g ^A ! w1 wn 2 hi ^ w1; : : : ; wn 2 ^ n > 1 g [ f(A; i) ! (w; j + 2) j i; j 2 f1; 2g ^ A ! w 2 hi ^ w 2 g [ f (a; i) ! a j i 2 f1; 2g ^ a 2 g [ f S ! (!; 3); S ! (!; 4) g Pacc;i = f (A; i + 2) ! (A; i); (A; i + 4) ! (A; i) j A 2 g [ f (A; 5 ? i) ! F; (A; 7 ? i) ! F j A 2 g [ f a ! F j a 2 g: for i 2 f1; 2g. Since in every table the alphabets of symbols of left- and right-hand sides of productions are disjoint, a t-mode application of one of these tables changes every symbol into some word (if possible); furthermore, sentential forms w with S )+ w obtained after an application of Pgen are element of (( f3; 4; 5; 6g) [ ). To such a sentential form, Pacc;i is only successfully applicable, i.e., not introducing a failure symbol F , if w 2 ( fi + 2; i + 4g). In applying Pgen , it has to be guessed whether to simulate table hj via the following application of Pgen (after recolouring the symbols by Pacc;j ) or whether to terminate. 2 A similar construction is valid for Lgen (BCD(2;1); rREG ? ; t).
Lemma 4.5 Lgen (ET0L) Lgen (BCD(2;1); rREG ? ; t) Proof. Again, let the ET0L language L be given via a propagating ET0L system G = (; ; !; H ), where H contains only two tables h1 and h2 and ! 2 n . We
construct an equivalent BCD system G0 = (N; T; S; Pgen;1; Pgen;2; Pacc ) of degree (2; 1) with N = ( f1; 2; 3g) [ fS; F g, T = ( f4g) [ . 10
For i 2 f1; 2g the production-sets are given as follows:
Pgen;i = f (A; i) ! (w1; 4) (wn?1 ; 4)(wn; 3) j A ! w1 wn 2 hi ^w1; : : :; wn 2 ^ n > 1 g [ f(A; i) ! (w; 3) j A ! w 2 hi ^ w 2 g [ f (a; i) ! a j a 2 g [ f S ! (!; 3) g [ f (A; 3 ? i) ! F j A 2 g Pacc = f (A; i + 2) ! (A; j ) j i; j 2 f1; 2g ^ A 2 g [ f a ! F j a 2 g:
2
To complete our picture, we show how to generate a well-known non-E0L language [28, Corollary II.4.7] via a BCD grammar system of degree (1; 1).
Example 4.6 BCD system G = (f A; A0; A00; B; B 0; B 00; S; F g; f a0; a; b; b0 g; S; P1; P2) of degree (1; 1) with P1 = f A ! a0A0; B ! b0B 0; A ! a; B ! bB; B ! b; A00 ! a; B 00 ! b; A00 ! a0; B 00 ! b0; S ! ABA g and P2 = f a ! F; b ! F; a0 ! A00; b0 ! B 00; A0 ! A; B 0 ! B g generates f ak blak j l k 1 g in t-mode. Theorem 4.7 Let m; n 2 N [ f1g with m > 1 or n > 1. Then, Lgen (REG) = Lgen (BCD(1;0); rREG[?]; t) Lgen (E0L) Lgen (BCD(1;1); rREG[?]; t) Lgen (ET0L) = Lgen (BCD(m;n); rREG[?]; t): 2 Unfortunately, the question whether the inclusion
Lgen (BCD(1;1); rREG[?]; t) Lgen (ET0L) is strict or not remains an open question. Concluding this section, we turn our attention to the -mode of derivation. Similar results hold for the other modes in D n ftg as well, but are (besides one example) not stated explicitly below.
Example 4.8 BCD system G = (fS g; fa; b; cg; S; fS ! acbg; fc ! S g) of degree (1; 1) generates the non-regular language f ancbn j n 2 N g in -mode. 11
Theorem 4.9 Let m; n 2 N [ f1g. Then, Lgen (REG) Lgen (BCD(1;1); rREG[?]; ) = Lgen (BCD(m;n); rREG[?]; ) Lgen (CF): Proof. The rst inclusion is trivial, its strictness is seen by the example above.
The last inclusion can be seen as follows. If G = (N; T; S; P1; : : : ; Pm; Pm+1 ; : : :; Pm+n ) is a BCD grammar system of degree (m; n), we de ne an equivalent contextfree grammar G0 = (N 0; T; S 0; P ), where N 0 contains primed versions of both nonterminal and terminal symbols from N and T , respectively. Let g : (N [T ) ! N 0 be de ned by A 7! A0. Then, P = f g(A) ! g(w) j A ! w 2 Pj for j 2 f1; : : : ; m + ng g [ f a0 ! a j a 2 T g: The equality Lgen (BCD(1;1); rREG[?]; ) = Lgen (BCD(m;n); rREG[?]; ) can be seen by similar arguments: it is possible to take all generating productions into one big generating component and all accepting productions into one big accepting component. 2 Again, the question whether the inclusion Lgen (BCD(1;1); rREG[?]; ) Lgen (CF) is strict or not remains an open question. Finally, we give an example showing that restricted regular BCD grammars working in the = 2-mode or 2-mode can even generate non-context-free languages. Example 4.10 BCD system G = (f S; A; C g; f a; b; c; $; # g; S; P1; P2) of degree (1; 1) with P1 = f S ! a#bC; C ! $c; A ! a#b g and P2 = f # ! A; $ ! C g working in the = 2-mode or 2-mode or t-mode of derivation generates the language f an #bn$cn j n 2 N g. Up to now, we only wrote about generating restricted BCD grammar systems. Similarly, it would be possible to de ne accepting restricted BCD grammar systems: there are only generating rules of the form A ! B with B 2 N [ T , A 2 N . Counting components in the usual way, we denote the corresponding language classes by Lacc (BCD(m;n); rREG[?]; f ). Without detailed proof, we mention the following result. 12
Theorem 4.11 Lacc (BCD(1;1); rREG[?]; t) = Lgen (CS). Sketch of proof. \": Observing that -productions in accepting components
working in the t-mode do not make sense, a simple simulation by a linear bounded automaton can be given. \": The basic idea is the same as in [15, Theorem 4.2]. The generating components mainly serve for colouring nonterminals into pseudo-terminals, hence preparing the actual simulation of a derivation step of a context-sensitive grammar in Kuroda normal form via a sequence of accepting right-linear components. Especially, it can be tested whether two special symbols are adjacent or not, by simply sending wrong pairs of neighbours to some failure symbol. 2
5 Regularly Controlled Bidirectional Grammars In this section we show close connections between bidirectional grammar systems considered so far and regularly controlled bidirectional (RCB) grammars as considered by Asveld and Hogendorp [2, 20, 21]. Using this connection we can solve some open problems in the theory of RCB grammars and language families in this way. Without going into formal details, we sketch the ideas on RCB grammars in the following: Basically, a context-free grammar G = (N; T; S; P ) and a regular control language C (P [ P ) is given. Productions of P may be applied in a generating (producing) or accepting (reducing) fashion. This is re ected within the control language by introducing p or its barred counterpart, p, respectively. Productions are applied in a rightmost manner. Of course, similar results can be obtained for the leftmost derivation. Asveld and Hogendorp (contrary to the notion we introduced in our previous works) term an accepting production p applicable to a string when there is a rightmost generating derivation step using p from the string yielded from via p. In the so-called nonterminal mode (RN-mode), rightmost application means deriving the rightmost nonterminal. In the so-called occurrence mode (ROmode), rightmost application means deriving the rightmost occurrence of the nonterminal prescribed by the left-hand side of the production p which should be applied (in a producing or reducing manner) according to the control language. Asveld and Hogendorp considered derivations of G according to C either without appearance checking (B-mode) or with unconditional transfer (S-mode). Furthermore, they distinguish between the cases when allowing or disallowing reductions of the form w ! A, w 2 T , which they call g-mode and f-mode, respectively. The family of languages generated by RCB grammars working in, e.g., RNmode with appearance checking and allowed reductions of the form w ! A, is denoted by RN/B/g. 13
It is easily seen that the proof of our main theorem also works when considering left derivations, both in the nonterminal and in the occurrence mode, where the control language may permit arbitrary combinations of generating and accepting rules, and both the B-mode and the S-mode interpretation are possible. Furthermore, by interpreting pseudo-nonterminals as nonterminals, this proof is also valid for the f-mode of RCB grammars, if we allow arbitrary context-free productions. A similar construction is also possible for right-most derivations. Hence, we have obtained the following result, thereby solving some open problems from Asveld and Hogendorp [2, Table 2]. We only give the strongest form of the results known to us. Moreover, we think that our proof using g-systems is essentially shorter than the proof of similar results of Asveld and Hogendorp using Turing machine simulations.
Corollary 5.1 The class of recursively enumerable sets is characterized by the following classes of RCB grammars: RN/B/g, RN/S/f, RO/B/f, and RO/S/f. 2 Furthermore, Hogendorp [20] considered RCB grammars (with rightmost derivation) containing only linear and left-linear rules, called LRCB and LLRCB, respectively. Again, our construction solves the problem of the generative power of such grammars with rules obeying the g-mode.
Corollary 5.2 The class of recursively enumerable sets is characterized by the
following classes of LLRCB grammars: RN/B/g, RN/S/g, RO/B/g, and RO/S/g.2 Especially, our corollaries allow us to solve the questions concerning closure properties of [LL]RCB grammars of various types as listed by Hogendorp [20, Table 1]. Questions that remain open concern LLRCB grammars with the fmode restriction.
6 Conclusions We investigated bidirectional CD grammar systems as a straight-forward generalization of generating and accepting CD grammars systems as investigated by Csuhaj-Varju et al. [8] and Fernau, Holzer, Bordihn [15]. We have shown that independently of their mode , , =, , or t, BCD grammar systems with two components characterize the recursively enumerable languages even with rightlinear rules and without -productions. This sharply contrasts earlier results on solely generating or accepting grammars, where right-linear rules always only lead to regular languages. For sake of completeness we summarize the known results on generating and accepting CD grammars systems with regular and context-free components work14
ing in t-mode in Figure 1. For other modes we refer to Csuhaj-Varju et al. [8] and Fernau, Holzer, Bordihn [15]. Lgen (RE) = L (BCD(1 1) ; REG[?];t) = Lgen (BCD(1 1) ; REG[?]; t) gen
;
;
Lgen (CS) = L (BCD(0 1) ; CF[?];t) = Lacc (BCD(0 2) ; CF[?]; t) = Lacc (BCD(1 1) ; rREG[?]; t) acc
;
;
;
Lgen (ET0L) = L (BCD(1 0) ; CF[?];t) = Lgen (BCD(3 0) ; CF[?]; t) = Lgen (BCD(1 2) ; rREG[?];t) = Lgen (BCD(2 1) ; rREG[?]; t) gen
;
;
;
;
Lgen (BCD(1 1) ; rREG[?]; t)) ;
Lgen (E0L) Lgen (CF) = Lgen (BCD(2 0) ; CF[?]; t) = Lgen (BCD(1 0) ; CF[?]; t) = Lacc (BCD(0 1) ; CF[?];t) ;
;
;
Lgen (REG) = L (BCD(1 0) ; REG[?]; t) = Lacc (BCD(0 1) ; REG[?];t) gen
;
;
Lgen (BCD(0 0) ; REG[?]; t) = Lgen (BCD(0 0) ; CF[?]; t) = f;g ;
;
Figure 1: Inclusion diagram. Lines with arrows stand for proper inclusions of the \lower" class within the \upper" one. In case of lines without arrows, the strictness of the inclusion is open. Moreover, due to the close connection of BCD grammar systems and regularly controlled bidirectional grammars, we solved some open problems in theory of regularly controlled bidirectional grammars in this way. Let us nally mention that our main theorem also implies the result of Boasson [3] stating that every recursively enumerable language can be generated by a 15
context-free grammar whose rules can be applied both in generating (derivation) and accepting (reduction) manner. Furthermore, it would be interesting to nd characterizations of other known language classes by suitable restrictions of bidirectional grammars.
Acknowledgments Thanks to Gheorghe Paun who drew our attention to accepting CD and HCD grammar systems, respectively, and to Peter R. J. Asveld who informed us about his research on bidirectional grammars.
References [1] D. E. Appelt. Bidirectional grammars and the design of natural language generation systems. In Proceedings of Third Conference on Theoretical Issues in Natural Language Processing (TINLAP-3), pages 185{191, New Mexico State University, Las Cruces, New Mexico, January 7-9, 1987. [2] P. R. J. Asveld and J. A. Hogendorp. On the generating power of regularly controlled bidirectional grammars. International Journal of Computer Mathematics, 40, 1991. [3] L. Boasson. Derivation et reductions dans les grammaires algebriques. In Automata, Languages and Programming, volume 85 of LNCS, pages 109{118. Berlin: Springer-Verlag, July 1980. [4] R. V. Book. Simple representations of certain classes of languages. Journal of the Association for Computing Machinery, 25(1):23{31, January 1978. [5] R. V. Book and F.-J. Brandenburg. Representing complexity classes by equality sets. In Maurer [24], pages 49{57. [6] H. Bordihn and H. Fernau. Accepting grammars with regulation. International Journal of Computer Mathematics, 53:1{18, 1994. [7] H. Bordihn and H. Fernau. Accepting grammars and systems: an overview. In Developments in Language Theory '95, 1995. To appear. [8] E. Csuhaj-Varju et al. Grammar Systems: A Grammatical Approach to Distribution and Cooperation. London: Gordon and Breach, 1994. [9] E. Csuhaj-Varju et al. DNA computing based on splicing. To appear, 1995. [10] K. C ulik, II. On the homomorphic characterization of families of languages. In Maurer [24], pages 161{170. 16
[11] K. C ulik, II. A purely homomorphic characterization of recursively enumerable sets. Journal of the Association for Computing Machinery, 26(2):345{ 350, April 1979. [12] K. C ulik, II. Homomorphisms: Decidability, equality and test sets. In R. V. Book, editor, Formal Language Theory, pages 167{194, Santa Barbara, CA, 1980. Univ. of CA at Santa Barbara, New York: Academic Press. [13] J. Dassow and Gh. Paun. Regulated Rewriting in Formal Language Theory, volume 18 of EATCS Monographs in Theoretical Computer Science. Berlin: Springer, 1989. [14] H. Fernau and H. Bordihn. Remarks on accepting parallel systems. International Journal of Computer Mathematics, 56:51{67, 1995. [15] H. Fernau, M. Holzer, and H. Bordihn. Accepting multi-agent systems. Computers and Arti cial Intelligence, 1996. Submitted for publication. [16] V. Geert. Context-free like forms for the phrase-structure grammars. In M. P. Chytil et al., editors, Mathematical Foundations of Computer Science MFCS'88, volume 324 of LNCS, pages 309{317, 1988. [17] V. Geert. A representation of recursively enumerable languages by two homomorphisms and a quotient. Theoretical Computer Science, 62:235{249, 1988. [18] V. Geert. How to generate languages using only two pairs of parentheses. J. Inf. Process. Cybern. EIK (formerly Elektron. Inf.verarb. Kybern.), 27(5/6):303{315, 1991. [19] P. Gvozdjak and B. Rovan. Time-bounded parallel rewriting and fast generated languages. Received in July, 1995. [20] J. A. Hogendorp. Controlled bidirectional grammars. International Journal of Computer Mathematics, 27:159{180, 1989. [21] J. A. Hogendorp. Time-bounded controlled bidirectional grammars. International Journal of Computer Mathematics, 35:93{115, 1990. [22] L. Ilie and A. Salomaa. 2-testability and relabelings produce everything. Communicated by Gh. Paun, November 1995. [23] M. Latteux and P. Turakainen. On characterizations of recursively enumerable languages. Acta Informatica, 28:179{186, 1990. [24] H. A. Maurer, editor. volume 71 of LNCS. Berlin: Springer, July 1979. 17
[25] Gh. Paun. A characterization of recursively enumerable languages. EATCS Bulletin, 45:218{222, 1991. [26] Gh. Paun. On the generative capacity of hybrid CD grammar systems. J. Inf. Process. Cybern. EIK (formerly Elektron. Inf.verarb. Kybern.), 30(4):231{ 244, 1994. [27] B. Rovan. A framework for studying grammars. In Mathematical Foundations of Computer Science 1981; Proceedings, 10th Symposium Strbstke Pleso, Czechoslovakia, volume 118 of LNCS, pages 473{482, 1981. [28] G. Rozenberg and A. K. Salomaa. The Mathematical Theory of L Systems. Academic Press, 1980. [29] P. Turakainen. A uni ed approach to characterizations of recursively enumerable languages. EATCS Bulletin, 45:223{228, 1991. [30] D. Wood. Iterated a-NGSM maps and ? systems. Information and Control (now Information and Computation), 32:1{26, 1976.
18