External Versus Internal Hybridization for ... - Semantic Scholar

3 downloads 21652 Views 391KB Size Report
simply classical context-free grammar forms called \components" in the the- ory of CD grammar .... using rules of type X are denoted by L(CDn; X? ]; f). When theĀ ...
External Versus Internal Hybridization for Cooperating Distributed Grammar Systems Henning Fernau ? and Markus Holzer and Rudolf Freund 1

1

1

2

Wilhelm-Schickard-Institut fur Informatik, Universitat Tubingen Sand 13, D-72076 Tubingen, Germany email: ffernau/[email protected] 2 Institut fur Computersprachen, Technische Universitat Wien Resselgasse 3, A-1040 Wien, Austria email: [email protected]

Abstract. Mitrana and Paun commenced a study on cooperating dis-

tributed (CD) grammar systems with components working in di erent derivation modes, called hybrid CD grammar systems. In this paper, we introduce several hybrid derivation modes of CD grammar systems and characterize via this internal hybridization (i.e., all grammars of a CD grammar system work in the same mode) several external hybridizations (i.e., grammars of one CD grammar system may work in di erent modes) investigated by Mitrana and Paun. When studying internal versus external hybridizations involving the so-called t-mode of derivation, we obtain cases when internally hybrid systems are strictly more respectively less powerful than their externally hybrid counterparts.

Keywords: Formal languages, hybrid CD grammar systems, regulated rewriting, nite index.

C.R. Categories: F.4.2, F.4.3 ?

Supported by Deutsche Forschungsgemeinschaft grant DFG La 618/3-1.

Contents 1 2 3 4 5 6

Introduction De nitions General observations Combining  k and  k Combining t and  k The (t^  k)- and (t^ = k)-mode 2

1

3 5 8 10 11 20

6.1 Characterizing ordered and programmed languages with appearance checking . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.2 Characterizing programmed languages of nite index . . . . . 34

7 Inside nite index

38

8 Summary and Open Problems

50

7.1 Counting components . . . . . . . . . . . . . . . . . . . . . . . 38 7.2 CD grammar systems of nite index . . . . . . . . . . . . . . . 45

2

1 Introduction Cooperating/distributed (CD) grammar systems, introduced in [3] (with forerunner papers [10, 1]) and extensively studied in the book [4], have become a vivid area of research. Its motivation|stemming both from multiagent systems and from the blackboard model for problem solving introduced in Arti cial Intelligence, cf., e.g., Nilsson [12]|can be summarized as follows: several grammars (agents or experts from the viewpoint of AI), mainly consisting of production sets (corresponding to scripts along whose lines the agents act) are cooperating in order to work on a sentential form (representing their common work), nally generating terminal words in this way. The picture one has in mind is that of several grammar forms (mostly, these are simply classical context-free grammar forms called \components" in the theory of CD grammar systems) \sitting" around a table where there is laying the common workpiece, a sentential form. Some component takes this sentential form, works on it, i.e., it performs some derivation steps, and returns it onto the table, such that another component may continue the work. Of course, there are several ways to formalize this collaboration. In particular, \how long" is a component allowed to work on a sentential form until maybe another component can contribute to this work? In other words: how is the agent reading its script? The following modes have been thoroughly investigated in the literature:  ) k : Each component which takes the sentential form in order to work on it has to perform exactly k derivation steps.  )k : Each component : : : has to perform at most k derivation steps.  )k : Each component : : : has to perform at least k derivation steps.  ): Each component : : : can perform as much derivation steps as it likes.  )t: Each component : : : has to perform as much derivation steps as possible. Recently, Mitrana and Paun considered CD grammar systems where each component may work in a di erent, speci c mode [11, 13]. Mitrana obtained nice canonical forms for these hybrid systems. In this paper, we are going =

3

to introduce other hybrid derivation modes which partly nicely characterize the external hybridizations introduced by Mitrana. In another paper, we recently considered internally hybrid CD array grammar systems for character recognition purposes [6].

How to combine modes

More speci cally, we shall consider the following modes:

 ) k1 ^k1 Each component which takes the sentential form in order to (

)

work on it has to perform at least k and at most k derivation steps.  ) t^k Each component : : : has to perform as much derivation steps as possible, and at least k steps.  ) t^k Each component : : : has to perform as much derivation steps as possible, and at most k steps.  ) t^ k Each component : : : has to perform as much derivation steps as possible, and exactly k steps. 1

(

)

(

)

(

= )

2

This means that we consider possible logical AND combinations of the classical basic mode conditions. (We will introduce an appropriate formalization below.) We do not consider the combination ( ^ f ) for f 2 f k;  k; = k; t j k 2 Ng, since evidently this mode is equivalent to the original mode f . Similarly, we do not consider (f ^ f ) since this mode is equivalent to f . Of course, it would be also possible to use more complex Boolean combinations of the basic modes. For example, a component working in the (f _ f )-mode would correspond to two identical components, one of them working in the f -, and the other one working in the f -mode. As an aside, let us mention that a component working in the = k-mode with appearance checking (as introduced in [4]) corresponds to the same component working in the ((t^  k)_ = k)-mode. We shall not go into details of other Boolean combinations here. Of course, they leave an ample room for further research. Especially, the modes (t^ = k) lead to astonishing results, characterizing (purely syntactically, not requiring the nite index restriction on the 1

2

1

2

4

CD grammar systems) the family of languages of nite index programmed languages. If we combine (t^ = k) via external hybridization with t, we characterize programmed languages with appearance checking which have not been fully identi ed as families of languages obtained via CD grammar systems. In the following section, we introduce the notions sketched in this introduction more formally. Then, we present our results ordered by the di erent mode combinations introduced above.

2 De nitions We assume the reader to be familiar with some basic notions of formal language theory, as contained in Dassow and Paun [5]. In general, we have the following conventions.  denotes inclusion, while ( denotes strict inclusion, and the set of positive integers is denoted by N. The empty word is denoted by . Further let j jA be the number of A's in . We consider two languages L ; L to be equal i L n fg = L n fg, and we simply write L = L in this case. For the convenience of the reader, we repeat the basic de nitions of CD grammar systems, hybrid CD grammar systems, respectively, adapted from Paun [13]. The families of languages generated by regular, linear context-free, contextfree, context-sensitive, general type-0 Chomsky grammars, and ET0L systems, and context-free matrix grammars are denoted by L(REG), L(LIN), L(CF), L(CS), L(RE), L(ET0L), and L(M; CF), respectively. The class of nite languages is denoted by L(FIN). We attach ? in our notations if erasing rules are not permitted. A programmed grammar is a septuple G = (N; T; P; S; ; ; ), where N , T , and S 2 N are the set of nonterminals, the set of terminals, and the start symbol, respectively. In the following we use VG to denote the set N [ T . P is the nite set of productions ! , and  is a nite set of labels (for the productions in P ), such that  can be also interpreted as a function which outputs a production when being given a label;  and  are functions from  into the set of subsets of . For (x; r ), (y; r ) in VG  and (r ) = ( ! ), we write (x; r ) ) (y; r ) i either 1. x = x x , y = x x and r 2 (r ), or 1

1

2

1

1

1

1

2

2

2

2

2

1

2

2

1

5

1

2. x = y and rule ! is not applicable to x, and r 2 (r ). In the latter case, the derivation step is done in appearance checking mode. The set (r ) is called success eld and the set (r ) failure eld of r . As  usual, the re exive transitive closure of ) is denoted by ). The language generated by G is de ned as L(G) = f w 2 T  j (S; r ) ) (w; r ) for some r ; r 2  g: The family of languages generated by programmed grammars containing only context-free core rules is denoted by L(P,CF; ac). When no appearance checking features are involved, i.e., (r) = ; for each label r 2 , we are led to the family L(P,CF). The special variant of a programmed grammar where  =  is said to be a programmed grammar with unconditional transfer. Observe that due to our de nition of derivation, a production with empty goto- eld is never applicable. Hence, we assume without saying that grammars with unconditional transfer contain only productions with non-empty go-to elds. We denote the class of languages generated by programmed grammars with context-free productions and with unconditional transfer by L(P,CF,ut). An ordered grammar is a quintuple G = (N; T; P; S; ), where N , T , P , and S 2 N are the nonterminal alphabet, terminal alphabet, set of productions, and axiom, respectively.  is a partial order on P . A production ! is applicable to a string x 2 V  i x = x x for some x ; x 2 VG, and x contains no subword 0 such that 0 ! 0 2 P for some 0 and 0 ! 0  ! ; the application of ! to x yields y = x x . As usual, the yield relation is denoted by ), and its re exive and transitive closure is denoted by ) . The family of languages generated by ordered grammars with context-free productions is denoted by L(O; CF). A CD grammar system of degree n, with n  1, is a (n + 3)-tuple G = (N; T; S; P ; : : : ; Pn); where N , T are disjoint alphabets of nonterminal and terminal symbols, respectively, S 2 N is the axiom, and P ; : : :; Pn are nite sets of rewriting rules over N [ T . 2

1

1

1

1

1

3

2

1

2

1

1

1

2

2

2

1

1

3

If there is no confusion, we sometimes write for a derivation of a programmed grammar  == : = 1 ==r1 2 ==r2 n= G rn?1 D

instead of

D

:(

S; r1

S

)=(

w 1 ; r1

)

) 

) w

w

)

(

w 2 ; r2

)

)  )

6

) w

(

w n ; rn

w 2 V

)=(

w; rn

)

2 V

 G



.

Throughout this paper, we consider only regular, linear context-free, and context-free rewriting rules. Let G be a CD grammar system. For x; y 2 (N [ T ) and 1  i  n, we write x )i y i x = x Ax , y = x zx for some A ! z 2 Pi. Hence, subscript i refers to the component to be used. Accordingly, x )mi y denotes an m-step derivation of component number i, where x )i y i x = y. Now, let f 2 D := B [ f ( k^  l); (t^  k); (t^  k); (t^ = k) j k; l 2 N; k  l g, where B := f ; t g [ f  k; = k;  k j k 2 N g are the classical basic modes. We de ne the relation )fi by x )fi y : () 9m  0(x )mi y ^ P (f; m; u; v)) ; where P is a predicate de ned as follows (let k 2 N, f ; f 2 B ): predicate de nition P (= k; m; u; v) m=k P ( k; m; u; v) mk P ( k; m; u; v) mk P (; m; u; v) m0 :9w(v )i w) P (t; m; u; v) P ((f ^ f ); m; u; v) P (f ; m; u; v) ^ P (f ; m; u; v) Since P (( ^ f ); m; u; v) i P (f ; m; u; v),  may be used as a don't care in our subsequent notations. The language generated in the f -mode, f 2 D, by a CD grammar system G with only generating rules is de ned as: Lf (G) := f w 2 T  j S )fi1 )fi2 : : : )fim?1 m? )fim m = w with m  1, 1  ij  n, and 1  j  m g: If f 2 D and X 2 fREG; LIN; CFg, then the families of languages generated in f -mode by [-free] CD grammar systems with at most n components using rules of type X are denoted by L(CDn ; X [?]; f ). When the number of components is not restricted, the we will write L(CD1; X [?]; f ). If each component of a CD grammar system may work in a di erent mode, then we get the notion of (externally) hybrid CD grammar systems of degree n, with n  1, which is a (n + 3)-tuple G = (N; T; S; (P ; f ); : : : ; (Pn; fn )); 1

2

0

1

1

1

2

2

2

2

1

1

1

1

7

2

1

2

where N; T; S; P ; : : : ; Pn are as in CD grammar systems, and fi 2 D, for 1  i  n. Thus, we can de ne the language generated by a HCD grammar system with only generating rules as: 1

L(G) := f w 2 T  j S )fi1i1 w )fi2i2 : : : )ifmim??11 wm? )fimim wm = w with m  1, 1  ij  n, and 1  j  m g 1

1

If F  D and X 2 fREG; LIN; CFg, then the family of languages generated by [-free] HCD grammar systems with at most n components using rules of type X , each component working in one of the modes contained in F , are denoted by L(HCDn; X [?]; F ). Similarly, L(HCD1 ; X [?]; F ) is written when the number of components is not restricted. Observe that not every combination of modes as introduced above introduces something really new. For example, the ( k^  k)-mode is just another notation for the = k-mode. Let us nally mention that regular and linear components working in the one of the modes introduced above can only generate regular or linear languages, respectively, since all the necessary information can be put into the only nonterminal symbol. This observation is also true if we would require normal forms of regular grammars, e.g., productions of the forms A ! aB , A ! a, where A; B are nonterminal symbols and a is a terminal symbol, as it is commonly done in the Russian literature. The only di erence would be that some of the following lemmas (formulated for context-free grammars but also valid for general regular grammars) would not be true in the case not admitting unit productions. 4

3 General observations In this section, we will collect mainly two general but not very deep techniques which are very useful when dealing with hybrid systems, namely the prolongation and the lcm (least common multiple) technique which both are also used by Mitrana [11] in his framework.

Lemma 3.1 (Prolongation Technique) Let  2 f=; ; g, 2 f; tg, and k; ` 2 N such that k divides `. Every context-free component working 4

An example for such a construction is worked out in [11, Theorem 2].

8

in ( ^ k)-mode can be replaced by a context-free component working in ( ^ `)-mode. The simulating component has -productions only if the simulated component has -productions. Proof. Of course, we can assume ` > k. Let N 0 = f A1; : : :; Am g be the set of nonterminals occurring as left-hand sides of productions of the component under consideration. We introduce a number of new nonterminals, namely f (Ai; j ) j 1  i  m ^ 1  kj < ` g. Instead of a production Ai ! w in our original component, we introduce productions Ai ! (Ai; 1), (Ai; j ) ! (Ai; j + 1) for 1  k(j ? 1) < `, and (Ai; `=k ? 1) ! w. 2 Observe that in the construction above, it is possible that their remain coloured symbols (Ai; j ) in the sentential form when leaving the simulating component. When considering grammar systems, this is not bad as long as we colour each component individually in addition. When considering externally hybrid system, components working in the t-modes (and variants thereof) should block in the presence of such symbols (simply by unit productions of the form (Ai; j ) ! (Ai; j )). Theorem 3.2 Let  2 f=; ; g, 2 f; tg, and n; k; ` 2 N (or n = 1) such that k divides `. We nd L(CDn; CF[?]; ( ^ k))  L(CDn; CF[?]; ( ^ `)): 2 Observe that this theorem implies a prime number lattice structure on the families of languages L(CDn; CF[?]; ( ^ k)) where n 2 N [ f1g is xed. Lemma 3.3 (lcm Technique) Let  2 f=; ; g, 2 f; tg. Components P1 ; : : :; Pm working in modes ( ^ k1 ); : : : ; ( ^ km ), respectively, can be replaced by m components each working in ( ^ `)-mode, where ` = lcmf ki j 1  i  m g. The simulated components have -productions only if the simulated components have -productions. The least common multiple technique has been used in [11, Lemma 3]. Proof. By the prolongation technique, the component Pi working in mode ( ^ ki) can be replaced by one component working in mode ( ^ `). 2 Thus, we directly obtain the next theorem which shows that there is no di erence if one uses internal or external hybridization, respectively, in many cases.

9

Theorem 3.4 Let  2 f=; ; g, 2 f; tg. Then we have [ L(CD1; CF[?]; ( ^ k)) = L(HCD1 ; CF[?]; f ( ^ k) j k 2 N g): 2

N

k2

4 Combining  k2 and  k1 We start our investigations with interval mode. In this way we obtain the rst result on external versus internal hybridization.

Lemma 4.1 Every context-free component working in the  k -,  k -, = k -, or -mode, respectively, can be simulated by a component working in ( k ^  k )-mode. The simulating component has -productions only if the 1

2

1

1

2

simulated component has -productions.

Proof. One component working in  k -,  k -, = k -, -mode, respectively, corresponds to the same component working in ( k ^  2k ? 1)-, ( 1^  k )-, ( k ^  k )-, ( 1^  1)-mode, respectively. 2 1

2

1

1

2

1

1

1

It does not seem possible to simulate the internally hybrid mode without external hybridization using the classical modes.

Lemma 4.2 Every context-free component working in the ( k ^  k )mode can be simulated by k ? k + 1 components, each one working in the = k-mode for some k with k  k  k . The simulating component has 1

2

2

1

1

2

-productions only if the simulated component has -productions.

Proof. k ? k + 1 copies of the original component are run in = k-mode, for a speci c k with k  k  k . 2 2

1

1

2

Is it necessary to have di erent k and k within externally hybrid systems with components working in the ( k ^  k )-mode? The answer is no, as shown in the following lemma. 1

2

1

2

Lemma 4.3 Two context-free components working in the ( k ^  k )- and ( k ^  k )-mode can be replaced by (k ? k + k ? k + 2) components working in the ( `^  `)-mode, ` = lcmf k j (k  k  k )_(k  k  k ) g. 1

3

4

2

1

1

4

2

3

2

3

4

The simulating component has -productions only if the simulated component has -productions.

10

Proof. By the previous lemma, we know how to replace the given components by (k ? k + k ? k + 2) components working in the = k-mode with k  k  k or k  k  k . By lcm technique, we obtain (k ?k +k ?k +2) components working in the = `-mode (which trivially coincides with the ( `^  `)-mode), where ` = lcmf k j (k  k  k ) _ (k  k  k ) g. 2 2

1

1

2

3

4

3

4

2

1

2

1

3

4

3

4

Theorem 4.4 [ L(CD1 ; CF[?]; ( k ^  k )) =

N

1

k1 ;k2 2 k1 k2

2

= L(HCD1; CF[?]; f( k ^  k ) j k ; k 2 N; k  k g) = L[(HCD1; CF[?]; f k;  k; = k j k 2 Ng) = L(CD1 ; CF[?]; = k): 1

N

k2

2

1

2

1

2

2

We can trade o external versus internal hybridization, but this is not surprising in this case, since already the = k-mode is sucient. Finally, let us mention that there is a number of open problems (also in the classical theory of CD grammar systems) concerning the question whether the hierarchies formed by, say, L(CDn ; CF[?]; k) for di erent n and some xed relation  2 f; ; =g, and k 2 N, collapses or not.

5 Combining t and  k This section is structured as follows: rst, we show that external and internal hybridization also coincide when combining t-mode and  k-mode, employing more involved simulations. Then, we show that the corresponding language family coincides with the class of languages de nable via ET0L systems with permitting random context or, equivalently, by recurrent programmed grammars (with appearance checking), see [5, 19]. It is an old question whether this language family coincides with that one generable by programmed grammars with appearance checking or not. Since the family L of languages generable by HCD grammar systems (using only the classical modes) is therefore a superset of ET0L languages with permitting context and at the same time a subfamily of L(P,CF[?],ac), the question posed by Paun in [13] whether L coincides with L(P,CF[?],ac) or not is a tough one, 11

indeed. Finally, we consider the number of components in some detail, also comparing the new internally hybrid mode (t^  k) with the father modes t and  k. As a simple application of the prolongation technique, and since (t^  1)and t-mode are trivially equivalent, we obtain:

Lemma 5.1 For each k 2 N, every context-free component working in the t-mode can be simulated by a component working in (t^  k)-mode. The

simulating component has -productions only if the simulated component has -productions. 2

Lemma 5.2 For each k 2 N, every context-free component working in the  k-mode can be simulated by four components working in (t^  k)-mode. The simulating components have -productions only if the simulated component has -productions. Proof. Let P be a context-free component working in the  k-mode with terminal alphabet T and nonterminal alphabet N . We construct four contextfree components P i with nonterminal alphabet N 0 working in the (t ^ k)mode which serve for the same job. Let V = N [ T and N 0 := N [ (V [ fLg)  f0; 1; : : : ; 2kg [ fF g; P := fA ! (A; 2) j A 2 N g [ f(A; j ) ! (A; j + 1) j A 2 N; 2  j  kg [ fX ! F j X 2 N 0 n N  fk + 1gg; P := f(A; k + 1) ! w0 j A )i w in P; 1  i  k; h(w0) = w; w0 2 (V  fk + 1g)(V  fig)(V  fk + 1g)g [ f(A; k + 1) ! (L; i) j A )i  in P; 1  i  kg [ f(A; k + 1) ! (A; 0) j A 2 N g; where h : V  f0; 1; : : : ; k + 1g ! V is a morphism given by (A; j ) 7! A; P := f(A; i) ! (A; i ? 1) j A 2 V [ fLg; 1  i  kg; P := f(A; 0) ! (A; k + 2) j A 2 V [ fLgg [ f(A; k + j ) ! (A; k + j + 1) j 1 < j < 2k; A 2 V [ fLgg [ f(A; 2k) ! A j A 2 V g [ f(L; 2k) ! g [ f(A; i) ! F j A 2 V [ fLg; 1  i  kg: ( )

(1)

(2)

(3)

(4)

12

The rst and the last component perform only colouring operations, their seemingly unnecessary complexity is due to the possibility of deriving a sentential form with less than k nonterminals; therefore, a prolongation technique as in Lemma 5.1 is used. After applying P , all nonterminals have the form (A; k + 1). The second component hopefully simulates at least k derivation steps of P , which is tested by the third component. The number of simulated derivation steps is stored in the second component i of a nonterminal of the form (B; i) with B 2 V [ fLg. The last group of productions in P simply colours those nonterminals which should not take part in a simulated derivation. Further observe that only derivation paths of a length of at most k need to be considered, since other more lengthy derivations could be split into multiple applications of P . P can perform (at least) k derivation steps only if a suciently large number of steps has been simulated via P . In case of an incorrect guess of simulation steps or if P has not been applicable at all, P cannot perform k derivation steps, and the simulation stops at this stage. The colouring component P has the additional task to simulate the erasing productions of P . Note that in case P has no erasing productions, none of the simulating components will have erasing productions. 2 (1)

(2)

(3)

(2)

(3)

(4)

Lemma 5.3 For each k 2 N, every context-free component working in the (t^  k)-mode can be simulated by three components working in the t-mode and one component working in the  k-mode. The simulating components have -productions only if the simulated component has -productions.

Proof. The proof is quite similar to the one of the preceding lemma. Let P be a context-free component working in the (t^  k)-mode with terminal alphabet T and nonterminal alphabet N . We construct four context-free components P i with nonterminal alphabet N 0 working in the t- and  kmode, respectively, which do the same job. Let V = N [ T and ( )

N 0 := N [ (V [ fLg)  f0; 1; : : : ; kg [ fF g; P := fA ! (A; 0) j A 2 N g [ fX ! F j X 2 N 0 n N  f0gg; P := f(A; 0) ! w0 j A )i w in P; h(w0) = w; 1  i  k; w0 2 (V  fkg)(V  fig)(V  fkg)g [ f(A; k) ! w0 j h(w0) = w; w0 2 (V  fkg) g (1)

(2)

+

13

[ f(A; 0) ! (L; i) j A )i  in P; 1  i  kg [ f(A; k) ! (L; k) j A !  2 P g [ fX ! F j X 2 N g; where h : V  f 1; : : : ; k g ! V is a morphism given by (A; j ) 7! A; := f(A; i) ! (A; i ? 1) j A 2 V; 1  i  kg [ f(L; i) !  j 1  i  kg; := f(A; 0) ! A j A 2 V g [ fX ! F j X 2 N 0 n N g:

P P The rst and the last component working in the t-mode perform only colouring operations, the rst one transforming nonterminals of the form A into the form (A; 0), and the last one doing the opposite colouring. Their seemingly unnecessary complexity is due to the possibility of deriving a sentential form with less than k nonterminals. The second component hopefully simulates at least k derivation steps of P , which is tested by the third component P which is the only one working in  k-mode. Due to the fact that P is the only component which can handle nonterminals of the form (A; i) for i > 0, we enforce that also the third component works until no symbols of the form (A; i) with i > 0 are present in the sentential form any more. 2 Combining the above lemmas together with the lcm lemma 3.3 for (t^  k)-mode components, by again using simple colouring techniques in order to discriminate the simulation of di erent components, we can obtain the next results. The obvious proofs are left to the reader. Observe that we can again trade o internal versus external hybridization. Theorem 5.4 For each k  1, we have L(HCD1 ; CF[?]; ft;  kg) = L(CD1; CF[?]; (t^  k)): (3)

(4)

(3)

(3)

Moreover,

L(HCD1 ; CF[?]; ftg [ f  k j k 2 N g) = L[(HCD1; CF[?]; f (t^  k) j k 2 N g) = L(CD1; CF[?]; (t^  k)):

N

k2

2

Are there other characterizations of the family of languages encountered in the preceding theorem? Indeed, as we show in the following, that class 14

coincides with the class of languages de nable via ET0L systems with permitting random context or, equivalently, by recurrent programmed grammars. Von Solms [19] considered recurrent context-free programmed grammars. A context-free programmed grammar G is a recurrent context-free programmed grammar if for every p 2  of G, if (p) = ;, then p 2 (p), and if (p) 6= ;, then p 2 (p) = (p). The corresponding language families are denoted by L(RP; CF[?]; ac)

Theorem 5.5 If k > 1, then L(CD1; CF[?]; (t^  k)) = L(RP; CF[?]; ac) : Proof. First, we show the simulation of a CD grammar system G = (N; T; S; P ; : : :; Pn ) with Pi = fpi; ; : : : ; pi;ni g and pi;j = Ai;j ! wi;j . For 1  i  n and 1  j  ni , pi;j is simulated by productions (ri;j;`) = Ai;j ! wi;j . For 1  ` < k, let (ri;j;`) = f ri;j0;` j 1  j 0  ni g [ fri;j;` g and (ri;j;`) = ;: De ne (ri;j;k) = f ri;j0;k j 1  j 0  ni g [ fti; g and (ri;j;k ) = ;: 1

1

+1

1

After (at least) k such production applications (the ri;j;`-productions test the  k-property of derivation), it has to be checked whether the t-mode condition is met. This is done by rules labelled ti;j with (ti;j ) = Ai;j ! F with (ti;j ) = (ti;j ) = fti;j ; ti;j g, if 1  i  n and 1  j < ni, and with (ti;ni ) = (ti;ni ) = fti;ni g [ f ri0; ; j 1  i0  n g) for 1  i  n. The initialization is done by (init) = S 0 ! S with (init = finitg [ f ri0; ; j 1  i0  n g and (init) = ;: In total, we have the simulating grammar G0 = (N [fS 0; F g; T; P; S 0; ; ; ), where the rule set P is implicitly de ned above. Secondly, we give the simulation of a recurrent programmed grammar. We give the proof only for the simulation in case k = 2. The other modes can be obtained using prolongation techniques at the \external label symbol" which we admit within the sentential form, since recurrent programmed languages are trios (the proof of Theorem 3.4 in [8] can be adapted to the case admitting appearance tests), and the involved CD grammar system language families are closed under union. We only sketch the construction of the simulation of a production (p) = A ! w with (p) = (p) 3 p. First we guess whether the success or failure case is entered. For each of these cases, a separate simulating component is introduced whose rules are described next. +1

11

11

15

1. Failure case: There are productions p ! p0 and p0 ! q00 with q 2 (p). Furthermore, we have r ! F for all labels r, and A ! F , A ! F , A0 ! F . 2. Success case: We have productions p ! q00 for q 2 (p) and A0 ! w. Furthermore, we have r ! F for all labels r, and A ! F . Observe that the (t^  2)-mode forces at least one production A0 ! w to be applied. In case of empty failure eld, the rst case is deleted. We have two colouring tables: 1. c contains p00 ! p0 and p0 ! p for every label p and A ! A, A0 ! A for every nonterminal A. 2. c contains A ! A and A ! A0 for every nonterminal A. The exit point is given by the table exit containing p ! p0 and p0 !  for every label p and A ! F , A ! F , A0 ! F for every nonterminal symbol A. The entry point is given by the table init containing S 0 ! S 0, S 0 ! pS for every label p. 2 As an aside, let us mention that the same construction shows that languages described by CD grammar systems working in  k-mode can be simulated by recurrent programmed grammars without appearance checks. We have shown in [8, Theorem 3.2] that L(RC; CF[?])  L(RP; CF[?])  (P; CF[?]); where L(RC; CF[?]) is the family of languages generable by permitting random context grammars, see [5]. Hence, we have a link to the old open problem whether permitting random context grammars are as powerful as programmed grammars without appearance checks or not. As regards the hierarchical structure of L(CD1; CF[?]; (t^  k)) for di erent k, we have the prime number lattice structure implied by the prolongation technique, but we do not know whether there are strict inclusions besides the one contained in the following theorem. Corollary 5.6 For k > 1, we have L(ET0L) = L(CD1; CF[?]; (t^  1)) ( L(CD1; CF[?]; (t^  k)): 1

2

16

Proof. Trivially, L(CD1; CF[?]; (t^  1)) = L(CD1; CF[?]; t), and the latter class equals L(ET0L), see [4, Theorem 3.10]. The family of languages de ned via ET0L systems with random context (which coincides with L(CD1; CF[?]; (t^  k)) by Theorem 5.5) is a strict superset of L(ET0L)

by [5, Theorem 8.3]. 2 Observe that the reasoning of the last theorem also gives an alternative proof of [13, Theorem 3.5]. The number of components working together in a grammar system is a natural measure of descriptional complexity. It is known (see, e.g., [11, Lemma 2]) that n components working in the t-mode can be simulated by (at most) three components working in the t-mode. The proof of the quoted lemma is basically also valid for (t^  k)mode components. As the (t^  1)-mode and the simple t-mode coincide, the following result is obvious by the characterization of L(CF) and L(ET0L) given in [4, Theorem 3.10]. Corollary 5.7 Let n 2 N [ f1g, n  3. Then we nd: L(CF) = L(CD ; CF[?]; (t^  1)) = L(CD ; CF[?]; (t^  1)) ( L(CDn ; CF[?]; (t^  1)) = L(ET0L): 2 For general k the situation is a little bit di erent from the previous one. Theorem 5.8 Let n 2 N [ f1g, n  3. For each k  2, L(CF) = L(CD ; CF[?]; (t^  k)) ( L(CD ; CF[?]; (t^  k))  L(CDn; CF[?]; (t^  k)) = L(RP; CF[?]; ac)  L(P; CF[?]; ac): Proof. The inclusions themselves are trivial or follow by [13, Theorem 3.6] together with the well-known equivalence of matrix and programmed grammars. Using individually coloured nonterminals for each component, any 5

(

5

1

2

1

2

Only the colouring components have to be prolongated in order to turn them into )-mode components, see Lemma 5.1 above.

t^  k

17

CD grammar system with an arbitrary number of (t^  k)-mode components can be simulated by a CD grammar system with three components, where two components serve as switches between the nonterminal colours, and one component does the actual simulation. In order to prove the strictness of the rst inclusion, we show how the non-context-free language L = fanan    ank j n  1g can be generated by a CD grammar system with two (t^  k)-mode components. Namely, consider the grammar system G = (N; T; S ; P ; P ), where P ; P work in the (t^  k)-mode. For both components, we take as nonterminal alphabet N = fS ; : : :; Sk ; A ; : : : ; Ak ; A0 ; : : :; A0k g and as terminal alphabet T = fa ; : : : ; ak g. The components Pi are the following ones: P = fA ! a A0 a g [ fAi ! A0iai j 2  i  kg [ fA ! a a g [ fAi ! ai j 2  i  kg P = fSi ! Si j 1  i < kg [ fSk ! A    Ak g [ fA0i ! Ai j 1  i  kg : Then we have L(G) = L; namely, consider a derivation of G of the form S ) k A    Ak , and either or A    Ak ) k a A0 a    A0k ak A    Ak ) k a    ak (or a non-vanishing number of occurrences of A0i less than k is obtained such that neither P nor P can perform k derivation steps). The above reasoning can be applied inductively, leading to the above-listed, non-context-free language generated by G. 2 As we showed in Theorem 5.4, we can trade o internal versus external hybridization in the case of mixing t-mode and  k-mode. It is interesting to compare the language classes obtained by CD grammars systems with n collaborating components in both of these modes with the corresponding class obtained by CD grammars systems with n components working in (t^  k)mode. Corollary 5.9 In general, for each n; k 2 N (or n = 1), we nd L(CDn ; CF[?]; t) = L(CDn ; CF[?]; (t^  1))  L(CDn ; CF[?]; (t^  k)): 1

2

+1

1

1

1

1

1

2

1

2

1

1

1

1

+1

+1

1 2

1

1 2

+1

2

1

+1

= 2

1

1

= 1

1

1

1

+1

1

2

18

= 1

1

1 2

+1

More speci cally, for k 2 N and k > 1: 1. L(CF) = L(CD1; CF[?]; t) = L(CD1 ; CF[?]; (t^  k)); 2. L(CF) = L(CD2; CF[?]; t) ( L(CD2; CF[?]; (t^  k)); 3. for all n  3 or n = 1,

L(ET0L) = L(CDn; CF[?]; t) ( L(CDn; CF[?]; (t^  k)):

Proof. The strictness of the trivial inclusions follows by Corollary 5.6 and

2

Theorem 5.8.

Theorem 5.10 For k; n 2 N, and n > 2 or n = 1, we have L(CDn ; CF[?];  k) ( L(CDn; CF[?]; (t^  k)) : Proof. As according to the preceding corollary we have L(ET0L)  L(CD ; CF[?]; (t^  k)), and since L = f a m j m  0 g 2 L(CD ; CF ? ; (t^  k)) ; by [9], we obtain L 62 L(M,CF), which is a superset of L(CDn; CF[?];  k) 3

2

3

by [13, Diagram 1]. The inclusion itself follows by Lemma 5.2 combined with Theorem 5.8. 2 Furthermore, we trivially know that, for each k 2 N,

L(CD ; CF[?];  k) = L(CD ; CF[?]; (t^  k)) ; since this language family equals L(CF). We do not know anything about the 1

1

relation when we restrict our attention to at most two components. Possibly, L(CD ; CF[?];  k) and L(CD ; CF[?]; (t^  k)) are incomparable, if k  2. 2

2

19

6 The (t^  k)- and (t^ = k)-mode

The combination of t-mode and  k- respectively = k-mode is especially interesting, since here we do not get the same power if we regard external or internal hybridization (at least, only weak inclusion relations are known). More excitingly, we get even purely syntactical characterizations of programmed languages with nite index restriction. If we further consider internal hybridization of t-mode and  k-mode (or = k-mode) combined with external hybridization of some other mode, we even obtain characterizations of the family of programmed languages with appearance checking, which means that we have a characterization of the recursively enumerable languages when admitting -productions. Such forms of characterizations were unknown before. The next lemma is useful for proving that in general languages generated by CD grammar systems with (t^  k) components are already generated by CD grammar systems with (t^ = `) components. Obviously, one component working in (t^  k)-mode can be simulated by k components with the same productions as the original component, but working in modes (t^ = 1); : : : ; (t^ = k). These k components can be replaced by Lemma 3.3 by k components all of them working in (t^ = `)-mode, where ` = lcmf1; : : : ; kg. Hence we obtain:

Lemma 6.1 Every context-free component working in (t^  k) mode can be replaced by k components working in (t^ = `)-mode, ` = lcmf1; 2; : : : ; kg.

The simulating components have -productions only if the simulated component has -productions. 2

Corollary 6.2 [ [ L(CD1; CF[?]; (t^  k))  L(CD1; CF[?]; (t^ = k)):

N

k2

N

k2

2

We show later (see Theorem 6.18) that the inverse inclusion holds, too. In this way we get a characterization of the programmed context-free language of nite index (with appearance checking)|for a de nition of the nite index restriction we refer to the next subsection. At this point we should give an example, which demonstrates the power of CD grammar systems working in (t^ = 1)-mode. 20

Example: Let G be a CD grammar system with the following production

sets:

P = fS ! AB g P = fA ! aA bg [ fB ! F; B ! F; B ! F g P = fB ! B cg [ fA ! F; A ! F; A ! F g P = fA ! A g [ fB ! F; B ! F; B ! F g P = fB ! B g [ fA ! F; A ! F; A ! F g P = fA ! Ag [ fB ! F; B ! F; B ! F g P = fB ! B g [ fA ! F; A ! F; A ! F g P = fA ! A g [ fB ! F; B ! F; B ! F g P = fB ! B g [ fA ! F; A ! F; A ! F g P = fA ! abg [ fB ! F; B ! F; B ! F g P = fB ! cg where fa; b; cg is the terminal and fA; B; A ; B ; A ; B ; A ; B ; F g is the nonterminal alphabet. The language generated by that system, if all components work in (t^ = 1), is the non-context-free language f anbncn j n  1 g. Observe that we could have circumvented the use of a failure symbol using a production X ! X instead of X ! F due to the peculiarities of the (t^ = 1)-mode. For reasons of presentation of the proofs, we start with the proof of the announced fact on the characterization of programmed languages with appearance checking. 1

2

1

3

1

2

1

3

2

3

4

1

2

2

3

5

1

2

1

3

6

2

7

2

1

8

3

9

3

10

3

11

3

3

1

2

3

1

2

3

1

2

1

1

1

2

2

2

3

3

6.1 Characterizing ordered and programmed languages with appearance checking In this section we study HCD grammar systems with (t^ = k)-components in more detail. We give characterizations of ordered context-free and programmed context-free languages with appearance checking. First, we consider the case of HCD systems with components working in (t^ = 1)-mode. Lemma 6.3 If ; 6= F  f; tg [ f  k; = k;  k; ( k^  `); (t^  k) j k; ` 2 N; k  ` g [ f= 1;  1g, then L(O; CF[?])  L(HCD1; CF[?]; F [ f(t^ = 1)g): 21

Proof.

Let G = (N; T; P; S; ) be an ordered grammar generating a language L. We assume that every production of P has a unique label p from label-set ; thus we can speak of production p : A ! w. For each of the cases -, t-, and  k-mode we construct an equivalent HCD grammar G0 with nonterminals

N 0 = fF g [ f A; (A; 1); (A; 2); (Ap; 3) j A 2 N; p 2  g: Each of these grammars contains a production set Pinit = fSnew ! S g to start the derivation from a new start symbol Snew . Set Pinit may work in any mode. Additionally, grammar G0 contains production sets to perform necessary colouring operations. The construction of the these sets di ers, because they depend on F . In the following, we describe the construction of these sets which work in the considered mode. -mode: Set Pcol = f A ! (A; 1); (A; 1) ! A j A 2 N g. t-mode: Here, we use to colouring components Pcol; and Pcol; which are de ned as follows: 1

2

Pcol; = fA ! (A; 1); A ! (A; 2) j A 2 N g and Pcol; = f(A; 2) ! A j A 2 N g: 1

2

Finally for every production p : A ! w, with p 2 , instead of the original set P of the ordered grammar G we introduce two production sets Pp; and Pp; of the HCD grammar as follows: 1

2

Pp; = f(A; 1) ! (Ap; 3)g [ f B ! F j 9B ! v 2 P (A ! w  B ! v) g [ f (B; 1) ! F; (B; 2) ! F j B 2 N g [ f(Bq; 3) ! F j B 2 N; q 2 ; Bq 6= Ap g; 1

Pp; = f(Ap; 3) ! wg [ f(B; 1) ! F; (B; 2) ! F; (Bq; 3) ! F j B 2 N; q 2  g: 2

Each of these components work in (t^ = 1)-mode. 22

First, consider the -mode. We show that by induction we have a derivation in G0 of the form t^ t^   Snew == ) S ) uAv == ) u ( A; 1) v == ) u ( A; 3) v == ) uwv; p; p; init P (

=1) 1

col

(

=1) 2

for u; v 2 (N [ T ) and A 2 N if and only if in the ordered grammar we nd a derivation: S ) uAv )p uwv: The only way to start the derivation process in G0 is to use the rule Snew ! S . Assume now that we have already derived a word uAv in G0 and that we want to simulate an application of rule p : A ! w of the ordered grammar under consideration of the order relation . Application of any Pq; or Pq; for some label q does not change the sentential form or would hopefully introduce a failure symbol F . So we have to apply the colouring component Pcol obtaining a word u0(A; 1)v0 where some nonterminals B 's are replaced by (B; 1)'s. Since we want apply rule p of the ordered grammar only components Pq; for rules q with left-hand side A can rewrite the previously introduced symbol (A; 1) and Pq; would introduce a failure symbol. Thus choose set Pp; . Since it works in (t^ = 1)-mode this component checks (1) whether u0; w0 2 (N [ T ) (the words are unchanged) and (2) applies (A; 1) ! (Ap; 3) under consideration of . The only chance to successfully rewrite (Ap; 3) is to use Pp; at a time, but before its possible to introduce other (Ap; 3) symbols, via the deviating derivation with Pcol and Pp; . If this is done, we have no change to terminate anymore, because Pp; works in (t^ = 1)-mode. So, we have only one (Ap; 3) present in the current sentential form, and a nal application of Pp; gives the word uwv. This proves our claim. In case of t-mode one can make a similar analysis of the derivation process. The main di erence lies in the fact that one has now two colouring components. The more complex modes (t^  k),  k, = k, and ( k;  `) follow by simple prolongation techniques. 2 1

1

1

2

1

2

1

2

3

Theorem 6.4 If ; 6= F  f; tg [ f  k j k 2 N g [ f= 1;  1g, then L(HCD1 ; CF[?]; F [ f(t^ = 1)g) = L(O; CF[?]): 23

Proof. Since -mode trivially coincides with  k-mode, = 1-mode, and  1-mode, it suces to consider only the -mode.

The inclusions from right to left have been proved in the lemma above. Conversely, let G = (N; T; S; P ; : : :; Pn ) be a HCD grammar system. For each Pi , for 1  i  n, we construct a bunch of productions in an ordered grammar. The construction depends on the mode of the component Pi. The ordered grammar G0 = (N 0; T; P; S; ) has nonterminals 1

N 0 = N [ f [A; i]; [A; i] j A 2 N; 1  i  n g [ fF g: Production set P contains for every nonterminal A 2 N and every 1  i  n the colouring rules

A ! [A; i]  f [B; j ] ! F j B 2 N; i 6= j g [ f [B; j ] j B 2 N; 1  j  n g: For 1  i  n de ne the homomorphisms hi : (N [ T ) ! (N 0 [ T ) with hi(A) = [A; i] if A 2 N and hi(a) = a if a 2 T . Then for every component Pi , with 1  i  n, we introduce a bunch of rewriting rules in P according the mode the component works: -mode: For every production A ! w in Pi set [A; i] ! hi(w)  f B ! F j B 2 N g [ f [B; j ] j B 2 N; i 6= j g [ f [B; j ] j B 2 N; 1  j  n g: For every B 2 N we introduce the re-colouring rules [B; i] ! B  f [C; j ] ! F j C 2 N; i 6= j g [ f [C; j ] j C 2 N; 1  j  n g:

t-mode: For every production A ! w in Pi we set [A; i] ! hi (w)  f B ! F j B 2 N g [ f [B; j ] ! F j B 2 N; i 6= j g [ f [B; j ] j B 2 N; 1  j  n g: 24

In comparison to the -mode the re-colouring rules are a little bit more complicated, because, we additionally have to test, whether the application of rules from Pi are done in a t-mode like way. Therefore we set for every B 2 N : [B; i] ! B  f [C; j ] ! F j C 2 N; i 6= j g [ f [C; j ] ! F j C 2 N; 1  j  n g [ f [C; i] ! F j C occurs as left-hand side in Pi g: (t^ = 1)-mode: This is the most complicated mode with respect to the simulation. Without loss of generality we assume that component is reduced, i.e., there are no productions A ! w such that word w contains any nonterminal that occurs as left-hand side of any production of the component. Having this, we introduce for every rule A ! w the following productions: [A; i] ! [A; i]  f B ! F j B 2 N g [ f [B; j ] ! F j B 2 N; i 6= j g [ f [B; j ] ! F j B 2 N; 1  j  n g: Moreover, we need rules for the application of a the rule: [A; i] ! w  f [B; j ] j B 2 N; 1  j  n g [ f [B; j ] ! F j B 2 N; i 6= j g [ f B ! F j B occurs as left-hand side in Pi g: Obviously, the (t^ = 1)-mode simulation does not need a re-colouring mechanism. By induction, the reader may verify that for each component the derivation is simulated correctly. We give an example: Let the considered component Pi work in -mode. The reader may verify that

u )i v if and only if u ) hi(u) ) hi (v) ) v in G0 ; 25

where the former and latter derivation makes the colouring and re-colouring components, respectively. The actual application of the rules that corre sponds to the derivation made by Pi is hi (u) ) hi(v). The corecctness of the other modes can be seen similarly. 2 It is very interesting to see, that the power of HCD systems is enlarged if (t^ = 1)-components work together with components working in, e.g., = 2-mode. In this way we obtain our rst characterization of programmed context-free languages with appearance checking.

Theorem 6.5 If ; 6= F  f = k;  k; ( k^  `); (t^  k) j 2  k  ` g, then L(HCD; CF[?]; F [ f(t^ = 1)g) = L(P; CF[?]; ac): Proof. It is sucient to show how to simulate a programmed production (p : A ! w; (p); (p)). We restrict ourselves to the case k = 2. The other

cases can be seen using prolongation techniques. The failure case is simulated by a table working in (t^ = 1)-mode containing p ! q0 for q 2 (p) and r ! F for every label r and A ! F . There are colouring tables (working not in the (t^ = 1)-mode) for depriming labels and priming and double-priming and de-priming nonterminals. The success case is simulated by: 1. one table t working in = 2,  2, ( 2^  `), or (t^  2): it contains p ! p and A00 ! A; now, there is at least one occurrence of A in the string; 2. one table working in (t^ = 1): it contains q ! F and r ! F and q0 ! F for every label q; r with r 6= p. and A ! w; 3. a second table working in (t^ = 1): it contains q ! F and r ! F for every label q; r with r 6= p. and A ! F (this prevents a premature short-cut application) and p ! q0 for q 2 (p). If the rst table t is used to introduce more than one occurrence of A, it is not possible to consume this symbol anymore. 2 Now we turn to the case of (t^ = 2)-components. This will give us another characterization of programmed languages with appearance checking. =

=

26

Theorem 6.6 If ; 6= F  f; tg [ f  k;  k; = k; ( k^  k0); (t^  k) j k; k0 2 N; k  k0 g, then L(HCD1; CF[?]; F [ f(t^ = 2)g) = L(P; CF[?]; ac): Proof. The inclusion from left to right is obvious for any boolean combi-

nation of elementary modes. Thus, it remains to prove the other direction. We rst restrict ourself to the case F = fg. By standard techniques and arguments the construction given below generalizes to the other cases as well. Let L 2 L(P; CF[?]; ac) with L  T , then [ L = (aT \ L) [ (L \ T ) [ (L \ fg): +

a2T

Since L 2 L(P; CF[?]; ac), language La = f w 2 T j aw 2 L g is a programmed context-free language with appearance checking due to the closure of that language family under derivatives and intersection with regular languages. Since L(HCD1; CF[?]; f; (t^ = 2)g) is obviously closed under union and contains the regular languages, it is sucient to prove the present assertion for L0 = fagLa 2 L(P; CF[?]; ac) provided that La  T belongs to L(P; CF[?]; ac). We brie y sketch the construction of a (HCD1; CF[?]; fg [ f(t^ = 2)g) system of grammars that generates L0. Let G = (N; T; P; S; ; ; ') be a programmed grammar with appearance checking for language La. We use a standard trick to code the actual label into the rst nonterminal. Thus, we have additional nonterminals for colouring operations of the form pi , pi, A, and [A; pi] for each production A ! w 2 P labelled by pi. Then for every production in P , we build grammars according to the following construction: let +

+

pi : A ! w; (pi); '(pi ) be a production in P . Without loss of generality we can assume that pi does not belong to both (pi) and '(pi). All components below work in -mode, except components Pinit , Ppi; , Ppi; , and Pterm which work in (t^ = 2)-mode. To start the derivation, we introduce two new symbols Sinit , and S 0, where the former is the new start, and a production set Pinit = f Sinit ! S 0; S 0 ! pS j p 2  g: 1

3

27

The components given below are applicable successfully if and only if the actual sentential form has no nonterminal A or some marked nonterminal [A; pi], and therefore the production with label pi is not applicable. In addition the label of the rst letter of the sentential form is changed appropriately. Note that the component works in (t^ = 2)-mode.

Ppi; = f A ! F; [A; pk] ! F j pk 2 g [ fpi ! pi ; pi ! pk j pk 2 '(pi) g: 1

The next components simulate an application of production A ! w. In case A is rewritten, we apply necessary colouring operation and choose one nonterminal A for rewriting which is marked by [A; pi].

Ppi ; = fA ! [A; pi]g: 2

The check whether exactly one nonterminal was chosen by the previous production set for rewriting is done with production set Ppi; which works in (t^ = 2)-mode: 3

Ppi; = f pk ! F; [B; pk ] ! F j pk 2  n (pi); pi 6= pk ; B 2 N g [ f [B; pi] ! F j B 2 N; A 6= B g [ f pi ! pk j pk 2 (pi) g [ f [A; pi] ! wg: 3

In order to terminate the derivation process one applies the following production set which works in (t^ = 2)-mode:

Pterm = f A ! F; [A; pi] ! F j A 2 N; pi 2  g [ f p ! p; p ! a j p 2  g: Observe, that Pterm can only be applied successfully to a sentential form which looks like pw with p 2  and w 2 T . Assume that on a sentential form we want to apply production

pi : A ! w; (pi); '(pi ): If nonterminal A is not contained in the sentential form the only way to simulate the non-application of A ! w is done using production sets Ppi ; . 1

28

Thus, in the following the sentential form looks like pi A , with ; 2 (N [ T ) and A 2 N . To start the simulation of of the production successfully we must use Ppi; . Note that Pterm leads to a word containing a failure symbol, and an application of Ppi; is not possible since both and are words over N [ T. After the usage of Ppi; the derived word must look like pi [A; pi] , and both and do not contain any symbol of the form [B; pk] as we will see later. A further successful simulation can only be guaranteed if one uses productions from Ppi ; . Note that at this point an application of Ppi ; is not possible. If words or contain nonterminals of the form [B; pk ] production set Ppi; is not applicable since it works in (t^ = 2)-mode. Thus, the word must have the describe form above. Now applying Ppi; would get a word pk w . It is seen that the constructed grammar system simulates the original programmed grammar G and that the generated language equals L0. Observe, that the grammar system (production set Ppi; ) has -productions only if the programmed system has. This completes the construction for the case F = fg. The only components that work in another mode than (t^ = 2) are the production sets Ppi ; for pi 2 . Of course, the  k and  k as well as the interval mode ( k;  k0) can be obtained using prolongation techniques. In case of F = ftg, we use the standard trick to simulate the necessarily colouring operations by two components working in t-mode. The new nonterminals introduced on this purpose derive the failure symbol in all other components. The details of this construction and for the remaining modes are left to the interested reader. 2 Remark: In the construction of the HCD system we really need the (t^ = 2)mode only for the components Ppi; , for pi 2 Lab, to simultaneously rewrite the label and the coloured nonterminal [A; pi]. With a slight modi cation the remaining (t^ = 2)-components can be replaced by components even working in (t^ = 1)-mode. Using a simple prolongation technique (for the label colouring production only), one can use in the above construction (t^ = k)-components either. Thus, we obtain: 2

3

2

3

1

3

3

3

2

3

29

Corollary 6.7 Let ; 6= F  f; tg [ f  k;  k; = k; ( k^  k0); (t^  k) j k; k0 2 N; k  k0 g. For every `  2, then L(HCD1 ; CF[?]; F [ f(t^ = `)g) = L(P; CF[?]; ac): 2 In addition we can show that components in (t^ = `) are not necessary as long one can use components working in (t^  `)- and (t^  `)-mode. Corollary 6.8 1. L(HCD1 ; CF[?]; f(t^  1); (t^  1)g) = L(O; CF[?]). 2. For every k  2, L(HCD1; CF[?]; f(t^  k); (t^  k)g) = L(P; CF[?]; ac): Proof. Case (1) follows from Theorem 6.4, and case (2) is an immediate consequence of Theorem 6.5 and the prolongation technique. Together with Theorem 5.8, we get:

2

Corollary 6.9 For every k 2 N, 1. L(CD1 ; CF[?]; (t^  k)) ( L(HCD1 ; CF[?]; f(t^  k); (t^  k)g) and

2. L(CD1 ; CF[?]; (t^  k))  L(HCD1 ; CF[?]; f(t^  k); (t^  k)g).

Proof. The inclusion are obvious. It remains to prove the strictness of the former inclusion. In the next section we prove that the family of languages L(CD1; CF[?]; (t^  k)) coincides with the family of languages generated by ordered context-free grammars under the nite index restriction. Since the nite index restriction really restricts the power of ordered languages, the inclusion is strict for every k 2 N. 2 The reader may have noticed, that up to now we have nothing said about the (t^  k)-case. For k = 1, the situation was investigated in Theorem 6.4. In general, HCD language families with components working in (t^  k)mode are located between the ordered context-free languages and the programmed context-free languages with unconditional transfer. This is shown in the next theorem. 30

Theorem 6.10 Let ; 6= F  f; tg [ f  k j k 2 N g. For every ` 2 N, L(O; CF[?])  L(HCD1; CF[?]; F [ f(t^  `)g)  L(P; CF[?]; ut): Proof. The inclusion of the family of ordered languages within the family L(HCD1 ; CF[?]; F [ f(t^  `)g) already follows from Theorem 6.4.

Thus, it remains to show the containment of the latter language family in L(P; CF[?]; ut). Let G = (N; T; S; P ; : : :; Pn ) be a HCD grammar system where K is the maximal value k of the  k and (t^  k) components. For each Pi, for 1  i  n, we construct a bunch of productions in a programmed grammar with unconditional transfer. The constructions depends on the mode of the component Pi. We assume that each production of the grammar system has a unique label, and that the labels of productions in Pi are fpi; ; : : :; pi;r i g and that the productions look like pi;j : Ai;j ! wi;j . The programmed grammar G0 = (N [ fF g; T; P; S; ; ) has label set  = f pi ; p00i j 1  i  ng [ f p0i;j j 1  i  n; 1  j  r(i)g [ f (pi;j ; `) j 1  i  n; 1  j  r(i); 0  `  K g: Production set P contains for every 1  i  n the dummy productions pi : F ! F; f (pi;j ; 0) j 1  j  r(i) g and p00i : F ! F; f pj j 1  j  n g: Then for every component Pi , 1  i  n, we introduce the productions according the mode the component works: -mode: The simulation of the productions of Pi is done with the following rules: For every 1  j  r(i) we introduce (pi;j ; 0) : Ai;j ! wi;j ; f (pi;k ; 0) j 1  k  r(i) g [ fp00i g: 1

1

( )

t-mode: The construction for the t-mode is similar to the above given one. For every every 1  j  r(i) we introduce (pi;j ; 0) : Ai;j ! wi;j ; f (pi;k ; 0) j 1  k  r(i) g [ fp0i; g: Finally for every 1  j  r(i) ? 1 de ne p0i;j : Ai;j ! F; fp0i;j g and p0i;r i : Ai;j ! F; fp00i g: 1

+1

( )

31

 k-mode For every 1  j  r(i) and every 0  m  k ? 1 we introduce (pi;j ; `) : Ai;j ! wi;j ; f (pi;m; ` + 1) j 1  m  r(i) g [ fp00i g and we additional set (pi;j ; k) : F ! F; ;. (t^  k)-mode: The construction for this mode is nothing else than a com-

bination of the ideas used above. We left the details to the interested reader. By induction the reader may verify that G0 simulates G correctly. This proves our claim. 2 Since ordered languages are strictly included in programmed languages with unconditional transfer, one of the inclusions stated in the theorem above has to be strict. Finally, we want to mention one preliminary result on the number of components.

Theorem 6.11 If F = f; tg [ f  k;  k; = k; (t^  k); (t^  k); (t^ = k) j k 2 N g, then L(HCD1 ; CF[?]; F ) = L(HCD ; CF[?]; ftg [ f(t^  2); (t^  2)g): Proof. By our preceding results the former class coincides with the language family L(P; CF[?]; ac). Using Corollary 6.8 we construct a HCD system with say, n components working in (t^  2)-mode and m components working in (t^  2)-mode. 4

Finally we use the technique introduced by Mitrana [11, Theorem 4] to reduce the number of components in HCD systems. This lead us to two tcomponents which perform the necessary colouring operations and two components, one working in (t^  2) and the other one in (t^  2)-mode. The latter two components are constructed by putting the n (m, respectively) components together using n + m colours. The details of the construction is left to the reader. 2 We do not know whether three components are sucient or not. As regards two components, we have the following nice characterization. 32

Lemma 6.12 Let ; 6= F  f; tg [ f  k j k 2 N g. For  2 f; =g and n 2 f1; 2g, we have L(HCDn ; CF[?]; F [ f(t ^ 1)g) = L(CF): Proof. Let  2 f; =g and n 2 f1; 2g. By [4, Theorem 3.10], we know that L(CDn ; CF[?]; t) = L(CF). This especially implies L(HCDn ; CF[?]; f; tg [ f  k j k 2 N g [ f(t ^ 1)g)  L(CF). As shown in Theorem 7.3, L(CD ; CF[?]; (t ^ 1)) = L(LIN). 2

So, we have to show that a HCD grammar system G = (N; T; S; P ; Pt) containing one component Pt working in t-mode and one component P working in (t ^ 1)-mode only generates context-free languages. First, observe that P can only be successfully applied to a string containing exactly one nonterminal. Why? This situation may be true in the very beginning. Otherwise, due to the (t ^ 1)-mode P is working in, Pt must have been applied before. Assume that there are at least two nonterminals in the sentential form left over after applying Pt. P can only be applied once, changing one nonterminal into a string containing terminals plus nonterminals Pt can deal with. Therefore, at least one nonterminal remains with which neither Pt nor P can cope with, a contradiction. Secondly, we give a context-free grammar G0 = (N [ N 0; T; S 0; P ) simulating G. Let N 0 be the set of primed nonterminal symbols from N . Let f : (N [T ) ! 2 N [N 0[T  be de ned by f (w) = s(w)\((N [T )N 0(N [T )[T ), where the nite substitution s is de ned via s(A) = fA; A0g if A 2 N , and s(a) = fag, if a 2 T . Now, de ne 1

1

1

1

1

1

(

)

P = Pt [ f A0 ! v j v 2 f (w); A ! w 2 P [ Pt g : 1

The primed symbol is the one which forms the interface between the two original components. 2 Unfortunately, no such characterization exists for the case involving the (t ^ k)-mode with k > 1. We can state the following fact instead.

Lemma 6.13 For  2 f; =g, n 2 f1; 2g, k 2 N, k > 1, we have L(HCDn ; CF[?]; ftg [ f(t ^ 1)g) ( L(HCD ; CF[?]; ftg [ f(t ^ k)g): 2

33

Proof. The inclusion itself basically follows by prolongation technique. As

regards the strictness of the inclusion, in Theorem 5.8 we have shown that the non-context-free language

f anan : : :ank j n 2 N g 2 L(CD ; CF[?]; (t ^ k)); which is a trivial subfamily of L(HCD ; CF[?]; ftg [ f(t ^ k)g). 1

2

2

+1

2

2

Comparing the generative power of HCD systems using one or two components, respectively, with that using 3 components we obtain:

Lemma 6.14 Let  2 f; =g. For every k 2 N we have L(HCD ; CF[?]; ftg [ ft ^ k)g)  L(HCD ; CF[?]; ftg [ f(t ^ k)g): Especially for the (t ^ 1)-mode we have for n 2 f1; 2g, L(HCDn; CF[?]; ftg [ f(t ^ 1)g) ( L(HCD ; CF[?]; ftg [ f(t ^ 1)g): Proof. We have only to show the strictness of the latter inclusion inclusion. By [4, Theorem 3.10], L(ET0L) = L(CD ; CF[?]; t), and the latter is a trivial subfamily of L(HCD ; CF[?]; ft; (t^  1); (t^ = 1)g). Since L(CF) ( L(ET0L), the claim follows. 2 2

3

3

3

3

Nevertheless, we conjecture that the strict inclusion stated in the next Lemma is also valid for the case k > 1, since we presume that

L(HCD ; CF[?]; ftg [ f (t ^ k) j k 2 N g) 2

contains only semilinear sets.

6.2 Characterizing programmed languages of nite index

In the remainder of this section we show that CD grammar systems with components working in (t; = k)-mode only, characterize exactly the programmed context-free languages of nite index (with appearance checking). The nite index restriction is de ned as follows: 34

Let G be an arbitrary grammar type (from those discussed in Section 2) and let N , T , and S 2 N be its nonterminal alphabet, terminal alphabet, and axiom, respectively. For a derivation D : S = w ) w )    ) wn = w 2 T  according to G we set ind (D; G) = maxf jwijN j 1  i  n g; and, for w 2 T , we de ne ind (w; G) = minf ind (D; G) j D is a derivation for w in G g: The index of grammar G is de ned as ind (G) = supfind (w; G) j w 2 L(G) g: For a language L in the family L(X) of languages generated by grammars of type X we de ne ind (L) = inf f ind(G) j L(G) = L and G is of type X g: For a family L(X), we set Ln (X) = f[L j L 2 L(X) and ind (L)  n g for n 2 N, and L n (X) = Ln (X): 1

2

X

X

n1

We review some results on languages of nite index. The next theorem was proven in [15, Theorem 3] and also in [7]. In case of ET0L systems, the de nition of nite index refers to so-called active symbols instead of nonterminals.

Theorem 6.15 For every n 2 N, the following families of languages coincide: Ln (O; CF), Ln (O; CF ? ), Ln(P; CF), Ln (P; CF ? ), Ln (P; CF; ac), Ln (P; CF ? ; ac), and Ln (ET0L). How restrictive is the nite index restriction? We summarize some of the known results in the following theorem. Proofs are contained in [5, Chapter 5], [7], and [14, 15, 16]. 35

Theorem 6.16 For every n  2 we have: L(LIN) = L (CF) ( Ln (O; CF) ( Ln (O; CF) ( L n (O; CF) = L n (ET0L): 1

+1

More precisely,

Sn = f b(aib)  n +1

2 ( +1)

j i  1 g 2 Ln (O; CF) n Ln (O; CF) : +1

Furthermore,

L n (ET0L) ( L(ET0L) ( L(O; CF[?]) ( L(P; CF[?]; ut): A famous example of a context-free language which is not generable by an ordered grammar of nite index is the Dyck set (even over one pair of parentheses), as shown by Rozoy in [17]. Before we can state the main theorem of this section we need the following Corollary 4.4 from [7] which we quote here for the convenience of the reader. 6

Corollary 6.17 Every (O; CF) language of index n is generated by an (O; CF) grammar G = (N; T; P; S; ) of index n such that: 1. N contains a special failure symbol F not appearing on any left-hand side of a production in P . 2. P contains failure productions B ! F for every B 2 N , B 6= F . 3. For each production A ! w 2 P , either w 2 T  [ fF g or there exists a nonterminal B 2 N nfF g such that w = w1 Bw2 and A ! w  B ! F . 4. All nonterminals occurring in some sentential form v (not containing the failure symbol), which are derivable in a certain number of steps from the start symbol S , are pairwise distinct. Therefore, no production with non-failure symbols on its right-hand side can be applied twice in immediate sequence.

Now we are ready to prove our main theorem. Before, Salomaa showed in [18] that the Dyck set is not a context-free language of nite index. 6

36

Theorem 6.18 Let ` 2 N and  2 f; =g. Then we have: [ L(HCD1 ; CF[?]; f (t ^ k) j k  1 g) = L(CD1; CF[?]; (t ^ k)) k2N = L(CD1; CF[?]; (t ^ `)) = L(CD1; CF[?]; (t ^ 1)) = L n (P; CF[?]; ac) : Proof. We rst prove L(HCD1; CF[?]; f (t ^ k) j k  1 g)  L n (P; CF[?]; ac) : By Theorem 3.4 and Corollary 6.2 it suces to show that the language generated by a (CD1; CF[?]; (t ^ k)) grammar is of nite index. Let G be a (CD1 ; CF[?]; (t ^ k)) grammar system with nonterminal alphabet N , terminal alphabet T , and axiom S 2 N . One veri es that, if S ) ) w 2 T  for 2 (N [ T ), then for every A 2 N , the number of A's in is bounded by k, i.e., j jA  k. Otherwise, to does not lead to a terminal string. Thus, in general, the whole number of terminals in a sentential form that derives to a terminal word is bounded by at most k  jN j. This shows that language L(G) is of nite index. Trivially, L(CD1; CF[?]; (t^ = 1)) = L(CD1 ; CF[?]; (t^  1)), and by prolongation technique, we have for ` 2 N and  2 f; =g:

L(CD1; CF[?]; (t ^ 1))  L(CD1 ; CF[?]; (t ^ `)) : So, in order to complete our proof we have to show

L(CD1 ; CF[?]; (t^ = 1))  L n (P; CF[?]; ac) : By Theorem 6.15 and Corollary 6.17, it suces to show how to simulate an ordered grammar G = (N; T; P; S; ) given in the normal form of Corollary 6.17. For each production A ! w 2 P , where w is not the failure symbol F , we introduce a component

PA!w = fA ! wg [ f B ! B j B 2 N n fF g; A ! w  B ! F g : 37

It is easy to see (keeping in mind the properties of the normal form grammar as listed in Corollary 6.17) how a CD grammar system consisting of all such components PA!w simulates G. 2 Comparing the characterizations of the classes L(P; CF[?]; ac) respectively Lfin (P; CF[?]; ac), one observes that the main di erence lies in the absence of non-(t; = k)-modes. Comparing internal versus external hybridization, we get:

Corollary 6.19 For every k 2 N,  2 f; =g, we nd: L(CD1 ; CF[?]; (t ^ k)) ( L(HCD1; CF[?]; ftg [ fkg) : Proof. By Theorem 6.18, we know that L(CD1; CF[?]; (t ^ k)) char-

acterizes the family of nite index ET0L languages, see Theorem 6.15. By Theorem 6.16, the family of ET0L languages is a strict superfamily of its nite index counterpart. By [4, Theorem 3.10], L(ET0L) = L(CD1 ; CF[?]; t), and the latter family is a trivial subfamily of L(HCD1 ; CF[?]; ft; kg). 2

7 Inside nite index By the results of the preceding section, we know that CD grammar systems working in a (t ^ k) mode with  2 f; =g and k 2 N characterize the programmed languages of nite index. Unfortunately, our proofs do not bound the number of components of the CD grammar system. This is not just a coincidence, as we will see in this section.

7.1 Counting components

Our task will be the study of the language families L(CDn ; CF[?]; (t ^ k)) for di erent n; k 2 N and  2 f; =g. First we give some characterizations of well-known language families, namely the family of nite languages and the family of linear languages.

Lemma 7.1 For every k 2 N, and  2 f; =g, we have L(FIN) = L(CD ; CF[?]; (t ^ k)): 1

38

Proof. Since we have only one component, by de nition of the (t ^ k)-

mode, every derivation has length at most k, so that we only get nite languages. If L = fw ; : : : ; wmg  T  is some nite language, then grammar G = (fS g  f1; : : :; kg; T; (S; 1); P ) with 1

P = f (S; i) ! (S; i + 1) j 1  i < k g [ f (S; k) ! wj j 1  j  m g) generates L. 2 Now, we turn our attention to CD grammar systems with two components working in (t ^ 1)-mode for  2 f; =g.

Lemma 7.2 For  2 f; =g we have L(LIN) = L(CD ; CF[?]; (t ^ 1)). Proof. Let L be generated by the linear grammar G = (N; T; S; P ). Grammar G is simulated by the CD grammar system G0 = (N [ N 0; T; S; P ; P ), where N 0 contains primed versions of the nonterminals of G, set P contains colouring unit productions B ! B 0 for every nonterminal B 2 N , and P contains, for every production A ! w 2 P , a production A0 ! w. The 2

1

2

1

2

simulation of G by G0 proceeds by applying P and P in sequence until the derivation stops. On the other hand, it is easy to see that no sentential form generated by some (CD ; CF[?]; (t ^ 1))-system (eventually leading to a terminal string) can contain more than one nonterminal. Otherwise, we must have applied a production A ! w of say the rst component, where w contains at least two nonterminals. All nonterminals occurring in w cannot be processed further by the rst component, since otherwise it violates the (t ^ 1)-mode restriction. But nearly the same argument applies to the second component, too: it can only process at most one of the nonterminals just introduced. Hence, no terminal string is derivable in this way. Therefore, one could omit all productions containing more than one nonterminal on their right-hand sides, so that there are only linear rules left. Furthermore, one can forget about all productions in a component containing a nonterminal as its right-hand side which occurs also as a left-hand side of the originally given component. Now, one could put all remaining productions together yielding the rule set of a simulating linear grammar. 2 In the general case, i.e., two components working together in (t ^ k)mode, for k 2 N and  2 f; =g, we rst give some lower bounds. 2

2

39

1

Theorem 7.3 Let k 2 N,  2 f; =g. Then, we have: 1. L(LIN) = L (CF)  L(CD ; CF[?]; (t ^ 1)); in general 2. Lk (CF)  L(CD ; CF[?]; (t ^ k)); more precisely: 3. if k > 1, then Lk (CF) ( L(CD ; CF[?]; (t^ = k)); and 4. L n (CF) ( Sk2NL(CD ; CF[?]; (t^ = k)). Proof. 1. It is easy to see that L(LIN) = L (CF). So, this statement is equivalent 1

2

2

2

2

1

to the assertion of the last lemma. 2. Let G = (N; T; S; P ) be a context-free grammar of index k. We assume without loss of generality that every nonterminal occurs as the left-hand side of some production in P . Let N 0 be the set of primed nonterminal symbols. Grammar G is simulated by the CD grammar system G0 = (N [ N 0; T; S; P ; P ), where P contains colouring unit productions B ! B 0, and B ! B for every nonterminal B 2 N , and P contains, for every production A ! w 2 P , productions A0 ! w and A0 ! A0. 3. In Theorem 5.8, we gave the example of a (CD ; CF; (t^  k) grammar system generating the non-context-free language L = fanan    ank j n  1g: The same grammar system, viewed as a (CD ; CF; (t^ = k) grammar system, generates L, too. 4. follows from 3. 2 1

2

1

2

2

1

2

+1

2

Unfortunately, we do not know whether the inclusion Lk (CF)  L(CD ; CF[?]; (t^  k)) in the previous theorem is strict or not. By prolongation technique, we know that the classes L(CDn; CF[?]; (t ^ k)), for  2 f; =g form a prime number lattice, i.e., L(CDn ; CF[?]; (t ^ k))  L(CDn; CF[?]; (t ^ `  k) for ` 2 N, 2

40

with the least element L(CDn ; CF[?]; (t ^ 1)). Obviously, we also have the trivial inclusions L(CDn ; CF[?]; (t ^ k))  L(CDn ; CF[?]; (t ^ k)) for  2 f; =g. The question arises whether all these hierarchies are strict beside some (trivial) gaps shown below? In order to prove that this hierarchy (and other similar ones|if we x the mode and vary the number of components) is strict, we show some general theorems relating the number of components and the bound of the number of symbols to be rewritten by one component with the nite index of a simulating programmed grammar. Theorem 7.4 Let n; k 2 N and  2 f; =g. Then, we have: L(CDn ; CF[?]; (t ^ k))  L n? k (P; CF[?]; ac): Proof. Let G = (N; T; S; P ; : : :; Pn ) be a CD grammar system in (t^ = k)mode. Let Pi = f Aij ! wij j 1  j  N (i) g. Grammar G can be simulated by the programmed grammar G0 = (N [ fF g; T; S; P; ; ; ), with label-set  = f (i; j; ) j 1  i  n; 1  j  N (i); 1    k g [ f (i; j ) j 1  i  n; 1  j  N (i) g: Now, let 1  i  n; 1  j  N (i); 1    k. Then, we set ((i; j; )) = Aij ! wij , success eld ((i; j; )) = f (i; j 0;  + 1) j 1  j 0  N (i) g, if  < k, and ((i; j; k)) = f(i; 1)g, and failure eld ((i; j; )) = ;. Moreover, we let ((i; j )) = Aij ! F , failure eld ((i; j )) = f(i; j + 1)g if j < N (i), ((i; N (i))) = f (i0; j 0; 1) j 1  i0  n; 1  j 0  N (i0) g, and success eld ((i; j )) = ;. An application of Pi is simulated by a sequence of productions labelled with (i; j ; 1); : : : ; (i; jk; k); (i; 1); : : :; (i; N (i)) : In each such sequence, at most k symbols can be processed. Since there are n such sequences, only sentential forms containing at most n  k nonterminals can hope for termination. Therefore, the simulating programmed grammar has index at most n  k. In fact, we can do a little bit better: Since one component must start with the derivation process, only n ? 1 components remain for a further processing. Thus, the bound on the number of nonterminals is in fact (n ? 1)  k which can be seen by induction. +1

(

1

1

41

1)

The (t^  k)-mode case can be treated similarly: we just de ne ((i; j; )) to be equal ((i; j; )) instead of ((i; j; )) = ;. 2 Before we can separate the L(CDn; CF[?]; (t ^ k) hierarchy for varying n's and k's, we need the following theorem. Recall that Sn = f b(aib)  n j i  1 g 2 Ln (P; CF) n Ln (P; CF) : Theorem 7.5 Let n; k 2 N. Language Snk can be generated by a CDgrammar system with n + 1 context-free components, without erasing productions, working in (t^ = k + 1)-mode. In case of (t^  k + 1)-mode, Snk can be generated by CD-grammar system with 2  n + 1 context-free components, without erasing productions. Proof. The construction of the CD system is a little bit involved, but rather simple. We construct a CD grammar system G = (N; fa; bg; (S; 0); P ; P ; [ P ; ; : : : ; Pn; [ Pn; ) working in (t^ = k + 1)-mode generating language Snk . Let N = f (S; i) j 0  i  ng [ f (qi; 0); (qi; 1) j 1  i  n g [ f (ti; j ); (t0i; j ) j 1  i  n; 0  j  k g [ f (Ai; 0); (Ai; 1) j 1  i  k  n g and the production set looks as follows: P = f(S; 0) ! (S; 1); (S; n) ! (q ; 0)(A ; 0)b(A ; 0)b : : :b(Ank ; 0)bg [ f (S; i) ! (S; i + 1) j 1  i < n g [ f (Ai; 1) ! (Ai; 0) j 1  i  n  k g [ f (qi; 1) ! (qi ; 0) j 1  i < n g [ f(qn; 1) ! (q ; 0); (qn; 1) ! (t ; 0)g [ f (t0i; j ) ! (t0i; j + 1) j 1  i  n; 0  j < k g [ f(t0i; k) ! (ti ; 0) j 1  i < n g [ f(t0n; k) ! bg : For every 1  i  n let Pi = Pi; [ Pi; , where Pi; = f(qi; 0) ! (qi; 1)g [ f (Aj ; 0) ! a(Aj ; 1)a j (i ? 1)  k  j  i  k g and Pi; = f(ti; 0) ! (t0i; 0)g [ f (Aj ; 0) ! aba j (i ? 1)  k  j  i  k g: 2 ( +1)

+1

+1

0

0

11

1

12

1

+1

1

2

1

+1

1

2

1

2

42

2

1

The only way to start the derivation is to use P obtaining the word (q ; 0)(A ; 0)b(A ; 0)b : : : b(An; 0)b. To continue the derivation one has always to apply Pi and P in sequence and Pi (P , respectively) is only successfully applicable to a sentential form beginning with letter (qi; 0) or (ti; 0) ((qi; 1) or (t0i; 0), respectively). Assume we have a sentential form starting with letter (qi; 0), then rules from Pi replace exactly k occurrences of nonterminals (Aj ; 0), for (i ? 1)  k  j  i  k, by a(Aj ; 1)a or aba, respectively, and the label is changed to (qi; 1). Applying now the corresponding rules from P (the only way to continue) the derivation would block if at least one of the (Aj ; 0)'s from the previous step were replaced by aba. This ensures that all previously used productions are non-terminating. In case the sentential from starts with a letter (ti; 0), then an application of Pi followed by P checks whether the terminating rules form Pi were all used or not. Thus, one cycle, i.e., an application of P ; P ; P ; : : : ; Pn, and P , to the word (q ; 0)(A ; 0)b(A ; 0)b : : :b(An; 0)b leads to 0

1

1

2

0

0

0

0

1

1

1

0

2

0

2

(q ; 0)a(A ; 0)aba(A ; 0)ab : : :ba(An; 0)ab ; 1

1

2

and in general, running the cycle ` times, to the word (q ; 0)a`(A ; 0)a` ba`(A ; 0)a`b : : : ba`(An; 0)a` b : 1

1

2

Note that in the last application of P we could have taken the rule (qn; 1) ! (t ; 0) in order to terminate the derivation process after nishing the next cycle. After that cycle the grammar G has generated the word b(a` b)nk . In case of (t^  k + 1)-mode, we use the same construction as above, but now treating each Pi;j , for 1  i  n and 1  j  2, as an independent component of the grammar. This gives the bound 2  n + 1 on the number of components. 2 Now we are ready to investigate the class L(CDn ; CF[?]; (t ^ k)) in more detail. First, let the number of components be xed. Thus, we study the prime number lattice of the L(CDn ; CF[?]; (t ^ k)) classes. 0

1

+1

Corollary 7.6 Let  2 f; =g and n 2 N be xed. The hierarchy of the families of languages Lk := L(CDn; CF[?]; (t ^ k)) with respect to k 2 N is strict, i.e., for every k 2 N there exists a k 0 2 N, k 0 > k such that Lk ( Lk0 . 43

Proof.

Consider the (t^ = k)-mode rst. By the preceding theorem we have that Snk belongs to L(CDn ; CF[?]; (t^ = k + 1)). On the other hand, Snk cannot belong to L(CDn ; CF[?]; (t^ = ` + 1)) if ` < k?1, since otherwise it would also belong to Ln k? (P; CF[?]) which is a contradiction because Snk is the separator language. This shows that there is a language in L(CDn ; CF[?]; (t^ = k + 1)) that does not belong to L(CDn ; CF[?]; (t^ = k ? 1)). Therefore, the whole hierarchy is strict for xed n. For the (t^  k)-mode, we can do not as good as above, since the bounds on on number of components to generate Snk is a little bit worse. Nevertheless, we conclude that the hierarchy is strict for large enough gaps (factor 3) for xed n. 2 Finally, lets consider the other hierarchy, i.e., we x k and vary the number of components: Corollary 7.7 Let  2 f; =g and k 2 N be xed. The hierarchy of the families of languages Ln := L(CDn ; CF[?]; (t^k)) with respect to n 2 N is strict, i.e., for every n 2 N there exists a n0 2 N, n0 > n such that Ln ( Ln0 . Proof. We argue similar to the preceding corollary. First consider the (t^ = k)-mode of derivation. We already know that language Snk 2 L(CDn ; CF[?]; (t^ = k + 1)) ; but it cannot belong to L(CD` ; CF[?]; (t^ = k + 1)) if ` < (1 ? k )  n. Otherwise, we would get a contradiction, since we would especially get Snk 2 L(CD ? k+11 n? ; CF[?]; (t^ = k + 1))  L ? k+11 n?  k (P; CF)  Lkn? (P; CF) : In case of the (t^  k)-mode, we obtain Snk 2 L(CD n ; CF[?]; (t^  k + 1)) n L(CD ` ; CF[?]; (t^  k + 1)) if ` < (1 ? k )  n by a similar reasoning. In both cases, it is seen that for large enough gaps in the number of components the hierarchies are strict. 2 Finally, let us consider the hierarchy if we do not keep the number of components n xed anymore, for the \small cases" of n. +1

+1 (

1)

+1

+1

+1

1 +1

+1

)

((1

)

((1

1)+1

1) ( +1)

1

2

1 +1

+1

2 +1

2

44

Lemma 7.8 Let k 2 N and  2 f; =g. L(CD ; CF[?]; (t ^ k)) ( L(CD ; CF[?]; (t ^ 1)) ( L(CD ; CF[?]; (t ^ k )): Proof. By our previous considerations, we know that the former two classes coincide with L(FIN) and L(LIN), respectively. The non-linear language f anbnambm j n; m 2 N g is generated by a CD 1

2

3

grammar system with k = 1 with the following three components:

P = fS ! AB; A0 ! A; B 0 ! B g P = fA ! aA0b; A ! ab; B 0 ! B 0g P = fB ! aB 0b; B ! ab; A ! A; A0 ! A0g: 1

2

3

First, P and P have to be applied in sequence, say n times, until P uses the rule A ! ab. Now, P can be applied. Then, P and P must be applied in sequence, say m ? 1 times, until P terminates the whole derivation using B ! ab. In this way, a word anbn ambm is derived. By the prolongation technique, the claimed assertion follows. 2 1

2

2

3

1

3

3

7.2 CD grammar systems of nite index

It is well-known that the class of programmed languages of index m can be characterized in various ways, compare, e.g., [5, 7, 16]. Especially, normal forms are available. For the reader's convenience, we quote [7, Theorem 3.2] in the following, since we will use it to give a sharpened and broadened version of [4, Theorem 3.26], which leads us to new characterizations of the classes Lm (P; CF) and L n (P; CF).

Theorem 7.9 For every (P; CF; ac) grammar G = (N; T; P; S; ; ; ) whose generated language is of index n 2 N, there exists an equivalent (P; CF; ac) grammar G0 = (N 0 ; T; P 0; S 0; 0 ;  0; 0) whose generated language is also of index n satisfying the following three properties: 1. There exists a special start production with a unique label p0 , which is the only production where the start symbol S 0 appears.

45

 v ) w is a 2. There exists a function f : 0 ! NN0 0 such that, if S 0 ) p derivation in G0 , then (f (p))(A) = jvjA for every nonterminal A.

v = w is a derivation in G0, 3. If D : S 0 = v0 =r) v =) v =)    =r) m m 1 1 r2 2 r3 then, for every vi, 0  i  m, and every nonterminal A, jvijA  1. In other words, every nonterminal occurs at most once in any derivable sentential form. Moreover, we may assume that either G0 is a (P; CF) grammar, i.e., we have 0 = ;, or that G0 is a (P; CF; ut) grammar, i.e., we have 0 = 0.

In the following, we will refer to a grammar satisfying the three conditions listed above as nonterminal separation form (NSF).

Theorem 7.10 Let m 2 N, FI = ftg[f = m00;  m00; ( m00;  m0); (t^ = k), (t^  k); (t^  k) j m0; m00; k 2 N; m0  m00  m g, and F contain all modes considered in this paper. Let f 2 FI . Then Lm(P; CF[?]) = Lm (CD1; CF[?]; f ) = Lm (HCD1; CF[?]; F ): Proof. Looking through all the proofs showing the containment of HCD languages within programmed languages with appearance checking, it is easily seen that all these constructions preserve indices. So, we have to show

Lm (P; CF[?])  Lm(CD1; CF[?]; f ) for every f 2 FI . Let L 2 Lm(P; CF) be generated by a programmed

grammar G = (N; T; P; S; ; ) in NSF. Especially, there exists a function f :  ! NN such that, if S ) v )p w is a derivation in G, then (f (p))(A) = jvjA  1 for every nonterminal A. We construct a simulating CD grammar system G0 given by 0

((N  ) [ (f i 2 N j i  m g N ); T; (1; S ); fPI g[f Pp;q j p 2  ^ q 2 (p) g) of index m all of whose components are working in one of the modes = m,

 m, t, ( m;  m0) (with m0  m), (t^ = m), (t^  m), or (t^  m). 46

Consider a production (p) = A ! w of G. We can assume (check) that (f (p))(A) = 1. Furthermore, de ne X n := (f (p))(B )  m B 2N

is the number of nonterminals in the current string (within a possible derivation leading to an application of p). Let the homomorphism hp;q : (N fpg[ T ) ! (N fqg[ T ) be de ned by (A; p) 7! (A; q) for A 2 N , a 7! a for a 2 T . For every q 2 , we introduce a component Pp;q within the CD grammar system containing the following productions: If n = m, then (A; p) ! hp;q (w) simulates the (successful) application of rule p. If n < m, then we prolong the derivation arti cially: (A; p) ! (1; A); (1; A) ! (2; A); : : : ; (m ? n; A) ! hp;q (w) : (B; p) ! hp;q(B ) for B 2 N n fAg keeps track of the information of the current state. As initialization component, we take PI containing (1; S ) ! (2; S ); : : :; (m; S ) ! (S; p) for every p 2  such that (f (p))(S ) = 1. Observe that, by induction, to every sentential form derivable from the initial symbol (1; S ), any component applied to it can make either 0 steps (so, we selected the wrong one) or exactly m steps. By a simple prolongation trick, we can also take components working in one of the modes = m00,  m00, ( m00;  m0) (with m0  m00), for some m00  m. Since the construction given in Theorem 6.18 is index-preserving, we can also take arbitrary (t^ = k) or (t^  k) components instead of requiring k  m. Since the t-mode and the (t^  1)-mode are identical, the prolongation technique delivers the result for the (t^  k)-mode for k 2 N in general. 2 Our theorem readily implies a characterization of programmed languages of general nite index. We summarize this fact together with results obtained via a di erent simulation in [4, Theorem 3.26] in the following corollary. Corollary 7.11 Let m; k; k0 2 N, k  2,  2 f; =; g. Then following families of languages coincide with L n (P; CF[?]): 47

1. L n (CD1 ; CF[?]; t), 2. L n (CD1 ; CF[?]; = k), 3. L n (CD1 ; CF[?];  k), 4. L(CD1 ; CF[?]; (t^ = k)),

5. L(CD1 ; CF[?]; (t^  k)), and

6. L n (CD1 ; CF[?]; (t ^ k 0)).

2

As regards the number of components, we can state the following:

Theorem 7.12 Let m 2 N,  2 f; =; g. Then, we have: 1. Lm (CD1 ; CF[?]; m) = Lm (CD ; CF[?]; m) and 2. Lm (CD1 ; CF[?]; (t ^ m)) = Lm (CD ; CF[?]; (t ^ m)). Proof. Since we restrict our attention to languages of index m, we can 3

3

simply carry over the proofs of the type \three is enough" for t-mode components, see [4, Theorem 3.10] and [11, Lemma 2]. In case of the (t^ = m)-mode, we have to go back to the simulation of the programmed grammar given in the previous theorem. It is clear that, due to NSF, we can prolong the simulation of a single nonterminal symbol.

2

It is quite natural to compare the such-de ned classes obtained via the index-m-restriction with their unrestricted counterparts. In our case, interestingly, also these unrestricted counterparts deliver nite index languages. However, as we have seen in Theorem 7.5,

Skn 2 L(CDk ; CF[?]; (t^ = n + 1)): +1

Especially, we have

S Since S

m?1)

2(

m?1) 2 L(CD3 ; CF[?]; (t^ = m)):

2(

is not of (programmed) index 2m ? 3, we can state: 48

Corollary 7.13 For m 2 N, we have Lm (CD ; CF[?]; (t^ = m)) ( L(CD ; CF[?]; (t^ = m)): Proof. Our previous considerations deliver the case m > 2, since S m? is not a (programmed) language of index 2m ? 3, and therefore not a (pro3

3

2(

1)

grammed) language of index m; therefore, S m? 2= Lm(CD ; CF[?]; (t^ = m)) : In case m = 2, we know that S 2= L (CD ; CF[?]; (t^ = 2)) : On the other hand, the (CD ; CF[?]; (t^ = 2)) grammar system with the following three components generates S , starting with S : P = fA ! aA0a; A ! aba; B ! aB 0a; B ! B 00g P = fB 0 ! B; C ! aC 0a; A ! F; B 00 ! aba; C ! abag P = fS ! bAbBbCb; C 0 ! C; A0 ! A; B 0 ! F g: We have L (CD ; CF[?]; (t^ = 1)) = L(LIN) : The example f anbnambm j n; m 2 N g 2= L(LIN) was shown in Lemma 7.8 to be in L(CD ; CF[?]; (t^ = 1)). 2 If we admit four components (or arbitrary many), by a similar reasoning we can separate all corresponding classes, since 3(m ? 1) = 3m ? 3 > m for m  2. Corollary 7.14 For m; n 2 N, n  4 (or n = 1), we know: Lm (CDn ; CF[?]; (t^ = m)) ( L(CDn; CF[?]; (t^ = m)): 2 Observe that the last two corollaries are quite astonishing if one keeps in mind that [ L n (P; CF) = Lm(CDn; CF[?]; (t^ = m)) m2N [ = L(CDn; CF[?]; (t^ = m)): 2(

3

1)

3

2

3

3

3

1

2

3

1

3

3

for all n 2 N; n > 2.

N

m2

49

8 Summary and Open Problems In this paper, we investigated an internal mode hybridization and contrasted this to the already examined external hybridization. Mostly, internally hybrid modes may be used to characterize their externally hybrid counterparts, with the exception of the combination of t and = k, which is the most interesting one indeed. In that case, the internal hybridization is weaker than the external one, since we get stuck at semilinear ( nite index) languages. On the other hand, if we combine an internally hybrid mode of the form (t^ = k) (where k  2) with one of the classical modes via external hybridization, it is even possible to characterize the recursively enumerable sets, which does not seem to be possible using classical modes in HCD grammar systems with context-free core rules (or it is at least unknown). In our opinion, the greatest open problems in this area are: 1. What is the exact relation between the following language families for varying k 2 N: (a) L(CD1; CF[?];  k), (b) L(CD1; CF[?]; = k), and (c) L(CD1; CF[?]; (t^  k))? 2. Are the following inclusions strict? (a) L(HCD1; CF[?]; f= k j k 2 Ng)  L(P; CF[?]), or (b) L(HCD1; CF[?]; ftg [ f= k j k 2 Ng)  L(P; CF[?]; ac) Let us nally mention that internally hybrid modes are also very interesting from a very di erent viewpoint: When using (H)CD grammar systems as language acceptors, we were able to obtain the rst examples where generating grammars are more powerful than accepting grammars, which solves an open problem in the theory of accepting grammars, see [2]. We will discuss this item in detail in a forthcoming paper.

References [1] A. Atanasiu and V. Mitrana. The modular grammars. International Journal of Computer Mathematics, 30:101{122, 1989. 50

[2] H. Bordihn and H. Fernau. Accepting grammars and systems: an overview. In J. Dassow, G. Rozenberg, and A. Salomaa, editors, Developments in Language Theory II; at the crossroads of mathematics, computer science and biology, pages 199{208, 1996. [3] E. Csuhaj-Varju and J. Dassow. On cooperating/distributed grammar systems. J. Inf. Process. Cybern. EIK (formerly Elektron. Inf.verarb. Kybern.), 26(1/2):49{63, 1990. [4] E. Csuhaj-Varju et al. Grammar Systems: A Grammatical Approach to Distribution and Cooperation. London: Gordon and Breach, 1994. [5] J. Dassow and Gh. Paun. Regulated Rewriting in Formal Language Theory, volume 18 of EATCS Monographs in Theoretical Computer Science. Berlin: Springer, 1989. [6] H. Fernau and R. Freund. Bounded parallelism in array grammars used for character recognition. In P. Perner, P. Wang, and A. Rosenfeld, editors, Advances in Structural and Syntactical Pattern Recognition (Proceedings of the SSPR'96), volume 1121 of LNCS, pages 40{49, 1996. [7] H. Fernau and M. Holzer. Regulated nite index language families collapse. Technical Report WSI{96{16, Universitat Tubingen (Germany), Wilhelm-Schickard-Institut fur Informatik, 1996. [8] H. Fernau and D. Watjen. Remarks on regulated limited ET0L systems and regulated context-free grammars. Technical Report WSI{96{19, Universitat Tubingen (Germany), Wilhelm-Schickard-Institut fur Informatik, 1996. Accepted for publication in Theoretical Computer Science. [9] D. Hauschildt and M. Jantzen. Petri net algorithms in the theory of matrix grammars. Acta Informatica, 31:719{728, 1994. [10] R. Meersman and G. Rozenberg. Cooperating grammar systems. In Proceedings of Mathematical Foundations of Computer Science MFCS'78, volume 64 of LNCS, pages 364{374. Berlin: Springer, 1978. [11] V. Mitrana. Hybrid cooperating/distributed grammar systems. Computers and Arti cial Intelligence, 12(1):83{88, 1993. 51

[12] N. J. Nilsson. Principles of Arti cial Intelligence. Springer, 1982. [13] Gh. Paun. On the generative capacity of hybrid CD grammar systems. J. Inf. Process. Cybern. EIK (formerly Elektron. Inf.verarb. Kybern.), 30(4):231{244, 1994. [14] G. Rozenberg and D. Vermeir. On ET0L systems of nite index. Information and Control (now Information and Computation), 38:103{133, 1978. [15] G. Rozenberg and D. Vermeir. On the e ect of the nite index restriction on several families of grammars. Information and Control (now Information and Computation), 39:284{302, 1978. [16] G. Rozenberg and D. Vermeir. On the e ect of the nite index restriction on several families of grammars; Part 2: context dependent systems and grammars. Foundations of Control Engineering, 3(3):126{142, 1978. [17] B. Rozoy. The Dyck language D0 is not generated by any matrix grammar of nite index. Information and Computation (formerly Information and Control), 74:64{89, 1987. [18] A. K. Salomaa. On the index of a context-free grammar and language. Information and Control (now Information and Computation), 14:474{ 477, 1969. [19] S. H. von Solms. Some notes on ET0L-languages. International Journal of Computer Mathematics, 5(A):285{296, 1976. 1

52