mar systems. The simulation of multi-patterns by context-free PC grammar ... 23]; close connections with two-level grammars are also discussed in 1]. Another ...
Pattern Languages versus Parallel Communicating Grammar Systems Sorina Dumitrescu
University of Bucharest, Faculty of Mathematics Str. Academiei 14, 70109 Bucuresti, Romania
Gheorghe Paun
Institute of Mathematics of the Romanian Academy PO Box 1-764, 70700 Bucuresti, Romania
Arto Salomaa
Academy of Finland and Turku University Department of Mathematics 20014 Turku, Finland
Turku Centre for Computer Science TUCS Technical Report No 42 September 1996 ISBN 951-650-835-9 ISSN 1239-1891
Abstract We compare the power of two (fairly dierent) recently investigated language identifying devices: patterns and parallel communicating (PC) grammar systems. The simulation of multi-patterns by context-free PC grammar systems is rather obvious, but, unexpectedly, this can be realized also by (non-centralized) PC grammar systems with right-linear components. Moreover, in nite multi-patterns forming a regular set can also be simulated by PC grammar systems with right-linear components, whereas PC grammar systems with context-free components can simulate context-free multi-patterns with context-free domains for variables.
TUCS Research Group
Mathematical Structures of Computer Science
1 Introduction The aim of this paper is to compare (the power of) two language identifying devices which were recently investigated in rather dierent contexts: patterns and parallel communicating (PC) grammar systems. We consider here a pattern in the sense of [2], as a string of terminal symbols and variables; interpreting uniformly the variables (replacing them by terminal strings, dierent occurrences of the same variable being replaced by the same string), we obtain a language, the set of all strings of the same \shape" as de ned by the pattern. The main concern of [2] is to learn a pattern starting from a set of strings associated with it. However, the notion of a pattern is related to many fundamental issues in formal language theory and combinatorics on words, starting with avoidable and unavoidable patterns in strings, [3], [20], [25], and ending with natural decidability problems (equivalence and inclusion, [12], [13], [22], ambiguity, [15], etc.), and with possibilities to use patterns for building generative mechanisms, [7], [19], [23]; close connections with two-level grammars are also discussed in [1]. Another promising recent branch of formal language theory is grammar system theory. The basic idea is to consider several usual grammars and to put them to cooperate, in order to generate a unique, common, language. In this way, grammatical models for such important notions as multi-agent systems, distributed computing, synchronization, communication, etc are obtained. The eld was initiated in [16] (with motivations related to two-level grammars) and developed mainly after relating it in [5] with distributed architectures in Arti cial Intelligence (the blackboard model, [21]). Details can be found in [6]. We consider here the so-called parallel communicating (PC) grammar systems introduced in [24]: the grammars work synchronously, on their own sentential forms, and communicate by request. Speci cally, there are special nonterminal symbols, Qj , called query symbols, such that, when a component i introduces a query symbol Qj , the current sentential form of the component j is sent to the component i, where it replaces the occurrence of Qj in the sentential form of the component i. If several occurrences of Qj are introduced at the same step, then each of them is replaced by the current sentential form of the component j. This is very close to the way of replacing uniformly the variables in patterns, hence it is quite natural to investigate the relationship between the generative power of patterns and of PC grammar systems. Such a comparison is also important for the study of PC grammar systems: if the patterns can be simmulated by PC grammar systems { we shall see below that this is the case { then information about the power of PC grammar systems could be obtained. Moreover, results already known for 1
patterns (undecidability, complexity) could be carried on to PC grammar systems. As it is expected, the PC grammar systems prove to be able to simulate patterns, multi-patterns ( nite sets of patterns; the generated language is the union of languages generated by each pattern in the set), and even in nite { but regular or context-free { multi-patterns, even when the variables are replaced by strings in given regular (or context-free) languages associated to them. Surprisingly, when variables have associated regular languages, PC grammar systems with right-linear rules are still able to simulate multi-patterns and in nite regular multi-patterns. This is mainly due to the intricate behavior of PC grammar systems without restrictions about the communication graph, a feature already known from [6], [8], [17], etc.
2 Patterns and multi-patterns For an alphabet A, we denote by A the free monoid generated by A; is the empty string, jxj is the length of x 2 A, jxjB is the number of occurrences of symbols in B A in x 2 A. REG; CF; CS; RE denote the families in Chomsky hierarchy. Given an alphabet A and a set V = fX1; : : :; Xn g, disjoint of A (the elements of V are called variables, those of A are called terminals), a pattern over V [ A is a string 2 (V [ A). (Note that in contrast with [2], [12], we accept here as patterns also terminal strings 2 A, not only strings with jjV > 0.) Let us suppose that to each variable Xi 2 V we have associated a language Di A, in a given family F . (Di is called the domain of Xi.) Denote by D the sequence of languages D1; : : :; Dn and by HD the set of morphisms h : (V [ A) ?! A such that
h(a) = a; for a 2 A; h(Xi ) 2 Di; for 1 i n: Then, the language identi ed (we also say generated) by a pattern 2 (V [ A) is de ned by
LD () = fh() j h 2 HD g: Note that the terminals of remain unchanged, whereas the variables are uniformly replaced by strings in the associated languages (\uniformly" means here that multiple occurrences of the same variable are replaced by the same string). 2
A multi-pattern is a nite set of patterns, = f1; : : : ; mg. The associated language, with respect to D as above, is
LD () =
[m L
i=1
D (i ):
The family of all languages of the form LD (), for D1; : : :; Dn in a given family F , is denoted by MPLF . Here we will consider only F 2 fREG; CF g. Since we are dealing with multi-patterns in this paper, the distinction between erasing and nonerasing patterns, [12], is not important in our considerations, see Lemma 3 below. We present here some examples and counterexamples, some of them useful in the subsequent sections. Example 1. For V = fX1 g; A = fa; bg; 1 = X1 X1; D1 = A, we get
LD (1) = fww j w 2 fa; bgg: Note that this language, a model of the replication phenomenon in natural languages, is not context-free, but LD (1) 2 MPLREG: Example 2. For V = fX1; X2g; A = fa; b; cg; 2 = X1X2cX2X1 ; D1 = a+; D2 = b+ , we have
LD (2) = fanbmcbman j n; m 1g; hence LD (2) 2 MPLREG , too. Example 3. If L A; L 2 F , then L 2 MPLF : for V = fX1g; = X1; D1 = L, we have LD () = L: Proofs of the following three lemmas can be found in [9]. Lemma 1. fanbn j n 1g 2= MPLREG : Lemma 2. fanbn cn j n 1g 2= MPLCF : Lemma 3. For every multi-pattern (over A and V = fX1; : : : ; Xn g) and D = (D1 ; : : :; Dn ), domains in a family F, there is 0 (also over A and V) such that LD () = LD (0), for D0 = (D1 ? fg; : : :; Dn ? fg). 0
3 Parallel communicating grammar systems
A PC grammar system of degree n; n 1 ([24], [6]), is a construct ? = (N; T; K; (P1; S1); : : :; (Pn ; Sn)); 3
where N; T; K are pairwise disjoint alphabets, with K = fQ1; : : :; Qng, Si 2 N , and Pi are nite sets of rewriting rules over N [ T [ K; 1 i n; the elements of N are nonterminal symbols, those of T are terminals; the elements of K are called query symbols; the pairs (Pi; Si) are the components of the system. Note that, by their indices, the query symbols are associated with the components. When discussing the type of the components in Chomsky hierarchy, the query symbols are interpreted as nonterminals. For (x1; : : : ; xn); (y1; : : :; yn), with xi; yi 2 (N [ T [ K ); 1 i n (we call such an n-tuple a con guration), we write (x1; : : :; xn) =)r (y1; : : : ; yn) if one of the following two cases holds: (i) jxijK = 0 for all 1 i n; then xi =)Pi yi or xi = yi 2 T ; 1 i n; (ii) there is i; 1 i n; such that jxijK > 0; we write such a string xi as xi = z1Qi z2Qi : : :ztQit zt+1; for t 1; zi 2 (N [ T ); 1 i t + 1; if jxij jK = 0 for all 1 j t, then yi = z1xi z2xi : : : ztxit zt+1; [and yij = Sij ; 1 j t]; otherwise yi = xi. For all unspeci ed i we have yi = xi. Point (i) de nes a rewriting step (componentwise, synchronously, using one rule in all components whose current strings are not terminal), (ii) de nes a communication step: the query symbols Qij introduced in some xi are replaced by the associated strings xij , providing that these strings do not contain further query symbols. The communication has priority over rewriting (a rewriting step is allowed only when no query symbol appears in the current con guration). The work of the system is blocked when circular queries appear, as well as when no query symbol is present but point (i) is not realized because a component cannot rewrite its sentential form, although it is a nonterminal string. The above considered relation =)r is said to be performed in the returning mode: after communicating, a component resumes working from its axiom. If the brackets, [and yij = Sij ; 1 i t], are removed, then we obtain the non-returning mode of derivation: after communicating, a component continues the processing of the current string. We denote by =)nr the obtained relation. The language generated by ? is the language generated by its rst component (G1 above), when starting from (S1; : : :; Sn), that is Lf (?) = fw 2 T j (S1; : : :; Sn) =)f (w; 2; : : :; n ); for i 2 (N [ T [ K ); 2 i ng; f 2 fr; nrg: 1
2
1
2
4
(No attention is paid to strings in the components 2; : : : ; n in the last con guration of a derivation; moreover, it is supposed that the work of ? stops when a terminal string is obtained by the rst component.) Two basic classes of PC grammar systems can be distinguished: centralized (only G1 , the master of the system, is allowed to introduce query symbols), and non-centralized (no restriction is imposed on the introduction of query symbols). Therefore, we get four basic families of languages: denote by PC (X ) the family of languages generated in the returning mode by non-centralized PC grammar systems with rules of type X (and of arbitrary degree); when centralized systems are used, we add the symbol C, when the non-returning mode of derivation is used, we add the symbol N, thus obtaining the families CPC (X ); NPC (X ); NCPC (X ). In what concerns X , we consider here right-linear (RL) and context-free (CF) rules. In both cases, we allow only -free rules. If the language we consider contains the empty string, then a rule S ! is allowed in the master grammar. (Note that, because the derivation stops in that moment, cannot be communicated to another component.) The diagram in Figure 1 indicates the relations between the eight basic families of languages discussed in this paper, as well as their relationships with families in the Chomsky hierarchy (MAT denotes the family of languages generated by matrix grammars with -free context-free rules and without appearance checking and LIN is the family of linear languages). The arrows indicate inclusions, not necessarily proper; the families not connected by a path are not necessarily incomparable. Proofs of these relations can be found in [6], [8], [17], [18]. Let us consider two examples. For the system ?1 = (fS1; S2; S3g; fa; b; cg; K; (P1; S1); (P2; S2); (P3; S3)); P1 = fS1 ! abc; S1 ! a2b2c2; S1 ! aS1; S1 ! a3Q2; S2 ! b2Q3; S3 ! cg; P2 = fS2 ! bS2g; P3 = fS3 ! cS3g; we obtain
Lr (?) = Lnr (?) = fanbncn j n 1g; hence this language belongs to both CPC (RL) and NCPC (RL). Here is a derivation in ?1 : (S1; S2; S3) =)f (aS1; bS2; cS3) =)f : : : =)f (anS1; bnS2; cnS3); =)f (an+3Q2; bn+1 S2; cn+1 S3) =)f (an+3bn+1 S2; y2; cn+1S3) 5
=)f (an+3bn+3 Q3; y20 ; cn+2 S3) =)f (an+3 bn+3cn+2S3; y20 ; y3) =)f (an+3bn+3 cn+3; y200; y30 ); n 0; for f 2 fr; nrg; in the returning case we have y2 = S2; y20 = bS2; y200 = b2S2; y3 = S3; y30 = cS3, in the non-returning case y2 = bn+1S2; y20 = bn+2S2; y200 = bn+3 S2; y3 = cn+2 S3; y30 = cn+3S3. Because the second and the third components communicate only once to the rst component, there is no dierence between the language generated in the returning mode and the language generated in the non-returning mode. This is not the case for the following system. ?2 = (fS1; S2g; fag; K; (P1; S1); (P2; S2)); P1 = fS1 ! aQ2; S2 ! aQ2; S2 ! ag; P2 = fS2 ! aS2g: The reader might check that we obtain Lr (?2) = fa2n+1 j n 1g; m m j m 1g: Lnr (?2) = fa (
RE
+1)( 2
+2)
6
CS
6BMB PC (CF ) (CF ) BNPC Pi PP 1 B 6@I@ 6 PPP B P (RL) @ @ B PC (RL) NPC MAT BB BM@ IB 7 6 BB BB@@ CPC ( CF ) NCPC ( CF ) BB @@ @I ? ] J J BB @ @ @@ ?? JJ BB @@ CF6 JJ @ CPC (RL) NCPC (RL) @ LIN > }Z Z ZZ 6 ZZ REG
Fig. 1
6
4 Simulating multi-patterns by PC grammar systems The rst result is a direct consequence of the mode of de ning the language generated by the two types of devices: Theorem 1. MPLCF (CPC (CF ) \ NCPC (CF )): Proof. Let = f1; : : : ; mg be a multi-pattern over V = fX1 ; : : : ; Xn g and A, and consider the context-free languages D1; : : : ; Dn A generated by the grammars Gi = (Ni; A; Pi; Si); 1 i n. According to Lemma 3, we may assume that all Gi are -free, 1 i n ( 2= Di ; 1 i n). We construct the PC grammar system (without -rules) ? = (N; A; K; (P0; S0); (P1; S1); : : : ; (Pn; Sn )); where
[n
N = fS0g [ Ni; i=1 P0 = fS0 ! S0g [ fS0 ! u j u 2 ; u 2 Ag [fS0 ! u1Qi u2Qi : : : utQit ut+1 j t 1; u1Xi u2Xi : : :utXit ut+1 2 ; uj 2 A; 1 j t + 1; 1 ij n; 1 j tg: The rules S0 ! u in P0 simulate the patterns in containing no variable, the rules S0 ! u1Qi u2Qi : : :utQit ut+1 simulate the patterns in containing variables. When using such a rule in P0 , all the strings generated by components Pij (corresponding to grammars Gij ) have to be terminal, otherwise the work of ? is blocked, P0 cannot rewrite symbols in Nij . Due to the presence of the rule S0 ! S0 in P0 and because a component may do nothing when its sentential form is terminal, the derivations of the components can be nished independently of each other. Because in every derivation there is at most one communication step, there is no dierence between the returning and the non-returning modes of derivation. Consequently, Lr (?) = Lnr (?) = LD (), for D = (D1; : : : ; Dn), that is we have the inclusion MPLCF (CPC (CF ) \ NCPC (CF )). In view of Lemma 2 and the rst example at the end of the previous section, the inclusion is proper. 2 The result in the previous theorem cannot be improved to MPLCF (CPC (RL) \NCPC (RL)), even when replacing MPLCF by MPLREG . More precisely, we have 1
1
2
1
2
2
7
Theorem 2. Each of the families MPLREG ; MPLCF is incomparable
with each of the families CPC (RL); NCPC (RL). Proof. In Theorem 7.7 in [6] it is proved that the linear language L = fanbmcbman j n; m 1g is not in CPC (RL) [ NCPC (RL). In Section 2, Example 2, we have seen that L 2 MPLREG. On the other hand, from Lemma 2 we know that fanbn cn j n 1g 2= MPLCF ; as shown at the end of Section 3, this language is in CPC (RL) \ NCPC (RL). Using the obvious inclusion MPLREG MPLCF (proper, because fanbn j n 1g 2 CF MPLCF and fanbn j n 1g 2= MPLREG , by Lemma 1), we have the incomparabilities in the theorem. 2 From the proof of Theorem 1 we do not obtain the inclusion MPLREG (CPC (RL) \NCPC (RL)), because the rules S0 ! u1Qi u2Qi : : : utQit ut+1 in P0 do not depend on the variable domains, but on the form of the patterns. However, a similar inclusion is true for non-centralized systems: Theorem 3. MPLREG NPC (RL): Proof. Let = f1; : : :; mg be a multi-pattern over V = fX1 ; : : :; Xn g and A, let D1 ; : : :; Dn A be regular languages (associated with X1; : : :; Xn ). According to Lemma 3, we may assume that 2= Di; 1 i n: For each i take a (-free) right-linear grammar Gi = (Ni; A; Pi; Si) such that Di = L(Gi ); 1 i n. The family NPC (RL) is closed under union (Theorem 7.56 in [6] and the remark after it), therefore it is enough to prove that LD (i) 2 NPC (RL) for each i; 1 i n. Take a pattern = u1Xi u2Xi : : : uk Xik uk+1; uj 2 A; k 1, 1 j k + 1, 1 ij n; 1 j k. We construct the PC grammar system 1
1
2
2
? = (N; A; K; (P0; S0); (P10 ; S1); : : : ; (Pn0 ; Sn); (Pn+1 ; S ); (Pn+2 ; S ); : : :; (Pn+k?1 ; S ); (Pn+k ; S )); where
N = fS0; S g [ P0 = Pi0 = Pn+j
[n N [ fY j 1 j kg [ f[a] j a 2 Ag; i j
i=1 fS0 ! S0; S0 ! u1Qn+1g [ [fYj ! uj+1Qn+j+1 j 1 j k ? 1g; (Pi ? fB ! x 2 Pi j x 2 Ag) [ [fB ! x0[a] j B ! x0a 2 Pi; x0 2 A; a 2 Ag [
[f[a] ! [a] j a 2 Ag; for 1 i n; = fS ! S; S ! Qij ; Yj ! Yj g [ 8
Pn+k
[f[a] ! aYj j a 2 Ag; for 1 j k ? 1; = fS ! S; S ! Qik g [ [f[a] ! auk+1 j a 2 Ag:
The query symbols Qij are associated with components Pi0j ; 1 ij n; 1 j k, whereas Qn+j are associated with Pn+j ; 1 j k. Let us examine the work of ?, in the non-returning mode. The components Pi0 generate the languages fz[a] j za 2 Di ; a 2 Ag, for Di associated with the variables Xi ; 1 i n. The lengths of derivations on various components Pi0 are not related, due to rules [a] ! [a]; a 2 A; present in each of these components (and of rules S0 ! S0; S ! S present in the other components). The strings generated by the components Pi0; 1 i n, are requested by the components Pn+j , associated with the variable occurrences Xij ; 1 j k, in the pattern . When some component Pn+j introduces the query symbol Qij , the string of Pi0j must be of the form z[a], for some za 2 Dij and a 2 A, otherwise the work of ? is blocked (the nonterminals in Nij cannot be rewritten by Pn+j ). If Xij = Xir ; j 6= r, and the two components Pn+j ; Pn+r request the string of Pi0j at dierent moments, then they receive the same string, z[a], because the simulation of Gij by Pi0j is nished, no modi cation of the string is possible any more. After receiving a string z[a]; for za 2 Dij ; a 2 A, the component Pn+j will replace [a] with aYj , using from now on only the rule Yj ! Yj , if j k ? 1. If j = k, then [a] is replaced by auk+1 and the string is terminal. Note that the rule [a] ! aYj is speci c to the component Pn+j ; 1 j k ? 1. Finally, P0, the master component, after an arbitrary number of steps of using the rule S0 ! S0, starts building the interpretation of . After introducing u1Qn+1, it will receive some string x1Y1 from Pn+1 , hence we obtain u1x1Y1 on the rst component. In general, when we have wYj on the rst component, j k ? 2, the only possible derivation step leads to wuj+1Qn+j+1 . The string xj+1Yj+1 of the component Pn+j+1 is communicated to the rst component. When w0Yk?1 is obtained, we derive w0uk Qn+k ; after communicating auk+1, the derivation is completed and we obtain a string in LD (). Using these explanations, it is easy to see that all strings of LD () can be produced by ? (in the non-returning mode) and, conversely, all strings of Lnr (?) are in LD (). This proves that MPLREG NPC (RL). The inclusion is proper, because the language L = fanbn j n 1g is not in MPLREG (Lemma 1), but, it is easy to see, L 2 CPC (RL) \ NCPC (RL). 2 9
Corollary 1. MPLREG PC (RL):
5 In nite multi-patterns A natural generalization of the notion of a multi-pattern is to consider sets consisting of in nitely many patterns over given V = fX1 ; : : :; Xn g and A. For given languages D1 ; : : :; Dn associated with variables in V , we de ne
LD () =
[L
2
D ( ):
Of course, restrictions must be imposed on , otherwise arbitrary languages can be obtained, starting from arbitrary sets of patterns. We consider here the case when is a regular or a context-free language, and we denote by RMPLF (by CMPLF , respectively) the family of languages of the form LD (), with regular (context-free) and D1; : : : ; Dn 2 F , for F a given family of languages. The assertion in Lemma 3 is true for multi-patterns in any family of languages closed under (erasing) morphisms and union. Hence, also for regular and for context-free in nite multi-patterns we may assume, without loss of generality, that the variable domains are -free. This will be implicitly assumed in the following sections, without specifying it again in each speci c case. The generalization of multi-patterns to (regular or context-free) in nite multi-patterns is eective. Speci cally, the following result holds (a proof can be found in [9]). Lemma 4. All inclusions MPLF RMPLF CMPLF ; F 2 fREG; CF g, are proper. For instance, we have Lemma 5. fanbncn j n 1g 2= CMPLCF ; fanbn j n 1g 2= RMPLREG . Proof. Assume that L = fanbncn j n 1g is in CMPLCF . Take , a context-free set of patterns over V = fX1; : : : ; Xn g and A, and D1; : : :; Dn A, context-free languages associated to X1; : : :; Xn . As we have pointed out at the beginning of this section, we may assume that 2= Di ; 1 i n. Assume that is in nite. Being context-free, it has pumping properties, hence there is a pattern 2 which can be written in the form = uvwxy, for vx 6= , such that uvrwxr y 2 for all r 1. Interpreting uvrwxr y, we get some strings u0v0rw0x0ry0, which are in LD () for all r 1. Because
10
vx 6= and 2= Di for all i, it follows that v0x0 6= . Strings of the form u0v0rw0x0ry0 cannot be in L for all r 1, a contradiction. It follows that must be nite. This means that L 2 MPLCF , which contradicts Lemma 2. Consequently, L 2= CMPLCF . A similar argument proves that fanbn j n 1g 2= RMPLREG . The problem is reduced to the fact that fanbn j n 1g 2= MPLREG (Lemma 1). 2 In view of Lemma 4, it makes sense to ask whether or not the result of Theorem 3 (and of its Corollary) can be strengthened to in nite multipatterns. This is, indeed, the case: Theorem 4. RMPLREG NPC (RL). Proof. Let be a regular language over V [ A; V = fX1 ; : : :; Xn g, recognized by a deterministic nite automaton M = (Q; V [ A; s0; F; ). Assume that the states of M are Q = fs0; s1; : : :; sr g, for some r 0. Take a (free) regular grammar Gi = (Ni; A; Pi; Si) for each Di; 1 i n, where D1; : : :; Dn are the domains of X1; : : :; Xn . We construct the PC grammar system ? = (N; A; K; (P0 ; S0); (P10 ; S1); : : :; (Pn0 ; Sn); (P1;0; S ); (P1;1; S ); : : :; (P1;r ; S ); (P1;f ; S ); (P2;0; S ); (P2;1; S ); : : :; (P2;r ; S ); (P2;f ; S ); : : : : : : : : :: : : : : : (Pn;0 ; S ); (Pn;1; S ); : : :; (Pn;r ; S ); (Pn;f ; S )); where
[n
N = fS0; S g [ Ni [ fYj j 0 j rg[; i=1 [f[sj ] j 0 j rg [ f[a] j a 2 Ag; P0 = fS0 ! S0; S0 ! [s0]g [ fS0 ! u j u 2 ; u 2 Ag [ [f[sj ] ! a[sg] j a 2 A; sg = (sj ; a); 0 j; g rg [ [fYj ! a[sj ] j a 2 A; sg = (sj ; a); 0 j rg [ [f[sj ] ! Qi;g j sg = (sj ; Xi ); 1 i n; 0 j; g rg [ [fYj ! Qi;g j sg = (sj ; Xi); 1 i n; 0 j; g rg [ [f[sj ] ! a j a 2 A; (sj ; a) 2 F; 0 j rg [ [fYj ! a j a 2 A; (sj ; a) 2 F; 0 j rg [ [f[sj ] ! Qi;f j (sj ; Xi ) 2 F; 1 i n; 0 j rg; [fYj ! Qi;f j (sj ; Xi) 2 F; 1 i n; 0 j rg [ 11
Pi0 = (Pi ? fB ! x 2 Pi j x 2 Ag) [ [fB ! x0[a] j B ! x0a 2 Pi ; x0 2 A; a 2 Ag [ [f[a] ! [a] j a 2 Ag; for 1 i n; Pi;j = fS ! S; S ! Qi; Yj ! Yj g [ [f[a] ! aYj j a 2 Ag; for 1 i n; 0 j r; Pi;f = fS ! S; S ! Qig [ [f[a] ! a j a 2 Ag; for 1 i n: The query symbols Qi are associated with the components Pi0; 1 i n; Qi;j are associated with Pi;j ; 1 i n; 0 j r; and Qi;f with Pi;f ; 1 i n. Like in the case of the system in the proof of Theorem 3, the components Pi0; 1 i n, produce (independently on each other, due to rules [a] ! [a]; a 2 A; present in each of them), strings in fz[a] j za 2 Di ; a 2 Ag. These strings (and not other nonterminal strings generated by Pi ) are communicated to components Pi;j ; Pi;f , for all j; 0 j r. In Pi;j the symbol [a] is replaced by aYj , associated to the state sj of M , whereas in Pi;f , [a] is replaced by a. After using the rule S0 ! S0 for a while, P0 starts to simulate the work of M , introducing query symbols instead of variables. The query symbols Qi;j introduced in this way by P0 identify both the variable Xi and the state sj of M corresponding to the current step of parsing a string in by M . The information about the state is passed from Qi;j to Yj , communicated from Pi;j , hence the parsing is correctly continued. The components Pi;f ; 1 i n (f from \ nal"), are used in order to avoid introduction of -rules in P0 : when a pattern in ends with a variable, say Xi , that is (sj ; Xi) 2 F , then we use Yj ! Qi;f or [sj ] ! Qi;f in P0. The string communicated by Pi;f is terminal, hence the derivation ends. Using also the explanations in the proof of Theorem 3, one can see that LD () = Lnr (?), hence we have the inclusion RMPLREG NPC (RL). In view of Lemma 5, this inclusion is proper. 2 Corollary 2. RMPLREG PC (RL). Also the corresponding result for in nite regular multi-patterns with context-free domains for variables is true. In fact, a more general result holds, strengthening partly the assertion in Theorem 1. Theorem 5. CMPLCF NCPC (CF ): Proof. Take a context-free grammar G0 = (N 0 ; V [ A; P 0; S 0), in the Chomsky normal form, generating a multi-pattern (over A and V = 12
fX1; : : :; Xn g). Consider also the (-free) context-free grammars Gi = (Ni; A; Pi; Si) for the domains Di ; 1 i n of variables. Without loss of generality, we may assume that Ni \ N 0 = ; for all i; 1 i n. We construct the system
? = (N; A; K; (P0; S0); (P1; S1); : : : ; (Pn; Sn )); where
N = fS0; S 0g [ P0 =
[n N [ N 0;
i i=1 fS0 ! S0; S0 ! S 0g [ [fS0 ! j if 2 g [ [(P 0 ? fB ! Xi j 1 i ng) [ [fB ! Qi j B ! Xi 2 P 0; 1 i ng:
Because ? works in the non-returning mode and only terminal strings generated by the components Pi (hence strings in Di ) can be communicated to P0 without blocking the derivation, we obtain Lnr (?) = LD (): From Lemma 5, we nd that this inclusion is proper. 2 Corollary 3. CMPLCF NPC (CF ) ( PC (CF )): The above construction does not work in the returning and centralized case, hence the relation between CMPLCF and CPC (CF ) remains open.
6 Final remarks A variant of the notion of a pattern, already considered in [2], is based on the idea of using \mirrored" variables: for each variable X consider also X 0; when interpreting a pattern, the primed variables are replaced by mirror images of strings which replace the non-mirrored variables. Let us denote by RMP 0 LF the family of languages generated by regular multi-patterns with reversal (that is regular sets of patterns with reversal), using domains in a given family F ; by CMP 0LF we denote the corresponding family for the case of context-free multi-patterns. We conjecture that RMP 0 LREG ? NPC (RL) 6= ; (a possible language proving this assertion is fwc mi(w) j w 2 fa; bgg which can be obtained from the pattern = X1cX10 , with D1 = fa; bg). However, using techniques similar to those in the previous sections, one can prove the strict inclusion CMP 0LREG NPC (CF ). 13
The problem whether or not CMP 0LCF NPC (CF ) remains open; the diculty here is to simulate by a PC grammar system the generation of both x and mi(x), for x in the domain of a variable, when the domain is a context-free non-regular language. As a conclusion of this paper, we can emphasize the fact that PC grammar systems prove to be powerful enough to simulate diverse classes of multipatterns. This can lead to some interesting consequences. For instance, in [2] (Theorem 3.6) it is proved that the uniform membership problem for pattern languages (generated by single patterns with A as domain of each variable) is NP-complete. (Uniform membership means that both the string and the pattern are inputs of the problem: is x an element of L() for arbitrary x and ? Non-uniform membership means to ask whether or not x 2 L() for given and arbitrary x.) Therefore, the uniform membership for all families Y (X ); Y 2 fPC; CPC; NPC; NCPC g; X 2 fRL; CF g, with the exception of CPC (RL) and NCPC (RL), is NP-complete. Contrast this with the result in [4]: the non-uniform membership for Y (RL); Y as above, is decidable in polynomial time.
Acknowledgement: Research supported by the Academy of Finland,
Project 11281
References [1] J. Albert, L. Wegner, \Languages with homomorphic replacements", Proc. ICALP 80, LNCS 85, Springer, Berlin, 1980, 19 { 29. [2] D. Angluin, \Finding patterns common to a set of strings", J. Comput. System Sci., 21 (1980), 46 { 62. [3] D. R. Bean, A. Ehrenfeucht, G. F. McNulty, \Avoidable patterns in strings of symbols", Paci c J. Math., 85 (1979), 261 { 294. [4] L. Cai, \The computational complexity of linear PCGS", Computers and AI, 15 (1996), 199 { 210. [5] E. Csuhaj-Varju, J. Dassow, \On cooperating distributed grammar systems", J. Inform. Process. Cybern., EIK, 26 (1990), 49 { 63. [6] E. Csuhaj-Varju, J. Dassow, J. Kelemen, Gh. Paun, Grammar Systems. A Grammatical Approach to Distribution and Cooperation (Gordon and Breach, London, 1994). 14
[7] J. Dassow, Gh. Paun, A. Salomaa, \Grammars based on patterns", Intern. J. Found. Computer Sci., 4 (1993), 1 { 14. [8] S. Dumitrescu, \Non-returning PC grammar systems can be simulated by returning systems", Theoretical Computer Sci., 161 (1996). [9] S. Dumitrescu, Gh. Paun, A. Salomaa, \Languages associated to nite and in nite sets of patterns", Rev. Roum. Math. Pures Appl., 49 (1996), 613 { 631. [10] S. Ginsburg, The Mathematical Theory of Context-free Languages (McGraw Hill Book Comp., New York, 1966). [11] D. Hauschild, M. Jantzen, \Petri nets algorithms in the theory of matrix grammars", Acta Informatica, 31 (1994), 719 { 728. [12] T. Jiang, E. Kinber, A. Salomaa, K. Salomaa, S. Yu, \Pattern languages with and without erasing", Intern. J. Computer Math., 50 (1994), 147 { 163. [13] T. Jiang, A. Salomaa, K. Salomaa, S. Yu, \Decision problems for patterns", J. Computer System Sci., 50 (1995), 53 { 63. [14] L. Kari, A. Mateescu, Gh. Paun, A. Salomaa, \Multi-pattern languages", Theoretical Computer Sci., 141 (1995), 253 { 268. [15] A. Mateescu, A. Salomaa, \Finite degrees of ambiguity in pattern languages", RAIRO. Th. Inform. and Appl., 28 (1994), 233 { 253. [16] R. Meersman, G. Rozenberg, \Cooperating grammar systems", Proc. MFCS 78, LNCS 64, Springer-Verlag, Berlin, 1978, 364 { 374. [17] V. Mihalache, \Matrix grammars versus parallel communicating grammar" systems, in vol. Mathematical Aspects of Natural and Formal Languages ed. Gh. Paun (World Sci. Publ., Singapore, 1994) pp. 293 { 318. [18] V. Mihalache, \On the generative capacity of parallel communicating grammar systems with regular components", Computers and AI, 15 (1996) 155 { 172. [19] V. Mitrana, Gh. Paun, G. Rozenberg, A. Salomaa, \Pattern systems", Theoretical Computer Sci., 154 (1996), 183 { 201. [20] M. Morse, G. Hedlund, \Unending chess, symbolic dynamics and a problem in semigroups", Duke. Math. J., 11 (1944), 1 { 7. 15
[21] P. H. Nii, \Blackboard systems", in The Handbook of AI, vol. 4 (A. Barr, P. R. Cohen, E. A. Feigenbaum, eds.) (Addison-Wesley, Reading, Mass., 1989). [22] E. Ohlebusch, E. Ukkonen, On the equivalence problem for E-pattern languages, Theoretical Computer Sci., to appear. [23] Gh. Paun, G. Rozenberg, A. Salomaa, \Pattern grammars", J. Automata, Languages, Combinatorics, to appear. [24] Gh. Paun, L. S^antean, \Parallel communicating grammar systems: the regular case", Ann. Univ. Buc., Series Matem.-Inform., 38 (1989), 55 { 63. [25] A. Thue, \U ber unendliche Zeichenreihen", Norske Vid. Selsk. Skr., I. Mat. Nat. Kl., Kristiania, 7 (1906), 1 { 22.
16
Turku Centre for Computer Science Lemminkaisenkatu 14 FIN-20520 Turku Finland http://www.tucs.abo.
University of Turku Department of Mathematical Sciences
Abo Akademi University Department of Computer Science Institute for Advanced Management Systems Research
Turku School of Economics and Business Administration Institute of Information Systems Science