rules, the ways of introducing context observing abilities in mechanisms which look \context-free-like" are much more numerous: intersections, inverse mor-.
Characterizations of Recursively Enumerable Languages by Using Copy Languages Gheorghe Paun Arto Salomaa
Turku Centre for Computer Science TUCS Technical Report No 130 October 1997 ISBN 952-12-0078-2 ISSN 1239-1891
Abstract We give characterizations of recursively enumerable languages starting from copy languages, that is languages of the form fxx j x 2 Lg, where L is a regular language and x is the barred version of x. One characterization uses an intersection of morphic images of two copy languages, the other one uses a quotient of morphic images of two copy languages. As a consequence, we nd similar characterizations of recursively enumerable languages starting from languages generated by (non-returning non-centralized) parallel communicating grammar systems with right-linear rules.
Keywords: recursively enumerable languages, copy languages, parallel communicating grammar systems
TUCS Research Group
Mathematical Structures of Computer Science
1 Introduction The characterization of recursively enumerable (RE) languages (hence of the power of Turing machines or type-0 Chomsky grammars) is one of the most investigated topics in formal language theory. Basically, two main approaches can be identi ed in this area: (1) start from languages in a proper subfamily of RE and represent each language in RE by using certain operations with languages (see [1], [8], [10], etc.), and (2) start from generative mechanisms known (or not) to generate strict subfamilies of RE and add some modi cations to the work of these mechanisms, such that the full power of Turing machines is reached. Of this type are the results in [6], [7], [9], [13], [16], [18], [20], and in many other papers, as well as several results in the regulated rewriting area (see [3]). The basic implicit ingredients of all these characterizations are the capability of observing contexts and of erasing. While erasing is obtained mainly by using erasing morphisms (codings), quotients, or simply by rules, the ways of introducing context observing abilities in mechanisms which look \context-free-like" are much more numerous: intersections, inverse morphisms, quotients, sequential transducers, erasing rules of the form AB ! , and so on. One of the most fruitful { and powerful { ideas is that of a leftmost derivation. We follow here the rst direction, and we prove that characterizations of RE languages, such as the ones involving linear languages ([1], [11]), can be obtained also by using copy languages. Speci cally, for a regular language L over some alphabet V , consider the alphabet V = fa j a 2 V g; for a string x 2 V (V is the set of all strings over the alphabet V , including the empty string denoted by ) we denote by x the string obtained by replacing each symbol a 2 V appearing in x by its barred version a. Then we denote copy(L) = fxx j x 2 Lg: Each RE language can be written (1) as the morphic image of the intersection of two languages of the form h(copy(L)), where h is a morphism and L is a regular language, or (2) as the left quotient of two languages of the form h(copy(L)) as above. The similarity with the results in [1] and [11] is apparent. This similarity is not accidental: note the structural similarity of linear languages with copy languages, disregarding the fact that they occupy different places in Chomsky hierarchy (the copy languages are not necessarily context-free). Denote by mi(x) the mirror image of a string x. Then every linear language L can be written in the form L = fh(x mi(x)) j x 2 L0g, for a regular language L0 and a morphism h. Removing the mirror image, 1
we get a morphic image of a copy language, exactly as used in the above mentioned characterizations of RE languages. A language of the form h(copy(L)) can be generated in a direct way by a parallel communicating (PC) grammar system, a grammatical model of parallel computing introduced in [15] (see also [2] and [4]). Such a system consists of several usual grammars working synchronously on their sentential forms (one starts from separate axioms of these grammars) and communicating on request by introducing query symbols. The language of a designated component grammar is the language of the system. PC grammar systems with right-linear rules are able to generate languages of the form h(copy(L)), with a regular L. The resulting characterizations of RE languages should be compared with those in [14], where a characterization of RE languages is obtained by using PC grammar systems with non-erasing context-free components working in the leftmost manner (when a rule is used, it replaces the leftmost occurrence of its left hand nonterminal in the current sentential form). Knowledge of basic formal language theory will be assumed on the part of the reader. Whenever need arises, [19] can be consulted.
2 Characterizing RE by using copy languages The basic result of this paper is the following one. Theorem 1. For each language L 2 RE , two regular languages L1; L2 and three morphisms h1; h2; h3 such that L = h3(h1 (copy(L1 )) \ h2(copy(L2))) can be eectively constructed. Proof. Take a type-0 grammar G = (N; T; S; P ), denote V = N [ T and assume that P = fri : ui ! vi j 1 i ng. Consider the alphabets V0 = fc; d; e; f; gg [ fbi; b0i j 1 i ng; V1 = fa0 j a 2 V g; V2 = fa00 j a 2 V g; and de ne the morphism h0 : V ?! V1 by h0(a) = a0; a 2 V; and the regular languages L1; L2 by L1 = fb0i1 cx2b0i2 y2c : : : cxk?1b0i ?1 yk?1 exkbi yk f j k 1; xj ; yj 2 V1; 1 j k ? 1; xk ; yk 2 T ; 1 ij n; 1 j k; ri1 is a rule with S on the left side and ri is a terminal ruleg; L2 = fdez2c : : : czk?1czk gzk+1 f j k 2; zi 2 V1; 1 i k ? 1; 1 j k; zk 2 V ; zk+1 2 V2g: k
k
k
2
Consider also the morphisms
h1 : (V0 [ V1 [ V 0 [ V 1) ?! (V [ V1 [ fc; eg) h2 : (V1 [ V2 [ V 1 [ V 2 [ fc; d; e; f; g; c; d; e; f; gg) ?! (V [ V1 [ fc; eg); de ned by
h1(b0i) = h0(ui); h1(b0i) = h0(vi); h1(bi) = ui; h1(bi) = vi; 1 i n; h1(a) = h1(a0) = h1(a0) = a0; h1(a) = a; a 2 V; h1(c) = h1(c) = = h1(e) = h1(f ) = c; h1(e) = e; h1(f) = :
h2(a0) = h2(a0) = a0; h2(a00) = ; h2(a00) = a; a 2 V; h2(c) = h2(e) = h2(c) = = h2(g) = c; h2(d) = S 0; h2(g) = e; h2(f ) = h2(f) = ; h2(d) = h2(e) = ;
From these constructions, one can see that h1(copy(L1)) contains strings of the form
h0(w1)ch0(w2)c : : : ch0(wk?1)ch0(wk )ch0(z1)ch0(z2)c : : : ch0(zk?1)ezk ; where wi =) zi in G, 1 i k, and w1 = S; zk 2 T , whereas h2(copy(L2 )) contains strings of the form
S 0ct2c : : : ctk?1ctk ct2c : : : ctk?1ctk etk+1; where ti 2 V1; 1 i k; tk+1 2 V2. Therefore, intersecting these languages we get strings of the form
h0(w1)ch0(w2)c : : :ch0(wk?1 )ch0(wk )ch0(w2)ch0(w3)c : : : ch0(wk )ezk; where wi =) zi in G, 1 i k ? 1, w1 = S; wk =) zk in G, and zk 2 T . Consequently, we have a correct terminal derivation in G, S =) w1 =) w2 =) : : : =) wk =) zk 2 T . With the morphism (in fact, a projection) h3 : (T [ V1 [ fc; eg) ?! T , de ned by
h3(a0) = ; a 2 V1; h3(c) = h3(e) = ; h3(a) = a; a 2 T; we clearly obtain L(G) = h3(h1(copy(L1)) \ h2(copy(L2))). 3
Let us denote by LnL0 the left quotient of the language L0 with respect to the language L, de ned as follows: LnL0 = fx j zx 2 L0; z 2 Lg:
Theorem 2. For each language L 2 RE there are two regular languages L1; L2 and two morphisms h1; h2 such that L = h1(copy(L1 ))nh2(copy(L2 )).
Proof. We construct L1 ; L2 and h1 as in the previous proof and modify the morphism h2 by considering h2(a00) = ; a 2 V (for the other symbols h2 remains the same). Then, the strings in h2(copy(L2)) are of the form S 0ct2c : : : ctk?1ctk ct2c : : : ctk?1ctk e; where ti 2 V1; 1 i k: Therefore, h2(copy(L2))nh1(copy(L1)) contains exactly the strings zk for which uezk 2 h1(copy(L1)) and ue 2 h2(copy(L2)). From the form of u, as in the proof above, we get zk 2 L(G).
3 Characterizing RE by means of PC grammar systems We introduce here only the PC grammar systems with right-linear rules, working in the so-called non-returning mode (after communicating, a component of the system continues to rewrite the current string, it does not return to the axiom, as in the case of the returning mode; the reader is referred to [2], [4] for details). A (right-linear) PC grammar system of degree n; n 1, is a construct ? = (N; T; K; (P1; S1); : : : ; (Pn; Sn )); where N; T; K are pairwise disjoint alphabets, with K = fQ1; : : : ; Qng, Si 2 N , and Pi are nite sets of right-linear rules over N [ T [ K; 1 i n, with the symbols in K not allowed to appear in the left hand member of rules; the elements of N are nonterminal symbols, those of T are terminals; the elements of K are called query symbols; the pairs (Pi ; Si) are the components of the system (often, the sets Pi are called components). Note that the query symbols are associated in a one-to-one manner with the components. When discussing the type of the components in the Chomsky hierarchy, the query symbols are interpreted as nonterminals. (Therefore, the rules in the sets Pi; 1 i n, are of the forms A ! x; A ! xB , with A 2 N; B 2 N [ K , and x 2 T .) For (x11; : : : ; xnn); (y1 1; : : :; yn n), with xi; yi 2 T and i; i 2 N [ K [ fg; 1 i n (we call such an n-tuple a con guration), and 1 6= , we 4
write (x11; : : :; xnn ) =) (y1 1; : : : ; yn n) if one of the following two cases holds: (i) i 2= K for all 1 i n; then, for 1 i n, either i 2 N and yi = xiz for some rule i ! z i in Pi , or i = , i = and xi = yi; (ii) there is i; 1 i n; such that i = Qj ; 1 ji n; for each such index i, if j 2= K , then yi i = xixj j ; otherwise yi i = xii. For all i such that i 2= K we have yi i = xii. Point (i) de nes a rewriting step (componentwise, synchronously, using one rule in all components whose current strings are not terminal), whereas (ii) de nes a communication step: the query symbols Qj introduced by some components Pi of the system are replaced by the associated strings xj j , providing that these strings do not contain further query symbols. The communication has priority over rewriting (a rewriting step is allowed only when no query symbol appears in the current con guration). The work of the system is blocked when circular queries appear, as well as when no query symbol is present but point (i) is not realized because a component cannot rewrite its sentential form, although it is a nonterminal string. The language generated by the system ? is the language generated by its rst component ((P1; S1) above), when starting from (S1; : : :; Sn ), that is i
i
i
i
i
i
i
L(?) = fw 2 T j (S1; : : :; Sn ) =) (w; 2; : : : ; n); for i 2 (N [ T [ K ); 2 i ng: (No attention is paid to strings in the components 2; : : : ; n in the last con guration of a derivation; moreover, it is supposed that the work of ? stops when a terminal string is obtained by the rst component.) We denote by NPCn (RL) the family of languages generated by PC grammar systems as above (hence with arbitrary right-linear rules) of degree at most n; n 1. Lemma 1. For each regular language L and morphism h we have h(copy(L)) 2 NPC4(RL). Proof. Let L V be generated by the regular grammar G = (N; V; S; P ), and h : (V [ V ) ?! U . Assume the rules in P labelled in a one-to-one manner with elements in a set Lab. We construct the PC grammar system ? = (N 0; U; K; (P1; S1); (P2; S2); (P3; S3); (P4; S4)); N 0 = fS1; S2; S3; S4g [ f[a] j a 2 V g [ f[r; X ] j r 2 Lab; X 2 N g; 5
P1 = fS1 ! S1; S1 ! Q2g [ f[a] ! h(a)Q3 j a 2 V g; P2 = fS2 ! Q4g [ f[r; Y ] ! h(a)Q4 j r : X ! aY 2 P g [ f[a] ! [a] j a 2 V g; P3 = fS3 ! Q4g [ f[r; Y ] ! h(a)Q4 j r : X ! aY 2 P g [ f[a] ! h(a) j a 2 V g; P4 = fS4 ! [r; X ] j r : S ! aX 2 P g [ f[r; X ] ! [r0; Y ] j r0 : X ! aY 2 P g [ f[r; X ] ! [a] j r 2 Lab; X ! a 2 P g: We obtain the equality L(?) = h(copy(L)). At each step, P4 introduces a label of a rule in P , following a correct derivation in G, and, also at each step, P2 and P3 ask for the symbol produced by P4 and \translate" it as indicated by the current label and the morphism h. Finally, P1, the master, collects the strings produced by P2; P3, concatenating them. Combining this lemma with Theorems 1, 2, we get the following result: Theorem 3. For each language L 2 RE and each n 4, we can nd L1; L2; L3; L4 2 NPCn (RL) and a morphism h such that L = h(L1 \ L2) = L3nL4: The families NPCn (RL); n 4; are closed under arbitrary morphisms. Therefore, if they are also closed under intersection, then they are themselves equal to RE . Although an open problem at the moment, this is not likely to be true.
4 Final remarks A characterization of RE languages similar to the previous ones and to those based on linear languages in [1] and [11] is given in [5]. It is based on the so-called twin-shue languages: for an alphabet V , consider the alphabet V of barred versions of symbols in V , and de ne the language [ t x); TSV = (x ? x2V
where ? t is the shue ooperation (u ?t v = fu1v1. . . unvn j n 1; u = u1 : : :un; v = v1 : : :vn, ui; vi 2 V ; 1 i ng). This is the twin-shue language over V . In [5] it is proved that each language L 2 RE can be written in the form L = h(TSV \ R), where h is a morphism, TSV is the twin-shue language over an alphabet V , and R is a regular language. 6
Observe the similarity of all these three types of languages used in characterizations of RE mentioned above, linear, copy and twin-shue: a copy of a string x is concatenated with mi(x) in the case of linear languages and with x in the case of copy languages, whereas in the case of twin-shue languages x is shued with x. The shue operation is rather strong: guided by an intersection with a regular language, it replaces the general intersection used in the characterizations based on linear or copy languages. Denote by REG; LIN; CF the families of regular, linear, and context-free languages, respectively. We have noted in the Introduction that the following representation holds: LIN = fh(x mi(x)) j x 2 L; L 2 REG; h a morphismg: This suggests the family COPY = fh(xx) j x 2 L; L 2 REG; h a morphismg: In view of the results in this note and of the discussion above, the study of the family COPY is of de nite interest (closure properties, precise place in the Chomsky hierarchy, complexity, etc). Remember also the fact that duplication is one of the most important non-context-free constructions in natural languages, much more frequent than the center-embedded constructions modeled by linear languages; see, e.g., [12], [17]. We only remark here that REG COPY and that COPY is incomparable with both LIN and CF : fanbmanbm j n; m 1g 2 COPY ? CF and fanbmabman j n; m 1g 2 LIN ? COPY . We hope to return to the study of the family COPY in another paper.
References [1] B. Baker, R. Book, Reversal-bounded multipushdown machines, J. Computer System Sci., 8 (1974), 315 { 332. [2] E. Csuhaj-Varju, J. Dassow, J. Kelemen, Gh. Paun, Grammar Systems. A Grammatical Approach to Distribution and Cooperation, Gordon and Breach, London, 1994. [3] J. Dassow, Gh. Paun, Regulated Rewriting in Formal Language Theory, Springer-Verlag, Berlin, Heidelberg, 1989. [4] J. Dassow, Gh. Paun, G. Rozenberg, Grammar systems, in Handbook of Formal Languages (G. Rozenberg, A. Salomaa, eds.), Springer-Verlag, Heidelberg, 1997. 7
[5] J. Engelfriet, G. Rozenberg, Fixed point languages, equality languages, and representations of recursively enumerable languages, Journal of the ACM, 27 (1980), 499 { 518. [6] H. Fernau, M. Holzer, Bidirectional cooperating distributed grammar systems, Techn. Report WSI-96-01, Univ. Tubingen, 1996. [7] R. Freund, Gh. Paun, C. M. Procopiuc, O. Procopiuc, Parallel communicating grammar systems with context-sensitive components, in Arti cial Life. Grammatical Models (Gh. Paun, ed.), Black Sea Univ. Press, Bucharest, 1995, 166 { 174. [8] V. Geert, A representation of recursively enumerable languages by two homomorphisms and a quotient, Theor. Computer Sci., 62 (1988), 235 { 249. [9] L. Ilie, A. Salomaa, On regular characterizations of languages by grammar systems, Acta Cybern., 12, 4 (1996), 411 { 425. [10] M. Latteux, P. Turakainen, On characterizations of recursively enumerable languages, Acta Informatica, 28 (1990), 179 { 186. [11] M. Latteux, B. Leguy, B. Ratoandromanana, The family of one-counter languages is closed under quotient, Acta Informatica, 22 (1985), 579 { 588. [12] A. Manaster Ramer, Uses and misuses of mathematics in linguistics, X Congreso de Lenguajes Naturales y Lenguajes Formales, Sevilla, 1994. [13] V. Mihalache, Parallel communicating grammar systems with query words, Ann. Univ. Buc., Matem.-Inform. Series, 45, 1 (1996), 81 { 92. [14] Gh. Paun, Characterizing recursively enumerable languages by grammar systems with leftmost derivation, submitted, 1997. [15] Gh. Paun, L. S^antean, Parallel communicating grammar systems: the regular case, Ann. Univ. Buc., Series Matem.-Inform., 38 (1989), 55 { 63. [16] O. Procopiuc, C. M. Ionescu, F. L. Tiplea, Parallel communicating grammar systems: the context-sensitive case, Intern. J. Computer Math., 49 (1993), 145 { 156. 8
[17] W. C. Rounds, A. Manaster Ramer, J. Friedman, Finding natural languages a home in formal language theory, in Mathematics of language (A. Manaster Ramer, ed.), John Benjamins, Amsterdam, 1987, 349 { 360. [18] B. Rovan, A framework for studying grammars, Proc. 10th MFCS Conf., LNCS 118, Springer-Verlag, Berlin, 1981, 473 { 482. [19] A. Salomaa, Formal Languages, Academic Press, New York, 1973. [20] D. Wood, Iterated a-NGSM maps and ? systems, Inform. Control, 32 (1976), 1 { 26.
9
Turku Centre for Computer Science Lemminkaisenkatu 14 FIN-20520 Turku Finland http://www.tucs.abo.
University of Turku Department of Mathematical Sciences
Abo Akademi University Department of Computer Science Institute for Advanced Management Systems Research
Turku School of Economics and Business Administration Institute of Information Systems Science