DNA Computing, Sticker Systems, and Universality1 - CiteSeerX

17 downloads 0 Views 217KB Size Report
For a family F of languages, we denote by wcode(F) the family of languages of the ..... 19] A. Salomaa, Jewels of Formal Language Theory, Computer Science ...
DNA Computing, Sticker Systems, and Universality1 Lila KARI

Department of Computer Science University of Western Ontario London, Ontario, Canada N6A 5B7

Gheorghe PA UN

Institute of Mathematics of the Romanian Academy PO Box 1 { 764, 70700 Bucuresti, Romania

Grzegorz ROZENBERG

Department of Computer Science, Leiden University PO Box 9512, 2300 RA Leiden, The Netherlands

Arto SALOMAA

Academy of Finland and Turku University Department of Mathematics 20500 Turku, Finland

Sheng YU

Department of Computer Science University of Western Ontario London, Ontario, Canada N6A 5B7

Abstract. We introduce the sticker systems, a computability model, which

is an abstraction of the computations using the Watson-Crick complementarity as in Adleman's DNA computing experiment, [1]. Several types of sticker systems are shown to characterize (modulo a weak coding) the regular languages, hence the power of nite automata. One variant is proven to be equivalent to Turing machines. Another one is found to have a strictly intermediate power.

1

Research supported by the Academy of Finland, project 11281, the Spanish Secretaria de Estado

1. Introduction

The sticker systems introduced here are language generating devices based on the sticker operation, which, in turn, is a model of the techniques used by L. Adleman in his successful experiment of computing a Hamiltonian path in a graph by using DNA, [1]. We recall some details of the experiment in order to see the roots of our models. One knows that DNA sequences are in fact double stranded (helicoidal) structures composed of four nucleotides, A (adenine), C (cytosine), G (guanine), and T (thymine), paired A{T, C{G according to the so-called Watson-Crick complementarity. If we have a single stranded sequence of A, C, G, T nucleotides, together with a single stranded sequence composed of the complementary nucleotides, the two sequences will be \glued" together (by hydrogen bonds), forming a double stranded DNA sequence. Figure 1 illustrates this operation. 50 {AAACTGGAG{30 + 30 {TTTGACCTC{50

@ ?

AAACTGGAG TTTGACCTC Figure 1

Using this biochemical reaction, Adleman has proceeded as follows, in searching Hamiltonian paths in a graph: { codify the nodes by single stranded DNA sequences of length 20 and put all these strings in a test tube, { if the node i is codi ed by the string xi and the node j is codi ed by the string xj , and there is an arrow from node i to node j in the graph, then add to the test tube a single stranded DNA sequence yij such that if xi = x0i x00i , xj = x0j x00j , each of x0i ; x00i ; x0j ; x00j being strings of length 10, then yij = yij0 yij00 , where yij0 is the Watson-Crick complement of x00i and yij00 is the Watson-Crick complement of x0j . Due to the complementarity, the string yij will match the corresponding parts of xi and xj , linking them and producing in this way a sequence of length 40, as illustrated by Figure 2.

x0i

x00i

x0j

yij0

yij00 Figure 2

x00j

This \domino game" can continue, identifying longer and longer paths in the considered graph. By a ltering procedure which is not of interest here, one then can check whether or not paths with speci ed properties exist (for instance, Hamiltonian paths). We extract from this experiment only the basic ingredient: the operation of prolonging to the right a sequence of (single or double) symbols by using given single stranded strings, matching them with portions of the current sequence according to a complementarity relation. The formal model of this operation is the sticker operation de ned in the following section. This operation can be used in building a generative/computing device: start from a given set of incomplete double stranded sequences (axioms), plus two sets of single stranded complementary sequences. Iterating the right prolongation using elements of these latter sets, we get \computations" of possibly arbitrary length. Stop when a complete double stranded sequence is obtained, that is when no \sticky end" still exists. We obtain in this way a language. The generative power of several variants of such mechanisms is investigated here. The unrestricted case corresponds to the Adleman experiment and it is proved to characterize { modulo a weak coding { the regular languages. When an additional restriction is imposed, namely to use the same sequence of complementary strings from the two initial sets, then, rather surprisingly, we get a characterization of recursively enumerable languages. Whether or not such a restriction can be implemented in the DNA framework is a practical problem which we cannot answer, but providing that it can be done, computationally universal DNA \computers" could be designed just using the Watson-Crick complementarity, plus the techniques required in the mentioned restriction. This reminds us the results obtained in a series of papers (see references in [10], [13], [16]) about the possibility of designing universal (and programmable) DNA \computers" based on the operation of splicing, introduced in [9] as a model of the recombinant behavior of DNA under the in uence of restriction enzymes and ligases.

3. The sticker operation

Let V be an alphabet (a nite set of abstract symbols) endowed with a symmetric relation  (of complementarity),   V  V . Let # be a special symbol not in V , denoting an empty space (the blank symbol). Using the elements of V [f#g we construct the composite symbols of the following sets:  V  =  a  j a; b 2 V; (a; b) 2  ; V ! ( b ! ) # # V ! = ( a ! j a 2 V ); V = a ja2V :

We denote

 V W (V ) = V S (V );  

where



#  V  S (V ) = V [ # ; and we call the elements of this set well-started sequences (in general, X  is the set of all strings, including the empty one denoted by , composed of elements of X , and X + is the set X  ? fg). Stated otherwise, the elements of W(V ) start with pairs of symbols in V , as selected by the complementarity relation, and end either by a sux ! ! # b , for a; b 2 V (the consisting of pairs or with a sux consisting of pairs a # ! ! symbols # ; b are not mixed). # a The sticker operation, denoted by , is a partially de ned mapping from W (V )  S (V ) to W(V ), de ned as follows. For x 2 W (V ); y 2 S (V ); z 2 W(V ), we write !

!

(x; y) = z if and only if one of the following cases holds: 1:

ak+r+1 : : : ak+r+p ; k+r x = ab 1 : : : ab k ak#+1 : : : a# # # 1 k # ; : : : y= # cr c1 z = ab 1 : : : ab k ack+1 : : : ack+r ak+#r+1 : : : ak+#r+p ; 1 k 1 r for k  0; r  1; p  1; ai 2 V; 1  i  k + r + p; bi 2 V; 1  i  k; ci 2 V; 1  i  r; and (ak+i ; ci ) 2 ; 1  i  r; 





2:



!

!

!

!

!

!













!

!



k+r x = ab 1 : : : ab k ak#+1 : : : a# ; 1 k # ::: # ; # : : : y= # cr+p cr cr+1 c1 z = ab 1 : : : ab k ack+1 : : : ack+r c# : : : c# ; 1 k 1 r r+1 r+p for k  0; r  0; p  0; r + p  1; ai 2 V; 1  i  k + r; bi 2 V; 1  i  k; ci 2 V; 1  i  r + p; and (ak+i ; ci ) 2 ; 1  i  r; 





!









!

!

!

!





!





!

!

3:

# ::: # ; x = ab 1 : : : ab k b # : : : b# bk+r+1 bk+r+p 1 k k+1 k+r y = c#1 : : : c#r ; z = ab 1 : : : ab k b c1 : : : cbk+r b # : : : b # ; k+r+p k+r+1 k+r k+1 k 1 for k  0; r  1; p  1; ai 2 V; 1  i  k; bi 2 V; 1  i  k + r + p; ci 2 V; 1  i  r; and (ci ; bk+i ) 2 ; 1  i  r; 





4:



!

!

!

!

!

!

!

!

!

!









x = ab 1 : : : ab k b # : : : b# ; k+r k+1 1 k y = c#1 : : : c#r cr#+1 : : : cr#+p ; cr+1 : : : cr+p ; z = ab 1 : : : ab k b c1 : : : b cr # # 1 k k+1 k+r for k  0; r  0; p  0; r + p  1; ai 2 V; 1  i  k; bi 2 V; 1  i  k + r; ci 2 V; 1  i  r + p; and (ci ; bk+i ) 2 ; 1  i  r: 





!









!

!

!

!

!



!

!

!

!

In case 1 we add complementary symbols on the lower level without completing all the blank spaces. In case 2 we complete the blank spaces on the lower level of x and ! possibly add more composite symbols of the form # . Cases 3 and 4 are symmetric c to cases 1 and 2, respectively, completing blank spaces on the upper level of the string. Figure 3 picturally illustrates these cases.

Case 1: Case 2: Case 3: Case 4:

x y

x

y

x

y y

x Figure 3

Note that in all cases the string y must contain at least one composite symbol and that cases 2 and 4 allow the prolongation of \blunt" strings in W(V ): when r = 0, there is no blank position in x. Of course, for strings x; y which do not satisfy any of the previous conditions, (x; y) is not de ned.

4. Sticker systems

Using the sticker operation we can de ne a generating/computing mechanism as follows: A sticker system is a construct

= (V; ; A; Bd ; Bu); where V is an alphabet,   V  V is a symmetric relation!on V , A is a! nite subset of + + V # W(V ) (of axioms), and Bd and Bu are nite subsets of V and # , respectively. The idea behind such a machinery is the following. We start with the sequences in A and we prolong them to the right with the strings in Bd ; Bu according to the sticker operations (the elements of Bd are used on the lower row, down, and those of Bu are used on the upper row). When no blank symbol is present, we obtain a string over the   alphabet V . The language of all such strings is the language generated by . V  Formally, we de ne this language as follows. For two strings x; z 2 W(V ) we write

We denote by =) the re exive and transitive closure of the relation =). A sequence x1 =) x2 =) : : : =) xk ; x1 2 A, is called a computation in (of length   V k ? 1). A computation as above is complete if xk 2 V (no blank symbol is present  in the last string of composite symbols). The language generated by , denoted by L( ), is de ned by   V L( ) = fw 2 V j x =) w; x 2 Ag:  Therefore, only the complete computations are taken into account when de ning L( ). Note that a complete computation can be continued since we allow prolongations starting from blunt sequences. One sees the close resemblance with the operations used in the Adleman experiment: Bd corresponds to the codes of graph nodes, Bu corresponds to the complementary strings identifying the arrows in the graph (or conversely). The fact that we use here also a given set of axioms (and, in several results, a weak coding is applied to the language of words of composite symbols generated by our devices) adds exibility to the model and makes it more similar to usual generating mechanisms investigated in formal language theory.   V A complete computation x1 =) x2 =) : : : =) xk ; x1 2 A; xk 2 V , with respect to , is said to be: { primitive if no properly initial part of it is complete; { balanced if in each step xi =) xi+1 one uses a sticker operation corresponding to cases 2 or 4 in Section 3. Moreover, cases 2 and 4 alternate from a step to the next one. Thus, in a primitive computation we do not use sticker operations as in cases 1 { 4 with p = 0, except in the last step. In a balanced computation we allow p = 0, but from a step to the next one we have to change the set Bd ; Bu from which we take the string to be used.   V Let us denote by Lp ( ); Lb ( ); Lpb ( ) the languages of the strings w 2 V  obtained by a complete computation of that is primitive, balanced, both primitive and balanced, respectively. Assume now that the strings in the sets Bd ; Bu are labelled in a one-to-one manner by natural numbers from 1 to card(B ), 2 fd; ug; denote by e : B ?! f1; : : : ; card(B )g; 2 fd; ug, the labellings. For a computation   V D : x1 =) x2 =) : : : =) xk ; x1 2 A; xk 2 V ;  and for 1  j  k ? 1, we denote

and we de ne

e (D) = e (x1 =) x2 )e (x2 =) x3 ) : : : e (xk?1 =) xk ); for 2 fd; ug: We say that ed(D) is the d-control word and eu(D) is the u-control word associated with D. A computation D such that ed(D) = eu (D) is called coherent. When jed(D)j = jeu(D)j (where jxj is the length of the string x) we say that D is a fair computation.  V We denote by Lc ( ) and Lf ( ) the languages of the strings in V  that are obtained by a coherent complete computation and, respectively, by a fair complete computation in . Clearly, each coherent computation is also fair. By the de nition, L ( )  L( ), for all 2 fp; b; pb; c; f g. We denote by SL, PSL, BSL, PBSL, CSL, FSL the families of languages of the form L( ); Lp ( ); Lb( ); Lpb ( ); Lc ( ); Lf ( ), respectively, de ned as above. (By REG and RE we denote the families of regular and recursively enumerable languages, respectively.) In the following sections we will investigate these six families of languages generated by sticker systems. We would also like to investigate coherent primitive, coherent balanced, coherent primitive and balanced languages, as well as fair primitive, fair balanced languages etc. (Note that a balanced computation is not necessarily a fair one, because it can start and stop in the same set Bd ; Bu ). But we will not consider them in this paper. 



5. Characterizing the regular languages

We now begin our investigations concerning the generative capacity of sticker systems. We will rst show in this section that many of the basic variants yield only regular languages. Then we show that each regular language can be represented as a weak coding of a language generated by a sticker system of one of these basic types. (A weak coding is a morphism h : V1 ?! V2 such that h(a) 2 V2 [ fg for all a 2 V1. If h(a) 2 V2 for all a 2 V1 , then h is called a coding.) Lemma 1. SL  REG: Proof. Let = (V; ; A; Bd ; Bu ) be a sticker system. We denote d = maxfjxj j x 2 A [ Bd [ Bu g: We construct the right-linear grammar G = (N; T; S; P ) with !# " ! " ! !# # a a # k 1 N = f # : : : # ; a : : : a j ai 2 V; 1  i  k; 1  k  dg 1 k [ fS; X g; T = VV ; 

1.1) S ! a1 : : : an b1 bn a for 1 : : : b1 a a 1.2) S ! 1 : : : n b1 bn for a1 : : : b 





















an+1 : : : an+k , # # an+1 : : : an+k 2 A; # # # # : : : bn+1 bn+k ; # : : : # 2 A; b b

"

!

an bn

1

!

!



"



!#

!

an bn

!#

!

!



n+k

n+1

1.3) S ! a1 : : : an X; b1 bn a an 2 A: 1 for : : : b1 bn In all cases, ai; bi 2 V; 1  i  n + k, and k  1; n  0: 







2.1)

"









a1 : : : an ! a1 : : : am am+1 : : : an # # b1 bm # # # # for : : : b b 2 Bd with m < n, !

!#



!

2.2)

!

!#

;

a1 : : : an ! a1 : : : an X; b1 bn # # for # : : : # 2 Bd , b b 







!

1

2.3)

"

m

!#

!

!

"



!

1

"



n

a1 : : : an ! a1 : : : an # ::: # # bm # bn+1 b1 bn # # for : : : b b 2 Bd with m > n. !

!#



!

1





"

!

!#

;

!

m

a 2 V (We prolong the current terminal string of symbols V b using an element of Bd .) 3.1)

3.2)

"

"



    # # a a # # 1 m : : : ! : : : : : : b1 bn! b!1 bm bm+1 bn for a1 : : : am 2 Bu with m < n, # # !# ! # : : : # !  a1  : : :  an  X; bn! b!1 bn b1 !

!#

"

!



!#

;







to the right,

    # # a a an+1 : : : am ; 1 n 3.3) : : : ! : : : b1 bn! b!1 bn # # a a for 1 : : : m 2 Bu with m > n. # #     V a (We prolong the current terminal string of symbols b 2 V  to the right, using an element of Bu .) !# ! " a a 1 ::: n In all rules of types 2:i); 3:i); i = 1; 2; 3, the symbols # # " ! !# # # : : : b1 bn , respectively, are arbitrary symbols in N . "

!

!#

"

!

!#

4.1) X ! a1 : : : an , for a1 : : : an 2 Bu ; n  1; # # # # 4.2) X ! # : : : # ; for # : : : # 2 Bd ; n  1: bn b1 bn b1 (A complete computation can be continued using these rules.) "

!

"

!

!#

!#

!

!

!

!

5) X ! : It is easy to see that L(G) = L( ): at every step we can use an element of Bd or an element of Bu such that the current sticky end is shorter than d. Therefore, the terminals in N can control the process in the same way as the sticky ends. We conclude that L( ) is a regular language. 2 Lemma 2. PSL  REG: Proof. Starting from a sticker system , we construct a right-linear grammar G0 as in the previous proof, but without using the nonterminal symbol X (this means that the rules of types 1.3), 2.2), and 3.2) become terminal rules, and the rules of types 4.1), 4.2), and 5) are no longer used). In this way, no complete computation in can be continued by the corresponding derivation in G, that is L(G) = Lp( ). Consequently, Lp( ) 2 REG. 2 Lemma 3. BSL  REG: Proof. We proceed as in the proof of Lemma 1, but instead of using one symbol X we consider two nonterminals Xu; Xd. Then we introduce rules of type 1.3) with both Xu and Xd instead of X , in rules of type 2.2) we replace X with Xd, in rules of type 3.2) we replace X with Xu, in rules of type 4.1) we replace X with Xu, and in rules of type 4.2) we replace X with Xd; moreover, the rules of types 2.1) and 3.1) are removed; nally, instead of X !  we introduce both rules Xd !  and Xu ! .

In this way, only balanced computations in are simulated in the obtained grammar. Denoting this grammar by G00 , we obtain L(G00 ) = Lb( ). Therefore, Lb ( ) 2 REG. 2 Combining the ideas of the proofs of Lemmas 2 and 3 we get: Lemma 4. PBSL  REG: Modulo a weak coding, the opposite inclusions are also true. Lemma 5. Every regular language can be represented as a weak coding of a language in SL \ PSL \ BSL \ PBSL: Proof. Consider a regular grammar G = (N; T; S; P ), assume it -free (at most the -rule S !  is present and then S does not appear in the right hand side of the rules), and construct the sticker system

= (V; ; A; Bd ; Bu); with V = f[X; a]i j X 2 N; a 2 T; i = 1; 2g [ f(X; a)i j X 2 N; a 2 T; i = 1; 2g [ f[Z; ]; (Z; )g; where Z is a new symbol;  = f([X; a]i ; (X; a)i ) j X 2 N; a 2 T; i = 1; 2g [ f(([Z; ]; (Z;! ))g; ) [ S; a ] 1 j S ! aX 2 P; a 2 T; X 2 N A = # ( ! ) [ S; a ] 1 [ j S ! a 2 P; a 2 T (S; a)1 [ f( j S ! !2 P g; ! # j X ! aY 2 P; and Y ! bY 0 2 P # Bd = (X; a)1 (Y; b)2 0 2 N; a; b 2 T g or Y ! b 2 P; X; Y; Y ( ! ! ) # # [ j X ! a 2 P; a 2 T ( X; a ) ( Z;  ) 1 ( ! !) # # [ ; (Z; ) (Z; ) ( ! ! [ X; a ] [ Y; b ] 2 1 Bu = j X ! aY 2 P; and Y ! bY 0 2 P # # 0 2 N; a; b 2 T g or Y ! b 2 P; X; Y; Y ( ! ) ! [ X; a ] [ Z;  ] 2 [ j X ! a 2 P; a 2 T # # (

!)

Every computation has to start by using a string in Bd , it continues by alternately ! [Z; ] using elements of Bd and Bu, and can be completed only by using the string # in Bu . A complete computation cannot continue further by using strings that contain symbols other than [Z; ] or (Z; ), because the relation  allows only the matching of symbols [X; a]i ; (X; a)j with i = j .    V  de ned by ?! T Consider now the weak coding g : V 

a]i ) = a; for X 2 N; a 2 T; i = 1; 2; g( ([X; X; a)!i ] ) = : g( ([Z; Z; ) !

>From the construction of and the de nition of g one can easily see that L(G) = g(L( )) = g(L (G)), for all 2 fp; b; pb; f g. 2 For the case of primitive computations, the proof above can be modi ed in such a way to !have L(G!) equal to a coding of Lp ( ): if we remove all occurrences of symbols   V [Z; ] # ; # , then the computation stops when completing an element of V . (Z; )  A computation simulating a derivation in G is also balanced, hence a coding also suces for the case of primitive and balanced computations. ! ! # [Z; ] In the non-primitive case we cannot avoid using symbols ; # (hence (Z; ) we cannot avoid using a weak coding in the statement of Lemma 5), because otherwise we can continue a!computation corresponding to a derivation in G by adding further [X; a]i symbols ; i = 1; 2; such symbols cannot be removed even if we then use a (X; a)i weak coding. For a family F of languages, we denote by wcode(F ) the family of languages of the form g(L), for L 2 F and g a weak coding. Theorem 1. REG = wcode(SL) = wcode(PSL) = wcode(BSL) = wcode(PBSL): Proof. The family REG is closed under arbitrary morphisms, hence from Lemmas 1, 2, 3, 4 we obtain wcode(F )  REG, for F 2 fSL; PSL; BSL; BPSLg. Lemma 5 proves the opposite inclusions. 2 Thus, the Adleman way of computing cannot transgress the power of nite automata.

6. Characterizing the recursively enumerable languages

a modi cation of the classical proof of the characterization of recursively enumerable languages by means of equality sets and, the other, a speci c construction with sticker systems. The essence of our proof can be described as follows. The notion of coherence comes very close to the idea of the twin-shue languages [19]. Hence, the generative capacity of the latter can be carried over to sticker systems. We now begin the details. >From the de nitions, it is clear that all languages generated by sticker systems are context-sensitive. Moreover, from the Turing-Church thesis we have the following lemma: Lemma 6. wcode(CSL)  RE: In view of the fact that, at the rst sight, the operation of prolongation to the right based on matching symbols related by a complementarity relation does not look very powerful, the following result is rather surprising. Lemma 7. Every recursively enumerable language can be represented as a weak coding of a language in the family CSL. Our proof of this lemma is based on the following representation of an arbitrary recursively enumerable language L  T  : L = hT (h1(E (h1; h2 )) \ R): (1) where h1 and h2 are two morphisms, R is a regular language, E (h1; h2 ) is the equality set of h1 and h2 , and hT is a special projective morphism de ned by ( if a 2 T ; hT (a) = a; ; if a 62 T: A representation which is very similar to the above has been shown in [18], [19]: L = hT (E (h1; h2 ) \ R): (2) The di erence between (1) and (2) is that (1) uses the h1 image of the equality set of h1 and h2, i.e., h1 (E (h1; h2)) (= h2 (E (h1; h2 ))), but (2) uses the equality set itself, i.e., E (h1; h2). Unfortunately, we have found neither a proof for (1) in the literature nor a way to derive (1) from (2) directly. We give a proof for (1) in the following, which is a modi cation of the proof for (2) in [19] (Theorem 6.9). Lemma 8. For each recursively enumerable language L  T  , there exist two -free morphisms h1 ; h2 : 2 ! 1 , a regular language R  1, and a projection hT : 1 ! T  such that L = hT (h1(E (h1; h2 )) \ R): (3) Proof. Let L be an arbitrary recursively enumerable language generated by a phrasestructure grammar G = (N; T; P; S ), where N and T are the nite sets of nonterminals and terminals, respectively, P is the nite set of productions:

and S 2 N is the starting nonterminal. Without loss of generality, we assume that for each production pi : i ! i , i 6= , except for the production S !  if  2 L. De ne T 0 = fa0 j a 2 T g, T 00 = fa00 j a 2 T g, and P 0 = fp0 j p 2 P g. Denote by V and V1 the sets N [ T and N [ T 0 , respectively. For notational purpose, we also de ne a morphism d : V  ! V1 by d(A) = A for A 2 N and d(a) = a0 for a 2 T . Note that d is a bijection; thus, the inverse of d, d?1 , is well de ned. Let 1 = V [ T 0 [ fB; F; $g; (4) 2 = 1 [ T 00 [ P [ P 0 ; (5) where B; F , and $ are not in V , V1 , or V2 . The morphisms h1 ; h2 : 2 ! 1 , depending on G, are de ned by the following: (i) h1(B ) = BS $; h2 (B ) = B; (ii) h1($) = $; h2 ($) = $; (iii) h1(pi ) = d( i ); h2 (pi ) = d( i ); for pi : i ! i 2 P; h2 (p0i ) = d( i ); for pi : i ! i 2 P; (iv) h1(p0i ) = i ; (v) h1(A) = A; h2 (A) = A; for A 2 N; 0 0 0 0 (vi) h1(a ) = a ; h2 (a ) = a ; for a0 2 T 0 ; (vii) h1(a00 ) = a; h2 (a00 ) = a0 ; for a00 2 T 00 ; (viii) h1(a) = F; h2 (a) = a; for a 2 T; 0 0 (ix) h1($ ) = F; h2 ($ ) = $; (x) h1(F ) = F; h2 (F ) = FF: The regular language R is de ned by the regular expression

BS ($V1) $T  F + : Note that i ; i 6=  for all i above. So, both h1 and h2 are -free morphisms. If  2 L, then we introduce an additional symbol @ to 2 and de ne h1(@) = h2 (@) = BS $F: It is easy to see that by de ning h1 (@) and h2(@), we will not introduce any other new words to h1(E (h1; h2 )) \ R. Therefore, we assume that  62 L in the following arguments. We de ne hT : 1 ! T by if a 2 T ; hT (a) = a; ; if a 62 T: Now we show that the equation (3) holds. Our proof for that x 2 L implies x 2 hT (h1(E (h1; h2 )) \ R) is similar to the one for Theorem 6.9 of [19]. So, we omit the formal proof. Instead, we use an example to (

Let L be generated by the following phrase-structure grammar G:

p1 : S ! ACCC; p2 : CC ! CD; p3 : AC ! a; p4 : DC ! ACC; p5 : ACC ! C; p6 : C ! b: A derivation sequence for the word ab is S =) ACCC =) ACDC =) aDC =) aACC =) aC =) ab:

(6)

According to this derivation sequence, we de ne

x = Bp1 $Ap2 C $p3 DC $a0p4 $a0 p5 C $a00 p06 $0 abFFF: By the above de nitions of h1 and h2, we have h1 (x) = h2 (x) = BS $ACCC $ACDC $a0 DC $a0 ACC $a0 C $abFFFFFF: Then, clearly, x 2 E (h1; h2 ), h1(x) 2 R, and hT (h1(x)) = ab. Thus, ab 2 hT (h1 (E (h1; h2 )) \ R). Conversely, let w 2 hT (h1(E (h1; h2)) \ R), i.e., w = hT (y) for some y 2 h1(E (h1; h2)) \ R. Then by the de nition of R, y is in the form where y1 ; : : : ; yt?1 Then

BS $y1 $y2$ : : : $yt F l ; 2 V1 , yt 2 T  , and l > 0. Let y = h1(x) for some x 2 E (h1; h2 ).

x = Bx1 $x2 $ : : : $xt $0 xt+1 F m such that h2(x1 ) = S , h1(xi ) = h2 (xi+1 ) = yi , for 1  i  t, and l = 2m and h1(xt+1 ) = F m?1 . Note that if xj = xj+1 for some j , 1  j < t, then we can construct a new word x0 by deleting xj $ from x so that hT (h1(x0 ) \ R) = hT (h1(x) \ R) = w. So, without loss of generality, we assume that xj 6= xj+1 for all j , 1  j < t. (It is clear that xt 6= xt+1 ). Then the following are clear: (1) x1 = p (or x1 = p0 if t = 1) for some p : S ! in P , (2) xi 2 (V1 [ P ) P (V1 [ P ) , for 2  i < t, (3) xt 2 (V2 [ P 0 ), (4) xt+1 2 T  , (5) h1(xi ) = yi and h2(xi ) = yi?1 , for 1  i  t (letting y0 = S ). By (2) and (5) above and (iii) of the de nition of h1 and h2, it follows that d?1 (yi?1) )+G d?1 (yi );

Proof of Lemma 7. Let L  T  be an arbitrary recursively enumerable language. By Lemma 8, L = hT (h1(E (h1; h2)) \ R) for some -free morphisms h1; h2 : 2 ! 1 , regular language R  1 , and projection hT : 1 ! T  de ned by

if X 2 T; hT (X ) = X; ; otherwise, (

Let 2 = fb0; b1 ; : : : ; bn?1g, for some integer n > 0, and R be accepted by a deterministic nite automaton M = (Q; 1 ; ; q1 ; F ), where Q = fq0; q1 ; : : : ; qm?1 g, for some m > 0. We construct the sticker system

= (V; ; A; Bd ; Bu ) where V = 1 [ Q [ f[q; j ] j q 2 Q; 0  j  m ? 1g,  = f(X; X ) j X 2 1 g [ f(q; q); ([q; j ]; q); (q; [q; k]); ([q; j ]; [q; k ]) j q 2 Q; 0  j; k  m ? 1g, !) ( q 0 A= # , ( ! ! ! ! ! ! ! # # # # # # # Bd = [q ; j ] a q qi1 a2 qi2 qi2 : : : i0 1 i1 ! ! ! ! # # # # qi ?1 qi ?1 at qi a1a2 : : : at = h2(bi); bi 2 2 ; 0  j  m ? 1; i

ti

ti

ti

i

o

(qi ; ak+1) = qi +1 ; 0  k < ti # # q 2F , qi qi i a1 [qi1 ; j ] qi1 a2 qi2 qi2 : : : # # # # # # qi at qi # # # a1a2 : : : at = h1(bi); bi 2 2 ; 0  j  m ? 1; k

Bu =

(

k

!

!

(

S

!

!

!

)

!

!

!

ti

ti

i

!

!

!

i

(qi ; ak+1) = qi +1 ; 1  k < ti qi q 2 F . # i k

(

k

!

)

o

S

Note that each string of Bd or Bu contains an integer j , 0  j < m, which is paired with the state that appears rst (from the left) in the string. The function of this integer will become clear later. Denote by rd (i; j; k), 0  i < n and 0  j; k < m, a string in Bd which is constructed with the word h2(bi ) and the state qj as the rst state that is paired with the integer k. Similarly, denote by ru(i; j; k) a string in Bu . Assume that F contains l > 0 states and

F = ff0; f1 ; : : : ; fl?1 g; l  m: # # f Then we denote by rd (n; 0; j ) the string and by ru(n; j; 0) the string j . fj fj # 2 It is clear that card(Bd ) = card(Bu) = nm + l. De ne the labelling mappings ed : Bd ! f1; : : : ; card(Bd )g by ed(rd (i; j; k)) = i  m2 + j  m + k + 1 and eu : Bu ! f1; : : : ; card(Bu)g by eu(ru(i; j; k)) = i  m2 + k  m + j + 1: Let u denote a string in A. By the above construction of , it is clear that u =) z if and only if there is a sequence qi0 a1 qi1 a2 : : : qi ?1 atqi such that (1) qi0 = q0 , qi 2 F , and (qi ?1 ; ak ) = qi ; and (2) there is x 2 2 such that h1(x) = h2(x) = a1a2 : : : at.  V  de ned by ?! T Consider also the weak coding g : V !

!

!

t

t

t

k



k





if = = a; a 2 T; g( ) = a; ; otherwise: We show that g(Lc ( )) = hT (h1(E (h1; h2 )) \ R) in the following. Let w 2 hT (h1(E (h1; h2)) \ R). Then there exist x = bi1 bi2 : : : bi 2 E (h1; h2) and y = h1 (x) = h2(x) such that y 2 R and w = hT (y). Let y = a1 a2 : : : at. Then we have a state sequence qj1 ; qj2 ; : : : ; qj +1 of M such that qj1 = q0, qj +1 = fr 2 F for some r, 0  r < l, and (qj ; ak ) = qj +1 for 1  k  t. Note that h1(x) = h1(bi1 ) : : : h1(bi ) = a1 : : : at. Let h1(bi ) = a : : : a +1?1 , 1  k < s, and h1(bi ) = a : : : at . Similarly, let h2 (bi ) = a : : : a +1?1 , 1  k < s, and h2(bi ) = a : : : at . Then, there exists a computation D such that D uses the following strings from Bd : rd (i1 ; j 1 ; j 1 +1 ); : : : ; rd(is; j ; j +1 ); rd (n; 0; r) and the following from Bu: !

(

s

t

k

k

s

k

k

k

t

k

s

k

s

k

s

s

s

s

Let the result of the computation D be z . Clearly, by the de nition of g, g(z ) = w. It is also easy to see that ed(D) = eu(D) by the de nitions of ed and eu. We now show that if w 2 g(Lc ( )), then w 2 hT (h1 (E (h1; h2 )) \ R). We have w 2 g(z ) for some z that is the result of a computation D of and eu(D) = ed (D). By the construction of the sticker system , one can observe that z corresponds to a sequence qi0 ; a1 ; qi1 ; a2; : : : ; qi ?1 ; at ; qi where qi0 = q0 , qi 2 F , and (qi ?1 ; ak ) = qi , for 1  k  t. Then it is clear that a1a2 : : : at 2 R. By the de nition of Bd and Bu and the fact that eu (D) = ed (D), it follows that a1a2 : : : at = h1(x) = h2(x) for some x 2 2 . Denote a1a2 : : : at by y. Then, y 2 h1(E (h1; h2)) \ R. It is easy to show that w = g(z ) = hT (y). Therefore, w 2 hT (h1(E (h1; h2)) \ R). 2 >From Lemmas 6 and 7 we get Theorem 2. RE = wcode(CSL): t

t

k

7. An intermediate case

t

k

For the fair computations we have Theorem 3. REG  wcode(FSL)  RE: Proof. The inclusion REG  wcode(FSL) follows from the proof of Lemma 5. For the strictness, let us consider the sticker system

= (V; ; A; Bd ; Bu ); V = fa; a0 ; b; b0 g;  = f((a; a0!) ); (b; b0)g; A = #a ; ( ! ! !) # # # Bd = a0 ; b0 b0 ; ( ! ! !) a a b Bu = # # ; # : ! # Starting with the unique axiom in A, we have to use 0 from Bd and we obtain a a blunt sequence. We can continue with any string in Bd and Bu. However, due to the complementarity restrictions, if a!symbol b or! b0 is introduced, then we have to continue by using composite symbols #0 and b until obtaining again a blunt sequence. b # !+  + a b Thus, let us intersect the language Lf ( ) with the regular language 0 0 . a b !2m  2n+1 a b We obtain a language consisting of strings of the form ; n; m  1,

{ the rst string in Bd is used 2n + 1 times, { the second string in Bd is used m times, { the rst string in Bu is used n times (one occurrence of a is introduced by the axiom), { the second string in Bu is used 2m times. Due to the fairness, we must have 2n + 1 + m = n + 2m; which means that

n = m ? 1:

2m?1 b 2m a The language f 0 j m  1g is not a regular one, hence Lf ( ) is not a b0 regular. >From the Turing-Church thesis we have wcode(FSL)  RE . For the strictness, we shall prove that wcode(FSL)  MAT , where MAT  is the family of languages generated by context-free matrix grammars with arbitrary rules. Because MAT   RE ([6], [8]), we obtain wcode(FSL)  RE . Consider a sticker system = (V; ; A; Bd ; Bu). De ne V 0 = fa0 j a 2 V g; L(A) = f[a1 ; b01 ] : : : [ak ; b0k ]ak+1 : : : ak+r j k; r  0; k + r  1; a1 : : : ak ak+1 : : : ak+r 2 Ag b1 # bk # 0 0 0 0 [ f[a1 ; b1 ] : : : [ak ; bk ]bk+1 : : : bk+r j k; r  0; k + r  1; a1 : : : ak # : : : # 2 Ag b1 bk+r bk bk+1 [ f j  2 Ag; # 2 B g; : : : L(Bd) = fb01 : : : b0k j k  1; # d b b 

!











!









!

!

!

!

!

k 1! ! a1 : : : ak 2 B

L(Bu) = fa1 : : : ak j k  1; # u g: # Consider the new symbols s; d; d0 and construct the languages L1 = fxd0 j x 2 L(Bd )g+; L2 = fxd j x 2 L(Bu)g+; L01 = L1 ? t c+ ; 0

+

(? t is the shue operation: x ?t y = fx1 y1 : : : xn yn j n  1; x = x1 : : : xn; y = y1 : : : yn; xi ; yi 2 V ; 1  i  ng.) Clearly, L1 ; L2 are regular languages, hence also L3 is regular: the family REG is closed under the shue operation and under intersection. Consider the gsm Q which: { leaves unchanged the symbols [a; b0 ]; a; b 2 V , { replaces each pair ab0 by [a; b0 ]; a; b 2 V; { replaces each pair cd0 by [c; d0 ] and each pair dc by [d; c]. The language Q(L3 ) is also regular, over the alphabet

U = f[a; b0 ] j a; b 2 V g [ f[c; d0 ]; [d; c]g: Let G = (N; U; S; P ) be a regular grammar for Q(L3) and construct the matrix grammar G0 = (N 0 ; VV ; S 0 ; M );  where N 0 = N [ U [ fS 0 g; M = f(S 0 ! S )g [ f(r) j r 2 P g [ f([a; b0 ] ! ab ) j a; b 2 V g [ f([c; d0 ] ! ; [d; c] ! )g:  V 0 such that x =) w in , It is easy to see that L(G ) contains all the strings w 2 V  0 x 2 A, and this is a fair derivation: the matrix ([c; d ] ! ; [d; c] ! ) checks whether or not the number of symbols d and d0 is the same. The family MAT  is closed under arbitrary morphisms ([5]), hence the image by a weak coding of Lf ( ) is in the family MAT  . 2 Open problem. Is the family FSL included in the family of context-free languages (or even in the family of linear languages) ? 











7. Final remarks

We have found characterizations of families REG and RE using variants of sticker systems. Several further research directions are of interest:

 De ne variants of sticker systems which characterize families of languages di er-

ent from REG and RE .  Look for universal sticker systems, in the sense of universal Turing machines.

 Look for other variants which characterize RE , but not using the coherence re-

striction. This is again important to nding \realistic" models of DNA computers based on the sticker operation.  Investigate the descriptional complexity (the size) of sticker systems and languages. Several parameters are very natural: the number of axioms, the length of the longest axiom, the number of elements in the sets Bd; Bu , the length of the longest string in Bd ; Bu, etc. Do these measures give in nite hierarchies of sticker languages?

References [1] L. M. Adleman, Molecular computation of solutions to combinatorial problems, Science, 226 (Nov. 1994), 1021 { 1024. [2] L. M. Adleman, On constructing a molecular computer, in DNA Based Computers (R. J. Lipton, E. B. Baum, Eds.), DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 27, American Math. Soc., 1996, 1 { 22. [3] E. Csuhaj-Varju, L. Freund, L. Kari, Gh. Paun, DNA computing based on splicing: universality results, First Annual Paci c Symp. on Biocomputing, Hawaii, Jan. 1996. [4] K. Culik II, A purely homomorphic characterization of recursively enumerable sets, Journal of the ACM 26 (1979) 345-350. [5] J. Dassow, Gh. Paun, Regulated Rewriting in Formal Language Theory, SpringerVerlag, Berlin, Heidelberg, 1989. [6] J. Dassow, Gh. Paun, A. Salomaa, Grammars with controlled derivations, in Handbook of Formal Languages (G. Rozenberg, A. Salomaa, Eds.), Springer-Verlag, 1997. [7] R. Freund, L. Kari, Gh. Paun, DNA computing based on splicing: The existence of universal computers, Technical Report 185-2/FR-2/95, TU Wien, 1995. [8] D. Hauschild, M. Jantzen, Petri nets algorithms in the theory of matrix grammars, Acta Informatica, 31 (1994), 719 { 728. [9] T. Head, Formal language theory and DNA: an analysis of the generative capacity of speci c recombinant behaviors, Bull. Math. Biology, 49 (1987), 737 { 759. [10] T. Head, Gh. Paun, D. Pixton, Language theory and molecular genetics. Generative mechanisms suggested by DNA recombination, in Handbook of Formal Languages (G. Rozenberg, A. Salomaa, Eds.), Springer-Verlag, 1997.

[12] R. J. Lipton, Speeding up computations via molecular biology, in DNA Based Computers (R. J. Lipton, E. B. Baum, Eds.), DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 27, American Math. Soc., 1996, 67 { 74. [13] Gh. Paun, Splicing. A challenge to formal language theorists, Bulletin EATCS, 57 (1995), 183 { 194. [14] Gh. Paun, Regular extended H systems are computationally universal, J. Automata, Languages and Combinatorics, 1, 1 (1996), 27 { 36. [15] Gh. Paun, G. Rozenberg, A. Salomaa, Computing by splicing, Theor. Computer Sci., 168 (1996), 321 { 336. [16] Gh. Paun, A. Salomaa, DNA computing based on the splicing operation, Mathematica Japonica, 43, 3 (1996), 607 { 632. [17] A. Salomaa, Formal Languages, Academic Press, New York, London, 1973. [18] A. Salomaa, Equality sets for homomorphisms of free monoids, Acta Cybernetica 4 (1978) 127-139. [19] A. Salomaa, Jewels of Formal Language Theory, Computer Science Press, Rockville, Maryland, 1981.