DNA Computing, Matching Systems, and Universality

DNA Computing, Matching Systems, and Universality Lila Kari

Department of Computer Science University of Western Ontario London, Ontario, Canada N6A 5B7

Gheorghe Paun

Institute of Mathematics of the Romanian Academy PO Box 1 { 764, 70700 Bucuresti, Romania

Grzegorz Rozenberg

Department of Computer Science, Leiden University PO Box 9512, 2300 RA Leiden, The Netherlands

Arto Salomaa

Academy of Finland and Turku University Department of Mathematics 20014 Turku, Finland

Sheng Yu

Department of Computer Science University of Western Ontario London, Ontario, Canada N6A 5B7

Turku Centre for Computer Science TUCS Technical Report No 49 October 1996 ISBN 951-650-849-9 ISSN 1239-1891

Abstract We introduce the matching systems, a computability model which is an abstraction of the way of using the Watson-Crick complementarity when computing with DNA in the Adleman style, [1]. Several types of matching systems are shown to have the same power as nite automata, one variant is proven to be equivalent to Turing machines, and another one is found to have a strictly intermediate power.

TUCS Research Group

Mathematical Structures of Computer Science

1. Introduction The matching systems introduced here are language generating devices based on the matching operation, which, in turn, is a model of the techniques used by L. Adleman in his successful experiment of computing a Hamiltonian path in a graph by using DNA, [1]. We recall some details of the experiment in order to see the roots of our models. One knows that DNA sequences are in fact double stranded (helicoidal) structures composed of four nucleotides, A (adenine), C (cytosine), G (guanine), and T (thymine), paired A{T, C{G according to the so-called WatsonCrick complementarity. If we have a single stranded sequence of A, C, G, T nucleotides, together with a single stranded sequence composed of the complementary nucleotides, the two sequences will be \glued" together (by hydrogen bonds), forming a double stranded DNA sequence. Figure 1 illustrates this operation. 30{AAACTGGAG{50 + 50{TTTGACCTC{30

@ AAACTGGAG ? TTTGACCTC Figure 1 Using this biochemical reaction, Adleman has proceeded as follows, in searching Hamiltonian paths in a graph: { codify the nodes by single stranded DNA sequences of length 20 and put all these strings in a test tube, { if the node i is codi ed by the string xi and the node j is codi ed by the string xj , and there is an arrow from node i to node j in our graph, then add to the test tube a single stranded DNA sequence yij such that if xi = x0ix00i , xj = x0j x00j , each of x0i; x00i ; x0j ; x00j being strings of length 10, then yij = yij0 yij00 , where yij0 is the Watson-Crick complement of x00i and yij00 is the Watson-Crick complement of x0j . Due to the complementarity, the string yij will match the corresponding parts of xi and xj , linking them and producing in this way a sequence of length 40, as illustrated by Figure 2. 1

x0i

x00i

x0j

yij0

yij00

x00j

Figure 2 This \domino game" can continue, identifying longer and longer paths in the considered graph. By a ltering procedure which is not of interest here, one then can check whether or not paths with speci ed properties exist (for instance, Hamiltonian paths). We extract from this experiment only the basic ingredient: the operation of prolonging to the right a sequence of (single or double) symbols by using given single stranded strings, matching them with portions of the current sequence according to a complementarity relation. The formal model of this operation is the matching operation de ned in the following section. This operation can be used in building a generative/computing device: start from a given set of incomplete double stranded sequences (axioms), plus two sets of single stranded complementary sequences. Iterating the right prolongation using elements of these latter sets, we get \computations" of possibly arbitrary length. Associate to them strings by, for instance, a codi cation procedure which gives \meaning" to blocks of symbols. We obtain in this way a language. The generative power of several variants of such mechanisms is investigated here. The unrestricted case corresponds to the Adleman experiment and it is proved to characterize the regular languages. When an additional restriction is imposed, namely to use the same sequence of complementary strings from the two initial sets, then, rather surprisingly, we get a characterization of recursively enumerable languages. Whether or not such a restriction can be implemented in the DNA framework is a practical problem which we cannot answer, but providing that it can be done, computationally universal DNA \computers" could be imagined just using the Watson-Crick complementarity, plus the techniques required in the mentioned restriction. This reminds the results obtained in a series of papers (see references in [10], [13], [16]) about the possibility of designing universal (and programmable) DNA \computers" based on the operation of splicing, introduced in [9] as a model of the recombinant behavior of DNA under the in uence of 2

restriction enzymes and ligases.

3. The matching operation

Take an alphabet V (a nite set of abstract symbols) endowed with a symmetric relation (of complementarity), V V . Consider also a special symbol, #, not in V , denoting an empty space (the blank symbol). Using the elements of V [f#g we construct the composite symbols of the following sets: V = a j a; b 2 V; (a; b) 2 ; b V # = # ja2V ; a V

!

(

V = #

(

!

We denote

!

)

a ja2V : # !

)

W (V ) = VV S (V );

where

# V S (V ) = V [ # ; and we call the elements of this set well-started sequences (in general, X is the set of all strings, including the empty one denoted by , composed of elements of X , and X + is the set X ? fg). Stated otherwise, the elements of W (V ) start with pairs of symbols in V , as selected by the complementarity relation, and end either by a sux consisting of pairs #a or with a sux consisting of pairs #b , for a; b 2 V (the symbols #a ; #b are not mixed). The matching operation, denoted by , is a partially de ned mapping from W (V ) S (V ) to W (V ), de ned as follows. For x 2 W (V ); y 2 S (V ); z 2 W (V ), we write (x; y) = z if and only if one of the following cases holds: !

!

!

!

1:

x = ab1 : : : ab k 1 k

!

ak+1 : : : ak+r # # !

3

!

!

ak+r+1 : : : ak+r+p ; # # !

!

!

!

# ; : : : y= # cr c1 z = ab 1 : : : abk akc+1 : : : ack+r ak+#r+1 : : : ak+#r+p ; r 1 k 1 for k 0; r 1; p 1; ai 2 V; 1 i k + r + p; bi 2 V; 1 i k; ci 2 V; 1 i r; and (ak+i; ci) 2 ; 1 i r;

2:

!

!

!

!

!

!

!

!

!

x = ab 1 : : : ab k b# : : : b# b # : : : b # ; 1 k k+r+p k+r+1 k+r k+1 c1 : : : cr ; y= # # z = ab 1 : : : abk bc1 : : : cbk+r b # : : : b # ; k+r+p k+r+1 k+r k+1 k 1 for k 0; r 1; p 1; ai 2 V; 1 i k; bi 2 V; 1 i k + r + p; ci 2 V; 1 i r; and (ci; bk+i) 2 ; 1 i r;

!

!

!

!

!

!

!

!

!

!

4:

k+r k+1 ; : : : a# x = ab 1 : : : ab k a# k 1 # ::: # ; # : : : y= # cr+p cr cr+1 c1 z = ab 1 : : : abk akc+1 : : : ack+r c# : : : c# ; r+p r+1 r 1 k 1 for k 0; r 0; p 0; r + p 1; ai 2 V; 1 i k + r; bi 2 V; 1 i k; ci 2 V; 1 i r + p; and (ak+i; ci) 2 ; 1 i r;

!

3:

x = ab 1 : : : ab k b# : : : b# ; k+r k+1 k 1 y = c#1 : : : c#r cr#+1 : : : cr#+p ; cr+1 : : : cr+p ; z = ab 1 : : : abk bc1 : : : b cr # # k+r k k+1 1 for k 0; r 0; p 0; r + p 1;

!

!

!

!

!

!

!

!

4

!

!

ai 2 V; 1 i k; bi 2 V; 1 i k + r; ci 2 V; 1 i r + p; and (ci; bk+i ) 2 ; 1 i r: In case 1 we add complementary symbols on the lower level without completing all the blank spaces. In case 2 we complete the blank spaces on the lower level of x and we possibly still add some composite symbols of the form # . Cases 3 and 4 are symmetric to cases 1 and 2, respectively, completing c blank spaces on the upper level of the string. Figure 3 picturally illustrates these cases. !

Case 1: Case 2: Case 3: Case 4:

x y

x x

y y

x

y Figure 3

Note that in all cases the string y must contain at least one composite symbol and that cases 2 and 4 allow the prolongation of \blunt" strings in W (V ): when r = 0, there is no blank position in x. Of course, for strings x; y which do not satisfy any of the previous conditions, (x; y) is not de ned.

4. Matching systems

Using the matching operation we can de ne a generating/computing mechanism as follows: A matching system is a construct

= (V; ; T; f; g; A; Bd; Bu ); where V is an alphabet, V V is a symmetric relation on V , T is an alphabet, f : T ?! T [ fg is a weak coding, g : T ?! VV is a

5

morphism, A is a nite subset of W (V ) (of axioms), and Bd and Bu are + + V # nite sets such that Bd V and Bu # , repectively. !

!

The idea behind considering such a machinery is the following. We start with the sequences in A and we prolong them to the right according to the strings in Bd; Bu , using the matching operation (the elements of Bd are used on the lower row, down, and those of Bu are used on the upper row). When no blank symbol is present, we associate a string in T to the obtained sequence, by means of the mappings f; g: by g?1 we map blocks of the sequence into symbols of T , then certain symbols are erased by f or renamed. The language of all such strings is the language generated by . Formally, we de ne this language as follows. For two strings x; z 2 W (V ) we write

x =) z i z = (x; y) for some y 2 Bd [ Bu : We denote by =) the re exive and transitive closure of the relation =). A sequence x1 =) x2 =) : : : =) xk ; x1 2 A, is called a computation in

(of length k ? 1). A computation as above is complete when xk 2 VV (no blank symbol is present in the last string of composite symbols). The language generated by , denoted by L( ), is de ned by

x 2 A; w 2 VV g: Therefore, only the complete computations are taken into account when de ning L( ). Note that a complete computation can be continued, because we allow prolongations starting from blunt sequences. One sees the close resemblance with the operations used in the Adleman experiment: Bd corresponds to the codes of graph nodes, Bu corresponds to the complementary strings identifying the arrows in the graph (or conversely), the morphism g, by its associated inverse morphism g?1, assigns a \meaning" to the blocks corresponding to the graph nodes in the double stranded sequences. The use of such an inverse morphism cannot be avoided if we want to remain close to DNA manipulation by Watson-Crick complementarity, because we have to codify the information using blocks of nucleotides (not by single nucleotides). The fact that we use here also a given set of axioms and a weak coding f adds exibility to the model and makes it more similar to usual generating mechanisms investigated in the formal language theory.

L( ) = ff (g?1(w)) j x =) w;

6

A complete computation x1 =) x2 =) : : : =) xk ; x1 2 A; xk 2 VV , with respect to , is said to be: { primitive if no properly initial part of it is complete; { balanced if in each step xi =) xi+1 one uses a matching operation corresponding to cases 2 or 4 in Section 3. Moreover, cases 2 and 4 alternate from a step to the next one. Thus, in a primitive computation we do not use matching operations as in cases 1 { 4 with p = 0, except in the last step. In a balanced computation we allow p = 0, but from a step to the next one we have to change the set Bd; Bu from which we take the string to be used. Let us denote by Lp( ); Lb( ); Lpb( ) the language of the strings f (g?1(w)), for w obtained by a complete computation of which is primitive, balanced, both primitive and balanced, respectively. Assume now that the strings in the sets Bd; Bu are labelled in a one-toone manner by natural numbers from 1 to card(B), 2 fd; ug; denote by e : B ?! f1; : : :; card(B)g; 2 fd; ug, the labellings . For a computation D : x1 =) x2 =) : : : =) xk ; x1 2 A; xk 2 VV ; and for 1 j k ? 1, we denote e(y); if xj+1 = (xj ; y); y 2 B; e(xj =) xj+1) = ; otherwise, and we de ne e(D) = e(x1 =) x2)e(x2 =) x3) : : : e(xk?1 =) xk ); for 2 fd; ug: We say that ed(D) is the d-control word and eu(D) is the u-control word associated with D. A computation D such that ed(D) = eu(D) is called coherent. When jed(D)j = jeu(D)j (where jxj is the length of the string x) we say that D is a fair computation. We denote by Lc ( ); Lf ( ) the languages of the strings f (g?1(w)), for w obtained by a coherent complete computation, respectively by a fair complete computation in . Clearly, each coherent computation is also fair. By the de nition, L( ) L( ), for all 2 fp; b; pb; c; f g. We denote by ML, PML, BML, PBML, CML, FML the families of languages of the form L( ); Lp( ); Lb( ); Lpb( ); Lc ( ); Lf ( ), respectively, de ned as above.

7

In the following sections we shall investigate these six families of languages. We would also like to investigate coherent primitive, coherent balanced, and coherent primitive and balanced languages (and similarly for fair computations; note that a balanced computation is not necessarily a fair one, because it can start and stop in the same set Bd; Bu), but we do not consider them in this paper.

5. Characterizing the regular languages

We now begin our investigations concerning the generative capacity of matching systems. We will rst show in this section that many of the basic variants yield, in fact, only regular languages. Lemma 1. ML REG: Proof. Let = (V; ; T; f; g; A; Bd; Bu ) be a matching system and consider the languages Md = b1a01 : : : bk a0k bk+1 : : : bk+r j ab1 : : : ab k b # : : : b# 2 A; k+r k+1 k 1 k 0; r 0; ai 2 V; 1 i k; bi 2 V; 1 i k + rg; k+r 2 A; Mu = b1a01 : : :bk a0k a0k+1 : : :a0k+r j ab1 : : : ab k ak#+1 : : : a# 1 k k 0; r 0; ai 2 V; 1 i k + r; bi 2 V; 1 i kg; Ld = a1a2 : : : ak j a# a# : : : a# 2 Bd ; (

(

(

!

(

1! a1

!

2! a2 : : :

)

k ! ak

)

!

!

!

!

!

Lu = a1a2 : : :ak j # # # 2 Bu : Take the coding h : V ?! V 0 de ned by h(a) = a0; a 2 V (hence V 0 = fa0 j a 2 V g), and the regular language R = fab0 j a; b 2 V; (a; b) 2 g: Then we obtain L( ) = f (g1?1 (((MdLd ? t h(Lu )) [ (Ld ?t Muh(Lu ))) \ R)); where g1 : T ?! (V V 0) is the morphism de ned by g1(a) = b1c01b2c02 : : : brc0r if g(a) = cb1 cb2 : : : cbr ; 1 2 r and ? t is the shue operation (for x; y 2 V , x ?t y = fx1y1 : : : xnyn j n 1; x = x1 : : : xn; y = y1 : : :yn ; xi; yi 2 V ; 1 i ng).

8

The equality follows from the de nitions of Md; Mu ; Ld; Lu ; R and g1: Directly in Md; Mu and by the morphism h, the symbols appearing on the upper row of the composite symbols of A and Bu are primed; the intersection with R selects from all strings obtained by shuing the strings of Ld with those of h(Lu) (concatenated to the left with a string in Md or in Mu , which already contains intercalated primed and non-primed symbols), those strings consisting of alternating non-primed and primed symbols paired in the sense of the relation . Therefore, the strings in (Md Ld ? t h(Lu )) [ (Ld ? t Mu h(Lu )) represent correct computations in . Also, g1 corresponds to g, hence the equality is proved. The family REG is closed under all the operations used above, hence L( ) 2 REG. 2

Lemma 2. PML REG:

Proof. Starting from a matching system , construct Md ; Mu and Ld ; Lu as above. We introduce two new symbols c and d and extend h by h(d) = d. Instead of R, we de ne the regular language

R0 = fab0; acb0; ab0d j a; b 2 V; (a; b) 2 gfacb0d j a; b 2 V; (a; b) 2 g: Then, the strings in ((Mdfcg(Ldfcg) ? t h(Lu fdg)) [ ((Ldfcg ? t Mufdg(h(Lu )fdg))) \ R0 will describe primitive complete computations in (the symbols c and d are never \on the same position", excepting the end of the string). Consider the coding h0 de ned by h0(a) = a; h0(a0) = a0; a 2 V , h0(c) = h0(d) = . With g1 de ned as in the previous proof, we get

Lp( ) = f (g1?1(h0(((Md(Ld fcg) ? t h(Lu fdg)) [ (Ldfcg ? t Mu(h(Lu )fdg))) \ R0))); therefore Lp( ) 2 REG.

2

Lemma 3. PBML REG:

Proof. We repeat the previous argument, instead of R0 taking the regular language R0 \ R00, where

R00 = [(V V 0)(V fcgV 0)(V V 0)(V V 0fdg)]((V V 0)(V fcgV ) [ fg) [ [(V V 0)(V V 0fdg)(V V 0)(V fcgV 0)]((V V 0)(V V 0fdg) [ fg): The intersection of (Md fcg(Ldfcg) ? t h(Lufdg)) [ (Ldfcg ? t Mufdg(h(Lu )fdg)) with R0 \ R00 selects those complete 9

computations which are also balanced, hence

Lpb( ) = f (g1?1(h0(((Mdfcg(Ldfcg) ? t h(Lu fdg)) [ ((Ldfcg ? t Mufdg(h(Lu )fdg))) \ (R0 \ R00)))): Therefore Lpb( ) 2 REG.

2

As a particular case of the previous proof we get

Lemma 4. BML REG: Lemma 5. REG (ML \ PML \ BML \ PBML):

Proof. Consider a regular grammar G = (N; T; S; P ), assume it -free (at most the -rule S ! is present and then S does not appear on the right hand side of the rules), and construct the matching system

= (V; ; T 0; f; g; A; Bd; Bu); with

V = f[X; a]i j X 2 N; a 2 T; i = 1; 2g [ f(X; a)i j X 2 N; a 2 T; i = 1; 2g [ f[Z; ]; (Z; )g; where Z is a new symbol; = f([X; a]i; (X; a)i) j X 2 N; a 2 T; i = 1; 2g [ f([Z; ]; (Z; ))g; 0 T = T [ fcg; where c is a new symbol; f (a) = a; a 2 T; f (c) = ; X; a]i ; X 2 N; a 2 T; i = 1; 2; g(a) = ([X; a)i Z; ] ; g(c) = ([Z; ) [S; a]1 j S ! aX 2 P; a 2 T; X 2 N A = # a]1 j S ! a 2 P; a 2 T [ ([S; S; a)1 [ f j S ! 2 P g; # j X ! aY 2 P; and Y ! bY 0 2 P # Bd = (X; a)1 (Y; b)2 or Y ! b 2 P; X; Y; Y 0 2 N; a; b 2 T g !

!

(

(

(

!

)

!

!

)

!

10

[ [ Bu =

[ [

(

(

(

(

(

!

!

)

# j X ! a 2 P; a 2 T # (X; a)1 (Z; ) # # (Z; ) (Z; ) ; [X; a]2 [Y; b]1 j X ! aY 2 P; and Y ! bY 0 2 P # # or Y ! b 2 P; X; Y; Y 0 2 N; a; b 2 T g [X; a]2 [Z; ] j X ! a 2 P; a 2 T # # !

!)

!

!

!

!

[Z; ] #

!)

)

:

Every computation has to start by using a string in Bd, it continues by alternately using elements of Bd and Bu, and can be completed only by using the string [Z;#] in Bu . A complete computation cannot continue further by using strings that contain symbols other than [Z; ] or (Z; ), because the relation allows only the matching of symbols [X; a]i; (X; a)j with i = j . Continuing a computation with symbols [Z; ]; (Z; ) does not change the produced string. Therefore, L( ) = Lp( ) = Lb ( ) = Lpb( ) = Lf ( ). 2 !

Theorem 1. REG = ML = PML = BML = PBML: Thus, the Adleman way of computing cannot transgress the power of nite automata.

6. Characterizing the recursively enumerable languages We are now ready to give our main result: the family CML is universal, i.e., CML = RE . Our proof consists of two steps, one being a modi cation of the classical argument with equality sets and, the other, a speci c construction with matching systems. The essence of our proof can be described as follows. The notion of coherence comes very close to the idea of the twinshue languages [19]. Hence, the generative capacity of the latter can be carried over to matching systems. We now begin the details. From the Turing-Church thesis we have the following lemma:

Lemma 6. CML RE: 11

In view of the fact that, at the rst sight, the operation of prolongation to the right based on matching does not look very powerful, the following result is rather surprising.

Lemma 7. RE CML:

Our proof of this lemma is based on the following representation of an arbitrary recursively enumerable language L T :

L = hT (h1(E (h1; h2)) \ R): (1) where h1 and h2 are two morphisms, R is a regular language, E (h1; h2) is the equality set of h1 and h2, and hT is a special projective morphism de ned by if a 2 T ; hT (a) = a; ; if a 62 T: (

A representation which is very similar to the above has been shown in [18], [19]: L = hT (E (h1; h2) \ R): (2) The dierence between (1) and (2) is that (1) uses the h1 image of the equality set of h1 and h2, i.e., h1(E (h1; h2)) (= h2(E (h1; h2))), but (2) uses the equality set itself, i.e., E (h1; h2). Unfortunately, we have found neither a proof for (1) in the literature nor a way to derive (1) from (2) directly. We give a proof for (1) in the following, which is a modi cation of the proof for (2) in [19] (Theorem 6.9).

Lemma 8. For each recursively enumerable language L T , there exist two -free morphisms h1; h2 : 2 ! 1 , a regular language R 1 , and a projection hT : 1 ! T such that L = hT (h1(E (h1; h2)) \ R): (3) Proof. Let L be an arbitrary recursively enumerable language generated by a phrase-structure grammar G = (N; T; P; S ), where N and T are the nite sets of nonterminals and terminals, respectively, P is the nite set of productions: pi : i ! i; i = 1; : : : ; n; and S 2 N is the starting nonterminal. Without loss of generality, we assume that for each production pi : i ! i, i 6= , except for the production S ! if 2 L.

12

De ne T 0 = fa0 j a 2 T g, T 00 = fa00 j a 2 T g, and P 0 = fp0 j p 2 P g. Denote by V and V1 the sets N [ T and N [ T 0, respectively. For notational purpose, we also de ne a morphism d : V ! V1 by d(A) = A for A 2 N and d(a) = a0 for a 2 T . Note that d is a bijection; thus, the inverse of d, d?1, is well de ned. Let 1 = V [ T 0 [ fB; F; $g; (4) 00 0 2 = 1 [ T [ P [ P ; (5) where B; F , and $ are not in V , V1, or V2. The morphisms h1; h2 : 2 ! 1, depending on G, are de ned by the following: (i) h1(B ) = BS $; h2(B ) = B; (ii) h1($) = $; h2($) = $; (iii) h1(pi) = d( i); h2(pi ) = d(i ); for pi : i ! i 2 P; (iv) h1(p0i) = i; h2(p0i ) = d(i ); for pi : i ! i 2 P; (v) h1(A) = A; h2(A) = A; for A 2 N; (vi) h1(a0) = a0; h2(a0) = a0; for a0 2 T 0; (vii) h1(a00) = a; h2(a00) = a0; for a00 2 T 00; (viii) h1(a) = F; h2(a) = a; for a 2 T; (ix) h1($0) = F; h2($0) = $; (x) h1(F ) = F; h2(F ) = FF: The regular language R is de ned by the regular expression BS ($V1)$T F +: Note that i; i 6= for all i above. So, both h1 and h2 are -free morphisms. If 2 L, then we introduce an additional symbol @ to 2 and de ne h1(@) = h2(@) = BS $F: It is easy to see that by de ning h1(@) and h2(@), we will not introduce any other new words to h1(E (h1; h2)) \ R. Therefore, we assume that 62 L in the following arguments. We de ne hT : 1 ! T by a; if a 2 T ; hT (a) = ; if a 62 T: Now we show that the equation (3) holds. Our proof for that x 2 L implies x 2 hT (h1(E (h1; h2)) \ R) is similar to the one for Theorem 6.9 of [19]. So, we omit the formal proof. Instead, we (

13

use an example to explain our idea informally. The example is a modi ed version of the one from [19] (page 112). Let L be generated by the following phrase-structure grammar G:

p1 : S ! ACCC; p2 : CC ! CD; p3 : AC ! a; p4 : DC ! ACC; p5 : ACC ! C; p6 : C ! b: A derivation sequence for the word ab is

S =) ACCC =) ACDC =) aDC =) aACC =) aC =) ab: (6) According to this derivation sequence, we de ne

x = Bp1$Ap2C $p3DC $a0p4$a0p5C $a00p06$0abFFF: By the above de nitions of h1 and h2, we have

h1(x) = h2(x) = BS $ACCC $ACDC $a0DC $a0ACC $a0C $abFFFFFF: Then, clearly, x 2 E (h1; h2), h1(x) 2 R, and hT (h1(x)) = ab. Thus, ab 2 hT (h1(E (h1; h2)) \ R). Conversely, let w 2 hT (h1(E (h1; h2)) \ R), i.e., w = hT (y) for some y 2 h1(E (h1; h2)) \ R. Then by the de nition of R, y is in the form

BS $y1$y2$ : : : $ytF l; where y1; : : :; yt?1 2 V1, yt 2 T , and l > 0. Let y = h1(x) for some x 2 E (h1; h2). Then

x = Bx1$x2$ : : : $xt$0xt+1F m such that h2(x1) = S , h1(xi) = h2(xi+1) = yi, for 1 i t, and l = 2m and h1(xt+1) = F m?1. Note that if xj = xj+1 for some j , 1 j < t, then we can construct a new word x0 by deleting xj $ from x so that hT (h1(x0) \ R) = hT (h1(x) \ R) = w. So, without loss of generality, we assume that xj 6= xj+1 for all j , 1 j < t. (It is clear that xt 6= xt+1). Then the following are clear: (1) x1 = p (or x1 = p0 if t = 1) for some p : S ! in P , (2) xi 2 (V1 [ P )P (V1 [ P ), for 2 i < t, (3) xt 2 (V2 [ P 0), (4) xt+1 2 T , 14

(5) h1(xi) = yi and h2(xi) = yi?1, for 1 i t (letting y0 = S ). By (2) and (5) above and (iii) of the de nition of h1 and h2, it follows that

d?1(yi?1) )+G d?1 (yi); 2 i t ? 1. Note also that S )G d?1 (y1) and d?1(yt?1) )+G yt. Therefore, we have S )+G yt, i.e., yt 2 L. Since w = hT (y) = yt, we have proved that w 2 L. 2 Proof of Lemma 7. Let L T be an arbitrary recursively enumerable language. By Lemma 8 L = hT (h1(E (h1; h2)) \ R) for some -free morphisms h1; h2 : 2 ! 1, regular language R 1, and projection hT : 1 ! T de ned by ( X; if X 2 T; hT (X ) = ; otherwise,

Let 2 = fb0; b1; : : :; bn?1g, for some integer n > 0, and R be accepted by a deterministic nite automaton M = (Q; 1; ; q1; F ), where Q = fq0; q1; : : : ; qm?1g, for some m > 0. We construct the matching system

= (V; ; T 0; f; g; A; Bd; Bu ) where

V = 1 [ Q [ f[q; j ] j q 2 Q; 0 j m ? 1g, = f(X; X ) j X 2 1g [ f(q; q); ([q; j ];q); (q; [q;k]); ([q; j ]; [q; k]) j q 2 Q; 0 j; k m ? 1g, T 0 = 1 [ Q [ f[q; j; k]; [q; k]; [j; q] j q 2 Q; 0 j; k m ? 1g, f : T 0 ! T [ fg, de ned by X; if X 2 T; f (X ) = ; if X 2 T 0 ? T; (

g : T 0 !

V , de ned by V g(X ) = X X , for X 2 1 [ Q, q; j ] , g([q; j; k]) = [[q; k]

!

15

g([q; k]) = [q;qk] , and g([j; q]) = [q;qj ] , for q 2 Q, 0 j; k < m. !

!

A=

(

Bd =

(

q0 , # !)

# # # # # ::: qi1 qi1 a2 qi2 qi2 # # # qi ?1 qi ?1 at qi a1a2 : : :at = h2(bi); bi 2 2; 0 j m ? 1;

# [qi0 ; j ]

!

# a1 #

!

!

!

i

ti

ti

!

!

!

!

!

ti

!

!

i

(qi ; ak+1) = qi +1 ; 0 k < ti # # q 2F , qi qi i k

Bu =

(

a1 #

!

S

k

!

!

(

o

)

[qi1 ; j ] qi1 a2 qi2 qi2 : : : # # # # # at qi qi # # # a1a2 : : :at = h1(bi); bi 2 2; 0 j m ? 1; !

!

!

!

!

ti

i

ti

!

!

!

i

(qi ; ak+1) = qi +1 ; 1 k < ti qi q 2 F . # i

S

k

k

(

o

!

)

Note that each string of Bd or Bu contains an integer j , 0 j < m, which is paired with the state that appears rst (from the left) in the string. The function of this integer will become clear later. Denote by rd(i; j; k), 0 i < n and 0 j; k < m, a string in Bd which is constructed with the word h2(bi) and the state qj as the rst state that is paired with the integer k. Similarly, denote by ru (i; j; k) a string in Bu. Assume that F contains l > 0 states and

F = ff0; f1; : : : ; fl?1g; l m: 16

!

!

# and by r (n; j; 0) the Then we denote by rd (n; 0; j ) the string # u fj fj string f#j . It is clear that card(Bd) = card(Bu ) = nm2 + l. De ne the labelling mappings ed : Bd ! f1; : : : ; card(Bd)g by ed(rd(i; j; k)) = i m2 + j m + k + 1 and eu : Bu ! f1; : : :; card(Bu )g by eu(ru(i; j; k)) = i m2 + k m + j + 1: Let u denote a string in A. By the above construction of , it is clear that u =) z if and only if there is a sequence qi0 a1qi1 a2 : : : qi ?1 atqi such that (1) qi0 = q0, qi 2 F , and (qi ?1 ; ak) = qi ; and (2) there is x 2 2 such that h1(x) = h2(x) = a1a2 : : : at. We show that Lc ( ) = hT (h1(E (h1; h2)) \ R) in the following. Let w 2 hT (h1(E (h1; h2)) \ R). Then there exist x = bi1 bi2 : : : bi 2 E (h1; h2) and y = h1(x) = h2(x) such that y 2 R and w = hT (y). Let y = a1a2 : : : at. Then we have a state sequence qj1 ; qj2 ; : : : ; qj +1 of M such that qj1 = q0, qj +1 = fr 2 F for some r, 0 r < l, and (qj ; ak) = qj +1 for 1 k t. Note that h1(x) = h1(bi1 ) : : : h1(bi ) = a1 : : :at. Let h1(bi ) = a : : : a +1?1 , 1 k < s, and h1(bi ) = a : : : at. Similarly, let h2(bi ) = a : : : a +1?1 , 1 k < s, and h2(bi ) = a : : :at. Then, there exists a computation D such that D uses the following strings from Bd: rd(i1; j 1 ; j1+1); : : : ; rd(is; j ; j +1); rd(n; 0; r) and the following from Bu : ru(i1; j1+1; j 1 ); : : :; ru (is; j +1; j ); ru(n; r; 0): Let the result of the computation D be z. Clearly, by the de nitions of f and g, f (g?1 (z)) = w. It is also easy to see that ed(D) = eu(D) by the de nitions of ed and eu. We now show that if w 2 Lc( ), then w 2 hT (h1(E (h1; h2)) \ R). We have w 2 f (g?1 (z)) for some z that is the result of a computation D of and eu (D) = ed(D). By the construction of the matching system , one can observe that z corresponds to a sequence qi0 ; a1; qi1 ; a2; : : : ; qi ?1 ; at; qi !

t

t

t

k

k

s

t

t

k

s

k

k

k

k

k

k

k

s

s

s

s

s

t

17

s

s

s

t

where qi0 = q0, qi 2 F , and (qi ?1 ; ak) = qi , for 1 k t. Then it is clear that a1a2 : : :at 2 R. By the de nition of Bd and Bu and the fact that eu(D) = ed(D), it follows that a1a2 : : :at = h1(x) = h2(x) for some x 2 2. Denote a1a2 : : : at by y. Then, y 2 h1(E (h1; h2)) \ R. It is easy to show that w = f (g?1(z)) = f (y) = hT (y). Therefore, w 2 hT (h1(E (h1; h2)) \ R). 2 t

k

k

From Lemmas 6 and 7 we get

Theorem 2. RE = CML:

7. An intermediate case For the fair computations we have

Theorem 3. REG FML RE: Proof. The inclusion REG FML follows from the proof of Lemma 5. For the strictness, let us consider the matching system

= (V; ; T; f; g; A; Bd; Bu); V = fa; a0; b; b0g; = f(a; a0); (b; b0)g; T = fa; bg; f (a) = a; f (b) = b; g(a) = aa0 ; g(b) = bb0 ;

!

A = #a ; # # Bd = # ; a0 b0 b0 Bu = #a #a ; #b (

!)

(

!

(

!

!

!

!)

!)

; :

Starting with the unique axiom in A, we have to use # a0 from Bd and we obtain a blunt sequence. We can continue with any string in Bd and Bu. However, due to the complementarity restrictions, if a symbol b or b0 is introduced, then we have to continue by using composite symbols # b0 and b until obtaining again a blunt sequence. # !

!

!

18

Thus, let us intersect the language Lf ( ) with the regular language a+b+. We obtain a language consisting of strings of the form a2n+1b2m; n; m 1, produced by computations where { the rst string in Bd is used 2n + 1 times, { the second string in Bd is used m times, { the rst string in Bu is used n times (one occurrence of a is introduced by the axiom), { the second string in Bu is used 2m times. Due to the fairness, we must have 2n + 1 + m = n + 2m; which means that

n = m ? 1: The language fa2m?1b2m j m 1g is not a regular one, hence Lf ( ) is not regular. From the Turing-Church thesis we have FML RE . For the strictness, we shall prove that FML MAT , where MAT is the family of languages generated by context-free matrix grammars with arbitrary rules. Because MAT RE ([6], [8]), we obtain FML RE . Consider a matching system = (V; ; T; f; g; A; Bd; Bu). De ne V 0 = fa0 j a 2 V g; L(A) = f[a1; b01] : : : [ak ; b0k ]ak+1 : : :ak+r j k; r 0; k + r 1; a1 : : : ak ak+1 : : : ak+r 2 Ag # # bk b1 0 0 0 0 [ f[a1; b1] : : : [ak ; bk ]bk+1 : : :bk+r j k; r 0; k + r 1; a1 : : : ak # : : : # 2 Ag bk+r bk bk+1 b1 [ f j 2 Ag; # 2 B g; : : : L(Bd ) = fb01 : : : b0k j k 1; # d bk b1 L(Bu ) = fa1 : : : ak j k 1; a#1 : : : a#k 2 Bu g:

!

!

!

!

19

!

!

!

!

Consider the new symbols s; d; d0 and construct the languages

L1 = fxd0 j x 2 L(Bd)g+ ; L2 = fxd j x 2 L(Bu )g+; L01 = L1 ? t c+ ; L02 = L2 ? t c+ ; L3 = (L(A)L01 ? t L02) \ f[a; b0] j a; b 2 V g(V V 0 [ fcd0; dcg): Clearly, L1; L2 are regular languages, hence also L3 is regular: the family REG is closed under the shue operation and under intersection. Consider the gsm Q which: { leaves unchanged the symbols [a; b0]; a; b 2 V , { replaces each pair ab0 by [a; b0]; a; b 2 V; { replaces each pair cd0 by [c; d0] and each pair dc by [d; c]. The language Q(L3) is also regular, over the alphabet

U = f[a; b0] j a; b 2 V g [ f[c; d0]; [d; c]g: Let G = (N; U; S; P ) be a regular grammar for Q(L3) and construct the matrix grammar G0 = (N 0; VV ; S 0; M ); where

N 0 = N [ U [ fS 0g; M = f(S 0 ! S )g [ f(r) j r 2 P g [ f([a; b0] ! ab ) j a; b 2 V g [ f([c; d0] ! ; [d; c] ! )g:

It is easy to see that contains all the strings w 2 VV such that x =) w in , x 2 A, and this is a fair derivation: the matrix ([c; d0] ! ; [d; c] ! ) checks whether or not the number of symbols d and d0 is the same. Now, clearly, Lf ( ) = f (g?1 (L(G0))). The family MAT is closed under arbitrary morphisms and under inverse morphisms ([5]), hence Lf ( ) 2 MAT . 2

L(G0)

20

Open problem. Is the family FML included in CF (or in LIN ) ?

7. Final remarks

We have found characterizations of families REG and RE using variants of matching systems. Several further research directions are of interest:

De ne variants of matching systems able to characterize families which are dierent from REG and RE . (What about CF and CS ?)

Look for universal matching systems, in the sense usual for univer-

sal Turing machines. This is important for having universal and programmable DNA matching systems.

Look for other variants which characterize RE , but not using the coher-

ence restriction. This is again important, for nding \realistic" models of DNA computers based on matching.

Investigate the descriptional complexity (the size) of matching systems and languages. Several parameters are very natural: the number of axioms, the length of the longest axiom, the number of elements in the sets Bd; Bu , the length of the longest string in Bd; Bu , etc. Do these measures give in nite hierarchies of matching languages or we can bound them ?

References [1] L. M. Adleman, Molecular computation of solutions to combinatorial problems, Science, 226 (Nov. 1994), 1021 { 1024. [2] L. M. Adleman, On constructing a molecular computer, manuscript in circulation, 1995. [3] E. Csuhaj-Varju, L. Freund, L. Kari, Gh. Paun, DNA computing based on splicing: universality results, First Annual Paci c Symp. on Biocomputing, Hawaii, Jan. 1996. [4] K. Culik II, A purely homomorphic characterization of recursively enumerable sets, Journal of the ACM 26 (1979) 345-350. [5] J. Dassow, Gh. Paun, Regulated Rewriting in Formal Language Theory, Springer-Verlag, Berlin, Heidelberg, 1989. 21

[6] J. Dassow, Gh. Paun, A. Salomaa, Grammars with controlled derivations, in Handbook of Formal Languages (G. Rozenberg, A. Salomaa, eds.), in press. [7] R. Freund, L. Kari, Gh. Paun, DNA computing based on splicing: The existence of universal computers, Technical Report 185-2/FR-2/95, TU Wien, 1995. [8] D. Hauschild, M. Jantzen, Petri nets algorithms in the theory of matrix grammars, Acta Informatica, 31 (1994), 719 { 728. [9] T. Head, Formal language theory and DNA: an analysis of the generative capacity of speci c recombinant behaviors, Bull. Math. Biology, 49 (1987), 737 { 759. [10] T. Head, Gh. Paun, D. Pixton, Language theory and molecular genetics. Generative mechanisms suggested by DNA recombination, in Handbook of Formal Languages (G. Rozenberg, A. Salomaa, eds.), in press. [11] R. J. Lipton, Using DNA to solve NP-complete problems, Science, 268 (Apr. 1995), 542 { 545. [12] R. J. Lipton, Speeding up computations via molecular biology, manuscript in circulation, 1994. [13] Gh. Paun, Splicing. A challenge to formal language theorists, Bulletin EATCS, 57 (1995), 183 { 194. [14] Gh. Paun, Regular extended H systems are computationally universal, J. Automata, Languages and Combinatorics, 1, 1 (1996), 27 { 36. [15] Gh. Paun, G. Rozenberg, A. Salomaa, Computing by splicing, Theor. Computer Sci., to appear. [16] Gh. Paun, A. Salomaa, DNA computing based on the splicing operation, Mathematica Japonica, 43, 3 (1996), 607 { 632. [17] A. Salomaa, Formal Languages, Academic Press, New York, London, 1973. [18] A. Salomaa, Equality sets for homomorphisms of free monoids, Acta Cybernetica 4 (1978) 127-139. [19] A. Salomaa, Jewels of Formal Language Theory, Computer Science Press, Rockville, Maryland, 1981. 22

Turku Centre for Computer Science Lemminkaisenkatu 14 FIN-20520 Turku Finland http://www.tucs.abo.

University of Turku Department of Mathematical Sciences

Abo Akademi University Department of Computer Science Institute for Advanced Management Systems Research

Turku School of Economics and Business Administration Institute of Information Systems Science

DNA Computing, Matching Systems, and Universality

DNA Computing, Matching Systems, and Universality

Suggest Documents

DNA computing, sticker systems, and universality - Springer Link

Universality Issues in Reversible Computing Systems and Cellular

DNA Computing, Sticker Systems, and Universality1 - CiteSeerX

DNA Computing Based on Splicing: Universality Results - CiteSeerX

Universality and Deviations in Disordered Systems

Decidability and Universality in Symbolic Dynamical Systems

Universality Classes and Fluctuations in Disordered Systems

Online Perfect Matching and Mobile Computing - CiteSeerX

Task Matching and Scheduling in Heterogeneous Computing ...

Computing with DNA

DNA computing on surfaces

Experimental DNA computing - CiteSeerX

Reducing graph matching to tree matching for XML ... - NUS Computing

Universality in Elementary Cellular Automata - Complex Systems

Some universality results for dynamical systems

Non-universality of artificial frustrated spin systems

Universality classes in nonequilibrium lattice systems

Text Entry Systems : Mobility, Accessibility, Universality

Scalable Load Balancing in Networked Systems: Universality ...

Quasi-universality in mixed counterions systems

Universality and scaling in human and social systems

DNA computing: applications and challenges - Core

DNA computing: applications and challenges - Core

DNA COMPUTING AND ITS APPLICATIONS