Splicing systems using merge and separate operations 1 - CiteSeerX

2 downloads 0 Views 157KB Size Report
We study in details how to control molecules in a set of splicing tubes, in such a way ... A Merge and Separate System (MS for short) is a construct. M = (V; R; A1; ...
Splicing systems using merge and separate operations C. Zandron, G. Mauri, C. Ferretti, P. Bonizzoni DISCo - Universit a di Milano-Bicocca - Italy

contact:

[email protected]

Abstract

We study in details how to control molecules in a set of splicing tubes, in such a way that we can genereate any recursively enumerable language by allowing them to split and join (splice), but also by separating the molecules of a tube, according to the presence of a pattern in them, and by merging tubes when needed. We show that this new model of distributed splicing system is computationally universal.

1 Introduction In 1987 ([3]) Tom Head suggested to study in a formal way the interactions between DNA molecules, considering the well known biochemical reaction of splicing: molecules are rst cut and later joined in a crossed way, according to the presence of speci c patterns along the molecules themselves. Recently, with the long term goal of producing computing devices based on biological molecules, a long series of studies considered again this model. Other models are also being considered, and a good early instance of the results which can be practically obtained by implementing these ideas can be found in [1]. More speci cally, real splicing systems have been studied in [5]. The original theoretical model of splicing systems had a computational power smaller than that of a nite state automaton. This has motivated several new models based on these original ideas, aiming at obtaining more powerful computing molecular devices ([4]). One direction has been that of putting together at work many test tubes, instead of only one. In each tube molecules are free to perform splicing, but some speci ed molecules can be moved from a tube to another. Several scienti c papers already studied this model, looking for computationally universal distributed splicing systems of this type, using fewer and fewer test tubes. Recent results are for instance in [2a] and [2b]. In this paper we describe a di erent distributed model based on splicing, where molecules can move according to these two ideas:  when the computation, as we will de ne, requires it, the molecules of a test tube can be separated into two new test tubes, according to the presence or not of some pattern in the molecules themselves;  at di erent times during the computation, two test tubes can be merged in one single new test tube. We show, in a constructive way, that even this model is computationally universal. We observe that similar ideas where described even in [1].

2 Notation

Consider a nite alphabet V. We denote with V  the set of all ( nite) strings over V, while V + = V  ?  is the set of all strings in V  but the empty string, . 1

The family of recursively enumerable languages and of nite languages are denoted by RE and FIN, respectively. A Head splicing system (or H system) is a triple H = (V; A; R), where V is the alphabet of H, A  V  is the set of axioms, and R is the set of splicing rules, with R  V  #V  $V  #V  For x; y; z; w 2 V  and r = u1#u2$u3#u4 in R, we de ne (x; y) `r (z; w) if and only if x = x1u1u2 x2; y = y1 u3u4 y2; and z = x1u1 u4y2 ; w = y1 u3u2 x2; for some x1; x2; y1 ; y2 2 V  : For an H system H = (V; A; R) and a language L  V  , we write (L) = fz 2 V  j(x; y) `r (z; w) or (x; y) `r (w; z); for some x; y 2 L; r 2 Rg; and de ne  (L) = where

[ i(L)

i0

0 (L) = L i+1 (L) = i (L) [ (i (L)) for i  0

An H system is meant to operate starting from the set of strings A, and then generate new strings iterating the splicing step `r on them and on the strings generated during this process. The language generated in this way is  (A). We may think about this model as a system in which we have a set of test tubes, each one containing a set of DNA strands and a set of restriction enzymes R. The restriction enzymes are the same for each tube. This system can do these operations: - Create new strands using the restriction enzymes and the strands initially present in the tube - Merge two or more tubes, creating a new test tube with all the strands of the initial tubes - Separate a test tube to create two distinct test tubes, with di erent strands A Merge and Separate System (MS for short) is a construct M = (V; R; A1 ; A2; A3; A4) where V is an alphabet, R is a set of splicing rules and every Ai  V  (1  i  4) is a set of axioms. Initially, the four sets of axioms are placed in four distinct tubes. In each tube we create all the strings we are able to produce by applying the set of splicing rules R (as usually made in H systems). Moreover, we can do two other operations: MERGE: M(Ti ; Tj ) = Ti [ Tj : Starting from two tubes, we produce a single tube containing all the strings of the two starting tubes. This operation can easily be extended to a nite number of tubes: by M(T1 ; T2; :::; Tn) we indicate the Merge operation executed on n tubes. SEPARATE: +(Ti ; s) and ?(Ti ; s); s 2 V + : Starting from a single tube, we produce two tubes. In the rst one, +(Ti ; s), we put all the strings that contain the substring s and in the second one, -(Ti ; s), we put all the strings not containing s as a substring. 2

As computation proceeds, the number of test tubes will vary, depending on the operations we decide to undertake: if we do a Separate operation we will increase by one the number of test tubes, if we Merge two tubes we decrease by one the number of test tubes. The language generated by a MS system is the set of all the strings from T  present in any test tube of the system. The class of languages MS is the set of all the languages which can be generated by a MS system.

3 Informal description We give here an informal description of how the computation proceed. STEP 1 (simulate productions of G) Consider a type-0 grammar G and a string of the form XwuY (X and Y are special symbols not in G used to control the process, while wu is a sentential form of the grammar G). Using a splicing operation we can simulate a production u ! v of G on the right end of the string. We get: XwvY. We can proceed in this way until no left term of a production in G appears in the right end of the string. Let us denote with Xw1 rw2Y the obtained string and suppose we have now to simulate a production r ! t. We are not able to directly simulate a production of G in the middle of the string using a nite number of splicing rules, so we proceed by rotating right-end symbols of w1 rw2, one symbol at time. In this way, we get Xw2 w1rY and we can simulate the previous production to obtain the string Xw2 w1 tY . By rotating the simbols of w2 w1t we nally get Xw1 tw2Y . STEP 2 (rotate characters) To rotate a symbol from the right end to the left end we give an order to the symbols of G and then we proceed as follow: Step 2.1 The i-th symbol in this order is replaced with a corresponding special symbol i. For example, if we have a string XwqY and q is the third symbol, we get Xw 3Y . We use these special symbols because we need to distinguish the strings in the rotating phase from the others strings. Step 2.2 Now we have to put the same symbol on the left-end of the string, but we are not able to directly do this using a nite number of splicing rules. So, we put a random special symbols on the left and then we use a series of separate operations to isolate the strings in which the special symbols on the left-end and on the right-end are the same. Step 2.3 Let us explain how we isolate the strings in which the special symbols of rotation are the same on both the left and the right end with an example. Consider a grammar with an alphabet of three symbols and consider a test tube A0 containing a set of strings of the form X j w iY (1  i  3; 1  j  3): We have to extract only the strings of the form X k w kY (1  k  3): We rst separate in A0 the strings containing 1 and we get two tubes (A1 ; A2): in A1 we have strings with at least one symbols 1 while in A2 we have strings without symbols 1. If we separate in A2 the strings containing 2 we get two other tubes (A3 ; A4): in A3 there are the strings containing 2 (but not 1) while in A4 there are the strings not containing 2 (and not containing 1, because we made the separate operation starting from A2 ). The strings in A4 are of the form X 3 w 3Y ; the special character of rotation is the same on both the right and the left end. Consider now A3 . In this tube, the strings are of one of the following forms: X 2 w 3Y , X 3 w 2Y or X 2 w 2Y . If we separate from A3 strings containing 3, we get two tubes. In one 3

of these tubes we have strings of the forms X 2 w 3Y and X 3 w 2Y while in the other one we have strings of the form X 2 w 2Y . In A1 we have strings of the form X 1 w 2Y , X 1 w 3Y , X 2w 1Y , X 3 w 1Y and X 1 w 1Y . By separating strings containing 2 and then strings containing 3 we get a tube a in which strings are only of the from X 1 w 1Y . Of course, if the number of symbols of G is greater we need an increased number of separate operations, but proceeding in this way we are able to isolate strings of the form X k w k Y . The other strings are put together in a garbage tube, because they are "wrong". Step 2.4 To terminate the rotation step we delete from the right-end the rotation symbol and decode the rotation symbol in the left end with the corresponding symbol of G. STEP 3 (select terminal strings) The other operation we have to do is to isolate the terminal strings. First of all we have to isolate the strings that have been completely rotated (this is done with a special symbol B not in G that marks the start of the string). We separate the strings in which this symbol is on the left-end. Then we have to separate from these strings, the strings containing non terminal symbols of G. This is easy to do with a series of separate operation (one for each non terminal symbol of G). We get strings containing only terminal symbols of G and special symbols X, Y and B. These symbols are placed on the right-end and on the left-end, so they are easy to remove using two splicing operations.

4 Main result We show now that such a model using only a nite number of rules and a nite number of axioms is able to generate the class of RE languages.

Theorem: MS = RE Proof:

From the Turing-Church thesis we have MS  RE We have to show that RE  MS Let us take a type 0 grammar G = (T; N; S; P), where T = ft1; :::; tng is the set of terminal symbols, N = fn1; :::; nmg is the set of non terminal symbols, S is the starting symbol and P the set of productions. The MS system is built as it follows:

V = T [ N [ fX; Y; B; Z1; Z2 ; Z3; ZH ; ZT ; XH ; YT g [ f ij1  i  kg: The alphabet of the system contains terminal and non terminal symbols of the grammar G and two other sets of special symbols, not in G. The second set (f ij1  i  kg) is used to rotate the characters of the strings as explained below, while the rst set (fX; Y; B; Z1; Z2 ; Z3; ZH ; ZT ; XH ; YT g) contains symbols which are used as brackets for strings and to recognize \working" strings, and the symbol B which marks the starting of the rotated string. We denote with U = fU1 ; :::; Ukg the set of symbols T [ N [ fB g. We have: k = n + m + 1 In R, the splicing rules used are: 4

1:f#uY $Z1#vY ju ! v 2 P g[ (simulate productions of G) 2:f#UiY $Z1 # iYT j1  i  kg[ (prepare the string to rotate last symbol) 3:fX#$XH i#Z2 j1  i  kg[ (put in the left end of the string a special symbol of rotation) 4:fXH i#$XUi #Z3 j1  i  kg[ (decode the special symbol of rotation to a symbol of G) 5:f# iYT $Z3 #Y j1  i  kg[ (delete the rotated character from the right end of the string) 6:fXB#$#ZH g[ (delete the bracket from the left end of the string) 7:f#Y $ZT #g (delete the bracket from the right end of the string) Rule 1 is used to perform STEP 1 described in the previous paragraph. Rules 2, 3, 4, and 5 are used to perform STEP 2 described in the previous paragraph. Rules 6 and 7 are used to perform STEP 3 described in the previous paragraph.

The sets of starting axioms are: A1 = fXBSY g [ fZ1 vY ju ! v 2 P g [ fZ1 iYT j1  i  kg A2 = fXH iZ2 j1  i  kg A3 = fXUi Z3 j1  i  kg [ fZ3Y g A4 = fZH ; ZT g We denote with T1 ; T2; T3 and T4 the tubes containing A1 ; A2; A3 and A4 respectively. Initially, the only tube able to create something new is T1 : To describe the process, we consider the general case of the string XmY , where m 2 U  . For XBSY we have m = BS. In T1 we can apply splicing rules of types 1 and 2: i) XwuY; Z1 vY `1 XwvY; Z1 uY , where u ! v 2 P and m = wu, to simulate a production of grammar G in the right end of the string . (This is STEP 1 in the informal description of the process of the previous paragraph)

ii) XwUi Y; Z1 i YT `2 Xw iYT ; Z1Ui Y , where m = wUi , to prepare the string to rotate the last character (Ui , not Y that is used only as bracket). (This is STEP 2.1 in the informal description of the process of the previous paragraph)

Moreover, we can apply rules like: iii) Z1 v1Ui Y; Z1 i YT `2 Z1 v1 iYT ; Z1 Ui Y (where v = v1 Ui ) After these operations, in tube T1 we get strings of the form: XmY (where m 2 U  ), Z1 vY (v is the right term of a production in G), Z1 iYT (1  i  k), Z1 uY (u is the left term of a production in G), Xw i YT , Z1 Ui Y; Z1 m1 iYT (where m1 2 U  ). The strings of the form XmY; Z1 vY (v is the right term of a production in G) and Z1 i YT can enter new splicing of type 1 and 2 creating new strings in which we simulate the production of G in the right end of the string and in which we prepare the right end character to rotate. After a series of these operation, we can create nothing new. The strings of the form Z1 uY can enter splicing of type 1 and 2. We have the following possibilities: - Z1 uY; Z1vY `1 Z1 uY; Z1 vY , hence creating nothing new - Z1 u1Ui Y; Z1 iYT `2 Z1 u1 iYT ; Z1Ui Y , where u = u1Ui : The string Z1 Ui Y is already in T1 : The string Z1 u1 iYT cannot enter new splicing. The strings of the form Xw iYT cannot enter new splicing in T1: 5

The strings Z1 Ui Y can enter splicing of type 1 and 2 but creating nothing new. The strings of the form Z1 m1 iYT (m1 2 U  ) cannot enter new splicing in T1 : Thus, after applying a number of times rules of type 1 and 2, we are not able to create new strings. So, we start using the additional operations of Merge and Separate. First of all, we execute two Separate operation in T1 to select the strings ready to rotate the character on the right end (i.e. the strings of the form Xw i YT ). We recognize these strings by their right parenthesis symbol, that is YT instead of Y , and from the left bracket that is X. We get: T1;1 = +(T1 ; YT ) = Strings containing YT (Xw i YT ; Z1 i YT and Z1 m1 i YT ). T1C;1 = ?(T1 ; YT ) = Strings not containing YT T5 = T1;2 = +(T1;1; X) = Strings containing X and YT (i.e. strings of the form Xw i YT ; ready to rotate the character on the right end) T1C;2 = ?(T1;1; X) = Strings not containing X and containing YT (strings of the form Z1 iYT and Z 1 m 1 i Y T ) Because we need to reuse the strings of the form Z1 iYT with the string left in T1C;1; we execute a Merge Operation: T1C = M(T1C;1; T1C;2) In T5 we can create nothing new by applying Splicing rules R: In T1C the only strings we can create by applying Splicing rules is the strings of the form Xw i YT (the strings we have just separated and put in T5 ), so we cannot create new strings. Let us consider now the tube T5 : We will deal with T1C later in the proof. (Next we have Step 2.2 described in the previous paragraph)

We now execute a Merge operation between T5 and T2: T6 = M(T5 ; T2) In T6 we get the strings:

fXw i YT j1  i  kg [ fXH iZ2 j1  i  kg In T6 we can apply only rules of type 3. We get: Xw i YT ; XH j Z2 `3 XH j w iYT ; XZ2 After a series of such operations, no new strings appear. We start to rebuild tube T2 ; so we can use again this tube later. To rebuild this tube we avoid the operation of \Amplify", because, as said in [1], in the practical applications this operation is quite complex and error prone. We use instead a series of Separate operations, followed by a Merge operation. 6

T7;1 = +(T6 ; XH 1Z2 ) = String XH 1Z2 : In T6 there aren't other strings with this substring. T7C;1 = ?(T6 ; XH 1Z2 ) = Remaining strings (All but XH 1Z2 ). T7;2 = +(T7C;1; XH 2Z2 ) = String XH 2Z2 : T7C;2 = ?(T7C;1; XH 2Z2 ) = Remaining strings (All but XH 1Z2 ; XH 2Z2 ) ... T7;i = +(T7C;i?1; XH iZ2 ) = String XH i Z2 T7C;i = ?(T7C;i?1; XH iZ2 ) = Remaining strings (All but XH 1Z2 ; XH 2Z2 ; :::; XH i Z2 ) ... T7;k = +(T7C;k?1; XH k Z2 ) = String XH k Z2 T7C;k = ?(T7C;k?1; XH k Z2 ) = Remaining strings (All but XH 1Z2 ; XH 2Z2 ; :::; XH k Z2 ) With k Separate operations (recall that k = number of terminal characters + number of non terminal characters +1), one operation for every character that can execute a rotation, we get k + 1 tubes. Every tube in the rst k contains an axiom of the tube T2 (for instance, the tube T7;i contains the axiom XH iZ2 ; i.e. the axiom used to rotate the i-th character), while the last tube, T7C;k ; contains the remaining strings. By looking at the strings in each of these tubes, we can see that no Splicing Rule can be applied. By merging all the tubes T7;i we get exactly T2 : T2 = M(T7;1; T7;2; :::; T7;k): As said before, no Splicing Rule can be applied in T2: (Next we describe in detail Step 2.3 of the previous paragraph)

Let us denote with T8 the tube T7C;k In T8 we have the strings: fXw i YT j1  i  kg [ fXH j w iYT j1  i  kg [ fXZ2 g We have to deal only with the strings in which we have put a rotation character in the left end (i.e. the strings not containing the bracket character X). So, we can isolate these strings with a Separate operation: T9 = ?(T8 ; X) = Strings not containing X T9C = TG = +(T8 ; X) = Strings containing X In T9C we have useless strings, so we denote this tube with TG to indicate that this tube contains the \Garbage" strings. In T9 we get fXH j w iYT j1  i  k; 1  j  kg, thus the strings in which we have put a rotation character on the left end (not necessary the same rotation character that is on the right end). Now, we have to select in this set of strings, the subset containing only strings in which the rotation character on the left side is the same as that on the right side. To perform this operation we are going to execute a series of Separate operations in two distinct phases. First of all, we make a rough separation of the strings depending on the symbols i and then we isolate the strings of the form XH iw i YT (i.e. with the same symbol i in the left end and in the right end). We denote with T9;j the contents of tube T9 after the Separate operation of the rst phase on the j-th symbol. 7

Phase 1: T9;1 = +(T9 ; 1) = Strings containing 1 T9C;1 = ?(T9 ; 1) = Strings not containing 1 T9;2 = +(T9C;1; 2) = Strings containing 2 (but not containing 1 : this separate operation is done on T9C;1) T9C;2 = ?(T9C;1; 2) = Strings not containing 1 nor 2 T9;3 = +(T9C;2; 3) = Strings containing 3 (but not containing 1 nor 2 ) T9C;3 = ?(T9C;2; 3) = Strings not containing 1 ; 2 and 3 ... T9;k?1 = +(T9C;k?2; k?1) = Strings containing k?1 (but not containing 1; 2; 3; :::; k?2) T9C;k?1 = +(T9C;k?2; k?1) = Strings not containing k?1 nor 1 ; 2; 3; :::; k?2) T9;k = T9C;k?1 = Strings containing k (but not containing 1; 2; 3; :::; k?1) After these series of Separate operations, we can start with the second phase. The strings in T9;k contain the symbol k but cannot contain any other symbol i with 1  i  k ? 1: So, in T9;k we get only strings of the form XH k w k YT or we have that T9;k is empty. Let us call this tube T10;k : Consider now the tube T9;k?1: the strings in this tube are of the form XH iw j YT where i and j can be only equal to k ? 1 or k, and at least one between i and j is equal to k ? 1. We are only interested in the strings of the form XH k?1w k?1YT : To extract these strings, we do the following Separate operation (we denote with T9;i;j the tube T9;i after the Separate operation on the character j ): T9;k?1;k = +(T9;k?1; k) = Strings in T9;k?1 that contain k : These strings are of the form XH k?1w k YT or XH k w k?1YT : The character on the left end is di erent from the character on the right end. These strings are \wrong", so we put them in the garbage tube TG with a Merge operation. T9C;k?1;k = T10;k?1 = ?(T9;k?1; k) = Strings in T9;k?1 without the character k : These strings are of the form XH k?1w k?1YT : If we look at tube T9;k?2 we see that the strings inside it are of the form XH i w j YT where i and j can only be equal to k; k ? 1 or k ? 2, and at least one between i and j is equal to k ? 2. We are interested only in strings of the form XH k?2w k?2YT : We can isolate these strings with two Separate operations: T9;k?2;k = +(T9;k?2; k) = Strings in T9;k?2 that contain k . These strings are of the form XH k?2w k YT or XH kw k?2YT . The character on the left end is di erent from the character on the right end. These strings are \wrong", so we put them in the garbage tube TG with a Merge operation.

8

T9C;k?2;k = ?(T9;k?2;k; k) = Strings in T9;k?2 without k . These strings can only contain the characters k?1 and k?2. Obviously, at least one of the two characters in these strings is the character k?2. We eliminate the strings containing k?1 with a Separate operation. T9;k?2;k?1 = +(T9C;k?2;k; k?1) = Strings in T9;k?2;k that contain the symbol k?1. These strings are of the form XH k?2w k?1YT or XH k?1w k?2YT . The character on the left end is di erent from the character on the right end. These strings are \wrong", so we put them in the garbage tube TG with a Merge operation. T9C;k?2;k?1 = T10;k?1 = ?(T9C;k?2;k; k?1) = Strings in T9C;k?2;k without the symbol k?1. These strings are of the form XH k?2w k?2YT . By repeating these operations starting from tubes T9;k?3; T9;k?4; :::; T9;1 (and executing a number of Separate operations equal to i for the tube T9;k?i) we are able to create tubes T10;i(1  i  k) in which we have only strings of the form XH i w iYT . In no one of the tubes described above (obtained after every separate operation in the rst and in the second phase) can be applied splicing rules. Thus, after every separate operation, no new string can be produced. Now, we merge all the tubes T10;i in a single tube. T10 = M(T10;1; T10;2; :::; T10;k?1; T10;k ) In T10 we have only strings in which the special characters of rotation ( i ) are the same on the left end and on the right end, i.e. T10 contains only strings of the form fXH iw iYT j1  i  kg. In T10 we can create nothing new by applying Splicing rules. So we merge T10 with T3 . T11 = M(T3 ; T10) (Next we describe Step 2.4 of the previous paragraph)

In T11 we get the strings fXH iw iYT j1  i  kg [ fXUi Z3 j1  i  kg [ fZ3 Y g By applying Splicing Rules of type 4 and 5, the only two rules we can apply with these strings, we get: XH iw iYT ; XUi Z3 `4 XUi w iYT ; XH iZ3 XUi w iYT ; Z3 Y `5 XUi wY; Z3 iYT If we apply the rules of type 5 before the rules of type 4 we can create strings of the form XH iwY too. After a series of operations of these types, we get in T11 the strings: fXUi w iYT j1  i  kg [ fXH i wY j1  i  kg [ fXUi wY j1  i  kg [ fZ3 iYT j1  i  kg [ fXH i Z3 g, in addition to the strings that was already present: fXH iw iYT j1  i  kg [ fXUi Z3 j1  i  kg [ fZ3 Y g 9

By applying Splicing Rules we can create nothing new, so we start another series of Merge and Separate operation. First of all, we have to rebuild the tube T3 . This can be done with a series of Separate operations followed by a Merge operation, like we have done with T2 . Let us denote with T12 the tube we get after rebuilding T3 . In T12 we have the strings: fXUi w iYT j1  i  kg [ fXH i wY j1  i  kg [ fXUi wY j1  i  kg [ fZ3 iYT j1  i  kg [ fXH i Z3 g [ fXH i w iYT j1  i  kg We are interested in strings in which we have completed the rotation of the character that was initially present on the right end of the strings. These strings are the following: fXUi wY j1  i  kg. These strings are the only ones between the strings in T12 that contain either the symbols X and Y. To divide these strings from the other, we execute two Separate operations: T12;1 = +(T12 ; X) = Strings containing X T12C ;1 = ?(T12 ; X) = Strings not containing X T13 = T12;2 = +(T12;1; Y ) = Strings containing Y (and containing X, because we started from T12;1) T12C ;2 = ?(T12;1; Y ) = Strings not containing Y (and containing X) In the previous obtained tubes (T12C ;1; T12;1; T13 and T12C ;2) we can create nothing new by applying Splicing rules. In T13 we get exactly the strings fXUi wY j1  i  kg, thus strings in which we have rotated one character from the right end to the left end of the string. Using a Merge operation we put the contents of T12C ;1 and of T12C ;2 in the garbage tube TG . Let us consider now the tube T1C we obtained after the rst Separate operation on T1 . By merging T1C with T13 we get a tube in which we have the following strings:

fXwUi Y j1  i  k; w 2 U  g [ fXUi wY j1  i  k; w 2 U  g [ fZ1 mY jm 2 U  ; m is a left or right term of a production in Gg [ fZ1 i YT j1  i  kg [ fZ1 Ui Y j1  i  kg [ fZ1w iYT j1  i  k; w 2 U  g. This means that starting from a set in which we have strings of the form XwUi Y plus other strings, we get a tube with the same strings and other additional strings of the form XUi wY , in which we have rotated the character on the right end to the left end of the string. If we denote this tube with T1 and we reapply the procedure just described (new strings can be created by using the strings with the symbol rotated), we have that this system is able to simulate the production of the grammar G on the right end of the strings and to rotate the characters of the strings to keep a production that is in the middle of the string to the right end. The symbol B indicates the correct place where the string starts, so we can recognize a fully rotated string. (Next we describe in detail STEP 3 of the previous paragraph)

10

We show now how to extract the strings containing only terminal symbols. This operation has to be executed on the string of the form XBwY (i.e. completely rotated, because B is on the left end of the string) where w 2 T  . We can execute this operation on T1 or on T1C with only little di erences. We operate here on T1 after the phase of application of Splicing Rules has been completed. We start isolating the strings that have been completely rotated. We can recognize these strings from the fact that they are the only strings in which the symbol that denotes the start of the string (the character B) is at the right side of the bracket symbol X. We do a Separate operation based on the substring XB: TT;1 = +(T1 ; XB) = Strings containing XB as a substring TT;C 1 = ?(T1 ; XB) = Strings not containing XB as a substring In TT;1 and in TT;C 1 we cannot apply splicing rules. Using a series of Separate operations, we can extract the strings containing at least one non terminal symbols or special symbols (except the symbols X; Y and B). We obtain a tube in which the strings are of the form XBtY , where t is a terminal string. We denote this tube TT;2. In this tube we cannot apply splicing rules so we cannot create new strings. The same is true for every tube obtained after every separate operation just executed. Note: we merge the other tubes just created and then merge the resulting tube with TT;C 1. The obtained tube is the tube on which we have to apply the procedure explained before (this tube contains all the strings in T1 but the strings of terminal symbols completely rotated). We execute now a Merge between TT;2 and T4 : TT;3 = M(T1;2 ; T4). In TT;3 we get the strings fXBtY jt 2 T  g [ fZH ; ZT g By applying Splicing Rules 6 and 7 we get: XBtY; ZH `6 tY; XBZH tY; ZT `7 t; ZT Y If we apply Splicing rules 7 on strings of the form XBtY we get also XBt. After a series of operation of these types we can create nothing new by applying Splicing Rules. In TT;3 we have the strings: fXBtY jt 2 T  g [ fZH ; ZT g [ fXBZH ; ZT Y g [ ftY jt 2 T  g[ [fXBtjt 2 T  g [ ftjt 2 T  g We want to separate the string ftjt 2 T  g from the others, so we execute a series of Separate operation: TT;4 = +(TT;3 ; X) = Strings containing X (XBtY; XBZH ; XBt) TT;C 4 = ?(TT;3 ; X) = Strings not containing X (ZH ; ZT ; ZT Y; tY; t) TT;5 = +(TT;C 4 ; Y ) = Strings containing Y (but not X, because we start from TT;C 4) TT;C 5 = ?(TT;C 4 ; Y ) = Strings not containing Y (and not containing X, because we start from TT;C 4) = Strings fZH g; fZT g; ftjt 2 T  g

11

TT;6 = +(TT;C 5 ; ZH ) = Strings containing ZH (but not containing X neither Y , because we start from TT;C 5) = fZH g TT;C 6 = ?(TT;C 5 ; ZH ) = Strings not containing ZH (and not containing X neither Y , because we start from TT;C 5) = fZT g [ ftjt 2 T  g TT;7 = +(TT;C 6 ; ZT ) = Strings containing ZT (but not containing X; Y and ZH ; because we start from TT;C 6) = fZT g TT;C 7 = ?(TT;C 6; ZT ) = Strings not containing ZT (and not containing X; Y and ZH ; because we start from TT;C 6) = ftjt 2 T  g No one of these tubes can create new string by applying splicing rules. We denote TT;C 7 with TT . TT contains all the terminal strings that can be generated by the grammar G and no other string. The other tubes created in this phase, can be merged in the Garbage tube TG :

5 Observations We de ned this model as starting with exactly four test tubes, and during the computation the number of tubes can vary. Using well known splicing techniques we could start with a single test tube, it has been suggested. The molecules of each of the four families of starting axioms could be marked in four di erent ways, and put together in the single starting tube. Later, when each familiy has to be used, respective strands could be selected and then have the marker removed, so to rebuild one of our four starting test tubes. Other observations could be made with respect to the largest number of test tubes needed at any time during a computation. In our simulation of a grammar having O(k) symbols, a quick analysis shows that we need to perform O(k2 ) merge or separate operations, and we use at most O(2k) test tubes. This can be done if we sort in a smart way all the operations in each of the groups of operations described above. But since these bounds are depending on k, it would be interesting to look for a system operating with bounds constant with respect to the number of symbols.

References [1] L.M. Adleman, On constructing a molecular computer, in DNA Based Computers, Proc. of a DIMACS Workshop, Princeton, 1995, Amer. Math. Soc., 1996 (R.J. Lipton, E.B. Baum, eds.), 1{22. [2a] G. Paun, DNA computing: distributed splicing systems. in Structures in Logic and Computer Science. A Selection of Essays in honor of A. Ehrenfeucht, Lect. Notes in Computer Sci. 1261, Springer-Verlag, 1997. [2b] L. Priese, Y. Rogojin, M. Margenstern, Finite H-systems with 3 test tubes are not predictable, Paci c Symp. on Biocomputing, Hawaii, 1998, World Scienti c. [3] T. Head, Formal language theory and DNA: an analysis of the generative capacity of speci c recombinant behaviors, Bull. Math. Biology, 49 (1987), 737-759. [4] Gh. Paun, G. Rozenberg, A. Salomaa, DNA Computing: New Computing Paradigms, Springer Verlag, 1998. [5] E. Laun, K.J. Reddy, Wet Splicing Systems, Proceedings of the 3rd DIMACS workshop on DNA based computers, 1997, 115-126. 12