In vitro Implementation of Finite-State Machines - CiteSeerX

87 downloads 0 Views 214KB Size Report
A is not methylated to N6-methyladenine. More can be found in 14], which the reader can peruse 14] as a general reference for known methylation effects.
In vitro Implementation of Finite-State Machines M. Garzon ? , Y. Gao, J.A. Rose, R.C. Murphy, R. Deaton, D.R. Franceschetti, and S.E. Stevens, Jr. The Molecular Computing Group The University of Memphis Memphis, TN 38152, U.S.A.

Abstract. We explore the information processing capabilities and ef-

ciency of DNA computations by giving two di erent types of implementations of nite-state machines. A ligation-based approach allows input of arbitrary length and can be readily implemented with current biotechnology, but requires sequential input feed and di erent molecules for di erent machines. In a second implementation not based on ligation, transitions are represented by reusable molecules, and the input, coded as a molecule, can be introduced at once. We extend the technique for programmable fault-tolerant implementation of nondeterministic nite-state machines by enforcings the basic conditions in the subset constructions that permit ecient computation. All implementations allow optical extraction of the status of the machine.

1 Introduction Biological paradigms are now well known to provide fundamentally new insights to computing. Examples include genetic algorithms, genetic programming, and evolutionary programming. They are inspired in biological processes, yet remain only analogies that need to be implemented in electronics. More recently, Adleman [1] has pushed this insight beyond a mere analogy, by showing that real-life processes underlying DNA, such as recombination and separation, can carry computations meaningful to human endeavors. Speci cally, he proposed a way to solve instances of the Hamiltonian Path problem that would make feasible instances inaccessible to conventional computers. Several algorithms such as binary addition [10], real-valued multiplication [15], breadth- rst search and dynamic programming [2] have been implemented using DNA. An important, though implicit, fact in Adleman's approach is the the speci c role that environmental conditions play in the success or failure of computational approaches. Perhaps as important from a practical point of view is to understand the range of feasible tasks that DNA computing can perform eciently and reliably under realistic assumptions on the chemical environments in which DNA computations take place. Much research [1,16,5,6] is also now being done to understand the reliability and feasibility of the techniques for pushing the limits of feasible computation, a necessary step in its development. In this paper, ?

Corresponding author: [email protected]

we continue to explore the power and feasibility of DNA computing by examining its relationship to low level complexity classes. In particular, we explore the recognition of regular languages, a well known and understood complexity class with a wide variety of very practical applications (see e.g., [7]). We show DNA implementations of deterministic and nondeterministic automata that are realistic, fault-tolerant, and can be eciently implemented in vitro. The designs in this paper are intended to serve as a generic algorithm for implementation of a nite state machine using DNA processes. We regard this as a rst step in an investigation of the realistic information processing capabilities of DNA. Preliminary results to this paper have been presented in [18,?]. As with any other form of computation, the method used to perform a DNA computation consists of three basic steps: encoding that maps the problem onto DNA strands, hybridization/ligation that performs the basic core processing, and extraction that makes the results visible to the naked eye. In DNA-based computation, the instances of a problem are encoded in oligonucleotides, or strands, of DNA. The encoding alphabet is the set of nucleic acid bases A , T , G , C , which bind according to the Watson-Crick (WC) complement condition, A = T and C  G, and vice versa. Oligonucleotides bind in an antiparallel way with respect to the chemically distinct ends, 50 and 30, of the DNA molecule. Unlike the ordinary convention, oligonucleotides x are always written in this paper from the 30 to the 50 end throughout, unless we are describing double strands, and of course, the lower oligonucleotide is then always directed from 50 to 30 as a Watson-Crick (WC) complement. Hybridization is a chemical process that joins two complementary single strands into a double strand. Ligation is a chemical process whereby two double strands are joined into one double strand. A restriction enzyme (such as SmaI or EcoRI) is a protein characterized by the double strand DNA sequence which it recognizes (called a site, such as CCC'GGG GGG'CCC for SmaI, and

CTTAA'G G'AATTC

for EcoRI) and cuts into two segments (ending in

CCC GGG

and GGG for SmaI, and CTTAA and AATTCG for EcoRI). See [20,21] for further CCC G background in molecular biology. We now proceed to present two implementations of deterministic fsm's and the extension to nondeterministic devices. The computational steps of the deterministic implementations have been used in [18] but we present some details in order to make the paper fairly self-contained.

2 Implementation with Ligation In a nite-state machine (fsm), the input is given by words over an alphabet written over a one-way read-only tape. The device has a nite control reading problem instances on the input tape (in the form of strings of symbols) and computing according to a prewired program that drives the machine from one memory (source) state to the next (target) state depending on the symbols just

read on the tape. We assume that the reader is familiar with nite-state machines (see [11] for background). An example is shown in Fig. 1 that checks divisibility by 3 in a binary string, which will be used throughout for the implementation in this section. The implementations are based on DNA hybridization, that is, under proper conditions, complementary single-stranded DNA will hybridize to form a double-stranded helix. Error encoding strategies, which are critical to the prevention of unwanted hybridizations, have been discussed elsewhere [3,9] and will be assumed but not addressed here. 0 0

1

1

0 2

1

0

1

Fig. 1. A nite-state machine for divisibility by 3. Since the input to the fsm is represented as binary strings, we need to encode them into DNA strands. We use adapters, i.e., double-stranded segments of DNA, usually with a single-stranded overhang, and name them after the corresponding input symbols. The double-stranded part codes for the previous input symbol (0 or 1; there is a special start symbol in the beginning), and the overhang codes for current state information. The dynamic molecule representing the nite control is a double strand with a segment encoding for the current state and another segment encoding the last symbol read that led to the current state. Speci cally, the state of the nite state machine is encoded by nucleotide sequences containing discriminating bases (lower case 4-mers) given by state 0: ttat state 1: gctg state 2: ctca The start molecule is: GGGGAGATCttatCTTAA CCCCTCTAG For proper simulation of the transition, each of the states needs to be embedded in three double strands with a ring as follows. (A ring is a double strand of

DNA with an unhybridized single-stranded oligo in the form of a loop. The loop size and the size of the nucleotide sequence at the end will vary, depending on the size of the fsm being simulated). In the example, let's call the states 0,1,2 and the input symbols 0 and 1, so that we have 6 combinations 00, 10, 20, 01, 11, 21. Each input 0 (or 1) is represented by the three molecules 00, 10, 20 (01, 11, 21, respectively) given next:

00: 01:

A TCt t

CCCGGGGAG aataGAATTGGGCCCCTC

;

atCTTAAGAG C TAGAATTCTC

CCCGGGGCA aataGAATTGGGCCCCGT

;

tgCTTAAGAG C ACGAATTCTC

T GCg c

Note that this strand is obtained by replacing GCATGCGCTG . Likewise: ACc

00 by

C

t

10:

CCCGGGGAG cgacGAATTGGGCCCCTC

;

caCTTAAGAG C GTGAATTCTC

11:

CCCGGGGCA cgacGAATTGGGCCCCGT

;

atCTTAAGAG C TAGAATTCTC

A TCt t

Note that this strand is obtained by replacing GCAATCTTAT . And likewise: GCg

10 by

20: 21:

T

c

GAGATCTTAT

on the ring in

GAGCACTTCA

on the ring in

CCCGGGGAG gagtGAATTGGGCCCCTC

;

tgCTTAAGAG C ACGAATTCTC

CCCGGGGCA gagtGAATTGGGCCCCGT

;

caCTTAAGAG C GTGAATTCTC

C ACc t

Note that this strand is obtained by replacing GAGTGCGCTG on the ring in GCACACCTCA . Under these encodings, the machine is implemented as follows. Initially the initial state is coupled to beads, and packed into a conventional chromatography column (washing to get rid of uncoupled adapters).

20 by

1.

Step 1:

Input 0 will be an heterogenous aliquote mixture of the three distinct adapters (00, 10 and 20) loaded on column with ligase. Which adapter participates in the reaction (transition) depends on the current state. Only adapters of type 00 (beginning with AATA ) match the ttat in the hangover of the current state representation. After ligation, the remaining unreacted input adapters are washed away and collected. Exonucleoase is applied to chop any residual input adapters and the unreacted previous state, which has a single stranded overhang. The reacting 00's are protected because of the

ring at the end. This further reduces the amount of contamination, keeping the transitions fault-tolerant. 2. Step 2: Adding SmaI, a restriction enzyme which recognizes the palindrome GGGCCC and cuts at the middle, between G and C , giving rise to the blunt CCCGGG end needed for the coupling to the beads. The resulting adapter is coupled to the beads. 3. Step 3: Add EcoRI to cut away the ring at the end, exposing the new state information in the remaining 00's (here TTAT for 0) for the next round (next input symbol). Note that the ATC segment in the start molecule is the same as the ATC at the end of the ring, to be complemented by the TAG of the complementary strand when the ring opens by restriction digestion of EcoRI. Thus the state representation in the surviving 00's is restored as the new current state. The function of the ring segment is two-fold. First, it prevents the false blunt end reaction during a ligation step; second, it allows coupling of the adapter to beads only at the desired end. GAG is the nucletide encoding for binary input 0 and GCA encodes for input 1. The EcoRI restriction site is used for the removal of the end ring sequence to expose the state for subsequent reaction. The adapter itself will be an encoding for binary 0. Steps 1-3 are repeated for as many symbols as necessary. Under these conditions, the biochemical reactions faithfully re ect the transitions of the automaton. For example, if to the start molecule representing the initial state 0 we add adapter 1 plus some required chemicals, the resulting molecule represents state 1 (compare with Fig. 1). One can check the other ve transitions to verify that they do indeed re ect the dynamics of the given fsm (see Table 1). The techniques are well tested and generally used as standard protocols in a molecular laboratory. This method can be used in principle for arbitrary deterministic fsm's by changing the segments representing the state (larger n;mers than 4 for more states). Of course, in practice there are restrictions to how many states can actually be used (although the limitation is only on the n, which still yields an exponential number of state representations). For the extraction process, we can couple a uorescent dye to the DNA segments representing the states and use optical extraction to detect nal states at the end of the computation, Alternatively, the state of the nite control molecule can be extracted by running a DNA agarose gel; in that case, di erent states will have to be represented by segments of di erent length so as to distinguish them by the length of the nal molecule. (Although it is still possible to detect a di erence in segments of the same length with a mass spectrometer, the process is more delicate and time consuming.) We defer discussion of the advantages of this implementation until section 4, where we will extend it to nondeterministic devices.

3 Implementation without Ligation In this section we present a di erent implementationof an arbitrary deterministic nite state machine. It employs molecular representations that are independent of the speci c states and inputs required by the machine, so that they become reusable components. The transition diagram of a fsm (see e.g. Fig. 1) is a subset of the fully connected digraph on the set of given states. This implementation consists of a library of molecules which represent all possible input directed transitions on this fully connected digraph. A transition consists of a source (current) state and a target (next) state. Additional molecules provide distinguished starting and accepting states. The states and transitions of an arbitrary fsm up to a given number of states can then be obtained by de ning the input directed transitions as an appropriate subset of the molecular library. Molecular programming can then be accomplished by selecting and assembling molecular instructions from this library. The key idea is to reduce the fsm computation to path formation, in a manner similar to Adleman's solution to HPP, although the hybridizations are here input directed. Before a set of transition molecules can be manufactured, one must rst establish an encoding that de nes each element of the input and state sets of the digraph described above in terms of oligonucleotides. However, it is better to describe the transition molecules rst.

3.1 The Transition Molecules A transition molecule for a given input directed transition in the digraph is constructed as shown conceptually in Fig. 3(a). It is a circular single-stranded oligonucleotide that consists of a concatenation of four distinct segments of equal length l. (i) a noncoding strand, which contains no meaningful information but is required to be distinct from all codewords for inputs and states, and hence will be inert with respect to hybridizations with them. The noncoding strands in all transition molecules can hence be chosen to be identical polymers of deoxyribothymidine of length l; (ii) an source co-state strand identical to the WC-complement of the strand representing the source state of the transition, so they can stick together by hybridization under appropriate reaction conditions; (ii) a target co-state strand encoding the next state of the transition. This strand will be able to hybridize with any strand containing the WC-complement for the next state. (iv) an input co-symbol identical to the WC-complement of the relevant input symbol so, they can stick together. Now, we describe the construction of several other types of molecules which are necessary to initialize and halt the computation.

3.2 The Starter Molecule. In order to initiate path formation under appropriate reaction conditions, we need a starter molecule for each allowed transition from the initial state of the machine. It has a structure similar to the analogous transition molecule, and is shown conceptually in Fig. 3(b), but di ers in several segments. First, it has a noncoding strand of length l in place of an initial state cocoding strand. Second, this molecule also has a polydeoxyribothymidine strand of length l attached to the 3'-end of its input co-coding strand. Finally, the noncoding strand has a length of 2l, giving each such starter molecule a total length of 6l nitrogenous bases. We discuss below why these modi cations are needed for the molecule to achieve the desired computation.

3.3 The Acceptor Molecule We also need an acceptor molecule for each transition into an accepting state of a fsm. This molecule, shown in Fig. 3(c), is identical to the analogous transition molecule, except that it has both a polydeoxyribo ourouracil strand in place of a next state encoding strand, and a polydeoxyribothymidine in place of an input state co-coding strand. It allows to detect its presence optically in a solution with a certain minimal concentration of such molecules by means of ourescence microscopy [20].

3.4 The Input Molecule Now that we have described the set of transition molecules de ning the library, we consider what structures may be built with them alone. Given a fsm of interest, we create a massive number of its transition molecules through PCR [20]. Combining them under reaction conditions favorable for hybridizations of length l results in concatenations of transition molecules having WC-complementary target state encoding and source state co-coding strands. However, although this set of molecules represents a set of computation paths on the fsm of interest, none of these paths stem from an input string, nor would the majority begin at the fms's starting state. We solve the problem of coupling transition molecule mediated parallel path formation to sequential input by constructing an additional molecule to represent the input string to be tested. This input molecule, shown in Fig. 3(d), is a 5' to 3' ordered concatenation of the codewords for the symbols comprising the input string. At the 5' end of the input string, we attach an additional poly-deoxyriboadenosine sequence of length l. Its purpose is to raise the melting temperature to a level sucient to prevent formation of double-stranded DNA of length l, thus rendering energetically unfavorable the formation of all two molecule hybrids, except for the starter molecule/input-molecule pairs. This step is necessary to enforce the sequentiality of path formation. Moreover, each input symbol is separated from the next in the input string molecule by a short sequence (two molecules) of deoxyriboadenosine. Their purpose is to provide

proper spacing of transition molecules as they sequentially hybridize to the input molecule. Again, we stress that the reaction temperature is such that all hybridization events consist of the cooperative binding of a total of 2l nucleotide pairs. Finally, a poly-deoxyriboadenosine string of length l is attached to the 3' end of each input molecule, which is the end representing the terminus of the input string. From the perspective of an acceptor molecule in solution, it has the e ect of presenting the required 2l nucleotides required for a favorable hybridization event if and only if the molecular assembly in question has completed a computation arriving at an accepting nal state. An important component of the molecular library is a supply of molecules representing input strings of interest. A library of DNA strands representing input symbols used in the construction of our molecular library, can be maintained and used to form input strands by the sequential phosphorylation and ligation of the populations of DNA representing each input symbol comprising the input string, along with associated initiator, spacer, and acceptor strands. This eliminates the need for continuing solid phase synthetic methods once this input symbol library has been established. Alternatively, once a set of completed input strings has been established, new input strings to be tested might be chosen from among complete molecules already synthesized, with an appropriate rede nition of the fsm and its symbol set.

3.5 Simulation The rst step of the simulation is molecular programming, i.e. selecting the appropriate transitions from the molecular library. The second step is the actual reaction, which occurs properly according to the design of the various molecules. The nal phase is separation and detection. Running the reaction solution through a series of liquid chromatography columns [20] will serve to separate smaller molecules, most importantly detection molecules which failed to hybridize, from the ensemble of larger molecular complexes which have formed, the set of which represents the computation. The remaining solution can then be assayed for ourescence, the presence of which symbolizes acceptance of the input string by the nfsm. Thus, detection of computational results is optical. Solutions used in the reaction can then be heated for denaturation, separated by standary means, and stored for reuse.

3.6 Advantages and Disadvantages In addition to the reusablility of molecular components and the programmable nature of the system, there are several other theoretical advantages of our scheme. First of all, unlike other DNA based programs, this technique allows for optical detection of computational results and does not require the use of sequencing and gel electrophoresis, and therefore is a lot faster. Secondly, all the computational reactions proceed through hybridization alone, with no need for ligation. Finally, we point out that reusability and programmability implies

the possibility of an eventual automation of both the programming and reaction phases of the technique. From a practical perspective, there are several issues concerning the technique which must be addressed, prior to experimental work. First of all, an evaluation of the e ect of computational molecules to aquire undesirable secondary, tertiary, or quaternary structure must be performed. Although formation of the rst two types of three dimensional structure should be inhibited energetically by the stated reaction temperature, the e ect of coiling on the behavior of hybridized aggregates has not been addressed in DNA computing. An additional concern is the structural robustness of molecular aggregates which form to represent a computational path. Clearly, all bondings must be at least stable enough to resist denaturation by stresses experienced during laboratory handling. A less serious disadvantage is that an entire input molecule, which may be very long, must be assembled prior to the reaction.

4 Implementation of Nondeterminism This section explores ways to implement nondeterministic machines. The intent is not to have DNA implement a deterministic equivalent (which was shown in the previous section) but rather to understand the ability of DNA to implement nondeterminism as a native mode of computation. Nondeterminism is at the core of the diculties that Adleman's original experiment was designed to overcome. The class NP is de ned as the class of algorithmic problems whose solutions can be veri ed, although not necessarily found, eciently (say, in time polynomial in the size of the input instance). From the computational point of view, given the importance of problems in the class NP that can be eciently solved in nondeterministic mode, implementing nondeterminism by a deterministic process (at least timewise) is perhaps the most remarkable contribution in Adleman's original insight. On the other hand, nondeterminism is supposed to be well understood in the context of nite-state machines, since the well known subset construction [11, Theorem 2.1] produces a deterministic equivalent of a given nondeterministic fsm. It is conceivable that a better idea about the key diculties may be gained by looking for ways to implement nondeterminism directly in DNA. An example is shown in Fig. 2 that checks divisibility by 2 or 3 in a binary string, which will be used throughout for the implementation in this section. A nondeterministic fsm performs computations in runs. A run is a sequence of valid state transitions q0; q1;    ; qm according to the transition table , i.e., so that each qt 2 (qt;1; xt), where x := x;x2    xm is the input string and q0 is the start state (incoming arrow without source). In general, there is more than one legal run for a nondeterministic machine. The nfsm accepts its input x if and only if some run successfully arrives at a nal state (double circled in Fig. 2; the nal states are part of the machine's speci cation). Alternatively, one can view a nondeterministic fsm as an ordinary deterministic fsm with the capability of making copies of itself when making a nondeterministic move (more than

one next state possible in the transition diagram), with each one of the copies covering for one of the choices in the move. This latter view is particularly suited for DNA implementation due to the massive parallelism in DNA computations. The method used below to implement nondeterminism is an extension of the method used in Section 2. Again, error preventing strategies, which are critical to the prevention of unwanted hybridizations, have been discussed elsewhere [3,4,6] and will be assumed but not addressed here.

[0]

0

0

0

[0,3]

1 1

1

[2,3]

0

[0,2]

1

1 1

0

0 0

[1,3]

[1,2]

1

Fig. 2. A nondeterministic nite-state machine for divisibility by 2 or 3. The single-stranded overhang is, as before, named after the corresponding input symbols. The double-stranded part codes for the previous input symbol ( GAG is the nucletide encoding for binary input 0 and GCA encodes for input 1; there is a special start molecule), and the overhang codes for current state information. The dynamic molecule representing the nite control is a double strand with a segment encoding for the current state and another segment encoding the last symbol read that led to the current state. Speci cally, the state of the nite state machine is encoded by nucleotide sequences containing discriminating bases (lower case 4-mers) given by state [0]: atat state [0,3]: ttat state [1,3]: gctg state [2,3]: ctca state [0,2]: tatt state [1,2]: gtcg The start molecule is: GGGGAGATCatatCTTAA CCCCTCTAG For proper simulation of the transitions, each of the states needs to be embedded within a ring in as many molecules as there are transitions emanating from it as follows. The encoding is similar as for the deterministic case in Sec-

tion 2, so we sketch it brie y emphasizing the additional requirements with the new example. The main di erences lie in the protocol and they will be discussed next in Section 4.1. Let's call the states as they are labelled in Fig. 2, i.e., [0], [0,3], [1,3], [2,3], [0,2], and [1,2], and the input symbols 0 and 1, so that we have 12 combinations [0]0, [0]1, [0,3]0, [0,3]1, [1,3]0, [1,3]1, [2,3]1, etc. Each input d (0 or 1) is represented by the seven molecules representing the transitions (p; d) labelled by d, where p is a state, as shown next:

[0]0:

A TCt t

CCCGGGGAG tataGAATTGGGCCCCTC

T TCt a

CCCGGGGAG tataGAATTGGGCCCCTC

;

;

atCTTAAGAG C TAGAATTCTC

, and additionally

ttCTTAAGAG C AAGAATTCTC

Note that the second molecule is obtained by replacing rst ring by GAGTTCtatt .

[0]1:

C GCg t

CCCGGGGCA tataGAATTGGGCCCCGT

T GCg c

CCCGGGGCA tataGAATTGGGCCCCGT

;

;

cgCTTAAGAG C GCGAATTCTC

GAGATCttat

, and additionally

tgCTTAAGAG C ACGAATTCTC

Note that these molecules are obtained by replacing on the rst two rings by GCACGCgtcg and spectively. Likewise:

GAGTTCtatt

A TCt t

[0,3]0:

CCCGGGGAG aataGAATTGGGCCCCTC

;

atCTTAAGAG C TAGAATTCTC

[0,3]1:

CCCGGGGCA aataGAATTGGGCCCCGT

;

tgCTTAAGAG C ACGAATTCTC

T GCg c

Note that this strand is obtained by replacing GCATGCgctg . Likewise:

[0,3]0 by

[1,3]0: [1,3]1:

on the

C ACc t

GAGATCttat

CCCGGGGAG cgacGAATTGGGCCCCTC

;

caCTTAAGAG C GTGAATTCTC

CCCGGGGCA cgacGAATTGGGCCCCGT

;

atCTTAAGAG C TAGAATTCTC

A TCt t

GAGATCtatt GCATGCgctg

and , re-

on the ring in

Note that this strand is obtained by replacing GAGCACctca on the ring in GCAATCttat . Similarly for the other states:

[1,3]0 by

[2,3]0:

T GCg c

CCCGGGGAG gagtGAATTGGGCCCCTC

;

tgCTTAAGAG C ACGAATTCTC

CCCGGGGCA gagtGAATTGGGCCCCGT

;

caCTTAAGAG C GTGAATTCTC

[0,2]0:

CCCGGGGAG ataaGAATTGGGCCCCTC

;

ttCTTAAGAG C AAGAATTCTC

[0,2]1:

CCCGGGGCA ataaGAATTGGGCCCCGT

;

cgCTTAAGAG C GCGAATTCTC

[1,2]0:

CCCGGGGAG cagcGAATTGGGCCCCTC

;

ttCTTAAGAG C AAGAATTCTC

[1,2]1:

CCCGGGGCA cagcGAATTGGGCCCCGT

;

cgCTTAAGAG C GCGAATTCTC

[2,3]1:

C ACc t T TCt a

C GCg t T TCt a

C GCg t

:

4.1 Computational Protocol Under these encodings, the machine could be implemented as described in Section 2 for the deterministic case. For example, at the beginning of the reaction, either of the two adapters of type [0]0 (leading to [0,3] or [0,2]) has a tata hangover that matches atat in the hangover of the current state representation (the start molecule). Under similar conditions, the biochemical reactions faithfully re ect the nondeterministic transitions of the automaton. For example, if to the start molecule representing the initial state [0] we add adapter 0 plus some required chemicals, molecules representing states [0,3] and [0,2] will be produced (compare with Fig. 2); likewise, on input adapter 1, molecules representing states [1,3] and [1,2] will be produced. One can check the other transitions to verify that they do indeed re ect the dynamics of the given nfsm (some are shown in Fig. 2).

4.2 Extraction For the extraction process, we can attempt to couple a uorescent dye to the DNA segments representing the states and use optical extraction to detect nal states at the end of the computation, as was done in [9] and above. The problem in the case of a nondeterministic computation, however, is that there may be many copies of the nite-control in di erent states, and we want to detect whether any one of them is one of the nal states. It would be desirable to maintain the number of molecules representing any particular state within a certain range, so that all computation paths are taken into account in the extraction

process (avoid false negatives), and yet do not become so numerous for certain states that other current states may become undetectable. The key to the success in the subset construction that determinizes a nondeterministic fsm is that whenever two nondeterministic copies of the machine nd themselves in the same state, one can safely discard one of them since all its runs will be identical thereafter to the other's. Since the intent is to understand the ability of DNA to implement nondeterministic computation, it is desirable to have a biochemical procedure that renders the implementation ecient in the tube, in the sense that it will self-regulate to produce approximately equal concentrations of the molecules representing the various states present. To solve this problem, we propose a slight modi cation of the state representation and the use of the methylation process, occurring in living cells and implementable in the lab, described as follows. Any living cell that produces a restriction enzyme must identify and protect its own DNA from restriction. In order to prevent restriction of a cell's own DNA when the cell produces a restriction enzyme, the cell also produces a methylase enzyme (methyltransferase) which methylates, i.e., adds a -CH3 chemical group to certain bases within or near, the restriction sites. Nearly all restriction enzymes are thus inhibited by certain methylations of cytosine or adenine bases within their restriction sites. Common methylations include: N4;methylcytosine (the nitrogen at position 4 of cytosine is methylated); C5;methylcytosine (position 5 carbon of cytosine methylated, etc); hydroxymethylcytosine; and N6;methyladenine. In our case, EcoRI recognizes G'AATTC (i.e., GAATTC but we will refer to the top single CTTAAG strand for simplicity) and cuts it as described above; however, it is not cut if the upper C is methylated to C5;methylcytosine. Likewise for HindIII, which recognizes and cuts A'AGCTT . KpnI recognizes and cuts GGTAC'C but only if A is not methylated to N6;methyladenine. More can be found in [14], which the reader can peruse [14] as a general reference for known methylation e ects for restriction enzymes. We can thus accomplish the self-regulating e ect described earlier as follows. First, methylate a fraction of the input molecules being put into the reaction (what fraction exactly will depend on the reaction conditions of the implementation); second, represent states by palindromic restriction sites, so that state molecules will hybridize with other like molecules representing the same state, if not properly methylated; third, add, before entering the next input symbol molecules, an appropriate amount of the appropriate restriction enzyme as well. These enzymes will then cut and stop further expansion of this copy of the fsm in the reaction. However, the methylated bases found within the fraction of the restriction sites formed will not be cut by the restriction enzyme. Thus, the constant combined presence of restriction enzymes and methylated input bases will guarantee that the number of molecules in the tube can be maintained within a bounded range for each state, so that they will be fairly represented as the implementation proceeds.

4.3 Advantages and Disadvantages

Important advantages of this and the implementation in section 2 are: (a) as the computation proceeds, the size of the molecules representing the nitestate control remains unchanged, independent of the input size; (b) the reactions proceed in a cycle that can be easily automated; (c) the deciding state is easily extracted by standard molecular biotechnology once a minimum concentration of each state is present; (d) the implementation can be further made fault-tolerant. In addition to faulttolerant encodings, appropriate concentrations of current state molecules are assured by built-in self-control in the reaction; (e) strands of arbitrary length can be processed by feeding them (symbol by symbol or segmented in chunks) with nite resources over long periods of time; (f) the reaction is more ecient because all reactions occur in the same tube and the ordinary loss of material during tube transfers and extractions is eliminated; (g) the molecules representing states and transtitions can be standardized so as to make them independent of speci c states and inputs required by the machine. For example, the transition diagram of a fsm on binary inputs and 6 states (e.g., Fig. 2) is a subset of the fully connected digraph on the 6 states. Thus a library of molecules which represent all possible input directed transitions on this fully connected digraph could be created once and for all for a given number inputs b and states m, analogously to the library for the ligation-free implementation. The states and transitions of an arbitrary fsm up to m states can then be obtained by de ning the input directed transitions as an appropriate subset of the molecular library. Molecular programming can then be accomplished by selecting and assembling molecular instructions from this library. The molecules can be, in fact, reusable components. On the other hand, for very long input strings, this implementation would be very time-consuming since the input has to be fed manually and time must be allowed for biochemical reactions to occur (orders of magnitude slower than electronic implementations). Therefore this implementation will be suitable for large state sets, or in applications in which speed is not an issue. In medical applications for example, an organism may potentially interact much more easily with a biomolecule than with a larger electronic devise inside the body. One might just introduce the molecule representing the initial state of the nitecontrol, appropriately isolated, into a host, and let it supply its own inputs, support, and regulate the environmental conditions for the implementation.

5 Conclusion We give three types of implementations of the simplest nontrivial information processing model, the nite-state machine. Each implementation has its own advantages and disadvantages. The ligation-based approach allows input of arbitrary length, but requires sequential input feed and requires di erent molecules for di erent machines (although the state states can be standarized). The ligation-free model requires that a molecule of the size the length of the input

be fed all at once and is limited to strings and transitions of a certain size, but, on the other hand, it is programmable, reusable, and runs much faster. The nondeterministic implementation has the added advantage that it is self-regulating, hence more error-tolerant and ecient. Self-regulation is accomplished by constantly exposing methylated and nonmethylated input strands to the restriction enzymes in the tube. All three allow optical extraction. The nal test of the quality of our approaches ultimately relies on its success in the lab, which is yet to be veri ed experimentally. Even if this success is limited, it is well known that the number of states required by a nondeterministic machine may be exponentially smaller than the determinisitc equivalents in the worst case. Nondeterministic fsm's can be much more compact and semantically preferable for applications (see, for example, [11,17]). Finally, fsm implementations should shed light on the information processing capabilities of DNA computations. For example, in practice, the subset construction does not seem to blow out the number of states exponentially on the average [12], although the latter is probably superpolynomial. Experimental implementation of the sort proposed here may a ord an estimation of the exact growth rate of the trade-o since large fsm's can be implemented.

References 1. L.M. Adleman (1994). Molecular Computation of Solutions to Combinatorial Problems. Science 266, 1021-1024. 2. E. Baum (1996). Running Dynamic Programming algorithms on a DNA computer. in [16], 141-147. 3. R. Deaton, R.C. Murphy, M. Garzon, D.R. Franceschetti, S.E. Stevens, Jr. (1995). Good Encodings for DNA-based Solutions to Combinatorial Problems. In [16], 159-171. 4. R. Deaton, M. Garzon, R.C. Murphy, J.A. Rose, D.R. Franceschetti, S.E. Stevens, Jr.. Realiability and Eciency of a DNA Computation. Physical Review Letters, in press. 5. R. Deaton, M. Garzon, R.C. Murphy, J.A. Rose, D.R. Franceschetti, S.E. Stevens, Jr.. Genetic Search of Realiable Encodings for DNA-based Computation. In LateBreaking Papers at the Genetic Programming Conference, Stanford University, July 1996, pp 9-15. 6. R. Deaton, D.R. Franceschetti, M. Garzon, J.A. Rose, R.C. Murphy, S.E. Stevens, Jr. Information Transfer through Hybridization Reactions in DNA based Computing. In [13]. 7. M. Garzon, E. Eberbach. Dynamical Implementation of Nondeterministic Automata and Concurrent Systems. In [17]. 8. M.R. Garey, D.S. Johnson (1979). Computers and Intractability, Freeman, New York. 9. M. Garzon, P. Neathery, R. Deaton, R.C. Murphy, D.R. Franschetti, S.E. Stevens Jr. A New Metric for DNA Computing. In [13]. 10. F. Guarnieri, M. Fliss, C. Bancroft (1996). Making DNA Add. Science 273 220223. 11. J.E. Hopcroft, J.F. Ullman: Introduction to automata theory, languages and computation. Addison-Wesley, Reading MA, 1979

12. J.H. Johnson, D. Wood (1997): Instruction Computation in Subset Construction. In [17], 1-9. 13. J.R. Koza, K. Deb, M. Dorigo, D.B. Fogel, M. Garzon, H. Iba, R.L. Riolo (eds.) (1997). Proceedings of the Second Annual Genetic Programming Conference, Stanford University. San Francisco, CA: Morgan Kaufmann. 14. M. Nelson, E. Raschke, M. McClelland (1993): E ect of site-speci c methylation on restriction endonucleases and DNA modi cation methyltranferases. Nucleic Acids Research, 21:13, 3139. 15. J.S. Oliver (1996): Computation with DNA-Matrix Multiplication. in [16], pp 236248. 16. E. Baum, D. Boneh, P. Kaplan, R. Lipton, J. Reif, N. Seeman (eds.) (1997). Second Annual Meeting on DNA based computers, DIMACS workshop, Princeton University, 1996. To be published in DIMACS series of the American Mathematical Society. 17. D. Raymond, D. Wood (eds) (1997). Automata Implementation. Proc. First Int. Workshop on Implementing Automata, WIA'96, London, Ontario, 1996. Lecture Notes in Computer Science 1260, Springer-Verlag, 1997. 18. J.A. Rose, Y. Gao, M. Garzon, and R. C. Murphy DNA Implementation of FiniteState Machines. In [13]. 19. J.A. Rose, Y. Gao, M. Garzon, and R. C. Murphy DNA Implementation of nondeterminism. In Proc. [22]. 20. L. Streyer (1995). Biochemistry. Freeman & Co. 21. J.D. Watson, Hopkins, N. H., Roberts, J. W., Steitz, J. A., and Weiner, A. M. (1987). Molecular Biology of the Gene. The Benjamin/Cummings Publishing Co., Inc, Menlo Park, CA fourth edition. 22. D. Wood, R. Lipton, J. Reif, N. Seeman (eds.) (1997). Third Annual Meeting on DNA based computers, DIMACS workshop, U. of Pennsylvania, June 1997.

noncoding

noncoding

noncoding

target co-state

poly-deoxyribothymidine Fig. 3(a):

+

source co-state

target co-state

+

input co-symbol

Starter molecule.

+ ... +

nal state

input co-symbol

poly-deoxyribo uoracil

poly-deoxyribothymidine

Transition molecule

Acceptor molecule

Fig. 3(b):

Fig. 3(c):

...

poly-deoxyriboadenine

input symbol #1 Fig. 3(d):

spacer

input symbol #2

poly-deoxyriboadenine

Input molecule

Fig. 3. Molecules involved in the ligation-free implementation of a nite-state machine.

(For illustration, circular strands are drawn as squares; they actually assume a similar shape upon partial hybridization.) These molecules hybridize to the template input molecule in the order shown by the + signs (the transition molecule occurs multiple times).

From state 0: On input 0, add adapter 0, ligate with T4 DNA ligase, cut with SmaI, couple with beads, and cut with EcoRI. The reaction amounts to staying in state 0: GGGGAGATCttatCTTAA CCCCTCTAG

+

GGGGAGATCTTATCTTAACCC GGGGAG CCCCTCTAGAATAGAATTGGG CCCCTC

A TCt t

ligase ;! atCTTAAGAG C ; TAGAATTCTC SmaI ;EcoRI GGGGAGATCttatCTTAA ;!;! CCCCTCTAG

CCCGGGGAG aataGAATTGGGCCCCTC

A TCt t

;

atCTTAAGAG TAGAATTCTC

C

Likewise, on input 1: GGGGAGATCttatCTTAA CCCCTCTAG

+

GGGGAGATCTTATCTTAACCC GGGGCA CCCCTCTAGAATAGAATTGGG CCCCGT

CCCGGGGCA aataGAATTGGGCCCCGT

T GCg c

;

tgCTTAAGAG ACGAATTCTC

T GCg c

C

ligase ;! tgCTTAAGAG C ; ACGAATTCTC SmaI ;EcoRI GGGGCATGCgctgCTTAA ;!;! CCCCGTACG

From state 1: On input 0, go to state 2: GGGGCATGCgctgCTTAA CCCCGTACG

+

GGGGCATGCGCTGCTTAACCC GGGGAG CCCCGTACGCGACGAATTGGG CCCCTC

C ACc t

ligase ;! caCTTAAGAG C ; GTGAATTCTC SmaI ;EcoRI GGGGAGCACctcaCTTAA ;!;! CCCCTCGTG

CCCGGGGAG cgacGAATTGGGCCCCTC

C ACc t

;

caCTTAAGAG GTGAATTCTC

C

On input 1, go to state 0: GGGGCATGCgctgCTTAA CCCCGTACG

+

GGGGCATGCGCTGCTTAACCC GGGGCA CCCCGTACGCGACGAATTGGG CCCCGT

CCCGGGGCA cgacGAATTGGGCCCCGT

A TCt t

;

atCTTAAGAG TAGAATTCTC

A TCt t

ligase ;! atCTTAAGAG C ; TAGAATTCTC SmaI ;EcoRI GGGGCAATCttatCTTAA ;!;! CCCCGTTAG

C

From state 2: On input 0, go to state 1: GGGGAGCACctcaCTTAA CCCCTCGTG

+

GGGGAGCACCTCACTTAACCC GGGGAG CCCCTCGTGGAGTGAATTGGG CCCCTC

CCCGGGGAG gagtGAATTGGGCCCCTC

T GCg c

;

tgCTTAAGAG GTGAATTCTC

T GCg c

C

tgCTTAAGAG ACGAATTCTC

; SmaI ;EcoRI ;!;!

C

ligase

;!

GGGGAGTGCgctgCTTAA CCCCTCACG

On input 1, the reaction amounts to staying in state 2: GGGGAGCACctcaCTTAA CCCCTCGTG

+

GGGGAGCACCTCACTTAACCC GGGGCA CCCCTCGTGGAGTGAATTGGG CCCCGT

C ACc t

CCCGGGGCA gagtGAATTGGGCCCCGT

C ACc t

;

caCTTAAGAG GTGAATTCTC

C

ligase ;! caCTTAAGAG C ; GTGAATTCTC SmaI ;EcoRI GGGGCACACctcaCTTAA ;!;! CCCCGTGTG

Table 1. Reactions re ect the transition table of the dfsm.

From state [0]: The reaction takes the automaton nondeterministically to states [0,3] and [0,2]: A TCt t ligase GGGGAGATCatatCTTAA + ;! CCCGGGGAG atCTTAAGAG C CCCCTCTAG tataGAATTGGGCCCCTC ; TAGAATTCTC A TCt t SmaI ;EcoRI GGGGAGATCttatCTTAA GGGGAGATCATATCTTAACCC ;!;! CCCCTCTAG GGGGAG atCTTAAGAG C CCCCTCTAGTATAGAATTGGG CCCCTC ; TAGAATTCTC GGGGAGATCatatCTTAA CCCCTCTAG

+

GGGGAGATCATATCTTAACCC GGGGAG CCCCTCTAGTATAGAATTGGG CCCCTC

T TCt a

CCCGGGGAG tataGAATTGGGCCCCTC

A TCt a

;

ttCTTAAGAG AAGAATTCTC

ttCTTAAGAG AAGAATTCTC

; SmaI ;EcoRI ;!;!

C

C

ligase

;!

GGGGAGTTCtattCTTAA CCCCTCAAG

Likewise, on input 1, go to states [1,3] and [1,2]: GGGGAGATCatatCTTAA CCCCTCTAG

+

GGGGAGATCTTATCTTAACCC GGGGCA CCCCTCTAGAATAGAATTGGG CCCCGT GGGGAGATCatatCTTAA CCCCTCTAG

+

GGGGAGATCATATCTTAACCC GGGGCA CCCCTCTAGTATAGAATTGGG CCCCGT

C GCg t

CCCGGGGCA tataGAATTGGGCCCCGT

C GCg t

;

cgCTTAAGAG GCGAATTCTC

C

T GCg c

CCCGGGGCA tataGAATTGGGCCCCGT

T GCg c

;

tgCTTAAGAG ACGAATTCTC

ligase ;! cgCTTAAGAG C ; GCGAATTCTC SmaI ;EcoRI GGGGCACGCgtcgCTTAA ;!;! CCCCGTGCG

C

ligase ;! tgCTTAAGAG C ; ACGAATTCTC SmaI ;EcoRI GGGGCATGCgctgCTTAA ;!;! CCCCGTACG

From state [0,3]: On input 0, stay in state [0,3]: A TCt t ligase GGGGAGATCttatCTTAA + ;! CCCGGGGAG atCTTAAGAG C CCCCTCTAG aataGAATTGGGCCCCTC ; TAGAATTCTC A TCt t SmaI ;EcoRI GGGGAGATCttatCTTAA GGGGAGATCTTATCTTAACCC ;!;! CCCCTCTAG GGGGAG atCTTAAGAG C CCCCTCTAGAATAGAATTGGG CCCCTC ; TAGAATTCTC

Likewise, on input 1, go to state [1,3] GGGGAGATCttatCTTAA CCCCTCTAG

+

GGGGAGATCTTATCTTAACCC GGGGCA CCCCTCTAGAATAGAATTGGG CCCCGT

CCCGGGGCA aataGAATTGGGCCCCGT

T GCg c

;

tgCTTAAGAG ACGAATTCTC

T GCg c

C

ligase ;! tgCTTAAGAG C ; ACGAATTCTC SmaI ;EcoRI GGGGCATGCgctgCTTAA ;!;! CCCCGTACG

From state [1,3]: On input 0, go to state [2,3]: GGGGCATGCgctgCTTAA CCCCGTACG

+

GGGGCATGCGCTGCTTAACCC GGGGAG CCCCGTACGCGACGAATTGGG CCCCTC

C ACc t

CCCGGGGAG cgacGAATTGGGCCCCTC

C ACc t

;

caCTTAAGAG GTGAATTCTC

C

ligase ;! caCTTAAGAG C ; GTGAATTCTC SmaI ;EcoRI GGGGAGCACctcaCTTAA ;!;! CCCCTCGTG

Table 2. All the reactions re ect the transition table of the nfsm (only a sample is shown). The protocol is, on input d, to add seven adapters pd (some methylated), add T4 DNA ligase, cut with SmaI, couple with beads, and cut with EcoRI.