many DNA molecules resides between NCk and SACk+1 ... reaches the practical limit, the Avogadro number 6:02 1023. The total ... take us to a place that we have not been in before, and this is the theme of the infant area of DNA .... always accurate, provided that there are at least 10?15 moles of DNA strands per liter.
The Minimum DNA Computation Model and Its Computational Power Animesh Rayy
Mitsunori Ogihara
Technical Report, TR-672 Department of Computer Science University of Rochester Rochester, NY 14627 December, 1997
Abstract
This paper studies seemingly the smallest DNA computational model. This model assumes as its computation basis merge, detect, synthesize, anneal, and length-speci c separation, but does not assume sequence-speci c separation as in many other DNA computational models. Uncertainty occurring in some of the operations is taken into consideration, and the decision by computation under the model is de ned in terms of robustness. This paper shows tight upper bounds on the power of this computational model in terms of circuits. For every k 1, the languages robustly accepted by programs under this model in O(logk n) steps using polynomially many DNA molecules resides between NCk and SACk+1 .
1 Introduction DNA computing is an emerging eld bridging the gap between computer science and biochemistry, where computation is performed by accessing information stored in DNA strands with biochemical operations. The potential of DNA as a computing device lies in the fact that DNA has gigantic memory capacity (under a reasonable concentration, a liter of DNA solution can store approximately 1022 bits of information), and in the fact that biochemical operations are massively parallel. This new paradigm of computation was proposed by Adleman in 1994 [1]. He presented an algorithm based on well-established recombinant DNA techniques for the Hamilton Path Problem, a well-known NP-complete problem. Adleman also ran a laboratory experiment and successfully solved a small test instance. The basic idea in Adleman's approach is to trade the volume of DNA for nondeterminism. With his algorithm, one rst synthesizes all paths, each encoded as a DNA strand, next separates DNA strands encoding Hamilton paths, and nally answers the existence problem of Hamilton paths by testing whether any DNA strands have been separated. This approach can be applied to other NP problems. Lipton generalized Adleman's idea and showed that SAT, the standard NP-complete problem which logically characterizes NP, can be solved using recombinant DNA techniques [9]. This work is supported in part by NSF CAREER Award CCR-9701911 and by DARPA/NSF Grant 9725021. Department of Biology, University of Rochester, Rochester, NY 14627 Supported in part by NSF Grant MCB9630402. y
1
The papers by Adleman and Lipton might raise the hope that one could easily solve NP-complete problems with methods in biochemistry. However, there are two drawbacks in this approach: First, biochemical operations are slow and prone to errors. Second, as pointed out by Hartmanis [8], the total volume of DNA necessary for computation increases exponentially, and thus, very rapidly reaches the practical limit, the Avogadro number 6:02 1023 . The total volume of DNA necessary for computation with Adleman's method reaches the limit at the instance size of around 60 (and the number is around 70 in Lipton's algorithm), and such small instances can be rapidly solved by conventional computers. This analysis leaves us with the question whether DNA computing will take us to a place that we have not been in before, and this is the theme of the infant area of DNA computing. An approach to the question is studying the lower and upper bounds on the power of DNA computational models in terms of traditional resource-bounded complexity. Much work has been done in this regard. In particular, variations of the models by Adleman and Lipton, in which computations take place by applying operations to an exponential-size set of strands created beforehand, are well-studied in the literature [2, 4, 5, 6, 7, 12, 13, 18]. These papers show that the power of polynomial-time computation in such models will become, depending on the choice of permissible biochemical operations (realistic or unrealistic), (a) PBP (the polynomial-size branching programs) [18], (b) p2 [6], (c) p3 [13], (d) PH [5], or (e) PSPACE [4, 12]. Of all the variations that have been considered, the smallest is the model studied in (a), namely, the Restricted Model proposed by Adleman [2], a library-based model assuming only Merge, Detect, and Separate. One may ask whether there are smaller DNA computational models, i.e., whether some of these three operations can be eliminated or can be replaced by weaker operations. Among those three, Detect is the weakest possible method for reading out information from DNA strands, and Merge is the only possible method for combining information. Thus, Separate is the only possible candidate for elimination. So, we ask: How much weaker would the power of computation become were it not for the sequencespeci c separation? To understand the power of the sequence-speci c separation, this paper presents the Minimum Model , which assumes seemingly the smallest set of feasible biochemical operations. The model assumes Merge and Detect , and replaces Separate by Synthesize (synthesis of DNA strands), Anneal (annealing of DNA strands), and Length (separation of DNA strands based only on their lengths but not their sequence). Here the last three are operations that are assumed implicitly in many of the previously studied models. We split a program under the Minimum Model into three phases. The rst phase deals with preparation. This phase consists of Synthesize and Merge steps, all necessary strands are synthesized and put together into test tubes. The second phase deals with reaction. The phase consists of Merge, Anneal, and Length steps. The last phase deals with output and consists of a single Detect step. The complexity of programs is measured by time, de ned as the number of steps in the second phase. We show for every k 1, that the languages decided by polynomial-size programs under the Minimum Model in time O(logk n) reside between NCk and SACk . This improves the result shown in [11], that DNA computation can simulate SACk?1 in time O(logk n). First, our model assumes a fewer number of basic operations. Second, we show not only SACk?1 , but also ACk?1 , or even NCk , can be simulated in O(logk n) steps. Third, we treat more carefully the uncertainty involved in some biochemical operations. Finally, we prove a tight upper bound. Such a result is not shown in [11]. 2
u
v
5’-
-3’
3’-
-5’
λ(u,v)
Figure 1: hu; vi
2 The Minimum Model
2.1 The liquid-phase DNA chemistry
DNA is a charged copolymer of a repeating sugar-phosphate chain of inde nite length to which are attached one of four organic bases (A, T, G, and C) [17]. Each sugar-phosphate-base unit is a nucleotide. The chain is chemically and geometrically polar, with the so-called 50 -end and the 30 -end. A single polymer chain can have any sequence of bases from the 50 -end to the 30 -end. Each base has one of the four bonding surfaces, and the bonding surface of A is complementary to that of T, and of G to that of C. Because of this complementarity rule, a DNA chain can pair with another chain only when their sequences of bases are mutually complementary. Also, the chains must pair head-to-tail, i.e., antiparallel to each other. When collections of complementary single strands are mixed, heated and then slowly cooled, the complementary strands nd one another and double stranded DNA is formed (the process of annealing) by a second order reaction kinetics. This is DNA hybridization [16].
2.2 The encoding scheme
As in many other papers, programs under the Minimum Model employs an encoding scheme that is dependent only on the input size. The single strands that appear during the computation are taken from S0 , called the base strands, and from L, called the linker strands. The set S0 consists of strands of base length ` for some `, and is split into two groups, the left strands SL and the right strands SR . The set L consists of all linker strands (u; v) of some u 2 SL and v 2 SR , where (u; v) is the strand that is complementary antiparallel to u v. Thus, all the strands in L are of base-length 2`. When the linker (u; v) hybridizes with u and v, it forms a complete-double-strand with the u v-part disconnected between the 30 -end of u and the 50 -end of v (see Figure 1). We use hu; vi to denote this double-stranded triplet composed of u, v, and (u; v). The patterns S0 are designed so that each base strand u anneals only to a linker strand of u and some other strand v. In order to satisfy the condition, we demand that the patterns in S0 be chosen so that no two distinct patterns from it share a common sub-pattern of length `=K for some constant K > 1. There is a systematic way to construct such a set of patterns, where ` can be made O(log k S0 k) [3].
2.3 The basic operations
Merge In this operations, contents of two test tubes are mixed into one test tube. The merge
operation is done by pouring the contents of two input test tubes into an output test tube. In order to minimize loss of DNA molecules by pouring, the input test tubes may be repeatedly \washed." Here \washing" means adding solvent to resolve remaining strands and then pouring the solution into the output test tube. A realistic assumption here is that, even with \wash-out" a very small fraction of the strands will be lost. We will use mer ; 0 < mer < 1, to denote the upper bound on the fraction of DNA strands that may be lost in a Merge step. 3
degraded
2l
Figure 2: Action of mung bean nuclease
Detect The detect operation can be implemented by radioactive labeling (usually by using the
isotope 32 P) or uorescent labeling (by conjugative to uorescent probe) DNA strands, and by detecting the radioactivity or the uorescent light (see [11]). This operation can be assumed to be always accurate, provided that there are at least 10?15 moles of DNA strands per liter.
Synthesize Synthesis of strands in SL [SR [L is performed with an automated DNA synthesizer.
In order to synthesize a certain number, say M , of copies of a single-stranded pattern p, one sequentially lines up M molecules of the bases of p and turns on the synthesizer; then the synthesizer sequentially connects the lined-up molecules. Since the quantities of DNA molecules supplied at the bases are not perfectly controllable, the actual volume of the synthesized DNA strands may dier from the desired amount M . A realistic assumption here is that the volume control is possible only within a small factor. There is a small constant syn ; 0 < syn < 1, such that wishing to synthesize M copies of a pattern p, one may generate M 0 copies of p in the range (1 + syn )?1 M M 0 (1 + syn ) M: Also, we assume that there is a lower bound MINsyn such that the synthesis operation cannot be conducted for the purpose of synthesizing less than MINsyn molecules of a strand. Furthermore, due to the inaccuracy of the quantities of the molecules at the bases, strands shorter than p may be synthesized as residuals. Such residuals can be eliminated by the use of Length , which will be described below.
Anneal DNA-DNA hybridization is not exhaustive; rather it reaches its equilibrium, where the ratio of free molecules and hybridized ones at the equilibrium is dependent on the base sequence, temperature and concentration of some salts [16]. In addition, short stretches of sequence complementary between two over-all noncomplementary molecules may lead to `imperfect hybrids'. We need to eliminate such imperfect hybrids from the test tube. To accomplish that, right after annealing we let Mung Bean nuclease [10] act to degrade single-stranded DNA molecules into mononucleotides. Due to the error-correcting design of the base strands S0 , every double-strand with incorrect hybridization has single-stranded parts. Ming bean nuclease breaks strands with incorrect hybridization into completely-double-stranded pieces, thereby make them shorter than 2`-base long sequences (see Figure 2). On the other hand, any completely-double-stranded triplet is not aected by the use of mung bean nuclease, so its length 2` will be maintained. The use of mung bean nuclease necessitates elimination of double-strands of length less than 2`. This will be accomplished by Length. For S0 of size 1022 , without error-correcting coding ` becomes approximately 70. It is possible that many of these 1022 sequences will have 5 or more consective base complementarity tp a portion of wrong (u; v). In these cases, the imperfect hybrids will be cleaved by the Mung bean nuclease to products of length less than `. 4
Length This operation is performed by running the contents of a given test tube either on a
denaturing gel or on a non-denaturing gel. The speed that DNA strands move in the gel is higher for shorter strands, so we can classify the strands based on their lengths. The operation is highly accurate and even one-base dierence can be detected [15]. Recent technical advances have made this technique robust and rapid [14]. In order to accomplish the separation of length 2` doublestrands in the above we run the contents of the test tube on a non-denaturing gel. Other than that, Length will be performed to separate either all length ` strands or all length 2` strands on a denaturing gel. In the simulation method, we need only separation of length ` strands. So, we let Anneal consists of (i) anneal with elimination of imperfect double strands, (ii) separation of length 2` complete double-strands on a non-denaturing gel, and (iii) separation of length ` components of the length 2` strands from (ii). If one of the two strands is anity labeled, then separation can be achieved by denatruing and anity puri cation. Here we assume that there exists a xed constant ann ; 0 < ann < 1, such that for each legitimate triplet combination the ann fraction of the molecules will be lost during the annealing step.
2.4 Programs under the Minimum Model
Programs under the Minimum Model are speci ed by the base strands S0 and the instructions I , where I is a sequence of permissible operations on indexed test tubes. The instructions are divided into three phases. The rst phase consists of Synthesize and Merge, where strands necessary for computation are prepared in test tubes. Here each Synthesize may depend on a single input bit. The total number of steps in this phase should be bounded by some polynomial in the input size. The second phase consists of Anneal, Merge, and Length. The third phase is just a single Detect step, and the output of the operation becomes the output of the program. Now we state what we mean by \computing" with programs under the Minimum Model. As mentioned earlier, there is uncertainty involved in the operations. We demand that programs under the Minimum Model be robust , in the sense that they output the same value on the same input no matter how each operation is actually done within the predetermined degrees of uncertainty.
De nition 0.1 Let L be a language over f0; 1g and P be a Minimum Model program that takes as input n-bit strings. We say that P decides L n , the length n portion of language L, if for all x ( )
of length n,
if x 2 L, then P robustly accepts x, and if x 62 L, then P robustly rejects x. De nition 0.2 A language L over f0; 1g is decided by Minimum Model programs if there exists a family F = fPn gn of programs under the Minimum Model such that for every n 1, Pn decides L(n) .
1
2.5 Complexity measures
Let P = (S0 ; I ) be a program under the Minimum Model. We introduce a number of quantities that can be used to measure the complexity of P : 1. the size , Size(P ), de ned to be the total length of I ; 2. the coding-length , Code-Length(P ) is de ned to be `; 5
3. the computation time , Time(P ), de ned to be the number of computation steps in the second phase of P ; and 4. the volume of DNA, Volume(P ), de ned to be the total number of DNA molecules used in the program. In this paper, we will be concerned only with polynomial-size, O(log n)-coding-length, polynomialvolume programs. We will use MM(T (n)) to denote the class of all languages decided by a family of such Minimum Model programs with computation time bounded by T (n).
2.6 Boolean Circuits
We compare the computational power of Boolean circuits and that of the minimum model. A Boolean circuit of n inputs is a directed acyclic graph with labeled nodes. There are exactly 2n nodes with indegree 0. These nodes are called input gates and are labeled x1 ; : : : ; xn ; x1 ; : : : ; xn . Other nodes are labeled by one of AND and OR. A node labeled AND computes the conjunction of its input signals while one labeled OR computes the disjunction of its input signals. These gates are strati ed and each level consists of the same kind of gates; that is, there are levels of OR-gates and those of AND-gates. There is a unique node with outdegree 0. The gate is called the output gate. The models of Boolean circuits are de ned in terms of the gate fan-ins (the number of maximum inputs that AND-gates and OR-gates can take). In bounded-fan-in circuits , both AND-gates and ORgates may have at most two inputs. In unbounded-fan-in circuits , both AND-gates and OR-gates may have an arbitrary number of inputs. In semi-unbounded-fan-in circuits , OR-gates can have an arbitrary number of inputs while AND-gates have at most two inputs. There are two complexity measures for Boolean circuits. The size of a circuit C , denoted by Size(C ), is the number of gates in it. The depth of C , denoted by Depth(C ), is the length of the paths from input gates to the output gate. Also, PSize denotes the collection of all languages decided by a family of polynomial-size circuits. For a natural number k, NCk (respectively, ACk and SACk ) is the collection of all languages decided by a family fCn gn1 of polynomial-size, O(logk n)-depth bounded-fan-in circuits (respectively, unbounded-fan-in circuits and semi-unbounded-fan-in circuits).
3 The Power of the Minimum Model
Theorem 1 For every d 1, NCd MM(O(logd n)). Proof. Let d 1, and let a language A be accepted by fCn gn , a family of polynomial-size, 1
O(logd n)-depth unbounded-fan-in circuits. For simplicity, let n be xed. We will construct a Minimum Model Program Pn that decides A(n) . Let D = Depth(Cn ). Here we may assume that D is a multiple of log n. So, let D0 = D= log n. Convert Cn to an equivalent circuit Cn0 as follows: Group consecutive log n levels of Cn together. Then convert the log n depth sub-circuits into trees. More precisely, Cn0 is composed of sub-circuits G[n; 0]; G[n; 1]; ; G[n; D0]; such that G[n; 0] is the input gates of Cn, 6
for each i; 1 i D0, G[n; i] is a collection of full binary trees of depth precisely log n with
alternating levels of AND-gates and OR-gates, and for each i; 1 i D0, G[n; i ? 1] and G[n; i] are connected by distributing the output gates of G[n; i ? 1] among the input gates of G[n; i]. With this modi cation, the circuit Cn0 has the depth D + D0 and has polynomially many gates. Let P be the size of Cn0 . We will describe below how to simulate each G[n; l] in O(log n) steps. In our simulation, a gate g will be represented by two strands, L[g] 2 SL and R[g] 2 SR. We will use two parameters Mlow and Mhigh. The simulation is done so that for each l; 0 l D0 , there is a test tube Tl such that for every n-bit input x to Cn , when the simulation of G[n; l] has been done, for every output gate g of G[n; l], the following (~) holds: (~) If g(x) = false, then both L[g] and R [g] are absent in Tl ; and if g(x) = true, then the amplitudes of L[g] and R [g] in Tl are between Mlow and Mhigh. Note that (~) can be easily satis ed for G[n; 0]. For each gate g in G[n; 0] that outputs true, we synthesize L[g] and R [g] with Mlow (1 ? mer )?1 (1 + syn ) as the amplitude argument for the synthesize operation. We merge the synthesized strands into T0 . Then for each g in G[n; 0], if g outputs false, then the amplitudes of L[g] and R[g] in T0 are both 0, and if it outputs true, then both amplitudes are between Mlow and Mlow (1 ? mer )?1 (1+ syn ). Here the upper bound has to be at most Mhigh . So,
Mhigh Mlow (1 ? mer )?1 (1 + syn )2 :
(1)
Also, we have
Mlow (1 ? mer )?1 (1 + syn )2 MINsyn (2) On the other hand, the simulation of G[n; l]; 1 l D0 , proceeds in three stages: Stage 1: Generate strands corresponding to the inputs of G[n; l]. This is done by \distributing" the output strands of G[n; l ? 1] contained in Tl?1 among the input gates of G[n; l]. Stage 2: Simulate G[n; l] in a level-by-level fashion. Stage 3: Concurrently for each output gate g of G[n; l], amplify the amplitudes of L[g] and R[g] to Mlow . Next we describe how each stage is done.
Stage 1: Distributing output signals
Let g be an input gate of G[n; l] with its unique input from an output gate h of G[n; l ? 1]. The description is how we simulate this speci c gate g, but the method can be concurrently applied to all the input gates of G[n; l]. In the rst phase of the program, we prepare a left strand L[g], a right strand R [g], and linker strands (L[g]; R [h]), and (L[h]; R [g]). The synthesis amplitudes are set to
Mlow P ?1 (1 + syn )?1 7
for all the four synthesis steps. These strands are merged into a test tube U . The merge may decrease the amplitudes of these strands to
Mlow P ?1 (1 + syn )?2 (1 ? mer ): Now in the actual simulation, we rst merge U into Tl?1 . The amplitudes of the above four strands may be decreased to as small as
Mlow P ?1 (1 ? mer )2 (1 + syn )?2 : Next, we run anneal on Tl?1 to separate into Y (l; 0) the `-base components of 2`-base complete double-strands in Tl?1 . Suppose h outputs true. Then both L[h] and R [h] are present in Tl?1 , so hL[g]; R[h]i and hL[h]; R[g]i are synthesized in the annealing step. The content is then treated with nuclease to destroy all molecules that are not of the form hL[g]; R [h]i or hL[h]; R [g]i for some g; h 2 S0 . These triple compounds are rst separated by the separation of 2`-base strands, and then their `-base components, i.e., L[g]; L[h]; R [g]; R [h], are retrieved into Y (l; 0). The amplitudes of these strands will have a lower bound of LB = Mlow P ?1 (1 ? mer )2 (1 + syn)?2 (1 ? ann )2 ; and an upper bound of UB = Mlow P ?1 : This stage contributes three to the total number of simulation steps.
Stage 2: Simulating G[n; l]
The simulation in this stage proceeds level-by-level. Let k; 1 k log n, be an integer, and suppose that we are to simulate the gates at the k-th level of G[n; l]. We demand for every k; 0 k log n, and every gate g at the k-th level of G[n; l], that the condition (~) is satis ed with respect to a test tube Y (l; k) with LB (1 ? ann)4k as the lower bound and UB as the upper bound on the amplitudes. The method we will provide below is for simulating a single gate, but it can concurrently run on all the gates at the k-th level. The simulation method will be dierent depending on whether the level-k gates are OR-gates or AND-gates. Suppose that the level-k gates are OR-gates. Let g be an OR-gate taking inputs from level-(k ? 1) gates h and h0 . In the rst phase of the program, we prepare a left strand L[g], a right strand R [g], and linker strands (L[g]; R [h]); (L[h]; R [g]); (L[g ]; R [h0 ]), and (L[h0 ]; R [g]). These strands are synthesized with
LB (1 ? ann)4(k?1) (1 ? mer )?2 (1 + syn ) as the amplitude argument, and merged into a test tube U . In the simulation in the second phase, as in the method for Stage 1, we merge U into Y (l; k ? 1), anneal Y (l; k ? 1), and separate `-base components of 2`-base complete-double-strands into Y (l; k). In the case when g outputs true, either h or h0 outputs true, so, L[g] and R [g] will be present in Y (k; l) The amplitudes of these strands will have a lower bound of
LB (1 ? ann )4(k?1) (1 ? ann)2 LB (1 ? ann)4k : 8
and an upper bound of UB . On the other hand, in the case when g outputs false, L[h]; R [h]; L[h0 ], and R [h0 ] are all absent in Y (l; k ? 1). So, L[g] and R [g] will be absent in Y (l; k). Next suppose that the level-k gates are AND-gates. Let g be an AND-gate taking inputs from h and h0 , both at the (k ? 1)-st level of G[n; l]. Let u be a right strand and v be a left strand, not used in any other step. In the rst phase of the program, we prepare L[g]; R [g], and linker strands (L[g]; R[h]); (L[h0 ]; R[g]) in a test tube U . Also, we prepare (L[g]; R[g]) in a test tube U 0 . These strands are synthesized with
LB (1 ? ann)4(k?1) (1 ? mer )?2 (1 + syn ) as the amplitude argument for the synthesize operation. In the simulation in the second phase, as in the previous methods, we merge U into Y (l; k ? 1), and anneal to separate `-base components of 2`-base complete-double-strands into a test tube W . Then, we merge U 0 into W , anneal to separate `-base components of 2`-base complete-doublestrands into Y (l; k). In a similar analysis to the case when g is an OR-gate, L[g] is present in the test tube W if and only if h outputs true, and R [g] is present in the test tube W if and only if h0 outputs true. So, L[g] and R [g] are simultaneously present or absent in Y (l; k) depending on whether g outputs true or false, respectively. A lower bound of the amplitudes of L[g] and R [g] are given by
LB (1 ? ann)4(k?1) (1 ? ann)4 = LB (1 ? ann )4k : In summary, since G[n; l] consists of log n levels, simulation of G[n; l] contributes at most 6 log n +2 to the running time, and the amplitudes of the strands corresponding to the output gates of G[n; l] have an upper bound of and a lower bound of
Mlow P ?1
Mlow P ?1 (1 ? mer )2 (1 + syn )?2 (1 ? ann)4 log n+2 : There is a polynomial Q such that this lower bound is at least Mlow Q?1 .
Stage 3: Ampli cation
Since the lower of the amplitudes is Mlow Q?1 at the end of the simulation of a subcircuit, in order to satisfy the condition (~), we need to amplify the amplitudes of each strand by a multiplicative factor of Q. This can be achieved by repeated \pseudo-doubling." Suppose we wish to amplify a pair of strands (u; v) in a test tube U , where u and v are a left strand and a right strand, respectively. These strands are either both present or both absent, and if they are present, then m (u); (v) m0 , where m0 > m(1 ? mer )?2 (1 + syn ). Let u0 be some other left strand and v0 some other right strand. Suppose we synthesize u0 ; v0 ; (u0 ; v), and (u; v0 ) with m0 as the argument of synthesize and, as in the previous methods, merge those strands into U , and anneal to separate all `-base components of 2`-base complete-double-strands into U 0 . Then we obtain in U 0 at least m (1 ? ann )2 copies of u; v; u0 ; v0 if u and v were present in U and none otherwise. Introduce a new left strand u00 and a new right strand v00 and do a similar process for combinations (u00 ; v); (u00 ; v0 ); (u; v00 ); (u0 ; v00 ), and separate into U 00 all `-base components of 2`-base completedouble-strands. Then we will obtain in U 00 at least 2m (1 ? ann )4 copies of u00 and v00 if u and v were present in T , and none otherwise. Since ann is small, we can assume that 2(1 ? ann )4 > 1. If we rename u by u00 and v by v00 , and if we ignore u; v; u0 , and v0 , then we have multiplied the 9
amplitudes of u and v by a factor of 2(1 ? ann )4 . Here the upper bound m0 will be replaced by 2m0 . This \pseudo-doubling" process involves only a constant number of steps to be done in the second phase of the program. If we repeat the \pseudo-doubling" O(log n) times, then the lower bound will become at least Mlow . At that moment, the upper bound will become at most Mlow R(n) for some polynomial R. Therefore, by (1) and (2) there exist some polynomials R1 and R2 such that
R1 (n) MINsyn Mlow Mhigh =R2 (n): Overall, the number of operation steps in the second phase is
D0 O(log n) = O(logd n): There are polynomially many strands involved in the computation, and the synthesis amplitudes of the strands are bounded by a constant factor of Mhigh. Since Mhigh is bounded by some polynomial in n, the total number of DNA molecules necessary for computation is bounded by some polynomial in n. This proves the theorem.
Theorem 2 For every d 1, MM(O(logd n); poly ) SACd. Proof. Let d 1, and A be a language decided by a family of Minimum Model Programs fPn gn , such that Time(Pn ) c logd n for all n, where c is a xed constant. Let n be xed. Since Pn is 1
so written that it robustly decides A(n) , we can assume for any pair of strands that hybridize, that, as long as there is at least one molecule of each, they will form the double strand (in our encoding scheme the double strand is uniquely determined) if they are annealed. This means that the amplitude does not matter at all when logically determining the outcome of Pn . Let w1 ; ; wBpatt be an enumeration of all strands in S0 [ L. Note that Bpatt is bounded by a polynomial in n. Let Btime = Time(Pn ). Let Btube be the number of test tubes involved in the computation. Since the rst phase of the algorithm has polynomially many steps and there are only at most Btime steps in the second phase, Btube is also bounded by some polynomial in n. For each 0 s Btime , 1 t Btube , and 1 p Bpatt , de ne (s; t; p) = true if wp is present in the t-th test tube when the s-th step has been done and false otherwise. Let D be the test tube for which the nal Detect step is conducted. Then the output of Pn is equal to _ (B ; D; p): (3) time pBpatt
1
Once all the values of at time Btime have been computed, the formula in (3) can be evaluated with one unbounded fan-in OR-gate. Thus, we have only to show that the -values at time Btime can be computed by a semi-unbounded-fan-in circuits of depth O(Btime ). As to the values of at time 0, since the rst phase consists of synthesize and merge steps, the value (0; t; p) can be computed by one level of unbounded fan-in OR-gates, where we assume that the circuit is given the constants false and true at the input level. As to the values of at time s > 0, we show that they can be computed from the values of at time s ? 1 with at most three levels of fan-in two AND-gates and unbounded fan-in OR-gates. We will examine three cases, Merge, Anneal, and Length. 10
Merge Without loss of generality, we may assume that contents of some i-th test tube are mixed into some j -th test tube. For every t = 6 i; j , the t-th test tube is untouched, so for every p; 1 p Bpatt , (s; t; p) = (s ? 1; t; p). Since the i-th test tube is emptied, for every p; 1 p Bpatt , (s; i; p) = false. As to the j -th test tube, for every p; 1 p Bpatt , (s; j; p) = (s ? 1; i; p) _ (s ? 1; j; p): Thus, a Merge step can be simulated by a single-level of fan-in two OR-gates.
Anneal Suppose that an annealing is done at the t-th test tube at time s. As in the previous analysis, the values of other test tubes are preserved. As to the t-th test tube, the -value of a base strand wp is set to 1 if and only if there exist some q and r such that wr = (wp ; wq ) and that the values of wp, wq , and wr are all true at time s ? 1 in the t-th test tube. As to a linker strand, its -value is set to 0. So, this computational step can be simulated by an unbounded-fan-in OR-gate of two-levels of fan-in-2 AND-gates. Length The possible length for separation is either ` or 2`. We have only to reset the -values
of the strands to be eliminated by taking their conjunctions with false. Hence, each step requires at most 3 levels, so the depth of the circuit is bounded by c0 logd n, for some xed constant c0 > 0. This implies A 2 SACd . This proves the theorem. The following corollary immediately follows from Theorems 1 and 2.
Corollary 2.1 For every d 1, NCd MM(O(logd n); poly ) SACd. It is clear from the proof of Theorem 2 that the total number of DNA molecules does not aect the robust computational power of the Minimum Model. So, we have the following corollary.
Corollary 2.2 For any function class F including poly, MM(poly; F ) = PSize.
Acknowledgment The author would like to thank Gabriel Istrate and Dick Lipton for useful discussions.
References [1] L. Adleman. Molecular computation of solutions to combinatorial problems. Science, 266:1021{ 1024, 1994. [2] L. Adleman. On constructing a molecular computer. In R. Lipton and E. Baum, editors, Proceedings of 1st DIMACS Workshop on DNA Based Computers, pages 1{21. The American Mathematical Society, 1996. [3] E. Baum. DNA sequences useful for computation. In Proceedings of 2nd DIMACS Workshop on DNA Based Computers. The American Mathematical Society. To appear. [4] D. Beaver. Computing with DNA. Journal of Computational Biology, 2(1):1{8, 1995. 11
[5] D. Boneh, C. Dunworth, R. Lipton, and J. Sgall. On the computational power of DNA. Technical Report CS-TR-499-95, Department of Computer Science, Princenton University, Princenton, NY, October 1995. [6] D. Boneh, C. Dunworth, R. Lipton, and J. Sgall. On the computational power of DNA. Discrete Applied Mathematics, 71:79{94, 1996. [7] B. Fu and R. Beigel. A comparison of resource-bounded molecular computation models. In Proceedings of the 5th Israel Symposium on Theory of Computing and Systems, pages 6{11, 1997. [8] J. Hartmanis. The structural complexity column: on the weight of comptuation. Bulletin of the European Association for Theoretical Computer Science, Number 55:136{138, 1995. [9] R. Lipton. DNA solutions of hard computational problems. Science, 268:542{545, 1995. [10] T. F. McCutchen, J. L. Hausen, J. B. Dame, and J. A. Mullino. Mung bean nuclease cleavage Plasmodium genomic DNA at site, before and after genes. Science, 225:626{628, 1984. [11] M. Ogihara and A. Ray. Simulating boolean circuits on DNA computers. In Proceedings of 1st International Conference on Computational Molecular Biology, pages 326{331. ACM Press, 1997. [12] J. Reif. Parallel molecular computation. In Proceedings of 7th ACM Symposium on Parallel Algorithms and Architecture, pages 213{223. ACM Press, 1995. [13] D. Roo and K. Wagner. On the power of DNA computing. Information and Computation, 131:95{109, 1996. [14] B. B. Rosenbaum, F. Oaks, S. Menchen, and B. Johnson. Improved single-stranded DNA sizing accuracy in capillary electrophoresis. Nucleic Acids Research, 25:3925{3929, 1997. [15] J. Sambrook, E. F. Fritsch, and T. Maniatis. Molecular Cloning: a Laboratory Manual. Cold Spring Harbor Press, NY, 2nd edition, 1989. [16] J. Watson, M. Gilman, J. Witkowski, and M. Zoller. Recombinant DNA. Scienti c American Books, New York, NY, 2nd edition, 1992. [17] J. Watson, N. Hopkins, J. Roberts, J. Steiz, and A. Weiner. Molecular Biology of the Gene. Benjamin-Cummings, Menlo Part, CA, 4 edition, 1987. [18] E. Winfree. Complexity of restricted and unrestricted models of molecular computation. In R. Lipton and E. Baum, editors, Proceedings of 1st DIMACS Workshop on DNA Based Computers, pages 187{198. The American Mathematical Society, 1996.
12