Biomolecular computing: Is it ready to take off? - Wiley Online Library

23 downloads 1041 Views 325KB Size Report
Nov 5, 2006 - J. 2007, 2, 91–101 ... face of computer science, biological science and engi- neering. ... construct computer components was first conceived by.
Biotechnol. J. 2007, 2, 91–101

DOI 10.1002/biot.200600134

www.biotechnology-journal.com

Review

Biomolecular computing: Is it ready to take off ? Pengcheng Fu Department of Molecular Biosciences and Bioengineering, University of Hawaii at Manoa, Honolulu, HI, USA

Biomolecular computing is an emerging field at the interface of computer science, biological science and engineering. It uses DNA and other biological materials as the building blocks for construction of living computational machines to solve difficult combinatorial problems. In this article, notable advances in the biomolecular computing are reviewed and challenges associated with this multidisciplinary research are addressed. Finally, several perspectives are given based on the review of biomolecular computing.

Received 18 July 2006 Revised 18 October 2006 Accepted 5 November 2006

Keywords: Biocomputing · DNA strand · Massive parallelism · Self-assembling · Steganography

1 Introduction Biomolecular computing is an emerging field at the interface of computer science, biological science and engineering. The field is also known variously as biocomputing, molecular computation and DNA computation. The notion that single molecules or atoms could be used to construct computer components was first conceived by Richard Feynman in his visionary talk in 1959 [1]. The concept that DNA molecules and enzymatic DNA processing could be used to store information and perform computation was then theoretically discussed by T. Head in 1987 [2] and 1992 [3]. The possibility that DNA computation could be applied to solving complex mathematical problems was demonstrated by a landmark work published in Science in 1994 [4]. In that paper, Adlemen proved that a biological system could be used to solve a seven-node instance of Directed Hamiltonian path (DHP) problem [5], which is known to be one of the difficult mathematical problems. This computational breakthrough stimulated imaginative studies by computer scientists and molecular biologists worldwide on topics such

Correspondence: Professor Pengcheng Fu, Department of Molecular Biosciences and Bioengineering, University of Hawaii at Manoa, 1955 East-West Road, Honolulu, HI 96822, USA E-mail: [email protected] Fax: +1-808-956-3542 Abbreviations: DHP, directed Hamiltonian path; NP, non-deterministic polynomial time; PNA, peptide nucleic acid

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

as combinatorial optimization, operations research and numerical computation with biological paradigms. Subsequent research spawned a plethora of publications, journals, conferences and symposia, funding sponsors, research consortia, and even entire institutes. Over the ensuing decade, biomolecular computing evolved through multiple stages characteristic of many new paradigms: great initial excitement and optimism in the “honeymoon” period, followed by frustration and skepticism as problems and limitations were discovered, and more recently revitalized by new technologies and applications with the hope that substantial progress in DNA computation will be accomplished in the near future. Biological systems in nature are not formed for computation. However, many properties that biological organisms often possess are highly desirable for computer systems and computational tasks, such as a high degree of autonomy, parallelism, self-assembly and even self-repair functions. As building blocks, DNA and RNA molecules, various enzymes and other biological components can be used to construct cellular machines for computational purpose. A number of distinct methods have been developed to provide some kind of existence proof that biological organisms can serve as new tools for developing better computer systems and improving performance of computational tasks. These include notable advances in the DNA computation for hard mathematical problems, modalities of DNA computation, biomolecular computation and self-assembling systems, DNA strand and word design, biocomputing by restriction-ligation in plasmids and DNA computing and steganography, etc. This review is not an exhaustive survey of the literature available for

91

Biotechnology Journal

the biomolecular computing. Instead, it will discuss how the insights from living systems may pose a positive impact on the research of biocomputing.

2 Biocomputing for NP-complete problems Early biocomputing efforts have focused on constructing a massively parallel computational platform to attack NP (or non-deterministic polynomial time)-complete problems, which have been resistant to conventional computing mechanisms. For the NP-complete problems, an algorithm to solve one of them can be adapted to solve any one of the others [6]. The DHP problem [4], the maximal clique problem [7], and the Satisfiability Problem [8] etc. are known to be NP-complete. As a milestone, Adleman’s work in 1994 was designed to use a biological protocol to solve the classical Hamiltonian path problem, which is a special case of the “Traveling Salesman problem”. To obtain the solution for the Halmiltonian path problem, one seeks the shortest path the salesman can take to visit six cities only once each, starting at a given location and ending at a different defined location. This is also known to mathematicians as a DHP problem of graph theory: Given a “directed graph,” i.e., a set of seven points (vertices) with “edges,” i.e., one-way paths that connect the points, find a shortest path (if one exists) from the entry point (0) to the exit point (6) that passes through each point only once. There is no efficient algorithm in conventional computing architectures for this kind of problems [6]. In Adleman’s groundbreaking work, he used the DNA molecules to implement the computation strategy, and demonstrated how to solve the DHP problem by DNA encoding and subsequently creating and extracting the solution using

Biotechnol. J. 2007, 2, 91–101

very tiny quantities of his reagents. The insight was to encode the vertices and edges with unique 20-base DNA sequences, each edge having unique short ends complementary to the ends of the vertex oligonucleotides they connected. The path oligonucleotides (oligos) were asymmetric, and thus preserved the directionality. A series of steps, including DNA strand hybridization, PCR, ligation, affinity purification, and gel electrophoresis were used to create and resolve the DNA constructs representing the problem’s solution. The biological experiment aimed to orchestrate individual molecules to perform computational tasks that are beyond the reach of conventional computers based on solid-state electronics [9]. Details of the procedures are given in the original paper, and a review that includes a diagram can be seen in Fig. 1 (from [9]). The advantages of Adleman’s method were that massive parallelism and super information contents were achieved. The reactions contained approximately 3 × 1013 copies of each oligo, resulting in about 1014 strand hybridization encounters in the first step alone. In these terms, the DNA computation was a thousand-fold faster than the instruction cycle of a supercomputer. It was also noted that the information storage density was very high in DNA—about one bit per cubic nanometer, or 1021 bits per gram. This is billions of times denser than that of the media such as videotapes that require 1012 nm3 to store one bit [4]. In other terms, one micromole of nucleotides as DNA polymer can store about two gigabytes of information [7]. This has led to coding, processing, and protective packaging schemes that make use of DNA as a database [10–12]. Lastly, Adleman noted that the energy requirement for enzyme-based DNA computing is low: one ATP pyrophosphate cleavage per ligation that gives an efficiency of approximately 2 × 1019 operations per

Figure 1. Encoding problem in Adleman’s molecular computation. Reprinted from Garzon and Deaton [9].

92

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Biotechnol. J. 2007, 2, 91–101

joule. By comparison, supercomputers of that time performed roughly 109 operations per joule. A different set of DNA procedures was used by a different group to solve another NP-complete graph theory problem, or the “maximal clique problem” [7]. By definition, a clique is a set of vertices that are connected to each other by their edges. The maximal clique problem can be approached by determining the size of the largest clique in a given network containing N vertices and M edges [7]. Ouyang et al. [7] have designed their data structure in the form of double-stranded DNA so that each bit’s value (Vi) and its position (Pi) in a binary number were represented by two DNA sections. The data pool was digested with restriction enzymes. As a result, mathematical determination for the clique of largest size was transformed into searching for shortest length of DNA [7]. Adleman’s original work also revealed several shortcomings. Despite the theoretical information capacity of DNA, there are practical limits to the size of the problems that can be programmed, and the limited efficiency of data encoding and readout. For example, Ouyang et al. noted that maximal clique problems with up to 27 vertices could be solved with picomolar amounts of DNA nucleotides, and problems up to 36 vertices could be solved with nanomolar amounts. However, “Further scale-up rapidly becomes impractical” [7]. When designing experiments of this type, it is important to be able to estimate the complexity of DNA (number of different strands) in a tube. A simple procedure for this, based on the use of restriction enzymes, has been reported [13]. Occurrence of mismatches in hybridization raised the concern that erroneous solutions could be created. Several error detection and fault tolerance methods have been established to increase the accuracy of computation [14–18]. Among them, Deaton et al. [19] studied the errors associated with imperfect hybridization (i.e., false positive and false negative ligations). The authors pointed out that these errors depended largely on the reaction conditions of the hybridization, in particular on temperature. They estimated the size of the largest problem instances that could be encoded with nucleotides of a certain length. Condon et al. [20–22] and Wang et al. [23] have generated coding methods for DNA computing, and have made significant progress in accomplishing surfacebased DNA computing. Their works have accomplished much lower error rate as compared to the solution-based approach. Although the use of 20-base oligos allows 420 unique sequences, a significant number of these would develop secondary structures unsuitable for computing. Thus, methods were needed for generating oligos with suitable sequence and length to ensure high fidelity pairing during hybridization and PCR. However, it was obvious that these factors would become more detrimental as the size of the problems to be solved became larger. Also, Adleman noted that regular computers offered a far more ver-

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biotechnology-journal.com

satile set of operations than DNA and DNA-processing enzymes allowed. For instance, multiplication of two 100digit numbers using DNA computing would be very unwieldy [4]. In 2002, Adleman and co-workers [8] reported solving a 20-variable, NP-complete three-satisfiability (3-SAT) problem by DNA computing procedures similar to those in their 1994 paper. This computation–one unique answer from 220 possibilities–was the largest solved by non-electronic means at that time [8]. As a follow-up, Roweis et al. [24, 25] have developed the “sticker model” in which long single DNA strands were used as memory. In their work, DNA strands were made into Boolean variables by hybridization with oligonucleotide “stickers” of the same length, and complementary to the individual “bits” on the memory strand. This created a massively parallel, random-access memory that required no enzymes. Singlestranded bits were off (0) and sticker-hybridized doublestranded bits were on (1). This system permitted four basic operations on each bit or string of bits: combining, separating, setting on, and clearing (turning off). In general, the biomolecular computing application described by the authors has suggested that DNA computers could eventually be configured as robotic benchtop workstations [24].

3

Modalities of DNA computation

DNA algorithms for simple Boolean logic and arithmetic computing can be seen in the literature ([26], also at http://www.csc.liv.ac.uk/˜ctag/archive/t/CTAG-97009. ps, [27–32]. Initially, the DNA computing applications explored by Adleman and others were symbolic, rather than numerical. By 2004, numerous classical Boolean searchtype satisfiability problems had been solved, including “dominating-set”, “vertex cover”, “independent set”, “three-dimensional matching”, “set-packing”, “3-coloring”, and others, as cited in Chang et al. [33]. A wide variety of physical and chemical phenomena are combinatorial and require specified boundary conditions for modeling. These are known in mathematics as “forbidding/enforcing systems”. DNA-processing reactions are an example, and computing with DNA may be used to model such systems [34]. In addition, it has been shown that DNA computation can be used to solve problems in what is known to mathematicians as P-space: problems that impose a bound on memory as a polynomial function of their input, but impose no bound on the number of computational steps [10]. DNA computations may also be done on NC (Nick’s Class, after Nick Pippenger who did extensive research on the complexity theory) problems, which are a type of computing that uses parallel processors. Problems in optimization using the very broadly useful algorithms of integer linear programming are also solvable by DNA computing [35].

93

Biotechnology Journal

Biotechnol. J. 2007, 2, 91–101

Figure 2. Illustration of a simple DNA-based algorithm for adding two binary digits. (A) Input DNA strands and expected reactions for the operations 0 + 0, 0 + 1, and 1 + 1, and input DNA strands and expected reaction in the 1 + 1 reaction. (B) Input place holder DNA strand and expected second reaction in the 1 + 1 operation. Vertical dotted lines represent hybridization between complimentary DNA elements, and reiterated arrows represent primer extension. Reprinted from Guarnieri et al. [36].

Despite the versatility of DNA for the types of operations described above, arithmetic operations with DNA require different experimental approaches than search problems. The development of a DNA processing protocol for commutative arithmetic addition was another milestone in biomolecular computation. This system was first used to add any two non-negative rational binary numbers, and it was then extended to binary numbers of multiple digits [36]. In this work, the DNA-based addition algorithm was designed and executed biochemically. All DNA sequences used for computation were singlestranded, unique and non-complementary, except that overlining indicated a complementary DNA sequence [36]. Figure 2 illustrates a simple DNA-based algorithm for adding two binary digits in [36]. Correspondingly, Fig. 3a shows that addition of 0 + 0 resulted in a 40-bp strand, and both 0 + 1 and 1 + 0 yielded 70-bp strand. Figure 3b shows a similar result for 0 + 1 and 1 + 0, while a 110-bp strand

was produced for the 1 + 1 operation. More recently, Chang et al. [33] developed an algorithm and procedures for a DNA-based n-bit parallel adder and an n-bit parallel comparator. These methods were implemented using the memory strand and sticker techniques, and the concepts were formalized by Roweis et al. [24] and Lipton [37]. Since commutative multiplication is equivalent to repetitive addition, their procedure works for multiplication. A follow-up paper by Ho described how to solve the “subsetproduct” type of problem [38]. Similarly, Chang et al. [33] described a formula for calculating the number of tubes, maximum length of the DNA library strands, and the number of biological steps required to solve any such problem. This formula indicated that an automated lab bench-scale fluid-handling device would be practical for solving a broad range of such problems.

4

Figure 3. Biochemical execution of a simple DNA-based algorithm for adding two binary digits. M, Single-stranded molecular size markers. Molecular sizes are indicated on the left (in bases). (A) Products of each of the first three addition reactions shown in Fig. 2a. (B) Products of the (0 + 1) and the (1 + 0) reaction and the sequential (1 + 1) reaction. Reprinted from Guarnieri et al. [36].

94

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Biomolecular computation and self-assembling systems

High parallelism is one of the major advantages of DNA computation over conventional silicon-based computers. Parallel computation can be enhanced by self-assembling process where information is encoded in DNA tiles. Using sticky-end associations, a large number of DNA tiles can be self-assembled [39]. The paradigm is the “Wang tiles” (or “Wang dominoes”), which was first proposed by Hao Wang in 1961 [40]. Wang tiles are a set of rectangles with each edge so coded (for example, by color) that they can assemble into a larger unit, but only with homologously coded edges together. It was shown mathematically that by barring rotation and reflection, any set of such tiles

Biotechnol. J. 2007, 2, 91–101

could only assemble to cover a plane in a finite pattern that was aperiodic, i.e., the pattern was not repeated. Wang tiling is in contrast to a crystal structure, which has a periodically repeated pattern (for which one mathematical counterpart is known as “Penrose tiling”). It was further shown mathematically that the assembly of a set of Wang tiles into a unique lattice was analogous to the solving of a particular problem by the archetypal computer, known as a Turing machine [41]. In other words, self assembly of DNA materials with the architecture of Wang tiles may be used for computation, based on the logical equivalence between DNA sticky ends and Wang tile edges [42, 43]. To test various postulates of tiling self assembly, Rothemund [44] developed a simple model system using acrylic tiles with various edges treated to be hydrophobic or hydrophilic. To simulate various algorithms, combinations of the colored, fluorescent millimeter-sized tiles were suspended at the interface of an aqueous layer and a non-aqueous layer of n-hexadecane, and the meniscus shapes that formed were recorded photographically. This system allowed fast, simple experiments that validated several assumptions, and provided information about the accuracy, error frequency, free energy requirements, and robustness of computing by self assembly [45]. Double-stranded molecules of DNA, RNA, and analogs such as peptide nucleic acid (PNA) are capable of self assembly. The relation of DNA computation to self-assembling structures developed in the mid-1990s, largely through the theoretical and experimental work of Erik Winfree [42, 43], Nadrian Seeman [46], John Reif [10], and Grzegorz Rozenberg [47]. Given appropriate conditions, short, chemically stable segments or loops of DNA with strand termini that differ in sequence are a structural counterpart of Wang tiles. For linear assembly, “sticky ends” resulting from cleavage with restriction endonucleases suffice. Computational systems based on self assembly have been demonstrated in both 1-D arrangements called “string tiles” [42, 48], and 2-D lattices of DNA [43]. Other stable forms of nucleic acids include Z-DNA (which has 12 bases per turn instead of 10 in the B form), non-migrating Holliday junctions, and duplexes with triple crossovers or “pseudoknots” [46, 49, 50]. The design of sequences for DNA computation based on self assembly is more complex than for the strand-pairing mode, because it is very dependent on secondary structure during assembly in ambient conditions [45, 51]. Sequence relationships and constraints for string-tile formation were devised by Winfree et al. [42]. Patterns for DNA to form triple-crossover molecules that linearly assemble to create an exclusive OR gate (XOR) have been described [50]. Feldkamp et al. [52] developed a DNA sequence compiler for designing oligonucleotides. This compiler software creates DNA sequences that encode data in a 32-bit format using a specific grammar, and adjusts the design to accommodate different constraints at

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biotechnology-journal.com

various stages of the self-assembly process. A major contribution to the prediction of nucleic acid self assembly and maintenance of particular structures was made by McCaskill, who develop a thermodynamic partition function that quantifies the propensity of RNA sequences to form or eliminate secondary structures [53].

5

DNA strands and word design

DNA molecules are attractive instruments for computational tasks since they can be used for digital abstraction and for string construction. In particular, DNA strands can be used to code information, and the joinings and other recombinations of these strands may be used to represent putative solutions to certain computational problems [54]. For conventional solution-phase DNA computing, the theoretical bases and some algorithms were described by several groups [55–57]. The scope of the constraint problem was illustrated by Hartemink et al. [57]. In their attempt to design DNA sequences to make a unary counter, the first version of their program, called SCAN, required 24 h of computing time to search a space of over 7.5 billion unary counter designs, and found just nine designs that satisfied all of the constraints. However, one of the designs proved to be more efficient than manually designed words in an actual “wet” experiment [57]. Despite these developments in theory and practice, efficient design of sequences remains one of the greatest obstacles in solution-phase DNA computing. The combination of biological interactions (volume, base-pairing and melting temperatures, concentration of strands and words, requirement for tags of fixed sequence, action of restriction enzymes and ligases, amplification by PCR, readout by transcription, translation, etc.) and the problem to be encoded impose unique constraints. A general discussion of the variables, and separate treatments of the design issues for conventional and self-assembly computations, was given by Brenneman and Condon [45]. They also cited programs that assist with sequence word design for specific problems and procedures. In addition to the liquid-phase methods pioneered by Adleman [4], Reif [10], and others, techniques involving DNA tethered to a solid support, RNA-DNA hybridization, recognition of secondary structures, and self-assembling 2- and 3-D structures have been described. Other research groups explored the possibilities of computing with RNA-DNA hybrids, mainly because they are more temperature stable, and they allow use of a thermostable RNase H, which destroys only the RNA strand in RNADNA hybrid sequences [58]. While Adleman’s techniques were based on selecting one molecule representing a solution from the midst of many that represented non-solutions, Landweber and co-workers [59] devised methods to create all possible solutions of a problem in the form of RNA-DNA hybrids, and then exhaustively eliminate the

95

Biotechnology Journal

sequences that did not satisfy the problem’s constraints. This approach was used to solve a combinatorial chess problem using a 10-bit DNA memory strand [60, 61]. A similar approach was developed, based on inactivating, rather than digesting the non-candidate solutions at the University of Leiden [47]. Chemists at the University of Wisconsin developed a computation format and biological steps based on surface-immobilized oligodeoxynucleotides, high-affinity non-covalent hybridization with PNAs, and fluorescence readout [61]. Considerable efforts have been made to define information content and operations in DNA and other biomolecules according to concepts of mathematics, computer science, and linguistics [34, 62].

6 Biocomputing by restriction-ligation in plasmids A simple procedure for computing NP-complete problems, such as the maximum independent set (MIS) problem, was devised by Head et al. [63] with the introduction of the concept of plasmid computing. For the implementation of plasmid computing, Head et al. used classical “cut and paste” method of genetic engineering to modify the DNA molecules for formation of computational plasmids with bit 0 and bit 1. The MIS problem could be solved by the calculation of the largest cardinal number that occurs as the cardinality of an independent subset of the undirected graph G = (V, E), where V is the vertex set and E is the edge set [63]. To find the largest independent subset of vertices in G, Head et al. designed a duplex DNA molecule containing short unique sequences (~30–50 bp) flanked by restriction sites for different enzymes that produce sticky ends. This linear array of sequences representing the graph vertices (a, b, c, d, e, f) was inserted into the cloning site of a convenient plasmid. In this state, each short sequence represents a vertex, with its state set to 1. The largest independent set of vertices can include either, but not both directly adjacent vertices. To solve the problem, two aliquots of the plasmids–initially a single population–were treated in separate tubes with each of two restriction enzymes, to excise segments corresponding to ‘a’ and ‘b’, and the plasmids were then ligated and pooled. This pool, which contained a population of plasmids lacking vertex ‘a’ and a separate population lacking vertex ‘b’, was again split to two tubes, and restriction enzymes were used to excise the b and c vertices. These two plasmid solutions were ligated, pooled, divided into two tubes, and the processes were repeated until all directly adjacent sequences have been excised. The remaining inserts were excised from the cloning sites and resolved by electrophoresis. The longest remaining insert corresponds to the largest independent subset of vertices. The size gives the number of vertices in the subset, and sequencing reveals which vertices these are [63].

96

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Biotechnol. J. 2007, 2, 91–101

This technique differs from those used by Adleman, it may be applied to several other types of computing problems, and it gives the same answers Adleman and others found for various problems using different methods. Head’s method requires fewer steps than the others, no denaturation-renaturation steps are involved, virtually any convenient plasmid can be used, and there is no need to create sequences that represent all possible solutions of the problem. Furthermore, with appropriate modifications, the size of the problems is not limited by the available number of restriction enzymes [63].

7

DNA computing and cryptography/ steganography

Since DNA is a primordial coding device, its potential uses in cryptography (coding and decoding of information) and steganography (concealment of information) have been examined from many perspectives, as presented in an intriguing short survey of the possibilities by Amos et al. [34]. With an unparallel information density, DNA molecules are an ideal encryption pad as a tiny DNA sample could provide petabytes of data. The simplest steganography of all, and one of the most robust, was demonstrated by Clelland et al. [64] who created a message in a three-nucleotide code, synthesized an oligonucleotide containing the message flanked by known sequences, mixed it with a three-million-fold excess of genomic DNA, and spotted it in a 10-nL microdot. The message could only be decoded by PCR with the correct primers, and identification of the correct-sized product by electrophoresis. The DNA-based, doubly steganographic technique is shown in Fig. 4. A DNA strand containing an encoded message flanked by PCR primer sequences (Fig. 4a) was camouflaged within the human genomic DNA. Encryption key was created for the concealment of the “secret message” (Fig. 4b). The PCR products of the DNA microdots were analyzed by gel electrophoresis (Fig. 4c). The result DNA sequence with encoded text “June 6 invasion: Normandy” was decoded with the aid of the encryption key (Fig. 4b). This method is one of the most efficient and cost-effective ways to security-tag almost anything of value. A somewhat more complex, but more secure pair of methods was described by Leier et al. [65]. Perhaps of greatest interest is the harnessing of the massive parallelism of DNA information storage to break codes such as the U.S. Data Encryption Standard (DES) that are intractable even by current state-of-the-art computers. One method of decrypting DES requires an exhaustive search through 256 possible keys, using the memory strand and sticker approach described by Roweis et al. [24, 25, 34]. An in-depth survey of this and other possible strategies was published by Gehani et al. [66]. In general, a challenge still exists for the transforma-

Biotechnol. J. 2007, 2, 91–101

www.biotechnology-journal.com

Figure 4. Genomic steganography. (a) Structure of a prototypical secret-message DNA strand. F, forward; R, reverse. (b) Key used to encode a message in DNA. (c) Gel analysis of products obtained by PCR amplification with specific primers of microdots containing secretmessage DNA strands hidden in a background of sonicated, denatured human genomic DNA. Message input in copies per human haploid genome is indicated, where 1.0 corresponds to 0.41 fg secret-message DNA in 11 ng human DNA. Lane 2 contains a message input of 100 (20-fold more total DNA than the microdots) and was not PCR amplified. M, 100-base-pair size markers. The gel was stained with ethidium bromide. The arrow indicates the PCR product seen in some lanes, below which primer-dimer bands can be seen. (d) Sequence of the cloned product of PCR amplification, and the result of using the encryption key to decode the message. The DNA sequence determined for the encoded message is shown; the flanking primer sequences are in lower case. Reprinted from Clelland et al. [64].

tion of information from digital representation into DNA molecules and back again in the field of cryptography.

8 Concluding remarks Advantages provided by biomolecular computing have made it extremely attractive as a new paradigm for mathematical computation. This includes enhanced information density; high energy efficiency; fast and parallel data processing mode; robustness to the data with large amount of noise, self organization and self-learning ability and adaptation to the environmental changes. The history of biomolecular computing is relatively short, while there have already been a number of promising results reported in the literature [54, 67]. They can be divided amongst many significantly different disciplines. For instance, there are successful applications of biomolecular computing in seeking solutions of NP-complete problems, which requires that all the candidate solutions be exhaustively tested [68]. There are examples of DNA and DNA products been used in constructing autonomous systems, such as programmable finite-state automatons, to solve computational problems [69]. In this application, the programming was implemented by combined utilization of restriction nucleases, hybridizations and ligations to generate detectable output molecules that encode computational results. Several models for biomolecular computing have been developed [43, 70]. These methods represent a fundamental reconceptualization of computer science since they all take advantages of massive parallelism and miniaturization offered by biological systems.

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

One of the potential roles that biological organisms can play in biomolecular computing is that they can be used as memory. Such a bio-based memory may be many orders of magnitude superior over conventional magnetic types in terms of density [71]. There is an increasing trend of combining different computing elements for complex mathematical problems. For example, DNA-RNA hybridized double strands have been used to present valid solutions for a search problem in which RNA was used for the solution sequence and DNA was used to represent an element of an invalid solution [60]. With further advancement in biocomputing research, more diverse applications in this field will be expected. Since the work by Adleman in 1994, a large number of studies using biomolecules as instruments for computation have been published [54]. However, biomolecular computing is still in its nascent stage. So far it is more of an exploratory research than a solid body of result. Both the contents and perspectives of biomolecular computing need to be further developed for the molecular computers to compete with their silicon-based counterpart. Many challenges for this newly formed study exist, which crosses the disciplinary boundaries between the mathematical and computer science and biology. Firstly, design and execution of DNA biocomputing systems in the laboratory is not supported by a standardized modeling and simulation software prior to the “wet lab” experiments. Therefore, the design of DNA computation is mainly based on a “trial and error” process. The integration of the quantitative and predictive information, which has been widely applied for engineering design and implementation can thus hardly be used to form a model-based molecular computation

97

Biotechnology Journal

platform. Secondly, the biological materials, such as DNA, RNA or proteins, cannot be reused [4]. At the same time, the DNA sequences synthesized for a particular biomolecular computing may be consumed or even destroyed during the implementation of an algorithm. Thirdly, the DNA computational experiments are highly prone to errors [72]. Fourthly, although biomolecular computing has shown that it can be used to find solutions for complex mathematical problems, such as NP-complete problems, exponentially doubling of the volume of DNA may be required when the size of the NP-complete problems grows linearly. As a result, it will soon become an impossible computational task as the number of nodes for a NP-complete problem increases. Finally, biocomputing takes much longer time for each computation, in comparison with the silicon-based computers. Typically, implementation of an algorithm to solve a computational problem itself may take several days or even weeks. When a new initial condition needs to be tested, the same period of time is required for another run of calculation. Therefore, it is inconvenient and expensive to implement the biocomputing experiments which require repeated improvement processes. The abovementioned drawbacks associated with immature development in biomolecular computing must be overcome by the introduction of new concepts and tools from systems biology and synthetic biology. Based on this review, I suggest several research areas that may improve computational application inspired by biology: – Constraints-based modeling: Modeling efforts have been deployed for biomolecular computing without much success due to the complexity of biomolecular computation. To overcome this problem, one potential shift in our way of thinking may need to be made: rather than attempting to simulate exactly what a biological network may be designed to compute, we should be able to narrow the range of possible phenotypes that a biomolecular system can display based on the successive imposition of governing physicochemical constraints. Constraints-based modeling was originally developed for the reconstruction of genome-scale metabolic networks using genomic information, stoichiometry and thermodynamics [73]. The genome-scale metabolic networks with constraints have already been built for several organisms, including Escherichia coli [74, 75], Haemophilus influenzae [76], Helicobacter pylori [77], Saccharomyces cerevisiae [78] and Staphylococcus aureus N315 [79]. These kinds of biological models have been proven effective in the analysis of phenomic data [80–82], qualitative transcriptomic data [83], and gene knockout data [84, 85]. The results thus far have been a surprising degree of correlation between the predictions of genome-scale models and independently obtained experimental data [80, 84]. In a similar way, constraints may be used to define the parameter space where

98

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Biotechnol. J. 2007, 2, 91–101

computation under each method would be possible. For instance, the Whiplash PCR method for DNA computing recently proposed by Hagiya et al. [86] may be constrained to single-stranded molecules of a certain optimal length, in a maximum concentration, and where the different variables are encoded with a minimum base-pairing disagreement for each pair of variables. This systems biology approach provides a basis for understanding of the structure and function of biological organisms through an incremental process that takes into consideration the expected lack of complete kinetic information. – Construction of logic gates within biological organisms. No DNA computer has yet outperformed a silicon-based computer because of several severe disadvantages that biomolecular computing has experienced, this includes time-consuming and expensive construction process; error prone calculation; exponentially growth of volume needed for difficult search problems, etc. Among the many approaches for biomolecular computing, modification of biological organisms to construct cellular circuits that are analogous to electronic logic gates appears to be an intriguing option for developing programmable DNA computers. Construction of such logic gates with functional units for AND, NOT, OR/XOR and so forth can be done using synthetic biology methodology. By definition, synthetic biology is (i) the design and construction of new biological parts, devices, and systems, and (ii) the re-design of existing, natural biological systems for useful purposes (http://www.syntheticbiology.org/). More specifically, synthetic biology aims to design and build engineered biological systems that process information, manipulate chemicals, fabricate materials, produce energy, provide food, and maintain and enhance human health and our environment (see the Wikipedia: http://en.wikipedia.org/ wiki/Synthetic_biology). For biomolecular computing, this approach applies engineering formalisms to design and integrate biological/non-biological components into cellular metabolic/genetic networks to enable logic or digital computation. Systems theory and electronic circuit design that have found successful applications for electronic engineering can be utilized for biological systems in which signals are represented by protein concentrations, instead of electrical voltages. – Combination of system and synthetic biology. Ideally, a research on biomolecular computing should be an interplay between modeling/simulation (systems biology), and design/experimentation (synthetic biology). The implementation process for future biomolecular computing may involve an iterative relationship between these two complementary approaches: all of the “in silico” and “wet lab” experiments discover and catalog the phenomena, which empower scientists to in-

Biotechnol. J. 2007, 2, 91–101

www.biotechnology-journal.com

crease understanding of complex living systems; on the other hand, new cellular circuits can be created based on the existing body of knowledge from both engineering systems and biological systems. Development of biomolecular computing using systems biology and synthetic biology is thus cyclical; understanding (by systems biology) and creation (by synthetic biology) will enhance, move and evolve into each other continuously for a better biocomputational system.

Pengcheng (Patrick) Fu is currently an assistant professor in the Department of Molecular Biosciences and Bioengineering, University of Hawaii, Manoa (UHM) in Honolulu, Hawaii, USA. He received his Ph.D. in Biochemical Engineering from the University of Sydney, Australia in 1996. He then performed postdoctoral research in Japan and the USA from 1996 to 2000. He was em-

The author is grateful for the two anonymous reviewers for their critical feedback that has contributed significantly to the improvement of this manuscript.

ployed in Diversa Corp. (San Diego, CA) in 2001–2002, and became Assistant Professor at the UHM in 2002. His area of interests includes Metabolic Engineering, Bioprocess Control, Metabolomics and Systems and Synthetic Biology.

9 References [1] Feynman, R. P., There’s plenty of room at the bottom. In: Gilbert, D. H. (Ed.), Minaturization, Reinhold Publishing Corporation, New York 1961, pp. 282–296. [2] Head, T., Formal language theory and DNA: an analysis of the generative capacity of special recombinant behaviors. Bull. Math. Biol. 1987, 49, 737–759. [3] Head, T., Splicing systems and DNA, in: Handbook of Formal Languages. Springer-Verlag, Berlin 1992, pp. 371–383. [4] Adleman, L., Molecular computation of solutions to combinatorial problem, Science 1994, 266, 1021–1024. [5] Rubin, F., A Search Procedure for Hamilton Paths and Circuits. J. ACM 1974, 21, 5404–5411. [6] Garey, M. R., Johnson, D. S., Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman, New York 1979. [7] Ouyang, Q., Kaplan, P. D., Liu, S., Libchaber, A., DNA Solution of the maximal clique problem. Science 1997, 278, 446–449. [8] Braich, R. S., Chelyapof, N., Johnson, C., Rothemund, P. W. K., Adleman, L. M., Solution of a 20-variable 3-SAT problem on a DNA computer. Science 2002, 296, 499–502. [9] Garzon, M. H., Deaton, R. J., Biomolecular computing and programming. IEEE Trans. Evolut. Comput. 1999, 3, 236–250. [10] Reif, J. H., Computing: Successes and Challenges. Science 2002, 296, 478–479. [11] Kashiwamura, S., Yamamoto, M., Kameda, A., Shiba, T., Ohuchi, A., Potential for enlarging DNA memory: the validity of experimental operations of scaled-up nested primer molecular memory. Biosystems 2005, 80, 99–112. [12] Cox, J. P. L., Long-term data storage in DNA. Trends Biotechnol. 2001, 19, 247–250. [13] Faulhammer, D., Lipton, R. J., Landweber, L. F., Counting DNA: estimating the complexity of a test tube of DNA. Biosystems 1999, 52, 193–196. [14] Chen, J., Wood, D., A new DNA separation technique with low error rate, in: Third DIMACS Workshop on DNA Based Computers, June 1997, Philadelphia, pp. 43–58. [15] Williams, R., Wood, D., Exascale computer algebra problems interconnect with molecular reactions and complexity theory, in: Second Annual Meeting on DNA Based Computers, June 1996, Princeton, pp. 260–268. [16] Wood, D., Applying error correcting codes to DNA computing, in: Fourth International Meeting on DNA Based Computers, June 1998, Philadelphia, pp. 109–110.

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

[17] Hartemink, A., Gifford, D., Khodor, J., Automated constraint-based nucleotide sequence selection for DNA computation, in: Fourth International Meeting on DNA Based Computers, June 1998, Philadelphia, pp. 287–297. [18] Khodor, J., Gifford, D., Design and implementation of computational systems based on programmed mutagenesis, in: Fourth International Meeting on DNA Based Computers, June 1998, Philadelphia, pp. 101–108. [19] Deaton, R., Garzon, M., Murphy, R.C., Rose, J.A. et al., Reliability and Efficiency of a DNA-Based Computation. Phys. Rev. Lett. 1998, 80, 417–420. [20] Cai, W., Condon, A. E., Corn, R. M., Glaser, E. et al., The power of surface-based DNA computation, in RECOMB’97. Proceedings of the First Annual International Conference on Computational Molecular Biology, New York, 1997, pp. 67–74. [21] Frutos, A. G., Thiel, A. J., Condon, A. E., Smith, L. M., Corn, R. M., DNA computing at surfaces: 4 base mismatch word design, in: Third DIMACS Workshop on DNA Based Computers, June 1997, Philadelphia, pp. 238–239. [22] Garzon, M., Deaton, R., Neathery, P., Murphy, R. C. et al., On the encoding problem for DNA computing, in: Third DIMACS Workshop on DNA Based Computers, June 1997, Philadelphia, pp. 230–237. [23] Wang, L., Liu, Q., Frutos, A., Gillmor, S. et al., Surface-based DNA computing operations: Destroy and readout, in: Fourth International Meeting on DNA Based Computers, June 1998, Philadelphia, pp. 247–248. [24] Roweis, S., Winfree, E., Burgoyne, R., Nickolas, V. et al., A sticker based model for DNA computation, in: Proceedings of 2nd Annual DIMACS Workshop on DNA Based Computers (Discrete Mathematics and Theoretical Computer Science), June 10–12 1996, Princeton. [25] Roweis, S., Winfree, E., Burgoyne, R., Nickolas, V. et al., A sticker based model for DNA. J. Comput. Biol. 1998, 5, 615–629. [26] Amos, M., Dunne, P., DNA simulation of Boolean circuit, Department of Computer Science, University of Warwick, UK, 1998. [27] Hagiya, M., Arita, M., Towards parallel evaluation and learning of Boolean µ-formulas with molecules, in: Third DIMACS Workshop on DNA Based Computers, June 1997, Philadelphia, pp. 105–114. [28] Ogihara, M., Ray, A., Simulating Boolean circuits on a DNA computer, in: First Annual Conference on Computational Molecular Biology, New York 1997, pp. 326–331. [29] Ogihara, M., Ray, A., DNA-based self-propagating algorithm for solving bounded-fan-in Boolean circuits, in 3rd Conference on Genetic Programming, Philadelphia, 1998, 725–730.

99

Biotechnology Journal

[30] Ogihara, M., Ray, A., Circuit evaluation: Thoughts on a killer application in DNA computing, in Computing with Bio-Molecules. Theory and Experiments, Springer-Verlag, Heidelberg 1998. [31] Ogihara, M., Relating the minimum model for DNA computation and Boolean circuits, in: Proceedings of the Genetic and Evolutionary Computation Conference, 1999, San Francisco, pp. 1817–1821. [32] Ogihara, M., Ray, A., Executing parallel logical operations with DNA, in: Proceedings of the IEEE Congress on Evolutionary Computation, 1999, Washington, pp. 972–979. [33] Chang, W.-L., Ho, M. S. H., Guo, M., Molecular solutions for the subset-sum problem on DNA-based supercomputing. Biosystems 2004, 73, 117–130. [34] Amos, M., Paun, G., Rozenberg, G., Salomaa, A., Topics in the theory of DNA computing. Theor. Comput. Sci. 2002, 287, 3–38. [35] Wang, S., Yang, A., DNA solution of integer linear programming. Appl. Math. Comput. 2005, 170, 626–632. [36] Guarnieri, F., M. Fliss,, C. Bancroft, Making DNA add. Science 1996, 273, 220–223. [37] Lipton, R.J., DNA solution of hard computational problems. Science 1995, 268, 542–545. [38] Ho, M.S.-H., Fast Parallel Molecular Algorithms for DNA-Based Computation: Factoring Integers. IEEE Transactions on Nanobioscience, 2005, Biosystems 2005, 4, 149–163. [39] Yan, H, Park, S. H., Finkelstein, G., Reif, J. H. et al., DNA-templated self-assembly of protein arrays and highly conductive nanowires. Science 2003, 301, 1882–1884. [40] Wang, H., Proving theorems by pattern recognition. Bell System Tech. J. 1961, 40, 1–42. [41] Turing, A., On computable numbers, with an application to the entscheidungsproblem, in: Proceedings of the London Mathematical Society, Series 2, Vol. 42 (1936). Reprinted, in: Davis, M. (Ed.), The Undecidable, Raven Press, Hewlett 1965. [42] Winfree, E., Eng, T., Rozenberg, G., String Tile Models for DNA Computing by Self-Assembly. Lect. Notes Comput. Sci. 2001, 2054, 63–88. [43] Winfree, E., Algorithmic Self-Assembly of DNA, Ph.D. Dissertation, California Institute of Technology. Pasadena CA, 1998. [44] Rothemund, P. W. K., Using lateral capillary forces to compute by self-assembly. Proc. Natl. Acad. Sci. USA 2000, 97, 984–989. [45] Brenneman, A., Condon, A., Strand design for biomolecular computation. Theor. Comput. Sci. 2002, 287, 39–58. [46] Seeman, N. C., DNA Nanotechnology: Novel DNA Constructions. Annu. Rev. Biophy. Biomol. Struct. 1998, 27, 225–248. [47] Rozenberg, G., Spaink, H., DNA computing by blocking. Theor. Comput. Sci. 2003, 292, 653–665. [48] Winfree, E., Liu, F., Wenzler, L. A., Seeman, N. C., Design and selfassembly of 2D DNA crystals. Nature 1998, 394, 539–544. [49] Mao, C., Sun, W., Seeman, N. C., Designed two-dimensional DNA Holliday jumction arrays visualized by atomic force microscopy. J. Am. Chem. Soc. 1999, 121, 5437–5443. [50] Mao, C., LaBean, T. H., Reif, J. H., Seeman, N. C., Logical computation using algorithmic self-assembly of DNA triple-crossover molecules. Nature 2000, 407, 493–496. [51] Seeman, N. C., De novo design of sequences for nucleic acid structural engineering. J. Biomol. Struct. Dyn. 1990, 8, 573–581. [52] Feldkamp, U., W. Banzhaf, W., Rauhe, H., A DNA Sequence Compiler, in: Condon, A., Rozenberg, G. (Eds.), Preliminary Proceedings of the Sixth DIMACS Workshop on DNA-Computing, June 13th–17th 2000, Leiden, p. 253. [53] McCaskill, J. S., The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 1990, 29, 1105–1109. [54] Committee on Frontiers at the Interface of Computing and Biology, National Research Council, Biological inspiration for computing, in: Wooley, J. C., Lin, H. S. (Eds.), Catalyzing Inquiry at the Interface of

100

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Biotechnol. J. 2007, 2, 91–101

[55]

[56]

[57]

[58] [59]

[60]

[61]

[62]

[63]

[64] [65] [66]

[67]

[68]

[69]

[70]

[71] [72]

[73]

[74]

[75]

Computing and Biology, Chapter 8. The National Academies Press, Washington DC, 2005. Deaton, R. J., Chen, J., Bi, H., Rose, J. A., A software tool for generating non-crosshybridizing libraries of DNA oligonucleotides. Lect. Notes Comput. Sci. 2003, 2568, 252–261. Feldkamp, U., Rauhe, H., Banzhaf, W., Software tools for DNA sequence design. Genet. Program. Evolvable Machines 2003, 4, 153–171. Hartemink, A. J., Gifford, D. K., Khodor, J., Automated constraintbased nucleotide sequence selection for DNA computation, in: Proceedings of the Fourth Annual DIMACS Workshop on DNA Based Computers, University of Pennysylvania, June 15–19, 1998, Philadelphia. Ruben, A. J., Landweber, L. F., The past, present, and future of molecular computing. Nat. Rev. Mol. Cell Biol. 2000, 1, 69–72. Faulhammer, D., Cukras, A. R., Lipton, R.J., Landweber, L. F., Molecular computation: RNA solutions to chess problems. Proc. Natl. Acad. Sci. USA 2000, 97, 1385–1389. Cukras, A.R., Faulhammer, D., Lipton, R. J., Landweber, L. F., Chess games: a model for RNA based computation. Biosystems 1999, 52, 35–45. Wang, L., Liu, Q., Corn, R. M., Condon, A. E., Smith, L. M., Multiple word DNA computing on surfaces. J. Am. Chem. Soc. 2000, 122, 7435–7440. Manca, V., Martin-Vide, C., Paun, G., New computing paradigms suggested by DNA computing: computing by carving. Biosystems 1999, 52, 47–54. Head, T., Rozenberg, G., Bladergroen, R. S., Breek, C. K. D. et al., Computing with DNA by operating on plasmids. Biosystems 2000, 57, 87–93. Clelland, C. T., Risca, V., Bancroft, C., Hiding messages in DNA microdots. Nature 1999, 399, 533–534. Leier, A., Richter, C., Banzhaf, W., Rauhe, H., Cryptography with DNA binary strands. Biosystems 2000, 57, 13–22. Gehani, A., LaBean, T. H., Reif, J. H., DNA-based cryptography, in: Proceedings of 5th Annual DIMACS Meeting on DNA Based Computers (DNA 5), MIT, Cambridge June, 1999. Beigel, R., Fu, B., Molecular approximation algorithm for NP optimization problems, in: Third DIMACS Workshop on DNA Based Computers, June 1997, Philadelphia, pp. 93–104. Bach, E., Condon, A., Glaser, E., Tanguay, C., DNA Models and Algorithms for NP-completeproblems. IEEE Computer Society Press, 1996, pp. 290–299. Shi, X. Li, X., Zhang, Z., Xu, J. et al., Improve capability of DNA automation: DNA automation with three internal states and tape head move in two directions. Lect. Notes Comput. Sci. 2005, 3645, 1101–1013. Reif, J., Local parallel biomolecular computation, in: Third DIMACS Workshop on DNA Based Computers, June 1997, Philadelphia, pp. 243–264. Takinoue, M., Suyama, A., Molecular reactions for a molecular memory based on hairpin DNA. Chem-Bio. Informatics J. 2004, 4, 93–100. Edwards, J. S., Ramakrishna, R., Schilling, C. H., Palsson, B. O., Metabolic flux balance analysis, in: Lee, S. Y., Papoutsakis, E.. T. (Eds.), Metabolic Engineering. Marcel Deker, New York 1999. Edwards, J. S., Palsson, B.O., The Escherichia coli MG1655 in silico metabolic genotype: Its definition, characteristics, and capabilities. Proc. Natl. Acad. Sci. USA 2000, 97, 5528–5533. Reed, J. L., Vo, T. D., Schilling, C. H., Palsson, B. O., An expanded genome scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol. 2003, 4, R54. Edwards, J. S., Palsson, B. O., Systems properties of the Haemophilus influenzae Rd metabolic genotype. J. Biol. Chem. 1999, 274, 17410–17416.

Biotechnol. J. 2007, 2, 91–101

[76] Schilling, C. H., Covert, M. W., Famili, I., Church, G. M., Edwards, J. S., Palsson, B. O., Genome-scale metabolic model of Helicobacter pylori 26695. J. Bacteriol. 2000, 184, 4582–4593. [77] Forster, J., Famili, I., Fu, P. C., Palsson, B. O., Nielsen, J., Genomescale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res. 2003, 13, 244–253. [78] Becker, S. A., Palsson, B. O. Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation. BMC Microbiol. 2005, 5, 8. [79] Edwards, J. S., Covert, M., Palsson, B. O., Metabolic modelling of microbes: the flux-balance approach. Environ. Microbiol. 2002, 4, 133–140. [80] Varma, A., Palsson, B. O., Metabolic flux balancing: basic concepts, scientific and practical use. Bio/Technology 1994, 12, 994–998 [81] Edwards, J. S., Ibarra, R. U., Palsson, B. O., In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat. Biotechnol. 2001, 19, 125–130. [82] Forster, J., Famili, I., Palsson, B. O., Nielsen, J., Large-scale evaluation of in silico gene knockouts in Saccharomyces cerevisiae. Omics 2003, 7, 193–202. [83] Covert, M, W,, Palsson, B. O., Transcriptional regulation in constraints-based metabolic models of Escherichia coli. J. Biol. Chem. 2002, 277, 28058–28064. [84] Reed, J. L., Palsson, B. O., Thirteen years of building constraintbased in silico models of Escherichia coli. J. Bacteriol. 2003, 185, 2692–2699. [85] Ibarra, R. U., Edwards, J. S., Palsson, B. O., Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 2002, 420, 186–189. [86] Sakamoto, K., Gouzu, H., Komiya, K., Kiga, D. et al., Molecular Computation by DNA Hairpin Formation. Science 2000, 288, 1223–1226.

© 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Suggest Documents