Improving Initial Pool Generation of Direct-Proportional Length-Based DNA Computing by Parallel Overlap Assembly Zuwairie Ibrahim1, Yusei Tsuboi1, Osamu Ono1, and Marzuki Khalid2 1 Institute of Applied DNA Computing, Meiji University, 1-1-1 Higashi-mita, Tama-ku, Kawasaki-shi, Kanagawa-ken, 214-8571 Japan {zuwairie, tsuboi, ono}@isc.meiji.ac.jp http://www.isc.meiji.ac.jp/~i3erabc/IADC.htm 2 Center for Artificial Intelligence and Robotics (CAIRO), Universiti Teknologi Malaysia, Jalan Semarak, 54100 Kuala Lumpur, Malaysia
[email protected]
Abstract. A direct-proportional length-based DNA computing approach for weighted graph problems has been proposed where the cost of each path is encoded by the length of oligonucleotides in a proportional way. During the initial pool generation, the hybridization/ligation phase is carried out where all the combinations are generated in the solution. However, this encoding suffers from biological behavior of hybridization since longer oligonucleotides are more likely to hybridize as oppose to the shorter ones. Recently, parallel overlap assembly (POA) has been recognized as an efficient initial pool generation of DNA computing for weighted graph problems. If POA is employed during the initial pool generation of direct-proportional length-based DNA computing, we expected that the biological influence contributed by various lengths of the oligonucleotides could be minimized as much as possible. Thus, in this paper we found that the hybridization/ligation method should be replaced with parallel overlap assembly, for a better and efficient initial pool generation of direct-proportional length-based DNA computing.
1 Introduction Currently, there are three kinds of initial pool generation methods for DNA computing: hybridization/ligation, parallel overlap assembly, and mix/split. The hybridization/ligation method has been firstly introduced by Adleman [1] to solve a Hamiltonian path problem (HPP). During the operation, the link oligonucleotides or oligos for short, hybridize through the hydrogen bonds by enzymatic reaction. The hybridization/ligation reaction is well shown in Fig. 1 [2]. Parallel overlap assembly (POA) has been used [3] and broadly applied in gene construction [4-6], gene reconstruction [7], and DNA shuffling [8]. POA involves thermal cycle and during the thermal cycle, the position strings in one oligo annealed
350
Z. Ibrahim et al.
to the complementary strings of the next oligo. The 3’ end side of the oligo is extended in the presence of polymerase enzyme to form a longer double stranded DNA (dsDNA) as depicted in Fig. 2 [2]. After a number of thermal cycles, a data pool with all combinations could be built. Mix and split method on the other hand has been introduced by Faulhammer [9] to generate an initial RNA pool for the knight problem. Later, Braich [10] applied this method to generate an initial pool for 20-variable 3-SAT problem. Recently, Lee [2] studied these three methods for initial pool generation in DNA computing. They did a comparison between hybridization/ligation method with POA method for initial pool generation of DNA computing. One of their findings is that by using hybridization/ligation method, pool generation efficiency was very low. Thus, this method cannot guarantee a complete pool as the problem size increases which limits the solvable problem size.
Fig. 1. Hybridization/ligation method for initial pool generation. The arrowhead indicates the 3’ end
Previously, we proposed a direct-proportional length-based DNA computing approach for the shortest path problem [11]. In this approach, the cost of an edge is encoded as a directly proportional length oligos and hybridization/ligation method has been proposed for initial pool generation during the computation. Since this will result in numerous number of combinations, by using the standard bio-molecular laboratory operations, it is possible to extract the optimal combination which represents a solution to the problem by applying a series of biochemical operations. In this paper, however, we argue that the previously proposed hybridization/ligation method for initial pool generation of direct-proportional length-based DNA computing is not efficiently produces all the combinations of the solution. Thus, it is better to use parallel overlap assembly for an efficient initial pool generation of direct-proportional length-based DNA computing.
Improving Initial Pool Generation
351
2 Advantages of Parallel Overlap Assembly According to Lee [2], the advantages of parallel overlap assembly method over the hybridization/ligation method for initial pool generation are as follows: • The initial pool size generated from the same amount of initial oligos is about twice larger than that of hybridization/ligation method. Though, if a larger problem is considered, the initial pool size is too small to contain the complete pool. POA, however, with more cycle and large experimental scale could include the practical pools. • Initially, two single-stranded DNA molecules partially hybridize in the annealing step and then they are extended by polymerase. The elongated DNA molecules are denatured to two single-stranded DNA in the next denaturation step, and they are subjected to the annealing reaction at the next cycle. Therefore, POA does maintain the population size and the population size can be decided by varying the initial number of oligos.
Fig. 2. Parallel overlap assembly for initial pool generation. The thick arrows represent the synthesized oligos which are the input to the computation. The thin arrows represent the elongated part during polymerization. The arrowhead indicates the 3’ end. • In hybridization/ligation method, the population size decreases as reaction progress. The population size decreased by a factor of the number of components composing it in hybridization/ligation method. As the problem size increases, the required initial pool size increases dramatically. Moreover, initial pool generation by POA requires fewer strands than hybridization/ligation method to obtain similar amount of initial pool DNA molecules because complementary strands are automatically extended by polymerase. • POA does not require phosphorylation of oligos which is prerequisite for the ligation of oligos.
352
Z. Ibrahim et al.
• POA demands less time than hybridization/ligation method. Hybridization requires one and half hour while ligation required more than 12 hours. Hence, POA for 34 cycles requires only two hours. Therefore, POA is much more efficient and economic method for initial pool generation.
3 Direct-Proportional Length-Based DNA Computing As the shortest path problem is chosen as a benchmark, the input to this problem is a weighted, directed graph G = (V, E,ω ), a start node u and an end node v. The output of the shortest path problem is a ( u , υ ) path with the smallest cost. In the case given in Fig. 3, if u is V1 and υ is V5, the cost for the shortest path will be given as 150 and the optimal path is clearly shown as V1 – V3 – V4 – V5.
Fig. 3. An example of an input graph for shortest path problem
The directed graph shown in Fig. 3 is considered. Let n be the total number of nodes in the graph. The DNA sequences correspond to all nodes and its complements are designed. Let Vi (i= 1, 2, … , n) and
Vi (i= 1, 2, … , n) be the β-mer DNA se-
quences correspond to the ith node in the graph and its complement respectively. If β = 20, by using the available software for DNA sequence design, DNASequenceGenerator [12], the DNA sequences Vi is designed and listed in Table 1. In Table 1, Vi can be separated into Via and Vib where Via is defined as half-5-end and Vib is defined as half-3-end of Vi. The output solution of PCR operation is brought for gel electrophoresis operation. During this operation, the DNA molecules will be separated in term of its length and
Improving Initial Pool Generation
353
hence, the DNA molecules V1 – V2 – V5 representing the shortest path starting from V1 and end at V5 would be extracted for sequencing. Table 1. DNA sequences for nodes
Node, Vi V1 V2 V3 V4 V5
20-mer sequences (5’-3’) Via Vib TATGCTCATT TGCATTTTGA CAGGAGCGTC TGAGAGCGAG TACCGGATCG ACCGGCTAAG TCGTCCAAGG GAGGCTTCTC GCTATATTGC GCTGGATGTG
We introduce three rules to synthesize oligos for each edge in the graph as follows: • If there is a connection between V1 to Vj, synthesize the oligo for edge as V1 (β) + W1j (ω1jα- 3β/2) + Vja (β/2) • If there is a connection between Vi to Vj, where i ≠ 1, j ≠ n, synthesize the oligo for edge as Vib (β/2) + Wij (ωijα- β) + Vja (β/2) • If there is a connection between Vi to Vn, synthesize the oligo for edge as Vib (β/2) + Win (ωinα- 3β/2) + Vn (β) The synthesized oligos consist of three segments. The number of DNA bases for each segment is shown in parenthesis, α is direct proportional factor, β is the length of the node sequences, and ‘+’ represents the join. ωij indicates the weight for corresponding DNA sequences Wij, Wij denotes the DNA sequences representing a cost between node Vi – Vj, and the value in parenthesis indicates the number of DNA bases or nucleotides. As such, if α=1, for an edge from Vi to Vj, due to the synthesized rules, the DNA sequences for distance or costs, Wij are designed using the DNASequenceGenerator [12] and the results are listed as shown in Table 2. Table 3 on the other hand lists all the synthesized oligos based on the proposed synthesis rules. The complement oligos of each node and cost are synthesized as well. Then, all the synthesized oligos are poured into a test tube for initial pool generation. The generation of an initial pool solution is based on the hybridization/ligation method. If the shortest path V1 - V3 - V4 - V5 is emphasized, Fig. 4 clearly shows which kinds of oligos are important for the generation of this path. However, the unwanted combinations are also generated in the same manner. At this stage, an initial pool of solution has been produced and it is time to filter out the optimal combinations among the vast alternative combinations of the problem. Unlike conventional filtering, this process is not merely throwing away the unwanted DNA duplex but rather copying the target DNA duplex exponentially by using the incredibly sensitive PCR process. This can be done by amplifying the DNA molecules that contain the start node V1 and end node V5 using primers. After the PCR operation is accomplished, there should be numerous number of DNA strands repre-
354
Z. Ibrahim et al.
senting the start node V1 and end node V5 traveling through a possible number of nodes.
Fig. 4. DNA duplex for shortest path represented as V1 - V3 - V4 - V5 based on hybridization/ligation method. The arrowhead indicates the 3’ end
Table 2. DNA sequences for costs
Cost, Wij W13 W45 W34 W24 W25 W12 W23
DNA sequences (5’-3’) GAAGTGTACGTTAGGCTGCT AAAGGTCGTCTTTGAACGAG AAAGGCCCTCTTTTAACGAAGTCCTGTACT AAAGCCCGTCGGTTAAGCAAGTAGTTTACGCTGCGTCATT GCGTTGTTGCGACATGTGGAGAATTGATCGCTTTCGTGC ATAACTGGG CAGCATCGTAGTAGAGCTAGTATCGAACTG ATAAGTAAC GGAGGGGGCTC AAAGCTCGTC GTTTAAGGAAGTACGGTACTATGCGTGAT TTGGAGGTGGA
4 Application of Parallel Overlap Assembly for DirectProportional Length-Based DNA Computing Until now, there exist three kind of DNA computing approach for weighted problems: length-based [13,11], concentration controlled-based [14], and melting temperature-based DNA computing [15]. The last two approaches have been successfully implemented by laboratory experiments. However, the first one still have been presented and discussed theoretically. One of the reasons is that, it is expected due to the thermodynamic of hybridization, it is very difficult to generate all the combinations of solution by using hybridization/ligation method.
Improving Initial Pool Generation
355
As stated by Lee [15], “In addition, the fact that larger weights are encoded as longer sequences is contrary to the biological fact that; the longer the sequences are, the more likely they hybridize with other DNA strands, though we have to find the shortest DNA strands”. From the biological point of view, this argument is definitely true. In order to overcome the limitation of general length-based DNA computing, the authors discovered that by utilizing parallel overlap assembly method for initial pool generation, a phase where numerous combinations of random routes of the graph are generated in the solution, the biological influence contributed by the length of the oligos could be minimized as much as possible. Table 3. DNA sequences for edges
Edges V1 – W13 – V3a V4b – W45 – V5 V3b – W34 – V4a V2b – W24 – V4a V2b – W23 – V3a V2b – W25 – V5
V1 – W12 – V2a
DNA sequences (5’-3’) 5’-TATGCTCATTTGCATTTTG GAAGTGTACGTTAGGCTGCTTACCGGATCG-3’ 5’-GAGGCTTCTCAAAGGTCG TCTTTGAACGAGGCT ATATTGCGCTGGATGTG-3’ 5’-ACCGGCTAAGAAAGGCCCTCTTTTAACGAAGTCCTG TACTTCGTCCAAGG-3’ 5’-TGAGAGCGAGAAAGCCCGTCGGTTAAGCAA GTAGTTTACG CTGCGTCATTTCGTCCAAGG-3’ 5’-TGAGAGCGAGAAAGCTCGTCGTTTAAGGAAGTAC GGTACTATGCGTGATTTGGAGGTGGATACCGGATCG-3’ 5’-TGAGAGCGAGGCGTTGTTGCGAGGCATGTGGAGAA TTGATCGCTTTCGTGCATAACTGGGGCT ATATTGCGCTGGATGTG-3’ 5’-TATGCTCATTTGCATTTTGACAGCATCGTAGTAGA GCTA GTATCGAACTGATAAGTAACGGAGGGGGCTCC AGGAGCG TC-3’
In order to generate the initial pool of the direct-proportional length-based DNA computing for the example problem by using POA method, the input will be all the synthesized oligos as listed in Table 3 and the complement sequences for each edges and costs. These inputs are poured into a test tube and the cycles begin. The operation of POA is similar as polymerase chain reaction (PCR) but the difference is that POA operates without the use of primers. As PCR, one cycle consists of three steps: hybridization, extension, and denaturation. During the annealing step, the temperature is decreased slowly so that partial hybridization is allowed to occur at the respective locations. The extension on the other hand is applied with the presence of polymerase enzyme and the polymerization can be done from 5’ to 3’ direction. Then, the generated double stranded DNA molecules are separated by denaturation step. This can be done by increasing the temperature until the double stranded DNA molecules are separated to become single stranded
356
Z. Ibrahim et al.
DNA molecules. An example of one cycle POA is depicted in Fig. 5. Furthermore, the DNA molecules generated during the first stage, second stage, and final product are shown in Fig. 6, Fig. 7, and Fig. 8 respectively.
Fig. 5. An example of one cycle operation of parallel overlap assembly (POA). A cycle consists of hybridization, extension, and denaturation step. The thin arrow lines indicate the extension parts. The arrowhead indicates the 3’ end
Fig. 6. First stage of parallel overlap assembly (POA). The thin arrow lines indicate the extension parts. The arrowhead indicates the 3’ end
Fig. 7. Second stage of parallel overlap assembly (POA). The thin arrow lines indicate the extension parts. The arrowhead indicates the 3’ end
The final product of POA is then subjected to subsequent biochemical operations as previously proposed for the direct-proportional length-based DNA computing for weighted graph problems.
Improving Initial Pool Generation
357
Fig. 8. Final product of parallel overlap assembly (POA). The thin arrow lines indicate the extension parts. The arrowhead indicates the 3’ end
5 Conclusions Direct-proportional length-based DNA computing for weighted graph problem has been proposed in order to overcome the shortcoming of the constant-proportional length-based DNA computing. In this paper, according to the literature, some advantages of the parallel overlap assembly over hybridization/ligation method for the initial pool generation are highlighted. It has been found that the previously proposed direct-proportional length-based DNA computing which made use of the hybridization/ligation method for initial pool generation has a low efficiency. This is due to the fact that during the hybridization, the longer oligonucleotides, which represent more costs, will more likely hybridize even though the hybridization of shorter oligonucleotides is required. Hence, the use of parallel overlap assembly for initial pool generation should be better. We suggest that the parallel overlap assembly method should be applied during the initial pool generation to increase the correctness of the direct-proportional length-based DNA computing for the weighted graph problems.
Acknowledgements This research was supported partly by the IEEE Computational Intelligence Society (CIS) Walter J Karplus Student Summer Research Grant 2004 for a research visit in September 2004 at the DNA Computing Laboratory, Hokkaido University, Japan. The first author would like to thank Associate Professor Masahito Yamamoto and the members of DNA Computing Laboratory for discussions that led to improvements in this work during the research visit. The first author also is very thankful to Universiti Teknologi Malaysia for a study leave. Finally, the last author wishes to thank Hitachi Scholarship Foundation for their support in granting a research travel grant to Meiji University.
358
Z. Ibrahim et al.
References 1. Adleman, L.: Molecular Computation of Solutions to Combinatorial Problems. Science, Vol. 266 (1994) 1021-1024 2. Lee, J.Y., Lim, H.W., Yoo, S.I., Zhang, B.T., Park, T.H.: Efficient Initial Pool Generation for Weighted Graph Problems Using Parallel Overlap Assembly. Preliminary Proceeding of the 10th International Meeting on DNA Computing (2004) 357-364 3. Kaplan, P.D., Ouyang, Q., Thaler, D.S., Libchaber, A.: Parallel Overlap Assembly for the Construction of Computational DNA Libraries. Journal of Theoretical Biology, Vol. 188, Issue 3 (1997) 333-341 4. Ho, S.N., Hunt, H.D., Horton, R.M., Pullen, J.K., Pease, L.R.: Site-Directed Mutagenesis by Overlap Extension Using the Polymerase Chain Reaction. Gene, Vol. 77 (1989) 51-59 5. Jayaraman, K., Fingar, S.A., Fyles, J.: Polymerase Chain Reaction-Mediated Gene Synthesis: Synthesis of a Gene Coding for Isozymec of Horseradish Peroxidase. Proc. Natl. Acad. Sci. U.S.A., Vol. 88 (1991) 4084-4088 6. Stemmer, W.P., Crameri, A., Ha, K.D., Brennan, T.M., Heyneker, H.L.: Single-Step Assembly of a Gene and Entire Plasmid from Large Numbers of Oligodeoxyribonucleotides. Gene, Vol. 164 (1995) 49-53 7. DeSalle, R., Barcia, M., Wray, C.: PCR Jumping in Clones of 30-million-year-old DNA Fragments from Amber Preserved Termites. Experientia, Vol. 49 (1993) 906-909 8. Stemmer, W.P.: DNA Shuffling by Random Fragmentation and Reassembly: In Vitro Recombination for Molecular Evolution. Proc. Natl. Acad. Sci. U.S.A., Vol. 91 (1994) 10747 9. Faulhammer, D., Cukras, A.R., Lipton, R.J., Landweber, L.F.: Molecular Computation: RNA Solutions to Chess Problems. Proc. Natl. Acad. Sci. U.S.A., Vol. 98 (2000) 1385-1389 10.Braich, R.S., Chelyapov, N., Johnson, C., Rothermund, P.W.K., Adleman, L. Solution of a 20-variable 3-SAT Problem on a DNA Computer. Science, Vol. 296 (2002) 499-502 11.Ibrahim, Z., Tsuboi, Y., Ono, O., Khalid, M.: Direct-Proportional Length-Based DNA Computing for Shortest Path Problem. International Journal of Computer Science and Applications, Vol. 1, Issue 1 (2004) 46-60 12.Udo, F., Sam, S., Wolfgang, B., Hilmar, R.: DNASequenceGenerator: A Program for the Construction of DNA Sequences. Proceedings of the Seventh International Workshop on DNA Based Computers (2001) 23-32 13.Narayanan, A., Zorbalas, S.: DNA Algorithms for Computing Shortest Paths. Proceedings of Genetic Programming (1998) 718-723 14.Yamamoto, Y., Kameda, A., Matsuura, N., Shiba, T., Kawazoe, Y., Ahochi, A.: Local Search by Concentration-Controlled DNA Computing. International Journal of Computational Intelligence and Applications, Vol. 2 (2002) 447-455 15.Lee, J.Y., Shin, S.Y., Augh, S.J., Park, T.H., Zhang, B.T.: Temperature Gradient-Based DNA Computing for Graph Problems with Weighted Edges. Lecture Notes in Computer Science, Vol. 2568 (2003) 73-84