Parameter-Specific FPGA Implementation of Edit ... - Semantic Scholar

2 downloads 0 Views 217KB Size Report
Kenneth B. Kent, Ryan B. Proudfoot, Yong Zhao. Faculty of Computer Science ..... [13] Arnold, J.M., Buell, D.A., and Davis, E.G. (June. 1992) “Splash 2,” in ...
Parameter-Specific FPGA Implementation of Edit-Distance Calculation

Kenneth B. Kent, Ryan B. Proudfoot, Yong Zhao Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada {ken, k0a62, b15v3}@unb.ca Abstract Biologists require ways to rapidly sequence vast amounts of DNA information. An approach to satisfying the demand is to provide hardware support and leverage parallel computation. When providing hardware acceleration it is known that a custom specific circuit will provide a high performance solution. Providing a balance between delivering an application-specific circuit while achieving optimal utilization of a Field Programmable Gate Array is a difficult task. This paper presents a technique in which a custom circuit solution for a given parameter set is generated for the edit-distance problem in comparing two sequences for similarity.

1

Introduction

As genome projects attempt to unlock the mysteries of DNA their databases are becoming very large. Scientists try to find the similarity between a number of “amino-acid producing” DNA sequences and a genuine DNA sequence of an individual in hopes that they can identify the protein encoded by the DNA sequence of the individual [1]. Scientists also use this to study the evolutionary trend between different species and also to study disease and inheritance. With this knowledge better treatments for a disease may be made and it can be used to identify people susceptible to certain diseases. These are just a few of the many different applications in which rapid sequence DNA information is becoming more important. To determine which amino acids are produced by each part of a DNA sequence we need to find the similarity between two sequences. This is done through sequence alignment; also know as the minimum string edit-distance. Different algorithms are currently available for calculating the sequence alignment and much research work in implementing these algorithms efficiently has been performed [28,11,15]. The most common of these are FASTA [9] and BLAST [10]. The problem with several methods is the use of heuristics to shorten the search; while this greatly in-

creases performance it comes at the price of accuracy. Other approaches such as Paracel’s GeneMatcher2 use an ASIC for accelerating the sequence search [11]. The implementation proposed in this paper uses a fixed-weight version of the edit-distance algorithm for homology searches and sequence alignment in genetic databases [12]. This algorithm makes all pairwise comparisons between the two strings and is therefore complete but computationally expensive. For this reason hardware based solutions such as using Field Programmable Gate Arrays (FPGA) is appealing. FPGAs allow us to both exploit hardware parallelism and rapidly prototype our implementation. FPGAs also give us the ability to customize the circuit to meet each different set of parameters. This gives us a customized circuit given any parameter set. FPGAs were first used on the SPLASH2 system [13] and others such as TimeLogic’s Deypher [14] and Hokiegene [15]. The purpose of the implementation proposed in this paper is to allow for rapid prototyping of the FPGA circuits. The method described below allows you to specify your parameter set and it takes care of all work required to implement the circuit. It will generate the source files necessary to synthesize, place and route the circuit without any extra user input or knowledge of how to hook up hundreds or thousands for processing elements together. This allows a user to simply specify their parameters and then rapidly prototype the circuit allowing them to get quick results on such things like is the device I want to use large enough for my parameter set? Or perhaps a better question is can I extract anymore parallelism from the device using unused resources? The Parameter specific generator proposed in this implementation does just that, allowing the user to rapidly prototype different parameter sets allowing them to see which set or sets best suit their needs. The rest of the paper is organized as follows Section 2 gives information on the Edit-Distance Algorithm and various optimizations to the algorithm used in this design. Section 3 gives the FPGA implementation of the algorithm. Section 4 describes the software component of the design which allows for rapid generation of a customized circuit

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06) 0-7695-2580-6/06 $20.00 © 2006 IEEE

given a set of parameters. Section 5 outlines the results obtained through testing the implementation.

2

T

Edit-Distance Algorithm

The edit-distance algorithm is a dynamic programming technique that uses a two dimensional table to compute the similarity of two DNA or protein sequences. The score obtained from the results can be seen as the number of mutations necessary to change the query sequence (source) to the database sequence (target). The concept of edit-distance is not unique to DNA sequencing but DNA sequencing allows for many simplifications which can not necessarily be realized under other circumstances. i

when j = 0

j

when i = 0

d(i,j) =

d(i-1,j-1) + C(i,j), min

d(i-1,j) + wi,

otherwise

d(i,j-1) + wd Equation 1: The Processing Element Equation 1 shows the formula used to calculate each cell in the two dimensional array where i and j are the row and column indices respectively. Each cell in the algorithm uses the formula to compute its value where the value wd is the cost of deleting a single element from the string, wi is the cost of inserting a single element into the string and C(i,j) is zero if the ith symbol of S is the same as the jth symbol of T, otherwise C(i,j) equals wm, the cost of mutating (substituting) an element from one string into the element of the other string. The most commonly used values for insertion deletion and substitution are wd = wi = 1 and wm = 2 and we will be using these values for our implementation [12,16]. All of the d(i,j) values can be stored in a two dimensional table in which d(|S|,|T|) contains the editdistance between S and T. The cell in the bottom right hand corner d(m,n), contains the global edit-distance between the two sequences. The complexity of a naïve implementation of this algorithm (where one cell is updated at a time step) is in the order of O(mn) where the length of the two sequences are m and n. Figure 1 below depicts a small example of a completed table of the edit-distance calculation.

G

G

0

1

2

3

A

1

2

3

4

C

2

3

4

5

G

3

4

3

4

Figure 1: Edit-Distance Algorithm Example Each cell (i,j) in the matrix relies on the cells immediately to the left (i,j-1), above (i-1,j), and diagonally above and to the left (i-1,j-1). This localization of data dependences to neighboring cells allows the use of a systolic array in the algorithm implementation. The order in which various cells in the table are calculated can produce an increase in performance [17]. With a close examination of the problem it becomes apparent you can calculate all the cells along a diagonal at once. This is the basis of many parallel implementations. This permits the parallel calculation of processing elements up to the minimum length of the two strings. A naïve approach is to construct the entire two dimensional table (m+1)*(n+1) and propagate the values through the matrix starting at cell(1,1) and ending at cell(m,n). This approach contains (m+n-1) diagonals and thus runs in linear parallel rather than quadratic serial time. This is a huge improvement in the time required to calculate the edit-distance. The problem with this approach is that it requires a quadratic number of processors which becomes very impractical when S and T are large. A further optimization proposed by Lipton and Lopresti consists of a two-channel two-way systolic array in which each processor can communicate with the processor immediately to the left and right in the array [17]. This solution utilizes a systolic array of 2*max(|S|,|T|) – 1 processing elements resulting in a complexity of O(m+n-1). In this two-channel two-way systolic array implementation the source and target strings are shifted in simultaneously from the left and right respectively. At each clock cycle each processor containing two characters performs the editdistance calculation, d(i,j), using the same rules as the dynamic programming algorithm. When the characters are shifted out they carry along with them the values of the last row and column of the matrix, and consequently the result.

2.1

Minimizing Processor Size

DNA sequences can be several thousand bases long so it is necessary to fit a large number of processors onto a single chip. Even with the gains from minimizing the number of required processors the data path is still a limitation on

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06) 0-7695-2580-6/06 $20.00 © 2006 IEEE

Sin

Sout

Sin

Tin

Tout

d

c

b

d

Tout

a

Sout

Sin

Tin

Tout

d

c

b

d

a

Sout

Sin

Tin

Tout

d

c

b

d

a

Sout

Sin

Tin

Tout

d

c

d

c

b

d

b

d

a

Sout a

Tin

Figure 2: Systolic Array of Processing Elements the systems feasibility. The largest factor on the size of each processing element is that it must compare and add relatively large data values, on the order of log(n) bits [17]. Lipton and Lopresti observed the special case where wi = wd = 1 and wm = 2 that a constant 2 bits is sufficient for any length string comparison [17]. This is attributed to adjacent processors’ state values not differing greatly in magnitude. This makes it only necessary to pass low order bits between processors. Using Equation 1, determining the remainder of d modulo 4 only requires knowing the characters Si and Tj and the remainders of each of a, b, and c modulo 4 where a, b, and c are d(i-1,j-1), d(i-1,j), and d(i,j-1) respectively. It is proven that the values of horizontal and vertical neighbors in the matrix differ by ± 1. Hence, the only calculation by Equation 1 is d = a or d = a + 2 and can be rewritten into Equation 2:

The processing element, performing Equation 2 and passing the result to its immediate neighbours, utilizes eight Virtex slices. The processing element is initialized to an initial value ‘a’ according to its location in the array chain using Equation 3:

a

> >

­° max S , T  i @%4 if i  max S , T ® °¯ i  max S , T %4 if i ! max S , T  2 * max S , T

@



Equation 3: Initialization of Processing Elements

Equation 2: Reduced Processing Element

These values correspond to the first row and column of the two dimensional matrix starting from the centre processing element set to a value of zero and the processor to its immediate left and right get the value of one, and the ones next to those a value of two, etc. This initial value is set using the value ‘aIN’ during the VHDL source code generation step described in Section 4. Figure 3 shows all the inputs and outputs of the processing element designed for this implementation.

Due to the simplified calculation, the output of the circuit is the result modulo 4. To produce the full result, a simple analysis of an exiting string is required. The exiting strings represent the entire last row and column of the original matrix. The values from the source string as it exits the array are read using an up/down counter. The counter starts equal to the length of the target string modulo 4. As the values are pumped out of the array; if the value is one greater than the previous output (mod 4) the counter is incremented. If the value is one less than the previous output (mod 4) the counter is decremented.

Figure 3: Circuit I/O Controls

d (i, j)

3

if b or c equal a  1 or Si T j ­ a ® if b and c equal a  1 and Si z T j ¯a  2

FPGA Implementation

Using the modifications from the previous section the complexity of the algorithm becomes O(m+n-1). The implementation consists of an array of identical processing elements, an up/down counter, a simple controller to pump the strings through the array and a software component to generate the VHDL specification for a specific circuit customized to the problem parameters.

When a processing element contains both a source and target character a comparison takes place using the values of Sin, Tin, b, c and the value stored in the PE from the previous calculation, ‘a’. The processing element is initialized to an initial value, ‘a’, according to its location in the array chain using Equation 3. The initial value corresponds to the first row and column of the two dimensional matrix starting from the centre PE set to a value of zero and the PE

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06) 0-7695-2580-6/06 $20.00 © 2006 IEEE

to its immediate left and right the value one, and the subsequent neighbors a value of two, etc.

is responsible for generating the systolic array based on various parameters provided by the user.

The processing elements are connected to form a twoway, two-channel systolic array as shown in Figure 2. The implementation requires 2*max(|S|,|T|)-1 processing elements. At each time step every PE that contains two characters will do a comparison as described earlier. The result of the comparison is stored as a two bit value ‘a’ in that PE and the most significant bit is passed as the output ‘d’ to the PEs to its immediate left and right as shown in Figure 2. The ‘d’ output of any PE becomes the ‘c’ input of the processor to its left and the ‘b’ input of the processor to its right. Figure 4 shows the calculated values as the two strings pass through the array. Remember that the most significant value of ‘a’ is passed to the processing element to its immediate right and left. The values shown in bold indicate that the calculation at the last time step changed the value of ‘a’. The value in brackets holds the last calculated value for that letter as it exits the array.

The controller consists of logic and two caches. One cache for the source string and another for the target string(s) to be analyzed. The controller logic is primarily responsible for pumping the two strings through the systolic array(s) and collecting the result(s). It pumps the source string into the first processing element and the target string into the last processing element. When both input strings have been pumped into the array a counter starts from 1 to 2*|S| - 1. During this interval, results from the array are collected. After completion the array is reset to the initial values and a new calculation can begin.

GCA

a=”10”

a=”01”

a=”00”

a=”01”

a=”10”

a=”10”

a=”01”

a=”00”

a=”01”

a=”10”

a=”01”

a=”00”

a=”01”

T a=”10”

TGG GC

A a=”10”

GC

An up/down counter is used to transform the mod 4 results into an edit-distance. The up/down counter is connected to the source string side output of the array. An internal state signal of two bits is used to keep track of the expected output from the array, described in Section 2.1. However, the array only outputs the second to least significant bit. Figure 5 shows how to calculate the result based on the output bit and the expected value and Figure 6 shows an example of the edit distance being calculated. [S]tate is a two bit vector and d is the one bit output from the array.

GG

A

if {[state(0) XOR state(1)='0'] AND d='0'} OR T

G

G

a=”10”

a=”01”

C a=”10”

a=”00”

a=”01”

T a=”10”

C a=”10”

T a=”11”

G T a=”00”

a=”00”

a=”10”

GG

a=”01”

G a=”10”

G

a=”10”

G

EditDistance = EditDistance + 1

a=”10”

G a=”11”

C a=”11”

G a=”11”

G a=”00”

a=”11”

G a=”00”

EditDistance = EditDistance – 1

a=”11”

G a=”00”

C a=”00”

G a=”01”

G a=”00”

a=”00” C(1)

a=”01”

a=”00”

G(0) TG

a=”00”

G a=”11”

a=”00”

a=”01”

a=”00” G(0)

TG

G a=”00”

a=”11”

else

a=”00”

end if

A

G T

{[state(1) XOR state(0)='1'] AND d='1'} then

A

G T

a=”01”

A

a=”01”

a=”00”

Figure 5: Expected State Calculation A(0)

T

A

G

G

Up/Down Counter (state)

Edit Distance

0

1

2

3

3 or “11”

3

C(1) A

A

1

2

3

0

0 or “00”

+1 = 4

C

2

3

0

1

1 or “01”

+1 = 5

CA

G

3

0

3

0

0 or “00”

–1 = 4

G(0) CA

Figure 6: Example Using Up/Down Counter TGG

Figure 4: Example showing how the values propagate through the array. The typical length of strings range from 100 to 1000 characters in length, thus requiring approximately 200 to 2000 PEs. The software component of the implementation

4

Customized Circuit Generation

The edit-distance problem is often used in the context of various parameters, namely the number of strings to be compared and the string length. With the use of configurable hardware technology it is possible to generate a circuit

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06) 0-7695-2580-6/06 $20.00 © 2006 IEEE

that delivers the optimal performance by customizing the circuit. This can be achieved by generating a circuit that: i) uses a precise systolic array length to match the input strings needs; and ii) maximizes the number of strings that are compared in parallel.

Circuit Template (VHDL)

Xilinx FPGA Design Flow

length of target

length of source

Circuit generation utilizes a template circuit for the design that is specified in VHDL. This template circuit is preprocessed to describe a specific circuit for a given set of parameters. The effect on the design process is an additional stage prior to synthesis. Figure 7 depicts the new FPGA design flow. The VHDL source generator takes four arguments as input: the length of the source string, the length of the target strings, the number of targets per parallel chain, and the number of targets to do in parallel. Specifying these parameters through a small graphical user interface, the software component generates a VHDL circuit specification. The purpose of this is to allow the circuit to utilize the maximum amount of FPGA resources.

Parameter Specific Generator

targets in parallel

targets per chain

Parameter Specific Circuit (VHDL)

Figure 7: New FPGA Design Flow The software modifies the controller and systolic component by adding in one array for every target to calculate in parallel as well as an up/down counter for each of these arrays. The controller requires the addition of one target output line for each target to be calculated in parallel. Also the memory width used for storing the target strings is increased by two bits for every target to be calculated in parallel. The source string is used for each array and is stored in a separate memory bank. Hence, only one memory read is necessary to get all the source and targets into the array. The width of the memory read for the targets is just increased by two bits for every target in parallel and then separated among the different arrays which each receiving their own two bits. The VHDL source generator is a significant portion of the implementation as it allows us to rapidly generate different optimal circuits given any parameter set. This gives us the advantage of not wasting any resources on the device

as well as decreasing the calculation time to each parameter set’s optimal value. When comparing shorter strings the array only takes up a rather small portion of the resources so we can use those left over resources to add in more arrays and calculate several targets in parallel. We can keep adding in arrays until we run out of resources on the device. This method allows us to exploit not only the fine grained parallelism of the calculation but also the course grained parallelism of the independence of two different string comparisons. With other implementations the sizes of the arrays are fixed and if your string length is not exactly the same size then you will have wasted time pumping the strings out of the array. Our implementation allows you to customize the arrays to fit your desired parameters as long as there are enough resources on the device to fit your parameter set.

5

Results

The system has been implemented using the Xilinx ISE 7.0 development suite targeting a Xilinx Virtex-E 802e [18]. This chip was used simply for its availability the design is fairly generic and using a different chip should require little or no changes to the core code. Using new/larger chips should give even better results and maximum string lengths. On the Virtex-E 802e the circuit operates at 79-102 MHz varying on the size of the systolic array and the number of targets to be compared in parallel. It takes 4*max(|S|,|T|) – 2 clock cycles to complete the calculation of the editdistance between the source string and the target strings compared in parallel. With the FPGA used, comparing target and source strings of length 830 characters is possible. In this configuration, the system can calculate the editdistance in 41.6 microseconds (at 79 MHz). These results were obtained, and verified, through both simulation and actual testing in hardware. It should be noted that the time needed to synthesize and configure the FPGA device is not included in these times. For a performance comparison the same strings were compared using a software solution on an Intel Pentium 3 866 MHz machine. In this environment, the software solution required 48831 microseconds - significantly slower than the hardware solution. This performance gap expands with the availability of additional logic resources on the FPGA to support higher levels of parallel sequence processing. For a better comparison we looked at Hookiegene [15]. Hookiegene was implemented on a Xilinx Virtex II XC2V6000 with speed grade -4 FPGA device so we synthesized and simulated our design for use on the same device. Hookiegene was found to run at 180MHz with four

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06) 0-7695-2580-6/06 $20.00 © 2006 IEEE

arrays of 1750 processing elements each. Our implementation runs at around 179MHz with one array of processing elements with enough space left over that one or two more arrays should fit onto the device but this has not been tested yet. Each implementation is capable of comparing strings of 1750 characters although Hookiegene is capable of calculating more target strings in parallel due to our different approaches. The difference between Hookiegene’s approach and our approach is that the source string is hardcoded into the processing elements and therefore doesn’t require as many PE’s as our approach. However compared to our approach Hookiegene requires the circuit to be reconfigured when the source string changes. Our system is more generic and simply lets you just pump a new source string into the arrays as long as that string fits into the parameter set. Table 1 presents the FPGA resource usage (in CLBs) and clock frequency (in MHz) for various levels of parallelism and string length. Shaded cells in the table indicate circuit configurations that are invalid for the FPGA in use due to a lack of resources available. Strings in Parallel String Length 100 200 300 400 500 600 700 800

1 String Size (CLBs) 1189 2289 3415 4539 5663 6793 7728 9041

Freq. (MHz) 92.9 88.2 86.7 84.7 82.8 82.1 83.9 80.1

2 Strings Size (CLBs) 2257 4394 6570 8694

Freq. (MHz) 89.8 85.4 83.1 85.6

3 Strings Size (CLBs) 3199 6275

Freq. (MHz) 88.2 83.3

Table 1: Usage and Timing for Various Circuits In general, it can be seen that the performance of the system scales rather linearly. The length of the strings capable decrease as the number of strings compared in parallel increase. This results in a near constant number of processing elements implemented on the FPGA. The operating frequency of the circuit declines slightly as the length of the strings increase or the level of parallelism is increased. This is overall attributed to the increase usage of the FPGA resources and the increased difficulty in placing and routing the components optimally.

6

Future Work

A useful function to be added is the ability to read from common databases such as FASTA format [9] and NCBI GenBank format [10]. This would allow users to use already established and commonly used databases to use as their target sequences. Likewise, providing a facility to store previously generated circuits for re-use can remove regeneration and resynthesis in a high usage environment. A hindrance in usability is the need to specify the number of strings to compare in parallel. Knowing the length of the strings and the characteristics of the FPGA (i.e. CLBs available), the system would ideally determine the optimal level of parallelism. Having the system use a heuristic function to determine the optimal parallelism will save the user valuable time [19]. Implementing a staged design flow approach would make for easier implementation of parameter sets. So now instead of first generating all the VHDL source files and then synthesize, place and route we overlap the processes. First the size of the PE is determined and using that information we can take a guess at how many PEs we can fit on the device. So the generation program tries to come up with an optimal number of processor arrays it can fit on the device to run in parallel. Then the synthesis, place and route can take place as normal. This could be further extended to first synthesis one complete array and then from the size of that array take a guess at how many of these arrays we can fit onto the device and then continue on with the rest of the design flow. Optimizing the current processing element so it can take advantage of the smaller size of a PE like the one described in [15] but at the same time keeping with our general approach of not having the source string hardwired onto the device. If possible this will have a two-fold effect on performance. Firstly it will allow us to fit more PEs on the device and secondly the PE should run at a higher clock rate than the one currently used in the implementation. Using a different processing element should require only minimal changes in the software generator program as the controller and up/down counter will be virtually the same and only slight changes should be needed for creating the arrays (adding/removing input/output lines). Also adapting the runtime reconfiguration used in their implementation may allow us to further improve on the performance and allow us to change the memory contents of the source and target ROM’s.

7

Acknowledgements

Funding for this work was provided by an NSERC Discovery Grant and equipment from the Canadian Microelec-

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06) 0-7695-2580-6/06 $20.00 © 2006 IEEE

tronics Corporation to the first author. The authors would also like to acknowledge Gord Brown, Dept. of Computer Science at the University of Victoria, for providing the comparable software solution.

8

References

[1] Yu, C.W., Kwong, K.H., Lee, K.H., and Leong, P.H.W. (2003) “A Smith-Waterman Systolic Cell.” In FPL 2003. Lecture Notes in Computer Science no. 2778. SpringerVerlag, pp. 375-384. [2] Gotoh, Osamu. (1982) “An Improved Algorithm for Matching Biological Sequences. In Journal of Molecular Biology. 162(3), pp. 705-708. [3] Hirschberg, J.D., Hughey, R., and Jarplus, K. (1996) “Kestrel: A programmable Array for Sequence Analysis.” In ASAP’96. IEEE Computer Society, pp. 25-34. [4] Puttegowda, K., Worek, W., Pappas, N, Dandapani A., Athanas P., Dickerman A. (2003) “A Run-Time Reconfigurable System for Gene-Sequence Searching.” VLSI Design 2003. pp. 561-566. [5] Quinton, P. and Robert, Y. (1991) Systolic Algorithms & Architectures. Prentice Hall. [6] Shasha, D. and Zhang, K. (1989) “Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems.” SIAM Journal on Computing, 18(6), pp. 12451262.

[14] TimeLogic Corp., “Decypher bioinformatics acceleration solution,” 2002. http://www.timelogic.com/decypher intro.html. [15] Puttegowda, K., Worek, W., Pappas, N., Dandapani, A., Athanas, P., and Dickerman, A. "A Run-Time Reconfigurable System for Gene-Sequence Searching," vlsid, p. 561, 16th International Conference on VLSI Design, 2003. [16] Churchill, D., Gillard, P., Hamilton, M., Wareham, T. “Prototyping Parallel Sequence Edit-Distance Algorithms in FPGA Hardware.”, 14th Annual Newfoundland Electrical and Computer Engineering Conference, http://necec.engr.mun.ca/ocs/viewpaper.php?id=28&cf=1, Oct 2004. [17] Lipton, L. and Lopresti, D. (1985) “Systolic Array for Rapid String Comparison.” In Proceedings of the 1985 Chapel Hill Conference on VLSI. Computer Science Press, pp. 363-376. [18] Virtex-E 1.8V FPGA Complete Data Sheet (All four Modules). Xilinx Inc. d http://direct.xilinx.com/bvdocs/publications/ds022.pdf, July 17, 2002. [19] Kent, K. B., Rice, J. E., Ronda, T., and Zhao, Y., “Instance-specific versus Parameter-specific Circuit Generation”, Engineering of Reconfigurable Systems and Applications (ERSA) Conference 2005, pp. 243-246.

[7] Smith, T.F. and Waterman M.S., “Identification of common molecular subsequences”, in Journal of Molecular Biology, 1981, 147(1), pp. 195-197. [8] Yamaguchi, Y. and Maruyama, T. (2002) “High Speed Homology Search with FPGAs.” In PSB 2002. World Scientific Press, pp. 271-282. [9] European Bioinformatics Institute Home Page. FASTA searching program, 2005. http://www.ebi.ac.uk/fasta. [10] National Center for Biotechnology Information. NCBI BLAST home page, 2005. http://www.ncbi.nlm.nih.gov/blast. [11] Paracel Inc., “The genematcher2 system datasheet,” 2002. http://www.paracel.com/products/pdfs/gm2_datasheet.pdf. [12] Wagner, R., and Fischer, M., “The String-to-String Correction Problem”, Journal of the ACM, January 1974, pp. 168-173. [13] Arnold, J.M., Buell, D.A., and Davis, E.G. (June 1992) “Splash 2,” in Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 316–324.

Proceedings of the Seventeenth IEEE International Workshop on Rapid System Prototyping (RSP'06) 0-7695-2580-6/06 $20.00 © 2006 IEEE

Suggest Documents