A Sub-quadratic Sequence Alignment Algorithm ... - Semantic Scholar
Recommend Documents
A SUBQUADRATIC SEQUENCE ALIGNMENT ALGORITHM. 1655 of substitutions, insertions, and deletions (indels). The costs of these transformations.
May 5, 2006 - Multiple sequence alignment. Robert C Edgar. 1 and Serafim Batzoglou. 2. Multiple sequence alignments are an essential tool for.
clustering algorithm called Antipole tree and an ap- proximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool.
Jan 27, 2010 - with the maximum score based on the above scoring scheme. .... By applying Hirschberg's algorithm [7], it is clear that the required space in our.
International Journal of Information Technology Vol. 11, No. 8, 2005. A Novel Method for Multiple Sequence Alignment Based on Wavelet Package Transform.
databank Swissprot Rel. 35. When both sequences share several patterns the algorithm makes it possible to bind two similarity zones which are very far from.
Abstract. We consider a branch-and-cut approach for solving the multiple sequence align- ment problem, which is a central problem in computational biology.
... found sequence subset. A data-parallel algorithm for local sequence alignment .... every element above (north of) it, to the left (west) of it, and on the diagonal ..... remote running at http://fasta.bioch.virginia.edu/fasta_www/home.html or for
HSA runs in four steps: Step 1: (Initialization) We start by building a directed graph from the input proteins as follows. Each residue maps to a vertex in the graph.
and Cancellation in Wireless Networks. Li Erran Li ... Interference in traditional wireless networks has been con- ...... Cambridge University Press, May 2005.
Email: [email protected] ... algorithm was chosen as it is one of the best state of the art ... WU-BLAST implementation, which works only with 4(DNA).
A Sub-quadratic Sequence Alignment Algorithm ... - Semantic Scholar
A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Cost. Matrices. Maxime Crochemore *. Institut Gaspard-Monge. Universit de Maxne-la~Vall~e.
679
A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices Maxime Crochemore * Institut Gaspard-Monge Universit de Maxne-la~Vall~e
G a d M. L a n d a u t Haifa University and Polytechnic University
Abstract
The classical algorithm for computing the similarity between two sequences [36, 39] uses a dynamic programming matrix, and compares two strings of size n in O(n2) time. We address the challenge of computing the similarity of two strings in sub-quadratic time, for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both localand globalalignment computations. The speed-up is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by Lempel-Ziv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an O(n2/logn) algorithm for an input of constant alphabet size. For most texts, the time complexity is actually O(hn2/logn) where h _< 1 is the entropy of the text. 1
Introduction
Michal Ziv-Ukelson Haifa University
and IBM T.J.W Research Center
ing, organizing and analyzing the wealth of biological information. One of the most interesting new fields that the availabilityof the complete genomes has created is that of genome comparison (the genome is all of the D N A sequence passed from one generation to the next). Comparing complete genomes can give deep insights about the relationshipbetween organisms, as well as shedding light on the function of specific genes in each single genome. The challenge of comparing complete genomes necessitates the creation of additional, more efficientcomputational tools. One of the most common problems in biological comparative analysis is that of aligning two long biosequences in order to measure their similarity. In the global alignment problem [19], [29], the similarity between two strings A and B is measured. In the local alignment problem [39], the objective is to find substrings of A which are similar to substrings of B. Both alignment problems can be solved in O(n 2) time by dynamic programming [19],[39].
The rapid progress in large-scale DNA sequencing opens In this paper data compression techniques are employed a new level of computational challenges involved in stor- to speed up the alignment of two strings. The compression mechanism enables the algorithm to adapt to the data and to utilizeits repetitions. The periodic nature ~ t u t Gaspard-Monge, Universit de Marne-la-Vallde,Cit of the sequence is quantified via its entropy, denoted Descartes, Champs-sur-Marne, 77454 Marne-la-Vall~eCedex 2~ 0 < h