Joan Boyar and Kim S. Larsen. Department of Mathematics and Computer Science. Odense University, Denmark. Abstract. Fast parallel algorithms are given forĀ ...
Faster Parallel Computation of Edit Distances Joan Boyar and Kim S. Larsen Department of Mathematics and Computer Science Odense University, Denmark Abstract Fast parallel algorithms are given for the longest common subsequence problem and the string editing problem with bounded weights. In the COMMON PRAM model, the algorithm for the longest common subsequence problem takes time O(log m), where m is the length of the shorter string, while the algorithm for the string editing problem with bounded weights takes time O(max(log m; log n= log log n)), where m is the length of the shorter string and n is the length of the longer string.
1 Introduction The problem of computing the length of the longest common subsequences of two strings, s and t, is quite well examined. In the sequential case, the solutions in the literature are variations over the basic dynamic programming approach of lling out a table with values for pre xes of the argument strings. This leads to a time complexity of O(mn). Faster algorithms can be obtained for some special cases such as: short second argument, nearly identical strings, etc. A summary of these results can be found in [2, 7]. In this paper, however, the focus is on the parallel time complexity. Assume that m n. In [3], it is demonstrated how the length of the longest common subsequences can be computed in time O(log m log n) Some of this work was done while the second author was at the Computer Science Department of Aarhus University.
1
with O(mn= log m) processors on a CREW-PRAM, and, alternatively, in time O(log n(log log m)2 ) with O(mn= log log m) processors on a CRCWPRAM. In both cases, space is O(mn). A faster algorithm is presented in [1]; it uses O(log n log log m) time and O(mn= log log m) processors on a CRCW-PRAM. We show how an even better time complexity can be obtained for a CRCW-PRAM (the COMMON model is sucient) if one is only concerned about keeping the number of processors polynomial, rather than down to some realistic number. The algorithm, we present, runs in time O(log m) with O(mn3) processors using O(mn2) space. The longest common subsequence problem can viewed as a special case of the more general string editing problem. The results in [3], which were mentioned above, are actually for the string editing problem, though they automatically apply to the longest common subsequence problem as well. The result in [1] for longest common subsequences was in fact obtained for the the string editing problem with bounded weights, and the result for longest common subsequences simply follows from that. We also consider the string editing problem with bounded weights. By using an unrealistically large number of processors (though still a polynomial number), we obtain a time complexity of O(max(log m; log n= log log n)) on a CRCW-PRAM, where m is the length of the shorter string and n is the length of the longer string. Since the number of processors involved in both of our algorithms is ridiculously large, these results are not of practical signi cance. They are theoretical results, giving faster, and perhaps best possible, time bounds on the parallel complexity of these two problems.
2 Longest common subsequences A subsequence of a string s is any string, which can be created from s by deleting some of the elements. More precisely, if s is the string s1s2 sm , then si1 si2 sip is a subsequence of s if 8j 2 f1; : : :; pg : ij 2 f1; : : :; mg and 8j 2 f1; : : :; p ? 1g : ij < ij+1. Consider two xed strings s = s1s2 sm and t = t1t2 tn. Among the strings which are subsequences of both s and t, there will be some of maximal length. Such subsequences are called longest common subsequences. 2
Let count(s; t) denote the length of the longest common subsequences of s and t. The algorithm we give for computing count(s; t) is based on the following observation: a longest common subsequence c of s = s1s2 sm and t = t1t2 tn can be divided up into two parts cl cr = c, such that cl is a subsequence of s1 sm=2, and cr is a subsequence of sm=2+1 sm . In addition, there must exist an h 2 f0; : : :; ng such that cl is a subsequence of t1 th and cr is a subsequence of th+1 tn. This observation leads to a recursive algorithm. Notice that the base case is when m = 1. In that case count(s; t) = 1, if s 2 ft1; t2; : : :; tng, and count(s; t) = 0, otherwise. Clearly, in order to compute count like this, it is necessary to store count values for substrings of s and t. In the following, we store more information. The processors contain a loop, which will be executed at most log m + 1 times. We assume that m is a power of two (if not, then we can always extend s using a special symbol not appearing elsewhere in the strings; this cannot aect the value of count). A global variable k is used to count the number of iterations of these loops. At a certain point, given the value of k, we are only interested in the following substrings of s:
s1 s2k ; s2k+1 s22k ; s22k+1 s32k ; : : :; s(m=2k ?1)2k+1 sm: In that round, only the entries table[i; j; v], where i 2 f2k ; 2 2k ; 3 2k ; : : :; mg are maintained. The basic idea is that p = table[i; j; v] should be the maximal index (giving rise to the shortest substring of t) such that count(si?2k +1 si ; tp+1 tj ) = v: Notice that the substring of t starts with tp+1 rather than with tp. This results in a more elegant algorithm, as we need not worry about tp being matched with two elements from s, when combining the values for two adjacent substrings of s into a value for the concatenation. In the algorithm, we assign a processor to each table[i; j; v] entry which needs to be calculated. Each of these processors controls the necessary 3
process (i; j; v) finitialization: k = 0g if v = 0 then table[i; j; v] := j
else if v = 1 then
table[i; j; v] := max(f?1g [ fu 2 f0; : : :; j ? 1g j si = tu+1g)
else
table[i; j; v] := ?1 fmain loopg h := 1; k := 1 while k log m do if i mod 2h = 0 then table[i; j; v] := max(f?1g [ fq j u 2 f0; : : : ; vg ^ p = table[i; j; u] ^ p 6= ?1 ^ q = table[i ? h; p; v ? u] ^ q 6= ?1g) h := 2h; k := k + 1 fend whileg Figure 1: Computing the length of the longest common subsequences.
processors to compute the maxima they need; this number of processors will vary as k varies. We return to the complexity issues later. We assume that the processors are synchronized and that in each step reading occurs before writing. Apart from computing maxima, the computation strategy of which we have not speci ed, we are within the CREW model of computation, as only one processor writes in table[i; j; v]. Note that 1 i m, 0 j n, 0 v m and h = 2k?1. The algorithms is listed in gure 1. For the purpose of giving a correctness proof, we now formulate and prove the invariant, which was outlined intuitively before presenting the algorithm.
Lemma 1 The following loop invariant holds for all values of k encountered in the algorithm:
4
for all i 2 fc 2k j c 2 IN n f0g ^ c 2k mg, j 2 f0; : : :; ng, and v 2 f0; : : :; mg, we have that table[i; j; v] equals max(f?1g [ fr j 0 r j ^ count(si?2k +1 si ; tr+1 tj ) = vg):
Proof The proof will be by induction on k. First, we prove that the
invariant holds after the initialization. As k = 0, we consider all i's in f1; : : :; mg. In the following, assume that i and j are xed. Notice that si?2k +1 si is simply si. For each v, we determine the maximal r such that r j and count(si ; tr+1 tj ) = v. Assume that v = 0. As count(si ; ") = 0, we can choose r = j (" denotes the empty string). Assume that v = 1. In order for count(si; tr+1 tj ) to equal one, si must appear in t1 tj . If it does not, then the maximum is ?1 and table[i; j; 1] is also assigned ?1. Now, assume that si is present in t1 tj . Let tu be the right-most occurrence of si in t1 tj . Then count(si ; tu tj ) = 1 and u is the maximal integer for which this is the case. So, we can choose r = u ? 1. Assume that v > 1. As si has length one, count(si ; tr+1 tj ) can be at most one no matter what the value of r is. So, in this case, table[i; j; v] should be assigned ?1. Now, we assume that the invariant holds for some k 0. Let k0 = k + 1. Assume that the maximum is ?1, i.e., no r exists such that count(si?2k0 +1 si ; tr+1 tj ) = v. This means that count(si?2k0 +1 si; t1 tj ) < v: Now, assume to the contrary that a u exists such that p 6= ?1 and q 6= ?1. By the invariant, this means that count(si?2k +1 si ; tp+1 tj ) = u and that count(si?2k0 +1 si?2k ; tq+1 tp) = v ? u. But the two substrings of s are disjoint, as are the two substrings of t, so by de nition of count, we must have that count(si?2k0 +1 si; tq+1 tj ) = v, which is a contradiction. Now, assume that the maximum is r 0. By dividing a longest common subsequence of si?2k0 +1 si and tr+1 tj up into the part which is in si?2k0 +1 si?2k and the part in si?2k +1 si , it is clear that a u and an l must exist such that count(si?2k +1 si; tl+1 tj ) = u 5
and
count(si?2k0 +1 si?2k ; tr+1 tl ) = v ? u: By the invariant, p (in the algorithm) is the maximal integer such that count(si?2k +1 si; tp+1 tj ) = u;
so obviously l p. But then count(si?2k0 +1 si?2k ; tr+1 tp) is at least v ? u. By the invariant, q (in the algorithm) is now the maximal integer such that count(si?2k0 +1 si?2k ; tq+1 tp) = v ? u: So, given that we insist on matching exactly u elements from si?2k +1 si with elements from t, q must also be the maximal integer such that count(si?2k0 +1 si ; tq+1 tj ) = v. This follows since in order to use a larger q, we would have to use a larger p. Since p was chosen to be maximal, count(si?2k +1 si; tp+1 tj ) would become strictly smaller than u. As all u's are considered in the algorithm, and as si?2k +1 si must match u elements from t for some u (0 u v), we are bound to nd the right u and, thus, the correct maximum value. 2
Corollary 2 The algorithm above correctly computes count(s; t) when the nal result is computed from the table by count(s; t) = 0max fv j table[m; n; v] 6= ?1g: vm
Proof The last processors terminate when k = log m, and at that point, table[m; n; v] = max(f?1g [ fr j 0 r n ^ count(s; tr+1 tn) = vg): Clearly, table[m; n; v] = ?1 if and only if count(s; t) < v. 2
Finally, we consider the complexity issues. Recall that in the CRCW PRAM model, there are a polynomial number of processors and a shared memory. Any number of processors can read from the same memory cell at the same time, and concurrent writing is also allowed. We assume the COMMON model of CRCW PRAMs. In the COMMON model, if more than one processor writes to the same memory cell at the same time, then they must all write the same value. The COMMON model has also been called a WRAM in the literature. 6
The algorithm has been designed such that maxima computations and the remaining computation can be analyzed separately. We use the following well-known result on computing maxima in the COMMON model.
Lemma 3 The maximum of n elements can be computed in time O(1) in the COMMON CRCW model. Proof In the COMMON model, n2 processors are used to perform every possible comparison in one step. The details can be found in [6]. 2
Theorem 4 In the COMMON model, the length of the longest common
subsequences can be computed in time O(log m), using O(mn3) processors, where m and n are the lengths of the shorter and longer strings, respectively. Proof Rename, if necessary, so that m = jsj n = jtj. The algorithm runs in log m + 1 steps and, except for the calculation of maxima, only a constant amount of work is done by each processor in each step. In the initialization, in order to handle the cases with v 6= 1, only O(m2n) processors are necessary. If v = 1, it is necessary to compute the maxima of O(n) elements for each of O(mn) elements of table. Thus O(mn3) processors are sucient. Within the main loop, it is necessary to compute table values for m=2k values of i and n values of j . For each of the table values, if v > 2k , the result will be ?1, so we can store those values using O(m2n) processors. Otherwise, we must compute the maximum of O(2k) elements, so O(22kmn) processors are sucient. This is O(m3n), which is O(mn3). Furthermore, the predicates used in computing these maxima can be computed in constant time. 2 An obvious question to ask here is if the time bound of theorem 4 is optimal. Then, if n is polynomial in m, theorem 4 gives that the complexity of calculating the length of the longest common subsequences is O(log n) in the COMMON model, where n is the length of the input. An obvious lower bound for this problem is (log n= log log n), the lower bound for parity in the COMMON model [4] | if s is all ones and t 2 f0; 1g, then the problem is simply to count the number of ones in t, which is at least as hard as computing parity. Almost all Boolean functions of n inputs require time (log n) on a CRCW PRAM [4], so it seems reasonably likely that the upper bound is 7
tight. Proving a better lower bound than (log n= log log n)), however, is likely to be dicult. Suppose we could prove that calculating the length of the maximal common subsequence required more than O(log n= log log n) time on a COMMON PRAM. It is not hard to see that this would imply that computing at least one of the bits of the result requires more than O(log n= log log n) time. Otherwise, we could solve the problem by computing each of the bits individually and then accumulating the result. The accumulation of the result from the individual bits can trivially be done in O(log log n) time, so the entire computation would only require O(log n= log log n) time. The work of Chandra [5] and Stockmeyer and Vishkin [8] shows that everything in NC1 can be computed on a COMMON PRAM in time O(log n= log log n) [8]. Thus, improving this lower bound for the maximal common subsequence problem would give us a problem which was in NC, but not in NC1, and would thus be an extremely interesting result.
3 The bounded weights editing problem The longest common subsequence problem can viewed as a special case of the more general string editing problem. This problem is to nd the minimum possible cost, E (s; t) of transforming string s to string t, using the operations insert, delete, and substitute. The strings s and t are over a nite alphabet . The cost of inserting a character a is denoted I (a), the cost of deleting a is D(a), and the cost of substituting an occurrence of a with an occurrence of b is S (a; b). These costs are assumed to be nonnegative real numbers, with S (a; b) D(a)+ I (b) for all a and b. Here we assume that all of these costs are integers bounded by some constant. The same basic idea used in the algorithm in the previous section also works for the editing problem with bounded weights. One can divide the rst string s into two equal halves sl and sr and then nd a partition of t into tl and tr so that the sum of the edit distances E (sl ; tl) and E (sr ; tr) is minimized. There are some signi cant dierences, however. It is no longer sucient to nd the right-most location for dividing t which minimizes E (sr ; tr); we must keep track of the best cost we can nd for all possible divisions. In addition, the initialization step is signi cantly more complicated. In order to keep track of the minimum cost of various subproblems, we 8
let our array table have four dimensions. When the loop index has value k, the value table[i; j1; j2; v] is 1 if we have found an edit sequence for si?2k +1 si and tj1 tj2 of cost v and is 0 otherwise. The indices have the following ranges: 1 i m, 1 j1 j2 +1, and v 0. If j1 = j2 +1, then the substring of t is assumed to be empty. We de ne (p; q) to be p D(a) + q I (b), where a is the symbol which maximizes D(a) and b is the symbol which maximizes I (b). Clearly (p; q) is an upper bound on the cost of transforming a string of length p into one of length q. In order to keep the number of processors we use polynomial in the length of the input, we need (p; q) to be polynomial in p and q. Assuming that the edit operations have bounded weights clearly does this. As mentioned earlier, the initialization becomes signi cantly more complicated when we are considering the string editing problem, rather than the longest common subsequence problem. One process will be assigned to each element of the array table, and it will control a polynomial number of other processors. Since k = 0, the substrings of s we consider contain exactly one symbol, si. The rst case to consider is where j1 = j2 + 1, so there is nothing in the substring of t being considered. Then the transformation must simply delete the symbol si , and this has cost D(si ). If v has this value, then the table entry gets the value 1; otherwise it gets the value 0. Now, we suppose that j1 j2. For each symbol a 2 , Pwe calculate num(a), the number of a's in the substring tj1 tj2 , and z = a2num(a) I (a). In the COMMON-PRAM model, the rst can be done in time O(log n= log log n), and the second in constant time, since the costs and jj are constants. Now we consider two cases: either si appears in the substring tj1 tj2 , or it does not. If si does appear, then the cost of transforming si to tj1 tj2 is z ? I (si ). Otherwise, the cheapest way of doing this transformation is to substitute some character c for si and then insert all the other characters. To minimize this cost, for all symbols c with num(c) > 0, we calculate S (si ; c) ? I (c) and nd c, the character with the minimum value. Then, the cost of transforming si to tj1 tj2 is z + S (si ; c) ? I (c). Determining if si appears in tj1 tj2 can be done in time O(log n= log log n), and the remainder of these calculations can be done in constant time. If the cost of this transformation is equal to v, then the table entry gets the value 1; otherwise it gets the value 0. 9
Then, to compute table[i; j1; j2; v] in the kth execution of the main loop, it is necessary to check all appropriate pairs u and p to see if we have previously found transformations: one of si?2k +1 si?2k?1 to tj1 tp?1 of cost u, and one of cost si?2k?1+1 si to tp tj2 of cost v ? u. The values of u which need to be checked are from zero to v, and the values of p are from j1 through j2 + 1. This can be done in constant time with a polynomial number of processors. The algorithm can be written as follows. Note that 1 i m, 0 j1 j2 + 1 n + 1, 0 v (m; n), and h = 2k?1. The algorithm is listed in gure 2. The proof that this algorithm works is similar to that for the maximum common subsequence algorithm. The main loop is only executed log m times. After this algorithm has computed the values in table, the edit distance can be computed by nding the minimum value of v for which table[m; 0; n; v] = 1. This can be done in constant time. Thus, we have the following:
Theorem 5 In the COMMON model, the string editing problem with
bounded weights can be solved in time O(max(log m; log n= log log n)), where m and n are the lengths of the shorter and longer strings, respectively. The discussion after theorem 4, about the possibility of the time bound being optimal, applies to this problem as well.
10
process (i; j1; j2; v) finitialization: k = 0g if j1 = j2 + 1 then if v = D(si) then table[i; j1; j2; v] := 1 else table[i; j1; j2; v] := 0 else For each a 2 , calculate num(a),
the number of occurrences of a in tj1 tj2 P Calculate z = a2 num(a) I (a) Determine if si appears in tj1 tj2 if si is in tj1 tj2 then if v = z ? I (si ) then table[i; j1; j2; v] := 1 else table[i; j1; j2; v] := 0
else
Find a symbol c such that num(c) > 0, and S (si ; c) ? I (c) S (si; c) ? I (c) for all c, where num(c) > 0 if v = z + S (si ; c) ? I (c) then table[i; j1; j2; v] := 1 else table[i; j1; j2; v] := 0 fmain loopg h := 1; k := 1 while k log m do if i mod 2h = 0 then for all p 2 fj1; : : :; j2 + 1g for all u 2 f0; : : :; vg Determine if table[i ? h; j1; p ? 1; u] = 1 ^ table[i; p; j2; v ? u] = 1 if there exists a pair u; p for which this holds then table[i; j1; j2; v] := 1 else table[i; j1; j2; v] := 0 h := 2h; k := k + 1 fend whileg Figure 2: Solving the string editing problem with bounded weights. 11
References [1] A. Aggarwal and J. Park. Notes on searching in mulitdimensional monotone arrays. In 29th Annual IEEE Symposium on Foundations of Computer Science, pages 497{512. IEEE Computer Society Press, 1988. [2] A. V. Aho. Algorithms for nding patterns in strings. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, pages 255{300. Elsevier Science Publishers, 1990. [3] A. Apostolico, M. J. Atallah, L L. Larmore, and S. McFardin. Ef cient parallel algorithms for string editing and related problems. SIAM J. Comput., 19(5):968{988, 1990. [4] P. Beame and J. Hastad. Optimal bounds for decision problems on the CRCW PRAM. JACM, 36(3):643{670, 1989. [5] A. K. Chandra, L. Stockmeyer, and U. Vishkin. Constant depth reducibility. SIAM J. Comput., 13(2):423{439, 1984. [6] Y. Shiloach and U. Vishkin. Finding the maximum, merging, and sorting in a parallel computation model. J. Algorithms, 2:88{102, 1981. [7] G. A. Stephen. String search. Technical Report TR-92-gas-01, School of Electronic Engineering Science, University College of North Wales, 1992. [8] L. Stockmeyer and U. Vishkin. Simulation of parallel random access machines by circuits. SIAM J. Comput., 13(2):409{422, 1984.
12