Department of Computer Science,. The University of Western ... size Ën, with Ën ⤠n, and the magnitude of Ën is reversely related to the degree of linearity. This.
An Improved Algorithm for Tree Edit Distance Incorporating Structural Linearity Shihyen Chen and Kaizhong Zhang Department of Computer Science, The University of Western Ontario, London, Ontario, Canada, N6A 5B7 {schen,kzhang}@csd.uwo.ca
Abstract. An ordered labeled tree is a tree in which the nodes are labeled and the left-to-right order among siblings is significant. The edit distance between two ordered labeled trees is the minimum cost of transforming one tree into the other by a sequence of edit operations. Among the best known tree edit distance algorithms, the majority can be categorized in terms of a framework named cover strategy. In this paper, we investigate how certain locally linear features may be utilized to improve the time complexity for computing the tree edit distance. We define structural linearity and present a method incorporating linearity which can work with existing cover-strategy based tree algorithms. We show that by this method the time complexity for an input of size n )) where φ(A, n ) is the time complexity of any becomes O(n2 + φ(A, n cover-strategy algorithm A applied to an input size n , with n ≤ n, and the magnitude of n is reversely related to the degree of linearity. This result is an improvement of previous results when n < n and would be useful for situations in which n is in general substantially smaller than n, such as RNA secondary structure comparisons in computational biology. Keywords: Tree edit distance, dynamic programming, RNA secondary structure comparison.
1
Introduction
An ordered labeled tree is a tree in which the nodes are labeled and the left-toright order among siblings is significant. Trees can represent many phenomena, such as grammar parses, image descriptions and structured texts, to name a couple. In many applications where trees are useful representations of objects, the need for comparing trees frequently arises. The tree edit distance metric was introduced by Tai [7] as a generalization of the string edit distance problem [9]. Given two trees T1 and T2 , the tree edit distance between T1 and T2 is the minimum cost of transforming one tree into the other, with the sibling and ancestor orders preserved, by a sequence of edit operations on the nodes (relabeling, insertion and deletion) as shown in Figure 1.
Research supported partially by the Natural Sciences and Engineering Research Council of Canada under Grant No. OGP0046373 and a grant from MITACS, a Network of Centres of Excellence for the Mathematical Sciences.
G. Lin (Ed.): COCOON 2007, LNCS 4598, pp. 482–492, 2007. c Springer-Verlag Berlin Heidelberg 2007
An Improved Algorithm for Tree Edit Distance Incorporating
483
T2
T1 (a → b) a
b
T1
T2 (c → ∅) c
T1
T2 (∅ → c)
c
Fig. 1. Tree edit operations. From top to bottom: relabeling, deletion and insertion.
Among the known algorithms of comparable results such as in [1,2,4,10], the majority [2,4,10] can be categorized in terms of a generalized framework by the name of cover strategy [3] which prescribes the direction by which a dynamic program builds up the solution. Briefly, a tree is decomposed into a set of disjoint paths which, in this paper, we refer to as special paths. Each special path is associated with a subtree such that the special path coincides with a path of the subtree which runs from the root to a leaf. The dynamic program proceeds in a bottom-up order with respect to the special paths such that for any node i on a special path, the subtrees hanging off the portion of the special path no higher than i have been processed before the node i is reached. When there are subtrees hanging off on both sides of a special path, the decision as to which side takes precedence is referred to as strategy. In the Zhang-Shasha algorithm [10], the special paths are chosen to be the leftmost paths. In Klein’s algorithm [4] as well as that of Demaine et al. [2], the special paths are chosen such that every node on a special path is the root of a largest subtree over its sibling subtrees. These special paths are referred to as heavy paths [6]. Examples of leftmost paths and heavy paths are shown in Figure 2. In these algorithms, no consideration is given to any structural characteristics which may exist in the tree. In this paper, we investigate the possibility of utilizing certain linear features within the trees to speed up the computation of the tree edit distance. We show that by embedding a procedure in any cover-
Fig. 2. Left: Decomposition of a tree into leftmost paths (in bold). Right: Decomposition of a tree into heavy paths (in bold).
484
S. Chen and K. Zhang
strategy algorithm A, the resulting time complexity is O(n2 + φ(A, n )) where n is the original input size, φ(A, n ) is the time complexity of algorithm A applied to an input size n , with n ≤ n, and the magnitude of n is reversely related to the degree of linearity. This result would be useful for applications in which n is in general substantially smaller than n. The rest of the paper is organized as follows. In Section 2, we define structural linearity and give a new representation of trees based on reduction of the tree size due to the linearity. In Section 3, we show the algorithmic aspects as a result of incorporating the linearity and the implications on the time complexity. In Section 4, we describe one suitable application of our result. We give concluding remarks in Section 5.
2 2.1
Preliminaries Notations
Given a tree T , we denote by t[i] the ith node in the left-to-right post-order numbering. The index of the leftmost leaf of the subtree rooted at t[i] is denoted by l(i). We denote by F [i, j] the ordered sub-forest of T induced by the nodes indexed i to j inclusive. The subtree rooted at t[i] in T is denoted by T [i], i.e., T [i] = F [l(i), i]. The sub-forest induced by removing t[i] from T [i] is denoted by F [i], i.e., F [i] = F [l(i), i − 1]. When referring to the children of a specific node, we adopt a subscript notation in accordance with the left-to-right sibling order. For example, the children of t[i], from left to right, may be denoted by (t[i1 ], t[i2 ], · · · , t[ik ]). 2.2
Linearity
Definition 1 (V-Component). Given a tree T with left-to-right post-order and pre-order numberings, a path π of T is a v-component (i.e., vertically linear component) if all of the following conditions hold. – Both the post-order and the pre-order numberings along this path form a sequence of continuous indices. – No other path containing π satisfies the above condition. Definition 2 (V-Reduction). The v-reduction on a tree is to replace every v-component in the tree by a single node. Definition 3 (H-Component). Given a tree T and another tree T obtained by a v-reduction on T , any set of connected components of T corresponding to a in T form an h-component (i.e., horizontally linear component) set of leaves L if all of the following conditions hold. ≥ 2. – |L| share the same parent. – All the leaves in L – A left-to-right post-order or pre-order numbering on T produces a sequence of continuous indices for L. satisfy the above conditions. – No other set of leaves containing L
An Improved Algorithm for Tree Edit Distance Incorporating
485
Definition 4 (H-Reduction). The h-reduction on a tree is to replace every h-component in the tree by a single node. A tree possesses vertical (horizontal) linearity if it contains any v-component (h-component). A tree is v-reduced if it is obtained by a v-reduction only. A tree is vh-reduced if it is obtained by a v-reduction followed by an h-reduction. Note that a reduced tree is just a compact representation of the original tree. The edit distance of two reduced trees is the same as the edit distance of the original trees. In the case when a tree does not possess any linearity as defined above, the reduced tree is the same as the original tree. In Figure 3, we give an example showing the v-components and h-components of a tree and the corresponding reduced trees. Note that an h-component can also contain v-components. Each node in a v-reduced tree corresponds to either a v-component or a single node in the corresponding full tree. Given a v-reduced tree T, we define
v reduced
h reduced
Fig. 3. The v-components (in dashed enclosures) and the h-components (in dotted enclosures). Also shown are the reduced trees as a result of reduction. The parts of the original tree affected by reduction are represented by black nodes in the reduced tree.
t[α(i)] t[i] t[β(i)]
T[i] T [α(i)]
Fig. 4. A partial view of the mapping of nodes between a tree (left) and its v-reduced tree (right)
486
S. Chen and K. Zhang
two functions α() and β() which respectively map a node t[i] to the highest indexed node t[α(i)] and the lowest indexed node t[β(i)] of the corresponding vcomponent in the full tree T . In the special case when t[i] corresponds to a single node in T , t[α(i)] = t[β(i)]. An example of this mapping is given in Figure 4. When T is h-reduced to yield T, α() and β() apply in the same way for the mapping from T to T.
3
Algorithm
In this section, we show how to incorporate vertical linearity in the ZhangShasha algorithm. We will also show that the method can be incorporated in all the cover-strategy algorithms. Due to space limitation, we shall not discuss the incorporation of horizontal linearity in this paper. The method involves adapting techniques from matrix searching and will be given elsewhere in the future. 3.1
Incorporating Vertical Linearity
We denote by d(, ) the edit distance. The following lemmas incorporate vertical linearity in the Zhang-Shasha algorithm. Lemma 1 1. d(∅, ∅) = 0. 2. ∀i ∈ T1 , ∀i ∈ [l(i), i], d(F1 [l(i), i ], ∅) = d(F1 [l(i), i − 1], ∅) + d( t1 [i ], ∅). 3. ∀j ∈ T2 , ∀j ∈ [l(j), j], d(∅, F2 [l(j), j ]) = d(∅, F2 [l(j), j − 1]) + d(∅, t2 [j ]). Proof Case 1 requires no edit operation. In case 2 and case 3, the distances correspond to the costs of deleting and inserting the nodes in F1 [l(i), i ] and F2 [l(j), j ], respectively. Lemma 2. ∀(i, j) ∈ (T1 , T2 ), ∀i ∈ [l(i), i] and ∀j ∈ [l(j), j], if l(i ) = l(i) and l(j ) = l(j), d(F1 [l(i), i ], F2 [l(j), j ]) = d(T1 [i ], T2 [j ]) ; otherwise, ⎧ ⎫ 1 [l(i), i − 1], F2 [l(j), j ]) + d( ⎪ ⎪ d( F t [i ], ∅), 1 ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ d( F [l(i), i ], F [l(j), j − 1]) + d(∅, t [j ]), 1 2 2 . d(F1 [l(i), i ], F2 [l(j), j ]) = min ⎪ ⎪ ⎪ d(F1 [l(i), l(i ) − 1], F2 [l(j), l(j ) − 1]) ⎪ ⎪ ⎪ ⎩ ⎭ +d(T1 [i ], T2 [j ]) Proof. The condition “l(i ) = l(i) and l(j ) = l(j)” implies that the two forests are simply two trees and the equality clearly holds. We now consider the other condition in which “l(i ) = l(i) or l(j ) = l(j)”. If t1 [α(i )] = t1 [β(i )] and
An Improved Algorithm for Tree Edit Distance Incorporating
487
t2 [α(j )] = t2 [β(j )], the formula holds as a known result. Otherwise, at least one of t1 [i ] and t2 [j ] corresponds to a v-component in (T1 [α(i)], T2 [α(j)]). Consider the connected components in (T1 [α(i)], T2 [α(j)]) corresponding to ( t1 [i ], t2 [j ]). There are two cases to consider: either (1) there is no occurrence of node-to-node match between the connected components; or (2) there is at least one occurrence of node-to-node match between the connected components. In case 1, one of the components must be entirely deleted which implies that either t1 [i ] must be deleted or t2 [j ] must be inserted. In case 2, in order to preserve the ancestordescendent relationship T1 [i ] and T2 [j ] must be matched. Note. In Lemma 2 for the condition “l(i ) = l(i) or l(j ) = l(j)” the value of d(T1 [i ], T2 [j ]) would already be available if implemented in a bottom-up order, since it involves a subproblem of d(F1 [l(i), i ], F2 [l(j), j ]) and would have been computed. For the condition “l(i ) = l(i) and l(j ) = l(j)”, however, we encounter the problem involving (T1 [i ], T2 [j ]) for the first time and must compute its value. We show how to compute d(T1 [i ], T2 [j ]) in the following lemmas. Lemma 3. ∀u ∈ [β(i ), α(i )], ⎫ ⎧ ⎬ ⎨ d(F1 [u], F2 [β(j )]) + d(t1 [u], ∅), d(T1 [u], F2 [β(j )]) = min minj1 ≤q≤jl {d(T1 [u], T2 [α(q)]) − d(∅, T2 [α(q)])} . ⎭ ⎩ +d(∅, F2 [β(j )]) Proof. This is the edit distance between the tree T1 [u] and the forest F2 [β(j )]. There are two cases. In the first case, t1 [u] is constrained to be deleted and the remaining substructure F1 [u] is matched to F2 [β(j )]. In the second case, t1 [u] is constrained to be matched to a node somewhere in F2 [β(j )]. This is equivalent to stating that T1 [u] is constrained to be matched to a subtree in F2 [β(j )]. The question thus becomes finding a subtree in F2 [β(j )] to be matched to T1 [u] so as to minimize the distance between T1 [u] and F2 [β(j )] under such constraint. This can be done by considering the set of all combinations in which exactly one tree in F2 [β(j )] is matched to T1 [u] while the remainder of F2 [β(j )] is deleted. The minimum in this set is the edit distance for the second case. Lemma 4. ∀v ∈ [β(j ), α(j )], ⎧ ⎫ ⎨ d(F1 [β(i )], F2 [v]) + d(∅, t2 [v]), ⎬ d(F1 [β(i )], T2 [v]) = min mini1 ≤p≤ik {d(T1 [α(p)], T2 [v]) − d(T1 [α(p)], ∅)} . ⎩ ⎭ +d(F1 [β(i )], ∅) Proof. This is symmetric to that of Lemma 3.
Lemma 5. ∀u ∈ [β(i ), α(i )] and ∀v ∈ [β(j ), α(j )], ⎫ ⎧ ⎨ d(F1 [u], T2 [v]) + d(t1 [u], ∅), ⎬ d(T1 [u], T2 [v]) = min d(T1 [u], F2 [v]) + d(∅, t2 [v]), . ⎭ ⎩ d(F1 [u], F2 [v]) + d(t1 [u], t2 [v]) Proof. This is a known result for the tree-to-tree edit distance.
488
S. Chen and K. Zhang
Note. In the computation for every d(T1 [i ], T2 [j ]), we save the values for d(T1 [u], T2 [α(j )]) ∀u ∈ [β(i ), α(i )] and d(T1 [α(i )], T2 [v]) ∀v ∈ [β(j ), α(j )]. This ensures that when d(T1 [u], F2 [β(j )]) in Lemma 3 and d(F1 [β(i )], T2 [v]) in Lemma 4 are evaluated in a bottom-up order the values of the terms involving d(T1 [u], T2 [α(q)]) and d(T1 [α(p)], T2 [v]) would be available. Lemma 6. d(T1 [i ], T2 [j ]) = d(T1 [α(i )], T2 [α(j )]) . Proof. The result follows from the tree definitions. 3.2
The New Algorithm
For every node i of tree T , we designate a child of i, if any, to be its special child, denoted by sc(i). Note that in the Zhang-Shasha algorithm sc(i) is the leftmost child of i whereas in a different cover-strategy the choice of sc(i) may be different. Denote by p(i) the parent of i. We define a set of nodes, called key roots, for tree T as follows. keyroots(T ) = {k | k = root(T ) or k = sc(p(k))} . This is a generalized version of the LR keyroots used in [10] and is suitable for any known decomposition strategy as in [2,4,10]. Referring to Figure 2, in every special path the highest numbered node in a left-to-right post-order is a key root. We now give the new algorithm in Algorithms 1 and 2. Algorithm 1 contains the main loop which repeatedly calls Algorithm 2 to compute d(T1 [i], T2 [j]) where (i, j) are key roots in (T1 , T2 ). Theorem 1. The new algorithm correctly computes d(T1 , T2 ). Proof. The correctness of all the computed values in Algorithm 2 follows from the lemmas. By Lemma 6, d(T1 , T2 ) = d(T1 , T2 ) when (i , j ) are set to be the roots of (T1 , T2 ). Since these roots are key roots, d(T1 , T2 ) is always computed by Algorithm 1. Algorithm 1. Computing d(T1 , T2 ) Input: (T1 , T2 ) Output: d(T1 [i], T2 [j]), where 1 ≤ i ≤ |T1 | and 1 ≤ j ≤ |T2 | 1: Sort keyroots(T1 ) and keyroots(T2 ) in increasing order into arrays K1 and K2 , respectively 2: for i ← 1, · · · , |keyroots(T1 )| do 3: for j ← 1, · · · , |keyroots(T2 )| do 4: i ← K1 [i ] 5: j ← K2 [j ] 6: Compute d(T1 [i], T2 [j]) by Algorithm 2 7: end for 8: end for
An Improved Algorithm for Tree Edit Distance Incorporating
489
Algorithm 2. Computing d(T1 [i], T2 [j]) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:
13: 14: 15:
16: 17: 18: 19: 20: 21: 22: 23: 24:
25: 26: 27:
d(∅, ∅) ← 0 for i ← l(i), · · · , i do t1 [i ], ∅) d(F1 [l(i), i ], ∅) ← d(F1 [l(i), i − 1], ∅) + d( end for for j ← l(j), · · · , j do d(∅, F2 [l(j), j ]) ← d(∅, F2 [l(j), j − 1]) + d(∅, t2 [j ]) end for for i ← l(i), · · · , i do for j ← l(j), · · · , j do if l(i ) = l(i) and l(j ) = l(j) then for u ← β(i ), · · · , α(i ) do [u], F2 [β(j )]) ← d(T1⎧ ⎫ ⎬ ⎨ d(F1 [u], F2 [β(j )]) + d(t1 [u], ∅), min minj1 ≤q≤jl {d(T1 [u], T2 [α(q)]) − d(∅, T2 [α(q)])} ⎭ ⎩ +d(∅, F2 [β(j )]) end for for v ← β(j ), · · · , α(j ) do [β(i )], T2 [v]) ← d(F1⎧ ⎫ ⎬ ⎨ d(F1 [β(i )], F2 [v]) + d(∅, t2 [v]), min mini1 ≤p≤ik {d(T1 [α(p)], T2 [v]) − d(T1 [α(p)], ∅)} ⎭ ⎩ +d(F1 [β(i )], ∅) end for for u ← β(i ), · · · , α(i ) do for v ← β(j ), · · · , α(j )⎧do ⎫ ⎨ d(F1 [u], T2 [v]) + d(t1 [u], ∅), ⎬ d(T1 [u], T2 [v]) ← min d(T1 [u], F2 [v]) + d(∅, t2 [v]), ⎩ ⎭ d(F1 [u], F2 [v]) + d(t1 [u], t2 [v]) end for end for d(T1 [i ], T2 [j ]) ← d(T1 [α(i )], T2 [α(j )]) else [l(i), i ], F2 [l(j), j ]) ← d(F1⎧ ⎫ ⎪ ⎪ t1 [i ], ∅), ⎨ d(F1 [l(i), i − 1], F2 [l(j), j ]) + d( ⎬ min d(F1 [l(i), i ], F2 [l(j), j − 1]) + d(∅, t2 [j ]), ⎪ ⎩ d(F [l(i), l(i ) − 1], F [l(j), l(j ) − 1]) + d(T [i ], T [j ]) ⎪ ⎭ 1 2 1 2 end if end for end for
Theorem 2. The new algorithm runs in O(n2 + n 4 ) time and O(n2 ) space, where n is the original input size and n ≤ n. Proof. We first consider the time complexity. The v-reduced trees can be built in linear time in a preprocess. Identifying and sorting the key roots can be done in linear time. As well, all the values associated with insertion or deletion of a subtree or a sub-forest, as appearing in Lemma 3 and Lemma 4, can be
490
S. Chen and K. Zhang
obtained beforehand in linear time during a tree traversal. The Zhang-Shasha algorithm has a worst-case running time of O( n4 ) for an input size n . Referring to Algorithm 2, we consider the block from line 11 to 21 which concerns computation involving the (i , j ) pairs on leftmost paths, based on Lemmas 3, 4 and 5. This is the part of the algorithm that structurally differs from its counterpart in the Zhang-Shasha algorithm. For each such (i , j ) pair, this part takes O((α(i ) − β(i ) + 1) × (jl − j1 + 1) + (α(j ) − β(j ) + 1) × (ik − i1 + 1) + (α(i ) − β(i ) + 1) × (α(j ) − β(j ) + 1)). All the subproblems of (T1 , T2 ) associated with these (i , j ) pairs are disjoint. Summing over all these pairs, we can bound the complexity by O(n2 ). Hence, the overall time complexity is O(n2 + n 4 ). We now consider the space complexity. We use three different arrays: the fulltree array, the reduced-tree array and the reduced-forest array. The reducedforest array is a temporary array and its values can be rewritten during the computation of the reduced-forest distances. The other two arrays are permanent arrays for storing tree distances. The space for the reduced-tree array and the reduced-forest array is bounded by O( n2 ). The space for the full-tree array is 2 bounded by O(n ). Hence, the space complexity is O(n2 ). Theorem 3. Given (T1 , T2 ) of maximum size n, the edit distance d(T1 , T2 ) can be computed in O(n2 + φ(A, n )) time where φ(A, n ) is the time complexity of any cover-strategy algorithm A applied to an input size n , with n ≤ n. Proof. Since a v-component is consisted of a simple path, there is only one way a dynamic program can recurse along this path regardless which strategy is used. Hence, Lemmas 3 to 6 are valid for all cover strategies. Lemmas 1 and 2, after a proper adjustment of the subtree orderings in each forest to adapt to the given strategy, are also valid. The theorem is implied from Theorems 1 and 2 when the lemmas are properly embedded in any cover-strategy algorithm.
4
Application
We describe one application which would benefit from our result, namely RNA secondary structure comparison. RNA is a molecule consisted of a single strand of nucleotides (abbreviated as A, C, G and U ) which folds back onto itself by means of hydrogen bonding between distant complementary nucleotides, giving rise to a so-called secondary structure. The secondary structure of an RNA molecule can be topologically represented by a tree. An example is depicted in Figure 5. In this representation, an internal node represents a pair of complementary nucleotides interacting via hydrogen bonding. When a number of such pairs are stacked up, they form a local structure called stem, which corresponds to a v-component in the tree representation. The secondary structure plays an important role in the functions of RNA [5]. Therefore, comparing the secondary structures of RNA molecules can help understand their comparative functions. One way to compare two trees is to compute the edit distance between them. To gain an impression, we list the size reductions for a set of selected tRNA molecules in Table 1. |T | is the size of the original tree. |T| is the size of the
An Improved Algorithm for Tree Edit Distance Incorporating
491
AU GC CG C
A
U
G
C
C
G
A U A C A U U A U A A U A C A U G C C U U G A U G C U G
UA
U A
A
UA
UA
AU
CG
UA
CG
GC
C
U
U C G U
UA A U C U
Fig. 5. Left: Secondary folding of RNA. Dotted lines represent hydrogen bonds. Right: The corresponding tree representation. Table 1. Reduction of tree sizes for selected tRNA molecules [8] Name Athal-chr4.trna25 cb25.fpc2454.trna15 CHROMOSOME I.trna38 chr1.trna1190 Acinetobacter sp ADP1.trna45 Aquifex aeolicus.trna21 Azoarcus sp EbN1.trna58 Bacillus subtilis.trna63 Aeropyrum pernix K1.trna3 Sulfolobus tokodaii.trna25
|T | 52 52 51 51 55 55 55 52 56 53
|T| 35 40 38 34 38 38 38 35 39 36
Reduction (%) 33% 23% 25% 33% 31% 31% 31% 33% 30% 32%
v-reduced tree. The last column shows the size reductions in percentage. On average, we observe a size reduction by nearly one third of the original size, which roughly translates into a one-half decrease of running time for the known cover-strategy algorithms.
5
Conclusions
We presented a new method for computing tree edit distance by incorporating structural linearity. This method can work with any existing cover-strategy based algorithm to yield a time complexity of O(n2 + φ(A, n )) where n is the original input size and φ(A, n ) assumes the same form of the time complexity of the given algorithm A when it is applied to an input size n , with n ≤ n. The magnitude of n is reversely related to the degree of linearity. This result would be useful when n is in general substantially smaller than n. Therefore, incorporating our technique in any existing cover-strategy algorithm may yield an improved performance for such situations. One application which can readily benefit from this improvement is RNA secondary structure comparisons in computational biology.
492
S. Chen and K. Zhang
References 1. Chen, W.: New algorithm for ordered tree-to-tree correction problem. Journal of Algorithms 40(2), 135–158 (2001) 2. Demaine, E.D., Mozes, S., Rossman, B., Weimann, O.: An optimal decomposition algorithm for tree edit distance. In: Proceedings of the 34th International Colloquium on Automata, Languages and Programming (To appear) 3. Dulucq, S., Touzet, H.: Decomposition algorithms for the tree edit distance problem. Journal of Discrete Algorithms 3, 448–471 (2005) 4. Klein, P.N.: Computing the edit-distance between unrooted ordered trees. In: Proceedings of the 6th European Symposium on Algorithms(ESA), pp. 91–102 (1998) 5. Moore, P.B.: Structural motifs in RNA. Annual review of biochemistry 68, 287–300 (1999) 6. Sleator, D.D., Tarjan, R.E.: A data structure for dynamic trees. Journal of Computer and System Sciences 26, 362–391 (1983) 7. Tai, K.: The tree-to-tree correction problem. Journal of the Association for Computing Machinery (JACM) 26(3), 422–433 (1979) 8. Genomic tRNA Database. http://lowelab.ucsc.edu/gtrnadb/ 9. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the ACM 21(1), 168–173 (1974) 10. Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing 18(6), 1245–1262 (1989)