Fast Cyclic Edit Distance Computation with

1 downloads 0 Views 185KB Size Report
Cyclic edit distances are a good measure of contour shapes dissimilarity. A Branch and Bound algorithm that speeds up the computation of cyclic edit distances ...
Fast Cyclic Edit Distance Computation with Weighted Edit Costs in Classification Guillermo Peris Andr´es Marzal Departament de Llenguatges i Sistemes Inform`atics Universitat Jaume I, Castell´o, Spain {peris, amarzal}@lsi.uji.es Abstract Cyclic edit distances are a good measure of contour shapes dissimilarity. A Branch and Bound algorithm that speeds up the computation of cyclic edit distances with arbitrary weights for the edit operations is presented. The algorithm is modified to work with an external bound that further accelerates the computation when applied to classification problems.

1. Introduction The contour of objects can be described by means of cyclic strings, that is chain codes without a beginning or ending point. In order to compare cyclic strings, a (dis)similarity criterion, such as the cyclic edit distance, must be defined. Formally, given an alphabet Σ and a string x ∈ Σ∗ , a cyclic string [x] is the set {σ k (x) : 0 ≤ k < |x|}, where σ k (x) = xk+1 . . . x|x| x1 . . . xk . The cyclic edit distance is defined as D([x], [y]) = minx∈[x] (miny∈[y] d(x, y)) = minx∈[x] d(x, y), where d(x, y) is the (non-cyclic) edit distance between the strings x, y ∈ Σ∗ . The edit distance d(x, y) is the minimum (weighted) number of edit operations (insertion, deletion, and substitution) required to transform x into y. The weights of the insertion, deletion, and substitution operations will be noted with γI , γD , and γS , respectively. The so-called Levenshtein distance is the edit distance with unit cost weights (γI = γD = γS = 1). Wagner and Fischer’s algorithm [6] calculates d(x, y) in O(|x||y|) time by implicitly computing the minimum weighted edit path, P (x, y), underlying the so-called edit graph. A trivial algorithm yields the cyclic edit distance value in O(|x|2 |y|) time by computing d(σ i (x), y) for all 0 ≤ i < |x|. Maes introduced in [2] a Divide and Conquer algorithm [1] that runs in O(|x||y| log |x|) time. It is based on a property of optimal edit paths: P (σ j (x), y) never crosses

P (σ i (x), y) nor P (σ k (x), y) for 0 ≤ i < j < k ≤ |x| on a special edit graph underlying the comparison of x · x and y. In [3], Marzal and Barrachina presented two Branch and Bound (BB) procedures [1] to speed up the computation of the cyclic Levenshtein distance. The BB methods have the same time complexity of Maes’ algorithm, but were shown to significantly reduce the computational cost in practice. When the computation of cyclic edit distances is applied to classification, we can further reduce the computational cost of the BB algorithms by using an external bound that stops the computation of edit distances as soon as they are known to be worse than the best cyclic edit distance computed so far. In this paper we extend one of the lower bounds defined in [3] to deal with arbitrary weights γI , γD , and γS , and introduce the external bound. Efficiency of the new BB algorithms is compared to that of Maes’ algorithm for different weighted cyclic edit distances. In Section 2, the BB algorithm is generalized to deal with cyclic, weighted edit distances and with an external bound. In Section 3, the family of cyclic edit distances is partitioned into classes of equivalence characterized by a single parameter. In Section 4, an experimental comparison of the non-cyclic edit distance (NCED), the new BB and Maes’ algorithms is presented for two different pattern recognition tasks. Finally, conclusions are presented in Section 5.

2. The Branch & Bound method for generalized edit costs The BB algorithm (Figure 1) is completely specified when the function g(i, k), a lower bound of mini