{amarzal,palazon,peris}@lsi.uji.es. Dept. ... deletions, and substitutions) transforming A into B. Wagner and Fisher proposed in [11] an O(mn) time ... plied the (exact) cyclic edit distance computation to the recognition of shapes described with ...
Contour-based Shape Retrieval Using Dynamic Time Warping Andr´es Marzal, Vicente Palaz´on, and Guillermo Peris? {amarzal,palazon,peris}@lsi.uji.es Dept. Llenguatges i Sistemes Inform` atics. Universitat Jaume I de Castell´ o. Spain.
Abstract. A dissimilarity measure for shapes described by their contour, the Cyclic Dynamic Time Warping (CDTW) dissimilarity, is introduced. The dissimilarity measure is based on Dynamic Time Warping of cyclic strings, i.e., strings with no definite starting/ending points. The Cyclic Edit Distance algorithm by Maes cannot be directly extended to compute the CDTW dissimilarity, as we show in the paper. We present an algorithm that computes the CDTW dissimilarity in O(mn log n) time, where m and n are the lengths of the cyclic strings. Shape retrieval with the new dissimilarity measure is experimentally compared with the WARP system on a standard corpus.
1
Introduction
Contour matching is an important problem in shape classification and retrieval. Contours can be represented with cyclic strings: strings where no starting point can be univoquely defined. A cyclic string can be viewed as the set of strings obtained by cyclically shifting a representative string. Let A = a0 a1 . . . am−1 be a string from a (possibly infinite) alphabet Σ and let Σ ∗ be the closure under concatenation of Σ. A cyclic shift σ of A is a mapping σ : Σ ∗ → Σ ∗ defined as σ(a0 a1 . . . am−1 ) = a1 a2 . . . am−1 a0 . Let σ k denote the composition of k cyclic shifts and let σ 0 denote the identity. Two strings A and A0 are cyclically equivalent if A = σ k (A0 ) for some k. A cyclic string is an equivalence class [A] = {σ k (A) : 0 ≤ k < m}. Any of its members is a representative (non-cyclic) string. For instance, let Σ = {x, y} be an alphabet; the set {yxxx, xxxy, xxyx, xyxx} is a cyclic string and yxxx (or any other string in the set) can be taken as its representative. The contour of a shape can be described in terms of cyclic strings using different sets Σ: Freeman chaincodes, 2D points, curvature and/or distance to centroid values, etc. Levenshtein defined the Edit Distance (ED) between two (non-cyclic) strings A and B as the minimum (weighted) number of edit operations (insertions, deletions, and substitutions) transforming A into B. Wagner and Fisher proposed in [11] an O(mn) time algorithm to compute the ED, where m and n are the lengths of A and B. The Cyclic Edit Distance (CED) between the cyclic strings ?
This work has been supported by the Spanish Ministerio de Ciencia y Tecnolog´ıa and FEDER under grant TIC2002-02684.
[A] and [B] is the minimum (weighted) number of edit operations needed to transform [A] into [B] and is defined in terms of the minimum ED between any pair of representatives. A trivial, O(m2 n2 ) time algorithm to compute the CED consists in obtaining the ED for all mn pairs of representatives and choosing the minimum ED value. Maes proposed in [4] a Divide-and-Conquer algorithm to compute the CED in O(mn log n) time. In [6, 8], Marzal et al. improved the running time of this algorithm by proposing a Branch-and-Bound exploration of its search space. In [2], Bunke and B¨ uhler obtained an approximate value for the CED in O(mn) time. Mollineda et al. proposed in [7] other heuristics to approximate the value of the CED. Edit operations naturally arise in the comparison of some contour parameterizations such as cyclic Freeman chaincodes, but are difficult to define properly when contours are represented by sequences of points or real values. Maes applied the (exact) cyclic edit distance computation to the recognition of shapes described with polygons [5]. He pointed out that the CED has some drawbacks when applied to this problem: it is sensitive to segmentation inconsistencies in the polygons. Each primitive of one polygon is either aligned with one and only one primitive of the other polygon (a substitution) or deleted/inserted: similar regions of polygons represented by a different number of edges cannot be aligned. A different dissimilarity measure for (non-cyclic) strings can be defined in terms of Dynamic Time Warping (DTW), which is based on the weight of an optimal alignment of two (non-cyclic) strings [9]. DTW is similar to ED, but the only “edit operations” allowed are (possibly one-to-many) substitutions. DTW has been successfully applied to speech recognition, on-line handwritten text recognition, time series alignment, etc. Some approaches to shape matching represent contours with global features such as Fourier descriptors or invariant moments [12]. Recently, Bartolini et al. [1] have designed a shape retrieval system (WARP) that uses DTW to compare cyclic sequences of 2D points that have been “normalized” by means of Fourier descriptors manipulation. The WARP system has been tested on a shape retrieval standard database and their results outperform other DTW-based methods. In this paper, we propose a different Cyclic Dynamic Time Warping (CDTW) algorithm which is inspired in the CED algorithm by Maes. We show that Maes’ algorithm cannot be directly extended to compute the CDTW and provide O(mn log n) time algorithm for the CDTW dissimilarity computation. Experiments run on the same shape retrieval task presented in [1] show that our method provides significantly better results.
2
Dynamic Time Warping
Let A = a0 a1 . . . am−1 and B = b0 b1 . . . bn−1 be two strings in Σ ∗ . An alignment between two sequences A and B is a sequence of pairs (i0 , j0 ), (i1 , j1 ), . . . , (ik−1 , jk−1 ) such that (a) 0 ≤ i` < m and 0 ≤ j` < n; (b) 0 ≤ i`+1 − i` ≤ 1 and 0 ≤ j`+1 − j` ≤ 1; and (c) (i` , j` ) 6= (i`+1 , j`+1 ). The pair (i` , j` ) is said to align ai` with bj` . Each pair is weighted by means of a local dissimilarity function
x
4
x
3
y
2
y
1
x
0 0
1
2
3
x
x
y
y
b0 x
b1 y
b2 y
b3 x
x a0
x a1
y a2
y a3
(a)
b4 x
(b)
Fig. 1. (a) Warping graph for xyyxx and xxyy. The optimal alignment is shown with thicker arrows and is (0, 0), (1, 0), (2, 1), (2, 2), (2, 3), (3, 4). (b) Aligned symbols.
γ : Σ × Σ → R≥0 . The γ function can be defined as the euclidean distance of 2D points, P the difference of curvatures, etc. The weight of an alignment is defined as 0≤` 0 and j = 0; if i = 0 and j > 0;
(1)
if i > 0 and j > 0.
This equation formulates the D(A, B) computation problem as a shortest path problem in the so-called warping graph, an array of nodes (i, j), where 0 ≤ i < m and 0 ≤ j < n, connected by horizontal, vertical and diagonal arcs (see Fig. 1). All arcs incident on the same node (i, j) are weighted with γ(xi , yj ). Each path from (0, 0) to (m − 1, n − 1) defines an alignment between A and B containing a pair (i, j) for each traversed node (i, j). The weight of a path (and of its corresponding alignment) is the sum of the weights of its arcs. Since the warping graph is acyclic, the optimal path can be computed by Dynamic Programming in O(mn) time. Aligned pairs of symbols in DTW can be assimilated to substitutions in ED, but DTW allows for one-to-many correspondences. This makes DTW appropriate to model “elastic distortions” of strings describing shapes or time series. On the other hand, DTW alignments have no insertions or deletions and seem preferable to ED when these operations do not naturally arise. There are alternative definitions of the DTW (different arcs in the warping graph or weighting
2
1
20
(b) 20
1
2
(a)
1
(c)
Fig. 2. (a) Two fish shapes. The black dots indicate starting points in their (counterclockwise) contour coding as strings of curvature values. (b) Optimal alignment of the curvatures starting at black dots. The absolute value of the distance between aligned points is shown under the alignment. The DTW dissimilarity is the sum of these values. (c) A more significant alignment is possible if the second shape string starts at 20 .
functions that affect differently diagonal arcs). For the sake of clarity, we will consider only DTW similarity as defined by the recurrence (1). There is a fundamental flaw when trying to compare cyclic strings with DTW: the DTW is sensitive to the election of the starting point. Fig 2 illustrates this problem for cyclic strings of curvature values representing contours.
3
The WARP System
Bartolini et al. have presented in [1] a shape retrieval system that compares (cyclic) strings of 2D points describing contours with DTW: the so-called WARP system. DTW cannot be directly applied to sequences of 2D points, since it is sensitive to changes in position, scale, angle, and starting point (i.e., the chosen representative for the cyclic string). The WARP system proposes a normalization of the sequences of 2D points which is invariant to all these factors. This invariant representation is computed by properly manipulating the Fourier descriptors of the sequence (considered as a function of complex numbers), which is reconstructed from its new Fourier descriptors by an inverse Fourier transform. Let A = a0 a1 . . . am−1 be a sequence of complex numbers where each element denotes a point in the complex plane (thus defining a 2D point). The Discrete Fourier Transform of A is a sequence of complex values X a0i = ak e−j2πki/m , i ∈ {−m/2, . . . , −1, 0, 1, . . . , m/2 − 1}, 0≤k