{celebi,alp}@cse.uta.edu. Abstract. Shape is ..... The average precision and recall of the 70 classes of ... classes of shapes from the region shape database is.
A Comparative Study of Three Moment-Based Shape Descriptors M. Emre Celebi and Y. Alp Aslandogan Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019-0015 U.S.A {celebi,alp}@cse.uta.edu Abstract Shape is one of the fundamental visual features in the Content-based Image Retrieval (CBIR) paradigm. Numerous shape descriptors have been proposed in the literature. These can be broadly categorized as region-based and contour-based descriptors. Contourbased shape descriptors make use of only the boundary information, ignoring the shape interior content. Therefore, these descriptors cannot represent shapes for which the complete boundary information is not available. On the other hand, region-based descriptors exploit both boundary and internal pixels, and therefore are applicable to generic shapes. Among the region-based descriptors, moments have been very popular since they were first introduced in the 60’s. In this paper we study and compare three moment-based descriptors: Invariant moments, Zernike moments, and radial Chebyshev moments. Experiments on the MPEG-7 shape databases show that radial Chebyshev moments achieve the highest retrieval performance.
1. Introduction Shape is one of the fundamental visual features in the Content-based Image Retrieval (CBIR) paradigm. Numerous shape descriptors have been proposed in the literature. These can be broadly categorized as regionbased and contour-based descriptors. Contour-based shape descriptors make use of only the boundary information, ignoring the shape interior content. Examples to contour-based shape descriptors include Fourier descriptors [1,2,3,4,5], Wavelet descriptors [6,7], curvature scale space descriptor [8], etc. Since they are computed using only boundary pixels, in general, their computational complexity is low. However, these descriptors cannot represent shapes for which the complete boundary information is not available such as objects with holes, partially-
occluded objects, and complex objects consisting of multiple disconnected regions. On the other hand, region-based shape descriptors exploit both boundary and interior pixels of the shape. For this reason they are applicable to generic shapes and are more robust to noise and shape distortions. Among the region-based descriptors, moments have been very popular since they were first introduced in the 60’s [9,10]. These include geometric moments [9], invariant moments [9], Legendre moments [11], Zernike moments [11,12] and Chebyshev moments [13,14]. In this paper we study and compare three momentbased descriptors: Invariant moments, Zernike moments, and radial Chebyshev moments. The retrieval performance of the descriptors are tested on two MPEG-7 shape databases and are quantified by the classical precision and recall measures.
2. Invariant Moments (IM) The general form of a moment function Φ pq of order ( p + q) , of an image intensity function f ( x, y ) can be
given as: Φ pq = ∫∫ Ψ pq ( x, y) f ( x, y) dxdy
p, q = 0,1,2,...
(1)
xy
where Ψ pq is known as the moment weighting kernel or the basis set. For a digital image f ( x, y ) , equation (1) can be rewritten in discrete form as: m pq = ∑ ∑ Ψ pq ( x, y ) f ( x, y ) p , q = 0,1, 2,... (2) x
y
Geometric moments, are the simplest of the moment functions with basis Ψ pq = x p y q . The basis set x p y q ,
{
}
while complete, is not orthogonal [11]. The central moments, which are invariant to translation, are defined as:
µ pq = ∑ ∑ ( x − x ) p ( y − y ) q f ( x , y )
p , q = 0,1, 2,...
x y
(3)
where x = m10 m00 and y = m01 m00 . A set of 7 invariant moments (IM) which are invariant to rotation, scaling and translation are given by Hu [9]: φ1 = η 20 + η 02 φ 2 = (η 20 − η 02 ) 2 + 4η11
capability and low noise-sensitivity [16]. Therefore, we choose Zernike moments as our second shape descriptor. The complex Zernike moments are derived from orthogonal Zernike polynomials: Vnm ( x, y ) = Vnm (r cosθ , r sin θ ) = Rnm (r ) ⋅ exp( jmθ )
where Rnm (r ) is the orthogonal radial polynomial: Rnm ( r ) =
(n − m ) / 2
s ∑ (−1)
s =0
2
φ3 = (η 30 − 3η12 ) 2 + (3η 21 − η 03 ) 2 φ 4 = (η 30 + η12 ) 2 + (η 21 + η 03 ) 2
φ5 = (η 30 − 3η12 )(η 30 + η12 )[(η 30 + η12 ) 2 − 3(η 21 + η 03 ) 2 ]
[
+ (3η 21 − η 03 )(η 21 + η 03 ) 3(η 30 + η12 ) 2 − (η 21 + η 03 ) 2
φ 6 = (η 20 − η 02 )[(η 30 + η12 ) 2 − (η 21 + η 03 ) 2 ]
]
2
[
+ (3η12 − η 30 )(η 21 + η 03 ) 3(η 30 + η12 ) 2 − (η 21 + η 03 ) 2
(n − s)! r n−2s ⎛ n − 2s + m ⎞ ⎛ n − 2s − m ⎞ s! × ⎜ ⎟⎟ ! ⎜⎜ ⎟⎟ ! ⎜ 2 2 ⎝ ⎠ ⎝ ⎠
(6)
n = 0,1, 2,... ; 0 ≤ m ≤ n ; and n − m is even. (4)
+ 4η11 (η 30 + η12 )(η 21 + η 03 )
φ 7 = ( 3η 21 − η 03 )(η 30 + η12 )[(η 30 + η12 ) 2 − 3(η 21 + η 03 ) 2 ]
(5)
]
where η pq = µ pq µ00γ and γ = 1 + ( p + q ) 2 for p + q = 2,3,...
IM are computationally simple. Moreover, they are invariant to rotation, scaling and translation. However, they have several drawbacks [16]: Information redundancy: Since the basis is not orthogonal, these moments suffer from a high degree of information redundancy. Noise sensitivity: Higher-order moments are very sensitive to noise. Large variation in the dynamic range of values: Since the basis involves powers of p and q, the moments computed have large variation in the dynamic range of values for different orders. This may cause numerical instability when the image size is large. Since IM features have large variation, we normalize them to [0,1] range by z-score normalization [16].
3. Zernike Moments (ZM) Teague [11] has suggested the use of continuous orthogonal moments to overcome the problems associated with the geometric and invariant moments. He introduced two different continuous-orthogonal moments, Zernike and Legendre moments, based on the orthogonal Zernike and Legendre polynomials, respectively. Several studies have shown the superiority of Zernike moments over Legendre moments due to their better feature representation
Zernike polynomials are a complete set of complexvalued functions orthogonal over the unit disk, i.e., x 2 + y 2 ≤ 1 . The Zernike moment of order n with repetition m of a continuous function f ( x, y ) is given by: Z nm =
n +1
π
* 2 2 ∫∫ f ( x, y ) ⋅ V nm ( x, y ) dxdy ; x + y ≤ 1
(7)
xy
For a digital image f ( x, y ) , equation (7) can be approximated as: Z nm = =
n +1
π n +1
π
∑ ∑ f ( x, y ) ⋅ V *nm ( x, y )
x2 + y 2 ≤ 1
x y
∑ ∑ f (r cosθ , r sin θ ) ⋅ Rnm (r ) ⋅ exp( jmθ ) r θ
(8) r ≤1
If the image is rotated by an angle α, the transformed Zernike moment functions Z 'nm are given by: ′ = Z nm ⋅ e− jmα Z nm
(9)
This means that the magnitude of the moments stays the same after the rotation. Hence, the magnitudes of the Zernike moments of the image, Z nm , could be taken as rotation invariant features [12]. Zernike moments (ZM) have the advantages:
following
Rotation invariance: As shown above, the magnitudes of Zernike moments are invariant to rotation. Robustness: They are robust to noise and minor variations in shape [12, 15]. Expressiveness: Since the basis is orthogonal, they have minimum information redundancy [11]. However, the computation of ZM (in general, continuous orthogonal moments) pose several problems [13]:
Coordinate space normalization: The image coordinate space must be transformed to the domain where the orthogonal polynomial is defined (unit circle for the Zernike polynomial). Numerical approximation of continuous integrals: The continuous integrals in equation (1) must be approximated by discrete summations. This approximation not only leads to numerical errors in the computed moments, but also severely affects the analytical properties such as rotational invariance and orthogonality. Computational complexity: Computational complexity of the radial Zernike polynomial increases as the order becomes large. ZM features are normalized to [0,1] range by zscore normalization [16].
4. Radial Chebyshev Moments (RCM)
S pq =
m −1 2π 1 ∑ ∑ t p (r ) e− jqθ f (r ,θ ) 2π ρ ( p, m) r = 0 θ = 0
(12)
where m denotes ( N 2) + 1 . In equation (12), both r and θ take integer values. The mapping between (r ,θ ) and image coordinates ( x, y ) is given by: rN N cos(θ ) + 2(m − 1) 2 rN N sin(θ ) + y= 2(m − 1) 2
x=
(13)
Similar to ZM, it can be shown that the magnitudes of RCM, S pq , are invariant to rotation [14]. RCM features are normalized to [0,1] range by zscore normalization [16].
5. Shape Normalization
Mukundan et al. [13] has suggested the use of discrete orthogonal moments to eliminate the problems associated with the continuous orthogonal moments. They introduced Chebyshev (a.k.a. Tchebichef) moments based on the discrete orthogonal Chebyshev polynomial. They showed that Chebyshev moments are superior to geometric, Zernike, and Legendre moments in terms of image reconstruction capability. However, this first formulation of Chebyshev moments did not have rotational invariance. Recently, Mukundan [14] introduced radial Chebyshev moments which possess rotational invariance property. In this section we present a brief overview of the radial Chebyshev moments (RCM). The scaled orthogonal Chebyshev polynomials for an image of size N × N are defined according to the following recursive relation: t0 (x) = 1 t1(x) = (2x − N +1)/ N ⎪⎧ ( p −1)2 ⎪⎫ (10) (2 p −1)t1(x)t p−1(x) − ( p −1) ⎨1− ⎬t p−2 (x) N 2 ⎭⎪ ⎪ ⎩ t p (x) = , p >1 p
and the squared-norm ρ ( p, N ) is given by p2 ⎞ 1 ⎞⎛ 22 ⎞ ⎛ ⎛ N ⎜1 − 2 ⎟ ⎜⎜1 − 2 ⎟⎟"⎜⎜1 − 2 ⎟⎟ N ⎠⎝ N ⎠ ⎝ N ⎠ ⎝ p ( p, N ) = 2 p +1 p = 0,1,..., N − 1
The radial Chebyshev moment of order p and repetition q is defined as:
(11)
In general, there are four basic forms of planar shape distortion caused by changes in viewer’s location: rotation, scaling, translation and skewing [18]. According to MPEG-7 standard [17], a good shape descriptor should be invariant to these distortions. Among the three shape descriptors described in the previous sections, invariant moments are invariant to rotation, scaling and translation, while Zernike moments and radial Chebyshev moments are invariant to only rotation. We can use a normalization algorithm, called shape compacting [18], to normalize the shapes before feature computation. This algorithm normalizes a shape and it’s distorted versions (scaled, translated, skewed) so that they all become similar to each other. Therefore, after shape compacting, the three momentbased shape descriptors will effectively be invariant to scaling, translation and skewing. The only remaining distortion type, rotation, is not a problem, since all of these shape-descriptors are inherently invariant to rotation. There are three major steps in the shape compacting algorithm [18]: (1) computing the shape dispersion matrix M, (2) aligning the coordinate axes with the eigenvectors of M, and (3) rescaling the axes using the eigenvalues of M. In the following discussion, f ( x, y ) denotes the digital image function for which f ( x, y ) = 1 indicates that ( x, y ) is an object pixel while f ( x, y ) = 0 indicates that ( x, y ) is a background pixel.
5.1. Computing the shape dispersion matrix
λ1,2 =
For a given shape, we first compute its dispersion matrix M. This matrix is a key element in the normalization process. The two later steps, rotating and rescaling the coordinate axes, which actually carry out the shape compacting, are all based on the dispersion matrix. After the normalization process, the shape will have a dispersion matrix equal to an identity matrix multiplied by a constant. This is an indication that the shape is in its most compact form. To compute the dispersion matrix first we calculate the shape centroid by: x = ∑ ∑ x ⋅ f ( x, y ) A x y
y = ∑ ∑ y ⋅ f ( x, y ) A
(14)
x y
x y
(15)
The shape dispersion matrix M is a 2 by 2 matrix: ⎡ m1,1 m1,2 ⎤ M =⎢ ⎥ ⎣⎢ m2,1 m2,2 ⎦⎥
(16)
with the elements defined as follows: ⎛ ⎞ m1,1 = ⎜ ∑ ∑ x 2 ⋅ f ( x, y ) A ⎟ − x 2 ⎝x y ⎠ ⎛ ⎞ m1,2 = m2,1 = ⎜ ∑ ∑ x ⋅ y ⋅ f ( x, y ) A ⎟ − x ⋅ y x y ⎝ ⎠ m2,2
(18)
2
It can be shown that, the normalized eigenvectors E1 and E2 , are given by: m1,2 ⎡ ⎢ 2 2 ⎡e1x ⎤ ⎢ (λ1 − m1,1 ) + m1,2 E1 = ⎢ ⎥ = ⎢ λ1 − m1,1 ⎢⎣e1 y ⎦⎥ ⎢ ⎢ (λ1 − m1,1 )2 + m1,2 2 ⎣⎢
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦⎥
m1,2 ⎡ ⎢ 2 2 ⎡ e2 x ⎤ ⎢ (λ2 − m1,1 ) + m1,2 E2 = ⎢ ⎥ = ⎢ λ2 − m1,1 ⎣⎢ e2 y ⎦⎥ ⎢ ⎢ 2 2 ⎢⎣ (λ2 − m1,1 ) + m1,2
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦
(19)
Now we can construct a matrix R from E1 and E2 by:
where A is the total number of object pixels: A = ∑ ∑ f ( x, y )
m1,1 + m2,2 ± ( m1,1 − m2,2 ) 2 + 4m1,2 2
(17)
⎛ ⎞ = ⎜ ∑ ∑ y 2 ⋅ f ( x, y ) A ⎟ − y 2 ⎝x y ⎠
If we consider each object pixel as a data point, the shape can be viewed as a cluster of pixels. The shape dispersion matrix M computed above is exactly the covariance matrix of the cluster. In pattern recognition, the covariance matrix is used to decouple correlated features and to scale the features to make the clusters compact. Similarly, here, the shape dispersion matrix is used to normalize a shape by making it compact.
⎡ E T ⎤ ⎡e1x e1 y ⎤ R=⎢ 1 ⎥=⎢ ⎥ ⎢⎣ E2T ⎥⎦ ⎣⎢e2 x e2 y ⎦⎥
(20)
Since M is real and symmetric, E1 and E2 are orthogonal to each other. Furthermore, they are normalized to unit length. Thus, R is an orthonormal matrix. We now transform the coordinate system by first translating the origin to the shape center and then multiplying the coordinates with matrix R. Now, each object pixel location ( x, y ) will have a new location ( x′, y′) given by: ⎡ x′ ⎤ ⎡x − x ⎤ ⎢ ′⎥ = R ⋅ ⎢ ⎥ y ⎣ ⎦ ⎣y − y⎦
(21)
Since R is an orthonormal matrix, the geometric interpretation of the transformation by R is pure coordinate rotation. The new coordinate axes are in the same directions as E1 and E2. It can be shown that, the dispersion matrix M’ of the translated and rotated shape is given by: ⎡λ 0 ⎤ M′ = ⎢ 1 ⎥ ⎣0 λ2 ⎦
(22)
5.2. Shifting and rotating the coordinate axes
5.3. Changing the scales of the bases
Now we shift the origin of the coordinate system to the center of the shape and rotate the coordinate system according to the eigenvectors of the dispersion matrix M. The orthogonal matrix for rotation consists of the two normalized eigenvectors, E1 and E2, of M. In order to find the eigenvectors, we first need the two eigenvalues, λ1 and λ2, of M:
In the previous step, we have rotated the coordinate system so that the new x-axis points in the direction in which the shape is most dispersed. The effect of the rotation on the dispersion matrix is that now it is a diagonal matrix. Since our objective is to have a shape whose dispersion matrix is a scaled identity matrix, in this last step we will change the scales of the two axes according to the eigenvalues λ1 and λ2. That is, for an
⎡ x′′ ⎤ ⎡ x′ ⎤ ⎡ c / λ1 0 ⎤ ⎡ x′ ⎤ ⎥⋅⎢ ⎥ ⎢ ′′⎥ = W ⋅ ⎢ ′⎥ = ⎢ ⎣y ⎦ ⎣ y ⎦ ⎢⎣ 0 c / λ2 ⎥⎦ ⎣ y′⎦
(23)
where c is a system-wide constant. Since W is a diagonal matrix, the effect of the above step on the shape is to change the scales of the two coordinate basis vectors so that the shape is in its most compact form and with a normalized size. This concludes the shape normalization process. Figure 1 shows two different views of the letter ‘K’, and their normalized versions. Note that in the normalization process, the orientation of the shape is not preserved since we rotate the coordinate system in step 2. But, this is not a problem since our shape descriptors are all invariant to rotation.
The average precision and recall of the 70 classes of shapes from the contour shape database is given in Figure 2 and the average precision and recall of the 31 classes of shapes from the region shape database is given in Figure 3. These figures clearly show that RCM descriptor has the highest retrieval performance while IM descriptor has the lowest retrieval performance. Examples of the retrieval results of the three shape descriptors are shown in Figure 4.
Precision
object pixel location ( x′, y′) , the new location ( x′′, y′′) is obtained through a transformation defined by W:
100 90 80 70 60 50 40 30 20 10 0
IM ZM RCM
1
2
3
4
5
6
7
8
9
10
Recall
Precision
Figure 2. Average retrieval performance of the three shape descriptors on the contour shape database
Figure 1. Top row: Two distorted shapes, Bottom row: Their normalized versions
100 90 80 70 60 50 40 30 20 10 0
IM ZM RCM
1
2
3
5
6
7
8
9
10
Recall
6. Retrieval Experiments The retrieval performance of the three moment-based descriptors is tested on both the contour shape database and region shape database used in the MPEG-7 Core Experiments [17]. The contour shape database consists of Set B of MPEG-7 contour shape database. Set B has 1400 shapes which are classified into 70 classes. All of the shapes in Set B are used as queries. The region shape database consists of 3621 shapes of over 500 classes. 31 classes of shapes each having 21 members are selected as queries. Precision and recall measures are used as the evaluation criteria. Precision is defined as the ratio of the number of relevant retrieved shapes to the total number of retrieved shapes. Precision is a measure of retrieval accuracy. Recall is defined as the ratio of the number of relevant retrieved images to the total number of relevant shapes in the database. Recall is a measure of retrieval robustness.
4
Figure 3. Average retrieval performance of the three shape descriptors on the region shape database
7. Conclusions In this paper we have studied and compared three moment-based shape descriptors. The retrieval performance of the descriptors was tested on two MPEG-7 shape databases and quantified by the classical precision and recall measures. As a result of the retrieval experiments, we have concluded that the RCM descriptor has the highest retrieval performance and the IM descriptor has the lowest retrieval performance. These results are inline with the theoretical properties of the descriptors.
[4] Zhang D.S. and Lu G.J. (2001) “Shape Retrieval Using Fourier Descriptors” Proc. of the International Conference on Multimedia and Distance Education, pp.1-9 [5] Zahn C.T. and Roskies R.Z. (1972) “Fourier Descriptors for Plane Closed Curves” IEEE Trans. on Computers, C21(3): 269-281
(a)
[6] Müller K. and Ohm J.-R. (1999) “Contour Description Using Wavelets” Proceedings of WIAMIS’99, pp. 77-80 [7] Chuang C.-H. and Kuo C.-C.J. (1996) “Wavelet Descriptor of Planar Curves: Theory and Applications” IEEE Trans. on Image Processing, 5(1): 56-70
(b)
[8] Mokhtarian F., Abbasi S., and Kittler J. (1996) “Robust and Efficient Shape Indexing through Curvature Scale Space” Proc. of the British Machine Vision Conference, pp. 53-62 [9] Hu M.-K. (1962) “Visual Pattern Recognition by Moment Invariants” IRE Trans. on Information Theory, IT-8: 179-187 [10] Alt F.L. (1962) “Digital Pattern Recognition by Moments” Journal of the ACM, 9(2): 240-258
(c) Figure 4. Examples of retrieval results using (a) RCM descriptor (b) ZM descriptor (c) IM descriptor (The first column is the region shape database, the second one is the contour shape database. In each screenshot, the top left shape is the query shape)
8. Acknowledgements This research is supported in part by NSF grant 0216500-EIA and by Texas Workforce Commission grant #3204600182.
[11] Teague M.R. (1980) “Image Analysis Via the General theory of Moments” Journal of Optical Society of America, 70(8): 920-930 [12] Khotanzad A. (1988) “Rotation Invariant Pattern Recognition Using Zernike Moments” Proceedings of the International Conference on Pattern Recognition, pp. 326328 [13] Mukundan R., Ong S.H., and Lee P.A. (2001) “Image Analysis by Tchebichef Moments” IEEE Transactions on Image Processing, 10(9): 1357-1364
9. References
[14] Mukundan R. (2004) “A New Class of Rotational Invariants Using Discrete Orthogonal Moments” Proceedings of the 6th IASTED Conference on Signal and Image Processing, pp. 80-84
[1] Granlund G.H. (1972) “Fourier Preprocessing for Hand Print Character Recognition” IEEE Trans. on Computers, C21(2): 195-201
[15] Kim H. and Kim J. (2000) “Region-based Shape Descriptor Invariant to Rotation, Scale and Translation“ Signal Processing: Image Communication, 16: 87-93
[2] Kauppinen H., Seppanen T., and Pietikainen M. (1995) “An Experimental Comparison of Auto-regressive and Fourier-Based Descriptors in 2D Shape Classification” IEEE Trans. on Pattern Analysis and Machine Intelligence, 17(2): 201-207
[16] Mukundan R. and Ramakrishnan K.R. (1998) “Moment Functions in Image Analysis: Theory and Applications” World Scientific Publication Co., Singapore
[3] Persoon E. and Fu K.-S. (1977) ”Shape Discrimination Using Fourier Descriptors” IEEE Trans. On Systems, Man and Cybernetics, SMC-7(3): 170-179
[17] Jeannin S., Cieplinski L., Ohm J.-R., and Kim M. (2000) "MPEG-7 Visual Part of eXperimentation Model, Version 4.1" ISO/IEC JTC1/SC29/WG11 MPEG00/M5897 [18] Leu J.-G. (1989) “Shape Normalization Through Compacting” Pattern Recognition Letters, 10: 243-25