A Fast and Robust Image Matching Method Using Sign of ... - CiteSeerX

0 downloads 0 Views 191KB Size Report
stant and linear-varying intensity. A learning method of the VQN is introduced for making small number of efficient template images, LOG filters and retrieval win-.
2003 2003 ISPACS Awaji Awaji Island Island

2003 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2003) Awaji Island, Japan, December 7-10, 2003

D2-2

A Fast and Robust Image Matching Method Using Sign of Laplacian of Gaussian and Vector Quantization Net Yasuhiro Fuchikawa, Shuichi Kurogi and Takeshi Nishida Faculty of Engineering, Kyushu Institute of Technology, Kitakyushu 804-8550 Japan Tel: +81-093-884-3188, Fax: +81-093-861-1159 E-mail: [email protected] Abstract: A fast and robust image matching method using SLOG (sign of LOG (Laplacian of Gaussian)) and VQN (vector quantization net) is presented for finding occurrences of a retrieval image in a larger source image. A measure to match two images called NCSLOG (normalized cross-correlation of SLOG images) is introduced for fast computation as well as for reducing high and low frequency noise and completely removing both constant and linear-varying intensity. A learning method of the VQN is introduced for making small number of efficient template images, LOG filters and retrieval window images to search a source image transformed by coordinate transformations such as rotation, magnification, and projection.

by means of introducing a measure of match called NCSLOG (normalized cross-correlation of the SLOG images). Further, we will show the SLOG image does not change so much as the original image for a small amount of coordinate transformation, which contributes to making a small number of efficient template images, LOG filters and retrieval window images for the present method to search a source image transformed by coordinate transformations such as rotation, magnification, and projection. Although image processing methods using LOG filters [12], [13] have been proposed so far, they have not analyzed the role of the LOG filters for coordinate transformations.

2. Fast Matching Using SLOG 1. Introduction

2. 1 FFT operations for cross-correlation

In order to search for occurrences of a retrieval image in a larger source image, a fast and robust image matching method using SLOG (sign of LOG (Laplacian of Gaussian))[1] and a VQN (vector quantization net) [2], [3] is presented. The retrieving task of this kind has been investigated in many fields as one of the two common scenarios of template matching, where the other scenario is to classify a given image as one of a number of template prototypes[4]. For template matching, the measure of match is important and various measures have been studied. The correlation coefficient [5] and the sum of squared differences[6] are widely used since they are simple and explicit. However, these measures sometimes do not work so well for the images affected by illumination, occlusion, and so on[7]. To overcome these problems, several methods have been proposed [8][11]. On the other hand, there is another problem that the images to be matched sometimes involve coordinate transformations such as magnification, rotation, projection, etc., where the methods in [10], [11] are reported to have not so high robustness to magnifications and rotations as the conventional correlation coefficient. In this paper, we present a method robust mainly to coordinate transformations, where we utilize the LOG filter[1]; it is well known that the LOG filter is isotropic bandpass filter capable of reducing low and high frequency noise and completely removing both constant and linear-varying intensity, and the zero-crossing of the obtained LOG images and the binarized SLOG images is supposed to play significant roles in image processing in the brain. In addition, we here take advantages of the SLOG images for fast computation of matching

260

A two-dimensional image p is denoted by p  p(x, y) or p  p(x), where p(x) represents the intensity at the position x = (x, y)T in the image. The cross-correlation function of two images p(x) (x ∈ Ip  {0, 1, ..., N − 1}2 ) and q(x) (x ∈ Iq  {0, 1, ..., M − 1}2 ) is given by R(x)  p ∗ q 



p(x + i)q(i),

(1)

i∈Iq and is also obtained as follows, p ∗ q = F −1 (F ∗ (p)F (q))),

(2)

where F (·) is the Fourier transform, F ∗ (·) is the conjugate of F (·), and F −1 (·) indicates the inverse Fourier transform. So, the computational complexity of Eq.(2) using three FFT (fast Fourier transform) operations is given by OF F T = O(6N 2 log(N ) + N 2 ) for N ≥ M and N = 2n for an integer n, while the complexity of the direct calculation of Eq.(1) is Odirect = O(N 2 M 2 ). We can obtain, for example, that OF F T is 75 times smaller than Odirect for N = 512 and M = 64. When given two images are p(x) consisting of N1 × N2 pixels whose position x is in Wp  {0, 1, · · · , N1 − 1} × {0, 1, · · · , N2 − 1} and q(x) consisting of N3 ×N4 pixels whose x is in Wq  {0, 1, · · · , N3 − 1} × {0, 1, · · · , N4 − 1}, the FFT operations are applied to enlarged images p(x) = p(x)wp (x) and q(x) = q(x)wq (x) consisting of N 2 pixels, where N is the least integer which fulfills N ≥ Ni for i = 1, 2, 3, 4 and N = 2n for an integer n, and the window functions

are given by  wp (x)   wq (x) 

1 0

if x ∈ Wp , otherwise,

(3)

1 0

if x ∈ Wq , otherwise.

(4)

Further, in the following, without loss of generality, we suppose p(x) > 0 for x ∈ Wp , q(x) > 0 for x ∈ Wq .

(5) (6)

2. 2 NCSLOG The present method is to search for the position of an arbitrary shaped retrieval image r(x) in a larger source image p(x), where r(x) is supposed to be given by r(x)  q(x)wr (x),  1 x ∈ Wr , wr (x)  0 otherwise,

(7) (8)

for a rectangular template q(x) and the window Wr representing the arbitrary shaped area to be retrieved. In order to match p(x) and r(x), we here introduce a correlation based metric as follows; first, the LOG images are produced as follows pLOG (x)  p ∗ (∇2 G), 2

qLOG (x)  q ∗ (∇ G), where ∇2 G(x) 

−1 πσ 4

  x2 x2 1− e− 2σ2 , 2 2σ

(9) (10)

(11)

is the LOG bandpass filter which is able to remove low and high frequency noise. Next, we produce the SLOG images pSLOG (x)  sgnwp (x) (pLOG (x)),

(12)

rSLOG (x)  sgnwr (x) (qLOG (x)),

(13)

where

⎧ ⎨ 1 −1 sgnw(x) (f (x))  ⎩ 0

f (x) ≥ 0 and w(x) = 1 f (x) < 0 and w(x) = 1 , w(x) = 0 (14)

is the sign image of f (x) for the window w(x). Finally, we introduce NCSLOG (normalized cross-correlation of SLOG images), or the cross-correlation of pSLOG and rSLOG normalized by the number of pixels in Wr , i.e., 1 (pSLOG ∗ rSLOG ) |Wr | 1  pSLOG (x + i)rSLOG (i), = |Wr | i∈Wr

RN CSLOG (x) 

(a) Template image q

(c) sgnwr (∇2 G ∗ q)

(b) Retrieval area Wr

(d) sgn(∇2 G ∗ (wr q))

Figure 1. Effect of the order of wr . where the position x = xd with the maximum value RN CSLOG (x) for all x is supposed to represent the position detected. Note that Eq.(15) is obtained via a single cross-correlation operation, or three FFT operations as shown before, while the usual normalized crosscorrelation given by RnCSLOG (x) 

pSLOG (x + i)rSLOG (i)

i∈W    r , 2 2 pSLOG (x + i) rSLOG (i) i∈Wr i∈Wr

(16)

requires much more computation, although both of them have the same values for almost all x; for all x such that x + i ∈ Wp for all i ∈ Wr , we have RN CSLOG (x) = RnCSLOG (x) since pSLOG (x + i)2 = 1 and rSLOG (i)2 = 1. Although RN CSLOG (x) is not the same as RnCSLOG (x) for x such that x + i ∈ Wp for some i ∈ Wr , where the position x is near the edge of Wp and the conventional RnCSLOG (x) usually neglects such positions because the denominator and the numerator become small and a big value at such edge area is not so reliable, while we take the values of RN CSLOG (x) at such edge area into account because only the denominator becomes small. Here, we should note that the rectangular window Wq of the template image q should be larger enough than the arbitrary shaped area of the retrieval image r, while Wr should not be so large but contain the area to be retrieved (see Fig.1(a) and (b)). Further, the window function wr (x) should be applied after the LOG filter (see Fig.1(c)), otherwise the border around Wr affects the retrieval SLOG image via the LOG filter (see Fig.1(d)).

3. VQN to Search Transformed Image The source image p and the retrieval image r captured by cameras usually are not the same but involve transformations such as rotations, magnifications, projections, and so on. So, we have developed an efficient method using the VQN to match such images involving transformations. 3. 1 Coordinate transformation

(15)

261

Let h(x) = (xh (x), yh (x))T be a coordinate transformation which transforms p(x) to p(h(x)), where xh (x)

and yh (x) are functions of x. The image transformation is also represented by the operator th given by p(h(x)) = th (p(x))

 δ(xh (x, y) − u)δ(yh (x, y) − v)p(u, v)dudv, (17) where δ(·) is the delta function. We in the following examine projective transformations given by h(x) = where

 A

a11 a21

a12 a22

Ax , bT x + 1 

 ,b 

t

−−−A−→ p(Ax) p(x) ⏐ ⏐ ⏐ 2 ⏐ ∇2 G(x) ∇ G(Ax) pLOG (x) ∇2 G(x) pLOG (Ax) ∇2 G(Ax) ⏐ ⏐ ⏐sgn sgnwp (x) ⏐   wp (Ax) t pSLOG (x) ∇2 G(x) −−−A−→ pSLOG (Ax) ∇2 G(Ax)

(18) Figure 2. Relation of images for linear transformation b1 b2

 ,

(19)

and a11 , a12 , a21 , a22 , b1 , b2 ∈ R. When b = 0, h(x) can represent a combination of linear transformations such as rotations, magnifications, etc., otherwise it is nonlinear. 3. 2 Transformation of NCSLOG Let RN CSLOG (x) be the NCSLOG of p(x) and r(x),   and RN CSLOG (x) be that of p (x) = th (p(x)) and   r (x) = th (r(x)). If RN CSLOG (x) is not distorted by th or the relation given by  RN CSLOG (x) = th (RN CSLOG (x)).

(20)

is fulfilled, the position detected is the desired one as follows,  xd  argmax RN CSLOG (x) x   = h argmax RN CSLOG (x) x = h(xd ).

t −−−h−→

p(x) ⏐ ⏐ ∇2 Gσ  pLOG (x)

⏐ sgnwp (x) ⏐ 

∇2 Gσ

pSLOG (x) ∇2 G

σ

p(h(x)) ⏐ ⏐ 2 ∇ G(h(x)) pLOG (h(x)) ∇2 G(h(x)) ⏐ ⏐sgn  wp (h(x))

t −−−h−→ pSLOG (h(x)) ∇2 G(h(x))

Figure 3. Relation of images for projection Here, let bT u  1, x = Ax/(bT x + 1) and u = Au/(bT x + 1), then we have du = |A|du/(bT x + 1) and pLOG (h(x)) ∇2 G(h(x))

 , (25) (bT x + 1)|A|−1 th pLOG (x) ∇2 G  σ

(21)

Now, let h(x) = Ax, then the LOG image of p(Ax) filtered by ∇2 G(Ax) is given by pLOG (Ax) ∇2 G(Ax)

= p (Ax + Au) ∇2 G (Au) du  (22) = |A|−1 tA pLOG (x) ∇2 G(x) , where |A| is the determinant of A and |A| > 0. Thus, the SLOG image is derived as 

pSLOG (Ax) ∇2 G(Ax) = tA pSLOG (x) ∇2 G(x) , (23) where the relation is depicted in Fig.2. Since the same relation for the retrieval image holds, the relation given by Eq.(20) holds, and we have Eq.(21). Next, let h(x) = Ax/(bT x+1), then the LOG image of p(h(x)) filtered by ∇2 G(h(x)) is given by qLOG (h(x)) ∇2 G(h(x))   

 Ax + Au Au 2 = q ∇ G du. (24) bT x + bT u + 1 bT u + 1

262

where ∇2 Gσ is the LOG filter with the Gaussian width σ  = σ/(bT x+ 1) instead of σ. Since we usually suppose (bT x + 1)|A|−1 > 0, we have 

pSLOG (h(x)) ∇2 G(h(x)) th pSLOG (x) ∇2 G  , σ (26) where the relation is shown in Fig.3. Since the same relation holds for the retrieval image, Eq.(20) is approximately fulfilled when ∇2 Gσ is used. However, when the area of Wr is small enough as bT x  1, the projected image r(h(x)) can be approximated by a linear transformation image r(Ax). Therefore, we can use ∇2 G(h(x)) instead of ∇2 Gσ , where we have to note that Eq.(20) holds not for all x but for x = xd . The above discussion indicates that in order to search for r(h(x)) in a given source image p (x) = p(h(x)), we should use ∇2 G(h(x)) instead of ∇2 G(x). 3. 3 Matching using VQN Suppose that the source image p(x) involves a subimage r(h(x)) where h(x) is unknown but the range of the parameter values (e.g. A and b in Eq.(19)) is known and then h(x) is supposed to be in the infinite set H  {hj (x) j = 0, 1, 2, · · · }. So, we use the VQN

1000

Table 1. CRL based learning of transformed images The VQN incrementally updates storing images (vectors) qi , wi , vi and a scalar di as follows; at each discrete time k = 1, 2, · · · , K, a coordinate transformation h  hj(k) (x) ∈ H is generated randomly, where j(k) denotes a suffix of a coordinate transformation at time k. With h, the template image q  q(h), the window function w  wr (h), and the LOG image v  ∇2 G(h) are produced and the CRL based learning is done as follows: (1) The following condition, which we call RIC (reinitialization condition), given by dc > rd di 

and

H −

j

dj  ln k dk



dj  k dk



q if i = l, qi otherwise,

 wi :=



Time [s]

0.1 2 20

402 60 2 80 2 size of retrieval image [pixel]

100 2

4. Analysis and Experimental Results 4. 1 Computation time

,

w if i = l, wi otherwise,



 v if i = l, ηdi  if i = l or i = c, di := ηdi otherwise, vi otherwise, where := indicates substitution, η(< 1) is a constant, and lth unit is supposed to have the minimum di . (3) (Competitive learning) If the RIC does not hold, the following modification is applied vi :=

⎧ ⎨ qi + α (q − qi ) if i = c, qi := ⎩ q otherwise, i

NCSLOG(FFT) NCSLOG(direct) CC SCC

Figure 4. Comparison of calculation time.

is the entropy for estimating the equality of all di , which takes the maximum Hmax = ln(Nc ) when all di is equal. (2) (Reinitialization) If the RIC holds, the following modification is applied qi :=

10

1

H < rH ln(Nc ),

is examined, where the cth unit is supposed to have qc closest to the input image q, di  is the mean value of all di , rd (> 1) and rH (< 1) are constants, and 

100

⎧ ⎨ wi + α (w − wi ) if i = c, wi := ⎩ w otherwise, i

⎧ ⎧ ⎨ vi + α (v − vi ) ⎨ ηdi + q − qi 2 if i = c, if i = c, vi := di := ⎩ v otherwise, ⎩ ηd otherwise, i i where α = 1 − H/ ln(Nc ) is the forgetting rate depending on the entropy H. The initial values of qi , wi and vi are zero vectors and the initial value of di is zero.

for obtaining a small number of efficient retrieval images r(hi (x)) for i = 1, 2, · · · , Nc , where Nc is the number of units of the VQN and we introduce the CRL (competitive and reinitialization learning)[2], [3] based learning shown in Table 1 for producing efficient template images q i = q(hi (x)), retrieval window images wi = wr (hi (x)), and LOG filters v i = ∇2 G(hi (x)). Here, note that the CRL has been introduced to overcome the local minima problem of the conventional VQ algorithms, and its good performance in image processing has been verified in [3]. The present learning method is modified for storing wi and v i accompanied with the main quantization vector q i . After the learning, the memories q i , w i and v i are used for generating pSLOG (x), rSLOG (x) and RN CSLOG (x) in Eq.(15). The position retrieved, x = xd , is the position where RN CSLOG (x) shows the largest value for all i = 0, 1, 2, · · · , Nc and x ∈ Wp .

263

The present NCSLOG method requires only a single cross-correlation operation as explained before, where note that some computation to obtain SLOG images are required but the most time-consuming part is the cross-correlation whose highest order of complexity is O(N 2 log N ) with FFT operations. In order to examine how fast the present method is, the computation time has been measured for a source image with N 2 = 5122 pixels and a retrieval image with |Wr | = M 2 = 202 , 402 , 602 , 802 , and 1002 pixels. The result is shown in Fig.4, where four methods are compared: NCSLOG(FFT) represents the present method using FFT operations, NCSLOG(direct) the present method using direct computation, CC the correlation coefficient, and SCC the selective correlation coefficient. The computation time of NCSLOG(FFT) is faster than other methods and depends only on N while other methods depend on |Wr | and N . Concretely speaking, when |Wr | = 202 , NCSLOG(FFT) is about 3 times faster than NCSLOG(direct) and about 26 times faster than CC and SCC. When |Wr | = 1002, NCSLOG(FFT) is about 50 times faster than NCSLOG(direct) and over 470 times faster than CC and SCC. 4. 2 LOG and VQN for efficient matching First, we show the LOG image is robust to coordinate transformations; when the coordinate transformation h = h(x) changes slightly by δh, we have pLOG (h + δh) − pLOG (h)

= p(h + δh + u)∇2 G(u)du − pLOG (h) =

1 πσ 8



p(h + v)(v T δh)2 e−

v

 2 2σ2

dv +

∞ 

Oi (δh),

i=3

(27) where Oi (δh) represents the ith order term of δh and there is no first-order term. On the other hand, the change of the original image has the first-order term, i.e., p(h+δh)−p(h) = (∂p/∂h)δh+O2 (δh)+· · · . Thus, the change of the LOG image can be smaller than that

NCSLOG SCC CC

qi

wi

vi

wi

(a) Nc = 3

NCSLOG SCC CC

(b) Nc = 5

Figure 6. Matching result of each method.

Figure 5. Weight vectors after learning. of the original one for a small δh. When the LOG image does not change so much, so does the SLOG image and RN CSLOG , which indicates that the VQN requires small number of units. To verify the efficiency of the present method, the following experiment is done; the source image (see Fig.6) was captured from a certain angle, and the other image was captured from another angle to obtain the template image, where the digit “6” in the template image is supposed to be retrieved in the source image so that we made the retrieval window Wr for the digit “6” in the template. As a relation between the template and the source images, we suppose the following projective transformation with parameters r, θR , θP and θY , which represent a magnification, a roll, a pitch and a yaw angle, respectively;   cR cY − s P sR sY cY sR + cR sP sY x r −cP sR cP cR , h(x) = (cY sP sR , cR sY )x + 1 (28) where cR  cos θR ,cP  cos θP ,cY  cos θY ,sR  sin θR ,sP  sin θP ,sY  sin θY . Using random parameter values as r ∈ [0.8, 1.2], θR ∈ [−15, 15], θP ∈ [−10, 10] and θY ∈ [−10, 10], we generated the images for training the VQN with three units, and obtained weight vectors q i , wi and v i (i = 1, 2, 3) as shown in Fig.5. In order to compare with CC and SCC, a rectangular window involving “6” and its transformed images were trained and wi in Fig.5 were obtained. The result of the matching in Fig.6(a) shows that only the NCSLOG could find out the correct position. So, we tried an additional experiment, where we used the VQN with five units, and the result shown in Fig.6(a) indicates that all methods could find out the correct position. These results indicate that the present method is able to produce a small number of efficient templates owing that the LOG image is robust to the coordinate transformations as shown above.

5. Conclusion We have presented a fast image matching method robust to coordinate transformations using the SLOG images and the VQN. From comparative experiments, we have shown the present method achieves faster matching than other methods, and the present method gener-

264

ates the memories for matching more efficient than other methods. References [1] David Marr, “Vision: A Computational Investigation into the Human Presentation and Processing of Visual Information,” W.H.Freedman Company, 1982. [2] T. Nishida, S. Kurogi, and T. Saeki “Adaptive Vector Quantization using Re-Initialization Method,” IEICE, vol.J84-D-II, no.7, pp.1503–1511, 2001. [3] T. Nishida, S. Kurogi, and T. Saeki “An analysis of competitive and reinitialization learning for adaptive vector quantization,” Proc. IJCNN01, pp.978– 983, 2001. [4] G.S.Cox, “Template matching and measures of match in image processing,” Review, University of Cape Town, 1995. [5] D.I. Barnea and H.F. Silverman, “A Class of Algorithms for Fast Digital Registration,” IEEE Trans. on Computers, C-21, pp. 179–186, 1972. [6] J. K. Aggarwal, L. S. Davis, and W. N. Martin, “Correspondence process in dynamic scene analysis,” Proc. of the IEEE, vol.69, no.5, pp.562–572, 1981. [7] S. Kaneko, “Robust Image Registration for Real World Machine Vision,” IEEJ(C), vol.121-C, no.5, pp.830–834, 2001. [8] I. Murase, S. Kaneko, and S. Igarashi, “Robust Matching by Increment Sign Correlation,” IEICE(D-II), vol.J83-D-II, no.5, pp.1323–1331, 2000. [9] Y. Stoh, S. Kaneko, and S. Igarashi, “Robust Image Registration Using Selective Correlation Coefficient,” IEEJ(C), vol.121-C, no.4, pp.800–807, 2001. [10] F. Saitoh, “Robust Image Matching for Occlusion Using Vote By Block Matching,” IEICE(D-II), vol.J84-D-II, no.10, pp.2270–2279, 2001. [11] T. Ryugo, A. Miyamoto, S. Kaneko, and Satoru Igarashi, “Robust Image Registration by Sampled Rank Correlation,” IIEEJ, vol.31, no.3, pp.363– 369, 2002. [12] K. Sumi, M. Hashimoto, and H. Okuda, “Threelevel Broad-edge Matching based Real time Robot Vision,” IEEE Proc. International Conference on Robotics and Automation, pp1416–1422, 1995.

[13] T. Noguchi and S. Shigeru “Data Compression of LoG Filter Output for Pre-Processing in Stereo Matching,” IEICE(D-II), vol.J83-D-II, no.9, pp.1952–1956, 2000.

265

Suggest Documents