Detecting Rotational Symmetries using Normalized ... - CiteSeerX

10 downloads 0 Views 318KB Size Report
Björn Johansson and Hans Knutsson and Gösta Granlund. Computer Vision Laboratory. Department of Electrical Engineering. Linköping University, SE-581 83 ...
Detecting Rotational Symmetries using Normalized Convolution Bj¨orn Johansson and Hans Knutsson and G¨osta Granlund Computer Vision Laboratory Department of Electrical Engineering Link¨oping University, SE-581 83 Link¨oping, Sweden {bjorn, knutte, gosta}@isy.liu.se ei 0 ei π/2 ei π ei 3π/2

Abstract Perceptual experiments indicate that corners and curvature are very important features in the process of recognition. This paper presents a new method to detect rotational symmetries, which describes complex curvature such as corners, circles, star, and spiral patterns. It works in two steps; first extract local orientation from a gray-scale or color image, second apply normalized convolution on the orientation image with rotational symmetry filters as basis functions. These symmetries can serve as feature points at a high abstraction level for use in hierarchical matching structures for 3D estimation, object recognition, image database retrieval etc.

1. Introduction Human vision seems to work in a hierarchical way in that we first extract low level features such as local orientation and color and then higher level features [3]. There also seem to exist lateral interactions between cells, perhaps to make them more selective. No one knows for sure what these high level features are but there are some indications that curvature, circles, spiral, and star patterns are among them [9]. Indeed, perceptual experiments indicate that corners and curvature are very important features in the process of recognition and one can often recognize an object from its curvature alone [2], [4]. They have a high degree of specificity and sparsity and as they are point features, they do not suffer from the aperture problem usually encountered for line and edge structures [11]. This paper describes a procedure to detect the features mentioned above. First, a local orientation transform image This work was supported by the Swedish Foundation for Strategic Research (SSF), project VISIT - VIsual Information Technology.

Figure 1. Local orientations (left) and corresponding double angle descriptors as vectors (middle) and as complex numbers (right)

is calculated. This image is invariant to intensity which simplifies curvature detection. Second, this image is correlated with a set of filters to detect complex curvature features. The filter responses are made more selective using normalized convolution theory, which is an image processing tool that accounts for uncertainty in the data and spatial location of the basis functions. The rotational symmetry detection described in this paper is a nice illustration of the normalized convolution theory.

2. Local orientation A standard representation of local orientation is simply a 2D gradient vector pointing in the dominant direction. A better representation is the double angle representation [10] [12], where we have a vector pointing in the double angle direction, i.e. if the orientation has the direction θ we represent it with a vector pointing in the 2θ-direction. Figure 1 illustrates the idea. The magnitude represents our confidence in the dominant orientation and is sometimes referred to as a certainty measure (often denoted c). This representation has at least two advantages: • We avoid ambiguities in the representation: It does not matter if we choose to say that the orientation has the direction θ or, equivalently, θ + π. In the double angle representation both choices get the same descriptor angle 2θ (modulo 2π).

^ m

Figure 2. Examples of a simple signal (left) and non-simple signals (middle and right)

• Averaging the double angle orientation description field makes sense. One can argue that two orthogonal orientations should have maximally different representations, e.g. vectors that point in opposite directions. This is for instance useful in color images and vector fields when we want to fuse descriptors from several channels into one description. This descriptor will in this paper be denoted by a complex ˆ = z/|z| points out the dominant number z where z orientation and c = |z| indicates the certainty or confidence in the orientation identified. A definition before we continue: Definition: ˆ A simple signal I : Rn → R is defined as I(x) = f (x · m) ˆ is a fixed vector ∈ Rn . where f : R → R and m Figure 2 shows some examples of simple and non-simple signals. The magnitude of the double angle descriptor is usually chosen to be proportional to the difference between the signal energy in the dominant direction and the energy in the orthogonal direction. In this case the double angle descriptor cannot distinguish between a weak (low energy) simple signal and a strong non-simple signal (e.g. noise with a slightly dominant orientation). We will get a low magnitude c = |z| (indicating no dominant orientation) in both cases which sometimes may be undesirable. Another, more comprehensive, representation of local orientation is the concept of tensors [13] [11]. A 2D orientaˆT2 ˆ2 e ˆT1 +λ2 e ˆ1 e tion tensor is a 2×2 tensor (matrix) T = λ1 e where λ1 denote the energy in the dominant orientation diˆ1 and λ2 the energy in the orthogonal direction e ˆ2 rection e (by this definition we get λ1 ≥ λ2 and for a simple signal λ2 = 0). From this tensor we can calculate a double angle descriptor e.g. by    λ1 − λ2  (1) 1 − e−kTk/T0 ei2∠ˆe1 z= λ1 In this way we can control the selectivity for simple/nonsimple signals and the energy sensitivity separately (for instance a weak simple signal can now get a higher magnitude |z| than a strong non-simple signal). Note that the magnitude lies in the interval [0, 1], with |z| = 1 indicating high

confidence and |z| = 0 low confidence in the dominant orientation. There are different ways to compute the tensor in practice. One way is to look at the local energy in different directions using a set of quadrature filters and weigh them together in a proper way [13] [11]. The filtering can be efficiently computed by use of sequential filter trees [1]. Another way to compute a tensor, which is used in this paper, is to approximate (by weighted least squares) the local neighborhood I(x) with a polynomial of e.g. second degree I(x) ∼ xT Ax + bT x + c

(2)

where x, b, c ∈ R2 and A ∈ R2×2 . From the coefficients we can create a tensor description T = AAT + γbbT

(3)

where γ is a non-negative weight factor between the linear and quadratic terms. This local polynomial approximation can be done by means of convolutions which in turn can be made separable leading to a very efficient algorithm, see [7]. To compute all local orientations in 9 × 9 neighborhoods in a 256 × 256 gray-level image takes in MATLAB a couple of seconds on a 299MHz SUN Ultra 10. Figure 4(a) shows a test image and Fig. 4(b) its local orientation transform z. This image will be used to illustrate the rotational symmetries.

3. Normalized convolution This section contains a short summary of the normalized convolution theory. The theory is thoroughly described in [8], [16], [7]. Normalized convolution approximates a signal with a linear combination of a set of basis functions. It takes into account the uncertainty in signal values and also permits spatial localization of the basis functions which may have infinite support. Let {bn }N 1 be a set of N linearly independent vectors in CM . They could for instance be the circular harmonic functions bn (ϕ) = einϕ described in Sec. 4.1 reshaped as vectors. Assume we want to represent a vector f ∈ CM as a linear combination of {bn }, i.e. f∼

N X

sn bn = Bs

(4)

1

where B = [b1 , ..., bN ] and s = (s1 , ..., sN )T . With each signal vector f ∈ CM we attach a certainty vector c ∈ RM indicating our confidence in the values of f (cf. Sec. 2). Similarly we attach an applicability function a ∈ RM to the basis functions {bn } ∈ CM to use as a window for spatial

arg min kf −Bsk2W = arg min (f −Bs)∗ W(f −Bs) (5) s∈CN

s∈CN

where W = Wa Wc , Wa = diag(a), Wc = diag(c) and ∗ indicates conjugate transpose. The solution becomes ˜ ∗f s=B

where

˜ = WB(B∗ WB)−1 B

(6)

˜ are called the dual basis of {bn }. The columns of B Doing this approximation in each local area in an (complex valued) image can be efficiently implemented by means of convolutions, hence the name normalized convolution. If the total orientation information is too low we cannot rely on the result s. In the orientation image we had a certainty measure c indicating how well we can rely on the ˆ. We can use the same formalism orientation information z for rotational symmetry images, call it symmetry certainty cs , indicating how well we can rely on s. There exist several suggestions for cs , see [7]. The one used in this paper is from [15]: cs =



det(B∗ Wa Wc B) det(B∗ Wa B)

1/N

(7)

which measures how less distinguishable the basis functions becomes when we include uncertainty compared to full certainty . The 1/N exponent makes cs proportional to c.

4. Rotational symmetries 4.1. Basic theory The reader can verify for himself that a circle can be described in the double angle representation as cei2ϕ and a star pattern as cei2ϕ eiπ . Therefore we can use the same filter b = ei2ϕ and correlate with the orient transform image X z(y − x)b∗ (y) (8) (z ⋆ b)(x) = y∈Z2

to detect both patterns. We will get a high response for both patterns but with different phase (0 resp. π). The filter described above is only one of many in a family of filters called rotational symmetry filters: n:th order rotational symmetry filter wn (r, ϕ) = a(r)bn (ϕ)

where

bn (ϕ) = einϕ

a(r) is called the applicability function and can be thought of as a window function (cf. Sec. 3). The basis function bn

n = −3 n = −2 n = −1 n = 0

n=1

n=2

n=3

n=4

α = 3π/2 α = π α = π/2 α = 0

localization of the basis functions. We then use weighted least squares to approximate the signal:

Figure 3. Corresponding gray-level patterns to some rotational symmetries ei(nϕ+α)

is called circular harmonic function. The filter detects the class of patterns that can be described by cei(nϕ+α) (they are said to have the angular modulation speed n). Each class member is represented by a certain phase α. One can also think of the filter responses as components in a polar Fourier series of local areas in the orientation image. What image patterns corresponds to the orient transform cei(nϕ+α) ? To answer this one has to start with the orientation image z = cei(nϕ+α) and go backwards to the original image, I, from which the orientation image was calculated. This ’inverse’ is of course not unique, but we get a hint by making the assumption that the image gradient is parallel to the dominant orientation, ∇I = ±|∇I|ei(nϕ+α)/2 . (We have to divide the phase by two to get rid of the double angle representation. The price is the direction ambiguity.) This will give a differential equation that can be solved assuming polar separability (the solution is not derived here). Figure 3 contains a sample of these psychedelic patterns. The most useful rotational symmetries (i.e. most common visual patterns in our daily lives) are the • zeroth order, n = 0: Detects lines (simply an averaging of the orientation image). • first order, n = 1 (also called parabolic symmetries, referring to the gray-level patterns the filter are matched to): Detects curvature, corners and line-endings. • second order, n = 2 (also called circular symmetries): Detects circles, stars and spiral patterns. The rotational symmetry filters were invented around 1981 by Granlund and Knutsson, however first mentioned in a patent from 1986 [14] and later in [11], [5], [6]. The result of correlating the orientation image in Fig. 4(b) with the symmetry filters b1 resp. b2 is shown in Fig. 4(c,d) (The choice of applicability is not critical in this case. It is chosen as a Gaussian with a size about the same

as the circle.) The intensities represent the response magnitude and the vectors represent the complex values at the maxima points. Note that the responses are not as selective as you might wish them to be.

10 20

4.2. Consistency operation

30

A previous solution to achieve better selectivity is described in [14] and is used in a commercial image processing program by ContextVision† . The patent describes a method termed consistency operation. The name refers to the idea that the operator should only respond to signals which are consistent with the signal model. The method is based on a combination of four correlations: h1 = wn ⋆ z , h3 = a ⋆ z ,

h2 = wn ⋆ c

(9)

h4 = a ⋆ c

The filter results are combined into h(x) =

h4 (x)h1 (x) − h2 (x)h3 (x) hγ4 (x)

(10)

where the denominator is an energy normalization controlling the model versus energy dependence of the algorithm. With γ = 1 the orientation magnitude varies linearly with the magnitude of the input signal. Decreasing the value increases the selectivity. The result from this operation is called divcons when n = 1 and rotcons when n = 2. The result when applying this method (with γ = 1) on the orientation image in Fig. 4(b) is shown in Fig. 4(e,f).

40 50 60 20

40

60

(a)

(b)

10

10

20

20

30

30

40

40

50

50

60

60 20

40

60

10

20

20

30

30

40

40

50

50

60

The normalized convolution technique not only improves selectivity. It can also be used to describe other patterns than the ones in Fig. 3. Figure 5 shows some patterns † http://www.contextvision.se/

40

60

40

60

40

60

(d)

10

4.3. Improving selectivity using normalized convolution Figure 4(g,h) shows the result after applying normalized convolution with {b0 , b1 , b2 } as basis functions (note that the basis functions are orthogonal in full certainty but not necessarily so in uncertain regions). The result cs s1 is fairly comparable to divcons in this case but cs s2 has become more selective compared to rotcons which will ease further processing. The explanation for the better selectivity is that divcons and rotcons can be fairly comparable to normalized convolution with {b0 , b1 } resp. {b0 , b2 } as basis functions. This means for instance that the corners detected by b1 are not ’inhibited’ from the second order response but are instead detected as ’half circles’. In general, using normalized convolution instead of the consistency operation produces more selective and sparse responses.

20

(c)

60 20

40

60

20

(e)

(f)

10

10

20

20

30

30

40

40

50

50

60

60 20

40

(g)

60

20

(h)

Figure 4. (a) Gray-scale test image; a rectangle and a circle with additive Gaussian noise. (b) Local orientation transform in double angle representation. (c, d) Filter responses (z ⋆ w1 ) resp. (z ⋆ w2 ) (e, f) Consistency operation Divcons resp. Rotcons (g, h) Normalized convolution coefficients cs s1 resp. cs s2

(a)

(b)

(c)

(d)

1 0

e.g. have fairly high input certainty.

References

−1 1 0 −1 1 0 −1 1 0 −1 1 0 −1 1 0 −1 1 0 −1 1 0 −1 −2

−1

0 n

1

2

Figure 5. Examples of gray-scale patterns and their rotational symmetry description. (a) Gray-scale pattern. (b) Local orientation in double angle representation. (c) Rotational symmetry components sn (light-gray = ℜ{sn }, P2 gray = ℑ{sn }, black = |sn |). (d) n=−2 sn ei(nϕ) that can be approximated by a small number of rotational symmetry coefficients and the same technique as above can be used to find these patterns using for instance {bn }2−2 as basis functions.

5. Future work Nothing has been said about computational complexity. The filter kernels can, for some choices of applicability, be separated into a small number of 1D-filters (using Singular Value Decomposition). For multi-scale detection of rotational symmetries there are plans to use sequential filter trees [1]. One other possibility is to use a region of interest map and only calculate the symmetry coefficients where we

[1] M. Andersson, J. Wiklund, and H. Knutsson. Sequential Filter Trees for Efficient 2D 3D and 4D Orientation Estimation. Report LiTH-ISY-R-2070, ISY, SE-581 83 Link¨oping, Sweden, November 1998. [2] F. Attneave. Some informational aspects of visual perception. Psychological Review, 61, 1954. [3] M. F. Bear, B. W. Connors, and M. A. Paradiso. Neuroscience: Exploring the Brain. Williams & Wilkins, 1996. ISBN 0-683-00488-3. [4] I. Biederman. Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2):115–147, 1987. [5] J. Big¨un. Local Symmetry Features in Image Processing. PhD thesis, Link¨oping University, Sweden, 1988. Dissertation No 179, ISBN 91-7870-334-4. [6] J. Big¨un. Pattern recognition in images by symmetries and coordinate transformations. Computer Vision and Image Understanding, 68(3):290–307, 1997. [7] G. Farneb¨ack. Spatial Domain Methods for Orientation and Velocity Estimation. Lic. Thesis LiU-Tek-Lic-1999:13, Dept. EE, Link¨oping University, SE-581 83 Link¨oping, Sweden, 1999. Thesis No. 755, ISBN 91-7219-441-3. [8] G. Farneb¨ack. A unified framework for bases, frames, subspace bases, and subspace frames. In Proceedings of the 11th Scandinavian Conference on Image Analysis, pages 341–349, Kangerlussuaq, Greenland, June 1999. SCIA. [9] J. L. Gallant, J. Braun, and D. C. V. Essen. Selectivity for polar, hyperbolic, and cartesian gratings in macaque visual cortex. Science, 259:100–103, January 1993. [10] G. H. Granlund. In search of a general picture processing operator. Computer Graphics and Image Processing, 8(2):155–178, 1978. [11] G. H. Granlund and H. Knutsson. Signal Processing for Computer Vision. Kluwer Academic Publishers, 1995. ISBN 0-7923-9530-1. [12] H. Knutsson. Filtering and Reconstruction in Image Processing. PhD thesis, Link¨oping University, Sweden, 1982. Diss. No. 88. [13] H. Knutsson. Representing local structure using tensors. In The 6th Scandinavian Conference on Image Analysis, pages 244–251, Oulu, Finland, June 1989. Report LiTH–ISY–I– 1019, Computer Vision Laboratory, Link o¨ ping University, Sweden, 1989. [14] H. Knutsson and G. H. Granlund. Apparatus for determining the degree of variation of a feature in a region of an image that is divided into discrete picture elements. US-Patent 4.747.151, 1988, 1988. (Swedish patent 1986). [15] C.-J. Westelius. Focus of Attention and Gaze Control for Robot Vision. PhD thesis, Link¨oping University, Sweden, SE-581 83 Link¨oping, Sweden, 1995. Dissertation No 379, ISBN 91-7871-530-X. [16] C.-F. Westin. A Tensor Framework for Multidimensional Signal Processing. PhD thesis, Link¨oping University, Sweden, SE-581 83 Link¨oping, Sweden, 1994. Dissertation No 348, ISBN 91-7871-421-4.