F.Bellavia, D.Tegolo, C.Valenti. Dipartimento ... plementation and a standard Harris-affine implementa- ... for any Harris-based detector, will be presented: given.
A non-parametric scale-based corner detector F.Bellavia, D.Tegolo, C.Valenti Dipartimento di Matematica e Applicazioni, Universit`a degli Studi di Palermo, Italy Consorzio COMETA, Italy {fbellavia,tegolo,cvalenti}@math.unipa.it
Abstract This paper introduces a new Harris-affine corner detector algorithm, that does not need parameters to locate corners in images, given an observation scale. Standard detectors require to fine tune the values of parameters which strictly depend on the particular input image. A quantitative comparison between our implementation and a standard Harris-affine implementation provides good results, showing that the proposed methodology is robust and accurate. The benchmark consists of public images used in literature for feature detection.
1. Introduction Feature detection is a very interesting topic in computer vision, because many most-wanted image analysis applications rely on it as a primary stage. The performances of robust and efficient feature detectors closely depend on the input images and on the particular selection of the parameters, thus no method outperforms the others [11, 3]. One of the first detectors, which makes use of the autocorrelation matrix to locate corners as points of interest, was defined in [10]. Subsequently, this method was improved through the first order image derivatives [4, 12]. The Harris detector, which is only rotational invariant in its original definition, was enhanced through the Harris-Laplace and Harris-affine implementations [8], which are respectively scale and affine covariant Harris-based detectors. In a similar way, the blob feature detector, based on the Hessian matrix, was extended into the Hessian-Laplace and Hessian-affine detectors [8]. Recently, an approximated but faster method, namely Speeded Up Robust Features, was described in [2]. Another robust scale-invariant feature detector, inspired by the behavior of the natural neural system, is the so called Scale-Invariant Feature Trans-
978-1-4244-2175-6/08/$25.00 ©2008 IEEE
form [6]. Good detectors, based on segmentation as the Maximally Stable Extremal Regions and the Intensity Based Region detectors, as well on edges like the Edge Based Region detector and also on the definition of entropy as the Salient Region detector were developed [9]. A new automatic method, namely Harris-Z, suitable for any Harris-based detector, will be presented: given an observation scale, the proposed algorithm avoids the tuning of parameters. Sections 2 and 3 sketch the standard and non-parametric Harris detectors. Experimental results are provided with final remarks in Section 4.
2. Corner detector An Harris corner is a region of the image where directional derivatives in any orthogonal basis are locally maximized, not only in one direction as an edge. This mathematically agrees with high eigenvalues in the autocorrelation matrix μ of the derivatives for a point p ≡ (x, y) of the input graylevel image I: 2 Ix Ixy μ(p) = Ixy Iy2 that is, both the eigenvalues λ 1 and λ2 of μ must be large to have a corner. This local image property can be measured by the so called Harris function, which returns higher values in the case of corners: H(p) = det μ(p) − α trace2 μ(p) where α is a parameter, manually set in [0.04, 0.4] to let the determinant and the trace be comparable. Besides α, a threshold value t, that is very dependent on the input image, must be manually determined: the points of H which are maximum within a 3×3 window and greater than t can be considered as corners. This methodology has been extended by the Harrisaffine detector through the affine scale-space theory [5] and provides invariant features to affine transformations. The leading observation in the scale-space theory is that the meaning of an object depends on the
scale in which it is examined. Under reasonable conditions, scale change can be simulated by smoothing with a Gaussian kernel g(p, σ) [5]. If the resolution decreases no further new details can appear, as one naturally expects. In the scale-space theory the minimum resolution partially observable is defined as differential scale σD , while the minimum scale the feature is completely visible at is usually known as integration scale σI . Usually, these variables are given by: σI = 1.4i and σ D = 0.7σI where i = 0, ..., 11 identifies the scale resolution [5]. The autocorrelation matrix μ is: 2 μ(p,σD , σI ) = σD g(·, σI )⊗ (1) Lx (p, σD )Lx (p, σD ) Lx (p, σD )Ly (p, σD ) Lx (p, σD )Ly (p, σD ) Ly (p, σD )Ly (p, σD )
where ⊗ represents the convolution operator and Lk (p, σD ) is the derivative on p along the direction k (i.e. x or y) after the Gaussian convolution with standard deviation σ D (figures 1a,b).
3. Harris-Z: a non-parametric solution The original Harris detector requires both the parameter α and the threshold t, while our approach does not need them. A preprocessing phase, namely weighted zscore, is computed for every pixel p in the luminance image, by weighting L x (p, σD ) on a Gaussian window W (Lx (p, σD ), σI ), centered in p and with radius σ I : Zx (p, σD , σI ) =
Lx (p, σD ) − W (Lx (p, σD ), σI ) std(W (Lx (p, σD ), σI ))
where W (Lx (p, σD ), σI ) and std(W (Lx (p, σD ), σI )) are the mean and standard deviation of the weighted intensities of the gradient in W (L x (p, σD ), σI ). Practically, the z-score function measures, in terms of standard deviations, the brightness distance between p and its neighbors, used to consider a variety of observation scales. Figure 1c shows Zx (p, σD , σI ) only, but an analogous definition can be given for Z y (p, σD , σI ), along the orthogonal direction. Let us compute the magnitude of the gradient: G(p, σD ) = L2x (p, σD ) + L2y (p, σD ) and of the z-scored gradient: Gz (p, σD , σI ) = Zx2 (p, σD , σI ) + Zy2 (p, σD , σI ). Figures 1d,e depict G (p) and Gz (p) that are the normalized versions of G(p, σ D ) and Gz (p, σD , σI ) in the
range [0, 1], respectively. These images are multiplied pixel by pixel to weight one each other: P (p) = G (p)GZ (p). P removes the effect of noise and enhances the contrast in the gradient image. P has to be further processed by a median filter with radius 3σ D , thus obtaining a new image P , to rebuild local small structures which can have been destroyed. Corners usually lie near the edges of the input image, thus they can be selected also on the basis of their weight in P . A binary mask B is computed by thresholding P with its global mean P : 0 P (p) P B(p) = 1 P (p) > P Actually, most of the candidate corners are too weak to be chosen and a median filter on P would let maintain too many of them. B does not require to be so accurate and a mean threshold with P , that is greater than the median value of P , is sufficiently accurate to flush away almost half of the corner candidates. The underlying idea consists in the observation that the average luminosity is associated to the background of the image, since the z-score function exalts the edges of the objects. To avoid very crisp edges, we get a gray-level mask M by applying to B a Gaussian filter with standard deviation σ D (figure 1f ). The gradient of the final image, that appears through μ in the Harris function H, is obtained by replacing in equation (1) L x and Ly with (figure 1g): Mx (p, σD ) = M (p, σD )Lx (p, σD ) My (p, σD ) = M (p, σD )Ly (p, σD ) The Harris function can be modified to avoid the parameter α: in this case, instead of a z-score, locally defined on a Gaussian window W , we compute the zscore function on the whole matrix in a uniform way (figure 1h): Hz (p) = Z(det(μ(p))) − Z(trace2 (μ(p)) where Z(K(p)) =
K(p) − K std(K)
has been defined for a generic point K(p). In our implementation, a point p is selected as a corner if: • M (p) > 0.31, because a corner point belongs to the edges represented by M , which locally indicates features with different dimensions due to σ D . M is a binary mask, convolved with a Gaussian function, therefore, assuming that the resolution of a single pixel is
equal to σ D , the threshold value 0.31 (which is proper for whatever input image) separates zones originally black or white; • Hz (p) > 0 (i.e. Hz (p) is greater than the mean value) must be a local maximum for H z , within a circular window with radius 3σ D . The regions of interest are described by ellipses (figure 1i), centered on the extracted points, with directions 1 and λ 2 of the inverse defined by the eigenvectors λ matrix μ−1 and axes ratio by the square root of the ratio between the eigenvalues [7].
4. Experimental results and conclusions The Harris-affine detector is widely used, together with Hessian-affine, MSER, SIFT and SURF for classification, mosaicing and multiple-views reconstruction. A comparison between the algorithm described in [8], with the default parameters suggested by the authors, and our version was carried out, through the repeatability index [3] on a set of standard images [1], to make implicitly a comparison between all these detectors. The repeatability index increases when the features are repeated under different transformations, as for the images in the benchmark dataset. Only the results on viewpoint, scale and rotation changes (i.e. the most relevant ones) are shown, due to space limitations. Generally, the number of extracted features halves when the scale index i is incremented, due to the increasing radius, equal to 3σ D = 3×1.4i , of the circular window used to locate the maximum value of the Harris function. The regions of interest are local structures, therefore they require a local enhancement of the contrast and a local reduction of the noise, that depend on the observation scale. Pointlike details are more evident in the smallest scales. For all transformations, figure 2 shows that good results are obtained with an initial scale i > 1, while the results obtained with i ranging in [2, 8] and in [2, 11] are comparable. Bigger scales, with respect to the repeatability index, produce similar output. The choice of the scales is important and influences the final output, thus to be considered according to the particular task to be addressed. Preliminary results show that the proposed algorithm provides a major absolute correspondence number with respect to the original implementation. Moreover, the repeatability index gives better values, though our method is automatic. The proposed system can be thought as a segmentation algorithm, followed by a feature detector: the combination of a different feature descriptor will be studied to compute the matching score. We noticed that for a very low scale index i, too many features are detected due to details and this involves an
information overload during the matching step. Also, an high scale index implies wider regions of interest and more computational time.
5. Acknowledgements This work makes use of results produced by the PI2S2 Project managed by the Consorzio COMETA, a project co-funded by the Italian Ministry of University and Research (MIUR) within the Programme Operativo Nazionale “Ricerca Scientifica, Sviluppo Tecnologico, Alta Formazione” (PON 2000-2006). More information is available at http://www.pi2s2.it and http://www.consorzio-cometa.it.
References [1] Affine Covariant Features. http://www.robots.ox.ac.uk/∼vgg/research/affine. [2] H. Bay, T. Tuytelaars, and L. Van Gool. SURF: Speeded Up Robust Features. In 9th European Conference on Computer Vision, pages 404–417, 2006. [3] F. Fraundorfer and H. Bischof. A Novel Performance Evaluation Method of Local Detectors on Non-planar Scenes. In Empirical Evaluation Methods in Computer Vision, Conference on Computer Vision and Pattern Recognition, 2005. [4] C. Harris and M. Stephens. A Combined Corner and Edge Detector. In 4th Alvey Vision Conference, pages 147–151, 1988. [5] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer Academic Publishers, 1994. [6] D. Lowe. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60:91–110, 2004. [7] K. Mikolajczyk. Detection of Local Features Invariant to Affine Transformations. PhD thesis, Institut National Polythecnique De Grenoble, 2002. [8] K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60:63–86, 2004. [9] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. Comparison of affine region detectors. International Journal of Computer Vision, 65:43–72, 2005. [10] H. Moravec. Towards automatic visual obstacle avoidance. In 5th International Joint Conference on Artificial Intelligence, 1977. [11] P. Moreels and P. Perona. Evaluation of Features Detectors and Descriptors Based on 3D Objects. In International Conference on Computer Vision, 2005. [12] C. Tomasi and T. Kanade. Detection and tracking of point features. Technical report, Carnegie Mellon University, 1991.
a
b
c
d
e
f
g
h
i
Figure 1. a: Input image; b: L x ; c: Zx ; d: G ; e: Gz ; f : M ; g: Mx ; h: Hz ; i: luminance image with regions of corners. This result has been obtained with i = 4 (i.e. σ I = 1.4i = 3.8416 and σD = 0.7σI = 2.68912). The y components have been omitted for the sake of simplicity.
wall sequence
bark sequence
boat sequence
100
80
80
80
80
60
40
20
0
60
40
20
20
30
40 50 viewpoint angle
0
60
repeatability %
100
repeatability %
100
repeatability %
repeatability %
graffiti sequence 100
60
40
20
30
40 50 viewpoint angle
0 1
60
1.5
wall sequence
2000
2 scale change
2.5
0 1
3
5000
6000 5000 4000 3000 2000
# correspondences
# correspondences
# correspondences
# correspondences
4000
2.5 3 scale change
3.5
4
Harris−Z (i=1,...,11) Harris−Z (i=2,...,8) Harris−Z (i=2,...,11) Harris−Z (i=3,...,8) Harris−Z (i=3,...,11) Standard Harris−affine
4000
7000
500
2
bark sequence
8000
1000
1.5
boat sequence
9000
1500
40
20
20
graffiti sequence
60
3000
2000
1000
3000 2000 1000
1000 0
20
30
40 50 viewpoint angle
60
0
20
30
40 50 viewpoint angle
60
0 1
1.5
2 scale change
2.5
3
0 1
2
3 scale change
Figure 2. Repeatability index and number of correspondences under transformations.
4