Kona: A Multi-Junction Detector using Minimum Description Length Principle Laxmi Parida, Davi Geiger, Robert Hummel
[email protected],
[email protected],
[email protected]. September 28, 1996 Abstract
Corners, T-, Y-, X-junctions give vital depth cues which is a critical aspect of image \understanding": junctions form an important class of features invaluable in most vision systems. The three main issues in a junction (or any feature) detector are: scale, location, and, the junction (feature) parameters. The junction parameters are (1) the radius, or size, of the junction, (2) the kind of junction: lines, corners, 3-junctions such as T or Y, or, 4-junction such as X-junction, etcetra, (3) angles of the wedges, and, (4) intensity in each of the wedges. Our main contribution in this paper is a modeling of the junction (using the minimum description length principle), which is complex enough to handle all the three issues and simple enough to admit an eective dynamic programming solution. Kona is an implementation of this model. Similar approach can be used to model other features like thick edges, blobs and end-points.
Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, U.S.A.
1
1 Introduction A critical component of most recognition systems is stable, representative, feature extraction from images. One of the key features used in recognition is junctions: Tjunctions, Y-junctions, X-junctions and so on. These junctions are also critical for stereo vision modules or motion modules, since these are places where occlusions can be identi ed. Such points, for example, coincide with the images of trihedral vertices of an object. These are critical features for recognition as suggested by [2], [3], [5]. We use the template deformation framework to develop a \junction detector," to nds corners (two-junctions), tri-corners (tri-junctions), quad-corners (quad-junctions), etc., de ned as points where two or more homogeneous surface patches are located within an arbitrarily small neighborhood of the point. There have been basically two dierent paradigms for detecting junctions: edge detection followed by grouping of edges to form junctions [13, 12, 1, 16], and, treating junction as a template matching phenomenon [4], In the former, it is assumed that the presence (or absence) of a junction is determined by \grouping" the intensity gradient near a hypothesized junction. Usually one is interested in examining large gradients in the direction perpendicular to the hypothesized radial line. Experiments in this framework are limited and even the richest ones shown in [13] are interesting but not exhaustive. In the latterit is assumed that a (suitably small) local neighborhood is sucient to detect a junction. The basic idea is to t a junction-model to the input signal in a neighborhood. This involves minimizing an energy function which gives a measure of the \distance" of the junction-model from the input signal. The idea of performing local feature detection by projecting image data onto a subspace is fundamental in [4, 7]. Basically, the input is orthogonally projected onto a nite dimensional subspace of the Hilbert space of functions. An energy function (which is the L norm of the the dierence of the input and the tted function) is minimized in this nite dimensional space. The two main issues are nding an orthonormal basis that spans a good nite dimensional subspace and minimizing the energy function. This approach can give closed form solutions for edges [4, 7], and lines [6]. But, the generalization to junctions is complex and the solution is not apparent. In [9], corners and junctions (which are modeled as two adjacent corners) are represented by functions (models) that are blurred with a Gaussian (or an exponential lter) where the authors use a closed form solution. In general, numerical methods are used to obtain parameters that minimize the distance to the input data using an L norm. This is also the case in [10]. Our approach is to use a combination of the two paradigms: grouping of edges and tting templates. We use a template deformation framework, using the minimum description length (MDL) principle, that includes the gradient information in order to detect the radial partitions of the template as a grouping mechanism. In other words, the task is to nd the minimum number of wedges that best describes the junction. 2
2
2
Y X
θ2
T2
θ θ3
θ1 T1
T3
T(θ )
T(Y)
T2 T1 T3
Y
θ
Figure 1: Piecewise constant features. A bar detector and a junction detector. Note that as we increase the number of wedges, the junction description gets more accurate, hence the task is to use the MDL principle to obtain an optimal number of such wedges.
2 The Junction Model We model a junction as a region of an image where the values are piecewise constant in wedge-shaped regions emanating radially from a central point, covering a small disk centered at the point and omitting a (much) smaller disc centered at this point (see gure1). The parameters of a junction consist of (i) the radius of the junctiondisk, (ii) the center location, (iii) the number of radial line boundaries, (iv) the angular direction of each such boundary, and (v) the intensity within each wedge. The radius of the disk addresses the \scale" issue, and the location of the center is a kind of \interest operator" [8] that determines the position where the feature is located in a region, possibly pre-de ned. We can formulate the junction detection problem as one of nding the parameter values that yield a junction that best approximates the local data, and declaring local minima of the error as junctions. The best- t parameter values provide attributes of the detected junction. Let T denote the piecewise constant function/template. It has N angles and N intensities if N is the number of constant pieces. Further, let I denote the input signal. De ne the energy function, at a point (i; j ) on the image as follows
E = D + G ; where 0. 3
(a) Point on the original image.
(b) Original data.
(c) Projection of (b) to 1D signal.
(d) Gradient is not used. (e) Gradient is used. Figure 2: The use of gradient information in (e) gives a more accurate corner than in (d) where it is not used. The rst term, D, is a measure of the distance of the tted function from the data using the L norm: 2
D=
Z 1 Z 2 0
0
[I (r; ) ? T ()] g(r)rdrd;
(1)
2
where g(r) is an appropriate modulating function that goes to zero for large r, thus de ning the template size. The second term, G , is a measure of the distance of the gradient using the L norm. Figure 2 shows the an example where the use of the second term improves the solution. 2
G=
Z 1 Z 2 0
0
jrI (r; ) ? rT ()jg(r)rdrd:
(2)
where g(r) is an appropriate modulating function, not necessarily the same as g(r). Note that (r; ) e + 1 @I (r; ) e rI = @I@r r r @ rT () = 1 @T () e
r @
where er and e are the orthonormal vectors in the r and direction respectively, evaluated at (r; ). 4
Separating the angular and the radial terms, G can be written as,
G = A + R; where,
A=
Z 1 Z 2 0
R=
0
1 @I ? @T r @ @ 2
Z 1 Z 2 0
0
!2
g(r)rdrd;
(3)
!2
@I g(r)rdrd: @r
(4)
3 Energy Minimization Recall the energy equation:
E = R + E = R + (D + A): The rst term is independent of the junction feature and handles the scale and the location issue. The second factor is minimized to obtain the most appropriate junction template.
3.1 Scale & Location Consider a convenient and meaningful de nition of g(r) as follows: 8 > < 0 g(r) = > r1 :
rR A user-de ned threshold bounds R: this de nes R . R is often a user-de ned fraction of R : this allows a small hole in the center. A signi cant observation, from the series of experiments, has been the use of non-zero R . The following is done to obtain the location in a window: R, (with not necessarily the same R ), is evaluated 0
0
1
1
1
0
1
0
1
for the points and the one with the minimum value de nes the location. See Figure 3.
3.2 Junction Parameters We can carry out an appropriate numerical integration to obtain the value of the rst term R. The second factor, E , is used to estimate the junction template. The 5
Input image. L
;
39 31
marked. Region around L
;
39 31
.
L
;
, R = 0:61. L
39 30
;
, R = 0:57. L
40 30
;
, R = 0:49.
L
;
, R = 0:57. L
39 31
;
, R = 0:36. L
40 31
;
, R = 0:47.
L
;
, R = 0:47. L
39 32
;
, R = 0:38. L
40 32
;
, R = 0:44.
38 30
38 31
38 32
The junction at location L ; . Figure 3: The use of R, to locate the center of the junction. Lx;y indicates the x and y coordinates on the image. Note that the location L ; has the minimum R in the neighborhood. Incidentally, R , the size of the template is the same for all the nine locations. 39 31
39 31
1
6
(a) Position on the image.
(b) Position on the image.
(c) Adaptive window at (a). (d) Adaptive window at (b).
(e) The best- t to (c). (e) The best- t to (d). Figure 4: An example to show the dynamic computation of the windows at dierent locations on the image. unknown factors are: N , the number of intersecting lines (or wedges) at the junction , fpg; fTpg; p = 1 : : : N , where N is the number of wedges, p's are the angles where the partitions occurs, Tp's are the intensity values. To simplify the minimization, we x the number of wedges n, and obtain E n. Finally, N is computed by thresholding the relative error, r, 1
n E r= n : E +1
Note that as the number of parameters increase, E n decreases. There are at least two ways of deciding on the optimal number of parameters: one is by penalizing for the increase in the number and the other is by measuring the amount of decrease in the energy, We have observed that the former does not work well. On the other hand, measuringnthe rate of increase of energy, by thresholding the ratio of the energy measurement, E , works rather well. We can write down E as E = F + V;
N for homogenous region is 1, for line and corner is 2, for junctions like T-, Y-junction is 3, 4 for X-junction and so on. 1
7
where, F is xed, that is, it does not depend on the unknown parameters, where as, V does. Setting g(r) = r g(r), and, with some simple manipulations we obtain 2 ! 3 Z 1Z @I ( r; ) 4 I (r; ) + 5 g (r)rdrd: F= 2
2
2
2
@
0
0
F can be approximated numerically. Also, V can be approximated as, V= where,
N h X
p=1
i (p ? p)(?2TpI ; +1 + Tp C ) + (?2Tp0 (@ I~) + CTp0 ) ; 2
+1
p
C = I~() = (@ I~)
p
I ; +1 p
p
Z1 Z1 0
2
p
p
g(r)rdr = R ? R ; 1
0
I (r; )g(r)rdr R1 ~() @ I @ I (r; )g(r)r d r j @ j = @ jX +1 = I~(j ): 0
0
p
p
p
j =jp
Further, I~(), (@ I~) can be approximated numerically. For the sake of brevity, we have omitted the details of the derivations here. We now give a dynamic programming formulation, although in practice we use the simpler version which is reasonably eective (See section 4). Let A , A , : : :, AK denote the range of admissible intensities, , , : : :, k , the discretized angles. Then, 8 < ?2Aj I~(i ) + A C i = 1; j E ij = : min i ? s ~ ~ ? 2Aj I (i) + (C Ajs ? 2Ajs @ I (i)) + Ajs i > 1: sK E where Ajk = Aj ? Ak , and is a penalizing factor for the increase in wedges. The energy is, E = smin;K E ks : The number of wedges can be extracted from the minimum solution. Also, for the dynamic program, we re-arrange 's so that in the rearrangement, is such that I~( ) is closest to I~(k ). This ensures that and k are part of the same wedge. p
1
1
2
2
2
1
(
1)
2
=1
1
1
1
4 Implementation Although the energy equation E of the last section looks fairly complex, it has a remarkably simple and natural interpretation. Once the role of each factor of the energy equation is ascertained, some computations (like evaluating F , for instance), can be omitted without changing the solution. To recapitulate, the factors are used thus: 8
1. R is used for scale, i.e., to determine the size of the window, R . This is also used to obtain the exact location of the junction in a neighborhood. 2. E is used to obtain the junction parameters. 1
The I~() can be viewed as integrating the intensity along a radial line. Thus the two-dimensional image is projected on to a one-dimensional coordinate D(), where is appropriately discretized. Let D(i ) be de ned for ; ; : : : ; d, and, (i ? i) = 2=d, 8i . For a pjunction, T ; T ; : : : ; Tp, are the template intensities and the wedge boundaries are at k1 ; k2 ; : : : ; k , k k : : : k . Assuming, we know the p's, we can obtain Tp's by setting @T@V = 0; 8p. When = 0, xing, k 's, the Tl's can be shown to be the following: 1
1
2
2
+1
2
p
1
2
p
p
i
Tl =
Pkl+1 j =kl D(j ) : kl+1 kl
?
In other words, the Tl is a piecewise constant t which is the average value of the data in that region. The energy for the t is:
Ep =
p kX l+1 X
(D(j ) ? Tl) : 2
l=1 j =kl
When 6= 0, E p has some extra terms. Since, the template intensities are close to the image intensities, we make the following approximation: Tp0e (@ I~) : p
Now, it can be easily veri ed that the Tl and E p is the same as before. We compute the p's by exploring all possible set of p's. We summarize the dynamic program as follows: ( of tting Tp to j ; j : : :l, l j; p Cjl = cost cost of tting T to ; : : : : : : , l < j: +1
(
p
l l+1
1
j
sj p = 1; p? Esjp = Cmin p i 20). The tri-corners are ltered to show the ones with not-too-small contrast (> 40).
13
(a) Marked input image.
(b) 2-corners.
(c) Junction templates of (b).
(d) 3-corners. (e) Junction templates of (d). Figure 8: Example of another image showing the results of Kona.
14
6 Conclusions To conclude, we summarize the three observations/features of Kona. Firstly, the recognition of the importance of combining the template matching with the gradient term in the (template) function formulation resulting in a combination of a besttemplate- t with an edge grouping process with size (scale) and the location of the template as a by-product. Secondly, the improvement obtained by removing a small disc in the center of the junction and thirdly, a simple dynamic programming solution to the problem of junction detection. This paper demonstrates the successful use of piecewise constant functions to detect features like junctions and corners. Similar piecewise constant functions may be used for features like bar detectors, blobs, end-points etc.
References [1] W. Freeman, E. Adelson, Junction Detection and Classi cation, Proc. ARVO 1991. [2] A. Guzman, Decomposition of a Visual Scene into Three-Dimensional Bodies, Proc. AFIPS 1968 Fall Joint Computer Conference, 1968. [3] D.A. Human, A Duality Concept for the Analysis of Polyhedral Scenes, Machine Intelligence, Vol 6, Edinburgh Univ. Press, Edinburgh, U.K., 1971. [4] M. F. Hueckel, An operator which locates edges in digitized pictures, J, Assoc. Compt. Mach. Vol 18, 1971. [5] D. L. Waltz, Understanding Line Drawings of Scenes with Shadows, The Psychology of Computer Vision, McGraw-Hill, New York, 1972. [6] M. F. Hueckel, A local operator which recognizes edges and lines, J, Assoc. Compt. Mach. Vol 20, 1973. [7] R. Hummel, Feature Detection Using Basis Functions, Computer Graphics and Image Processing, Vol 9, 1979. [8] W. Forstner, E. Gulch, A Fast Operator for Detection and Precise Location of Distinct Points, Corners, and Centres of Circular Features, Procc of Intercommssion Conference on Fast Processing of Photogrammetric Data, Interlaken, Switzerland, 1987, pp 281-305. [9] R. Deriche, T. Blaszka, Recovering and Characterizing Image Features Using An Ecient Model Based Approach. In Proceedings of Computer Vision and Pattern Recognition, New York, 1993. 15
[10] G. Giraudon and R. Deriche. On corner and vertex detection. In Proceedings of Computer Vision and Pattern Recognition, Hawaii, 1991. [11] D. Mumford, T. Shah, Optimal Approximation by Piecewise Smooth Functions and Associated Variational Problems, Comm. on Pure and Applied Mathematics, Vol XLII, No 5, July 1989. [12] M. Nitzberg, D. Mumford, T. Shiota, Filtering, Segmentation, and Depth, Springer Verlag Berlin 1993. [13] D. Beymer, Massachusetts Institute of Technology Master's thesis, Junctions: Their detection and use for grouping images, 1989. [14] L. Parida, D. Geiger, B. Hummel, Junction Detection Using Piecewise Constant Functions, 1996. [15] J. Rissanen, A universal prior for integers and estimation by minimum description length, Annals Statistics, vol. 11, pp, 416-431, 1983. [16] V. Caselles, B. Coll, J. M. Morel, A Kanizsa Programme, Technical Report, Universitat de les Illes Balears, Spain, 1996.
16