Contact is a property that is invariant over projective transformations. ... 2 Projective Invariance of Contact Point Ellipses ..... -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7. -7.
Projectively Invariant Decomposition and Recognition of Planar Shapes Stefan Carlsson Computational Vision and Active Perception Laboratory Dept. of Numerical Analysis and Computing Science Royal Institute of Technology S - 100 44 Stockholm, Sweden abstract
An algorithm is presented for computing a decomposition of planar shapes into convex subparts represented by ellipses. The method is invariant to projective transformations of the shape, and thus the conic primitives can be used for matching and de nition of invariants in the same way as points and lines. The method works for arbitrary planar shapes admitting at least four distinct tangents and it is based on nding ellipses with four points of contact to the given shape. The cross ratio computed from the four points on the ellipse can then be used as a projectively invariant index. It is demonstrated that a given shape has a unique parameter-free decomposition into a nite set of ellipses with unit cross ratio. For a given shape, each pair of ellipses can be used to compute two independent projective invariants. The set of invariants computed for each ellipse pair can be used as indexes to a hash table from which model hypothesis can be generated Examples of shape decomposition and recognition are given for synthetic shapes and shapes extracted from grey level images of real objects using edge detection.
1 Introduction Ecient and reliable recognition of objects in images requires a representation of the object with low complexity for computational eciency but suciently rich in order to separate it from other objects. The geometry of an object, represented by its edges and contours is often ideal for this purpose. This level of description is in general broken down into the levels of feature points, lines and sometimes higher order curve 1
approximations. Recognition is then based on the relations between these features. The complexity of recognizing 3-D objects from 2-D images is increased by the fact that an object in 3-D will project dierent images depending on the viewpoint of the observer. The desire to achieve ecient viewpoint independent object recognition has therefore recently led to an increased interest in projective invariance for object representation (Barret 1991, Forsyth 1990, Forsyth 1991, Lamdan 1988, Weiss 1988). Projectively invariant descriptors of objects can be computed from relations between points, lines and conics that are coplanar on object surfaces in 3-D. These projectively invariant descriptors are in general de ned as cross ratios, which can be computed from combinations of 5 points or lines in general position on a planar surface. Cross ratios can also be computed from combinations of points and lines with conics (Forsyth 1991, Rothwell 1992-1) This requires the ecient an reliable extraction of these features from the image. and is of course easiest when the image of the object actually contains these features in sucient quantities in order to compute cross ratios. For arbitrary curved objects, projectively invariant point and line descriptions is more complicated. In this case invariant points and lines can be extracted from in exions or bitangents (e.g. (Rothwell 1992-1) (Lamdan 1988)). Related to this is the use of combined algebraic and dierential invariant descriptors (van Gool 1992). These methods of point and line descriptions have limitations for arbitrary curved shapes. For example, they cannot be used for convex shapes. For structured shapes, the number of invariants grows very rapidly with the number of points and lines used in the representation which leads to problems when the invariants are used for indexing. It is therefore desirable to look for more expressive and global primitives for projectively invariant representation of planar shape. A most natural extension of the use of lines is to use homogeneous polynomials. A homogeneous polynomial is transformed projectively into a homogeneous polynomial with the same degree. The parameters of the homogeneous polynomial can therefore be used in the same way as the coordinates of points and lines as projectively invariant shape descriptors. The problem lies in the association of the homogeneous polynomial with the given shape. This association must commute with the transformation. The homogeneous polynomial 2
asociated with a shape must be the projective transform of the polynomial that was associated with the original shape (Forsyth 1991). For a restricted class of ane transformations, methods of invariant associations of homogeneous polynomials to point sets were developed in (Bookstein 1979). This was extended to the general ane case in (Forsyth 1991). The restriction to point sets and ane transformations is a limitation of this method. It requires the ane matching of the point sets used for association and, since shape data are in general given as continuous curves, this means that the point sets have to be extracted in an ane invariant manner from the curves. In the projective case and with continuous curves instead of points the association has to be based on projectively invariant properties. Two curves are said to be in contact of order n if they coincide at a certain point and their derivatives up to order n ? 1 are the same. Contact is a property that is invariant over projective transformations. For a given shape we can consider the class of homogeneous polynomials and a certain number of contact points with a speci ed order of contact. A member of this class will then project to a member of the corresponding class given by the projective transform of the shape. The method that will be presented, outlined in (Carlsson 1992), is based on using ellipses, which form a subset of second-order homogeneous polynomials. For a given shape we will study the class of ellipses with four contact points with the shape. The order of contact is two, that is, we consider ellipses where the tangents of the ellipses at the four contact points coincide with the tangents of the given shape. The fact that we are considering just a subset of second order polynomials means that we have to restrict the projective transformations to those transforming an ellipse into an ellipse and not a parabola or hyperbola. This restricted class of transformations correspond to the physically very natural constraint of having the object in front of the camera in an imaging situation. As will be explained in the next section, the choice of four contact points and secondorder contact provides a method of identifying single members of this family of ellipses over projective transforms using the cross ratio of four points on a conic. It is thus possible to extract a nite set of ellipses from a shape in two projectively corresponding 3
Projectively invariant decomposition
Projective transformation
ellipse extraction
ellipse extraction
Projective transformation
Figure 1: Ellipses with 4 contact points as basic primitives for projectively invariant shape representation images in such a way that the ellipses in the two frames are in projective correspondence gure 1 The proposed method of shape representation has similarities with the medial axis transform (Blum 1973) (Brady 1983) (Lee 1982) and can in certain respects be seen as a generalization. Using ellipses as primitives, which are convex, recalls shape decomposition methods (Homan 1984) (Richards 1985) . It will be seen that the ellipses extracted with this method, will in general correspond to a perceptual decomposition of the object into its convex subparts.
4
2 Projective Invariance of Contact Point Ellipses 2.1 Cross Ratio of Points and Lines on a Conic To discuss the invariance properties of ellipses with various contact points we start with the cross ratio, the fundamental invariant for points and lines in the plane. The cross ratio can be expressed as a ratio involving determinants. Given three column vectors x1; x2; x3 in R3 we will use the bracket notation for the determinant of the 3 3 matrix formed by these vectors: [x1 x2 x3 ] =
det(x1 ; x2; x3)
(1)
For 5 points in the plane, with homogeneous coordinates xa; xb; xc; xd; xe; the cross ratio is de ned as: [xa xb xe ] [xc xd xe ] = [xa xc xe ] [xb xd xe ]
(2)
The invariance of the cross ratio over projective transformations xi = T xi , where T is a nonsingular 3 3 matrix, follows easily from the rule 0
[xi xj xk ] = [T xi T xj 0
0
0
T xk ]
= [T ] [xi xj xk ]
(3)
Inserting this expression on the left side in (2) we see that all determinants [T ] will cancel. If we consider the fth point as a variable x and denote the cross ratio = 1=2 we have 1[xa
xc x] [xb xd x]
?
2[xa
xb x] [xc xd x] =
0
(4)
This is a second-order polynomial in x representing a conic through the points xa; xb; xc; xd. The cross ratio = 1=2 is then the cross ratio of the four points on the conic. Varying the cross ratio we get a pencil of conics through the four points. This pencil of conics can be expressed using the homogeneous symmetric matrices P ( ); P1; P2 as: 5
xT P ( )x
= xT (1P1 (xa : : : xd ) +
2P2 (xa : : : xd ))x
= 0
(5)
Due to the duality between points and lines, a conic can be expressed as a quadratic form in line coordinates u. The pencil of conics tangential to four lines ua ; ub; uc ; ud expressed in line coordinates can be written: 1[ua
uc u] [ub ud u]
?
2[ua
ub u] [uc ud u] =
0
(6)
Just as in the point case this can be expressed using homogeneous symmetric matrices Q( ); Q1; Q2 as: uT Q( )u
= uT (1Q1 (ua : : : ud ) +
2Q2 (ua : : : ud ))u
= 0
(7)
A projective transformation T that maps points xi into xi = T xi will map the matrices P ( ) and Q( ) to matrices P ( ) and Q ( ). If we apply this transformation to the coordinates in the pencils (4) and (6) the conic matrices can be shown to be related as: 0
0
P ( )
0
= T T P ( )T
Q ( )
0
That is, the conic with cross ratio = ratio.
0
1=2
=
T Q( )T T
(8)
maps into a conic with the same cross
For a given shape we will consider the class of ellipses having four contact points with the contour making up the shape outline. A contact point is de ned as a point of secondorder contact of the ellipse and the contour; that is, at that point both point and line coordinates of the contour and ellipse will coincide. Point and line coordinates of points on a conic are related as:
ua
=
P xa
ub
=
P xb
uc
6
=
P xc
ud
=
P xd
(9)
For four points on a conic the cross ratio will equal that computed from the tangents of the points. This follows easily from the fact that [ua ub u] = [P xa P xb P x] = [P ] [xa xb x]
(10)
which is applied to all the brackets in (6). An ellipse with four contact points can therefore be expressed in point and line coordinates as: xT P ( )x
uT Q( )u
= 0
= 0
(11)
where P and Q are related to the point and line coordinates of the contact points according to (5) and (7). Since line and point coordinates of a conic are related by u = P x the relation uT Qu = 0 can be written as xT P T QP x = 0 from which we see that P = P T QP ; that is, we have the important relation: P ( )
=
Q?1 ( )
(12)
This equation relates point and line coordinates of contact points and will play an important part in the design of an algorithm for actually locating contact points.
2.2 Four and Five Contact Point Ellipses The cross ratio computed from the contact points is invariant and can be used to identify corresponding ellipses in projective transform pairs. In the general case, for a given shape the class of ellipses with four contact points will be in nite. It is, however, a one-parameter in nite family as can be seen from the following crude equation-counting argument. Suppose that the curve can be parameterized with a parameter s. The contact points are then represented by parameters sa ; sb; sc ; and sd . Together with the ve parameters for representing the ellipse, this gives us a total of nine parameters to be determined. These parameters are subject to the constraint that the ellipse should be 7
in contact with the shape at the four points. This gives four constraints for coincidence of points and four constraints for coincidence of tangents, making up a total of eight constraints. Nine parameters and eight constraints implies that the class of four contact point ellipses will be a one-parameter family. The important consequence of this is that in the generic case there will only be a nite number of ellipses with a speci c cross ratio; that is, we will have at most a nite ambiguity when identifying ellipses in projective transform pairs. An analogous argument can be applied to the case of ve contact points. In this case we will have ten unknown parameters and ten constraints, which in general will give a nite number of ellipses. In general there will be a multiplicity of cross ratios for four points or lines depending on the order in which the points are chosen. Permutations of the order of the points will give new values of the cross ratio. However, if we adopt the convention of choosing the points on the ellipse in a certain order, e.g. clockwise, and restrict the projective transformations to those leaving the order of the points invariant, it can be shown that only two dierent cross ratios can be computed for a set of four points on an ellipse. These two values of the cross ratio will be each other's inverses. By choosing ellipses with cross ratio = 1, this multiplicity is also taken care of. It can be shown that the projective transformations leaving the order of the points invariant are those corresponding to the case of relative camera motions with the object in front of the camera, which is a most natural constraint in real imaging situations. The parameters and cross ratio of the ellipses with four contact points to a smooth curve will in general vary smoothly with the contact points. When such an ellipse makes a fth contact with the curve, ve dierent cross ratios can be computed, each corresponding to a selection of four contact points. A ve contact point ellipse is therefore a common member of ve dierent families of four contact point ellipses with smoothly varying parameters and cross ratios. Five contact point ellipses can be seen as the analogues of the circles of the branching points in the skeleton of the medial axis transform. Figure 2 shows ve families of four contact point ellipses with varying cross ratio. The shape is a polygon and these ellipses were computed by an exhaustive search of all 8
combinations of four lines in the polygon. Figure 3 shows the ve contact point ellipse common to these families and ellipses with four contact points and cross ratio equal to one. As can be seen they correspond quite well to the intuitive decomposition of the shape in convex subparts. The exact relation between a certain shape and its associated families of four contact point ellipses is probably quite complex and a matter of further study.
3 An Iterative Algorithm for Fitting Four or Five Contact Point Ellipses 3.1 Structure of the Algorithm The ellipses of the polygon in Figs. 2 and 3 were computed by an exhaustive search among all combinations of lines in the polygon. It is however desirable to work with more general shapes extracted from grey-level images. The initial step of shape extraction from grey-level images is edge detection. The contours are then represented by the sampled image coordinates (xi ; yi) of the edge points. These coordinate lists are of nite precision, determined by the sampling grid, and long contours are often fragmented into smaller parts. An algorithm for computing contact point ellipses has to take all these facts into account. The algorithm works iteratively, starting with an initial ellipse and updating it in a manner similar to that of active contour nding (Kaas 1988). It is designed to have a xed point for ellipses with four contact points and unit cross ratio, and to break whenever ve contact points are encountered.
3.2 Main Iteration Loop The relation between the point and line coordinate matrices of an ellipse with four contact points was derived in the previous section (12). This relation is the basis for deciding whether four points on a curve can be the contact points of an ellipse. Given four points on a curve xa ; xb; xc; xd with tangents ua ; ub; uc ; ud; the quantity:
9
Figure 2: Five families of four contact point ellipses. Left: one family. Center: two families. Right: two families. The families in the center and right gure are symmetrically related by a re exion in the horizontal symmetry axis of the shape. All these families have the ve contact point of gure 3 in common.
Figure 3: Left: Five contact point ellipse. Center and right: four contact point ellipses with cross ratio equal to one .
10
2 min jjP ( ) Q( ) ? I jj
(13)
will be zero i the four points are contact points of the ellipse P (^ ) where ^ is the value of the cross ratio that minimizes the quantity. P and Q here are properly normalized. For arbitrarily chosen points the quantity will not be zero, but the ellipses P (^ ) and Q(^ ) will represent a certain best compromise. Minimization of the criterion in (13) is one step in the iterative algorithm, mapping points on the curve to an updated ellipse. We actually have two choices for the updated ellipse, P (^ ) and Q?1 (^). By choosing Pupdate = Q?1 (^) we will get a four contact point ellipse in one iteration in the case the shape is a four sided polygon, since all ellipses Q?1 ( ) are four contact point ellipses in this case. In order to have a complete iteration a method of updating the four points on the curve is required. For a given ellipse P we would like to nd the four new candidates for contact points. Referring to Fig. 4 start with computing the algebraic distance to each point xi ; yi on the curve. Using a normalized matrix P and homogeneous coordinates x = (xi ; yi ; 1)T we have (xi ; yi)
= xT P x
(14)
A contact point on the curve will have its tangent coinciding with the tangent of the ellipse P x. It will also be a local extremum of the algebraic distance (xi; yi ) along the curve. Since the level curves of constant algebraic distance corresponds to scaled versions of the ellipse, any local extremum of algebraic distance along the curve will correspond to a contact point of a scaled version of the ellipse, as shown in Fig. 4. The points of local extrema of algebraic distance therefore form good candidates to be used as tentative contact points in the iterative algorithm. There are however in general more than four local extrema of algebraic distance along the curve. In order to select four speci c points proceed as follows. 1. For each local extremum xi, the tangent ui is computed using the matrix P : u i = P xi . 11
Figure 4: Left: Ellipse in iterations, level curves of algebraic distance (broken) tangent lines of points of local extrema of algebraic distance (thin) and minimal polygon (wide). Right: four-line polygon and contact points for updating of ellipse. 2. From all the tangents ui , the polygon enclosing the center point of the ellipse and not containing any other tangents is computed. This is referred as the minimal polygon. 3. From all the tangents in the minimal polygon select the four that enclose the center point of the ellipse and have minimal area. This procedure gives four tangent lines to the curve (ua : : : ud ) and their corresponding tangent points (xa : : : xd ), for a given ellipse P . This closes the iteration loop. Note that four contact points will be xed points of the algorithm. The iterative algorithm is started by selecting an initial ellipse. This initial selection is a critical step for the convergence properties of the algorithm. This problem is treated in the experimental section The iterative algorithm for nding four contact point ellipses to a curve represented by its edge coordinates can be summarized in the following steps. (See gure 5) 12
1. Select an initial ellipse. 2. Compute points on the curve that are local extrema of the algebraic distance function generated by the ellipse. Compute tangents at these points. 3. Compute the minimal polygon from tangents enclosing the center of the ellipse. 4. Select the four lines in the minimal polygon that encloses the minimum area. Together with step 2 this gives four points and four lines. 5. Use these four points and lines to compute matrices P ( ) and Q( ) parameterized by the cross ratio . 6. Find cross ratio ^ that minimizes jjP ( ) Q( ) ? I jj2. 7. Update ellipse Pupdate = Q?1 (^). 8. Check for convergence to four or ve contact point ellipse. 9. If no convergence, go to step 2. Convergence is in two steps: 1. Convergence to an ellipse with four contact points but arbitrary cross ratio. 2. Convergence to an ellipse with four contact points and unit cross ratio or an ellipse with ve contact points. The criterion for judging whether a point is a contact point is simply its distance from the ellipse. If this is below a threshold the point is declared to be a contact point. Since all points have the property of being local extrema of algebraic distance along the curve, this will automatically mean that a point coinciding with the ellipse will have its tangent parallel to that of the ellipse that is, it will be a contact point. When convergence to four contact points has been declared, the cross ratio ^ computed by the minimization in (13) is checked to lie within a threshold of unity. If not, points of local extrema of algebraic distance along the curve whose tangents are in the minimal polygon, are checked to lie within threshold distance of the ellipse. If such a point is found, convergence to ve contact points is declared. If no convergence is declared, the ellipse is updated in order to nd unit cross ratio 13
Invariant ellipse fitting algorithm
Find local min and max of alg. distance along edges
Select initial ellipse P
Convergence
Update ellipse to unit cross ratio −1 P = Q(1)
Compute tangent lines at edge points of local min and max
no Find minimal polygon enclosing center of ellipse
yes
Check 5 point convergence 5 points within 1 pixel from ellipse
yes
Select 4 lines from minmal polygon , enclosing minimal area
no |cr_opt − 1| < thr ?
yes Compute pencils:
no
4 points −−> P(cr) 4 lines −−> Q(cr)
Check 4 point convergence 4 points within 1 pixel from ellipse ?
Compute optimal cross ratio cr:
Update ellipse
min { || P(cr) Q(cr) − I || } cr
−1 P = Q(cr_opt)
Figure 5: Block diagram of algorithm for invariant ellipse tting 14
ellipses. This updating is simply Pupdate
= Q?1 (1)
(15)
That is, P is updated as the ellipse with unit cross ratio in the pencil of ellipses inscribed in the four-sided polygon made up of the tangents of the four points. This can be seen as a linear extrapolation since if the curve to which we try to t ellipses is actually a four-sided polygon, we will get convergence in one step. The iterations are stopped if no convergence is found within 40 iterations. Iterations are also stopped if less than four lines are found in the minimal polygon.
3.3 Experimental Results on Synthetic Shapes Figure 6 shows an example of the evolution of the initial steps of the iterative algorithm on a synthetic shape. The tangent lines corresponding to local extrema of the algebraic distance of the ellipse are shown in white and the resulting minimal polygon in black on the left side of the gure. The right side shows the polygon of minimal area selected from the minimal polygon and the updated ellipse. In this case convergence to a four contact point ellipse is in 3 iterations. Typically the convergence to a four contact point ellipse is in 2 to 5 steps, while the convergence to a unit cross ratio ellipse is 5 - 15 iterations. A critical step in the algorithm is of course the selection of the initial ellipse. The algorithm that has been described will converge to a single four or ve contact point ellipse. For a given shape there will in general be several four contact point ellipses with unit cross ratio and ve contact point ellipses. Selection of the initial ellipse will of course determine to which of these ellipses the algorithm will converge. This means that the parameter space of ellipses can be partitioned into regions, with each region corresponding to the convergence to a certain unit cross ratio four contact point ellipse or a ve contact point ellipse. At least one initial ellipse must be selected from every such region in order that all four contact point ellipses are found. Since four contact point ellipses with unit cross ratio seem to correspond rather well with a decomposition of the object into convex subparts, a natural way to nd initial ellipses is to use a very simple initial decomposition into convex subparts, and use a simple ellipse approximation of these subparts. This initial decomposition need not be perfect in any way. The important 15
Tangent lines and minimal polygon: iteration 1
Minimum area 4 side polygon: iteration 1
Tangent lines and minimal polygon: iteration 2
Minimum area 4 side polygon: iteration 2
Tangent lines and minimal polygon: iteration 3
Minimum area 4 side polygon: iteration 3
Figure 6: Steps in the iterative ellipse tting algorithm. The shape contour is in full black. Left: tangents of points of local extrema of algebraic distance (white,) lines of minimal polygon (black). Right: updated ellipse and minimum area four-side polygon. 16
thing is that it should yield an ellipse within the convergence region for the four contact point ellipse corresponding to that subpart. In the case of synthetic images the algorithm of (Homan 1984), where convex subparts are found by computing points of local minima of negative curvature along the shape contour, will be used. For the contour points pn = (xn ; yn )T between two such local minima we nd the center of gravity pcg and positive de nite 2 2 scatter matrix S : pcg
=
XN n
n=1
p =N
S
=
XN ( n ? cg)( n ? cg )T
n=1
p
p
p
p
=N
(16)
With p = (x; y )T , the initial ellipse is chosen as (p ? pcg )T S ?1(p ? pcg ) = 1
(17)
This ellipse will give a reasonably good approximation to the shape of the points p1 : : : pN . In this process the end points of the contour segment between the local curvature minima are given extra weight, which will bias the initial ellipse towards the junction of the subpart with the main shape. The results of choosing initial ellipses in this way are shown in Fig. 7 on the left side. The right side shows the resulting unit cross ratio four contact point ellipses and the ve contact point ellipses. The result con rms the initial assumption that unit cross ratio four contact point ellipses correspond to the decomposition into convex subparts. Several alternative initializations with randomly selected starting ellipses gave the same results and no alternative four contact point unit cross ratio ellipses. Convergence in those cases needed more iterations however.
4 Model Library Indexing Using Conic Pair Invariants 4.1 Edge Detection and Ellipse Fitting In order to test the performance of viewpoint invariant recognition, images of 3 planar objects were acquired in two dierent viewpoint. These are shown in gures 8 9 and 10 The edges of the objects were found with a standard edge detector and the linked edge coordinates were used to compute ellipses with 4 contact points and unit cross ratio or 5 contact points. 17
Figure 7: Initial ellipses from subpart approximation using negative curvature maxima (left) and resulting ellipses with unit cross ratio after applying the iterative algorithm (right) 18
Since the edge detection algorithm applied to real world grey value images, in general gives rise to only fragmented lists of edge data, the method of nding initial ellipses based on approximative subparts could not be used . The initial ellipses were in this case chosen as circles, regularly spaced over the object with distances 20 pixels apart. The result of edge detection and ellipse shape decomposition for each object is also shown in the gures 8 9 and 10. Although edge detection on real world images will seldom result in perfect contours, this is not critical for the ellipse tting algorithm to work. The linked edge sequences only have to be long enough in order that the tangent can be computed using the point of local extrema of algebraic distance and the matrix of the ellipse. Important to note is however that if breaks occur in the edge sequences at contact points of unit cross ratio ellipses, this will of course aect the outcome. Another serious problem is the presence of spurious edges, leading to unwanted ttings. An example of this can be seen in the second view of object A in the upper right part of the edge picture. Since there is a threshold in determining whether an ellipse has converged to unit cross ratio and since dierent initial ellipses will have dierent trajectories of convergence, there is a slight spread of converged ellipses in some cases. The threshold for deciding unit cross ratio was in these examples 0:01. Contours on the object that are ellipses will in general result in ve contact point ellipses.
4.2 Invariants From Pairs of Ellipses The output of the tting algorithm are ellipses with four or ve contact points with the shape contour. If the shape is transformed projectively, the ellipses and contact points computed in the transformed frame will be in projective correspondence with the ellipses and contact points in the original frame. They can therefore be used for computation of transformational invariants. Four points and four lines through these points is a con guration with 12 degrees of freedom. If the four points are contact points of an ellipse with a speci c cross ratio, the lines are completely determined by the points via the four contact point ellipse. Every set of four contact point and lines therefore only has 8 degrees of freedom. This is the same number as the d.o.f. of the projective transform, which implies that in general no projective invariant can be computed from a single set of four contact points and lines. In order to compute projective invariants 19
Figure 8: Grey level images (upper) and edges with resulting ellipses (lower) Object A. 20
Figure 9: Grey level images (upper) and edges with resulting ellipses (lower) Object B. 21
Figure 10: Grey level images (upper) and edges with resulting ellipses (lower) Object C. 22
we have to use at least two sets of contact points and lines. We can also use the pairs of ellipses from two sets of contact points. Since ellipses have 5 d.o.f. a pair of ellipses will have 10 d.o.f. and therefore we can expect 10 ? 8 = 2 independent invariants for each ellipse pair. Given two coplanar conics described by the symmetric matrices P1 ad P2 normalized so that: det(P1 )
=
det(P2 )
= 1
(18)
it is possible to de ne two projective invariants (Forsyth 1991): I1
=
T race(P1?1 P2 )
I2
=
T race(P2?1 P1 )
(19)
Since these invariants can be computed from both four and ve contact point ellipses they have been chosen for simplicity. It should be noted that a very important feature of the invariant ellipse tting algorithm is that it performs a grouping of points and lines that is invariant over projective transformations. Four contact points and lines associated with a certain ellipse will be in projective correspondence with the contact points and lines of the corresponding ellipse in the transformed frame. Given n points in an image, the number of invariants that can be computed from these are of the order n5 , while the number of invariants computed from pairs of ellipses will be of the order n2 . The grouping performed by the algorithm therefore controls the combinatorial explosion of the number of invariants associated with the use of points and lines. It is important to note that the ellipses and contact points provide even more invariant information that could potentially be used. Examples of such information are given by:
Colour , texture or grey scale measures in the areas enclosed by the ellipses
Information whether the ellipse pair is a 4-4, 4-5, or 5-5 contact point pair
Number of intersections of the ellipses in the pair Convexity/concavity of the contact points relative to the tangent of the contact point
23
Object A
I2
Object B
7
7
6
6
5
5
4
4
3
3
2
2
1
I2
0
1 0
-1
-1
-2
-2
-3
-3
-4
-4
-5
-5
-6
-6
-7 -7 -6 -5 -4 -3 -2 -1 0 I1
1
2
3
4
5
6
7
2
3
4
5
6
7
-7 -7 -6 -5 -4 -3 -2 -1 0 I1
1
2
3
4
5
6
Object C 7 6 5 4 3 2 I2
1 0
-1 -2 -3 -4 -5 -6 -7 -7 -6 -5 -4 -3 -2 -1 0 I1
1
Figure 11: Invariants I1, I2 computed for pairs of views for objects A, B and C. (Only ellipses that were judged to have a counterpart in the opposite view are used) 24
7
In order to test the stability and robustness of the conic pair invariants, they were computed from ellipses in the gures 8 9 and 10, and the result is shown in g. 11 for each object . The axis here represent the signed logarithms of the absolute values of the invariants I1 ; I2 In order to reduce the eects of the spread in ellipses that converge to the same four contact points, a simple clustering of ellipses was performed so that all ellipses that dier in center with less than 5 pixels were grouped together. Only ellipses that were judged to have a counterpart in the opposite viewpoint image were selected in this gure. Ideally ,therefore for a certain object the values of the invariants from the two viewpoints should coincide. As can be seen from g. 11 this is most often the case for all objects with a few notable exceptions. Reasons for deviation of invariants in dierent viewpoints are imperfect edge detection, discrete contour point representation and termination criteria of the ellipse extraction algorithm. The most important reason for deviation can be seen if we study the edge images of objects A and B. Note that since the objects have a height of 2-3 mm, edges will be obtained both from the top and bottom part o the objects. Due to occlusion and imperfect edge detections, ellipses will in some cases have contact points with contours that do not correspond to a planar contour of the object but are combinations of projections from the top and bottom part of the object. These ellipses will not correspond to ellipses that are perfectly coplanar with the object surface which of course will aect the invariants.
4.3 Model Library Indexing Given an image of an object ellipses can be extracted and invariants can be computed. The invariants can then be used as indexes into a library of models. The degree of correspondence between the invariants computed from the image and those with a certain model in the library will be a measure of the probability that the model is present in the image. Due to the nite tolerance of the ellipse tting and other factors such as non planarity of tted ellipses, the correspondence between image and model invariants will never be perfect. In general therefore, an image of a speci c library object will give rise to multiple hypothesis of library objects. This number is of course dependent on the threshold set in order to guarantee that an hypothesis is generated for the correct object. 25
The representation of an image and an object in the library will be in the form of a table, associating invariant values (I1 ; I2) to each pair of ellipses. The degree of correspondence between an image an a library model is based on the number of corresponding ellipses in the image and model. Correspondence between an ellipse in the image and an ellipse in the model is measured by how well the invariants (I1; I2), computed for the image ellipse with all other image ellipses, agree with those computed for the model ellipse with all other model ellipses. Due to the presence of clutter edges in the image and possible lack of convergence of certain ellipses image pairs cannot be expected to have a corresponding pair in the model. The total number of established ellipse pair correspondences for an image ellipse and a model ellipse will of course depend on the total number of ellipses in the image and model We will therefor declare an image ellipse and a model ellipse to be in correspondence if the number of corresponding ellipse pairs derived from them exceeds a threshold controlled by the total number of ellipses in the image and the model. For each model model we then get a speci c number of ellipse correspondences that will measure the probability of the presence of the model in the image. This algorithm can be implemented eciently using a hash table in a way very similar to that used for ane point matching in (Lamdan 1988). Using a hash table means that we avoid searching through every model in the library. The values of the invariants I1 and I2 are quantized into bins. Two ellipse pairs are said to be in correspondence if their corresponding invariants fall in the same bin. In each bin we store models and ellipse pairs with invariants falling in this bin. For each image ellipse eim i we then compute the invariants generated by every other image ellipse. These invariants are used as indexes into the hash table, and for every mod model and ellipse pair emod m ; en which appears there we vote for the association between mod mod the image ellipse eim i and the ellipses em and en of that model. Model representation and recognition can be summarized as: Model representation. 1. For each model and pair of ellipses, compute the invariants (I1 ; I2) and quantize them. 2. Enter the model and the ellipse pair into the bin in the hash table 26
determined by the quantized values of (I1; I2) Recognition 1. Given an image, extract all ellipses and for each ellipse pair compute the invariants (I1; I2) 2. For each ellipse eim i in the image, Use the values of (I1; I2) to enter into mod the hash table. For every model and model ellipse pair emod m ; en which appears there, vote for the association with the image ellipse eim i . 3. For each ellipse eim i in the image, threshold the number of votes for every model ellipse. If above threshold declare a correspondence between the image and model ellipse. 4. For each model, sum the number of ellipse correspondences with the image. This number measures the strength of the hypothesis that the model is present in the image In order to evaluate the performance of the model representation and recognition scheme, a large model data base of 100 models was created. Each model consisted of 6 ellipses. These ellipses were constructed by randomly choosing the parameters of 3 ellipses an re ecting these in a symmetry axis. Each model can then be said to represent a symmetric object of the same degree of complexity as that of the objects in the gures 8 9 and 10. The reason to make all model objects symmetrical is that due to the projective correspondence between ellipses, a certain image ellipse will correspond both to the correct model ellipse and its symmetric counterpart. We can therefore expect twice as many image model ellipse correspondences for symmetric object compared to un symmetric objects. All the ellipses in the perspective view images of the objects A, B and C in the right part of the gures 8 9 and 10 were tested against the database of the 100 random models and models created from the ellipses extracted from the images of the near vertical camera direction of the objects. The result in terms of a histogram of the number of ellipse correspondences for each object with the random models and the correct model is shown in gure 12 27
Number of models
Number of models
80 70
Number of models
80
Object A
70
80
Object B
70
60
60
60
50
50
50
40
40
40
30
30
30
20
Model A: 9
10
20
Model B: 10
10
20
Object C
Model C: 10
10
0 2 4 6 8 10 12 14 16 18
0 2 4 6 8 10 12 14 16 18
Ellipse correspondences
0 2 4 6 8 10 12 14 16 18
Ellipse correspondences
Ellipse correspondences
Figure 12: Histogram of number of ellipse correspondences between images of objects A, B and C and library of 100 random models The histograms show that that very few models in the library give rise to the same number of ellipse correspondences or more, as that of the true object. For object A there are 5 library models with 10 or more ellipse correspondences with image A, which is the number of correspondences between image A and model A. For objects B and C this number is only 1. The relatively higher number o for object A can be explained by the fact that the perspective image of object A contains a number of clutter ellipses that are not present in the model of A. This is not the case for objects B and C. The histograms of gure 12 indicate how many hypothesis that have to be tested for veri cation for each image. This number will of course depend on what number of ellipse correspondences that we choose as a threshold. This number will be a trade o between the certainty of generating the true hypothesis vs. the number of false hypothesis generated. The complete recognition process will have to include some kind of hypothesis veri cation 28
process. This can be implemented using reprojection to a canonical frame of both model and image and matching of complete shape contours in this frame (Rothwell 1992-1). Using the contact points and lines, every four contact point ellipse with unit cross ratio can be used to compute a canonical frame. The tangent lines of the contact points can be transformed to the unit square. The unit cross ratio ellipse will then be transformed into the unique circle inscribed in the unit square. The contact points are therefore transformed to the midpoints of the segments of the unit square.
5 Conclusions An algorithm has been presented for computing ellipses from edge coordinates of an object. For planar objects these ellipses will be related to the shape in a projectively invariant way and can therefore be used for viewpoint independent recognition. For synthetic shapes with perfect edge data the algorithm will converge very fast if the initial ellipses are chosen as approximations to the convex subparts of the shape. For real world edge data the initial problem of ecient initial selection of ellipses is still unsolved. This problem aects however primarily the computational complexity of the algorithm, since we are forced to use many initial ellipses in order to assure convergence to all unit cross ratio ellipses. The robustness of the algorithm with respect to imperfect edge detection also has to be improved. Especially the problem of disappearing of edges, due to illumination has to be considered. These problems have to be evaluated relative to the applications however. It is probably not necessary to extract all 4 contact point ellipses of an object in order to achieve recognition. In the case that truly corresponding ellipses have been extracted, the pairwise invariants computed show very good agreement in the two viewpoints. Due to the tting of ellipses to slightly nonplanar contour con gurations, the invariants will not be in perfect agreement over projective transformations. This seems to be the major cause of instability of the invariants. By matching invariants extracted from an image with those stored in a model library consisting of 100 random models, very few (1 - 5) false hypothesis were generated if 29
the threshold of the number of ellipse correspondences was set just in order for the correct model hypothesis to be generated. Although more exact comparisons with other invariant indexing systems based on lower level geometric features (Rothwell 1992-2), (Stein 1992) remains to be established, it is believed that using invariant ellipses would lead to a more robust recognition system due to its more global properties. The negative side compared to using simpler features is of course the increase in complexity of the feature extraction process.
Acknowledgements I would like to thank Lars Svensson of the Royal Institute of Technology for interesting discussions on invariants. This work was part of Esprit Basic Research Action 6448, VIVA, with support from Swedish NUTEK.
References [1] Barrett, E.B, Payton, P.M, Haag, N.N. and Brill, M.H. (1991) General methods for determining projective invariants in imagery. CVGIP-IU 53, pp. 46-65. [2] Blum, H. (1973) Biological shape and visual science. Journal of Theoretical Biology, 38, pp. 205-287. [3] Bookstein, F. (1979) Fitting Conic Sections to Scattered Data. Computer Graphics and Image Processing 8, pp. 56 - 71. [4] Brady, M. (1983) Criteria for representation of shapes. In: Rosenfeld and Beck (eds), Human and Machine Vision, Erbaum, Hillsdale NJ, pp. 39 - 84. [5] Carlsson, S. (1992) Projectively invariant decomposition of planar shapes. In: Mundy, and Zisserman (eds), Geometric Invariance in Computer Vision, MIT-Press , pp. 267 - 273. [6] Forsyth, D.A., Mundy, J.L., Zisserman, A.P. and Brown, C. (1990), Invariance-a new framework for vision. In: Proc. of 3rd ICCV, pp. 598-605. 30
[7] Forsyth, D.A., Mundy, J.L., Zisserman, A.P., Coelho, C., Heller, A. and Rothwell, C.A. (1991) Invariant descriptors for 3-D object recognition and pose. IEEE PAMI Vol. 13, No 10 , pp. 971 - 991. [8] Homan, D.D. and Richards, W. (1984) Parts of recognition. Cognition, 18, pp. 65-96. [9] Kass, M., Witkin, A. and Terzopoulos, D. (1988) Snakes: Active contour models. International Journal of Computer Vision 1, pp. 321-331. [10] Lamdan