Image registration and object recognition using affine ... - CiteSeerX

11 downloads 423 Views 717KB Size Report
to handle the occlusion and/or appearance of new objects. These invariants are .... reference object into the test object domain (or vice versa). The matching is ...
934

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 7, JULY 1999

Image Registration and Object Recognition Using Affine Invariants and Convex Hulls Zhengwei Yang and Fernand S. Cohen, Senior Member, IEEE

Abstract— This paper is concerned with the problem of feature point registration and scene recognition from images under weak perspective transformations which are well approximated by affine transformations, and under possible occlusion and/or appearance of new objects. It presents a set of local absolute affine invariants derived from the convex hull of scattered feature points (e.g., fiducial or marking points, corner points, inflection points, etc.) extracted from the image. The affine invariants are constructed from the areas of the triangles formed by connecting three vertices among a set of four consecutive vertices (quadruplets) of the convex hull, and hence do make direct use of the area invariance property associated with the affine transformation. Because they are locally constructed, they are very well suited to handle the occlusion and/or appearance of new objects. These invariants are used to establish the correspondences between the convex hull vertices of a test image with a reference image in order to undo the affine transformation between them. A point matching approach for recognition follows this. The time complexity for registering L feature points on the test image with N feature points of the reference image is of order O(N 2 L). The method has been tested on real indoor and outdoor images and performs well. Index Terms—Affine invariants, affine transformations, alignment, convex hull, occlusion, perspective, registration, weak.

I. INTRODUCTION

I

MAGE registration and object recognition/matching is concerned with the problem of identifying correspondences between parts of an image and a particular view of a known image scene (object). The transformation that describes the projective relationship between an object and its image is the nonlinear perspective transformation. Since under weak perspective conditions, the perspective transformation can be well approximated by an affine transformation, we limit our discussion here on the simpler affine transformation. Moreover, the correspondence could be with respect to the whole image or part of it, and an image may contain part of an object, a whole object, or more than one object. Hence, the problem of interest is the image registration in the presence of possible occlusion/appearance of new objects (i.e., disappearance of feature points and the appearance of new ones)

Manuscript received August 5, 1996; revised September 16, 1998. This work was supported by the National Institutes of Health under Grant P41 RR01638. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. John Goutsias. Z. Yang is with the KLA-Tencor Corporation, San Jose, CA 95134 USA. F. S. Cohen is with the Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104 USA (e-mail: [email protected]). Publisher Item Identifier S 1057-7149(99)05112-X.

and under an unknown viewing position (i.e., unknown affine transformation). A number of methods have been proposed for matching and object recognition, including alignment [11], linear combination [26], Fourier descriptors [1], [21], [27], correlation registration [22], shape description by time series [12], syntactic methods [8], photometric invariance [19], moment invariant methods [6], [10], [24], B-spline invariant matching [29], geometric invariants [16], and others. Basically, each method exploits different image features of the object such as boundary, photometric properties, color, texture, moments, discrete feature points like vertices, fiducial marks, etc., which are extracted from the image, and uses them toward matching with a reference object. In this paper, the emphasis is on the discrete feature point matching problem, i.e., the registration or matching process is based on a set of scattered feature points (e.g., fiducial or marking points, corner points, inflection points, etc.) which are extracted from the image. In this paper, the emphasis is on the matching process rather than on process of extracting the feature points. In this work, the images are enhanced, filtered, and thresholded. The feature points here are the corners of the different parts which can be detected with the techniques in [2] and [23]. Object alignment or registration can be achieved in many ways. By and large, there exist two approaches for the alignment. The first relies on establishing the correspondences between the feature points, whereas the other bypasses it by using invariant measures that remain unchanged under the transformation. Huttenlocher and Ullman [11] recognize flat objects by alignment of point triplets amongst feature points feature points of the of the reference scene image and test scene image. The complexity of their algorithm for the , where is worst case scenario is of order the complexity associated with verifying the reference image against the test image. Lamdan et al. [13] proposed a point matching method that formed an affine invariant basis with a noncollinear triplet of model points, and used it to represent all the other model points. The complexity of this preprocessing per model. Costa, Haralick, and Shapiro step is of order [4] improved the affine invariant point matching algorithm’s robustness by including an explicit noise model and an optimal voting approach. Thompson and Mundy [15] used a clustering approach to undo the affine transformation between groups of model vertices and edges and the appropriate groups in the image under an affine approximation to the perspective transformation. To bypass the necessity of establishing correspondences, geometric invariant measures can be used

1057–7149/99$10.00  1999 IEEE

YANG AND COHEN: IMAGE REGISTRATION

to identify the best matches. Moment invariants are such measures and they have been used for object matching [6], [10], [24], as well as for finding the affine transformation that relates the prototype object to the transformed image [5], [7]. Huang and Cohen [28] recently reported the use of B-spline affine invariant weighted-moments to analytically solve for the affine transformation parameters. Yang and Cohen [32] used a new set of cross-weighted moments derived for scattered data to analytically solve for the affine transformation parameters. In general, moment invariants are hard to use in the presence of occlusion, as they are global measures. Ibrahim and Cohen [31] used intrinsic properties of a curve modeled by a B-spline in deriving landmarks that are preserved under the affine transformation. This paper uses a hybrid approach that utilizes affine invariants to establish the correspondence between the convex hull vertices of a test image with a reference image in order to undo the affine transformation between them. A point matching approach for recognition follows this. This paper is trying to make use of the well-known affine invariant property—the area invariance, to match and recognize locally occluded or changed objects or scenes effectively and efficiently. It uses the area invariance property of affine transformation to develop a class of ordered local affine invariant features based on the convex hull. The algorithm can handle the case where the scene or object is uniquely represented by a polygonal outlines, as well as when it is represented by a set of scattered feature points with no ordering information of these feature points like a polygon has. From a set of scattered feature points there is a unique convex hull, but many polygons can be extracted. Thus, we use the convex hull as a representation of the scene or the collection of objects. It is true that objects which are very different in shape may have identical convex hulls, and that would be a problem if we were to exclusively use the convex hull affine invariants to do the matching. In this paper, the convex hull affine invariant features are used for establishing the correspondence of the convex hulls’ vertices. This allows the recovery of the affine transformation and the mapping of the reference object into the test object domain (or vice versa). The matching is then done using all feature points not only those on the convex hull. This rules out identifying the different class of objects as the same even though their convex hulls are. This paper is organized as follows. The preliminary background covering the concepts of affine transformation and invariants are introduced in Section II. Section III answers the questions of why the convex hull is used and how a convex hull can be found from a set of scattered data points. In Section IV, the problem of how to construct the convex hull affine invariants is addressed. Section V discusses the process for establishing correspondences between the vertices of the convex hulls for recovering the affine transformation. Object identification is discussed in Section VI. The computational complexity of the algorithm is given in Section VII. The experimental examples are given in Section VIII. Finally, conclusions are presented in Section IX.

935

II. AFFINE TRANSFORMATION AND INVARIANTS Before we embark on discussing the convex hull affine invariants, we briefly introduce the affine transformation that we deal with in this paper, and discuss its properties, as well as introduce the concept of invariance. A. Affine Transformation and Its Properties An affine transformation is given by between point pair and

in 2-D space

(2.1) It is a combination of several simple special mappings, such as the identity, a translation, a scaling, a rotation, a reflection, . The and a shear. It is a linear transformation if Euclidean transformation (rigid body motion) is a special case of the affine transformation. It preserves lengths and angles while the affine mapping, in general, does not. In affine is full, i.e., , then mapping, if the rank of the affine transform maps two-dimensional (2-D) objects onto , it maps 2-D objects onto a 2-D objects. If straight line (rank 1). So, from here on, we assume that the has full rank. transformation matrix Affine mappings have many properties [18], of which the following are the most important in deriving affine invariants. of the image of an object is equal to 1) The area times the the product of its original object area , i.e., determinant of the transformation matrix . 2) Parallel lines map onto parallel lines, intersecting lines map into intersecting lines with the point of intersection of two lines mapping into the point of intersection of their images. 3) Any three noncollinear points have noncollinear images. 4) Any convex set has a convex image. B. Affine Invariants There exist two kinds of invariants: relative invariants is a and absolute invariants. A relative invariant and parameters , of which function of the coordinates the form remains unchanged except for a proportionality factor associated with the transformation parameters under a given group of transformation. If under the transformation the invariant does not change at all, it is called an absolute invariant (i.e., the proportional factor is now equal to one). Mathematically, a relative invariant is represented by (2.2) is a proportionality that depends only on the transformation is the transformed parameters, is a constant factor, and . Consequently, for an absolute invariant, point, i.e., . Furthermore, if is independent of the object point coordinates , the invariant is a constant for a given object and a given transformation, and it is called a global invariant. Otherwise, it is called a local invariant. The dependence of the local invariants on the coordinates can be eliminated, as for example, by integrating or summing

936

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 7, JULY 1999

Fig. 1. Convex hull of a set of scatter feature data.

(a)

local invariants over . In contrast to global invariants, the local invariants land themselves to dealing with the occlusion problem. In general, an absolute invariant can be obtained by eliminating between two relative invariants. For example, given two relative invariants (2.3) (2.4) An absolute invariant may be derived as (2.5) The invariant is named after the undergone transformation, e.g., an affine invariant is an invariance up to affine transformations. III. CONVEX HULL In this section, we introduce the convex hull, and its properties [20]. For a set of points in the plane, the convex hull is the smallest convex object containing all the points. The convex hull bounds the set of points from the outside, as illustrated in Fig. 1. The convex hull possesses very attractive properties that make it suitable for shape representation and analysis. 1) It has uniqueness. 2) It has computational efficiency—the upper bound of the computational complexity associated with finding , and the convex hull of data points is of order if the feature points are unithe expected complexity is formly distributed [17]. Moreover, the ordering of the vertices of the convex hull is readily available—a property that is very useful for deriving local absolute affine invariants for matching in the presence of occlusion and/or appearance of new objects. 3) It has affine invariance, which means that the convex hull of a data set subjected to an affine transformation is simply the affine transformed convex hull of the data before the transformation. 4) It has local controllability—which means that when feature points are either added to or subtracted from the original data set, the convex hull is locally affected, i.e., vertices in the vicinity of these changes either appear or disappear from the original convex hull. There are many algorithms for computing convex hulls of sets of points [17]. In this paper, we have used the efficient, recursive, and simple algorithm developed by Bykat [3] to find the convex hull of a set of points. For details, the reader is referred to [3]. The algorithm terminates with a convex hull with an ordered set (in a clockwise direction) of vertices.

Fig. 2.

(b) Constructing convex hull affine invariants.

IV. CONVEX HULL AFFINE INVARIANTS In a test scene image, which has undergone an affine transformation and/or occlusion, a part of the convex hull may change and the number of vertices of the convex hull associated with the data set may increase, decrease, or remain unchanged. As a result of the fact that the convex hull is affine invariant and locally controllable as indicated in Section IIIA, it follows that local affine invariants can be constructed from the convex hull to deal with the affine transformation and/or the addition or subtraction of feature points from the original data set. One possible set of local invariants that make direct use of the area invariance property associated with the affine transformation, may be constructed from the areas of the polygons formed by connecting the neighboring vertices on the convex hull. Since the simplest polygon is a triangle, which uses only three points, the affine invariants are derived from the triangle areas. To arrive at an absolute invariant, at least two relative invariants should be considered. Hence, by considering four noncollinear consecutive vertices on the convex hull, four different triangles can be formed. From these four different triangles, local absolute affine invariants can be derived as follows. For a convex hull of a reference scene image or object with ordered vertices ), (the vertices form a polygon), arbitrarily ( pick up the th quadruplet, formed from the four consecutive ), as shown in Fig. 2(a). Let vertices ( be the convex hull of the data set after the affine transformation and/or possible occlusion or addition, with vertices ( ), of which vertices are transformed from the reference convex hull vertices. Assume ) is an affine the th quadruplet ( ), mapping image of the th quadruplet ( as shown in Fig. 2(b). There are four possible triangles formed by considering three vertices at a time from the th ), these are: ( ), quadruplet ( ), ( ), and ( ) ( , respectively. Their with areas four counterpart triangles formed by considering the ) corresponding th quadruplet ( ), ( ), are: ( ), and ( ) with ( , respectively. areas and be the quadrangle areas of Let th quadruplet ( ) and th the ), respectively. quadruplet (

YANG AND COHEN: IMAGE REGISTRATION

937

. These areas are relative affine invariants. From the affine transformation’s area invariance, we have (4.1) (4.2) where

(a)

(b)

(4.3) (c)

(d) Fig. 3.

Convex hull simulation data.

all

and . Note that , and are also invariants. But they are just the reciprocal of , and do not give any new information. Therefore, they are not used. Another set of absolute invariants could be obtained by taking the ratio of the area of any one of the four triangles to the area of the quadrangle. By dividing (4.1) by (4.2), we obtain the following four local absolute affine invariant pairs: (4.4) Local absolute affine invariants can be obtained by eliminatwith a combination of relative affine ing the factor invariants (at least two relative affine invariants, i.e., a pair of areas of triangles or a triangle area and a quadrangle area) from (4.3) and (4.4). For every given quadruplet, there are two different kind of basic absolute affine invariants. Each kind of basic absolute affine invariants forms an invariant feature vector for a given quadruplet. One possible set of absolute invariants (affine invariants of the first kind) can be arrived by considering the ratio of any pair of triangles from (4.1). There are six local absolute invariants in total that can be obtained by considering a pair of areas at a time. These are and , with their counterparts being and , respectively. Ideally

and

(4.5)

This set of invariants is denoted as the first invariant feature vector (FIFV). These two corresponding affine invariant sets form two feature vectors and for

(4.6) This set of invariants is denoted as the second invariant feature vector (SIFV). Correspondingly, two feature vecand tors are , respectively. The set of invariants in (4.6) is not totally independent of the set in (4.5), and is shown (see Section VIII-A) to be outperformed by the set in (4.5) in the low level noise case. This is due the normalization of the areas by the total area that weakens their discrimination ability. On the other hand, as the noise increases, the normalization factor reduces the noise contribution, and hence a slightly better performance is expected. As a consequence, in all the experiments, we elected to use the set in (4.5) for registration. Obviously, more consecutive vertices can be used to derive affine invariants. We can, for example, use five consecutive points to form quadrilaterals or six consecutive points to form pentagons. V. ESTABLISHING CORRESPONDING VERTICES ON THE CONVEX HULLS Objects (presented here as a collection of feature points) that are very different in shape may have identical convex hulls. That poses a problem if we were to exclusively use the convex hull affine invariants for object classification (identification).

938

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 7, JULY 1999

TABLE I MATCHING STATISTICS USING THE AFFINE INVARIANT FEATURE IN (4.5) VERSUS THE ONES IN (4.6)

This, however, does not present a problem if the convex hull affine invariant features are used for establishing the correspondence of vertices for the purpose of recovering the affine transformation, and the mapping of the reference object into the test object domain (or vice versa). Classification is then done using all the feature points not only those on the convex hull. Hence, ruling out identifying the different class of objects as the same even though their convex hulls are. The residual error matching process based on the convex hull affine invariants introduced shortly is chosen with that in mind. For each quadruplet of the convex hull, there is an affine invariant feature vector associated with it, and because the vertices of the convex hull are ordered, this induces an ordering of the feature vectors along the convex hull. In the absence of occlusion or overlap, or scene changing, the number of invariant feature vectors associated with the convex hull of the affine transformed scene will be same as that of the original will have convex hull, and each invariant feature vector . However, if the object is occluded (or a counterpart overlapped), or the scene is partially changed, the number of feature points may change, and so might the convex hull. If any one or more of the vertices of a convex hull is (are) changed, the associated invariant feature vectors are consequently changed. The correspondence problem under this scenario is to find out the vertices (on the convex hull on the original image) that appear as vertices on the convex hull of the affine transformed and possibly occluded image. To establish this correspondence, the following matching error measure is used

(5.1) is the number of consecutive feature vectors (i.e., consecis balancing between relying utive vertices). The number on local shape through the use of local invariants to establish correspondence versus relying on more regional shape. The smaller the , the better the localization of the corresponding vertex pairs, but the worse the detection of such pairs. This is

FOR

DIFFERENT NOISE LEVELS AND DIFFERENT

w

a result of the fact that a single affine invariant feature vector is sometimes not reliable, as a consequence of its construction which is based on area measure with somewhat limited local shape discrimination ability, and/or because of the presence of noise. We have adopted the following simple matching decision rule: (5.2) from the reference image It establishes that the quadruplet from the test image are corresponding and the quadruplet pairs. The subscript takes module and the subscript takes ( is the number of the vertices of the reference module is the number of the vertices of the test convex hull and convex hull). Since the convex hull vertices are ordered, the matching is basically a linear circular shift search. This is a big advantage of having the invariants ordered. Based on used, a declared match gives a set of four the value of , five consecutive corresponding vertices in the case of , and consecutive corresponding vertices in the case of . six consecutive corresponding vertices in the case of Remark: A more complicated but more noise resilient approach to establish correspondences is to consider all possible sets of three vertices on the convex hull (not necessarily consecutive) and pick the ones that result in error values below an allowable threshold. Such an approach would necessitate the construction of a hash table [9], [14], [25] for speeding up the process of finding the corresponding vertices. A. Affine Transformation Recovery The affine transformation can be recovered a pair of matched triplet, or estimated from matched vertices by using least square error estimation methods. If there are more than correspondent vertex pairs, say and , where affine transformation can be obtained as the solution

from more (LSE) three , the of the

YANG AND COHEN: IMAGE REGISTRATION

939

(a)

(b) Fig. 4.

(c)

Image scenes with objects (parts) added or disappearing.

(a)

(b)

(c)

Fig. 5. Convex hulls of Image scenes with objects (parts) added or disappearing. TABLE II TRUE VERTEX CORRESPONDENCE BETWEEN THE CONVEX HULL OF THE REFERENCE IMAGE AND THAT OF THE TEST IMAGE (HIGHLIGHTED VERTICES INDICATE THE CORRESPONDING AFFINE INVARIANTS)

LSE estimation which minimizes the LSE measure (5.3) with (

respect

to

the unknown parameters ). The LSE solution is closed form.

VI. OBJECT CLASSIFICATION

Fig. 6. Convex hulls superimposed with the recovered affine transformation.

We are given a test scene that has originated from one of different prototype objects (scenes) after possible affine transformation and possible addition and/or subtraction of some of its parts. The goal here is to classify the test scene to one of the prototypes and to identify the corresponding parts. As stated in Section IV, objects that are very different in shape may have identical convex hulls. To overcome that problem we adopt a classification scheme that is based on the feature point coordinates. This is accomplished by performing a “best” match between the sample data and each of the prototype objects using the convex hull affine invariants to establish the corresponding vertices between the sample object and the presumed prototype object. From this correspondence, is the “best” affine transformation recovered as explained in Section V-A. From the recovered

transformation, the convex hull vertices and all the feature into the points of the test data are mapped through prototype image domain, and correspondences are declared based on a predetermined allowable distance deviation (a threshold ) around each point of the prototype data set. This threshold is determined according to an allowable percentage (say 5%) in distance or length relative to the total object length (average distance between projected points on either major or minor axis of the scatter data on the prototype scene). Remark: Features (e.g., length and angle [9], [25], [30]) other than point coordinates, can be added to the point matching process after undoing the affine transformation. However, this is problematic in the presence of an affine transformation,

940

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 7, JULY 1999

Fig. 7.

(a)

(b)

(c)

(d)

(e)

(f)

The images of the second experimental example taken in supermarket. (a) Reference image.

as length as well as angle is not directly preserved under affine transformations.

VII. COMPLEXITY

OF THE

feature points with the reference feature points. Hence, the total number of the operations is of .

ALGORITHM

The complexity of the algorithm for the worst case where all the feature points are vertices of the convex hull, the total number of operations (comparison, addition, and multiplication) and for finding the convex hulls needed are for computing for both reference and test images; for finding the the affine invariants; “best” match between the reference and test affine invariants; for recovering the affine transformation [ vertices ]; and for registering all test are matched

VIII. EXPERIMENTS We first experimentally test the robustness of the set of invariants in (4.5) and in (4.6) to noise, then, we present experimental results on real scenes.

A. Performance and Sensitivity of FIFV and SIFV Invariants The experimental object data is the convex hull shown in Fig. 3. Fig. 3(a) is the original data, whereas Fig. 3(b) shows

YANG AND COHEN: IMAGE REGISTRATION

941

(a)

(b)

(d)

(c)

(e)

(f)

Fig. 8. The segmented image regions of interest. (a) Reference image. TABLE III TRUE VERTEX CORRESPONDENCE BETWEEN THE CONVEX HULL OF THE REFERENCE IMAGE AND THAT OF THE TEST IMAGE (HIGHLIGHTED VERTICES INDICATE THE CORRESPONDING AFFINE INVARIANTS)

the convex hull after applying the affine transformation (8.1) Using a random number generator from uniform probability distribution, noise has been added to the affine transformed convex hull. The noise level was given by two measures. The first of which is the average percentage change in length ) of the convex hull, whereas the second the average size ( percentage change of the areas of the quadruplets as a result of and are the areas the noise addition. Let of the th quadrangles of the th triangles with noise and without noise, respectively. Then, the noise ) is defined by level measure (

(8.2)

We have considered two cases, the first of which noise has been added such that there was on average a 3% change in length size of the convex hull. This corresponded to an average . The other case had an average of about 5% in noise level a 5% change in length size of the convex hull, corresponding

to an average of about 9% in noise level . In both cases we have used fifteen different noise realizations. One realization for each noise level is shown in Fig. 3(c) and (d). Using the error measure in (5.1), Table I shows the matching statistics obtained using the set of invariants in (4.5) compared to the set in (4.6) for the fifteen different realizations, different , and different noise level. The entries in Table I are the average numbers based on 15 realizations of incorrectly matched quadruplets (NIM) with at least one wrong match vertex. From these statistics, the following observations are made. 1) The matching capability of either the affine invariants increases. This is to in (4.5) or (4.6) increases as be expected since as increases we have more shape information and fewer localities. 2) The performance based on the invariants in (4.5) is superior to that in (4.6) for the low-level noise case. This is due the normalization of the areas by the total area that weakens their discrimination ability. On the other hand, as the noise increases, the normalization factor reduces the noise contribution, and hence a slightly better performance is expected. B. Experiments on Real Scenes This algorithm has been tested on real scenes with objects removed and added, and performs well. The feature points

942

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 7, JULY 1999

TABLE V FOUND VERTEX CORRESPONDENCE BETWEEN THE CONVEX HULL OF THE REFERENCE IMAGE AND THAT OF THE TEST IMAGES FOR w = 3

(a)

(b)

(c)

(d)

(e)

(f)

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 9. The convex hulls of the Pepsi label images. (a) Reference image.

TABLE IV THE MINIMUM RESIDUAL ERROR OF THE FIRST KIND AFFINE INVARIANT FEATURE MATCH WITH w = 3

were the corner points. In the first experiment, we are given several image frames of parts taken in our laboratory, as shown in Fig. 4. Fig. 4(a) is the reference image, whereas Fig. 4(b) and (c) are the test image scenes for which the viewing positions are a priori unknown to the algorithm. In Fig. 5, the convex hulls are drawn with dashed lines. The task here is to identify the corresponding quadruplets and vertices on the convex hulls of the reference and the test images using the affine invariants. The true correspondences between the reference image and the test images are given in Table II. The second and the third rows list the vertices associated with test images (b) and (c) that correspond to the vertices associated with the reference image, respectively. In this table, the symbol “ ” labels the vertex that has no counterpart in the reference image. Using the matching error function in (5.1) and the decision rule in (5.2), the following quadruplets between the reference image and image (b) were , found to be the corresponding pairs: 2 and 9 for , and 6 and and 3; and for image (c): 8 and 5 for . 3 for

Fig. 10. The convex hulls superimposed with recovered affine transformations. (a) Reference image.

Based on the correspondence vertices for case, the affine transformations between the reference scene image and the two test images (b) and (c) were found to be

Fig. 6(b) and (c) show the convex hulls that result from applying the recovered transformations to the convex hull of the reference image and superimposing them on their corresponding test image scene convex hulls, respectively. As shown in Fig. 6, we get a good fit of the convex hulls. In the second experiment, the pictures, as shown in Fig. 7, are taken in a supermarket, and the positions and viewing angles of the images are unknown. The perspective effects on some of these images are relatively pronounced. Regions of interest from the images in Fig. 7 are shown in Fig. 8.

YANG AND COHEN: IMAGE REGISTRATION

The feature points are still the corners or cross points of the edges. The convex hulls of these images are shown in Fig. 9. The true vertex correspondence is shown in Table III. With , the minimum residual error and the correspondence of first vertex are shown in Table IV. In Table V, all correctly matched vertices are listed for each test image. Based on the vertex correspondence obtained from the matched convex hull ), the affine transformations between the vertices (with reference scene image and the five test images of Fig. 9(b)–(f) found to be

Fig. 10(b)–(f) show the convex hulls that result from applying the recovered transformations to the reference image convex hull while superimposing them on their corresponding test image scene convex hulls, respectively. The convex hulls are well recovered for test images 10(b)–(d). It is interesting to observe that the recovery was possible in spite of a strong speckle reflection on the upper right hand side of the scene in Fig. 7(b). However, as shown in Fig. 10, parts of convex hulls of Fig. 9(e) and (f) are not well recovered. This occurs at the parts of the convex hulls which are distant from the matched vertices and which are the closest to the viewer where perspective effects are dominant. To reduce the recovery error caused by the strong perspective effect in the test convex hulls in Fig. 10(d) and (e), we followed an iterative procedure, where new correspondences declared based on a predetermined allowable distance deviation (a threshold) are used in updating the value of the affine transformation, and in finding new matching points that fall below the allowable threshold. This procedure is iteratively repeated until no more matching points that fall below the threshold are found. This iterative procedure led to the following estimated affine transformations:

This updating is shown in Fig. 11. In the third example, the pictures, as shown in Fig. 12, are outdoor scenes of a house viewed from different and unknown positions and viewing angles. Unlike the previous two examples where the objects were either flat or resided on a 2-D plane in the three-dimensional (3-D) space, the feature points (the corners) on the house are nonplanar. This example is considered so as to examine the 3-D structure effects on the 2-D affine invariant matching. The convex

943

Fig. 11. Superimposed convex hulls (e), right, and (f), left, using the refined affine transformations.

hulls are shown in Fig. 13. The task here is also to identify the correspondences of vertices of convex hulls between the reference and the test images, which are visually observed in Fig. 13. We also would like to examine the 3-D structure effect and the to the 2-D affine invariant matching. Using matching decision rule in (5.2), the matched vertices between the reference convex hull and that associated with each of the images of Fig. 13(b)–(d) are listed in Table VI, and the values for the minimum residual error are shown in Table VII. From Table VII, we observe that the minimum residual error between the image of Fig. 13(b) and the reference image is much larger than other two. This is a result of the strong perspective effects encountered for the image of Fig. 13(b). Fig. 14(b)–(d) shows the convex hulls (the dashed lines) that result from applying the recovered transformations to the reference image convex hull while superimposing them on their corresponding test image scene convex hulls, respectively. The convex hulls are well recovered for the test images of Fig. 14(c) and (d), and less so for case (b). In addition to these convex hull registration experiments, we also performed the following image scene classification experiment. We are given three classes of image scenes, and ten sample image scenes to be classified as originating from one of these three class scenes. The three image class scenes were taken as the parts image scene shown in Fig. 4(b), the Pepsi image scene shown in Fig. 8(b), and the house images shown in Fig. 12(b), whereas the sample image scenes are the ones shown in Fig. 4(b) and (c), Fig. 8(b)–(f) and 12(b)–(d). Two types of classifiers are considered for the experiment. For both classifiers we first use the decision rule in (5.2) to establish the best matching vertices (six vertices when ) between the assumed reference scene and using the test scene, and compute the affine transformation based on that match. For the first classifier, we then compute the residual error in (5.1) between the affine invariant features at the matching convex hull vertices. These residual errors are displayed in Table VIII. We observe that the minimum residual errors in all of the test images point to the correct class reference images, and that these are order of magnitude smaller than the ones associated with the wrong class. It is interesting to note that for Pepsi image scene (d) or for the House image (b) under the correct class association the residual error is much higher than the other image scenes again under the correct class association. For the Pepsi image scene (d), this may be due to the strong scale effect that made the vertices of the convex hull very close by, hence affecting the sensitivity

944

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 7, JULY 1999

(a)

(b)

(c)

(d) Fig. 12.

House scene images. (a) is the reference image.

TABLE VI MATCHED VERTICES BETWEEN THE CONVEX HULL OF THE REFERENCE IMAGE AND THAT OF THE TEST IMAGE

(a)

(c)

(b)

(d)

Fig. 13. Convex hulls of the house scene images of the third experimental example. (a) Reference image.

of the invariants in comparison to the other scenes. In spite of this, however, we observe that the error is ten times bigger under the wrong class association. For the house image (b), the minimum residual error, although pointing to the right class, is large due to the strong perspective effects encountered. The second type of classifier used in this experiment is based on the number of matched feature points. Based on the

TABLE VII THE MINIMUM RESIDUAL ERROR OF THE FIRST KIND AFFINE INVARIANT FEATURE MATCH (w = 3)

estimated affine transformations computed from the matched vertices, the feature points of the test images are mapped back is the into the reference domain. If for a test image , total number of matched feature points under the assumption that the reference class is , then we declare the test image as , where originating from class provided that is the minimum number of matched feature points for class . For three different classes were set to 15, 30, and 8, respectively. Note that the predetermined allowable distance for determining the point correspondence (see deviation Section VI) was set to be ten pixels. The classification results are summarized in Table IX. We observe that the parts and Pepsi images are correctly classified, but two of three house images [image (b) and

YANG AND COHEN: IMAGE REGISTRATION

945

TABLE VIII CLASSIFICATION OF THREE CLASSES OF IMAGES USING THE AFFINE INVARIANT FEATURE CLASSIFIER IN (5.2) (w = 3)

TABLE IX CLASSIFICATION BASED ON THE CLASSIFIER (6.1)

(a)

(b)

(c)

(d)

Fig. 14. Superimposed convex hulls after undo the affine transformations. (a) is the reference image.

image (c)] have no class association. As shown in Fig. 14(b) and (c), the recovered feature points are removed from their counterparts on the reference image due to the nonplanar nature of the object and the strong perspective effects. IX. CONCLUSIONS This paper uses a hybrid approach that utilizes affine invariants to establish the correspondences between the convex hull vertices of a test image with a reference image in order to undo the affine transformation between them. This is followed by a point matching approach for recognition. The affine invariants are constructed from the areas of the triangles formed by connecting each of four consecutive vertices (quadruplets) of the convex hull, and hence do make direct use of the area invariance property associated with the affine transformation. The method exploits the attractive properties of the convex hull, namely, the affine invariance property, the local controllability property, and the data reduction. Using the vertices of the convex hull, an ordered feature point subset significantly reduces the data size. The exhaustive search that would be otherwise necessary for establishing correspondences reduces

to a simple linear circular shift search when using the convex hull as a result of the ordering of the affine invariants along the convex hull. In addition, because of the local nature of these invariants, the appearance and/or disappearance of parts of the scene can be very effectively dealt with. Since objects (presented here as a collection of feature points) that are very different in shape may have identical convex hulls. That poses a problem if we were to exclusively use the convex hull affine invariants for object classification (identification). This, however, does not present a problem if the convex hull affine invariant features are used for establishing the correspondence of vertices for the purpose of recovering the affine transformation, and the mapping of the reference object into the test object domain (or vice versa). Classification is then done using all the feature points not only those on the convex hull. Hence, ruling out identifying the different class of objects as the same even though their convex hulls are. The experiments on simulations show that the method is robust to noise and fast. The method has also been tested on real indoor and outdoor images and performs well. ACKNOWLEDGMENT The authors wish to thank the reviewers for their valuable comments and suggestions which improved the manuscript. REFERENCES [1] K. Arbter, W. E. Snyder, H. Burkhardt, and G. Hirzinger, “Application of affine-invariant Fourier descriptors to recognition of 3-D objects,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, pp. 640–647, July 1990. [2] H. L. Beus and S. S. H. Tiu, “An improved corner detection algorithm based on chain-coded plane curves,” Pattern Recognit., vol. 20, pp. 291–296, 1987. [3] A. Bykat, “Convex hull of a finite set of points in two dimensions,” Inform. Process. Lett., vol. 7, pp. 296–298, 1978. [4] D. Cyganski and J. A. Orr, “Applications of tensor theory to object recognition and orientation determination,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-7, pp. 662–673, 1985. [5] M. S. Costa, R. M. Haralick, and L. G. Shapiro, “Optimal affine invariant pattern matching,” in Proc. 10th Int. Conf. Pattern Recognition, 1990, vol. 1, pp. 233–236. [6] H. Dirilten and T. G. Newman, “Pattern matching under affine transformations,” IEEE Trans. Comput., vol. C-26, pp. 314–317, 1977. [7] T. L. Faber and E. M. Stokely, “Orientation of 3-D structures in medical images,” IEEE Trans. Pattern Anal. Machine Intell., vol. 10, pp. 626–633, May 1988. [8] K.-S. Fu, Syntactic Methods in Pattern Recognition. New York: Academic, 1974.

946

[9] Hecker and R. M. Bolle, “Invariant feature matching in parameter space with application to line features,” Res. Rep. RC 16447, IBM Res. Div., T. J. Watson Res. Ctr., Yorktown Heights, NY, 1991. [10] M. K. Hu, “Visual pattern recognition by moment invariants,” IRE Trans. Inform. Theory, vol. IT-8, pp. 179–187, Feb. 1962. [11] D. P. Huttenlocher and S. Ullman, “Recognizing solid objects by alignment with an image,” Int. J. Comput. Vis., vol. 5, pp. 195–212, 1990. [12] B. Kartikeyan and A. Sarkar, “Shape description by time series,” IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 977–984, 1989. [13] Y. Lamdan, J. T. Schwartz, and H. J. Wolfson, “Affine invariant modelbased object recognition,” IEEE Trans. Robot. Automat., vol. 6, pp. 578–589, Oct. 1990. [14] Y. Lamdan and H. J. Wolfson, “Geometric hashing: A general and efficient model-based recognition scheme,” in Proc. 2nd Int. Conf. Comput. Vis., 1988, pp. 238–249. [15] D. W. Thompson and J. L. Mundy, “Three-dimensional model matching from an un-constrained viewpoint,” in Proc. IEEE Int. Conf. Robotics and Automation, Raleigh, NC, 1987, pp. 208–220. [16] J. Mundy and A. Zisserman Eds., Geometric Invariance in Machine Vision. Cambridge, MA: MIT Press, 1992. [17] M. Overmars, “Computational geometry and its application to computer graphics,” Advances in Computer Graphics V, W. Purgathofer and J. Schnhut, Eds. Berlin, Germany: Springer-Verlag, 1989. [18] P. S. Modenov and A. S. Parkhomenko, Geometric Transformations. New York: Academic, 1965, vol. 1. [19] S. K. Nayar and R. M. Bolle, “Reflectance ratio: A photometric invariant for object recognition,” in Proc. Int. Conf. Computer Vision, Berlin, Germany, May 1993. [20] F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction. Berlin, Germany: Springer-Verlag, 1985. [21] E. Person and K. S. Fu, “Shape discrimination using Fourier descriptor,” IEEE Trans. Syst., Man, Cybern., vol. SMC-7, Mar. 1977. [22] W. K. Pratt, “Correlation techniques of image registration,” IEEE Trans. Aerosp. Electron.Syst., vol. AES-10, no. 3, May 1974. [23] J. Lee, Y. Sun, and C. Chen, “Multiscale corner detection using wavelet transform,” IEEE Trans. Image Processing, vol. 4, pp. 100–104, Jan. 1995. [24] G. Taubin and D. B. Cooper, “Object recognition based on moment (or algebraic) invariants,” Geometric Invariance in Machine Vision, J. Mundy and A. Zisserman, Eds. Cambridge, MA: MIT Press, 1992, pp. 375–397. [25] F. Tsai, “A probabilistic approach to geometric hashing using line features,” Comput. Vis. Graph. Image Process, pp. 182–195, 1996. [26] S. Ullman and R. Basri, “Recognition by linear combination of models,” IEEE Trans. Pattern Anal. Machine Intell., vol. 13, pp. 992–1006, 1991. [27] C. Zahn and R. Roskies, “Fourier descriptors for planar closed curves,” IEEE Trans. Comput., vol. C-21, no. 3, Mar. 1972. [28] Z. Huang and F. S. Cohen, “Affine-invariant B-spline moments for curve matching,” IEEE Trans. Image Processing, vol. 5, pp. 1473–1480, Oct. 1996. [29] F. S. Cohen, Z. Huang, and Z. Yang, “Invariant matching and identification of curves using B-splines curve representation,” IEEE Trans. Image Processing, vol. 4, pp. 1–10, Jan. 1995.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 7, JULY 1999

[30] J. Ben Arie, “The probabilistic peaking effect of viewed angles and distances with application to 3-D object recognition,” Model Based Vision, H. Naser, Ed. Bellingham, WA: Optical Engineering, 1993, pp. 128–142.

Zhengwei Yang received the B.S. degree in electrical engineering from the Shanghai University of Science and Technology, Shanghai, China, in 1982, the M.S. degree in systems engineering from Shanghai Jiao Tong University, Shanghai, in 1985, and the Ph.D. degree in electrical engineering from Drexel University, Philadelphia, PA, in 1997. From 1985 to 1987, he was an Assistant Professor, and from 1987 to 1991, a Lecturer, at Tongji University, Shanghai. He was a Research Assistant at the Imaging and Computer Vision Center, Drexel University, Philadelphia, PA, from 1991 to 1997. In 1997, he joined KLATencor Corporation, San Jose, CA. His research interests include computer vision, image processing and understanding, pattern recognition, medical imaging, and multimedia. Dr. Yang was the recipient of the Shanghai Science and Technology Advance Award in 1987 and 1988.

Fernand S. Cohen (SM’95) received the B.Sc. degree in physics from the American University, Cairo, Egypt, in 1978, and the M.Sc. and Ph.D. degrees in electrical engineering from Brown University, Kingston, RI, in 1980 and 1983, respectively. He joined the Department of Electrical Engineering, University of Rhode Island (URI), Providence, in 1983 as an Assistant Professor. In 1984, he joined the Robotics Research Center, URI, and was responsible for the vision research in the center from 1986 to 1987. In 1987, he joined the Department of Electrical and Computer Engineering at Drexel University as the George Beggs Associate Professor. In 1990, he became the Associate Director of the Imaging and Computer Vision Center. He is currently Full Professor in the University’s Department of Electrical and Computer Engineering, and is affiliated with the University’s School of Biomedical Engineering, Science, and Health Systems. In the summer of 1994, he was invited as a Visiting Professor by the National Institute of Research in Information and Automation (INRIA), Sophia Antipolis, France. His research interests include pattern recognition, computer vision, medical image processing, and applied stochastic processes. Dr. Cohen was the recipient of a Research Excellence Award from the College of Engineering, URI, in 1995. In 1986, the French government invited him to tour research laboratories and universities as the George Beggs Associate Professor.