Recognition of Free-Form Objects in Dense Range Data Using Local ...

2 downloads 0 Views 317KB Size Report
II 450Mhz based PC under the Windows NT 4.0 operating system. The speed of the system is competitive with current techniques for free-form object recognition ...
Recognition of Free-form Objects in Dense Range Data Using Local Features Richard J. Campbell Sharp Laboratories of America Camas, WA 98607 USA [email protected]

Abstract

Patrick J. Flynn CS&E Dept., Univ. of Notre Dame Notre Dame, IN 46556 USA [email protected]

especially in multiple-feature combinations.

This article describes a system for recognizing free-form 3D objects in dense range data employing local features and object-centered geometric models. Local features are extracted from range images and object models using curvature analysis, and variability in feature size is accommodated by decomposition of features into sub-features. Shape indices and other attributes provide a basis for correspondence between compatible image and model features and subfeatures, as well as pruning of invalid correcpondences. A verification step provides a final ranking of object identity and pose hypotheses. The evaluation system contained 10 free-form objects and was tested using 10 range images with two objects from the database in each image. Comments address strengths of the proposed technique as well as areas for future improvement.

1 Introduction

2 Feature Extraction Recently, Srikantiah [12] used a homogeneity criterion based on curvature consistency to perform segmentation. A striking property of this work is that the segmentation is not exhaustive; there are surface regions that are effectively unlabeled and unused in recognition. Our work, while inspired by that of Srikantiah, employs specific criteria (large curvature magnitude) for the extraction of salient regions, with an understanding that these extraction results are likely to be imperfect. Salient regions are further divided into subregions to accommodate occlusion. These regions and subregions are augmented with a number of attributes and are used to prune the search for correspondences between image and model features.

2.1 Curvature

3D object recognition has been a topic of interest to researchers for years [2] and the use of free-form objects and dense range data in this context has been studied for about a decade [1, 10, 6, 5]. In this paper, we address the free-form object recognition problem using a system that employs dense range data, object-centered models (which, in our case, are constructed by merging multiple range views), local features, and feature correspondence employing geometric constraints. It is well recognized that correspondence-based recognition may lead to large search spaces in a feature-rich environment [8]. Hence, our correspondence-based system depends on constraints, which in turn requires reasonable geometric feature estimates. As an alternative to segmentation (problematic with free-form objects), we employ networks of small region features and their attributes to obtain constraint predicates. Weak correspondence is used to limit the search for recognition hypotheses. The spatial distribution of highly curved object regions has shown to be a reasonable discriminant,

To find salient surface regions, we use surface curvature estimates. Flynn and Jain [7] studied the curvature estimation problem and concluded that analytical techniques (that fit a surface model to the sensed data and obtain curvatures from analytical derivatives) provide the most stable curvature estimates. Surface fitting provides estimates of the principal surface curvatures κ1 (p) and κ2 (p) in both the range images and in the polyhedral mesh CAD models that form our object database. A regression-based method is applied at each 3D point of interest p in the image or model except those pixels near the silhouette edges. The shape index S and curvedness C defined by Koenderink [11]   are κ1 (p)+κ2 (p) 2 calculated at p: S(p) = π arctan κ1 (p)−κ2 (p) , and q κ21 (p)+κ22 (p) C(p) = . Table 1 depicts surface type classi2 fications based on a quantized shape index. Figure 1 shows the output of this curvature estimation technique on the cow polygonal mesh model.

1051-4651/02 $17.00 (c) 2002 IEEE

(a)

(b)

(c)

prune the search. To handle occlusion, we decompose extracted regions into smaller subregions using a simple region growing procedure with a size constraint. These subregions SE j are ordered based on their surface area. Figure 1(d) shows the result of running the region decomposition algorithm on the regions in Figure 1(e). Additional attributes calculated for subregions allow correspondences between them to be pruned effectively. We calculate area~ j for each subrenormalized location Lj and orientation O gion in a manner analogous to the normalized shape and curvedness measures discussed above.

3 Hypothesis Generation (d)

(e)

Figure 1. Cow object model processing. (a): Model. (b): Curvedness (brighter points indicate larger curvature magnitude). (c): Shape Index (colors identify surface classes). (d): Classified regions. (e): Subregions.

2.2 Regions, Sub-regions, and Attributes Vertices with curvedness C higher than a predetermined threshold T h are identified and their shape class is obtained (Table 1). Vertices in highly curved regions are sorted in decreasing order of C. Vertices whose shape class is convex or concave cylinder are removed due to their positional unreliability. Then, starting from the most highly curved vertex, connected regions of compatible surface class (convex, concave ellipsoid and saddle) are grown. To overcome some inaccuracy in shape index estimates, regions are allowed to overlap in class: convex ellipsoid regions are allowed to contain convex cylinder vertices, etc. Next, the regions are sorted by surface area in descending order. The larger regions are more likely to appear. Any region whose surface area does not exceed a predetermined area threshold is omitted as spurious. Figure 1(e) shows an example of this feature extraction algorithm applied to the toy cow CAD model. The distribution and structure of the ‘landmarks’ extracted by the technique above provides a descriptive model for each object. Several statistics are calculated to summarize and prioritize the regions. Surface area is a key attribute since large, highly curved regions tend to be highly salient. The areanormalized shape index and area-normalized curvedness are also calculated using sums weighted by the areas of polygons in a tesselation of the surface. These measures will be used in hypothesis generation to determine the similarity between regions in an effort to

The hypothesis generation procedure determines likely model pose estimates to explain the observed data in the range image. Our procedure for the generation of the hypotheses uses the regions’ shape indices SI and areas to determine region similarity. Region correspondences are filtered for compatibility. Sets of consistent sub-region pairs and triples are formed. The set of sub-region triples are used with a pose clustering technique to produce sets of compatible correspondences. These sets enumerate the hypotheses to be verified. The hypothesis generation procedure contains the following steps. A set SC of candidate correspondences between range image and model regions is generated. ‘Seed’ sub-region pairs SP i , one for each range image region, are generated. Sets PM of consistent matches to the seed pairs are obtained. Triples ST of image sub-regions are generated by extending the previously generated seed pairs. A set T M of matches between elements of ST and compatible triples of model sub-regions is obtained. The set T M is filtered for pose consistency, yielding a set PC, elements of which are to be verified using synthetic image rendering. S, S(m), SE, and SE(m) denote the sets of range image regions, model regions (for model m), range subregions (for region i), and model sub-regions (for region j of model m), respectively. In the previous section we defined several properties for the regions and subregions. These region properties are Si .C (curvedness measure), Si .SI, (the shape index measure), and Si .area (the region area). Comparable attributes for sub-regions are SEa .C (the area-normalized curvedness), SEa .SI (the area-normalized shape index), SEa .area (the area of the sub-region), SEa .L (the area-normalized location), and ~ (the area-normalized orientation). SEa .O Initial correspondences are built between segments with compatible shape type tags (see Table 1). In addition to filtering potential correspondences by type, we filter by area (ensuring that model regions are no smaller than corresponding scene regions). The set of initial matches for that model is given by SC i (m) and the entire set of matches (in-

1051-4651/02 $17.00 (c) 2002 IEEE

Type tag 0 1 2 3 4 5 6

Type

SI interval

Compat. tags

Convex Ellipsoid Convex Cylinder Weak Convex Saddle Saddle Weak Concave Saddle Concave Cylinder Concave Ellipsoid

[−1, −5/8) [−5/8, −3/8) [−3/8, −1/8) [−1/8, 1/8) [1/8, 3/8) [3/8, 5/8) [5/8, 1]

0,1 0,1,2 1,2,3 2,3,4 3,4,5 4,5,6 5,6

Table 1. Region surface type classification based on the shape index.

volving all models) for image region Si is SC. Once SC has been found, the pair SPi of approximately matching subregions with largest 3D separation is obtained to define a baseline for pose estimation. This is done for every region to produce a set of seed pairs SP. For each such seed pair, we compute SPi .D = kSEa (i).L − SEb (i).Lk (the distance between the elements of the pair) and SPi .θ = ~ • SEb (i).O), ~ the angle between the oricos−1 (SEa (i).O entation of the sub-regions in the pair. The next step is to obtain candidate matches between image seed pairs SPi and compatible seed pairs in model regions Sj (m) ∈ SC i (m). Compatibility is assumed if the seed pairs under consideration have similar relative orientations and distances with respect to the underlying regions. This step yields a list of possible seed to model pair matches PM = {PMi : i = 1, . . . , N s}, where PMi is the set of model sub-region pairs that are compatible with the image seed pair SPi . Each sub-region pair defines a local coordinate system with basis vectors {SPi .~x, SPi .~y, SPi .~z} where SPi .~x is the vector connecting the centroids of the two sub-regions in the pair normalized to unit length, SPi .~z is the component of an area-weighted sum of the orientations of the seed pair’s constituent sub-regions that is orthogonal to SPi .~x, and SPi .~y = SPi .~z × SPi .~x. The origin of this coordinate system is defined as the midpoint of the line connecting the subregions. This procedure is duplicated for each of the seed pairs SPi ∈ SP and model pairs in PM. After obtaining a candidate match between subregion pairs in a single region in image and model, a third subregion match involving a different region in image and model is obtained (this is often not possible for incorrect matches and is hence a powerful pruning constraint). This approach was inspired by the technique proposed by Chua and Jarvis [4]. The resulting set of all possible subregion triple matches is denoted T M. Each element in T M contains a possible pose hypothesis for an object in the range image. Pose clustering is used to determine larger sets of corresponding sub-regions. This yields sets PC g (m) of consistent hypotheses that refer to the same model and that admit consistent pose estimates.

These hypotheses are ordered by the total surface area associated with the range image regions involved, and verification begins with the hypotheses that explain the most image area.

4 Verification The hypothesis PC g (m) may contain multiple sets of corresponding triples (STi,k , Tj,l (m)) each of which contains a pose estimate. A combined pose estimate is found using Horn’s closed form [9] quanternion solution to the corresponding point set registration problem. A synthetic depth image of the model in the hypothesized pose is generated using a polygon rendering algorithm. Pixel-by-pixel comparison of the two images yields a match quality measure based on proximity; at present we use a hard distance threshold to identify pixels that correspond between the two images. A novel aspect of our verification technique is its ability to trigger pose estimate updates. Sets of corresponding points in the original and synthetic images are used with the ICP registration algorithm [3] to refine the pose estimate. Then a final synthetic range image of the model is generated and compared with the synthetic range image Ssynth of the scene, yielding a final matching score. If the score exceeds an acceptance threshold Taccept , the hypothesis is accepted. The matching process then repeats with those regions explained by previously accepted hypotheses removed, until no regions remain or no hypotheses survive the verification step.

5 Experiments We tested the proposed technique on a database of polygonal mesh models of toys acquired by a Minolta Vivid 700 range scanner. Figure 2 shows three models and lists the names of all ten objects. The Minolta sensor was also used to test the recognition system outlined in this paper.

Figure 2. Apple, cow, and duck-rattle models. Other models in the database are lamb, orange dino, Po, red dino, rubber duck, whale, and croc. Figure 3 shows both the range image surfaces and the recognized models rendered in the same coordinate frame for all three test range images. The range image surfaces are

1051-4651/02 $17.00 (c) 2002 IEEE

Image

Correct Detection

Error

Scene 1

Rubber Duck Whale Crocodile Cow Orange Dino Lamb Po Po Crocodile

0.96mm 0.59mm 2.85mm 1.37mm 0.98m 1.25mm 2.17mm 1.3mm 2.04mm

Cow Duck Rattle Red Dino Orange Dino

1.88mm 1.62mm 0.92mm 2.95mm

Scene 2 Scene 3 Scene 4 Scene 5 Scene 6 Scene 7 Scene 8 Scene 9 Scene 10

False Alarms

Missed

Duck Rattle Apple Whale Rubber Duck

the images and object models. These large uniform surface type regions are divided up into smaller sub-regions. The sub-regions then define a basic unit of the surface used in the hypothesis generation stages of the method. Experiments suggest improvements that can be made to the recognition system. The verification process is brittle and needs to be made more robust. A multi-scale approach could be used to improve the recognition of less highly curved objects.

Apple Lamb

Acknowledgments

Red Dino

Whale Lamb

Table 2. Recognition results. shaded light gray, while the recognized models are shaded dark gray. Our experiments showed that the proposed technique handles self-occlusion as well as occlusion from adjacent objects. Summary results for ten two-object scenes appear in Table 2. These tests were run on a Micron Pentium II 450Mhz based PC under the Windows NT 4.0 operating system. The speed of the system is competitive with current techniques for free-form object recognition. In the ten experiments described above, hypothesis generation time was between 2 and 22 seconds and verification time was between 2 and 39 seconds. Free-form features such as bumps, ridges, and valleys yield the features that were most effective in matching. Errors committed by the system include false matches as well as poor pose estimates.

Figure 3. Recognition and pose estimation results.

6 Conclusion In this paper, we have proposed and demonstrated a new local feature based recognition technique for free-form objects. This method utilizes the surface structure in the highly curved regions of the objects to determine whether an object is present in an image and, if so, where it is located. The algorithm is built upon an initial feature extraction step to select and classify the highly curved regions of

This work has been supported by the National Science Foundation under grants IRI-9209212, IRI-9506414, IIS9996004, CDA-9422044, and EIA-9818212, by the department of Electrical Engineering at The Ohio State University and by the School of Electrical Engineering and Computer Science at Washington State University.

References [1] P. J. Besl. The Free-Form Surface Matching Problem. In H. Freeman, editor, Machine Vision for Three-Dimensional Scenes, pages 25–71. Academic, 1990. [2] P. J. Besl and R. C. Jain. Three-Dimensional Object Recognition. Computing Surveys, 17(1):75–145, March 1985. [3] P. J. Besl and N. D. McKay. A Method for Registration of 3-D Shapes. IEEE Trans. Pattern Analysis and Machine Intelligence, 14(2):239–256, February 1992. [4] C. S. Chua and R. Jarvis. 3D Free-Form Surface Registration and Object Recognition. International Journal of Computer Vision, 17:77–99, 1996. [5] C. S. Chua and R. Jarvis. Point Signatures: A New Representation for 3D Object Recognition. International Journal of Computer Vision, 25(1):63–85, 1997. [6] C. Dorai and A. K. Jain. COSMOS-A Representation Scheme for 3D Free-Form Objects. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(10):1115–1130, October 1997. [7] P. J. Flynn and A. K. Jain. On reliable curvature estimation. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 110–116, June 1989. [8] W. E. L. Grimson. Object Recognition by Computer. MIT Press, 1990. [9] B. Horn. Closed-form solution of absolute orientaion using unit quaternions. Journal of the Optical Society, A(4):629– 642, April 1987. [10] A. E. Johnson and M. Hebert. Surface matching for object recognition in complex three-dimensional scenes. Image and Vision Computing, 16:635–651, 1998. [11] J. Koenderink and A. van Doorn. Surface shape and curvature scales. Image and Vision Computing, 10(8):557–565, 1992. [12] R. Srikantiah. Multi-Scale Surface Segmentation and Descrition for Free Form Object Recogntion. MS Thesis, The Ohio State University, 2000.

1051-4651/02 $17.00 (c) 2002 IEEE

Suggest Documents