3-D OBJECT REPRESENTATION USING PARAMETRIC GEONS Kenong Wu
Martin D. Levine
TR-CIM-93-13
September 1993
Centre for Intelligent Machines McGill University Montreal, Quebec, Canada
Postal Address: 3480 University Street, Montreal, Quebec, Canada H3A 2A7 Telephone: (514) 398-8093 FAX: (514) 398-7348 Electronic Mail:
[email protected] [email protected]
3-D OBJECT REPRESENTATION USING PARAMETRIC GEONS Kenong Wu
Martin D. Levine
Abstract We propose parametric geons as a coarse description of object components for qualitative object recognition. Parametric geons are seven qualitative shape types de ned by parameterized equations which control the size and degree of tapering and bending. Model recovery is performed by a procedure of model tting and selection by minimizing an objective function measuring the similarities in both size and shape between models and objects. Multiple view data, parametric model constraints and global optimization are employed to obtain unique models and to compensate for noise and minor changes in object shape. This approach has been studied in experiments with both synthetic 3D data and actual range nder data of perfect and imperfect geon-like objects.
Keywords: computer vision, 3D object representation, object description, multi-view integra-
tion, global optimization, volumetric primitives, parametric models, geons.
1
Contents 1 Introduction 2 Parametric Geons 2.1 De nition : : : : : : : : : : : : : : : 2.2 Formulation : : : : : : : : : : : : : 2.3 Characteristics of Parametric Geons : 3 The Objective Function 3.1 The Distance Measure : : : : : : : : 3.2 The Normal Measure : : : : : : : : :
3 8
::::::::::::::::::::::: 8 ::::::::::::::::::::::: 9 : : : : : : : : : : : : : : : : : : : : : : : 13 16 : : : : : : : : : : : : : : : : : : : : : : : 17 : : : : : : : : : : : : : : : : : : : : : : : 19 3.3 Biasing the Objective Function with Dierent Norms : : : : : : : : : : : : : : : 22 4 Minimizing the Objective Function 23 5 Acquisition of Multiple View Data 25 6 Experiments 27 6.1 Experiments with Synthetic 3D Data : : : : : : : : : : : : : : : : : : : : : : : 29 6.2 Experiments with Range Data of Geon-like Objects : : : : : : : : : : : : : : : : 31 6.3 Experiments with Range Data of Imperfect Geon-like Objects : : : : : : : : : : 32 7 Discussion 34 8 Conclusion 36 References 38
2
List of Figures 1
Qualitative attributes of geons. : : : : : : : : : : : : : : : : : : : : : : : : : :
2 3 4 5
Parametric geons. : : : Tapering deformation. : Bending deformation. : Local surface geometry.
6 7 8 9
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : : Composition of the objective function. :
: : : : : Metrics for tapered and curved primitives. :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : : Possible models of an object with incomplete surface. : The L1 and L2 norms. : : : : : : : : : : : : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
10 Shape of an objective function in terms of two rotation parameters. 11 Removing redundant data. : : : : : : : : : : : : : : : : : : : : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
::::::: ::::::: ::::::: ::::::: 16 Values of distance measures (d1 ) and normal measures (d1 ) vs number of decrements in the objective function. : : : : : : : : : : : : : : : : : : : : : : : : : : 17 Objects for which deformation is not performed along the longest side. : : : : :
12 13 14 15
Computing parametric geons. : : : : : : : : Rectangular region for estimating object size. Multi-view integration. : : : : : : : : : : : : Banana objects. : : : : : : : : : : : : : : :
: : : : : : : : : : : : : :
4 9 10 11 15 17 19 20 23 24 27 28 29 32 33 36 37
3
1. Introduction A major problem of interest in computer vision is the derivation of shape descriptions of three dimensional (3D) objects from images obtained with laser range nders. This shape description usually consists of constrained geometric entities related to relevant knowledge about objects. For the task of object recognition, a computer vision system must match such a description against a set of models in a database. The shapes of these objects vary in many ways and image data are often contaminated by noise. However, the number of object models which a computer can consider is restricted from a practical point of view. It is often the case that object models are only idealized instances of objects and cannot completely depict all possible variations in object shape. Thus the role of object shape representation is to produce object shape approximations as descriptions which are consistent with a prescribed set of object models. In order to produce an appropriate description of 3D objects, the following questions arise:
What information about the objects is characterized by the description? For recognition, is this description unique within a class of objects and distinctive among classes?
Can a reliable technique be used to derive such a description from sensed data? In this report, these questions are addressed by studying the characteristics of a new part model which approximates the shape of an object part using volumetric primitives. The problem of object part decomposition is a subject of further research. Several kinds of volumetric primitives for modeling object parts have been used in computer vision. Examples are generalized cylinders [9], ellipsoid and cylinder models [19], geons [7], superquadrics [39], deformable superquadrics [49], hyperquadrics [24] and higher order polynomials [31]. These part models can be classi ed into two groups: qualitative models and parametric (quantitative) models. Qualitative models, such as geons, are interesting since they provide distinct shape characteristics which are useful for symbolic object recognition [5]. According to Biederman's \Recognition by Components" (RBC) theory [7], geons (short for \geometric ions")
1. INTRODUCTION
4
Asymmetrical
Reflection
Expand & Contract
Size
Symmetry Rotation & Reflection
Curved
Edge Straight
Expand
Constant
Straight
Axis Curved
Figure 1: The original geons are de ned by variations in three attributes of the cross-
section: (i)curved vs. straight edges; (ii)constant vs. expanded vs. expanded and contracted size; (iii)mirror and rotational symmetry vs. mirror symmetry vs. asymmetrical) and one of the axis shape (curved vs. straight axis) [7].
are 36 volumetric component shapes described in terms of four qualitative attributes of generalized cylinders (Figure 1). These properties can be readily detected by an analysis of relatively perfect 2D line drawings [3, 4, 5]. The putative component shapes are hypothesized to be simple, typically symmetrical, and lacking sharp concavities. The object components can be dierentiated on the basis of perceptual properties manifested in a 2D image that are sometimes dicult to detect and are relatively independent of viewing position and degradation. Psychological experimentation has provided support for the descriptive power of geon-based descriptions [8]. The geon model has therefore been proposed as a basis for object recognition of 3D objects observed in single 2D views.
1. INTRODUCTION
5
This model has been previously used by several researchers to describe 3D objects [5, 36, 15, 26, 16, 42, 30]. A recent survey on geon-based object recognition is given in [35]. Most of the work on this subject has focused on the recovery of geon models from complete line drawings which depicted perfect geon-like objects. The problem of imperfect objects was not considered in these cases. In many situations, however, \clean" or complete line drawings of objects cannot be obtained due to the color and texture of object surfaces or because of poor lighting conditions. Because of this, and also for practical considerations, some research has focussed on data obtained from laser range nders. For example, Nguyen and Levine [36] have derived a simple edge-based method for computing the 3D geon description for range data. Raja and Jain[42] have obtained geons using surface features, i.e. principal curvatures, as well as the distribution of the angles between a part's principal axis and the normals on the side surfaces of that part. Both of these approaches have built their part descriptions in a bottom-up fashion. However, objects can appear in a variety of shapes but geons are simple and regular volumes. In some cases, the object features derived from these approaches may not satisfy the de nition of any of the geon types. In addition, purely qualitative models which lack quantitative information may have limited usefulness. For example, consider the situation where the qualitative shape information is the same for two objects and discrimination must rely on quantitative information, such as the relative size of the object parts or the speci c curvature of the part axis. Thus a mechanism of object shape approximation, global volumetric information, and parametrization of some attributes are highly desirable properties for geon recovery in many applications. These characteristics can be found in parametric models, such as superquadrics. Superquadrics are a parameterized family of closed surfaces [22, 1]. The advantage of their use is that superquadrics require only two more parameters than simple ellipsoids to describe a much larger variety of geometric shapes. Superquadrics and their normals are de ned parametrically as follows [1]: 2 3 2 3 1 2 66 x(; !) 77 66 a1cos cos ! 77 x(; !) = 666 y(; !) 777 = 666 a2cos1 sin2 ! 777 (1) 4 5 4 5 z(; !) a3sin1 According to Barr[1], the correct term for\superquadrics" as used in the computer vision literature should be superellipsoids. Superellipsoids are a subset of superquadrics. In this report, the terms superellipsoid and superquadric are used interchangeably.
1. INTRODUCTION
6
3 2 2 66 nx(; !) 77 66 a11 cos2?1 cos2?2 ! n(; !) = 666 ny (; !) 777 = 666 a12 cos2?1 sin2?2 ! 5 4 4 1 2?1 nz (; !) a3 sin ? ! : ?
3 77 77 75
(2)
2 2 Here is a north-south parameter, like latitude, and ! is an east-west parameter, like longitude. 1 is the \squareness" parameter in the north-south direction; 2 is the \squareness" parameter in the east-west direction. a1; a2 ; a3 are scale parameters along the x; y; z axes, respectively. Superquadrics can be also expressed in the form of an implicit equation as follows [1]:
x 2=2 y 2=2 !2 =1 z 2=1 + + = 1 a1 a2 a3
(3)
Equations (1), (2) and (3) represent superquadrics at the center of a coordinate system without rotation about the coordinate axes. If a general transformation of superquadrics is considered, then three translation and three rotation parameters must be included in the de ning equations. The superquadric model permits us to relax the overly rigid constraint on objects and deal with families of objects characterized by parameters. Since the number of data available is usually much larger than the number of parameters, by using superquadric models, an overconstrained estimate of parameters can be determined. During the model recovery procedure, such a model primitive can adapt to the object shape by \tuning" the model parameters. The shape constraints provided by the model through superquadric equations can be used to reduce the in uence of image noise and minor variations in object shape. In this way, a volumetric description of objects can be obtained eciently, thereby bypassing some common error-prone processing steps such as building point-by-point descriptions of lines and surfaces. Some work has already been done using superquadrics as volumetric tting primitives [39, 40, 11, 45, 18, 53]. With regard to recognition, Raja and Jain have explored the recovery of geons from single-view range images by classifying the actual parameters of globally deformed superellipsoids [38]. It was found that the estimated parameters were extremely sensitive to viewpoint, noise and objects with coarse surfaces. The classi cation error was unacceptably large, up to 70% for certain textured objects. This result illustrates that the superquadric models, as employed by the authors, have very weak discriminative power. There are basically two reasons for their poor results. First, partially viewed objects cause uncertainties in the estimated parameters of the superquadric
1. INTRODUCTION
7
model [51]. Second, parameters in globally deformed superellipsoids tend to interact with each other in ways that make the model dicult to control. The extreme nonlinearity of the globally deformed superquadric equations makes the problem even more complicated. To obtain more unique models for the purpose of recognition, we suggest the use of multiple view data to provide more global object shape information. In addition, the interaction between parameters should also be reduced. By appropriately constraining the superellipsoid parameters, we can obtain models which approximate object parts and can be used to discriminate between them. Of course, the general problem of volumetric shape approximation has been addressed by computer vision researchers for many years. For example, at INRIA, Faugeras and his colleagues have developed a method for computing a polyhedral approximation of a set of 3D points [17]. At USC, Rao and Nevatia have derived a linear straight homogeneous generalized cone description from sparse and imperfect 3D data [43]. Terzopoulos, Witkin and Kass obtained volumetric primitives by tting a symmetry-seeking model [48]. Keren, Cooper and Subrahmonia at Brown University used fourth-degree implicit polynomials to represent volumetric objects [31]. It appears that the use of superquadrics to model volumetric objects has received more attention. In this report we propose a new set of volumetric primitives, parametric geons, which combine the merits of both qualitative and parametric(quantitative) models. Parametric geons are described by seven distinct shape types and a few parameters which control size and global shape deformation. For the task of object recognition, both qualitative shape features and quantitative size and deformation parameters are important. The primitive recovery procedure presented in this report includes multiple view data acquisition, parametric geon tting and model selection. Multi-view data provide more shape information about objects and reduce the ambiguities caused by object self-occlusion. To t models to the acquired data and select the best t model, we de ne an objective function characterized by the spatial distance between model and object surfaces and the dierence between the normal vectors of the models and objects. This function is minimized using a stochastic global optimization approach. The selection of the model which best ts the data is based on the tting residuals, rather than the model parameters. We demonstrate the discriminative power of parametric geons and show their robustness in model recovery using experiments with synthetic data without noise, range data of geon-like objects and range data of imperfect geon-like objects.
2. PARAMETRIC GEONS
8
This report is organized as follows. In Section 2, we give the de nition of parametric geons, elaborate their formulation and discuss their characteristics. In Section 3, we derive an objective function to recover the parametric geon models. In Section 4, we illustrate the model recovery procedure. Section 5 brie y discusses the approach of multiple view data acquisition. In Section 6, we present experimental results using both synthetic and real range data. In Section 7, we mention a few issues related to this work and conclusions are drawn in the last section.
2. Parametric Geons In this section, we begin with the de nition of parametric geons and then derive their mathematical formulation. Lastly, we illustrate certain properties of parametric geons and their relationship to the original Biederman's geons [7].
2.1 De nition Parametric geons are seven volumetric shapes which are derived from the superquadric equations (1), (2) and (3) by (i) specifying the shape parameters, 1 and 2 and (ii) applying tapering and bending deformations. These two parameters control the degree of \roundness" or \squareness" of superquadrics in the north-south and east-west directions, respectively. Three of the parametric geons are derived from superquadrics as follows:
1 = 1, 2 = 1: Ellipsoid. 1 = 0:1, 2 = 1: Cylinder shape with elliptical cross sectiony. 1 = 0:1, 2 = 0:1: Cuboid. These three regular forms are the basic shapes in our family of seven parametric geons. Next the global deformations of tapering and bending are applied to cylinders and cuboids, resulting in the additional four shapes listed bellow:
Tapered cylinder, Superquadric shape changes smoothly with 1 and 2. We choose 1 = 0:1 for a cylinder, based on computational robustness and the perceptual acceptance of its shape. The same reasoning applies to the cuboid. y For simplicity, we refer to this primitive as a cylinder.
2. PARAMETRIC GEONS
9
cuboid
ellipsoid cylinder
tapered cylinder
tapered cuboid
curved cylinder
curved cuboid
Figure 2: The seven parametric geons.
Tapered cuboid, Curved cylinder, Curved cuboid. The seven shapes are illustrated in Figure 2.
2.2 Formulation (i)Implicit Equations of the Three Basic Shapes Given 1 = 2 = 1 in (3), the equation of an ellipsoid is
x 2 y 2 z 2 a1 + a2 + a3 = 1
(4)
2. PARAMETRIC GEONS
10
(a)
(b) Figure 3: Tapering deformation (a) Downward tapering along the z axis; (b) Invalid tapering deformation with Kx ; Ky > 1: Given 1 = 0:1; 2 = 1, the equation of a cylinder is given by
x 2 y 2!10 z 20 + a1 + a2 a3 = 1
(5)
Given 1 = 2 = 0:1, the equation of a cuboid is
x 20 y 20 z 20 a1 + a2 + a3 = 1
(6)
(ii)Implicit Equations of Tapered Shapes Two assumptions are made regarding the tapering deformation: (1) tapering deformation is performed along the z axis; (2) the tapering rate is linear with respect to z . Although this linearity assumption is sometimes violated for real objects, our model is only designed to approximate tapered object parts. Based on these assumptions, tapering deformation is given by
8 > < X = ( Ka3x z + 1)x > : Y = ( Ka3y z + 1)y
(7)
where X and Y are the transformed coordinates of the primitives after tapering is applied to the coordinates x and y . Kx ; Ky are tapering parameters in the x and y coordinates. To permit downward tapering only and avoid invalid tapering (see Figure 3), we impose the constraints 0 Kx 1 and 0 Ky 1. Upward tapering can be accomplished by a rotation operation. By substituting Equation (7) into Equations (5) and (6) respectively, we obtain implicit
2. PARAMETRIC GEONS
11 z ( x0 , z 0 )
Z0
1/κ - x 0
θ 1/κ
X0
O
x
Figure 4: Bending deformation in the xz plane. Axis y is perpendicular to this plane,
projecting into the paper. The shaded area delimits the original primitive. The thick line depicts the curved primitive. O is the center of bending curvature and is the bending angle. Point (x0 ; y0) is transformed into coordinate (X0 ; Y0) by the bending operation.
equations for a tapered cylinder and cuboid as follows:
12110 20 Y A CA + Z = 1 K y a3 a2( a3 Z + 1) 1 a3 0 120 0 120 20 X Y @ Kx A + @ Ky A + Z =1 a3 a1( a3 Z + 1) a2( a3 Z + 1)
00 12 0 X B @@ a ( Kx Z + 1) A + @
(8) (9)
(iii) Implicit Equations of Curved Shapes We use a simple bending operation which corresponds to a circular section, as shown in Figure 4. The reason for choosing such a simple bending deformation is that only one parameter - the curvature of the circular section - can be used to describe the bending feature. Although many curved object parts do not have constant curvature, we can still amply approximate curved object parts using this qualitative shape model. The bending operation is applied along the z axis in the positive x direction. There is no torsional deformation. The bending operation transforms vectors (x; y; z ) into vectors (X; Y; Z ). The equations describing the bending deformation are
2. PARAMETRIC GEONS given by (see Figure 4):
12
8 > ?1 ?1 > > < X = ? cos ( ? x) Y = y > > > : Z = (?1 ? x) sin
(10)
8 q > ?1 ? Z 2 + (?1 ? X )2 > x = > < y = Y > > > : z = ?1 = ?1 arctan ?1Z?X
(11)
Here = z is the bending angle. The inverse transformation is given by
The equations for curved cylinders and cuboids, as given in (12) and (13), can be obtained by substituting Equation (11) into Equations (5) and (6):
110 00 q ?1 ? Z 2 + (?1 ? X )2 12 Y 2 ?1 arctan ?1Z !20 C B ?X A + =1 @@ a a A + a 1
2
3
0 ?1 q 2 1 ?1 arctan ?1Z !20 ?1 ? X )2 20 Y 20 ? Z + ( ?X @ A + =1 a1 a2 + a3
(12) (13)
(iv) Computation of Normals for Parametric Geons Normal vectors at a point on the surface of the parametric geons can be computed either from (i) the implicit equation (3) or (ii) the parametric equation (2) for the three regular shapes. Let an implicit equation of a parametric geon be de ned as a mapping M such that
M : g(~x;~a) = 0 where ~x = fx; y; z gT de nes the surface points and ~a is a parameter vector. Then the gradient vector ( ) @g (~x;~a) @g (~x;~a) @g (~x;~a) ~nm = (14) ; ;
@x
@y
@z
de nes the normal vector to a parametric geon. An alternative approach to computing normals for deformed primitives is to apply a transformation to the normal vectors of the three regular shapes. If tapering or bending is expressed by the equation X~ = F~ (~x) (15)
2. PARAMETRIC GEONS
13
where X~ is the transformed point of ~x, computation of the normals of deformed primitives requires using the inverse transpose of the Jacobian matrix of the deformation function as follows [2]:
~nXm~ = B~n~xm
(16)
where B = (detJ )J ?1T and J denotes the Jacobian matrix whose ith column is obtained by the partial derivative of F~ (~x) with respect to ith component in ~x as follows:
8 9 < @ F~ (~x) @ F~ (~x) @ F~ (~x) = J (~x) = : @x ; @y ; @z ;
(17)
The determinant of J can be ignored because only the direction of the normals is important. The normal transformation matrix for tapered primitives can be obtained by applying Equation (17) to (7) as follows:
0 B B ? 1 T J = BBB @
ky z + 1 a3
0 0 kx z + 1 0 0 a3 ?( kay3 z + 1) kax3 x ?( kax3 z + 1) kay3 y ( kax3 z + 1)( kay3 z + 1)
1 CC CC CA
(18)
The normal transformation matrix for curved primitives can be obtained by applying Equation (17) to (10) as follows:
1 0 ?1 ? x) cos k ( k 0 sin CC B B B ? 1 T J = BB 0 k(k?1 ? x) 0 CCC A @ ?k(k?1 ? x) sin 0 cos
(19)
By knowing the normal vectors for the regular primitives, one can multiply them by either (18) or (19) to obtain the normal vectors for tapered and curved primitives, respectively. A more detailed discussion of the global deformation of solid shapes can be found in [2, 45].
2.3 Characteristics of Parametric Geons Parametric geons as volumetric primitives carry information about the spatial distribution and surface orientation of 3D solid shapes. This volumetric and surface information, as de ned by the parametric geon equations, eectively restricts the solutions of the model recovery procedure to shapes which are consistent with the shapes of the parametric geons. In other words, the global
2. PARAMETRIC GEONS
14
ATTRIBUTES
PARAMETRIC GEONS
combination of properties
either tapering or bending
cross sectional shape cross sectional size
symmetrical constant, expanding,
GEONS
symmetrical, asymmetrical constant, expanding expanding & contracting both tapering and bending
Table 1: Dierence of qualitative properties between parametric geons and Biederman's original geons.
shape models are used as constraints in the recovery procedure to reduce the eect of noise and minor shape texture. The actual technique used to recover the object models will be presented in detail later. The seven speci c parametric geon shape types were chosen because they are regular, simple and symmetrical volumes. Most of them are commonly described by a simple geometric term such as ellipsoid, cuboid, cylinder, cone and pyramid - without ambiguity. These shapes are also consistent with the basic forms used by the more traditional methods of 3D object representation such as sculpture [54]. The dierence between parametric geons and Biederman's geon properties are given in Table 1. Certain qualitative properties of parametric geons are simpli ed in comparison to the original geons of Biederman's. For example, asymmetrical cross section is not used in de ning parametric geons since the description of geometric asymmetry requires more information than symmetry. Formulating asymmetrical primitives also requires more parameters which may lead model nonuniqueness. The assumption that all parametric geons are symmetrical with respect to their major axes is adopted in accordance with the known tendency in human perception toward phenomenal simplicity and regularity [25]. Symmetrical primitives have also been employed in the variations of the original geons discussed by other researchers [42, 14]. We do not allow the cross-sectional size to expand and contract along the axis since the resulting shape could also be represented by two tapered shapes with the same cross-section. Furthermore, we do not permit tapering and bending to occur simultaneously. This greatly simpli es the attributes of the parametric primitives, and in turn, avoids interaction between tapering and bending parameters. This restriction, applied to either tapering or bending of primitives, was also invoked by Dickinson et al.[14]. Some surface properties of parametric geons are illustrated below. This information charac-
2. PARAMETRIC GEONS
15 U( P )
S
TP (M) P
b
u
M
Figure 5: Local surface geometry.
terizes the local geometrical properties of objects. It is the surface that one actually sees and from which one derives clues that may be related to an object's global structure. For example, a cube is only composed of planar surfaces; a sphere is only composed of curved surfaces; a cylinder has both planar and curved surfaces. The determination of simple volumetric shapes is often based on surface features [19, 42, 34]. The surface properties can be characterized in terms of surface curvatures, as illustrated in Figure 5. In this illustration, TP (M ) is the tangent plane of a surface M in E 3 at a point P . U (P ) is the normal vector of M at P and u is a particular tangent vector to M at P . A normal plane S , containing U (P ) and u, intersects the surface M , resulting in a curve (or normal section) b which is a function of u. The curvature of b is referred to as the normal curvature k (u) associated with the direction u. If k (u) > 0, the normal section b is bends toward U (P ). The maximum and minimum values of the normal curvature k (u) of M at P are called the principal curvatures of M at P and are denoted by k1 and k2 , respectively [37]. Information about the principal curvatures can also be expressed in terms of the Gaussian curvature K and mean curvature H : K = k1k2 and H = (k1 + k2)=2. K and H are known to be invariant to changes in translation and rotation of object surfaces [6]. Table 2 shows the curvature signs associated with each of the de ned parametric geons. For example, a cylinder has K = 0 and k1 = 0 for all surface points. Also, H < 0; k2 < 0 for the side face of the cylinder, and H = 0; k2 = 0 for the top and bottom surfaces. Note that there is no dierence in the curvature signs between the regular and
3. THE OBJECTIVE FUNCTION
16
type K H k1 ellipsoid + { { cylinder 0 {, 0 0 cuboid 0 0 0 tapered cylinder 0 {, 0 0 tapered cuboid 0 0 0 curved cylinder {, 0, + {, 0, + {, 0, + curved cuboid 0 {, 0, + 0, +
k2
{ {,0 0 {,0 0 {, 0 {, 0
Table 2: Parametric geons
and the signs of their Gaussian, mean and principal curvatures. We assume that the normals are pointing toward the outside of the primitives.
tapered primitives. This demonstrates that curvature sign information has a restricted potential for primitive discrimination for a subset of the parametric geons. However, in order to distinguish the regular and tapered primitives, speci c additional volumetric properties must be considered.
3. The Objective Function In our study, the objective function was chosen to serve two purposes. First, all the parametric geons are tted to the object range data by minimizing the objective function. Second, the best model is selected according to the minimum residual. The objective functions studied previously by several researchers were strictly concerned with the rst purpose [12, 45, 51, 53], using a maximum likelihood estimator (L2 norm). To accomplish the second purpose, the values of the objective function must correctly re ect the dierence in size and shape between the object data and the parametric models. When a model and an object are close to being the same shape, the objective function should produce a small residual value. When a model is tted to another class of objects, this same objective function should give large residual values. Since the contribution made to tting residuals by shape variations between dierent types of objects is much more than that by sensor noise, we can perform shape discrimination using the nal residuals. Our objective function is consists of two terms expressed as follows:
E = d1 + d2
(20)
The rst term, d1 , measures the distance between the object surface and the model surface; the second term, d2 , measures the dierence between the object and model normals. When the
3. THE OBJECTIVE FUNCTION
17
difference between normals
θi
nm
Model
nd distance between two surfaces
A xs O
Image data
Figure 6: De ning the objective function. nm and nd are the model and data surface
normals, respectively . O is the origin of the model. A is the distance between a particular data point and the center of the model. xs is a point on the model surface. i is the angle between a model and object surface normals.
model and object pose are the same, the intuitive interpretation of these two terms corresponds to size and shape similarity, respectively.
3.1 The Distance Measure The rst term of the objective function is given by N X d1 = 1 je(Di;~a)j
(21) N i=1 Here N is the number of data points, fDi 2 R3 ; i = 1; :::; N g is the set of data points in a model frame and ~a is the vector of model parameters. For the three regular primitives (ellipsoid, cylinder and cuboid), e(Di ;~a) is de ned as the Euclidean distance from a data point to the model surface along a line passing through the origin O of the model and the data point [51] (see Figure 6). Let ~xs = lDi where l is a scalar and ~xs is the model surface point on the line joining Di and O. A is the distance from Di to O. Substituting ~xs into Equations (4), (5) and (6), we obtain ! 1 e(Di;~a) = A 1 ? [f (D ;~a)]1=p (22) i where 8 > < 2 for the ellipsoid; p=> : 20 for the cylinder and cuboid:
3. THE OBJECTIVE FUNCTION
18
f (Di ;~a) is an implicit function for a parametric geon. For instance, the implicit function for an ellipsoid is (see Equation (4))
x 2 y 2 z 2 f (Di ;~a) = a + a + a i = 1; :::; N (23) 1 2 3 For tapered and curved primitives, the computation of e(Di ;~a) can be formulated as follows. Let e(Di ;~a) = and let g(~xs;~a) = 0 (24) be the implicit equation of the model. We can also write (see Figure 6)
~xs = A A? Di :
(25)
Substituting Equation (25) into Equation (24) we get
g( A A? Di ;~a) = 0
(26)
The problem is as follows: nd the minimum value of 0 which satis es Equation (26). Since tapering or bending signi cantly complicates the implicit primitive equations (26), we cannot obtain a closed-form solution for or e(Di ;~a), as was done in Equation (22). Thus an iterative method would be indicated to obtain a value of , the Euclidean distance from a data point Di to the model surface along a line passing through Di and center O. Since objective function evaluation is the largest computational component of the model recovery procedure, for the sake of simplicity, we compute an approximate distance measure for the tapered and curved models. No iteration is required. First, we apply an inverse tapering transformation (see Equation (7)) or bending transformation (see Equation (11)) to both the data and the model in order to obtain the transformed data Di0 ; this gives either a regular cuboid or regular cylinder. Second, we use Equation (22) to compute the distance from the transformed data point Di0 to the transformed model surface along a line passing through Di0 and the model origin O. We interpret e(Di0 ;~a) as the approximation of the distance along a line from Di to the model surface. Figures 7 (a) and (b) show e(Di0 ;~a) for tapered and curved models, respectively. Although this approximation creates a small error in the distance measure, it tremendously speeds up computation of the distance and normal measures.
3. THE OBJECTIVE FUNCTION
19 Di
Di
Di Di
actual distance
xs
xs O
computed distance
actual distance
O
O
O
computed distance
e(Di , a)
e(Di , a)
(a)
(b)
Figure 7: The right cylinders in (a) and (b) are obtained by applying inverse tapering
and bending transformations to the left tapered cylinder and curved cylinder, respectively. e(Di0 ;~a) is the Euclidean distance along a line ODi0 in the inverse transformed case.
3.2 The Normal Measure We de ne the second term (d2 ) of the objective function by measuring a square dierence between the surface normal vectors ~nd of objects and the surface normal vectors ~nm of models at each corresponding position de ned in the same way as in the rst term (see Figure 6),
d2 = N1 here N is the number of data points and
N X i=1
en(i)
en(i) = k~nd(i) ? ~nm(i)k2 :
(27)
(28)
The nd are unit normal vectors on object surfaces, computed from range image data. The nm are the unit normal vectors on the model surfaces, computed based on methods described in Section 2.2. In (20), is a factor which makes the second term adapt to size of parametric geons. It is de ned to be = (ax + ay + az )=3 and ax; ay and az are model size parameters. The unit given in the rst term of the objective function is millimeters but d2 is the average of dierence of unit normals. when multiplied by , d2 has the same units as the rst term. This factor also forces the selection of a model with a smaller size if object data are t equally well by a model with dierent parameter sets. This case happens when the data on the bottom surface of an object cannot be obtained. Figure 8 demonstrated that the overestimated model and the best model can produce the same residual value of d2 . By multiplied by , the model having a
3. THE OBJECTIVE FUNCTION
20
object supporting plane
invisible surface
underestimated model
best model
overestimated model
Figure 8: Since data on the bottom surface is not available due to occlusion, two
models - the best model and the overestimated model - could be t to the data equally well without employing . If is used, the underestimated model causes both terms in the objective function to be large, while the overestimated model causes the second term to be large. The best model results in the smallest residual value of the objective function.
smaller gives a smaller residual than the overestimated model. However, the size of the model is prevented from being arbitrarily small since the value of the objective function increases if the size of the model is smaller than the object size. In (20), is a weighting constant, controlling the contribution of the second term to the objective function. It is selected by a user based on assumptions of shape similarity between objects and their models and based on smoothness of object surfaces. When the shape types of an object and a model being t are dierent, the second term should have a relatively large value. On the other hand, if the object surface are very coarse, the measurement of surface orientation by local computation of surface normals is not reliable. Therefore, the contribution by the second term should be greatly reduced by making small. Here we give some \hints" for selecting . If nd and nm are unit normal vectors of objects and models, and i is the angle between them (see Figure 6), the relationship between en (i) and i can be shown as follows: en(i) = 2 ? 2 cos i (29)
d2 is expressed by P
d2 = N1
N X i=1
en(i) = 2 ? 2
(30)
here = (1=N ) Ni=1 cos i , the average value of cosi . If the shape of an object and its model are perfectly matched, i = 0; = 1 and d2 = 0. If their shapes are not exactly the same, i 6= 0
3. THE OBJECTIVE FUNCTION
21
for some fig and then < 1, d2 6= 0. Therefore, we can approximately relate shape similarity between objects and models to . The greater the value of , the more similar their shapes. The second term in the objective function is weighted by as follows:
d2 = (2 ? 2 )
(31)
In Equation (31), the left side is the contribution of average of normal dierences to the second term of the objective function. The right side is the average of normal dierences weighted by . The problem is how to select . Instead of arbitrarily choosing a , we need to know, based on some heuristics, when the shapes of objects and models are signi cantly dierent (i) how much contribution the second term makes and (ii) the value of . For the rst problem, when two shapes are dierent we should make the second term significantly large. However, the contribution made by the second term cannot be too large because if so the rst term will be ignored. We determine the contribution based on the fact that if the minimum value of the rst term is equal to the average \radius" of the model, these two particular shapes are signi cantly dierent. Applying the fact to the second term we get d2 = (ax + ay + az )=3, and therefore, d2 = 1. According to Equation (31), can be determined by = 2 ?1 2 (32) Based on this equation, one can select according to a given under the selection of the contribution speci ed above. The second problem is generally very dicult to deal with. Since we use seven distinctive shapes, we may think that shapes of dierent parametric geons are very dierent. However, this constraint is still not sucient to determine because varies for dierent pairs of parametric geons and even varies for two instances of the same type of the parametric geon with dierent parameter settings. For some very special cases, we can numerically compute the value of , resulting in = 0:9 for a sphere and a cylinder and = 0:79 for a sphere and a cube . This means that, for example, if we select = 0:9, we get = 5 from Equation (32). Thus, the second term equals (ax + ay + az )=3 when the object is a sphere and the model is a cylinder. The diameter of the sphere and the cylinder, the height of the cylinder, and the length of the cube are set equal to one.
3. THE OBJECTIVE FUNCTION
22
Equation (29) can also be used to examine the sensitivity of en (i) to minor changes in normal direction i . The derivative of en with respect to i is:
en0(i) = 2 sin(i)
(33)
When i is small, e0n (i) is small. Equation (33) implies that the normal measure is not sensitive to minor changes of normal angles. In summary, we use the average of normal dierences as a similarity measure between objects and models. In order to make an appropriate contribution to the objective function, the average of normal dierences are weighted by two parameters, , making the second term have the same unit as the rst term, and , adjusting the contribution to the objective function. Although there is no general rules for selecting , we use a heuristic on the dierence between two particular shapes of parametric geons to guide our choice. Finally, we show that the sensitivity of the squared dierence of normals with respect to the angle between two normals is very small when the angle is small. If noise or minor changes in object shape cause small changes in normals, the objective function is not sensitive to these changes.
3.3 Biasing the Objective Function with Dierent Norms We use a L1 norm (see Equation (21)) and a L2 norm (see Equation (28)) to measure the dierence in the distance and in orientation, respectively. Both the L1 norm and the L2 norm can be viewed as special cases of norms of M-estimator found in robust statistics [23]. An L1 norm is denoted by (x) = jxj and an L2 norm by (x) = x2. Figure 9 (a) and (b) show an L2 and L1 norm respectively. Figure 9 (c) and (d) show the rst derivatives of the L2 and L1 norm respectively. As illustrated in Figure 9 (c), the sensitivities of an L2 norm gradually increase. In other words, the method is insensitive to small values in the objective function and becomes sensitive to outliers (samples with values far from the local trend). Whereas, the sensitivities of an L1 norm are equal for all scales of residual values, as shown in Figure 9 (d). When the shape types of objects and models are not same or object pose is very dierent from that of the model, the data far from the model surface can be viewed as outliers. Thus, using an L1 norm makes a much smaller contribution to the rst term in the objective function than using an L2 norm. In
4. MINIMIZING THE OBJECTIVE FUNCTION
23
4
4
3.5
3.5
ρ1 (x) = x 2
3 2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0 -2
-1.5
-1
-0.5
0
0.5
1
ρ2 (x) = |x|
3
1.5
2
x
0 -2
-1.5
-1
-0.5
(a) 2.0
1.5
1.5
0.5
ρ1 (x)
1.0
1
1.5
2
0.5
1
1.5
2
x
ρ2(x)
0.5
0
0.0
-0.5
-0.5
-1
-1.0
-1.5
-1.5
-2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
0.5
(b)
2
1
0
x
-2.0
(c)
-2
-1.5
-1
-0.5
0
x
(d)
L2 norm of x. (b) The L1 norm of x. (c) The rst derivative of the L2 norm. (d) The rst derivative of the L1 norm. Figure 9: (a) The
this case, the value of the second term - in the form of L2 norm - is very large and dominates the objective function. With the objective function de ned in such a way, we form an ecient model recovery procedure which will be discussed in Section 7.
4. Minimizing the Objective Function The procedure of parametric geon recovery is a search for parameters in space ~a which minimize the objective function E in (20). This function has a few deep and many shallow local minima as shown in Figure 10. The surface represents logarithm values of an objective function in terms of two rotation parameters. The actual parameter space is between nine and eleven dimensions. The deep local minima are caused by an inappropriate orientation of the model. The shallow
4. MINIMIZING THE OBJECTIVE FUNCTION
24
Figure 10: The objective function surface as a function of two rotation parameters.
minima are caused by noise and minor changes in object shapes. In order to obtain a best t of a model to an object, we need to nd model parameters corresponding to the global minimum in an objective function. To accomplish this goal, we employ a stochastic optimization technique, Very Fast Simulated Re-annealing (VFSR ) [27]. Motivated by an analogy to the statistical mechanics of annealing in solids, simulated annealing(SA) technique uses a 'temperature cooling' operation for non-physical optimization problems, thereby transforming a poor solution into a highly optimized, desirable solution[ 32]. To solve a global optimization problem, SA uses a random perturbation from an arbitrarily selected solution and then evaluates the resulting change in the objective function E . If the value of the objective function decreases (E < 0), the new solution is accepted as the starting point for the next move. However, if E > 0, the move is accepted according to a probability P (accept)= e?E =T . T is a parameter, commonly called \temperature" which controls the state generation and state acceptance. Thus an improbable move towards a state with a higher value of the objective function can be accepted occasionally. This uphill move allows the optimization procedure to escape from local minima. By successively lowering the temperature where performing the above operations, fewer uphill moves are allowed, and the likelihood that the solution approaches a global minimum increases. Finally, when the temperature reaches zero, only downhill moves are permitted. The salient feature of this approach
The advanced version of this program is called Adaptive Simulated Annealing (ASA) [28].
5. ACQUISITION OF MULTIPLE VIEW DATA
25
is to statistically deliver a global optimal solution. As an improved version of SA, VFSR permits an annealing schedule for T which decreases exponentially in annealing-time k , much faster than traditional (Boltzmann) annealing, where annealing schedule decreases logarithmly. The reannealing property permits adaptation to changing sensitivities in the multi-dimensional parameter space. Recent research [29] has also shown that VFSR is orders of magnitude more ecient than a standard genetic algorithm [44], another popular contender for global optimization. Using VFSR, we hope to reliably obtain parameters which describe the best t between models and data based on our objective function.
5. Acquisition of Multiple View Data The requirement of multi-view data is considerably important to stably obtain a unique parametric model of an object for recognition. Raja and Jain demonstrated that geon recovery from superquadrics using single view data is highly dependent on the viewpoint [38]. Whaite and Ferrie studied a quantitative measure of uncertainty in estimating superquadric parameters from single view data [51]. This measure is proportional to the thickness of the shell of uncertainty, de ned to be a region of 3D space that encloses the surface of the t model and in which there is a given probability that the true surface of the model lies. Some researchers have shown that acquisition of additional data from other views can signi cantly improve superquadric model estimation [11, 52]. Multi-view data are produced by a three-step procedure, called multiple view integration [20, 46, 41, 50, 13, 10]. In the rst step - data acquisition - range data from dierent views are collected as viewer-centered data descriptions speci ed in each camera coordinate system. In the second step - view registration - a transformation between a camera coordinate system and the world coordinate system is calculated. In the last step - view integration - range data in each camera coordinate system are transformed into the world coordinate system and usually the redundant data seen in more than one view are removed. In this section, we present a simple and straightforward method for view integration. Acquisition of multi-view range images is accomplished with a laser range nder which scans objects supported by a turntable. The registration among images taken from dierent views is
Current research in our laboratory has resulted in a method which permits us to \patch" together any number
5. ACQUISITION OF MULTIPLE VIEW DATA
26
obtained by a calibration which relates the turntable coordinates to the world coordinates. View integration is performed by using the view transformation, surface normals and residuals of the normal computation as follows:
1. Scan images from view m= 1,2,...M. 2. For each view, compute the normals and the angles between normals and visual lines. 3. FOR each view m,
FOR each data point D(i,j) in the image, FOR each successive view, n = m+1, m+2, ......, M, 1. Transform D(i,j) onto this view n. 2. IF the data D(i,j) is redundant with data in this view, THEN mark this D(i,j) and the corresponding data with RD (ReDundant). ELSE Mark this data NR (Not Redundant). END for each successive view. IF data are redundant, THEN select the best data in a speci c view according to its normal and visual angle. Then mark this best data with NR.
END for each data. END for each view. 4. Convert data marked with NR in all views onto a common world coordinate system. The principle of selecting the best data is shown in Figure 11. Point A generating a data point D1 on the image plane in view1 can be mathematically projected on image planes in both view2 and view3 giving data points D2 and D3 . If there exists data points at D2 and D3 on the original image planes, we can compute the world coordinates for these two points based on camera calibration. If the position of a point in the world coordinate system is spatially overlapped by point A, we mark this data point in its image plane as a redundant one. In Figure 11, D2 is redundant with D1 . However, D3 is not redundant with D1 and D2 because of randomly taken, partially overlapped views [10]. Due to errors in the estimated transformation parameters, multi-view data for the same point in 3D do not exactly overlap. Here a threshold in distance is employed.
6. EXPERIMENTS
27
view 1 D1 D3 B α
view3 object
A
β
N D2
view2
Figure 11: Point A on the shaded object surface can be projected onto view1; view2
and view3 resulting in data points D1 ; D2 and D3 , respectively. Only D1 and D2 depict the same point(A) and are treated as redundant data. D3 is not redundant with D1 and D2 since point A cannot appear in view3 due to object self-occlusion.
point A cannot be optically projected on the image plane in view3 due to object self-occlusion. The world coordinates for D3 is at point B which is not overlapped with point A in 3D space. Given redundant data D1 and D2 , then, the angles, and , between the surface normal N at point A and scan lines are examined. In general, if a surface point faces the range nder, and the angle between its normal and the scan line is small, the range nder gets good re ection of laser beams from the surface and the quality of the image data is good. Thus, the data D1 which gives a larger angle - or a smaller cosine value of - is removed. If the cosine values of angles associated with a few data are very close, we keep the data point with the smallest residual resulting from the normal computation. In a few cases, both cosine values and residuals are very close and we then choose data in the view collected in the earlier stage. The integrated data is expressed as a sequence of points in R3 .
6. Experiments The following experiments are conducted to investigate the eciency of the objective function and the discriminative properties of parametric geons. We are interested in examining the residual dierences among all t models, especially when object data contain noise and object shapes are not the exact shapes of parametric geons. The procedure for recovering parametric geons
6. EXPERIMENTS
28 Model Fitting Ellipsoid Cylinder Cuboid
IMAGE DATA
Tapered Cylinder
Model Selection
MODEL
Tapered Cuboid Curved Cylinder Curved Cuboid
Figure 12: Computing parametric geons. Range data are input to seven indepen-
dent systems. The tting residuals are determined and the model associated with the minimum residual is selected as the object model.
involves two steps: model tting and model selection as illustrated in Figure 12. Input range data are distributed to seven tting systems. The best t model with the minimum residual value is selected by comparing the tting residual values. All objects used in the experiments are single-part objects which lack sharp concavities. We show three sets of experiments using synthetic data, range data of geon-like objects and range data of imperfect geon-like objects. During experiments, a constraint for each parameter to be searched was speci ed beforehand. We calculated a rectangular region in 3D space bounded by maximum and minimum x; y; z coordinates fXmax ; Xmin ; Ymax ; Ymin ; Zmax ; Zmin g of range data shown in Figure 13. The maximum dimension in this space is
q L = (Xmax ? Xmin )2 + (Ymax ? Ymin )2 + (Zmax ? Zmin )2
(34)
Constraints for all parameters are given in Table 3. l > 0 is the minimum possible length of objects. (cx ; cy ; cz ) is the centroid of data set. d is the deviation from the centroid. l and d are free parameters set according to a priori knowledge. Since the upper bound of the bending curvature can be set to the inverse of the minimum possible radius, we select h = min(Xmax ? Xmin ; Ymax ? Ymin ; Zmax ? Zmin ) as the minimum diameter of the bent sector. Thus h=2 is the minimum possible radius. tx ; ty and tz are translation parameters and rx ; ry and
6. EXPERIMENTS
29 (Xmin ,Ymin ,Z max )
(Xmin ,Ymax ,Z max)
(Xmax ,Ymin , Z max )
(Xmax ,Ymax , Z max ) L
z y
(Xmin ,Ymax ,Z min )
x (Xmax ,Ymax ,Z min ) (Xmax ,Ymin ,Z min ) Figure 13: A cylindrical object enclosed by a rectangular region in 3D space which is used for estimating the range of the size parameters.
rz are rotation parameters. The parameter search procedure done with VFSR stops when any of the following conditions is reached. 1. Smallest temperature value. 2. Minimum value of the objective function. 3. Maximum number of times sampling the same point. 4. Maximum number of times of state acceptance. 5. Maximum number of evaluations of the objective function.
6.1 Experiments with Synthetic 3D Data First, we use synthetic noise-free data which allow us to examine the discriminative properties of parametric geons. This is motivated by the fact that if sensor noise does not appear in all data, the dierences between tting residuals are only due to size and shape dierences. We will show the residual dierences among seven parametric geons. The given seven objects are generated using the implicit equations suggested in [21]. Note that for the cases of tapering and bending, the similarity between deformed objects and non-deformed objects depends on the degree of deformation. For a cylinder and a tapered cylinder, the greater the values of the
6. EXPERIMENTS
30 parameter lower bound upper bound ax l=2 L=2 ay l=2 L=2 az l=2 L=2 tx cx ? d cx + d ty cy ? d cy + d tz cz ? d cz + d
? ? ?
rx ry rz kx ky
0 0 0
1 1
2=h
Table 3: Parameter constraints.
parameters values
ax
20
Table 4: Model
ay
20
az
60
kx
0:5
ky
0:5
=240
parameters used for generating synthetic data.
tapering parameters, the larger the dierence in residuals. The range is from 0 to 1:0 for tapering parameters and is from 0 to =H for the bending parameter. Here H is the length of primitives along which bending is performed. The parameters used for creating synthetic data of parametric geons are given in Table 4. Table 5 shows tting residuals resulting from a procedure of tting parametric geons to synthetic data without noise. The types of objects being t are listed in the rst column. Each row shows residuals for tting dierent models to a particular object given in the rst column. The bold gures on the diagonal are the residuals given by tting a model to its own type of data. The underlined gures are residuals produced by tting tapered and curved models to a cylinder or a cuboid. When tapered or curved models are t to a regular cylinder or cuboid, kx ; ky , or take values which are very close to 0. The deformed models appear to have the same shapes as a regular cylinder or cuboid. In model selection, if two residuals are very close and much smaller than all others, the algorithm selects the simplest of the two shapes. Experimental results show that the residuals from tting a model to its own object type In this experiment, we assume that the maximum bending curvature is =H corresponding to half of the circular section. The radius of the circular section is R = H=.
6. EXPERIMENTS
OBJECTS elli cyld cubd tcyld tcubd ccyld ccubd
31 elli
0.001
14.025 32.335 18.912 37.056 23.683 36.171
cyld 4.737
0.047
9.753 14.266 25.201 25.671 36.115
cubd 7.512 9.877
0.010
20.081 14.791 33.408 18.780
MODELS
tcyld tcubd ccyld 4.630 8.308 4.614 0:024 11.535 0:544 9.776 0:066 9.765 0.143 10.068 14.271 10.182 0.193 25.605 23.191 10.068 0.072 34.408 24.850 12.590
ccubd 8.579 8.372 0:078 20.006 15.073 21.876
0.104
Table 5: Fitting models to synthetic data without noise. Items in each row are the residuals from tting dierent models to the data of a particular object, as listed in the rst column. The symbols elli, cyld, cubd, tcyld, tcubd, ccyld and ccuboid denote ellipsoid, cylinder, cuboid, tapered cylinder, tapered cuboid, curved cylinder and curved cuboid, respectively. The bold gures denote the residuals from tting a model to its own object type. The underlined gures are the residuals from tting tapered or curved models to a regular cylinder or cuboid.
are signi cantly smaller than when tting a model to another type of objects. These results demonstrate that the seven parametric geons are extremely distinctive based on the de ned objective function.
6.2 Experiments with Range Data of Geon-like Objects In this experiment, range data of machine-made wooden objects are used. The shapes of these objects are similar to those shapes of parametric geons. Four views are used to collect range images. A simple thresholding is performed to remove the supporting plane and other background data. Surface normals are computed by a least squares tting method. Multi-view integration followed by parametric geon recovery is then performed. Figure 14 shows the range images obtained from four viewpoints, the range data after removing most redundant data and the range data integrated into the world coordinate system. The major dierences from the rst experiment is that object data contain sensor noise and some data are missing due to object self-occlusion. Table 6 shows that although noise and object self-occlusion aect the tting residuals, residual values obtained by tting models to their own object type are also much smaller than those obtained by tting models to other object types.
6. EXPERIMENTS
32
(a)
(b)
(c)
Figure 14: Multi-view integration. (a) Four images taken from dierent viewpoints. (b) Range data in each view after removing redundant data. (c) Range data viewed in the four directions after merging the four sets of data in (b).
6.3 Experiments with Range Data of Imperfect Geon-like Objects Our purpose is to examine the uniqueness of shape approximation using parametric geons when given a set of objects whose shapes slightly vary. In this experiment, eleven real bananas are used as objects. Their shapes cannot simply be depicted with any shape of parametric geons. Some bananas have stems at their ends and black marks on their bodies. Some bananas have relatively sharp surface variations. In some bananas, the curvatures of the main axes change slightly at the top and signi cantly at the bottom. No banana's cross section is perfectly symmetrical. Figure 15 shows four bananas which we used in experiments. Noise surfaces of banana objects are due to the banana size and the view angle of the range nder. Since the banana size is relatively large, it has to be far from the range nder in order to t it inside the range nder scanning eld. The sampling error of the range nder increases as the scanning distance increases. Thus, a bit amount of noise occur on banana surfaces. Table 7 gives the average tting residuals, standard deviations,
6. EXPERIMENTS
OBJECTS elli cyld cubd tcyld tcubd ccyld ccubd
33 elli
1.206
22.535 36.156 17.993 34.276 24.153 34.209
cyld cubd 12.075 19.368 0.7968 12.976 17.819 1.313 20.740 27.190 28.241 14.256 21.986 22.148 29.887 23.213
MODELS tcyld 16.511 0:864 28.707
tcubd 26.449 12.834 1:327 2.339 15.203 20.156 1.667 21.197 47.987 25.130 16.291
ccyld 14.512 0:871 17.807 17.625 28.228
ccubd 24.162 14.432 1:338 25.740 14.242 3.300 22.197 14.341 2.949
Table 6: Fitting models to range data of geon-like objects.
Figure 15: Four bananas used in the experiments.
maximum and minimum tting residuals for all banana objects. Since all bananas have dierent sizes, we cannot compare the tting residuals among dierent bananas. Thus, all residuals are normalized by the minimum residuals as follows:
Eijn = Eij =Emin;j ;
i = 1; :::; 7
(35)
Emin;j = min fEij g i
E n is the relative (normalized) residual value. i and j are the indices of models and objects respectively. Results show that the best model for all bananas is the curved cylinder, which gives the smallest average residual value among all values. Parametric geon models and recovery procedures demonstrate robust behavior and uniquely represent banana shapes even though there
7. DISCUSSION
34 elli 3.255
cyld 2.889
cubd 3.851
MODELS tcyld 3.324
tcubd 3.611
ccyld 1
ccubd 2.987
Mean Standard deviation 0.118 0.092 0.152 0.232 0.139 0.000 0.149 Maximum residual 4.00111 3.48881 4.71736 5.0178 4.32779 1 3.80155 Minimum residual 2.65569 2.45822 3.10204 2.46402 3.07318 1 2.38523 Table 7: Fitting models to range data of eleven bananas.
are minor variations with these shapes.
7. Discussion We have introduced parametric geons as volumetric primitives to derive object descriptions for qualitative recognition. An objective function, involving (i) distances between data and model surfaces and (ii) dierences between data and model normals, is minimized when parametric geons are t to the object data. The best object model corresponds to the one having the minimum residual value of the objective function. Experiments with synthetic data, range data of geon-like objects and range data of imperfect geon-like objects were performed to examine the behavior of the objective function and the shape discrimination capabilities of such parametric models. Recently, Metaxas and Dickinson [33] presented a method for recovering volumetric primitives by integrating qualitative and quantitative techniques. Using intensity images containing perfect geon-like objects as input, they rst recovered a qualitative geon model and then tted a deformable superquadric to this qualitative model. Their geon models and system output are very similar to ours. However, the major dierence between their work and ours is that they combine both a qualitative and quantitative method to recover the object shapes, while we focus on an approximation of the qualitative object shapes. Comparing the two approaches, (i) we use multiview range nder data, (ii) we can deal with imperfect geon-like objects as input, and (iii) we obtain both qualitative and quantitative object shape information simultaneously. In Section 3.2, we de ned the objective function as a sum of a distance measure (the rst term) and a normal measure (the second term) in terms of L1 and L2 norms, respectively. In
7. DISCUSSION
35
Section 3.3, we indicated that the L1 norm is less sensitive to outliers than the L2 norm. It is also known that the absolute size of an object is independent of the measurement of the dierences between normals. These properties can be used to construct an ecient parameter search during model tting. Eectively, the procedure automatically endeavors to compute the correct result in what amounts to two successive \stages". In the rst \stage", when the tting procedure begins, the models and objects are not well aligned; hence, most of the data can be viewed as outliers. Thus the second term is much larger than the rst one, thereby dominating the search. Since the second term is a measure of the dierence between normals, the exact size of the object has little eect on it. At this point, the actual search is eectively being performed in a space which mainly involves transformation, deformation, and size ratio parameters. This search space is clearly smaller than the entire parameter space. As the tting procedure progresses, the shape of the model approaches that of the object. Now the contribution of the second term gradually decreases and the rst term becomes progressively `larger'. When the value of the rst term is similar to that of the second, the search enters the second \stage" in which both terms play important roles in the objective function, and the search space becomes the full parameter space. The major objective of this second \stage" is to nd the correct parameters for the parametric geons. The transition from one \stage" to the next is achieved smoothly and automatically as shown in Figure 16. We note that the second \stage" occurs only when the model shape is relatively similar to the object shape. Otherwise, the second term is always much larger than the rst , even though the search procedure may terminate. The optimization algorithm used in these experiments is Very Fast Simulated Re-annealing (VFSR), a stochastic optimization approach [27]. Since the objective function possesses many local minima, nondeterministic optimization approaches must be employed. Some researchers have used a nonlinear least squares minimization (Levenberg-Marquardt) method, adding random walks to escape local minima [40, 12, 45, 47]. This is similar to simulated annealing but with an extremely fast annealing schedule. In some cases, where the properties of the objective functions are known or a good initial estimation of parameters can be obtained, this type of technique will usually take much less time than general global optimization methods. However, with an inappropriate initial guess and employing an extremely fast annealing schedule, this may trap the algorithm in a local minimum. This is because the initial guess could be positioned near one of the deep local minima and the chosen annealing schedule would not then ensure global
8. CONCLUSION
36
140
values of the objective function
120
100
80
60
40
20
0 0
20
40
60
80
100
120
number of decrements in the objective function
Figure 16: The curves show the convergence of the distance measure d1 and the normal
measure d2 as they change during the search. Their values gradually decrease. The solid line indicates values of the distance measure (the rst term). The dotted line gives values for the normal measure (the second term). Finally, the dashed line presents the values of the complete objective function. These curves were obtained when a curved cylinder model was t to data from the same type of object.
convergence. Although VFSR uses a very rough initial estimate for the parameter range, it still exhibited stable behavior in searching for the global minimum in our experiments. In addition, those methods which require a good initial estimate must assume that bending and tapering deformation takes place along only the longest side [45]. By using global optimization methods, we do not necessarily need to impose this constraint. Therefore more objects can be modeled, as shown in Figure 17.
8. Conclusion The contributions of this report are as follows: 1. We derive a new set of volumetric primitives, parametric geons, from geons and globallydeformed superquadrics. Parametric geons provide the distinctive shape information as well as quantitative size and deformation information required for object recognition. Their model constraints signi cantly prevent primitive recovery from eect of noise and minor
8. CONCLUSION
37
(a)
(b)
(c)
(d)
(e)
(f)
Figure 17: (a) and (d) show two regular cuboids. (b) and (e) show curved and tapered cuboids, which are consistent with the assumption that deformation occurs along the longest side of objects. In (c) and (f), bending and tapering, respectively, are not performed along the longest side of the cuboids. By using a global optimization method, we can model the objects illustrated not only in (b) and (e), but also in (c) and (f).
variations in object shape. 2. We propose an approach to recover parametric geon models from range data. Multi-view data, the objective function involving measure of distance and normal dierences, and a global optimization (VFSR) are used to obtain the best t object model. We discuss how to determine the contribution of normal measure to the objective function. A utility of combination of the L1 and L2 norms for building the objective function allows a stable and hierarchical search during the model recovery procedure. 3. We demonstrate the strong capability of shape approximation of single-part objects using parametric geons by more than two-hundred experiments with synthetic data, range data of geon-like objects and range data of imperfect geon-like objects.
Acknowledgements
8. CONCLUSION
38
We thank Dr. Lester Ingber for kindly providing the VFSR (ASA) computer code and engaging in valuable discussions. K. Wu thanks Gerard Blais and Gilbert Soucy for technical help. M. D. Levine would like to thank the Canadian Institute for Advanced Research and PRECARN for its support. This work was partially supported by a Natural Sciences and Engineering Research Council of Canada Strategic Grant and an FCAR Grant from the Province of Quebec.
39
References [1] A. H. Barr. Superquadrics and angle-preserving transformations. IEEE Computer Graphics Applications, 1:11{23, 1981. [2] A. H. Barr. Global and local deformations of solid primitives. Computer Graphics, 18(3):21{ 30, 1984. [3] R. Bergevin and M. D. Levine. Extraction of line drawing features for object recognition. Pattern Recognition, 25:319{334, 1992. [4] R. Bergevin and M. D. Levine. Part decomposition of objects from single view line drawings. CVGIP: IMAGE UNDERSTANDING, 55(1):73{83, January 1992. [5] R. Bergevin and M. D. Levine. Generic object recognition: Building and matching coarse descriptions from line drawings. IEEE Transcations on Pattern Analysis and Machine Intelligence, 15(1):19{36, January 1993. [6] P. J. Besl and R. C. Jain. Invariant surface characteristics for three dimensional object recognition in range images. Computer Vision, Graphics, and Image Processing, 33(1):33{ 88, 1986. [7] I. Biederman. Human image understanding: Recent research and a theory. Computer Vision, Graphics, and Image Processing, 32:29{73, 1985. [8] I. Biederman and E. Cooper. Priming contour-deleted images: Evidence for intermediate representations in visual object recognition. Cognitive Psychology, 23:394{419, 1991. [9] T. O. Binford. Visual perception by computer. In IEEE Conference on Systems and Control, Miami, FL, 1971. [10] Gerard Blais. Creating 3D computer objects by integrating multiview range data. Master's thesis, Department of Electrical Engineering, McGill University, Montreal, Quebec, Canada, 1993.
REFERENCES
40
[11] T. Boult and A. Gross. Recovery superquadrics from depth information. In Proceedings of the AAAI workshop on spatial Reasoning and Multisensor Integration, pages 128{137. American Association for Arti cial Intelligence, 1987. [12] A. D. Bross and T. E. Boult. Error of t measures for recovering parametric solids. In Proceedings, 2nd International Conference on Computer Vision, pages 690{694, Tampa, Florida, 1988. Computer Society of the IEEE, IEEE Computer Society Press. [13] Y. Chen and G. Medioni. Object modeling by registration of multiple range images. In Proceedings of the 1991 IEEE International Conference on robotics and Automation, pages 2724{2729, Sacramento, CA, 1991. [14] S. J. Dickinson, A. P. Pentland, and A. Rosenfeld. A representation for qualitative 3D object recognition integrating object-centered and viewer-centered models. In K. N. Leibovic, editor, Vision: A Convergence of Disciplines. Springer Verlag, New York, 1990. [15] S. J. Dickinson, A. P. Pentland, and A. Rosenfeld. 3D shape recovery using distributed aspect matching. IEEE transactions on Pattern Analysis and Machine Intelligence, 14(2):174{198, February 1992. [16] R. Fairwood. Recognition of generic components using logic-program relations of images contours. Image and Vision Computing, 9:113{122, 1991. [17] O. D. Faugeras, M. Hebert, P. Mussi, and J. D. Boissonnat. Polyhedral approximation of 3D objects without holes. Computer Vision, Graphics, and Images Processing, 25:169{183, 1984. [18] F. Ferrie, J. Lagarde, and P. Whaite. Darboux frames, snakes, and super-quadrics: Geometry from bottom-up. In Proceedings IEEE Workshop on Interpretation of 3D Scenes, pages 170{176, Austin, TX, November 1989. [19] F. Ferrie and M. D. Levine. Deriving coarse 3D models of objects. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 345{353, Ann Arbor, Michigan, June 1988.
REFERENCES
41
[20] F. P. Ferrie and M. D. Levine. Integrating information from multiple views. In Proceedings of the IEE Computer Society Workshop on Computer Vision, pages 117{122, Miami, Florida, Dec. 1987. IEEE Computer Society Press. [21] W. R. Franklin and A. H. Barr. Faster calculation of superquadric shapes. Computer Graphics, 1(3):41{45, July 1981. [22] M. Gardiner. The superellipse: A curve that lies between the ellipse and the rectangle. Scienti c American, 213:222{234, 1965. [23] F. R. Hammpel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel. Robust Statistics: The Approach Based on In uence Functions. Wiley, New York, 1986. [24] S. Han, D. B. Goldgof, and K. Bowyer. Using hyperquadrics for shape recovery from range data. In Proceedings of Fourth International Conference on Computer Vision, pages 492{ 496, Berlin, Germany, May 1993. IEEE Computer Society Press. [25] G. Hat eld and W. Epstein. The status of the minimum principle in the theoretical analysis of visual perception. Psychological Bulletin, 97(2):155{186, 1985. [26] J. Hummel and I. Biederman. Dynamic binding in a neural network for shape recognition. Psychological Review, to appear 1992. [27] L. Ingber. Very fast simulated re-annealing. Mathematical and Computer Modelling, 12(8):967{973, 1989. [28] L. Ingber. Adaptive simulated annealing (asa). [ftp.caltech.edu:/pub/ingber/asa.Z], 1993. Lester Ingber Research, McLean, VA. [29] L. Ingber and B. E. Rosen. Genetic algorithms and very fast simulated reannealing: A comparison. Mathematical and Computer Modelling, 16(11):87{100, 1992. [30] A. Jacot-Descombes and T. Pun. A probabilistic approach to 3-d inference of geons from a 2-d view. In K. W. Bowyer, editor, Applications of Arti cial Intelligence X: Machine Vision and Robotics, Proceedings of SPIE, volume 1708, pages 579{588, Orlando, Florida, April 1992.
REFERENCES
42
[31] D. Keren, D. Cooper, and J. Subrahmonia. Describing complicated objects by implicit polynomials. Technical Report LEMS Technical Report #102, Division of Engineering, Brown University, Providence, RI, USA, 1992. [32] S. Kirkpatrick, Jr. C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671{680, May 1983. [33] D. Metaxas and S. J. Dickinson. Integration of quatitative and qualitative techniques for deformable model tting from orthographic, perspective, and stereo projections. In Proceedings of Fourth International Conference on Computer Vision, pages 641{649, Berlin, Germany, May 1993. IEEE Computer Society Press. [34] P. G. Mulgaonkar. Understanding object con gurations using range images. IEEE Transcations on Pattern Analysis and Machine Intelligence, 14(2):303{307, February 1992. [35] R. C. Munck-fairwood and L. Du. Shape using volumetric primitives. Image & Vision Computing, 11(6):364{371, July 1993. [36] Q. L. Nguyen and M. D. Levine. 3D object representation in range images using geons. In 11th International Conference on Pattern Recognition, Hague, Netherlands, August 1992. [37] B. O'neill. Elementary Dierential Geometry. Academic Press, Nork and London, 1966. [38] N. S. Paja and A. K. Jain. Recognizing geons from superquadrics tted to range data. Image and Vision Computing, 10(3):179{190, April 1992. [39] A. P. Pentland. Perceptual organization and the representation of natural form. Arti cial Intelligence, 28:293{331, 1986. [40] A. P. Pentland. Recognition by parts. In The First International Conference on Computer Vision, pages 8{11, London, June 1987. [41] M. Potmesil. Generating models of solid objects by matching 3D surface segments. In Proceedings of the 8th International Joint Conference on Arti cial Intelligence, pages 1089{1093, Karlsruhe, West Germany, August 1983.
REFERENCES
43
[42] N. S. Raja and A. K. Jain. Obtaining generic parts from range data using a multi-view representation. In Proceedings of SPIE conference on Application of Arti cial Intelligence: Machine Vision & Robotics, Orlando, April 1992. [43] K. Rao and R. Nevatia. Computing volume descriptions from sparse 3-D data. International Journal of Computer Vision, 2:33{50, 1988. [44] N. N. Schraudolph and J. J. Grefenstette. A uers guide to gaucsd 1.2. Technical report, University of California at San Diego, La Jolla, CA, 1991. [45] F. Solina and R. Bajcsy. Recovery of parametric models from range images: the case for superquadrics with global deformations. IEEE transactions on Pattern Analysis and Machine Intelligence, 12(2):131{147, 1990. [46] M. Soucy and D. Laurendeau. Generating non-redundant surface representations of 3D objects using multiple range views. In Proceedings of the 10th Conference on Pattern Recognition, pages 198{200, Atlantic city, New Jersey, June 1990. [47] S. Sullivan, L. Sandford, and J. Ponce. On using geometric distance ts to estimate 3D object shape, pose, and deformation from range, ct, and video images. In Proceedings of 1993 IEEE computer Society Conference on Computer Vision and Pattern Recognition, pages 110{115, New York City, NY, June 1993. IEEE Computer Society Press. [48] D. Terzopoulos, , A. Witkin, and M. Kass. Symmetry-seeking models for 3D object reconstruction. International Journal of Computer Vision, 1(3):211{221, 1987. [49] D. Terzopoulos and D. Metaxas. Dynamic 3D models with local and global deformations: Deformable superquadrics. IEEE transactions on Pattern Analysis and Machine Intelligence, 13(7):703{714, 1991. [50] B. C. Vemuri and J. K. Aggarwal. 3D model construction from multiple views using range and intensity data. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 435{438, Miami, Florida, June 1986. [51] P. Whaite and F. P. Ferrie. From uncertainty to visual exploration. IEEE transactions on Pattern Analysis and Machine Intelligence, 13(10):1038{1049, October 1991.
REFERENCES
44
[52] P. Whaite and F. P. Ferrie. Uncertain views. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 15{18, Champaign, Illinois, June 1992. [53] N. Yokoya, M. Kaneta, and K. Yamamoto. Recovery of superquadric primitives from a range image using simulated annealing. In Proceedings of International joint Conference on Pattern Recognition, volume 1, pages 168{172, 1992. [54] W. Zorach. Zorach Explains Sculpture: What It Means and How It Is Made. Tudor Publishing Company, New York, 1960.