An Integrated Linear Technique for Pose Estimation from ... - CiteSeerX

0 downloads 0 Views 382KB Size Report
the position and orientation of a camera with respect to an object coordinate frame. So- ... level exceeds a knee level or the number of points is below a knee level, these ... paper, we describe an integrated least-squares method that solves the camera ..... of the circle center can then be solved for from the plane equation.
An Integrated Linear Technique for Pose Estimation from Di erent Geometric Features Qiang Jiy, Mauro S. Costay, Robert M. Haralicky, and Linda G. Shapiroy;z Department of Electrical Engineeringy Department of Computer Science and Engineeringz University of Washington Seattle WA 98195, USA fqiangji, mauro, haralick, [email protected] Abstract Existing linear solutions for the pose estimation (or exterior orientation) problem su er from a lack of robustness and accuracy partially due to the fact that the majority of the methods utilize only one type of geometric entity and their frameworks do not allow simultaneous use of di erent types of features. Furthermore, the orthonormality constraints are weakly enforced or not enforced at all. We have developed a new analytic linear least-squares framework for determining pose from multiple types of geometric features. The technique utilizes correspondences between points, between lines, and between ellipse-circle pairs. The redundancy provided by di erent geometric features improves the robustness and accuracy of the least-squares solution. A novel way of approximately imposing orthonormality constraints on the sought rotation matrix within the linear framework is presented. Results from experimental evaluation of the new technique using both synthetic data and real images reveal its improved robustness and accuracy over existing direct methods.

Keywords: Pose estimation, exterior camera parameter estimation, camera calibration. 1

1 Introduction Pose estimation is an essential step in many machine vision and photogrammetric applications including robotics, 3D reconstruction, and mensuration. In computer vision, the problem is also known as exterior camera calibration. It addresses the issue of determining the position and orientation of a camera with respect to an object coordinate frame. Solutions to the pose estimation problem can be classi ed into linear methods and non-linear methods. Linear methods have the advantage of computational eciency, but they su er from lack of accuracy and robustness. Non-linear methods, on the other hand, o er a more accurate and more robust solution. They are, however, computationally intensive and require an initial estimate. The classical non-linear photogrammetric approach to camera calibration requires to set up a constrained non-linear least-squares system. Given initial estimates of the camera parameters, the system is then linearized and solved iteratively. While the classical technique guarantees all orthonormality constraints and delivers the best answer, it, however, requires good initial estimates. It is a well-known fact that the initial estimates must be close or the system may not converge. Hence, the quality of an initial estimate is critical since it determines the convergence speed and the correctness of the iterative procedures. Robust and accurate linear solutions are therefore important for computer vision and photogrammetric problems. Numerous methods have been proposed to analytically obtain exterior camera parameters. Previous methods in camera parameter or pose estimation have primarily focused on using sets of 2D-3D point correspondences including the three point solution (Fischler and Bolles, 1981), the four point solutions(Hung et al., 1985; Holt and Netravali, 1991), and the six or more point solutions (Tsai, 1987; Sutherland, 1974; Faugeras, 1993a). Haralick et al (Haralick et al., 1994) reviewed major analytic solutions from three point correspondences and characterized their performance under varying noisy conditions. Sutherland(Sutherland, 1974) provided a closed-form least-squares solution using six or more points. The solution, however, assumed the depth of the camera to be unity. Faugeras(Faugeras, 1993a) proposed a similar constrained least-squares method to solve for the camera parameters that requires at least six point 2D-3D correspondences. The solution includes an orthogonalization process 2

that ensures the orthonormality of the resulting rotation matrix. Tsai(Tsai, 1987) presented a direct solution by decoupling the camera parameters into two groups; each group is solved for separately. This method, however, assumes the existence of a physical constraint that is dependent on only a subset of the camera parameters. These methods are e ective and simple to implement. However, they are not robust and are very susceptible to noise in image coordinates(Wang and Xu, 1996), especially when the number of control points approaches the minimum required. For the 3-point solutions, Haralick et al (Haralick et al., 1994) showed that even the order of algebraic substitutions can render the output useless. Furthermore, the point con guration and noise in the point coordinates can also dramatically change the relative output errors. For least-squares based methods, a di erent study by Haralick et al(Haralick et al., 1989) shows that when the noise level exceeds a knee level or the number of points is below a knee level, these methods become extremely unstable and the errors skyrocket. The use of more points can help relieve this problem. However, fabrication of more control points often proves to be dicult, expensive, and time-consuming. Another disadvantage of point-based methods is the diculty with point matching, i.e., nding the correspondences between the 3D scene points and 2D image pixels. In view of these issues, other researchers have investigated the use of higher-level geometric features such as lines or curves as observed geometric entities to improve the robustness and accuracy of linear methods for estimating camera parameters. Over the years, various algorithms (Echigo, 1990; Liu et al., 1990; Chen and Tsai, 1990; Rothwell et al., 1992) have been introduced. In their analytic method, Liu et al (Liu et al., 1990) and Chen (Chen and Tsai, 1990) discussed direct solutions for determining external camera parameters based on a set of 2D-3D line correspondences. The key to this algorithm lies in the linear constraint they used. This constraint uses the fact that a 3D line and its image line lie on the same plane determined by the center of perspectivity and the image line. Rothwell et al(Rothwell et al., 1992) discussed a direct method that determines camera parameters using a pair of conic curves. The method works by extracting 4 or 8 points from conic intersections and tangencies. Camera parameters are then recovered from these points. Ma (Song, 1993) introduced a technique for pose estimation from the correspondence of 2D/3D conics. The 3

technique, however, is iterative and requires a pair of conics in both 2D and 3D. Analytic solutions based on high-level geometric features a ord better stability and are more robust and accurate. Here the correspondence problem can be addressed more easily than the point-based methods. However, high-level geometric features may not always be present in some applications, and points are present in many applications. Therefore, completely ignoring points while solely employing high-level geometric entities can be a waste of readily available geometric information. This is one of the problems with existing solutions: they either use points or lines or conics but not a combination of features. In this paper, we describe an integrated least-squares method that solves the camera transformation matrix analytically by fusing available observed geometric information from di erent levels of abstraction. Speci cally, we analytically solve for the external camera parameters from simultaneous use of 2D-3D correspondences between points, between lines, and between 2D ellipses and 3D circles. The attractiveness of our approach is that the redundancy provided by features at di erent levels improves the robustness and accuracy of the least-squares solution, therefore improving the precision of the estimated parameters. To our knowledge, no previous research attempts have been made in this area. Work by Phong et al (Phong et al., 1995) described a technique in which information from both points and lines is used to compute the pose. However, the method is iterative and involves only points and lines. Another major factor that contributes to the lack of robustness of the existing linear methods is that orthonormality constraints on the rotation matrix are often weakly enforced or not enforced at all. In this paper, we introduce a simple, yet e ective, scheme for approximately imposing the orthonormal constraints on the rotation matrix. While the scheme does not guarantee that the resultant rotation matrix completely satis es the orthonormal constraints, it does yield a matrix that is closer to orthonormality than those obtained with competing methods. This paper is organized as follows. Section 2 brie y summarizes the perspective projection geometry and equations. Least-squares frameworks for estimating camera transformation matrix from 2D-3D point, line, and ellipse/circle correspondences are presented in sections 3, 4, and 5 respectively. Section 6 discusses our technique for approximately imposing orthonormal constraints and presents the integrated linear technique for estimating the transformation matrix simultaneously using point, line, and ellipse/circle correspondences. 4

Performance characterization and comparison of the developed integrated technique is covered in section 7.

2 Perspective Projection Geometry To set the stage for the subsequent discussion of camera parameter estimation, this section brie y summarizes the pin-hole camera model and the perspective projection geometry. Let P be a 3D point and (x y z)t be the coordinates of P relative to the object coordinate frame Co. De ne the camera coordinate system Cc to have its z-axis parallel to the optical axis of the camera lens and its origin located at the perspectivity center. Let (xc yc zc)t be the coordinates of P in Cc. De ne Ci to be the image coordinate system, with its u-axis and v-axis parallel to the x and y axes of the camera coordinate frame, respectively. The origin of Ci is located at the principal point. Let (u v)t be the coordinates of Pi, the image projection of P in Ci. Based on the perspective projection theory, the projection that relates (u; v) on the image plane to the corresponding 3D point (xc; yc; zc) in the camera frame can be described by 0 1 0 1 BB u CC BB xc CC (1) B BB v CCC = BBB yc CCC @ A @ A zc f where  is a scalar and f is the camera focal length. Further, (x y z)t relates to (xc yc zc)t by a rigid body coordinate transformation consisting of a rotation and a translation. Let a 3  3 matrix R represent the rotation and a 3  1 vector T describe the translation, 0 1 0 then1 BxC BB xc CC BB y CC = R BBB y CCC + T (2) B@ CA B@ c CA z zc where T can be parameterized as T = (tx ty tz ) and R as an orthonormal matrix

0 BB r BB r R=B @ r

11 21 31

r r r

12 22 32

5

1 r C CC r C CA r 13

23

33

R and T describe the orientation and location of the object frame relative to the camera frame respectively. Together, they are referred to as the camera transformation matrix. Substituting the parameterized T and R into equation 2 yields 10 1 0 1 0 1 0 BB xc CC BB r r r CC BB x CC BB tx CC BB y CC = BB r r r CC BB y CC + BB t CC (3) CA B@ CA B@ y CA B@ c CA B@ tz z r r r zc 11

12

13

21

22

23

31

32

33

Combining the projection equation 1 with the rigid transformation of equation 3 and solving for  yields the collinearity equations, which describe the ideal relationship between a point on the image plane and the corresponding point in the object frame r y + r z + tx u = f rr xx + + r y + r z + tz r y + r z + ty v = f rr xx + (4) +r y+r z+t 11

12

13

31

32

33

21

22

23

31

32

33

z

For a rigid body transformation, the rotation matrix R must be orthonormal, that is, Rt = R? . The constraint Rt = R? amounts to the six orthonormality constraint equations on the elements of R r +r +r =1 r r +r r +r r =0 r +r +r =1 r r +r r +r r =0 1

1

2 11

2 12

2 13

11 21

12 22

13 23

2 21

2 22

2 23

11 31

12 32

13 33

r +r +r =1 2 31

2 32

2 33

r r +r r +r r =0 21 31

22 32

23 33

(5)

where the three constraints on the left are referred to as the normality constraints and the three on the right as the orthogonality constraints. The normality constraints ensure that the row vectors of R are unit vectors, while the orthogonality constraints guarantee orthogonality among row vectors.

3 Camera transformation matrix from point correspondences Given the 3D object coordinates of a number of points and their corresponding 2D image coordinates, the coecients of R and T can be solved for by a least-squares solution of an 6

overdetermined system of linear equations. Speci cally, the least-squares method based on point correspondences can be formulated as follows. Let Xn = (xn; yn; zn), n = 1; : : : ; K , be the 3D coordinates of K points relative to the object frame and Un = (un; vn) be the observed image coordinates of these points. We can then relate Xn and Un via the collinearity equations as follows

un = f rr vn = f rr

11 31

21 31

xn + r xn + r xn + r xn + r

12 32

22 32

yn + r yn + r yn + r yn + r

13 33

23 33

zn + tx zn + t z zn + ty zn + t z

Rewriting the above equation yields

fr xn + fr yn + fr zn ? unr xn ? unr yn ? unr zn + ftx ? untz = 0 fr xn + fr yn + fr zn ? vnr xn ? vnr yn ? vnr zn + fty ? vntz = 0 11

12

21

13

22

31

23

32

31

33

32

We can then set up a matrix M and a vector V as follows 0 BB fx1 fy1 fz1 0 0 0 ?u1x1 ?u1y1 BB 0 0 0 fx1 fy1 fz1 ?v1x1 ?v1y1 M 2K 12 = B BB ... BB fx fy fz @ K K K 0

V

1 =

12



0

r

11

0

12

0

fxK fyK fzK

0

r

0

r

13

r

21

r

22

r

(6)

33

1

?u1 z1 f 0 ?u1 C ?v1 z1 0 f ?v1 C CC C

CC C ?uK C A

(7)

?uK xK ?uK yK ?uK zK f 0 ?vK xK ?vK yK ?vK zK 0 f ?vK

r

23

31

r

32

r

33

tx ty tz

t

(8)

where M is hereafter referred to as the collinearity matrix and V is the unknown vector of transformation parameters containing all sought rotational and translational coecients. To determine V , we can set up a constrained least-squares problem that minimizes

 = jjMV jj 2

2

(9)

where  is the sum of residual errors of all points. Given an overdetermined system, a solution to the above least-squares minimization requires MV = 0. Its solution contains an arbitrary scale factor. To uniquely determine V , di erent methods have been proposed to solve the scale factor. In the least-squares solution provided by Sutherland(Sutherland, 1974), the depth of the camera is assumed to be unity; tz = 1. Not only is this assumption 2

7

unrealistic for most applications, but also the solution is constructed without regard to the orthonormal constraints that R must satisfy. Faugeras (Faugeras, 1993a) posed the problem as a constrained least-squares problem. The third normality constraint in equation 5 is imposed during the minimization to solve for the scale factor and to constrain the rotation matrix.

4 Camera transformation matrix from line correspondences Given correspondences between a set of 3D lines and their observed 2D images, we can set up a system of linear equations that involve R, T , and the coecients for 3D and 2D lines as follows. Let a 3D line L in the object frame be parametrically represented as

L : X = N + P where X = (x y z)t is a generic point on the line,  is a scalar, N = (A; B; C )t is the direction cosine, and P = (Px Py Pz )t is a point on the line. Let the corresponding 2D line l on the image plane be represented by

l : au + bv + c = 0 Ideally the 3D line must lie on the projection plane formed by the center of perspectivity and the 2D image line as shown in Figure 1. Relative to the camera frame, the equation of the projection plane can be derived from the 2D line equation as

afxc + bfyc + czc = 0 where f is the focal length. Since the 3D line lies on the projection plane, the plane normal must be perpendicular to the line. Denote the plane normal by n = (af; bf; c)t ; then given an ideal projection, we have nt RN = 0 (10) Similarly, since point P is also located on the projection plane, this leads to 8

Image Plane

n N Yc

P

Camera Frame Zc

projection plane

Xc

(R, T) Z Object frame

X

Y

Figure 1: Projection plane formed by a 2D image line l and the corresponding 3D line L.

nt (RP + T ) = 0

(11)

Equations 10 and 11 are hereafter referred to as coplanarity equations. Equivalently, they can be rewritten as

Aar11 + Bar12 + Car13 + Abr21 + Abr22 + Cbr23 + Acr31 + Bcr32 + Ccr33 = 0 (12) Px ar11 + Py ar12 + Pz ar13 + Px br21 + Py br22 + Pz br23 + Px cr31 + Py cr32 + Pz cr33 + atx + bty + ctz = 0 (13)

Given a set of J line correspondences, we can set up a system of linear equations similar to those for points that involve matrix H and vector V , where V is as de ned before and H is de ned as follows

9

1 0 A B C A B C A B C 0 0 0 1 a1 1 a1 1 a1 1 b1 1 b1 1 b1 1 c1 1 c1 1 c1 C BB BB Px a1 Py a1 Pz a1 Px b1 Py b1 Pz b1 Px b1 Py c1 Pz c1 a1 b1 c1 CCC CC (14) H 2J 12 = B BB ... BB A a B a C a A b B b C b A c B c C c 0 0 0 CCC A @ JJ JJ JJ JJ JJ JJ JJ JJ JJ 1

1

1

1

1

1

1

1

1

PxJ aJ PyJ aJ PzJ aJ PxJ bJ PyJ bJ PzJ bJ PxJ cJ PyJ cJ PzJ cJ aJ bJ cJ

and is called the coplanarity matrix. Again we can solve for V by minimizing the sum of residual errors jjHV jj . V can be solved for up to a scale factor. The scale factor can be determined by imposing one of the orthonormality constraints. The camera transformation matrix can be solved with at least 6 lines if the elements of R are assumed to be independent. 2

5 Camera transformation matrix from ellipse-circle correspondences 5.1 Camera pose from ellipse-circle correspondence Given the image of a circle in 3D space and the corresponding ellipse in the image, its pose relative to the camera frame can be solved for analytically. Solutions to this problem may be found in Haralick(Haralick and Shapiro, 1993), Forsyth(Forsyth et al., 1991), and Dhome(Dhome and et al, 1989). Here we propose an algebraic solution as detailed below. Assume we observe an ellipse in the image, resulting from the perspective projection of a 3D circle. Relative to the image frame Ci, the ellipse may be represented by a standard conic function Au + Buv + Cv + Du + Ev + F = 0 (15) 2

2

where the coecients A, B, C, D, E, F can be estimated from a least squares t. A point in camera frame (xc; yc; zc) relates to its projection (u; v) in the image via c fy u = fx zc v = z Substituting u and v into equation 15 yields (16) Axc + Bxcyc + Cyc + Df xczc + Ef yczc + fF zc = 0 c

c

2

2

2

2

10

The above equation de nes the equation of a cone formed by the center of perspectivity and the ellipse in image as shown in Figure 2. Image Plane

Cone

Yc

image ellipse

Camera Frame

Z Object frame

Zc

X

Y

Xc

Figure 2: The 3D cone formed by a 2D ellipse and the corresponding 3D circle The sought 3D circle must be the cross-section of the cone when cut with a plane. Without loss of generality, assume the plane equation, relative to the camera frame, can be represented as zc = xc + yc + . Substituting the plane equation into the cone equation 16 and collecting like terms yield the equation for the cross section of the cone and the plane F )x + (C + E + F )y + (B + D + E + 2 F )x y + (A + D + c c f f c f f c f f f (17) ( D + 2F )x + ( E + 2F )y + F = 0 f f f f f Given above equation, the pose of the circle plane is determined by exploiting the property, unique to a circle, that the space curve relative to the camera frame must have equal coef cients for the xc and yc terms and zero coecient for the xcyc term. This property yields the following two simultaneous equations A + D + F = C + E + F B + D + E + 2 F = 0 f f f f f f f Reorganizing the above two equations gives rise to Ef ) = (C ? A)f + (D ? E )f ( + Df ) ? ( + 2F 2F F 4F 2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

11

2

( + Df )( + Ef ) = f (DE ? 2FB ) (18) 2F 2F 4F Let v = + DfF and w = + EfF , we can easily solve for v and w from the above equations. Given v and w, the plane parameters and can be derived. There are two unique solutions for and . With and known, we can substitute them into equation 17 , yielding xc + yc + qxc + lyc + s = 0 (19) 2

2

2

2

2

2

where

+ Df q = Af2F + Df + F + Ef l = Af2F + Df + F F s = Af + Df + F Given the radius of the circle, we can use it to solve for the third plane parameter and subsequently for q and l. Substituting the solutions to q, l and r into equation 19 allows us to solve for the x and y coordinates of the circle center in the camera frame. The z coordinate of the circle center can then be solved for from the plane equation. 2

2

2

2

2

2

2

5.2 Camera transformation matrix from circles The pose of a circle, as derived in the last section, is relative to the camera frame. If we are also given the pose of the circle in the object frame, then we can use the two poses to solve for R and T . Speci cally, let Nc = (Nc Nc Nc )t and Oc = (Oc Oc Oc )t be the 3D circle normal and center in camera coordinates respectively. Also, let No = (No No No )t and Oo = (Oo Oo Oo )t be the normal and center of the same circle, but in the object coordinate system. Nc and Oc can be obtained from the pose-from-circle technique as described in section 5.1. No and Oo are assumed to be known. The problem is to determine R and T from the correspondence between Nc and No, and between Oc and Oo. The two normals and the two centers are related by the transformation R and T as shown below Nc = RN 0 o 10 1 BB r r r CC B No C (20) BB r r r CCC BB@ No CCA = B @ A No r r r x

z

y

x

y

z

x

x

y

z

11

12

13

x

21

22

23

y

31

32

33

z

12

y

z

and

Oc = R 0 rOo +rT r 1 0 O 1 0 t 1 CC BB o CC BB x CC BB = B @ r r r CA B@ Oo CA + B@ ty CA tz Oo r r r Equivalently, we can rewrite equations 20 and 21 as follows 11

12

13

x

21

22

23

y

31

32

33

z

No r + N o r + No r No r + N o r + No r No r + N o r + No r x

11

y

12

z

13

x

21

y

22

z

23

= Nc = Nc

x

31

y

32

z

33

= Nc

(21)

x

y

z

and

Oo r + Oo r + Oo r + tx = Oc Oo r + Oo r + Oo r + ty = Oc Oo r + Oo r + Oo r + tz = Oc x

11

y

12

z

13

x

x

21

y

22

z

23

y

x

31

y

32

z

33

z

Given I observed ellipses and their corresponding space circles, we can set up a system of linear equations to solve for R and T by minimizing the sum of residual errors jjQX ? kjj , where Q and k are de ned as follows 2

Q

=

0 B B B B B B B B B B B B B B B B B B B B @

N1o

x

N1o

y

N1o

0

0

0

0

0

0

O1o

x

O1o

y

z

O1o

0

0

0

0

0

0

0

0

N1o

z

0

N1o

x

y

z

0

0

0

0

0

O1o

x

y

0

0

0

0

x

0

O 1o

N1o

y

0

x

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

1

0

0

0

1

N1o

0

0

z

0

0

N 1o

O1o

0

0

0

N1o

0

O1o

0

O1o

z

O1o

y

z

. . . NI

ox

NI

oy

NI

oz

0

0

0

0

0

0

OI

ox

OI

oy

OI

oz

0

0

0

0

0

0

NI

NI

ox

oy

NI

0

0

0

0

0

OI

ox

oy

0

0

1cx

N N O 1c y

1cz

1cx

ox

0

oz

0

OIo

O O

1cz

NI

NI

oy

oz

0

0

OI

1cy

0

NI

and

k = (N

0

0

oz

0

OI

0

0

0

x

0

OIo

OIo

y

z

1 CC CC CC CC CC CC CC CC CC CC A

(22)

: : : NI NI NI OI OI OI )t cx

cy

cz

cx

cy

cz

(23)

Since each circle provides 6 equations, a minimum of two circles are therefore needed to solve the 12 unknown coecients. 13

6 The Integrated Technique In the previous sections we have outlined the least-squares frameworks for computing transformation matrix from di erent features. It is desirable to be able to compute camera parameters using more than one type of feature simultaneously. In other words, given observed geometric entities at di erent levels, we want to develop a mechanism that systematically and consistently fuses this information. The reason is quite obvious: using all available geometric information will provide a more accurate and robust solution, since it increases the relative redundancy of the least-squares estimation. It also reduces the dependency on points. We can be more selective when choosing points, without worrying about the minimum number of points needed for accurate results. Furthermore, we do not need to worry about whether the selected points are coplanar or not. To our knowledge, this problem has never been addressed in the literature. This section is devoted to formulating a direct solution for computing camera parameters from 2D-3D correspondences of points, lines, and ellipses/circles.

6.1 Fusing all observed information The problem of integrating information from points, lines, and circles is actually straightforward, given the frameworks we have outlined individually for points, lines, and circles. The problem can be stated as follows Given the 2D-3D correspondences of K points, J lines, and I ellipse/circle pairs, we want to set up a system of linear equations that involves all geometric entities. The problem can be formulated as a least-squares estimation in the form of minimizing jjWV ? bjj, where V is the unknown vector of transformation parameters as de ned before, and b is a known vector de ned below. W is an augmented coecient matrix, whose rows consist of linear equations derived from points, lines, and circles. Speci cally, given the M, H, and Q matrices de ned in equations 7, 14, and 22, the W matrix is0 1 BB M CC (24) BB H CCC W =B @ A Q 14

where the rst 2K rows of W represent contributions from points, the second subsequent 2J rows represent contributions from lines, and the last 6I rows represent contributions from circles. The vector b is de ned as b

=

?

0

0

:::

0

N 1c

x

N1c

y

N1c

z

O1c

x

O1c

y

O1c

z

:::

NKc

x

NKc

y

NKc

z

OKc

x

OKc

y

OKc



t

z

Given W and b, the least-squares solution for V is

V = (W tW )? W tb 1

(26)

It can be seen that to have an overdetermined system of linear equations, we need 2K +2J + 6I  12 observed geometric entities. This may occur with any combination of points, lines, and circles. For example, one point, one line, and one circle or two points and one circle are sucient to solve the transformation matrix from equation 26. Any additional points or lines or circles will improve the robustness and the precision of the estimated parameters.

6.2 Approximately imposing orthonormal constraints The least-squares solution to V described in the last section cannot guarantee the orthonormality of the resultant rotation matrix. One major reason why previous linear methods are very susceptible to noise is because the orthonormality constraints are not enforced or enforced weakly. To ensure this, the six orthonormal constraints must be imposed on V within the least-squares framework. The conventional way of imposing the constraints is through the use of the Lagrange multiplier. However, simultaneously imposing any two of the six constraints using the Lagrange multiplier requires a non-linear solution for the problem. Therefore, most linear methods choose to use a single orthonormal constraint. For example, Faugeras (Faugeras, 1993b) imposed the constraint that the norm of the last row vector of R be unity. This constraint, however, cannot ensure complete satisfaction of all orthonormal constraints. To impose more than one orthonormality constraint but still retain a linear solution, Liu et al (Liu et al., 1990) suggested the constraint that the sum of the squares of the three row vectors be 3. This suggested constraint, however, cannot guarantee the normal of each individual row vector to be one. Haralick et al (Haralick et al., 1989) and Horn (Horn, 1987) proposed direct solutions, where all orthonormality constraints are imposed, but for the 3D to 3D absolute orientation problem. They are only applicable while using 15

(25)

point correspondence and are not applicable to line and circle-ellipse correspondence. Most important, their techniques cannot be applied to the general liner framework we propose. Given the linear framework, even imposing one constraint can render the solution nonlinear. It is well-known that the solution to minimizing jjAx ? bjj subject to s(x) = 0 can only be achieved by non-linear method. In statistics, this kind of problem is called trust region subproblem. Existing solutions to trust region problems are all non-linear. We now introduce a simple yet e ective method for approximately imposing the orthonormal constraints in a way that o ers a linear solution. We want to emphasize that the technique we are about to introduce cannot guarantee a perfect rotation matrix. However our experimental study proves that it yields a matrix that is closer to a rotation matrix than those obtained using the competing methods. The advantage of our technique is that the constraint is globally imposed on each entry of the rotation matrix rather than locally, and that, asymptotically, the resulting matrix should converge to a rotation matrix. The technique requires knowledge of attributes of a geometric entity in the object frame and the camera frame. We are given a geometric attribute Xn of a 3D entity in a Euclidean 3-space and the corresponding geometric attribute Yn in another Euclidean 3-space and the knowledge that Xn and Yn are related through a rigid body transformation, i.e., Yn is mapped to Xn by a function of R.

Yn = f (R; Xn)

(27)

where f represents a known function. The pose estimation problem is to infer R from a set of Xn and Yn, where n = 1; : : : ; N . The problem can be formulated as a least-squares method, i.e., nd R that minimizes the sum of residual errors E = PNn (Yn ? f (R; Xn)) . This generates a set of linear equations that involve R as well as Xn and Yn, just like the coecient matrices M , H , and V described in the previous sections. Since R must be orthonormal, that is, Rt = R? , we can also express the relationship between Xn and Yn in the following way 2 1

=1

2

1

Xn = g(Rt; Yn) 16

(28)

where g represents a known function. Similarly, to solve for R, we can set up a di erent system of linear equations that Rt must satisfy; and the solution of the linear system minimizes E = PNn jjXn ? g(Rt; Yn)jj. The resultant matrix that minimizes both E and E is close to being orthonormal. The closeness of the resultant rotation matrix to orthonormality largely depends on N and the noise present in the data as to be demonstrated in our experiments. Examples are now given to demonstrate how the technique works. Let Xn and Yn represent the orientation of a vector (e.g., the surface normal or the direction cosine of a line) in the camera frame and the object frame respectively. They are related via 2 2

2 1

=1

Yn = RXn

2 2

(29)

Given N orientations, we may set up a system of linear equations for which the solution minimizes e = PNn (Yn ? RXn) . Similarly, we can express the relations between Xn and Yn by a di erent function that uses the fact that Rt = R? as follows 2 1

2

=1

1

Xn = Rt Yn

(30)

Another set of linear equations may be obtained whose solution minimizes e = PNn (Xn ? Rt Yn) . The resultant matrix that minimizes both e and e is close to being orthonormal. We now address the problem of how to impose the orthonormality constraints in the general framework of nding pose from multiple geometric features described in sections 3, 4, and 5. Given the pose of circles relative to the camera frame and the object frame, let Nc = (Nc Nc Nc )t and No = (No No No )t be the 3D circle normals in camera and object frames respectively. Equation 20 depicts the relation between two normals that involves R. The relation can also be expressed in an alternative way that involves Rt as follows 2

2 1

x

y

x

z

2 2

2 2

z

y

t No = R 0 Nc BB r r BB r r = B @ r r 11

21

12

22

13

23

10 N r C CC BB c r C CA B@ Nc Nc r 31

x

32

y

33

z

1 CC CA

Equivalently, we can rewrite equation 31 as follows

Nc r + N c r + N c r x

11

=1

y

21

z

17

31

= No

x

(31)

Nc r + N c r + N c r Nc r + N c r + N c r x

12

y

22

z

32

x

13

y

23

z

33

= No = No

y

z

Given the same set of I observed ellipses and their corresponding space circles, we can set up another system of linear equations that uses the same set of circles as in Q. Let Q0 be the coecient matrix that contains the coecients of the set of linear equations, then Q0 is

V

0

=

0 B B B B B B B B @

N1c

x

0 0

0

0

N 1c

x

0

N1c

0

0

N1c

x

0

y

0

y

0

z

0

N 1c

0

0

N 1c

0

N 1c

y

N 1c

z

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

N1c

z

. . . NI

cx

0 0

0 NI

cx

0

0

0

NI

cy

0

0

NIc

x

0 0

NI

cy

0

0

NI

cz

0

N Ic

y

0 NI

cz

0

0

NIc

z

1 CC CC CC CC A

(32)

Correspondingly, we have k0 de ned as k0 = (N N N 0 0 0 : : : NI NI NI 0 0 0)t 1ox

1oy

1oz

ox

oy

(33)

oz

To implement the constraint in the least-squares framework, we can augment matrix W in equation (24) with Q0 , yielding W 0, and augment vector b in equation (25) with k0, yielding b0 , where W 0 and b0 are de ned as follows 0 1 0 1 W b W0 = B @ 0 CA b0 = B@ 0 CA V k Putting it all together, the solution to V can be found by minimizing jjW 0V ? b0 jj , given by V = (W 0t W 0)? W 0t b0 (34) 2

1

The resultant transformation parameters R and T are more accurate and robust due to fusing information from di erent sources. The resultant rotation matrix R is also very close to being orthonormal since the orthonormality constraints have been implicitly added to the system of linear equations used in the least-squares estimation. The obtained R and T can be used directly for certain applications or fed to an iterative procedure to re ne the solution. Since the obtained transformation parameters are accurate, the subsequent iterative procedure can converge very quickly, usually after a couple of iterations as evidenced by our experiments. In the section to follow, we study the performance of the new linear pose estimation method against the methods that only use one type of geometric entity at a time, using synthetic and real images. 18

7 Experiments In this section, we present and discuss the results of a series of experiments aimed at characterizing the performance of the integrated linear pose estimation technique. Using both synthetic data and real images of industrial parts, the experiments conducted are to study the e ectiveness of the proposed technique for approximately imposing the orthonormal constraints and to quantitatively evaluate the performance of the integrated linear technique against the existing linear techniques.

7.1 Performance characterization via covariance propagation Development of a new technique requires understanding of the technique's sensitivity and robustness, i.e., how input data perturbation a ects the output quantity. This may be accomplished via error propagation. In this section, we introduce a technique we use to characterize the uncertainty associated with the estimated vector V . The technique analytically studies the uncertainty with V due to uncertainties with 2D observed geometric entities such as image points, lines, and ellipses. The uncertainty of V , as characterized by its covariance matrix, may be used as a performance measure of the integrated technique. Based on the theory of error propagation, the technique assumes the perturbation on the input data be additive and small random noise so that the system can be approximated by the rst order Taylor series expansion. For a comprehensive treatment on this topic, readers may refer to Haralick work (Haralick, 1994) Let U be a 2K  1 vector obtained by concatenating K 2D image points Ui = (ui vi)t , where i = 1; : : : ; K . Let L = (l : : : lJ ) be 3J  1 vector representing all 2D lines used, where li = (ai bi ci)t . Finally, let E = (ei e : : : eI )t be a 6I  1 vector for all involved 2D ellipses, where ei is a 6  1 vector containing the parameters of the ith ellipse. Assume the random perturbation on U be iid Gaussian distribution with a covariance matrix U . The random perturbations on L and ellipses E may be obtained by assuming the underlying image points of iid Gaussian distribution. Denote them by L and SigmaE respectively. 1

2

19

From eq(34), we have

V = (W 0t W 0)? W 0tb0 1

V is therefore a function of U , L, and E . Let V = f (U; L; E ), then to the rst order approximation, we have

@f U + @f L + @f E V = @U @L @E @f @f @f , @L , and @E may be evaluated as follows where derivatives like @U

(35)

@f = @ (W 0t W 0)? W 0tb0 + (W 0t W 0)? @W 0t + (W 0t W 0)? W 0t @b0 @U @U @U @U @f = @ (W 0t W 0)? W 0tb0 + (W 0t W 0)? @W 0t + (W 0t W 0)? W 0t @b0 @L @L @L @L 0 t 0 ? 0 t @f = @ (W W ) W 0tb0 + (W 0t W 0)? @W + (W 0t W 0)? W 0t @b0 @E @E @E @E 0 0? where @ W @UW can be derived using (W 0W 0t )? (W 0W 0t) = I , leading to 1

1

1

t

(

) 1

1

1

1

1

1

1

(36)

1

@ (W 0t W 0)? = ?(W 0tW 0)? @ (W 0t W 0) (W 0tW 0) @U @U 1

1

and

@ (W 0t W 0) = 2 @W 0t W 0 @U @U 0 0? 0 @ W0 W0 ? @W 0 t and @ W @EW . where @W @U = ( @U ) In the similar fashion, we can compute @L @f @f @f Given @U , @L , and @L , we may compute V , the covariance matrix of the estimated vector V , by t

(

t

) 1

V = E (V (V )t ) @f  ( @f )t + @f  ( @f )t + @f  ( @f )t = @U U @U @L L @L @E E @E assuming U , L, and E are independent of each other. 20

(

t

) 1

(37) (38)

7.2 Performance study with synthetic data This section consists of two parts. First, we present results from a large number of controlled experiments aimed at analyzing the e ectiveness of our technique for imposing orthonormal constraints. This was accomplished by comparing the errors of the estimated rotation and translation vectors obtained with and without orthonormal constraints imposed under different conditions. Second, we discuss the results from a comparative performance study of the integrated linear technique against an existing liner technique under di erent noisy conditions. In the experiments with simulation data, the 3D data (3D point coordinates, 3D surface normals, 3D line direction cosines) are generated randomly within speci ed ranges. For example, 3D coordinates are randomly generated within the cube [(-5, -5, -5) to (5,5,5)]. 2D data are generated by projecting the 3D data onto the image plane, followed by perturbing the projected image data with iid Gaussian distributed noise of mean 0 and standard deviation . From the generated 2D-3D data, we estimate the rotation and the translation vector using the linear algorithm, from which we can compute the estimation errors. The estimation error is de ned as the Euclidean distance between the estimated rotation (translation) vector and the ideal rotation (translation) vector. We choose the rotation matrix (vector) rather than other speci c representations like Euler angles and quaternion parameters for error analysis. This is because all other representations depend on the estimated rotation vector. For each experiment, 100 trials are performed and the average distance errors are computed. The noise level is quanti ed using signal to noise ratio (SNR). SNR is de ned as 20log d , where  is the standard deviation of the Gaussian noise and d is the range of the quantity being perturbed. Figures 3 and 4 plot the mean rotation and translation errors as a function of the signal to noise ratio, with and without orthonormal constraints imposed. It is clear from the two gures that imposing the orthonormal constraints improves the estimation errors for both the rotation and translation vectors. The improvement is especially signi cant when the SNR is low. To further study the e ectiveness of the technique for imposing constraints, we studied 21

its performance under di erent numbers of pairs of ellipse/circle correspondences. This experiment is intended to study the ecacy of imposing orthonormal constraints versus the amount of geometric data used for the least-squares estimation. The results are plotted in Figures 5 and 6, which give the average rotation and translation errors as a function of the number of ellipse/circle pairs used, with and without constraints imposed. The two gures again show that imposing orthonormal constraints leads to an improvement in estimation errors. This improvement, however, begin to taper o when the amount data used exceeds a certain threshold. The technique is most e ective when fewer ellipse/circle pairs are used. This echos the conclusion drawn from the previous two gures. To compare the integrated linear technique with an existing linear technique, we studied its performance against that of Faugeras (Faugeras, 1993b). The results are given in Figures 7 and 8, which plot the mean rotation and translation vector errors as a function of the SNR respectively. The two gures clearly show the superiority of the new integrated technique over Faugeras' linear technique, especially for the translation errors. To further compare the sensitivity of the two techniques to viewing parameters, we changed the position parameters of the camera by increasing z. Figures 9 and 10 plot the mean rotation and translation vector errors as a function of SNR respectively under the new camera position. While increasing z causes an increase in the estimation errors for both techniques, its impact on Faugeras' technique is more serious. This leads to a much more noticeable performance di erence between the two linear techniques. The fact that the integrated technique using only ve geometric entities ( 1 line, 1 point, and 3 circles) still outperforms Faugeras' technique, which uses 10 points, shows that the higher-level geometric features such as lines and circles can provide more robust solutions than those provided solely by points. This demonstrates the power of combining features on di erent levels of abstraction. Our study also shows that Faugeras' linear technique is very sensitive to noise when the number of points used is close to the required minimum. For example, when only six points are used, a small perturbation of the input data can cause signi cant errors on the estimated parameters, especially the translation vector. Figures 9 and 10 reveal that the technique using only points is numerically unstable to viewing parameters. 22

7.3 Performance characterization with real images This section presents results obtained using real images of industrial parts. The images contain linear features (points and lines) and non-linear features (circles). This phase of the experiments consist of two parts. First, we visually assess the performance of the linear least-squares framework using di erent combinations of geometric entities, such as one circle and six points; one circle and two points; and one circle, one point and two lines. The performance of the proposed technique is judged by visual inspection of the alignment between the image of a part and the reprojected outline of the part using the estimated transformation matrix. Second, the technique is compared against existing methods that use only one type of geometric entity as well as against the Gauss-Newton iterative method. The closeness between the solutions from the linear method and from the iterative method, as represented by the residual errors, as well as the number of iterations required, are used as measures to indicate the goodness of the solution obtained using the new method. To test the integrated technique, we performed the following experiments. The pose transformation (R T ) was analytically computed as described in Section 6 using di erent combinations of points, lines, and circles. Figures 11, 12, and 13 illustrate the results obtained using the following combinations of features: two points plus one circle; one point, two lines, and one circle; and six points plus one circle, respectively. Visual inspection of the gures reveals that results obtained from the three con gurations are all good enough to serve as an initial guess to an iterative procedure. It is also evident from Figures 11, 12, and 13 that the result using six points and one circle is superior to the ones obtained using the other two con gurations. The signi cance of these sample results is as follows. First, they demonstrate the feasibility of the proposed framework applied to real image data. Second, they show that using multiple geometric primitives simultaneously to compute the pose reduces the dependency on points. One can be more selective when choosing which point correspondence to use in pose estimation. This can potentially improve the robustness of the estimation procedure since points are more susceptible to noise than lines and circles. Third, the use of more than the minimum required number of geometric features provides redundancy to the least-squares estimation, therefore improving the accuracy of the solution, 23

as evidenced by the progressively improved results as the number of linear equations increase. In order to compare the results with those of other existing techniques, we computed the pose of the same object using the same six points and the same circle, separately. The result for the pose computation using a linear technique (Ji and Costa, 1997) (similar to that of Faugeras(Faugeras, 1993a)) with six points is given in Figure 14. The algorithms of Dhome (Dhome and et al, 1989) and Forsyth (Forsyth et al., 1991) for the pose-from-circle computation were augmented in (Costa, 1997) to handle non-rotationally symmetric objects. The results of this augmented algorithm using the single ellipse/circle correspondence is shown in Figure 15. Notice that due to the localized concentration of detectable feature points and the physical distance between the circle and these points, the poses computed align well only in the areas where the features used are located. Speci cally, the result in Figure 15 shows a good alignment in the upper portion of the object where the circle is located and a poor alignment in the lower part (as indicated by the arrow). On the other hand, the result in Figure 14 shows good alignment only at the lower part of the object where the concentration of detectable feature points is located and a poor alignment on the upper part of the object (as indicated by the arrow). Visual inspection of the results in Figures 13, 14, and 15 shows the superiority of the new technique over the existing methods. The model reprojection using the transformation matrix obtained using the new technique yields a better alignment than those using only points or only ellipse/circle correspondences. To compare the performance quantitatively, we compare the transformation matrices obtained using the three methods against the one obtained from the iterative procedure. Table 1 shows the numerical results for the transformations obtained from using only points, only the circle, and a combination of points and circle. The results from each method were then used as the initial guess to the iterative Gauss-Newton method. The nal transformation obtained after the convergence of the iterative method is shown in the last row of Table 1. These nal results are the same regardless of which initial guess was used.

24

Table 1. Pose transformations from di erent methods Methods R T 3 2 66 0:410 ?0:129 ?0:902 77 66 0:606 ?0:700 0:376 77 [?43:125 ? 25:511 1232:036] point only 75 64 2 ?0:681 ?0:701 ?0:208 3 66 0:302 0:302 ?0:932 77 66 0:692 ?0:628 0:355 77 [?35:161 ? 15:358 1195:293] circle only 64 75 2 ?0:655 ?0:753 ?0:054 3 66 0:398 ?0:142 ?0:902 77 points and circle 666 0:554 ?0:667 0:336 777 [?43:077 ? 26:400 1217:855] 4 5 ? 0 : 700 ? 0 : 684 ? 0 : 201 3 2 66 0:341 ?0:156 ?0:927 77 66 0:631 ?0:693 0:349 77 [?43:23 ? 28:254 1273:07] non-linear 75 64 ?0:697 ?0:704 ?0:137 Table 2 summarizes the number of iterations required for the iterative procedure to converge using as initial guesses the results from the three linear methods mentioned in Table 1. Table 2. Performance of di erent methods Method Number of iterations Points only 4 Circle only 6 Points and circle 1 It is evident from Tables 1 and 2 that the new technique yields a transformation matrix that is closer to the one obtained from the iterative procedure and therefore requires fewer iterations (one here) for the iterative method to converge. By contrast, the nal results for initial guesses obtained using only points and only one circle require 4 and 6 iterations, respectively, for the iterative procedure to converge. The result from the quantitative study echoes the conclusion from visual inspection: the new technique o ers better estimation accuracy. 25

To further validate our technique, we tested it on over fty real images with similar results. Figures 17 and 18 illustrate the results obtained using six points and one circle for di erent images and di erent objects. Figure 19 give results of applying the integrated technique to di erent industrial parts with di erent combinations of geometric features. Our experiments also reveal, as was theoretically expected for a system of linear equations of the form AX = b, a decay in robustness when the number of equations in the linear system reaches the minimum required for a solution to be found. The premise is that one should make use of as many available features as possible in order to improve accuracy and robustness. Our technique allows for that.

8 Conclusions In this paper, we present a linear solution to the exterior camera parameters or pose estimation. The main contributions of this research are the linear framework for fusing information available from di erent geometric entities and for introducing a novel technique that approximately imposes the orthonormality constraints on the rotation matrix sought. Experimental evaluation using both synthetic data and real images show the e ectiveness of our technique for imposing orthonormal constraints in improving estimation errors. The technique is especially e ective when the SNR is low or fewer geometric entities are used. The performance study also revealed superiority of the integrated technique to a competing linear technique in terms of robustness and accuracy. Experiments are still underway to further evaluate the eciency of the technique using a larger data set . The new technique proposed in this paper is ideal for applications such as industrial automation where robustness, accuracy, computational eciency, and speed are needed. Its results can also be used as initial estimates in certain applications where more accurate camera parameters are needed. 1

References Chen, S.-Y. and Tsai, W.-H. (1990). Systematic approach to analytic determination of 1

check out http://george.ee.washington.edu/ qiangji/Research for more experimental results

26

camera parameters by line features. Pattern Recognition, 23(8):859{897. Costa, M. S. (1997). Object recognition and pose estimation using appearance-based features and relational indexing. Ph.D. Dissertation, University of Washington, Seattle. Dhome, M. D. and et al (1989). Spatial localization of modeled objects of revolution in monocular perspective vision. In First European Conference on Computer Vision. Echigo, T. (1990). Camera calibration technique using three sets of parallel lines. Machine Vision and Applications, 3(3):159{167. Faugeras, O. (1993a). Three-Dimensional Computer Vision. The MIT Press, Cambirdge, Massachusetts. Faugeras, O. D. (1993b). Three Dimensional Computer Vision: A Geometric Viewpoint. MIT Press. Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: A paradigm for model tting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381{395. Forsyth, D., Mundy, J. L., Zisserman, A., and et al (1991). Invariant descriptors for 3d object recognition and pose. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(10):971{991. Haralick, R. M. (1994). Propagating covariance in computer vision. Proc. of 12th IAPR, pages 493{498. Haralick, R. M., Joo, H., Lee, C., Zhang, X., Vaidya, V., and Kim, M. (1989). Pose estimation from corresponding point data. IEEE Tran. on Systems, Man, and Cybernetics, 19(6):1426{1446. Haralick, R. M., Lee, C., Ottenberg, K., and Nolle, M. (1994). Review and analysis of solutions of the three point perspective pose estimation. International Journal of Computer Vision, 13(3):331{356. 27

Haralick, R. M. and Shapiro, L. G. (1993). Computer and Robot Vision, volume 2. AddisonWesley Publishing Company. Holt, R. J. and Netravali, A. N. (1991). Camera calibration problem: some new results. Computer Vision, Graphics, and Image Processing, 54(3):368{383. Horn, B. K. P. (1987). Closed-form solution of absolute orientation using quternions. J. of Optical Society Am. A., 4:629{642. Hung, Y., Yeh, P. S., and Harwood, D. (1985). Passive ranging to known planar point sets. Proceedings of IEEE Int. Conf. on Robotics and Automation, pages 80{85. Ji, Q. and Costa, M. S. (1997). New linear techniques for pose estimation using point correspondences. Intelligent Systems Lab Technical Report #ISL-04-97. Liu, Y., Huang, T., and Faugeras, O. D. (1990). Determination of camera locations from 2d to 3d line and point correspondence. IEEE Trans. on Pattern Analysis and Machine Intelligence, 12(1):28{37. Phong, T. Q., Horward, R., Yassine, A., and Tao, P. (1995). Object pose from 2-d to 3d point and line correspondences. Int. Journal of computer vision, 15:225{243. Rothwell, C. A., Zisserman, A., Marinos, C. I., Forsyth, D., and Mundy, J. L. (1992). Relative motion and pose from arbitrary plane curves. Image and Vision Computing, 10(4):251{262. Song, D. M. (1993). Conics-based stereo, motion estimation, and pose determination. International journal of computer vision, 10(1):7{25. Sutherland, I. E. (1974). Three-dimensional data input by tablet. Proceedings of the IEEE, 62(4):453{461. Tsai, R. (1987). A versatile camera calibration technique for high-accuracy 3d machine vision metrology using o -the-shelf tv cameras and lens. IEEE Journal of Robotics and Automation, 3:223{244. 28

Wang, X. and Xu, G. (1996). Camera parameters estimation and evaluation in active vision system. Pattern Recognition, 29(3):439{447. 3 without constraints with constraints

mean rotation vector error

2.5

2

1.5

1

0.5

0 10

20

30

40

50 60 Signal to noise ratio (dB)

70

80

90

100

Figure 3: Average rotation vector error versus SNR. The plot was generated using the integrated linear technique with a combination of one point, one line and three circles. Each point represents an average of 100 trials.

29

100 without constraints with constraints

90

80

mean translation vector error

70

60

50

40

30

20

10

0 10

20

30

40

50 60 Signal to noise ratio (dB)

70

80

90

100

Figure 4: Average translation vector error versus SNR. The plot was generated using the integrated linear technique with a combination of one point, one line and three circles. Each point represents an average of 100 trials.

0.5 without constraints with constraints 0.45

mean rotation vector error

0.4

0.35

0.3

0.25

0.2

0.15

0.1

5

10

15 20 Number of ellipse/circle pairs

25

30

Figure 5: Mean rotation vector error versus the number of ellipse/circle pairs (SNR=35)

30

4 without constraints with constraints 3.5

mean translation vector error

3

2.5

2

1.5

1

0.5

0

5

10

15 20 Number of ellipse/circle pairs

25

30

Figure 6: Mean translation vector error versus the number of ellipse/circle pairs (SNR=35)

0.7 Faugeras algorithm (10 points) Integrated method (3 points, 1 line, 1 circle) 0.6

mean rotation vector error

0.5

0.4

0.3

0.2

0.1

0

0

10

20

30

40 50 Signal to noise ratio (dB)

60

70

80

90

Figure 7: Mean rotation vector error versus SNR. The curve for Faugeras' was obtained using 10 points while the curve for the integrated technique was generated using a combination of one point, one line, and three circles. 31

30 Faugeras algorithm (10 points) Integrated method (3 points, 1 line, 1 circle)

mean translation vector error

25

20

15

10

5

0

0

10

20

30

40 50 Signal to noise ratio (dB)

60

70

80

90

Figure 8: Mean translation vector error versus SNR. The curve for Faugeras' was obtained using 10 points while the curve for the integrated technique was generated using a combination of one point, one line, and three circles. 1.4 Faugeras algorithm (10 points) Integrated method (3 points, 1 line, 1 circle) 1.2

mean rotation vector error

1

0.8

0.6

0.4

0.2

0

0

10

20

30

40 50 Signal to noise ratio (dB)

60

70

80

90

Figure 9: Mean rotation vector error versus SNR with an increased camera position parameter z. The curve for Faugeras' was obtained using 10 points while the curve for the integrated technique was generated using a combination of one point, one line, and three circles. 32

60 Faugeras algorithm (10 points) Integrated method (3 points, 1 line, 1 circle)

mean translation vector error

50

40

30

20

10

0

0

10

20

30

40 50 Signal to noise ratio (dB)

60

70

80

90

Figure 10: Mean translation vector error versus SNR with an increased camera position parameter z. The curve for Faugeras' was generated using 10 points while the curve for the integrated technique was generated using a combination of one point, one line, and three circles.

Figure 11: The pose computed using two points and one circle with the integrated technique. 33

Figure 12: The pose computed using one point,two lines,and one circle with the integrated technique.

Figure 13: The pose computed using six points and one circle with the integrated technique. 34

Figure 14: The pose computed using six points alone. It shows good alignment only at the lower part of the object where the concentration of detectable feature points is located and a poor alignment on the upper part of the object (as indicated by the arrow).

Figure 15: The pose computed using a single circle. It shows a good alignment in the upper portion of the object where the circle is located and a poor alignment in the lower part (as indicated by the arrow). 35

Figure 16: The nal pose obtained from the Gauss-Newton iterative method after only one iteration using the solution in Figure 3 as the initial solution.

Figure 17: Results of the linear method using six points and one circle. 36

Figure 18: Results of the linear method using six points and one circle.

37

Figure 19: Results of the integrated method applied to di erent industrial parts.

38

Suggest Documents