An Invariant, Closed-form Solution for Matching Sets of 3D Lines Behzad Kamgar-Parsi Oce of Naval Research 800 N. Quincy St. Arlington, VA 22217
[email protected]
Abstract
Existing algorithms for nding the best match between two sets of 3D lines are not completely satisfactory in the sense that they either yield approximate solutions, or are iterative which means they may not converge to the globally optimal solution. An even more serious shortcoming of the existing algorithms is that they are all non-invariant with respect to the translation of the coordinate system. Thus, any best match found becomes rather meaningless. In this paper, we discuss the source of this non-invariance and present a new algorithm that is invariant to coordinate transforms. Moreover, the algorithm is closed-form which implies that it always yields the best global match.
1 Introduction
Matching geometric features (points, lines, surfaces, etc.) is a basic tool in computer vision that has applications in scene registration, object localization and recognition, pose estimation, motion estimation, and others. In this paper we discuss matching two sets of N corresponding 3D lines (matching 2D line sets is a special case). The line sets may be extracted from a model and an image, or from two images. We refer to the two sets of lines as the model, A, and the image, X . We want to nd the rigid transformation (translation and rotation) that gives the best match (t, alignment, registration) of X with A. Lines in models and images are typically line segments with nite lengths. However, under certain conditions we may not be able to obtain the line segment length. In cases where we are able to reliably detect the line segment length, we have a nite-length line segment which has as usable information the line location, direction, and length. Otherwise we have an innite line that has as usable data only the line location and direction. Hence in terms of line lengths, we may encounter the following three basic cases in line This work was supported by the O ce of Naval Research.
0-7695-2158-4/04 $20.00 (C) 2004 IEEE
Behrooz Kamgar-Parsi Navy Center for Applied Research in AI Naval Research Laboratory Washington, DC 20375
[email protected] matching: (1) lines in both A and X are nite (2) lines in A are nite, while lines in X are innite (3) lines in both A and X are innite. Mixed cases may be handled by combining these basic cases. For cases 1 and 2, provably convergent (and almost always optimal) algorithms have appeared in the literature 9]. Existing algorithms for Case 3 are not completely satisfactory. The shortcomings of these algorithms are that either they are approximate solutions, or are iterative which may not converge to the best global match, or cannot be guaranteed to converge. An even more serious shortcoming of the existing algorithms is that they are all non-invariant with respect to the translation of the coordinate system. This non-invariance obviously renders the best matches found by these algorithms rather meaningless. In this paper, we discuss the source of this non-invariance and present a new algorithm that is invariant to coordinate transforms. Moreover, the algorithm is closed-form which means it always yields the best global match. Here we cite existing algorithms. Faugeras and Hebert 5] proposed a methods that cannot be guaranteed to converge. Subsequent modications nd approximate solutions 13]. Daniilidis 4], and Walker, et al. 11], use a dual quaternion approach, which leads to a closed-form approximate solution. The authors 9] propose a provably convergent algorithm that appears to nd the best exact match. We will discuss these algorithms in more detail in Sec. 3. The algorithms by Besl and McKay 2] and Zhang 12] can also be tailored to solve case 3, by approximating lines in the image set with a number of points. Again, these are iterative algorithms that may not converge to the best match. Bartoli and Sturm 1] also propose algorithms with asymmetric cost functions dened with respect to matches in a 2D image, rather than matches in 3D as in this work and others cited above. In all these algorithms (as well as the algorithm we present in this paper) it is assumed that lines are
already detected and that corresponding lines in the model and image are hypothesized. We should note that detection of 3D lines is a nontrivial problem in early vision, which we do not address here, and that often a challenging aspect of computer vision problems is determining correspondences. Certain pairings of features in the image and model can be eliminated by geometric constraints, all others have to be examined based on the goodness of match 6]. Indeed, the matching algorithm we present here may be used to reliably and rapidly verify hypothesized correspondences. In matching lines we have to simultaneously determine the coordinate transform, as well as corresponding points on corresponding lines. That is, even after we hypothesize corresponding pairs of lines, we still have to nd corresponding points on each line pair. Therefore, this is a (N + 6)-variable non-linear optimization (6 variables for rotation and translation and N for corresponding points). The line matching algorithm we present here is built on a closed-form solution for matching sets of corresponding line segments with equal-lengths 7]8]. We briey present that solution, and then discuss the previous solutions and present the new method. The matching method we present in this paper, as well as most approaches in the literature, are based on regression. The evidence accumulation approach (e.g. the Hough transform method) has also been used for 3D line matching, e.g. see 3]. A related, and even more challenging, problem is matching projections of lines in an image to a 3D model or to lines in another image. See, for example, 1]10]. Proposed solutions for this problem are also unsatisfactory for reasons mentioned above. However, we will not discuss them in this paper.
2 Equal-Length Line Segments
The best match between two sets of 3D lines, where corresponding pairs have equal lengths can be calculated in closed-form. Here we include a brief description of the solution and refer to 7] or 8] for details. Consider two sets of (directed) line segments: the model set A = fAn g, and the image set X = fXn g, where n = 1 : : : N and An and Xn are a pair of corresponding line segments with the same lengths. We represent the line segment An with its center point an , the unit vector along its direction b^ n , and its length ln , i.e. An = (an b^ n ln ). Similarly Xn = (xn y^n ln ). The transformation T = (t ) that operates on X is given by: R
T xn = t +
R
xn
0-7695-2158-4/04 $20.00 (C) 2004 IEEE
T y^n = y^n R
(1)
where t and are the translation vector and rotation matrix. The distance measure, M (A T X ), between the two sets is dened as the sum of Euclidean distance squares of corresponding pairs: R
M (A T X ) =
PN
l k an ; t ; xn k2
n=1 n
R
(2)
+ln3 (1 ; b^ >n y^n )=6]: R
Vectors are column matrices, k an k is the length of an , and > denotes matrix transpose. The minimum distance measure, M ? =minT M (A T X ), is called the mismatch measure. The advantage of this distance measure is that the transformation minimizing it can be found in closed-form. First the rotation matrix is computed in closedform, using the quaternion representation, from the 33 cross-covariance matrix , R
S
=
X n
S
ln (an ; a~)(xn ; x~ )> + ln3 b^ n y^n> =12] (3)
and then translation t = a~ ; x~ is calculated. Here a~ =
X n
wn an x~ =
X n
R
wn xn wn = ln =
3 Previous Solutions
X n
ln :
In this section, we briey discuss existing algorithms and show why they are not invariant with respect to coordinate transforms. The outline of the approach to matching two set of N corresponding lines is the following. Suppose A = fAn g with An = (an b^ n ln ) is a line segment represented by one of its points an , its direction b^ n , and its length ln . Similarly, X = fXn g, where Xn = (xn y^n Ln ). Suppose that an and xn +sn y^n are a pair of corresponding points on An and Xn , where sn is a scalar parameter that we have to determine. The distance measure between these two lines, D(An Xn ), is thus the sum of the distances between all pairs of corresponding points, i.e.,
D(An Xn ) =
Z
n
du dist(an +ub^ n xn +sn y^n +uy^n)
where u is a scalar variable that parameterizes the lines, n is the overlap between An and Xn , and dist(a x) is the distance function between points a and x. The distance measure between the model and the transformed image is
M (A T X ) =
N X n=1
D(An T Xn ):
This denition for D(An Xn ) is meaningful for cases 1 and 2, because the overlap lengths n are nite. However, it needs to be modied for case 3 since it becomes innite for innite lines. Hence, we have to limit the overlap length. That is, we have to make a compromise between the importance of line location and line orientation. By choosing a longer overlap length, we would be favoring orientation, and vice versa. The best match between the two sets is obtained by minimizing M (A T X ) over all possible T and fsng. It can be proved that this is a nonlinear problem even for the most tractable dist-function, namely, Euclidean L2 norm.
3.1 Convergent Method (CM)
The authors 9] treat this case as matching a set of innitely long line segments and a set of nite line segments all with the same length l, because all lines must make the same contribution (l is arbitrarily long). We represent the model lines by An = (an b^ n ), where b^ n its direction and an is its closest point to the coordinate system origin, i.e. a>n b^ n = 0. The image line segments are represented by Xn = (xn y^n l), where l is its length, y^n its direction and xn is its closest point to the coordinate system origin, i.e. x>n y^n = 0. The corresponding point to an is xn + sn y^n , the midpoint of the virtual line segment of length l. The shift parameters sn are unknown and have to be determined. The distance measure between the two sets is
M (A T X )=PNn=1 l k an ; t ; (xn + sny^n ) k2 R
+l3(1 ; b^ >n Ry^n )=6]:
(4)
The distance measure used in the seminal paper of Faugeras and Hebert 5] has the following form: M (A T X )= K2 Mlocation + K1 Morientation where K1 and K2 are user-set positive coecients. Examination of the FH distance measure reveals that it can be cast in terms of the distance measure used in (4). The virtual length l in CM and the coecients in FH are related by l =(6K1=K2)1=2 . For minimizing the distance measure FH present an iterative algorithm that cannot be guaranteed to converge. The FH solution requires that the transformed point, zn = T (xn+sn y^n ), to be the closest point to the origin, which implies that z>n y^n =0. This constraint yields sn = ;t> y^n : (6) The constraint (5) and the FH constraint (6) become identical when a>n y^n = 0. This is satised exactly when (i) data has zero noise so that y^n = b^ n , and (ii) line correspondences are correct. For noisy data, as well as for cases with incorrect line correspondences where matching becomes meaningless, this condition is not satised. Therefore, in general, the FH constraint (6) does not lead to a convergent solution. In subsequent papers, Faugeras and co-workers present various algorithms that yield approximate solutions, e.g. 13]. R
sn = (an ; t)> y^n :
(5)
R
If fsn g are known then the corresponding portions of line segments in set A are specied and this problem reduces to the simpler problem of matching sets of corresponding line segments with equal length for which a closed-form solution exists. When sn is given, the point xn+sny^n on line segment Xn corresponds to the center of line segment An . Thus, if we replace xn with xn + sn y^n everywhere in Sec. 2, i.e. match line segments (xn + sn y^n y^n l) and An = (an b^ n l), then we may compute t and . Having calculated t and , then the values of fsng can be improved according to (5). The following convergent iterative algorithm nds the best match: 1. Initialize the set fsn g. 2. Replace xn by xn+sny^n and compute from (3), then compute and t. R
R
S
0-7695-2158-4/04 $20.00 (C) 2004 IEEE
3.2 Faugeras-Hebert Method (FH)
R
The solutions of @M=@sn =0 are
R
3. Update the set fsng according to (5). Go to step 2, and repeat until all sn converge. The proof that the algorithm converges is straightforward. For given values of fsng, the rotation and translation reduce the value of the distance measure (actually minimize the distance measure) and updating the values of fsng for the given rotation and translation further reduces the value of the distance measure (actually minimizes the distance measure). Since the distance measure is bounded from below, M (A X ) 0, then the algorithm must converge after a number of iterations. Extensive tests show that the algorithm almost always converges to the global minimum, no matter how it is initialized. Thus the algorithm appears to be able to nd the optimal solution.
R
R
3.3 Dual Quaternion Method (DQ)
Daniilidis 4] and Walker, et al. 11] propose an interesting solution based on dual quaternion representation of lines. This representation is implicitly equivalent to setting the overlap length to innity, which results in line orientations completely dominating line locations. This can be seen by the arbitrary parameter
(2)
(2)
(1)
(3)
(1)
(3)
Figure 1: An example of translating the coordinate system origin, which changes the best match. Solid lines are the model, dotted lines the image, and the origin is the black dot. X
A
x
2
x1 a o1 o
2
Figure 2: Moving the coordinate system origin may change the best match.
, which has the property > 0 but 2 = 0. Parameter 1= plays the role of line length. The algorithm yields
an approximate solution to the full matching problem, where couplings between rotation and translation are ignored. Because of decoupling of rotation and translation, the solution may be obtained in closed-form.
3.4 Best Matches Are Non-invariant
All previous solutions have a drawback in common: They are not invariant to the choice of the coordinate system origin. This is illustrated in Fig. 1. In this 2D example, the model consists of two vertical and one horizontal lines. The image is made by rotating 30 the model's right vertical line. As can be seen, changing the coordinate system origin changes the best match of the image to the model. Fig. 1 shows the result of the best match for CM. FH and DQ also behave similarly. The FH formulation is not invariant because it matches the distances of corresponding lines from the origin, which are not intrinsic properties of the line sets. A change of the origin changes the distances and hence the best match. The convergent formulation is not invariant because the selected line segments in the image are arbitrary. Changes in the selected line seg-
0-7695-2158-4/04 $20.00 (C) 2004 IEEE
ments change the best match. Note that when the model and image are identical these formulations become invariant to the choice of the origin. However, as the similarity of model and image degrades, the best match becomes more sensitive to the location of the origin. Figure 2 illustrates this problem. Suppose A and X are corresponding lines. Also suppose the gure shows the best match of X to A when o1 is the coordinate system origin. Now suppose that the origin is moved to o2 . Because the location mismatch k x1 ; a k increases to k x2 ; a k, these methods try to compensate for this increase by better alignment of the lines. This eect is seen clearly in line 1 in Fig. 1.
4 Invariant Method
The underlying cause of non-invariance of the best match in previous formulations is in the representation of lines. It is a well-known problem that a unique, satisfactory representation for a single line does not exist. A single line may be represented by its direction vector and the coordinates of an arbitrary point on the line (often chosen as the closest point to the origin for a minimal number of variables, i.e. 4 instead of 5). Alternative representations, e.g. Plucker, are based on two arbitrarily chosen points on the line subject to certain constraints. All of these representations are based on the coordinate system origin. Obviously if the coordinate system is translated, then the line representation changes. In all formulations, line locations are given with respect to the coordinate system origin. This representation does not remain invariant under a coordinate change. However, having a set of lines, rather than single lines, oers a solution to the representation dilemma. This makes it possible to represent the lines in the set relative to a point which is xed with respect to the set. In this way line representations remain unchanged if the coordinate system undergoes transformations. Even though this point can, in principle, be arbitrary, the reference points c for set A and z for set X must be corresponding points. However, we do not know before hand the corresponding points, which are indeed what we set out to nd (simultaneously with the best transformation).
4.1 The Algorithm
The reference point that appears to be the best choice (and the correct choice when the image and model are not noisy) is the point that is overall closest to all the lines, that is, the point that minimizes the sum of the distances from all lines in the set: c = arg min p
X n
dist(p An ):
(7)
l=0.01 l=1 l=100
(a)
(b)
Figure 3: (a) shows the best match with the new algorithm. Corresponding points are indicated on the lines. (b) shows the eect of changing the length parameter l on the best match. If for the distance we use the L2 Euclidean norm, then nding c becomes a linear problem with the solution c = U;1 v
where U
=
(8)
X n
X (I ; b^ n b^ >n ) v = (an ; a>n b^ n b^ n ): n
is the unit matrix. A similar equation holds for z with the obvious replacement of (an b^ n ) by (xn y^n ). The corresponding points on An and Xn are thus the projections of c on An and z on Xn , namely, ^ n b^ n a0n = an + (c ; an )> b (9) 0 > xn = xn + (z ; xn ) y^ n y^ n : Having thus specied the corresponding points of An and Xn , we use the equal-length line segment matching of Sec. 2 to nd the best match in closed-form, by replacing fan xn ln g ! fa0n x0n lg in (2) to compute the optimal transformation. This solution is obviously: (a) invariant to coordinate system transforms
(b) closed-form and (c) identical to the previous solutions when line correspondences are correct and noise is absent. When all the lines in the set are parallel, i.e. all ^ n = b^ , an obvious ambiguity exists because c is not b P unique: c? = n a?n =N , while ck is undetermined. (Superscripts ? and k denote the perpendicular and parallel components of b^ .) This is a trivially degenerate case and can be handled easily. We only need to match the orientations, since matching to any location of such a set would be optimum.
Figure 4: A polyhedral object and its edges (top row). A 3D edge image (mid. left), and its best match to the model with l = 0:01 1 100. Image is shown by thick lines.
I
4.2 Why the Closest Point
The question arises as to why we should represent lines in set A with respect to c. As a justication for this choice, we argue that point c is likely the most
0-7695-2158-4/04 $20.00 (C) 2004 IEEE
stable point of A. Indeed, we can show this to be true under certain strong assumptions. Suppose instead of c, we use the weighted closest point: cw
= arg min p
X n
wn dist(p An ) P
where wn 0 subject to the constraint wn = N . If all line parameters have small random perturbations with independent, identical distributions, then it follows that the average perturbation of cw is
< jcw j2 > /
X 2 wn : n
The sum on the right hand side is minimized when all wn = 1, which means that point c is (on the average) the most stable point under perturbations of set A. We should also note p that c varies with Gaussian noise only as jcj/ 1= N , hence is fairly stable.
4.3 Experiments
Fig. 3 shows the best solution obtained with this technique. The solution remains fairly stable even as the value of virtual length, l, varies considerably (from 0.01 to 100) as seen in Fig. 3(b). The new matching algorithm is closed-form, hence its validation is straightforward and does not require extensive testing for convergence and optimality as
Figure 5: Photograph of the grid, and a view of its 3D underwater acoustic image. previous algorithms would require. We have applied this algorithm to several data sets and the results are quite acceptable to visual inspection. For example, Fig. 4 shows synthetic images with N = 18. The best match remains fairly stable as l varies by 4 orders of magnitude. Fig. 5 shows the photograph of a 1212 steel grid, and its underwater acoustic image. Only 13 lines can be detected from the image, 6 horizontal and 7 vertical lines. Fig. 6 shows the best matches of the image to the model. This example also shows that the mismatch measure increases signicantly when there are errors in hypothesised line correspondences. Matches shown are with l = 1, results with l = 0:01 to l = 100 are visually indistinguishable.
5 Discussion
We presented a solution to the open problem of matching sets of corresponding lines. The solution is optimal, invariant, and closed-form. This algorithm can be used to verify hypothesised correspondences rapidly and reliably. We expect a similar approach may be used to develop reliable algorithms for solving the more challenging problem of matching projections of 3D lines in an image to a 3D model or to another image. The distance measure, M , used in this paper is a least-squares (LS) error based on the Gaussian noise assumption. Such cost functions are well known to be sensitive to outliers and non-Gaussian noise. However, even in those cases the LS solution presented here is needed in many robust methods, e.g. LMedS.
0-7695-2158-4/04 $20.00 (C) 2004 IEEE
Correspondence Error: None, M*=0.71
Correspondence Error: 1 Row, M*=12.2
Correspondence Error: 1 Col, M*=10.2
Corres. Error: 1 Row & 1 Col, M*=21.7
Figure 6: Results of matching the grid image (solid lines) to the model (dotted lines): with correct correspondences (top left), and with correspondences o by 1 row, 1 column, and 1 row & 1 column.
References
1] A. Bartoli and P. Sturm, \The 3D line motion matrix and alignment of line reconstructions," Proc. CVPR, Hawaii, Dec. 2001, Vol. 1, pp. 287-292. 2] P.J. Besl and N.D. McKay, \A method for registration of 3-D shapes," IEEE Trans. PAMI, 14:239-256, 1992. 3] H.H. Chen and T.S. Huang, \Matching 3-D line segments ," IEEE Trans. PAMI, 12:1002-1008, 1990. 4] K. Daniilidis, \Hand-eye calibration using dual quaternions," Int. J. Robotics Res., 18(3):286-298, 1999. 5] O.D. Faugeras and M. Hebert, \The representation, recognition, and locating of 3-D objects," Int. J. Robotics Res., 5(3):27-52, 1986. 6] W.E.L. Grimson, Object Recognition by Computer: The Role of Geometric Constraints, MIT Press, Cambridge, MA, 1990. 7] D.R. Heisterkamp and P. Bhattacharya, \Matching of 3D polygonal arcs," IEEE T-PAMI, 19:68-73, 1997. 8] B. Kamgar-Parsi and B. Kamgar-Parsi, \Matching sets of 3D line segments with application to polygonal arc matching," IEEE Trans. PAMI, 19:1090-1099, 1997. 9] B. Kamgar-Parsi and B. Kamgar-Parsi, \An open problem in matching sets of 3D lines," Proc. CVPR 2001, Hawaii, Dec. 2001, Vol. 1, pp. 651-656. 10] C.J. Taylor and D.J. Kriegman, \Structure and motion from line segments in multiple images," IEEE Trans. PAMI, 17(11):1021-1032, 1995. 11] M.W. Walker, L. Shao, and R.A. Volz, \Estimating 3-D location parameters using dual number quaternions," CVGIP: IU, 54:358-367, 1991. 12] Z. Zhang, \Iterative point matching for registration of free-form curves and surfaces," Int. J. Computer Vision, 13:119-152, 1994. 13] Z. Zhang and O. Faugeras, 3D Dynamic Scene Analysis, Springer-Verlag, 1992. :::