DIIS - I3A Universidad de Zaragoza C/ Mar´ıa de Luna num. 1 E-50018 Zaragoza Spain
Internal Report: 2005-V15
Robust line matching in image pairs of scenes with dominant planes1
C. Sag¨ u´ es, J.J. Guerrero
If you want to cite this report, please use the following reference instead: Robust line matching in image pairs of scenes with dominant planes, C. Sag¨ u´es, J.J. Guerrero, Optical Engineering, Vol. 45, no. 6, pp. 06724 1-12, 2006.
1
This work was supported by projects DPI2000-1272 and DPI2003-07986.
1
Robust line matching in image pairs of scenes with dominant planes Sag¨ u´es C., Guerrero J.J. DIIS - I3A, Universidad de Zaragoza C/ Mar´ıa de Luna num. 1 E-50018 Zaragoza, Spain. Phone 34-976-762349, Fax 34-976-761914. csagues,
[email protected] Abstract We address the robust matching of lines between two views, when camera motion is unknown and dominant planar structures are viewed. The use of viewpoint non invariant measures gives a lot of non matched or wrong matched features. The inclusion of projective transformations gives much better results with short computing overload. We use line features which can usually be extracted more accurately than points and they can be used in cases where there are partial occlusions. In the first stage, the lines are matched to the weighted nearest neighbor using brightness and geometricbased image parameters. From them, robust homographies can be computed, which allows to reject wrong matches and to add new good matches. When two or more planes are observed, the corresponding homographies can be computed and they can be used to obtain also the fundamental matrix, which gives constraints for the whole scene. The simultaneous computation of matches and projective transformations is extremely useful in many applications. It can be seen that the proposal works in different situations requiring only a simple and intuitive parameter tuning. Keywords: Machine vision, matching, lines, homographies, uncalibrated vision, plane segmentation, multi-plane scene.
1
Introduction
We address the problem of robust matching of lines in two images when the camera motion is unknown. Using lines instead of points has been considered in other works [1]. Straight lines can be accurately extracted in noisy images, they capture more information than points, specially in man-made environments, and they may be used when occlusions occur. The invariant regions used to match images, as the well known SIFT [2], often fails due to lack of textures in man made environments with homogeneous surfaces [3]. Fortunately in this kind of scenes line segments are available to match them. However, line matching is more difficult than point matching because the end points of the extracted lines is not reliable. Besides that, there is not geometrical constraint, like the epipolar, for lines in two images. 2
The putative matching of features based on image parameters has many drawbacks, giving nonmatched or wrong matched features. Previously, the problem of wide baseline matching has been addressed establishing a viewpoint invariant affinity measure [4]. We use the homography in the matching process to select and to grow previous matches which have been obtained combining geometric and brightness image parameters. Scenes with dominant planes are usual in man made environments, and the model to work with multiple views of them is well known. Points or lines on the world plane projected in one image are mapped to the corresponding points or lines in the other image by a plane-to-plane homography [5]. As known, there is no geometric constraint for infinite lines in two images of a general scene, but the homography is a good constraint for scenes with dominant planes or small baseline image pairs, although they have large relative rotation. Robust estimation techniques are currently unquestionable to obtain results in real situations where outliers and spurious data are present [6, 7]. In this work the least median of squares and the Ransac methods have been used to estimate the homography. They provide not only the solution in a robust way, but also a list of previous matches that are in disagreement with it, which allows to reject wrong matches. To compute the homographies, points and lines are dual geometric entities, however line-based algorithms are generally less usual than point-based ones [8]. Thus, some particular problems related to data representation and normalization must be considered in practice. We compute the homographies from corresponding lines in two images making use of classical normalization of point data [9]. The simultaneous computation of matches and projective transformation between images is useful in many applications and the search of one homography is right if the scene features are on a plane or for small baseline image pairs. Although this transformation is not exact in general situations, the practical results are also good when the distance from the camera to the scene is large enough with respect to the baseline. For example, this assumption gives very good results in robot homing [10], where image disparity is mainly due to camera rotation, and therefore a sole homography captures the robot orientation, that is the most useful information for a robot to correct its trajectory. Our proposal can also be applied in photogrammetry for automatic relative orientation of convergent image pairs. In this context, points are the feature mostly used [11], but lines are plentiful in urban scenes. We have put into practice our proposal with aerial images and images of buildings, obtaining satisfactory results. If the lines are in several planes, the corresponding homographies can be iteratively obtained. From them, the fundamental matrix, which gives a general constraint for the whole scene, can be computed [12]. We have also made experiments to segment several scene planes when they exist, obtaining line matching in that situation. This segmentation of planes could be very useful to make automatically 3D model of urban scenes. After this introduction, we present in §2 the matching of lines using geometric and brightness parameters. The robust estimation of homographies from lines is explained in §3. After that, we present in §4 the process to obtain the final matches using the geometrical constraint given by the homography. Then we present the multi-plane method §5, and the useful case of having only vertical features §6. Experimental results with real images are presented in §7. Finally, §8 is devoted to expose the conclusions. A previous version of this matcher was presented in [13], but now we have completed and extended the work showing the multi plane method and the special case of having only vertical lines. We also include experiments made with the new software version which implements different robust 3
3 4
2 0
5
o
1 6
8 7
Figure 1: Left: Segmentation of the image into regions which have similar orientation of brightness gradient, LSR. Right: Zoom of the LSRs extracted on real image. From the LSR, we obtain the line with its geometric description and also attributes related to its brightness and contrast used for the line matching. techniques.
2
Basic matching
As said, the initial matching of straight lines in the images is made to the weighted nearest neighbor. These lines are extracted using our implementation of the method proposed by Burns [14], which computes spatial brightness gradients to detect the lines in the image. Pixels having the gradient magnitude larger than a threshold are grouped into regions of similar direction of brightness gradient. These groups are named Line Support Regions (LSR). A least-squares fitting into the LSR is used to obtain the line, and the detector gives not only the geometrical parameters of the lines, but also several attributes related to their brightness and some quality attributes, that provide very useful information to select and identify the lines (Fig. 1). We use not only the geometric parameters, but also the brightness attributes supplied by the contour extractor. So, agl and c (average grey level and contrast) in the LSR, are combined with geometric parameters of the segments such as midpoint coordinates (xm , ym ), the line orientation θ (in 2π range with dark on the right and bright on the left) and its length l. In several works, the matching is made over close images. In this field, correspondence determination by tracking geometric information along the image sequence has been proposed as a good solution [15]. We determine correspondences between lines in two images of large disparity in the uncalibrated case. Significant motion between views or changes on light conditions of the viewed scenes makes that few or none of the defined line parameters remain invariant between images. So, for example, approaching motions to the objects gives bigger lines which are also going out the image. Besides that, measurement noise introduces more problems to the desired invariance of parameters. Now, it must be stated that a line segment is considered to select the nearest neighbor, but assuming higher uncertainty along the line than across it. However, to compute afterwards projective transformations between images, only the infinite line information is considered. This provides an accurate solution being robust to partial occlusions without assuming that 4
the tips of the extracted line are the same.
2.1
Similarity measures
In the matching process two similarity measures are used: a geometric measure and a brightness measure. We name rg the difference of geometric parameters between both images (1, 2)
rg =
xm1 − xm2 ym1 − ym2 θ1 − θ2 l1 − l2
.
As previously [15], we define the R matrix to express the uncertainty due to measurement noise in the extraction of features in each image
2 S 2 + σ2C 2 2 CS − σ 2 CS σ⊥ σ⊥ k k 2 2 C 2 + σ2S 2 σ⊥ CS − σk2 CS σ⊥ k
R=
0 0
0 0
0 0
2
2 σ⊥ l2
0
0 0
.
0 2σk2
where C = cos θ y S = sin θ. Location uncertainties of segment tips along the line direction and along the orthogonal direction are represented by σk and σ⊥ respectively. Additionally we define the P matrix to represent the uncertainty on the difference of the geometric parameters due to camera motion and unknown scene structure
P=
σx2m 0 0 0
0 σy2m 0 0
0 0 0 0 σθ2 0 0 σl2
,
where σi , (i = xm , ym , θ, l) represents the uncertainty of variation of the geometric parameters of the image segment. Thus, from those matrixes we introduce S = R1 + R2 + P to weigh the variations on the geometric parameters of corresponding lines due to both, line extraction noise (R1 in the first image and R2 in the second image) and unknown structure and motion. Note in R that σk is bigger than σ⊥ . Therefore measurement noise of xm and ym are coupled and the line orientation shows the direction where the measurement noise is bigger (along the line). However, in P the orientation does not matter because the evolution of the line between images is mainly due to camera motion, which is not dependent on the line orientation in the image. The matching technique in the first stage is made to the nearest neighbor. The similarity between the parameters can be measured with a Mahalanobis distance as dg = rg T S−1 rg . The second similarity measure has been defined for the brightness parameters. In this case we have "
B=
2 σagl 0 0 σc2
5
#
,
where σagl and σc represent the uncertainty of variations of the average gray level and the contrast of the line. Both depend on measurement noises and on changes of illumination between the images. Naming rb the variation of the brightness parameters between both images "
rb =
agl1 − agl2 c1 − c2
#
,
the Mahalanobis distance for the similarity between the brightness parameters is db = rb T B−1 rb .
2.2
Matching criteria
Two image lines are stated as compatible when both, geometric and brightness variations are small. For one line in the second image to belong to the compatible set of a line in the first image, the following tests must be satisfied. • Geometric compatibility Under assumption of Gaussian noise, the similarity distance for the geometric parameters is distributed as a χ2 with 4 degrees of freedom. Establishing a significance level of 5%, the compatible lines must fulfill, dg ≤ χ24 (95%). • Brightness compatibility Similarly referring to the brightness parameters, the compatible lines must fulfill, db ≤ χ22 (95%). A general Mahalanobis distance for the six parameters is not used because the correct weighting of so different information as brightness based and location based in a sole distance is difficult and could easily lead to wrong matches. Thus, compensation between high precision in some parameters with high error in other parameters is avoided. A line in the first image can have more than one compatible line in the second image. From the compatible lines, the line having the smallest dg is selected as putative match. The matching is carried out in both directions from the first to second image and from second to first. A match (n1 , n2 ) is considered valid when the line n2 is the putative match of n1 and simultaneously n1 is the putative match of n2 . In practice the parameters σj (j = ⊥, k, xm , ym , θ, l, agl, c) introduced in R, P, B must be tuned according to the estimated image noise, expected camera motion and illumination conditions, respectively.
6
3
From lines to homographies
The representation of a line in the projective plane is obtained from the analytic representation of a plane through the origin: n1 x1 +n2 x2 +n3 x3 = 0. The coefficients of n = (n1 , n2 , n3 )T correspond to the homogeneous coordinates of the projective line. All the lines written as λn are the same than n. Similarly, an image point p = (x, y, 1)T is also an element of the projective plane and the equation pT · n = nT · p = 0 represents that the point p is on the line n, which shows the duality of points and lines. A projective transformation between two projective planes (1 and 2) can be represented by a linear transformation H21 , in such a way that p2 = H21 p1 . Considering the above h iT equations for lines in both images, we have n2 = H−1 n1 . This homography requires 21 eight parameters to be completely defined, because it is defined up to an overall scale factor. A pair of corresponding points or lines gives two linear equations in terms of the elements of the homography. Thus, four pairs of corresponding lines assure a unique solution for H21 , if no three of them are parallel.
3.1
Computing homographies from corresponding lines
Here, we obtain the projective transformation of points (p2 = H21 p1 ), but using matched lines. To deduce it, we suppose the start (s) and end (e) tips of a matched line segment to be ps1 , pe1 , ps2 , pe2 , which usually will not be corresponding points. The line in the second image can be computed as the cross product of two of its points (in particular the observed tips) as ˜ s2 pe2 , n2 = ps2 × pe2 = p
(1)
˜ s2 is the skew-symmetric matrix obtained from vector ps2 . where p As the tips belong to the line we have, pTs2 n2 = 0; pTe2 n2 = 0. As the tips of the line in the first image once transformed also belong to the corresponding line in the second image, we can write, pTs1 HT21 n2 = 0; pTe1 HT21 n2 = 0. Combining with equation (1) it turns out, ˜ s2 pe2 = 0 ; pTe1 HT21 p ˜ s2 pe2 = 0. pTs1 HT21 p
(2)
Therefore each pair of corresponding lines gives two homogeneous equations to compute the projective transformation, which can be determined up to a non-zero scale factor. Developing them in function of the elements of the projective transformation, we have Ã
Axs1 Ays1 A Bxs1 Bys1 B Cxs1 Cys1 C Axe1 Aye1 A Bxe1 Bye1 B Cxe1 Cye1 C
!
Ã
h=
0 0
!
,
where h = (h11 h12 h13 h21 h22 h23 h31 h32 h33 )T is a vector with the elements of H21 , and A = ys2 − ye2 , B = xe2 − xs2 and C = xs2 ye2 − xe2 ys2 . Using four line correspondences, we can construct a 8 × 9 matrix M. The solution for h is the eigenvector associated to the least eigenvalue (in this case the null eigenvalue) of the matrix MT M. If we have more than the minimum number of good matches, an estimation method may be considered to obtain a better solution. Thus n matches give a 2n × 9 matrix M, and the solution h can be obtained using the singular value decomposition of this matrix [5]. 7
It is known that a previous normalization of data avoids problems of numerical computation. As our formulation only uses image coordinates of observed tips, data normalization proposed for points [9] has been used. In our case the normalization makes the x and y coordinates to be centered in an image which has unitary width and height. Equations (2) can be used as a residue to designate inliers and outliers with respect to a candidate homography. In this case the residue is an algebraic distance and the relevance of each line depends on its observed length, because the cross product of the segment tips is related to its length. Alternatively a geometric distance in the image can be used. It is obtained from the sum of the squared distances between the segment tips and the corresponding transformed line d2 =
˜ s2 pe2 )2 ˜ s2 pe2 )2 (pTe1 HT21 p (pTs1 HT21 p + , ˜ s2 pe2 )2x + (HT21 p ˜ s2 pe2 )2y (HT21 p ˜ s2 pe2 )2x + (HT21 p ˜ s2 pe2 )2y (HT21 p
(3)
where subindexes ()x and ()y indicate first and second component of the vector respectively. Doing the same for both images and with both observed segments, the squared distance with geometric meaning is:
d2 =
3.2
˜ s2 pe2 )2 + (pTe1 HT21 p ˜ s2 pe2 )2 (pTs2 H−T ˜ s1 pe1 )2 + (pTe2 H−T ˜ s1 pe1 )2 (pTs1 HT21 p 21 p 21 p + ˜ s2 pe2 )2x + (HT21 p ˜ s2 pe2 )2y (HT21 p ˜ s1 pe1 )2x + (H−T ˜ s1 pe1 )2y (H−T 21 p 21 p
(4)
Robust estimation
The least squares method assumes that all the measures can be interpreted with the same model, which makes it very sensitive to outliers. Robust estimation tries to avoid the outliers in the computation of the estimate. From the existing robust estimation methods [6], we have initially chosen the least median of squares method. This method makes a search in the space of solutions obtained from subsets of minimum number of matches. If we need a minimum of 4 matches to compute the projective transformation, and there are a total of n matches, then the search space will be obtained from the combinations of n elements taken 4 by 4. As that is computationally too expensive, several subsets of 4 matches are randomly chosen. The algorithm to obtain an estimate with this method can be summarized as follows: 1. A Monte-Carlo technique is used to randomly select m subsets of 4 features. 2. For each subset S, we compute a solution in closed form HS . 3. For each solution HS , the median MS of the squares of the residue with respect to all the matches is computed. 4. We store the solution HS which gives the minimum median MS . A selection of m subsets is good if at least one of them has no outliers. Assuming a ratio ² of outliers, the probability of one subset to be good can be obtained [16] as, m P = 1 − [1 − (1 − ²)4 ] . Thus assuming a probability (1 − P ) of mismatch the computation, the number of subset m to consider can be obtained. For example, if we want a probability P = 0.99 of one of them being good, having ² = 35% of outliers, the number of subsets m 8
should be 24. Having ² = 70% of outliers the number of subsets m should be 566 and a minor quantile instead of the median must be selected. Anyway, a reduction of ten times in the probability of mismatch the computation (P = 0.999) only yields a computational cost of 0.5 times larger (m = 36 with ² = 35% and m = 849 with ² = 70%).
3.3
Rejecting wrong basic matches
Once the solution has been obtained, the outliers can be selected from those of higher residue. Good matches can be selected between those of residue smaller than a threshold. As in [6] the threshold is fitted proportional to the standard deviation of the error that can be estimated [16] as, q
σ ˆ = 1.4826 [1 + 5/(n − 4)]
MS .
Assuming that the measurement error for in inliers is Gaussian with zero mean and standard deviation σ, then the square of the residues is a sum of Gaussian variables and follows a χ22 distribution with 2 degrees of freedom. Taking, for example, a 5% of probability of rejecting a line being correct, the threshold will be fixed to 5.99 σ ˆ2.
4
Final matches
From here on, we introduce the geometrical constraint imposed by the estimated homography to get a bigger set of matches. Our objective here is to obtain at the end of the process more good matches, also eliminating wrong matches given by the basic matching. Thus final matches are composed by two sets. The first one is obtained from the matches selected after the robust computation of the homography passing additionally an overlapping test compatible with the transformation of the segment tips. The second set of matches is obtained taking all the segments not matched initially and those being rejected previously. With this set of lines, a matching process similar to the basic matching is carried out. However, now the matching is made to the nearest neighbor segment transformed with the homography. The transformation is applied to the end tips of the image segments using the homography H21 to find, not only compatible lines but also compatible segments in the same line. In the first stage of the matching process there was no knowledge of the camera motion. However in this second step the computed homography provides information about expected disparity and therefore the uncertainty of geometric variations can be reduced. So, a new tuning of σxm , σym , σθ and σl , is considered. To automate the process, a global reduction of these parameters has been proposed and tested in several situations, obtaining good results with reductions of about 1/5. As the measurement noise (σk and σ⊥ ) has not changed, the initial tuning of these parameters is maintained in this second step. Note that the brightness compatibility set is the initially computed, and therefore it is not repeated.
5
Different planes
When lines are in different planes, the corresponding homographies can iteratively be computed. Previously, we have explained how to determine a homography assuming that some 9
of the lines are outliers. The first homography can be computed in this way assuming that a certain percentage of matched lines are good matches and belong to a plane, and the other matches are bad or they cannot be explained by this homography. We do not have a priori knowledge about which plane of the scene is going to be extracted the first, but the probability of a plane being chosen increases with the number of lines in it. The most reasonable way to extract a second plane is eliminating the lines used to determine the first homography. Here we are assuming that lines belonging to the first plane do not belong to the second, which is true except for the intersection line between both planes. This idea of eliminating lines already used will apply if more planes are extracted. On the other hand, in the multi-plane case the percentage of outliers for each homography will be high even all the matches are good. As the least median of squares method has a breakdown point in 50% of outliers, we have used in this case the Ransac method [17] which needs a tuning threshold, but works with a less demanding breakdown point. Besides that, all the homographies obtained from an image pair are related between them. This relation can be expressed with the fundamental matrix (F21 ) and the epipole (e2 ), which is unique for an image pair, through the following equation: F21 = [e2 ]× Hπ21 ,
(5)
[e2 ]× being the skew matrix corresponding to epipole e2 of the second image and Hπ21 the homography between first and second image through plane π. If at least two homographies (Hπ211 , Hπ212 ) can be computed between both images corresponding to two planes (π1 , π2 ), an homology H = Hπ211 · (Hπ212 )−1 , that is a mapping from one image into itself, exists. It turns out that the homology has two equal eigenvalues [5]. The third one is related to the motion and to the structure of the scene. These eigenvalues can be used to test when two different planes have been computed, and then the epipole and the intersection of the planes can be also obtained. The epipole is the eigenvector corresponding to the non-unary eigenvalue of the homology and the other two eigenvectors define the intersection line of the planes [18]. In case of small baseline or if there exist only one plane in the scene, the epipolar geometry is not defined and only one homography can be computed. So possible homology H will be close to identity, up to scale. In practice, we propose a filter to notice the goodness of the homology using these ideas. Firstly, we normalize the homology dividing by the median eigenvalue. If there are no two unary eigenvalues, up to a threshold, then the computation of the second homography is repeated. If the three eigenvalues are similar we search if the homology is close to identity. In this case two similar homographies would explain the scene or the motion, and the fundamental matrix cannot be computed. In other case, two planes and two coherent homographies are given as result of the process. From them the fundamental matrix can be computed [12].
6
Particular case: Vertical lines
A special case of interest in some applications can be considered reducing in one dimension the problem. For example, when robot moves at man made environments, the motion is on a plane and vertical lines give enough information to carry out visual control. For vertical lines, only the x coordinate is relevant to compute the homography, therefore the problem
10
is simplified. The homography now has three parameters and each vertical line gives one equation, therefore three vertical line correspondences are enough. The matching is quite similar to the presented for the lines in all directions. In this case the parameter referred to the orientation of the lines is discarded because it has no sense, although the lines are grouped in two incompatible subsets, those having 90 degrees of orientation and those with 270 degrees of orientation.
6.1
Estimation of the homography
In this case, as the dimension of the problem is reduced, only the xm coordinate is relevant for the computation of the projective transformation H21 . We can use the point based formulation which is simpler because a vertical line can be projectively represented as a point with (xm , 1) coordinates. Then, we can write, up to a scale factor Ã
λ xm2 λ
!
Ã
=
h11 h12 h21 h22
!Ã
xm1 1
!
Therefore each pair of corresponding vertical lines having xm1 and xm2 coordinates respectively provides one equation to solve the matrix H21 ³
xm1
1
−xm1 xm2
−xm2
´
h11 h12 h21 h22
= 0.
With the coordinates of at least n = 3 vertical line correspondences we can construct a n × 4 matrix M. The homography solution is the eigenvector associated to the least eigenvalue of the MT M matrix. As above before, we use a robust method to compute the projective transformation. With vertical lines the problem is computationally simpler. For example with P = 0.99 and 70% of outliers only 168 sets of 3 matches must be considered (in the general case we need 566 sets of 4 matches). As told before, the estimated homography is also used to obtain a better set of matched lines.
7
Experimental Results
A set of experiments with different kind of images has been carried out to test the proposal. The images correspond to different applications: Indoor robot homing, images of buildings, and aerial images. In the algorithms there are extraction parameters to obtain more or less line segments according to its minimum length and minimum gradient. There are also parameters to match the lines, whose tuning has turned out simple and quite intuitive. In the experiments we have used the following tuning parameters σ⊥ = 1, σk = 10, σagl = 8, σc = 4, σxm = 60, σym = 20, σθ = 2, σl = 10, or small variation with respect to them depending on expected image disparity. We have applied the algorithm presented for robot homing. In this application the robot must go to previously learnt positions using a camera [10]. The robot corrects its heading from the computed projective transformation between target and current images. In this 11
Robot Rotation 2◦ 4◦ 6◦ 8◦ 10◦ 12◦ 14◦ 16◦ 18◦ 20◦
σxm 60 60 60 60 60 100 100 100 140 140
Basic 92 (1W) 73 (5W) 63 (4W) 53 (6W) 47 (9W) 41 (9W) 36 (12W) 27 (9W) 37 (14W) 28 (10W)
After H21 78 (0W) 56 (1W) 47 (1W) 31 (0W) 35 (1W) 30 (2W) 24 (2W) 17 (3W) 24 (3W) 17 (1W)
Final 90 (0W) 76 (0W) 63 (0W) 52 (0W) 50 (0W) 33 (1W) 32 (1W) 30 (1W) 28 (1W) 24 (0W)
Table 1: Number of matches with several robot rotation, indicating also the number of wrong matches (W). First column show the robot rotation. Second shows the tuning of σxm used in the matching. Third shows the basic matches. Forth shows the matches obtained after the constraint imposed by the homography and fifth column shows the final matches. Here, the matches that are good as lines but wrong as segments (they are not overlapped) are counted as wrong matches. experiment a set of robot rotations (from 2 to 20 degrees) has been made. The camera center is about 30 cm. out of the axis of rotation of the robot and therefore this camera motion has a short baseline. In the table 1 the number of matches in the successive steps with this set of camera motions are shown. The number of lines extracted in the reference image is 112. Here, the progressive advantage of the simultaneous computation of the homography and matching can be seen. When the image disparity is small, the robust estimation of the homography does not improve the basic matching. However, with a disparity close to 70% of the image size, the basic matching produces a high ratio of wrong matches (> 30%), that are automatically corrected in the final matching. We observe that in this case the system also works even between two images having a large image disparity. To simplify, only the images corresponding to the 18 degrees of robot rotation are shown (Fig. 2). A 38 % of wrong matches are given by the basic matching. At the final matching stage, all the matches are good when considered as lines, although one of them can be considered wrong as segment. We have also used two aerial images with large baseline (Fig. 3). In photogrammetry applications usually putative matching has a high ratio of spurious results. This is confirmed in our case, where the basic matching has given a ratio of wrong matches higher than 50%, which is the theoretical limit of least median of squares method. However, if we select a smaller quantile of the squares of the residue instead of the median, the robust method works properly. The results have been obtained with the Least 30-Quantile of Squares (Fig. 3). The robust computation of the homography provides 55 matches, 11 of them being wrong as segments but good as infinite lines. Among the 105 final matches, there are less than 3% of wrong matches which correspond to contiguous cars. This means that the final matches are duplicated with respect to the matches obtained with the homography. Note also that the final matches selected are mainly located on the ground. There are some lines on the roofs of the buildings but they are nearly parallel to the flight of the camera which is coherent with the model of homography used.
12
35
2
2
35 3
1 3
1
4
7
7
8
8 10
10 15 16
9
11
12 14 16
12 14
20 25
26
26
36
25
28
28 22 27
30
18 21
3723 29
24
21
37 29
19
20
19
36 24
11
15 17
17
23
9
13
13
18
54
6
6
5
22 31 27 33
30
31 32
32 33
34
34
18
2
2
18
1 1
3
3
4
4 5
5 22 20 21 24 26
23 25 27
11
28
19 15
11
8 12
6
1910 15
8 12 14
14 16
28
25
23 27
7
6
7 10
22 20 21 24 26
9 13
16
9 13 17
17
Figure 2: Images with 18 degrees of robot rotation. Images of first row show the matches after the basic matching (37 matches, 14 wrong). Second row shows the matches at the final stage of the process, only one been wrong match as segment although it is good as line (number 10).
13
3
106
11052
107
110
108 3
4
109
2 105
107
1
106
109
108 110
4
5 10
11
5 10
6
8 9
11
12
7
9
13
16
13
12
6
8
14 7
16 18
14 17 19
21
22
18
19
20
15
15
22
23
17
23
20
21
26
24
24 26 27
25
27 25
28
28
111 111 29 31
120117
40
38
41 45 44
43
33
37
38
113 115 116 118
42
34
112
35
35 39
31
112 3233 34
36
29
30
30
113
36
39
119
47 48 46
120117 47
32 42
40
115 116 118
114
37
41
114
43 44 45 119
46 48
49 51
50
49 51 50
5352
54
57
55
56
53
52 54
55
56
57 58
58
59
61 62 66 67
66 70
69
75
73
74 71
77
64 77
72 78
60
80 82
83
79
81 87
85
89
88 91 94
84
90 92
91
93
96
95
96
69 74
86
81 87 93
65
76
64
75 78
86
68
63
76
84 85 89
61 67
62 65
72
73 70
83
59
60
68
63
97 121
97 121
80 82 88 90 94
71
79 95 92
98
99 9899
100
101 101
103
100
102 104
99
103
102 104
99
100
101
100
101
27 30
31
32
28
32
1
33
1
27 30
29
31
29
33
28
2 3
3
2
34 4
4
34
5
5 35 36
39
39
38
40
36 35
38
37 40
6
41 42
42
22
47 105
24
54 55
60
71
64 66 70 17 73
19
79 83 88
26
11 54 55
12
57
84 20 87 91
11 50 52 51
53
78 80 90
81 85 89
13 59 63
69 72 76
53
62
60
15
68
14
71
16 74 75 77 82 86
64 66 70 17 73
18
19 79 83
84 20 87 91
21
92 94
7880
88 90 26
93
52 51
50
56 57
58 61 65 67
103105
10
48
56
68
24
25 23 49
48
62
104
9 10
45
46
8
103 104
9
12
44 102
45
46 8
25 23
37
22 7
102 47
6
43
44
43 7
49
41
81 85 89
13 59 63 65 69 72 76 77
58 61 67
15 14 16
74 75 82
18
86
92 94
21
93 98
95
96
98 97
95 96 97
Figure 3: Two aerial images with large baseline. In the first row, the lines extracted are superimposed to the images (approximately 300 lines/image). The second row shows the basic matches (121 matches, 64 being wrong). Third row shows the matches at the final stage (105 matches, 3 being wrong that are corresponding to contiguous cars).
14
1
1
2
26 18 17 19 2128
32 24 25 33 2035
2
15 14 16 31 3427
36
29
4
15 14 16 31 3427
5
4
36
29 7
67 22 23
9
32 24 33 25 2035
18 26 17 19 2128 3
3 5
22 23
9
8
6
8
37
12
10
10 11
11
12
37
30
30 13
13 38
38 41 40 42
41
4339
40
42
44
4339
44
16
16 2
2 17
17
18 3
18 3
4
215
22
4
19
20
23
24
215
25 6
1
19
20
24
23 22
25 6
1
26 8
26 8
7
7 9
11
10 27
11 12
9 10 27
13
12 13
28
28 14 15
14
29
29 30
15
30
Figure 4: Two examples of image pairs with automatically matched vertical lines. House College
Basic.Matches Good 148 75% 196 82%
Final.Matches 114.6 (8.91) 156.7 (11.09)
Good 99% (1%) 96% (1%)
Table 2: Number of matches of scenes in Figs. 5 and 6. Only the lines on the two main planes are considered. We have repeated the experiments fifty times using the same basic matches, showing as final matches the mean values and, in brackets, the standard deviation. We also show two examples of matched vertical lines with an homography of reduced dimension (Fig. 4). In both cases the percentage of good final matches is close to 100%. Using vertical lines, we have developed a real time implementation for robotic applications in man made environments [10]. Other experiments have been carried out to match lines which are in several planes, segmenting simultaneously the planes (Figs. 5, 6). In Table 2 we show the number of matches and the ratio of good matches for 50 repetitions of the robot computation. In this case, once an homography has been computed, the robust homography computation and the growing of matches process has been iteratively repeated twice. As it can be seen, the number and quality of the final matches is very good. The homology filter just commented in section 5 has been used to detect situations where a second plane is not obtained and therefore a sole homography can be computed, or when the homographies do not give a right homology due to noise or bad extraction. It can be seen that some lines of the second plane have been selected by the first homography (Fig. 6). The reason is that these lines are nearly coincident with epipolar lines and they can be selected by any homography being compatible with the fundamental matrix. The method also work with rotated images. In this case σθ must be changed according to the image disparity. We show an example of a rotated image pair of short baseline where a sole homography gives very good results (Fig. 7). 15
123 124
123
126
124
126
3 125
125
1 2 67 5
127
4 11 130 10 8 1412
128 131
24 2126
13
135 132 19 28 136 34 36 141 142 43 45
31 32
138
46
7268
45 43 142 141
70
67
52
6971
61 59 82
83 85 89 84 90
88
144 91
93
86
92 87
96 95 109
111
138
106 147 116
112
33
39
143 5154 55
44
49
47 48 56
137
34 36
40
139 140 42
6562 64 60
19 28
132 136 135
30
35
41
7774 7976
107 97
13
27 29 133
5350 67 57
127 128131
18 20 17 15 23 2216 25
134
37
38
129 6 2 7 9 130 4 12 11 108 18 20 14 17 2216 1523 25 24 29 2126 27 134 133 30
5
137 33 31 32
39
3
1
9 129
73 58 66 75 7880
57
63
62 65 64
5350
5154
42 46
44
58
66 7578 80
6971
61 59
144 91 86
8792
96 95 106 147 116
109 111
63
82
83 85 89 84 90
93 107 97
40
70 73 55
52
7268
88
81 94 145 98 100 99146 101 102 103 104 105 108 110 115 117 118 119
143
49
47 48 56 60 77 74 76 79
139 140
35
41 38 37
112
81 102 10099 146 104 105 108 110 115 117 118 119 94 145 101 98 103
114 113
113 114
121
122
120
121 122
120
64
64 80
22 56
22
68 67
65 66 25 24 63 26 57
1 69 4
2 3 30
74
65 63 26 57 2 3 30
58
74
4847 5049 76 77 52 14 16 13 61
15 17
51
4847 5049 52 14 16 13 61
76 77 51 78
17
19 18 21
70 73
71 72 5 75 11 9 10 12 15
78
58
4342 4544
78 6
70
73
71 72
27 28 32 35
29 31 54 33 60 40
59 34 36 38
4 37 39 41 55 46
119 1210
68 67
24 66 25
80
1 69
4342 4544
5
75
27 28 32 35
29 31 54 33 60 40
59 34 36 38
37 39 41 55 46 78 6
56
23 53
23 53
19 18
20
20
62
21
62 79
79
50
51
29
29
50 51 34 3421 22 23 24 25 11 26
1
1415
16 17 27
37 313240
18 30 39 2 319 4
25 11 26 14 15
21 22 23 24
36
36
5 12 6 9 35 10 20
7 33
37 3132 40
38
8 13 28 42 43 44
18 39 2 3194
30
38
5 12 6
41
337
8 13 43 44
52 9 35 10 20
45 46 48
1
16 17 27
47 49
41 28 42
52
45 46 48
47 49
Figure 5: Images of the house. First row: basic matches. Second row: final matches corresponding to the first homography. Third row: final matches corresponding to second homography (Original images from KTH, Stockholm).
16
1 1
2
3
3
2 4 82 8 12 699 11 17 63 18 64 23 25
61 20 21
12699 11 17 63 18 64 23 25
30 31 32 33 34
35 39
38 72 40 42
41
8
19
22 70 24
26 2728 71 29
6676 5 57 58 10 67 68 60 13 59 83 14 62 15 65 16
26 27 28 71 29
36 37
45 84
74 75 92
47 50 55 56
46
52
78 80
93
85 88 90 53 81
48
87 89 91
84 74 75 92
54 5179
30 31 32 33 34
35 39 36 37
43 44
45 73 86 49 76 77
65 16 19
20 21
22 70 24 38 72 40 42
41
43 44
61
824 66 75 6 57 58 10 68 67 60 83 13 59 14 62 15
46 47 50
52
55 56
73 86 49 76 77
78 80
85 88 90 9353 81
48
87 89 91
54 5179
39 39
40 41
69
40 41 69
42 1
62 71 11
2
3
45
89
47 6 43 7
42 81 9
70 10
7213
48 17 16 20
63
2 3
43
7213
670 10
62 71 11 19 18 49 44 22
45 7
12 14 15
47
12 14 15
63 4817
21 23 24
1620
19 18 49 44 22
21 23 24
25 26 27
64 45
54 33 55 35 36 74 38 46 68
30 58 61
25 26 28 65 51 29 31 52 34 32 53 73 66 56 37 57 59 60 67 50
64 45
27 33 55
54
35 36 74 46 38 68
58 61
30
28 50 65 51 29 3432 53 31 5273 66 37 57 59 56 67 60
Figure 6: Images of the college. First row: basic matches. Second row: final matches corresponding to first homography. Third row: final matches corresponding to second homography (Original images from VGG, Oxford).
17
78 9 10 79
80 1 113 114 116 117
73 84
11 70
115
8473
12 14
71 72 13
83
19 92 75 76 34 25 29 28
16 87 89 91
20 96 25 92 75 19 28 76 34 29
85 95 23 24 15 2 33 3
86 1790 27 88 18 21 74 26 22
30 32 35 98
97
99 101
11 70
88 18 74
36 103 41
100
105104
42
103 105 104 8 106
43 44
102 45 4 6 47 48
7 59
60
32 35
98
31 94
81 93
5 77
49 51
102 59 7 62 65 63 112 68 66 64 69 67
36
6 60 58
108
61
62 63 112 65 66 64 67 68 69
109 111
61
100
45 4746 50 48
110
56 55
58
82 30
37
109 53 108 111 54
110 57 77
12 14
4 57
52
50
38
6
5 4
13
106 8
40
107
10 79
16 2286 21 26 1790 27 23 24 85 95 15 2 97 333
107
42
1 71 72
39
31 94
78 9
80
83 87 89 91
99 101
93
38
41
115 9620
37 39
113 114 116 117
81 82
43 44
53 54
55 56
40
52 49 51
78 9 10 79
80 1 113 114 116 117
73 84
11 70
115
8473
12 14
71 72 13
83
19 92 75 76 25 34 29 28
16 87 89 91
20 96 25 92 75 19 28 76 34 29
88 18 21 74 26 22
861790 27
95 85 23 24 15 2 33 3
30 32 35 98
97
99 101
88 18 74
39
31 94 36
103 105 104
107
42
8 106
43 44
102
107
45 46 47 48 50
6
5 4
110 7 59
52
16 13 12 14
38 37
82 30 32 35
98
31 94
81
4
93 5
57 77
49 51
102 59 7 62 65 63 112 68 66 64 69 67
53 109 111 54 56 55
58 60
108
10 79
23 24 85 95 15 2 97 333
105104
106 8
40
1
22 86 26 21 1790 27
103
100
78 9
80 71 72
41 42
57 77
11 70 83 87 89 91
99 101
93
38
41
115
9620
37 39
113 114 116 117
81 82
60
108
61 55 56
109 111
100
45 4746 50 48
110 58
61
62 63 112 65 66 64 67 68 69
36
6 43 44 53 54
40
52 49 51
Figure 7: Pair of rotated short baseline images and final line correspondences.
18
8
Conclusions
We have presented and tested a method to automatically obtain matches of lines simultaneously to the robust computation of homographies in image pairs. The robust computation works especially well to eliminate outliers which may appear when the matching is based on image properties and there is no information of scene structure or camera motion. The homographies are computed from lines extracted and the use of lines has advantages with respect to the use of points. The geometric mapping between uncalibrated images provided by the homography turns out useful to grow matches and to eliminate wrong matches. All the work is automatic with only some previous tuning of parameters related to the expected global image disparity. As is seen in the experiments, the proposed algorithm works with different types of scenes. With planar scenes or in situations where disparity is mainly due to rotation, the method gives good results with only one homography. However, it is also possible to compute several homographies when several planes are in the scene, which are iteratively segmented. In this case, the fundamental matrix can be computed from lines through the homographies, obtaining a constraint for the whole scene. Currently we are working with other image invariants to improve the first stage of the process.
Acknowledgments This work was supported by project DPI2003-07986.
References [1] C. Schmid and A. Zisserman, “Automatic Line Maching across Views,” In IEEE Conference on CVPR, pp. 666–671 (1997). [2] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision 60(2), 91–110 (2004). [3] H. Bay, V. Ferrari, and L. V. Gool, “Wide-Baseline Stereo Matching with Line Segments,” In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, 329–336 (2005). [4] P. Pritchett and A. Zisserman, “Wide Baseline Stereo Matching,” In IEEE Conference on Computer Vision, pp. 754–760 (1998). [5] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision (Cambridge University Press, Cambridge, 2000). [6] Z. Zhang, “Parameter Estimation Techniques: A tutorial with Application to Conic Fitting,” Rapport de recherche RR-2676, I.N.R.I.A., Sophia-Antipolis, France (1995) . [7] P. Torr and D. Murray, “The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix,” International Journal of Computer Vision 24, 271–300 (1997).
19
[8] L. Quan and T. Kanade, “Affine Structure from Line Correspondences with Uncalibrated Affine Cameras,” IEEE Trans. on Pattern Analysis and Machine Intelligence 19(8), 834–845 (1997). [9] R. Hartley, “In Defense of the Eight-Point Algorithm,” IEEE Trans. on Pattern Analysis and Machine Intelligence 19(6), 580–593 (1997). [10] J. Guerrero, R. Martinez-Cantin, and C. Sag¨ u´es, “Visual map-less navigation based on homographies,” Journal of Robotic Systems 22(10), 569–581 (2005). [11] A. Habib and D. Kelley, “Automatic relative orientation of large scale imagery over urban areas using Modified Iterated Hough Transform,” Journal of Photogrammetry and Remote Sensing 56, 29–41 (2001). [12] C. Sag¨ u´es, A. Murillo, F. Escudero, and J. Guerrero, “From Lines to Epipoles Through Planes in Two Views,” Pattern Recognition (2006), to Appear. [13] J. Guerrero and C. Sag¨ u´es, “Robust line matching and estimate of homographies simultaneously,” In IbPRIA, Pattern Recognition and Image Analysis, LNCS 2652, pp. 297–307 (2003). [14] J. Burns, A. Hanson, and E. Riseman, “Extracting Straight Lines,” IEEE Trans. on Pattern Analysis and Machine Intelligence 8(4), 425–455 (1986). [15] R. Deriche and O. Faugeras, “Tracking Line Segments,” In First European Conference on Computer Vision, pp. 259–268 (Antibes, France, 1990). [16] P. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection (John Wiley, New York, 1987). [17] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” Comm. of the ACM 24(2), 381–395 (1981). [18] L. Zelnik-Manor and M. Irani, “Multiview Constraints on Homographies,” IEEE Transactions on Pattern Analysis and Machine Intelligence 24(2), 214–223 (2002).
20