Robust line matching in image pairs of scenes with ... - CiteSeerX

0 downloads 0 Views 1MB Size Report
The putative matching of features based on image parameters has many drawbacks, ..... matrix MT M. If we have more than the minimum number of good matches, ..... First column show the robot rotation. ... Images of first row show the matches.
DIIS - I3A Universidad de Zaragoza C/ Mar´ıa de Luna num. 1 E-50018 Zaragoza Spain

Internal Report: 2005-V15

Robust line matching in image pairs of scenes with dominant planes1

C. Sag¨ u´ es, J.J. Guerrero

If you want to cite this report, please use the following reference instead: Robust line matching in image pairs of scenes with dominant planes, C. Sag¨ u´es, J.J. Guerrero, Optical Engineering, Vol. 45, no. 6, pp. 06724 1-12, 2006.

1

This work was supported by projects DPI2000-1272 and DPI2003-07986.

1

Robust line matching in image pairs of scenes with dominant planes Sag¨ u´es C., Guerrero J.J. DIIS - I3A, Universidad de Zaragoza C/ Mar´ıa de Luna num. 1 E-50018 Zaragoza, Spain. Phone 34-976-762349, Fax 34-976-761914. csagues, [email protected] Abstract We address the robust matching of lines between two views, when camera motion is unknown and dominant planar structures are viewed. The use of viewpoint non invariant measures gives a lot of non matched or wrong matched features. The inclusion of projective transformations gives much better results with short computing overload. We use line features which can usually be extracted more accurately than points and they can be used in cases where there are partial occlusions. In the first stage, the lines are matched to the weighted nearest neighbor using brightness and geometricbased image parameters. From them, robust homographies can be computed, which allows to reject wrong matches and to add new good matches. When two or more planes are observed, the corresponding homographies can be computed and they can be used to obtain also the fundamental matrix, which gives constraints for the whole scene. The simultaneous computation of matches and projective transformations is extremely useful in many applications. It can be seen that the proposal works in different situations requiring only a simple and intuitive parameter tuning. Keywords: Machine vision, matching, lines, homographies, uncalibrated vision, plane segmentation, multi-plane scene.

1

Introduction

We address the problem of robust matching of lines in two images when the camera motion is unknown. Using lines instead of points has been considered in other works [1]. Straight lines can be accurately extracted in noisy images, they capture more information than points, specially in man-made environments, and they may be used when occlusions occur. The invariant regions used to match images, as the well known SIFT [2], often fails due to lack of textures in man made environments with homogeneous surfaces [3]. Fortunately in this kind of scenes line segments are available to match them. However, line matching is more difficult than point matching because the end points of the extracted lines is not reliable. Besides that, there is not geometrical constraint, like the epipolar, for lines in two images. 2

The putative matching of features based on image parameters has many drawbacks, giving nonmatched or wrong matched features. Previously, the problem of wide baseline matching has been addressed establishing a viewpoint invariant affinity measure [4]. We use the homography in the matching process to select and to grow previous matches which have been obtained combining geometric and brightness image parameters. Scenes with dominant planes are usual in man made environments, and the model to work with multiple views of them is well known. Points or lines on the world plane projected in one image are mapped to the corresponding points or lines in the other image by a plane-to-plane homography [5]. As known, there is no geometric constraint for infinite lines in two images of a general scene, but the homography is a good constraint for scenes with dominant planes or small baseline image pairs, although they have large relative rotation. Robust estimation techniques are currently unquestionable to obtain results in real situations where outliers and spurious data are present [6, 7]. In this work the least median of squares and the Ransac methods have been used to estimate the homography. They provide not only the solution in a robust way, but also a list of previous matches that are in disagreement with it, which allows to reject wrong matches. To compute the homographies, points and lines are dual geometric entities, however line-based algorithms are generally less usual than point-based ones [8]. Thus, some particular problems related to data representation and normalization must be considered in practice. We compute the homographies from corresponding lines in two images making use of classical normalization of point data [9]. The simultaneous computation of matches and projective transformation between images is useful in many applications and the search of one homography is right if the scene features are on a plane or for small baseline image pairs. Although this transformation is not exact in general situations, the practical results are also good when the distance from the camera to the scene is large enough with respect to the baseline. For example, this assumption gives very good results in robot homing [10], where image disparity is mainly due to camera rotation, and therefore a sole homography captures the robot orientation, that is the most useful information for a robot to correct its trajectory. Our proposal can also be applied in photogrammetry for automatic relative orientation of convergent image pairs. In this context, points are the feature mostly used [11], but lines are plentiful in urban scenes. We have put into practice our proposal with aerial images and images of buildings, obtaining satisfactory results. If the lines are in several planes, the corresponding homographies can be iteratively obtained. From them, the fundamental matrix, which gives a general constraint for the whole scene, can be computed [12]. We have also made experiments to segment several scene planes when they exist, obtaining line matching in that situation. This segmentation of planes could be very useful to make automatically 3D model of urban scenes. After this introduction, we present in §2 the matching of lines using geometric and brightness parameters. The robust estimation of homographies from lines is explained in §3. After that, we present in §4 the process to obtain the final matches using the geometrical constraint given by the homography. Then we present the multi-plane method §5, and the useful case of having only vertical features §6. Experimental results with real images are presented in §7. Finally, §8 is devoted to expose the conclusions. A previous version of this matcher was presented in [13], but now we have completed and extended the work showing the multi plane method and the special case of having only vertical lines. We also include experiments made with the new software version which implements different robust 3

3 4

2 0

5

o

1 6

8 7

Figure 1: Left: Segmentation of the image into regions which have similar orientation of brightness gradient, LSR. Right: Zoom of the LSRs extracted on real image. From the LSR, we obtain the line with its geometric description and also attributes related to its brightness and contrast used for the line matching. techniques.

2

Basic matching

As said, the initial matching of straight lines in the images is made to the weighted nearest neighbor. These lines are extracted using our implementation of the method proposed by Burns [14], which computes spatial brightness gradients to detect the lines in the image. Pixels having the gradient magnitude larger than a threshold are grouped into regions of similar direction of brightness gradient. These groups are named Line Support Regions (LSR). A least-squares fitting into the LSR is used to obtain the line, and the detector gives not only the geometrical parameters of the lines, but also several attributes related to their brightness and some quality attributes, that provide very useful information to select and identify the lines (Fig. 1). We use not only the geometric parameters, but also the brightness attributes supplied by the contour extractor. So, agl and c (average grey level and contrast) in the LSR, are combined with geometric parameters of the segments such as midpoint coordinates (xm , ym ), the line orientation θ (in 2π range with dark on the right and bright on the left) and its length l. In several works, the matching is made over close images. In this field, correspondence determination by tracking geometric information along the image sequence has been proposed as a good solution [15]. We determine correspondences between lines in two images of large disparity in the uncalibrated case. Significant motion between views or changes on light conditions of the viewed scenes makes that few or none of the defined line parameters remain invariant between images. So, for example, approaching motions to the objects gives bigger lines which are also going out the image. Besides that, measurement noise introduces more problems to the desired invariance of parameters. Now, it must be stated that a line segment is considered to select the nearest neighbor, but assuming higher uncertainty along the line than across it. However, to compute afterwards projective transformations between images, only the infinite line information is considered. This provides an accurate solution being robust to partial occlusions without assuming that 4

the tips of the extracted line are the same.

2.1

Similarity measures

In the matching process two similarity measures are used: a geometric measure and a brightness measure. We name rg the difference of geometric parameters between both images (1, 2)    

rg = 

xm1 − xm2 ym1 − ym2 θ1 − θ2 l1 − l2

   . 

As previously [15], we define the R matrix to express the uncertainty due to measurement noise in the extraction of features in each image 

2 S 2 + σ2C 2 2 CS − σ 2 CS σ⊥ σ⊥ k k  2 2 C 2 + σ2S 2  σ⊥ CS − σk2 CS σ⊥ k

R=  

0 0

0 0

0 0

2

2 σ⊥ l2

0



0 0  

. 

0  2σk2

where C = cos θ y S = sin θ. Location uncertainties of segment tips along the line direction and along the orthogonal direction are represented by σk and σ⊥ respectively. Additionally we define the P matrix to represent the uncertainty on the difference of the geometric parameters due to camera motion and unknown scene structure    

P=

σx2m 0 0 0

0 σy2m 0 0

0 0 0 0 σθ2 0 0 σl2

   , 

where σi , (i = xm , ym , θ, l) represents the uncertainty of variation of the geometric parameters of the image segment. Thus, from those matrixes we introduce S = R1 + R2 + P to weigh the variations on the geometric parameters of corresponding lines due to both, line extraction noise (R1 in the first image and R2 in the second image) and unknown structure and motion. Note in R that σk is bigger than σ⊥ . Therefore measurement noise of xm and ym are coupled and the line orientation shows the direction where the measurement noise is bigger (along the line). However, in P the orientation does not matter because the evolution of the line between images is mainly due to camera motion, which is not dependent on the line orientation in the image. The matching technique in the first stage is made to the nearest neighbor. The similarity between the parameters can be measured with a Mahalanobis distance as dg = rg T S−1 rg . The second similarity measure has been defined for the brightness parameters. In this case we have "

B=

2 σagl 0 0 σc2

5

#

,

where σagl and σc represent the uncertainty of variations of the average gray level and the contrast of the line. Both depend on measurement noises and on changes of illumination between the images. Naming rb the variation of the brightness parameters between both images "

rb =

agl1 − agl2 c1 − c2

#

,

the Mahalanobis distance for the similarity between the brightness parameters is db = rb T B−1 rb .

2.2

Matching criteria

Two image lines are stated as compatible when both, geometric and brightness variations are small. For one line in the second image to belong to the compatible set of a line in the first image, the following tests must be satisfied. • Geometric compatibility Under assumption of Gaussian noise, the similarity distance for the geometric parameters is distributed as a χ2 with 4 degrees of freedom. Establishing a significance level of 5%, the compatible lines must fulfill, dg ≤ χ24 (95%). • Brightness compatibility Similarly referring to the brightness parameters, the compatible lines must fulfill, db ≤ χ22 (95%). A general Mahalanobis distance for the six parameters is not used because the correct weighting of so different information as brightness based and location based in a sole distance is difficult and could easily lead to wrong matches. Thus, compensation between high precision in some parameters with high error in other parameters is avoided. A line in the first image can have more than one compatible line in the second image. From the compatible lines, the line having the smallest dg is selected as putative match. The matching is carried out in both directions from the first to second image and from second to first. A match (n1 , n2 ) is considered valid when the line n2 is the putative match of n1 and simultaneously n1 is the putative match of n2 . In practice the parameters σj (j = ⊥, k, xm , ym , θ, l, agl, c) introduced in R, P, B must be tuned according to the estimated image noise, expected camera motion and illumination conditions, respectively.

6

3

From lines to homographies

The representation of a line in the projective plane is obtained from the analytic representation of a plane through the origin: n1 x1 +n2 x2 +n3 x3 = 0. The coefficients of n = (n1 , n2 , n3 )T correspond to the homogeneous coordinates of the projective line. All the lines written as λn are the same than n. Similarly, an image point p = (x, y, 1)T is also an element of the projective plane and the equation pT · n = nT · p = 0 represents that the point p is on the line n, which shows the duality of points and lines. A projective transformation between two projective planes (1 and 2) can be represented by a linear transformation H21 , in such a way that p2 = H21 p1 . Considering the above h iT equations for lines in both images, we have n2 = H−1 n1 . This homography requires 21 eight parameters to be completely defined, because it is defined up to an overall scale factor. A pair of corresponding points or lines gives two linear equations in terms of the elements of the homography. Thus, four pairs of corresponding lines assure a unique solution for H21 , if no three of them are parallel.

3.1

Computing homographies from corresponding lines

Here, we obtain the projective transformation of points (p2 = H21 p1 ), but using matched lines. To deduce it, we suppose the start (s) and end (e) tips of a matched line segment to be ps1 , pe1 , ps2 , pe2 , which usually will not be corresponding points. The line in the second image can be computed as the cross product of two of its points (in particular the observed tips) as ˜ s2 pe2 , n2 = ps2 × pe2 = p

(1)

˜ s2 is the skew-symmetric matrix obtained from vector ps2 . where p As the tips belong to the line we have, pTs2 n2 = 0; pTe2 n2 = 0. As the tips of the line in the first image once transformed also belong to the corresponding line in the second image, we can write, pTs1 HT21 n2 = 0; pTe1 HT21 n2 = 0. Combining with equation (1) it turns out, ˜ s2 pe2 = 0 ; pTe1 HT21 p ˜ s2 pe2 = 0. pTs1 HT21 p

(2)

Therefore each pair of corresponding lines gives two homogeneous equations to compute the projective transformation, which can be determined up to a non-zero scale factor. Developing them in function of the elements of the projective transformation, we have Ã

Axs1 Ays1 A Bxs1 Bys1 B Cxs1 Cys1 C Axe1 Aye1 A Bxe1 Bye1 B Cxe1 Cye1 C

!

Ã

h=

0 0

!

,

where h = (h11 h12 h13 h21 h22 h23 h31 h32 h33 )T is a vector with the elements of H21 , and A = ys2 − ye2 , B = xe2 − xs2 and C = xs2 ye2 − xe2 ys2 . Using four line correspondences, we can construct a 8 × 9 matrix M. The solution for h is the eigenvector associated to the least eigenvalue (in this case the null eigenvalue) of the matrix MT M. If we have more than the minimum number of good matches, an estimation method may be considered to obtain a better solution. Thus n matches give a 2n × 9 matrix M, and the solution h can be obtained using the singular value decomposition of this matrix [5]. 7

It is known that a previous normalization of data avoids problems of numerical computation. As our formulation only uses image coordinates of observed tips, data normalization proposed for points [9] has been used. In our case the normalization makes the x and y coordinates to be centered in an image which has unitary width and height. Equations (2) can be used as a residue to designate inliers and outliers with respect to a candidate homography. In this case the residue is an algebraic distance and the relevance of each line depends on its observed length, because the cross product of the segment tips is related to its length. Alternatively a geometric distance in the image can be used. It is obtained from the sum of the squared distances between the segment tips and the corresponding transformed line d2 =

˜ s2 pe2 )2 ˜ s2 pe2 )2 (pTe1 HT21 p (pTs1 HT21 p + , ˜ s2 pe2 )2x + (HT21 p ˜ s2 pe2 )2y (HT21 p ˜ s2 pe2 )2x + (HT21 p ˜ s2 pe2 )2y (HT21 p

(3)

where subindexes ()x and ()y indicate first and second component of the vector respectively. Doing the same for both images and with both observed segments, the squared distance with geometric meaning is:

d2 =

3.2

˜ s2 pe2 )2 + (pTe1 HT21 p ˜ s2 pe2 )2 (pTs2 H−T ˜ s1 pe1 )2 + (pTe2 H−T ˜ s1 pe1 )2 (pTs1 HT21 p 21 p 21 p + ˜ s2 pe2 )2x + (HT21 p ˜ s2 pe2 )2y (HT21 p ˜ s1 pe1 )2x + (H−T ˜ s1 pe1 )2y (H−T 21 p 21 p

(4)

Robust estimation

The least squares method assumes that all the measures can be interpreted with the same model, which makes it very sensitive to outliers. Robust estimation tries to avoid the outliers in the computation of the estimate. From the existing robust estimation methods [6], we have initially chosen the least median of squares method. This method makes a search in the space of solutions obtained from subsets of minimum number of matches. If we need a minimum of 4 matches to compute the projective transformation, and there are a total of n matches, then the search space will be obtained from the combinations of n elements taken 4 by 4. As that is computationally too expensive, several subsets of 4 matches are randomly chosen. The algorithm to obtain an estimate with this method can be summarized as follows: 1. A Monte-Carlo technique is used to randomly select m subsets of 4 features. 2. For each subset S, we compute a solution in closed form HS . 3. For each solution HS , the median MS of the squares of the residue with respect to all the matches is computed. 4. We store the solution HS which gives the minimum median MS . A selection of m subsets is good if at least one of them has no outliers. Assuming a ratio ² of outliers, the probability of one subset to be good can be obtained [16] as, m P = 1 − [1 − (1 − ²)4 ] . Thus assuming a probability (1 − P ) of mismatch the computation, the number of subset m to consider can be obtained. For example, if we want a probability P = 0.99 of one of them being good, having ² = 35% of outliers, the number of subsets m 8

should be 24. Having ² = 70% of outliers the number of subsets m should be 566 and a minor quantile instead of the median must be selected. Anyway, a reduction of ten times in the probability of mismatch the computation (P = 0.999) only yields a computational cost of 0.5 times larger (m = 36 with ² = 35% and m = 849 with ² = 70%).

3.3

Rejecting wrong basic matches

Once the solution has been obtained, the outliers can be selected from those of higher residue. Good matches can be selected between those of residue smaller than a threshold. As in [6] the threshold is fitted proportional to the standard deviation of the error that can be estimated [16] as, q

σ ˆ = 1.4826 [1 + 5/(n − 4)]

MS .

Assuming that the measurement error for in inliers is Gaussian with zero mean and standard deviation σ, then the square of the residues is a sum of Gaussian variables and follows a χ22 distribution with 2 degrees of freedom. Taking, for example, a 5% of probability of rejecting a line being correct, the threshold will be fixed to 5.99 σ ˆ2.

4

Final matches

From here on, we introduce the geometrical constraint imposed by the estimated homography to get a bigger set of matches. Our objective here is to obtain at the end of the process more good matches, also eliminating wrong matches given by the basic matching. Thus final matches are composed by two sets. The first one is obtained from the matches selected after the robust computation of the homography passing additionally an overlapping test compatible with the transformation of the segment tips. The second set of matches is obtained taking all the segments not matched initially and those being rejected previously. With this set of lines, a matching process similar to the basic matching is carried out. However, now the matching is made to the nearest neighbor segment transformed with the homography. The transformation is applied to the end tips of the image segments using the homography H21 to find, not only compatible lines but also compatible segments in the same line. In the first stage of the matching process there was no knowledge of the camera motion. However in this second step the computed homography provides information about expected disparity and therefore the uncertainty of geometric variations can be reduced. So, a new tuning of σxm , σym , σθ and σl , is considered. To automate the process, a global reduction of these parameters has been proposed and tested in several situations, obtaining good results with reductions of about 1/5. As the measurement noise (σk and σ⊥ ) has not changed, the initial tuning of these parameters is maintained in this second step. Note that the brightness compatibility set is the initially computed, and therefore it is not repeated.

5

Different planes

When lines are in different planes, the corresponding homographies can iteratively be computed. Previously, we have explained how to determine a homography assuming that some 9

of the lines are outliers. The first homography can be computed in this way assuming that a certain percentage of matched lines are good matches and belong to a plane, and the other matches are bad or they cannot be explained by this homography. We do not have a priori knowledge about which plane of the scene is going to be extracted the first, but the probability of a plane being chosen increases with the number of lines in it. The most reasonable way to extract a second plane is eliminating the lines used to determine the first homography. Here we are assuming that lines belonging to the first plane do not belong to the second, which is true except for the intersection line between both planes. This idea of eliminating lines already used will apply if more planes are extracted. On the other hand, in the multi-plane case the percentage of outliers for each homography will be high even all the matches are good. As the least median of squares method has a breakdown point in 50% of outliers, we have used in this case the Ransac method [17] which needs a tuning threshold, but works with a less demanding breakdown point. Besides that, all the homographies obtained from an image pair are related between them. This relation can be expressed with the fundamental matrix (F21 ) and the epipole (e2 ), which is unique for an image pair, through the following equation: F21 = [e2 ]× Hπ21 ,

(5)

[e2 ]× being the skew matrix corresponding to epipole e2 of the second image and Hπ21 the homography between first and second image through plane π. If at least two homographies (Hπ211 , Hπ212 ) can be computed between both images corresponding to two planes (π1 , π2 ), an homology H = Hπ211 · (Hπ212 )−1 , that is a mapping from one image into itself, exists. It turns out that the homology has two equal eigenvalues [5]. The third one is related to the motion and to the structure of the scene. These eigenvalues can be used to test when two different planes have been computed, and then the epipole and the intersection of the planes can be also obtained. The epipole is the eigenvector corresponding to the non-unary eigenvalue of the homology and the other two eigenvectors define the intersection line of the planes [18]. In case of small baseline or if there exist only one plane in the scene, the epipolar geometry is not defined and only one homography can be computed. So possible homology H will be close to identity, up to scale. In practice, we propose a filter to notice the goodness of the homology using these ideas. Firstly, we normalize the homology dividing by the median eigenvalue. If there are no two unary eigenvalues, up to a threshold, then the computation of the second homography is repeated. If the three eigenvalues are similar we search if the homology is close to identity. In this case two similar homographies would explain the scene or the motion, and the fundamental matrix cannot be computed. In other case, two planes and two coherent homographies are given as result of the process. From them the fundamental matrix can be computed [12].

6

Particular case: Vertical lines

A special case of interest in some applications can be considered reducing in one dimension the problem. For example, when robot moves at man made environments, the motion is on a plane and vertical lines give enough information to carry out visual control. For vertical lines, only the x coordinate is relevant to compute the homography, therefore the problem

10

is simplified. The homography now has three parameters and each vertical line gives one equation, therefore three vertical line correspondences are enough. The matching is quite similar to the presented for the lines in all directions. In this case the parameter referred to the orientation of the lines is discarded because it has no sense, although the lines are grouped in two incompatible subsets, those having 90 degrees of orientation and those with 270 degrees of orientation.

6.1

Estimation of the homography

In this case, as the dimension of the problem is reduced, only the xm coordinate is relevant for the computation of the projective transformation H21 . We can use the point based formulation which is simpler because a vertical line can be projectively represented as a point with (xm , 1) coordinates. Then, we can write, up to a scale factor Ã

λ xm2 λ

!

Ã

=

h11 h12 h21 h22



xm1 1

!

Therefore each pair of corresponding vertical lines having xm1 and xm2 coordinates respectively provides one equation to solve the matrix H21  ³

xm1

1

−xm1 xm2

−xm2

´    

h11 h12 h21 h22

    = 0. 

With the coordinates of at least n = 3 vertical line correspondences we can construct a n × 4 matrix M. The homography solution is the eigenvector associated to the least eigenvalue of the MT M matrix. As above before, we use a robust method to compute the projective transformation. With vertical lines the problem is computationally simpler. For example with P = 0.99 and 70% of outliers only 168 sets of 3 matches must be considered (in the general case we need 566 sets of 4 matches). As told before, the estimated homography is also used to obtain a better set of matched lines.

7

Experimental Results

A set of experiments with different kind of images has been carried out to test the proposal. The images correspond to different applications: Indoor robot homing, images of buildings, and aerial images. In the algorithms there are extraction parameters to obtain more or less line segments according to its minimum length and minimum gradient. There are also parameters to match the lines, whose tuning has turned out simple and quite intuitive. In the experiments we have used the following tuning parameters σ⊥ = 1, σk = 10, σagl = 8, σc = 4, σxm = 60, σym = 20, σθ = 2, σl = 10, or small variation with respect to them depending on expected image disparity. We have applied the algorithm presented for robot homing. In this application the robot must go to previously learnt positions using a camera [10]. The robot corrects its heading from the computed projective transformation between target and current images. In this 11

Robot Rotation 2◦ 4◦ 6◦ 8◦ 10◦ 12◦ 14◦ 16◦ 18◦ 20◦

σxm 60 60 60 60 60 100 100 100 140 140

Basic 92 (1W) 73 (5W) 63 (4W) 53 (6W) 47 (9W) 41 (9W) 36 (12W) 27 (9W) 37 (14W) 28 (10W)

After H21 78 (0W) 56 (1W) 47 (1W) 31 (0W) 35 (1W) 30 (2W) 24 (2W) 17 (3W) 24 (3W) 17 (1W)

Final 90 (0W) 76 (0W) 63 (0W) 52 (0W) 50 (0W) 33 (1W) 32 (1W) 30 (1W) 28 (1W) 24 (0W)

Table 1: Number of matches with several robot rotation, indicating also the number of wrong matches (W). First column show the robot rotation. Second shows the tuning of σxm used in the matching. Third shows the basic matches. Forth shows the matches obtained after the constraint imposed by the homography and fifth column shows the final matches. Here, the matches that are good as lines but wrong as segments (they are not overlapped) are counted as wrong matches. experiment a set of robot rotations (from 2 to 20 degrees) has been made. The camera center is about 30 cm. out of the axis of rotation of the robot and therefore this camera motion has a short baseline. In the table 1 the number of matches in the successive steps with this set of camera motions are shown. The number of lines extracted in the reference image is 112. Here, the progressive advantage of the simultaneous computation of the homography and matching can be seen. When the image disparity is small, the robust estimation of the homography does not improve the basic matching. However, with a disparity close to 70% of the image size, the basic matching produces a high ratio of wrong matches (> 30%), that are automatically corrected in the final matching. We observe that in this case the system also works even between two images having a large image disparity. To simplify, only the images corresponding to the 18 degrees of robot rotation are shown (Fig. 2). A 38 % of wrong matches are given by the basic matching. At the final matching stage, all the matches are good when considered as lines, although one of them can be considered wrong as segment. We have also used two aerial images with large baseline (Fig. 3). In photogrammetry applications usually putative matching has a high ratio of spurious results. This is confirmed in our case, where the basic matching has given a ratio of wrong matches higher than 50%, which is the theoretical limit of least median of squares method. However, if we select a smaller quantile of the squares of the residue instead of the median, the robust method works properly. The results have been obtained with the Least 30-Quantile of Squares (Fig. 3). The robust computation of the homography provides 55 matches, 11 of them being wrong as segments but good as infinite lines. Among the 105 final matches, there are less than 3% of wrong matches which correspond to contiguous cars. This means that the final matches are duplicated with respect to the matches obtained with the homography. Note also that the final matches selected are mainly located on the ground. There are some lines on the roofs of the buildings but they are nearly parallel to the flight of the camera which is coherent with the model of homography used.

12

35

2

2

35 3

1 3

1

4

7

7

8

8 10

10 15 16

9

11

12 14 16

12 14

20 25

26

26

36

25

28

28 22 27

30

18 21

3723 29

24

21

37 29

19

20

19

36 24

11

15 17

17

23

9

13

13

18

54

6

6

5

22 31 27 33

30

31 32

32 33

34

34

18

2

2

18

1 1

3

3

4

4 5

5 22 20 21 24 26

23 25 27

11

28

19 15

11

8 12

6

1910 15

8 12 14

14 16

28

25

23 27

7

6

7 10

22 20 21 24 26

9 13

16

9 13 17

17

Figure 2: Images with 18 degrees of robot rotation. Images of first row show the matches after the basic matching (37 matches, 14 wrong). Second row shows the matches at the final stage of the process, only one been wrong match as segment although it is good as line (number 10).

13

3

106

11052

107

110

108 3

4

109

2 105

107

1

106

109

108 110

4

5 10

11

5 10

6

8 9

11

12

7

9

13

16

13

12

6

8

14 7

16 18

14 17 19

21

22

18

19

20

15

15

22

23

17

23

20

21

26

24

24 26 27

25

27 25

28

28

111 111 29 31

120117

40

38

41 45 44

43

33

37

38

113 115 116 118

42

34

112

35

35 39

31

112 3233 34

36

29

30

30

113

36

39

119

47 48 46

120117 47

32 42

40

115 116 118

114

37

41

114

43 44 45 119

46 48

49 51

50

49 51 50

5352

54

57

55

56

53

52 54

55

56

57 58

58

59

61 62 66 67

66 70

69

75

73

74 71

77

64 77

72 78

60

80 82

83

79

81 87

85

89

88 91 94

84

90 92

91

93

96

95

96

69 74

86

81 87 93

65

76

64

75 78

86

68

63

76

84 85 89

61 67

62 65

72

73 70

83

59

60

68

63

97 121

97 121

80 82 88 90 94

71

79 95 92

98

99 9899

100

101 101

103

100

102 104

99

103

102 104

99

100

101

100

101

27 30

31

32

28

32

1

33

1

27 30

29

31

29

33

28

2 3

3

2

34 4

4

34

5

5 35 36

39

39

38

40

36 35

38

37 40

6

41 42

42

22

47 105

24

54 55

60

71

64 66 70 17 73

19

79 83 88

26

11 54 55

12

57

84 20 87 91

11 50 52 51

53

78 80 90

81 85 89

13 59 63

69 72 76

53

62

60

15

68

14

71

16 74 75 77 82 86

64 66 70 17 73

18

19 79 83

84 20 87 91

21

92 94

7880

88 90 26

93

52 51

50

56 57

58 61 65 67

103105

10

48

56

68

24

25 23 49

48

62

104

9 10

45

46

8

103 104

9

12

44 102

45

46 8

25 23

37

22 7

102 47

6

43

44

43 7

49

41

81 85 89

13 59 63 65 69 72 76 77

58 61 67

15 14 16

74 75 82

18

86

92 94

21

93 98

95

96

98 97

95 96 97

Figure 3: Two aerial images with large baseline. In the first row, the lines extracted are superimposed to the images (approximately 300 lines/image). The second row shows the basic matches (121 matches, 64 being wrong). Third row shows the matches at the final stage (105 matches, 3 being wrong that are corresponding to contiguous cars).

14

1

1

2

26 18 17 19 2128

32 24 25 33 2035

2

15 14 16 31 3427

36

29

4

15 14 16 31 3427

5

4

36

29 7

67 22 23

9

32 24 33 25 2035

18 26 17 19 2128 3

3 5

22 23

9

8

6

8

37

12

10

10 11

11

12

37

30

30 13

13 38

38 41 40 42

41

4339

40

42

44

4339

44

16

16 2

2 17

17

18 3

18 3

4

215

22

4

19

20

23

24

215

25 6

1

19

20

24

23 22

25 6

1

26 8

26 8

7

7 9

11

10 27

11 12

9 10 27

13

12 13

28

28 14 15

14

29

29 30

15

30

Figure 4: Two examples of image pairs with automatically matched vertical lines. House College

Basic.Matches Good 148 75% 196 82%

Final.Matches 114.6 (8.91) 156.7 (11.09)

Good 99% (1%) 96% (1%)

Table 2: Number of matches of scenes in Figs. 5 and 6. Only the lines on the two main planes are considered. We have repeated the experiments fifty times using the same basic matches, showing as final matches the mean values and, in brackets, the standard deviation. We also show two examples of matched vertical lines with an homography of reduced dimension (Fig. 4). In both cases the percentage of good final matches is close to 100%. Using vertical lines, we have developed a real time implementation for robotic applications in man made environments [10]. Other experiments have been carried out to match lines which are in several planes, segmenting simultaneously the planes (Figs. 5, 6). In Table 2 we show the number of matches and the ratio of good matches for 50 repetitions of the robot computation. In this case, once an homography has been computed, the robust homography computation and the growing of matches process has been iteratively repeated twice. As it can be seen, the number and quality of the final matches is very good. The homology filter just commented in section 5 has been used to detect situations where a second plane is not obtained and therefore a sole homography can be computed, or when the homographies do not give a right homology due to noise or bad extraction. It can be seen that some lines of the second plane have been selected by the first homography (Fig. 6). The reason is that these lines are nearly coincident with epipolar lines and they can be selected by any homography being compatible with the fundamental matrix. The method also work with rotated images. In this case σθ must be changed according to the image disparity. We show an example of a rotated image pair of short baseline where a sole homography gives very good results (Fig. 7). 15

123 124

123

126

124

126

3 125

125

1 2 67 5

127

4 11 130 10 8 1412

128 131

24 2126

13

135 132 19 28 136 34 36 141 142 43 45

31 32

138

46

7268

45 43 142 141

70

67

52

6971

61 59 82

83 85 89 84 90

88

144 91

93

86

92 87

96 95 109

111

138

106 147 116

112

33

39

143 5154 55

44

49

47 48 56

137

34 36

40

139 140 42

6562 64 60

19 28

132 136 135

30

35

41

7774 7976

107 97

13

27 29 133

5350 67 57

127 128131

18 20 17 15 23 2216 25

134

37

38

129 6 2 7 9 130 4 12 11 108 18 20 14 17 2216 1523 25 24 29 2126 27 134 133 30

5

137 33 31 32

39

3

1

9 129

73 58 66 75 7880

57

63

62 65 64

5350

5154

42 46

44

58

66 7578 80

6971

61 59

144 91 86

8792

96 95 106 147 116

109 111

63

82

83 85 89 84 90

93 107 97

40

70 73 55

52

7268

88

81 94 145 98 100 99146 101 102 103 104 105 108 110 115 117 118 119

143

49

47 48 56 60 77 74 76 79

139 140

35

41 38 37

112

81 102 10099 146 104 105 108 110 115 117 118 119 94 145 101 98 103

114 113

113 114

121

122

120

121 122

120

64

64 80

22 56

22

68 67

65 66 25 24 63 26 57

1 69 4

2 3 30

74

65 63 26 57 2 3 30

58

74

4847 5049 76 77 52 14 16 13 61

15 17

51

4847 5049 52 14 16 13 61

76 77 51 78

17

19 18 21

70 73

71 72 5 75 11 9 10 12 15

78

58

4342 4544

78 6

70

73

71 72

27 28 32 35

29 31 54 33 60 40

59 34 36 38

4 37 39 41 55 46

119 1210

68 67

24 66 25

80

1 69

4342 4544

5

75

27 28 32 35

29 31 54 33 60 40

59 34 36 38

37 39 41 55 46 78 6

56

23 53

23 53

19 18

20

20

62

21

62 79

79

50

51

29

29

50 51 34 3421 22 23 24 25 11 26

1

1415

16 17 27

37 313240

18 30 39 2 319 4

25 11 26 14 15

21 22 23 24

36

36

5 12 6 9 35 10 20

7 33

37 3132 40

38

8 13 28 42 43 44

18 39 2 3194

30

38

5 12 6

41

337

8 13 43 44

52 9 35 10 20

45 46 48

1

16 17 27

47 49

41 28 42

52

45 46 48

47 49

Figure 5: Images of the house. First row: basic matches. Second row: final matches corresponding to the first homography. Third row: final matches corresponding to second homography (Original images from KTH, Stockholm).

16

1 1

2

3

3

2 4 82 8 12 699 11 17 63 18 64 23 25

61 20 21

12699 11 17 63 18 64 23 25

30 31 32 33 34

35 39

38 72 40 42

41

8

19

22 70 24

26 2728 71 29

6676 5 57 58 10 67 68 60 13 59 83 14 62 15 65 16

26 27 28 71 29

36 37

45 84

74 75 92

47 50 55 56

46

52

78 80

93

85 88 90 53 81

48

87 89 91

84 74 75 92

54 5179

30 31 32 33 34

35 39 36 37

43 44

45 73 86 49 76 77

65 16 19

20 21

22 70 24 38 72 40 42

41

43 44

61

824 66 75 6 57 58 10 68 67 60 83 13 59 14 62 15

46 47 50

52

55 56

73 86 49 76 77

78 80

85 88 90 9353 81

48

87 89 91

54 5179

39 39

40 41

69

40 41 69

42 1

62 71 11

2

3

45

89

47 6 43 7

42 81 9

70 10

7213

48 17 16 20

63

2 3

43

7213

670 10

62 71 11 19 18 49 44 22

45 7

12 14 15

47

12 14 15

63 4817

21 23 24

1620

19 18 49 44 22

21 23 24

25 26 27

64 45

54 33 55 35 36 74 38 46 68

30 58 61

25 26 28 65 51 29 31 52 34 32 53 73 66 56 37 57 59 60 67 50

64 45

27 33 55

54

35 36 74 46 38 68

58 61

30

28 50 65 51 29 3432 53 31 5273 66 37 57 59 56 67 60

Figure 6: Images of the college. First row: basic matches. Second row: final matches corresponding to first homography. Third row: final matches corresponding to second homography (Original images from VGG, Oxford).

17

78 9 10 79

80 1 113 114 116 117

73 84

11 70

115

8473

12 14

71 72 13

83

19 92 75 76 34 25 29 28

16 87 89 91

20 96 25 92 75 19 28 76 34 29

85 95 23 24 15 2 33 3

86 1790 27 88 18 21 74 26 22

30 32 35 98

97

99 101

11 70

88 18 74

36 103 41

100

105104

42

103 105 104 8 106

43 44

102 45 4 6 47 48

7 59

60

32 35

98

31 94

81 93

5 77

49 51

102 59 7 62 65 63 112 68 66 64 69 67

36

6 60 58

108

61

62 63 112 65 66 64 67 68 69

109 111

61

100

45 4746 50 48

110

56 55

58

82 30

37

109 53 108 111 54

110 57 77

12 14

4 57

52

50

38

6

5 4

13

106 8

40

107

10 79

16 2286 21 26 1790 27 23 24 85 95 15 2 97 333

107

42

1 71 72

39

31 94

78 9

80

83 87 89 91

99 101

93

38

41

115 9620

37 39

113 114 116 117

81 82

43 44

53 54

55 56

40

52 49 51

78 9 10 79

80 1 113 114 116 117

73 84

11 70

115

8473

12 14

71 72 13

83

19 92 75 76 25 34 29 28

16 87 89 91

20 96 25 92 75 19 28 76 34 29

88 18 21 74 26 22

861790 27

95 85 23 24 15 2 33 3

30 32 35 98

97

99 101

88 18 74

39

31 94 36

103 105 104

107

42

8 106

43 44

102

107

45 46 47 48 50

6

5 4

110 7 59

52

16 13 12 14

38 37

82 30 32 35

98

31 94

81

4

93 5

57 77

49 51

102 59 7 62 65 63 112 68 66 64 69 67

53 109 111 54 56 55

58 60

108

10 79

23 24 85 95 15 2 97 333

105104

106 8

40

1

22 86 26 21 1790 27

103

100

78 9

80 71 72

41 42

57 77

11 70 83 87 89 91

99 101

93

38

41

115

9620

37 39

113 114 116 117

81 82

60

108

61 55 56

109 111

100

45 4746 50 48

110 58

61

62 63 112 65 66 64 67 68 69

36

6 43 44 53 54

40

52 49 51

Figure 7: Pair of rotated short baseline images and final line correspondences.

18

8

Conclusions

We have presented and tested a method to automatically obtain matches of lines simultaneously to the robust computation of homographies in image pairs. The robust computation works especially well to eliminate outliers which may appear when the matching is based on image properties and there is no information of scene structure or camera motion. The homographies are computed from lines extracted and the use of lines has advantages with respect to the use of points. The geometric mapping between uncalibrated images provided by the homography turns out useful to grow matches and to eliminate wrong matches. All the work is automatic with only some previous tuning of parameters related to the expected global image disparity. As is seen in the experiments, the proposed algorithm works with different types of scenes. With planar scenes or in situations where disparity is mainly due to rotation, the method gives good results with only one homography. However, it is also possible to compute several homographies when several planes are in the scene, which are iteratively segmented. In this case, the fundamental matrix can be computed from lines through the homographies, obtaining a constraint for the whole scene. Currently we are working with other image invariants to improve the first stage of the process.

Acknowledgments This work was supported by project DPI2003-07986.

References [1] C. Schmid and A. Zisserman, “Automatic Line Maching across Views,” In IEEE Conference on CVPR, pp. 666–671 (1997). [2] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision 60(2), 91–110 (2004). [3] H. Bay, V. Ferrari, and L. V. Gool, “Wide-Baseline Stereo Matching with Line Segments,” In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, 329–336 (2005). [4] P. Pritchett and A. Zisserman, “Wide Baseline Stereo Matching,” In IEEE Conference on Computer Vision, pp. 754–760 (1998). [5] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision (Cambridge University Press, Cambridge, 2000). [6] Z. Zhang, “Parameter Estimation Techniques: A tutorial with Application to Conic Fitting,” Rapport de recherche RR-2676, I.N.R.I.A., Sophia-Antipolis, France (1995) . [7] P. Torr and D. Murray, “The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix,” International Journal of Computer Vision 24, 271–300 (1997).

19

[8] L. Quan and T. Kanade, “Affine Structure from Line Correspondences with Uncalibrated Affine Cameras,” IEEE Trans. on Pattern Analysis and Machine Intelligence 19(8), 834–845 (1997). [9] R. Hartley, “In Defense of the Eight-Point Algorithm,” IEEE Trans. on Pattern Analysis and Machine Intelligence 19(6), 580–593 (1997). [10] J. Guerrero, R. Martinez-Cantin, and C. Sag¨ u´es, “Visual map-less navigation based on homographies,” Journal of Robotic Systems 22(10), 569–581 (2005). [11] A. Habib and D. Kelley, “Automatic relative orientation of large scale imagery over urban areas using Modified Iterated Hough Transform,” Journal of Photogrammetry and Remote Sensing 56, 29–41 (2001). [12] C. Sag¨ u´es, A. Murillo, F. Escudero, and J. Guerrero, “From Lines to Epipoles Through Planes in Two Views,” Pattern Recognition (2006), to Appear. [13] J. Guerrero and C. Sag¨ u´es, “Robust line matching and estimate of homographies simultaneously,” In IbPRIA, Pattern Recognition and Image Analysis, LNCS 2652, pp. 297–307 (2003). [14] J. Burns, A. Hanson, and E. Riseman, “Extracting Straight Lines,” IEEE Trans. on Pattern Analysis and Machine Intelligence 8(4), 425–455 (1986). [15] R. Deriche and O. Faugeras, “Tracking Line Segments,” In First European Conference on Computer Vision, pp. 259–268 (Antibes, France, 1990). [16] P. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection (John Wiley, New York, 1987). [17] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” Comm. of the ACM 24(2), 381–395 (1981). [18] L. Zelnik-Manor and M. Irani, “Multiview Constraints on Homographies,” IEEE Transactions on Pattern Analysis and Machine Intelligence 24(2), 214–223 (2002).

20

Suggest Documents