very good image registration, global alignment is still needed to obtain a globally ... [8] showed that using a grid of points on the mosaic frame produces good ..... Fischler, M., Bolles, R.: Random sample consensus: A paradigm for model fitting.
A New Global Alignment Method for Feature Based Image Mosaicing A. Elibol, R. Garcia, O. Delaunoy, and N. Gracias Computer Vision and Robotics Group, University of Girona, 17071, Spain {aelibol,rafa,delaunoy,ngracias}@eia.udg.edu
Abstract. Over the past decade, image mosaicing has become as an important tool for several different areas such as panoramic photography, mapping, scene stabilization, video indexing and compression. Although recent advances in detection of image correspondences have resulted in very good image registration, global alignment is still needed to obtain a globally coherent mosaic. Normally, global alignment requires the nonlinear minimization of an error term, which is defined from image correspondences. In this paper, a new global alignment method is presented. It works on the mosaic frame and does not require any non-linear optimization. The proposed method has been tested with several image sequences and comparative results are presented to illustrate its performance.
1
Introduction
Mosaics are very useful for many applications such as panoramic photography [1], mapping [2], scene stabilization [3], video indexing [4] and compression [5]. Illumination effects, noise, lack of image contrast and blurring in some parts of the image make image registration difficult and errors in image registration cause misalignment when images are mapped onto a mosaic (e.g. global) frame. Global projection of images (global or absolute homography) can be calculated by successively multiplying relative homographies between time consecutive images. Since homographies have some errors due to positions of correspondences and estimation methods, accumulating the time consecutive ones makes the error bigger in the form of misalignment and distortions on the mosaic. To deal with this problem, global alignment methods are needed. In the context of this paper, we refer to the global alignment as the problem of finding the image registration parameters that best comply with constraints given by image matches. These matches can be time-consecutive or not. Several methods to solve the global alignment problem have been proposed in the literature [6,7,8,9,10,11]. Capel [6] formulates the global alignment problem in such a way that not only do absolute homographies describe the global projection of every image, but locations of features on the mosaic frame are also unknowns. However, as data set gets bigger, the total number of unknowns dramatically increases. This may cause problems during minimization. In addition, this method has not been tested with low overlapping images. Kang et al. [7] and Marzotto G. Bebis et al. (Eds.): ISVC 2008, Part II, LNCS 5359, pp. 257–266, 2008. c Springer-Verlag Berlin Heidelberg 2008
258
A. Elibol et al.
et al. [8] showed that using a grid of points on the mosaic frame produces good results. Although the use of a grid of points has the advantage of a well distributed error, it has some disadvantages, such as: (1) the point location must be defined very carefully so that every image and overlapping areas have enough grid points to calculate the homography and (2) since the points are distributed arbitrarily,they may fall in a textureless area, making it difficult to track it with the corresponding point in another image. Sawhney et al. [9] defined an error function on the mosaic frame and minimized the distances between correspondences when they are mapped to the mosaic frame. Unfortunately, when the minimization is carried out on the mosaic frame, the solution tends to reduce the size of the mosaiced images, since reducing its size also decreases the error term. In order to avoid this problem, they introduced an additional term that penalizes changes on the image size when the images are mapped to a mosaic frame. However, forcing image sizes causes some misalignments between images. Similarly, Can et al. [10] proposed linear joint estimation of two combined error terms. The first term is to measure errors against feature locations in the image frame which is chosen as a global frame and the second term is to minimize the distance between correspondences when they are mapped onto mosaic frame. However, minimizing the proposed error term for projective type of homographies is not linear anymore. Therefore, the proposed method cannot cope with projective homographies. More recently, Cervantes and Kang [11] have presented a technique very similar to our proposed method. Their method uses image-tomosaic registration, known as “online mosaicing”. In this case, if an error occurs while mapping one image onto the mosaic, future image-to-mosaic registration is likely to fail. In this paper, we analyze Capel’s Method in depth and propose a novel global alignment method. This new method works on the mosaic frame and makes use of features tracked along images. It does not require any kind of non-linear optimization. The method is based on two iterative phases. First, for each scene point that projects into the mosaic, it approximates its projection onto the mosaic frame. Then, using these approximated positions, the global projection of all images is recalculated. This procedure prevents the images from suffering the down-scaling effect. The rest of the paper is organized as follows: the following section summarizes some image mosaicing and global alignment aspects. Then, section 3 is dedicated to explaining the proposed method. Some results are illustrated in section 4 and, finally, we present our conclusions in the last section.
2 2.1
Image Mosaicing Feature-Based Image Registration
Image registration is the process of overlaying two or more images of the same scene taken at different times and from different viewpoints. The registration process geometrically aligns the images, and several approaches exist to register images [12].
A New Global Alignment Method for Feature Based Image Mosaicing
259
Some registration methods rely on the detection of salient features using Harris [13], Hessian [14] or Laplacian [15] detectors. These features are detected in the two images to be registered, and then a correlation or SSD (sum of squared distance) measure is computed around each feature for each assumed geometric transformation of the image. This had been the trend for several years, until the SIFT (Scale Invariant Feature Transform) algorithm was recently proposed by Lowe [16]. The satisfactory results of this method have greatly speeded up the development of salient point detectors and descriptors, taking feature-based matching techniques to the forefront. Compared to all formerly proposed schemes, SIFT and subsequent developed methods such as SURF [17] demonstrate considerably greater invariance to image scaling, rotation, change in illumination and 3D camera viewpoint. These methods solve the correspondence problem through a pipeline that involves (1) feature detection, (2) feature description and (3) descriptor matching. In both methods mentioned feature detection is based on Hessian or Laplacian detectors (the “Difference of Gaussians” of SIFT is an approximation to the Laplacian, and SURF uses an approximation to the Hessian). Feature description exploits gradient information at a particular orientation and spatial frequencies (see [18] for a detailed survey on descriptors). Finally, the matching of features is normally based on the Euclidean distance between their descriptors. In this way corresponding points are detected in each pair of overlapping images. The initial matching frequently produces some incorrect correspondences, which are called outliers. Outliers should be rejected with a robust estimation algorithm (e.g. RANSAC [19] or LMEDS [20]). These algorithms are used to estimate the image motion that agrees with the largest number of points. Outliers are identified as the points that do not follow that motion. After outlier rejection, the homography can be computed from the inliers. 2.2
Global Alignment
The quality constraints of image mosaics especially for mapping purpose are very strict, as the mosaic will be used for global navigation, the localization of interest areas and the detection of temporal changes. Hence, highly accurate image registration methods are needed, but they are not enough to ensure reliable maps over long image sequences. Mosaic creation requires the automatic inference of the path topology in order to detect non-consecutive overlapping images. In absence of other sensor data, motion estimation between time consecutive images is relevant information for the topology estimation of a complete sequence to find the non-time consecutive overlapping images that provide useful information for global alignment. Although the local alignment between images might be good, global alignment is still needed due to error propagation. In particular, the errors in the estimated motion have to be dealt with when the camera visits an area that has already been visited. The aim of global alignment is to overcome cumulative error and build a seamless and well aligned mosaic. Let k−1 Hk denote the relative homography between k th and (k−1)th image in a sequence. If the first image of the sequence is
260
A. Elibol et al.
chosen as the global frame, the global projection of image k into the mosaic frame is denoted as 1 Hk and called the Absolute Homography. This homography can be calculated by composing the transformations 1 Hk =1 H2 ·2 H3 · . . . ·k−1 Hk . Unfortunately, estimated correspondences between image pairs are subject to localization errors due to noise, illumination effects, etc. and the accuracy of the resulting homography is subject to the selected estimation method. Therefore, relative homographies have some errors and computing absolute homographies from the relative ones in a cascade product results in cumulative error.
3
Iterative Global Alignment
Our idea is inspired in a method proposed Capel [6] that tries to simultaneously minimize both the homography values and the position of features on the mosaic image. In this method, the same feature point correspondences need to be identified over all views. This requires feature tracking. The ith interest point on image k, k xi , is a projection onto the mosaic of point m xj which is called the pre-image point and is also usually projected in different views. All the image points that correspond to the projection of the same pre-image point are called N-view matches. The cost function to be minimized is defined as E=
M
k xi −k Hm ·m xj 2
(1)
j=1 k xi ∈ηj
where M is the total number of pre-image points, ηj is the set of N-view matches and k Hm is a mosaic-to-image homography 1 . In Eq. (1), both the homographies and the pre-image points are unknowns. The total number of unknowns is (DOF of homography)×(Number of views)+2×(Number of pre-image points). Eq. (1) can be minimized iteratively by applying the non-linear least square methods. In order to avoid the scaling effect on images the error term in Eq. (1) is defined on the image frame but also finds the position of the point on the mosaic frame. The error term in Eq. (1) can be transferred to the mosaic frame in the following equation: M m xj −m Hk ·k xi 2 (2) E= j=1 k xi ∈ηj
where m Hk is equal to (k Hm )−1 . Direct minimization of the error term in Eq. (2) scales down the image size, which is a problem since the smaller the image size, the smaller the error. We analyze the error term in Eq. (2) and see that an iterative minimization is possible, and non-linear optimization can be avoided. The idea is to divide the minimization into two steps: the first step is to minimize 1
m stands for the mosaic frame. This frame can be one of the image frames or a different arbitrary coordinate frame. In this study, we have chosen the first image frame as a mosaic frame therefore, m is equal to 1. In order to keep the generality, we have used m in the formulations.
A New Global Alignment Method for Feature Based Image Mosaicing
261
the error by considering the homography values to be constant. Therefore, they are not taken into account as unknowns. The problem is then reduced to a special case (one free point ) of quadratic placement problem [21] and has an analytic solution which is the average of the fixed points under the Euclidean norm. The coordinates of the pre-image points (m xj ) in the mosaic frame can be found as the mean of the position of points multiplied by the corresponding absolute homography. In the first iteration, as homographies are constant, the m Hk ·k xi term in Eq. (2) is known and the equation can be rewritten as follows: E=
M j=1
kx
m xj −m xi k 2
(3)
i ∈ηj
where m Hk ·k xi =m xki , In the Eq. (3), only the real positions of features in the mosaic frame are unknown. Derivatives of (3) for each feature points are calculated and set equal to zero. m
ˆj = x
1 m k ( xi ) Nj k
(4)
xi ∈ηj
where Nj is the total number of images in which feature point m xˆj appears. After estimating the real positions of the feature points, the second step is to ˆj ). These recalculate new absolute homographies using the new point set (k xi ,m x two steps can be executed iteratively until a selected stopping criterion is fulfilled. A typical stopping criterion is to set a threshold on the decrease rate of error term E. It should be noted that using this approach has two main advantages over existing methods: First, it avoids non-linear optimization by altering two linear steps. This is relevant in the case of large mosaics. As there is no any non-linear optimization, its computational cost is very low and therefore it is faster. The second one, there is no limitation on the data set. Our approach can be easily employed for large data sets.
4
Experimental Results
We have tested our method using three different image sequences. Capel’s TM method was implemented in the MATLAB environment. The optimization algorithm requires the computation of the Jacobian matrix that contains the derivatives of all residuals with respect to all parameters. This Jacobian matrix is very sparse since each residual only depends on a very small number of parameters. Furthermore, it has a clearly defined block structure, and the sparsity pattern is constant [6]. The first image sequence 2 has the same data set used in [22] and is also similar to the one used in [6]. There is no translation; the camera just rotates around its optical axis. The total number of images is 145. SIFT [16] is used 2
Data set is available at http://www.soe.ucsc.edu/˜davis/panorama/
262
A. Elibol et al.
Fig. 1. Initial estimation of the church sequence
(a) Capel’s method
(b) Proposed method
Fig. 2. Resulting mosaics of the church sequence
to detect and match features between images. Then, RANSAC [19] is used to reject outliers and estimate the homography between images. Total number of overlapping image pairs is 2997. Features are tracked along the images using the initial estimation of the topology. The total number of tracked features is 3956 and the total number of correspondences among overlapping image pairs is 128439. Figs. 4, 2(a) and 2(b) show the initial estimation and resulting mosaics. The mean reprojection error calculated by using all correspondences of the initial estimation is 13.29 pixels. For Capel’s method, the error is 2.21 pixels and the running time is 2249.15 seconds. For the proposed method, the error is 2.19 pixels and the required time is 240.91 seconds. The second image sequence 3 is from Marzotto et al. [8]. This time the camera moves arbitrarily. It is not only rotating but also translating. The sequence is 3
Data set is available at http://profs.sci.univr.it/˜fusiello/demo/hrm/
A New Global Alignment Method for Feature Based Image Mosaicing
263
Fig. 3. Initial estimation of the document sequence
(a) Capel’s Method
(b) Proposed method
Fig. 4. Resulting Mosaics of the document sequence
composed of 50 images of size 340 × 242. The total number of overlapping image pairs is 853 and the total number of correspondences and tracked features are 91628 and 1344 respectively. The initial estimation and the resulting mosaics are given in Figs. 3, 4(a) and 4(b). The average reprojection error of the initial estimation is 6.68 pixels. Capel’s Method required 176.93 seconds and the average reprojection error was 0.723 pixels. The proposed method required 13.59 seconds and the average error was 0.729 pixels. The last image sequence is acquired by an underwater robot which carries a downward looking camera. This set consists of 159 images of size 376 × 280 and covers approximately 18m2 . Before applying our method, the images were compensated for radial distortion. Thirty-two key frames were selected by calculating at least 50 percent overlap. Then, non-time consecutive overlapping image pairs were found. The total number of overlapping image pairs between key-frames is 150. Fig. 5(a) shows the initial estimation calculated by accumulation. Average reprojection error is 71.44 pixels. The total number of tracked features is 1116 and their distribution with the images is given in Fig. 5(b). The resulting mosaics are depicted in Figs. 6(a) and 6(b). In both mosaics, some misalignments can be seen because the distribution of tracked features is not close to uniform distribution. Some images contain very few tracked features, e.g. the 19th image has only five features. The average reprojection error
264
A. Elibol et al.
(a) Mosaic with the accumulated (b) Distribution of tracked features along imhomographies ages Fig. 5. Initial estimation and number of tracked features of the underwater sequence
(a) Capel’s method
(b) Proposed method
Fig. 6. Resulting mosaics of the underwater image sequence
calculated over 16203 correspondences is 13.18 pixels for Capel’s method and 13.46 for the proposed method. The running time for 20 iterations is 152.33 seconds for Capel’s method and 6.99 for the proposed method. From the results, it can be seen that the proposed method is able to build mosaics without any non-linear minimization. This makes it faster and removes
A New Global Alignment Method for Feature Based Image Mosaicing
265
the data size limitations. It does not suffer the scaling effect that appears while working on mosaic frame but, the resulting mosaic is highly dependent on the initial estimation as it works on the mosaic frame.
5
Conclusions and Future Work
An iterative global alignment method has been proposed to overcome the limitations of current state of the art in photo-mosaicing. Normally, global alignment requires the minimization of an error term, which is defined from image correspondences. This error term can be defined either in the image frame or in the mosaic coordinate system. In both cases, non-linear minimization is required most of the time. This new approach provides similar results without the need of non-linear optimization. It works on the mosaic frame and does not require any non-linear optimization methods. The proposed method has been tested with several image sequences and comparative results are presented to illustrate its performance. There is no limitation on the problem size and since its computational cost is very low, it is faster. The number of tracked features in each image and the initial estimation of topology play important roles in terms of image alignment. Our future direction is to develop an adequate weighting system while approximating the real positions of features on the mosaic frame in order to get better image alignment and topology. Acknowledgments. This work has been partially funded through the MOMARNET EU project MRTN-CT-2004-505026, in part by the FREESUBNET EU project MRTN-CT-2006-036186 and in part by the Spanish Ministry of Education and Science(MEC) under grant CTM2007-64751. NG has been supported by MEC grant CTM2007-64751, Ramon y Cajal program and AE has been funded by Generalitat de Catalunya under grant 2004FI-IQUC1/00130.
References 1. Szeliski, R., Shum, H.: Creating full view panoramic image mosaics and environment maps. In: SIGGRAPH International Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, vol. I, pp. 251–258 (1997) 2. Gracias, N., Costeira, J., Victor, J.: Linear global mosaics for underwater surveying. In: 5th IFAC Symposium on Intelligent Autonomous Vehicles (2004) 3. Hu, R., Shi, R., Shen, I.F., Chen, W.: Video stabilization using scale-invariant features. In: IV 2007 11th International Conference on Information Visualization, pp. 871–877 (2007) 4. Irani, M., Anandan, P.: Video indexing based on mosaic representations. Proceedings of the IEEE 86 (1998) 5. Irani, M., Hsu, S., Anandan, P.: Video compression using mosaic representations. Signal Processing: Image Communication 7, 529–552 (1995) 6. Capel, D.: Image Mosaicing and Super-resolution. Springer, London (2004) 7. Kang, E., Cohen, I., Medioni, G.: A graph-based global registration for 2d mosaics. In: International Conference on Pattern Recognition (2000)
266
A. Elibol et al.
8. Marzotto, R., Fusiello, A., Murino, V.: High resolution video mosaicing with global alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, vol. I, pp. 692–698 (2004) 9. Sawhney, H., Hsu, S., Kumar, R.: Robust video mosaicing through topology inference and local to global alignment. In: European Conference on Computer Vision, Freiburg, Germany, vol. II, pp. 103–119 (1998) 10. Can, A., Stewart, C.V., Roysam, B., Tanenbaum, H.L.: A feature-based technique for joint linear estimation of high-order image-to-mosaic transformations: Mosaicing the curved human retina. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(3), 412–419 (2002) 11. Cervantes, A., Kang, E.Y.: Progressive multi-image registration based on feature tracking. In: International Conference on Image Processing, Computer Vision, & Pattern Recognition, Las Vegas, pp. 633–639 (2006) 12. Zitov´ a, B., Flusser, J.: Image registration methods: A survey. Image and Vision Computing 21(11), 977–1000 (2003) 13. Harris, C.G., Stephens, M.J.: A combined corner and edge detector. In: Alvey Vision Conference, Manchester, U.K., pp. 147–151 (1988) 14. Beaudet, P.R.: Rotational invariant image operators. In: IAPR International Conference on Pattern Recognition, pp. 579–583 (1978) 15. Lindeberg, T.: Feature detection with automatic scale selection. International Journal of Computer Vision 30(2), 77–116 (1998) 16. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004) 17. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: European Conference on Computer Vision (2006) 18. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(10), 1615–1630 19. Fischler, M., Bolles, R.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. Assoc. Comp. Mach. 24(6), 381–395 (1981) 20. Meer, P., Mintz, D., Rosenfeld, A.: Analysis of the least median of squares estimator for computer vision applications. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 621–623 (1992) 21. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004) 22. Davis, J.: Mosaics of scenes with moving objects. In: IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, USA, vol. I, pp. 354–360 (1998)