to obtain the 3D structure of the object [2]. Reed and Alan built a robotic system consisting of a laser range finder attached to a robot arm to acquire range im-.
3D Shape Recovery with Registration Assisted Stereo Matching Huei-Yung Lin, Sung-Chung Liang, and Jing-Ren Wu Department of Electrical Engineering, National Chung Cheng University, 168 University Rd., Min-Hsiung Chia-Yi 621, Taiwan, R.O.C.
Abstract. A novel method for simultaneously acquiring and registering range data of a real object from different viewpoints is presented. Currently, most 3D model reconstruction techniques do not cooperate with the existing range data for shape recovery of future viewpoints. In this work, a stereo vision system is developed for 3D model acquisition. To reduce the computation and increase the accuracy of stereo matching algorithms, the recovered range data from previous viewpoints are registered and then used to provide additional constraints for 3D acquisition of the next viewpoint. Experiments have shown that our approach gives better performance on both execution time and stereo matching results.
1
Introduction
3D model acquisition of real-world objects is an active research area with applications in reverse engineering, pattern recognition, industrial inspection, computer graphics and multimedia systems, etc. Most commonly used approaches for obtaining a 3D model acquire partial 3D shapes of an object from different viewpoints and then fuse the range data sets into a common coordinate system. Thus, the procedure for 3D model reconstruction usually consists of the following four stages: (i) data acquisition, (ii) data registration, (iii) surface integration, and (iv) texture mapping. The data acquisition stage is to acquire the partial 3D shapes and the texture information of an object. The acquired range images are registered to a common reference frame based on their acquisition viewpoints. The registered range images are then integrated into a single surface representation. Finally, the texture information is mapped onto the surface to create a textured 3D model. Most 3D model reconstruction methods use passive camera systems or laser range scanners to collect partial 3D shapes of an object. Data registration for multiple views either heavily relies on the accuracy of 3D measurements or requires significant manual assistance. The separation of data acquisition and registration not only restricts the applicability to many systems, but also lacks data correction functionality from the registration stage. For example, Pulli et al. presented a complete system for scanning the range and color information of a 3D object from arbitrary viewpoints [1]. Albamont and Goshtasby designed a scanner J. Mart´ı et al. (Eds.): IbPRIA 2007, Part II, LNCS 4478, pp. 596–603, 2007. c Springer-Verlag Berlin Heidelberg 2007
3D Shape Recovery with Registration Assisted Stereo Matching
1st image pair
2nd image pair
3rd image pair
3D recovery (stereo matching)
3D recovery (stereo matching)
3D recovery (stereo matching)
1st image pair
2nd image pair
3D recovery (stereo matching)
3D recovery (stereo matching)
Data registration
Data registration
Data registration
Rotation matrix, translation vector
Rotation matrix, translation vector
(a) Conventional approach.
597
3rd image pair
3D recovery (stereo matching)
Modified searching range
Rotation matrix, translation vector
Registration constraint
(b) The proposed method.
Fig. 1. Flowcharts of the conventional 3D model acquisition approach and the proposed registration assisted method
system using four synchronous camera heads equipped with laser line generators to obtain the 3D structure of the object [2]. Reed and Alan built a robotic system consisting of a laser range finder attached to a robot arm to acquire range images [3]. Lin and Subbarao developed a stereo vision system using a single camera for range data acquisition by rotating the object [4]. Although some of the above work used incremental shape acquisition approach for view planning and range image registration, all of them separated the data acquisition stage from the data registration stage. In this work, we present a novel method to integrate the partial shape acquisition stage and range data registration stage for different viewpoints. It is basically an incremental method for 3D model acquisition, but the main focus is on the assistance of partial shape recovery rather than data registration from predetermined range images. Intensity images and the corresponding range data of a viewpoint are acquired by our stereo vision system. Different from conventional stereo-based techniques for 3D shape recovery, stereo pairs recorded from different image frames are not processed independently in our approach. 3D shapes recovered from the previous image frames are first registered to find the rotation matrix and translation vector of the transformation and generate a larger 3D surface. 3D reconstruction of the current viewpoint is then based on the information to reduce the computation and increase the accuracy and robustness of the stereo matching algorithms. Figure 1(a) shows the conventional 3D model acquisition approach used in the earlier research. Partial 3D shapes acquired by stereo based approaches or laser range scanning techniques are registered and integrated to create a more complete 3D model. Range data acquisition for the present viewpoint does not utilize any information from the previous viewpoints. Thus, error correction and processing speedup are not possible with this method. Figure 1(b) illustrates the proposed registration assisted stereo matching for range data acquisition. The major difference is that the registration constraint available after the first two sets of range data are obtained and registered. The information is then used for
598
H.-Y. Lin, S.-C. Liang, and J.-R. Wu
providing additional constraints for the next viewpoint range data acquisition. It is noted that the initial steps of the flowcharts are identical for both cases.
2
Range Data Acquisition
For a given object, the range data and intensity images from a fixed viewpoint are acquired by our stereo vision system. It consists of two video cameras placed side-by-side on a twin camera bar with an adjustable baseline and a computer controlled turntable. Sequences of stereo image pairs are transferred to a computer at a frame rate of 15 fps. A background of uniform color is used to facilitate the segmentation of the real object against the background regions. 3D acquisition of the object is then achieved by shape from stereo, i.e., recovering the depth information using triangulation. Multiple partial 3D shapes of the object for different viewpoints are collected by rotating the turntable gradually. In our earlier work, multiple base-angle rotational stereo concept has been proposed to achieve multi-baseline stereo by rotating the object placed on the turntable [4]. This technique, however, requires fairly accurate rotation angle to establish the epipolar geometry. In this research we are more interested in recovering range data using the assistance of previous registration information, rather than 3D shape reconstruction with precise rotation angle calibration. Thus, two cameras are installed to meet the conventional stereo configuration, i.e., the optical axes are parallel and the image planes are coplanar, for range range data acquisition. Furthermore, the rotation axis of the turntable is estimated by Tsai’s calibration method [5] using a planar checkerboard pattern aligned with the axis. With a fixed angle of rotation, the matches of the control points on the calibration pattern are used to estimate the corresponding transformation.
3
Multi-view Range Data Registration
The goal of registration is to find the spatial transformation between the range images taken from an object at different viewpoints, so that the points found in different range image views that represent the same surface point are aligned. A popular method for refining a given registration is the iterative closest point (ICP) technique, first introduced by Besl and McKay [6]. It uses a nonlinear optimization procedure to further align the data sets from coarse registration. Most ICP based registration algorithms require a fairly good initial rotation matrix and translation vector between the data sets to avoid the registration result stuck in a local minimum. In our data acquisition system, since the difference between two consecutive viewpoints is very small, the rotation matrix and translation vector are given by identity matrix and zero vector, respectively. To obtain a more complete 3D model of an object, multi-view registration of the partial 3D shapes is required. If a pair of range data from two consecutive viewpoints are registered at a time, it is clear that the registration errors will accumulate. General approach to this problem is to consider the network of views as a whole and minimize the registration errors of all views simultaneously,
3D Shape Recovery with Registration Assisted Stereo Matching
599
such that the registration errors are equally distributed [7]. Since our vision system focuses on 3D model acquisition, multi-view registration up to the nth viewpoint (n-th frame) only depends on the previous n − 1 viewpoints. To reduce the processing time and take the advantage of small changes between the viewpoints, the registration approach for newly added data set is implemented as follows. First, the registration is based on the modified ICP algorithm [6], but with m range data sets considered at a time. Suppose Si , Si+1 , · · · , Si+m−1 are the consecutive data sets, and Ti−1 represents the initial transformation (rotation and translation) to the data sets. Then the algorithm for the block of m data sets is given by Algorithm 1. Block Registration 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:
S i ← Ti−1 ◦ Si j=1 while j < m do S ← Si+j for k = −1 to j − 2 do S ← Ti+k ◦ S end for S i+j ← S Ti+j−1 ← Apply ICP on {S i+j−1 , S i+j } j++ end while
It is clear that m < n, and the computation and correctness of the registration simultaneously increase with m. After the registration has been done for the 3D data sets from newly added viewpoints, there might exist some isolated points due to data acquisition error. Those points are considered as noise and will be removed if the following two criteria are met: (i) the distances between the point and any other points are larger than a threshold, (ii) for a partitioned working volume, the point belongs to a cube with density lower than a threshold. Since the overlapping parts of the object surface accumulate as the the number of acquisition viewpoints increases, the error points can be removed efficiently without affecting the points belonging the true object surface.
4
Registration Assisted 3D Shape Recovery
3D reconstruction using correlation based stereo algorithms is usually timeconsuming due to template matching and the unknown searching areas. In addition to the commonly used constraints for stereo matching (such as ordering constraint, epipolar constraint), we propose a registration constraint to reduce the computation and increase the accuracy of 3D shape recovery. The basic idea
600
H.-Y. Lin, S.-C. Liang, and J.-R. Wu
Q
w·r r
Pn−1Pn
Object surface (frame n + 1)
P
Pn−2
Object surface (frame n)
pn−2 pn−1
q
p
Or
Ol
Fig. 2. Searching range
Ol
Or
Fig. 3. Projected circles
is to use the recovered and registered 3D shapes from previous image frames (viewpoints) to restrict the searching areas of the current stereo image pair. Under the assumption of slow object motion between different image frames, the corresponding motion vectors are bounded by some constants predicted from previous 3D registration. Suppose the 3D data are obtained and registered for image frames n − 2 and n − 1, and the registration is given by rotation matrix Rn−1 and translation vector tn−1 , respectively. If the corresponding 3D point of an image point pn−1 with respect to image frame n − 1 is Pn−1 , as illustrated in Fig. 2, then the same 3D point with respect to image frame n − 2 is given by Pn−2 = R−1 n−1 (Pn−1 − tn−1 )
(1)
Thus, the displacement of the 3D point between the image frames is given by Pn−1 −Pn−2 . If we assume the object motion (both the rotation and translation) is smooth, then the same 3D point with respect to image frame n should be bounded by a sphere with radius w · r centered at Pn−1 , where r is given by Pn−1 − Pn−2 and w is a weighting factor. Therefore, the searching areas corresponding to the 3D point Pn are the projections of the sphere onto the left and right images. There is a sphere associated with each image point in frame n − 1, and the spheres can be constructed based on the previous registration result. As shown in Figure 2, for a given image point pn of the left image from viewpoint n, the searching area in the right image is given by the projections of the spheres which intersect the ray passing through the optical center and the image point pn . More precisely, suppose the spheres are given by Sj , for j = 1, 2, ..., n, where ˆj are the projections of Sj onto n is the total number of image pixels. Bj and B the left and right images, respectively. Then for a given point x in the left image ˆ in the right image is given by with x ∈ ∩kj=1 Bj , the stereo correspondence x k ˆ ˆ ∈ ∪j=1 Bj . Consequently, the searching range is bounded by the union of the x projected circles of the spheres.
3D Shape Recovery with Registration Assisted Stereo Matching
601
The above observations on the stereo searching range and the required computation can be further reduced for the overlapping foreground (object) region of two consecutive image frames. As shown in Figure 3, suppose the projection of an object point P onto the left image is given by p for the n-th image frame, and there exists a foreground image point at the same image point pl in the (n + 1)-th image frame. Then there must exist a point Q, after the object motion, such that the projection of Q onto the left image in the (n + 1)-th image frame is also given by p. Thus, the searching range for the object point Q from image frame (n + 1) can be fully determined by a single sphere centered at the point P. Consequently, the searching range for the left image point pl given by the n-th frame is bounded by a single circle centered at pr , the correspondence of pl in the n-th image frame. Since the motion of the object or the cameras is relatively slow compared to the video frame rate during image acquisition, the foreground and background difference between two consecutive image frames is usually small and happens near the object boundary. In the experiments, the searching range of more than 80% of the object region in the left image can be covered by a single circle centered at the matching point in the previous image frame. The radius of the circle is given by a weighting factor times the motion vector derived from range data registration with previous two image frames. If the object motion (rotation and translation) is fairly uniform, a constant weighting factor can be used for the whole image sequence. In practice, smaller weighting (less than one, for instance) is preferred since both the possibility of stereo mismatch and computation time can be reduced. For the rest of the object regions which do not exist in the previous image frame, the union of projected circles has to be used for the searching region.
5
Experimental Results
The described algorithms have been implemented on a stereo vision system, and tested on a number of real objects. The baseline of the stereo cameras is set as 60 mm and the pose of the test object is changed slowly in front of a static background. A blue screen technique is used to segment the background, and only the foreground regions are used for stereo matching. The foreground regions of two consecutive image frames are also used to identify their common image area for single circular stereo searching regions. In the implementation, the epipolar constraint is applied and the searching range is given by the intersection of the union of the circles and the epipolar line. One interesting observation is that the resulting searching range varies for different image positions. Figure 4 shows the first set of experimental results. The test object is manufactured via rapid prototyping without additional texturing. The disparity maps of the 3D reconstruction without registration information are shown in the second rows, followed by those obtained with registration assistance using weighting factors of 0.6, 0.8, 1, and 2, respectively. It can be seen that the disparity maps obtained from registration assistance with weighting factor of 0.8 give the
602
H.-Y. Lin, S.-C. Liang, and J.-R. Wu
Fig. 4. From top to bottom: the acquired intensity images from the left sequence, the disparity maps obtained without registration information, the disparity maps obtained with registration assistance using weighting factors of 0.6, 0.8, 1, and 2, respectively
best results in terms of smoothness and correctness. Table 1 shows the execution times (in seconds) of the results in Figure 4. The processing time for stereo reconstruction is speeded up about five times with the assistance of data registration.
3D Shape Recovery with Registration Assisted Stereo Matching
603
Table 1. Execution times of the results shown in Figures 4 (in seconds) Frame No. without registration assistance with registration (weighting = 0.6) with registration (weighting = 0.8) with registration (weighting = 1) with registration (weighting = 2)
6
1 1.25 1.25 1.25 1.25 1.25
2 1.24 1.24 1.24 1.24 1.24
3 1.23 0.22 0.24 0.30 0.39
4 1.22 0.22 0.25 0.28 0.47
5 1.22 0.22 0.25 0.28 0.42
6 1.20 0.23 0.25 0.34 0.44
7 1.20 0.30 0.31 0.38 0.52
Conclusion and Future Work
Most of the existing 3D model acquisition techniques lack data correction functionality due to the separation of range data registration and 3D data collection stages. In this work, we have presented a novel method to simultaneously acquire and register range data of an object from different viewpoints. A stereo vision system was developed to acquire the 3D shapes of an object. The range data obtained from previous viewpoints are first registered and then used to provide additional geometric constraints for the 3D acquisition of the next viewpoint. Experiments have shown that the proposed approach gives better performance on both execution time and 3D reconstruction result. In the future work, more sophisticated prediction model based Kalman filter should be adopted to further reduce the correspondence searching time. The possibility of extending current research to deal with generically shaped object will also be investigated.
References 1. Pulli, K., Shapiro, L.: Surface reconstruction and display from range and color data. Graphical Models 62(3), 165–201 (2000) 2. Albamont, J., Goshtasby, A.: A range scanner with a virtual laser. Image and Vision Computing 21(3), 271–284 (2003) 3. Reed, M.K., Allen, P.K.: 3-d modeling from range imagery: An incremental method with a planning component. Image and Vision Computing 17(2), 99–111 (1999) 4. Lin, H.Y., Subbarao, M.: A vision system for fast 3d model reconstruction. In: CVPR (2), pp. 663–668. IEEE Computer Society Press, Los Alamitos (2001) 5. Tsai, R.: A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses. IEEE Trans. Robotics and Automation 3(4), 323–344 (1987) 6. Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and machine Intelligence 14(2), 239–258 (1992) 7. Bergevin, R., Soucy, M., Gagnon, H., Laurendeau, D.: Towards a general multiview registration technique. IEEE Trans. Pattern Analysis and Machine Intelligence 18(5), 540–547 (1996)