3D Object Reconstruction from Uncalibrated Images using a Single Off ...

15 downloads 5453 Views 370KB Size Report
useful information from images, in a way as similar ..... html, 2002. Lorensen, W. E. ... Yang, L., 3D Surface Reconstruction from 2D Images, Center for Visual ...
3D Object Reconstruction from Uncalibrated Images using a Single Off-the-Shelf Camera Teresa C. S. Azevedo

INEGI – Instituto de Engenharia Mecânica e Gestão Industrial LOME – Laboratório de Óptica e Mecânica Experimental FEUP – Faculdade de Engenharia da Universidade do Porto - Portugal

João Manuel R. S. Tavares & Mário A. P. Vaz

INEGI, LOME DEMEGI – Departamento de Engenharia Mecânica e Gestão Industrial FEUP – Faculdade de Engenharia da Universidade do Porto - Portugal

ABSTRACT: Three-dimensional (3D) objects’ reconstruction using just bi-dimensional (2D) images has been a major research topic in Computer Vision. However, it is still a hard problem to solve, when automation, speed and precision are required and/or the objects present complex shapes and visual properties. In this paper, we compare two Active Computer Vision methods commonly used for the 3D reconstruction of objects from image sequences, acquired with a single off-the-shelf CCD camera: Structure From Motion (SFM) and Generalized Voxel Coloring (GVC). SFM recovers the 3D shape of an object using the camera(s)’s or object’s movement, while VC is a volumetric method that uses photoconsistency measures to build a 3D model for the object. Both methods considered do not impose any kind of restrictions to the relative motion involved. 1 3D RECONSTRUCTION 1.1 Introduction Computer Vision is continuously developing theories and methodologies to automatic extract useful information from images, in a way as similar as possible as humans do with their visual system. Contactless methods usually used to recover the 3D geometry of objects are commonly divided in two categories: active techniques, that require some kind of energy projection or relative moment between the camera’s and object’s, and passive techniques, that only use ambient illumination and so is not considered any kind of projected energy. The main goal of this work was the comparison of two active methods commonly used in Computer Vision for 3D objects reconstruction: Structure From Motion (SFM) and Generalized Voxel Coloring (GVC). 1.2 State of the art 3D reconstruction of objects has become an intensive research topic in Computer Vision. Digital 3D models are required in many applications, such as industrial inspection, biomedical, navigation, objects identification, etc. Usually, high quality 3D models of static objects are obtained using scanners systems, generally expensive, but easy to handle. The explosive growth in computer’s processing, in terms of computational power and memory storage, and its continuous

reducing price, together with the development of more and more sophisticated and affordable digital cameras, allowed the practical use of photogrammetric methods for 3D reconstruction in the last decades. In fact, they appear now has a lowcost and portable alternative to the common rangebased 3D reconstruction methods. However, imagebased reconstruction is still a difficult task, in particular for large or complex objects and if uncalibrated images are used, or when a wide baseline between the acquired images is presented, [Remondino, 2006]. The next two subsections will focus on two commonly used image-based reconstruction methods: Structure From Motion (SFM), that belongs to the standard stereo-based methods and Generalized Voxel Coloring (GVC), that belongs to the recent volumetric reconstruction methods. 1.3 Structure From Motion Proposed in [Ullman, 1979], SFM is a stereo-based method, Figure 1. It uses the relative movement between the camera(s) used and the object to be reconstructed, to make assumptions about the 3D object’s shape. Thus, by knowing the trajectories of object’s feature points in the image’s plane, this method determines the 3D shape and motion that better describes most of the point’s trajectories. This method has received several contributions and diverse approaches, e.g. [Chaumette, 1991], [Aans, 2002], [Chiuso, 2002] and [Hui, 2006]. In the

present case, we do not pretend impose any kind of restrictions to the movement involved.

Figure 1. Stereo vision’s principle: P’s 3D coordinates are determined through the intersection of the two lines defined by the optical centers O and O´ and the matched 2D image points p and p´.

1.4 Generalized Voxel Coloring Stereo-based methods, like SFM, fail to capture shapes with complicated topology, due to occlusion or smooth surfaces. For smooth object’s, 3D reconstruction using volumetric or voxel-based methods have been quite popular for some time, [Seitz, 1997]. These methods assume that there is a bounded volume in which lays the object of interest. The 3D space model is then represented or sampled by voxels (regular volumetric structures also known as 3D pixels). First volumetric methods combine silhouette images of the object to be reconstructed with camera’s calibration information to set the visual rays in 3D space for all silhouette points, which define a generalized cone within lays the same object. The intersection of these cones defines the visual hull, [Laurentini, 1994], a volumetric space in which the object to be reconstructed is guaranteed to be, Figure 2. The accuracy of the reconstruction obtained depends on the number of images used, on the positions of each viewpoint considered, on the camera’s calibration quality and on the complexity of the object’s shape. Generalized Voxel Coloring (GVC) is a volumetric method that does not require a matching process between the object’s feature points along the image sequence used as the SFM needs, [Slabaugh, 1999]. Instead, starting with a sequence of calibrated images from the object to be reconstructed, GVC uses photoconsistency measures to determine if a certain voxel belongs or not to the object being reconstructed. This technique simultaneously builds and colors a 3D model for the object to be reconstructed.

Figure 2. Left to right: from the original object to its visual hull ([Yang, 2003]).

2 METHODOLOGY FOLLOWED In this work, both methods were tested on two objects with different topological properties: a simple parallelepiped and a human’s hand model. The parallelepiped has a straightforward topology, with flat orthogonal surfaces, whose vertices are easily detected in each image and simply matched along the image sequence. On the contrary, the hand model has a smooth surface and complicated topology. To test the SFM method, we follow the methodology proposed in [Pollefeys, 2004], Figure 3. Thus, the first step is to acquire two uncalibrated images, from the object to be reconstructed, using a single off-the-shelf digital camera. Then, image feature points are extracted and matched, followed by the determination of the epipolar geometry, image rectification and, finally, dense matching.

Figure 3. SFM methodology followed to obtain the 3D reconstruction of objects from uncalibrated images.

In other hand, to test the GVC method we follow the methodology proposed in [Azevedo, 2007], Figure 4. Thus, a single off-the-shelf CCD camera is used, to acquired object’s image sequences, and the Zhang’s calibration method was used to calibrate the same, [Zhang, 2000]. Then, to obtain the object’s silhouettes from the input images, image segmentation is performed. Combining the original image sequence and associated silhouette images, and considering the camera’s calibration parameters, both objects’ models are built using the GVC volumetric method, and polygonized and smoothed using the Marching Cubes algorithm, [Lorensen, 1987]. 3 EXPERIMENTAL RESULTS 3.1 SFM method Figure 5, shows the acquired stereo image pairs of both objects used in this work. For both objects, 200 image features were extracted using the Harris corner detector, [Harris, 1988], imposing a minimum distance of 10 pixels between each detected feature. Robust matching of features between the stereo images was made using the RANSAC algorithm, [Fischler, 1981]. The results

obtained can be observed in Figure 6 and Figure 7. Since the hand model presents a smooth surface, obviously many wrong matches were detected and, consequently, the determined epipolar geometry was incorrect.

Figure 4. GVC methodology followed to obtain the 3D reconstruction of objects from uncalibrated images.

closest objects will have maximum disparity instead (white regions in the disparity map). The results obtained for both objects considered in this work can be observed in Figure 10 and Figure 11. Given the incorrect results obtained in the previous steps, when compared with the parallelepiped object case, the dense matching for the hand model was, consequently, of poor quality.

Figure 7. Results of the feature points (robust) with the stereo image pair of the parallelepiped object (matched feature points of first image are marked with green crosses and the red crosses represent the matched feature points of the second image).

Figure 5. Stereo image pairs of the objects used to test the SFM reconstruction method.

Figure 8. Rectification results for the stereo images of the parallelepiped object.

Figure 6. Results of the feature points (robust) matching with the stereo image pair of the parallelepiped object (matched feature points of first image are marked with green crosses and the red crosses represent the matched feature points of second image).

After, both stereo pairs were rectified Rectification transforms a stereo image pair in such a way that epipolar lines became horizontal, using the algorithm presented in [Isgrò, 1999]. This step allows an easier dense matching process. The results obtained with this step can be observed in Figure 8 and Figure 9. Dense matching was performed using the Stan Birchfield algorithm, [Birchfield, 1999]. This algorithm returns a disparity map, that gives some depth information about the objects present in a rectified pair of images: far objects will have zero disparity (black regions in the disparity map) and the

Figure 9. Rectification results for the stereo images of the hand model object.

Figure 10. Disparity map obtained for the parallelepiped object.

Figure 11. Disparity map obtained for the hand model object.

3.2 GVC method Figure 12, shows some examples of the images acquired to reconstruct the objects considered in this work using the GVC method. For an idea of the viewpoints considered in the image acquisition process, Figure 13 has a 3D graphical representation of the obtained extrinsic parameters for both objects.

Figure 13. 3D graphical representation of the extrinsic parameters obtained from the camera’s calibration process: above, parallelepiped object case; below, hand model case. Figure 12. Three images used for the 3D reconstruction of the parallelepiped (above) and the hand model (below).

The results obtained from the camera calibration were very accurate for both cases. Some segmentation results can be observed in Figure 14. Those results were achieved by first removing the red and green channels from the original RGB images and, finally, using image binarization by threshold value. Figure 15 and Figure 16 shows the results of the 3D reconstruction obtained for both objects using the GVC implementation in [Loper, 2002]. Both reconstructed models are very similar to the real 3D object, even in the case of the hand model. Comparing these results with the previous obtained by the SFM methodology, GVC has no problem to reconstruct objects with smooth surfaces or with complicated morphology. On the other hand, the accuracy of the 3D models built by this last methodology is highly dependent on the previous calibration and segmentation steps. Thus, GVC puts some restrictions, such as a background with low color variation and suitable calibration apparatus, making it unfit for real-world object reconstruction.

Figure 14. One example of image segmentation for the parallelepiped (above) and the hand model (below): on the left, the original image; on the right, the binary image obtained.

Figure 15. Two different viewpoints of the 3D model obtained for the parallelepiped case: left, original image; middle, voxelized 3D model; right, polygonized and smoothed 3D model.

REFERENCES

Figure 16. Two different viewpoints of the 3D model obtained for the hand model case: left, original image; middle, voxelized 3D model; right, polygonized and smoothed 3D model.

4 CONCLUSIONS The main goal of this paper was to compare experimentally two commonly used image-based methods for 3D object reconstruction: Structure From Motion (SFM) and Generalized Voxel Coloring (GVC). To test and compare the both methods, two objects with different topological properties were used: a parallelepiped and a human’s hand model. Our adopted SFM methodology gave fine results when the objects presents strong feature points, easy to detect and match along the input images. However, we can conclude that even small errors in the matching and epipolar geometry estimation can seriously compromise the remaining steps. On the other hand, this is a flexible method for real-world scenes, because it does not require an independent camera calibration process, as the features matched along an image sequence can be used in an autocalibration process. The models built using the GVC method were quite similar and closer to the real objects, as in terms of shape as in color. Even thought, the reconstruction accuracy was highly dependent on the quality of the results’ of the camera calibration procedure and of image segmentation. These can be two major drawbacks in real-world scenes because they limit the application of the GVC method. Thus, when comparing the two methods, we can conclude that, on one hand, GVC performs better the 3D reconstruction of objects with complex topology and, on the other hand, SFM is better for unconstrained real-world objects reconstruction. 5 ACKNOWLEDGEMENTS This work was partially done in the scope of project “Segmentation, Tracking and Motion Analysis of Deformable (2D/3D) Objects using Physical Principles”, with reference POSC/EEASRI/55386/2004, financially supported by FCT – Fundação para a Ciência e a Tecnologia from Portugal.

Aans, H. & Kahl, F., Estimation of Deformable Structure and Motion, Vision and Modelling of Dynamic Scenes Workshop, Copenhagen, Denmark, 2002. Azevedo, T. C. S., Tavares, J. M. R. S., et al., 3D Volumetric Reconstruction and Characterization of Objects from Uncalibrated Images, 7th IASTED International Conference on Visualization, Imaging, and Image Processing, Palma de Maiorca, Spain, 2007. Birchfield, S., Depth Discontinuities by Pixel-to-Pixel Stereo, International Journal of Computer Vision, vol. 35, no. 3, pp. 269-293, http://vision.stanford.edu/~birch/p2p/, 1999. Chaumette, F. & Boukir, S., Structure from motion using an active vision paradigm, Int. Conference on Pattern Recognition, The Hague, Netherlands, vol. 1, pp. 41-44, 1991. Chiuso, A., Favaro, P., et al., Structure from motion causally integrated over time, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 523-535, 2002. Fischler, M. A. & Bolles, R., RANdom SAmpling Consensus: a paradigm for model fitting with application to image analysis and automated cartography, Communications of the Association for Computing Machinery, vol. 24, no. 6, pp. 381395, 1981. Harris, C. G. & Stephens, M. J., A combined corner and edge detector, Forth Alvey Vision Conference, University of Manchester, England, vol. 15, pp. 147-151, 1988. Hui, J., A holistic approach to structure from motion, Computer Science Dissertation, University of Maryland, USA, 2006. Isgrò, F. & Trucco, E., Projective rectification without epipolar geometry, IEEE International Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, USA, vol. 1, pp. 94-99, 1999. Laurentini, A., The visual hull concept for silhouette-based image understanding, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 2, pp. 150-162, 1994. Loper, M., Archimedes: Shape Reconstruction from Pictures A Generalized Voxel Coloring Implementation, http://matt.loper.org/Archimedes/Archimedes_docs/html/index. html, 2002. Lorensen, W. E. & Cline, H. E., Marching cubes: A high resolution 3D surface construction algorithm, International Conference on Computer Graphics and Interactive Techniques, ACM Press, New York, USA, vol. 21, no. 4, pp. 163-169, 1987. Pollefeys, M., Gool, L. V., et al., Visual Modeling with a Hand-Held Camera, International Journal of Computer Vision, vol. 59, no. 3, pp. 207-232, 2004. Remondino, F. & El-Hakim, S., Image-based 3D Modelling: a review, The Photogrammetric Record, vol. 21, no. 115, pp. 269-291, 2006. Seitz, S. N. & Dyer, C. R., Photorealistic Scene Reconstruction by Voxel Coloring, IEEE Conference on Computer Vision and Pattern Recognition Conference, San Juan, Puerto Rico, pp. 1067-1073, 1997. Slabaugh, G. G., Culbertson, W. B., et al., Generalized Voxel Coloring, Workshop on Vision Algorithms, Corfu, Greece, pp. 100-115, 1999. Ullman, S., The Interpretation of Visual Motion, Massachusets MIT Press, Cambridge, USA, 1979. Yang, L., 3D Surface Reconstruction from 2D Images, Center for Visual Computing, Instructional Computing, Stony Brook University, EUA, 2003.

Zhang, Z., A Flexible New Technique for Camera Calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330-1334, 2000.