Detailed image-based 3D geometric reconstruction of ... - CiteSeerX

Detailed image-based 3D geometric reconstruction of heritage objects FABIO REMONDINO

1

Abstract: The attention in digital documentation and preservation of heritages is always increasing and fast but reliable, low cost, portable and practical solutions are more and more of great interest for archaeologists, restorators and the all heritage community. The goal of this work is to present the developed multi-photo image-based techniques able capture the fine 3D geometric details of such objects or sites. The size of the interesting area can be large and of any arbitrary shape. Although laser scanners can acquire a large number of 3D points at high speed, they can be impractical and slow to set up and move around in archaeological sites. With our approach we provide for geometric details in an automatic and reliable manner to achieve photo-realistic visualisation for heritage and archaeological applications. Our main contributions are the use of multiple image features (grid point, interest points and edges) and the simultaneous processing of multiple images. The reported results demonstrated the utility and flexibility of the technique and proved that it creates highly detailed models in a reliable manner with no unpredictable behaviour for many different types of surface detail. Accuracy tests with range data are also reported.

1 Introduction 3D modeling of an object can be seen as the complete process that starts from the data acquisition and ends with a virtual model in three dimensions visible interactively on a computer. Nowadays the generation of computer 3D models is mainly achieved with range sensors (e.g. laser scanner) or image data while in some cases other information like CAD, surveying or GPS data are also integrated in the project [El-Hakim et al., 2007]. Different applications and fields require 3D models, from the traditional industrial inspections and robotics to the recent interests for visualization, documentation and preservation of Cultural Heritage. Among the available optical measurement techniques (mainly passive and active methods), photogrammetry is a passive image-based documentation method able to provide precise 3D geometric and textural information of an imaged object. In the last decade, with the advent of automated procedures (mainly for satellite and aerial cases) and fully digital sources and products, it has become easier to use and cheaper and a wide range of commercial software is available. Most of the current reliable and precise image-based solutions rely on semi-automated measurements; therefore the introduction of automated algorithms is a key goal in the photogrammetric communities. Generally we can classify 3D modeling methods according to the level of automation or the required input data while their strength is reflected by (i) the variety of scene that can be processed, (ii) the level of detail that can be reconstructed and (iii) the accuracy of the final model. According to the project requirements, automated, semi-automated or manual 1

Institute of Geodesy and Photogrammetry – ETH Zurich, Switzerland E-mail: [email protected]; Web: http://www.photogrammetry.ethz.ch

image-based approaches should be selected to produce digital models usable for inspections, visualization or documentation. Automated methods focus mainly on the full automation of the process but generally produce results which are mainly good for nice-looking real-time 3D recording or simple visualization. On the other hand, semi-automated methods try to reach a balance between accuracy and automation and are very useful for precise documentation and restoration planning. Therefore it is always very important to define the project requirements and afterwards select the more suited recording and processing method. Even if active sensors are promising and being very often used in 3D modeling projects, their cost, size, power requirement, the required large number of scans to overcome occlusions and the intricate handling of their data are significant drawbacks. Alternatively, image-based techniques provide data from low cost portable digital cameras or even mobile phones and the recent developments in image matching and surface measurement are really promising. In this article we present our latest research in terrestrial image-based modeling for heritage and archaeological objects documentation, focusing mainly on the image matching algorithm developed to derive dense and detailed surface models from a generic set of convergent terrestrial images.

2 Terrestrial image-based modeling Compared to other recording and modeling methods, images can be acquired with no expensive systems and contain all the information for the generation of a textured 3D model. But deriving a complete, detailed, accurate and realistic 3D image-based model is still a difficult task, in particular for large or detailed objects, if the images are acquired by non-experts and if uncalibrated or widely separated images are used. Since many years photogrammetry is dealing with the precise 3D reconstruction of objects from images. Even if it often considered as time consuming and complicated, the heritage community is starting to consider it for digital documentation also at terrestrial scale as a very promising alternative to range sensors, traditionally used as easy and efficient instruments, even if not always portable and usable. Photogrammetry requires precise calibration and orientation procedures, but different commercial packages are nowadays available. In the terrestrial case, those packages are all based on manual or semi-automated measurements. They allow, after the (manual) tie point measurement and bundle adjustment phase, to obtain sensor calibration and orientation data, 3D object point coordinates from a multi-image network, as well as wireframe or textured 3D models. Nevertheless two great research topics are still largely investigated: the automated image orientation and surface measurement. Indeed, at the moment, no commercial solution is able to perform an automated markerless image orientation [Remondino & Ressl, 2006] while automated camera calibration based on coded-target is an issue already solved since some years [Gangi & Hanley, 1998; Cronk et al., 2006]. Furthermore, there is no package able to automatically reconstruct a complex surface model employing more than two images: indeed commercial photogrammetric software (e.g. LPS - Leica Geosystem, PI-3000 - Topcon) have only a matching tool able to provide dense surface model from ‘convergent’ stereo-pairs [Kadobayashi et al., 2004; Chandler et al., 2007] while a multi-photo approach would be more

reliable and efficient. Fully automated 3D modeling procedures have been widely reported in the vision research community [Fitzibbon & Zisserman, 1998; Nister 2001; Pollefeys et al., 2004]. These approaches start with a sequence of closely separated images taken with an uncalibrated camera. The system then extract interest points, sequentially match them across the views and compute the camera parameters as well as the 3D coordinates of the matched points. This is done in a projective geometry framework and is usually followed by a bundle adjustment. A selfcalibration, to compute the interior camera parameters, is afterwards performed in order to obtain a metric reconstruction, up to a scale, from the projective one. The 3D surface model is then automatically generated by means of dense stereo depth maps. It is therefore clear that the key to the success of these fully automated approaches is the very short interval between consecutive images, the absence of illumination or scale changes and the good texture in the images. These are all constraints that cannot always be satisfied during the image acquisition [Voltolini et al., 2006], in particular the small baseline. Moreover illumination changes can always appear in a sequence as well as image-scale differences. To face the wide baseline and the image-scale problems, different strategies have been proposed [Matas et al., 2002; Lowe, 2004; Mikolajczyk et al., 2005], although further research in this area is still needed. Indeed their reliability and applicability for automated image-based modeling of complex objects is still not satisfactory, as they yield mainly a sparse set of matched feature points. Automated dense reconstruction were instead presented in [Strecha et al., 2003; Megyesi & Chetverikov, 2004], but no accuracy tests were reported. In some applications, manual measurements are also performed, generally for complex architectural objects or in cultural heritage documentations where highly precise and detailed results are required [Gruen et al., 2004]. Manual measurements are time consuming and provide for less dense 3D point clouds, but have higher reliability compared to automated procedures. Therefore, the modeling steps are generally separated, having automation where possible and interaction where reliability and precision are necessary. The entire photogrammetric workflow used to derive metric and reliable information of a scene from a set of images consists of (i) calibration and orientation, (ii) 3D measurements via image matching, (iii) structuring and modeling, (iv) texture mapping and visualization. In the following section, the surface measurement approach developed at ETH to derive dense and detailed 3D model from terrestrial images is presented. For a review of the entire imagebased modeling pipeline we refer to [Remondino & El-Hakim, 2006].

3 Dense and detailed surface measurement 3.1 Image matching overview Image matching can be defined as the establishment of correspondences between primitives extracted from two or more images. Typical primitives are points or edges. In its oldest form, image matching involved 4 transformation parameters (cross-correlation) and could already provide for successful results [Foerstner, 1982]. Further extensions considered a 6- and 8parameters transformation, leading to the well known non-linear Least Squares Matching (LSM) estimation procedure [Gruen, 1985; Foerstner, 1986]. Gruen [1985] and Gruen & Baltsavias

[1986] introduced the Multi-Photo Geometrical Constraints (MPGC) concept into the image matching procedure and integrated also the surface reconstruction into the process. Then from image space, the matching procedure was generalized to object space, introducing the concept of ‘groundel’ or ‘surfel’ [Wrobel, 1987; Helava, 1988]. In the vision community, two-frame stereo-correspondence algorithms are predominantly used [Dhond & Aggarwal, 1989; Brown, 1992; Scharstein & Szeliski, 2002], producing a dense disparity map consisting of a parallax estimate at each pixel. Often the second image is resampled in accordance with the epipolar line to have a parallax value in only one direction. A large number of algorithms have been developed and the dense output is generally used for view synthesis, image-based rendering or quick modeling of complete regions. A part from simple points, the extraction of feature lines [Dhond & Aggarwal, 1989; Ziou & Tabbone, 1998] is also a crucial step in the surface generation procedure. Lines (edgel) provide more geometric information than single points and are also useful in the surface reconstruction (e.g. as breaklines) to avoid smoothing effects on the object edges. Edge matching [Vosselman, 1992; Gruen & Li, 1996; Schmid & Zisserman, 2000] establishes edge correspondences over images acquired at different standpoints. Similarity measures from the edges attributes (like length, orientation and absolute gradient magnitude) are a key point for the matching procedure. Unfortunately in close-range photogrammetry, the viewpoints might change consistently; therefore similarity measures are not always useful for edge matching. Even if more than three decades have been devoted to the image matching problem, nowadays some important limiting factors still remain. A fully automated, precise and reliable image matching method, adaptable to different image sets and scene contents is not available, in particular for close-range images. The limits stay in the insufficient understanding and modeling of the undergoing processes (human stereo vision) and the lack of appropriate theoretical measures for self-tuning and quality control. The design of an image matcher should take into account the topology of the object, the primitives used in the process, the constraint used to restrict the search space, a strategy to control the matching results and finally optimization procedures to combine the image processing with the used constraints. 3.2 Surface measurement from multiple images The multi-image matching approach reported in this paper was originally developed for the processing of the very high-resolution TLS Linear Array images [Gruen & Zhang, 2003] and afterwards modified to accommodate any linear array sensor [Zhang & Gruen, 2004; Zhang, 2005]. Then it has been extended to process other image data such as the traditional aerial photos or convergent close-range images [Remondino & Zhang, 2006; Lambers et al., 2007]. It is based on the Multi-Photo Geometrically Constrained (MPGC) matching concept [Gruen & Baltsavias, 1986] and the Least Squares B-Spline Snakes (LSB-Snakes) method [Gruen & Li, 1996]. The matcher combines different matching algorithms according to the following steps: 1. Image pre-processing: the set of available images is processed combining an adaptive smoothing filter and the Wallis filter [Wallis, 1976], in order to reduce the effects of the radiometric problems such as strong bright and dark regions and optimizes the images for subsequent feature extraction and image matching. Furthermore image pyramids are generated.

2. Multiple Primitive Multi-Image (MPM) matching: this part is the core of the all strategy for accurate and robust surface reconstruction. Starting from the low-density features in the lowest resolution level of the image pyramid, the approach incorporates multiple image primitives (feature points, grid points and edges) extracted and matched in 3 integrated subsystems: the feature point extraction and matching, the edge extraction and matching (based on edge geometric and photometric attributes) and the relaxation-based relational matching procedure. Within the pyramid levels, the matching is performed with an extension of the standard crosscorrelation technique (Geometrically Constrained Cross-Correlation), integrating the epipolar geometry constraint for restricting the search space. The matcher exploits the concept of multiimage matching guided from object space and allows reconstruction of 3D objects by matching all available images simultaneously, without having to match all individual stereo-pairs and merge the results [Gruen & Baltsavias, 1986]. Moreover, at each pyramid level, a TIN is produced using the matched features and it is used in the subsequent pyramid level to get an approximation of the analyzed surface. 3. Refined matching: the MPGC matching and LSB-Snakes methods are used to refine the previous correlation results, achieve potentially sub-pixel accuracy matches and identify some inaccurate and possibly false matches. This is applied only at the original image resolution level. The surface derived from the previous step provides well enough approximations for the two matching methods and increases the convergence rate. The main characteristics of the multi-image-based matching procedure are: • Truly multiple image matching: the approach does not aim at pure image-to-image matching but it directly seeks for image-to-object correspondences. A point is matched simultaneously in all the images where it is visible and, exploiting the collinearity constraint, the 3D coordinates are directly computed, together with their accuracy values. • Matching with multiple primitives: the method is developed to be a robust hybrid image matching algorithms which takes advantage of both area-based matching and feature-based matching techniques and uses both local and global image information. In particular, it combines an edge matching method with a point matching method through a probability relaxation based relational matching process. Feature points are suitable to generate dense and accurate surface models but they suffer from problems caused by image noise, occlusions and discontinuities. Edges generate coarser but more stable models as they have higher semantic information and they are more tolerant to image noise. • High matching redundancy: exploiting the multi-image concept, highly redundant matching results are obtained. The high redundancy also allows automatic blunder detection. Mismatches can be detected and deleted through the analysis and consistency checking within a small neighbourhood. A part from the known camera parameters, the matcher requires some seed points between the images to start the automated matching procedure. These points can be measured manually in mono or stereo-view as well as imported from the orientation phase.

4 Examples We have performed many tests on different close-range data sets, trying to evaluate the presented surface measurement approach under different image conditions: widely separated images, untextured surfaces, detailed heritage objects, illumination or scale changes, etc. Results are presented in Figure 1, Figure 2 and Figure 3.

Figure 1: Heritage objects and derived 3D results: detailed and complex ornament (left) and ancient ruins of a church (right).

Figure 2: Three wide baseline images and the derived 3D model, display as colour-shaded as well as textured model.

For accuracy tests, we compared the image matching results with different ground truth data, acquired in the lab [El-Hakim et al., 2007] and also in the field [Rizzi et al., 2007]. The ground truth data were acquired with triangulation-based or ToF laser scanners. To compare the photogrammetric models with the ground truth data, we used commercial reverse engineering software. The registration and ‘3D-compare’ functions are able to compare two surface models and provide a colour-coded map of the distances between the models. A sample results is reported in Figure 4. In all our experiments, the average difference between the scanned model and the image-based model was between 1mm (triangulation-based scanner) and 3 mm (ToF scanner and object ca 20 m far away from the sensors), leading to the conclusion that the reached accuracy and details of both approaches are very similar.

Figure 3: Sample models of detailed heritages: small pot, church frontal relief, frescoed arches, small wooden relief.

5 Conclusions 3D image-based modeling of heritages is a very interesting topic with a lot of possible applications. We believe that site managers, archaeologists, restorators, conservators and the whole heritage community need simple and cost-effective methods to record and document heritages. Image-based modeling is a great approach and the presented results show that advanced surface measurement algorithms can get similar results to range sensors, but in a cheaper, fastest, portable and simpler way. In this contribution we reported our developed matching and surface measurement strategy. It is a multi-image approach, more reliable and precise than typical stereo-pair algorithms and based on the least squares matching principle. Further accuracy tests are required but we can definitely say that the aspect that is no more decisive in the choice of the 3D modeling technique is the accuracy and detail of the final 3D model, at least in most of the terrestrial applications.

Figure 4: 3D model of a relief realized using 5 images acquired with a 13.5Mpixel camera equipped with a 135 mm objective. The accuracy analysis (colour-coded map) with range sensor data (Leica HDS 3000, 6 mm positional accuracy at 50m) gave a standard deviation of ca 3 mm.

6 References BROWN, L.G., 1992. A survey of image registration techniques. ACM Computing surveys, 24(4): 325-376 CHANDLER, J., BRYAN, P., FRYER, J., 2007: The development and application of a simple methodology for recording rock art using consumer-grade digital cameras. Photogrammetric Record, 22(117), pp. 10-21 CRONK, S., FRASER, C., HANLEY, H., 2006: Automatic metric calibration of colour digital cameras. Photogrammetric Record, 21(116), pp. 355-372 DHOND, U.R., AGGARWAL, J.K., 1989: Structure from Stereo. IEEE Transaction on System, Man and Cybernetics, Vol. 19(6), pp. 1489-1510 EL-HAKIM, S., GONZO, L., VOLTOLINI, F., GIRARDI, S., RIZZI, A., REMONDINO, F., WHITING, E., 2007: Detailed 3D modeling of castles. Int. Journal of Architectural Computing (in press) FITZGIBBON, A., ZISSERMAN, A., 1998: Automatic 3D model acquisition and generation of new images from video sequence. Proceedings of European Signal Processing Conference, pp. 1261-1269 FOERSTNER, W., 1982: On the geometric precision of digital correlation. IAPRS, Vol. 24(3), pp.176-189 FOERSTNER, W., 1986: A feature based correspondence algorithm for image matching. IAP, Vol. 26(3), Rovaniemi GANCI, G., HANLEY, H., 1998: Automation in videogrammetry. IAPRS, 32(5), pp.53-58

GEORGESCU, B., MEER, P., 2004: Point Matching under Large Image Deformations and Illumination Changes. PAMI, Vol. 26(6), pp. 674-688 GRUEN, A., 1985: Adaptive least square correlation: a powerful image matching technique. South African Journal of PRS and Cartography, Vol. 14(3), pp. 175-187 GRUEN, A., BALTSAVIAS, E., 1986: Adaptive least squares correlations with geometrical constraints. Proc. of SPIE, Vol. 595, pp. 72-82 GRUEN, A., LI, H., 1996: Linear feature extraction with LSB-Snakes from multiple images. IAPRS, Vol. 31(B3), pp. 266-272 GRUEN, A., ZHANG L., 2003. Automatic DTM Generation from TLS data. In: Gruen/Kahman (Eds.), Optical 3D Measurement Techniques VI, Vol. I, ISBN: 3-906467-43-0, pp. 93-105 GRUEN, A., REMONDINO, F., ZHANG, L., 2004: Photogrammetric Reconstruction of the Great Buddha of Bamiyan, Afghanistan. Photogrammetric Record, Vol. 19(107) HELAVA, U.V., 1988: Object-space least-squares correlation. PE&RS, Vol. 54(6), pp. 711-714 KADOBAYASHI, R., KOCHI, N., OTANI, H., FURUKAWA, R., 2004: Comparison and evaluation of laser scanning and photogrammetry and their combined use for digital recording of cultural heritage. IAPRS&SIS, 35(5), pp. 401-406 LAMBERS, K., EISENBEISS, H., SAUERBIER, M., KUPFERSCHMIDT, D. GAISECKER, T., SOTOODEH, S., HANUSCH, T., 2007: Combining photogrammetry and laser scanning for the recording and modelling of the Late Intermediate Period site of Pinchango Alto, Palpa, Peru. Journal of Archaeological Science, 34 (in press) LOWE, D., 2004: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, Vol. 60(2), pp. 91-110 MATAS, J., CHUM, O., URBAN, M., PAJDLA, T., 2002: Robust wide baseline stereo from maximally stable extremal regions. Proceedings of BMVC, pp. 384-393 MIKOLAJCZYK, K. TUYTELAARS, T., SCHMID, C., ZISSERMAN, A., MATAS, J., SCHAFFALITZKY, F., KADIR, T., VAN GOOL, L. 2005: A comparison of affine region detectors. Int. Journal of Computer Vision MEGYESI, Z., CHETVERIKOV, D., 2004: Affine propagation for surface reconstruction in wide baseline stereo. Proc. ICPR, Cambridge, UK NISTER, D., 2001: Automatic dense reconstruction from uncalibrated video sequences. PhD Thesis, Computational Vision and Active Perception Lab, NADA-KHT, Stockholm, 226 p. POLLEFEYS, M., VAN GOOL, L., VERGAUWEN, M., VERBIEST, F., CORNELIS, K., TOPS, J., KOCH, R., 2004: Visual modeling with a hand-held camera. IJCV, Vol. 59(3), pp. 207-232 REMONDINO, F., EL-HAKIM, S., 2006: Image-based 3D modelling: a review. Photogrammetric Record, 21(115), pp. 269-291 REMONDINO, F., RESSL, C., 2006: Overview and experience in automated markerless image orientation. IAPRS&SIS, Vol. 36(3), pp. 248-254 REMONDINO, F., ZHANG, L., 2006: Surface reconstruction algorithms for detailed closerange object modeling. IAPRS&SIS, Vol. 36(3), pp. 117-123 RIZZI, A., VOLTOLINI F., REMONDINO, F., GIRARDI, S., GONZO, L., 2007: Optical measurement techniques for the digital preservation, documentation and analysis of cultural heritages. In: Gruen/Kahman (Eds.), Optical 3D Measurement Techniques VIII, Zurich, Switzerland (in press) SCHARSTEIN, D., SZELISKI, R., 2002: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 47(1/2/3): 7-42

SCHMID, C., ZISSERMAN, A., 2000: The geometry and matching of lines and curves over multiple views. IJCV, Vol. 40(3), pp. 199-233 STRECHA, C., TUYTELAARS, T., VAN GOOL, L., 2003: Dense Matching of Multiple Widebaseline Views. IEEE Proceedings of ICCV’03, Vol.2, pp. 1194-1201 TUYTELAARS, T. and VAN GOOL, L., 2004: Matching widely separated views based on affine invariant regions. International Journal of Computer Vision, Vol. 59(1), pp. 61-85 VOLTOLINI, F., REMONDINO, F., PONTIN, M., GONZO, L., 2006: Experiences and considerations in image-based-modeling of complex architectures. IAPRS&SIS, Vol. 36(5), pp. 309-314 VOSSELMAN, G., 1992: Relational matching. Lecture Notes in Computer Science, No. 628, Springer Berlin, 190 pages WALLIS, R., 1976: An approach to the space variant restoration and enhancement of images. Proc. of Symposium on Current Mathematical Problems in Image Science, Naval Postgraduate School, Monterey, CA WROBEL, B., 1987: Facet Stereo Vison (FAST Vision) – A new approach to computer stereo vision and to digital photogrammetry. Proc. of ISPRS Intercommission Conference on ‘Fast Processing of Photogrammetric Data’, Interlaken, Switzerland, pp. 231-258 ZHANG, L., GRUEN, A., 2004: Automatic DSM Generation from Linear Array Imagery Data. IAPRS&SIS, Vol. 35(B3), pp. 128-133 ZHANG, L., 2005: Automatic Digital Surface Model (DSM) generation from linear array images. PhD Thesis Nr. 16078, IGP, ETH Zurich, Switzerland, 199 pages ZIOU, D., TABBONE, S., 1998: Edge Detection Techniques - An Overview. Journal of Pattern Recognition and Image Analysis. Vol. 8, pp. 537-559