3D MODELLING OF A BUILDING FROM A SET OF UNCALIBRATED IMAGES WITH THE HELP OF CONSTRAINTS S. Cornou 1, 2, M. Dhome1, P. Sayd2 1
LASMEA UMR 6602 du CNRS – Plateaux des cézeaux – Université Blaise Pascal – 63000 Aubiere Cedex
[email protected] 2
CEA Saclay, bât. 512, DRT/DTSI/SLA/LCEI, 91191 Gif/Yvette –
[email protected] [email protected]
In this article, we present an interactive method to enable the 3D modelling of buildings from a set of uncalibrated images. Matching of homologous points and constraints (geometric ones in our examples) are used to proceed to the 3D model estimation. First, we introduce a bundle adjustment algorithm that hides the pose parameters by using classical pose estimation algorithms inside the optimisation process. We demonstrate that our approach does not require the knowledge of an approximate localisation and the orientation of the image. Then, we show that our bundle adjustment method is an original solution to the free gauge problem as the framework is implicitly fixed while we solve the normal equations and totally free if we consider the full estimation algorithm. Last, we demonstrate experimentally the efficiency in term of quality and convergence obtained by this approach, and we explain why this method is a good complement to the classical approach. In a second part, we present how we proceed to the reconstruction. The user locates the position of some specific points in the images and defines constraints such as orthogonality, planarity... Then, the previous bundle adjustment is used to evaluate the scene. At the end, we obtain a calibration of the cameras (focal lengths, position and the orientation of the points of view), and a 3D model of the building, compatible with the images, and that strictly respects the geometric constraints defined. Last, we show examples of 3D reconstruction of simple and complex buildings. By this way, we highlight that this approach is able to carry with a large variety of scene, and is flexible enough to allow reconstruction from one or several images. allow the reconstruction. In fact, the only link between points is given by the underlying 3D structures associated to the cameras. INTRODUCTION Automatic system can cope with this approach because the matching of homologous points is obtained automatically. In this work, our objective is to present a semi-automatic Whatever, those approaches are limited to sequence in which building reconstruction method. Our main contribution is an the point of view of each new image is close to the previous easy-to-use modelling method based on the definition of one. In our case, we wish to deal we a few number of image as, intuitive geometric constraints. The method yields a 3D model in the case of urban reconstruction, the interesting building is with the smallest description in term of the number of often enclosed in a highly constructed area. parameters, the absolute certainty to respect the geometric rules As we decided to use a few number of images, we cannot defined by the user, and the possibility to merge several models expected to obtain a large amount of matching points. So, we easily to obtain a full and multi-scale global 3D building model. have oriented our in direction of supervised reconstruction process and we have added to manual matching point, We first introduce the reconstruction problem through an information about the geometric structure of the scene (⇒ overview of existing methods. Then we explain how we constraints). Several methods are available to add this introduce constraints in our model. In the next part we introduce knowledge: a bundle adjustment algorithm. Last, we present results obtained on real sequences. 1. Consider a cloud of 3D points and add to the equation given by the matching between image, penalties 1 OVERVIEW OF EXISTING traducing the “respect” of given constraints. We name such constraints as “soft constraints” as one cannot be APPROACHES sure that they will be strictly respected. Many approach as been proposed to recover the 3D model of 2. Recover the 3D structure without constraint and then building from images. The aim of every method is to offer the trying to find the closest model that respects the more efficient solution considering three main criterions: wished constraint. Depending of the choice of the final model, this approach may lead to hard or soft • Accuracy of the final model constrained solutions. • Computation times as short as possible, reduced 3. Last, the 3D model may intrinsically contain the memory usage constraint. So, in that case, the degrees of freedom • An easy to use model associated to the model are minimal and, whatever the combination we try, the corresponding 3D solution The simplest model is to describe the building as a cloud of free always checks strictly those hard constraints. 3D points. In this case, each point has three degrees of freedom. Consequently, a large amount of information has to be given to
The two first approaches don’t reduce to the minimum the dimensionality of the problem. At the opposite, the third one does but requires a specific process to obtain a usable constraint model. To organize the overview of reconstruction method, we have organized the existing work in three parts. Of course, our description is not exhaustive but we try to propose an overview of the existing approaches
1.1
From elementary features to complex models
Those methods recover model from simple features such as points or lines. Often used in automatic reconstruction process, that method can deal with complex model (curved surface...) but there performance depends of the richness of the image (textures, edges…): [Pollefeys 99], [Mahamud 00], [Lhuillier 03]. Once the location of 3D features (and the camera calibration) has been identified, the following step is to identify high level elements to simplify the model. Those elements can be planes or more complex object such as cylinders, cubes or CAO model. One can find information about the structuration by plane in [Bartoli 03], [Werner 02a], [Werner 02b], the problem of detecting complex model into a set of features has been studied in [Moron 96] and [Ertl 01].
1.2
A sum of CAO model
Another way is to describe the model as a set of complex elements such as CAO objects. The user selects 3D shapes into a library and defines relationship between these elements. In complement, the objects are localised in the images. The main advantage of these approaches is to define implicitly constraint inside the chosen 3D shapes. The parameterisation is directly usable in a bundle adjustment and checks strictly the constraints. Moreover, such a description is directly inspired from the industrial CAO tools like CATIA, AUTOCAD, 3D Studio, MAYA … Consequently, the modelling process is easy to understand by engineer and can efficiently be include as toolbox. This merge of CAO and photogrammetry as already led to successful industrial system. For example, we quote the industrial measurement solution: Cylicon system from Siemens [Navab 02], AOMS from the CEA [Sayd 01], Phidias system [PHI]. Some solution dedicated to realistic rendering also exists: Photomodeler [PHO], Image Modeler [IMA], CANOMA [CAN]. Nevertheless, those systems have to deals with the complexity of recovering 3D structure from images. A lot of researches are led in this area. Of course, the system FAÇADE proposed by Debevec [Debevec 96] offered a solution based on parametric blocs. More recently, [Chen 99] has suggested using parallelepiped available in the scene to calibrate the intrinsic and the extrinsic parameters of the camera. A more general approach has been proposed by Wilczkowiak and al. in [Wilczkowiak 01]. Last, a probabilistic approach has been suggested by Dick and al. in [Dick 01]. I consist to use a priori knowledge of the building style to define a statistical distribution of specific architectural elements. After a learning step and a calibration step, a RANSAC process is used to segment plane in the scene and then, to identify architectural element in the scene.
1.3
Mixed approach: elementary features and constraints
The last approach is a mixture of the two previous one. The idea is to keep the flexibility of the elementary features set associated to a reduction of the number of degrees of freedom induced by the definition of constraints. Of course, the type of constraints available will limit the class of model that can be reconstructed by the method. The most often used constraints are the geometric one. Of course, the simplest to use are the linear constraints (colinearity…). This subject has been studied in [Slezisky 98] in the case of coplanar points, [Sparr 98] for co-planarity and parallelism, [Bondyfallat 98] introduce orthogonality and the belonging of a point to a plane or a line. [Grossman 00] and [Wilczkowiak 03a] have introduced linear and bilinear constraints, and solved the resulting system of equation as linear systems. More recently, in [Bazin 01] and [Wilczkowiak 03b] the problem of dealing with redundant and multiple constraints has been studied.
1.4
Motivation of our choice
In this Phd thesis, we have chosen to study the problem of reconstruction of building with a light set of images. Consequently the automatic approaches were difficult to apply. So, a mixed solution has been chosen. The user defines elementary features and then, introduces simple geometric constraints in the model. This method is close to the CAO method in the perspective of the constraint definition while offering a large flexibility. We will now present in detail this method. The fig. 1 presents the general scheme of our system. The three main components are the user interface used to define constraints and to locate 3D points in the images. Associated to this interface and hidden from the user point of view, a modelling algorithm build data representation that enclosed all the available information in a form easy to use in the computation step. So, the second part of method is the computation task that autocalibrates the cameras and evaluates the 3D position of each 3D points in respect of constraints. Last, the final step is the conversion of this model to a standard 3D format. We have used VRML as it enable to exchange easily file by the internet.
Fig. 1. overview of the reconstruction system
2
DEALING WITH CONSTRAINTS
In order to deal with a simple model and to reduce the dimensionality of the problem we have searched a way to integrate constraint into a low level features description. Usually one can find in literature two types of constraint: the soft and the hard ones. Hard constraints have to be respected strictly. For example, the coordinates of a particle of water streaming in a pipe cannot correspond to a point out of the pipe… In other word, hard constraints define a subspace where parameters representing the model can evaluate and a forbidden subspace, the complementary of the previous one, which will never be reached. In computer vision, such a constraint is met in the case of the focal length as we often wish to forbid negative value for this parameter. On the other side, a constraint can be softly required. It is often traduce by a penalty apply proportionally to the overflow. Such a soft constraint is defined in bundle adjustment when we try to minimise the distance between a point detected in the image and the back-projection of the corresponding 3D point. In our case, we have introduced hard constraint to reduce the size of the minimisation problem (see below). The goal is to provide a mathematical description of the 3D model. The model equation will be defined by a tree and a vector of parameters. For any combination of this parameter’s vector, the resulting model will strictly respected the given constraints. Consequently, we build interactively a function whose entrance is a set of parameters and whose output is a set of elementary features: points in our case.
constraint equation takes in entrance those parameters and the position of already existing points to compute the 3D position of the nth point. If the equation corresponding to the available constraints respects the previous properties then, the full model does. Moreover, this model is the smallest in term of the number of parameters in front of the defined constraints. Nevertheless, this method presents many drawbacks: 1.
2. 3.
The tree structure implies to compute points position in function of the root ones. Consequently, the errors are propagated and may pollute the final result, specially the final 3D points (the more recently defined). The definition of linked constrains may create a lot of local minima and drastically reduce the advantage of the small number of parameters. This method cannot deal with redundant constraints.
Hopefully, we will demonstrate in the last section of this article that those drawbacks do not crumble down our method. This method is founded on elementary features like points. Nevertheless, it is really simple to define more complex model close to CAO ones by the combination of points and constraints. For example, the figure 3 presents the definition of a rectangular parallelepiped. This example open the way to a future development of Macro objects definition. In this case, the user will manipulated classic object such as cube, sphere … while the algorithms use a tree description. This merge may offer us a high level of flexibility associated to highly condensate description of the 3D model (specifically in term of parameters).
As we will use a Levenberg-Marquard method to solve the reconstruction problem, our model function has to check this properties: • Continuity • Derivability • Injectivity All these properties are implicitly respected by our reconstruction process. Our process is a recursive process, the two firsts points are necessarily absolutely free point (I mean points with three degrees of freedom).
Fig. 2. Definition of a new point. Imagine we have now define an ‘n-1 points’ model, we will now add a new point, the nth one. As describe in figure 2, the new point is localised in function of the position of the already existing ones. The leaving free degrees of freedom are filling by parameter that specifies the location of the nth point. The
Fig 3. Definition of a rectangular parallelepiped. As you can observed, all the model is build from the two first points.
3
CACHE_POS: HIDING SOME PARAMETERS TO THE BUNDLE ADJUSTMENT
noise added to the position of point detected in the image do not modify the success rate.
Once we have designed the model structure with the help of constraint, we need to find the parameter vector that respect the position of point located in the images. We suppose here that no intrinsic or extrinsic camera data are available. As our model is a function, we try to minimise the distance between point in the image and the projection of 3D points.
3.1
Estimation algorithm
The estimation algorithm is a bundle adjustment. However, it is a variant that has been published in [Cornou 02a] and in [Cornou 03a]. The difference with the usual approach is that we do not minimise the pose parameters as non-linear parameters but with the help of a pose estimation process inside the bundle adjustment. An overview of the process is presented in the figure 4.
Fig. 5. Comparison of the usual and the CACHE_POS approach in front of the 2D noise added to feature point position in image and the quality of the 3D initialisation. Profile of a convergence To understand the convergence error in CACHE_POS, we have drawn the root mean square (RMS) of the minimised function in function of the focal length. The figure 6 presents the result of this study.
Fig. 4. The minimisation process. At each iteration, before estimating the normal equation the pose of each camera is computed in respect to the current parameters that define the position of the 3D point. As temporally the position of 3D point is fixed, we can apply classical pose estimation algorithms such as [DeMenthon 92].
3.2
Some properties
In order to validate this method, we made some experiments on success rate in function of the initialisation and the noise in image data. The results are quite amazing as you can observe in figure 5. Influence of the initial state This first test has been made on synthetic data; it shows a comparison between the CACHE_POS method and the classic approach. The 3D model is a set of twenty 3D points observed with a unique camera which shot five images. The bar indicates different level of noise in the 2D point positions and the X axis corresponds to the initial mean distance between the perfect 3D model and the initial one. The perfect 3D data are contains in a sphere whom radius is one unit. On the figure 5, we observe that the usual method success rate collapse when the initial 3D error is up to 100% (1unit) while the proposed method maintains a good score with a success rate around 40%. We also notice that, whatever the method, the 2D
Fig 6. On the top, the RMS has been drawn for the range of focal length between 0 and 2000 pixels. On the bottom the range is 500 to 5000 pixels. For very short focal length we observe an unstable section that present plenty of local minima. This comportment is due to the pose estimation algorithm used that supposed that the object if “far” from the camera. If during the estimation the trajectory followed bye the camera pass close to the object we fall down in this case and the convergence failed.
At the opposite, for high focal length we observe an asymptotic comportment. Once the optimisation trajectory is in this area, the gradient is very close to zero and the optimisation process evaluate very slowly. Consequently, the optimisation failed to converge. Last, we notice a minimum quite well identify at f=200 pixels. Nevertheless, it seems to be a non differentiable point. We have not encountered such problem during our experiment. This bundle adjustment algorithm has been used in coordination with the modelling tools previously presented. The next section show reconstruction obtained with a global interactive system founded on these two methods.
3.3
Experimental Evaluation
To experiment our method on real data we have used a sequence of 6 images of a desk. These images (cf figure 6) have been taken the same camera and the same camera lens. Images size is 1204*801 pixels. The principal point is supposed to be at the centre of the images and we have consider the two focal length fx and fy as unknown.
Fig. 7. Metric reference and length measurements. On the Figure 8, the crosses represent the back-projection of 3D points of the initial 3D cloud in the image after a simple Dementhon algorithm pose estimation. Of course, 3D projections are really far from the solution. The image in the Figre 9 shows the final state. 3D points are well back-projected.
Fig. 6. The set of images uses for the reconstruction. As you can observe, all images have been taken from the same side of the desk. 22 3D points have been manually located in images. Some of them where not seen in all images. To compare with the reality we have put a metric reference, it is indicated in the Erreur ! Source du renvoi introuvable. by the legend “1 meter”. We have defined distance to evaluate in the scene. Most of 3D points correspond to the extremity of these edges (except the dot on the wall). In each images 63 percents of 3D points are observed and the initial focal length was fixed to 2000 pixels. Of course, we have made the measurement of these elements and we know the real value of each distance indicate with a number in figure 7. To reconstruct the scene we did not have any initial values for the 3D position of 3D points. We have randomly generated clouds of 3D points and we launched our optimisation method. We obtain a success rate of 44%. This is coherent with results obtain on synthetic data (cf 4.3 - Tableau 1). Successful results were obtained in 1,5 seconds.
Figure 8 : Initial back-projection of 3D points (just a call to the Dementhon algorithm to locate the camera)
Figure 9: Final back-projection of 3D points.
A comparison of the measurement with the real distances is then performing. The focal length estimation is fx=1844,5 pixels and fy=1847,2 pixels. The results for the 3D distances estimation are: Number 1 2 3 4 5 6 7 8 9 10 11 Average
Real length (cm) 20.4 23.5 23.5 20.4 14.8 14.4 20.6 23.7 20.6 23.6 120.0
Estimate length (cm) 20.43 23.34 23.40 20.67 14.74 14.42 20.59 23.57 20.47 23.55 119.79
Relative error (%) 0.14 0.68 0.42 1.32 4.05 1.38 0.48 0.54 0.63 0.21 0.18 0.64
upstairs windows, one sort of dormers windows, and the central advance of the frontage. Each element has been independently reconstructed from adequate images. With only three images that are available concerning downstairs windows, we define a constrained model using our graph description and we apply our bundle adjustment method to recover the 3D structure and the camera parameters. After a manual surface definition, textures have been extracted from the images. This process has been applied to each element. Then, all the elements have been merged together to obtain a full model of the castle. For example, to locate the downstairs windows, window corners have been defined in the frontage with geometric constraints and reconstructed with the help of images (the body ones). These windows are located up to the castle body resolution, but their proportion and their texture have been previously defined with close-range images. For repetitive elements, a unique model has been used. This merging process has been applied to each detail and the results are presented in the following figures.
Our estimations are close to the real values. It confirms that our reconstruction is rather good. This reconstruction has been obtained without fixing the coordinate frame we have just applied a scale factor to compare our results with the reality. Of course this gauge fixing conditioned our length evaluation nevertheless this experiment is coherent with the synthetic study previously made (not presented here).
4
EXAMPLES
The reconstruction of building in urban environment is a difficult problem. First the variety of building impose to choose very generic method, second, the urban environment limited the available point of view. Of course, the question of the choice of point of view has been widely studied by photogrammeter [Atkinson 96] but, few cases enable to choose the convenient point of view. Consequently, we have decided to use an uncalibrated approach. The intrinsic and the extrinsic camera calibration are unknown. We present here three applications; the first one is the rendering of the Sceaux castle (near Paris). The second and the third one are models obtained with a ship numeric camera.
4.1
The Sceaux sequence
The “Château de Sceaux” has been chosen as an example. It is a French XVII century castle. Some specific elements have been selected. 19 photographs of these elements have been taken with a digital camera (Nikon D100) (figure 10). 2 focal lengths 28 mm and 135 mm have been used. The image resolution is 2000x3008 pixels, the distortions have not been corrected and the focal lengths are unknown. The photographs have been taken from the ground and no information on the camera poses is available. If we look in detail to the castle elements, it appears a wide variety of details (windows, gutters, bas relief…). The user has to choose the details he wants to reconstruct (in function of the needs), because it would be too costly to reconstruct each detail of the building. Whatever, it is always possible to complete the model later by adding new part of the building reconstructed with a new set of images. Here, we present the reconstruction of the castle body, the downstairs and
Fig. 10. The image used to model the Sceaux Castle.
4.2
A church
This example shows the result obtain with 6 images taken with a low-cost camera. No calibration was available.
Fig. 13. Results, 5 images were available. Fig. 11. Reconstruction of a church from holiday snapshot.
CONCLUSION 4.3
A house in Brittany
This example shows the result obtain with 6 images taken with a low-cost camera. No calibration was available. The figure 12 illustrates the difficulty to obtain correct point of view. Only the blue part was authorised…
A new approach for building reconstruction has been presented here. We suggest a method using constraints on 3D points with simple and intuitive geometric rules. This result is an easy-touse tool that offers the flexibility of low-level features approaches and the modularity of primitive-based methods. Moreover, a new bundle adjustment approach (without camera poses initialisation) has been used to estimate these models. At the end, this method has been successfully applied on a real sequence and a multi-scale model of the “Château de Sceaux” has been obtained. Two reconstructions have also been achieved from low-cost camera image. Electronic format of the corresponding thesis can be obtained by e-mailing at:
[email protected]
Acknowledgments Fig. 12. Limitations in the choice of points of view.
This article is the result of S. Cornou PhD Thesis. This work has been led with the financial support of the “région Auvergne”. Thanks to all the researchers of the LASMEA in ClermontFerrand and to the researchers of the “Commissariat à l’Energie Atomique” for their support.
References [Bartoli 03] A. Bartoli. – Reconstruction et alignement en vision 3D : points, droites, plans, cam´eras. – Phd thesis Institut national polytechnique de grenoble, septembre 2003. [Bazin 01] P.L. Bazin. – Modélisation et estimation du mouvement et de la structure dans une séquence d’images à partir d’indices géométriques épars, avec applications à la post-production audio-visuelle. – Phd thesis - Universit´e de Paris-Sud - Orsay, juin 2001. [Bondyfallat 98] D. Bondyfallat, S. Bougnoux. – Imposing euclidean constraints during self-calibration process. – SMILE workshop, 1998. [Boufama 95] B. Boufama, R. Mohr. – Epipole and fundamental matrix estimation using the virtual parallax property. 4th International Conference on Computer Vision (ICCV 1995), pp. 1030–1036, june 1995. [CAN] http ://www.canoma.com/.
[Mahamud 00] S. Mahamud, M. Hebert. – Iterative projective reconstruction from multiple views. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2000. [Moron 96] V. Moron. – Mise en correspondance de données 3d avec un modèle cao. mai 1996. [Navab 02] N. Navab. – Canonical representation and three view geometry of cylinders. ISPRS Commission III Photogrammetric Computer Vision, pp. 218–224, 2002. [PHI] www.phocad.de/produkte/phidias/english/english.html. [PHO] http://www.photomodeler.com/. [Pollefeys 99] M. Pollefeys. – Self-calibration and metric 3d reconstruction from uncalibrated image sequences. PhD thesis Katholieke Universiteit, Leuven, 1999. [Sayd 01] P. Sayd, S. Naudet, M. Viala, L. Cohen, A. Dumont, F. Jallon. – Application : Aoms un outil de relev´e 3d d’environnements industriels. ORASIS 2001, 2001.
[Chen 99] C. Chen, C. Yu, Y. Hung. – New calibration-free approach for augmentef reality based on parameterized cuboid structure. ICCV, pp. 1 :30–37, 1999.
[Sparr 98] G. Sparr. – Euclidean and a_ne structure/motion for uncalibrated cameras from a_ne shape and subsidiary information. – SMILE workshop, 1998.
[Cornou 02a] S. Cornou, M. Dhome, P. Sayd. – Bundle adjustment : a fast method with weak initialisation. – British Machine Vision Conference (BMVC’02), pp. 223–232, 2002.
[Szeliski 98] R. Szeliski, P. H. S. Torr. – Geometrically constrained structure from motion : points on planes. – SMILE workshop, pp. 171–186, 1998.
[Cornou 03a] S. Cornou, M. Dhome, P. Sayd. – Architectural reconstruction with multiple views and geometric constraints. – British Machine Vision Conference (BMVC’03), 2003.
[Werner 02a] T. Werner, A. Zisserman. – Model selection for automated architectural reconstruction from multiple views. – Proceedings of the British Machine Vision Conference, pp. 53–62, 2002.
[Debevec 96] Paul E. Debevec, Camillo J. Taylor, Jitendra Malik. – Modeling and rendering architecture from photographs : A hybrid geometry- and image-based approach. Computer Graphics, 30(Annual Conference Series) :11–20, 1996. [DeMenthon 92] D. DeMenthon, L.S. Davis. – Exact and approximate solutions of the perspective-three-point problem. – IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, pp. 1100–1105, 1992.
[Werner 02b] T. Werner, A. Zisserman. – New techniques for automated architecture reconstruction from photographs. – Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, vol. 2, pp. 541– 555. Springer-Verlag, 2002. [Wilczkowiak 01] M. Wilczkowiak, E. Boyer, P. Sturm. – Camera calibration and 3d reconstruction from single images using parallelepipeds. ICCV, pp. 1 :142–148, 2001.
[Dick 01] A. Dick, P. Torr, S. Ru_e, R. Cipolla. – Combining single view recognition and multiple view stereo for architectural scenes. – 8th International Conference on Computer Vision (ICCV), pp. 268–274, 2001.
[Wilczkowiak 03a] M. Wilczkowiak, P. Sturm, E. Boyer. – The analysis of ambiguous solutions in linear systems and its application to computer vision. – BMVC, pp. 53–62, 2003.
[Ertl 01] Thomas Ertl, Bernd Girod, Heinrich Niemann, HansPeter Seidel (´edit´e par). – Extracting Cylinders in Full 3D Data Using a Random Sampling Method and the Gaussian Image. – Aka GmbH, 2001, 35–42p.
[Wilczkowiak 03b] M. Wilczkowiak, G. Trombettoni, C. Jermann, P. Sturm, E. Boyer. – Scene modeling based on constraint system decomposition techniques. – ICCV, pp. 1004– 1010, 2003.
[Grossman 00] E. Grossman, J. Santos-Victor. – Dual representation for vision-based 3d reconstruction. – BMVC, 2000. [IMA] http ://www.realviz.com/. [Lhuillier 03] M. Lhuillier, L. Quan. – Image-based rendering by joint view triangulation. IEEE Transactions on Circuits and Systems for Video Technology, pp. 13(11) :1051–1063, 2003.