Creating Photorealistic Models by Data Fusion ... - Semantic Scholar

Creating Photorealistic Models by Data Fusion with Genetic Algorithms D. Chetverikov, Z. Jank´ o, E. Lomonosov, and A. Ek´art Computer and Automation Research Institute, Budapest, Kende u.13-17, H-1111 Hungary [email protected]

1 Introduction Building photorealistic 3D models of real-world objects is a fundamental problem in computer vision and computer graphics. Such models require precise geometry as well as detailed texture on the surface. Textures allow one to obtain visual effects that are essential for high-quality rendering. Photorealism is further enhanced by adding surface roughness in form of the so-called 3D texture represented by a bump map. Typical applications of precise photorealistic 3D models are: • Digitalising cultural heritage objects (the Digital Michelangelo project [23], the Pieta project [8], the Great Buddha project [9]). • Surgical simulations in medical imaging. • E-commerce. • Architecture. • Entertainment (movie, computer games). Different techniques exist to reconstruct the object surface and to build photorealistic 3D models. Although the geometry can be measured by various methods of computer vision, for precise measurements laser scanners are usually used. However, most of laser scanners do not provide texture and colour information, or if they do, the data is not accurate enough. (See [30] for a detailed discussion.) We are currently working on several projects related to the automatic fusion and high-level interpretation of 2D and 3D sensor data for building models of real-world objects and scenes. One of our major goals is to create rich and geometrically correct, scalable photorealistic 3D models based on multimodal data obtained by different sensors, such as camera and laser scanner. In this chapter, we present a sophisticated software system that processes and fuses geometric and image data using genetic algorithms and efficient methods of computer vision.

2

D. Chetverikov, Z. Jank´ o, E. Lomonosov, and A. Ek´ art

Our primary goal is to create a system that only uses a PC, an affordable laser scanner and a commercial (although high-quality) uncalibrated digital camera. The camera should be used freely and independently from the scanner. No other equipment (special illumination, calibrated setup, etc.) should be used. No specially trained personnel should be needed to operate the system: After training, a computer user with minimal engineering skills should be able to use it. The above mentioned ambitious projects [8, 9, 23] have developed sophisticated technologies for digitising statues and even buildings, but these technologies are extremely expensive and time-consuming due to the size of the objects to be measured. They require specially designed equipment and trained personnel. Creation of a model takes weeks [8] or even months. What we aim at is a much simpler system that would, for instance, allow an arts expert in a museum to create, in 1–2 hours, a scalable photorealistic model of relatively small cultural object (say, up to 50 cm in size) and transmit this model to his/her colleague in another part of the world. Our modelling system receives as input two datasets of diverse origin: a number of partial measurements (3D point sets) of the object surface made by a hand-held laser scanner, and a collection of high quality images of the object acquired independently by a digital camera. The partial surface measurements overlap and cover the entire surface of the object; however, their relative orientations are unknown since they are obtained in different, unregistered coordinate systems. A specially designed genetic algorithm (GA) automatically pre-aligns the surfaces and estimates their overlap. Then a precise and robust iterative algorithm (Trimmed Iterative Closest Point, TrICP [4]) developed in our lab is applied to the roughly aligned surfaces to obtain a precise registration. Finally, a complete geometric model is created by triangulating the integrated point set. The geometric model is precise, but it lacks texture and colour information. The latter is provided by the other dataset, the collection of digital images. The task of precise fusion of the geometric and the visual data is not trivial, since the pictures are taken freely from different viewpoints and with varying zoom. The data fusion problem is formulated as photo-consistency optimisation, which amounts to minimising a cost function with numerous variables which are the internal and the external parameters of the camera. Another dedicated genetic algorithm is used to minimise this cost function. When the image-to-surface registration problem is solved, we still face the problem of seamless blending of multiple textures, that is, images of a surface patch appearing in different views. This problem is solved by a surface flattening algorithm that gives a 2D parametrisation of the model. Using a measure of visibility as weight, we blend the textures providing a seamless and detail-preserving solution. All major components of the described system are original, developed in our laboratory. Below, we present the main algorithms and give examples of photorealistic model building using GA-based registration and fusion of spatial and pictorial data. Since GA-based data fusion plays a key role in our

Creating Photorealistic Models

3

project, we start with a discussion of the previous use of GAs for this purpose. The discussion is based on our paper [22]. 1.1 Previous work on genetic algorithms in data registration Genetic algorithms have already been used for registration of 3D data. In their recent survey on genetic algorithms in computer-aided design [25], Renner and Ek´ art mention a few related studies. In particular, Jacq and Roux [15] applied GAs to registration of medical 3D images. They considered elastic transformation of 3D volumetric bitmaps rather than rigid transformation of 3D point clouds. Cordon and co-authors [6] also addressed the problem of 3D image registration. They used a CHC evolutionary algorithm and compared binary coding to real value coding. Brunnstr¨ om and Stoddart [2] used GAs to match 3D surfaces represented by measured points. Their approach significantly differs from ours as they perform optimisation in the correspondence space rather than the transformation space. A genetic algorithm finds the best matches, then the optimal transformation is calculated from these matches. The cost function used in [2] is computationally demanding: the number of operations needed to evaluate it grows quadratically with the number of points. For this reason, the method is only applicable to a relatively small number of points (up to a few hundreds). Yamany et al. [29] used a genetic algorithm for registration of partially overlapping 2D and 3D data by minimising the mean square error cost function. The method is made suitable for registration of partially overlapping data sets by only considering the points such that pi ∈ G1 ∪ G2 , where G1 and G2 are space bounding sets for the two datasets. Unfortunately, the authors give very few details about their genetic algorithm, focusing on the Grid Closest Point transformation they use to find the nearest neighbour. Salomon et al. [27] apply a so-called differential evolution algorithm to medical 3D image registration. Differential evolution uses real-valued representation and operates directly on the parameter vector to be optimised. Otherwise, only the reproduction step is different from GA. On the other hand, this method requires much more computation than the simpler algorithm we use. In [27], differential evolution is used to register two roughly pre-aligned volumetric images of small size. Robertson and Fisher [26] applied GAs to the registration of range data. They use the mean square error objective function and perform optimisation in the six-parameter space not considering the overlap. Their genetic algorithm uses four different mutation operators and two crossover operators. Each of the operators has a probability of selection associated with it. The algorithm adjusts these probabilities adaptively. The study [26] includes experiments with partially overlapping measured data; however, no method to automatically deal with arbitrary overlap is presented. Silva et al. [28] use GAs for alignment of multiview range images. A novel Surface Interpenetration Measure is introduced that allows one to evaluate

4


registration results more precisely, especially in the case of small registration errors. A predefined residual threshold is used to handle partial overlap. However, no method is given to adjust the threshold automatically. Since the fitness function depends implicitly on the range scanner measurements error and the interpoint distance (quantisation error), it might be necessary to adjust the threshold manually for other datasets. Hill-climbing techniques are used to refine the GA result. A recent study on the use of genetic algorithms for range data registration appeared in [5]. Euclidean parameter space optimisation is considered. To handle partially overlapping data, the median of the residuals is used as error metric. This improves the robustness but renders the method inapplicable to data overlaps below 50%. An advanced operator of dynamic mutation is introduced, which reportedly improves registration error and helps avoid premature convergence. An attempt is made to improve the precision by using dynamic boundaries. After the GA has converged, the search space is reduced and the GA is applied again. However, using genetic algorithms for precise registration does not seem reasonable since faster and more precise iterative methods exist for registration of pre-aligned datasets.

2 Pre-registration of surfaces using a genetic algorithm This section deals with genetic pre-alignment of two arbitrarily oriented datasets, which are partial surface measurements of the object whose model we wish to build. (See figure 1 for an illustration of such measurements.) The task is to quickly obtain a rough pre-alignment suitable for subsequent application of the robust Trimmed Iterative Closest Point algorithm [4] developed in our lab earlier. Our experience with TrICP shows that, depending on the data processed, the algorithm can cope with initial angular misalignments up to 20◦ ; 5◦ is certainly sufficient. This means that the genetic pre-registration should provide an angular accuracy of 5◦ , or better.

Frog data

GA

GA+TrICP

Fig. 1. The Frog dataset, GA alignment and final alignment.


5 N

Consider two partially overlapping 3D point sets, the data set P = {pi }1 p m and the model set M = {mi }N 1 . Denote the overlap by ξ. Then the number of points in P that have a corresponding point in M is Npo = bξNp c. The standard ICP [1] assumes that P is a subset of M. ICP iteratively moves P onto M while pairing each point of P with the closest point of M. The cost function of ICP is the mean square error (MSE), that is, the mean of all residuals (distances between paired points). In contrast to ICP, our TrICP [4] only assumes a partial overlap of the two sets, which is more realistic. TrICP finds the Euclidean motion that brings an Npo -point subset of P into the best possible alignment with M. The algorithm uses another cost function. At each iteration, Npo points with the least residuals are selected, and the optimal motion is calculated for this subset so as to minimise the trimmed MSE e=

Npo 1 X 2 d , Npo i=1 i:Np

(1)

N

where {d2i:Np }1 p are the sorted residuals. The subset of the Npo paired points is iteratively updated after each motion. In practice, the overlap ξ is usually unknown. It can be set automatically [4] by running TrICP for different values of ξ and finding the minimum of the objective function e(ξ, R, t) Ψ (ξ, R, t) = , (2) ξ2 which minimises the trimmed MSE while trying to use as many points as possible. When an object is scanned by a 3D scanner, P and M are often obtained in different coordinate systems. As a result, their orientations may be very different. TrICP provides an efficient and robust solution when the two sets are roughly pre-registered. This is typical for all iterative algorithms, for which the pre-alignment is usually done manually. Our genetic pre-registration [22] procedure complements TrICP yielding a robust and completely automatic solution. The genetic pre-registration algorithm minimises the same objective function Ψ (ξ, R, t) as TrICP, but this time as a function of all the seven parameters, namely, the overlap ξ, the three components of the translation vector t, and the three Euler angles of the rotation matrix R. The difference between the genetic solution and the overlap selection procedure [4] is essential. The former means evaluating Ψ (ξ, R, t) for different values of ξ, R, and t, while the latter means running TrICP for different values of ξ. Our genetic solution provides an elegant way to estimate the overlap and the optimal motion simultaneously, by treating all parameters in a uniform way. The solution [4] only works for pre-registered sets. If desired, it can be used to refine the overlap estimate obtained by the GA; we did not do this in our tests.

6


To minimise the objective function Ψ (ξ, R, t), we applied a genetic algorithm tuned to the problem. The objective function was evaluated by mapping each integer parameter onto a real-valued range using normalisation. Simple one-point crossover was employed. Different population sizes were tested and an optimal value was selected for the final experiments. Two mutation operators were introduced. Shift mutation shifts one parameter randomly by a value not exceeding 10% of the parameter range, while replacement mutation replaces a parameter with a random value. The corresponding probabilities were also set after preliminary experimentation. Tournament selection was applied, as it is easy to implement and helps avoid premature convergence. An elitist genetic algorithm was employed, where one copy of the best individual was transferred without change from each generation to the next one. The method is presented in detail in our paper [22]. The main steps of the genetic algorithm are as follows: 1. Generate initial population. 2. Calculate objective function values. 3. Apply genetic operators (crossover, mutation) to selected individuals. Next generation will contain the offspring and, additionally, the best fit individual from current generation. 4. Calculate objective function values for the new population. 5. If best fitness has not changed for Ng generations, stop and designate the best fit individual as solution, otherwise go to step 3. 6. If the solution obtained at step 5 is not of acceptable precision, restart from step 1. We have tested the genetic pre-alignment and the combined method (GA followed by TrICP) on different data. Three examples, the Frog, the Angel and the Bird data, are shown in figures 1, 2 and 3, respectively. To test the method under arbitrary initial orientations, set P was randomly rotated prior to alignment in each of the 100 tests. Results of all tests were visually checked. No erroneous registration was observed. Typical results of alignment are displayed in figures 1, 2 and 3. In each figure, the first two pictures show the two datasets to be registered. The datasets result from two separate measurements of the same object obtained from different angles. The third picture of each figure (GA) displays the result of our genetic preregistration algorithm. Here, the two datasets are shown in different colours. One can see that the datasets are roughly registered, but the registration quality is not high: the surfaces are displaced, and they occlude each other in large continuous areas instead of ‘interweaving’. Finally, the rightmost picture is the result of the fine registration obtained by TrICP using the result of the genetic pre-registration. Here, the surfaces match much better, and they are interwoven, which is an indication of the good quality of the final registration. We have analysed the precision and the computational performance of the genetic alignment algorithm. Some of the numerical experimental results obtained in the 100 tests are presented in tables 1 and 2. (For more numerical


Angel data

GA

7

GA+TrICP

Fig. 2. The Angel dataset, GA alignment and final alignment.

Bird data

GA

GA+TrICP

Fig. 3. The Bird dataset, GA alignment and final alignment.

results, see [22].) To assess the precision, reference values were obtained by the TrICP algorithm [4] applied to the result of genetic registration. (Since TrICP is much more precise than GA, its output can be used as ground truth.) Table 1 shows the angular error of genetic alignment. The error is defined as follows. Rotation by any matrix R can be viewed as rotation by an angle α around some axis. This angle can be calculated as α = arccos

R11 + R22 + R33 − 1 , 2

(3)

where Rii are the diagonal elements of R. To compute the angular error, we obtain the matrix that rotates the result of the GA to the result of TrICP. The angular registration error shown in table 1 is the angle α calculated by (3) for this rotation matrix. Table 1. GA angular registration errors α (◦ ) Dataset Frog Angel Bird

Mean 0.89 1.17 2.15

Max 1.72 2.11 3.34

Min 0.23 0.28 0.41

Median 0.87 1.15 2.24

Mean deviation 0.22 0.38 0.50

8


Finally, table 2 summarises the performance of genetic alignment in terms of the execution time, the number of elapsed generations, and the number of evaluations of the objective function. The reported experiments were performed on a 2GHz P4 PC running Linux. Table 2. GA computational performance dataset Exec. time (s) Generations Mean Max Mean Max Frog 58 202 203 654 Angel 29 62 122 252 Bird 53 120 176 377

Evaluations Mean Max 101,291 327,004 61,151 126,002 88,046 188,502

3 Fusion of surface and image data In this section, we address the problem of combining geometric and textural information of the object. As already mentioned, the two sources are independent in our system: the 3D geometric model is obtained by 3D scanner, then covered by high quality optical images. After a brief survey of relevant previous work, we discuss our photo-consistency based registration method with genetic algorithm based optimisation. Then we deal with the task of blending multiple texture mappings and present a novel method which combines the techniques of surface flattening and texture merging. Finally, results for real data are shown. 3.1 Registering images to a surface model All registration methods discussed in section 1.1 register unimodal data structured in the same way (measured point sets, range datasets). Fewer methods are available for fusion of essentially different types of data, such as registration of images to surfaces. (Note that this is different from the multimodal image registration, where the data structure is the same.) Several 2D-to-3D (image-to-surface) registration methods were proposed in computer vision and its medical applications. Most of them are based on feature correspondence: Feature points are extracted both on the 3D surface and in the images, and correspondences are searched for. (See, for example, [11, 12, 21].) However, the features are often difficult to localise precisely in 3D models. In addition, defining similarity between 2D and 3D features is not easy. Intensity based registration is another approach to the problem. The algorithm of Clarkson et al. [10] applies the photo-consistency to find the precise registration of 2D optical images of a human face to a 3D surface model. They use calibrated images, thus the problem is reduced to estimating the pose of


Images

3D model

9

Textured model

Fig. 4. The Bear Dataset and result of registration of images to surface.

the cameras. None of the methods [10, 11, 12, 21] uses genetic algorithms, since the optimisation problems they consider are easier and solved faster by conventional non-linear iterative strategies. We do not use calibrated camera, so the number of parameters is much higher. The size of the parameter space and the behaviour of the cost function motivated the use of genetic algorithm-based optimisation. Based on our paper [16], let us discuss the registration of images to a 3D model. The input data consists of two colour images, I1 and I2 , and a 3D surface model. They represent the same object. (See figure 4 for an example.) The images are acquired under fixed lighting conditions and with the same camera sensitivity. (The latter can be easily achieved if the images are taken by the same camera.) All other camera parameters may differ and are unknown. The raw data is acquired by a hand-held 3D scanner, then processed by the efficient and robust triangulator [18] developed in our lab. The 3D model obtained consists of a triangulated 3D point set (mesh) P with normal vectors assigned. The finite projective camera model is used to project the object surface to the image plane: u ' P X, where u is an image point, P the 3 × 4 projection matrix and X a surface point. (' means that the projection is defined up to an unknown scale.) The task of registration is to determine the precise projection matrices, P1 and P2 , for both images. Since the projection matrix is up to a scale factor, it has only 11 degrees of freedom in spite of having 12 elements. The collection of the 11 unknown parameters is denoted by p, which represents the projection matrix P as an 11-dimensional parameter vector. Values of p1 and p2 are sought such that the images are consistent in the sense that the corresponding points – different projections of the same

10


3D point – have the same colour value. Note that the precise mathematical definition is valid only when the surface is Lambertian, that is, the incoming light is reflected equally to every direction on the surface. This is usually true for diffuse surfaces. The formal definition is the following: We say that images I1 and I2 are consistent by P1 and P2 (or p1 and p2 ) if for each X ∈ P: u1 = P1 X, u2 = P2 X and I1 (u1 ) = I2 (u2 ). (Here Ii (ui ) is the colour value in point ui of image Ii .) This type of consistency is called photoconsistency [10, 20]. The photo-consistency holds for accurate estimates for p1 and p2 . Inversely, misregistered projection matrices mean much less photo-consistent images. The cost function is the following: Cφ (p1 , p2 ) =

1 X 2 kI1 (P1 X) − I2 (P2 X)k . |P|

(4)

X∈P

Here φ stands for photo-inconsistency while |P| is the number of points in P. Difference of the colour values kI1 − I2 k can be defined by a number of different colour models. (For details see [16].) Finding the minimum of the cost function (4) over p1 and p2 yields estimates for the projection matrices. The cost function (4) is robustified against occlusion and wrong measurements. Occluded points are eliminated by using the surface normals, and the outliers by rejecting a certain amount of the smallest and largest squares (α-trimmed mean technique). In spite of the simplicity of the cost function Cφ (p1 , p2 ), finding the minimum is a difficult task. Due to the 22-dimensional parameter space and the unpredictable shape of Cφ (p1 , p2 ), the standard local nonlinear minimisation techniques failed to provide reliable results. We have tested a number of widely-known optimisation methods: Newton-like methods, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) variable metric method and the Levenberg-Marquardt algorithm. Experiments have shown that local search techniques terminate every time in local minima quite far from the expected global optimum. A global nonlinear optimisation technique has also been tested. However, the stochastic optimisation method by Csendes [7] did not yield acceptable results either. The randomness of a stochastic method is excessive, and it does not save nearly good solutions. In contrast, genetic algorithms preserve the most promising results and try to improve them. (Running a GA without elitism yields unstable and imprecise results, similarly to the stochastic optimisation.) The methods mentioned above and other popular techniques, such as simulated annealing and tabu search process one single solution. In addition to performing a local search, simulated annealing and tabu search have specific built-in mechanisms to escape local optima. In contrast, genetic algorithms work on a population of potential solutions, which compete for survival. The competition is what makes GAs essentially different from single solution pro-


11

cessing methods [24]. Consequently we decided to apply a genetic algorithm, as a time-honoured global search strategy. We pre-register the images and the 3D model manually. This yields a good initial state for the search, which narrows the search domain and accelerates the method. Manual pre-registration is reasonable since this operation is simple and fast compared to the 3D scanning, which is also done manually. The photo-consistency based registration makes the result more accurate. The genetic algorithm starts by creating the initial population. The individuals of the population are chosen from the neighbourhood of the parameter vector obtained by the manual pre-registration. The values of the genes are from the intervals defined by the pre-registered values plus a margin of ±. In our experiments was set to values between 1% and 3%, depending on the meaning and the importance of the corresponding parameter. The individual that encodes the pre-registered parameter vector is also inserted in the initial population to avoid losing it. To avoid premature convergence we decided to run the algorithm three times: The algorithm starts three times from the beginning, preserving only the best individual from the previous step and re-initialising the whole population. An iteration is finished if a certain number of generations has been created, or if the best of the population has changed 10 times. Our genetic algorithm for image-to-surface registration works as follows: BEST ←− manual pre-registration. For three times Generate initial population around BEST. BEST ←− best of the population. Loop Calculate cost function values. Apply genetic operators, create new population. BEST ←− best of the population. Until (Ng generations created) or (BEST changed 10 times). End for. We have tested the method with a number of different genetic settings. With proper settings, the projection error of registration can be decreased from 18–20 pixels (the average error of the manual pre-registration) to 5– 6 pixels. After preliminary testing with semi-synthetic data, the following genetic setting has been selected: Steady state algorithm with Tournament selector, Swap mutator and Arithmetic crossover, with 250 individuals in the population, with mutation probability of 0.1 and crossover probability of 0.7. The typical running time with a 3D model containing 1000 points was 5–6 minutes on a 2.40 GHz PC with 1 GB memory. We applied the method to different real data. One of them, the Bear Dataset, is shown in figure 4. The precision of the registration can be best judged at the mouth, the eyes, the hand and the feet of the Bear. Figure 5 visualises the difference between the manual pre-registration and the photo-

12


Manual

Genetic

Fig. 5. Difference between manual pre-registration and genetic registration.

consistency based registration. The areas of the mouth, the eyes and the ears show the improvement of the quality. To quantitatively asses the method, we created semi-synthetic groundtruthed data by covering the triangular mesh of a dataset with a synthetic texture. (In this case, the Shell mesh shown in figure 14 was used.) Two views of the object produced by a visualisation program provide the two input images for which the projection matrices are completely known. The projection error is measured as follows. The 3D point set P is projected onto the image planes by both the ground truth and the estimated projection matrices, then the average distance between the corresponding image points is calculated. Evaluating the result of the manual pre-registration by this metric we obtained the above mentioned average error of 18–20 pixels. The results of the genetic method are shown in table 3. The algorithm was tested 10 times with 500 individuals in the population and 10 times with 1000. The average error of 5–6 pixels is acceptable for the images of 512 × 512 pixels. The typical running time with a 3D model containing 1000 points was 15 minutes for 500 individuals and 40 minutes for 1000. (Note that a laser scanner usually provides much more points, but for the image-to-surface registration 1000 points are sufficient.) The test was performed on a 2.40 GHz PC with 1 GB memory. More numerical results related to various parameters of the genetic algorithm are presented in [17]. 3.2 Merging multiple textures When the image-to-surface registration problem is solved, we still face the problem of seamless merging (blending) of multiple textures, that is, images of a surface patch appearing in different views. There are several ways to paste


13

Table 3. Projection error for 10 runs (pixels) 500 individuals 1000 individuals 3.36 3.44 8.77 6.42 5.42 7.79 7.35 4.02 5.97 5.61 12.2 5.38 5.15 6.64 7.52 3.12 5.16 5.35 4.51 8.55 average: 6.54 average: 5.63

texture to the surface of an object. Graphics systems usually have the requirement of two pieces of information: a texture map and the texture coordinates. The texture map is the image we paste, while the texture coordinates specify where it is mapped to. Texture maps are usually two-dimensional, although during the last years the application of 3D textures has also become general. Currently, we only deal with 2D textures. The geometric relation between the 3D surface and the 2D texture map can be described by a transformation. The texture mapping function M assigns a 2D texture point u to a 3D model point X as u = M (X). The relation between the surface and the texture map is illustrated in figure 6. 3D Model

M

2D Texture Map

Fig. 6. Relation between 3D model and texture map.

It is straightforward to choose a photo of an object to be a texture map. An optical image of an object contains the colour information we need to paste to the surface. Projection of a 3D surface point X can be described in e where P is the 3 × 4 projection matrix and e means e ' P X, matrix form: v homogenous coordinates [14]. In this way the texture mapping function is a simple projective transformation.

14


The difficulty of image-based texturing originates from the problem of occlusion, which yields uncovered areas on the surface. An optical image shows the object from only a single view. Therefore, it contains textural information only about the visible parts of the surface; the occluded areas are hidden from that view. (See figure 7b.) This insufficiency can be reduced – in optimal cases eliminated – by combining several images.

(a) Input images

(b) Partially textured models

Fig. 7. Textures cover only parts of the model.

Figure 7a shows two images of the globe. These images can be considered as texture maps. Visualising multiple textures on the same surface is not trivial. Visualisation toolkits usually apply interpolation techniques to colour the surface continuously. However, interpolation cannot be applied between different texture maps, hence handling multiple textures yields a gap appearing between the borders. The result is the same when textures are merged by appending the second image to the first one and modifying the second mapping function with a translation. We designed a flattening-based method to create a texture map based on optical images. Flattening the surface mesh of an object provides an alternative two-dimensional parametrisation. The advantage is that this parametrisation preserves the topology of the three-dimensional mesh. A texture that covers entirely the flattened 2D surface covers also the original 3D surface. Figure 8 illustrates the relation between the 3D surface, the optical image and the flattened surface. Transforming optical images to flattened surfaces provides partial texture maps. (See figure 9.) But since flattening preserves the structure of the 3D mesh, these texture maps can be merged, in contrast to the original optical images. Flattening algorithm published in the literature use different principles and require different techniques. (For a recent survey, see [13].) Many methods optimise various energy functions – based on physical models – to minimise the geometric distortion in some sense; the price is that the optimisation is highly non-linear. Other methods ignore some geometric attributes in order to have a simpler optimisation problem. The so-called convex combination method is frequently used by most of these techniques, including the algorithm of K´os and V´ arady [19] applied in our system.


15

3D Model Flattened Surface

Image

T P

F

Ti

Tf A

Fig. 8. Relation between 3D model, optical image and flattened surface. T is a triangle, F flattening, P projective mapping, A affine mapping.

Partial textures

Merged texture

Fig. 9. Partial and merged texture maps.

Usually, complex meshes cannot be flattened at once, they need to be cut before flattening. We have chosen to cut by plane, since the cutting plane can be easily determined manually: three points selected on the surface define a plane. The 3D mesh is cut in half by this plane, then the halves are flattened and textured separately. The problem of re-merging the textured halves will be discussed later. Figure 10 shows an example of using the algorithm in our experiments.

(a)

(b) Fig. 10. Mesh of Frog and its parametrisation.

16


After flattening the 3D surface, we need to convert our optical images to flattened texture maps. In contrast to the projection matrix, this map is complicated, since neither the transformation of flattening nor the relation between the optical image and the texture map can be represented by a matrix. We use the mesh representation of the 3D surface for conversion: Given a triangle of the mesh, the vertices of the corresponding triangles are known both in the optical image and on the flattened surface. Let us denote these triangles by Ti in the optical image and by Tf on the flattened surface (figure 8). One can easily determine the affine transformation between Ti and Tf , which gives the correspondence between the points of the triangles. Note that the affine transformation is unique for each triangle pair. The algorithm of the conversion is the following: For each triangle T of 3D mesh If T completely visible Projection: Ti ← P · T . Flattening: Tf ← FLAT(T ). Affine transformation: A ← AFFINE(Tf , Ti ). For each point uf ∈ Tf : Colourf (uf ) ← Colouri (A · uf ). End for. End if. End for. Conversion of optical images provides partial texture maps. To obtain the entire textured surface, one needs to merge these texture maps. Since flattening preserves the topology of the mesh, the texture coordinates of the partial texture maps are consistent. The only problem is that of the overlapping areas, where texture maps must be smoothly interpolated. We have tested three methods for handling overlapping areas. The essence of the first method is to decide for each triangle which view it is mostly visible from. The measure of visibility of a 3D point can be the scalar product of the normal vector and the unit vector pointing towards the camera. The measure of visibility of a triangle is the average visibility of its three vertices. For each triangle the texture is given by the texture map that provides the greatest visibility. The main problem of this method is the seams appearing at the borders of the texture maps. (See figure 11.) The second method tries to eliminate this drawback by blending the views. The method collects all the views the given triangle is entirely visible from. All of these views are used for setting the colours of the triangle points. The measure of visibility can be used to set a weight for each view. If the point is better visible from the view, the weight is greater; if less, the weight is smaller. Note that visibility is calculated for each point separately. For this, the 3D surface point and the normal vector are also estimated from the 3D coordinates and the normal vectors of the vertices, respectively.


17

The third method applies the multiresolution spline technique [3] for blending the images. It uses the image pyramid data structure for merging the images in different resolution levels and eliminating the seams between the borders of the texture maps. The same weighting function was applied as in the second method. Using these blending methods the border between the neighbouring texture maps becomes nice and smooth, as one can see in figure 11. The difference between the results of the second and the third method is insignificant.

Method 1

Method 2

Method 3

Fig. 11. Differences between the three merging methods.

We mentioned that complex meshes need to be cut into pieces before flattening. The pieces are textured separately; however, re-merging them yields seams between the borders. These artefacts can be eliminated by the technique called alpha blending as follows. Colours are usually represented by triples: RGB, CMY, etc. A fourth channel can be used to represent the transparency of the pixel. This is termed as alpha channel and it has a value between 0 and 1. 0 means that the pixel is fully transparent, while 1 means that it is fully opaque. Varying alphas of pixels close to the borders makes discontinuities disappear. Due to the blending methods discussed above, using only values of 0 and 1 is completely sufficient. Cutting the mesh should be performed so as to have overlapping areas, that is, points and triangles close to the borders should belong to both halves. These overlapping areas are covered by the same texture on both halves. The alpha values of the pixels in the overlapping areas are set to 0 in one half and to 1 in the other. This technique guarantees the continuity of the texture in the re-merged model, as illustrated in figure 12.

4 Tests Our photorealistic modelling system has been tested both on synthetic and real data. The synthetic data provides the ground truth necessary for assess-

18


Images

Textureless 3D model

Texture maps

Views of textured 3D model Fig. 12. Building photorealistic model of Bear.


19

ing the performance of the system in terms of precision and computational efficiency. In this section, we give further examples of processing real measured data and creating high-quality models using the algorithms described above. The already mentioned Bear dataset, as well as the Frog, the Shell, and the Cat datasets were acquired by a 3D laser scanner and a high-resolution digital camera. In each case, the complete textureless 3D model (triangular mesh) was obtained by the surface registration algorithm presented in section 2 and the triangulator [18]. Then, some 5–6 images of each object were taken. The images were registered to the 3D model and blended by the methods presented in section 3. For the blending, the 3D models were interactively cut in half. The halves were handled separately and merged only at the end of the process. The results can be seen in figures 12, 13, 14, and 15. To better demonstrate the quality of the model, the Cat model is also shown enlarged in figures 16 and 17.

5 Conclusion We have presented a software system for building photorealistic 3D models. It operates with accurate 3D model measured by laser scanner and high quality images of the object acquired separately by a digital camera. The complete 3D model is obtained from partial surface measurements using a genetic based pre-registration algorithm followed by a precise iterative registration procedure. The images are registered to the 3D model by minimising a photo-consistency based cost function using a genetic algorithm. Since textures extracted from images can only cover parts of the 3D model, they should be merged to a complete texture map. A novel method is used to combine partial texture mappings using surface flattening. Test results with real data demonstrate the efficiency of the proposed methods. A high-quality model of a relatively small object can be obtained within two hours, including the processes of 3D scanning and photography. Our next step will be adding the surface roughness by measuring the bump maps using a photometric stereo technique.

Acknowledgement This work is supported by EU Network of Excellence MUSCLE (FP6-507752).

References 1. P.J. Besl and N.D. McKay. A Method for Registration of 3-D Shapes. IEEE Trans. Pattern Analysis and Machine Intelligence, 14:239–256, 1992.

20


2. K. Brunnstr¨ om and A.J. Stoddart. Genetic algorithms for free-form surface matching. In Proc. International Conference on Pattern Recognition, volume 4, pages 689–693. IEEE Comp. Soc., 1996. 3. Peter J. Burt and Edward H. Adelson. A multiresolution spline with application to image mosaics. ACM Trans. Graph., 2(4):217–236, 1983. 4. D. Chetverikov, D. Stepanov, and P. Krsek. Robust Euclidean Alignment of 3D point sets: the Trimmed Iterative Closest Point algorithm. Image and Vision Computing, 23:299–309, 2005. 5. C.K. Chow, H.T. Tsui, and T. Lee. Surface registration using a dynamic genetic algorithm. Pattern Recognition, 37:105–117, 2004. 6. O. Cordon, S. Damas, and J. Santamaria. A CHC Evolutionary Algorithm for 3D Image Registration. In LNAI, volume 2715, pages 404–411. Springer, 2003. 7. T. Csendes. Nonlinear parameter estimation by global optimization – Efficiency and Reliability. Acta Cybernetica, 8:361–370, 1988. 8. F. Bernardini et al. Building a digital model of Michelangelo’s Florentine Piet` a. IEEE Comp. Graphics & Applications, 22(1):59–67, 2002. 9. K. Ikeuchi et al. The great Buddha project: Modeling cultural heritage for VR systems through observation. In Proc. IEEE ISMAR03, 2003. 10. M.J. Clarkson et al. Using photo-consistency to register 2D optical images of the human face to a 3D surface model. IEEE Tr. on PAMI, 23:1266–1280, 2001. 11. P. David et al. SoftPOSIT: Simultaneous pose and correspondence determination. In Proc. 7th European Conf. on Computer Vision, pages 698–714, 2002. 12. R.B. Haralick et al. Pose estimation from corresponding point data. IEEE Tr. on SMC, 19:1426–1445, 1989. 13. M.S. Floater and K. Hormann. Surface parameterization: a tutorial and survey. In N. A. Dodgson, M. S. Floater, and M. A. Sabin, editors, Advances in Multiresolution for Geometric Modelling, Mathematics and Visualization, pages 157–186. Springer-Verlag, Berlin, Heidelberg, 2005. 14. R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge Univ. Press, 2000. 15. J.J. Jacq and C. Roux. Registration of 3-D images by genetic optimization. Pattern Recognition Letters, 16:823–841, 1995. 16. Z. Jank´ o and D. Chetverikov. Photo-consistency based registration of an uncalibrated image pair to a 3D surface model using genetic algorithm. In Proc. 2nd Int. Symp. on 3D Data Processing, Visualization & Transmission, pages 616–622, 2004. 17. Z. Jank´ o, D. Chetverikov, and A. Ek´ art. Using a genetic algorithm to register an uncalibrated image pair to a 3D surface model. Int. Sci. Journal of Engineering Applications of Artificial Intelligence, 2005. Accepted for publication. 18. G. K´ os. An algorithm to triangulate surfaces in 3D using unorganised point clouds. Computing Suppl., 14:219–232, 2001. 19. G. K´ os and T. V´ arady. Parameterizing complex triangular meshes. In Proc. 5th International Conf. on Curves and Surfaces, pages 265–274, 2003. 20. K.N. Kutulakos and S.M. Seitz. A Theory of Shape by Space Carving. Prentice Hall, 1993. 21. M.E. Leventon, W.M. Wells III, and W.E.L. Grimson. Multiple view 2D-3D mutual information registration. In Proc. Image Understanding Workshop, 1997. 22. E. Lomonosov, D. Chetverikov, and A. Ek´ art. Pre-registration of arbitrarily oriented 3D surfaces using a genetic algorithm. Pattern Recognition Letters,


23. 24. 25. 26. 27.

28.

29. 30.

21

Special Issue on Evolutionary Computer Vision and Image Understanding, 2005. Accepted for publication. M. Levoy et al. The digital Michelangelo project. ACM Computer Graphics Proceedings, pages 131–144, 2000. Zbigniew Michalewicz and David B. Fogel. How to Solve It: Modern Heuristics. Springer, 2000. G. Renner and A. Ek´ art. Genetic algorithms in computer aided design. Computer-Aided Design, pages 709–726, 2003. C. Robertson and R. Fisher. Parallel evolutionary registration of range data. Computer Vision and Image Understanding, 87:39–55, 2002. M. Salomon, G.R. Perrin, and F. Heitz. Differential Evolution for Medical Image Registration. In International Conference on Artificial Intelligence, pages 201– 207, 2001. L. Silva, O.R.P. Bellon, and K.L. Boyer. Enhanced, robust genetic algorithms for multiview range image registration. In Fourth International Conference on 3-D Digital Imaging and Modeling, pages 268–275, 2003. S.M. Yamany, M.N. Ahmed, and A.A. Farag. A New Genetic-Based Technique for Matching 3D Curves and Surfaces. Pattern Recognition, 32:1817–1820, 1999. Y. Yemez and F. Schmitt. 3D reconstruction of real objects with high resolution shape and texture. Image and Vision Computing, 22:1137–1153, 2004.

Images


Texture maps

Views of textured 3D model Fig. 13. Building photorealistic model of Frog.

Images


Texture maps

Views of textured 3D model Fig. 14. Building photorealistic model of Shell.

Images


Texture maps

Views of textured 3D model Fig. 15. Building photorealistic model of Cat.

Fig. 16. Enlarged view of the Cat model.

Fig. 17. Another enlarged view of the Cat model.

Index

genetic algorithms, 2 multimodal data, 1 photo-consistency, 2 photorealistic models, 1

registration, 2 surface flattening, 2 texture blending, 2

Creating Photorealistic Models by Data Fusion ... - Semantic Scholar

Creating Photorealistic Models by Data Fusion ... - Semantic Scholar

Suggest Documents

Creating Photorealistic Models by Data Fusion

Photorealistic Rendering of Scientific Data - Semantic Scholar

Photorealistic Rendering of Scientific Data - Semantic Scholar

Regression Relevance Models for Data Fusion - Semantic Scholar

Better models by discarding data? - Semantic Scholar

City Data Fusion: Sensor Data Fusion in the ... - Semantic Scholar

Photorealistic Semantic Web Service Groundings - Semantic Scholar

Creating structure features by data mining the PDB ... - Semantic Scholar

Data Fusion and Object Identification. - Semantic Scholar

Ivlultisensor Data Fusion - Electronics ... - Semantic Scholar

Multispectral and Panchromatic Data Fusion ... - Semantic Scholar

Optimized Photorealistic Audiovisual Speech ... - Semantic Scholar

Photorealistic Rendering using the PhotonMap - Semantic Scholar

Photorealistic Visualization for Virtual and ... - Semantic Scholar

Non-photorealistic image processing: an ... - Semantic Scholar

Statistical Sketching for Non-Photorealistic ... - Semantic Scholar

Statistical Sketching for Non-Photorealistic ... - Semantic Scholar

Multimod Data Manager: A tool for data fusion - Semantic Scholar

Creating Multiplatform User Interfaces by ... - Semantic Scholar

Data Fusion by Matrix Factorization

Anonymized Data: Generation, Models, Usage - Semantic Scholar

Data Analysis: Models and Algorithms - Semantic Scholar

TGV-Fusion - Semantic Scholar

TGV-Fusion - Semantic Scholar