Multiview Texture Models - ETH Zurich - Computer Vision Lab

0 downloads 0 Views 764KB Size Report
The method has the advantage that it successfully deals with both stochastic and structural ..... extracted, and a banana with tangerine texture was produced. 4.
CVPR 2001

Multiview Texture Models Alexey Zalesny1 and Luc Van Gool1,2 1 Swiss Federal Institute of Technology Zurich {zalesny, vangool}@vision.ee.ethz.ch, http://www.vision.ee.ethz.ch/~zales 2 Catholic University of Leuven, Belgium [email protected] bidirectional gray level histograms have been categorized and used for more precise intensity scaling under changing viewpoints [16], and the outputs of Gaussian derivative filters for different viewing conditions have been clustered and used for material recognition and texture reproduction [12]. We propose a texture model that also takes example images of different viewing conditions as input and that can be used to generate more of the same texture, including the 3D effects. In contrast to relief textures [14] and volumetric textures (e.g. [13]), no input on 3D geometry is required in order to generate the multiview texture model. Also no correspondence search for parallax computations is required [3]. The point of departure is a texture model for a single view that has the advantage that it can be elegantly generalized to a multiview texture model. Once the model for a single view has been created, such extension is very fast and the size of the model remains very small. The example images don’t have to be stored once the model has been created (cf. [4], [5]). Section 2 describes the basic model for a single view, which is then generalized to a multiview texture model in Section 3. Section 4 concludes the paper.

Abstract Mapping textured images on smoothly approximated surfaces is often used to conceal the loss of their real, fine-grained relief. A limitation of mapping a fixed texture in such cases is that it will only be correct for one viewing and one illumination direction. The presence of geometric surface details causes changes that simple foreshortening and global color scaling cannot model well. Hence, one would like to synthesize different textures for different viewing conditions. A texture model is presented that takes account of viewpoint dependent changes in texture appearance. It is highly compact and avoids copy-andpaste like repetitions. The model is learned from example images taken from different viewpoints. It supports texture synthesis for previously unseen conditions.

1. Introduction In recent years important progress has been made in the analysis and subsequent synthesis of textures (e.g. [4], [5], [9], [10], [15], [17], [18]). It is now possible to synthesize textures learned from an example image, taken from a specific viewing angle and under specific illumination. Texture mapping can well conceal the lack of fine surface geometry. Nevertheless, using the same texture for different viewing conditions has its limitations. The simple foreshortening and smooth surface shading of traditional texture mapping cannot mimic such 3D effects as variable self-occlusions and self-shadowing. Rather than resorting to explicit geometry, which would be expensive, an alternative route is to create texture models that take such effects into account. Important steps in that direction have already been taken. First of all, the kind of changes to be expected have been recorded systematically and a range of good example images have been made available through the CUReT database [2]. These data are now beginning to be analyzed in detail. For instance, the dependence of correlation length on viewpoint has been studied [1], the

2. Basic model for a single view The multiview texture model is an extension of a single view texture modeling technique. This is discussed first in order to keep the paper sufficiently self-contained. It extracts some carefully chosen statistics from an example texture during an analysis step. Synthesis then consists of constructing textures with similar statistics. The method has the advantage that it successfully deals with both stochastic and structural textures. It does not copy any part of the example texture, thereby avoiding repetitions in the synthesized textures. It includes both short-range and long-range pixel interactions and therefore can pick up small- and large-scale effects. The texture model is also highly compact, only a couple of Kbytes. These advantages are preserved in the extended version that includes viewpoint dependency.

1

CVPR 2001

2.1. Extracted Statistical Properties

2.2. Clique Type Selection

The method extracts first- and second-order statistics from an example image. The first-order statistics correspond to the intensity histograms. Color will be discussed later. The second-order statistics draw upon the cooccurrence principle: for pixel pairs at fixed relative positions the intensities are compared. The pixel pairs are called cliques and pairs with the same relative positions form a clique type. See Figure 1. The clique is an ordered pair. Hence, a “tail” and “head” pixels can be distinguished. Instead of storing the complete joint probability distributions for the different clique types, our model only stores the distribution of the intensity differences between the head and tail pixels. The differences are requantized into 63 signed values. The relative frequencies of these difference values yield a difference histogram for every clique type.

This section describes how clique types with the largest perceptual impact are selected, one by one. The procedure consists of a loop, where a texture is synthesized from the model already collected at a given instant, and the statistical deviations from the example texture are used to upgrade the model through the appending of an additional clique type. Starting with an empty neighborhood structure and empty parameter set: 1. Collect the complete 2nd-order statistics for the example texture, i.e., the difference histograms of all clique types. 2. Generate an image filled with independent noise with values uniformly distributed in the range of the example texture. This noise image serves as the initial synthesized texture, to be refined in subsequent steps. 3. Collect the difference histograms of all clique types for the current synthesized image (initially noise). 4. Select the clique type with the largest Euclidean distance between its difference histogram for the synthesized texture and the example texture. If this distance is less than a threshold, then stop. Otherwise, add the clique type to the neighborhood structure and its difference histogram to the statistical parameter set. 5. Synthesize a new texture using the updated neighborhood structure and statistical parameter set. This texture should have the prescribed statistics of the parameter set for all clique types in the neighborhood structure [20]. 6. Go to step 3. For this texture analysis algorithm, repeated texture synthesis is necessary (step 5). We use the same algorithm as for the synthesis from the final texture model. This algorithm is based on the Gibbs Random Field image representation. It is described in [19] and the complementary paper [20], where also other aspects of the above modeling procedure are described in more detail. Although we cannot repeat the details here, it is useful to point out a number of features of the synthesis scheme. These also bring out the difference with a number of alternative approaches. The algorithm generates a texture with the prescribed statistics (parameter set) through stochastic optimization. It is constructed so that it cannot be trapped by local minima (as opposed to the method of [6]) of the distance to the prescribed statistics (step 5): dist ( f , f 0 ) → min , (1)

Cliques of the same type

Figure 1.

Cliques of different types

Cliques and clique types.

It is not practical to collect these second-order statistics for all possible clique types. One could limit their headtail distance [6], but we rather opted for trying to select clique types that have the largest perceptual impact for the given texture. A small selection of both short and large distance types is included into the model, ensuring that important short- and long-range interactions between pixels are represented. The resulting texture model consists of two parts. A first part specifies the clique types that have been selected to describe the texture. The set of these clique types is called the neighborhood structure. A second part is the statistical parameter set: the intensity histogram and the difference histograms for the selected clique types. After the model has been created, the example image is no longer needed. Restricting the model to first- and second-order statistics puts a limitation on the textures that can be generated. Although Julesz’s early experiments indicated that first- and second-order statistics govern to a large extent our perception of textures, he also demonstrated that higher-order statistics (cliques containing more than two pixels) couldn’t be neglected [11]. Later, Gagalowicz [7] concluded from extensive experiments that perception is governed by texture statistics up to the 6th order. Nevertheless, as we will demonstrate, quite a broad range of textures can be synthesized with our method. Moreover, higher-order statistics can be included in the model, albeit at the expense of computation time.

where f and f 0 are vectors concatenating the histograms of the clique types selected so far for the currently synthesized texture and the example texture respectively.

2

CVPR 2001

It has been shown that the successive approximations constructed in the histograms space converge even if the minimum in (1) is nonzero, i.e., the prescribed statistics f 0 could not be reached exactly, e.g. because the size of the synthesized image differs from that of the example image [19]. As (1) converges, there is no need to assume ergodicity (as opposed to [18]). This is important, as the vast majority of regular textures are essentially nonergodic [11] and this feature is key to their appearance. With our method there also is no need to estimate the Gibbs potentials exactly (cf. [18]) for finding the final random field corresponding to the example texture. It would cost too much computational time. The dynamic random field approach is used instead, with continuously changing potentials and as synthesis result an image from a generated sequence that corresponds to the minimal distance (1).

3.1. Adaptation of the model Figure 4 shows straw texture, seen from two angles. Part (a) is a perpendicular view, (b) an oblique view under 68°. Both textures were obtained from the CUReT texture database. Part (c) is the result of simply foreshortening (a) based on the cosine of the angle between the views. As this example shows, such simple foreshortening does not capture the 3D effects of changing viewpoint for this kind of nonplanar texture. We propose a modification to the single view texture algorithm in order to cater for 3D effects, with a minimal overhead in terms of model size and computation time.

2.3. Extension to Color Textures

red

In the case of color images, separate “intra-band” neighborhood structures and parameter sets are selected for each of the three color bands. Besides these intra-band interactions, “inter-band” interactions between the color bands are included where head and tail positions belong to different color bands. An example of such intra-band + inter-band neighborhood structure is shown in Figure 2. Some interactions are always included into the neighborhood structure. These are the interactions with the four nearest neighbors within the bands and the “vertical” connections between bands (i.e., between identical pixels). Experiments have shown that they had to be included almost without exception in the texture models. Their automatic inclusion speeds up the modeling. Experiments with a luminosity + chromaticity type of color space (YIQ) suggest that synthesis in the RGB-space directly gives better results.

red-green red-blue

green green-blue blue

Figure 2. Complete neighborhood structure for a color texture. Dots represent pixels that build a clique with the central pixel. Left column: intraband neighborhood structures for the 3 color bands red, green, and blue (since central pixel can act as tail or head, the patterns are point symmetric around the center). Second and third columns: inter-band neighborhood structures for pairwise interactions between the r-g, g-b, and r-b bands.

2.4. Single View Texture Examples Figure 3 shows a collage with a few examples, original textures in the top image and synthesized textures in the bottom image. On the whole, the synthesized textures are perceptually similar to the examples for both regular and stochastic textures. Nevertheless, some of the examples indicate that our approach finds it difficult to capture the precise shapes of texels. Such imperfections can be seen in the colored candy texture and the cardboard regularly punched with holes.

The single view texture model consists of two parts: the neighborhood structure and the statistical parameter set. It is the neighborhood structure that is difficult to get at. Once this part has been constructed, the extraction of the corresponding statistics is fast and easy. Hence, we avoid extracting a new neighborhood structure for every viewpoint. The texture is first modeled for one viewpoint, typically for a fronto-parallel one. The neighborhood structure for that viewpoint is then simply deformed for other viewpoints, by contraction or stretching in the direction of the slant. So foreshortening is still applied, but to the neighborhood structure and not to the texture

3. Multiview Texture In this section the model is adapted to include 3D effects observed with changing viewpoint.

3

CVPR 2001

viewpoints. Although the deformed neighborhood structure is not optimal for the other views, we observed good synthesis results for a broad class of textures nonetheless. This extension to other viewpoints is very fast: milliseconds compared to the tens of minutes required for the extraction of a new neighborhood structure. As a consequence, building a texture model that includes 3D effects takes virtually no additional time compared to the extraction of a model for a single viewpoint.

itself. As a matter of fact, this would still simply yield a foreshortened texture if the statistical parameter set for the initial texture were kept. Further refinements are obtained by extracting a new statistical parameter set for this deformed neighborhood structure, from the example image for the new view. In the case of color images, each of the intra-band and inter-band neighborhood structures are deformed in the same way, and new statistics are extracted for each.

(a)

(b)

(c)

Figure 4. Straw texture (CUReT database), seen from two angles. (a) Perpendicular view, (b) oblique view under 68°°, (c) foreshortening of (a) (see color plates).

The resulting multiview model can also be made very compact. Only a single neighborhood structure is stored, which is deformed on the basis of the angle between viewpoints. This costs no additional bits. The largest part of the extended model consists of the difference histograms. By applying Principal Component Analysis this part can also be compressed: they can be expressed as a linear combination of a small number of principal components with the highest eigenvalues, coined “eigenhistograms”. One only has to store a few eigenhistograms plus the weights of the linear combination for each difference histogram. Such PCA decomposition also helps to generate views of intermediate angles, for which no example images have been taken. This is achieved by interpolating between the weights of neighboring views, for which such example images were available. As a result, fewer example images can suffice, leading to an additional compression of the model.

3.2. Examples of multiview texture Figure 3. Texture collage. Top: original example textures (including Brodatz and VisTex database), bottom: synthesized textures (see color plates).

Here we illustrate the results of such multiview texture modeling. First, consider Figure 5. Part (a) shows the frontoparallel straw image, already shown in Figure 4, (b) is the neighborhood structure of its model (blue band as an example). Part (c) again is the example image for the straw looked at under 68°, (d) shows the synthesized view

Hence, this process does not extract a new neighborhood structure for new views, it simply deforms the one it already has for the first view. Only the statistics are extracted anew from the example images of the new

4

CVPR 2001

based on the multiview straw texture model, with as neighborhood structure for this viewing angle the contracted neighborhood structure as shown in (e). Another example from the CUReT database is shown in Figure 6. The top row shows a frontal and oblique view of fur, the bottom row shows synthetic results obtained with the multiview model for this texture. As another example, consider the piece of protective rubber foam in Figure 7. It is highly nonplanar and therefore a critical test for a multiview texture algorithm. In particular, it is not obvious that the simple deformation of the neighborhood structure in our algorithm would suffice for such strong relief. The first column of Figure 8 shows some example views, i.e., the original images, for viewing angles of 0°, 30°, 45°, 60°, and 80°. In total, images for 22 different viewing angles were taken, between 0° and 80°. The illumination was kept fixed. Nevertheless, the views not only differ in geometric structure, but also in overall intensity. The second column shows in the top row the result of texture synthesis with the single view texture model extracted from the 0° view. The other four images in that column show the result of simple foreshortening, according to the viewing angles. Again, there is quite a big difference between the original images and these textures, as traditional texture mapping would produce.

(a)

The third column shows the result of texture synthesis with the multiview model. The neighborhood structure has been extracted only once, namely for the 0° view. It contains 40 clique types. The neighborhood structures for all other views are simply foreshortened versions of this one, but with their statistics newly measured from the different example views, as described earlier. These synthesized textures already look much more like the original ones.

Figure 6. Top row: fronto-parallel and oblique views of CUReT fur texture, bottom: synthesis with multiview model (see color plates).

(b)

Figure 7.

(c)

(d)

Rubber foam texture.

The fourth column shows textures synthesized with a PCA-compressed, multiview model. The original multiview model was compressed by a factor of 5, by decomposing the 40 difference histograms (one for every clique type) for each of the 22 views in terms of only 12 principal components. The reduction is not simply 880 (40 times 22) to 12 as also the weights to represent the original histograms in terms of the 12 components need to be stored. It is interesting to note that the results of using the compressed model are rather better than worse. This is particularly true for the 45° view. Removing the components with smaller eigenvalues seems to remove

(e)

Figure 5. (a) Fronto-parallel view of straw texture (CUReT), (b) neighborhood structure for this view, (c) oblique view of straw, (d) synthetic texture for the same angle as (c), (e) neighborhood structure for (d) (see color plates).

5

CVPR 2001

9 shows such interpolated result. Based on the views for 30° and 60° linear interpolation leads to the synthesized texture for 45° shown as (a). Part (b) shows again the result for the multiview texture, but with a parameter set directly learned from a 45° view. Part (c) shows the synthesis result based on a neighborhood structure and statistical parameter set specifically extracted for 45°. As can be seen, the result of interpolation is still quite good, although of less quality than what is obtained with a model dedicated to this viewing angle. Also, interpolating between viewing angles as far apart as 30° and 60° is at the limit of what we found to be feasible with strongly nonplanar textures. Such interpolation allows one to reduce the number of stored viewing angles, yielding a

unwanted noise rather than useful model information. We have observed similar improvements for several other textures. The original multiview model of the rubber foam texture has a size of about 100 Kbytes for all 22 views, which is a typical size of our models. After compression, the size typically is a few tens of Kbytes. In both cases the model is more compact than the space required for storing even a single example image. PCA analysis of the difference histograms not only offers the possibility to compress the multiview texture model. As mentioned before, it also supports viewpoint interpolation by interpolating between the weights for the eigenhistograms of the neighboring learned views. Figure



30°°

45°°

60°° 80°° Originals

Foreshortening

Synthesis

Synthesis after PCA

Figure 8. Columns from left to right: rubber foam input images for different viewing angles, result of foreshortening the fronto-parallel view, synthetic results based on the multiview model, synthetic results based on the PCA compressed multiview model.

6

CVPR 2001

Secondly, the current model does not support animation. If the (virtual) camera would change its position, new textures can be generated that are consistent with the new viewing angles. This is not enough, however. The new textures should be consistent with the old ones, i.e. the textures on the same physical patch of the surface should mimic dips, streaks, etc. on corresponding places. If simply a new texture is generated, no such correspondences will be realized. This extension calls for the inclusion of relations between different viewpoints and is the subject of our current investigations. Interesting links with the work of Leung and Malik [12] may emerge at that point. In the nearest future we also intend to check the influence of weak perspective distortions. It seems that e.g. the “jitter” seen at the top part of Figure 9 (a) was the result of such distortions not being accounted for by the current model.

further compression. It could happen that the interpolated histograms yield mutually incompatible statistics, i.e., statistics that cannot be satisfied simultaneously on the given raster. In this situation the synthesis algorithm of section 2.2 yields an image with the nearest possible statistics. Figure 10 shows an example with multiview texture for a curved surface. The top orange is a real one. In this case the patches are both viewed and illuminated from different relative directions. A three-dimensional model of this orange was produced, so that the normal directions at the different points of its surface are known. The bottom orange is the result of covering the image projection of this shape with the corresponding viewpoint dependent textures. As a matter of fact, the projection was divided into different patches, which were then covered with texture for the average normal of that patch. The multiview texture model was learned from similar patches on the input image (top orange). Synthesis in this case was based on the uncompressed model. In order to avoid seams between the patches, a special texture knitting technique was used. This technique generates natural transitions between textures. It is described in [20]. The main idea is this. When synthesizing a new patch of texture, its model takes into account the surrounding patches that have already been synthesized. When dealing with pixels within the new patch, some of them – mainly those near the patch boundaries - will require pixels in the neighboring patches for some of the cliques. These surrounding pixels influence the conditional probabilities of the intensities within the patch. This influence is sufficiently strong, especially because of the long-range interactions of the model, that there are no visible seams between patches of similar textures or that there is a convincing transition in the form of a mixture in the case of different textures. For more details, see [20]. Figure 11 shows a similar example. The left tangerine is an example image, the right one a synthetically generated version. Also the 3D shape of the banana was extracted, and a banana with tangerine texture was produced.

(a)

(b)

4. Conclusion We have proposed a multiview texture model, that allows to model viewpoint dependencies in the appearance of textures. The model is highly compact yet sufficiently rich to capture the perceptual essence of many textures. It is clear that the multiview model is not complete yet. First, one would like to manipulate the viewpoint and illumination directions independently. This seems to require a rather straightforward extension provided sufficient example images are available.

(c)

Figure 9. (a) Result of interpolated, synthetic texture, (b) result for the multiview texture model with a parameter set learned from a 45°° example view, (c) synthetic texture for the same view based on a single view model with a neighborhood structure optimized for this slant.

7

CVPR 2001

[2]

K.J. Dana, B. Van Ginneken, S.K. Nayar, and J.J. Koenderink, “Reflectance and Texture of Real-World Surfaces”, ACM Transactions on Graphics, Vol. 18, No. 1, 1999, pp. 1-34.

[3]

P. Debevec, C. Taylor, and J. Malik, “Modeling and Rendering Architecture from Photographs: a Hybrid Geometry- and Image-Based Approach”, SIGGRAPH 96, 1996, pp. 11-20.

[4]

J.S. De Bonet, “Multiresolution Sampling Procedure for Analysis and Synthesis of Texture Images”, SIGGRAPH 97, 1997, pp. 361-368.

[5]

A. Efros and T. Leung, “Texture Synthesis by Non-Parametric Sampling”, Proc. Int. Conf. Computer Vision (ICCV 99), Vol. 2, 1999, pp. 1033-1038.

[6]

A. Gagalowicz and S.D. Ma, “Sequential Synthesis of Natural Textures”, Computer Vision, Graphics, and Image Processing, Vol. 30, 1985, pp. 289-315.

[7]

A. Gagalowicz, personal communication.

[8]

G. Gimel’farb, Image Textures and Gibbs Random Fields, Kluwer Academic Publishers: Dordrecht, 1999, 250 p.

[9]

D. Heeger and J. Bergen, “Pyramid-Based Texture Analysis/Synthesis”, SIGGRAPH 95, 1995, pp. 229-234.

[10] T.I. Hsu and R. Wilson, “A Two-Component Model of Texture for Analysis and Synthesis”, IEEE Trans. on Image Processing, Vol. 7, No. 10, Oct. 1998, pp. 1466-1476. [11] B. Julesz and R.A. Schumer, “Early visual perception”, Ann. Rev. Psychol., Vol. 32, 1981, pp. 575-627 (p. 594). [12] T. Leung and J. Malik, “Recognizing Surfaces Using ThreeDimensional Textons”, Proc. Int. Conf. Computer Vision (ICCV 99), 1999, pp. 1010-1017.

Figure 10. Top: real orange, bottom: synthetic orange (see color plates).

[13] F. Neyret, “Modeling, Animating, and Rendering Complex Scenes Using Volumetric Textures”, IEEE Trans. Visualization and Computer Graphics, Vol.4, No.1, Jan - March 1998, pp. 55-70. [14] M. Oliveira, G. Bishop, and D. McAllister, “Relief Texture Mapping”, SIGGRAPH 00, 2000, pp. 359-368. [15] J. Portilla and E. P. Simoncelli, “Texture Modeling and Synthesis Using Joint Statistics of Complex Wavelet Coefficients”, Int. J. of Computer Vision, Vol.40, No.1, Oct. 2000, pp. 49-72. [16] B. Van Ginneken, J. Koenderink, and K. Dana, “Texture Histograms as a Function of Irradiation and Viewing Direction”, Int. J. of Computer Vision, Vol.31, No.2/3, 1999, pp. 169-184. [17] L.-Y. Wei and M. Levoy, “Fast Texture Synthesis Using TreeStructured Vector Quantization”, SIGGRAPH 00, 2000, pp. 479-488. [18] S.C. Zhu, Y.N. Wu, and D. Mumford, “Filters, Random Fields and Maximum Entropy (FRAME)”, Int. J. Computer Vision, Vol. 27, No. 2, March/April 1998, pp. 1-20.

Figure 11. Multiview textures as an alternative for genetic engineering. Top-left: real tangerine, topright: synthetic tangerine, bottom: banana covered with a tangerine skin, the “banerine” (see color plates).

[19] Alexey Zalesny, “Analysis and Synthesis of Textures With Pairwise Signal Interactions”, Tech. Rep. KUL/ESAT/PSI/ 9902, Catholic University of Leuven, Belgium, 1999, 132 p., http://www.vision.ee.ethz.ch/~zales.

References [1]

[20] Alexey Zalesny and Luc Van Gool, “A Compact Model for Viewpoint Dependent Texture Synthesis”, SMILE 2000, Workshop on 3D Structure from Images, Lecture Notes in Computer Science 2018, M. Pollefeys et al. (Eds.), 2001, pp. 124-143.

K. Dana and S. Nayar, “Correlation Model for 3D Texture”, Proc. Int. Conf. Computer Vision (ICCV 99), 1999, pp. 10611066.

8