Computer Science Department of The University of Auckland CITR at Tamaki Campus (http://www.citr.auckland.ac.nz)
CITR-TR-97
September 2001
On Tradeoffs Between Deterministic Structure and Randomness in Texture Simulation Georgy Gimel'farb1
Abstract The paper discusses today's techniques of simulating realistic images of natural textures. These techniques account for deterministic spatial structures of signal relationships in a given training sample and allow for random deviations of signals in the simulated texture. When simulation is based on random permutations of image tiles, these latter can be found for some periodic regular textures by using characteristic pixel neighbourhoods. The neighbourhoods are estimated by using Gibbs random field models with multiple pairwise pixel interactions, and the tiles determined from the neighbourhoods, serve as basic structural texture elements, or texels. To exclude spurious borders between the permuted tiles and suppress visually undesirable repetition of specific singularities of individual tiles, simulated images are post-processed by maximising conditional probabilities of signals in accord with the Gibbs model.
1
Center for Image Technology and Robotics Tamaki Campus, The University of Auckland, Auckland, New Zealand.
[email protected]
You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the CITR Tamaki web site under terms that include this permission. All other rights are reserved by the author(s).
On Tradeos Between Deterministic Structure and Randomness in Texture Simulation Georgy Gimel'farb
CITR, Department of Computer Science Tamaki Campus, University of Auckland Private Bag 92019, Auckland 1
[email protected]
1 Introduction For last ve years, texture modelling achieved much success in simulating realistic samples of natural image textures [2{5, 7, 9, 10]. The simulation accounts, explicitly or implicitly, for a deterministic spatial structure of signal relationships (interactions) in a given training sample but allows for random deviations of signals in the simulated texture while preserving the interaction structure of the training sample. If necessary, the structure is extrapolated to images of larger size. The approaches in [2, 9] retain the interaction structure by approximating relative frequency distributions of top-down signal co-occurrences at dierent levels of a multiresolution description of the simulated image with the closest (with respect to some probabilistic distance) similar co-occurrence distributions for the training sample. In particular, the Laplacian pyramid and the steerable pyramid based on complex wavelets are used in [2] and [9], respectively. The pyramidal synthesis of a desired texture is conducted in [2] by a sequential top-down sampling of the training pyramid. The goal image has the same root signal value or several replicas of the training root signal if the goal image is of larger size. At each lower level of the synthesising pyramid, the signal in every location is sampled from the uniform distribution of signals in the corresponding locations of the training pyramid. The correspondences are determined by thresholding distances between all training top-to-preceding-level signal vectors and the like signal vector for the synthesising pyramid. The extent to which the signal vectors coincide speci es the structural similarity between the images, and the threshold determines the level of randomisation with respect to the training sample. Actually in this case the randomisation with respect to the training sample is restricted only to similar signals in the training pyramid. More diverse randomisation is obtained when the goal texture is formed from an arbitrary sample of the independent random eld as to make the wavelet coecients for the goal image closely similar to the coecients in the training pyramid [9]. The gradient-based search for the goal image is guided by an elaborated set of constraints that account for joint frequency distributions of the coecients at adjacent spatial locations, orientations, and scales.
The interaction structure in these both approaches is retained by replicating the multiresolution representation of the training sample up to the point speci ed by a given threshold of similarity. Because of preserving signal co-occurrence frequency distributions at dierent resolutions, the goal texture consists of implicit spatial permutations of parts of the training sample. These permutations become explicit in texture simulation by non-parametric sampling [3] that intends to preserve the interaction structure of the training sample by using a heuristically chosed square pixel neighbourhood as the structure-preserving element. Synthesis of a goal image begins from a small seed taken randomly from the training sample. Every new pixel is successively added to the already synthesised part by uniformly sampling from among the pixels of this part having closely similar neighbourhoods. The similarity is de ned by thresholding a relative distance between all the signals in the neighbourhoods. Generally, such extrapolation accumulates local errors so that the textures generated can depart much from the given prototypes. The patch-based non-parametric sampling in [7] tries to diminish such errors by adding to the already synthesised part at every step a new rectangular patch chosen randomly from among the already synthesised patches that have closely similar neighbourhoods. These techniques eciently simulate a big variety of dierent textures although most of them involve rather intensive computations, e.g. 10{20 minutes to generate a 256 256 image on a 500MHz Pentium workstation [9]. The patch-based sampling scheme is much faster but the quality of simulation depends notably on how adequate is a heuristic choice of the patches. This paper shows that the adequate patches for some textures can be found using a Gibbs random eld image model with multiple pairwise pixel interactions [4, 5, 10]. The model yields an analytical estimate of the neighbourhood that characterises the interaction structure. In the case of some approximately periodic regular textures such as mosaics the estimated neighbourhoods specify tiles covering the training sample. Such tiles can be considered as basic structural elements of these textures called texels in [6]. Then the desired texture is simulated by randomly permuting and replicating the tiles. To exclude spurious borders between the permuted tiles and suppress visually undesirable repetition of speci c singularities of individual tiles, the images can be post-processed by maximising conditional probabilities of signals in accord with the Gibbs model.
2 Gibbs random eld texture model Let Q and R denote a nite set of image signals (e.g., grey values) and a nite arithmetic lattice supporting images : R ! Q, respectively. Let C = [( + ) : + 2 R] be a family of translation invariant pixel pairs separated by the relative shift . Spatially homogeneous pairwise pixel interactions are described by a particular set C = [C : 2 A] of interacting pixel pairs, or cliques of the neighbourhood graph depicting such g
i; i
a
a
i; i
a
a
a
a
interactions.
Quantitative strengths of interactions for every clique family C are given by the bounded potential function : Q Q ! R = (,1 1). The Gibbs random eld model with multiple pairwise pixel interactions is represented by the Gibbs probability distribution: a
Va
;
0 X X Pr( jA V) / exp @ g
;
A(
2
a
i;i+a)
C
2 a
(
Va gi ; gi+a
1 )A
(1)
The rst approximation of the maximum likelihood estimate (MLE) of the potential vector V = [ ( ) : 2 Q] is proportional to the centred grey level co-occurrence histogram (GLCH) H ( ) = [ ( ) : ( = = + ) 2 Q Q; ( + ) 2 C ] collected for the clique family C [4]. The GLCHs for a large-size search window W of the intra-clique shifts allow for parallel or sequential analytical learning of the characteristic interaction structure A of a given training sample [5]. a
Va q; s a
q; s
g
Va q; s
q
gi ; s
gi
a
i; i
a
a
a
a
g
Figures 1 { 3 demonstrate grey-coded model-based interaction maps (MBIM) and characteristic neighbourhoods A that are learned for the regular textures D20, D34, and D53 from [1] using the MBIMs. Spatial positions in the MBIM represent 2D intra-clique shifts 2 W and give values of trelative Gibbs energies, that is, components of the exponent in Eq. (1) for every shift . The parallel learning assumes that all the interactions are independent and forms the characteristic interaction structure by simply thresholding the MBIM. The sequential learning assumes independent primary and dependent secondary interactions produced by the primary ones and takes account of this interdependence to select only the primary interactions for describing both the basic and ne charcteristic interaction structure. The primary interactions are found by sequential exclusion of the secondary ones [5] a
a
Figure 1:
Texture D20: the training sample and results of parallel and sequential analytic learning of the characteristic interaction structure; jAj = 71.
Figure 2:
Texture D34: the training sample and results of parallel and sequential analytic learning of cthe haracteristic interaction structure; jAj = 100.
Figure 3:
Texture D53: the training sample and results of parallel and sequential analytic learning of the characteristic interaction structure; jAj = 60.
The Gibbs random eld model of Eq. (1) allows for simulating a desired texture from an arbitrary sample of the independent random eld by stochastic relaxation. The relaxation changes signals in every pixel in accord with their conditional probabilities for
the estimated characteristic neighbourhood. This process is embedded into stochastic approximation of Gibbs potentials that pursues the goal of bring close together the GLCHs for the training and generated samples. Such simulation technique called Controllable Simulated Annealing (CSA) in [4] results in textures similar to the training samples provided that the analytically estimated characteristic neighbourhoods describe adequately the interaction structure.
Figure 4:
Texture D29: the training sample 128 128 and steps 40, 80, 100, 160, and 200 of the CSA-simulation after parallel analytic learning of the characteristic interaction structure; jAj = 11.
Many stochastic textures in [1] and [8] depicting, for instance, sand, pressed cork, grass lawn, wood grain, and similar objects are accurately described with close-range neighbourhoods of size jAj = 10 20, and their simulation by the CSA is relatively fast. For instance, 200 steps of the CSA-simulation of the 256 256 sample of the stochastic texture D29 from [1] in Figure 4 takes about 60 seconds on the 366MHz laptop PC (actually, the rst 80{100 steps are already sucient to produce the goal image). :::
But sometimes only a very generalised basic structure of a desired texture can be recovered by the parallel learning that assumes the mutually independent pairwise pixel interactions (Figure 5). In some cases, as shown in Figures 6 and 8, the desired characteristic structure (though of rather large size) can be recovered by the sequential analytical learning. Empirical sequential learning in [10] yields much smaller neighbourhoods but at the expense of much more intensive computations during the learning stage. Nonetheless for some regular mosaics even quite large sequentially estimated structures cannot describe accurately visually important details, as for instance, in Figure 7. Here, the simulated pattern diers much from the training sample although the conditional probabilities of signals in every pixel of this simulated texture for the chosen neighbourhood in Figure 2 are as high as for the training sample. Thus the Gibbs simulation of such textures necessitates more diverse types of pixel interactions than purely pairwise ones in Eq. (1).
CSA step 0
20
40
60
80
CSA step 100
120
140
160
180
CSA step 200
220
240
260
280
Figure 5:
CSA-simulation of the texture D20 after parallel analytic learning of the characteristic interaction structure; jAj = 71.
CSA step 0
20
40
60
80
CSA step 100
120
140
160
180
CSA step 200
220
240
260
280
Figure 6: CSA-simulation of the texture D20 after sequential analytic learning of characteristic interaction structure; jAj = 71.
3 Towards a formally de ned texel The straightforward CSA-simulation is computationally intensive in the case of the large characteristic pixel neighbourhoods. At the same time the interaction structures analyti-
CSA step 0
20
40
60
80
CSA step 100
120
140
160
180
CSA step 200
220
240
260
280
Figure 7: CSA-simulation of the texture D34 after sequential analytic learning of characteristic interaction structure; jAj = 100. CSA step 0
20
40
60
80
CSA step 100
120
140
160
180
CSA step 200
220
240
260
280
Figure 8: CSA-simulation of the texture D53 after sequential analytic learning of characteristic interaction structure; jAj = 60. cally estimated for the model of Eq. (1) open up possibilities for de ning basic structural elements of a texture, or texels [6]. At least for some regular periodic mosaics, the texel can be related to the polygonal hull of the learned pixel neighbourhood.
Figure 9: Randomised tilings of texture D20 with the tiles of size 76 74. Figures 9 { 11 demonstrate regular textures simulated by random permutation and replication of a few small rectangular tiles. The tiles acting as the primitive texels were cut randomly from the training samples in Figures 1 { 3 after choosing the sizes of the rectangles in accord with the polygonal hulls of the learned characteristic neighbourhoods, namely, 76 74, 36 28, and 48 62 for the textures D20, D34, and D53, respectively. These experiments con rm that the texels and characteristic neighbourhoods of pairwise pixel interactions are closely interrelated at least for some types of the regular mosaics. From the viewpoint of Gibbs modelling, the signal co-occurrence histograms for the randomised tiling are quite close to the training histograms because the dierences are limited to the arbitrary long-range inter-tile interactions and to replications of only a particular part of the intra-tile interactions in each clique family. The main drawback of such texture simulation is that the singularities of the chosen tiles are replicated verbatim in the generated images although due to random positions of the replicated tiles the repetitions might not be visually caught from the rst glance. Also, random structural deviations in the training sample may result in visible false borders between the permuted tiles. In order to suppress the false borders the obtained texture can be randomised by single-step post-processing. The post-processing replaces each signal in the image by the most probable signal with respect to its characteristic neighbourhood in the randomised tiling. Results of such post-processing shown in Figures 9 { 11 support such a suggestion.
Figure 10: Randomised tilings of texture D34 with the tiles of size 36 28.
4 Conclusions These experiments show that the basic goal of probabilistic texture modelling is to adequately describe and expoloit the deterministic structure of the training sample. If the structure can be speci ed by a few speci c texels such that desired samples dier by a particular spatial arrangement of the texels while all the arrangements preserve the structure in general, then the fast simulation can be based on random permutations of the texels taken randomly from the training sample. But such technique is not adequate for most of the stochastic textures (like the above-mentioned texture D29 [1]) that has a deterministic behaviour of the conditional probability distributions of signals but possesses no explicit texels and rules of their arrangement. Most advanced today's texture simulation techiques use the training sample itself as the structure-preserving framework. As our experiments show, there exists a possibility to formally de ne texels for some regular textures and nd the underlying spatial structure of their interactions by using the characteristic pixel neighbourhoods estimated for the Gibbs image model with multiple pairwise pixel interactions. The use of texels can notably accelerate the simulation of such regular textures.
Figure 11: Randomised and re ned randomised tilings of texture D53 with the tiles of size 48 62.
References [1] Brodatz, P.: Textures: A Photographic Album for Artists and Designers, New York: Dover Publications, 1966. [2] De Bonet, J. S.: Multiresolution sampling procedure for analysis and synthesis of texture images. In: Proc. ACM Conf. Computer Graphics SIGGRAPH'97, 1997, 361{368. [3] Efros, A. A., Leung, T. K.: Texture synthesis by non-parametric sampling. In: Proc. IEEE Int. Conf. Computer Vision ICCV'99, Greece, Corfu, Sept. 1999, vol. 2, 1999, 1033{1038. [4] Gimel'farb, G. L.: Image Textures and Gibbs Random Fields. Dordrecht: Kluwer Academic, 1999. [5] Gimel'farb, G.: Characteristic interaction structures in Gibbs texture moleling. In: BlancTalon, J., Popescu, D. C. (Eds.): Imaging and Vision Systems: Theory, Assessment and Applications. Huntington, N. Y.: Nova Science, 2001. [6] Haralick, R. M., Shapiro, L. G.: Computer and Robot Vision, Reading: Addison-Wesley, vol. 1 (1992), vol. 2 (1993). [7] Liang, L., Liu, C., Xu, Y., Guo, B., Shum, H. Y.: Real-Time Texture Synthesis by Patch-Based Sampling. MSR-TR-2001-40. Microsoft Research, 2001, 21 p. [8] Pickard, R., Graszyk, S., Mann, S., et al.: VisTex Database, Cambridge, Mass.: MIT Media Lab., 1995. [9] Portilla, J., Simoncelli, E. P.: A parametric texture model based on joint statistics of complex wavelet coecients. Int. Journal on Computer Vision, 40(1), 2000, 49{71. [10] Zalesny, A., Van Gool, L.: A compact model for viewpoint dependent texture synthesis. In: Pollefeys, M., Van Gool, L., Zisserman, A., Fitzgibbon, A. (Eds.). 3D Structure from Images { SMILE 2000. Second European Workshop on 3D Structure from Multiple Images of Large-Scale Environments, Dublin, Ireland, July 12, 2000. Revised Papers. Lecture Notes in Computer Science 2018, Berlin: Springer, 2001, 124{143.