Unsupervised Texture Segmentation on the Basis of Scale Space Features Jan Puzicha, Thomas Hofmann and Joachim M. Buhmann Rheinische Friedrich{Wilhelms{Universitat Bonn, Germany http://www-dbv.cs.uni-bonn.de, email: fjan,th,
[email protected]
Abstract
A novel approach to unsupervised texture segmentation is presented, which is formulated as a combinatorial optimization problem known as sparse pairwise data clustering. Pairwise dissimilarities between texture blocks are measured by scale space features, i.e., multi-resolution edges. These scale space features are computed by a Gabor lter bank tuned to spatial frequencies. To solve the data clustering problem a deterministic annealing technique is applied. This approach is examined from the viewpoint of scale space theory. Based on a mean eld approximation an ecient algorithm is derived. We present an application of the proposed algorithm to Brodatz{ like microtexture collages.
Introduction Unsupervised segmentation of textured images is a dicult low level vision problem with important applications in vision{guided autonomous robotics, product quality inspection, medical diagnosis and in the analysis of remotely sensed images. Recognition, optical ow and stereopsis algorithms often depend on a high quality image segmentation. A key diculty in unsupervised texture segmentation is the detection of the characteristic scale, at which a texture is de ned. Since natural textures arise at a wide range of scales, scale space methods form a natural approach for texture segmentation and texture classi cation problems. In contrast to the classical scale space construction, which is basically a low-pass smoothing operation, textures can be characterized by particular spatial frequencies. Thus a band-pass structure for feature extraction seems to be appropriate. Although the Gaussian lter family completely characterizes the local image structure [12, Chap. 1], a natural access to explicit frequency-band information is tuning the lter to a particular spatial frequency [4]. As will be brie y shown, this procedure yields the Gabor{ lter family [12, Chap. 2]. The good discrimination properties of Gabor features for textured images are well-known [1, 9, 10] and are in good agreement with psychophysical experiments [5]. Using a discretized feature space representation texture segmentation is formulated as a constrained combinatorial optimization problem by computing pairwise dissimilarities between blocks of feature vectors. Texture dissimilarity is measured as the Kolmogorov{Smirnov distance of the empirical distributions for each of the Gabor features. This strategy for texture discrimination has been rst proposed by Geman et al. [6] using features based on image intensities. We have signi cantly extended this approach by three major modi cations. First the intensity based texture features used in [6] are replaced by a collection of scale space features as described. Second
1
to appear as Technical Report, University of Kopenhagen
2
a normalized cost function is used, motivated by theoretical and empirical results. Third a deterministic annealing algorithm for pairwise data clustering [7] is adapted to sparse dissimilarity data. The original stochastic annealing method based on Gibbs{Sampling thus is replaced by an ecient deterministic algorithm with similar global optimization properties. Deterministic annealing can be understood as a continuation method, e.g., as tracking the minimum of a one parameter family of smoothed cost functions from low to high resolution. This family of functions naturally forms a scale space representation in an appropriate function space. High spatial resolution leads to noise in the characterization of texture classes by dissimilarities, as texture blocks are then compared using a small spatial support. For further improvement of a segmentation solution, relaxation techniques are employed as a postprocessing step to enforce spatial consistency. This heuristic processing step corresponds to simple a priori assumptions about region size and form [6].
Scale Space Features for Texture Segmentation The dierential structure of an image F (x) is completely extracted by the convolution with the Gaussian lter family [12] de ned by
p n xt x G1 ;:::;n (; x) = 2 2 ?1@1 ;:::;n e? 22 :
(1)
But it is convenient to rst tune a lter to the parameter of interest, e.g. a particular spatial frequency k. The tuning operation of a lter can be formalized [4] and yields the family of Gabor lters [12, Chap. 2] in the case of frequency tuning: t t 2 1 (2) B (x; ; k) = p 2 e?x x=2 eik x : 2 This is essentially a sine and a cosine function modulated by a Gaussian. Filtering an image with these kernels can be interpreted as a local Fourier analysis. The empirical known good discrimination properties of Gabor lters for a wide range of textures [1, 9, 10, 5] can thus be theoretically justi ed in a scale space framework. Gabor lters are optimally localized in the sense that they reach the lower bound for simultaneous localization in spatial and Fourier domain [3]. By convolution with the Gabor kernels a 3{dimensional feature space is assigned to each image point, which can be used together with a vector valued diusion for texture segmentation [12, Kap. 4.7], but this approach suers from the computational complexity of the continuous Gabor transform. In this work a discrete parameter subset of frequencies k is used instead to compute a collection of scale space feature channels frkg, which characterize the local texture. For a given frequency k only the modulus of the corresponding lter output is utilized. The frequency tuning of lters allows an axiomatic characterization of Gabor lters being the linear, shift invariant family of transformations which
is rotational invariant with respect to the tuning frequency k, is parameterized by a scale parameter with a semi{group structure, is scale invariant, i.e., the function that relates the observables is independent of the choice of dimensional units. Scale changes are thus treated in a self{similar manner.
to appear as Technical Report, University of Kopenhagen
3
The scale invariance leads to a technique known as dimensional analysis which states that scale invariant relations can always be expressed in terms of dimensionless variables. These variables can be obtained using the PI{theorem [11]. In the frequency tuning case the fundamental units are luminance and length and the derived quantities are the original image F , the lter output S , the scale and the frequencies w and k for addressing the Fourier domain and the tuning parameter. For a scale invariant linear lter K (; k) the relation S (; k) = K (; k) F
(3)
thus can be stated in terms of the dimensionless variables FS , w and kw. Rewriting (3) in the Fourier domain and requiring rotational symmetry with respect to k yields S^(; k) ^ = K (j (w ? k)j): F^
(4)
In analogy of the proof in [12, Kap. 1.5.6] for k = 0 this leads to K^ = exp(?j (w ? k)jp). It can be argued that p = 2 and = 2 under the assumptions of dierentiability with respect to the scale parameter and positivity of the kernel [12]. Thus K^ (; k) = G^ (; w ? k) = B^ (; k; w) is valid and K (; k) = B (; k) follows immediately.
Dissimilarity Measure
Given any discrete set of (scale space) features frk g we proceed as follows: to capture the characteristics of a texture, the image features are spatially grouped into blocks fBi g. A disparity measure d(Bri ; Brj ) evaluates the dissimilarity between textures of two blocks Bri ; Brj based on feature r. As originally proposed in [6] the Kolmogorov{Smirnov distance applied to the empirical feature distribution functions is used. For image blocks Bki = (bs )1sn the empirical distribution functions are de ned by 1 Fir (t) = j fbs tg j: (5) n
The Kolmogorov{Smirnov distance is de ned as the maximal distance of two distribution functions: d(Bri ; Brj ) = max jFir (t) ? Fjr (t)j: t
(6)
d(Bri ; Brj ) exhibits the desirable property of invariance to monotone transformations of feature data.
The often discussed question, whether energy [9] or modulus [10] of the lter output have better discrimination properties, thus does not arise. Dierent features are integrated using a simple maximization rule to avoid additional algorithmic parameters which have to be chosen according to an ad hoc heuristics. The result is a dissimilarity value Dij = maxr d(Bri ; Brj ): The maximum rule re ects the fact, that a signi cant dierence in one feature channel is sucient to judge two blocks as having distinct textures. Other mechanism for integrating dierent feature channels are currently under investigation. An obvious drawback of this approach is the large number of (possible) pairwise comparisons, scaling quadratically with the number of blocks N . To guarantee computational eciency the evaluation of dissimilarity values for block Bi is restricted to a signi cantly reduced neighborhood Ni, jNij N without loosing the discriminating properties [8]. The neighborhood consists of
4
to appear as Technical Report, University of Kopenhagen
nearest neighbors given by the image topology and a larger number of random neighbors. In our experiments the four nearest and between 20 and 30 random neighbors were used. An image segmentation is de ned by assigning a region label to each pixel block. Given N blocks and a xed number of labels K , an image P partition is represented by a Boolean assignment N K matrix M 2 f0; 1g satisfying the constraint K=1 Mi = 1 for all i, where Mi = 1 indicates that block Bi is assigned to the cluster of blocks with label . The constraint enforces that each a unique n block isN attributed o region label. Let M denote the set of valid con gurations P K K M 2 f0; 1g : =1 Mi = 1 . The quality of a segmentation is measured in terms of dissimilarities between blocks within the same cluster: K N X X Mj X P M Dij ; (7) Mi H(M) = i=1 =1
j 2Ni
k2Ni
k
The additional normalization removes the sensitive dependency of the minimum of H to constant shifts of the dissimilarities and is insensitive to dierent cluster sizes. The importance of these considerations is validated by the experimental results comparing both approaches.
Deterministic annealing Eq. (7) is a combinatorial optimization problem known as pairwise data clustering. To minimize the cost function H, a global optimization technique discussed in the literature as mean eld annealing is derived. Mean eld annealing has already been successfully applied to pairwise data clustering problems with completely known dissimilarity matrices [2]. The basic idea of annealing methods is to treat the unknown boolean variables of the combinatorial optimization problem as random variables, to introduce a thermal noise parameter T , called the computational temperature, and to calculate the expectations, e.g., of assignments Mi . A natural choice for a probability distribution from the viewpoint of robust statistics is the Gibbs distribution exp(?H(M )=T ) ; PG (M ) = P (8) M 2M exp(?H(M )=T ) since it maximizes the entropy S (P ) for all distributions with xed expectation hHiP or, equivalently, minimizes the generalized free energy X X FT (P ) = hHiP ? T S (P ) = P (M )H(M ) ? T P (M ) log P (M ): (9) M 2M
M 2M
The Gibbs distribution is maximally noncommittal with respect to missing information and maximally stable with respect to changes of the temperature T [2]. Note that for large T Eq. (9) is convex and that for T ! 0 the minimum converges to the uniform distribution on the set of states minimizing the original cost function. Thus FT is a smoothed version of the original optimization problem, where more and more details of the original cost function appear as T is lowered. Therefore, FT can be interpreted as a scale space representation in the function space parameterized by the set of probability distributions de ned over M. T assumes the role of a continuous scale parameter with semi{group properties. The original
to appear as Technical Report, University of Kopenhagen
5
optimization task is embedded into a one{parameter family of smoothed optimization problems. Let denote the space of all probability distributions de ned over M. The scale space is formally generated by a map OT operating on the space of real{valued functionals F : ! IR. It is de ned by OT (F (P )) = F (P ) ? T S (P ) with the identity operation O0. It is easily seen that (OT1 OT2 )(F (P )) = OT1+T2 (F (P )) and FT (P ) = OT (hHiP ). It is by no means obvious how to embed discrete optimization problems in a scale space representation. The two (open) questions arise, whether (i) the choice of embedding the problem in a probabilistic framework is an optimal continuation approach in some sense and whether (ii) the generation of a scale space representation by the maximum entropy method can be axiomatically founded as naturally arising from basic, easily acceptable postulates. Calculation of the expectations for Eq. (8) can be achieved either by Monte{Carlo / Gibbs{ Sampling [6] or (at least approximately) by analytical methods. A common way to derive deterministic annealing algorithms is to minimize Eq. (9) over a (tractable) subspace, e.g., the space of separable Gibbs distributions de ned by cost functions of the form K N X X Mi hi : (10) H0(M) = i=1 =1
The hi are chosen such that I (P (H0 )jjPG) is minimized, where I denotes the Kullback{Leibler distance commonly used as a distance measure for probability distributions. The hhi i are called mean elds. To derive the mean eld equations an equivalent expression of the cost function in the form of Eq. (10) has to be found, where hi = hi (fMj jj 6= ig) does not depend on assignments of block i. P In the unnormalized case hi is simply given by hi = j 2Ni Mj Dij , whereas the normalization leads to the more complex expression ! P X X Mj Mj k2Nj ;k6=i Mk Djk P P Dij + hi = : (11) Dij ? P k2Nj ;k6=i Mk j 2Ni k2Nj ;k6=i Mk + 1 j 2Ni k2Ni Mk h i P h i The assignment probabilities are approximated by hMi i = exp ? T1 hhi i = K=1 exp ? T1 hhk i ; where fhhi ig are so-called mean elds, which are derived from fhi g using the identities hMi Mj i = ij hMi i + (1 ? ij )hMi ihMj i. This results in a system of N K coupled transcendental equations, which is solved using a converging xed point iteration. More details about the application of mean eld approximations to optimization problems and extensions to unknown number of labels K can be found in [7] and the references in there. The technical details of the presented deterministic annealing algorithm can be found in [8].
Results We performed a large number of experiments on a representative set of mixtures of Brodatz{like microtextures. The test database consisted of 100 mixtures of ve textures each (as depicted in Fig. 1a), which were constructed randomly out of a representative set of 40 Brodatz{like micropatterns. The images with 512 512 pixels were covered with 64 64 overlapping blocks of 16 16 pixels each. A typical segmentation example is shown in Fig. 1. The used database and additional examples are available via World Wide Web (WWW).
6
to appear as Technical Report, University of Kopenhagen
a
b
c
d
e
f
Figure 1: A typical segmentation result with K = 5: (a) Randomly generated image. (b) Segmentation achieved using costs without normalization. (c) Segmentation on the basis of normalized costs. Second row: Represented in black are the wrong classi ed blocks of segmentation achieved using normalized costs before (d) and after (f) the postprocessing step. (e) Wrong classi ed blocks of segmentation without normalization after postprocessing.
Two questions are empirically investigated: 1. How adequate is texture segmentation modeled by the cost functions (7)? 2. How good is the quality of the mean eld optimization algorithm compared to stochastic procedures? An answer to the rst question is given by comparing minimal cost con gurations with known ground truth. Fig. 2 shows histograms of wrongly classi ed blocks between the ground truth con guration and segmentations obtained by deterministic annealing for the normalized and unnormalized cost functions. Signi cantly better results were obtained with the normalized version yielding segmentations with a median error rate as low as 6%. In most cases the normalized cost function based on a pairwise data clustering formalization thus captures the true structure of the image. As can be seen in Fig. 1f the wrongly classi ed blocks mainly correspond to errors at texture borders, which are not avoidable due to the support of the Gabor lter [10]. The unnormalized cost function in contrast often misses full texture classes as is illustrated in Fig. 1e. The second question concerns the quality of the mean eld optimization algorithm as opposed to stochastic procedures. The quality of the proposed clustering algorithm was evaluated by comparing the costs of the achieved segmentation on one hand with the costs of a ground truth segmentation
7
to appear as Technical Report, University of Kopenhagen
empirical density 0.07
not normalized normalized
0.06 0.05 0.04 0.03 0.02
a
0.01 00
10
20
30
40 50 60 wrong classi ed blocks [%]
b
empirical density 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 00 10
not normalized normalized
20
30
40 50 60 wrong classi ed blocks [%]
Empirical density of the percentage of wrong classi ed blocks: (a) before postprocessing. (b) after postprocessing. The peaks indicate errors caused by not discriminating dierent textures. Note that 12.8% of the blocks contain more than one texture, thus one has to expect errors.
Figure 2:
and on the other hand with a stochastic Gibbs sampling method. Compared to the latter no signi cant dierences were observed. The ground truth almost always had slightly higher costs compared to (presumably) minimal con gurations. This indicates that a better minimization algorithm does not lead to an improved segmentation algorithm. In this sense the deterministic annealing algorithm, ecient in contrast to its stochastic variants, is very well suited for the texture segmentation problem. Note that for the normalized cost function the used neighborhood does not de ne the neighborhood in the sense of Markov random elds which is much larger. Straightforward implementation of Gibbs{Sampling is thus very inecient. Eq. (11) can be interpreted as a more ecient way to implement a Gibbs{Sampler.
Conclusion We presented a novel approach to the segmentation from textured images based on three main ingredients, which are mainly independent of each other. First, a scale space approach for data representation based on Gabor lters has been suggested. These features are shown to evolves natural from theoretical concepts and exhibit excellent discrimination properties for a wide range of natural textures. Second, texture segmentation was formulated as a pairwise data clustering problem based on dissimilarities between texture blocks with a sparse neighborhood structure. For a given set of feature vectors associated with each pixel the pairwise dissimilarities are computed using the Kolmogorov{Smirnov distance of the empirical feature distributions. The approach of [6] was thereby signi cantly extended. A normalized cost function has been proposed motivated by theoretical and empirical evidence. Third, deterministic annealing optimization techniques were examined from the viewpoint of scale space theory. The family of cost functions derived from the principles of maximum entropy inference can be interpreted as a scale space representation with the temperature as a resolution parameter for problem complexity. An ecient algorithm for solving pairwise data clustering problems with a given sparse dissimilarity matrix has been derived. The algorithms were applied for the normalized and the unnormalized clustering function based on a large data base of Brodatz{like microtexture mixtures. We conclude from the presented results that the normalized cost function in contrast to the unnormalized case captures the true structure of textured images very well. The unavoidable noise due to high resolution is easily removed using a simple postprocessing step.
to appear as Technical Report, University of Kopenhagen
8
The scale hierarchies in feature space and in optimization space were treated independently. The interesting question remains how to couple resolution in feature space with algorithmic complexity in optimization space to design an optimal algorithmic strategy for image segmentation.
References [1] A. Bovik, M. Clark, and W. Geisler. Multichannel texture analysis using localized spatial lters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 1990. [2] J. Buhmann and T. Hofmann. A maximum entropy approach to pairwise data clustering. In Proceedings of the IAPR International Conference on Pattern Recognition, volume II, pages 207{212, 1994. [3] J. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical lters. Journal of the Optical Society Am. A, 2(7):1160{1169, 1985. [4] L. M. J. Florack, B. M. t. Haar Romeny, J. J. Koenderink, and M. A. Viergever. Families of tuned scale-space kernels. In G. Sandini, editor, Proceedings of the 2nd European Conference on Computer Vision, Lecture Notes in Computer Science 588, pages 19{23, Berlin, 1992. Springer-Verlag. [5] I. Fogel and D. Sagi. Gabor lters as texture discriminators. Biological Cybernetics, 61:103{ 113, 1989. [6] D. Geman, S. Geman, C. Gragne, and P. Dong. Boundary detection by constrained optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7):609{628, July 1990. [7] T. Hofmann and J. Buhmann. Pairwise data clustering by deterministic annealing. Technical Report IAI-TR-95-7, revised in 3.96, Institut fur Informatik III, June 1995. [8] T. Hofmann, J. Puzicha, and J. Buhmann. Unsupervised segmentation of textured images by pairwise data clustering. Technical Report IAI-TR-96-2, Institut fur Informatik III, 1996. [9] A. Jain and F. Farrokhnia. Unsupervised texture segmentation using Gabor lters. Pattern Recognition, 24(12):1167{1186, 1991. [10] R. Navarro, O. Nestares, J. Portilla, and A. Tabernero. Several experiments on texture analysis, coding and synthesis by Gabor wavelets. Technical Report 52, Instituto de Optica Daza de Valdes de Madrid, 1994. [11] P. Olver. Applications of Lie Groups to Dierential Equations. Springer{Verlag, 2. edition, 1993. [12] H. Romeny, editor. Geometry-Driven Diusion in Computer Vision. Kluwer Academic Publishers, 1994.