Abstract. Texture segmentation is a long standing problem in computer vision. In this paper, we propose an interactive framework for texture seg- mentation.
Texture Image Segmentation: An Interactive Framework Based on Adaptive Features and Transductive Learning Shiming Xiang, Feiping Nie, and Changshui Zhang State Key Laboratory of Intelligent Technology and Systems, Department of Automation, Tsinghua University, Beijing 100084, China {xsm, nfp03, zcs}@mail.tsinghau.edu.cn Abstract. Texture segmentation is a long standing problem in computer vision. In this paper, we propose an interactive framework for texture segmentation. Our framework has two advantages. One is that the user can define the textures to be segmented by labelling a small part of points belonging to them. The other is that the user can further improve the segmentation quality through a few interactive manipulations if necessary. The filters used to extract the features are learned directly from the texture image to be segmented by the topographic independent component analysis. Transductive learning based on spectral graph partition is then used to infer the labels of the unlabelled points. Experiments on many texture images demonstrate that our approach can achieve good results.
1
Introduction
Automatic texture segmentation [1, 2] is a tough problem as witnessed in the past decades. It is challenging since it is under-constrained for the following reasons: The first is the intrinsic ambiguities in texture perception. We can distinguish a texture when we see it. However, it is difficult to give an accurate definition [1] which can be applied to a vast amount of vision patterns in generical images. Texture is a scale-related regional process, which may be understood in different ways by different people for different purposes. The second is the scale selection. Texture features for segmentation are all computed over local windows, whose size should be selected properly to contain the wide range of basic patterns. It is difficult to determine automatically an adaptive size without any prior information. The third is the uncertainty in quantity. In real applications, an image region may be explained as a texture or a combination of several vision objects. It is important to know the number of the textures in an image for later statistical pattern analysis. Motivated to the above observations, this paper addresses the problem of texture segmentation in a user controllable environment. The goal is to achieve good performance at the cost of modest human-computer interaction. The interactive framework needs the user to define his/her own textures by labelling some points belonging to them, supply the total number of the textures P.J. Narayanan et al. (Eds.): ACCV 2006, LNCS 3851, pp. 216–225, 2006. c Springer-Verlag Berlin Heidelberg 2006
Texture Image Segmentation: An Interactive Framework
217
in the texture image and the size of the local window for feature extraction. So the segmentation is controlled by the user. Through human-computer interaction, the class IDs (labels) of labelled points are all known. Then the task is to infer the labels of the unlabelled points based on those labelled ones. This is a typical learning problem. We use transductive learning via spectral graph partition [3] to solve the learning problem. Different from inductive learning and semi-supervised learning [4, 5], transductive learning only aims to infer the labels of the points in a given data set. Statistical learning results suggest that better results can be achieved [6]. In addition, a small part of labelled points are often enough to design an effective transductive learner (transducer). This means that the user is only required to label relatively few points. To extract the adaptive texture features, we use the topographic independent component analysis (TICA) [7] to learn the filters directly from the texture image to be segmented.
2
Related Work
Interactive image segmentation. The recent years have seen a surge of interest in interactive image segmentation[8, 9, 10]. By indicating certain pixels that absolutely belong to the parts of the objects, the background or the foreground, hard constraints are imposed to the segmentation system to alleviate the problems inherent to fully automatic segmentation. Another advantage is that the user can make final decision whether the current result is good or not. There exists a lot of work on this topic (refer to [8, 9, 10] for more literatures). Most of the approaches are based on color and gray information. To our knowledge, currently little work is developed on interactive texture segmentation. Transductive learning. The setting of transductive learning is introduced in [6]. A transducer is constructed on a set of fixed data points, which contains two subsets [11]: a training set of labelled points and a working set, i.e. test set, of unlabelled points. The general transduction task is to infer the labels of the points in the working set. But in the traditional inductive setting, a classification function is learned first and only later tested on a test set chosen after the learning has been completed [11]. Algorithms for designing a transducer can be found in [3, 11, 12, 13] and so on. Texture segmentation. Texture segmentation includes two main steps: feature extraction and pattern classification. Literatures on feature extraction are rich [2, 14]. Among the majority of existing methods, filter based methods have won an emerging consensus [15]. Almost all existing filters are designed under some mathematical framework, for example, the optimized Gabor filter bank [16]. Differing from the traditional approaches [15], Zeng et al. [17] apply the classical independent component analysis (ICA) [18] to natural scene images to learn the filters, and have achieved good results. In applications, filters adaptive to the texture image to be segmented are desired.
218
S. Xiang, F. Nie, and C. Zhang
Classification can be performed in an unsupervised or supervised way. Generally, most unsupervised clustering algorithms are designed on some prior knowledge. Naturally, the prior knowledge can be supplied by an interactive way.
3
Interactive Texture Segmentation Framework
3.1
Basic Formulation
Suppose the texture image to be segmented is converted to an array of feature vectors, X = (x1 , x2 , · · · , xn ). Each data point, xi (∈ RN ), corresponds to a pixel Pi and has a desired label of texture class yi ∈ {1, 2, · · · , C}, here C is the total number of the textures that the user supplies. Let X = XT ∪ XW , where XT is the training set and XW is the working set. The labels of the points in XT are given by the user when defining his/her textures. Now the task is to infer the labels of the points in XW . This is a typical transductive learning problem. We use Joachims’s spectral graph transducer (SGT) [3] to infer the labels. The SGT is a transductive version of the k-NN classifier, which is very suitable for our task since texture can often be modelled as a Markov random field. The SGT is initially designed for two-class classification problems. For multiclass problems (C > 2) [6], the labelled data should be divided into two subsets to construct a basic SGT. This may result in that they are unbalanced in quantity. But this factor is considered into the global optimization when using spectral methods to design a SGT. A key advantage of the algorithm is that it does not need additional heuristics to avoid unbalanced splits. However, the SGT needs to perform singular value decomposition (SVD) of Laplacian matrix whose size is equal to the number of the data points in X. Performing SVD may need a huge amount of computing resource for large matrices. Alternatively, to avoid labelling each pixel, we can label a representative subset of all pixels. The points in this subset can be chosen uniformly from the image by the user with proper resolution. Actually, it is reasonable only to consider a representative subset for texture image segmentation. This results from the fact that texture is a region process. Every pixel in a window patch1 of a texture should be labelled as the same class. We can use the center point as their representative. Thus, the total data points to be considered can be reduced to a large degree. This skill is similar to the technique applied in [19] to image segmentation. However, this treatment will not produce pixel-level accuracy in the edge regions between textures. The user can choose to increase the resolution by reducing the size of the window patch to alleviate this problem. 3.2
Overview of the Framework
The interactive texture segmentation framework consists of five main modules: initialization, feature extraction, transductive learning, filtering and user evaluation. 1
Note that it is not the “local window” used to extract the features.
Texture Image Segmentation: An Interactive Framework
219
Initialization. First, the user is required to provide four integers x0 , y0 , xs and ys to construct a representative subset P = {(x0 , y0 ), (x0 + xs , y0 ), · · · , (x0 , y0 + ys ), (x0 + xs , y0 + ys ), · · ·}. Here, xs and ys control the resolution of representatives. Second, the user is required to supply the total number of textures C and the size of the local window (wl , hl ). The third step is to define the textures. For user’s convenience, the user can only need to mark a rectangle region R. A subset is uniformly selected from R∩P with two controlling parameters of row step sr and column step sc . For example, sr /sc =2 means that the points in every two rows/columns will be selected. The user can also label single important points to define a texture. Alternatively, the user can also choose to provide a data file of labelled information. In this way, the user is only required to supply the array with zeros for unlabelled pixels and positive integers for labelled ones. Feature extraction. We use TICA to learn the filters to extract the features (Subsection 4.1). Transductive learning. Based on the features of the points, single SGT is used to solve two-class classification problem, while a group of transducers is designed for multi-class problems (Subsection 4.2). The output of transductive learning is an array of point labels. Filtering. To smooth the labels, median filter is performed on the label array according to the space relationship of data points. User evaluation. The user can further improve the segmentation results until s/he is satisfied. This provides a mechanism for the user to correct the errors. Usually, there are two kinds of errors. One is due to the reason that some basic vision patterns miss to be labelled. A part of image regions may be labelled as error textures. Another error often appears in the edge regions of different textures. When a patch in an edge region between textures is separated from the image setting, the ambiguity in pattern classification increases. Supplying more labelled points in the edge regions is desired.
4 4.1
Algorithm Texture Feature Extraction
Recent researches suggest that ICA process of nature scene images can result in edge detection [20]. Zeng et al. use the classic ICA to learn the filters from images of four nature scenes [17]. Differing from their work, we use the TICA to learn the filters directly from the texture image to be segmented. The reason is that the TICA is more suitable for image decomposition [7], compared with classic ICA. According to image decomposition, each image patch x, treated as a vector here, can be formulated as a linear combination of a set of image bases, i.e. x = A · s. Equivalently, we have s = W · x. Each column of A is a mixing basis
220
S. Xiang, F. Nie, and C. Zhang
Fig. 1. Four groups of filters learned by the TICA. Here, wp = hp = 16, Ntica = 6000. The dimensionality after PCA is reduced to 64 and Nf is equal to 40.
and each row of W is an unmixing basis. Note that the computation of W · x is similar to the convolution operation in signal processing and each row of W can be viewed as a filter. The steps of feature extraction are as follows: (1) Given the size of image patch (wp , hp ), we randomly choose Ntica patches and convert each one (gray data) to a vector respectively to get samples {x}. Then principal component analysis (PCA) is performed to reduce the redundancy and construct the eigen-space. After whitening the eigenspace transformed data, the TICA is used to learn the matrix W. By reconverting each row of W to a patch form, we get a group of filters denoted by F = (f1 , f2 , · · · , fNf ), where Nf is the total number of the filters learned by the TICA. Figure 1 shows four groups of filters, which correspond orderly to the source images in Figure 2. We can see that these filers are similar, to some degree, to the vision patterns that the texture contains. Thus, they are adaptive to the image data. (2) convolute the texture image with F . For each pixel p(x, y), a response Nf 1 2 , Rp(x,y) , · · · , Rp(x,y) )T . vector is obtained, i.e. Rp(x,y) = (Rp(x,y) (3) construct filter channels by pixel-to-filter mapping [24]. Each filter channel Ii corresponds to a filter, which is a subset of pixels where the given filter gets maximal response [17]. It can be calculated from Rp(x,y) : j Ii = {(x, y)|i = arg max{Rp(x,y) , (x, y) ∈ I} j
Obviously, {I1 , · · · , INf } is an equivalent partition of I, namely, I = ∪Ii and Ii ∩ Ij = ∅, ∀ i = j. (4) For each p(x, y), calculate a locally windowed filter histogram from all filter channels [17, 24]: N
f 1 2 |, |Hp(x,y) |, · · · , |Hp(x,y) |)T Hp(x,y) = (|Hp(x,y)
i here Hp(x,y) = {(s, t)|(s, t) ∈ Ii ∩ Np(x,y) }, Np(x,y) is the local window of p(x, y) and | · | is the cardinality of a set. The size of the local window, (wl , hl ), is given by the user. (5) Choose a Nd −dimensional sub-vector Fp(x,y) from Hp(x,y) by discarding its components with small values. j {Fp(x,y) }. After normalizing Fp(x,y) with M , we (6) Let M = max (x,y)∈I,1≤j≤Nd
obtain the texture feature.
Texture Image Segmentation: An Interactive Framework
221
Fig. 2. Comparable experimental results by k-means cluster. In each of the four panels, from the first to the fourth column are the source image [21, 22], the results by the Gabor filter bank [16], by the filters learned from the 13 natural images [23] by TICA, and by the filters learned from the image to be segmented. The upper and lower bounder of interesting frequencies of Gabor filter bank are taken as 0.4 and 0.05, while the scale number and orientation number are 4 and 6. For TICA, Nf = 40 and Nd = 20.
Gabor filter bank [16] is often used to extract texture features. However, it may not be effective for the textures with irregular and non-periodic vision patterns [17]. Some comparable results are reported in Figure 2. 4.2
Transductive Learning by Spectral Graph Partition
Joachims’s approach to transductive learning is a tranductive version of the kNN rule. Without a greedy search, the global optimization problem can also be solved effectively by spectral methods [3]. Two-class problem. The main steps of constructing a SGT are as follows: (1) Construct the similarity-weight kNN graph: sim(xi ,xj ) if xj ∈ knn(xi ) xk ∈knn(xi ) sim(xi ,xk ) Aij = 0 otherwise
where xi , xj and xk are texture features, and sim(xi , xj ) is the similarity between xi and xj , calculated as sim(xi , xj ) = (xi · xj )/(xi · xj ). T (2) Compute weighted matrix A = A + A , diagonal degree matrix B, Bii = j Aij , and Laplacian matrix L = B−0.5 · (B − A) · B−0.5 . (3) Compute the smallest 2 to d + 1 eigenvalues and the corresponding eigenvectors of L. (4) Construct a SGT. According to the labels, we first divide the training set into two subsets, i.e. positive training subset and negative training subset. Then, We compute the indicative vectors [3] for these two subsets and the working set, and evaluate the ratio of positive/nagetive points. Given the selected eigenvectors and a parameter c that trades off training error versus cut value, a SGT is finally constructed [3]. Multi-class problem. We use a method similar to one-versus-rest strategy [6] to deal with the multi-class problem. Given a combination of the class labels, we first partition the training set into positive and negative training subsets.
222
S. Xiang, F. Nie, and C. Zhang
A SGT is then constructed. According to different combinations of class labels, we can get a group of transducers. Majority voting principle is applied to the outputs of these transducers to infer the final labels of unlabelled points.
5 5.1
Results Textures in Benchmarks
To demonstrate the effectiveness of our approach, we apply the SGT and our texture features to many texture images constructed from two benchmarks, the Brodatz and MIT VisTex texture libraries [21, 22], which are mostly used in texture research. Some results are shown in Figure 3. The upper row in Figure 3 shows the source texture images. The segmentation results are demonstrated in the lower row. To construct a representative subset, we input 16, 16, (10,10) for rs , cs and (x0 ,y0 ), respectively. Then the data points used to define each texture are labelled by a window, as illustrated respectively in the upper row in Figure 4. In Figure 3(a), 3(b), 3(c), 3(d), the numbers of the labelled points for each texture are 8, 20, 8, and 12, respectively. The size of local window (wl , hl ) is 41×41.
(a)
(b)
(c)
(d)
Fig. 3. Artificial texture images and the segmentation results. Only gray data are used.
When designing a SGT, we take k = 10, c = 5000.0 and d = 200. For feature extraction, we take Nd =15. All the other parameters related to PCA and TICA are the same as those used in the experiments demonstrated in Figure 2. These parameters for transductive learning and feature extraction are fixed for all experiments. For four experiments reported in Figure 3, we achieve 100%, 98.05%, 97.27% and 94.14% correct rate, respectively.
Texture Image Segmentation: An Interactive Framework
5.2
223
Natural Texture Images
Figure 4 shows some results by applying our approach to real natural texture images. In Figure 4(a), we use two windows to label 12 data points, taking sr =2 and sc =1. We can see, from the result demonstrated in Figure 4(b), that only a few data points are incorrectly classified. To improve the performance, we add three labelled data points as shown in Figure 4(c) with yellow circles2 . Then the transducer is reconstructed according to the new labelled information. New result is shown in Figure 4(d).
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 4. Natural texture images and the segmentation results. Only gray data are used.
Figure 4(e) shows another texture image. 16 data points are labelled for the two textures respectively, taking sr = 1 and sc =2. The segmentation result is shown in Figure 4(f). The error rate is about 12%. One reason is that the blue-sky and white cloud texture is very complex. Besides the windowed vision patterns in Figure 4(e), the cloud patterns located in the second window as demonstrated in Figure 4(f) are also fundamental. Another reason is that the meadow patterns labelled in Figure 4(e) are not enough to represent the patterns located in the shadow region. To get better result, we add two labelled subsets of data points as shown in Figure 4(g). The result is illustrated in Figure 4(h). We can see that almost all data points are correctly labelled. The sizes of local window used to extract the texture features for Figure 4(a) and Figure 4(e) are 41 × 41 and 51×51, respectively. All the other parameters are the same as those used in the experiments in Subsection 5.1. 2
This is equivalent to performing heuristic post-processing.
224
6
S. Xiang, F. Nie, and C. Zhang
Conclusion
In conclusion, a new framework for texture image segmentation has been proposed and demonstrated, which can obtain good segmentation quality with a few interactive operations. This framework allows the user to define his/her own textures by supplying several labelled data points and make final decision by evaluating the results. In this interactive setting, texture image segmentation is formulated as one of designing transductive learners. In addition, the TICA is used to learn the filters directly from the image to be segmented. These filters are adaptive to the image data. Comparable experimental results shows that the features extracted by these filter are more separable for classification. The limit of our framework is that currently it is only suited for texture images. However, most natural images include not only texture objects but also other non-texture objects, such as objects with uniform color distribution, lines, shapes, etc.. In the future, the main work is to integrate different perception objects together into an interactive framework. Another limit of our work is that we can not obtain pixel-level accuracy. In the future, we would like to introduce hierarchical technique into our segmentation framework.
Acknowledgements This study is carried out as a part of “R&D promotion scheme funding international joint research” promoted by NICT (National Institute of Information and Communications Technology) of Japan We would like to thank the reviewers for their valuable suggestions, and thank Doctor Yangqiu Song and Shiliang Sun for valuable discussions.
References 1. Tuceryan, M., Jain, A. K.: Texture analysis. In Chen, C. H., Pau, L. F., Wang, P.S.P., eds: The handbook of pattern recognition and computer vision. 2nd edn. Singapore: World Scientific Publishing Company (1998) 207–248 2. Reed, T. R., du Buf, J. H. M.: A review of recent texture segmentation and feature extraction techniques. Computer Vision, Graphics, and Image Processing: Image Understanding, 57 (1993) 359–372 3. Joachims, T.: Transductive learning via spectral graph partitioning. In: Proc. of Int. Conf. on Machine Learning (ICML), Washington DC, USA (2003) 87–93 4. Vapnik, V. N.: The Nature of Statistical Learning Theory. Springer-Verlag, New York, 2nd edn, (2000) 5. Zhu, X. J.: Semi-Supervised Learning with Graphs. PhD thesis, Carnegie Mellon University (2005) 6. Vapnik, V. N.: Statistical Learning Theory. John Wiley, New York (1998) 7. Hyvarinen, A., Hoyer, P. O., Inki, M.: Topographic independent component analysis. Neural Computation, 13 (2001) 1525–1558
Texture Image Segmentation: An Interactive Framework
225
8. Boykov, Y., Jolly, M. P.: Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Vancouver, Canada (2001) 105–112 9. Rother, C., Kolmogorov, V., Blake, A.: Grabcut - interactive foreground extraction using iterated graph cuts, In: Proc. of ACM SIGGRAPH, Los Angeles, USA (2004) 309–314 10. Sun, J., Jia, J. Y., Tang, C. K., Shum, H. Y.: Poisson matting. In: Proc. of ACM SIGGRAPH, Los Angeles, USA (2004) 315–321 11. De Bie, T., Cristianini, N.: Convex methods for transduction, In: Advances in Neural Information Processing Systems. Vancouver, Canada (2003) 73–80 12. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of Conf. on Computational Learning Theory, Amsterdam, Holand (1998) 92–100 13. Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML, Bled, Slovenia (1999) 200–209 14. Zhang, J. G., Tan, T. N.: Brief review of invariant texture analysis methods. Pattern Recognition, 35 (2002) 735–747 15. Randen, T., Husoy, J. H.: Filtering for texture classification: A comparative study. IEEE Trans. On Pattern Analysis and Machine Intelligence, 21 (1999) 291–310 16. Manjunath, B. S., Ma, W. Y.: Texture features for browsing and retrieval of image data. IEEE Trans. On Pattern Analysis and Machine Intelligence, 18 (1996) 837– 842 17. Zeng, X. Y., Chen, Y. W., Nakao, Z., Lu, H. Q.: Texture segmentation based on pattern maps obtained by independent component analysis. In: Proc. of Int. Conf. On Neural Information Processing, Shanghai, China (2001) 1189–1193 18. Hyvarinen, A.: Survey on independent component analysis. Neural Computing Surveys, 2 (1999) 94–128 19. Yu, S. X., Shi, J. B.: Segmentation given partial grouping constraints. IEEE Trans. On Pattern Analysis and Machine Intelligence, 26 (2004) 173–183 20. Bell, A. J., Sejnowski, T. J.: The independent components of natural scenes are edge filters. Vision Research, 37 (1997) 3327–3338 21. Brodatz, P.: Textures: A Photographics Album for Artists and Designers. Dovery, New York (1966) 22. MIT Vision and Modeling Group: Texture image library. Http://www.media. mit.edu/vismod/, (1998) 23. Natural Image Collection for ICA Experiments: Texture image library. http://www.cis.hut.fi/projects/ica/data/images/ (2001) 24. Malik, J., Belongie, S., Shi, J. B., Leung, T.: Textons, contours and regions: Cue integration in image segmentation. In: ICCV, Corfu, Greece (1999) 918–925