Departamento de Ciência da Computação. Núcleo de Processamento Digital de Imagens. Caixa Postal 702, 31270-010 Belo Horizonte, MG, Brazil. {arnaldo ...
Unsupervised Image Segmentation based on Wavelet Textural Analysis and Neural Networks A. de A. Araújo, L. M. B. Claudino, R. A. R. Oliveira, S. J. F. Guimarães, and E. Bastos♦ Universidade Federal de Minas Gerais Departamento de Ciência da Computação Núcleo de Processamento Digital de Imagens Caixa Postal 702, 31270-010 Belo Horizonte, MG, Brazil {arnaldo,claudino,rabelo,sjamil }@dcc.ufmg.br ♦
The segmentation scheme is performed over a feature set extracted from the source image, after being stored as weights in a neural network. Then, the image data are classified and we are ready to study the quality of the description obtained. This paper is organized as follows: in Section 2, we present the wavelet techniques that we used to perform the feature extraction. In Sections 3 and 4, we provide the basic theory about self-organizing maps and the k-means clustering. Section 5 describes the segmentation/ classification algorithm, and finally the results and the conclusion are illustrated in Sections 6 and 7, respectively.
Laboratório de Microscopia e Palinologia - Divisão de Bromatologia e Toxicologia, Fundação Ezequiel Dias, Cx. Postal 26, Belo Horizonte, MG, Brazil
2 Feature Extraction
Abstract. One of the difficulties in pattern recognition is to develop a good evaluation of the classes present on a scene. To suitably describe those classes it is necessary to find feature spaces, which allow distinguishing between them. In this work, we propose an unsupervised segmentation/classification technique based on wavelet textural analysis and self-organizing maps clustering.
The modelization of texture is a difficult problem. The substantial body of work on textures has not yet produced an exhaustive solution for the problem of texture analysis, classification and segmentation. Furthermore, the analysis and extraction of texture in unconstrained imagery, which is the most difficult problem, has not yet been sufficiently addressed. Attempts at modeling textures include different approaches like spatial gray-level dependencies, cooccurrence matrices and spatial/frequency techniques [1]. Due to results and studies in the spatial/frequency techniques we chose to use this approach. A spatial/frequency expansion of the image captures its localized spatial and frequency content. In general, image spatial/frequency transforms attain a joint reduction in spatial/frequency domain that is bounded by the uncertainty principle, in the sense when a high space resolution is achieved, the frequency resolution is low and vice-versa. Obtaining a high joint resolution in spatial/frequency domain is critical in texture region extraction where the localization of the texture pattern is important. This methodology is called multi-resolution analysis [2]. Multi-resolution analysis is a signal processing strategy where a set of specialized filters are designed to extract information from the signal, about the occurring frequencies and its temporal localization, in different resolutions. Considering images, the color variation is the frequency information and the temporal localization is the spatial localization of this variation. This approach is useful in image description, according to studies of the human visual system (HVS). The HVS has been found to decompose the retinal image into narrow bands of frequency and orientation, corresponding to outputs of spatial/frequency channels where HVS texture segmentation occurs [3]. This process can be
1 Introduction Nowadays, images are used as important sources of information, but in order to use this information it is necessary to develop methods to process and analyse the images. In this way, we propose a method that is applied to sample images of milled and toasted coffee beans which present impurities (wood, silica and barley, etc.), the purpose being to provide good quality/quantity description of coffee purity. This contribution is a long term work in the area of food quality control in Minas Gerais (MG), Brazil. This is a joint project carried out by the Computer Science Department of the Universidade Federal de Minas Gerais and the Ezequiel Dias Minas Gerais State Foundation (FUNED). The sample images were generated at FUNED using an optical microscopy Olympus BX50, with video camera and a frame grabber. Once the images were not so formally acquired, for instance in terms of good conditions of brightness and luminance, the entire process of image description became more complicated. After verifying some usual image descriptors such as edge information, color and gray-level, we found the most expressive results using texture. In this method, texture was chosen to be characterized by a statistical analysis over the wavelets coeficients, providing a multiresolution analysis.
2.1 The Texture Modelization
approximated by the Gabor functions, but there are drawbacks to use them in practical applications. A good solution for this process, without using Gabor functions, is to use a quadrature mirror filter (QMF) wavelet bank [2]. The QMF wavelet filter produces octave bandwidth segmentations in spatial/frequency domain and is an orthogonal approximation to that produced by the Gabor filters. By spacing the filters at octave band distances, the wavelet filter bank provides dyadic trade-off in spatial/frequency resolution. In this work, we used two base functions for the QMF wavelet filter: The Haar and the Daubechies functions [4]. 2.2 The Wavelet Transform The implementation of the QMF wavelet filter uses the same principle of the Discrete Wavelet Transform [1]. First let f(x) be an input signal of the filters hi. The convolution of f(x) and hi has the form:
with half size of the original signal, and the other two, each one with one quarter of the original size. This process is applied again over the output of the low-pass filter, storing the high-pass signals and filtering the low signals, and repeated until the desired resolution is reached. Considering an image as bi-dimensional functions, the process is applied over each dimension (rows and columns) separately, and each high-pass signal, which describes a different orientation of the original input [1], is stored. This process computes the orientation and the frequency of the image at different resolutions, and each one is known as a subband. At this point, the image can be characterized by making a first order statistical analysis, like means, variance, entropy, contrast, energy, of the subbands. In this work, we chose to use the energy, considered a very good texture descriptor [2], [5]. The energy formula is:
the original signal can be reconstructed as: 3 Self-Organizing Maps 3.1 The Self-organizing Map Model
This result is relevant to texture extraction once it is possible to achieve a perfect signal reconstruction. The outputs that are localized filters and the decimation of the filter outputs reduce the process complexity [2]. In order to do the analysis of the signal, it must be decomposed in different parts followed by a filtering. This signal filtering is done in two steps, a low-pass filtering and a high-pass one. As examples of those filters, we can have h0(iδ t) and h1(iδ t), respectively, where the first produces the signal g0(iδ t) and the second produces g1(iδt) which contain the low frequencies and the high frequencies of the signal, respectively. For the QMF filter bank, the filters h1 and h0 are defined as:
where n is the input signal. The Wavelet Transform algorithm is implemented in the following way: the signal f(x) is filtered by h0(n) and h1(n) and the output of the low-pass filter, g0, is filtered again, and the output of the high-pass filter, g1, is stored. At this step, the filters generate three signals, one signal
A self-organizing map (SOM) is a neural network model developed in 1980 by Teuvo Kohonen [6]. In contrast with other neural network models, it has a strong physiology inspiration, as it is based on the topological map that exists in the brain cortex. The cortex is organized so that topologically closer neurons tend to produce answers to the same kind of stimulus; this is one of the reasons why it is largely employed in visual pattern recognition. Self-organizing maps are commonly used to the exploration of large data sets, extracting the most relevant features and relations between them [7]. As well as those data sets, images might also be considered as data sources that sometimes hide useful information. The use of selforganizing maps allows the creation of shorter feature sets that keep the most relevant data contained in an image. 3.2 The Training Algorithm The self-organizing map training algorithm is based on competitive learning [8], which is a particular case of neural network unsupervised learning. In this case, the presentation of a certain input starts some kind of competition among the neuron units, so one of them is
considered the winner. It is unsupervised in the sense that the desired outputs are not fed to the network. In the training algorithm of the self-organizing map, the neural network is searched to find the unit that is the nearest to the present entry. The similarity criterion usually chosen is the Euclidean distance [1]. This winning unit and some N neighbors around it have their weights w updated, and become better representative of the x input data, as the following expression shows (where t means the current time):
In the above expression, n(t) is the learning rate, which works as a decreasing function of the distance of the units from the winner [7]. The full training stage consists on presenting several data to the network, commonly chosen randomly, and on the computation of the neural network weights until the network reaches the stability. This can be evaluated, for example, by previously stating a certain number of inputs that should be, at least, presented to the network. An other option, actually the one employed in this work, is to determine a set of entries from the entire data set and apply the following expression:
are then grouped according to the number of centroids by applying some distance criterion. Once more, the Euclidean distance was used. After that, the centroids are computed again. In sequence, the β parameter, defined as follows, is calculated to evaluate the quality of the clustering:
Here, Sb and Sw are the between- and within cluster scatter matrices. The β parameter is obtained by the product of the trace of both matrices, resulting in a value that measures the separability of the data. The β parameter is then compared to its previous value. If the result is greater or equal to the previous one, another centroid is added by picking the data vector that has the largest distance from its closest cluster center, and the process continues from the grouping data stage. Otherwise, the centroids already found are the optimal ones. Figure 1 shows the relationship between the number of groups detected by the k-means algorithm and the β parameter. A local maximum can be achieved rather than a global one, which is a real problem. This can be avoided by the suitable choice of the data set, before the segmentation process is launched. β Local maximum
The Euclidean distance between each of the i entries and its c nearest units in the network is accumulated. As soon as the computed value becomes shorter or equal to a threshold (∈), the network is considered stable, and the training process is stopped. 4 Unsupervised k-Means Algorithm The unsupervised k-means algorithm is a data clustering technique that was presented by Coleman & Andrews [9]. Its objective is to determine the most representative groups of some considered data set. Those groups are quantitatively expressed in terms of a feature vector with the same dimensions of the explored data – the centroids or cluster centers. The algorithm starts with two centroids. It was observed that the choice of the initial centroid has a deep influence at the final results [10]. We also used the Karhunen-Loève transform (also called Principal Component Analysis - PCA) to compute them as suggested by the same reference. At each iteration, data
Global maximum
Number of classes
FIGURE 1 – Number of groups versus β parameter. 5 The Segmentation and Classification Algorithm Features are extracted by picking windows from the image randomly, at each iteration. The way the windows are selected from the image can deeply influence the final results. Random chosen windows give the same probability to all regions to be selected during this stage; this is because it was originally designed this way. In Figure 2, we show the whole segmentation and classification scheme. The convolution between the chosen window and the wavelet bank will result in an energy/subband feature vector. Each feature vector extracted is presented as an input to a self-organizing map, which is responsible by selecting the most representative features. The size of the
network, as well as the learning rate also has great importance on the obtained results. The features keep being sampled and presented to the neural network until it reaches stability. Still Unstable
Feature Extraction
SOM training
Feature Vector Stable
Image Classification
K Classes
Unsupervised K-Means
FIGURE 2 – Segmentation and classification scheme. The use of a self-organizing map as a feature selector/reductor is an interesting strategy because it reduces computation, in the sense that only the data stored as the neural network node weights will be segmented. The segmentation process is based on the Coleman & Andrews [9] unsupervised k-means clustering algorithm. It detects an optimal number of classes, each being described by a feature vector. Those classes will work as codebooks at the classification stage. Each region of the target image itself is in fact classified in one of them. To perform this classification, new windows are sequentially obtained from the source image, and labeled according to their closer detected class, using, for example, the Euclidean norm as a distance criterion. 6 Results The self-organizing map was set to a bi-dimensional array of only 8x8 nodes. Besides, it was chosen a square neighborhood of nearly 80% of the number of nodes (in this case, about 6 nodes), and a learning rate of 0.6%. Both the neighborhood and the learning rate decrease exponentially by 0.02% at each iteration. The network was considered stable respecting the stability criterion shown in Section 3.2 with an error threshold of 10-5. Those parameters made possible to reach good results, once the learning rate decreases by a very small factor, even having a network that has a short number of nodes. Also, this helped to achieve the convergence faster than having 32x32 nodes, a number that is usually employed. Figure 3 shows the resulting self-organizing map for a coffee image sample :
(a)
(b)
FIGURE 3 – Coffee sample: (a) Original image and (b) the respective trained self-organizing map. For the feature extraction, the sample windows were 8x8 pixel sized. It was observed that the way the windows are chosen to be picked can modify the segmentation/classification process results. Windows chosen sequentially often led to more stable results, but semantically worse than the ones obtained by windows chosen at random. We tested another way of choosing the windows: "addicting the sampling engine", that is, increasing the probability of some regions of the image to have more picked windows than others. That led to unstable results that sometimes were high, but at most of the executions were poor. We first used the Haar function as the wavelet function, and although it is considered by several authors as the simplest function, it provided good results [11]. The use of the Daubechies function [4] improved considerably the segmentation process. It provides a better texture description because of the smoothness aspect that is almost absent in the Haar function [4]. Figure 4 and 5 show the first results obtained, when the Haar function was used at the segmentation process. The original images were processed in gray-scale. Figures 6 and 7 show the original color images of the coffee grains and their respective segmentation and classification results using the Daubechies function. On the processed images, each gray level represents a label for one of the detected classes. 7 Conclusion In this work, we proposed an unsupervised segmentation/classification technique based on wavelet analysis and self-organizing maps clustering. After verifying some usual image descriptors such as edge information, color and gray-level, we found the most expressive results using texture. In this method, texture was chosen to be characterized by a statistical analysis over the wavelets coeficients, providing a multiresolution analysis. We first used the Haar function as the wavelet function, and although it is considered by several authors as the simplest function, it provided good results [11]. The
use of the Daubechies function [4] improved considerably the segmentation process. It provides a better texture description because of the smoothness aspect that is almost absent in the Haar function [4]. 8 Acknowledgements The authors would like to thank CNPq, CAPES, FAPEMIG, and the DCC SIAM Project for the financial support of this work.
(a)
(b)
FIGURE 7 - Coffee and barley sample: (a) original image and (b) processed image by using the Daubechies wavelet function. References
(a)
(b)
FIGURE 4 – Pure coffee sample: (a) original image and (b) processed images by using the Haar wavelet function.
(a)
(b)
FIGURE 5 - Coffee and barley sample: (a) original image and (b) processed image by using the Haar wavelet function.
(a)
(b)
FIGURE 6 - Coffee and wood sample: (a) original image and (b) processed image by using the Daubechies wavelet function.
[1] K. R. Castleman, Digital Image Processing, Prentice Hall Inc, New Jersey, 1996. [2] J. R. Smith, “Integrated Spatial and Feature Image Systems: Retrieval, Analysis and Compression", Ph.D. thesis, Graduate School of Arts and Sciences, Columbia University, USA, 1997. [3] J. P. Frisbyl, Seeing, Oxford University, 1980, pp 3969. [4] I. Daubechies, Ten Lectures on Wavelets, 3rd. ed., Society for Industrial and Applied Mathematics, 1994. [5] J. Z. Wang, G. W. O. Firschen, and S. X. Wei, “Content Based image Indexing and Searching using Daubechies Wavelets”, International Journal of Digital Libraries, 1997, pp 311-328. [6] T. Kohonen, Self-organization and associative memory, Springer-Verlag, Berlin, 1989. [7] S. Kaski, “Data exploration using self organizing maps”, PhD thesis, University of Tecnology, Espoo, Finland, 1997. [8] A. P. Braga, A. P. Leon, and T. B. Ludemir, Fundamentos de redes neurais artificiais. 11ª Escola de Computação, Rio de Janeiro, Brazil, 1998. [9] G. B. Coleman and H. C. Andrews, “Image Segmentation by clustering”, Proc. IEEE, vol. 16, 1979, pp 773-785. [10] J. Moreira, “Uma proposta de estruturação e integração de processamento de cores em sistemas artificiais de visão, Ph.D. thesis, Instituto de Física de São Paulo, Universidade de São Paulo, Brazil, 1999. [11] A. de A. Araújo, L. M. B. Claudino, R. A. R. Oliveira, S. J. F. Guimarães, “Pattern Image Recognition and Image Description by Suitable Textural Information”, Proceedings of the 13th SIBGRAPI, IEEE Computer Society Press, 2000.