Key words: Pattern Recognition, Image Segmentation, Neural Network. 1. ... mentation has been applied in many different problems of machine vision like object ... To solve this problem a very simple and effective model, called segmentation and .... on the concepts of a receptive field over an image and the basic idea is to ...
Nonclassical Receptive Field Inhibition Applied to Image Segmentation Bruno J. T. Fernandes, George D. C. Cavalcanti and Tsang I. Ren∗
Abstract: This paper presents a new model to perform a supervised image segmentation task. The proposed model is called segmentation and classification with receptive fields (SCRF) which is based on the concept of receptive fields that analyzes pieces of an image considering not only a pixel or group of them, but also the relationship between them and their neighbors. In order to work with the SCRF model, we propose a new artificial neural network, called I-PyraNet, which is a hybrid implementation of the recently described PyraNet and the nonclassical receptive fields inhibition. Furthermore, the model and the neural network are combined to accomplish a satellite image segmentation task. Key words: Pattern Recognition, Image Segmentation, Neural Network
1.
Introduction
Image segmentation can be compared to a clustering problem which is to partition a given data set (image) into a number of different clusters (parts of the image). The objective is to separate the image into different relevant regions. Image segmentation has been applied in many different problems of machine vision like object detection [1, 2, 3], word recognition [4] and forest detection in satellite images [5, 6], the aim of this work. Most of the work developed in the field of image segmentation uses unsupervised segmentation to perform its goal, however if the intention is to recognize objects inside the picture, this kind of segmentation implies that it is necessary a further analysis of the image to classify each cluster (segment of the image). Meurie et al. [7] showed that supervised pixel classification might lead to better results when compared to unsupervised techniques. A supervised segmentation procedure was performed in [8], different classifiers were used to achieve a skin detection task. Their results indicated that a Multilayer Perceptron (MLP) [9] presents the highest classification rate compared to other algorithms. It is important to note that the classification in [8] was performed pixel-by-pixel and it was based on the assumption that the human skin has a very consistent color which is distinct from the colors of many other objects. This hypothesis, however, cannot be made to any given object. To solve this problem a very simple and effective model, called segmentation and classification with receptive fields (SCRF), is proposed here. This model makes use ∗ Bruno J. T. Fernandes, George D. C. Cavalcanti and Tsang I. Ren Informatics Center (CIn), Federal University of Pernambuco (UFPE), Recife, PE, Brazil, { bjtf,gdcc,tir }@cin.ufpe.br
c °ICS AS CR 2008
21
Neural Network World 01/09
of the concepts of receptive fields [10] which recognizes an image based not only in the values of a single pixel but also in the way that the pixel interacts with its neighborhood. What the SCRF model proposes is to divide an image in such a way that the classification for each generated sub-image is used to obtain the classification for each pixel in the image. In another work, Phung and Bouzerdoum [11] proposed an artificial neural network, called PyraNet, to work as a supervised classifier having as input a regular 2-D image. Its architecture allowed the feature extraction and classification of a pattern to occur in one single step, it also made use of several other advantages of 2-D NNs, such as to retain the spatial topology of the image patterns while extracting features. However, PyraNet did not contemplate the inhibitory behavior of the neurons that surrounds any given receptive field [12]. This kind of inhibitory stimulus showed to be very common around the receptive fields and it was successfully applied by [13] in the problem of contour detection achieving great results. Therefore, in this paper we also proposed a new hybrid implementation of the PyraNet, called I-PyraNet, which was based on all the concepts presented in the PyraNet and also in the inhibitory behavior around the receptive fields. The I-PyraNet was used together with the SCRF technique to perform supervised segmentation tasks. Our contributions in this paper are threefold. First, we present the SCRF model which makes use of the pixel neighborhood to extract its classification. Next, we present the I-PyraNet classifier that is able to receive as direct inputs the sub-images generated by the SCRF model. Finally, in our experimental results are presented the advantages of integrating the SCRF model and the I-PyraNet classifier. This paper is organized as follows. In Section 2. the concepts of the receptive and inhibitory fields are presented. In Section 3. we introduce the SCRF model used to perform image segmentation. In Section 4. we present the I-PyraNet classifier. In Section 5. a comparison between different segmentation methods to solve the problem of forest detection in a satellite image is presented. Finally, in Section 6. we discuss some concluding remarks.
2.
Receptive and Inhibitory Fields
First of all, an overview about receptive fields is required to better understanding of the proposed methods. A receptive field is defined in [14] as “an area in which stimulation leads to response of a particular sensory neuron”. In a few words, the receptive field of a neuron is a region of affectation that it covers, this region is presently referred as the classical receptive field (CRF). This kind of neuron exists in many parts of the human brain, and it has been already identified at the auditory, somatosensory and visual systems [14]. However, research developed in [12, 15] showed that besides the receptive field, another stimulus originated outside the receptive field can also have effect over the neuron. In most of cases, this is an inhibitory stimulus and is referred as nonclassical receptive field (non-CRF) inhibition. This kind of stimulus has been observed in the visual cortex of the macaque monkeys, also it was demonstrated that this stimulus influences the visual perception in human being [16]. Non-CRF inhibition seems to be a common property of biological responsible for edge detectors, and it 22
Fernandes et al.: Nonclassical Receptive Field Inhibition Applied to Image Segmentation
Classpixels=argmaxclass(P11,… P1n) Classpixels=argmaxclass(P21,… P2n) Classpixels=argmaxclass(P31,… P3n) Classpixels=argmaxclass(P11+P21,… P1n+P2n)
Classpixels=argmaxclass(P21+P31,… P2n+P3n)
Pixels Classification
I1
P11, P12,… P1n
I2
P21, P22,… P2n
P31, P32,… P3n
I3
Sub-Images Classification
Sub-Images Extraction
Overlapped Regions
Image Acquisition
Fig. 1 System architecture. was already successfully applied in contour detection [13]. Therefore, these results of theses researches motivated our work since we use the same concepts in the artificial neural network, PyraNet, to improve its capacity of learning and to obtain better results in the field of visual pattern classification.
3.
Image Classification Model
The model proposed in this work is called segmentation and classification with receptive fields (SCRF). An image is divided in sub-images and the classification of each one of them is used to classify each pixel in the image. The model is based on the concepts of a receptive field over an image and the basic idea is to generate sub-images that shares some overlapped pixels, in this way the classification of a pixel will not depend only in itself, but it will also depend on the classification of the sub-images that contains it. This means that the pixels in its neighborhood will also affect its classification. Figure 1 shows the architecture of the proposed model. The SCRF model works as follows: first, an image is acquired, (Image Acquisition, Figure 1); after that the original image must be divided into sub-images (Sub-Images Extraction, Figure 1) that will have a defined size r×r called the re23
Neural Network World 01/09
ceptive field which shares some overlapped pixels. Then, the probability of each sub-image to belong to each one of the known classes (Sub-Images Classification, Figure 1, each probability is given by Pij , i.e. the probability of the sub-image j belongs to the class i) must be calculated, through the use of a supervised classifier. In a final stage after classify each sub-image and in order to classify each image pixel, the model defines the class of a pixel as the class that shows the highest sum of probabilities between the sub-images that contains the pixel (Pixels Classification, Figure 1, where Classpixels is the classification result for all the pixels in that image segment). However, if the pixel is not in an overlapped area, which means that only one sub-image contains it, only one probability will be generated for each class and it will be assigned to the highest one. The next equation shows how the classification is calculated for a single pixel: X (1) Cxi,j = argmax P (c, SI)) , class c
SI|xi,j ∈SI
where xi,j is a single pixel in the (i, j) image position, Cxi,j is the pixel classification, c denotes one of the possible classes, SI represents a sub-image and P (c, SI) represents the a posteriori probability of a given sub-image SI to belong to a given class c. Although in this work the summation of the probabilities is used to define the pixel class, any other metric using the calculated probabilities can be applied.
4.
I-PyraNet
The I-PyraNet is proposed here to work as a supervised classifier in the SCRF model. Its inputs are the generated sub-images and the outputs corresponds to the classifications for each sub-image. I-PyraNet is based on the PyraNet [11] which is an artificial neural network used for the classification of visual patterns, it is also motivated by the very good results obtained by the convolutional neural networks (CNNs) [17]. PyraNet main advantage compared to other models is that it realizes features extraction and classification into a single structure keeping the topology information of the input image. However, the PyraNet model only consider the excitatory effects of neurons inside a receptive field. Therefore, we proposed here the I-PyraNet, in this model a neuron might send both excitatory or inhibitory stimulus to other neurons in posteriors layers. The idea to use the inhibitory stimulus is based on the concept of inhibitory and receptive fields. It is important to note that the magnitude of the impulse sent by a neuron will always be the same, only the signal of this impulse will be affected by the fact of the neuron is present in a receptive or inhibitory field (the signal of the impulse sent by the neuron will invert when it appears in a inhibitory field).
4.1
I-PyraNet Architecture
The architecture of a I-PyraNet, Figure 2, is formed by a multilayered network with two different kinds of processing layers. The 2-D layers, arranged into a 2-D 24
Fernandes et al.: Nonclassical Receptive Field Inhibition Applied to Image Segmentation
1-D Layers
2-D Layers
Receptive Field Inhibitory Field Other Pixels
Fig. 2 I-PyraNet architecture
array, that realize the feature extraction and data reduction and are located at the base of the network; and the 1-D feedforward layers that realize the classification step are located at the top of the network. The last 2-D layer is connected with the first 1-D layer. The entire network is connected in cascade, the output of one layer works as the input to the next layer. Each neuron of a 2-D layer l is connected to a receptive field of size r×r in the previous 2-D layer l − 1 surrounded by an inhibitory field. The presence of a neuron in an inhibitory field affects the neuron output by inverting its signal. Then, being o the amount of overlapped neurons between adjacent receptive fields and g is the gap defined by g = r − o, the height H and the width W of the 2-D layer l is related by Hl = b((Hl−1 − ol )/gl )c
(2)
Wl = b((Wl−1 − ol )/gl )c
(3)
where l denotes the index of a 2-D layer and l = 0 denotes the input image. Furthermore, there is the vertical or horizontal number of inhibitory neurons surrounding the receptive field of a neuron given by h, which contributes negatively for the output of this neuron. The output of a 2-D neuron consists of a nonlinear activation function f , in this work the tangent hyperbolic is used as activation function for all the neurons. This function is applied on the weighted sum of the output of those neurons contained in its receptive field minus the weighted sum of the neurons contained in its inhibitory field. Supposing that (u, v) is the position of a neuron in the pyramidal layer l, (i, j) is the position of a neuron in the previous pyramidal layer l − 1 and bu,v is 25
Neural Network World 01/09
the bias of the neuron (u, v), the output of the neuron yu,v is given by yu,v
X X =f wi,j ×yi,j − wi,j ×yi,j + bu,v , , |{z} i,j∈R i,j∈I u,v u,v | {z } | {z } Bias Receptive F ield
(4)
Inhibitory F ield
where wi,j denotes the weight associate with the input position (i, j) to the 2-D layer l, yi,j the output of the neuron (i, j) in the previous layer l − 1, Ru,v the receptive field of the neuron (u, v) with the size given by r and Iu,v the inhibitory field of the neuron (u, v) with the size given by h. The input to the first 2-D layer is the input image. The output of the last 2-D layer is rearranged into a column vector and works as the input to the first 1-D layer. The 1-D layer works as a simple Multilayered Perceptron (MLP), where the output of a neuron is given by a nonlinear activation function f applied upon the weighted sum of the neurons, which have the weight related with the connection neuron to neuron. The output of the neuron y in position n at layer l, represented by ynl is calculated through the equation, Nl−1 X l−1 (5) ynl = f wm,n ×ym + bln , m=1
where Nl−1 represents the number of neurons in the previous 1-D layer l − 1, wm,n is the synaptic weight from the neuron m in the layer l − 1 to the neuron n at layer l−1 l, ym is the output of the neuron m at layer l − 1 and bln is the bias of the neuron n at layer l. The output of the last 1-D layer work as the PyraNet output.
4.2
I-PyraNet Training
The I-PyraNet must first be trained to be able to perform visual pattern recognition tasks. As a supervised neural network its objective is to reduce the error obtained through the output desired and the output obtained and it is made adjusting the synaptic weights in the I-PyraNet. In this work, the approach used to perform this task was the cross-entropy function (CE) [18], where the network outputs estimate the a posteriori probability for each known class. Being ynL the output of the neuron n in last network layer L for an input image k, the estimated a posteriori probability pn is given by pkn
NL ´ ³ ¡ L,k ¢ X = exp yn / exp yiL,k ,
(6)
i=1
where NL is the number of neurons in the layer L. Therefore, in order to adjust the synaptic weights in the I-PyraNet, the error gradient of the weights must be calculated through the error sensitivity for each neuron. The error sensitivity δ for each neuron n at the 1-D output network layer L1D , for an input image k is given by ¡ ¢ δnL1D ,k = ekn f 0 snL1D ,k , (7) 26
Fernandes et al.: Nonclassical Receptive Field Inhibition Applied to Image Segmentation
where ekn is the output ynk produced by the neuron n at the last 1-D layer L1D minus the desired output dkn , then ekn = ynk − dkn . And snL1D ,k is the weighted sum input to neuron n at layer L1D and f 0 is the differential of the activation function f . Then, for the neurons in the others 1-D layers l1D < L1D the error sensitivity is given by Nl1D +1 X ¡ l1D ,k ¢ l1D ,k 0 l1D +1,k δn = f sn × δm ×wn,m , (8) m=1
where Nl1D +1 represents the number of neurons in the next layer l1D + 1, wn,m is the synaptic weight from the neuron n in the layer l1D to the neuron m at layer l1D +1,k l1D + 1 and δm is the error sensitivity of the neuron m at layer l1D + 1. The error sensitivities for the last 2-D layer are calculated using the previous equation but rearranged into a 2-D grid. In the others 2-D layers l2D , the error sensitivity for each neuron at the position (u, v) is given by l2D ,k δu,v
=f
0
¡
l2D ,k su,v
¢
imax h
max jh
X
×wu,v ×
X
l2D +1,k γi,j ,
(9)
i=imax j=jlmax l 2D ,k is the weighted sum input for the neuron (u, v), wu,v is the weight where slu,v l2D +1,k associate to the neuron (u, v) at layer l2D and γi,j is given by ( l2D +1,k δi,j il ≤i≤ih , jl ≤j≤jh l2D +1,k γi,j = , (10) l2D +1,k −δi,j otherwise
l2D +1,k being δi,j the error sensitivity for the neuron (i, j) at the next layer, and imax , l max max max ih , jl , jh , il , ih , jl and jh were calculated by & ' & '
u − (rl+1 + hl+1 ) gl+1 − hl+1
imax = l
$ imax h
% u−1 gl+1 − hl+1
=
& jlmax
=
v − (rl+1 + hl+1 ) gl+1 − hl+1
$ jhmax
=
$ + 1, ih =
'
& + 1, jl =
+1
(11)
(12)
' v − rl+1 gl+1
$ + 1, jh =
+1
% u−1 gl+1
% v−1 gl+1 − hl+1
u − rl+1 gl+1
+ 1, il =
+1
(13)
% v−1 gl+1
+1
(14)
Therefore, the error gradient to the weights and the biases can be derived through the next equations. • 1-D Weights: the error gradients for the 1-D synaptic weight wm,n from the neuron m at layer l1D − 1 to the neuron n at layer l1D for all the images input K, are given by K X ∂E l1D −1,k = δnk ym . (15) ∂wm,n k=1
27
Neural Network World 01/09
• 2-D Weights: the 2-D synaptic weight wu,v of neuron (u, v) at layer l2D to layer l2D + 1 is calculated by max jh imax K h X X X ∂E l2D +1,k l2D ,k = yu,v × γu,v , (16) ∂wu,v max max i=il
k=1
j=jl
• Biases: the error gradients for the bias of neuron n, bn , at 1-D layer l1D and of neuron u, v, bu,v , at 2-D layer l2D are respectively given by K
K
k=1
k=1
X X ∂E ∂E k = = δnk , δu,v ∂bn ∂bu,v
(17)
The weights were then recalculated through the use of the Gradient Descent method [19] which completes the training phase of the I-PyraNet.
5.
Satellite Images Segmentation
Nowadays, there is an increasing concern with the amount of forest area existing on Earth. Projects like Deter [5], used by the Brazilian Federal Government, tries to define the devastated vegetation area in the Amazon forest using unsupervised techniques to segment a satellite image, then an expert defines for which class each area belongs. This is the crucial difference between the unsupervised and supervised methods, the second one does not need necessarily any further classification. Recent works developed to perform the recognition of satellite images has the need of use different spectrals of the same image to be able to realize a good classification [20, 21]. However, the task accomplished here intends to detect forested areas in single graylevel satellite images that are easier and cheaper to be acquired. The model proposed must be compared with other existing image segmentation algorithms. Several methods of unsupervised segmentation used in many other situations were used to compare the results of the SCRF model and the I-PyraNet classifier. Those methods are k-Means [22], Otsu [23] and Fuzzy C-Means (FCM) [24]. Also, two supervised techniques are also used. The k-NN pixel-by-pixel [7] applied upon graylevel pixels; and the Multilayer Perceptron [8] which is the only classifier applied upon colored pixels. The SCRF model is applied using a k-NN classifier with static measures of the pixels inside the sub-images (mean, standard-deviation, kurtosis and skewness), too. All satellite images used in this work were taken from Google MapsTM [25] and have 900×450 pixels, they can be accessed in the Web1 . Two images taken from Manaus, a Brazilian city, representing forested and not forested area were used as the training images to all the supervised classifiers, Figure 3 shows examples of these images. The test images were extracted from three cities of different regions from Brazil (Jundiai, Manaus and Recife) and they were manually segmented for comparison purposes. The classification rate is calculated by a comparison made pixel-by-pixel, where the error rate of a given image is the number of the pixels misclassified divided by the total number of pixels in the image. 28
Fernandes et al.: Nonclassical Receptive Field Inhibition Applied to Image Segmentation
Fig. 3 Images taken from the map service Google MapsTM [25] used to train the classifiers. (a) Not forested area. (b) Forested area. Both of the images are from Manaus, the capital city of the state of Amazon in Brazil.
10.5 10 9.5
Error rate (%)
9 8.5 8 7.5 7 6.5 6
0
200
400
600 Values of k
800
1000
1200
Fig. 4 Error rate, vertical axis, for different values of k, horizontal axis, in the combination between SCRF and the k-NN classifier applied over an image taken from Manaus.
29
Neural Network World 01/09
Fig. 5 Error rate using an overlap of size 0 for both pyramidal layers for different receptive fields configurations. The SCRF model used in this work generates sub-images of size 18×18 with an overlapping of 6 pixels, generating a number of 1250 sub-images per image. First, it is analyzed the results of the combination between the SCRF and the k-NN classifier for different values of k, the result is presented in Figure 4. The best classification result is attained with k = 100 therefore this will be the value of k to be used to classify the other satellite images. The I-PyraNet utilized in this work has two pyramidal layers and one output layer with two neurons each one giving the a posteriori probability of the input image belongs to a forested or to a not-forested region. Many configurations for both pyramidal layers were tested to perform the forest detection task. Figure 5, Figure 6 and Figure 7 present the results for different receptive fields configurations using respectively an overlap factor of 0, 1 and 2 for both layers. The best configuration with the lowest error rate of 7.67% was achieved with a receptive field of size 3×3 and 2×2 for the first and the second pyramidal layer respectively, having an overlap of 1 pixel for both layers. In Figure 8 is presented the error rate with different inhibitory fields sizes for the best receptive field and overlap configuration. It is easy to see that the inhibitory field of size 1 in the first layer and 2 in the second layer achieved the lowest error rate of 7.18%. An important point is to observe that a PyraNet is nothing more than an I-PyraNet with all the values of the inhibitory fields equals to 0. The following results in this work were calculated using the best configurations for the receptive field, inhibitory field and overlap size. Finally, Table I and II shows the average classification error rates for all the algorithms with the best result for each image in bold, where SCRF-IPN, SCRF-PN and SCRF-NN represent the combination of the SCRF model with the I-PyraNet, PyraNet and the k-NN classifier respectively. PN is the application of the PyraNet classifier alone, which means that it is applied like a sliding window through the entire image. The 100-NN represents the results obtained with the k-NN pixelby-pixel classifier with k = 100 and MLP represents the Multilayer Perceptron 1 http://cin.ufpe.br/˜bjtf/SCRF
30
Fernandes et al.: Nonclassical Receptive Field Inhibition Applied to Image Segmentation
Fig. 6 Error rate using an overlap of size 1 for both pyramidal layers for different receptive fields configurations.
Fig. 7 Error rate using an overlap of size 2 for both pyramidal layers for different receptive fields configurations.
Fig. 8 Error rate using the best receptive field and overlap configuration for different inhibitory fields sizes.
31
Neural Network World 01/09
Tab. I Error Rate in % of Forest Detection for the Supervised Classifiers Classifier Jundiai-1 Jundiai-2 Jundiai-3 Manaus-1 Manaus-2 Manaus-3 Manaus-4 Recife-1 Recife-2 x ¯
SCRF-IPN 12.83 11.13 8.18 5.51 5.36 7.37 9.03 2.49 2.73 7.18
SCRF-PN 14.27 11.91 8.46 5.73 5.57 7.63 9.51 2.72 3.31 7.68
SCRF-NN 28.17 19.80 13.61 5.67 5.49 6.48 14.79 3.65 4.91 11.40
PN 17.11 13.86 10.91 7.04 6.67 8.98 11.50 3.34 4.20 9.29
100-NN 37.81 35.77 35.77 55.73 53.63 16.03 31.75 48.26 47.18 40.21
MLP 16.87 28.96 30.15 23.75 27.3 8.98 34.52 23.98 23.18 24.19
Tab. II Error Rate in % of Forest Detection for the Unsupervised Classifiers Classifier Jundiai-1 Jundiai-2 Jundiai-3 Manaus-1 Manaus-2 Manaus-3 Manaus-4 Recife-1 Recife-2 x ¯
32
K-Means 23.64 35.16 22.53 13.97 22.97 45.52 36.67 15.98 16.95 25.93
Otsu 23.41 36.01 21.99 13.79 23.27 45.69 36.67 16.14 17.38 26.04
FCM 23.41 33.07 21.14 13.17 21.48 46.90 34.98 15.35 16.66 25.13
Fernandes et al.: Nonclassical Receptive Field Inhibition Applied to Image Segmentation
Tab. III False Positive and Negative rates in % of Forest Detection for the SCRF+I-PyraNet Type of error Jundiai-1 Jundiai-2 Jundiai-3 Manaus-1 Manaus-2 Manaus-3 Manaus-4 Recife-1 Recife-2 x ¯
False positive (%) 11.97 6.05 2.57 1.08 1.13 1.25 6.61 1.28 1.42 3.71
False negative (%) 0.86 5.08 5.61 4.43 4.23 6.12 2.42 1.21 1.31 3.47
Amount of Forest (%) 37.85 36.05 35.94 56.92 54.21 16.11 31.88 48.74 47.44 40.57
classifier with color inputs. The tables shows the error rate for all the test images and the means value for each classifier. First, it is easy to see that the k-NN pixel-by-pixel and the unsupervised algorithms had the highest error rate among all the algorithms, the MLP classifier had a smallest improvement in the results, reaching an amount of 24.19% in the error rate. On the other hand, the application of the SCRF model together with the PyraNet reduced the error rate in 1.61 percentile points in comparison to the classifier that uses only the PyraNet. Also, the application of the I-PyraNet reduce the error rate even more, reaching the lowest error rate of 7.18%. It is important to see that the lowest error rates for all the images happened when the SCRF model was applied. The Table III presents the false positive and false negative rates for all the images using the SCRF model and the I-PyraNet classifier. It is easy to see that the classification error is well distributed between the two types of errors showing that the SCFR plus the IPyraNet is a good approach to separate forest and not-forest regions in a satellite image. Figure 9 presents the segmentation realized over the Recife-2 image with the SCRF model and the I-PyraNet, the PyraNet and the k-NN classifier. To generate a more trustable result, a technique proposed in [26], the Probabilistic Rand Index, is used to measure the similarity among the images generated by the methods here tested and the manually segmented images. This technique is a similarity function very useful for assessing partition consistency and providing sensible aggregate scores over several images for rating different algorithms. The Table IV presents the classification rate for all the classifiers in comparison with two manually segmented image from the city of Jundiai. One more time the SCRF model with the I-PyraNet classifier reached the best classification rate. In order to perform a more detailed comparison between the used algorithms, Table V shows the computational time required to classify one image of 900×450 pixels in seconds. The time spent with the k-NN classifiers were the highest, particularly the 100-NN performing a comparison pixel-by-pixel took seven hours to classify a single satellite image. The MLP classifier also present a slow classification 33
Neural Network World 01/09
Fig. 9 Test image segmentation example, (a) original image, (b) manually segmented image and (c) image segmented by the SCRF model with the I-PyraNet.
Tab. IV Probabilistic Rand Index over Jundiai-1 Algorithm Classification Rate SCRF-IPN 0.88 SCRF-PN 0.77 SCRF-NN 0.61 PN 0.73 100-NN 0.49 MLP 0.75 k-Means 0.64 Otsu 0.64 FCM 0.64
34
Fernandes et al.: Nonclassical Receptive Field Inhibition Applied to Image Segmentation
time, this might be explained due to the fact that the classification was done pixelby-pixel. The unsupervised algorithms demonstrated to be faster when compared to those ones, but if this comparison is done with the other supervised algorithms they are much slower with an exception to the Otsu method that is the fastest among all the algorithms. It is important to notice that the use of the SCRF model decreases a lot the classification time in comparison with the pixel-by-pixel approaches, while the use of inhibitory fields increased the classification time of the regular PyraNet in only 0.02 seconds. It is easy to see that the SCRF model and the I-PyraNet classifier brought gains in the classification rate without interfering at the time spent in the recognition task. Tab. V Classification Speed in Seconds per image of 900×450 pixels Algorithm Time Spent SCRF-IPN 0.58 SCRF-PN 0.56 SCRF-NN 77.25 PN 0.47 100-NN 25200.00 MLP 12.73 K-Means 2.05 Otsu 0.18 FCM 5.60
6.
Conclusions
In this work, we proposed a new model to perform image segmentation and classification. The basic idea behind the model is to consider not only the value of the pixel, but also to take in consideration its neighbor pixels to define a pixel class. We also proposed a hybrid implementation of the PyraNet based on the idea of inhibitory fields, called I-PyraNet. The results obtained with the SCRF showed that it is a very good model to be applied with any of the three classifiers, k-NN, PyraNet and I-PyraNet, the last one presented the best results among all the classifiers. Although, only these three classifiers have been used in the SCRF model, any other supervised classifier can be applied instead. We have applied the SCRF model and the I-PyraNet classifier together to perform a forest detection task over satellite images. In some cases the proposed method achieves an error of only 2.49% in the amount of wrong pixels when compared to manually segmented images. In general the errors presented with the SCRF plus the I-PyraNet were not higher than 10%. It is straightforward to see that the results obtained with the methods here proposed are better than the ones obtained with the others classifiers, being a good approach to realize the image segmentation task. 35
Neural Network World 01/09
References [1] X.-J. Cao, B.-C. Pan, S.-L. Zheng, and C.-Y. Zhang, “Motion object detection method based on piecemeal principal component analysis of dynamic background updating,” International Conference on Machine Learning and Cybernetics, vol. 5, pp. 2932–2937, 2008. [2] B.-Z. Shaick and L. Yaroslavsky, “Accelerating face detection by means of image segmentation,” 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications, vol. 1, pp. 411–416, 2003. [3] B. T. M. Cornet and L. J. M. Rothkrantz, “Recognition of Car License Plates using a neocognitron type of Artificial Neural Network,” Neural Network World, vol. 13, no. 2, pp. 115–132, 2003. [4] M. Senouci, A. Liazid, H. A. Beghdadi, and D. Benhamamouch, “A segmentation method to handwritten word recognition,” Neural Network World, vol. 17, no. 3, pp. 225–236, 2007. [5] Y. Shimabukuro, V. Duarte, M. Moreira, E. Arai, B. Rudorff, L. Anderson, F. Santo, R. Freitas, L. Aulicino, L. Maurano, and J. Arago, “Detec¸c˜ ao de ´ areas desflorestadas em tempo real: conceitos b´ asicos, desenvolvimento e aplica¸ca ˜o do projeto DETER,” in Instituto Nacional de Pesquisas Espaciais, 2005. [6] B. J. T. Fernandes, G. D. C. Cavalcanti, and T. I. Ren, “Classification and Segmentation of Visual Patterns Based on Receptive and Inhibitory Fields,” Eighth International Conference on Hybrid Intelligent Systems, pp. 126–131, 2008. [7] C. Meurie, G. Lebrun, O. Lezoray, and A. Elmoataz, “A supervised segmentation scheme for cancerology color images,” Signal Processing and Information Technology, vol. 14, pp. 664 – 667, 2003. [8] S. L. Phung, A. Bouzerdoum, and D. Chai, “Skin segmentation using color pixel classification: analysis and comparison,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 1, pp. 148–154, 2005. [9] S. Haykin, Neural Networks: A Comprehensive Foundation. IEEE Press, 1999. [10] D. H. Hubel, “The visual cortex of the brain,” Scientific American, no. 209, pp. 54–62, 1963. [11] S. L. Phung and A. Bouzerdoum, “A pyramidal neural network for visual pattern recognition,” IEEE Transactions on Neural Networks, vol. 18, no. 2, pp. 329–343, 2007. [12] G. Rizzolatti and R. Camarda, “Inhibition of visual responses of single units in the cat visual area of the lateral suprasylvian gyrus (clare-bishop area) by the introduction of a second visual stimulus,” Brain Research, vol. 88, no. 2, pp. 357–361, 1975. [13] C. Grigorescu, N. Petkov, and M. A. Westenberg, “Contour detection based on nonclassical receptive field inhibition,” IEEE Transactions on Image Processing, vol. 12, no. 7, pp. 729– 739, 2003. [14] M. Levine and J. Shefner, Fundamentals of sensation and perception. Oxford University Press, 2000. [15] J. I. Nelson and B. J. Frost, “Orientation-selective inhibition from beyond the classic visual receptive field,” Brain Research, vol. 139, p. 359365, 1978. [16] H. C. Nothdurft, J. L. Gallant, and D. C. van Essen, “Response modulation by texture surround in primate area V1: correlates of “popout” under anesthesia,” Visual Neuroscience, vol. 16, no. 1, p. 1534, 1999. [17] Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computing, vol. 1, no. 4, pp. 541–551, 1989. [18] C. M. Bishop, Pattern recognition and machine learning. Oxford, U.K.: Clarendon, 2007. [19] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, pp. 318–362, 1986. [20] Y. Venkatesh and S. K. Raja, “On the classication of multispectral satellite images using the multilayer perceptron,” Pattern Recognition, vol. 36, no. 9, p. 21612175, 2002.
36
Fernandes et al.: Nonclassical Receptive Field Inhibition Applied to Image Segmentation
[21] K. Venkatalakshmi, S. Sridhary, and S. MercyShaliniez, “Neuro-statistical classication of multispectral images based on decision fusion,” Neural Network World, vol. 16, no. 2, pp. 97– 107, 2006. [22] J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. [23] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Transactions on Systems, Man and Cybernetics, vol. 9, pp. 62–66, 1979. [24] J. C. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,” Journal of Cybernetics, vol. 3, no. 3, pp. 32–57, 1973. [25] “Google MapsTM .” http://maps.google.com. [26] R. Unnikrishnan and M. Herbet, “Measures of similarity,” in IEEE Workshop on Applications of Computer Vision, pp. 394–400, 2005.
37