Geometry-Invariant Texture Retrieval Using a ... - MIT Press Journals

2 downloads 0 Views 1MB Size Report
This letter proposes a novel dual-output pulse coupled neural network model (DPCNN). The new model is applied to obtain a more stable tex- ture description in ...
LETTER

Communicated by Jason Kinser

Geometry-Invariant Texture Retrieval Using a Dual-Output Pulse-Coupled Neural Network Xiaojun Li [email protected]

Yide Ma [email protected]

Zhaobin Wang [email protected]

Wenrui Yu [email protected] School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu Province 730000, China

This letter proposes a novel dual-output pulse coupled neural network model (DPCNN). The new model is applied to obtain a more stable texture description in the face of the geometric transformation. Time series, which are computed from output binary images of DPCNN, are employed as translation-, rotation-, scale-, and distortion-invariant texture features. In the experiments, DPCNN has been well tested by using Brodatz’s album and the VisTex database. Several existing models are compared with the proposed DPCNN model. The experimental results, based on different testing data sets for images with different translations, orientations, scales, and affine transformations, show that our proposed model outperforms existing models in geometry-invariant texture retrieval. Furthermore, the robustness of DPCNN to noisy data is examined in the experiments. 1 Introduction Texture analysis is of great importance in the analysis and interpretation of images. Texture can reflect the overall visual patterns or subpatterns of local variations within the gray-scale content and shape of objects. It can be found in almost all digital pictures. Many existing texture analysis methods assume that all texture images are identical with respect to rotation, scale, and translation. However, it is hard to ensure that all the images have been captured under the conditions of the same angle, focal length, and position. Thus, extracting invariant texture features has been extensively investigated (Fountain, Tan, & Baker, 1998; Zhang & Tan, 2002). Most popular methods for texture retrieval are statistics-based, model-based, or filtering approaches, such as the gray-level co-occurrence matrix (GLCM) (Soh & Tsatsoulis, 1999), the hidden Markov model (HMM) (Chen & Kundu, Neural Computation 24, 194–216 (2012)

c 2011 Massachusetts Institute of Technology 

Geometry-Invariant Texture Retrieval

195

1994), the gaussian Markov random field (GMRF) model (Cohen, Fan, & Patel, 1991; Deng & Clausi, 2004), Gabor filters (Manjunath & Ma, 1996; Han & Ma, 2007), and wavelet transform (Manian & Vasquez, 1998; Muneeswaran, Ganesan, Arumugam, & Soundar, 2005). The pulse-coupled neural network (PCNN), which is biologically inspired by pulse synchronization within the mammalian visual cortex, has been widely applied to image processing and pattern recognition (Ranganath, Kuntimad, & Johnson, 1995; Lindblad & Kinser, 2005; Wang, Ma, Cheng, & Yang, 2010; Ma, Zhan, & Wang, 2010). In addition, PCNN is an effective tool in texture retrieval. Since a series of pulse images produced by PCNN can represent the edge, texture, and segment of the original image, PCNN is capable of extracting effective features. As the first pioneer accessing invariant features extraction by PCNN, Johnson (1994) proposed to calculate the summation of 1’s in each binary pulse images to get translation, rotation, scale, distortion, and intensity invariant time series. Zhang, Zhan, and Ma (2007) also used PCNN for antinoise invariant texture retrieval. Kinser et al. have proposed the intersecting cortical model (ICM) (Kinser, 1996; Ekblad, Kinser, Atmer, & Zetterlund, 2004), and Zhan et al. described the spiking cortical model (SCM) (Zhan, Zhang, & Ma, 2009), simplified versions of the PCNN model. Both ICM and SCM can improve accuracy in invariant texture retrieval (Wang & Kinser, 2004; Zhan et al., 2009). Though standard PCNN has commendable ability in features extraction, it has some limitations: there is only one pulse generator in a neuron model; the influence from the neighborhoods to the current neuron does not have the action of an input signal under consideration; and the external stimulus remains unchanged. This letter proposes a dual-output PCNN model (DPCNN) that introduces some modifications to standard PCNN to overcome these weaknesses. Experimental results show that the proposed method is not only feasible and efficient in geometry-invariant texture retrieval but also robust to noise. The rest of this letter is organized as follows. In section 2, the basic theories of the standard PCNN model are briefly reviewed, and the proposed DPCNN model is introduced in detail. Section 3 presents a geometryinvariant texture retrieval algorithm based on DPCNN. Experimental results and comparisons are provided in section 4. Section 5 summarizes this work. 2 PCNN and DPCNN Models Since the proposed DPCNN model is a modified version of the standard PCNN model, we review the PCNN model briefly and then provided improvements on it for practical application. 2.1 Standard PCNN Model. The PCNN is a laterally connected feedback network of pulse-coupled neurons. As shown in Figure 1, each neuron

196

X. Li, Y. Ma, Z. Wang, and W. Yu

Figure 1: Structure of the PCNN model.

has three parts: the receptive field, the modulation field, and the pulse generator. The mathematical description of PCNN is given by the following equations (Johnson, Ranganath, Kuntimad, & Caulfield, 1998; Johnson, & Padgett, 1999): Fi j [n] = e −α F Fi j [n − 1] + VF



Mi jkl Ykl [n − 1] + Ii j ,

(2.1)

Wi jkl Ykl [n − 1],

(2.2)

kl

L i j [n] = e −αL L i j [n − 1] + VL

 kl

  Ui j [n] = Fi j [n] 1 + β L i j [n] ,  1 Ui j [n] > E i j [n] Yi j [n] = , 0 other wise

E i j [n + 1] = e −α E E i j [n] + VE Yi j [n] ,

(2.3) (2.4) (2.5)

where the index (i, j) and index (k, l) refer to the current neuron and its neighborhood, respectively. n denotes the iteration number, and I is the external stimulus. VF , VL , and VE are normalizing constants. α F , α L , and α E are the time constants. β is the linking strength of the PCNN. F and L are two kinds of inputs in the receptive field. The linking input L receives a local stimulus from its neighbors via linking synapse W. And the feeding input receives not only a local stimulus but also an external stimulus. In the modulation field, internal activity U is generated by the modulation of F and L. The dynamic threshold E receives feedback from the states of output Y. If the neuron is stimulated, the dynamic threshold will have an additional increasing VE . Otherwise, it gradually decreases at iteration. When the internal activity surpasses the dynamic threshold, the output pulse is produced.

Geometry-Invariant Texture Retrieval

197

Figure 2: Structure of the DPCNN model.

2.2 DPCNN model. In order to make the PCNN more suitable for feature extraction, we propose a new, improved model, the DPCNN. The neuron structure of the improved model is shown in Figure 2. Compared with PCNN, there are two pulse generators in the DPCNN, the external stimulus S changes according to the current outputs YF and YU , and the received local stimulus from its neighbors is controlled by the external stimulus Sij . The following expressions describe its mathematical model:  Fi j [n] = f Fi j [n − 1] + Si j [n] γ + VF  YiFj

[n] =

1 Fi j [n] > E i j [n] 0 other wise

 [n] =

 Mi jkl YklU

[n − 1] , (2.6)

kl

Ui j [n] = Fi j [n] + VU Si j [n]

YiUj



,



(2.7)

Wi jkl YklF [n],

(2.8)

,

(2.9)

kl

1 Ui j [n] > E i j [n] 0 other wise

E i j [n + 1] = g E i j [n] + VE YiUj [n] , (2.10)     Si j [n + 1] = 1 − YiUj [n] + YiFj [n] Si j [n] + YiUj [n] − YiFj [n] Ai j . (2.11) In the DPCNN model, f and g are decay constants that are less than 1. S stands for the external stimulus. M and W are the connection weights through which the neuron communicates with its neighbors. γ is a constant value and determines the feeding strength of the external stimulus. Feeding output YF depends on the feeding input F and dynamic threshold E. And compensating output YU is determined by the comparison between the

198

X. Li, Y. Ma, Z. Wang, and W. Yu

internal activity U and the dynamic the threshold E. A is the adjustment signal. Each neuron of DPCNN is a dynamic neuron and may generate pulses when stimulated by its feeding input or internal activity. First, feeding input Fij receives an external stimulus and compensating outputs of surrounding neurons. When Fij is greater than Eij , the neuron ij produces the feeding output pulse. Furthermore, the data from the feeding outputs of surrounding neurons, the feeding input, and the external stimulus work together to change Uij . Once the internal activity of the neuron ij exceeds its dynamic threshold, the compensating output pulse will generate. Finally, the dynamic threshold Eij and the external stimulus Sij are updated. Before we describe implementing DPCNN, we define some symbols: • multiplies each element in one matrix by the corresponding element in the other one, × denotes the multiplication between a constant and a matrix, and ⊗ indicates a two-dimensional convolution. DPCNN is implemented as follows: 1. Initializing parameters and matrices: F = Y F = YU = 0, E = 1. The total number of iterations is N, and the iteration number is n = 1. I is the input signal, which is normalized between 0 and 1: S = I . Parameters (M, W, f, g, γ , VF , VU and VE ) are determined manually, among which f and g are  greater than 0 and less than  1. Generally K = M = W and Ai j = kl K i jkl Ikl /σ i j where σ i j = kl K i jkl . 2. F[n] = f × F [n − 1] +S[n] • (γ +VF × YU [n − 1] ⊗ M). If Fij [n] > Eij [n], then YiFj [n] = 1, else YiFj [n] = 0. 3. U[n] = F [n] +VU × S[n] • (YF [n] ⊗ W). If Uij [n] > Eij [n], then YiUj [n] = 1, else YiUj [n] = 0. 4. If YiUj [n] = 0, then Eij [n + 1] = g × E i j [n]; otherwise Eij [n + 1] = VE + g × E i j [n]. 5. Update S[n + 1] = (1−YU [n] + YF [n]) • S[n] + (YU [n] − YF [n]) • A. 6. If n < N, then n = n + 1 and go back to step (2) Else end. The DPCNN model inherits some good properties from the standard PCNN, such as the mechanism of synchronous pulse bursts and the exponential attenuation characteristic of the threshold. The exponential attenuation characteristic of the threshold is seen in equation 2.10. Suppose the first compensating output pulse of neuron ij is generated at time n0 and generated again after Tth iterations. Then the threshold at the first iteration can be described by E i j [1] = g E i j [0] + VE YiUj [0] = g. When n ≤ n0 , we get E i j [n] = g n = e n ln(g) .

(2.12)

Geometry-Invariant Texture Retrieval

199

Figure 3: Exponential attenuation curve of the dynamic threshold (g = 0.8).

When n0 < n ≤ n0 + T, the threshold at the nth iteration is E i j [n] = g n + VE g n−n0 −1 = (1 + VE g −(n0 +1) )e n ln(g) = εe n ln(g) ,

(2.13)

where ε is a constant value. A similar conclusion, that the dynamic threshold decreases according to exponential law, can be found after the (n0 + T)th iteration. The exponential attenuation of dynamic threshold is shown in Figure 3. Suppose the compensating output pulse of the neuron ij is produced at time m. Then we get Ui j [m] ≈ E i j [m] = εe m ln(g) . So m≈

  ln Ui j [m] ln (ε) − . ln (g) ln (g)

(2.14)

And according to equations 2.6 and 2.8, we have   Mi jkl YklU [m − 1] Ui j [m] = f Fi j [m − 1] + Si j [m] γ + VF +VU

 kl

 Wi jkl YklF [m] .

kl

(2.15)

200

X. Li, Y. Ma, Z. Wang, and W. Yu

Because the influence from the neighborhoods to the current neuron is modulated by the external stimulus Sij in the DPCNN, Uij could be regarded as the accumulation of Sij . In other words, internal activity U is actually another expression of external stimulus. Therefore, if we consider the time used for producing the compensating output pulse as the perception toward the external stimulus, equation 2.14 means that the DPCNN model conforms to the Weber-Fechner law (i.e., the relationship between stimulus and perception is logarithmic). Thus, the DPCNN model agrees with the human visual system. Furthermore, it can be seen from Figure 3 that neurons with less internal activity have a greater resolving power than the neurons with more internal activity. Thus, a certain amount of contrast variation in higher intensity is processed coarsely, while the same one in lower intensity is processed accurately. As exhibited in Figure 2, the DPCNN neuron model has two pulse generators—one used to generate feeding output YF and the other to produce compensating output YU . For a noisy image, the neighborhoods, which ought to fire synchronously with corresponding contaminated neurons, may be stimulated earlier or later in the operation of PCNN, inevitably resulting in an unstable output pulse image. However, the DPCNN neurons have an opportunity to capture the neighboring pulses generated at the current iteration (see equations 2.8 and 2.9). This compensating mechanism increases the probability of synchronous pulse bursts of the contaminated neuron with its neighboring neurons. In addition, the external stimulus S may change according to equation 2.11. Once a neuron is stimulated (i.e., it generates a feeding output pulse or a compensating output pulse) by its internal activity instead of its feeding input at current iteration, it will update its external stimulus by an adjustment value. Compared with the standard PCNN model, the DPCNN model can get more stable output pulse images, and it also can automatically change the external stimulus of the neuron. Furthermore, the DPCNN model is believed to be coincident with human visual characteristics. These improved characteristics are propitious for image processing. 3 Geometry-Invariant Texture Retrieval This section introduces the geometry-invariant texture retrieval algorithm using DPCNN. 3.1 Feature Extraction and Similarity Measurement. To identify a query texture in a texture database, two basic tasks need to be discussed in a typical content-based texture retrieval system. The first is to generate features that can accurately represent the content of each texture, and the second is to find an effective similarity measurement that quantifies the differences between two textures.

Geometry-Invariant Texture Retrieval

201

The outputs of PCNN or DPCNN are a series of binary images that describe the magnitude of the external inputs and the connections among neurons. Johnson (1994) first constructed the time series as a feature vector by using the output binary images of PCNN, which is invariant to translation, rotation, scaling, and distortion of input image. Similarly, in the DPCNN, a geometry-invariant feature vector is obtained, G [n] =



YiUj [n],

(3.1)

ij

where n = 1, 2, . . . , N. YU [n] is the compensating output of the DPCNN at the nth iteration, and N is the total number of iterations. (Only compensating output YU is used for the texture retrieval application.) The Brodatz album (Brodatz, 1966) is a well-known texture database for evaluating texture recognition algorithms. It contains 112 different texture classes. Some examples of feature vectors of texture images are shown in figures 4 to 6. In Figure 4, the feature vectors of texture D1 in Brodatz album (Brodatz, 1966) with different scales and rotation angles are similar to the feature vectors of the original texture. Figure 5 indicates that feature vectors are not affected much by translation transform. Figure 6, which gives the feature vectors of 12 textures, shows that each texture has its own unique feature vector G. Similarity between two images is obtained by computing the distance of their corresponding feature vectors. Let G (q ) be the feature vector of a query image and G ( p) be one of the feature vectors in the database, with the distance between two vectors defined as  (3.2) d(q , p) = dn (q , p), n

where dn (q , p) =

(q ) G [n] − G ( p) [n] G (q ) [n] + G ( p) [n]

.

(3.3)

Note that if G (q ) [n] + G ( p) [n] = 0, we set dn (q , p) = 0. 3.2 Implementation of the Geometry-Invariant Texture Retrieval Algorithm. The DPCNN used in this letter is a single-layer twodimensional array of laterally linked neurons. There is a one-to-one correspondence between neurons and the input image, so that the number of neurons in the network is equal to the number of pixels in the input image. The pixel value Iij is regarded as the input signal of neuron ij. The architecture of the geometry-invariant texture retrieval system is shown in Figure 7. It consists of the following steps:

202

X. Li, Y. Ma, Z. Wang, and W. Yu

Figure 4: Samples with different scales (s) and rotation angles (r in degrees) extracted from texture D1 in Brodatz’s album and their corresponding feature vectors. The horizontal axis represents the iteration number n, and the vertical axis shows G[n]/10000.

Geometry-Invariant Texture Retrieval

203

Figure 5: Samples with 40 pixel circular shifts (t) in the right direction for the scaled and rotated texture D1 and their corresponding feature vectors.

204

X. Li, Y. Ma, Z. Wang, and W. Yu

Figure 6: Twelve textures in Brodatz’s album and their corresponding feature vectors.

Geometry-Invariant Texture Retrieval

205

Figure 7: Architecture of the geometry-invariant texture retrieval system.

1. Obtain N compensating output images after operating the DPCNN algorithm for the given query texture image. 2. Extract the feature vector with size 1 × N. 3. Compute the distances between the feature vector of the query texture image and the feature vectors of the texture images stored in the database. 4. Rank the texture images of the database according to the distances of their feature vectors to query one. The ranking is the search result.

4 Experimental Results In this section, we present the experimental details and results. Since different parameters of DPCNN can produce different effects, the section begins with parameter setting. Then the criteria of performance evaluation are given in order to testify to the capability of the proposed algorithm. In the experiments, PCNN, ICM, SCM, and multichannel Gabor filtering method are presented as comparative algorithms. The parameter settings of PCNN are the same as those in Zhang et al. (2007). The parameter settings of ICM are the same as those in Wang and Kinser (2004). The parameters of SCM are set as Zhan et al. (2009). Details about retrieval by multichannel Gabor filtering method are presented in Manjunath and Ma (1996). Since equation 3.3 is applicable only to time series, the distance measurement used in multichannel Gabor filtering method is the same one used in Manjunath and Ma (1996.) 4.1 Parameter Setting. DPCNN has many options for the connection weight between two neurons. In this letter, connection weight varies inversely with the Euclidean distance between two neurons. The connection weight matrix is defined as a λ × λ two-dimensional matrix, where λ is an odd integer. The large λ will make computer simulation costly in time, so connection weight matrixes M and W are given by [0.707, 1, 0.707; 1, 0, 1; 0.707, 1, 0.707]. Other parameters of DPCNN are shown in Table 1. Note that these parameters are manually set by experience.

206

X. Li, Y. Ma, Z. Wang, and W. Yu

Table 1: Parameters Set for DPCNN. Parameters Value

f

g

γ

VF

VU

VE

N

0.3

0.8

2

0.14

0.14

7

37

4.2 Retrieval Performance Evaluation. In texture retrieval, if we have subsamples extracted from each texture image and define these subsamples as one class, the success retrieval rate can be defined as the percentage of correctly classified images in the top R images (ranked by the shortest distances with the query texture image). The mean success retrieval rate can be used to evaluate the retrieval performance of all data sets. Another useful evaluation tool is a precision-recall graph. Precision is defined as the ratio of the number of relevant images retrieved to the total number of images retrieved. Recall is the ratio of the number of relevant images retrieved to the total number of relevant images. Precision is regarded as a measure of exactness, whereas recall is a measure of completeness. It is easy to achieve 100% recall by returning all of the images in the database as an answer set. Therefore, recall alone is not enough to evaluate retrieval performance. It must have considered the value of precision. The precision recall graph describes both the exactness and the completeness of the retrieval result and is used to evaluate retrieval efficiency. 4.3 Experiments for Texture Images with Translation, Rotation, and Scale Changes. The performance of the proposed DPCNN model has been well tested in several experiments using all 112 texture images in Brodatz (1966). In order to obtain the scaled and rotated data sets, we change the Brodatz images by bicubic interpolation. For data set 1 of texture images with translation changes, we crop each 512 × 512 Brodatz image to the standard dimensions of 128 × 128 and then extract 16 subsamples from each of them by 16 circular shifts (20 to 80 with 20 pixel intervals in all four directions: left, right, top, and bottom). Therefore, we create a data set of 1792 texture images. Data set 2 is obtained by expanding all the Brodatz texture images into 2016 128 × 128 different images with 18 orientations (0 to 170 degrees with 10 degree intervals). For data set 3 of 1680 images, we extracted 15 subsamples of size 128 × 128 with 15 scales (0.7–1.4 with 0.05 interval) from each 512 × 512 image in Brodatz. For data set 4 of 8960 (5 × 4 × 4 × 112) texture images, we extracted 80 subsamples of size 128 × 128 with five scales (0.7, 0.85, 1, 1.15 and 1.3), four orientations (0, 45, 90 and 135 degrees), and four circular shifts (40 pixel intervals in all four directions: left, right, top, and bottom) from each image in Brodatz.

Geometry-Invariant Texture Retrieval

207

Table 2: Comparison Results Between the Proposed DPCNN Method and Other Methods. Mean Success Retrieval Rates (%) Data Set Data set 1 (translation) Data set 2 (rotation) Data set 3 (scale) Data set 4 (mixed changes) Number of features

DPCNN

PCNN

ICM

SCM

Gabor

100 98.20 91.86 88.47 37

99.47 87.34 71.41 64.54 37

100 99.27 79.18 77.2 37

100 94.95 87.59 82.87 37

98.97 85.82 78.40 61.58 48

Table 2 compares the proposed DPCNN and other methods. It is found that all of these methods are good at handling the shifted texture images. The experimental results for rotated data set 2 show that the mean success rate of DPCNN is slightly lower than that of ICM and higher than the others. Another observation is that the impact of rotation on retrieval tasks using DPCNN is small. It is also found that DPCNN has effectively improved the scale invariance of PCNN from the experiments with data set 3. For the experiments with joint scaled, rotated, and shifted texture images in data set 4, the mean success retrieval rates decrease, in the order shown for the following five methods: DPCNN, SCM, ICM, PCNN, and Gabor. If both the exactness and the completeness of the retrieval result of data set 4 are considered at the same time, Figure 8 indicates that DPCNN performs better than other methods. It is obvious from Table 2 that the mean success retrieval rates of DPCNN, PCNN, ICM, and SCM are higher than multichannel Gabor filtering method in most cases. The reason is that DPCNN, PCNN, ICM, and SCM are inspired by mammalian vision cortex neuron and closer to a biological neuron than multichannel Gabor filters. In addition, DPCNN, ICM, and SCM are modified versions of PCNN for the purpose of practical applications in image processing. Hence, the retrieval results are always better than for PCNN. 4.4 Noise Robustness. The experiments in this section are carried out to examine the robustness of DPCNN to noisy data. Data set 5 is introduced by adding gaussian noise with zero mean and 11 variations (0.01 to 0.02 with 0.001 interval) in each texture image from Brodatz. (Before adding gaussian noise, the texture images are cropped to the standard dimensions of 128 × 128, and their pixel values are normalized between 0 and 1.) Thus, 1232 128 × 128 noisy images are generated in data set 5. For data set 6, we crop each 512 × 512 image from Brodatz to the standard dimensions of 128 × 128 and then extract 11 subsamples from each of them by adding impulse noise with 11 noise densities (5% to 20%, with a 1.5% interval). Therefore, we create a data set of 1232 noisy images.

208

X. Li, Y. Ma, Z. Wang, and W. Yu

Figure 8: Precision recall graph for joint scaled, rotated, and shifted data set 4. Table 3: Mean Success Retrieval Rates for Noisy Images. Mean Success Retrieval Rates (%) Data Set Data set 5 (gaussian noise) Data set 6 (impulse noise)

DPCNN

PCNN

ICM

SCM

Gabor

92.82 93.67

73.84 49.24

86.24 67.76

84.03 98.45

99.61 64.37

The experimental results of noisy data sets are summarized in Table 3. It is found that multichannel Gabor filtering method is almost immune to gaussian noise. However, it has poor performance on the impulse noisy data set. SCM tolerates impulse noise better than the other four methods, but the retrieval results of SCM are more easily influenced by gaussian noise than DPCNN and ICM are. Table 3 also shows that the proposed DPCNN performs well for both the gaussian and impulse noisy data sets. Further experiments are carried out to examine the noise robustness of the five retrieval methods. In the first step, we crop all texture images in Brodatz to the standard dimensions of 128 × 128 and compute the feature vector of each image as a feature database. For the experiment with gaussian noise, we obtain query images by adding gaussian noise with zero mean and a variance dependent on the required peak signal-to-noise ratio (PSNR) to Brodatz’s texture images. For the experiment with impulse

Geometry-Invariant Texture Retrieval

209

Figure 9: Performance of five methods in the presence of different levels of gaussian noise.

noise, we prepared query images by adding impulse noise with different noise densities to the texture images. The correct rate is defined as B/112, where B is the number of correct retrieved texture images. The correct rates versus different PSNRs are shown in Figure 9. We see in the figure that the multichannel Gabor filtering method performs best with regard to gaussian noise. In addition, the correct rates of DPCNN are always higher than the correct rates of SCM and PCNN, and DPCNN performs better than ICM when PSNR is less than 20 dB, but otherwise the performance of DPCNN is similar to ICM. The experimental results of impulse noise are summarized in Figure 10: DPCNN outperforms the other methods except SCM in the presence of different levels of impulse noise. 4.5 Affine Robustness. In order to examine the affine robustness of the proposed method, affine transformed data set 7 is produced from Brodatz. Their slopes of shear transformation are equally likely drawn at random between −1 and 1. In addition, the angles of rotation are equally likely drawn at random from 0 to 180 degrees, and the scales are equally likely to be generated at random between 0.6 and 1.4 while the aspect ratios of

210

X. Li, Y. Ma, Z. Wang, and W. Yu

Figure 10: Performance of five methods in the presence of different levels of impulse noise.

the affine warps are distributed between 1 and 1.4:0.6. In this way, the Brodatz images are cut into 1680 128 × 128 images by transforming each original texture image into 15 affined versions. Some examples are shown in Figure 11. Figures 12 and 13 compare the DPCNN with other methods and indicate that the proposed DPCNN method is superior to other methods with affine transformed texture images. Another observation is that the mean success retrieval rates of DPCNN, PCNN, ICM, and SCM are higher than the multichannel Gabor filtering method for affine transformed data set 7. In fact, if the texture is seriously distorted, the filtering result of the Gabor filter in a certain scale and orientation will also change greatly. Hence, for a distorted texture, the features produced by Gabor filters are unstable. The features produced by DPCNN, PCNN, ICM, or SCM are more stable by virtue of the mechanism of their synchronous pulse bursts. In other words, the neurons, which have similar characteristics (similar pixel values and similar neighborhoods), will fire synchronously. Thus, the retrieval results for DPCNN, PCNN, ICM, and SCM for affine transformed data set are better than the multichannel Gabor filtering method.

Geometry-Invariant Texture Retrieval

211

Figure 11: Four affine transformed textures used in the affine robustness experiment.

Figure 12: Comparison results of the DPCNN with other methods for affine transformed data set 7.

4.6 Experiments Based on MIT VisTex Database. We also tested the effectiveness of the DPCNN method in several experiments using the complete set of 167 homogeneous texture images in MIT’s VisTex database (MIT Vision and Modeling Group, 1998). (We converted the RGB images in the VisTex database to gray-scale images before the experiments.) Data set 8 is obtained by expanding all VisTex texture images into 3340 128 × 128 images with five scales (0.7, 0.85, 1, 1.15, and 1.3) and four orientations (0, 45, 90,

212

X. Li, Y. Ma, Z. Wang, and W. Yu

Figure 13: Precision recall graph for affine transformed data set 7.

Figure 14: Comparison results of the proposed DPCNN with other methods for joint scaled and rotated data set 8.

and 135 degrees) by bicubic interpolation. Figure 14 gives the mean success retrieval rates of all the methods for data set 8, and the performances in terms of precision and recall are shown in Figure 15. From Figures 14 and 15, it is clear that the performance of DPCNN is better than the others for joint scaled and rotated texture images, even though these texture images are derived from the VisTex database.

Geometry-Invariant Texture Retrieval

213

Figure 15: Precision recall graph for joint scaled and rotated data set 8.

Figure 16: Comparison results of the proposed DPCNN with other methods for affine transformed data set 9.

We also carried out experiments of affine robustness for MIT’s VisTex database. For obtaining the affine transformed data set 9, 15 affined versions of each texture image from the VisTex database are produced in the same way as in section 4.5. The experimental results are summarized in Figures 16 and 17, which demonstrate that the proposed method using DPCNN outperforms the other methods in affine invariant texture retrieval.

214

X. Li, Y. Ma, Z. Wang, and W. Yu

Figure 17: Precision recall graph for affine transformed data set 9.

5 Conclusion This letter has presented a novel DPCNN model and applied it to geometryinvariant texture retrieval. The proposed model has some new characteristics: (1) each neuron has two opportunities to get itself stimulated; (2) the model can automatically change the external stimulus of the neuron; (3) the received local stimulus from a neuron’s neighbors is controlled by the external stimulus; and (4) since the internal activity of DPCNN has been proved to be another expression of an external stimulus, the DPCNN model is believed to be coincident with human visual characteristics. Our experiments, which have been carried out for both Brodatz’s album and MIT’s VisTex database, demonstrate that the proposed method outperforms PCNN, ICM, SCM, and multichannel Gabor filtering method in geometry-invariant texture retrieval. Moreover, once the texture images are contaminated by noise, DPCNN shows good performance as well. The inherent property of DPCNN’s pulse burst could also be used for other image processing applications. We will investigate these in future work. Acknowledgments We thank the editor and anonymous reviewers for their helpful and valuable suggestions. We also appreciate the support and help of Jie Lie. This letter is supported by the National Natural Science Foundation of China (no. 60872109).

Geometry-Invariant Texture Retrieval

215

References Brodatz, P. (1966). Textures: A photographic album for artists and Designers. New York: Dover. Chen, J. L., & Kundu, A. (1994). Rotation and gray scale transform invariant texture identification using wavelet decomposition and hidden Markov model. IEEE Trans. Pattern Anal. Mach. Intell., 16(2), 208–214. Cohen, F. S., Fan, Z., & Patel, M. A. (1991). Classification of rotated and scaled textured images using gaussian Markov random field models. IEEE Trans. Pattern Anal. Mach. Intell., 13(2), 192–202. Deng, H. W., & Clausi, D. A. (2004). Gaussian MRF rotation-invariant features for image classification. IEEE Trans. Pattern Anal. Mach. Intell., 26(7), 951–955. Ekblad, U., Kinser, J. M., Atmer, J., & Zetterlund, N. (2004). The intersecting cortical model in image processing. Nuclear Instruments and Methods in Physics Research, 525(1–2), 392–396. Fountain, S. R., Tan, T. N., & Baker, K. D. (1998). A comparative study of rotation invariant classification and retrieval of texture images. In Proc. British Machine Vision Conf. (Vol. 1, pp. 266–275). Malvern: British Machine Vision Association. Han, J., & Ma, K. K. (2007). Rotation-invariant and scale-invariant Gabor features for texture image retrieval. Image and Vision Computing, 25(9), 1474–1481. Johnson, J. L. (1994). Pulse-coupled neural nets: Translation, rotation, scale, distortion, and intensity signal invariance for images. Applied Optics, 33(26), 6239– 6253. Johnson, J. L., & Padgett, M. L. (1999). PCNN models and applications. IEEE Transaction on Neural Networks, 10(3), 480–498. Johnson, L., Ranganath, H., Kuntimad, G., & Caulfield, H. J. (1998). Pulse-coupled neural networks. In O. Omidvar & J. Dayhoff (Eds.), Neural networks and pattern recognition (pp. 1–56). San Diego, CA: Academic Press. Kinser, J. M. (1996). A simplified pulse-coupled neural network. In Proceedings of SPIE (Vol. 2760, pp. 563–569). Bellingham, WA: SPIE. Lindblad, T., & Kinser, J. M. (2005). Image processing using pulse-coupled neural networks. (2nd ed.). New York: Springer. Ma, Y. D., Zhan, K., & Wang, Z. B. (2010). Applications of pulse-coupled neural networks. Berlin: Springer-Verlag. Manian, V., & Vasquez, R. (1998). Scaled and rotated texture classification using a class of basis functions. Pattern Recognition, 31(12), 1937–1948. Manjunath, B. S., & Ma, W. Y. (1996). Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell., 18(8), 837–842. MIT Vision and Modeling Group. VisTex database. (1998). Available online at http://www.media.mit.edu/vismod/. Muneeswaran, K., Ganesan, L., Arumugam, S., & Soundar, K. R. (2005). Texture classification with combined rotation and scale invariant wavelet features. Pattern Recognition, 38(10), 1495–1506. Ranganath, H. S., Kuntimad, G., & Johnson, J. L. (1995). Pulse coupled neural networks for image processing. In Proceedings of the IEEE Southeast Conference on Visualize the Future (pp. 37–43). Piscataway, NJ: IEEE.

216

X. Li, Y. Ma, Z. Wang, and W. Yu

Soh, L., & Tsatsoulis, C. (1999). Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Transactions on Geoscience and Remote Sensing, 37(2), 780–795. Wang, G. S., & Kinser, J. M. (2004). Texture discrimination and classification using pulse images. In Proceedings of the 33rd Applied Imagery Pattern Recognition Workshop (pp. 55–60). Washington, DC: IEEE Computer Society. Wang, Z. B., Ma, Y. D., Cheng, F. Y., & Yang, L. Z. (2010). Review of pulse-coupled neural networks. Image and Vision Computing, 28(1), 5–13. Zhan, K., Zhang, H. J., & Ma, Y. D. (2009). New spiking cortical model for invariant texture retrieval and image processing. IEEE Transaction on Neural Networks, 20(12), 1980–1986. Zhang, J. G., & Tan, T. N. (2002). Brief review of invariant texture analysis methods. Pattern Recognition, 35(3), 735–747. Zhang, J. W., Zhan, K., & Ma, Y. D. (2007). Rotation and scale invariant antinoise PCNN features for content based image retrieval. Neural Network World, 2, 121– 132.

Received March 24, 2011; accepted May 17, 2011.

Suggest Documents