invariant feature generation procedure for classification and of images using ... filters to the first orientation of the first scale and the energies of ... that all textures in the same class had identical scales and ... frequency bandwidth of the Gabor filter and ,0 represents .... representation (CGR), Ju Han's [7], Lo's [9], Zhi's [8] scale.
Scale and Rotation Invariant Gabor Features for Texture Retrieval Md. Hafizur Rahman, Mark R. Pickering, and Michael R. Frater School of Engineering and Information Technology UNSW at the Australian Defence Force Academy Canberra, Australia Abstract—For image classification applications it is often useful to generate a compact representation of the texture of an image region. The conventional representation of image textures using extracted Gabor wavelet coefficients often yields poor performance when classifying scaled and rotated versions of image regions. In this paper we propose a scale and rotation invariant feature generation procedure for classification and of images using Gabor filter banks. Firstly, to obtain scale and rotation invariant features, each image is decomposed at different scales and orientations. Then, in order to create unique feature vectors, we apply a circular shift operation to both scale and rotation dimensions to shift the maximum value of the Gabor filters to the first orientation of the first scale and the energies of these filtered images are calculated. To demonstrate the effectiveness of our proposed approach we compare its performance with the most recent texture feature generation methods in a classification task. Experimental results show that our proposed feature generation method is more accurate at classifying scaled and rotated textures than the existing methods. Keywords- Gabor filters, scale and rotation invariance, DTCWT, Brodatz, Texture classification
I.
INTRODUCTION
Although there is no strict definition of the texture of an image, it is easily perceived by humans and believed to be a rich source of visual information about the nature and three dimensional shape of physical objects. Generally speaking, textures are complex visual patterns composed of entities, or sub-patterns that have characteristic brightness, colour, shape and size [1]. Texture is a fundamental characteristic of many natural images and also plays an important role in computer vision and pattern recognition. Texture analysis is an essential step for many image processing applications such as industrial inspection, document segmentation, remote sensing of earth resources, and medical imaging. Hence a great number of approaches to texture analysis have been investigated over the past three decades [2]. Gabor filters have been found appropriate for textural processing for several reasons: they have tuneable orientation and radial frequency bandwidths, tuneable centre frequencies, and optimally achieve joint resolution in space and spatial frequency [3]. Therefore, the Gabor representation of textures is very effective and one of the most commonly used methods for feature extraction. In many applications, such as object recognition and target detection, objects of interest may appear at different scales and
orientations. Therefore, in content-based image retrieval systems, the conventional Gabor representation of textures and its extracted features often provide unacceptable performance in retrieving rotated and scaled versions of the query image. In this paper, a new scale and rotation invariant Gabor texture feature generation method is proposed, which involves a simple modification to the conventional Gabor filter family for achieving scale and rotation invariance. The paper is organized as follows. In Section II, previous approaches to scale and rotation invariance are described and their limitations are discussed. The conventional Gabor representation of texture features is described in Section III. In Section IV, the proposed scale and rotation invariant Gabor feature generation method and its extracted invariant features are presented. Section V describes the performance evaluation of our approach for image classification. Conclusions and future works are described in Section VI. II.
RELATED WORK
Numerous methods have been reported in the literature for texture retrieval/classification. Manjunath and Ma [4] showed the best retrieval rate on the Brodatz album [13] and USC texture databases using Gabor filter banks. Their method is generally accepted as the benchmark method for texture retrieval and classification. In this method, the image is filtered using a set of Gabor filters which pass spatial frequencies with different scales and orientations. Figure 1 shows the impulse responses for a typical set of Gabor filters. For each pixel, the outputs of the complete set of filters can be combined into a two-dimensional feature in the scale-rotation space as shown in Figure 1 (c). For an image region, the standard Gabor texture feature is taken as the mean and standard deviation over the image region of the filter outputs at each scale and rotation. Many early approaches that adopted this method assumed that all textures in the same class had identical scales and orientations [11,12,14]. However this assumption does not usually hold in practical applications, as it is difficult to ensure that the texture in a query image region has the same scale and orientation as the target texture. Researchers have begun to address the problem of developing features that are invariant to the scale and rotation of the texture. The major existing approaches include circular shifting [5-6], projecting the coefficient values onto the scale and rotation dimensions separately [7], optimal matching of
scale energies [8], and applying the discrete Fourier Transform (DFT) operation along the scale and rotation dimension [9]. Zhang et al. [5] showed that image rotation in the spatial domain is equivalent to a circular shift of the feature vectors along the rotation dimension. The approaches proposed in [5] and [6] generated rotation invariant texture features by circular shifting the feature elements along the rotation dimension so that the largest feature element was placed first in the feature vector. However these approaches failed if there was also a significant change in the scale of the query and target images Ju Han et al. [7] addressed the problem of scale and rotation invariance by projecting the two-dimensional feature vector onto the scale dimension and the rotation dimension to produce two one-dimensional feature vectors. These two onedimensional vectors were then circular shifted to provide scale and rotation invariance. However the process of projecting the features removes some information from the original twodimensional feature vector and allows some ambiguity in the matching process.
(a)
Zhi Li et al. [8] proposed a scale and rotation method where scale and rotation invariance were achieved by calculating multiple scale energies of the Gabor filtered image and the optimal matching scales of the Gabor texture descriptor were chosen using a matched filter approach along the scale dimension. The main drawback of this method is the requirement to perform multiple matches in the scale dimension to find the scale at which the query descriptor best matches the target descriptor. Lo et al. [9] proposed a scale and rotation invariant (SRI) feature generation method in terms of the Double Dyadic DualTree Complex Wavelet Transform (D3T-CWT), which is an extension of Kingsbury’s DT-CWT. However their proposed feature generation method can be implemented by simply replacing the D3T-CWT coefficients with Gabor filter outputs. In this approach the shift-invariant property of the magnitudes of the DFT was utilized. Their feature was generated by applying a two-dimensional FFT to the feature in scale-rotation space and then using the magnitudes of the output of this FFT as the scale and rotation invariant feature. However the main limitation of this approach is that only the magnitudes of the FFT outputs are used and some information about the original feature is lost when the phase of the FFT outputs is discarded. Li et al. [8] also used this DFT approach to provide rotation invariance. In this paper we propose a method which employs circular shifting of the Gabor filter outputs in both the scale and rotation dimensions. The circular shift is used to place the largest feature element at the first orientation and the first scale. In this way we aim to create a unique feature vector which matches well with similar textures at different scales and/or orientations but differentiates between non-similar textures. Unlike other methods, our proposed method retains all of the available information from the filter outputs. Our method is also more efficient than that proposed by Li et al. [8] since multiple comparisons in the scale energy dimension are not required.
(b) rotation
image region
(c) Figure 1. (a) An ensemble of Gabor filters with frequency (a) =√2, Scales (M) =5, Orientations (N) =6. (b) Magnitude of the Gabor filters for 5 scales. (c) Two dimensional feature vectors are formed from the filter outputs at each pixel of the image region
In the following sections we provide a more detailed description of the standard Gabor texture descriptor and our proposed modifications to this approach that provide scale and rotation invariance. III.
THE CONVENTIONAL GABOR TEXTURE DESCRIPTOR
A two dimensional Gabor function Transform can be written as:
,
and its Fourier
,
2 ,
respectively, where
1 (2)
1
2
and
1
2
.
Here and characterize the spatial extent and frequency bandwidth of the Gabor filter and , 0 represents the center frequency of the filter in frequency domain coordinates ( , . Let , be the mother Gabor wavelet, then this self similar set of Gabor functions , , can be
obtained by rotating and scaling function: ,
through the generating
,
1
,
, are integers
(3)
orientation of the first scale. The details of the algorithm are described in pseudo-code as follows: (a) Decompose an input image , orientations by convolving with
, where and m and n specify the scale and orientation of the wavelet
Expanding a signal using Gabor functions provides a localized frequency description that form a complete and nonorthogonal basis set. ⁄
(4) (5)
, √
, ,
5 6
, , ,
,
where and are the lower and higher frequencies of interest respectively. For a given image , with size wavelet transform is given by the convolution: ,
∑ ∑
∑ ∑ | ∑ ∑ |
, ,
|
|
8
……
10
THE INVARIANT GABOR DESCRIPTOR
To overcome the scale and rotation invariance limitations encountered in conventional Gabor representations we propose a modification to the standard method using circular shifting in the scale and rotation dimensions to create a unique instance of the feature vector. This enables similar descriptors to be generated for images of the same textures but at different scales and orientations. We apply a circular shift operation which shifts the maximum value of the filter outputs to the first
,
(c) Find the position ( value in (d) Circular
and 1
) of the maximum
.
shift -1) row/s and up by
to
the left by 1 column/s.
(e) For each pixel , , restore the circular shifted values in step (d) to the corresponding scale and orientation . Thus the transformation of image , gives a 4-D matrix, ∑ ∑
,
,
,
which represents the invariance properties of 1 scales and 1 orientation bands.
,
for
Therefore the scale invariant and rotation invariant (SRI) and standard deviation of the mean value transformed coefficients can be calculated as ∑ ∑
,
11
and ∑ ∑
9
respectively. Figure 1 shows examples of typical Gabor filters and their magnitudes at different scales. A feature vector can be created using and as feature components. For M scales and N orientation, the feature vector can be written as:
IV.
(b) For each pixel location , from each scale each orientation , construct a matrix, where 0, 1, … , 1 and 0,1, … ,
7
where represents the complex conjugate. The mean and standard deviation of the magnitude of the orientation bands, which are used to construct the texture feature vector, can be calculated as:
scales and , . ,
,
, its Gabor ,
,
,
/ .
respectively, with m=0,1…,M-1, n=0,1…,N-1, and
for
,
12
respectively. Hence the scale and rotation invariant Gabor feature vector can be constructed as , V.
,…,
,
(13)
EXPERIMENTAL PROCEDURE
A. Experimental Setup In our experiments we use exactly the same datasets as used in [8]. The Brodatz (1966) database [13] is widely used as the benchmark for testing texture classification results. The dataset used in our experiments and in [8] contains 13 different primary texture images. Figure 3 shows one sample texture from each class. All texture images used in the experiments have resolution of 512 512 pixels. After applying Gabor filters on each image in the database with different scales and orientations, the
Parameter selection (M=5, N=6) Convolving this filter with texture image
Gabor filter For each scale and each orientation
T01 (bark) T02 (brick) T03 (bubbles) T04 (grass)
Gabor filtered image
Circular shifting operation Mean, Standard
Feature vector (M*N*2)
T05 (leather) T06 (pigskin) T07 (raffia) T08 (sand) deviation,
Image retrieval
Figure 2. The system structure used in our experiments.
T09 (straw) T10 (water) T11 (weave) T12 (wood)
invariant texture features are computed by following the procedure described in Section IV. As in [8] we set the number of scales to 5 and orientations to 6. Therefore, the scale and rotation invariant feature vector in eq. (13) can be written as
Figure 3. The 13 Brodatz textures (T01-T13) used in the experiments.
,
,…,
,
14
The overall system structure used in our experiments is depicted in Figure 2. It consists of the following basic steps: parameter selection, Gabor filtering, scale and rotation invariant feature generation, and texture retrieval. We compared the classification performance of five different feature extraction methods: The conventional Gabor representation (CGR), Ju Han’s [7], Lo’s [9], Zhi’s [8] scale and rotation invariant (SRI) methods, and our proposed circular shift based SRI (CSSRI) method. We tested these methods for rotation invariance only and also for scale and rotation invariance and observed the effects of rotation and/or scale changes on classification results. In Dataset 1, all textures were rotated in steps of 5° from 0° to 360°, and 936 rotated texture images (13 × 72 = 936) are obtained. Each texture has 72 samples and each sample has a resolution of 512 512 pixels. In order to evaluate the concurrent scale and rotation invariance property of the proposed approach, joint rotation and scaling transforms were performed on the 13 texture images to create Dataset 2. Each image was rotated at seven different rotation angles: 0°, 30°, 60°, 90°, 120°, 150° and 200° and then scaled by the factor (s) from 1.2 to 0.3 with 0.1 intervals. Thus, we have 910 texture images (13 × 7× 10 = 910) with 7 orientations and 10 scales and each texture class has 70 ground truths. All images are re-sized to 64 64 by first applying anti-aliasing low pass filtering followed by a downsampling process. B. Evaluation Strategy We measured the performance of each descriptor using classification accuracy and precision-recall graphs. We follow the standard procedure described in [15] and [16] for
T13 (wool)
calculating classification accuracy and [10] for calculating precision and recall values. The classification accuracy Ac for a query image of a given class (containing N samples) is defines as Ac = n/N, where n is the number of correctly classified samples and N is the total number of samples in that class. For a query hitlist of length n, proecision is defined as the number of relevant hits in the hitlist divided by n. Recall is the number of relevant hits in the hitlist divided by the total number of relevant images in the database. P-R graphs show the variation of precision for recall values between 0 and 1. Perfect retrieval is achieved when the precision is 100% for all values of recall. In our experiments, each texture in each class is used as a query, average classification accuracy and average precisionrecall values are calculated for all query images for a given class. Finally, we calculate the mean of the average classification accuracy (overall classification accuracy for all classes) and the mean of the average precision-recall values. As an alternative performance measure, we have used which represents the number of correctly classified textures. VI.
EXPERIMENTAL RESULTS
A. Rotation invariance (Dataset 1) The conventional Gabor representation yields a recognition rate of 62.5 percent for this dataset. It classifies only 585 textures out of 936 so around half of the textures are misclassified. Due to its band directionality and band coverage, it is not scale and/or rotation invariant. It is therefore not surprising that the overall classification rate is 62.5 percent for this method while other scale and rotation invariant methods have higher classification accuracy. The classification results in Table I, show that Ju Han’s [7] method misclassifies 247 textures, 29 textures in Ed Lo’s [9] method are misclassified, 9
Figure 5. Precision-Recall graphs for the five feature extraction methods.
in Zhi Li’s [8] method whereas only 3 in our proposed method are misclassified. Even in this experiment with only rotated images in the database, our proposed approach achieves the highest accuracy (99.7%). The average classification accuracy for each texture and each feature extraction method is shown in Table III in the Appendix. B. Scale and rotation invariance (Dataset 2) The mean of the average classification rates for each method is shown in Table II and details of the average classification accuracy for each texture class for the proposed approach is shown in Table IV in the Appendix. From the results in Table II we can see that as expected the CGR method performs worse than any other method. Among the other four scale and rotation methods, Ju Han’s [7] method provides a classification accuracy of 86.9 percent which is lower than the other three SRI methods. This is because this method discards much of the information present in the original twodimensional feature vector. As Lo’s [9] and Zhi Li’s [10] methods remove some feature information when discarding the phase information of the DFT these methods achieve 94.9 percent and 95.3 percent respectively. Our proposed method achieves the highest average classification accuracy of 97.3 percent. As an alternative performance comparison, the precisionrecall graphs for each method are shown in Figure 5. We can see from the graph that precision of the CGR method falls faster than other methods due its non-invariant nature. Zhi Li’s [7] method and Ed Lo’s [9] show similar performance because both methods use the DFT. Our proposed approach provides the best performance with almost 100% precision for all values of recall. VII. CONCLUSIONS AND FUTURE WORK In this paper we have presented a new scale and rotation invariant feature generation method. The method is based on the circular shift operation and enables a unique feature vector
TABLE I. CLASSIFICATION RESULTS FOR DATASET 1
%
CGR
Han [7]
Lo [9]
Li [8]
CSSRI
62.5
73.6
96.9
99.1
99.7
585
689
907
927
933
TABLE II. CLASSIFICATION RESULTS FOR DATASET 2
CGR
Han [7]
Lo [9]
Li [8]
CSSRI
(%)
76.5
86.9
94.9
95.3
97.3
696
790
864
867
886
to be formed for the same textures regardless of their scale and orientation. Our experimental results demonstrated that the proposed method achieves scale and rotation invariance simultaneously and performs better than existing methods. Our next investigation will be to apply the proposed method to other wavelets and evaluate their individual performance. REFERENCES [1]
[2]
[3]
[4]
[5]
A. Materka, M. Strzelecki, “Texture analysis methods-a review”, Technical University of Lodz, Institute of Electronics, COST B11 report, Brussels, 1998 Kim, S.C., Kang, T.J., "Texture classification and segmentation using wavelet packet frame and Gaussian mixture model," Pattern Recognition. Papers 40(4), 1207-1221(2007). A.C.Bovik, M.Clark, and W.S.Geisler, “Multichann-el texture analysis using localized spatial filters”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.12, no.1, pp.55-73, 1990. B. S. Manjunath and W. Y. Ma. "Texture feature of browsing and retrieval of image data" IEEE Transactions on Pattern Analysis and Machine Inetlligence, Vol 18, No. 8, Aug 1996 D. Zhang and G. Lu. "Content-based Image Retrieval Using Gabor Texture Features". In Proc. of First IEEE Pacific-Rim Conference on
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13] [14]
[15]
[16]
Multimedia (PCM'00), pp.1139-1142, Sydney, Australia, December 1315 , 2000 S.Arivazhagan, L.Ganesan and S.P.Priyal,“Texture classification using Gabor wavelets based rotation invariant features”, Pattern Recognition Letters, vol.27, no.16, pp.1976-1982, 2006. J. Han, K. K. Ma, “Rotation-invariant and scale-invariant Gabor features for texture image retrieval”, Image and Vision Computing, ELSEVIER, Volume, 25, Issue 9, 1 September 2007, Pages 1474-1481 Z. Li, G. Liu, X. Qian, C Wang, “Scale and Rotation Invariant Gabor Texture Descriptor for Texture Classification”, Visual Communications and Image Processing, Proc. of SPIE Vol. 7744 77441T, 2010 E.H.S. Lo, M. Pickering, M. Frater, and J. Arnold. “Scale & rotation invariant features from Dual-Tree Complex Wavelet Transform”. In Proc Int'l Conf. Image Processing, Singapore, Oct. 2004, IEEE E. H. S. Lo, M. R. Pickering, M. R. Frater and J. F. Arnold, Query by Example using Invariant Features from the Double Dyadic Dual-Tree Complex Wavelet Transform. in Proc. Int'l Conf. Image & Video Retrieval, Santorini, Greece, Jul. 2009, ACM. A. H. Kam, N. G. Kingsbury and W. J. Fitzgerald. “Content Based Image Retrieval Through Object Extraction and Querying” In Proc. IEEE Workshop Content-Based Access of Image & Video Libraries, Hilton Head Island, USA, Jun. 2000 B.S. Manjunath, J. R Ohm, V. V. Vasudevan, A. Yamada, “Color and Texture Descriptors”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 6, June 2001 Brodatz, P., “Textures: A photographic album for artists & designers”, Dover, New York, 1966. Kokare, M. ; Chatterji, B.N. ; Biswas, P.K., “Comparison of Similarity Metrics for Texture Image Retrieval”, IEEE Conference on Convergent Technologies for Asia-Pacific Region, TENCON 2003, Vol. 2, pp-571. K. Xu, B. Georgescu, D. Comaniciu, and P. Meer, “Performance Analysis in Content-Based Retrieval with Textures,” Proc. Int’l Conf. Pattern Recognition, vol. 4, pp. 275-278, 2000. S. Lazebnik,C. Schmid, J. Ponce, “A Sparse Texture Representation Using Local Affine Regions.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, August 2005
T08 (sand) T09 (straw) T10 (water) T11 (weave) T12 (wood) T13 (wool) Mean
100.0
100.0
100.0
36.5
49.9
100.0
100.0
100.0
50.0
96.4
100.0
100.0
100.0
60.0
100.0
100.0
100.0
100.0
60.4
47.3
100.0
100.0
99.1
66.5
58.5
83.7
89.8
97.3
62.5
73.6
96.9
99.1
99.7
585
689
907
927
933
TABLE IV. AVERAGE CLASSIFICATION ACCURACY OF THE PROPOSED APPROACH FOR DATASET 2 (%)
Class
Average Accuracy T12 81.9 (wood)
TABLE III. AVERAGE CLASSIFICATION ACCURACY FOR DATASET 1 (%)
T01 (bark) T02 (brick) T03 (bubbles) T04 (grass) T05 (leather) T06 (pigskin) T07 (raffia)
98.7
APPENDIX
Class
68.1
Class T04 (grass)
Average Accuracy 100.0
T09 (brick)
86.2
T05 (leather)
100.0
T13 (wool)
98.4
T06 (pigskin)
100.0
CGR
Han [7]
Lo [9]
Li [8]
CSSRI
T01 (bark)
99.2
T07 (raffia)
100.0
70.7
70.3
100.0
100.0
100.0
T08 (sand)
99.5
T10 (water)
100.0
26.4
40.1
91.8
98.0
100.0
T02 (brick)
100.0
T11 (weave)
100.0
T03 (bubbles)
100.0
64.7
99.6
100.0
100.0
99.2
Mean 65.0
99.6
91.8
100.0
100.0
85.0
54.0
98.0
100.0
100.0
84.7
97.7
95.9
100.0
100.0
73.8
44.7
98.0
100.0
100.0
97.3
The texture classes are sorted in order of increasing classification rate.