Searching Images in a Textile Image Database Yin-Fu Huang and Sheng-Min Lin Department of Computer Science and Information Engineering National Yunlin University of Science and Technology
[email protected],
[email protected]
Abstract. In this paper, a textile image search system is proposed to query similar textile images in an image database. Five feature descriptors about the color, texture, and shape defined in the MPEG-7 specification, which are relevant to textile image characteristics, are extracted from a dataset. First, we tune the feature weights using a genetic algorithm, based on a predefined training dataset. Then, for each extracted feature descriptor, we use K-means to partition it into four clusters and combine them together to obtain an MPEG-7 signature. Finally, when users input a query image, the system finds out similar images by combining the results based on MPEG-7 signatures and the ones in three nearest classes. The experimental results show that the similar images returned from an image database to a query textile image are acceptable for humans and with good quality. Keywords: CBIR, genetic algorithm, K-means, MPEG-7 specification, weight tuning.
1
Introduction
At present, multimedia data have played an important role in our daily life. However, querying a multimedia database by keywords is gradually insufficient to meet users' needs. Thus, facing a huge amount of images in an image database, content-based image retrieval has become a popular and required demand. In the past years, many general-purpose image retrieval systems have been developed [5, 6, 10], and these systems rely mainly on visual features. King and Lau used MPEG-7 descriptors to retrieve fashion clothes [5]. In order to improve query results, Lai and Chen proposed a user-oriented image retrieval system by iteratively interacting with users about query results [6]. Smeulders et al. presented a review of 200 references in content-based image retrieval [10]. In this paper, we propose a textile image search system for querying similar textile images in an image database. This system consists of an offline phase and an online phase. In the offline phase, we tune the feature weights using a genetic algorithm [11], based on a predefined training dataset. Then, for each extracted feature descriptor, we use K-means to partition it into four clusters and combine them together to obtain an MPEG-7 signature [3]. In the online phase, when users input a query image, the system extracts its MPEG-7 visual features first, and then finds out similar images by combining the results based on MPEG-7 signatures and the ones in three nearest classes. Y. Tan et al. (Eds.): ICSI 2014, Part II, LNCS 8795, pp. 267–274, 2014. © Springer International Publishing Switzerland 2014
268
Y.-F. Huang and S.-M. Lin
The remainder of this paper is organized as follows. In Section 2, we present the system architecture and briefly describe the procedure of searching similar textile images. In Section 3, we introduce five feature descriptors relevant to textile image characteristics. In Section 4, the methods of tuning the feature weights and generating MPEG-7 signatures are proposed to facilitate searching similar images. In Section 5, we present the experimental results to evaluate the effectiveness of three search modes provided in our system. Finally, we make conclusions in Section 6.
2
System Architecture
In this paper, we propose a textile image search system consisting of an offline phase and an online phase, as shown in Fig. 1. In the offline phase, five feature descriptors about the color, texture, and shape defined in the MPEG7 specification are extracted from training images; i.e., ColorLayout descriptor, ColorStructure descriptor, EdgeHistogram descriptor, HomogeneousTexture descriptor, and RegionShape descriptors including a total of 221 dimensions. Because each feature plays a different role in distinguishing a textile image from others, feature weights should be determined in order that the discrimination among textile images could be boosted. Here, we use a genetic algorithm to determine feature weights. Then, we build MPEG-7 signatures using k-means clustering on all textile images where the weighted Euclidean distance calculated in k-means clustering takes the feature weights determined in the genetic algorithm. Offline Phase
MPEG-7 Descriptors Color Layout Color Structure
Training Images
Edge Histogram
Feature Weight Tuning
Homogeneous Texture Region Shape
Image Database
K-means Clustering
Online Phase
MPEG-7 Signatures
Images
MPEG-7 Descriptors Color Layout Color Structure
Query Images
Edge Histogram
Similarity Calculation
Result Images
Homogeneous Texture Region Shape
Fig. 1. System architecture
In the online phase, we also extract the same features, as mentioned, from a query image. Then, we find out 1) the images with the same MPEG-7 signature as the query image as the first candidates and 2) the images in three nearest classes to the query image as the second candidates. Finally, we can find out result images most similar to the query image, which appear in both groups of candidates.
Searching Images in a Textile Image Database
3
269
Feature Extraction
In this paper, we adopt the bag-of-feature MPEG-7 [1, 2, 4, 7-9] defined by the MPEG organization, which consists of color description (i.e., two color descriptors), texture description (i.e., two texture descriptors), and shape description (i.e., one shape descriptor), as shown in Table 1. Among them, the descriptors relevant to textile image characteristics are as follows. 1) ColorLayout Descriptor describes the layout and variation of colors and this reflects the color combinations in a textile image. 2) ColorStructure Descriptor counts the contents and structure of colors in a textile image by using a sliding window. 3) EdgeHistogram Descriptor counts the number of edge occurrences in five different directions in a textile image. 4) HomogeneousTexture Descriptor calculates the energies of a textile image in the frequency space, which are the level of gray-scale uniform distribution and texture thickness, and this reflects the texture characteristics in a textile image. 5) RegionShape Descriptor relates to the spatial arrangement of points (pixels) belonging to an object or a region in a textile image. Table 1. MPEG-7 visual descriptor features Type 1 2 3 4 5
4
Feature description ColorLayout ColorStructure EdgeHistogram HomogeneousTexture RegionShape
Dim. 12 32 5 2 35
Overall statistics 1 1 16 31 1
Total number 12 32 80 62 35
Method
K-means clustering is an effective approach to find similar images for a query image, but it is usually dependent on how well stored images are clustered. In reality, the Euclidean distance between two images, used in K-means clustering, plays a major role in determining good clustering. In calculating the Euclidean distance between two images, each kind of involved features (or descriptors) mentioned in Section 3 has its own semantic in measuring their similarity; thus, these features should be assigned with weights in measuring the similarity between two images. In our system, a finest weight set is tuned to represent the weight of each feature involved in an image by using a genetic algorithm. Next, we generate MPEG-7 signatures based on K-means clustering with weighted features. Finally, in the online phase, we find out result images most similar to a query image after the similarity calculation.
270
4.1
Y.-F. Huang and S.-M. Lin
Feature Weight Tuning
In this step, we use a genetic algorithm to tune the weights of features. A genetic algorithm is a search heuristic used to generate useful solutions to optimization and search problems. In general, a typical genetic algorithm requires 1) a genetic representation of a solution domain and 2) a fitness function to evaluate the solution domain. Here, we also use the weighted Euclidean distance as the fitness function to measure the similarity between two images as follows. ,
∑
·
(1)
, , ,…, where A, B are two images with feature sets A and B , , ,…, . Before starting the genetic algorithm, the values of all the 221 features extracted from an image have been normalized in the range [0, 1]. For an initial population of 100 individuals in the genetic algorithm, each one is a set of weight values randomly generated. Only the best 20 individuals with higher fitness values can be alive in the next generation. Then, the 20 individuals are randomly selected to generate 40 children by crossover; in crossover, each feature weight of a child is selected from the corresponding feature weight of parent A or parent B, respectively with 50% probability. Furthermore, in order to avoid being trapped into a local optimal, we also generate another 40 children by crossover plus mutation. In mutation, 10% probability of the feature weights of a child are replaced with new random values. To measure the fitness of each individual, 24 centroids are used to represent 24 pre-defined classes of 679 training textile images, which are calculated from the images in each class. Then, the weighted Euclidean distance can be treated as a classifier; if a training image has the shortest distance with the centroid of a class using the weights of individual x and the matching class is indeed the class of the training image, 1 point is added to the score of individual x, and this score is the fitness of individual x. By iteratively doing so, the best individual (or best feature weights) with the highest score would be found. The genetic algorithm will terminate when the finest weight set becomes stable; i.e., the finest weight set is always the best for 1000 iterations. During the iterations, if a new individual has higher fitness than the old best one, the iteration counter is reset to 0 and the new individual will be examined for the next 1000 iterations. 4.2
MPEG-7 Signatures Based on K-means Clustering
For each extracted visual descriptor, we use K-means to partition it into four clusters respectively, and number them from 0 to 3. Then, we combine the cluster numbers from the five visual descriptors together and obtain a 5-digit MPEG-7 signature. Thus, an MPEG-7 signature can represent the characteristics of an image. An MPEG7 signature has 5 digits and each digit can be 4 different values, so that 4 1024 bins could be used to distinguish the characteristics of images. Since K-Means compresses images into clusters, we would be able to build an index structure more
Searching Images in a Textile Image Database
271
easily using these signatures. Besides, the centroids of K-means on the five visual descriptors are also stored for the similarity measures in the online phase. 4.3
Similarity Calculation
First, we extract the MPEG-7 visual features of a query image. Then, the similarity measures between the visual features extracted from the query image and the recorded centroids of K-means on the five visual descriptors are performed to determine cluster numbers, respectively. The cluster numbers of the most similar centroids are combined together to become the query signature. Then, we can find out the images with the same MPEG-7 signature as the query image as the first candidates. This approach can be treated as the similarity measures based on local views. Next, we find out the images in three nearest classes to the query image as the second candidates where 24 centroids of pre-defined classes have been mentioned in Section 4.1. This approach can be treated as the similarity measures based on global views. Finally, we can find out the most similar images to the query image, which appear in both groups of candidates.
5
Implementation
We have implemented an “Image Search Engine” system to search similar images from an image database to a query textile image. Totally the 4069 images in the textile image database are from Globle-Tex Co., Ltd. [13], where 679 images are training images with pre-defined classes and the others are input to their nearest classes subsequently according to the weighted Euclidean distance as mentioned in Section 4.1. 5.1
User Interface
The user interface of the ISE system is shown in Fig. 2. The “Initialization” button is used to initialize the system (or to do data preprocessing in the offline phase). The “Input” button can be clicked to input a query image, and after the query image is input, Area 1 will record the path name of the query image. Furthermore, the “Execution” button is used to show result images. The radius buttons shown in Area 2 are page switches used to display result images of different pages in Area 3. The number of radius buttons is dynamic and dependent on the number of all result images. The check boxes shown in Area 4 are used to select a search mode; i.e., full (or default) mode, texture-concern mode, and color-concern mode. For the textureconcern mode, we use only the EdgeHistogram Descriptor, the HomogeneousTexture Descriptor, and the RegionShape Descriptor to search similar images. On the contrary, for the color-concern mode, we use only the ColorLayout Descriptor and the ColorStructure Descriptor to search similar images. The result images displayed in Area 3 are ranked according to the similarity degree to a query image. The most similar image is put at the upper-left and the others are sequentially shown in the left-to-right and top-to-bottom way. The similarity degree to a query image is also according to the weighted Euclidean distance.
272
Y.-F. Huang and S.-M M. Lin
Fig. 2. User interface
5.2
Similarity Evaluatiion
Here, we invite ten evaluattors to rate the similar images returned by the ISE system, to their query images. In orrder to observe the effectiveness of the ISE system, we use the acceptable percentage measure m defined by Zhang et al. [12] for rating each reesult image as a 1-to-5 scale (1: not similar, 2: poorly similar, 3: fairly similar, 4: w well similar, and 5: strongly sim milar). Besides, we also use the quality value measuree to evaluate the quality of resullt images as follows. The acceptable percentage measure: m
m1 =
n3 + n4 + n5
5
n
(2)
i =1 i
The quality value measure:
n ×i = n 5
m2
i =1 i 5
(3)
i =1 i
where n1, n2, n3, n4, and n5 are the number of result images with a score of 1, 2, 33, 4, and 5, respectively. 5.3
Experiments and Discussions D
In Experiment 1, each evaluator tests the effectiveness of three search modes ussing images in the database. Sin nce the interpretations of the evaluators on textures in the same image are diverse (i.e., someone focuses on patterns, but someone focusess on tiny formations), their meassures on the full and texture-concern modes are with maajor differences. However, on average, a the acceptable percentages on the full and textuureconcern modes are 81% and a 83%, and the quality values on the full and textuureconcern modes are 3.6 and 3.7, respectively. Furthermore, since the interpretations of
Searching Images in a Textile Image Database
273
the evaluators on colors are more consistent, the average acceptable percentage and quality value on the color-concern mode are 92% and 4.1, respectively. Thus, the system works well in all three modes, when using images in the database. In Experiment 2, each evaluator tests the effectiveness of three search modes using images out of the database. We found that their measures on these three modes are a little decreased, when compared with using images in the database. The average acceptable percentages on the three modes are 73%, 80%, and 73%, and the average quality values are 3.1, 3.6, and 3.6, respectively. The reason could be that the query images out of the database do not pertain to the pre-defined classes in the system, and the system cannot but return the similar images in three nearest classes to the query images.
6
Conclusions
In this paper, we propose and implement a textile image search system to search similar images from an image database to a query textile image. In the system, a finest weight set is tuned for the extracted features involved in an image by using a genetic algorithm. Then, we generate MPEG-7 signatures based on K-means clustering with weighted features. In the online processing, users can find out result images most similar to a query image after the similarity calculation. The experimental results show that the similar images returned from an image database to a query textile image are acceptable for humans and with good quality in all three modes. Although our content-based ISE system can work well for searching textile images, there are still two issues to be overcome in the future. First, the descriptors we used here are still not good enough to describe all the classes of images in our system so that some of them cannot be well classified in the system. Second, for a query image without a pre-defined class, this will lead the system to return unpredictable results. For example, when users input a car image in the worst case, the system has no way to exclude this situation. Acknowledgments. This work was supported by National Science Council of R.O.C. under grant MOST 103-2221-E-224-049.
References 1. Bober, M.: MPEG-7 visual shape descriptors. IEEE Transactions on Circuits and Systems for Video Technology 11(6), 716–719 (2001) 2. Chang, S.F., Sikora, T., Puri, A.: Overview of the MPEG-7 standard. IEEE Transactions on Circuits and Systems for Video Technology 11(6), 688–695 (2001) 3. Huang, Y.F., Chen, H.W.: A multi-type indexing CBVR system constructed with MPEG-7 visual features. In: Zhong, N., Callaghan, V., Ghorbani, A.A., Hu, B. (eds.) AMT 2011. LNCS, vol. 6890, pp. 71–82. Springer, Heidelberg (2011) 4. ISO/IEC 15938-3, Information Technology – Multimedia Content Description InterfacePart3: Visual (2002)
274
Y.-F. Huang and S.-M. Lin
5. King, I., Lau, T.K.: A feature-based image retrieval database for the fashion, textile, and clothing industry in Hong Kong. In: Proc. International Symposium on Multi-technology Information Processing, pp. 233–240 (1996) 6. Lai, C.C., Chen, Y.C.: A user-oriented image retrieval system based on interactive genetic algorithm. IEEE Transactions on Instrumentation and Measurement 60(10), 3318–3325 (2011) 7. Manjunath, B.S., Ohm, J.R., Vasudevan, V.V., Yamada, A.: Color and texture descriptors. IEEE Transactions on Circuits and Systems for Video Technology 11(6), 703–715 (2001) 8. Martinez, J.M., Koenen, R., Pereira, F.: MPEG-7: the generic multimedia content description standard, part 1. IEEE Multimedia 9(2), 78–87 (2002) 9. Martinez, J.M.: MPEG-7 overview (version 10). ISO/IEC JTC1/SC29/WG11 N6828 (2004) 10. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000) 11. Whitley, D.: A genetic algorithm tutorial. Statistics and Computing 4(2), 65–85 (1994) 12. Zhang, Y., Milios, E., Zincir-Heywood, N.: Narrative text classification for automatic key phrase extraction in web document corpora. In: Proc. the 7th Annual ACM International Workshop on Web Information and Data Management, pp. 51–58 (2005) 13. Globle-Tex Co., Ltd., http://www.globle-tex.com/