A Cartoon Image Classification System Using ... - Semantic Scholar

0 downloads 0 Views 2MB Size Report
Abstract. Today cartoon images take more portion of digital multimedia than ever as we notice this phenomenon in the entertainment business. With the ex-.
A Cartoon Image Classification System Using MPEG-7 Descriptors* Junghyun Kim1, Sung Wook Baik1, Kangseok Kim2, Changduk Jung3 and Wonil Kim1** 1

College of Electronics and Information Engineering at Sejong University, Seoul, Korea Tel.: +82-2-3408-3795 [email protected], [email protected], [email protected] 2 Department of Knowledge Information Security at Ajou University, Suwon, Korea [email protected] 3 Department of Computer and Information Science at Korea University, Korea [email protected]

Abstract. Today cartoon images take more portion of digital multimedia than ever as we notice this phenomenon in the entertainment business. With the explosive proliferation of cartoon image contents on the Internet, we seem to need a classification system to categorize these cartoon images. This paper presents a new approach of cartoon image classification based on cartoonists. The proposed cartoon image classification system employs effective MPEG-7 descriptors as image feature values and learns features of particular cartoon images, and classifies the images as multiple classes according to each cartoonist. In the performance simulation we evaluate the effectiveness of the proposed system on a large set of cartoon images and the system successfully classifies images into multiple classes with the rate of over 90%. Keywords: MPEG-7 visual descriptor, Image Classification, Neural Network, Cartoon.

1

Introduction

With the fast development of digital multimedia, we can access much more digital contents, such as cartoon images, on the Internet and TV than ever. As the volume of cartoon image contents is exponentially increased and they are easy to be accessed, we need to think of another issue: How are these cartoon image contents categorized to be easily accessed? Now millions of cartoon images are residing in people’s blogs and cartoon viewers might want to choose their favorite cartoons from the Internet. Thus, to manage a digital library for cartoon image contents, we need an automatic cartoon image classification system. *

“This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (No.2010-0028046)”. ** Corresponding author. H. Deng et al. (Eds.): AICI 2011, Part II, LNAI 7003, pp. 368–375, 2011. © Springer-Verlag Berlin Heidelberg 2011

A Cartoon Image Classification System Using MPEG-7 Descriptors

369

The main purpose of this paper is to apply MPEG-7 standard to cartoon images for image classification according to the cartoonist. By analyzing MPEG-7 descriptors, we create a prototype system that can be used for cartoon image classification techniques under visual environments, and introduce effective methodology of descriptor fusion via experiments. In this paper, we use neural networks for the image classification. An input value for the network is several values of visual features extracted by MPEG-7. We discuss several methods of the image classification in section 2. In section 3, we propose our Neural Network based cartoon image classification system. The simulation results are presented in section 4. We conclude in section 5.

2 2.1

Previous Research MPEG-7 Descriptors and Image Classification Systems

MPEG-7 is a emerging standard that describes media content used in image classification systems. Even though it is not a standard dealing with the actual encoding and decoding of video and audio, it solves the problem of lacking standard to describe visual image content. It uses a XML to store metadata. The aim, scope, and details of MPEG-7 standard are nicely overviewed by Sikora of Technical University Berlin in his paper [1]. Among series of researches that use various MPEG-7 descriptors, Ro et al. [2] shows a study of texture based image description and retrieval method using an adapted version of homogeneous texture descriptor of MPEG-7. Other studies of image classification use descriptors like a contour-based shape descriptor [3], a histogram descriptor [4], and a combination of color structure and homogeneous descriptors [5]. As a part of the EU aceMedia project research, Spyrou et al. propose three image classification techniques based on fusing various low-level MPEG-7 visual descriptors [6]. Since the direct inclusion of descriptors would be inappropriate and incompatible, fusion is required to bridge the semantic gap between the target semantic classes and the low-level visual descriptors. There is a CBIRS that combines neural network and MPEG-7 standard: researchers of Helsinki University of Technology developed a neural, self-organizing system to retrieve images based on their content, the PicSOM (the Picture + self-organizing map, SOM) [7]. The technique is based on pictorial examples and relevance feedback (RF). The PicSOM system is implemented by using tree structured SOM. The MPEG7 content descriptor is provided for the system. In the paper, they compare the PicSOM indexing technique with a reference system based on vector quantization (VQ). Their results show the MPEG-7 content descriptor can be used in the PicSOM system despite the fact that Euclidean distance calculation is not optimal for all of them. Neural network has been used to develop methods for a high accuracy pattern recognition and image classification for a long period of time. Kanellopoulos and Wilkinson perform their experiments of using different neural networks and classifiers to classify images including multi-layer perceptron neural networks and maximum likelihood classifier [8]. The paper examines the best practice in such areas as: network architecture selection, use of optimization algorithms, scaling of input data, avoiding

370

J. Kim et al.

chaos effects, use of enhanced feature sets, and use of hybrid classifier methods. They have recommendations and strategies for effective and efficient use of neural networks in the paper as well. It is known that the neural network of the image classification system should make different errors to be effective. So Giacinto and Roli propose an approach to ensemble automatic design of neural network [9]. The approach is to target to select the subset of given large set of neural networks to form the most error-independent nets. The approach consists of the overproduction phase and the choice phase, which choose the subset of neural networks. The overproduction phase is studied by Partidge and the choice phase are sub-divided into the unsupervised learning step for identifying subsets and the final ensemble set creation step by selecting subsets from the previous step [10]. Kim et al. proposed a neural network based classification module using MPEG-7 [11, 12]. In this model, inputs for the neural network are fed from the feature values of MPEG-7 descriptors that are extracted from images. Since the various descriptors can represent the specific features of a given image, the proper evaluation process should be required to choose the best one for the adult image classification. 2.2

Cartoon Image Classifications

R. Glasberg et al. proposes a approach for classifying mpeg-2 video sequences as cartoon or non-cartoon by analyzing specific color, texture, and motion features of consecutive frames in real-time [13]. It is a well-known video genre classification problem, where popular TV broadcast genre like cartoon, commercial, music, news, and sports are studied. In the system the extracted features from the visual descriptors are non-linear weighted with a sigmoid-function and afterwards combined using a multilayered perceptron to produce a reliable recognition. Rama et al. proposed a flexible scheme based on a non-linear classifier called Fuzzy Integral. This operator was supposed to give a relevance measure to all the features involved in the classification as well to classify the images [14]. Berkels et al. particularly extract cartoon images from 2D aerial images in city areas. These images are mainly characterized by rectangular geometries of locally varying orientation. Their method is based on a joint classification of the shape orientation and a rectangular structure that are preserved before the restoration of image shapes [15]. Humphey implement a system that using low-level descriptors to classify a query as either cartoon or non cartoon. His system employs neural network for training database of ground-truth features. The performance of the system relatively well for training data compared to previous work, but fail to exhibit the same classification accuracy in test case [16]. Bosch et al. used image classification method according to the object categories that they contain in the large number of object cases. To implement this they combine three ingredients as follows, (1) shape and appearance representations that support spatial pyramid matching over a region of interest. (2) an automatic selection of the regions of interest in training phases. (3) the use of random forests as a multi-way classifier. The advantage of such classifiers is the ease of training and testing[17].

A Cartoon Image Classification System Using MPEG-7 Descriptors

3 3.1

371

The Proposed Cartoon Image Classification System The Proposed Architecture

The sample images of single cut cartons and multi cut cartoons are illustrated in Figure 1 and Figure 2 respectively. The proposed system classifies these images as one of three categories. The categories are predefined like s1, s2, s3 and m1, m2, m3 with respect to cartoonist.

(a) s1

(b) s2

(c) s3

Fig. 1. Example images of single cut cartoon used in simulation

(a) m1

(b) m2

(c) m3

Fig. 2. Example images of multi cut cartoon used in simulation

Figure 3 represents the overall architecture of the proposed cartoon image classification system. The proposed architecture consists of two modules; feature extraction module and classification module. Features defined in the MPEG-7 descriptors for the given query images are extracted, and then used as inputs for the classifier module. 3.2

Feature Extraction

By running the MPEG-7 XM program, features of training images are extracted in XML format. This feature information in XML format is parsed in the next step and is normalized into values between 0 and 1 with respect to values generated by each descriptor. These normalized values are used as inputs for the neural network classifier. The original values are converted into normal values, and followed by the class information. The class information, which is attached to the feature value, is the

372

J. Kim et al.

orthogonal vector value. For example, a category one cartoon image is represented as (1 0 0), whereas category two and three cartoons are (0 1 0) and (0 0 1) respectively.

Training DB

Feature Extraction

Testing DB

Normalization

Single cut Classification module

Multi cut Classification module

Fig. 3. The overall architecture of cartoon image classification system

3.3

Classification Module

The classification module employs neural network. The neural network classifier learns a relation of the feature values and a corresponding class by modifying the weight values between nodes. We use the backpropagation algorithm to train the network. It consists of input layer, output layer, and multiple hidden layers. The number of input nodes depends on a dimension of each descriptor, whereas the number of output nodes is three. The class information for the three output nodes is represented orthogonally like (1, 0, 0) depending on the classes as mentioned above. In a testing process, similar to the training process, the system extracts features from query images using MPEG-7 descriptors and classifies the images using the neural network that generated by the training process.

4 4.1

Simulation and Results Environments

The simulation uses a total of 600 images for training (100 for each cartoon images), and 300 for testing (50 for each cartoon images). It employs five descriptors for feature values, Color Layout, Color Structure, Region Shape, Homogeneous Texture, and Edge Histogram. The inputs consist of MPEG7 normalized descriptor values. Two classification modules were evaluated; one for single cut cartoon images and the other for multi cut cartoon images. The both modules were equipped with 2 layer 50 hidden

A Cartoon Image Classification System Using MPEG-7 Descriptors

373

nodes each and trained 100,000 iterations. The output layer consists of 3 nodes, one for each cartoon class. 4.2

Result

The simulation results of five descriptors for single cut cartoon images and multi cut cartoon images are shown in Table 1 and Table 2 respectively. In Table 1, the proposed cartoon image classification system for single cut cartoon images performs excellent result in Color Structure. The average of Color Structure simulation results is 99.34%. The system also shows good performance results in other 4 descriptors. For example, in Edge Histogram and Homogeneous Texture, the average results are 90.67% and 93.98% respectively. In Region Shape, it performs relatively low but reasonable results since the average is over 81%. Table 1. Classification Results for Single Cut Cartoon Images

S1 S2 S3

S1 S2 S3

S1 S2 S3

Color Layout S1 S2 88.00 12.00 6.00 84.00 0.00 18.37 Region Shape S1 S2 82.00 16.00 10.00 84.00 14.29 8.16 Homogeneous Texture S1 S2 92.00 8.00 0.00 94.00 0.00 4.08

S3 0.00 10.00 81.63

S1 S2 S3

S3 2.00 6.00 77.55

S1 S2 S3

S3 0.00 6.00 95.92

S1 S2 S3

Color Structure S1 S2 100.00 0.00 2.00 98.00 0.00 0.00 Edge Histogram S1 S2 86.00 14.00 14.00 86.00 0.00 0.00 Average S1 S2 10.00 89.60 6.40 89.20 2.86 6.13

S3 0.00 0.00 100.00 S3 0.00 0.00 100.00 S3 0.40 4.40 91.02

Table 2. Classification Results for Multi Cut Cartoon Images

M1 M2 M3

M1 M2 M3

M1 M2 M3

Color Layout M1 M2 96.00 4.00 8.00 82.00 1.89 13.21 Region Shape M1 M2 82.00 8.00 0.00 80.00 14.29 8.16 Homogeneous Texture M1 M2 100.00 0.00 0.00 96.00 0.00 4.08

M3 0.00 2.00 84.91

M1 M2 M3

M3 10.00 20.00 77.55

M1 M2 M3

M3 0.00 4.00 95.92

M1 M2 M3

Color Structure M1 M2 96.00 4.00 0.00 98.00 0.00 0.00 Edge Histogram M1 M2 100.00 0.00 0.00 98.00 0.00 0.00 Average M1 M2 3.20 94.80 1.60 90.80 3.24 5.09

M3 0.00 2.00 100.00 M3 0.00 0.00 100.00 M3 2.00 5.60 91.68

374

J. Kim et al.

Moreover, in Table 2, the proposed cartoon image classification system for multi cut cartoon images performs excellent result in Color Structure. The average of Color Structure simulation results is 98%. In Edge Histogram and Homogeneous Texture, the average results are 99.34% and 97.31% respectively. However, the result of Region Shape is worse than other descriptors as 79.85%. Overall, the proposed cartoon image classification system performs excellent result in Color Structure, Homogeneous Texture and Edge histogram than other two descriptors. The results seem very promising and can be applied to various image processing domains. It can be easily extended to medical image processing, in which identifying a particular image belongs to a certain symptom is very critical. Also it can be implemented as the main part of image search engine or image collection engine. For a large image data base, it is very useful tool for image retrieval system.

5

Conclusion

In this paper, we presented cartoon image classification system employing MPEG-7 descriptors as image feature values. The proposed system learns features of particular cartoon images and classifies the images into multiple classes according to the cartoon stylist. In the performance simulation, the proposed system successfully classifies images into multiple classes with the rate of over 90%.

References 1. Sikora, T.: The MPEG-7 visual standard for content description – an overview. IEEE Transactions on Circuit and Systems for Video Technology 11(6), 696–702 (2001) 2. Ro, Y., Kim, M., Kang, H., Manjunath, B., Kim, J.: MPEG-7 homogeneous texture descriptor. ETRI Journal 23(2), 41–51 (2001) 3. Bober, M.: The MPEG-7 visual shape descriptors. IEEE Transactions on Circuit and Systems for Video Technology 11(6), 716–719 (2001) 4. Won, C., Park, D., Park, S.: Efficient use of MPEG-7 edge histogram descriptor. ETRI Journal 24(1), 23–30 (2002) 5. Pakkanen, J., Ilvesmäki, A., Iivarinen, J.: Defect image classification and retrieval with MPEG-7 descriptors. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 349–355. Springer, Heidelberg (2003) 6. Spyrou, E., Borgne, H., Mailis, T., Cooke, E., Arvrithis, Y., O’Connor, H.: Fusing MPEG7 visual descriptors for image classification. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 847–852. Springer, Heidelberg (2005) 7. Laaksonen, J., Koskela, M., Oja, E.: PicSOM – Self-organizing image retrieval with MPEG-7 content descriptor. IEEE Transactions on Neural Networks: Special Issue on Intelligent Multimedia Processing 13(4), 841–853 (2002) 8. Kanellopoulos, I., Wilkinson, G.: Strategies and best practice for neural network image classification. International Journal of Remote Sensing 18(4), 711–725 (1997) 9. Giacinto, G., Roli, F.: Design of effective neural network ensembles for image classification purposes. Image and Vision Computing 19(9-10), 699–707 (2001)

A Cartoon Image Classification System Using MPEG-7 Descriptors

375

10. Patridge, D.: Network generalization differences quantified. Neural Networks 9(2), 263–271 (1996) 11. Kim, W., Lee, H.-K., Yoo, S.-J., Baik, S.W.: Neural Network Based Adult Image Classification. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3696, pp. 481–486. Springer, Heidelberg (2005) 12. Kim, W., Lee, H.-K., Park, J., Yoon, K.: Multi Class Adult Image Classification Using Neural Networks. In: Kégl, B., Lee, H.-H. (eds.) Canadian AI 2005. LNCS (LNAI), vol. 3501, pp. 222–226. Springer, Heidelberg (2005) 13. Glasberg, R., Elazouzi, K., Sikora, T.: Cartoon-Recognition Using Visual-Descriptors and a Multilayer-Perceptron. In: WIAMIS, Montreux, April 13-15 (2005) 14. Rama, A., Tarres, F., Sanchez, L.: Cartoon Detection Using Fuzzy Integral. In: WIAMIS 2007 Proceedings of the Eight International Workshop on Image Analysis for Multimedia Interactive Services (2007) 15. Berkels, B., Burger, M., Droske, M., Nemitz, O., Rumpf, M.: Cartoon Extraction Based on Anisotropic Image Classification. In: Modeling and Visualization. Springer, Heidelberg (2006) 16. Humphrey, E.: Cartoon Recognition and Classification. University of Miami (2009) 17. Bosch, A., Zisserman, A., Munoz, X.: Image Classification using Random Forests and Ferns. In: IEEE 11th International Conference on Computer Vision (2007)