Nudity Detection Based on Image Zoning - Semantic Scholar

13 downloads 954 Views 371KB Size Report
scheme that consists of an online sampling mechanism and the one-class-one-net neural network. The authors ..... Info-tech and Info-net (ICII 2001), pp. ... ITT 2007, pages 310-319, 2007. [11] J. S. Lee, Y. M. Kuo, P. C. Chung, and E. L. Chen,.
The 11th International Conference on Information Sciences, Signal Processing and their Applications: Special Sessions

NUDITY DETECTION BASED ON IMAGE ZONING Clayton Santos, Eulanda M. dos Santos, Eduardo Souto Institute of Computing (IComp) Federal University of Amazonas (UFAM), Manaus, Brazil

ABSTRACT Automatic nudity detection strategies play an important role in solutions focusing on controlling access to inappropriate content. These strategies usually apply filter or similar approaches in order to detect nudity in digital images. In this paper we propose a strategy for nudity detection based on applying image zoning. Moreover, we perform feature extraction using color and texture information, as well as feature selection, focusing on pointing out the most relevant features. SVM is used for image classification. Experimental results demonstrate that an effective improvement in terms of accuracy is attained by using the zoning strategy. In addition, the highest classification rates are obtained by combining two color-based and two texture-based features, which shows that it is possible to reduce the number of features while keeping high classification rates. Index Terms – nudity detection, feature analysis, image zoning.

1. INTRODUCTION The huge amount of multimedia data on the Internet is fundamental to the great attention that Internet has attracted. Nevertheless, it has also allowed an increase in the amount of pornographic content, since there is a lack of information control on the Web. As a consequence, the interest on automatic pornography detection is rising especially due to parental/institutional control with Internet-surfing children and employees, and Internet security. This second issue refers to the fact that pornographic content may lead to pages with malicious threats. It is clearly presented in a recent report provided by Symantec [12] that there is a direct relationship between pornography and dissemination of malware. Nudity detection in images may be an initial step toward porn site identification. It is not surprising that this idea has already been investigated in the literature [7,11,13]. These previous works, as well as other approaches, have one characteristic in common: they employ filters to detect skin as a first level of nudity

978-1-4673-0382-8/12/$31.00 ©2012 Crown

classification processes. Arentz and Olstad [1] point out that the identification of skin areas is the main factor for detecting nudity in images successfully. Besides, after the skin detection level, most of these strategies work by extracting features from the images, such as color [18], texture [15], shape [17] and location of skin pixels [12]. Image feature extraction has also received a great deal of attention in the recent literature on content-based image retrieval. Kalva et al. [9] introduced the idea that by dividing an image into independent zones one may be able to extract local features and to discard irrelevant information. According to the authors, the most relevant features in one image are provided by its central regions. It is important to mention that Kalva et al. [9] did not focus on nudity detection. They dealt with five classes: vehicle, people, domestic animal, motorcycle and CD/DVD. Their interesting results motivated us to investigate whether or not image zoning may improve nudity detection rates. Thus, this paper proposes a strategy for nudity detection in images, which is divided into two modules: (1) filter skin detection; and (2) image zoning. The objective of image zoning module is to divide images into independent parts based on the assumption that a nude image presents higher amount of skin pixels in its central region. Then, six features are extracted for each zone, namely two color-based and four texture-based features. It is noteworthy that these features are the most frequent information employed in works dealing with nudity detection in images. Finally, nudity detection is performed using SVM (Support Vector Machines) [3]. In addition, this paper presents a feature selection process. Our goal is to evaluate the relevance of each feature for the classification performance. At the same time, we try to increase detection rate while reducing the dimensionality of the feature space. It is important to note that feature selection was not often performed on previews work on nudity detection. In this paper, the term nudity is defined as the state where the person is fully discovery of clothing or wearing little clothing, exposing a private part. Therefore, in this context, the image of a child on a beach is considered nude picture. The remainder of this paper is organized as follows. Section 2 presents research work related to nudity detection in images. Our proposed zoning strategy and investigated features are introduced in Section 3. Then,

1098

Section 4 presents the experiments and obtained results. Finally, Section 5 presents the conclusions and future work. 2. RELATED WORK In this section we discuss some existing nudity detection methods that use skin filter based on color space combined with other features such as color histograms, texture analysis and shape measures. Most of these systems employ machine learning classifiers. One of pioneering work is done by Forsyth and Fleck [7]. They combine skin filter using color and texture properties, to detect bare parts in human body images. These skin regions are the fed to a cluster process, which attempts to group human figures using geometric constraints obtained from human body structure. The system attained 60% of precision on a test set composed of 138 nude images and 1,401 assorted control images, containing images of people but with no nude. Another work is conducted by Jiao et al. [6]. They first use a YUV and YIQ color space to detect skin areas in images. Then, the Sobel operator and the Gabor filter are applied to remove non-skin pixels. Finally, color features are extracted and provided to a SVM classifier. The proposed method obtains a precision of 89.3% on a test set composed by 1,200 nude images and 1,200 assorted non-nude images. Some of non-nude samples are images of people with no nudity. Zhu et al. [15] present a two-step adaptive framework for accurate skin-color detection. In the first step, they identify the Skin-Similar pixels using a generic skinmodel. Texture information is then used to train a Gaussian Mixture Model (GMM). A SVM classifier, using the spatial and shape features along with Gaussian parameters, is trained to separate the correct skin Gaussian component from the trained GMM. In comparison with traditional methods, this combination is able to achieve 88.9% of accuracy on a test set of 400 nude images and 400 assorted images. Schettini et al. [5] propose a pornography detection model by combining color, edge, and texture features. They compare decision forest generated using CART (Classification And Regression Trees) and SVM. Their database consists of 1,500 pictures, being 750 pornography samples. The results show that SVM achieves the best performance, i.e. 90.4% of accuracy. Lee et al. [11] present a nudity detection algorithm based on learning-based chromatic distribution matching scheme that consists of an online sampling mechanism and the one-class-one-net neural network. The authors argue that the object’s chroma distribution can be on-line determined so that the skin color deviation coming from lighting can be accommodated without sacrificing the accuracy. The results show detection rates of 86.4% and 94.8%, for nude and non-nude images respectively . As can be observed in these methods, none of them is based on partitioning a digital image into multiple zones. As previously mentioned, we suggest that by partitioning the physical area of an image into a number of squares,

called zones, we may simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. The use of zones may allow image’s regions being independently dealt with, i.e., feature extraction may be locally conducted. 3. METODOLOGHY In this section, we describe our image zoning strategy. In order to better illustrate the whole nudity detection process, we adopt the architecture most often applied in works that perform skin filter [1, 5, 6, 11, 15], as shown in Figure 1. First, input images are normalized and then segmented through a filtering mechanism. Here, however, instead of performing feature extraction directly on the entire image, we divide the images into zones which are then used for local feature extraction. Thus, features obtained from each zone are combined into a feature vector and submitted to SVM for classification on nude or neutral (non-nude) classes. Bellow, we present details about zoning strategy, as well as the remaining modules.

Figure 1. Overview of the automatic nudity detection process. 3.1.

Normalization

As mentioned in the introduction, image nudity detection may be assumed as an initial step to identify porn sites. Therefore, input images must be obtained from the Internet. This fact leads us to deal with images provided with different formats and resolutions, which increases the complexity of an automatic nudity detection process. In order to provide images with the same size, format, and other standard aspects, a normalization step is then performed. Different configurations were used to partition the images (32x32, 64x64, 128x128, 256x256). The best results were achieved when the original images were re-sized to approximately 256 columns by 256 lines. Thus, at this step, all images were normalized to 256x256 and converted to JPEG format.

1099

3.2.

examples of images and their corresponding results after applying filter skin, i.e. segmentation output.

Skin Filter

The objective of this second module is image segmentation. Normalized images are partitioned into two regions: (1) skin region; and (2) non-skin region. According to Zhu et al. [16], the most relevant image features are based on color, since it contains the highest level features for detecting and locating objects. Several color spaces have been applied in the literature for nudity detection. YCbCr is the color space used in the experiments presented throughout this paper. This choice is especially due to the fact that YCbCr is able to cope with luminosity, which is the key issue in images containing human skin. It is widely accepted that skin absorbs light owing to melanin - a substance that surrounds and protects human hair and skin, more abundant in black people [8]. Hence, it is important to use a color space that properly isolates and handles brightness on image [11] in order to detect nudity successfully. Another reason for using YCbCr is the fact that human skin contains a limited range of pixel values over the color space. Finally, according to Kelly et al. [10], by using YCbCr we may achieve high skin detection rates while reducing computational cost. Image segmentation process starts by converting RGB color space into YCbCr, where Y is the component that provides information of luminance and, Cb and Cr (equations 1 and 2), are the components that provide information about chroma.   = (1) ++  =

 (2) ++

Figure 3. Images showing the result of filter skin. (a) original nude image; (b) skin filter result on nude image; (c) image showing people with clothes; (d) skin filter result on image (c); (e) no-skin image; and (f) the result of the filter on image with no skin. 3.3.

Zoning Images

In pictures, the central region is the most representative, while the outer regions represent scenarios or other less important elements [9]. In this study, we adopt the hypothesis that nude images are composed by skin regions predominantly in their central region. Indeed, by observing nude images, it is possible to verify that naked body usually is found in a significant portion of the image and that the position of the naked body is close to the image center, as also observed in [11]. From this assumption, local feature vectors may be generated. Accordingly, the concept of zoning means that we have to define regions in the image and treat them independently, i.e., feature extraction algorithms analyze each area (zone) independently [9].

After this conversion, the images are pre-processed using a digital image processing technique known as histogram analysis to determine the variation of different skin pixel values in the image. Figure 2 shows some samples of skin areas obtained from different regions of people’s faces by varying ethnic groups and lighting conditions. Each sample is composed by 30x30 pixels.

Figure 4. Images showing zoning results. (a) nude image vertically oriented and its zoning partition (b); (c) nude image horizontally oriented and its zoning partition (d). Figure 2. Skin samples examples used to obtain chromaticity. Finally, images mean chromaticity (Cb and Cr components) are obtained by means of histogram analysis. In our experiments, this analysis pointed out that pixel values related to skin vary Cb component from 77 to 127 and Cr component from 133 to 173. The definition of these range values of pixels allowed us to divide images into skin and non-skin pixels. Figure 3 shows

In this work, we divide the image into k zones where each zone corresponds to a concentric rectangular region of proportional size, as shown in Figure 4. Image zoning can be modeled as a trisecting divisive process, i.e., starting from the whole image, the process recursively divides the image into k sub-regions (zones). Let I be an image, NZ (I) be the total number of zones of I, Z be a certain zone of I, FE(Z) be the local feature extraction process of Z, FE(I) be the feature vector of I, CP be the classification process. This leads to the following definition of zoning algorithm.

1100

- Correlation. Measures how a pixel is correlated with its neighbor. The comparison is performed on all image pixels. The correlation is 1 for a completely correlated pixel. ∑, ( −  )  −  (, )  = (4)  

Algorithm 1. Zoning algorithm definition NZ(I) = k; For all image I do; divide I into NZ; For all zone(Z) do; FE(Z); FE(I) = FE(I) + FE(Z); End; End; CP(FE(I));

- Energy. Returns the sum of squared within the GLCM, i.e, calculates the repetitions of pairs of pixels in the image gray levels. So, when an image is constant, only similar pixels are present and its value is close to 1. When the image is heterogeneous, there are differences among pixels and its value is close to 0.

The zoning algorithm is briefly explained as follows: initially, a whole image I is used as input. Then we start the local feature extraction process for each zone FE(Z) aiming to obtain the feature vector of image FE(I), which is used as parameter to the classification process CP(FE(I)). In our experiments, NZ(I) is directly related to the normalization of images. As the images were standardized in an approximate size of 256 columns per 256 lines, the algorithm zoning given three areas for the image 3.4.

 ! =   (, ) (5) ,

- Homogeneity. Returns a value that represents the closeness of the distribution of elements over the diagonal of the matrix of co-occurrence of the grayscale. #$! =

Feature Extraction

The selection of relevant features to be employed in the classification process is the key point for building robust learning models. Therefore, we performed a detailed analysis of the features already employed in other works and carried out several experiments using real images. Six features are extracted for each zone, namely two color-based and four texture-based features. These features are the following: 1. Color-based features - The number of skin pixels connected in the zone. - The proportion of skin pixels (PS) to the total number of pixels in the image. PS=Ns/Nt, where Ns is the skin count connected and Nt is the total number of pixels in the image. 2. Texture-based features The texture features are extracted from a Gray-Level Co-Occurrence Matrix (GLCM), which represents the probability of sets of gray levels i and j within a defined spatial relationship in the image, where the indices i and j are the row and column of the matrix neighborhood. This spatial relationship is defined in terms of a distance from an angle θ. Then, given a GLCM, some statistical data can be extracted, such as homogeneity, correlation, contrast and energy. Considering these statistics, we found the best distance for d = 1, calculated in eight directions and θ = [0, 45, 90, 135, 180, 225, 270, 315]. The four measures are listed below as in [5,15]: - Contrast. Measures the contrast between the intensities of pixels analyzed and the neighboring pixel. The comparison is performed on all image pixels. For a constant image, contrast is 0 (zero).  = ( − ) (, ) (3) ,

∑&,' % (,) ()(*)

(6)

As already mentioned, all features are computed for each zone. 3.5.

Classification

SVM is used for the classification module due to its inherent classification robustness and encouraging results in nudity detection literature [10]. We employed standard SVMs trained using Weka package [14]. We first set the parameters in a small training set, and then we analysed the performance on a larger dataset. 4. EXPERIMENTS A series of experiments has been carried out to determine whether or not including a zoning module on automatic nudity detection architecture leads to increase detection rates. We also point out the best subset of features for this task. First, we present a description of the parameters settings on experiments, such as database, classifier parameters, and others. 4.1.

Database

We used a database composed of 5,360 images. Neutral (non-nude) class was composed by 3,110 images (faces, different people, different environments, cars, motorcycles, airplanes, etc.) which were collected from Caltech [4] data repository. Nudity class was provided by Belém et al. [2], containing 2,250 images. In this class, are included nude scenes containing people of different ethnicities (African, Caucasian, Asian, Indian, European) under various light conditions As mentioned previously, all images are colorful and have been normalized and converted to JPEG format.

1101

Table 1. Sensitivity and specificity attained by SVM using different combinations of features and zoning. No Zoning

Feature

Zone 1

Zone 2

Zone 3

All Zones

Sensitivity

Specificity

Sensitivity

Specificity

Sensitivity

Specificity

Sensitivity

Specificity

Sensitivity

Specificity

All features

84.9%

92%

96%

96.6%

93.8%

96%

75.9%

91.1%

97.4%

98.4%

Skin Count

76.9%

84.6%

89.1%

92%

88.9%

92.3%

68.1%

86.3%

91.5%

93.7%

Skin Percent

81.9%

84.8%

89.3%

92%

87.4%

92.8%

69.6%

87.6%

90%

94.3%

All Collor

77.8%

82.2%

89.2%

92.8%

88.7%

93.2%

68.1%

89.4%

92%

95.3%

Contrast

0%

100%

72.1%

81.4%

30%

95.6%

19.6%

96.9%

81.4%

91.6%

Correlation

0%

100%

72.1%

81.4%

51.5%

88.2%

12.1%

98.5%

70%

89.8%

50.3%

82.9%

50.4%

89%

39.5%

88.5%

13.3%

97.5%

82.3%

91.8%

0%

100%

47%

85%

0%

100%

10.3%

99.8%

61.5%

86.7%

Homogeneity Energy Cont. + Corr.

44.7%

86.7%

79.3%

87.7%

52.4%

89%

23.1%

95.9%

86.3%

93.4%

Cont. + Hmg.

56.3%

83.3%

82%

88.3%

66.6%

87.9%

32.1%

93.1%

87.4%

94%

56%

82.8%

80.7%

87.3%

36.8%

94.5%

16.7%

98.7%

84.2%

92.3%

Enrg. + Hmg.

63.8%

81.4%

74.8%

85.8%

66.6%

86.4%

37.7%

91%

85.2%

92.3%

Enrg. + Corr.

52.4%

86.4%

66.2%

87.9%

56.8%

86.3%

13.2%

99.4%

80.7%

92%

Hmg. + Corr.

70.2%

83%

77.1%

89.5%

67.5%

87.6%

50.4%

89%

88.4%

93.4%

All Texture

71.6%

84.6%

82.3%

89.2%

69.7%

86.9%

52.4%

90%

91.4%

95.6%

Best Combination

83.6%

91.1%

95.1%

96%

93.1%

95.3%

74.7%

90.6%

96.8%

98.8%

Cont. + Energy

4.2.

Experimental Protocol

Classical holdout validation strategy was employed to evaluate performance achieved by SVM classifier, i.e. the database was partitioned into training and test sets. The training set was composed by 1,340 nudity images and 1,340 neutral pictures. We used the same amount of samples for each class in order to overcome bias. The remaining samples were used to compose a test set, e.g. 1,170 and 910 of neutral and nudity images respectively. Cross-validation strategy was used to fine-tune SVM. Our experiments indicated that SVM with RBF kernel outperformed polynomial kernel. In addition, penalty parameter C and γ (owing to the use of kernel RBF) were defined as C=100 and γ = 0.07. 4.3.

Results

It is noteworthy that, as described in section 3, we have divided the input images into three zones. Moreover, we have extracted six features, two color-based features: skin pixels count and skin pixels percent, and four texturebased features: contrast, correlation, energy and homogeneity. First, all features were put together to compose five different feature spaces: (1) features with no zoning; (2) features from zone 1; (3) features from zone 2; (4) features from zone 3; and (5) features from all zones. SVM’s performance was measured when trained with each different feature space. Further, combinations of features were tested, considering each feature space, to accomplish feature selection module. Table 1 reports the results attained by SVM taking into account sensitivity and specificity measures. Sensitivity denotes the ratio between the number of nude images correctly classified to the whole amount of nude images, while specificity indicates the ratio between the

number of non-nude images correctly classified to the total number of images for this class. The first question to answer is: may zone strategy increase detection rate? Table 1, second line, shows that SVM’s performance was superior when trained with features extracted after the zoning process. Sensitivity increased from 84.9% to 97.4%, while specificity increased from 92% to 98.4%, when all zones and the whole set of extracted features were used. It is important to observe that Table 1 also shows results when SVM was trained using features extracted for each zone separated. The best results were obtained by using features from zone1, which means that nudity pixels are more likely to belong to central regions of the images. Therefore, these results confirm the hypothesis that there is higher concentration of skin pixels in the center of the image than in outer regions. In terms of feature selection, we adopt an exhaustive procedure, since a small feature space is involved. Initially, each feature was used as the only information provided to train SVM. Thus, all possible combinations of features, according to information source, i.e. colorbased and texture-based features, are tested to allow us to point out the best combination of feature for each group. Further, the best subset of features identified at each group, were also combined. It is important to mention that the recognition rate is used as search criterion. It can be seen in Table 1(last line) that the impact of features was similar with zoning and with no zoning. Both color-based features were found to be relevant for nudity detection. In the context of texture-based features, it was found that the combination of consistency and correlation produced the best results. Hence, the results show that the combination of two color-based features, plus the two best texture features, i.e., homogeneity and correlation led SVM to obtain 83.6% and 96.8% of sensitivity, and specificity of 91.1% and 98.8%, with no zoning and zoning respectively. Even though all

1102

combinations of features have been compared, Table 1 shows only the best results. Moreover, these experiments show that feature selection successfully reduced the number of features. The initial set of 06 features was decreased to 04 features, leading to a 33% of information reduction, while performance was only slightly decreased. Hence, feature selection may reduce complexity whilst keeping high accurate classifiers. We also observe that misclassified neutral pictures present high number of yellow and red pixels. Images which illustrated sand and rock formation also had high misclassification rates. These results demonstrate that image zoning may be widely used in different applications and in several future works. In addition, we observed that the time required to process each image can be considered as applicable in online systems, for example, plugins, search engines, audit systems, it was less than 30 milliseconds. It is important to note that although some previous works provide important information related to database, such as ethnicity, number of samples and light conditions, a process of direct and fair comparison is difficult due to the lack of benchmarking databases. 5. CONCLUSION This work presented a strategy that uses image zoning along with the analysis of six features, two color –based and four texture-based features, applied to nudity detection in images. SVM was used for classification. The main contribution of this paper is to confirm the hypothesis that nude images have most of its skin pixels concentrated on the central region of the image. Experiments conducted using skin filtering algorithm and zoning strategy demonstrated that features extracted from the central zone (first zone) provide the most relevant information for SVM classifier. Besides, the zoning strategy increased recognition rates effectively. Another contribution of this paper is a feature selection process intended to reduce the dimensionality of the feature space and to increase correct classification rates. Our results indicate that this objective has been accomplished since the best subset of features is composed by two color-based and two texture-based features. In addition, the number of samples involved in this work is relatively high in comparison with most of the works in nudity detection literature. For future work we intend to add more features to increase feature subset selection results. We also plan to investigate porn detection in the context of the Web, by combining textual and nudity detection in images.

[2] R. J. S. Belém, J. M. B. Cavalcanti, "SNIF: A Simple Nude Image Finder". In Proceedings of the Third IEEE Latin American Web Congress (LA-WEB’05), pp. 252-258, 2005. [3] Burges, C. J. C. "A tutorial on Support Vector Machines for Pattern Recognition". Data Mining and Knowledge Discovery, Vol. 2, pp. 121-167, 1998. [4] Caltech. Computational Vision Laboratory. California Institute of Technology. Disponível em: http://www.vision. caltech.edu/html-files/archive.html . Acesso em Abril de 2010. [5] R. Schettini, C. Brambilla, C. Cusano, and G. Ciocca, “On the Detection of Pornographic Digital Images”, VCIP 2003, pp. 2105-2113, 2003. [6] F. Jiao, W. Gao, L. Duan, and G. Cui, “Detecting Adult Image using Multiple Features”, Proc. International Conference Info-tech and Info-net (ICII 2001), pp. 378-383, 2001. [7] M. Fleck, D. A. Forsyth, and C. Bregler, “Finding Naked People”, Proc. 4th European Conf. On Computer Vision, vol. 2, pp. 593-602, 1996. [8] M. J. Jones, and J. M. Rehg, “Statitical Color Models with Application to Skin Detection”. Computer Vision, Vol. 46, pp. 81-86, 2002. [9] P. R. Kalva, F. Enembreck, and A. L. Koerich, “Web Image Classification using Combination of Classifiers”. IEEE Latin America Transactions, pp. 661-671, 2008. [10] W. Kelly, A. Donnellan, and D. Molloy, “A Review of Skin Detection for Objectonable Images”. In Proc. ITT 2007, pages 310-319, 2007. [11] J. S. Lee, Y. M. Kuo, P. C. Chung, and E. L. Chen, “Naked Image Detection Based on Adaptive and Extensible Skin Colour Model”, Pattern Recognition, Vol. 40, pp. 22612270, 2006. [12] Symantec, “Symantec Internet Security Threat Report – Trends for 2010”. Disponível em http://www.symantec.com/business/threatreport/. Acesso em Junho de 2011. [13] X. Wang, C. Hu, and Yao “An Adult Image Recognizing Algorithm on Naked Body Detection”. IEEE International Colloquium on Computing Comunication, Control and Management, pp. 197-200, 2009. [14] Weka, “Data Mining with Open Source Machine Learning Software in Java”. University of Waikato. Disponível em: http://www.cs.waikato.ac.nz/~ml/weka. Acesso em Outubro de 2010. [15] H. Zhu, S. Zhou, J. Wang, and Z. Yin, “An Algorithm of Pornographic Image Detection”. In Fourth International Conference on Image and Graphics, pp. 801-804, 2007. [16] Q. Zhu, C-T. Wu, K-T. Cheng, and Y-L. Wu, “An Adaptive Skin Model and its Application to Objectionable Image Filtering”, ACM Multimedia, pp. 56-63, 2004. [17] Q.-F. Zheng, W. Zeng, G. Wen, W.-Q. Wang, “ShapeBased Adult Image Detection”. In 3rd International Conference on Image and Graphics, pp. 150—153, 2004.

REFERENCES [1] A. Arentz, and B. Olstad, “Classifing Offensive Sites Based on Image Content”. Computer Vision and Image Understanding, pp. 295-310, 2004.

[18] P. Yogarajah, J. Condell, K. Curran, A. Cheddad, and P. McKevitt, “A Dinamic Threshold Approach for Skin Segmentation in Color Images”. In Proc. IEEE 17nd International Conference on Image Processing (ICIP 2010). Hong Kong, China. (2010), 2225-2228.

1103

Suggest Documents