A Contourlet-Based Face Detection Method in ... - Semantic Scholar

3 downloads 0 Views 1MB Size Report
A Contourlet-Based Face Detection Method in Color Images .... window. Only the windows containing a possible face based on skin pixels percentage are ...
A Contourlet-Based Face Detection Method in Color Images Hedieh Sajedi, Mansour Jamzad(IEEE Member) Department of computer engineering, Sharif University of Technology, Tehran, Iran [email protected], [email protected] Abstract The first step of any face processing system is detecting the location in images where faces are present. In this paper we present an upright frontal face detection system based on the multi-resolution analysis of the face. In this method firstly, skin-color information is used to detect skin pixels in color images; then, the skin-region blocks are decomposed into frequency sub-bands using contourlet transform. Features extracted from sub-bands are used to detect face in each block. A multi-layer perceptrone (MLP) neural network was trained to do this classification. To decrease false positive detection we use eyes and lips template matching. These templates achieved by averaging corresponding parts in LL sub-band of contourlet decomposition. Experimental results show that the proposed algorithm is effective and efficient in detecting frontal faces in color images.

1. Introduction Human face detection has been studied extensively in recent years due to the recent emergence of applications such as security access control, visual surveillance, content-based information retrieval, and advanced human-to-computer interaction like video conferencing. Face detection is also a required preliminary step to automated face recognition whose performance greatly impacts recognition rates. Given an arbitrary image, the goal of face detection is to determine whether or not there are any faces in the image and, if present, return the window location and extent of each face. However, face detection from a single image is a challenging task because of variability in scale, location, orientation (up-right, rotated), and pose (frontal, profile). Facial expression, occlusion, and lighting conditions also change the overall appearance of faces. In this paper, we propose a face detection algorithm for color images in the presence of varying lightening condition and the complex backgrounds.

Copyright © SITIS

Skin color is considered to be a useful and discriminating image feature for face and people detection, localization, and tracking. Like in almost any other computer vision research fields, confounding imaging conditions (e.g., change of illumination, shadows, shading and highlights) complicate the skin detection problem. Our proposed method in first step detects skin regions over the entire image using boosted skin detector algorithm that is proposed in [1] and then generates face candidates. As contourlet transform is better at detecting curves in more directions without influencing by discontinuities, in comparison to wavelets, it has led to extract smooth curves from images very effectively. Starting from the idea that different sub-bands of contourlet bring different information, in second step we extracted some features from these sub-bands to discriminate faces from other patterns. The classification of patterns to face/non-face is done using a MLP neural network. Improving the performance, we introduce two templates which are mean eyes and mean lips. These templates achieved by averaging eyes and lips parts of LL sub-band decomposition of faces that were used to train NN classifier. The motivation to use LL sub-band is that in this sub-band many strong edges that affect the mean templates are filtered. Experimental results demonstrate successful face detection over a wide range of facial variations in color, position, scale and expression in images from several photo collections. This paper is organized as follows. In Section 2 the face detection problem is described. Section 3 describes Contourlet transform. The proposed algorithm is covered in Section 4. The used database and experimental results are shown in Section 5 and finally conclusions and future directions are given in Section 6.

- 678 -

2. Face Detection In this section, we review the existing techniques to detect faces from a single intensity or color image. [2] Classifies single image detection methods into four categories: 1.Knowledge-based methods. These rule-based methods encode human knowledge of what constitutes a typical face. Usually, the rules capture the relationships between facial features. 2.Feature invariant approaches. These algorithms aim to find structural features that exist even when the pose, viewpoint, or lighting conditions vary, and then use these to locate faces. These methods are designed mainly for face localization. 3.Template matching methods. Several standard patterns of a face are stored to describe the face as a whole or the facial features separately. The correlations between an input image and the stored patterns are computed for detection. 4.Appearance-based methods. In contrast to template matching, the models (or templates) are learned from a set of training images which should capture the representative variability of facial appearance. These learned models are then used for detection. Numerous techniques for face detection have been proposed. Viola and Jones [3] presented a machine learning approach for face detection. The novelty of their approach comes from the integration of a new image representation (integral image), a learning algorithm (based on AdaBoost), and a method for combining classifiers (cascade). In several papers a feature extraction based on Haar sub-band is proposed. For example [4,5] propose to use the intensity values, horizontal/vertical projections and the horizontal and vertical directions of the Haar transform. [5] is a feature based approach that detects faces by searching for local feature regions (eyes, nose, and mouth). Silhouettes have been used as templates for face localization in [6]. A set of basis face silhouettes is obtained using principal component analysis (PCA) on face examples in which the silhouette is represented by an array of bits. These eigen-silhouettes are then used with a generalized Hough transform for localization. [7] Scans an input image with a time-delay neural network to detect faces. To cope with size variation, the input image is decomposed using wavelet transforms. In [8], Osunaet al. developed an efficient method to train a SVM for large scale problems, and applied it to face detection. SVMs have also been used to detect faces and pedestrians in the wavelet domain. A multi-expert approach for wavelet-based face detection is proposed in [9]. Typically, the slide-

window technique is used to search for faces in an image; many such existing techniques are time consuming [11]. In this paper, a new and effective contourlet-based approach which extracts the edge features in images is proposed. In this method, we have used contourlet transform because of its low redundancy and good representation of edges in different directions.

3. Contourlet Transform Contourlet transform was proposed by Do and Lu in 2003 as a simple directional extension for wavelets that fixes this sub-band mixing problem and improves the directionality. They introduced a double filter bank structure by combining the Laplacian pyramid with a directional filter bank. Based on a nonredundant checkerboard filter bank, the proposed directional extension for wavelet works with both the critically-sampled wavelet transform as well as the undecimated wavelet transform [11]. Being critically sampled, the contourlet will not increase the redundancy of the overall transform. Although nonseparable in essence was an efficient implementation based on 1-D operations, it can be easily generalized to higher dimensions. The structure of critically sampled contourlet transform is shown in Figure 1. This step can be iterated on the lowpass sub-band for multiple level of decomposition. The synthesis is given by synthesis part of the checkboard filter banks and the inverse wavelet transform. Non-subsampled contourlet transform was made simply by turning of the downsampler units in subsampled contourlet by considering some issues about aliasing problem. For more information refer to [12]. As shown in Figure 2(a), the 2-D wavelet transform produces one lowpass sub-band (LL), and three highpass sub-bands (HL, LH, HH), corresponding to the horizontal, vertical, and diagonal directions. Furthermore, diagonal sub-bands mix the directional information oriented at 45◦ and 135◦. The main idea of contourlet is to find some directional extension to further divide each highpass sub-band of the wavelets into two directions. In particular, the desire frequency partitioning in contourlet transform is shown in Figure 2(b), which contains of six directional sub-bands roughly oriented at 15◦, 45◦, 75◦, 105◦, 135◦ and 165◦.

- 679 -

0

LL

1 2

HL WT

LH

3 4

HH

5 6 Wavelet Decomposition

Directional Filter Bank

Figure 1. 1-level contourlet decomposition[12]

4. Proposed Method Many existing face-detection approaches are time consuming even just for the task to detect a single face in an image. This is because the time consuming slidewindow technique is widely used, and in order to detect faces with varying scales, input images have to be resized several times. (π , π ) (π , π ) ω 2 ω2 2 1 H 6 H LH 5 HL

HL

LL

ω1

4

3 0

4

3 H

H

LH

- (π , π )

(a)

6

- (π , π )

1

ω1

2

5

(b)

Figure 2. Division of 2-D frequency spectrum. (a) Wavelet transform. (b) Contourlet transform [13] As skin color is an important apparent property for face detection, if we can determine face candidates based on skin pixel percentage in each block of image, the above problem can be reduced. Our proposed facedetection algorithm can be summarized as follows:

• Perform skin-pixel classification , • In each skin-region block, face detection can •

be performed to detect a possible face, Verify detected face by using template matching.

The face detector structure is shown in Figure 3. The preliminary step is extraction of the candidate window. Only the windows containing a possible face based on skin pixels percentage are selected to be processed; this step is performed adopting the method proposed in [1] that is robust to change of illumination. To improve the speed of face detection we use only the boosted skin classifier that is a pixelbased method. The blocks with less than 70% skin pixels do not pass to next stage. The windows resulting from this step are automatically resized to 32 rows and 32 columns, irrespective of original size and are normalized by the histogram equalization method in order to reduce noise and lighting effect. Skin filtering step is useful to speed the detection and also to improve the performance. The second step is filtering of the candidate windows. Each window is transformed by 2D contourlet decomposition. Contourlet features are extracted; the windows are classified as face/non-face by MLP Neural Network trained using features that were extracted from contourlet sub-bands. Like in all other classification problems, feature selection when properly done could substantially enhance the face detection performance. A typical detection solution requires a face image descriptor, which is characterized by: (I) an extraction algorithm to encode face image features into feature vectors; and (II) a similarity measure to compare two similarface and face patterns. Face descriptor contains edge feature extracted from contourlet sub-bands. We use contourlet with 3 levels of decomposition. In this phase a face image is decomposed by contourlet transform to 8 sub-images. In vertical sub-bands the mean value of contourlet coefficients in horizonatal axis and in horizontal sub-bands the mean value of coefficients in vertical axis are computed. We used only the mean values as features to reduce dimensionality and therefore to improve computation efficiency as well as to generalize the capability of our classifier.

Detail Sub-band(LL) Image

Skin Detection

Blocks of Skin region

Contourlet Transform Decomposition

Eyes & Lips Template Matching

Face/NonFace

Face Other Sub-bands

Figure 3. Face detector structure

- 680 -

MLP NN

NonFace

A sample of contourlet decomposition of a face image is shown in Figure 4. Figure 6. Eyes and lips templates

Figure 4. Contourlet decomposition of a face image The features extracted from the seventh sub-band of above decomposition are shown in Figure 5. The feature vector included these features extracted from all sub-bands. Since each face window is resized into 32Ⱋ32 and each sub-band provides 16 features, therefore the feature vector representing 8 subbands has 128 dimensions.

If the face candidate window similarity to templates exceeds from an experimental threshold, the block will be detected as a face image. We used a neural network-based approach, which involves training the system with both positive and negative patterns. Purely contourlet-based face detection algorithms coarsely classify face/nonface images, according to the pattern learned by MLP. We used template matching to reduce false positive results. Our system slides a window across the image horizontally and vertically to determine whether or not a face is present at a location in an image. To detect faces with different sizes we start from a large window size and decrease its size steadily. In order to detect faces with varying orientations, input images have to be rotated several times. We consider the window size from 70Ⱋ70 to 20Ⱋ20. The blockoverlapping is another factor that affects the accuracy and speed of the algorithm. In all experiments windows slide in 2 pixels step size. Depending on applications, to improve the speed of the face detection, we can choose less block overlap; in this state, the method results less degree of precision.

5. Experimental Results

Figure 5. Feature extraction from Contourlet decomposition of a face image The MLP trained with extracted features has 2 hidden layers. The first layer has 16 neurons and the second one has 4 neurons. Only the windows classified as faces in this step, are processed by the following step. Finally, local features are used for verification. Predefined templates can be used to model facial features effectively. We designed two templates according to averaging some eyes and lips regions on face images in LL sub-band of contoutlet decomposition. The templates are shown in Figure 6.

To examine proposed face detection method, the algorithm is applied on a wide variety of images. Available data sets usually contain grayscale images only and are not suitable for evaluating our face detection approach. Face-detection experiments were done on some images from the UCD colour database [14] that have frontal faces, Bao face database [15] and some color images from the Internet and family photo collections. These color images have been taken under varying lightening conditions and with complex backgrounds and various visual qualities. The test set contains 40 images with 224 faces. For training the MLP classifier, face and non-face samples marked by hand in size of 32Ⱋ32. The dataset consist of 188 face blocks and 270 non-face blocks that are cropped from images in face recognition UCD database [14] and other collections. Some sample faces that were used to train MLP are shown in Figure 7.

- 681 -

Table II- Average face detection accuracy

Figure 7. Face samples used to train MLP classifier In the test phase a red square marks the areas classified as face. Experimental results on some typical images of scenes with multiple faces are displayed in Figure 8. The proposed face-detection algorithm is shown to be effective in detecting frontal faces in color images. Detecting faces in family and group images is more challenging than portraits with large faces, but our algorithm, as shown in Figure 8 is able to perform quit well on family and group images. Experimental results are carried out on a 2046 MB PIV processor using MATLAB 7.1 and image processing toolbox 5.0.2. It should be noted that Matlab codes are usually 9 or 10 times slower than their C/C++ equivalents [16]. In our experiments, the average execution times of some typical images are shown in Table I. Table I – Average execution time Size of image

Skin pixel percentage

No.of enlargemenrt Iterations

Execution Time (sec)

60Ⱋ60

50

3

24

120Ⱋ90

40

4

124

340Ⱋ350

30

5

3100

The times reported are not very accurate because the time of scanning an image highly depends on the percentage of skin pixels and the variety of face sizes in the image. The training time of NN comparing to other methods is very low. It is about 3 minutes for NN while others take several hours. In the proposed method, the low training time is because we use low dimensional feature vectors. The detection rate is shown in Table II. The detection rate before applying eyes/lips templates is ~93% and FP is about 10%. After the second stage the detection rate does not change but the FP decreases to 6%. The reason for this decrease in FP rate is the removal of those blocks with similar edges to a face.

Detection Rate

False Positives (FP) Rate

93%

6%

Experiments show that the proposed algorithm is effective and efficient in detecting frontal faces with different races and under different lighting conditions. Because of direct relation between true positives and false positives in detectors, the thresholds are determined in such an order that it could maintain a balance between high and low TP-FP rates to achieve the best results.

6. Conclusion In this paper, a face detection algorithm in color images is proposed. The contribution of this paper is combining skin-color information filtering, classifying faces/non-faces based on features extracted from contourlet decomposition of an image and template matching technique. Our method in first stage detects skin regions over the entire image. The skin detection method is robust to variations of skin color. This property improves the speed of face detection method as well as its accuracy. Skin blocks are passed to a MLP classifier that is built on features extracted from multiresolution decomposition. This stage verifies the face candidate skin blocks. Then we employ template matching to decrease the false positive rate. The results show that applying the proposed methods on images, almost all frontal faces with low false positive rate in color images can be detected. In future we are going to do multi-view face detection.

References [1] H.Sajedi, M.Najafi, S.Kasaei, "A Boosted Skin Detection Algorithm based on color and Texture Information", 5th International Symposium on Image and Signal Processing and Analysis, 2007, Turkey [2] M.Yang, D.Kriegman, N.Ahuja "Detecting Faces in Images: A Survey", IEEE Transactions on pattern analysis and machine intelligence, Vol. 24, No. 1,2002 [3] P. Viola, M. Jones, "Rapid object detection using a boosted cascade of simple features", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai,Hawaii, , pp. 511–518, 2001.

- 682 -

(a)

(b) (a) Figure 8. (a) Extracted skin regions, (b) Face detection result

[4] Huang, L.-L., Shimizu, A., "A multi-expert approach for robust face detection". Pattern Recognition 39 (9), 1695–1703.,2006 [5] Shih, P., Liu, C., "Face detection using discriminating feature analysis and support vector machine". Pattern Recognition 39 (2), 260–276,2006 [6] A. Samal and P.A. Iyengar, "Human Face Detection Using Silhouettes," Int’l J. Pattern Recognition and Artificial Intelligence, vol. 9, no. 6, pp. 845-867, 1995 [7] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang,"Phoneme Recognition Using Time-Delay Neural Networks,"IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 328-339, May 1989. [8] E. Osuna, R. Freund, and F. Girosi, "Training Support Vector Machines: An Application to Face Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 130-136, 1997. [9] L.Nanni, A.Lumini, "A multi-expert approach for wavelet-based face detection",Elsevier, Pattern Recognition Letters, 2007 [10] Z.Jin, Z.Lou, J.Yang, Q.Sun, "Face detection using template matching and skin-color information",elsevier, Neurocomputing 70,pp. 794– 800,2007 [11] Yue. Lu and Minh. N. Do, "A Directional Extension for Multidimensional Wavelet transforms," IP EDICS: 2-WAVP (Wavelets and Multiresolution Processing), April., 2005. [12] Zhou, A. L. da Cunha, and M. N. Do, "Nonsubsampled contourlet transform: construction and application in enhancement", Proc. of IEEE

(b)

International Conference on Image Processing, Genoa, Italy, Sep. 2005. [13] R. G. Baraniuk, N. Kingsbury, and I. W. Selesnick, “The dual-tree complex wavelet transform - a coherent framework for multiscale signal and image processing,” IEEE SP Mag., 2005. [14] P.Sharma, R.Reilly, "A colour face image database for benchmarking of automatic face detection algorithms",4th EURASIP Conference ,pp.423-428, Zagreb,2003 [15] Bao Face Datacase, Available: http://www.facedetection.com/facedetection/datasets. htm [16] Kyung-Min Cho, Jeong-Hun Jang and Ki-Sang Hong ,Adaptive skin-color filter , Pattern Recognition, Vol.34, Issue 5, pp. 1067-1073,2001

- 683 -