Component-based robust face detection using AdaBoost and decision

0 downloads 0 Views 875KB Size Report
Component-based robust face detection using AdaBoost and decision tree. Kiyoto Ichikawa. Tokyo Institute of Technology. 4259-G2-4, Nagatsuta. Midori-ku ...
Component-based robust face detection using AdaBoost and decision tree Kiyoto Ichikawa Tokyo Institute of Technology 4259-G2-4, Nagatsuta Midori-ku Yokohama, Japan [email protected]

Takeshi Mita Toshiba Corporation 1, Komukai-Toshiba-cho Saiwai-ku Kawasaki, Japan [email protected]

Abstract We present a robust frontal face detection method that enables the identification of face positions in images by combining the results of a low-resolution whole face and individual face parts classifiers. Our approach is to use face parts information and change the identification strategy based on the results from individual face parts classifiers. These classifiers were implemented based on AdaBoost. Moreover, we propose a novel method based on a decision tree to improve performance of face detectors for occluded faces. The proposed decision tree method distinguishes partially occluded faces based on the results from the individual classifies. Preliminarily experiments on a test sample set containing non-occluded faces and occluded faces indicated that our method achieved better results than conventional methods. Actual experimental results containing general images also showed better results.

Osamu Hori Toshiba Corporation 1, Komukai-Toshiba-cho Saiwai-ku Kawasaki, Japan [email protected]

propose a novel method using a decision tree to detect occluded faces.

2. Face Detection using face parts 2.1. Related works False negatives occur when partially occluded or slightly rotated faces as shown in Figure 1 (a), (b) and (c). Moreover, false positives (detecting a non-face region as a face) occur when the background resembles a face image. How-

(a)

(b)

(c)

Figure 1. Occluded and rotated face images.

1. Introduction Facial features are useful for systems that require face detection, including personal identification [12, 11], surveillance, human-machine interface [10]. Many frontal face detection methods based on learning algorithms such as eigenface [9], neural network [1] and kernel-support vector machine (kernel-SVM) [4], have been studied so far. Despite their good results, it is difficult for conventional methods to detect partially occluded faces. Most conventional methods use only whole face information so that they cannot deal with occluded faces and different illumination conditions on faces. One of the approaches to detect occluded faces is to use information of face parts as local features. Our goal is to detect such faces by utilising face parts, such as eyes, nose and lips, as well as the whole face. Our method consists of a low-resolution whole face and individual face parts classifiers using AdaBoost [8]. The classifiers scan over an input image, and then the output values from the individual classifiers are combined based on linear discriminant analysis (LDA) [7]. Moreover, we

ever, face parts are less sensitive to different illumination conditions than a whole face, because they are local features. In addition, even though some parts of a face are occluded, not hidden parts can be used as source of information for face detection. Therefore, we would like to take advantage of face parts for face detection. Heisele et al. [5] have proposed to use face parts for their face detection system. They showed that it achieved better performance when using face parts rather than only using the whole face. In their system, they used a kernel-support vector machine (kernel-SVM) as a basic classifier and LDA to combine the results from individual face parts classifiers. However, it was not designed to directly deal with partially occluded faces. AdaBoost provides slightly better classification performance and a much faster classifier than the kernel-SVM for face detection. (The experimental results are shown in 3.1.) We therefore use AdaBoost with combining the output values from a whole face and individual face parts classifiers in our proposed method.

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06) 0-7695-2503-2/06 $20.00 © 2006

IEEE

Classifier for nose

Classifier for lip

Classifier for left eye

Classifier for right eye

Classifier for whole face

Classifier for right mouth corner

face combine

Classifier for left mouth corner

non face

Figure 2. Component-based face detection.

2.2.

Component-based framework

face

detection

Figure 2 shows our face detection framework. Classifiers are trained in advance for a whole face and individual face parts images using AdaBoost, in the same manner as Viola and Jones [2]. Our system uses a left eye, a right eye, a nose, lips, a left mouth corner and a right mouth corner as individual face parts. First, each classifier scans over an input image and returns the result as a confidence measure. The system then determines whether each object is a face or a non-face by combining output values from individual classifiers. Four types of Haar-like features [2], the so-called tworectangle and three-rectangle features as shown in Figure 3, were used in our face detector in the same manner of Viola and Jones. The value of a two-rectangle feature is the difference between the sums of the pixels within two rectangular regions. The regions have the same size and shape and are horizontally or vertically adjacent. A three-rectangle feature computes the sum within two outside rectangles subtracted from the sum of the centre rectangle. These features measure the difference in intensity between the rectangular regions. For example, the feature, as shown in Figure 3 (a), measures the difference in intensity between a region of the left eye and a region across the upper cheeks. The feature, as shown in Figure 3 (b), compares the pupil with a region of the white of the eye.

2.3. Combining results from individual classifiers The individual classifiers return values as confidence measures. To combine the output values from individ-

ual classifiers, liner discriminant analysis (LDA) method is used. Equation (1) gives how to calculate the total score based on LDA [7]. This type of analysis is a classification method that projects high-dimensional data onto a line and performs classification in a one-dimensional space when dealing with two classes. The projection maximises the distance between the means of the two classes while minimising the variance within each class.  N 1 i=0 ei hi (x) ≥ T H(x) = (1) −1 otherwise where N is the number of classifiers, hi (x) is the classifiers for whole face and face parts, ei is the factor of the first eigen vector calculated from Fisher’s criterion (Equation 2), and T is the threshold. At Sb A JS (A) = t (2) A Sw A where A is an m × n matrix with (m ≤ n), Sw is a withinclass scatter matrix and Sb is a between-class scatter matrix. The first row of A is the first eigen vector, which corresponds to the largest eigen value in Equation (2).

3. Preliminarily experiments on test sample sets 3.1. Results on a test set containing nonoccluded faces The performance of the proposed method described in 2.3 is evaluated in preliminarily experiments on test sample sets. The frontal face image sample set consisted of 14,520 hand labelled images that were clipped and aligned to produce 19 by 19 pixels to get whole faces and individual face parts. The face images were collected from captured TV program videos. Some examples are shown in Figure 4. The non-face image sample set consisted of 700,000 images extracted from pictures containing no faces. Before evaluating our proposed method, we compare the performance of AdaBoost (method of Viola et al.) with kernel-SVM which Heisele et al. used in [5]. The kernel-SVM were trained with 2nd-degree polynomial kernel using the same training sample as Viola and Jones’ method were trained. The number of weak learners was 500 for the Viola and Jones’ method and the numbers of support vectors of classifiers for whole face, left eye, right eye, nose, lip, left mouth corner Right Eye

Left Eye Whole Face

(a)

(b)

(c)

(d)

Figure 3. Two-rectangle and three-rectangle features.

Left Mouth Corner

IEEE

Right Mouth Corner

Figure 4. Non-occluded face sample set.

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06) 0-7695-2503-2/06 $20.00 © 2006

Lip

Nose

1

1

LDA on non-occluded faces

Correct detection rate

Correct detection rate

0.95

SVM with LDA

0.995

AdaBoost with LDA 0.99

Viola and Jones’ method based on AdaBoost (whole face)

Viola and Jones’ method (whole face) on non-occluded faces

0.9

LDA on eyes occluded faces 0.85

0.985

SVM (whole face)

0.0001

0.001 0.01 False positive rate

0.1

0.8 0.00001

1

Figure 5. ROC curves on a test set containing 14,520 non-occluded faces and 700,000 non faces. and right mouth corner based on kernel-SVM were 1098, 1365, 1244, 1144, 1105, 1718 and 1489 respectively. The performance is shown as receiver operating characteristic (ROC) curves. An ROC curve depicts the false positive rate on the x-axis and the correct detection rate on the y-axis. The more an ROC curve approaches the top left of the graph, the higher performance it shows. To create ROC curves, a classification threshold is changed within a range from +∞ to −∞. In the case of face detection systems, the method based on AdaBoost (method of Viola et al.) performs better than SVM with polynomial kernel as shown in Figure 5, Furthermore the method of Viola et al. is approximately 27 times faster than kernel-SVM. It is natural that the combined classifiers for individual face parts using AdaBoost performs better than using kernel-SVM.

0.0001

0.001 0.01 False positive rate

0.1

1

Figure 7. ROC curves on test set containing 14,520 eyes occluded face images and 700,000 non-face images or test set containing 14,520 non-occluded faces and 700,000 non-face images. 1

LDA on non-occluded faces Viola and Jones’ method Correct detection rate

0.98 0.00001

Viola and Jones’ method (whole face) on eyes occluded faces

0.95 (whole face) on

non-occluded faces

LDA on mouth occluded faces

0.9

0.85

0.8 0.00001

Viola and Jones’ method (whole face) on mouth-occluded faces 0.0001

0.001 0.01 False positive rate

0.1

1

3.2. Results on a test set containing partially occluded faces

Figure 8. ROC curves on test set containing 14,520 mouth occluded face images and 700,000 non-face images or test set containing 14,520 non-occluded faces and 700,000 non-face images.

Figures 6 (a) and (b) show examples of partially occluded face images that are often observed by a surveillance camera. This is because sunglasses and a mask are often wore and sometimes suspicious people will try to conceal their face, therefore accurate identification is required by surveillance systems. Figure 7 and 8 show the ROC curves for the Viola and Jones method and the LDA method. In Figure 7 the LDA method was significantly better than the Viola and Jones

method for both sample sets. However, both methods indicated bad performance on the test sample set containing eyes occluded faces. In Figure 8, although the deterioration was less than that in the eyes occluded faces, the same trend was observed. Since the projection weight of the eyes classifiers in the LDA method are larger than that of mouth (lips and corners) classifiers, the deterioration might be small.

4. Decision tree to detect partially occluded faces

(a) Eyes occluded (b) Mouth occluded Figure 6. Partially occluded face sample.

This section presents a novel face detection method using a decision tree to detect partially occluded faces in images and significantly improve the LDA method.

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06) 0-7695-2503-2/06 $20.00 © 2006

IEEE

1

LDA < Threshold

≥ Threshold

0.95

Suggest Documents