Robust Frontal Face Detection in Complex

0 downloads 0 Views 107KB Size Report
image segmentation method based on connected components labelling to select candidate face areas. 2). We propose a positive-negative attractor template to.
Robust Frontal Face Detection in Complex Environment Quan YUAN[1], Wen GAO[2], Hongxun YAO[1] Department of Computer Science and Engineering, Harbin Institute of Science [2] Institute of Computing Technology, China Academy of Science

[1]

Abstract We have constructed a simple and fast system to detect frontal human faces in complex environment. There are two main contributions of our work: 1) We use a fast image segmentation method based on connected components labelling to select candidate face areas. 2) We propose a positive-negative attractor template to examine face areas. A valley detector is used to search the valley-like points of eyes and mouths. We test the system on images in complex environment and with confusing objects. The experiment shows a robust detection result with few false detected faces.

1. Introduction Human face detection has an important use in many fields, such as face recognition, object-oriented image coding and intelligent human-computer interaction. A variety of face detection methods have been reported. But there still remain several difficulties that have not been satisfactorily resolved. Different conditions of illumination, confusing objects and various face orientations are still challenges to face detection applications. Generally, there are two groups of methods applied in face detection field, namely, bottom-up methods and top-down methods[1]. The bottom-up methods firstly detect features on the face such as eyes and mouths, and then examine the features according to their relative positions or some statistical models, such as checking local maxima[6] and triangle relationship of features[3]. The top-down methods take the whole face as a detection unit, such as hierarchical knowledge-based method[7] and neural network[2][4]. More and more recent research works combine these two methods together to get robust results. This paper presents a fast segmentation method to combine neighboring pixels with similar hue. And then it

detects valley features and examines them according to a positive-negative template. These methods are integrated into one system.

2. Image segmentation based on neighboring similarity Color information is very effective in image segmentation. And many previous researches examine the probable hue of human skin in images. But sometimes the illumination will affect the hue of a face and make it difficult to set a fixed range. We notice that the neighboring pixels of an object surface always tend to have similar hues. Although the illumination makes them deviate from the original color, the deviation tends to be the same. So we propose a segmentation method based on neighboring pixels similarity for color images.

2.1. Grouping of similar neighboring pixels We use 4-neighbor pixels as the neighboring pixels that are on the up, down, left and right side of a pixel.

Figure 1. 4-neighbor pixels We convert the colors from RGB space to YUV space to examine the hue similarity. UV plane stands for the hue information of the pixel. And the Y coordinates represent the illumination. So we use the UV coordinates to detect the hue similarity of two pixels. The similarity of two pixels is examined by Cartesian product of two points X 1 ( u1 , v1 ), X 2 ( u 2 , v 2 ) on the UV plane. S stands for the similarity function:

1051-4651/02 $17.00 (c) 2002 IEEE

S

$

u1u 2  v1v 2

$





105 and 150 . We give a wider range of 85$ ,220$ to make the system robust to more kinds of illumination.

u12  v12 u 22  v 22

The Cartesian product represents the cosine of the angle T between two vectors X 1 and X 2 . To make the segmentation more compatible to human eyes, we add the Y coordinates as a criterion that if the difference between y1 and y 2 is larger than a threshold

T y the two pixels are not regarded as similar, no matter how much S is. To check the similarity of every two neighboring pixels, we design an algorithm based on connected component labelling to compare all the pixels under O(N) time complexity, where N is the number of the pixels in an image. It goes from top to bottom and from left to right. At each pixel it only examines the similarity of this pixel to other two pixels –the one on the top and the one on the left, as it shows in Figure 2.

Figure2. The algorithm to examine the similarity between pixels We can see that at each pixel P, the two pixels Q R it compares with are already checked before. So we have 4 kinds of results after the comparison: A) P is not similar to Q or R, and then we start a new group and add P in it. B) P is similar to Q(R) but not similar to R (Q), then we add P in the group of Q(R). C) P is similar to both Q and R while Q R belong to the same group, then we add P in the group of Q. D) P is similar to both Q and R but Q R belong to different groups, then we combine these two groups and add P in it. When all the pixels are scanned, the image is automatically divided into several groups. But we still have a problem to set a proper threshold to check whether two pixels are similar or not. If the threshold is too high, it will generate a lot of small regions like “noise”. If the threshold is too low, we are likely to falsely combine different regions and we can hardly segment them in further steps. So we set a second step to combine similar neighboring groups based on the average hue of a single group. The threshold for pixels similarity in the first scan is higher than that in the second scan for groups. And in this method we avoid a lot of wrong groupings.

2.2. Candidate area selecting Previous research[2] gave the range of face color on UV plane that the vector should be located between the angel

Figure 3. The grouping result (different colors indicate different groups) The average hue of each group is checked under this criterion and only those fitted in the range enter next step. The groups that occupy an area less than

1 of the 80

image are discarded. We set this criterion on the assumption that the face area is not very small in an image. We check compactness and symmetry approximately to decide whether a segment can possibly be a face area. We set the smallest rectangle that can cover all the pixels of this group. Then we examine the line on left and right bounders to see if the number of the pixels of this group on one bounder is larger than

1 2

length of the bounder line. If not, we move the bounder inside. And so does the bounders of top and bottom. At last we check the area of the handled segment. If the area is less than

1 of the image, the segment is discarded. 80

We divided the rectangle area into four parts with a vertical line and a horizontal line that cross the center of the segment. Then we check the four parts to see if the number of pixels in this group is approximately the same in each part. If the number differs too much for any two parts, the segment is discarded.

1051-4651/02 $17.00 (c) 2002 IEEE

o

o

Figure 4. Checking shape and compactness of a segment The criterion we set for a candidate area is not very strict. But most impossible face areas are discarded after the selecting process. This gives a small burden for further steps to verify a face area.

3. Feature detection and verification The features on a face such as eyes and mouth provide important information for detection. Many previous applications explored the detectors of the features and the relationship among them. We propose a detector to search valley-like features in the image and then check their possibilities as the genuine features of a face with a positive-negative attractor template.

3.1. Valley-like features detection

Figure 6. Valley-like feature detection (2 u 2 block mosaic image)

3.2. Face verification with positive-negative attractor template In the image of feature detection we can see that the features are distributed regularly in the face area. The eyes, mouth and the bottom edge of nose include most of the feature points. And the chins, forehead and nose have very low density of valley-like feature points. We build a square template that has positive attractors and negative attractors inside.

We notice that in gray level face images the eyes and mouths are areas that have much lower gray level than the area around them. So a valley-like feature detector is designed to check the difference between the gray level of the surrounding pixels and the gray level of core pixels. We find that the performance is improved if we use the minimum instead of the average gray level of surrounding pixels. The time complexity of this detector is O( N log M ) , where M is the number of surrounding pixels in the detector and N is total pixel number in the image. We make the original image into lower resolution mosaic images then apply the detector.

Figure 5. Valley-like detector As the size of a face area in an image varies, we create two kinds of mosaic images with different blocks sizes of 2 u 2 and 4 u 4. The performance is shown in Figure 6.

Figure 7. Positive-negative attractor template The + indicates a positive attractor and -- indicates a negative attractor. When we cover the template on a candidate face area, each feature point is captured by an attractor if their distance is the shortest. The template is set approximately the same area of a candidate face segment and is allowed to move in a small range to get the best matching result. We set two criterions to decide whether a segment is a face after the capturing process: A) Each of the three groups of positive attractors (two eyes and the mouth) captures feature points more than threshold T1 . T1 is set as a fraction of the number of pixels in the segment area.

1051-4651/02 $17.00 (c) 2002 IEEE

B) The total number of feature points captured by positive attractors is more than T2 times of those captured by negative attractors. The candidate areas meeting these two requirements are marked as a face area. The template is built according to nearly 50 frontal face pictures. As we have mentioned, we use mosaic images of different block size. The template matching process verifies a face if one of them meets the requirements.

of the images have confusing objects that have skin-like color and oval or square shape. The false detection mainly results from the occlusion of important features on the faces or severe unbalanced illumination on a face that causes false segmentation of a face area. Table 1. Testing results Number of Correctly Miss False Faces detected detected detected 137 120 17 2

4. Experiments

5. Conclusions

We tested our system on two kinds of images --- static images and real time video frames. They are under different illumination conditions. The size of each image is 320 u 240. The machine is PIII 600 with 256M RAM. In real time test the processing speed is about 350 msecs a frame without any special hardware or MMX longword commands. Most time is consumed in the segmentation process. But as the process is under the time complexity of O(N). It is very promising to speed up as the machine develops. Although the template we used is mainly for upright frontal faces, the experiments show that it also detects faces with slightly tilted orientation.

In this paper we propose a novel method to detect human faces in complex environment. To avoid the effect of different illumination we use neighboring pixels grouping based on hue similarity. In the candidate face area we detect valley like features that can possibly be eyes and mouths. And a positive-negative attractor template is proved to be very efficient in checking face area. The system has a very good real time performance in complex environments.

References [1] Ming-Hsuan Yang, Kriegman, D.J. Ahuja, N. Detecting faces in images: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, Issue 1. Jan 2002, Page(s): 34 -58 [2] Hongming Zhang, Debin Zhao, Wen Gao, Xilin Chen. Combining Skin Model and Neural Network for Rotation Invariant Face Detection, Proceeding of 3rd International Conference on Multimodal Interfaces, 237-244, Oct. 14-16, 2000. [3] Chiunhsiun Lin, Kuo-Chin Fan. Human face detection using geometric triangle relationship. In Proceedings, 15th International Conference on Pattern Recognition, Volume: 2, Page(s): 941-944 vol.2, 2000. [4] H. Rowley, S. Baluja, T. Kanade Neural Network-Based Face Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, January, 1998, pp. 23-38. [5] Yu Hengyong, Wang Yong, Mou Xuanqin, Cai Yuanlong. Face detection model based on distance measure of regional feature. WCCC-ICSP 2000. 5th International Conference on Signal Processing Proceedings, Volume: 3, Page(s): 1479 –1482, 2000. [6] Hoogenboom R., Lew M. Face detection using local maxima. Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, Page(s): 334 –339,1996. [7] G. Yang and T. S. Huang, Human Face Detection in Complex Background. Pattern Recognition, vol. 27, no. 1, pp. 53-63, 1994.

Figure 8. Some detection results Compared with other applications of face detection, our system implements a simple and fast method. Most of the test images are in multi-object environment. About 1/4

1051-4651/02 $17.00 (c) 2002 IEEE

Suggest Documents