A Subspace Approach to Face Detection with

0 downloads 0 Views 233KB Size Report
observation windows are filtered by linear SVM with ... For face detection, an observation window in ..... J.C. Platt, “Sequential Minimal Optimization: A Fast.
A Subspace Approach to Face Detection with Support Vector Machines Haizhou Ai, Lihang Ying, Guangyou Xu Dept. of Computer Science and Technology, Tsinghua University, Beijing 100084, PR China Email: [email protected] Abstract We present a subspace approach to face detection with Support Vector Machine (SVMs). A linear SVM classifier is trained as a filter to produce a subspace in which a non-linear SVM classifier with Gaussian kernel is trained for face detection. This makes training easier and results in a very efficient face detection algorithm. Experimental results demonstrate their promising performance compared with some well-known existing detectors.

1. Introduction Face detection has been intensively researched in recent years due to its possible applications in security access control, visual surveillance, content-based information retrieval, advanced human and computer interaction, for a survey see [1]. In the literature, there have been many different approaches of which in the case of color pictures, skin tone can be an important cue for reducing searching space [2], but in general, features in gray level pictures should be considered. There are mainly two kinds of approaches to face detection: heuristic approaches that from component candidate extraction such as eyes, nose, mouth as cues to deduce whether a global faces exist [2]; and statistical learning approaches that of using ensemble face classifier based on training methods such as ANN[3], SVM[4]. Recently the statistical learning methods have been proved to be very effective in many difficult practical pattern recognition problems including face detection. But usually the training involves huge amount of samples that will be computation intensive and it is difficult to retrain a detector in order to adapt it to a particular application domain. Recently some approaches to this problem have been published including CGM (Constrained Generative Model) [5] for ANN, RSM (reduced set vectors) and SRSM (Sequential Reduced Set Machine) for SVM [6]. In this paper, we discuss the problem of cut down training space by subspace method, instead of optimizing decision function after training we divide original space into much smaller target space in which training and detection becomes much easier and efficient. So that it can be easily adapted to particular

applications via retraining through collected samples from specific domain. This paper organized as follows: In Section 2 the face detection framework is introduced. In Section 3 gives a brief description on skin-color segmentation. After a review of general theory of SVM, training procedures are illustrated in Section 4. Experiment results are given in Section 5 and follows with conclusion in Section 6.

2. Face Detection Framework As shown in Figure 1, it is a two–stage SVM procedure in which the first linear SVM filters out face candidates from all observation window in input space that resulted in a greatly reduced subspace, and then the non-linear SVM makes the final decision about whether each face candidate is really a face. Since most of the observation windows are filtered by linear SVM with lower computing complexity, it results 20 times speedup in the final detection procedure. For color images, skin color segmentation can be used before the two-stage SVM face detection. For face detection, an observation window in dimension 20×20 is used as basic processing unit as commonly used in [3][4]. In order to detect multi-scale faces, image pyramid are generated of which each observation window across all possible positions, scales should be checked. Each observation window, face sample and non-face sample is first preprocessed to exclude impossible face candidates based on lowest square difference check, then is normalized by histogram equalization, minus a lighting adjustment plane, and finally a transformation to a distribution with the same average and square difference before being fed to a two stage SVMs.

1

I n i tial Non -Face S am ple s

Train in g I m age s

Non -Face S am ple s

Face S am ple s

Lin e ar S V M Train in g

I n pu t I m age s

S k in -C olor Mode l

Lin e ar S V M C las s ifie r

S k in C olor S e gm e n tation

3. Skin Color Segmentation

Non -Face S am ple s Fals e Fals e A larm s A larm s Non -lin e ar S V M Train in g

S u bs pace

Non -lin e ar S V M C las s ifie r

Face D e te ction

Figure 1 Face Detection Framework Support Vectors (SVs). The classifier is

For skin color segmentation, a lookup table of 256× 256×16 of binary value 0 or 1 in HSV color space is set up over a training color face samples in which 1 corresponds to skin color and 0 non-skin color. After skin color classification, pixels of skin color will be grouped into rectangle regions according to color uniformity and pixel connectivity, and then rectangle regions will be merged according to color and position nearness and some heuristic rules about permitted scale difference and area changing bounds after merging [9].

  (2) f ( x ) = sign  ∑ α i y i K ( xi , x ) + b    xi ∈SVs Where b = − 1 ∑ α i y i [K ( x r , xi ) + K ( x s , xi )] , xr , xs are 2 xi ∈SVs

different types of SVs [7]. In the case of linear SVM, kernel function in the equation (2) is simply replaced by: (3) K (x i ⋅ x j ) = ( x i ⋅ x j ) For non-linear SVM with Gaussian kernel, the kernel function is as follows:

K (x i ⋅ x j ) = e

4. Face Detection with Two-Stage SVMs 4.1 Support Vector Machines SVM is a maximum margin classification tool based on Structural Risk Minimization (SRM) principle [7]. In theory it is superior to those methods based on Empirical Risk Minimization (ERM) principle, such as ANN. SVM minimize an upper bound on VC dimension as opposed to ERM that minimize the error on the training data. Given samples ( yi , xi ), xi ∈ R n , yi ∈ {−1,+1}, i = 1, L, l (where l is the number of samples, x i is the sample, and y i is the teacher of x i ), and a kernel function K ( xi , x j ) , SVM is formed as solving the quadratic programming problem: l 1 l l (1) α = arg min ∑∑ α iα j yi y j K (xi ⋅ x j ) − ∑α i α 2 i =1 j =1 i =1 0 ≤ α ≤ C, subject to  l i

N on -Face S am ple s

i = 1,L, l

 ∑α i y i = 0  i =1

C is a parameter which trades off wide margin with a small number of margin failures. All the xi corresponded to non-zero αi are the

xi − x j

2

2σ 2

(4)

As we can see from formula (3)(4), the speed of nonlinear SVM is directly related to the number of SVs. 4.2 Training Linear SVM for Subspace For linear SVM the optimal separating hyperplane has the form: f ( x ) = sign ( w ⋅ x + b ) l

∑α x y

, b = − 1 w ⋅ [xr + xs ] . In this 2 i =1 case w can be explained as a special template for template matching to separate face from non-face. Instead of the above decision rule, we directly compute f ( x ) = w ⋅ x + b and choose a threshold T that when f ( x ) > T the observation window passes the filter, otherwise no. Where w =

i

i

i

We collected 815 faces from all 479 training images as the original face samples. Each of those faces is transformed into 24 faces via reflection, stretch 1.1, enlarge 1.1, turn 50. left and right. Totally 19,560 face samples are produced, of which 12,434 are randomly selected as face samples for SVM training.

2

12,512 non-face samples for SVM training are randomly collected from all windows in a set of 156 images without any face. The linear SVM is trained by SMO algorithm [8] with linear kernel and C=200. Input vectors are 374 in dimension excluding some corner points of 20x20 window. Figure 2 gives the image form of w and the performance of the linear SVM is described in Figure 3 by the curve of face pass rate versus non-face filtered rate, which evaluated on the 5290 face samples independent from the training samples and all 12,282,348 non-face windows from a set of 13 images without any face.

(3) Collect false alarms by SVM in the linear SVM filtered subspace on the same training set. The nonface samples are then patched with the newly collected false alarms. (4) Add more face samples that are false negatives of the present SVM on extend training images. (5) Repeat (2) (3) (4), until collected enough non-face samples. With Gaussian kernel function the non-linear SVM is trained by SMO algorithm [8] with C=200. At the final loop 10,649 non-face samples are used together with 7,748 face samples in training SVM, which resulted in 3,712 support vectors, of which a few support vectors are given in Figure 4.

Figure 2. The image form of w

Figure 4. Some support vectors (top: faces; bottom: non-faces) In order to reduce the false alarms, the arbitration similar to that in [3] with two non-linear SVMs is introduced. The non-linear SVM trained above is named as G-SVM1. And another non-linear SVM(G-SVM2) is trained with the same method. 4.4 Face Positioning

Figure 3. Face pass rate versus non-face filtered rate 4.3 Training Non-Linear SVM for Face Detection We separate the whole training set into two parts: 406 basic training images and 73 extend training images. With the same method as in 4.2 section, totally 16,944 face samples are produced from basic training images, of which 5125 are randomly selected as original face samples for SVM training. Non-face samples are collected through bootstrapping procedure in linear SVM filtered subspace. After each round of bootstrapping procedure, we also add more face samples that are false negatives of the non-linear SVM of this round tested on the extended training images. In this way, not only training becomes much easier but also it reduces the number of nonlinear SVM’s support vectors. The bootstrapping procedure is as follows: (1) Initialize non-face samples with false alarms passed the linear SVM filtering on a selected training set of images with or without faces. (2) Training SVM with the face samples and the non-face samples.

In order to detect faces in different scale, each image is repeatedly subsampled via a ratio 1.2 and results a pyramid of images. Each image in the pyramid is filtered firstly by the linear SVM and then by the non-linear SVM to produce similarity map. Faces are positioned via local maximum search as follows: Initialize a face candidate list, Scan each similarity map, whenever a value exceeds a threshold, check whether the corresponding rectangle overlaps with existing ones in the candidate list, if not then put that rectangle in the list, otherwise the one with larger similarity value replace the smaller. Map all the candidate faces found in each scale back to the original input image and check those candidates in the same way like the above to produce the final detection results.

5. Experimental results Two test sets independent from training set are used for performance evaluation, of which one is our own that consists of 230 images of various types with 545 upright frontal faces, the other is CMU’s test set [4] that consists of 130 images with 507 upright frontal faces. Search scales are corresponding to face size from 20×20 to

3

256×256 (that amounts to 14 scales in maximum) for all the images. Comparative results are shown in Table 1 and Table 2, in which results of CMU Rowley’s system 5 (only single neural network is used) and system 11 (two neural networks are used for arbitration) on the same test set are given. We use the average number of windows processed per second (WPS) on PC (PIII-933CPU, 256M memory) as the speed criterion of detection. Some processed images are given in Fig.5. Table 1. Detect rate and false alarms for the color image test set (230 images with 545 frontal faces) Methods Detect rate False alarms WP S 6,081 Linear SVM+G- 96.5% 824 SVM1 5,853 Linear SVM 69 91.2% +G-SVM1+G-SVM2 9,321 CMU System 5 97.8% 1,841 5,681 Rowley [3] System 11 96.3% 87 Table 2. Detect rate and false alarms for the CMU’s test set (130 images with 507 frontal faces) Methods Detect rate False alarms Linear SVM+G-SVM1 82.3% 322 Linear SVM+G-SVM1+G- 75.4% 68 SVM2 CMU System 5 90.5% 570 Rowley [3] System 11 86.2% 23

are those with great shadow or very low resolution that are not covered by our training set.

6. Conclusion In this paper we propose a subspace method of face detection with two-stage SVMs of which a linear SVM produces a subspace that filters out most of non-faces observation windows and then a non-linear SVM trained in this subspace is used for face detection. Owning to training in the subspace, training of non-linear SVM becomes easier. And since most of the observation windows are filtered by linear SVM with lower computing complexity, the face detection procedure is greatly speed up. Comparative results on both test set of our own and the well-known CMU’s test set demonstrate the effectiveness of this algorithm. Acknowledgment Dr. H.A. Rowley and Prof. T. Kanade of CMU kindly provide us with their binary codes for our testing use. We express our sincere thanks to them.

References 1.

2. 3. 4. 5. 6. 7.

8. Fig3. Some Processed Images on Test Sets:A,B:from our test set(using skin color segment+Linear SVM+G-SVM1); C,D: from CMU test set(using Linear SVM+G-SVM1) It can be shown that our algorithm can achieve comparable performance under rather small compact training set. Many of not detected faces on CMU test set

9.

M. H. Yang, N. Ahuja, D. Kriegman, “A survey on face detection methods”, available at : http://vision.aiuiuc.edu/mhyang/papers/survey.ps.gz, 1999. R.-L. Hsu, A.-M. Mohamed and A.K. Jain, “Face Detection in Color Images”, ICIP2001. H A Rowley, S Baluja, T. Kanade, “Neural networkbased face detection”, IEEE Trans. PAMI, 20(1):23~38, 1998. E. Osuna, R. Freund, F. Girosi, “Training support vector machines: an application to face detection”, In Proc. of CVPR, Puerto Rico, pp.130-136, 1997. R. Feraud, O. J. Bernier, et.al., “A fast and accurate face detection based on neural network”, IEEE PAMI, 23(1), 42-53, 2001, Jan. S. Romdhani, P Torr, et.al., “Computationally Efficient Face Detection”, ICCV2001. S.R. Gunn, “Support Vector Machines for Classification and Regression, Technical Report”, Image Speech and Intelligent Systems Research group, University of Southampton, 1997. J.C. Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines”, Technical Report MSR-TR-98-14, 1998 H.Z. Ai, L.H. Liang, G.Y. Xu, “A General Framework for Face Detection, Lecture Notes in Computer Science”, Vol.1948, Springer-Verlag Berlin Heidelberg New York, pp.119-126, 2000.

4

Suggest Documents