Face detection based on template matching and support ... - IEEE Xplore

0 downloads 0 Views 413KB Size Report
In this paper, a face detection algorithm integrating template matching and Support Vector Machines (SVM) is presented. Two types of templates: eyes-in-whole ...
FACE DETECTION BASED ON TEMPLATE MATCHING AND SUPPORT VECTOR MACHINES Haizhou Ai,Luhong Liang, Guangyou Xu Dept. of Computer Science and Technology, Tsinghua University, Beijing 100084, PR China (ahz, xgy-dcs) @mail.tsinghua.edu.cn, [email protected] ABSTRACT

2. FACE DETECTION FRAMEWORK

In this paper, a face detection algorithm integrating template matching and Support Vector Machines (SVM) is presented. Two types of templates: eyes-in-whole and face itself, are used for coarse filtration, and the SVM classifier is used for classification. A bootstrap method is used to collect non-face samples for SVM training under template matching constrained subspace, which greatly reduced the complexity of training SVM. Comparative experimental results demonstrate its effectiveness.

As shown in Fig.1, template matching is used for filtering out face candidates for SVM to classify into face or nonface classes. Two templates: eyes-in-whole and face itself, are used one by one in template matching. SVM is trained via bootstrap procedures.

1. INTRODUCTION Face detection has bleen intensively researched in recent years due to its widely applications in security access control, visual surveillance, content-based information retrieval, advanced human and computer interaction. In the literature, there have been many different approaches which in the case of color pictures, skin tone can be an important cue for constraining searching space [l], but in general, algorithms for gray-level images should be considered, such as eigenface methods [2], view-based learning clustering [31, neural network based algorithms [4]. Recently, as a promising classification tool, SVM [5-61 attracts much attention in pattern recognition society. Osuna et.al. [7] developed a SVM training algorithm and used in face detectioa which demonstrated its potential in face detection problems. In all those learning based methods one of the key problems is the training complexity, even by bootstrap method it remains a great challenge. This is mainly due to the diversity of non-face samples compared with face samples. In this paper, we propose a subspace method for downsizing the training space via template matching filtering, by training SVM based on Sequential Minimal Optimization (SMO) algorithm proposed by Plat [8] we develop a face detection algorithm with very promising performance compared with some well-known existing detectors.

0-7803-6725-1/01/$10.00 02001 IEEE

1006

2.1. Template Matching 50 mugshots, of which each face is rectified by its feature points including pupils and comers of the mouth, then is transformed to a normalized scale (50x50) and a gray distribution of the same average (128) and squared difference (64). The average face is subsampled to the Fig.Z Templates size of 20x20 as the fundamental face template and of which an eyes-in-whole template is cut out with the size of 20x8 [9]. Given a template T[M][N] with the intensity average pT and the squared difference oT, an image window R[M][N] with pR and oR,the correlation coefficient r(T, R) is as follows:

An average face (Fig.2) is generated from a set of

M-IN-I

r ( ~ ,= ~ )i=o

(T[il[A - PT )(WILil- P R 1

1=0

M.N.o,.a,

(1)

Only those image windows of which both eyes-in-whole template matching and face itself are over a threshold (0.25) will be sent to SVM for further classification.

2.2. SVM Classifier

2.2.1 Support Vector Machines SVM is a powerful tool for classification developed by Vapnik [5-61 based on Structural Risk Minimization (SRM) principle. In theory it is superior to those methods based on Empirical Risk Minimization (ERM) principle, such as ANN. SVM minimize an upper bound on VC dimension as opposed to ERM that minimize the error on the training data. Given samples(y,,x,),x,ER",y,E {-l,+I},i=l;..,l and a kernel function K ( ~ , , X , )SVM , is formed as solving the quadratic programming problem:

OIa, IC, i = l ; . . , l

(1) Initialize non-face samples with false alarms detected only by template matching on a selected training set of images with or without faces. (2) Training SVM with the face samples and the non-face samples. Collect those non-face support vectors. (3) Collect false alarms by SVM in template matching filtered subspace on the same training set. The nonface samples are then composed of three parts: (a) All the non-face support vectors in (2); (b) Randomly select some of the other non-face samples in (2); (c) Randomly select some of the newly collected false alarms. (4) Repeat ( 2 ) ( 3 ) , until collected enough non-face samples. We choose Gaussian radial basis function as the kernel function. Input vectors are 374 in dimension excluding some comer points of 20x20 window. SVM is trained with C=200. At the final loop 5047 non-face samples are used together with 5125 face samples in training SVM, which resulted in 2207 support vectors. A few support vectors are shown in Fig.3.

All the x,corresponded to non-zero EJare the Support Vectors (SVs). The classifier is

iC C ~ . V , K (1X ! , X ) + ~

Fig.3 Some support vectors (top: faces; bottom: non-faces)

.j(.r)=sip7

I,ES)I

Where

(3)

6=-1 C q . ~ , [ K ( x , , x i ) + K ( x , , x , )xr] ~ xs are 7

2 X,ESVS different types of SVs [ 7 ) . Recently, Plat [XI developed a fast algorithm for training SVM called SMO (Sequential Minimal Optimization) that makes it possible for PC users to practice on complex applications. We implemented this algorithm for face detection. 2.2.2 Training SVMfbr Face Detection We rectified and cut out 706 faces from 406 images of our own as the original face samples. Each of those faces is transformed into 24 faces via reflection, stretch 1.1, enlarge 1.1, rotate 5' left and right. Totally 16944 face samples are normalized via histogram equalization, lighting rectification and gray-level normalization of which 5125 are randomly selected as face samples for SVM training. We collect non-face samples via bootstrapping method in template matching constrained subspace, shown in Fig.1. In this way, not only training becomes much easier but also results 20 times speedup in the final detection procedure via template matching coarse filtering. The collection procedure is as follows:

1007

In order to fusion information from multiple scales, a mapping is used to transfer SVM output to a face similarity measure in the domain [O,l]:

2.3. Face Positioning In order to detect faces in different scale, each image is repeatedly subsampled via a ratio 1.2 and results a pyramid of images. Each image in the pyramid is filtered by template matching and SVM together. Usually true faces will give high responses ( > O S ) in 2 or 3 consecutive scales, but non-faces not so often. According to this phenomenon, we adopts a similar strategy proposed by Rowley [4] to positioning each face in the filtered pyramid as follows: (1) Each group of three consecutive images are weighted sum UP: Given f ,., (,y,y) , f , , ( x , ~ ) ,) l,,+l(x,y) ,bilinear interpolation is used to make I,,(x,y)and l,,(x,*v)the same dimension to I,(.;y) that results in jn-l(x,y)and [ I + l ( ~ , y. ) Then the weighted sum is (Fig.4C):

And then

in(x, y ) is snioothed with a 3x3 weighted kernel

100% I

to produce the similarity map of this scale for face positioning (Fig.4D):

1

95%

~

c,

90% 85%

cri

801

a-.

75%

E

E G 70%

(2) Faces are positioned via local maximum search as follows: 4 Initialize a face candidate list, 4 Scan each similairity map, whenever a value exceeds a threshold, check whether the corresponding rectangle overlaps with existing ones in the candidate list, if not then put that rectangle in the list, otherwise the one with larger similarity value replace the smaller (Shown in Fig.4E, rectangle a overlaps with b, a is left due to its larger value than b). 4 Map all the candidate faces found in each scale back to the original input image and check those candidates in the same way like the above to produce the final detection results (Fip.4F) .

65%

-

60%

55% 50% 1. E-09

1. E-08

1. E-07

1. E-06

1. E-05

False Alarm Rate

Fig.5. Detect rate against false alarm rate on our test set Comparative results are shown in Table 1, in which results of CMU Rowley's system 5 (only single neural network is used) and system I 1 (two neural networks are used for arbitration) on the same test set are given. Table 1. Detect rate and false alarms for the test set1 (230 images with 545 frontal faces) (Methods 1 Faces I Detect I False I I Detected Rate Alarms Template Matching I 516 1 94.7% I 815 SVM CMU System5 533 97.8% 1841 Rowlevl41 Svstem 1 1 525 96.3% 87

1

I

+

A

a

Fig.4. Face localization (A. pyramid images; B. convolution images; C. average convolution image; D. likeness image; E. overlap elimination; F. final result)

3. EXPERIMENTAL RESULTS Two test sets independent from training set are used for performance evaluation, of which one is our own that consists of 230 images of various types with 545 upright frontal faces, the other is CMU's test set [4] that consists of 130 images with 507 upright frontal faces. Search scales are corresponding to face size from 20x20 to 256x256 (that amounts to 14 scales in maximum) for all the images. For our test set, the performance is shown in Fig.5, and totally 163,397,467 windows are searched. Some processed images are given in Fig.6, of which it takes 74.7s on PC (PIII-933 CPU, 256M memory) to process a 356x281 (Fig.6E) irnage with totally 253,256 windows searched.

Fig.6. Some proceeded images in our test set (a.b. scanned photos; c.d. image from WWW; e.f. camera grabbed images) For CMU's test set, experiment results are shown in Table2. Some processed images are given in Fig.7.

1008

Table 2. Detect rate and false alarms for the CMU’s test set (130 images with 507 frontal faces) (Methods I Faces I Detect I False alarm I

4. SUMMARY In this paper we propose a face detection algorithm based on template matching and SVM. The SVM is trained under template matching constrained subspace, which greatly reduced the complexity of training SVM and finally resulted in a much faster speed in detection via template matching preprocessing procedure. Comparative results on both test set of our own and the well-known CMU’s test set demonstrate the effectiveness of this algorithm.

4-SVM 90.5%

ACKNOWLEDGMENT Dr. H.A. Rowley and Prof. T. Kanade of CMU kindly provide us with their binary codes for our testing use. We express our sincere thanks to them. REFERENCES [ 1J C. Garcia, G. ziritas, “Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis”, IEEE Trans. Multimedia, 1(3):264-277, 1999.

[2] B. Moghaddam, A. Pentland, “Probabilistic Visual Learning for Object Representation”, IEEE Trans. PAMI, I9(7):696-710,

1997. Fig.7. Some proceeded images in CMU test set As a comparison with Osuna’s SVM [7] results on CMU’s test set B [4] (originally in MIT [3]) which contains 23 images, the performance is shown in Fig.8 in which the top detect rate reaches 80.7% (125 faces detected, 50 false ) in particular when alarms, false alarm rate 5 . 4 ~ 1 0 - ~and the detect rate is at the same level (74.2%) with [7] the false alarms decrease to 17 compared with 20 in [7]. 85%

I

[4] H A Rowley, S Baluja, T. Kanade, “Neural network-based face detection”, IEEE Trans. PAMI, 20( 1):23-38, 1998. [5] B.E. Boser, I.M. Guyon and V.N. Vapnik, “A training algorithm for optimal margin classifier”, In Proc. 5th ACM Workshop on Computational Learning Theory, pp. 144-152, Pittsburgh, PA, July 1992.

1

80%

[6] S.R. Gunn, “Support Vector Machines for Classification and Regression, Technical Report”, Image Speech and Intelligent Systems Research group, University of Southampton, 1997.

75% U

[3] K. Sung, T. Poggio, “Example-Based Learning for ViewBased Human Face Detection”, IEEE Trans. PAMI, 20( 1):39-5 I , 1998.

70%

i-’

2

65%

$

60%

*

[7] E. Osuna, R. Freund, F. Girosi, “Training support vector machines: an application to face detection”, In Proc. of CVPR, Puerto Rico, pp.130-136, 1997.

55%

[SI J.C. Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines”, Technical Report MSR-TR-98- 14, I998

50% 40% 45%

4

1. E-07

1. E-06 False alarms

[9] H.Z. Ai, L.H. Liang, G.Y. Xu, A General Framework for Face Detection, Lecture Notes in Computer Science, Vol.1948, Springer-Verlag Berlin Heidelberg New York, pp. I 19-126, 2000.

1. E-05

Fig.8. Detect rate against to false alarm rate on CMU test set B

1009

Suggest Documents