Cascade AdaBoost Classifiers with Stage Features Optimization for Cellular Phone Embedded Face Detection System Xusheng Tang, Zongying Ou, Tieming Su, and Pengfei Zhao Key Laboratory for Precision and Non-traditional Machining Technology of Ministry of Education, Dalian University of Technology, Dalian 116024, P.R. China
[email protected], {ouzyg, tiemings}@dlut.edu.cn
Abstract. In this paper, we propose a novel feature optimization method to build a cascade Adaboost face detector for real-time applications on cellular phone, such as teleconferencing, user interfaces, and security access control. AdaBoost algorithm selects a set of features and combines them into a final strong classifier. However, conventional AdaBoost is a sequential forward search procedure using the greedy selection strategy, redundancy cannot be avoided. On the other hand, design of embedded systems must find a good trade-off between performances and code size due to the limited amount of resource available in a mobile phone. To address this issue, we proposed a novel Genetic Algorithm post optimization procedure for a given boosted classifier, which leads to shorter final classifiers and a speedup of classification. This GAoptimization algorithm is very suitable for building application of embed and resource-limit device. Experimental results show that our cellular phone embedded face detection system based on this technique can accurately and fast locate face with less computational and memory cost. It runs at 275ms per image of size 384×286 pixels with high detection rates on a SANYO cellular phone with ARM926EJ-S processor that lacks floating-point hardware.
1 Introduction Many commercial applications embedded in cellular phone demand a fast face detector, such as teleconferencing, user interfaces, and security access control. Several face detection techniques have been developed in recent years [1], [2], [3], [4]. However, due to the limitation of hardware of the mobile phone (only a few KB memory and a processor with low frequency), fast face detection embedded on mobile phone is a challenging task. It must find a good trade-off between the high detection rates, runtime and code size. Recently, Viola [2] introduced an boosted cascade of simple classifiers using Haarlike features capable of detecting faces in real-time with both high detection rate and very low false positive rates, which is considered to be one of the fastest systems. Viola implemented this face detector on the Compaq iPaq handheld and has achieved detection at two frames per second (this device has a 200MIPS Strong Arm processor) [2]. The central part of their method is a feature selection algorithm based on L. Wang, K. Chen, and Y.S. Ong (Eds.): ICNC 2005, LNCS 3612, pp. 688 – 697, 2005. © Springer-Verlag Berlin Heidelberg 2005
Cascade AdaBoost Classifiers with Stage Features Optimization
689
AdaBoost [5]. Much of the recent work on face detection following Viola-Jones has explored alternative-boosting algorithms such as Float-Boost [6], GentleBoost [7], and Asymmetric AdaBoost [8]. However, AdaBoost is a sequential forward search procedure using the greedy selection strategy. Because of its greedy character, neither the found weak classifiers nor their coefficients are optimal. In this paper we proposed a post optimization procedure for each completed stage classifier based on Genetic Algorithm, which removes the redundancy feature and leads to shorter final classifiers and a speedup of classification. This is very important to the mobile phone because its memory is only a few KB. A face location system on mobile phone using our proposed framework is built and tested on test database. The experimental results demonstrate that our face location system can be implemented on a wide range of small resource-limit devices, including handholds and mobile phones. The remainder of the paper is organized as follows. In section 2 the Adaboost learning procedure proposed in [2] is introduced. The stage Optimization procedure based on Genetic Algorithms is presented in section 3. Section 4 provides the experimental results and conclusion is drawn in section 5
2 Cascade of AdaBoost Classifiers There are three elements in the Viola-Jones framework: the cascade architecture, a rich over-complete set of Haar-like feature, and an algorithm based on AdaBoost for constructing ensembles of simple features in each classifier node. A cascade of face classifiers is degenerated decision tree where at each stage a classifier is trained to detect almost all frontal faces while rejecting a certain fraction of non-face patterns. Those image-windows that are not rejected by the initial classifier are processed by a sequence of classifiers, each slightly more complex than the last. If any classifier rejects the image-windows, no further processing is performed, see the Fig.1. The cascade architecture can dramatically increases the speed of the detector by focusing attention on promising regions of the images. Each stage classifier was trained using the Adaboost algorithm [5]. AdaBoost constructs the strong classifier as a combination of weak classifiers with proper coefficients. This is an iterative supervised learning process. Given a training set { xi , yi } ,
the weak classifier h j is to find a threshold θ j which best separates the value of the Haar-like feature f j of the positively labeled samples from the negatively labeled
samples. Thus, the weak classifier can be defined as follow: ⎧1 h j ( x) = ⎨ ⎩0
p j f j < p jθ j otherwise
.
(1)
where p j is 1 if the positive samples are classified below the threshold or –1 is the positive samples are classified above the threshold. After T rounds of Adaboost training, T numbers of weak classifiers h j and ensemble weights α j are yielded by learning. Then a final strong classifier H (x) is defined as follow:
690
X. Tang et al.
⎧⎪1 H ( x) = ⎨ ⎪⎩0
∑Tj α j h j ( x ) ≥ θ . otherwise
(2)
The threshold θ is adjusted to meet the detection rate goal.
Fig. 1. Cascade of classifiers with N stages. At each stage a classifier is trained to achieve a hit rate of h and a false alarm rate of f
3 Genetic Algorithms for Stage Optimization According to the model of the boosting classifier (Equ.2), the stage classifier could be regarded as the weight combination of weak classifier {h1, h2 ,L , hT } . Each weak classifier hi will be determined after the boosting training. When it is fixed, the weak classifier maps the sample xi from the original feature space F to a point
xi* = h( xi ) = {h1( xi ), h2 ( xi ),L, hT ( xi )} .
(3)
in a new space F* with new dimensionality T. As AdaBoost is a sequential forward search procedure using the greedy selection strategy, neither the found weak classifiers nor their coefficients are optimal. At the same time, classifiers with more features require more time to evaluate and more memory to occupy. However, the more features used the higher detection accuracy may be achieved. So performance at each stage classifier involves a tradeoff between accuracy, speed and hardware resources. This problem is more importance in embedded system. To address this issue, we use the Genetic algorithms to remove the redundancy and optimize the parameter. The procedure is summarized in Algorithm.1. z
z
z
Given example images ( x1, y1 ),L, ( xn , yn ) where yi = 0,1 for negative and positive examples respectively. Initialize weights ω1,i=1/2m,1/2l, for yi=0,1 respectively, where m and l are the number of negatives and positives respectively. For t=1,…,T 1. Normalize the weights, ωt ,i ← bution.
ωt ,i ∑ nj =1ωt ,i
so that ωt is a probability distri-
Cascade AdaBoost Classifiers with Stage Features Optimization
691
2. For each feature, j, train a classifier h j which is restricted to using a single Haar-like feature. The error is evaluated with respect to
ωt , ε j = ∑i ωi h j ( xi ) − yi . 3. Choose the classifier, ht , with the lowest error ε t .
1− e 4. Update the weights: ωt +1,i = ωt ,i βt i where ei = 0 if example xi is classified correctly, ei = 1 otherwise, and βt =
εt . 1 − εt
z
⎧⎪1 The final stage classifier is: h( x ) = ⎨ ⎪⎩ 0
z
Genetic Algorithms Stage Classifier Post-Optimization (See Algorithm 2)
1 ∑T t =1α t ht ( x ) ≥ θ where α t = log . β otherwise t
Algorithm. 1. Post-optimization procedure of a given boosted stage classifier based on Genetic algorithms
3.1 Evolutionary Search
Genetic algorithms [9] are nondeterministic methods that employ crossover and mutation operators for deriving offspring. The power of GA lies in its ability to exploit, in a highly efficient manner, information about a large number of individuals. 3.1.1 Individual Representation In order to apply genetic search a mapping must be established between concept descriptions and individual in the search population. The representation scheme that we use can be described as follows. Assume that the stage classifier contains T weak classifiers (hi) with T weight values αi a stage threshold value (b). This information is encoded a string just like Fig.2.
Fig. 2. The representational structure of individual
K1~KT denotes the weak classifiers, Ki =0 means the weak classifier is removed and Ki =1 means the weak classifier is present. α1 ~ αT denotes the weight values, which range in [0, 10]. The b denotes the stage threshold value, which range in [-6,1]. 3.1.2 Fitness Function The fitness function is composed of two functions that satisfy the two characteristics of fewer weak classifier and lower error rate.
692
X. Tang et al.
The first fitness component concerns the efficiency of stage classifier and mode size. In terms of these, it is preferable that fewer weak classifiers give a correct prediction for a given object rather more weak classifiers. The following component is designed to encourage this behavior. F1 = 1 − l / T .
(4)
Where l is the number of weak classifier is selected, that is l = ∑T i =1 Ki , T is the total number of weak classifiers. The next fitness component concerns accuracy measures-high hit rate (h) and low false alarm rate (f). We defined as follow:
⎧⎪1 − n − / N − if m + / M + ≥ h F2 = ⎨ . ⎪⎩ if m + / M + < h 0
(5)
m+ is the number of labeled positive samples correctly predicted, M+ is the total number of labeled positives samples in the training set, n- is the number of labeled negative samples wrongly predicted, N- is the total number of labeled negative samples in the training set, h is the hit rate of the original stage classifier in the training set. Given samples (x1, y1)…(xn, yn) where yi=0,1 for negative and positive samples respectively and a set of weak classifiers {h1, h2 ,L, hT } . According to the chromosome representation in the Section 3.1.1, the prediction function is defined as follow: where:
⎧⎪1 H ' ( xi ) = ⎨ ⎪⎩ 0
∑T i =1 Kiαi hi ( xi ) + b ≥ 0 . otherwise
(6)
Thus the total fitness function for the genetic algorithm can be represented as the weighted function of the two components (F1, F2) defined above. Then the total fitness function is defined as follow: F = w1F1 + w2 F2 .
(7)
where w1 and w2 are fitness weights that can be adjusted to balance the efficiency, mode size and accuracy of the classifier. 3.1.3 Genetic Operators As the representational structure of individual is composed of three parts, the crossover and the mutation operators must be work on each part respectively. This ensures that the representational structure of individuals is preserved through crossover and mutation. In this paper, we adopt the point crossover and mutation. The process of crossover and mutation is shown in Fig.3 and Fig.4. (P, X and β are the random value of mutation)
Cascade AdaBoost Classifiers with Stage Features Optimization
693
Fig. 3. The illumination of the crossover
Fig. 4. The illumination of the mutation
3.1.4 Search Search is performed as per standard genetic search. An initial population of individuals is generated. These individuals are then evaluated according to Equ.7. And the fitter individuals are chosen to undergo reproduction, crossover, and mutation operations in order to produce a population of children for the next generation. This procedure is continued until either convergence is achieved, or a sufficiently fit individual has been discovered. The evolutionary search algorithm is shown in Algorithm. 2. Further information on genetic algorithms can be found in [9]. z z z
z
Choose initial population Evaluate each individual's fitness function Repeat 1. Select individuals to reproduce 2. Mate pairs at random 3. Apply crossover operator 4. Apply mutation operator 5. Evaluate each individual's fitness function Until terminating condition
Algorithm. 2. Genetic Algorithms Stage Classifier Post-Optimization 3.2 Cascade Face Classifiers Learning Framework
Training a classifier for the face detection task is challenging because of the difficulty in characterizing prototypical “nonface” images. It is easy to get a representative sample of images which contain faces, but much harder to get a representative sample of those which do not. We adapted “bootstrap” method [10] to reduce the size of the training set needed. The negative images are collected during training, in the following manner, instead of collecting the images before training is started.
694
X. Tang et al.
1. Create an initial set of nonface images by collecting m numbers of random images. Create an initial set of face images by selecting l numbers of representative face images. Given the stage number N , the total false alarm rate of total stage f. 2. Train a stage face classifier using these m+l numbers of samples by Algorithm.1. 3. Add this stage face classifier to ensemble a cascade face classifier system. Run the system on an image of scenery that contains no faces and filter out m numbers of negative images that the system incorrectly identifies as face. Stage number=stage number+1; 4. If (stage number f) Go to step 2. Else Exit.
4 Experiment In this section, a face detection system on the cellular phone using our proposed algorithm is implemented. The performance of our algorithm and standard AdaBoost algorithm is also compared. The performance evaluation concentrates on the speed, complexity of the learned cascaded classifiers and the memory occupy under the cellular phone environment. 4.1 Training Dataset
We collected 10000 face images and 20,000 non-faces images from various sources, such as AR, FERET, and from WEB, covering the out-of-plane rotation in the range of [-20°, +20°] of out-of-plane rotations. The dataset contains face images of variable quality, different facial expressions and taken under wide range of lightning conditions. The 10000 face images are cropped and rescaled to the size of 20×20. 5000 face images and 10000 non-face images are used in training and other images are used in GA post optimization. 4.2 Training Process
The training is implement on PC with CPU of AMD AlthonXP2500+. Two face detection systems were trained: One is standard AdaBoost and one with our novel postoptimization procedure for each completed stage classifier. Parameters used for evolution were: 80% of all individuals undergo crossover ( pc = 0.8 ), 10% of all individuals were mutated ( pm = 0.1 ) and the population was initialized randomly. The GA terminated if the population was converged to a good solution so that no better individual was found within the next 5000 generations. If convergence did not occur within 10000 generations, the GA was stopped as well. 4.3 Experiment Results
Our face detection system is implemented using C language with ARM compiler. The system is tested on a mobile phone that integrates with a CCD camera; 2MB of RAM and ARM926EJ-S processor that lacks floating-point hardware.
Cascade AdaBoost Classifiers with Stage Features Optimization
695
We tested our system on the BIO face test set [11]. This set consists of 1520 images with 1521-labeled frontal faces with a large variety of illumination and face size, and very complex background. This database is believed to be more difficult than some commonly used head-and-shoulder face database without complex background. The ROC curve over the Bio-FaceDabase test is shown in Figure 5. The mode and the average detection time of two face detection systems are listed in Table 1.The numbers of weak classifiers trained by standard Adaboost and our novel GApost-optimization procedure for each completed stage classifier are summarized in Table 2. Table 1. Model size and detection time of two face detector on the cellular phone
Face Detector With GA-post-optimization Without GA-post-optimization
Model size 80KB 110KB
Average detection time 275ms 395ms
Table 2. the nuber of weak classifiers by with and without GA-post-optimization
Stage No. 1 2 3 4 5 6 7
Number of weak classifiers GA-postAdaBoost optimization 13 10 17 11 22 15 29 20 39 29 48 34 58 41
Stage No. 8 9 10 11 12 13 Total
Number of weak classifiers GA-postAdaBoost optimization 82 60 113 86 128 94 157 107 160 122 138 97 1004 726
Fig. 5. ROC cures for the face detector based on AdaBoost with and without GA-postoptimization on the Bio test set. The tests were run on a SANYO cellular phone with ARM926EJ-S processor that lacks floating-point hardware.
696
X. Tang et al.
Fig. 6. Sample experiment results of our mobile phone embedded face detection system on realtime detection
From the experiment results shown in Table 1 and Table 2, we can see that the numbers of weak classifier of each stage classifiers are reduced due to the Genetic Algorithms optimization. The model size of face detector with GA-optimization is also about 30% smaller. This is very important, as the memory resource of the mobile phone is only a few KB. Due to the features reduction, the average detection time of face detector with GA-optimization is about 30% faster. And the performance of detection is only slightly dropped, which can be seeing from Fig.5. Some real-time detection results using our mobile phone embedded face detection system are shown in Fig.6. As can be seen in Fig.6, our system is able to detect face in variable lighting conditions and movements such as rotation, scaling, up and down, profile, and occlusion.
5 Conclusion In this paper, we present a novel stage post-optimization procedure for boosted classifiers by applying the genetic algorithms. This method effectively improves the learning results. The classifier trained by the novel method was about 22% faster and consists of only 70% of the weak classifiers needed for a classifier trained by standard AdaBoost while the detection rate only slightly decrease. The reduction of the model size can be very important in mobile phone context where the weak classifiers are expensive to compute and implement. Experimental results show that our object detection framework is very suitable for the resource-limit devices, including mobile phones, smart cards or other special purpose due to the reduction of the number of weak classifiers. Our GA-post-optimization algorithms can also be applied with other boosting algorithms like e.g. FloatBoost [5] or GentleBoost [7].
Cascade AdaBoost Classifiers with Stage Features Optimization
697
References 1. Rowly, H., Baluja, S., and Kanade, T.: Neural network-based face detection. PAMI, Vol. 20 (1998) 23-38 2. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. IEEE CVPR, (2001) 511~518 3. Romdhani, S., Torr, P., Schoelkopf, B., and Blake, A.: Computationally efficient face detection. In Proc. Intl. Conf. Computer Vision, (2001) 695–700 4. Henry, S., Takeo, K.: A statistical model for 3d object detection applied to faces and cars. In IEEE Conference on Computer Vision and Pattern Recognition. (2000) 5. Freund, Y., Schapire, R.: A diction-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, Vol. 55 (1997) 1196. 139 Li, S.Z., Zhang, Z.Q., Harry, S., and Zhang, H.J.: FloatBoost learning for classification. In Proc.CVPR, (2001) 511-518 7. Lienhart, R., Kuranov, A., and Pisarevsky, V.: Empirical analysis of detection cascades of boosted classifiers for rapid object detection. Technical report, MRL, Intel Labs, (2002) 8. Viola, P., Jones, M.: Fast and robust classification using asymmetric AdaBoost and a detector cascade. In NIPS 14, (2002) 9. Goldberg, D.E.: Genetic algorithms in search, optimization, and machine learning, Addison-Wesley, Reading, A (1989) 10. Sung, K.K.: Learning and Example Selection for Object and Pattern Detection. PhD thesis, MIT AI Lab, January (1996) 11. http://www.bioid.com/downloads/facedb/facedatabase.html