Simon J.D. Prince and Jania Aghajanian. Department of Computer Science, University College London. ABSTRACT. Many previous studies have investigated ...
GENDER CLASSIFICATION IN UNCONTROLLED SETTINGS USING ADDITIVE LOGISTIC MODELS Simon J.D. Prince and Jania Aghajanian Department of Computer Science, University College London ABSTRACT Many previous studies have investigated gender classification in well-lit frontal images. In this paper we consider images where the pose, expression and lighting are relatively unconstrained. We localize faces using a standard sliding-window detector. We preprocess the facial region by convolving with Gabor filters at at four scales and four orientations. We sample these responses and concatenate them to form a feature vector. We develop a classifier based on an additive sum of non-linear functions of one-dimensional projections of the data. In particular we investigate arc tangent and weighted sums of Gaussians. We describe a training method based on increasing the binomial log likelihood. We demonstrate that our system on two databases and show that it performs well relative to the state of the art. Index Terms— Gender identification, Boosting 1. INTRODUCTION A considerable body of work in computer vision has addressed facial analysis. Automatic gender classification is a particularly important subtask, since it can be used as a preprocessing step for face recognition and has other applications such as demographic and psychological analysis. Current gender classification methods can be divided to appearance-based and feature-based methods. Appearance based approaches attempt to classify the whole image simultaneously and have included multi-layer neural networks, [1, 2], radial basis function (RBF) networks [3, 4] and support vector machines [5, 6, 7]. Feature-based approaches emphasize that feature selection is an important issue for gender classification. Previous methods of this type have used independent components analysis [8], and Local Binary Patterns (LBPs) [9, 7]. These have been used in conjunction with adaboost-based classifiers such as threshold adaboost [10, 9] and Look Up Table (LUT) adaboost [11]. Adaboost-based methods are generally faster. Many of the above algorithms achieve impressive performance on controlled databases such as FERET [12]. However most of these algorithms use manual detection and alignment of the face images which has been shown to improve performance [13]. Only a small number of studies have automati-
Fig. 1. Our goal is to classify gender in faces that have been harvested from the web with a conventional face detector. Ticks and crosses indicate performance of our classifier which achieves 87.5% performance for this database. cally extracted faces using a face detector [14, 5, 15]. Moreover the images used to evaluate current methods have been taken in controlled conditions, are mainly frontal and show small or no variations in pose, lighting and background. Recently M¨akinen and Raisamo [13] compared state of the art methods on images collected from internet in which faces were automatically detected suing a sliding window adaboost detector [16]. These images are still frontal however they present much more variation in lighting, and background clutter compared to images from FERET database. They test six state of the art methods (two appearance-based and four feature-based) on this database. They show that performance of current methods drops sharply when evaluated on automatically detected images from the web. In Section 2.1 we propose a novel model for gender classification in uncontrolled images based on additive logistic models (see Figure 1). In Section 2.2 we describe an algorithm to efficiently learn these models. In Section 3 we
b.
Activa ation, x
a.
T xi = a0 + f1 (w1T zi , θ1 ) + . . . fN (wN zi , θN )
0
The activation consists of an additive sum of a constant a0 and a series of functions fn each of which acts on an associated linear projection of the data wnT zi and has associated parameters θn . Possible function classes include:
Pr((t=1)
1
• Heaviside Step Function, θ = {a, b}
d. w1
Activa ation, x
0
c.
f (wT z|a, b) = aH[wT z + b] • Arc Tan Functions, θ = {a, b, c, d} f (wT z|a, b, c) = a arctan[bwT z + c]
Pr((t=1) Activattion, x Pr(tt=1)
w2
(4)
• Radial Basis Functions, θ = {a1 . . . aJ } " 2 # T J X w z − µ j f (wT z|a1... j) = aj exp −0.5 σj j=1
0
f.
(3)
0
1
e.
(2)
(5)
0
2.2. Learning 1
We learn the additive logistic models by maximizing the log binomial posterior probability:
0
Fig. 2. Additive logistic regression. The activation x is incrementally modeled by adding arc tan functions. (a) Feature space showing class 1 (circles) and class 2 (squares). (b) Activation x and posterior probability P r(t = 1) along dotted line are both initially flat. (c) After adding a single arc tan function (d) Associated activation and posterior probability (e) after adding a second arc tan function. (f) Associated activation and posterior.
argmax L = argmax
a0 ,w1...N ,θ1...N
a0 ,w1...N ,θ1...N
I X
1 log[P r(ti |zi )]
where ti is the gender of the i’th face. Rather than optimize this complicated function all at once, we take a sequential approach in which we add the functions f1...n one at a time. in the first stage we optimize over parameters a0 , w1 , θ1 to fit a model with activation: xi = a0 + f1 (w1T zi , θ1 )
present results for the same database as that used by M¨akinen and Raisamo [13] as well a database we have gathered ourselves from the internet. This database is more challenging since it also includes larger variations in pose. In Section 4 we summarize our results and draw conclusions. 2. METHODS 2.1. Additive Logistic Models We aim to calculate the posterior probability that the gender ti of the face is male (ti = 1) rather than female (ti = 0) based on an observed feature vector zi so that 1 P r(ti = 1|zi ) = yi = 1 + exp(−xi )
(1)
The activation xi indicates the tendency for the face i to be considered male and is defined as:
(6)
i=1
(7)
At the second stage, we fix w1 and θ1 and optimize over parameters a0 , w2 and θ2 to fit a model with activation: xi = a0 + f1 (w1T zi , θ1 ) + f2 (w2T zi , θ2 )
(8)
At the n’th stage, we fix w1...n−1 and θ1...n−1 and optimize over parameters a0 , wn and θn and so on. This process is illustrated in Figure 2. At each stage we optimize the binomial log likelihood criterion, L using a Raphson-Newton method. This requires the first and second derivatives of L with respect to parameters φn = {a0 , wn , θn } of x which are easily calculated using the relations: N X ∂xi ∂L = (ti − yi ) ∂φn ∂φn i=1
(9)
N X ∂2L ∂xi ∂xi T ∂ 2 xi = y (y − 1) + (t − y ) (10) i i i i ∂φ2n ∂φn ∂φn ∂φ2n i=1
a.
b.
Percent Error
Log Likelihood
Test Train
www, arc tan
Iteration Number
Iteration Number
c.
d. Test Train
Percent Error
Percent Error
Test Train
www, rbf
Iteration Number
UCL, arc tan
Iteration Number
Fig. 3. (a) Log binomial likelihood monotonically increases as we add arc tan functions to model the www data. (b-c) Training and test error for www database as a function of number of arc tan and RBF functions respectively (d) Training and test error for arc tan functions with UCL database. Using this approach, we need the derivatives of the function. However, this is not the case for the Heaviside function as it is not smooth. Although there are ways to circumvent this problem, we only consider the arc tan and RBF functions for the remaining part of the paper. Note that this model has close connections with other learning methods. For the Heaviside function, this method resembles classical boosting (in fact, it is logit-boost [17]): the functions f1...n can be interpreted as weak classifiers. For the arc-tangent functions the final classifier resembles a multi-layer perceptron in which hidden units are sequentially added and optimized. 3. EXPERIMENTS We test our system on (i) the www database of [13] which consists of grayscale images (3048 training and 760 test) each of size 32 × 40 pixels (ii) our own database (referred to as the UCL database) which consists of grayscale images (32000 training and 1000 test) images each of 60 × 60 pixels. These were captured from on-line dating websites and contain significant variation in pose, lighting and expression (see Figure 1). In each case faces were automatically detected by a commercial face detector. The final image was normalized relative to the scale and orientation of the detected region such that the hair region was included. Each image was histogram equalized and filtered with a
bank of Gabor filters at 4 scales, 4 orientations and 2 phases. The resulting filtered images were sampled at regular intervals, where the space between samples was proportional to the filter wavelength. The sampled values from all scales, orientations and phases were concatenated to form a feature vector z of size 1064 for the www database and 928 for the UCL database. Finally, each dimension was whitened so that the training data had mean zero and variance 1. For the www database only, we flipped the training images in the horizontal direction and added them back to the original training set. For the www database, we trained classifiers based on both arc tangent and radial basis functions (RBFs). In order to speed up the learning process we restrict the projection directions w to be sparse, so each uses only a randomly chosen subset of 500 of the 1064 possible dimensions. For the RBF model we used 15 basis functions with µ1...15 evenly 2 distributed between ±4 and variances σ1...15 all set to one. We trained 200 functions which took 2 hours on a modern desktop PC. In testing we calculate the posterior probability that the face was male using Equation 1 and threshold this value at 0.5. Figure 3a shows binomial log-likelihood against the number of arc tan functions added. As expected, this is monotonically increasing. The results for the training and test data are shown as a function of the number of functions added s in Figure 3b. Similarly to boosting methods, the training error rapidly decreases to zero, but the test error continues to decrease for both models. With 3500 functions, the test performance for the arc tan model was 79.34%. Figure 3c shows the equivalent graph for the radial basis function model. Here, the learning is somewhat slower but produces a final test performance of 81.97% with 3500 functions. This compares favorably to the models tested in [13] as shown in Table 1. METHOD Neural Network SVM Threshold Adaboost LUT Adaboost Mean Adaboost LBP+SVM Our Approach (arc tan) Our Approach (RBF)
Error Rate 60.26 75.13 75.26 76.71 71.84 76.71 79.34 81.97
Table 1: Comparison of results for www database [13].
Although our approach competes well with the state of the art, we note that the performance is still somewhat poor. We conjecture that part of the problem may be the relatively small amount of training data and the low original resolution of the images. To investigate this issue, we also tested our approach using the UCL database which contains a much larger set of training images each of which is 60 × 60. The results are plotted for the arc tan function in Figure 3d. As for the www database, test error continues to decline after the training error has reached zero. Examples of successful and failed
classifications are shown in Figure 1. With 300 functions, the performance was 87.5%. 4. DISCUSSION In this paper, we have presented an approach to gender classification based on constructing additive sums of non-linear functions of the data that are then passed through the logistic function. We present a method for training this classifier by sequentially adding functions to maximize the binomial loglikelihood criterion. We test two varieties of our method on the www database of [13]. Our results compete favourably with previously published work. We also trained with a much larger database of uncontrolled images that were collected from the web. Under these conditions, performance increases by approximately 10% to 87%. This database is available to other researchers upon request from the authors. However, we note that this is probably an overestimate of the true performance for images such as these: we only test on faces that were identified by the commercial face detector. This only identified ∼ 70% of faces in the original images and notably failed under significant pose and lighting variations. These are exactly the situations where gender classification is hardest. However, we know that good performance is possible with the complete set: in informal experiments, we have estimated that human gender classification performance for manually extracted 60 × 60 pixel faces from this database is 96%. We conclude that there is still room for improvement in gender classification methods. 5. REFERENCES [1] G.W. Cottrell and J. Metcalfe, “EMPATH: face, emotion, and gender recognition using holons,” Proceedings of the 1990 conference on Advances in neural information processing systems, pp. 564–571, 1990. [2] B.A. Golomb, DT Lawrence, and T.J. Sejnowski, “Sexnet: A neural network identifies sex from human faces,” Advances in Neural Information Processing Systems, vol. 3, pp. 572–577, 1991. [3] H. Abdi, D. Valentin, B. Edelman, and A.J. O’Toole, “More about the difference between men and women: evidence from linear neural network and the principalcomponent approach,” PERCEPTION-LONDON-, vol. 24, pp. 539–539, 1995. [4] R. Brunelli and T. Poggio, “HyperBF Networks for Gender Classification,” Image Understanding 1992, 1992. [5] B. Moghaddam and M.H. Yang, “Learning Gender with Support Faces,” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, pp. 707– 711, 2002.
[6] Y. Saatci and C. Town, “Cascaded Classification of Gender and Facial Expression using Active Appearance Models,” Proc. of the 7th Intl. Conf. on Automatic Face and Gesture Recognition, vol. 80, pp. 393–400, 2006. [7] H. Lian and B. Lu, “Multi-view Gender Classification Using Local Binary Patterns and Support Vector Machines,” LECTURE NOTES IN COMPUTER SCIENCE, vol. 3972, pp. 202, 2006. [8] A. Jain and J. Huang, “Integrating independent components and linear discriminant analysis for gender classification,” Int. Conf. Automatic Face and Gesture Recognition, pp. 159–163, 2004. [9] N. Sun, W. Zheng, C. Sun, C. Zou, and L. Zhao, “Gender Classification Based on Boosting Local Binary Pattern,” LECTURE NOTES IN COMPUTER SCIENCE, vol. 3972, pp. 194, 2006. [10] S. Baluja and H.A. Rowley, “Boosting Sex Identification Performance,” International Journal of Computer Vision, vol. 71, no. 1, pp. 111–119, 2007. [11] B. Wu, H. Ai, and C. Huang, “LUT-Based Adaboost for Gender Classification,” LECTURE NOTES IN COMPUTER SCIENCE, pp. 104–110, 2003. [12] P.J. Phillips, H. Wechsler, J. Huang, and P.J. Rauss, “The FERET database and evaluation procedure for facerecognition algorithms,” Image and Vision Computing, vol. 16, no. 5, pp. 295–306, 1998. [13] E. M¨akinen and R. Raisamo, “An experimental comparison of gender classification methods,” Pattern Recognition Letters, vol. 29, no. 10, pp. 1544–1556, 2008. [14] S. Gutta and H. Wechsler, “Gender and ethnic classification of human faces using hybridclassifiers,” Neural Networks, International Joint Conference on, vol. 6, 1999. [15] G. Shakhnarovich, PA Viola, and B. Moghaddam, “A unified learning framework for real time face detection andclassification,” in Automatic Face and Gesture Recognition, 2002. Proc. Fifth IEEE Intl. Conf. on, 2002, pp. 14–21. [16] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” IEEE COMP. SOC. CONF. ON COMPUTER VISION AND PATTERN RECOGNITION, vol. 1, 2001. [17] T. Hastie J. Friedman and R. Tibsharini, “Additive logistic regression: a statistical view of boosting,” Annals of Statistics, , no. 2, pp. 159–163, 2000.