Jul 22, 2004 - identification, and to discriminate the images on the basis of the number of .... Once the skin color model M is built, we have to define a criterion.
CIHSPS2004 - IEEE International Conference on Computational Intelligence for Homeland Security and Personal Safety Venice, Italy, 21-22 July 2004
Face detection in color images of generic scenes Paola Campadelli, Raffaella Lanzarotti, Giuseppe Lipori Dipartimento di Scienze dell’Informazione Universit`a degli Studi di Milano Via Comelico, 39/41 - 20135 Milano, Italy {campadelli, lanzarotti, lipori}@dsi.unimi.it
Abstract - In this paper we describe a face detection algorithm, the key idea being to determine roughly the skin regions of a 2D color image and searching for eyes through them. The technic is based on a Support Vector Machine trained to separate sub images representing eyes from others. The algorithm can be used in face image database management systems both as a first step of a person identification, and to discriminate the images on the basis of the number of faces in them. Keywords - Face detection, skin color model, SVM.
I. I NTRODUCTION In this paper we propose an automatic method for face detection in still color images. Such task is the first fundamental step for many applications such as face recognition and 3D face reconstruction. Due to the high inter-personal variability (e.g. gender and race), the intra-personal changes (e.g. pose and expression), and the acquisition conditions (e.g. lighting and image resolution) face detection is not a trivial task and it is still an open problem. In literature (for a complete excursus see [24]), we can distinguish four main classes of approaches to the problem: Knowledge-based methods are developed on the rules derived from human knowledge of what a face is [22]. The problem of this approach is in defining good rules: if too strict, they may generate many false negatives, while if they are too general they can give many false positives. Template matching methods require the definition of several patterns of either the whole face or the facial features separately [6]. This approach has been proven to be inadequate for face detection since it cannot effectively deal with variation in scale, pose, and shape. Feature invariant approaches search for face structural features which are invariant to changes in pose, illumination, and viewpoint. An example of a largely adopted feature is the skin color: several works [11], [18], [21], [23] suggest modelling the skin color distribution with a Gaussian mixture model. Such approach seems reasonable in consideration of the fact that human skin forms a relatively tight cluster in color spaces, even when different races are considered [10]. The main problem of this method regards the determination of a probability threshold to be used for deciding which pixels correspond to the skin: both a fix and an adaptive threshold would fail especially in complex scene and in particularly inhomogeneous illumination conditions. Recently in [9] an attempt to find a more robust solution has been presented: before segmenting an image, a light compensation Work partially supported by project “Acquisizione e compressione di Range Data e tecniche di modellazione 3D di volti da immagini”, COFIN 2003.
algorithm is applied. While such approach works well in most of the situations, it still fails in the challenging ones, where, in general, it produces a high percentage of false positive. Such wrong segmentation requires to deal with more uncertainty in the next steps, and in some case it is the cause of errors. Appearance based methods use classifiers trained on samples which should represent the high variability of the human face appearance (for example due to the expression or the pose). In this context, in the last years numerous works based on the Support Vector Machines (SVMs) have been presented. The papers [13] has been the first to propose the SVMs as an instrument to be used to search faces within images, scanning them exhaustively and at different resolutions. However, such approach is computationally expensive and not general enough, since the SVMs (and the face classifiers in general) are very sensitive to pose variations. To overcome this limit, many attempts have been done; in particular we report the results obtained by Heisele and others [8] who proved that the performance of component-based approaches exceed clearly the ones of global approaches, being more robust to pose, illumination and expression variations. These considerations have led us to propose a hybrid system (according to the scheme presented previously): at first we search for skin regions within the image (characteristic invariant method); this step allows to restrict the search area for the subsequent step: an SVM (appearance based method) which is applied only in correspondence to the skin regions with the objective of discriminating between faces and non faces. In particular we trained the SVM to recognize eyes, meaning that an eye within a skin region confirms the face presence. In section II we describe the skin detection method and present the results obtained testing this module. In section III we present the validation step based on a SVM classifier and report the experimental results. Finally in section IV conclusions and further works are discussed.
II. S KIN
DETECTION
The first step of our method consists in localizing all the skin regions within the image. We have chosen to characterize them with their peculiar skin color and in particular with the chrominance components. The skin color model [7] consists of a Gaussian mixture model made of two bidimensional Gaussians, each one parameterized by (µi , σ2i I), which is simple and statistically well justified. The parameters are estimated using the EM algorithm [12], [14]. The skin color samples have been gathered under 10 different illuminations and from 8 different people determining a set of two million samples. The space color adopted is the YCbCr. Once the skin color model M is built, we have to define a criterion that, given an image, decide automatically and on the basis of M which pixels correspond to the skin. This allows to construct the
Skin-Map, that is a binary image where the pixels corresponding to the skin are set to 1 and the others are set to 0.
These simple rules allow to reduce significantly the Skin-Map, simplifying the task of the SVM.
A. Skin Map determination
C. Experimental results
The method we propose to determine the Skin-Map in generic color images is based on the watershed technique which allows to segment the image into catchment basins to be considered as elementary units in the further steps. We have adopted the algorithm proposed by Vincent and Soille [19] since it computes the watershed in linear time; it is based on the immersion simulations, and it proceeds labelling each basin with a progressive label, starting from the pixels with the darkest gray level. We have applied the algorithm to the gradient of the low-pass filtered image, since we have experimentally observed that this allows to extract basins which well describe the objects in the scene 1 . To this end, we have adopted a Gaussian filter with σ = 1 and the Sobel filter. Once the image is segmented, the algorithm associates to each basin its mean color in the YPbPr color space, and then builds the probability image P, such that for every pixel p, P(p) is the probability of p to be a skin color pixel. Afterwards P is thresholded putting to 1 the 5% of the highest values. The obtained binary image represents the Skin-Map approximation. As we can expect, the SkinMap besides including face regions, may include other body parts or even other objects whose color is similar to the skin one.
The skin detection module has been tested on three different databases, each one with different characteristics: • XM2VTS [5]: it consists of 1180 high quality images of single faces acquired in frontal position with homogeneous background. • BANCA [1] - Adverse: it consists of 2080 images each one representing one person placed frontally to the camera and looking down and with a non uniform background. The image quality and illumination are not very good. • DBLAIV: it consists of 845 color images collected in our LAIV laboratory [3], and representing an arbitrary number of people (even none) standing on an arbitrary background. The images are completely uncontrolled both regarding the acquisition conditions (very different illuminations, 5 cameras, etc.) and the face characteristics (pose, scale, expression, wearing or not glasses, partial occlusions, rotations, etc.). In order to evaluate the Skin-Map, we state that an error (false negative) occurs when, in correspondence to a face, the algorithm does not catch a region which covers at least the triangle including both eyes and the mouth. On the contrary, we state that false positives (regions included in the Skin-Map which do not correspond to a face) do not cause the failure of this step. Thus if the Skin-Map includes all the faces and eventual other portions of the image, we conclude it is correct. Of course it makes a big difference if the Skin-Map corresponds exactly to the faces or if it includes other regions; for this reason we qualitatively classify the correct results according to the following notions: Excellent results: we consider ‘excellent’ the cases when the Skin-Map includes no false positives. Good results: obtained either when the false positives are disjoint from the skin regions corresponding to the face or when they are joint but their extension does not exceed in dimension the face area. Poor results: we say a Skin-Map is ‘poor’ when it includes both a face and false positives, and the area of the face is smaller than the one of the false positive part. In table I the obtained results are reported:
B. Skin Map refinement With the objective of reducing further the search area for the subsequent step, we examined the characteristics of the basins corresponding to the regions in the Skin-Map, looking for a discriminant between the basins corresponding to faces and the others. Analyzing 100 images representing generic scenes we have concluded that: 1) face basins have generally labels with high values; this is due to the fact that frequently in correspondence to faces there is a high concentration of borders, that is high values in the edge images; remembering the watershed algorithm starts labelling the darkest gray levels, the basins corresponding to faces will have high values. 2) the face basins are small, since in correspondence to the face there are many features which fragment in many different basins the face region. We exploit such additional information in order to eliminate the basins that with high reliability do not correspond to a face: at first, we eliminate the basins whose label values are lower than the 1% of the maximum label used in the image. Second, we proceed analyzing the area of the basins the idea being to eliminate the big ones; however we cannot simply eliminate neither the biggest in the image nor the ones which exceed an absolute threshold; thus for each analyzed image, we estimate the mean µ and the standard deviation σ of the basin areas and we proceed eliminating the basins whose areas are greater than (µ + 5σ) under the condition that σ is high with respect to both µ and the image dimension |Img| expressed in pixel, that is if the following condition is verified: [σ > (0.2 · µ)] ∧ [σ > (0.01 · |Img|)] . 1 We have also experimented the method on the gray level image and on the gradient image derived from the original image, but in both cases an over-segmentation is obtained.
TABLE I: Skin Detection results Database
False Negatives
XM2VTS
4%
BANCA Adverse
14.6%
DBLAIV
5.43%
Correct Poor
Good
Excellent
Total
-
9.1%
86.9%
96%
-
12%
73.4%
85.4%
10.7%
67.5%
16.4%
94.6%
The best results occur with simple images (XM2VTS database), even if both the BANCA-Adverse and the DBLAIV have high detection rates. The main causes of errors are due to both the skin color model (which does not match with all the possible skin colors) and the elimination of big regions from the Skin-Map (which in some cases brings to deleting face regions).
Finally we are interested in comparing our algorithm with the one presented by Hsu and others [9]. Unfortunately a direct comparison can hardly be made, since we do not know which images of the cited database [2] have been used by the authors for their test. To have an idea of the different performances, we made an implementation of their algorithm from the description they gave in their paper. Experimenting both algorithms on a set of 100 images of the DBLAIV database, we obtained that ours is more selective in the skin identification [Fig. 1] making the subsequent validation step easier. Regarding the computational costs, the solution proposed in [9] is more efficient than ours.
•
presence of a single eye (face rotated so much that only one eye is visible, presence of an occlusive object in front of the other eye, Skin-Map being partial);
•
faces tilted up to |60◦ |2 ;
•
head significantly bent down;
•
closed eyes, subject looking away from camera objective;
•
subject wearing transparent glasses;
subject manifesting a non-neutral expression (big smile, open mouth, etc.). We present in section III-A the construction of the SVM classifier, in section III-B we give a brief description of the validation technique and finally in section III-C we summarize the results of our face detector. •
A. Training the classifier In order to build a SVM able to recognize the eyes in a wide range of situations, we considered 3 different image databases for the construction of the training and test sets: 1) FERET [4] (1330 images): due to its variety and good quality3 it is suitable to model the general eye pattern (eyes belonging to vertical faces, both frontal and rotated, eyes eventually framed by glasses); 2) BANCA-Adverse (207 images): useful to model the class of closed eyes; 3) DBLAIV (478 images): it includes in our classifier some knowledge about real world pictures. For instance it allows to model eyes taken from tilted faces4 and it enriches the class of negative examples due to the high complexity and variety of the backgrounds.
Fig. 1: Some results: First line: original images; Second line: Skin-Maps obtained with our implementation of the algorithm described in [9]; Third line: Skin-Maps obtained with our watershed algorithm.
III. FACE
DETECTION WITH
SVM
Given the Skin-Map of an image, we conceive a validation criterion to determine for each connected component R of the SkinMap whether it constitutes a face or not. To do so, we search within it for (at least) one eye; in case of positive response we say we have detected and localized a face, otherwise we discard the region R as non-face. We intend to design a face detector that behaves correctly in case of Skin-Maps at worst good (according to the definition in section II). We base the validation step mainly upon the output of a statistical classifier, without taking into account any strong geometric knowledge of what is a face, because we mean to treat a wide set of critique situations that can arise: •
significant over or underestimation of Skin-Map regions;
It must be noticed that the FERET database contains only gray scale pictures. In fact from now on we discard the color information (useful only in the skin detection step), while we restrict our attention to the spatial one. For each face we extracted automatically5 the 2 eyes (labelled as positive examples), 6 non-eye components chosen randomly out of 11 [Fig. 2] plus 4 random examples from the background (or generally speaking from the complement of the eyes bounding box). These latter 10 examples are labelled as negative. The dimensions of the window used for extraction are related to the the intereyes distance and to an estimate of the pose angle of the subject. The whole sample was then split into training and test set with proportion 2/3 and 1/3 respectively. This procedure gave rise to sets of cardinality 17801 and 8899. 2 As specified in section III-A, the SVM has some knowledge of examples extracted from faces tilted up to |60◦ |. 3 Each image of the FERET represents one person on a uniform background, with the pose angle varying over the range −60◦ to +60◦ in nine discrete steps. Images specifications: 650 close portraits, 120 of reduced sizes, 200 dark pictures taken under poor illumination, 360 portraits of people wearing glasses. 4 This subset of the DBLAIV contains faces whose tilt angle (defined as the angle between the vertical axis of the head and the horizontal plane) varies continuously from −60◦ to 60◦ , with mean around 0◦ and standard deviation 15◦ . 5 For each face we dispose of the coordinates of the eyes’ centers, the nose tip and the mouth center.
Fig. 2: Facial components: 2 positive and 11 negative examples
In order to render the examples suitable for training they have been contrast stretched and pyramidally downsampled to the size of 20 × 15 pixels. Such dimension has been experimentally found as trade off between the necessity to maintain low the computational cost, and to have sufficient details to learn. The corresponding 300dimensional vectors (of values ranging from 0 to 255) are fed to a C-SVM with Gaussian kernel, with parameters γ = 10−3 and C = 10. Also these values have been chosen after several trials as trade off between error reduction and generalization ability. By doing so, we obtained an error of 2.2% on the test set6 .
B. Validation technique The validation of the Skin-Map poses two major problems. Firstly, it is necessary to reduce the number of points to consider for classification, while not excluding eyes centers. Secondly, the absence of any assumption on the scale of faces forces the research of eyes on a range of possible dimensions. The two questions have implications both on the computational cost and on the accuracy of the technique. For what concerns the first issue, in a past work [7] we have tried to exploit the fact that eyes regions show a high blue component and a low red one. Unfortunately that approach has not led to significant results. Therefore we make very few assumptions about the searchable area: in fact we restrict the scan to edges within the Skin-Map, according to the consideration that eyes usually lay on strong edges. In particular we limit the maximum number of points to visit in each region R by thresholding adaptively the edge image ER . The detail with which a region is scanned depends on its size, through the definition of the “radius” r A r= , where A is the area of R in pixels. π 6 We measured an error of 1.2% on the training set (the C-SVM allows errors also on the training set, according to the soft margin formulation).
Fig. 3: Candidates extraction from the Skin-Map: upper left corner: the original image; upper right corner: its skin map; lower left corner: edges image; lower right corner: candidates
In fact the candidates p are exactly the points [Fig. 3] in ER ∧ G where G is the set of the interceptions of a grid of lines, spaced according to the scan step s = r/k (being k a constant that regulates such detail). In this way we expect to consider approximately the same number of candidates independently of the region area. As we will see s works as the unit of distance between candidates of region R. Regarding the second issue, in order to evaluate a candidate point we need to extract an example at the correct scale. In other words, we expect the SVM response to be correct on eyes only if we feed an example in which the eye dimension fits the model defined by the class of positive examples. We know by construction that the training examples have been extracted with a size based on the inter-eyes distance. On the other hand, during validation the only information that we have is the shape and size of R. Experimentally we see that if the Skin-Map of a face is excellent, then the ratio of the distance between the eyes over the radius r is about 0.7. So the natural choice for candidates extraction is the size (in pixels) d = 0.7 × r. However, complex pictures seldom have Skin-Maps very well defined. We must account for possible errors of over or underestimation of skin regions, which means to consider different possible dimensions for eyes (hypothetically) present in the region. For simplicity we extract only two additional examples of sizes 0.8 × d and 1.2 × d, besides the optimal size d. + Let us call x p , x− p and x p the examples corresponding to the same candidate p at the 3 different scales; we evaluate the strength of p by summing the margins of each example according to the
classifier. We define the function + f (p) = SV M(x p ) + SV M(x− p ) + SV M(x p )
where SV M(x) = 0 is the equation which defines the optimal separating hyperplane. Since the three scales are not so different we usually observe a good correlation among the margins on positive examples, and the definition of f is useful only to prevent the exclusion of a good candidate due to a wrong Skin-Map estimate and simultaneously to weaken the strength of a pattern that looks similar to an eye only at a certain scale. Unfortunately the classification skills of the SVM are not sufficient to take for eyes all candidates p with a positive f (p) and discard all the others. What’s more, we do not know a priori neither how many candidates are placed near the eyes’ centers, nor what will be the response of the SVM on them. Consequently it is necessary to determine a threshold to separate positives from false positives and a heuristic to group together all positives relative to the same eye. Our answer to the problem requires the introduction of two absolute threshold values, of opposite sign. First, we make a quick scan of the candidates only at the optimal scale d and discard all p˜ s.t.
Fig. 4: BANCA controlled: FA rate vs. FR rate
SV M(x p˜ ) < θ1 < 0. Second, we operate another scan in which we select all p 6= p˜ s.t. f (p) ≥ θ2 > 0, then we aggregate them according to their mutual distance. The idea is to cluster all strong candidates closer to each other than a certain multiple m of s, and for each cluster C we calculate its centroid c. Finally we strengthen the correctness of the position of c by adding the contribute of those candidates p s.t. θ2 > f (p) ≥ θ2 /2 (candidates which are less strong but still significant). Finally we say that each point c represents the center of an eye. The technique so far presented depends critically on many parameters: d, k, m, θ1 and θ2 . Therefore we thought to use a special database to make an optimal choice for them. We selected 175 images from the BANCA-Controlled (it consists of good quality frontal portraits over a uniform background), in a way that the SkinMap estimation for each image was at worst good. Then we exploited this ideal condition to make different experiments after which we set k = 15, m = 3, θ1 = −1.5 and d = 0.7 × r (as already mentioned). Most important is an appropriate choice for θ2 , which regulates the discrimination between positives and false positives, and consequently deserves a deeper analysis. If we measure the false acceptance (FA) and the false rejection (FR) rates respectively as the percentages of accepted negatives (non-faces) and rejected faces, then we can sketch a graph of their variation over a range of different thresholds. In figures 4 and 5 we show the trend of such a procedure (varying θ2 in [0.5, 5]) for two different sets of images: the already mentioned BANCA-Controlled and a subset (148 images) of the much more complex DBLAIV. In both cases we found the value θ2 = 2.5 below the intercepts between the FA and FR curves, and we chose it according to a conservative attitude. It is already noticeable from the graphs a certain difficulty of the SVM to separate the classes of faces from that of non-faces. We discuss this problem more completely in section IV.
C. Experimental results Since the design of the face detector avoids any ad hoc consideration with respect to any particular database, it follows that we need to evaluate its performance over different kinds of image collections. First of all it must be said that all sets of images mentioned in the previous sections (for purposes of training and parameters tuning)
Fig. 5: DBLAIV: FA rate vs. FR rate
are disjoint from the sets built for experiments on face detection. In the latter case, we constructed 4 distinct sets: 1) XM2VTS/1 (748 images): approximately 187 subjects NOT wearing glasses, photographed (frontally) in 4 different sessions; 2) XM2VTS/2 (406 images): about 102 subjects wearing glasses, photographed (frontally) in 4 different sessions; 3) GLAIV (149 images): 3 galleries (frontal, left-rotated, rightrotated) of 45 subjects photographed on uniform background; 4) DBLAIV (220 images). Table II summarizes the findings on each data set. A point detected is said to be a positive if it is closer to an eye center than m×s pixels (as defined in section III-B). By the term erroneous positive we mean a true face detected erroneously due to a false positive eye. It is interesting to note that our validation step produces a great number of false positives regarding eyes, but they cause a relatively small number of erroneous positives regarding faces. This happens because false positive eyes occur most often in combination with true eyes correctly detected. However this poses some problems in view of an eventual use of the face detection results for authentication purposes. In the next section we further discuss the pros and cons
TABLE II: Face detection results eyes DB
XM2VTS/1
faces
positives
positives rate
false positives
positives
positives rate
1496
82.1%
372
748
89.6%
detection rate
XM2VTS/2
812
62.2%
190
406
285
93.7%
239
143
618
72.5%
586
324 detection rate
of our face detector.
IV. C ONCLUSIONS In this paper we have presented a two step-method which automatically detects faces in still color images of generic scenes representing an arbitrary number of people (even none). At first a skin detection module discards the regions whose color has very low probability to correspond to skin; this step allows to avoid an exhaustive search with the classifier, reducing the computational costs and the number of possible false positives (on average the SkinMap area measures 1/4 of the image area). Afterwards, a validation step based on a SVM classifier discriminates between faces and non faces among all the regions maintained by the previous step; the classifier is trained to recognize eyes, meaning that detecting at least one eye within a skin region determines the presence of a face. We have chosen to search for eyes instead of whole faces since it has been shown that component based methods are more robust to pose, expression and illumination variations. We make only the assumptions that skin color is not strongly altered by artificial lights and that at least one eye per face is visible. Consequently we propose a very general method, robust to partial occlusions and to changes in pose, expression and scale. Differently from many scale-independent methods, which scan the image several times in a computationally expensive fashion, we limit our search only to three different scales and on a small subset of points, exploiting the information given by the Skin-Map. The computation of a Skin-Map takes averagely 40 seconds on 800 × 600 images. We visited 469 points per region, corresponding to approximately the 0.91% of the Skin-Map area and to the 0.23% of the total image area. Each point has a computational cost of about 50ms, which is fairly constant over the different databases and depends on the processing system7 . A direct comparison with other face detection systems [15] [17] [20] cannot be done since their goal is to detect only frontal faces and they avoid the treatment of occlusions. Moreover, they present results obtained on databases of gray scale images that we cannot process.
7 We carried our experiments on a Pentium 4 with clock 3.2 GHz. Our Java code was interpreted by a Sun Java Virtual Machine and it makes use of external calls to the SVM package mySVM [16], thus spending a lot of time doing I/O.
false positives rate
2.1%
634
3.9%
7.1%
367
3.3%
0.7%
34
32.4%
7.7%
454
43.0%
75.9% 83.0% 99.3%
detection rate
DBLAIV
negatives
91.7%
detection rate
GLAIV
erroneous positives rate
100% 88.0% 95.7%
Our method has shown to detect most of the faces in images even in case of complex ones (DBLAIV) while the remaining open problem is the high number of false positives; in order to improve the system performance we can intervene on the two main steps separately. Regarding the skin detector we are gathering more skin color samples to improve our model and to make it more reliable. Concerning the validator, we intend to pursue three strategies, each one driven by a different explanation to the cause of errors. A first idea is to characterize the examples with conveniently small number of highly significant features in order to investigate in this different space whether the SVM can better discriminates ‘eyes’ from ‘non eyes’. A second possibility is to train several SVMs, each one specialized in a well determined eye appearance, such as left and right, closed, with glasses, etc., thus simplifying the task of each SVM. Finally we believe that an improvement could be achieved combining several detectors; for instance one for the eyes and one for the mouth, in order to combine the judgements of the different SVMs.
R EFERENCES [1] The BANCA database. Web address: http://www.ee.surrey.ac.uk/Research/VSSP/banca/. [2] The Champion database. Web address: http://www.libfind.unl.edu/alumni/events/breackfast-for-champions.htm. [3] Laboratory of Analysis of Images and Vision (LAIV). Web address: http://homes.dsi.unimi.it/ campadel/LAIV/. [4] The FERET database. Web address: http://www.itl.nist.gov/iad/humanid/feret/, 2001. [5] The XM2VTS database. Web address: http://www.ee.surrey.ac.uk/Research/VSSP/xm2vtsdb/, 2001. [6] R. Brunelli and T. Poggio. Face recognition: Features versus templates. IEEE Transactions PAMI, 15(10):1042–1062, 1993. [7] E. Casiraghi, R. Lanzarotti, and G. Lipori. A face detection system based on color and support vector machines. Workshop Italiano Reti Neurali (WIRN2003), Vietri sul mare (Sa), In Press, 2003. [8] B. Heisele, P. Ho, and T. Poggio. Face recognition with support vector machines: global vesus component-based approach. Proceedings IEEE International Conference on Computer Vision (ICCV2001), pages 688– 694, 2001. [9] R. Hsu, M. Abdek-Mottaleb, and A.K. Jain. Face detection in color images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):696–706, May 2002. [10] M. Hunke and A. Waibel. Face locating and tracking for humancomputer interaction. 28th Asilomar Conference Signals, Systems and Comuters, California, 1994. [11] S. McKenna, S. Gong, and Y. Raja. Face recognition in dynamic scenes. British Machine Vision Conference (BMVC97), Essex, 1997.
[12] G.J. McLachlan and T. Krishnan. The EM algorithm and extensions. John Wiley & Son, 1996. [13] E. Osuna, R. Freund, and F. Girosi. Training support vector machines:an application to face detection. Proceedings of International Conference on Computer Vision and Pattern Recognition, CVPR’97, 1997. [14] R.A. Redner and H.F. Walker. Mixture densities, maximum likelihood and the em algorithm. SIAM Review, 26:195–239, 1984. [15] H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–38, 1998. [16] Stefan R¨uping. mySVM package implementation. http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/. [17] K. Sung and T. Poggio. Example-based learning for view-based human face detection. IEEE Transactions on pattern analysis and machine intelligence, 20(1):39–51, 1998. [18] J.C. Terrillon, M.N. Shirazi, H. Fukamachi, and S. Akamatsu. Comparative performance of different skin chrominance models and chrominance for the automatic detection of human faces in color images. Proceedings of the IEEE International conference of Face and Gesture Recognition, pages 54–61, 2000. [19] L. Vincent and P. Soille. Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Transactions Pattern Analysis and Machine Intelligence, 13(6):583–598, 1991. [20] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. [21] H. Wu, Q. Chen, and M. Yachida. Face detection from color images using a fuzzy pattern matching method. IEEE Transactions Pattern Analysis and Machine Intelligence, 21:557–563, 1999. [22] G. Yang and T.S. Huang. Human face detection in complex background. Pattern recognition, 27(1):53. [23] M. Yang and Narendra Ahuja. Gaussian mixture model for human skin color and its applications in image and video databases. SPIE Proceedings Storage and Retrieval for Image and Video Databases VII, 01/23 - 01/29/1999, San Jose, CA, USA, pages 458–466, 1999. [24] M.H. Yang, D. Kriegman, and N. Ahuja. Detecting face in images: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(1):34–58, 2002.