face detection techniques: theory and applications

3 downloads 9960 Views 407KB Size Report
recognition, automotive monitoring systems etc. Face detection is ..... 127-133, 2006, Consortium for Computing Sciences in Colleges, USA. ISSN 1937-. 4771.
FACE DETECTION TECHNIQUES: THEORY AND APPLICATIONS Karel Horak Brno University of Technology Department of Control and Instrumentation Kolejni 4, Brno, 612 00 Czech Republic [email protected]

Abstract: Detection of human faces plays a basic role in many applications of computer vision such as video surveillance, image retrieval, face recognition etc. This paper presents a general framework for face detection in both color and monochromatic images. Several basic methods of color transformations are introduced and discussed as well as advanced techniques such as boosted cascade of simple features or face symmetry detection. All presented techniques are based on completely different principles and so each method is suitable only for some applications. Keywords: skin detection, face detection, color transformation, Haar-like features, symmetry features

1 Introduction The contemporary evolution of information and computer technologies allows designing of advanced computer vision systems besides well-known traditional inspection tasks. Human very often plays a major role in a wide variety of these advanced applications such as video surveillance, face database management, human computer interaction, face recognition, automotive monitoring systems etc. Face detection is the most often first processing step in such automated systems for facial image analysis [2]. Many researchers try to handle the major challenges in face analysis such as different skin tone (ethnicity), head orientation, face distance, complex background and detection under changing lighting conditions [22]. Some authors pay attention only to general skin detection problem i.e. to detect all objects corresponding to a given color gamut [5, 15, 16]. Other authors deal with face detection techniques to identify only true face regions from the rest [2, 6, 20]. Several authors also try to handle the face recognition problem, which aims to evaluate an identity of a person. Finally, a considerable effort is spent on recognition systems of facial expression. Image processing approaches for face detection are usually considered as either feature-based or image-based approaches [2]. All low-level analysis methods as edge detection, gray-level or color analysis and others belong to the class of feature-based approaches [4, 8, 12]. On the contrary, linear subspace methods, neural networks and statistical analysis methods form the latter image-based approach.

2 Color-based Methods In term of image acquisition, the standard RGB color space is most often used by both end-users and industrial cameras. Besides the RGB color space, many others color spaces are available for image processing tasks (CMY, HSV, HLS, YUV, YCbCr, L*a*b, CIE-xyz, IRgBy etc.) and unfortunately only some of them are suitable for skin and face detection applications. These color spaces have always separated brightness component from the remaining two color components as in the case of IRgBy, YCbCr and other color spaces. To obtain a new color space representation, it is necessary to perform a so-called color conversion, which is always linear or non-linear transformation of one triplet to another. The main advantage of these color transformations is higher independence on changes in illumination. Many times were proved that strong changes in illumination in a scene significantly affect only brightness component in color space (I and Y) and the remaining color components stay relatively stable. It follows that values of Rg/By and Cb/Cr pairs mentioned above can be easily used to color-based segmentation in Rg-By and Cb-Cr plane respectively. On the other hand, some authors claim that RGB color space has identical performance for skin detection and face recognition tasks as intuitive color spaces mentioned in the previous paragraph. For example in [16] is introduced a novel and very simple method for skin color detection based on mere difference between R and G component of RGB color space. Obviously such simple arithmetic on pure RGB components is rapid and easy to implement, unfortunately also unreliable particularly along illumination changes, shadows and other circumstances. For these reasons only intuitive color spaces with separated brightness and color components will be considered in the following sections. 2.1 Segmentation in IRgBy color space Conversion from native RGB representation to IRgBy color space is given by the following three equations (1). Each of these equations represents non-linear transformation of original triplet components R,G and B into the new brightness component I and the two color components Rg and By.

f ( R ) + f (G ) + f ( B )   3  Rg = f ( R ) − f (G )   f ( R ) + f (G )  By = f ( B ) −  2

I=

where

f ( x) =

B MAX ⋅ log( x + 1) log B MAX

(1)

The component Rg means a difference between the red and green channels and the component By means a difference between the blue and an average of the red and green channels. Non-linearity of this transformation is caused by the logarithmic function f(x), where symbol BMAX denotes a maximal pixel value i.e. number of 256 for the most often used image of 8 bit in depth per channel. As can be seen in the following figure, skin regions are highlighted chiefly in the Rg component (see Figure 1c). Moreover, by using a combination of Rg and By components can be achieved even better results in comparison to using only one separated Rg component. As suitable combination of Rg and By components can be considered square of inverse tangent Rg/By, which is usually called Hue (refers to intuitive color models) and it can be seen in the Figure 1e.

(a) (b) (c) (d) (e) Fig. 1. Skin detection: (a) original color image, (b) I-component, (c) Rg-component, (d) By-component, (e) Hue Mentioned Hue image clearly serves as a skin-map for subsequent image processing and image understanding. Unfortunately, the skin-map contains both face and non-face regions, so in the next step only face regions must be unambiguously recognized from the other non-face regions. It can be carried out by a number of various methods using knowledge of size, shape, position, motion or unique features of a face region [2]. 2.2 Segmentation in YCbCr color space Properly speaking, notation of YCbCr stands for a type of encoding RGB signal rather than an absolute color space, nevertheless for our purposes of skin detection it can be treated as a standard color space, which can be converted and analyzed. Conversion between standard RGB color space and the YCbCr representation is technically very similar to the previous IRgBy transformation. Each of the three result component Y, Cb and Cr is given by a linear function of the three input variables R,G and B with different coefficients. Example of the golfer from Figure 1 represented in YCbCr space is shown in the next Figure 2, where binary and probabilistic maps are shown on the right.

(a) (b) (c) (d) (e) Fig. 2. Skin detection: (a) Y-component, (b) Cb-component, (c) Cr-component, (d) binary and (e) probabilistic output Rather than thresholding of color components Cb/Cr or their combination, a general skin color model has to be established to detect skin color and afterwards face regions. Many of almost universal models in YCbCr were suggested and tested in the past [2, 15], however a single purpose color model seems to be the best solution for different tasks, because of skin tone dissimilarity, application purposes, conditions etc. A color model of skin tones can be simply generated from any available true-skin sample images. The collage of skin images is depicted in the next Figure 3 together with pixel’s arrangement in the YCbCr color space and the binary and probabilistic model. The binary and probabilistic model depicted in the Figure 3 (c) and (d) are based on skin-pixels projection onto CbCr plane. Finally, dichotomic classification of a new unknown pixel into the class of skin is then accomplished by determination of the model response to the pixel’s Cb and Cr coordinates. Results of such classification for both binary (BW skin-map) and probabilistic (multilevel skin-map) model are depicted in the Figure 2 (d) and (e) respectively. These binary and probabilistic skin-maps can be used in the next processing step for computation eye or mouth-map [5].

(a) (b) (c) (d) Fig. 3. Skin model: (a) skin samples, (b) samples in YCbCr, (c) binary model, (d) probabilistic model

3 Feature-based Methods Feature-based techniques represent completely different approach for face detection tasks in the realm of computer vision. These techniques are based on evaluation and classification of significant features, which belong to objects contained in an analyzed image. These features can be extracted from both monochrome and color images, however monochromatic images are used more often because of color information is not always relevant and stable [2, 14]. 3.1 Haar-like features Theory of the Haar-like features employs the Haar basis functions also associated with Haar wavelet basis functions, which have been originally presented by Papageorgiou for general object detection in [13]. The Haar-like features are well known thanks to a framework for detection of general rigid objects first introduced by Viola and Jones in [17] and later its real-time modification in [18]. This framework is capable of processing images extremely rapidly while preserving high detection rates, because the algorithm only classifies simple features. Mentioned framework (not only for face detection) employing the Haar-like features is composed from the three basic components. The first component is features evaluation, second component is learning algorithm and the last component is architecture of classification cascade. As the first step a basic set of Haar-like features is used for relevant information extraction from an image [17]. This basic set contains only four rectangle features as it is shown in the Figure 4. In addition to this basic set, an extended set of similar Haar-like features was introduced by Lienhart in [9]. In comparison with only four features in the basic set, the extended set contains overall 14 features divided into the three groups of edge, line and centersurround features. This new extended set of rotated features improves face detection in approximately 10 % lower false alarm rate.

Fig. 4. Basic set of the Haar-like features. The value of any given feature is simply calculated as the sum of the pixels within white rectangles subtracted from the sum of the pixels within dark rectangles. Value of the selected feature is enumerated for all admissible positions in an input image (sub-windows) and for chosen Haar-like features from the basic or extended set as depicted in Figure 4. A so-called two-rectangle features in Figure 4 (a) and (b) detect vertically and horizontally adjacent regions of different brightness values. Following three-rectangle feature (c) clearly servers as a detector of ridge-shaped or valley-shaped region. Finally the four-rectangle feature (d) computes the difference between diagonal pairs of corresponding rectangles. Because of high computational cost of mechanical calculation of features in pixel-by-pixel sense, an intermediate image representation called integral image is commonly used [21]. A value of the integral image at a location (x, y) is defined as the sum of all pixel values above and to the left of coordinates (x, y) inclusive. The integral image denoted Iint of a greyscale input image I is given by the equation (2).

I int ( x, y ) =

∑ I (i, j )

(2)

i≤ x, j≤ y

A significant advantage of calculating the integral image Iint is that it can be computed in one pass through the original input image. The following two equations define such quick one-pass computation of any integral image.

I int ( x, y ) = I int ( x − 1, y ) + s ( x, y ) s( x, y ) = s ( x, y − 1) + I ( x, y )

(3)

Symbol s(x,y) denotes an intermediate sum of all pixels in column x with a vertical coordinate less than y. As mentioned above, the integral image Iint refers to sum of all pixels above and to the left of coordinates x and y. It follows that value of the integral image with coordinates (x,y) can be graphically illustrated as depicted in Figure 5 (a).

(a) (b) (c) Fig. 5. Integral image: (a) value of Iint(x,y), (b) quick rectangle evaluation, (c) second Haar-like feature on image The Haar-like features, as depicted in the Figure 4 in basic set, can be constructed by means of the integral image very rapidly. See the example in Figure 5 (b). The sum of pixels within the rectangle D can be easily computed only as a sum of four values. The sum within the gray rectangle is equal to (4)-(2)-(3)+(1) where e.g. (4) refers to the value of integral image at coordinates marked with the symbol 4. It is obvious the integral image allows computation of the Haar-like features at any scale and any location very rapidly and in constant time, because only three additions are required for an arbitrary size of the rectangle. The basic and extended set of the Haar-like features described above can be gracefully used for example in driver inattention monitoring system as you can see in the Figure 5 (c). After the features evaluation, learning of classifiers cascade and classification step have to be performed [17]. The cascade of nodes is arranged as is shown in Figure 6. Note that each node represents one strong classifier and further each strong classifier represents several weak classifiers. Design of strong classifiers employs a variant of AdaBoost learning algorithm to aggressive feature selection and classifiers training. On account of reducing computational costs, the strong classifiers are arranged in a line cascade in order of their complexity.

Fig. 6. The cascade of N nodes of boosted classifiers When all required features of a sub-window are computed, the first strong classifier evaluates these features and makes a decision. The first classifier is very rough with extremely high detection rate and almost zero rejection. Its function is to reject only absolutely non-face candidates. When the sub-window goes through the cascade, more and more complex classifiers evaluate its features. Only the sub-window of a true face passes all nodes and is finally classified as face region. This cascade of classifiers cause extremely small number of false positives, because of each node has high detection rate (e.g. more than 99 %) and very low rejection rate (e.g. less than 40 %). Thus a non-face region can be classified as a face, however very rarely. On the other hand, the cascade architecture is designed to completely suppress false negatives so that no true-face region is omitted. 3.2 Symmetric features Theory of detection and localization of points of interest in image processing branch has been well-known for many years. The most known detectors are simple Moravec operator and Harris detector used for face detection tasks among others. Symmetric features approach employs an intuitive idea that a human face is vertically symmetric, so points of interest on a face are arranged in strictly symmetric constellations [10, 11]. It follows there is no face-segmentation step in an algorithm, but only face-symmetric features are detected via a set of points of interest. Each image of a human face contains a large number of interest points, but only some of them contribute to a value of symmetric feature [1]. First, a set of feature points pi has to be constructed by using any rotationally invariant method (e.g. SIFT or SURF). A requirement of a rotational invariance is very important due to a further computation of rotated features for each point of interest whilst scale-invariant parameter is not necessary. Let the symbol pi be a point vector of an ith interest point defined by the equation (4).

pi = ( xi , yi ,φi , si )

(4)

The pair of xi and yi denotes position of the point of interest, symbol φi denotes its normative orientation in degrees and symbol si its normative scale. Further, a feature descriptor denoted ki is computed for each feature point pi as just defined. The feature descriptor ki encodes the local appearance of an ith feature point after its orientation and scale have been normalized (see next Figure 7). After that, a mirrored feature descriptor labeled mi is generated for each corresponding feature descriptor ki. The mirrored feature descriptor mi is the same as basic feature descriptor ki of mirrored region [10]. The basic feature descriptor ki and the mirrored feature descriptor mi create a descriptor pair.

Fig. 7. Illustration of a descriptor pair (ki,mi) and symmetric pair (kj,mj) For all admissible combinations of a feature descriptor ki (interest point pi) and a feature descriptor mj (point of interest pj) a similarity measure Mij is evaluated. It is obvious that matching of a pair (kj,mi) is redundant because the similarity measure Mij has the same value as Mji. The value of similarity measure Mij is given by the equation (5).

Φ ⋅ S ⋅ D M ij =  ij ij ij 0

Φ ij > 0 otherwise

(5)

In the last equation Φij refers to an angular symmetry weight, symbol Sij refers to a scale weight and finally symbol Dij refers to a distance weight [10]. When matrix Mij of similarity measures is known a linear Hough transform can be used to determine a dominant symmetric axis. Only similarity measures Mij with value greater than an a-priori given threshold are processed in Hough transform voting style. It means that each pair of points of interest (pi,pj) with the strong symmetry measure Mij votes a couple (rij,θij) in a Hough accumulator. When finished, the Hough space is filtered with a Gaussian and a peak corresponding to a global maximum is found and a relevant couple (rij,θij) is extracted. These two coordinates in Hough space clearly determine dominant symmetric axis in an input image. All points of interest detected as symmetric features as well as the dominant face-axis are depicted in the following Figure 8.

Fig. 8. Points of interest as symmetric features for face detection In order to face recognition from other symmetric objects in an image only such axis are treated, which satisfied faces relevant requirements (axis orientation, eye distance etc.). When a dominant symmetry axis is determined, points of interest corresponding to eyes regions can be easily located and further tracked over the whole image sequence if necessary. Finally, a computational complexity of symmetric features approach is in particularly given by a computational complexity of points of interest detector and features descriptor algorithm.

4 Applications The face detection and recognition techniques including facial expression are very often used in advanced HCI systems (human computer interaction) and in artificial intelligence systems and robots. In automation the face detection techniques is used to interactive tasks control of assembly line and other home-automation processes (e.g. surveillance). Face recognition techniques are also used in teleconferencing for the localization and tracking of the speaker. Very important step, besides automatic eyes detection, is its validation and image sequence stabilization in such types of perceptive applications [19]. The face detection and subsequent eyes localization methods are often employed in traffic monitoring systems and in driver assistance systems [3, 4, 7]. Implementations of face detection techniques in these automotive assistance systems have to be crucially reliable and stable because of traffic safety.

5 Conclusion In this paper were presented the most often used approaches for skin and face detection and face recognition systems. In the first place, methods based on color information in images were introduced i.e. methods depending on the standard RGB components of input an image. These techniques are very simple to implement and are very efficient in the sense of computational complexity. The second part of the paper introduces so-called feature-based approaches specifically Haar-like features method and symmetric features method. These methods are more complex and are not trivial to

implement them into a practical application. However, their adaptability to a target application is considered as very significant advantage (owing to learning step) together with ability of processing grayscale images.

Acknowledgement: This research has been supported by the Czech Science Foundation under the project GA102/09/1897 and by the Technology Agency of the Czech Republic under the Competence center TE01020197.

References: [1] Bai, L., Shen, L., Wang, Y. A Novel Eye Location Algorithm Based on Radial Symmetry Transform, Proceedings of the 18th International Conference on Pattern Recognition, Vol. 3, pp. 511-514, 2006, IEEE Computer Society, Washington. ISBN 0-7695-2521-0. [2] Hjelmas, E., Low, B.K. Face Detection: A Survey, Computer Vision and Image Understanding, Vol. 83, Issue 3, pp. 234-274, 2001, Elsevier, ISSN 1077-3142. [3] Horak, K. Fatigue Features Based on Eye Tracking for Driver Inattention System, Proceedings of the 34th International Conference on Telecommunications and Signal Processing, pp. 593-597, 2011, Department of Telecommunications, Brno. ISBN 978-1-4577-1411-5. [4] Horak, K. Honzik, P., Kucera, P. On Image Segmentation Techniques for Driver Inattention Systems, Proceedings of the 17th International Conference on Soft Computing, pp. 1-6, 2011, Institute of Automation and Computer Science, Brno. ISBN 978-80-214-4120-0. [5] Hsu, R., Abdel-Mottaleb, M., Jain, A.K. Face Detection in Color Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 5, pp. 696-706, 2002, IEEE Computer Society. ISSN 0162-8828. [6] Jeon, B.H., Lee, S.U., Lee, K.M. Rotation Invariant Face Detection Using a Model-based Clustering Algorithm, International Conference on Multimedia and Expo, Vol.2, pp. 1149-1152, 2000, New York. ISBN 0-7803-6536-4. [7] Kucera, P., Hyncica, O., Pavlata, K., Horak, K. On Vehicle Data Acquisition System. Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems. pp. 1446-1451, 2011, New York, USA. ISBN 978-1-4577-2197-7. [8] Kothari, R., Mitchell, J.L. Detection of Eye Locations in Unconstrained Visual Images, Proceedings of the International Conference on Image Processing. Vol. 3, pp. 519-522, 1996, Lausanne. ISBN 0-7803-3259-8. [9] Lienhart, R., Maydt, J. An Extended Set of Haar-like Features for Rapid Object Detection, Proceedings of the International Conference on Image Processing, Vol. 1, pp. 900-903, 2002, Rochester. ISBN 0-7803-7622-6. [10] Loy, G., Eklundh, J. Detecting Symmetry and Symmetric Constellations of Features, Proceedings of the 9th European Conference on Computer Vision, Vol. 2, pp. 508-521, 2006, Springer-Verlag, Berlin. ISBN 3-54033834-9. [11] Loy, G., Zelinsky, A. Fast Radial Symmetry for Detecting Points of Interest, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 8, pp. 959-973, 2003, IEEE Computer Society, Washington. ISSN 0162-8828. [12] Morimoto, C.H., Koons, D., Amir, A., Flickner, M. Pupil Detection and Tracking Using Multiple Light Sources, Image and Vision Computing, Vol. 18, Issue 4, pp. 331-335, 2000, Elsevier. ISSN 0262-8856. [13] Papageorgiou, C.P., Oren, M., Poggio, T. A General Framework for Object Detection, Proceedings of the 6th International Conference on Computer Vision, pp. 555-562, 1998, IEEE Computer Society, Washington. ISBN 81-7319-221-9. [14] Peng, K., Chen, L., Ruan, S., Kukharev, G. A Robust Algorithm for Eye Detection on Gray Intensity Face without Spectacles, Journal of Computer Science & Technology, Vol. 5, No. 3, pp. 127-132, 2005. ISSN 1666-6046. [15] Phung, S.L., Bouzerdoum, A., Chai, D. A Novel Skin Color Model in YCbCr Color Space, Proceedings of the International Conference on Image Processing, Vol. 1, pp. 289-292, 2002. ISBN 0-7803-7622-6. [16] Saleh, A.A. A Simple and Novel Method for Skin Detection and Face Locating and Tracking, Lecture Notes in Computer Science, Vol. 3101, pp. 1-8, 2004, Springer-Verlag, Berlin. ISBN 978-3-540-22312-2. [17] Viola, P., Jones, M. Rapid Object Detection Using a Boosted Cascade of Simple Features, Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518, 2001. ISBN 0-7695-1272-0. [18] Viola, P., Jones, M. Robust Real-Time Face Detection, International Journal of Computer Vision, Vol. 57, Issue 2, pp. 137-154, 2004, Kluwer Academic Publishers, Hingham. ISSN 0920-5691. [19] Wang, P., Green, M.B., Ji, Q., Wayman, J. Automatic Eye Detection and Its Validation, Conference on Computer Vision and Pattern Recognition, Vol. 3, pp. 164-171, 2005, IEEE Computer Society, San Diego. ISBN 0-76952372-2. [20] Wang, P., Ji, Q. Multi-view Face and Eye Detection Using Discriminant Features, Computer Vision and Image Understanding, Vol. 105, Issue 2, pp. 99-111, 2006, Elsevier, ISSN 1077-3142. [21] Wilson, P.I., Fernandez, J. Facial Feature Detection Using Haar Classifiers, Journal of Computing Sciences in Colleges, Vol. 21, Issue 4, pp. 127-133, 2006, Consortium for Computing Sciences in Colleges, USA. ISSN 19374771. [22] Yang, G., Huang, T.S. Human Face Detection in a Complex Background, Proceedings of the 18th International Conference on Pattern Recognition, Vol. 27, Issue 1, pp. 53-63, 1994, Elsevier. ISSN 0031-3203.