An Adept Segmentation Algorithm and Its Application to the Extraction of Local Regions Containing Fiducial Points ˙ Erhan AliRiza Ince and Syed Amjad Ali Eastern Mediterranean University, Electrical and Electronic Engineering Famagusta, North Cyprus
[email protected],
[email protected]
Abstract. Locating human fiducial points like eyes and mouth in a frontal head and shoulder image is an active research area for applications such as model based teleconferencing systems, model based low bit rate video transmission, computer based identification and recognition systems. This paper proposes an adept and efficient rule based skin color region extraction algorithm using normalized r-g color space. The given scheme extracts the skin pixels employing a simple quadratic polynomial model and some additional color based rules to extract possible eye and lip regions. The algorithm refines the search for fiducial points by eliminating falsely extracted feature components using spatial and geometrical representations of facial components. The algorithm described herein has been implemented and tested with 311 images from FERET database with varying light conditions, skin colors, orientation and tilts. Experimental results indicate that the proposed algorithm is quite robust and leads to good facial feature extraction.
1
Introduction
Extracting face regions containing fiducial points and recovering pose which are two challenging problems in computer vision have been widely explored by researchers. Many vision applications such as video telephony, face recognition, hybrid access control, feature tracking, model based low bit rate video transmission and MPEG-4 coding require feature extraction and pose recovery. There exist various methods for the detection of facial features and a detailed literature survey about these techniques is available in [1-6]. One of the very first operation that is needed for facial feature detection is the face localization. To achieve face localization many approaches such as segmentation based on skin color [2,3], clustering [7], Principal Component Analysis [8], and neural nets [9] have been proposed. Once the face region is located it can be made more evident by applying a region growing technique [10]. Facial features can then be extracted from the segmented face region by making use of image intensity, chromaticity values and geometrical shape properties of the face. It has been clearly demonstrated in [11] that local facial components based recognition will outperform the global face based approaches since the global A. Levi et al. (Eds.): ISCIS 2006, LNCS 4263, pp. 553–562, 2006. c Springer-Verlag Berlin Heidelberg 2006
554
˙ E.A. Ince and S.A. Ali
Input Image
Skin Segmentation
Eye Detection
Lip Detection
Output Image
Fig. 1. (a) Feature detection archetype (b) Skin color cluster in (r − g) space
approaches are more sensitive to image variations caused by translations and facial rotations. Hence this paper suggests an efficient segmentation algorithm for locating and cropping local regions containing fiducial points. In order to reduce the search area in the input images skin-pixels are extracted using a quadratic polynomial model. Also to alleviate the influence of light brightness in extracting skin pixels, the proposed algorithm adopts the r-g chromatic color coordinates for color representation. Over the r-g plane the skin pixels will form a compact region and the algorithm’s computational cost is lower in comparison to probabilistic and neural network based models. Since the objective is to design a real-time system it is essential that the complexity of the algorithm chosen is not high. Unlike many published works in the literature this paper adopts an approach in which the eyes and the mouth features are extracted independently of each other. As depicted in Fig. 1(a) this will provide the extra advantage of being able to run the two extraction routines in parallel for better time management. The paper is organized as follows: section 2 provides details about the standard color FERET database and explains how the collection of 311 images belonging to 30 subjects was chosen. Section 3 introduces the rule-based skin segmentation and sections 4 and 6 detail how to generate the feature and lip maps respectively. Sections 5 and 7 are about rules for facial component verifications and simulation results. Details about the efficiency in extracting each individual feature are given in section 8. Finally conclusions are made in section 9.
2
The Standard Color FERET Database
The database used in the simulations is a subset of the color FERET database [12,13] which has been specifically created in order to develop, test and evaluate face recognition algorithms. We have selected 30 subjects randomly from the pool and accumulated 311 pictures in total using only the FA, FB, QL, QR, RB and RC poses for each subject. FA and FB are frontal images, QR and QL are poses with the head turned about 22.5 degrees left and right
An Adept Segmentation Algorithm and Its Application
555
respectively, and RB and RC are random images with the head turned about 15 degrees in either direction. The standard color FERET database contains images with multiple faces under various illumination conditions. The images come in two different sizes. The large ones are (512 × 768) and the small ones are (256 × 384). Profile left/right (PL,PR) and half left/right (HL,HR) poses have been intentionally left out because no authorized user of a hybrid access system making use of facial feature based identification is expected to pose in front of the camera at an angle more than 22.5 degrees.
3
Skin Segmentation
For humans, skin-tone information is a useful mean for segmenting skin region. The RGB, normalized r-g, Y Cb Cr and HSV color spaces or their variations are used frequently in literature [1,14] for skin segmentation. In this work, we have used both the RGB and normalized r-g color spaces. The normalized red-green components are computed by using the following relations r=
R R+G+B
(1)
G (2) R+G+B Once the r-g components are obtained a simple quadratic polynomial model [15] is used to determine the upper and lower thresholds for the skin region as shown in Fig. 1(b). g=
fupper (r) = −1.3067r2 + 1.0743r + 0.1452
(3)
flower (r) = −0.7760r2 + 0.5601r + 0.1766
(4)
Finally, skin segmentation is done by applying together the following three rules S1. flower (r) < g < fupper (r) S2. R > G > B S3. R − B > 10 to obtain a raw binary mask (BM ) as follows: 1 if all segmentation rules S1, S2 and S3 are true, BM = 0 otherwise. The binary mask is refined by first selecting the largest connected binary region in the image and then filling the holes inside the region. Lastly, closing the gaps (holes) connected to the background in the upper part of the binary image (mostly eye and eyebrow in a left or right rotated head create such regions). The outcome of each phase of the skin segmentation algorithm in shown in Fig. 2. In order to close the holes connected to the background in the upper part of the image we first define toprow, bottomrow, lef tcolumn and rightcolumn as
556
˙ E.A. Ince and S.A. Ali
Fig. 2. Left to right: (a) Original image (b) Binary mask (BM ) (c) Largest connected binary mask with holes filled (d) Binary mask after closing the gaps
Fig. 3. Binary face mask with marked boundaries
shown in Fig. 3. For each column of the binary map we apply the processing from toprow down to 45% elements of the height (hindex). The mechanism for closing the gaps can be explained using a simple example. Suppose x = 1 1 1 1 0 0 1 1 1 0 0 0 1 1 0 contains the binary pixels of any selected column. Finding the starting and ending index of contiguous 1’s we get 1 7 13 y= . Now filling indices (5 6) and (10 11 12) with 1’s, the modified col4 9 14 umn values become x = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 . The number of columns from both left and right side of the image are chosen to be 30% of the width (windex). Skin segmentation for binary face map has the advantage that the processing required to extract fiducial points needs to be carried out only inside this map. Secondly, it is possible to locate eyes and lip region independent of each other so that their performance will not depend on each other. Independent search to locate eyes and the lip region also has the benefit that they can be carried out in parallel to speed up the processing.
4
Feature Map Generation
Most approaches for eye and mouth detection are template based. However, in this paper we directly locate eyes and mouth based on measurements derived
An Adept Segmentation Algorithm and Its Application
557
from the r-g, RGB and Y Cb Cr color space components of the images. In order to locate the possible eye region we first create a feature map (F M ) using the equations and conditions below.
S1. S2. S3. S4.
fupper (r) = −1.3067r2 + 1.0743r + 0.1452
(5)
flower (r) = −0.7760r2 + 0.5601r + 0.1766
(6)
flower (r) < g < fupper (r) R>G>B R − G > 10 65 < Cb < 125 and 135 < Cr < 165 FM =
1 0
if all segmentation rules S1, S2, S3 and S4 are true, otherwise.
Once FM is obtained it is complemented and the components touching the borders are cleared to obtain the composite feature map (CF M ). Afterwards the CFM is masked by the binary face mask (BM ) obtained in previous section and finally the eye region is extracted using: eyereg = CF M (toprow + 0.19 ∗ hindex : toprow + (0.52 ∗ hindex))
(7)
where hindex represents the height of the binary skin segmented image (see Fig. 3). The steps, as described above for the extraction of the region containing the eyes can be seen in Fig. 4.
Fig. 4. Obtaining the region containing the eyes
558
5
˙ E.A. Ince and S.A. Ali
Rules for Eye Component Verifications
Among all the extracted possible eye candidates only the largest 5 components are kept. If fewer than 5 components remain then all components are kept. From the selected possible eye components there exist some false candidates that needs to be eliminated. In order to remove these falsely extracted components, the proposed algorithm performs component verifications based on a set of rules which are discussed below. A. If the height for any candidate is larger than a threshold value then those components should be eliminated (refer to Fig. 5). B. If the right vertical border of any component is to the left of a vertical line at a distance equal to one-eight of the image width or the left vertical border is to the right of a vertical line at seven-eight of the image width then they should be eliminated (refer to Fig. 6a). C. Knowing that in an image the eyes are always horizontally aligned (symmetric), we first detect the top, bottom, left and right boundaries for each possible eye component and scan the left and right side within the vertical boundaries to see if other components exist. If we find any component/s in the searched bands the component under test is kept but if no other component is found the candidate should be eliminated (refer to Fig. 6b).
Fig. 5. Eliminating components with large height
Fig. 6. (a) Out of bounds elimination (b) Isolated component elimination
Fig. 7. Possible four component cases
An Adept Segmentation Algorithm and Its Application
559
D. In the case of having 4 remaining components which also satisfy the symmetry property the decision is based on two criterions. At first we find the horizontal distance between the two symmetric components namely d1 and d2 and secondly the vertical distance h between them. If the horizontal distances d1 and d2 are comparable but the vertical distance between them is small we choose the lower two components as eyes, since the upper components are eyebrows. If the horizontal distances d1 is fairly greater than d2 and h is also relatively large we choose the upper two components as eyes, as the lower components are due to nostrils. If the horizontal distances d1 and d2 are comparable but h is quite large we again choose the upper two components as eyes, as the lower components are due to the lip corners (refer to Fig. 7). E. If there exist more than two components that are aligned in a horizontal band, find the center (half of windex) and choose the two components that are at a minimum horizontal distance from the center.
6
Lip Map Generation
Lip detection as discussed in [15] states that the lip color distribute at the lower areas of the crescent area defined by the skin colors on the r-g plane. Hence, as before we can define another quadratic polynomial discriminant, lr (r), for the extraction of the lip pixels. How to combine the three polynomial discriminants and the RGB color space information in order to obtain a lip map (LM ) is shown below. (8) fupper (r) = −1.3067r2 + 1.0743r + 0.1452 flower (r) = −0.7760r2 + 0.5601r + 0.1966
(9)
lr (r) = −0.7760r + 0.5601r + 0.2563
(10)
2
LM =
1 0
if g > flower , g < lr , R > 60, G > 30, B > 30 otherwise.
Fig. 8. Extracting the mouth regions
560
˙ E.A. Ince and S.A. Ali
where R,G and B are the intensity values in the red, green and blue channels of the RGB color space. Final step in the processing is to crop the lip map using equation (11) to get the mouth region. M outhRegion = LM (bottomrow − ceil(0.60 ∗ hindex) : bottomrow − 25) (11) The above described processing steps are depicted in Fig. 8.
7
Rules for Mouth Component Verifications
To remove falsely extracted mouth candidates the following set of rules are applied. A. If the right vertical border of any component is to the left of a vertical line at a distance equal to 16 % of the image width or the left vertical border is to the right of a vertical line at 80 % of the image width then they should be eliminated. B. Remove all components whose width to height ratio is below 1.8. C. If the area (number of connected 1’s) for a candidate is less than a fixed threshold value then remove this component. D. If the area (number of connected 1’s) for a candidate is greater than a fixed threshold value then also remove this component. E. If more than 2 components remain select the largest two. 2 F. Compute the ratio A · ( wh ) for each of the two components where, A is the area, w is the width and h is the height of the component considered. Finally, select the one with the larger ratio.
8
Simulation Results
We tested the proposed algorithm on 311 test images (30 subjects) that we have randomly selected from the color FERET database. Four sample faces with
Fig. 9. Marked eye features
An Adept Segmentation Algorithm and Its Application
561
Fig. 10. Marked mouth features
detected features using the proposed algorithm are shown in Fig. 9 and Fig. 10. The experimental results show that the algorithm can robustly detect the local regions for people with varying skin-tones. Table 1 provides a summary for the correct detection rates for the left eye, right eye and the mouth regions. As can be seen from the results all three rates are very promising. Also since this method does not extract the mouth corners based on the lipcut, it is not required that the subject’s mouth is closed. Table 1. Performance in extracting component regions from FERET database region containing left eye region containing right eye region containing mouth
9
92.53 94.02 91.00
Conclusion
A efficient rule based local region extraction algorithm making use of quadratic polynomial discriminants derived from the r-g chromatic coordinates and the RGB color space information is proposed. The algorithm will eliminate the false feature candidates using spatial and geometrical representations of facial components. The paper adopts an approach in which the feature and lip maps are generated independently and hence provides the flexibility of being able to run the two extraction routines in parallel for better time management. Preliminary simulation results indicated in Table 1 implies that the proposed algorithm is quite effective. Authors believe that if the affine transform parameters are estimated and the transformations are reversed then the correct extraction rates can be even higher. Finally, since the mouth region extraction is not based on determining the lip-cut there is no restriction on the mouth mimics.
562
˙ E.A. Ince and S.A. Ali
Acknowledgment The work presented herein is an outcome of research carried out under Seed Money Project EN-05-02-01 granted by the Research Advisory Board of Eastern Mediterranean University.
References 1. Rein-Lien, H., Abdel-Mottaleb, M.: Face detection in colour images, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 24, No:5, pp. 696-706, May 2002. 2. Ming-Hsuan, Y., Kriegnam, D. J., Ahuja, N.: Detecting faces in images: a survey, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 24, No:1, pp. 34-58, January 2002. 3. Alattar, A. M., Rajala, S. A.:Facial Features Localization In Front View Head And Shoulders Images, IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 6, pp. 3557-3560, March 1999. 4. Rein-Lien, H., Abdel-Mottaleb, M., Jain, A. K.:Face detection in color images, Tech. Report MSU-CSE-01-7, Michigan State University, March 2001. ¨ 5. ´Ince., E. A., Kaymak., S., C ¸ elik., T.: Y¨ uzsel Oznitelik Sezimi I¸cin Karma Bir Teknik, 13. IEEE Sinyal ´ Isleme ve ´ Iletisim Uygulamaları Kurultayı, pp. 396-399, May 2005. 6. Hu, M., Worrall, S., Sadka A. H., Kondoz, A. M.: Face Feature Detection And Model Design For 2D Scalable Model-Based Video Coding, International Conference on Visual Information Engineering, pp. 125-128, July 2003. 7. Sung, K., Poggio, T.: Example based Learning for View-based Human Face Detection, C.B.C.L. , Paper No: 112, MIT, 1994. 8. Moghaddam, B., Pentland, A.: Face Recognition using View-Based and Modular Eigenspaces, IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol. 20, No:1, pp. 23-38, January 1998. 9. Rowley, H., Baluja, S., Kanade, T.: Neural Network Based Face Detection, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 24, No:1, pp. 34-58, January 2002. 10. Adams, R., Bischof, L.: Seeded Region Growing, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 16, No:6, pp. 641-647, June 1994. 11. Heisele, B., Ho, P., Wu, J., Poggio, T.: Face Recognition: Component-based versus Global Approaches, Computer Vision and Image Understanding 91, 6-21, February 2003. 12. Phillips P. J., Wechsler H., Huang J., Rauss P.: The FERET database and evaluation procedure for face recognition algorithms, Image and Vision Computing J., Vol.16, No.5, pp. 295-306, 1998. 13. Phillips P. J., Moon H., Rizvi S. A.,Rauss P. J.: The FERET Evaluation Methodology for Face Recognition Algorithms, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 22, pp. 1090-1104, 2000. 14. Terrillon, J. C., Shirazi, M. N., Akamatsu, S.: Comperative performance of different skin chrominance models and chrominance spaces for the automatic detection of human faces in color images, Proc. IEEE Int. Conf. Automatic Face and Gesture Recognition, pp. 54-61, 2000. 15. Chiang, C-C., Tai, W-K., Yang, M-T., Huang, Y-T., Huang, C-J.: A novel method for detecting lips, eyes and faces in real time, Real-Time Imaging 9 , Vol. 9, pp. 277-287, 2003.