Eye-verifier Using Ternary Template for Reliable ... - Semantic Scholar

Eye-verifier Using Ternary Template for Reliable Eye Detection in Facial Color Images Nguyen Van Huan, Nguyen Thi Hai Binh and Hakil Kim Abstract— This paper introduces an eye-verifier for reliable detection of eyes in facial color images. At first, the eye region candidates are searched by the circular filter to the binary face image. The eye candidates are then fed into an eye-verifier. The eye-verifier uses a ternary template, generated from the eye area consisting of iris, sclera and skin. Then the template matching is made by ternary Hamming distance. Experimental results evaluating the proposed method on our own face database and FERET database show promising performance in terms of detection rate. The proposed algorithm showed a robust performance to the face poses and eye directions

I. INTRODUCTION Biometrics is undoubtedly contributing to the human ID management in the modern society. With several advantages compared to other biometrics, face recognition has been finding many applications since its birth about two decades ago. A robust and accurate detection of eyes is a crucial step in applications related to face recognition as the eyes are considered one of the most salient and distinguishable features on the human face[1, 2]. Once the two eyes are located reliably, the other facial features (e.g. mouth, nose, ears etc.) can be roughly estimated based on the position of eyes as they have reference values to each other [3]. Many researchers have dedicated their efforts to propose several approaches to the problem. The existing methods can coarsely be classified into 3 categories: template-based approaches, feature-based approaches and appearance-based approaches. As to the template-based approaches, an eye template is firstly generated using the knowledge of the eye shape, and then, the most likely eye position is located by matching the template to the extracted window in the image [4 - 6]. The appearance-based approaches aim at producing a classifier that is learned from a large amount of sample images in different subjects, under different orientations of head and eye itself, as well as different states of illumination using Neural networks or SVM [7 - 9].

This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the Biometrics Engineering Research Center (BERC) at Yonsei University. Portions of the research in this paper use the FERET database of facial images collected under the FERET program, sponsored by the DOD Counterdrug Technology Development Program Office. The authors are currently with the Computer Vision Lab., INHA University, Incheon, Korea. The authors are also the members of Biometrics Engineering Research Center in Korea. Emails (huan, binh, [email protected]). 978-1-4244-5020-6/09/$25.00 ©2009 IEEE

The feature-based approaches make use of the distinctive features of eyes such as dark pupil, white sclera, the circular shape of iris and the ellipse-like shape of eyelids, to specify the eyes on human face [10, 11]. Besides, symmetrical characteristics of eye pair are also utilized in a number of eye detection methods [12, 13]. The previous methods try to search for the two eyes on the face image. Verification is commonly done by using the symmetric characteristics of the face or the relative position of the two eyes and mouth. In the latter situation, the detection of mouth is further required. However, detection of eyes encounters difficulty when the face is not captured in the frontal view or when one eye is visible in the face image or in cases the two eyes are not horizontally level. In this paper, a template-feature based eye-verifier is proposed for detection of eyes in facial color images. The proposed method is highly appropriate for the new generation of high quality facial photos used for biometric passport, recommended by International Civil Aviation Organization Report [27]. The proposed method consists of two stages: eye region candidate detection and eye verification. For the eye region candidate detection, the face region is firstly detected by using skin color in the HSV and YCbCr color spaces. Thereafter, only the face region is cropped for further processes. The circular filer [14] is applied to the binary face image after binarization of the face region image. The eye region candidates are selected from the peaks in the filtered image. The eye candidates are then fed into an eye-verifier. A ternary eye mask is generated for each eye candidate by overlapping three sub-regions: iris, sclera and the skin around eye and then compared with a ternary eye template which is generated on the basis of geometric features of an eye. Verification is performed by measuring the ternary Hamming distance between the template and the ternary eye mask to decide whether the eye candidate is truly eye or not. After the eye is verified, the detailed information about the eye such as pupil center and eye corners are all retrieved. The algorithm diagram can be figured out as in Fig. 1 below. The remainder of this paper is organized as follows. Section II presents the detection of eye candidates by applying circular filters. The main section describing the eye-verifier is explained in Section III. Section IV illustrates the experimental results. Conclusion and future work are described in Section V.

where cH is the cumulative histogram of the difference image

I Diff . Then, the morphological operations, dilation and erosion, are utilized to fill the reflections in the pupil regions (if present) and remove small connected components. In order to increase the discriminating power of the circular filter subsequently applied, the binary face image is re-assigned:

⎧1 Γ b ' ( x, y ) = ⎨ ⎩ −1

Figure 1: Overall process of eye detection

if Γb ( x, y ) = 1 otherwise

(3)

B. Eye candidate region detection II. EYE REGION CANDIDATE DETECTION A. Face detection and binarization Detection of face region is done by using skin color information that is known as one of the most important features for face localization. It is said that skin color after removing the brightness distributes in a small area of the chromatic color space [15]. Motivated by the work of Kim, et al. [16], instead of detecting skin color directly in RGB color model, Hue component of HSI color model and Cb, Cr components of YCbCr are used. The decision rules originate from Sobottka, et al. [17] (for the hue image) and Hsu, et al. [18] (for the Cb, Cr images). Pixel (i,j) is assigned as skin color pixel if 00 < hue ( i, j ) < 500 (1) 137 < Cr ( i, j ) < 177 77 < Cb ( i, j ) < 127

190 < Cb + 0.6 × Cr < 215 Thereafter, small connected components in the map are removed by simple morphological operations to obtain the face mask. After the face region is detected, only the face region is cropped for further processes. The face binarization is a minor modification of Khosravi and Safabakhsh [22]. By observing that the eyes are dark regions on face, the morphological closing operation is applied to the grayscale face image. Then the difference between the closed image and the grayscale face image is computed. The size of the structuring element R is estimated by the same method. The difference image exhibits outstanding pixels for those in the low-intensity regions such as eyes, eyebrows, nostrils or beards. In order to generate the binary image, the threshold is selected by considering the ratio of the eye region in the human face. Thus, 5 percent of the highest intensities in the difference image are considered to belong to the low-intensity areas. By normalizing the difference image to [0, 255], the binarization process can be presented as

th = max {i | cH (i ) < 0.95* cH (255)} i =1,...,254

⎧1 Γ b ( x, y ) = ⎨ ⎩0

if I Diff ( x, y ) > th otherwise

(2)

With the circular shape of eye in the binary face image, the eye-candidates are located by convolving the binary face image with a series of circular filters. Different filters at different radii (namely, 4 filters in our experiments) are assessed and the measure in Eq. (4) is used to find out the most fitting filter that corresponds to the maximum k. Max ( I f ) k= (4) s 2 f

where Max ( I f ) is the maximum value of the filtered image and

s f is the size of the filter. The radii of the candidate

filters

Rcf are selected in a range R − δ < Rcf < R + δ . An

example of the filtered image can be seen in Fig. 2 below as the eye-peaked image. Four peaks corresponding to local maxima are extracted to feed into an eye-verifier module. If the two eyes were already found before the four peaks are verified, the process finishes. The size of the eye window to be extracted is 4 * Rcf (height) × 8* Rcf (width) centered at the peak, to assure that the eye components are included in the window, where

Rcf is the radius of the selected circular filter. The overall diagram for detection of eye region candidates is depicted as Fig. 2 below.

a)

b)

c)

d)

Fig. 2 Diagram of the eye candidate region detection. a) grayscale face image; b) binary face image; c) circular filter; and d) eye-peaked image

III. EYE-VERIFIER A. Ternary eye-mask generator Under low illumination, the separability between the iris, sclera and skin classes is degraded. Thus, the light compensation method proposed in [19] is used to enhance the separability, since for the eye window, the two assumptions of the method are satisfied. Due to small size of eye window, the window is up-sampled to three times using the pixel interpolation. While interpolation enlarges the window size, it also makes each class more homogeneous. Both effects are

beneficial for the eye-mask generation process. Then, a small area at the bottom of the eye window is referred to the skin sample area.

Fig. 3. Skin sample area with a height of 1/3 of pupil radius in the eye window

1) Iris mask detection: The iris mask in this paper includes pixels that are in the pupil or iris region of the eye. Comparing to the other parts of the eye window, those pixels have the lowest intensities, even for African descent. From the skin sample area, the averages of RGB components are computed, and the 3D distance map is computed using the Euclidean distance as

I 3 D ( x, y ) = where

∑

κ ∈{R ,G , B}

κ ( I EW ( x, y ) − mκ )2

(7)

κ I EW refers to each color component of the eye window

κ

image and m indicates the intensity mean value of each color component of skin. It is obvious that the iris class is more separable from the skin class in color space. As a result, the high intensities in the I 3D concentrate in the iris region. After normalizing I 3D to [0, 1], the following rule is used to detect an iris mask M Iris

I 3Norm D ( x, y ) =

, I 3 D ( x, y ) − min ( p (i, j )) p ( i , j )∈I3 D

(8)

max ( p (i, j )) − min ( p (i, j )) p ( i , j )∈I3 D

p ( i , j )∈I3 D

Norm ⎧ ⎪⎧ I 3 D ( x, y ) > 0.6 ⎪1 if ⎨ gr gr M Iris ( x, y ) = ⎨ ⎪⎩ I EW ( x, y ) < m − 20 ⎪ ⎩0 otherwise Eq. (8) explains that the pixels in the iris region should have a long 3D distance to the skin and low intensity; however, the eyelids are sometimes visible in the mask. Therefore, a morphological closing operation with a small size structuring element is utilized to remove them. In addition, the morphological operation can also fill the hole in the iris region that represents specular reflections. After all, the biggest connected region is kept to be considered as the iris mask.

a)

b)

c)

Fig.4. Illustration of iris mask detection. a) the eye window; b) the 3D distance map; and c) iris mask after morphology

For a fine estimation of the iris parameters (i.e., the center and radius), the iris mask is re-assigned as following in order to be adaptive to a semi-circular filter, ⎧1 if M ' Iris ( x, y ) = 1 M ' Iris ( x, y) = ⎨ (9) ⎩-1 otherwise Then, the center ( ( xIris , y Iris ) and

and

radius

of

the

iris

region

rIris ) is determined by applying a

semi-circular filter (as depicted in Fig. 5) to the iris mask. The reasoning for using a semi-circular filter instead of a circular filter is that the upper half of the iris is very often occluded.

Fig.5. Semi-circular filter

2) Sclera mask detection: In order to find sclera, the saturation image in the HSV color space is utilized. Regardless of the color of skin, the pixels belonging to sclera in the saturation image are always in low-intensity due to its white color. A morphological closing operation with a structuring element with the size of iris’s radius is applied to the saturation image, and then the resulted image is subtracted from the saturation image to generate the difference image. Compared to the saturation image, the sclera class in the difference image to the skin class is more separable. Then, the threshold for binarizing the difference image is defined by ths = msk + 4* σ sk where msk is the mean value of the pixels of difference image in the skin sample area and

σ sk is

their standard deviation. Assuming the skin pixels in the difference image has a normal distribution; the 4*σ sk distance can guarantee occupancy of nearly all skin pixels. Then, the raw sclera mask is found by if S Diff ( x, y ) > ths ⎧1 M Sc ( x, y) = ⎨ otherwise ⎩0

(10)

where S Diff is the difference image. The raw sclera mask generated by Eq. (10) may include pixels from iris and specular reflections in the iris region.

a)

b)

c)

d)

e)

f)

Fig.6. Illustration of sclera mask detection: a) the eye window; b) the saturation image; c) the difference image; d) the raw sclera mask; e) the Eye image; and f) the sclera mask

Therefore, an Eye image is constituted from the raw sclera mask and iris mask by Eye = M Iris & M Sc where

& indicates

the OR operation. The Eye image is smoothed by morphological dilation and erosion operations, and then the largest connected region is kept. The sclera mask is then refined by M Sc = Eye − M Iris . 3) Eye-mask generation: The ternary eye-mask is the overlapping of the three classes (i.e. iris, sclera and skin) with different amplitudes for each class: 0 for iris, 0.5 for skin and 1 for sclera. The ternary eye-mask having the same size as the masks is generated by if M Iris ( x, y ) = 1 ⎧0 ⎪ (11) TEM ( x, y ) = ⎨1 if M Sc ( x, y ) = 1 ⎪0.5 otherwise ⎩ From the binary Eye image, eye features (e.g. eye corners and eye height) can be determined. The leftmost and rightmost edge points in Eye are considered to be the eye corners. Based on the two corners, the TEM image is rotated so that the two corner points are horizontally level and then, it is cropped to compact the eye to be ready for matching as shown in Fig. 7 below.

a)

b)

Fig. 7. a) TEM image, and b) ternary eye-mask after cropping.

B. Ternary eye-template generation In modeling an eye template, an eye usually consists of a circle that represents the boundary of the iris to the surroundings, and parabolic curves to represent the upper and lower eyelids [4, 20, 21]. From the binary Eye image, parameters needed to represent an eye can be obtained. In this method, a ternary eye template having the same size as the eye-mask is generated that also consists of three components: iris, sclera and skin. The upper eyelid is modeled by a parabolic y = hU − by y =

Fig.8. Ternary eye-template

C. Hamming distance-based ternary eye matching Hamming distance measures the disagreements of symbols in corresponding positions between two strings of equal length. For verification of the eye candidate, the ternary Hamming distance is computed to assess the similarity between the ternary eye-mask and the ternary eye-template as TEM ⊗T TET HD = (12) TEM where TET is the ternary eye-template. The operation ⊗T indicates a ternary XOR-like that is defined as ⎧0 p ⊗T q = ⎨ ⎩1

if p=q otherwise

IV. EXPERIMENTAL RESULTS The proposed algorithm is evaluated on a collection of 1,000 face images from the color FERET face database [24, 25] and 353 facial images collected by the authors. The collection from the FERET database includes frontal and half-profile images and involves European, Asian and African descents. The experiment on this collection shows that the eye-verifier is sensitive to the quality of the face images, especially in the eye regions since the proposed method requires the distinction between the three classes of sclera, pupil and skin. The detection rate is 87.2%, and the main reason for failed cases is the failure in skin detection. In all the experiments, a Hamming threshold of 0.4 is used for verification.

4 * hU 2 x ; and the lower eyelid w2

4 * hL 2 x − hL . The region inside the two boundaries w2

is assigned with the value 1. The iris circular region is modeled by a circle of radius rIris centered at the same center position

in 2

the 2

eye-mask

by

2

( x − xIris ) + ( y − yIris ) = rIris . The iris region bounded by the two eyelids and the circle is assigned with the value 0. The remaining pixels are assigned with 0.5 to represent skin. The ternary eye-template is as depicted in Fig.8 below.

(13)

Fig.9 Examples from FERET database

In order to illustrate the robustness of the proposed method to the variety of eye directions and face poses, a face database is created from 10 volunteers. For each volunteer, 35 to 40 images were captured to cover almost all face and eye directions such as face roll, yaw, pitch, expressions and eye look-left, look-right, look-up, look-down, and different illumination. Also the database includes the cases in which only one eye is visible due to wearing eye-patch or one eye is closed as depicted in Fig.10. The summary results are reported in table 1 below. The result of the proposed method is not only the eye positions but also the accurate eye feature points which are valuable for evaluating eye conditions.

Fig. 11 shows some examples of failed detection. The proposed method failed to verify the eyes in cases the face image is nearly profile, then the eyes are deformed and different from the template. Besides, the eye-verifier can not work for cases the eye is too narrowly opened to detect sclera and pupil.

Table 1: Performance summary on our face pose database

Image type Head poses (roll, yaw, pitch) Eye directions (look left, right, up, down) Expressions (neutral, smile, surprise, sad, eye squint, angry, mouth open) Wearing eyeglasses

No. of images 108

Detection rate (%) 87.04

Error rate (%) 1.85

80

97.5

0

70

92.86

4.28

95

85.26

3.15

a)

b)

c)

d)

Fig.11 Failed cases of eye detection. a) narrow-opening eyes; b) nearly profile eye; c) shadows on eyes; and d) reflections on eyes

V.CONCLUSION The geometric symmetry information is widely used in eye detection methods for verification of eye pairs. However, this information is sensitive to the pose of face and requires the appearance of both two eyes. This paper proposes an eye verification method that verifies each eye separately based on the knowledge of the eye geometry and the eye features. The proposed method showed a robust performance to the variety of the face poses and eye directions. However, the method is still not robust enough to overcome the difficulty of bad quality images. Further effort will be dedicated to solve this problem and then to evaluate the method on more public databases to have a thorough comparison to other methods. REFERENCES [1] [2] [3] Fig.10 Examples from our own database showing the robustness of the proposed method to different head poses and eye directions

A. Senior, R.L. Hsu, M.A. Mottaleb, and A.K.Jain, "Face Detection in Color Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002. 24(5): p. 696-706. J. Wu, and Z.H. Zhou, "Efficient face candidates selector for face detection," Pattern Recognition, 2003. 36: p. 1175-1186. L.G. Farkas, T.A. Hreczko, and M.J. Katic, "Craniofacial norms in North American caucasians from birth (one year) to young adulthood," in Farkas LG, ed. Anthropometry of the Head and Face. 1994, Raven Press: New York. p. 241–335.

[4]

A.L. Yuille, P.W. Hallinan, and D.S. Cohen, "Feature extraction from faces using deformable templates." Int. J. Comput. Vision, 1992. 8(2): p. 99-111. [5] K.M. Lam, and H. Yan, "An Improved Method for Locating and Extracting the Eye in Human Face Images," in Proceedings, IEEE International Conference on Pattern Recognition (ICPR’96). 1996. Vienna, Austria: IEEE, p. 411-415. [6] Y. Li, X. Qi, and Y. Wang, "Eye detection using fuzzy template matching and feature-parameter-based judgement," Pattern Recognition Letters, 2001. 22(10): p. 1111-1124. [7] D. Balya, and T. Roska, "Face and eye detection by CNN (Cellular Neuron Network) algorithms," Journal of VLSI Signal Pro. , 1999. [8] W.M. Huang, and R. Mariani, "Face detection and precise eyes location," in Proc. Int. Conf. On Pattern Recognition (ICPR_00). 2000. [9] S. Kawato, and N. Tetsutani, "Real-time detection of between-the-eyes with a circle frequency filter," in Proc. Of ACCV2002. 2002. p. 442-447 [10] S. Sirohey, and A. Rosenfeld, "Eye detection in a face image using linear and nonlinear filters." Pattern Recognition Letters, 2001. 34: p. 1367-1391. [11] T. Kawaguchi, D. Hidaka, and M. Rizon, "Detection of eyes from human faces by Hough transform and separability filter," in Proceedings of Int. Conf. on Image Processing. 2000, p. 49-52 [12] K. Peng, L. Chen, S. Ruan, and G. Kukharev, "A Robust Algorithm for Eye Detection on Gray Intensity Face without Spectacles," Journal of Computer Science & Technology, 2005. 5(3): p. 127-132. [13] J. Song, Z. Chi, and J. Li, "A robust eye detection method using combined binary edge and intensity information," Pattern Recognition, 39, 2006. 39: p. 1110-1125. [14] N.V. Huan, and K. Hakil, "Robust iris segmentation via simple circular and linear filters," J. Electron. Imaging, 2008. 17(4): p. 043027. [15] J. Yang, and A. Waibel, "Tracking human faces in real time," 1995, Carnegie Mellon Univ. [16] Y.G. Kim, J.P. Goh, and H.R. Byon, "Facial region extraction based on multi channel skin color model," Information and Science Society’s Collection of Papers, 2001. 28(2): p. 433-435. [17] K. Sobottka, and I. Pitas, "Face localization and facial feature extraction based on shape and color information," in Proceedings of IEEE International Conference on Image Processing. 1996, Vol. III, p. 483-486. [18] M. Hsu, … "Face feature detection and model design for 2-D scalable model based video coding," in Proceedings of IEEE International Conference on Visual Information Engineering. 2003, p. 125-128. [19] C.C. Chiang, and C.J. Huang, "A robust method for detecting arbitrarily tilted human faces in color images," Pattern Recognition Letters, 2005. 26: p. 2518-2536. [20] R.P. Wildes, "Iris Recognition: An Emerging Biometric Technology," in Proceedings of the IEEE. 1997, p. 1348-1363 [21] J. Daugman, "How iris recognition works," IEEE Trans. Circuits and Systems for Video Technology, 2004. 14(1): p. 21-30. [22] M.H. Khosravi, R. Safabakhsh, "Human Eye Sclera Detection and Tracking Using a Modified Time-Adaptive Self-Organizing Map," Pattern Recognition, 41 (2008): p.2571-2593 [23] ICAO report source http://www2.icao.int/en/MRTD/Pages/Downloads.aspx [24] P.J. Phillips, H. Wechsler, J. Huang, P. Rauss, "The FERET database and evaluation procedure for face recognition algorithms," Image and Vision Computing J, Vol. 16, No. 5, pp. 295-306, 1998. [25] P.J. Phillips, H. Moon, S.A. Rizvi, P.J. Rauss, "The FERET Evaluation Methodology for Face Recognition Algorithms," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 22, pp. 1090-1104, 2000.