sensors Article
Real-Time Detection and Measurement of Eye Features from Color Images Diana Borza 1 , Adrian Sergiu Darabant 2 and Radu Danescu 1, * 1 2
*
Computer Science Department, Technical University of Cluj Napoca, 28 Memorandumului Street, Cluj Napoca 400114, Romania;
[email protected] Computer Science Department, Babes Bolyai University, 58-60 Teodor Mihali, C333, Cluj Napoca 400591, Romania;
[email protected] Correspondence:
[email protected]; Tel.: +40-740-502-223
Academic Editors: Changzhi Li, Roberto Gómez-García and José-María Muñoz-Ferreras Received: 28 April 2016; Accepted: 14 July 2016; Published: 16 July 2016
Abstract: The accurate extraction and measurement of eye features is crucial to a variety of domains, including human-computer interaction, biometry, and medical research. This paper presents a fast and accurate method for extracting multiple features around the eyes: the center of the pupil, the iris radius, and the external shape of the eye. These features are extracted using a multistage algorithm. On the first stage the pupil center is localized using a fast circular symmetry detector and the iris radius is computed using radial gradient projections, and on the second stage the external shape of the eye (of the eyelids) is determined through a Monte Carlo sampling framework based on both color and shape information. Extensive experiments performed on a different dataset demonstrate the effectiveness of our approach. In addition, this work provides eye annotation data for a publicly-available database. Keywords: iris segmentation; sclera segmentation; eye center localization; shape analysis; eyelid segmentation
1. Introduction The eyes and their movements can be used as an unobtrusive method to gain deeper understanding of one’s cognitive and neurological processes: studies show that facial expressions are the major interaction modality in human communication, as they contribute for 55% to the meaning of the message. Robust eye detection and tracking algorithms are crucial for a wide variety of disciplines, including human computer interaction, medical research, optometry, biometry, marketing research, and automotive design (notably through driving attention monitoring systems). In today’s context of globalization, security is becoming more and more important. Biometry refers to a set of methods that rely on a set of distinctive, measurable characteristics that are used to describe individuals for the purpose of identification and access control. As opposed to traditional methods of identification (token-based identification or knowledge-based identification), biometric data is unique and permanently associated with an individual. Iris recognition is growing in popularity and is currently being used in a broad set of applications. The first step in any iris recognition process is the accurate segmentation of the region of interest: the localization of the inner and outer boundaries of the iris and removal of eyelids, eyelashes, and any specular reflections that may occlude the region of interest. Secondly, the iris region is transformed into a rubber-sheet model and a unique bit pattern encoding is computed based on these pixels. Recently, a new ocular biometric technique has received particular interest from the scientific community: sclera recognition [1]. The sclera region (the white outer coat of the eye) comprises a unique and stable blood vessel structure which can be analyzed to identify humans. Moreover, Sensors 2016, 16, 1105; doi:10.3390/s16071105
www.mdpi.com/journal/sensors
Sensors 2016, 16, 1105
2 of 24
this method can complement iris biometry by increasing its performance in case of off-angle or off-axis images. The major challenges that ocular biometry poses is the accurate segmentation of the regions of interest (the iris or the sclera). An incorrect segmentation can reduce the available information, but it can also introduce other patterns (eyelashes, eyelids) that can influence the biometric applicability of the trait. This work focuses on the accurate segmentation of the eye: the iris region and the external shape of the eye (the eyelids). Therefore, this paper’s results are relevant in a variety of domains, ranging from iris and sclera biometrics, psychology, psychiatry, optometry, education, and driving assistance monitoring. This paper will highlight the following contributions: ‚ ‚ ‚ ‚ ‚
A fast and accurate iris localization and segmentation algorithm; A new method for representing the external shape of the eyes based on six control points used to generate two parabolas for the upper and lower eyelids; An original sclera segmentation algorithm based on both color and shape constraints (as opposed to other methods from the specialized literature which are based solely on color information [2–4]); A Monte Carlo segmentation algorithm based on the proposed model and the corresponding matching methodology; and The annotation [5] of a publicly-available database with the eye positions and eyelid boundaries.
The rest of this paper is organized as follows. Section 2 presents a general overview of the most relevant research literature on eye localization, iris and sclera segmentation. The proposed method is detailed in Section 3. Experimental results on eye localization and segmentation are presented in Section 4. The conclusions and future research directions are presented in Section 5. 2. Related Work Over the past decades eye localization and tracking has been one of the most active research areas in the field of computer vision, mainly due to its application to a wide variety of research fields. The organization of all the eye localization methods into a single and general taxonomy proves to be a challenging task. Based on the light source, eye detection methods can be classified as active light methods (that rely on infrared or near-infrared light sources) and passive light methods (that use visible light to illuminate the eyes). Two general types of active light method can be exploited using a physical property of the pupil that modifies its appearance in the captured image depending on the position of the IR illuminator: bright pupil and dark pupil. When the IR illuminator is coaxial with the camera, the pupil appears as a bright zone (bright pupil), and when the light source is offset from the optical path the pupil appears dark (dark pupil). This work uses a passive light method. In [6] a detailed survey on eye detection and tracking techniques is presented, with emphasis on the challenges they pose, as well as on their importance in a broad range of applications. The authors propose a three class taxonomy based on the model used to represent the eyes: shape-based methods, appearance-based methods, and hybrid methods. Shape-based methods use a prior model of the eye geometry and its surrounding texture and a similarity measure. Appearance-based methods detect and track eyes based on the color distribution or filter responses of the eye region. These methods require a large amount of training data representing the eyes under different illumination conditions and face poses. Finally, hybrid methods combine two or more approaches to exploit their benefits and to overcome their drawbacks. As an example, the constrained local models (CLM) [7] framework, a recently emerged promising method for facial feature detection, uses a shape model to constrain the location where each feature might appear and a patch model to describe the appearance of the feature. In [8] a hybrid method is proposed to extract features of the human face. First, a Viola-Jones face detector
Sensors 2016, 16, 1105
3 of 24
is applied to approximate the location of the face in the input image. Next, individual facial features detectors are evaluated and combined based on shape constraints. Finally, the results are refined using active appearance models tuned for edge and corner cues. A three-stage facial feature detection method is presented in [9]. On the first stage, the face region is localized based on the Hausdorff distance between the edges from the input image and a template of facial edges. The second stage uses a similar, smaller model for the eyes. Finally, the pupil locations are refined using a multi-layer perceptron trained with pupil centered images. Other methods localize several features on the face using local landmark descriptors and a predefined face shape model. In [10] the authors propose a new approach to localize features in facial images. Their method uses local detectors for each feature and combines their output with a set of global models for the part locations computed from a labeled set of examples. To model the appearance of each feature, a sliding window detector based on support vector machine (SVM) classifier with gray-scale scale invariant feature transform (SIFT) features is used. The work presented in [11] localizes nine facial landmarks in order to investigate the problem of automatic labeling of characters in movies. In the first stage, the position and the approximate scale of the face is detected in the input image. The appearance of each feature is determined using a variant of the AdaBoost classifier and Haar-like image features. Next, a generative model of the feature positions is combined with the appearance score to determine the exact position of each landmark. The approach presented in our paper uses a fast shape based approach to locate the center of the irises. After the accurate iris segmentation, the external shape of the eyes (the eyelids) is segmented based on color information and shape constraints. The entire method takes, on average, 20 ms per processed image. In general shape-based eye localization techniques impose a circularity shape constraint to detect the iris and the pupils [12,13], making the algorithms suitable only for near-frontal images. In [12], the authors developed a fast eye center localization based on image gradients: the center of the circular pattern of the iris is defined as the region where most of the image gradients intersect. An additional post-processing step, based on prior knowledge of the appearance of the eyes, is applied to increase the robustness of the method on images where the contrast between the sclera and the iris is not that obvious, and occlusions (hair, eyelashes, glasses, etc.) are present. In [14] a multi-stage circular Hough transformation is used to determine the iris center and its radius. Additionally, a similarity measure for selecting the iris center is computed based on the circularity measure of the Hough transform, the distance between the hypothetical iris centers and the contrast between the presumed iris and its background. In [15] the eyes are localized using two distinctive features of the iris compared to the surrounding area: the eye region has an unpredictable local intensity and the iris is darker compared to the neighboring pixels. The eye centers are selected using a score function based on the entropy of the eye region and the darkness of the iris. With the development of ocular biometrics, accurate localization of the iris and pupil area has drawn the attention of computer vision scientists. Iris and pupil segmentation were pioneered by Daugman [13] who proposed a methodology that is still actual: an integro-differential operator which uses a circular integral to search for the circular path where the integral derivative is maximal. The method searches for the circular contour that generates the maximum intensity change in pixel values by varying the center (cx , cy ) and the radius (radii) of the path. This operator has high computational complexity (for each possible center multiple radii scans are necessary to compute this operator) and it has problems detecting the iris boundary in cases of low intensity separability between the iris and the sclera region. Another influential approach [16] uses a two-stage iris segmentation method: first, a gradient based binary edge map is created, followed by a circular Hough transform to find the parameters of the iris circle. The binary map is generated so that it favors ranges of orientation (for example, to delimit the iris-sclera region, image derivatives are weighted to be more sensible to
Sensors 2016, 16, 1105
4 of 24
vertical edges). The main disadvantage of this method is its dependence on the threshold values used in the edge map construction phase. Other shape based eye detection studies use a more complex model of the eye [17,18] by also modeling the eyelids, but they are computationally demanding and their performance is strongly linked to the initial position of the template. In [18] the eyes are extracted using a deformable template which consists of a circle for describing the iris and two parabolas for the upper and lower eyelid. The template is matched over the input image using energy minimization techniques. A similar approach is used in [17], but the authors use information about the location of eye corners to initialize the template. Eye corner detection has gained the attention of several research works, as it is relevant for multiple domains, such as biometrics, assisted driving systems, etc. In [19] the authors propose a new eye corner detection method in periocular images that simulate real-world data. First, the iris and the sclera region are segmented in the input image to determine the region of interest. Next, the eye corners are detected based on multiple features (response of Harris corners algorithm, the internal angle between the two corner candidates, their relative position in the ROI). The method gives accurate results on degraded data, proving its applicability in real-world conditions. Recently sclera segmentation has shown to have a particular importance in the context of unconstrained, visible wavelength iris and sclera biometrics. Several works started to address this problem. In addition, sclera segmentation benchmarking competitions [20] were organized in order to evaluate the recent advances in this field and to attract researchers’ attention towards it. The sclera region is usually segmented based on color information. In [21] the sclera region is roughly estimated by thresholding the saturation channel in the HSI color space; based on this segmentation, the region of interest for the iris is determined and the iris is segmented using a modified version of the circular Hough transform. Finally, the eyelid region that overlaps the iris is segmented using a linear Hough transform. Other methods [2–4] use more complex machine vision algorithms to segment out the sclera. In [4] Bayesian classifiers are used to decide whether the pixels belong to the sclera region or to the skin region, using the difference between the red and green, and blue and green channels from the RGB color space. In [2] three types of features are extracted from the training images: color features that illustrate the various relationships between pixels in different color spaces, Zernike moments and histogram of oriented gradients (HOG) features (the sclera has significant fewer edges than other regions around the eyes). A two stage classifier is trained to segment out the sclera region. The classifiers from the first stage operate on pixels, and the second stage classifier is a neural network that operates on the probabilities that are the output of the first stage. In [3] two classifiers are used: one for the sclera region and one for the iris region. The features used for the sclera classifier are Zernike moments and distinctive color features from different color spaces. The iris is localized based on three control points from the iris on the summation of two edge maps. Eyelids are segmented only near the iris area based on Canny edge detection and parabolic curve fitting. 3. Solution Description 3.1. Iris Segementation The outline of the iris segmentation algorithm proposed in this paper is depicted in Figure 1. First, the face area is localized using the popular Viola-Jones face detector [22]. This method uses simple rectangular features (Haar-like features) and an efficient image representation, integral image transform, which allows computing the response of these features very quickly. The face region is segmented using the AdaBoost algorithm and a cascade of increasingly complex classifiers, which rapidly discards the background pixels. On the face area, the eye region is computed using facial proportions. Additionally, the eyebrow area is roughly estimated using simple image projections. The width of the detected face image is cropped to 70% (15% of the image width are removed on each side) in order to assure that the
Sensors 2016, 16, 1105 Sensors 2016, 16, 1105 Sensors 2016, 16,Image 1105 Image
5 of 24 5 of 24 Face localization Face localization
Approx. eye area Approx.
FRST FRST
eye area
Circular objects Circular
Eye center detection Eye center detection
objects
Eye centers
5 of 24
Eye centers
background and the hair are removed from is computed on Eyebrow the image. The vertical image projection Sobel radial Iris radius Vertical image area local minimum (projection valley) is selected as the rough Eyebrow the gray scale intensity image and the first projection Sobel radial Iris radius projection Vertical image area projection Sensors 2016, for 16, 1105 5 of 24 projection estimation the eyebrow region (Figure 2). Figure 1. Iris segmentation algorithm. Circular
Approx.
Image
1. Iris segmentationobjects algorithm. Face Figure Eye center Eye centers eye area FRST localization detection the eye region is computed using facial proportions. Additionally,
On the face area, the On the face area, the eye region is computed using facial proportions. Additionally, the eyebrow area is roughly estimated using simple image projections. The width of the detected face eyebrow area is roughly estimated using simple image projections. The width of the detected face image is cropped to 70% (15% of the image inradial orderIristoradius assure that Eyebrowwidth are removed on each side) Sobel Vertical image area width are removed on each side) in order to assure that image is cropped to the 70%hair (15% of the image projection the background and are removed from the image. The vertical image projection is computed projection the background and the hair are removed from the image. The vertical image projection is computed on the gray scale intensity image and the first local minimum (projection valley) is selected as the on the gray scale intensity image and the first local minimum (projection valley) is selected as the rough estimation for the eyebrowFigure region1.(Figure 2). Iris segmentation algorithm. Figure 1. Iris segmentation algorithm. rough estimation for the eyebrow region (Figure 2). On the face area, the eye region is computed using facial proportions. Additionally, the eyebrow area is roughly estimated using simple image projections. The width of the detected face image is cropped to 70% (15% of the image width are removed on each side) in order to assure that the background and the hair are removed from the image. The vertical image projection is computed on the gray scale intensity image and the first local minimum (projection valley) is selected as the rough estimation for the eyebrow region (Figure 2).
(a) (a)
(b) (b)
Figure 2. Eye region selection. (a) Cropped face image and vertical image projection (on the right). Figure 2. minima Eye region region selection. (a) image image projection right). The local of the image projection are face drawn withand bluevertical lines; and (b) the selected(on eyethe area (the Figure 2. Eye selection. (a) Cropped Cropped face image and vertical image projection (on the right). The local minima of the image projection are drawn with blue lines; and (b) the selected eye area (the green rectangle) and positionare of the eyebrow redand line). The local minima of the the estimated image projection drawn with row blue(the lines; (b) the selected eye area green rectangle) and the estimated position of the eyebrow row (the red line). (the green rectangle) and the estimated position of the eyebrow row (the red line).
3.1.1. Iris Center Localization 3.1.1. Iris Center Localization 3.1.1.Circular Iris Center Localization regions from the eye region are detected using(b) a radial symmetry detector [23] (Fast (a) Circular regions from the eye region are detected using a radialimage symmetry detector [23] (Fast Radial Symmetry Transform—FRST). The FRST is a gradient-based operator that computes Circular regions from the eye region are detected using a radial symmetry detector [23] (Fast Figure 2. Eye region selection. (a) Cropped face image and vertical image projection (on the right). Radial Symmetry Transform—FRST). The FRST is a gradient-based image operator that computes the role that each pixel p plays to The the FRST symmetry of the neighboring pixels at a computes radius r, the by Radial Symmetry Transform—FRST). is a gradient-based image operator that The local of thepimage projection drawn with and (b) the selected area (ther, by the role that minima each pixel plays to the are symmetry of blue theinlines; neighboring pixels at eye a radius accumulating the orientation and magnitude contributions the direction of the gradient. role that each pixel p plays to the symmetry of the neighboring pixels at a radius r, by accumulating green rectangle) the estimated position of the eyebrow row (thedirection red line). of the gradient. accumulating the orientation and magnitude contributions in the For each pixel p,and a positively-affected pixel p+ direction and a negatively-affected the orientation and magnitude contributions in the of the gradient. pixel p- are computed For each pixel p, a positively-affected pixel p+ the and a negatively-affected pixel p- are computed (Figure thepixel positively-affected pixel is defined the gradient is pointing at computed a distance For3); each p, a positively-affected pixel pas a negatively-affected pixel p´to, are + and pixel 3.1.1. Iris Center Localization (Figure 3); the positively-affected pixel is defined as the pixel the gradient is pointing to, at a distance distance r(Figure from p,3);and the negatively-affected pixel is the pixel the gradient is pointed away from at a the positively-affected pixel is defined as the pixel the gradient is pointing to, at a distance r fromCircular p, and the negatively-affected pixel isare thedetected pixel theusing gradientradial is pointed away detector from at a[23] distance from the eyepixel region symmetry (Fast rr from from p,p:and theregions negatively-affected is the pixel the gradienta is pointed away from at a distance r rRadial from p:Symmetry Transform—FRST). The FRST is a gradient-based image operator that computes from p: ˆ ( ) ˙ = symmetry ± the role that each pixel p plays to± the ofg ppq (the ) neighboring pixels at a radius r,(1)by p˘± “ p ˘ round | ( )| r (1) = ± (1) accumulating the orientation and magnitude contributions ||g | ppq|| ( )|in the direction of the gradient. whereFor g(p)each is the the gradient vector at at pixel pixel p. p. pixel p+ and a negatively-affected pixel p- are computed pixel p, a positively-affected where g(p) is gradient vector where gradient vector atpixel pixelisp.defined as the pixel the gradient is pointing to, at a distance (Figureg(p) 3); is thethe positively-affected r from p, and the negatively-affected pixel is the pixel the gradient is pointed away from at a distance r from p: ±
=
±
( ) | ( )|
where g(p) is the gradient vector at pixel p. Figure and negatively negatively (p )-affected pixels determined by the gradient element g(p) Figure 3. 3. PositivelyPositively- (p (p++)) and (p´ −)-affected pixels determined by the gradient element g(p) for aa radius rr (after Figure 3. Positively(p+) and negatively (p−)-affected pixels determined by the gradient element g(p) for radius (after [23]). [23]).
for a radius r (after [23]).
Figure 3. Positively- (p+) and negatively (p−)-affected pixels determined by the gradient element g(p)
(1)
Sensors 2016, 16, 1105 Sensors 2016, 16, 1105
6 of 24 6 of 24
The orientation projection Or image and magnitude projection image Mr are updated at each step based on the positively and negatively affected pixels: The orientation projection Or image and magnitude projection image Mr are updated at each step based on the positively and ( negatively ( ±affected ) ± ( pixels: ), (2) ±) = ± = ± ±1 Initially, the orientation projection set to zero. “ Mr pp˘ qand ˘ gmagnitude ppq , Or pp˘ q “ Or ppare Mr pp˘ q projection ˘q ˘ 1 For each radius r the symmetry transform is defined as the convolution (*):
(2)
Initially, the orientation projection and magnitude projection are set to zero. = × For each radius r the symmetry transform is defined as the convolution (*): where: S= r “ ||r( || Fr (ˆ) A
)
(3) (3)(4)
( )
where: Ar is a two dimensional Gaussian filter used to spread the symmetry contribution, and pαq Ă Ă are the normalized orientation and magnitude Fr “ ||Or ppq || projection Mr ppq images, and α is the radial strictness (4) parameter. Ăr and M Ăr are Ar is a two dimensional Gaussian filter used to spread the symmetry contribution, O The full transform S is computed as the sum of symmetry contributions over all the radii the normalized orientation and magnitude projection images, and α is the radial strictness parameter. considered: The full transform S is computed as the sum of symmetry contributions over all the radii considered: = ÿ (5) S“ Sr (5) r
The transform can be adapted to search only for dark or bright regions of symmetry: dark The transform can be by adapted to search only fornegatively dark or bright regions of symmetry: darkregions regionsare regions can be found considering only the affected pixels, while bright canfound be found by considering the negatively affected pixels, while bright are found by Mr. by considering only only positively affected pixels when computing Or andregions considering only positively affected pixels when computing O and M . r r As the iris region is darker than the surrounding area (the skin and sclera) only As the iris region ispixels darkerare than thetosurrounding (the skin and sclera) only negatively-affected negatively-affected used compute Oarea r and Mr. The search radii are determined based on pixels are used to compute Or andthe Mreye . The search are determined based anthropomorphic constraints: width is radii approximately 1/5 ([24]) of on theanthropomorphic face width, and the constraints: the eye width is approximately 1/5 [24] of the face width, and the ratio of iris to eye ratio of iris width to eye width is about 0.42 in young adulthood and middle age andwidth increases with width is about 0.42 in young adulthood and middle age and increases with age. age. TheThe centers of the irises areare selected using geometrical andand appearance constraints. centers of the irises selected using geometrical appearance constraints. TheThe FRST image is scanned using a sliding window of 2 ˆ r size, where rminrmin is the minimum FRST image is scanned using a sliding window of 2min × rmin size, where is the minimum radius for for which the the transform is computed, and and the minimum intensity position fromfrom each each window is radius which transform is computed, the minimum intensity position window retained. The minimum positions that are too close to each other (their distance is less than r ) are min is retained. The minimum positions that are too close to each other (their distance is less than rmin) are merged together, andand the the filtered minima are are considered as possible iris iris candidates. merged together, filtered minima considered as possible candidates. TheThe irisiris centers are selected based on the circularity score from the symmetry transform image centers are selected based on the circularity score from the symmetry transform image andand geometrical conditions (the left iris center and right iris center must be positioned in the first andand geometrical conditions (the left iris center and right iris center must be positioned in the first second halfhalf of the faceface width, respectively). Additionally, thethe confidence of the candidates thatthat areare second of the width, respectively). Additionally, confidence of the candidates positioned above thethe eyebrow areaarea is penalized. positioned above eyebrow is penalized. Figure 4 shows the result of the irisiris center localization module. Figure 4 shows the result of the center localization module.
(a)
(b)
(c)
Figure 4. Iris center localization. (a) Symmetry transform on the eye region; (b) Possible iris center Figure 4. Iris center localization. (a) Symmetry transform on the eye region; (b) Possible iris center candidates in green; (c)selection: iris selection: leftcandidates iris candidates in green, right iris candidates cyan, candidates in green; and and (c) iris left iris in green, right iris candidates in cyan,inand and selected iris in centers selected iris centers red. in red.
Finally, after the initial approximation of the iris center, a small neighborhood of (rmin/2, rmin/2) is Finally, after the initial approximation of the iris center, a small neighborhood of (rmin /2, rmin /2) analyzed and the center of the iris is constrained to the dark region of the pupil, similar to [12]. is analyzed and the center of the iris is constrained to the dark region of the pupil, similar to [12].
Sensors 2016, 16, 1105
7 of 24
Sensors 2016, 16, 1105
7 of 24
3.1.2. Iris Iris Radius Radius Detection Detection 3.1.2. The initial initial approximation approximationofofthe theiris iris radius range is based on geometrical constraints. The radius range is based on geometrical face face constraints. The The problem now is to refine this rough estimation so that it best fits the real radius of the problem now is to refine this rough estimation so that it best fits the real radius of the iris.iris. We use We use aa vertical vertical Sobel Sobel derivate derivate on on aa blurred blurred ROI ROI around around each each eye eye in in order order to to emphasize emphasize the the strong transition between the iris and the sclera area. To eliminate the false transitions caused by strong transition between the iris and the sclera area. To eliminate the false transitions caused by potential image noise, the lowest k% of the gradients are ignored. potential image noise, the lowest k% of the gradients are ignored. For each each candidate candidate radius radius rr in inthe theinterval interval[r[rmin rmax a radial projection proj(r) is computed min For , r,max ], a],radial projection proj(r) is computed by by adding the gradient values that are r pixels away from the iris center and span under the angles adding the gradient values that are r pixels away from the iris center and span under the angles [θ min θ max ]: [θ min,, θ max]: θÿ max ` ˘ proj prq “ sobel c x ˘ rcos pθq , cy ˘ rsin pθq (6) ( ) = θ “ θmin (6) ± cos( ), ± sin( ) where sobel(x, y) is the value of the gradient from the vertical Sobel image at (x, y) coordinates,pc x , cy q s are the from search the angles. is the location the is iristhe center and rθ , θmax where sobel(x,of y) value ofminthe gradient vertical Sobel image at (x, y) The above proj(r) its maximum the border sclera and the iris. coordinates, ( , ) isreaches the location of the irisvalues centeron and , between are the search angles. We limit the angular coordinate θ to anvalues interval θ max ] because the iris The above proj(r) reaches its maximum on[θthe betweenthe theborder sclera between and the iris. min ,border and the yields the strongest transition, while the between iris and the eyelids is Wesclera limit the angular coordinate θ to an interval [θtransition min, θmax] because thethe border between the iris less evident. and transition, lower partswhile of thethe iris are occluded by the value for and the scleraOften yieldsthe theupper strongest transition between theeyelids. iris and The the eyelids is ˝ , 45˝ ] through trial and error experiments. [θ minevident. , θ max ] was set to less Often the[´45 upper and lower parts of the iris are occluded by the eyelids. The value for projection analyzed and theand most prominent peak is selected as the iris radius: [θmin,The θmaxradial ] was set to [−45°,is45°] through trial error experiments. The radial projection is analyzed and the most prominent peak is selected as the iris radius: radius “ argmax proj prq (7) = argmax ( ) r (7)
Figure 55 illustrates Figure illustrates the the steps steps of of the the iris iris radius radius computation computation algorithm. algorithm.
(a)
(b)
(c)
(d)
Figure 5. Iris Iris radius radius computation. computation. (a) (a)Blurred Blurredeye eyeregion; region;(b) (b)Vertical Vertical Sobel derivative (with = 20% Figure 5. Sobel derivative (with k =k20% of of the gradients ignored); (c) Radial projection of the image (the selected radius is depicted in green); the gradients ignored); (c) Radial projection of the image (the selected radius is depicted in green); and andthe (d)iris theradius. iris radius. (d)
3.2. Eye Shape Segmentation 3.2. Eye Shape Segmentation The outline of the eye shape extraction procedure is depicted in Figure 6; the method is The outline of the eye shape extraction procedure is depicted in Figure 6; the method is color-based color-based and it exploits the high contrast between the sclera and its surrounding area. and it exploits the high contrast between the sclera and its surrounding area. An offline learning module is used to determine the probabilities of a color belonging to the An offline learning module is used to determine the probabilities of a color belonging to the sclera sclera region. The output of this module is a look-up table (LUT) that stores the computed region. The output of this module is a look-up table (LUT) that stores the computed probabilities. probabilities. The eye shape extraction algorithm uses Monte Carlo sampling and a voting procedure to select The eye shape extraction algorithm uses Monte Carlo sampling and a voting procedure to select the eyelid shape from a uniformly generated space of eye shape hypotheses. The b fittest individuals the eyelid shape from a uniformly generated space of eye shape hypotheses. The b fittest individuals contribute to the selected eye shape. contribute to the selected eye shape.
Sensors 2016, 16, 1105 Sensors 2016, 16, 1105
8 of 24 8 of 24
Monte Carlo sampling Hypothesis generation
Eye center, Iris radius
Eye image
Feature extraction
Sclera Proba. image
Hypothesis evaluation
Voting
Color samples
Labels (sclera, non-
Feature extraction
Solution
Sclera color Proba LUT
Feature vector
SVM
Predictive model
Feature vector
Feature extraction
Color generation
Offline learning
Figure 6. Eye shape detection algorithm.
3.2.1. Sclera Probability LUT LUT 3.2.1. Sclera Probability A selected selectedregion regionaround around is transformed a probability spacemachine using machine A thethe eyeseyes is transformed into ainto probability space using learning learning techniques to determine the probability of pixels belonging to the sclera region as described techniques to determine the probability of pixels belonging to the sclera region as described below. below. The proposed approach adopts a pixel-based strategy for the classification and uses only color The proposed approach adopts a pixel-based strategy forthe thecomponents classification uses only color features: the hue component from the HSV color space, and ofand the RGB opponent features: theThe hueRGB component from the HSViscolor space, andofthe components of the RGBchannels opponent color space. opponent color-space a combination three values based on the of color space. The RGB opponent color-space is a combination of three values based on the channels of the opponent color space [25]: ¨ ˛ ¨ R´ G ˛ the opponent color space ([25]): ? O1 2 ˚ ‹ ˚ R ` G´2B ‹ −? ‹ (8) ˝ O2 ‚ “ ˚ ˝ ‚ 6 R ` G ` B √2? O3 + −32 (8) The color information of the pixel is stored=by O1 and √6 O2 channels, while component O3 represents the intensity information. + + The extracted chromatic features are used to train a√3 linear Support Vector Machine (SVM) classifier which will be used to determine the probability of each pixel to belong to the sclera region. The color information of the pixel is stored by O1 and O2 channels, while component O3 The training of the SVM was performed on randomly-selected patches of approximately represents the intensity information. 10 ˆ 10 pixels around the eye area from a publicly available face database [26]. Thirty patches were The extracted chromatic features are used to train a linear Support Vector Machine (SVM) used for the non-sclera region and 25 patches for the sclera-region. Figure 7 shows some examples of classifier which will be used to determine the probability of each pixel to belong to the sclera region. sclera and non-sclera patches used for training. The training of the SVM was performed on randomly-selected patches of approximately 10 × 10 pixels around the eye area from a publicly available face database [26]. Thirty patches were used for the non-sclera region and 25 patches for the sclera-region. Figure 7 shows some examples of sclera and non-sclera patches used for training.
Sensors 2016, 16, 1105
9 of 24
Sensors 2016, 1105 Sensors 2016, 16,16, 1105
9 of 9 of 2424
(a)(a)
(b) (b)
Figure Sclera and non-sclera patches used train the SVM classifier. The non-sclera patches Sclera (a)(a) and non-sclera (b)(b) patches used toto train the SVM classifier. The non-sclera patches Figure 7. 7. are selected from the skin and eyelashes region. eyelashes region. region. are selected from the skin and eyelashes
Followingthe the learningprocess process Plattscaling scaling [27]isisperformed performed totransform transformthe thebinary binaryoutput outputofof Following Following the learning learning processPlatt Platt scaling[27] [27] is performedtoto transform the binary output the SVM classifier into probabilities. Other methods ([3,4]) use more complex features, such edge thethe SVM classifier intointo probabilities. OtherOther methods ([3,4])[3,4] use use moremore complex features, such such asasedge of SVM classifier probabilities. methods complex features, as information or Zernike moments, as features. Taking into account that the total number of colors information or Zernike moments, as features. Taking into account that the total number of colors edge information or Zernike moments, as features. Taking into account that the total number of thatare arerepresentable representableinina a2424bits/pixel bits/pixelimage imageisisfinite, finite,a alookup lookuptable tablecan can be usedtotostore storethe the that colors that are representable in a 24 bits/pixel image is finite, a lookup table canbebeused used to store the probability of each possible color. Therefore, our method has the advantage that it can pre-calculate probability of of each each possible possible color. color. Therefore, Therefore, our our method method has has the the advantage advantage that that it it can can pre-calculate pre-calculate probability theprobability probabilityofofeach eachpossible possiblepixel, pixel,replacing replacingthe thecomputation computationofofthe theprobability probabilitywith witha asimple simple the the probability of each possible pixel, replacing the computation of the probability with a simple array array indexing operation. array indexing operation. indexing operation. Figure8 8shows showsthe theoutput outputofofthe theclassifier classifierofofananinput inputimage. image. Figure Figure 8 shows the output of the classifier of an input image.
Figure 8. Sclera probability image (bright zones indicate a high sclera probability, while darker regions Figure Sclera probability image (bright zones indicate highsclera scleraprobability, probability, while darker Figure 8. 8. Sclera probability image (bright zones indicate a ahigh while darker indicate a low sclera probability). regions indicate a low sclera probability). regions indicate a low sclera probability).
3.2.2.Eye EyeModel Model 3.2.2. Theexterior exterioreye eyeshape shapeisisrepresented representedusing usingtwo twoparabolas, parabolas,one onefor forthe theupper uppereyelid eyelidand andone onefor for The the bottom eye. The following shape definition vector, containing the coordinates of control points, the bottom eye. control points, is eye. The The following followingshape shapedefinition definitionvector, vector,containing containingthe thecoordinates coordinatesofof control points, is used to generate the shape of the eye: used to generate the shape of the eye: is used to generate the shape of the eye: (9)(9) ==” ı T X“ (9) c x cy t x ty bx by tlx tly tr x try bly bry x, cy) are the (x, y) coordinates of iris center, (tx, ty) is the middle point on the top eyelid, where(c(c x, cy) are the (x, y) coordinates of iris center, (tx, ty) is the middle point on the top eyelid, where x, by) is the middle point on the bottom eyelid. For the top eyelid we define two other control points (b(b x, by) (c isxthe on the bottom eyelid. For(tthe we define other points where , cy )middle are the point (x, y) coordinates of iris center, ) is eyelid the middle point two on the top control eyelid, (b x , tytop x , by ) (lt x, lty) and (rlx, rly) to the left and right of the middle point. These landmarks have two (ltthe x, ltmiddle y) and point (rlx, rl y) the to bottom the lefteyelid. and For right middle point.two These have is on theoftopthe eyelid we define otherlandmarks control points (ltx ,two lty ) corresponding on the bottom eyelid that have the same x coordinates as the first two points, their corresponding on the eyelid havepoint. the same x coordinates as the first two points, their yy and (rlx , rly ) to the left bottom and right of thethat middle These landmarks have two corresponding on the coordinates are defined by the offset values bl y and bry (Figure 9). coordinates arethat defined offset values bly and bryfirst (Figure bottom eyelid have by thethe same x coordinates as the two 9). points, their y coordinates are defined The coordinates ofthe the control points ofthe theeye eyeexternal externalshape shapeare areexpressed expressedrelatively relativelytotothe the The coordinates points of by the offset values blyofand brcontrol (Figure 9). y eyecenter centerininorder ordertotomake makethe themodel modelinvariable invariabletototranslation. translation. eye
Sensors 2016, 16, 1105
10 of 24
Sensors 2016, 16, 1105 Sensors 2016, 16, 1105
10 of 24 10 of 24
Figure 9. Eye model using seven control points (12 values) that generates two intersecting parabolas.
3.2.3. Shape Matching Figure the 9. Eye model using seven control points (12 values) that generates parabolas. Using described model, multiple hypotheses are generated andtwo theintersecting sclera probability image Figure 9. Eye model using seven control points (12 values) that generates two intersecting parabolas. is used to determine how well a hypothesis conforms to the input image. Ideally, the eyelid parabola Thebecoordinates the control of thebetween eye external expressed relatively to the eye should positioned of exactly on thepoints boundary sclerashape pixelsare (brighter region) and non-sclera 3.2.3. Shape Matching center in order to make the model invariable to translation. pixels (darker region). Using thetodescribed model, multiple hypotheses areagenerated and the sclera image In order express the above assumption we use similar approach as in probability [28]. Two sets of 3.2.3. Shape Matching is used to determine how well a hypothesis conforms to the input image. Ideally, the eyelid parabola pixels at Δ distance away from the current hypothesis are considered: the positive pixels (p+) which should be positioned exactly onmultiple the boundary between pixels (brighter region) and non-sclera Using thethat described model, hypotheses are sclera generated and the sclera probability imagethe is are the pixels belong to the sclera region and negative pixels (p-) which are the pixels from pixels (darker region). used to determine how well a hypothesis conforms to the input image. Ideally, the eyelid parabola skin and eyelashes zone (Figure 10). In be order to the on above assumption we usesclera a similar in [28]. sets of should positioned exactly boundary pixelsapproach (brighter as region) andTwo non-sclera The score of express a hypothesis isthe defined as: between pixels(darker at Δ distance away from the current hypothesis are considered: the positive pixels (p+) which pixels region). ∑∆ − ∑∆ = (10) are the pixelstothat belong the sclera region anduse negative pixels (p-) which are the pixels from the In order express theto above assumption we a similar approach as in [28]. Two sets of pixels + skin and eyelashes zone the (Figure 10).hypothesis are considered: the positive pixels (p+ ) which are the at ∆ distance away from current whereThe α and β of (α ato + hypothesis βthe = 1) are the weights that control the(pinfluence of positive and negative pixels, score is defined pixels that belong sclera region and as: negative pixels ´ ) which are the pixels from the skin and respectively. eyelashes zone (Figure 10). ∑∆ − ∑∆ = (10) +
where α and β (α + β = 1) are the weights that control the influence of positive and negative pixels, respectively.
10. Eyelid shape matching (for the top eyelid eyelid only). only). The dotted line represents the current Figure 10. are depicted depicted in in yellow yellow and and the the negative negative pixels pixels in in red. red. hypothesis; the positive pixels are
We assign same contribution to as: both sets of pixels, and therefore both weights, α and β, are The score ofthe a hypothesis is defined set toFigure 0.5. The metric computes the average difference between the probability values of 10. proposed Eyelid shape matching (for the top eyelid only). ř ř The dotted line represents the current α p ´ β p the inner part ofthe the eyelidpixels curve, must fit`inside of∆ the region and, thus, expected to hypothesis; positive arewhich depicted in ∆yellow and the negative in red. ´ sclerapixels ω“ (10) α `of βthe outer area of the curve, which must fit on have a high sclera probability value, and the values We assign the same to both setsaof pixels, and thereforevalue. both weights, α and β, are the outside of the eye and,contribution thus, expected to have low sclera probability where α and β (α + β = 1)metric are thecomputes weights that control the influence of positive and negative pixels, set toSeveral 0.5. The proposed the average difference between probability of variations of this metric were tested: to consider only the pixelsthe from the set {0,values Δ}, that respectively. the inner part eyelidaway curve, which fit hypothesis, inside of theorsclera region all and, expected to is the pixels at of Δ the distance from the must current to examine thethus, pixels in the in assign the same contribution to both of pixels, therefore weights, α and β, are set haveWe a high sclera value, thesets values of theand outer area ofboth theover curve, must on interval [1, Δ]. Theprobability best results wereand obtained by computing the metric all which the pixels infitthe to The proposed computes theto average thevalue. probability values of the the0.5. outside of the eyemetric and, thus, expected have a difference low sclera between probability interval [1, Δ]. innerSeveral part of the eyelid curve, which must inside the sclera regionpixels and, thus, the expected to variations of this wereisfit tested: to of consider only Δ},have that The value of Δ is set to a metric value that proportional to the iristhe radius (infrom our caseset5 {0, pixels). A aishigh sclera probability value, and the values of the outer area of the curve, which must fit on the the pixels at Δ distance away from the current hypothesis, or to examine all the pixels in the in lower value of this parameter does not propagate enough the transition between the sclera and the outside of the eye and, thus, expected to have a low sclera probability value. interval [1,regions Δ]. Theand bestisresults were obtained by while computing the value metriccould over lead all the in the the non-sclera very sensitive to noise, a higher to pixels missing Several variations of this metric were tested: to consider only the pixels from the set {0, ∆}, that is interval Δ]. region of[1, interest. the pixels at ∆ distance the current hypothesis, ortotothe examine all the(in pixels the5inpixels). interval The value of Δ is away set tofrom a value that is proportional iris radius our in case A [1, ∆]. The best results were obtained by computing the metric over all the pixels in the interval [1, ∆]. lower value of this parameter does not propagate enough the transition between the sclera and the The value of ∆ is set is to very a value that is proportional to the iris radius (in could our case 5 pixels). A lower non-sclera regions and sensitive to noise, while a higher value lead to missing the value of this parameter does not propagate enough the transition between the sclera and the non-sclera region of interest. regions and is very sensitive to noise, while a higher value could lead to missing the region of interest.
Sensors 2016, 16, 1105
11 of 24
3.2.4.Sensors Eye Shape 2016, 16,Selection 1105
11 of 24
To find the exterior shape of the eye, a simple Monte Carlo sampling is used. Monte Carlo 3.2.4. Eye Shape Selection algorithms form a wide class of methods that are based on repeated resampling of random variables To finddomain, the exterior shape of the the eye, phenomena a simple Monte sampling is used. Montehave Carlo been over anSensors input which defines in Carlo question. These methods 2016, 16, 1105 11 of 24 algorithms form a wide class of methods that are based on repeated resampling of random variables successfully used to solve optimization problems with complex objective functions. over anEye input domain, which defines the phenomena in question. These methods have been 3.2.4. Shape N hypotheses areSelection uniformly generated over the input domain defined by the iris center position successfully used to solve optimization problems with complex objective functions. To find the exterior shape ofisthe eye, a simple Monte Carlo sampling is the used. Monte described Carlo and the iris radius. Each matched the input image using method N hypotheses arehypothesis uniformly generated overover the input domain defined by the iris center position in algorithms form a wide class of methods that are based on repeated resampling of random variables Section 3.2.3. and the iris radius. Each hypothesis is matched over the input image using the method described in over an input domain, which defines the phenomena in question. These methods have been The solution Section 3.2.3. is selected by considering the best b% hypotheses: each one of these shapes successfully used to solve optimization problems with complex objective functions. Thetosolution is selected considering best b% hypotheses: to each of these shapes contributes the solution by a by weight that isthe directly proportional its one fitness value. In other N hypotheses are uniformly generated over the input domain defined by the iris center position contributes to the solution by a weight that is directly proportional to its fitness value. In other words, the position for each control point of the result is computed as: and the iris radius. Each hypothesis is matched over the input image using the method described in words, the3.2.3. position for each control point of the result is computed as: Section
ř
px, )yq i ∙¨ pi b% The solution is selected by considering the each one of these shapes ∑ ωbest ř ( , hypotheses: px,( yq )“=is directly ,that (11) ω contributes to the solution by a weight proportional to its fitness value. In other ∑ i words, the position for each control point of the result is computed as:
(11)
y) represents (x,coordinates y) coordinates controlpoint point from from the the solution, , )yqand wherewhere s(x, y)s(x, represents the the (x, y) ofofa∑acontrol solution, p i(px, and ωi are ∙ ( , ) are the position and the score function of the ith hypothesis that votes for the solution. )= (11) the position and the score function of the (ith, hypothesis that votes for the solution. ∑ Figure 11 graphically illustrates the voting process of the proposed algorithm using color maps. Figure 11 graphically illustrates the voting process of the proposed algorithm using color maps. where s(x, y) represents the (x, y) coordinates of a control point from the solution, ( , ) and are the position and the score function of the ith hypothesis that votes for the solution. Figure 11 graphically illustrates the voting process of the proposed algorithm using color maps.
Figure 11. Eye voting procedure. (a) The distribution of the markers obtained through thethe voting Figure 11. shape Eye shape voting procedure. (a) The distribution of the markers obtained through fittest b = 30% cold hypotheses; cold colors represent smaller valuescolor and represent warm color of thevoting fittest of b =the 30% hypotheses; colors represent smaller values and warm higher represent higher (b) The solution control points (marked asthrough theaverage weighted values; (b) The11. solution control points (marked green) computed as computed theobtained weighted Figure Eye values. shape voting procedure. (a) Thein distribution of in thegreen) markers the of the fittest hypotheses. The intersection between lower and theisand upper parabola is the votingofofthe theThe fittest b = 30% hypotheses; cold colors represent smaller values warm color fittestaverage hypotheses. intersection between the lower and the the upper parabola computed so that computed so that the solution can be expressed through the eye corners and two points on the upper higher values. (b) Thethe solution control points (marked in green) as the weighted solutionrepresent can be expressed through eye corners and two points on thecomputed upper and lower eyelids; and andaverage lower eyelids; and (c) color scale. of the fittest hypotheses. The intersection between the lower and the upper parabola is (c) color scale. computed so that the solution can be expressed through the eye corners and two points on the upper
and lower eyelids; (c)was colorheuristically scale. The optimal value and for N determined to 200 samples. From our experiments we The optimalthat value for N was heuristically determined to 200 samples. From determined increasing the number of samples of the Monte Carlo sampling doesour notexperiments cause a optimal N was heuristically determined to 200 From our100 experiments we cause a we determined that increasing the number of samples the Monte Carlo sampling does not relevantThe increase invalue the for accuracy of the algorithm. Aofvalue forsamples. N lower than has a negative determined that increasing the number of samples of the Monte Carlo sampling does not cause a by influence on in the ofofthe reported Section were obtainedinfluence relevant increase theaccuracy accuracy thealgorithm. algorithm.The A value for results N lowerinthan 100 4has a negative relevant increase in the accuracy of the algorithm. A value for N lower than 100 has a negative N 200algorithm. hypothesesThe and reported considering the fittest 30% hypotheses in the voting process. N = 200 on thegenerating accuracy of=the results in Section 4 were obtained by generating influence on the accuracy of the algorithm. The reported results in Section 4 were obtained by The result of the eye shape segmentation algorithm is depicted in Figure 12. hypotheses and considering the fittest 30% hypotheses in the voting process.
generating N = 200 hypotheses and considering the fittest 30% hypotheses in the voting process.
The result theofeye segmentation depicted in Figure The of result theshape eye shape segmentationalgorithm algorithm isisdepicted in Figure 12. 12.
Figure 12. Eye shape segmentation results. Figure Eyeshape shape segmentation segmentation results. Figure 12.12.Eye results.
Sensors 2016, 16, 1105
12 of 24
4. Results and Discussion 4.1. Iris Center Localization The metric used to validate the performance of the eye center localization is the relative error introduced in [9]: the error obtained by the worst of both eye estimators, normalized with the distance between the eye centers: ´ ¯ xr || max ||Cl ´ Cpl ||, ||Cr ´ C wec ď (12) ||Cl ´ Cr || xr are the positions of the where Cl , Cr are the positions of the left and right iris centers, and Cpl , C estimated left eye and right iris centers. This metric is independent of the image size. Based on the fact that the distance between the inner eye corners is approximately equal to the width of an eye, the relative error metric has the following properties: if wec ď 0.25 the error is less than or equal to distance between the eye center and the eye corners, if wec ď 0.10 the localization error is less than or equal to the diameter of the iris, and finally, if wec ď 0.05 the error is less than or equal to the diameter of the pupil. In addition two other metrics were implemented like suggested in [12]: bec and aec which define the lower and the averaged error, respectively:
bec ď
aec ď
´ ¯ xr || min ||Cl ´ Cpl ||, || Cr ´ C ||Cl ´ Cr || ´ ¯ xr || avg ||Cl ´ Cpl ||, ||Cr ´ C ||Cl ´ Cr ||
(13)
(14)
where min() and avg() are the minimum and the average operators. The proposed iris center localization algorithm does not use any color information, only the sclera segmentation part. For comparison purposes, our iris center localization method is evaluated on the BIO-ID face database [9], one of the most challenging eye databases, which has been used for the validation of numerous eye localization methods. The dataset reflects realistic image capturing conditions, featuring a large range of illumination conditions, background and face size and many state of the art methods were tested on this dataset. The database contains 1521 grey-scale images of 23 persons, captured during different sessions in variable illumination conditions. Moreover, some of the subjects in the database wear glasses, in some images the eyes are (half) closed or the eyes are occluded by strong specular reflections on the glasses. The resolution of the images is low 384 ˆ 286 pixels. This dataset has been widely used to evaluate eye center localization methods and, therefore, it allows us to benchmark the results of our algorithm with prior work. Results on the BIO-ID face database are depicted in Figure 13 and the ROC curve is depicted in Figure 14. The comparison of our method with other state of the art papers is shown in Table 1. If the performance for the normalized error P {0.05, 0.10, 0.25} was not mentioned explicitly by the authors, we extracted the values from the performance curves; these values are marked with * in the table. In the case of pupil localization (wec ď 0.05) the proposed method is outperformed only by [12]. In the case of eye localization (wec ď 0.25) our method outperforms the other works. However, in the case of wec ď 0.10 the proposed algorithm is outperformed by three other state of the art methods [8,12,15]. This is due to the fact that our eye center localization algorithm relies mostly on circularity constraints and the BIO-ID face database contains multiple images where the eyes are almost closed, in which case the circularity of the iris cannot be observed. Therefore, the accuracy of the algorithm is impaired. The transition between the cases wec ď 0.05 and wec ď 0.25 is smoother because in multiple images of the database the circularity of the iris is not observable due to occlusions
conditions, featuring a large range of illumination conditions, background and face size and many state of the art methods were tested on this dataset. The database contains 1521 grey-scale images of 23 persons, captured during different sessions in variable illumination conditions. Moreover, some of the subjects in the database wear glasses, in some images the eyes are (half) closed or the eyes are by strong specular reflections on the glasses. The resolution of the images is low 384 Sensorsoccluded 2016, 16, 1105 13 of 24 × 286 pixels. This dataset has been widely used to evaluate eye center localization methods and, therefore, it allows us to benchmark the results of our algorithm with prior work. and closedResults eyes. To up, the proposed algorithm yields in accurate ď 0.05 in 74.65% of onsum the BIO-ID face database are depicted Figure results 13 and (wec the ROC curve is depicted in the images) for the images where the iris is visible, and acceptable results otherwise. Figure 14.
SensorsSensors 2016, 16, 1105 2016, 16, 1105
13 of 24 13 of 24
(a)
(a)
(b) (b) Figure 13. Iris center localization resultsresults on theonBIO-ID face database. (a) Iris localization Iris localization center localization theface BIO-ID face (a) database. (a)center Iris center localization FigureFigure 13. Iris13. center results on the BIO-ID database. Iris center localization results; results; and (b) iris(b) center localization results—failure cases.cases. results; and iris center localization results—failure and (b) iris center localization results—failure cases.
Figure 14. Iris center center localization localization results results on BIO-ID face Figure 14. Iris on the theon BIO-ID face database. database. Figure 14. Iris center localization results the BIO-ID face database.
The comparison of our with with otherother state state of theofart is shown in Table 1. If the The comparison of method our method thepapers art papers is shown in Table 1. If the performance for the errorerror ∈ {0.05, 0.10, 0.10, 0.25} 0.25} was not explicitly by the performance fornormalized the normalized ∈ {0.05, wasmentioned not mentioned explicitly byauthors, the authors, we extracted the values from from the performance curves; thesethese values are marked with with * in the we extracted the values the performance curves; values are marked * intable. the table.
almost closed, in which case the circularity of the iris cannot be observed. Therefore, the accuracy of the algorithm is impaired. The transition between the cases wec ≤ 0.05 and wec ≤ 0.25 is smoother because in multiple images of the database the circularity of the iris is not observable due to occlusions and closed eyes. To sum up, the proposed algorithm yields accurate results (wec ≤ 0.05 in Sensors 2016, 1105 14 of 24 74.65% of16, the images) for the images where the iris is visible, and acceptable results otherwise. However, our method was designed for use cases, such as biometry, optometry, or human emotion understanding, in localization which the accuracies face is the main component under analysis Table 1. Iris center compared to the state-of-the-art methods.and the facial region has medium to good quality. The BIO-ID face database is not adequate for this purpose due to the low quality of the images. We tested algorithm this database that we can compare Method (wec our ď 0.05) (wecon ď 0.10) (wec ďso 0.25) with other methods. Timm et al. [12] 82.5% 93.4% 98.0% The proposed iris localization based on the84% accumulation of first order derivatives Hassaballah et al. [15] method 37% * * 94.3% obtains accurate Cristianace results if the iris in the input image.97.1% The search region for the et al. [8]is relatively 57% visible * 96% et al. [9] such as38.0% * 78.8% * hair, etc., 91.8% eyes often containsJerosky other elements, eyeglasses, eyebrows, and, if the iris is occluded Proposed method 74.6594% 79.1553% 98.09% in some way (semi-closed eyes or strong specular reflections of the eyeglasses), these external * The performance not explicitly statedresponse by the authors it was readiris from the ROC15). curve. elements could generate awas higher circularity thanand the actual (Figure We try to filter out these false candidates by imposing appearance constraints—the pupil center However, our than method designed forarea, use cases, as biometry, optometry,transform or human emotion must be darker thewas surrounding so thesuch result of the symmetry image is understanding, in which the face is the main component under analysis and the facial region has weighted by the inversed, blurred gray-scale image, and several geometrical constraints: separation medium to good quality. The BIO-ID face database is not adequate for this purpose due to of the left and right eye candidates, penalization of the eye candidates that are too close tothe the low qualityarea. of the images. We tested ourinterested algorithmin onobtaining this database soofthat can compare eyebrow Considering that we are a pair eyewe centers, we havewith also other methods. included a metric that models the confidence of the candidates as a pair. The score of a pair is The proposed iris localization based on the accumulation of first order derivatives weighted by a Gaussian function method of the inter-pupillary distance normalized by the face widthobtains having accurate results if the iris is relatively visible in the input image. The search region for the eyes often as mean the average ratio between the inter pupillary distance and the face width [24]. However, contains other elements, such as eyeglasses, eyebrows, hair, etc., and, if the iris is occluded in some this method was designed mainly for application domains (such as optometry, human-computer way (semi-closed or strong reflections ofand the the eyeglasses), these external elements could interaction, etc.) eyes in which the specular user is cooperative eye is visible in the input image and generate a higher circularity response than the actual iris (Figure 15). therefore these exaggerated occlusions are less likely to occur.
(a)
(b)
Figure15. 15.Failure Failurecases casesofofeye eyecenter centerlocalization. localization.(a) (a)Symmetry Symmetrytransform transformimage imageand andselected selectedeye eye Figure candidates; and (b) selected eye candidates (on the input image). candidates; and (b) selected eye candidates (on the input image).
As we will further demonstrate, the performance of the algorithm increases with the quality of We try to filter out these false candidates by imposing appearance constraints—the pupil center the image, while keeping the computational power low (on average, the eye center localization must be darker than the surrounding area, so the result of the symmetry transform image is weighted algorithm takes six milliseconds on an Intel Core i7 processor). by the inversed, blurred gray-scale image, and several geometrical constraints: separation of the left To test the proposed method we annotated a publicly-available face database [26], created by and right eye candidates, penalization of the eye candidates that are too close to the eyebrow area. the University of Michigan Psychology department. In the rest of the paper we will refer to this Considering that we are interested in obtaining a pair of eye centers, we have also included a metric database as University of Michigan Face Database (UMFD). The database comprises facial images of that models the confidence of the candidates as a pair. The score of a pair is weighted by a Gaussian 575 individuals with ages ranging from ages 18 to 93 and is intended to capture the representative function of the inter-pupillary distance normalized by the face width having as mean the average ratio features of age groups across the lifespan. The dataset contains pictures of 218 adults age 18–29, 76 between the inter pupillary distance and the face width [24]. However, this method was designed adults age 30–49, 123 adults age 50–69, and 158 adults age 70 and older. mainly for application domains (such as optometry, human-computer interaction, etc.) in which the Six points were marked on each eye: the center of the pupil, the eye corners, the top and the user is cooperative and the eye is visible in the input image and therefore these exaggerated occlusions bottom eyelids and a point on the boundary of the iris (Figure 16). The database annotation data can are less likely to occur. be accessed from [5]. The structure of the annotation data is detailed in Appendix A. As we will further demonstrate, the performance of the algorithm increases with the quality of the image, while keeping the computational power low (on average, the eye center localization algorithm takes six milliseconds on an Intel Core i7 processor). To test the proposed method we annotated a publicly-available face database [26], created by the University of Michigan Psychology department. In the rest of the paper we will refer to this database as University of Michigan Face Database (UMFD). The database comprises facial images of 575 individuals with ages ranging from ages 18 to 93 and is intended to capture the representative features of age groups across the lifespan. The dataset contains pictures of 218 adults age 18–29, 76 adults age 30–49, 123 adults age 50–69, and 158 adults age 70 and older.
Sensors 2016, 16, 1105
15 of 24
Six points were marked on each eye: the center of the pupil, the eye corners, the top and the bottom eyelids and a point on the boundary of the iris (Figure 16). The database annotation data can be accessed from Sensors 2016, 16, 1105 [5]. The structure of the annotation data is detailed in Appendix A. 15 of 24 Sensors 2016, 16, 1105
15 of 24
Figure 16. Eye landmarks used on the annotation annotation of University of of Michigan Face Face Database. Database. Figure Figure16. 16.Eye Eyelandmarks landmarksused usedononthe the annotationofofUniversity University ofMichigan Michigan Face Database.
The ROC curves for the iris center localization on the age groups from the University of TheROC ROCcurves curvesfor forthe theiris iriscenter centerlocalization localizationon onthe theage agegroups groupsfrom fromthe theUniversity Universityofof The Michigan database are illustrated in Figure 17 and Table 2 shows the eye center localization results Michigandatabase database illustrated Figure Table 2 shows the eye center localization results Michigan areare illustrated in in Figure 17 17 andand Table 2 shows the eye center localization results on on this database. on database. this database. this
Figure 17. Performance of the iris center localization algorithm on the University of Michigan Face Figure17. 17. Performance Performance of onon thethe University of Michigan Face Figure of the the iris iriscenter centerlocalization localizationalgorithm algorithm University of Michigan Database. Database. Face Database.
From the Table 2 it can be noticed that the on medium quality facial images (640 × 480) the From the Table 2 it can be noticed that the on medium quality facial images (640 × 480) the From theofTable 2 it can be noticed is that the on mediuminquality (640worst ˆ 480) performance the proposed algorithm highly increased: 96.30%facial of theimages cases, the of the performance of the proposed algorithm is highly increased: in 96.30% of the cases, the worst of the performance the proposed algorithm in 96.30% of the cases, the worst of the two eye centerofapproximations falls into is thehighly pupil increased: area. two eye center approximations falls into the pupil area. two eye center approximations falls into the pupil area. Table of 2. Iris localization results on the of Michigan database. The accuracy the center algorithm is lower (93.63%) forUniversity older subjects of agesface between 70 and 94 years Table 2. Iris center localization results on the University of Michigan face database. old due to the fact that a lower portion of the iris is visible and its circularity cannot be observed. Iris Center Localization Accuracy Iris Center Localization In conclusion, the proposed eye localization method proves to Accuracy be efficient on all the use cases Error ≤ 0.05 Error ≤ 0.10 Error ≤ 0.25 Error ≤ 0.05 Error ≤ 0.10 Error 0.25 Datasetpupil localization considered: (wec ď 0.05), iris localization (wec ď 0.10), and eye ≤localization Dataset wec aec bec wec aec bec wec aec bec wec aec bec wec aec bec wec aec bec (wec ď 0.25). (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) UMFD age 18–29 99.08 99.08 100 99.08 98.08 100 99.08 100 100 UMFD age 18–29 99.08 99.08 100 99.08 98.08 100 99.08 100 100 UMFD age 30–49 97.36 98.68 100 98.68 98.68 100 98.68 100 100 UMFD age 30–49 97.36 98.68 100 98.68 98.68 100 98.68 100 100 UMFD age 50–69 95.12 96.74 98.37 98.37 100 100 100 100 100 UMFD age 50–69 95.12 96.74 98.37 98.37 100 100 100 100 100 UMFD age 70–94 93.63 95.54 97.45 96.81. 98.09 99.36 98.09 98.72 99.36 UMFD age 70–94 93.63 95.54 97.45 96.81. 98.09 99.36 98.09 98.72 99.36 Average 96.30 97.51 98.96 98.71 98.71 99.84 98.96 99.68 99.84
Sensors 2016, 16, 1105
16 of 24
Table 2. Iris center localization results on the University of Michigan face database. Iris Center Localization Accuracy Error ď 0.05
Dataset
Error ď 0.10
Error ď 0.25
wec (%)
aec (%)
bec (%)
wec (%)
aec (%)
bec (%)
wec (%)
aec (%)
bec (%)
UMFD age 18–29 UMFD age 30–49 UMFD age 50–69 UMFD age 70–94
99.08 97.36 95.12 93.63
99.08 98.68 96.74 95.54
100 100 98.37 97.45
99.08 98.68 98.37 96.81.
98.08 98.68 100 98.09
100 100 100 99.36
99.08 98.68 100 98.09
100 100 100 98.72
100 100 100 99.36
Average
96.30
97.51
98.96
98.71
98.71
99.84
98.96
99.68
99.84
4.2. Iris Radius Computation To evaluate the accuracy of the iris radius estimation algorithm we compute the following normalized errors: p ´ rl| p ´ rr| ` |rl |rr aer “ (15) rl ` rr ´ ¯ p ´ rl| p ´ rr|, |rl 2¨ max |rr wer “ (16) rl ` rr ´ ¯ p ´ rl| p ´ rr|, |rl 2¨ min |rr ber “ (17) rl ` rr p are the estimated radiuses of p and rl where rl and rr are the radiuses of the left and right iris, and rr the right and left eye, respectively. The wer (worst error radius) metric represents the radius error for the worst radius estimator, the aer (average error radius) is the average radius error for the left and right iris and ber (best error radius) is the radius error for the best radius estimator. The functions are normalized by the average of the correct irises radius. Table 3 shows the performance of the iris radius computation algorithm on the different age groups from the University of Michigan Face Database. Table 3. Iris radius computation results on the University of Michigan Face Database. Dataset
aer
wer
ber
University of Michigan Face Database (age18–29) University of Michigan Face Database (age 30–49) University of Michigan Face Database (age 50–69) University of Michigan Face Database (age 70–94) Average
0.0903 0.0885 0.1247 0.0928 0.0991
0.1269 0.1276 0.1721 0.1361 0.1406
0.053 0.0495 0.0774 0.0495 0.05735
On average, the normalized aer value is 0.0991; in other words, the average error of the iris radius is less that 10% of the actual radius. Taking into account the fact that the iris radius has approximately 12 pixels on the images from the database, the magnitude of the error is about 1–2 pixel. 4.3. Eye Shape Segmentation To evaluate the eye shape segmentation algorithm, we used several statistical measures of the performance by analyzing the proportion of pixels that are assigned to the eye or non-eye region. We computed the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN), by comparing the results of the algorithm with the ground truth from the test databases.
Sensors 2016, 16, 1105
17 of 24
The terms true (T) and false (F) refer to ground truth and the terms positive (P) and negative (N) refer to the algorithm decision. Based on these values, the following statistical measures were determined:
Sensors 2016, 16, 1105
sensitivity “
TP TP ` FN
(18)
speci f icity “
TN TN ` FP
17(19) of 24
TP ` + TN accuracy “ = TP ` +TN `+FP `+FN
(20)
Sensitivity Sensitivity(or (orrecall) recall)isisaameasure measureofofthe theproportion proportionofofeye eyepixels pixelsthat thatare arecorrectly correctlyidentified identified and indicates the algorithm’s ability to correctly detect the eye region; specificity and indicates the algorithm’s ability to correctly detect the eye region; specificity(or (ortrue truenegative negative rate) rate)measures measuresthe theproportion proportionofofnon-eye non-eyepixels pixelsthat thatare arecorrectly correctlyidentified identifiedasassuch suchand andrelates relatestotothe the algorithm’s ability to rule out the pixels that do not belong to the eye region. In other words sensitivity algorithm’s ability to rule out the pixels that do not belong to the eye region. In other words quantifies thequantifies algorithmthe ability to avoid false negatives, while specificitywhile quantifies its ability to avoid sensitivity algorithm ability to avoid false negatives, specificity quantifies its false positives. ability to avoid false positives. Eye Eyeshape shapesegmentation segmentationresults resultsare areillustrated illustratedininFigure Figure18. 18.The Thedetected detectediris irisisismarked markedwith withaa green circle and the upper and lower eyelid parabolas are depicted in yellow. green circle and the upper and lower eyelid parabolas are depicted in yellow.
(a)
(b) Figure 18. Eye shape segmentation results on University of Michigan Face Database. (a) Eye shape Figure 18. Eye shape segmentation results on University of Michigan Face Database. (a) Eye shape extractionresults; results;and and(b) (b)failure failurecases. cases. extraction
From the results on University of Michigan Face Database it can be noticed that the accuracy of the results on University of Michigan Face Database it can be noticed that the accuracy of the From algorithm is decreasing with age. On older subjects the sclera portion of the eye becomes less the algorithm is decreasing with age. On older subjects the sclera portion of the eye becomes less visible due to eyelid ptosis and skin excess around the eyes. The shape of eyelid is distorted and it visible due to eyelid ptosis and skin excess around the eyes. The shape of eyelid is distorted and it can can no longer be estimated with a parabola (Figure 19). In addition, sclera color quality degradation no longer be estimatedeffect with of a parabola 19). Ininfluences addition, sclera color qualityofdegradation [29]The is [29] is a well-known aging on(Figure the eyes that the performance the algorithm. aloss well-known effect of aging on the eyes that influences the performance of the algorithm. The loss of of performance is about 1%–2% in average. performance is about 1%–2% in average.
From the results on University of Michigan Face Database it can be noticed that the accuracy of the algorithm is decreasing with age. On older subjects the sclera portion of the eye becomes less visible due to eyelid ptosis and skin excess around the eyes. The shape of eyelid is distorted and it can no longer be estimated with a parabola (Figure 19). In addition, sclera color quality degradation [29] is a well-known effect of aging on the eyes that influences the performance of the algorithm. The Sensors 2016, 16, 1105 18 of 24 loss of performance is about 1%–2% in average.
Figure 19. The The effect effect of aging over the eye region.
Sensors 2016, 16, 1105
18 of 24
The results results of of the theeye eyeshape shapesegmentation segmentationalgorithm algorithmare arestrongly strongly dependent accuracy The dependent onon thethe accuracy of of the center localization. example, the lastimage imagefrom fromFigure Figure18 18ititcan can be be seen seen that that the the eyeeye center localization. ForFor example, in in the last the performance performance of of the the algorithm algorithm is is impaired impaired due due to to the the wrong wrong eye eye center center estimation. estimation. To the best of our knowledge, full eye segmentation methods that we cancan compare withwith havehave not To the best of our knowledge, full eye segmentation methods that we compare beenbeen reported previously in thein specialized literature. An older uses[17] a deformable template not reported previously the specialized literature. Anwork older[17] work uses a deformable to find the full shape of the eye, but only the run time of the algorithm is reported. template to find the full shape of the eye, but only the run time of the algorithm is reported. The algorithm that is independent from ourour training set, set, the The algorithm was wasalso alsotested testedon onananimage imagedatabase database that is independent from training IMM frontal face database [30], which contains 120 facial images of 12 different subjects. All of the the IMM frontal face database [30], which contains 120 facial images of 12 different subjects. All of images are annotated withwith 73 landmarks that that define the facial features for the nose,nose, and the images are annotated 73 landmarks define the facial features foreyebrows, the eyebrows, jaws.jaws. The The contour of each eye is marked withwith eighteight points (Figure 20). 20). and contour of each eye is marked points (Figure
Figure 20. 20. The The landmarks landmarks from the IMM face database denoting the eye Figure eye region. region.
Results on the IMM face database are depicted in Figure 21 and the numerical scores are shown in Table 4. 4.
Figure 21. Eye shape segmentation on the IMM face database. Figure 21. Eye shape segmentation on the IMM face database. Table 4. Performance of the eye shape segmentation algorithm the UMFD and IMM datasets. Table 4. Performance of the eye shape segmentation algorithm the UMFD and IMM datasets.
Eye Shape Segmentation Results Accuracy Sensitivity Specificity Dataset Sensitivity Specificity90.02% Accuracy UMFD (age 18–29) 87.61% 88.77% UMFD (age 30–49) 86.57% UMFD (age 18–29) 90.75% 87.61% 90.02% 83.29%88.77% UMFD (age 30–49) 84.35% 90.75% 83.29% 83.06%86.57% UMFD (age 50–69) 83.58% UMFD (age 50–69) 84.35% 83.06% 83.58% UMFD (age 70–94) 84.76% 82.41% 83.41% UMFD (age 70–94) 84.76% 82.41% 83.41% IMM Face Database 86.3% 90.13% 87.93% IMM Face Database 86.3% 90.13% 87.93% Dataset
Eye Shape Segmentation Results Sensitivity Specificity
Labeled Face Parts in the Wild (LFPW) [10] is a large, real-world dataset of hand labeled Labeled Face Parts in the Wild (LFPW) [10] is a large, real-world dataset of hand labeled images, images, acquired from Internet search sites using simple text queries. The images from the dataset acquired from Internet search sites using simple text queries. The images from the dataset are captured are captured in unconstrained environments and contain several elements that can impair the performance of the algorithm: the eyes are occluded by heavy shadowing or (sun-) glasses, hats, hair, etc., some faces contain a lot of make-up and present various (theatrical) facial expressions. The only precondition of the images is that they are detectable by a face detector. Each image was annotated with 29 fiducial points. The images contain the annotation of three different workers and
Sensors 2016, 16, 1105
19 of 24
in unconstrained environments and contain several elements that can impair the performance of the algorithm: the eyes are occluded by heavy shadowing or (sun-) glasses, hats, hair, etc., some faces contain a lot of make-up and present various (theatrical) facial expressions. The only precondition of the images is that they are detectable by a face detector. Each image was annotated with 29 fiducial points. The images contain the annotation of three different workers and the average of these annotations was used as the ground truth. Due to copyright issues, the image files are not distributed, but rather a list of URLs is provided from which the images can be downloaded. Therefore, not all the original images are still available, as some of the image links have disappeared. We have downloaded all the images that were still accessible (576 images) from the original set of images and we evaluated the performance of our method on this dataset. The mean errors of our algorithm compared to other state of the art works and a commercial off the shelf (COTS) system [10] are shown in Table 5. Table 5. Mean error normalized by the inter-pupillary distance. Landmark
[10]
[11]
COTS [10]
Our Method
Left eye error Right eye error Left eye inner corner Left eye outer corner Left eye upper eyelid Left eye bottom eyelid Right eye inner corner Right eye outer corner Right eye upper eyelid Right eye bottom eyelid
0.024 0.027 0.0285 0.035 0.027 0.03 0.028 0.033 0.027 0.028
n/a n/a 0.072 0.077 n/a n/a 0.055 0.084 n/a n/a
n/a n/a 0.048 0.076 n/a n/a 0.047 0.07 n/a n/a
0.0412 0.0440 0.0677 0.0701 0.0503 0.0564 0.0672 0.0694 0.0486 0.0562
From Table 5 it can be noticed that our method is comparable with the COTS system and [11], but [10] is more accurate. However, [10] detects 29 fiducial points on the face and the total processing time for an image is 29 s. Our method takes on average 20 ms to find all of the six landmarks around the eyes, being several orders of magnitude faster than [10]. For the eye center localization, the average normalized error is in average 0.0426. A normalized error less than 0.05 implies that the detected iris center is within the pupil area. Therefore, our method has a good performance even on images that are degraded due to capturing conditions. For the sclera landmarks, the algorithm yields larger localization errors: on average the normalized distance between the sclera landmarks and the annotated landmarks is 0.0607. First, we note a difference of semantics between the annotated landmarks and the result of our algorithm: the proposed method is intended to segment the sclera region as accurately as possible and not to detect the position of the eye corners. While in some cases the sclera landmarks can determine the exact eye corners this cannot be generalized. For example, from Figure 22 it can be noticed that the sclera region is correctly segmented but the distance between the annotated left eye inner corner and the corresponding sclera landmarks is large; the normalized error between these two points is 0.0708. In addition, some of the images in the database contain sunglasses that totally obstruct the eyes and in some other images the sclera is not visible, due to the low image resolution, and cannot be accurately segmented even by a human operator. The problems targeted by our solution are iris and sclera segmentation; in order to solve these problems the features under consideration must be visible in the input image. Figure 23 shows the results of our method on some images from the LFPW database. Due to the smaller resolution of the images, we only draw the landmarks (eye centers and the landmarks used to generate the eyelid parabolas).
note a difference of semantics between the annotated landmarks and the result of our algorithm: the proposed method is intended to segment the sclera region as accurately as possible and not to detect the position of the eye corners. While in some cases the sclera landmarks can determine the exact eye corners this cannot be generalized. For example, from Figure 22 it can be noticed that the sclera region correctly the Sensorsis2016, 16, 1105segmented but the distance between the annotated left eye inner corner and20 of 24 corresponding sclera landmarks is large; the normalized error between these two points is 0.0708.
(a)
(b)
Sensors 2016, 16, 1105
20 of 24
Figure 22.22. Difference between thethe annotated eyeeye corners and thethe detected sclera landmarks. It can bebe Figure Difference between annotated corners and detected sclera landmarks. It can noticed that even if the sclera is correctly segmented, there is a larger distance between the annotated noticed that application even if the sclera is correctly segmented, there is a larger between annotated The main domains targeted by our method are distance optometry andthe ophthalmology, eyeeye corners and thethe sclera landmarks. (a)(a) Annotated landmarks onon LFPW database (in(in green) and corners and sclera landmarks. Annotated landmarks LFPW database green) augmented reality, and human-computer interaction, where the quality of the images is and usually detected sclera landmarks (in red); and (b) segmented sclera region. detected sclera landmarks (in red); and (b) segmented sclera region.
medium to good and the user is cooperative.
In addition, some of the images in the database contain sunglasses that totally obstruct the eyes and in some other images the sclera is not visible, due to the low image resolution, and cannot be accurately segmented even by a human operator. The problems targeted by our solution are iris and sclera segmentation; in order to solve these problems the features under consideration must be visible in the input image. Figure 23 shows the results of our method on some images from the LFPW database. Due to the smaller resolution of the images, we only draw the landmarks (eye centers and the landmarks used to generate the eyelid parabolas).
(a)
(b) Figure 23. Results on the LFPW database. The eye centers are depicted with a green cross and the Figure 23. Results on the LFPW database. The eye centers are depicted with a green cross and landmarks usedused to generate the the shape of the eyelids are are marked red red crosses. (a) Results; andand (b) the landmarks to generate shape of the eyelids marked crosses. (a) Results; failure cases. (b) failure cases.
The is integrated into atargeted virtual contact simulator application and into a digital The method main application domains by our lens method are optometry and ophthalmology, optometric measures the iris diameter and the the segment height (theimages verticalisdistance augmentedapplication reality, andthat human-computer interaction, where quality of the usually in mm from the bottom of the lens to the beginning of the progressive addition on a progressive lens) medium to good and the user is cooperative. basedThe on method facial images. Snapshots of the virtual contact lens simulator application are presented in is integrated into a virtual contact lens simulator application and into a digital Figure 24: different contact lenses colors are diameter simulated: dark-green, hazel, natural green, distance and ocean optometric application that measures the iris and the segment height (the vertical in blue respectively. mm from the bottom of the lens to the beginning of the progressive addition on a progressive lens) based on facial images. Snapshots of the virtual contact lens simulator application are presented in Figure 24: different contact lenses colors are simulated: dark-green, hazel, natural green, and ocean blue respectively. The average execution time for the full eye segmentation algorithm (the iris and the eyelids) is on average 20 ms on a on an Intel Core i7 processor on 640 ˆ 480 resolution images.
Figure 24. Demonstrator application—virtual contact lens simulator. On the first row: detection:
The method is integrated into a virtual contact lens simulator application and into a digital optometric application that measures the iris diameter and the segment height (the vertical distance in mm from the bottom of the lens to the beginning of the progressive addition on a progressive lens) based on facial images. Snapshots of the virtual contact lens simulator application are presented in Figure 24: different Sensors 2016, 16, 1105 contact lenses colors are simulated: dark-green, hazel, natural green, and ocean 21 of 24 blue respectively.
Figure 24. Demonstrator application—virtual contact lens simulator. On the first row: detection: Figure 24. Demonstrator application—virtual contact lens simulator. On the first row: detection: detection detection result result (iris (iris circle circle and and sclera sclera segmentation). segmentation). On Onthe the next next two two rows: rows: different different contact contact lens lens simulations: natural green, dark green, ocean blue, and violet. simulations: natural green, dark green, ocean blue, and violet.
The average execution time for the full eye segmentation algorithm (the iris and the eyelids) is 5. Conclusions on average 20 ms on a on an Intel Core i7 processor on 640 × 480 resolution images. This paper presents a fast eye segmentation method that extracts multiple features of the eye region: including the center of the pupil, the iris radius, and the external shape of the eyes. Our work has superior accuracy compared to the majority of the state of the art methods that measure only a subset of these features. The eye features are extracted using a multistage algorithm: first the iris center is accurately detected using circularity constraints and, on the second stage, the external eye shape is extracted based on color and shape information through a Monte Carlo sampling framework. Compared to other state of the art works our method extracts the full shape of the eye (iris and full eyelid boundaries), and we consider that it is sufficiently generic so that it has applications in a variety of domains: optometry, biometry, eye tracking, and so on. Extensive experiments were performed to demonstrate the effectiveness of the algorithm. Our experiments show that the accuracy of the method is dependent to the image resolution: increasing the image quality leads to an increase of accuracy without excessively increasing the computation time. Future work will include increasing the accuracy of the estimated eye shape using more measurement cues (like corner detectors for the eye corners) and tracking of the iris centers. By tracking the iris centers the detection system will benefit by reducing the detection failure in the case of illumination changes or when one of the irises is not fully visible. Additionally, we intend to use other curves for representing the eyelid shapes, such as third degree polynomials, spline, or Bezier curves, in order to increase the performance of the eye shape segmentation. Supplementary Materials: The eye feature annotation of the UMFD dataset is available online at https://drive. google.com/folderview?id=0Bwn_NrD78q_Td3VjOXA2c183UHM&usp=sharing. Acknowledgments: This work was supported by the MULTIFACE grant (Multifocal System for Real Time Tracking of Dynamic Facial and Body Features) of the Romanian National Authority for Scientific Research, CNDI–UEFISCDI, Project code: PN-II-RU-TE-2014-4-1746. Author Contributions: Radu Danescu and Diana Borza designed and implemented the algorithms; Radu Danescu designed the experiments; Diana Borza performed the experiments; Radu Danescu and Adrian Sergiu Darabant analyzed the data; Adrian Sergiu Darabant contributed analysis tools; Diana Borza wrote the draft version of the paper; Radu Danescu and Adrian Sergiu Darabant performed the critical revision of the article; Radu Danescu, Adrian Sergiu Darabant, Diana Borza contributed to the final approval of the paper version to be published. Conflicts of Interest: The authors declare no conflict of interest.
Sensors 2016, 16, 1105
22 of 24
Abbreviations The following abbreviations are used in this manuscript: ROI CLM HOG FRST LUT SVM UMFD TP TN FP FN LFPW COTS
Region of interest Constrained Local Models Histogram of Oriented Gradients Fast Radial Symmetry Transform Look Up Table Support Vector Machine University of Michigan Face Database True Positives True Negatives False Positives False Negatives Labeled Face Parts in the Wild database commercial off the shelf
Appendix A 16, 1105 Sensors 2016,
22 of 24
This section Appendix A describes the annotation format of the University of Michigan Face Database [26]. This dataset contains a large number of faces of individuals throughout the adult lifespan (from 18 to This section describes the annotation format of the University of Michigan Face Database [26]. 94 years) and it includes representative faces from both younger and older subjects. This dataset contains a large number of faces of individuals throughout the adult lifespan (from 18 to A detailed description regarding the data acquisition procedure and the breakdown of 94 years) and it includes representative faces from both younger and older subjects. participants by age, description race, and gender can found in [26]. procedure Access toand the the database is restricted A detailed regarding thebedata acquisition breakdown of to academic researchers. participants by age, race, and gender can be found in [26]. Access to the database is restricted to This work provides only the database annotation files; to request access to the images please academic researchers. This work provides contact the authors of [26]. only the database annotation files; to request access to the images please contact theimage authorsfrom of [26]. For each the dataset, the annotation data is stored into a .pts file with the same For each image from the dataset, the annotation data stored EMWmale19neutral.bmp into a .pts file with the same base-name the image; for example, the annotation file for theisimage is stored base-name the image; for example, the annotation file for the image EMWmale19neutral.bmp is stored in the file EMWmale19neutral.bmp.pts. in the file EMWmale19neutral.bmp.pts. All photos were annotated with 12 landmarks around the eyes. Table A1 specifies the All photos were annotated with 12 landmarks around the eyes. Table A1 specifies the correspondence between annotation landmarks and eye features and Figure A1 illustrates the exact correspondence between annotation landmarks and eye features and Figure A1 illustrates the exact localization of each oneone of these points. localization of each of these points.
Figure A1. The 12 annotation points used to define the eye features.
Figure A1. The 12 annotation points used to define the eye features. Table A1. The 12 landmarks defining the eye features.
Table A1. The 12 landmarks defining the eye features. Eye Type Eye Feature Landmark # Iris center 1 # Eye Type Eye Feature Landmark Iris radius 2 Iris center 1 Eye external corner 3 Iris radius 2 Left eye EyeEye interior corner external corner 34 Left eye Eye interior 45 Top eyelidcorner Top eyelid eyelid 56 Bottom Bottom eyelid 6 Iris center 7 Iris center 7 Iris radius 8 Iris radius 8 EyeEye external corner 9 external corner 9 Right eyeeye Right EyeEye interior corner 10 interior corner 10 Top eyelid 11 Top eyelid 11 Bottom eyelid 12 Bottom eyelid 12
A .pts file is a simple text file structured as a set of line separated by a new line character. The layout of the file is as follows: • •
The first line contains the total number n of points annotated in the image (in this case n = 12); The following lines, ranging from k = 2 to n + 1 contain the coordinates of the (k − 1)th landmark (one point per line)
Sensors 2016, 16, 1105
23 of 24
A .pts file is a simple text file structured as a set of line separated by a new line character. The layout of the file is as follows: ‚ ‚
The first line contains the total number n of points annotated in the image (in this case n = 12); The following lines, ranging from k = 2 to n + 1 contain the coordinates of the (k ´ 1)th landmark (one point per line) The annotation can be freely used for education and research purposes.
References 1. 2.
3.
4. 5.
6. 7. 8. 9.
10.
11.
12.
13. 14.
15. 16. 17. 18. 19.
Zhou, Z.; Du, E.Y.; Thomas, N.L.; Delp, E.J. A New Human Identification Method: Sclera Recognition. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2012, 42, 571–583. [CrossRef] Radu, P.; Ferryman, J.; Wild, P. A robust sclera segmentation algorithm. Biometrics Theory. In Proceedings of the 2015 IEEE 7th International Conference on Applications 157 and Systems (BTAS), Arlington, VA, USA, 8–11 September 2015; pp. 1–6. Tan, C.; Kumar, A. Automated segmentation of iris images using visible wavelength face images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 9–14. Betke, M.; Mullally, W.J.; Magee, J.J. Active Detection of Eye Scleras in Real Time. In Proceedings of the IEEE CVPR Workshop on Human Modeling, Analysis and Synthesis, Madeira, Portugal, 12 June 2000. Borza, D.; Darabant, A.S.; Danescu, R. University of Michigan Face Database Annotations. Available online: https://drive.google.com/folderview?id=0Bwn_NrD78q_Td3VjOXA2c183UHM&usp=sharing (accessed on 26 April 2016). Hansen, D.W.; Ji, Q. In the eye of the beholder: A survey of models for eyes and gaze. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 478–500. [CrossRef] [PubMed] Cristinacce, D.; Cootes, T. Automatic feature localisation with constrained local models. Pattern Recognit. 2008, 41, 3054–3067. [CrossRef] Cristinacce, D.; Cootes, T.F.; Scott, I.M. A Multi-Stage Approach to Facial Feature Detection. In Proceedings of the British Machine Vision Conference, Kingston, UK, 7–9 September 2004. Jesorsky, O.; Kirchberg, K.J.; Frischholz, R.W. Audio- and Video-Based Biometric Person Authentication. In Proceedings of the Third International Conference, AVBPA 2001, Halmstad, Sweden, 6–8 June 2001; Springer: Berlin, Germany; Heidelberg, Germany, 2001; pp. 90–95. Belhumeur, P.; Jacobs, D.; Kriegman, D.; Kumar, N. Localizing Parts of Faces Using a Consensus of Exemplars. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011. Everingham, M.; Sivic, J.; Zisserman, A. Hello! My name is Buffy—Automatic naming of characters in TV video. In Proceedings of the 17th British Machine Vision Conference (BMVC 2006), Edinburgh, UK, 4–7 September 2006. Timm, F.; Barth, E. Accurate Eye Centre Localisation by Means of Gradients. In Proceedings of the International Conference on Computer Theory and Applications (VISAPP), Algarve, Portugal, 5–7 March 2011; pp. 125–130. Daugman, J. How iris recognition works. IEEE Trans. Circuits Syst. Video Technol. 2004, 14, 21–30. [CrossRef] Toennies, K.; Behrens, F.; Aurnhammer, M. Feasibility of Hough-Transform-Based Iris Localisation for Real-Time-Application. In Proceedings of the 16th International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002; Volume 2, pp. 1053–1056. Hassaballah, M.; Murakami, K.; Ido, S. An Automatic Eye Detection Method for Gray Intensity Facial Images. IJCSI Int. J. Comput. Sci. Issues 2011, 8, 4. Wildes, R.P. Iris recognition: An emerging biometric technology. IEEE Proc. 1997, 85, 1348–1363. [CrossRef] Lam, K.M.; Yan, H. Locating and extracting the eye in human face images. Pattern Recognit. 1996, 29, 771–779. Yuille, A.L.; Hallinan, P.W.; Cohen, D.S. Feature extraction from faces using deformable templates. Int. J. Comput. Vis. 1992, 8, 99–111. [CrossRef] Santos, G.; Proença, H. A Robust Eye-Corner Detection Method for Real-World. In Proceedings of the IEEE International Joint Conference on Biometrics—IJCB 2011, Washington, DC, USA, 11–13 October 2011.
Sensors 2016, 16, 1105
20. 21.
22. 23. 24. 25. 26. 27. 28.
29. 30.
24 of 24
Sclera Segmentation Benmarching Competition 2015. Available online: http://www.ict.griffith.edu.au/ conferences/btas2015/index.html (accessed on 26 April 2016). Chen, Y.; Adjouadi, M.; Han, C.; Wang, J.; Barreto, A.; Rishe, N.; Andrian, J. A Highly Accurate and Computationally Efficient Approach for Unconstrained Iris Segmentation. Image Vis. Comput. 2010, 28, 261–269. [CrossRef] Viola, P.; Jones, M.J. Robust Real-Time Face Detection. Int. J. Comput. Vis. 2001, 57, 137–154. [CrossRef] Loy, G.; Zelinsky, E. A Fast Radial Symmetry Transform for Detecting Points of Interest. In Proceedings of the 7th Euproean Conference on Computer Vision, Copenhagen, Denmark, 28–31 May 2002; p. 358. Farkas, L.G.; Katic, M.J.; Forrest, C.R. International Anthropometric Study of Facial Morphology in Various Ethnic Groups/Races. J. Craniofacial Surg. 2005, 16, 615–646. [CrossRef] Van de Sande, K.; Gevers, T.; Snoek, C. Evaluating Color Descriptors for Object and Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1582–1596. [CrossRef] [PubMed] Minear, M.; Park, D.C. A lifespan database of adult facial stimuli. Behav. Res. Methods Instrum. Comput. 2004, 36, 630–633. [CrossRef] [PubMed] Platt, J.C. Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods; Advances in Large Margin Classifiers; MIT Press: Cambridge, MA, USA, 1999; pp. 61–74. D˘anescu, R.; Nedevschi, S.; Meinecke, M.M.; To, T.B. A stereovision-based probabilistic lane tracker for difficult road scenarios. In Proceedings of the 2008 IEEE Intelligent Vehicles Symposium, Eindhoven, The Netherlands, 4–6 June 2008; pp. 536–541. Garrity, J. Effects of Aging on the Eyes. Available online: http://www.msdmanuals.com/home/eyedisorders/biology-of-the-eyes/effects-of-aging-on-the-eyes (accessed on 26 April 2016). Fagertun, J.; Stegmann, M.B. The IMM Frontal Face Database; Informatics and Mathematical Modelling, Technical University of Denmark: Kongens Lyngby, Denmark, 2005. © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).