Pupil Detection Algorithm Based on RANSAC Procedure Radu Gabriel Bozomitu1, Alexandru Păsărică1, Robert Gabriel Lupu2, Cristian Rotariu3, Eugen Coca4 1
Faculty of Electronics, Telecommunication and Information Technology, “Gheorghe Asachi” Technical University, Iaşi, Romania
[email protected],
[email protected] 2 Faculty of Computer Engineering and Automatic Control, “Gheorghe Asachi” Technical University, Iaşi, Romania
[email protected] 3 “Grigore T. Popa” University of Medicine and Pharmacy, Iaşi, Romania
[email protected] 4 “Ştefan cel Mare“ University, Suceava, Romania
Abstract—In this paper a new pupil detection algorithm based on RANSAC procedure is presented. Unlike other similar algorithms reported in the literature, the proposed algorithm provides higher accuracy, low running time and operates properly in noise conditions and for variable illumination. This algorithm is used in the field of assistive technology in order to communicate with neuromotor disabled people by detecting their gaze direction, and can easily be adapted for other applications, such as PC control using eye-gaze detection.
I.
INTRODUCTION
Pupil detection algorithms (PDA) have been presented in many research articles which focus on eye tracking applications [1-6]. Some PDAs illustrated in literature provide lower accuracy due to the principle used to detect the pupil center, like Starburst algorithm [2] and the algorithm based on circular Hough transform [3]. Other algorithms presented, although they have higher accuracy and operate quite well in noise condition, require high running time and, as consequence, are difficult to use for real-time gaze direction detection applications (e.g. pupil detection algorithm based on elliptical Hough transform). For real-time applications, the operation of these algorithms are affected by the following noise sources: the noise due to video camera circuits, corneal reflection (due to infrared illumination of the eye), involuntary blinking, salt and pepper type noise due to variable and nonuniform lighting condition (that results after eye image binarization), physiological tremor of the eye, pupil position in the eye image, etc. Other PDAs which provide low noise immunity are represented by the least-squares fitting of an ellipse technique [4] and the projection method algorithm. II.
RANSAC PROCEDURE
In this paper, a PDA for real-time applications based on RANdom SAmple Consensus (RANSAC) paradigm, that provides high accuracy, high noise immunity for difficult operation conditions, and lower running time, is presented. The main disadvantage of fitting to ellipse techniques by using direct least squares algorithm [4] or by projections method is represented by the possibility to detect outlier
978-1-5386-0674-2/17/$31.00 ©2017 IEEE
characteristic points, which occur in the case of noisy images and are not placed on the pupil contour (or in its immediate vicinity). These points can reduce the fitting of ellipse accuracy and some errors of pupil center coordinates detection can occur. Thus, the error of pupil center position in the image provided by the video camera, will determine a significant error of the cursor positioning on the user screen, thereby reducing the accuracy of the eye tracking applications. This problem can be solved by using the RANSAC procedure for model fitting [5, 6]. In our case, the model we are looking for is represented by an ellipse, which better approximates the eye pupil. RANSAC is a technique for model fitting in the presence of high and unknown number of outliers (pixels placed outside of the contour approximated by an ellipse). The RANSAC algorithm is used to determine the optimal ellipse, which better approximates the initial set of input data points, obtained from the eye image segmentation. A.
Description of the new PDA based on RANSAC procedure In this section, the steps involved by the proposed PDA based on RANSAC procedure are presented. The eye image acquisition is done by using an infrared (IR) video camera based on dark pupil technique. The proposed PDA begins with the eye image filtering in order to reduce the shot noise and line noise present in the image provided by the head-mounted video camera. In order to determine the pupil contour, required by RANSAC procedure, the eye image is binarized by using a quantitative segmentation technique [7]. The eye pupil image obtained by binarization procedure is affected by noise coming from different sources, illustrated in Section I. In order to increase the detection accuracy and to stabilize the cursor position on the user screen, the proposed PDA uses some processing techniques to eliminate the noise from the image. The first processing step consists in pupil image reconstruction by removing/diminishing the corneal reflection that appears inside the pupil area or at its edge. The pupil image reconstruction is based on the morphological dilation of dark pupil pixels situated between the two different adjacent areas
(the pupil area and the background image). The pixel dilation is done by applying a line type border of dark pixels around the analysed pixel. The algorithm fills the white gaps within the pupil image. The next processing step is applied in order to remove the noise caused by artefacts from other dark areas of the image, such as eyelashes or eyebrows. This step is done by identifying all dark areas of the segmented image and selecting only the largest one, which corresponds to the pupil image. The fitting of ellipse of input data points (obtained after the above illustrated steps and pupil contour detection) by RANSAC algorithm is realized in a normalized space. By normalization transformation, the centroid of the new points is represented by the origin of coordinates system and their average distance from the origin is equal to 2 . The next step of the algorithm is to find the best fitting of ellipse described by these points. Some algorithms use the least-squares fitting of an ellipse [4] by using all the feature points, but high errors occurring in the feature-detection stage can significantly influence the accuracy of eye-gaze detection results. The least-squares methods use all input data to fit a model, because in this case it is supposed that all of the input samples are inliers (a sample in the data attributable to the mechanism being modeled [2]). In this case, any possible error is due to measurement error. On the other hand, the RANSAC admits the possibility of outlier samples and uses only a subset of the input data to fit the model [2, 5]. RANSAC is an iterative procedure that selects many small but random subsets of the input data, uses each subset to fit a model and finds the model that better fits with the input set of data points [2, 5]. The number of outliers depends on the performance of the pupil detection algorithm and the video camera used. For a high performance binarization algorithm, the number of outliers is small compared with the inliers number. The RANSAC algorithm finds the ellipse that best fits the pupil contour, by eliminating the outlier samples. The following procedure is repeated for R iterations. After normalization, the algorithm randomly chooses five points from the detected feature set of input data points, since this is the minimum number of points required to find all ellipse parameters. Next, we use the singular value decomposition (SVD) on the conic design matrix (D) obtained from the normalized featurepoint coordinates to find the parameters of the ellipse that perfectly fits these five points. If the determined parameters do not describe an ellipse (are not real, the ellipse center is not inside of the image and the major axis is not less than two times of the minor axis), the algorithm choses randomly others five points to fit of a new ellipse, until the above ellipse constraints are accomplished. Then, the algorithm determines the number of feature points (inliers) from the input data set, that better approximate the ellipse model. This set of input points is named the consensus set. Obviously, after a certain number of iterations an ellipse is fit to the largest consensus set. The inliers are represented by the input feature points for which their algebraic distance to the ellipse model is less than a
fixed threshold. An optimal value for this threshold is 1.98 pixels, derived from a method illustrated in [2]. In [2] it is supposed that the average error variance of feature detector is approximately one pixel and this error is Gaussian distributed with zero mean. In order to obtain a 95% probability that a sample to be classified as an inlier, in [2] the threshold is derived from a χ2 distribution with one degree of freedom. From the computational point of view, it is infeasible to test all possible combination of feature input points. In order to avoid this problem, the number of randomly tested subsets must be determined such that at least one of the randomly tried subsets to contain only inliers. In [2] it is shown that this can be obtained with a probability p = 0.99 if the number of iterations of RANSAC procedure is given by: R = log (1 − p ) log 1 − w5 (1) where w is the proportion of the inliers from the input samples. At the start of the algorithm the number of iterations R is set very large and then, during iterations, its value is set lower based on equation (1). The number of algorithm iterations can be reduced each time when a new largest consensus set (represented by the percentage w of inliers) is detected, until the total number of inliers remains constant. The lower number of procedure iterations determines a decrease of algorithm running time, which is an advantage for real-time applications. One of the RANSAC procedure problem is that it is based on randomly detecting of the largest consensus set, which means that the algorithm can provide different results (regarding the coordinates of the detected pupil center) for identical conditions of algorithm running on the same eye images. This problem produces a noise overlapped on the signals provided by the PDA, which determines cursor instability on the user screen. In order to increase the accuracy of RANSAC algorithm, the result of ellipse fitting can be improved through a model-based optimization, using a Nelder-Mead Simplex search, illustrated in [2], that does not rely on feature detection. Other techniques illustrated in the literature, used to increase the cursor stability on the user screen, are based on real-time filtering of the signals provided by the PDA (Fig. 2.a).
(
)
B. Ellipse fitting in a normalized space with RANSAC method The input data are represented by the coordinates of the points on the pupil contour, which can be described in a homogenous unnormalized (P) and normalized ( Pˆ ) space by using the following matrix: P = [ X Y 1]T T Pˆ = Xˆ Yˆ 1
X = [ x , x , , x ]T Xˆ = [ xˆ , xˆ , , xˆ ]T n n 1 2 1 2 ; (2) T T Y = [ y1 , y2 , , y n ] Yˆ = [ yˆ1 , yˆ 2 , , yˆ n ]
;
For a better fitting of ellipse, these points (described by P matrix) are normalized by using the following transformation: (3) Pˆ = H ⋅ P The transformation matrix, H, can be written: s 0 tx H = 0 s t y (4) 0 0 1 In H matrix, tx, ty and s are given by the following equations:
t x = − s ⋅ x 2 (5) ; s= n 1 2 2 t y = − s ⋅ y x − x + y − y ( i ) ( i ) n i =1 where x , y represent the mean values of the vectors which contain the input points coordinates in the unnormalized space. After normalization, the RANSAC algorithm starts by randomly choosing five input points in the normalized space, Pˆi ( xˆi , yˆi ) , i = 1, 2,3, 4, 5 , because this is the minimum number of points required to describe an ellipse. The design matrix D can be written as: xˆ12 xˆ1 yˆ1 yˆ12 xˆ1 yˆ1 1 2 2 xˆ2 xˆ2 yˆ 2 yˆ 2 xˆ2 yˆ 2 1 (6) D = xˆ32 xˆ3 yˆ3 yˆ32 xˆ3 yˆ3 1 2 2 xˆ4 xˆ4 yˆ 4 yˆ 4 xˆ4 yˆ 4 1 xˆ 2 xˆ yˆ yˆ52 xˆ5 yˆ5 1 5 5 5
where ( xˆi , yˆi ) , i = 1, 2, 3, 4, 5 represent the coordinates of the five points randomly chosen in the normalized space. The vector of the algebraic ellipse parameters ( Aˆ ), described as a conic in the normalized space, result from the singular value decomposition (SVD) of D matrix and is given by: Dmn = U mm SmnVnnT ; m = 5 ; n = 6 (7) T
Aˆ = [v16 , v26 , , v66 ]T = Aˆ (1), Aˆ (2), Aˆ (3), Aˆ (4), Aˆ (5), Aˆ (6) (8) The matrix description of the ellipse in the normalized space is: Pˆ T ⋅ Cˆ ⋅ Pˆ = 0 (9) ˆ where C is the ellipse matrix in the normalized space. The algorithm determines the number of inliers. The inliers are the input points for which their algebraic distance to obtained ellipse in the normalized space is less than 1.98 pixels. Form equation (9), by using (2), we obtain the conic description of the ellipse in the normalized space, given by: ˆ ˆ + Aˆ (3)Yˆ 2 + Aˆ (4) Xˆ + Aˆ (5)Yˆ + Aˆ (6) = 0 (10) Aˆ (1) Xˆ 2 + Aˆ (2) XY By applying the H transformation in (4) to the matrix equation described in (9), we obtain the matrix equation in the homogenous and unnormalized space, given by: PT ⋅ C ⋅ P = 0 ; C = H T ⋅ Cˆ ⋅ H (11) where C and Cˆ represent the ellipse matrix in the homogenous unnormalized and normalized space, respectively, given by: Aˆ (1) B(2) 2 B (4) 2 B(1) C = B(2) 2 B(3) B(5) 2 , Cˆ = Aˆ (2) 2 ˆ B(4) 2 B(5) 2 B(6) A(4) 2
Aˆ (2) 2 Aˆ (3) Aˆ (5) 2
Aˆ (4) 2 Aˆ (5) 2 (12) Aˆ (6)
From equation (11), by using (2), we obtain the conic description of the ellipse in the homogenous and unnormalized space: B (1) X 2 + B (2) XY + B (3)Y 2 + B (4) X + B (5)Y + B (6) = 0 (13) By identifying the terms in formulas that describe C matrix in equations (11) and (12), we obtain the algebraic parameters of the ellipse in the homogenous and unnormalized space, which are given by the following expressions:
B (1) = s 2 ⋅ Aˆ (1); B (2) = s 2 ⋅ Aˆ (2); B(3) = s 2 ⋅ Aˆ (3) B (4) = s ⋅ 2 ⋅ Aˆ (1) ⋅ t x + Aˆ (2) ⋅ t y + Aˆ (4) (14) ˆ ˆ ˆ B (5) = s ⋅ 2 ⋅ A(3) ⋅ t y + A(2) ⋅ t x + A(5) B (6) = Aˆ (1)t x2 + Aˆ (2)t x t y + Aˆ (3)t y2 + Aˆ (4)t x + Aˆ (5)t y + Aˆ (6) Next, the geometric ellipse parameters (XC, YC, a, b, τ) in the homogenous and unnormalized space are determined from the algebraic ellipse parameters given by equations (14); XC, YC represent the coordinates of the ellipse center, a and b represent the major and minor axes of the ellipse and τ is the rotation angle of the ellipse. The RANSAC algorithm verifies the ellipse constrains in the homogenous and unnormalized space: (15) 0.75 < r < 1.34 ; r = a / b Finally, the algorithm realizes the model-based optimization of these parameters in the homogenous and unnormalized space. The number of necessary iterations for RANSAC algorithm convergence (R) is obtained when the total number of inliers points remains constant. III.
EXPERIMENTAL RESULTS
In order to test the performances of the proposed algorithm we use an evaluation method of gaze direction detection for static eye images and for video eye images captured in real-time with an IR video camera. The experimental results were obtained by analyzing two databases having eye images with spatial resolution of 640x480 pixels. The first dataset (DB1) used for this analysis consists of 184 IR images acquired in laboratory conditions. The second dataset (DB2) consists of 400 IR eye images from a publicly available database CASIA-IrisLamp [8]. We use a head-mounted device with an IR video camera fixed on a pair of glasses (developed by our team) and a PC with i7-4790 3.6 GHz CPU for eye images processing. The user controls the cursor on the screen, which is located 60cm away, by detecting his point of gaze. The tests were done for healthy subjects, obtaining the cursor movement based on the eye-gaze detection, procedure used in assistive technology. The main application of this eye tracking system is to communicate with neuromotor disabled people by using keywords technology, but can easily be adapted for healthy users. For static eye images the algorithm accuracy is evaluated by computing the relative error of the detected pupil center with respect to its ideal center on both axes of the coordinate system (Fig. 1 - a, b). Since for the eye tracking application in the field of assistive technology, the accurate determination of the pupil center it is of interest, we present the detection error of the eye pupil center in the term of the Euclidean distance between the detected and the ideal center of the eye pupil, for both databases analyzed (Fig. 1.c). The mean values of the relative errors on both axes of the coordinate system and the relative error mean value of the pupil center positioning are illustrated in Table 1. The experiments performed for real-time applications are illustrated in Fig. 2. For video eye images captured in real-time, the system evaluation is made by moving and controlling with gaze direction the cursor on the user screen, divided in 9 identical quadrants.
Relative error on Ox (DB2)
X (pixels)
Relative error on Ox (DB1)
2
2
1 0
50
100
a)
150
Relative error on Oy (DB1)
0
100
200
300
Relative error on Oy (DB2)
4
5
0
50
100
b)
150
Euclidean distance (DB1)
0
15
10
10
5
5 50
100
150
200
300
Euclidean distance (DB2)
15
No. of images
100
Y (pixels)
2
c)
100
200
a)
300
No. of images
Figure 1. Relative error of the detected pupil center on both coordinate axes and Euclidian distance between detected and ideal pupil centers (DB1, DB2)
The cursor movement on the user screen is done on different imposed tracks with equal pauses in the selection area (represented by the red circles in Fig. 2.b) of each quadrant. The ideogram selection is done by maintaining the cursor’s position into the selection area of each quadrant for a certain dwell time. In this way, it is possible to make difference between ideogram/ keyword viewing and gaze-control. TABLE I.
ACCURACY AND RUNNING TIME OF THE RANSAC PDA
PDA accuracy Relative error mean value on Ox axis [%] Relative error mean value on Oy axis [%] Relative error mean value between pupil centers [%] Mean value of the PDA running time [s]
DB1 0.5506 0.9835 3.5903 0.0271
DB2 0.5885 0.8974 2.9224 0.0286
The experimental results illustrated in Fig. 2.b show a good gaze control of the cursor on the user screen in all the nine quadrants. Most subjects have succeeded to fix the point of gaze in the selection area (represented by the red circles) for a certain dwell time in order to perform the ideograms selection. The experimentally obtained results for both databases (the PDA accuracy and running time, illustrated in Table 1) validate the RANSAC algorithm for real-time eye tracking applications. IV.
CONCULSION
The experimental results presented indicate that the system can be used for real-time gaze detection control applications and can operate in variable and nonuniform lighting condition. Due to high accuracy of the algorithm with increase stability of the cursor on the user screen, the application of the proposed system is in the field of assistive technology for communication with severe neuromotor disabled patients. The number of identical quadrants on the user screen, the size of the selection area of each quadrant and duration of the dwell time depend on user experience in using the system and the algorithm accuracy. For experienced users, the system can be updated by increasing the number of identical quadrants (16 or more) in order to improve their communication capability.
b) Figure 2. a) Filtered and raw signals provided by the PDA (cursor coordinates on both axes); b) Correspondent cursor movement tracking on the user screen
ACKNOWLEDGMENT The work has been carried out within the program Joint Applied Research Projects, funded by the Romanian National Authority for Scientific Research (MEN – UEFISCDI), contract PN-II-PT-PCCA-2013-4-0761, no. 21/2014 (SIACT). REFERENCES [1]
[2]
[3] [4]
[5]
[6]
[7] [8]
D. W. Hansen and Q. Ji, “In the Eye of the Beholder: A Survey of Models for Eyes and Gaze”, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 3, pp. 478-500, March 2010; Dongheng Li, Derrick J. Parkhurst, “Starburst: A robust algorithm for video-based eye tracking”, Human Computer Interaction Program, Iowa State University, Ames, Iowa, 50011; N. Cherabit, F. Z. Chelali, A. Djeradi, “Circular Hough Transform for Iris localization”, Science and Technology, 2(5), 2012, pp. 114-121; A. W. Fitzgibbon, M. Pilu, and R. B. Fischer, “Direct least squares fitting of ellipses”, in proc. of the 13th International Conference on Pattern Recognition, pp. 253-257, Vienna, September 1996; M. Fischler, R. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography”, Communications of the ACM 24 (6), 1981, pp. 381-395; Lech Świrski, Andreas Bulling and Neil Dodgson, “Robust real-time pupil tracking in highly off-axis images”, in Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA '12, pp. 173-176, Santa Barbara, California, March 28 - 30, 2012; Y. J. Zhang, J. J. Gerbrands, “Objective and quantitative segmentation evaluation and comparison”, Signal processing, 39, 1, pp. 43-54, 1994; http://biometrics.idealtest.org/, Casia-Iris-Lamp.