Enhanced Eye Gaze Direction Classification Using a ... - IEEE Xplore

4 downloads 0 Views 2MB Size Report
Combination of Face Detection, CRT and SVM. Amer AI-Rahayfeh. Department of Computer Science and Engineering. University of Bridgeport. Bridgeport, the ...
Enhanced Eye Gaze Direction Classification Using a Combination of Face Detection, CRT and SVM Amer AI-Rahayfeh

Miad Faezipour

Department of Computer Science and Engineering University of Bridgeport Bridgeport, the United States [email protected]

Department of Computer Science and Engineering University of Bridgeport Bridgeport, the United States [email protected]

Abstract-

Automatic estimation of eye gaze direction is an

interesting research area in the field of computer vision that is growing rapidly with its wide range of potential applications. However, it is still a very challenging task to implement a robust eye gaze classification system. This paper proposes a robust eye detection system that uses face detection for finding the eyes region.

The

Circular

Hough

Transform

(CHT)

is

used

for

locating the center of the iris. The parameters of the Circular Hough

Transform

are

dynamically

calculated

based

on

the

detected face information. A new method for eye gaze direction classification using Support Vector Machine (SVM) is introduced and combined with Circular Hough Transform to complete the task required. The experiments were performed on a database containing 4000 images of 40 subjects from different ages and genders. The algorithm achieved a classification accuracy of up to

92.1%.

Keywords - Eye direction detection, face detection, , Circular Hough Transform" SVM, Viola-Jones.

I.

INTRODUCTION

Eye is the most direct way of interaction with the surrounding world and therefore, eye detection and gaze estimation has become a significantly active research area. Different approaches are used to estimate the direction of eye gaze. Being a convenient and natural way of interaction, the field of eye gaze estimation is growing and it is used in various applications, such as Human Computer Interaction (HCI), virtual reality, driver assistance systems and assistance for people with disabilities. Eye gaze classification systems aim to locate the eye in the image and then use the obtained information about the eye region and the head pose to estimate the eye gaze direction to be used in applications such as cursor pointing on the screen. Eye gaze can reveal the attention and express the interest of a person. The eye movements also contribute in expressing the person's emotional states [1]. Eye gaze and movement are required to identify and describe the visual world. Hence, it becomes very demanding to develop robust eye detection and gaze estimation methods to improve human-computer interaction and take it to the next level.

Many applications which are based on eye gaze direction classification technology have been developed in different fields. The first step in many applications in the field of computer vision is eye detection. These applications include iris detection and recognition, facial recognition, facial feature tracking and expression analysis. The next processing steps in these applications depend on the accuracy of the eye detection phase. Although it is not investigated as much, another application that employs eye gaze estimation is to obtain insight about the perception animations and synthesized images for quality optimization and development of efficient algorithms. Donder [2] stated that there is only a subset of positions to which the position of the eye is restricted. According to Donder's law, the eye orientation can be uniquely determined based on the eye gaze direction. The law also states that the eye orientation is independent of the previous eye positions. This paper introduces a new method for eye direction classification, evaluates the performance of a real-time eye gaze direction classification system and suggests some performance enhancements. The proposed enhancements focus on required CPU processing time. The rest of the paper is outlined as the following: Section II introduces different eye detection and gaze direction classification methods available in literature. Section III describes the methods used in the proposed system. Section IV explains the eye gaze direction classification procedure. In Section V, the performance of the proposed system is evaluated and the results of the experiments are discussed. Finally, Section VI draws the conclusion of this paper. II. RELATED WORK There has been much research working to develop robust eye detection and gaze estimation methods. Eye trackers differ in the degrees of freedom which they can track. Simple eye trackers report only the direction of the gaze relatively to the head. Many eye detection and gaze estimation algorithms have been presented in literature. Chern-Shen Lin et al. [3] presented a method based on mapping polar coordinates to detect the eye gaze. Morinoto et al. [4] proposed an eye gaze

estimation scheme that determines the eye gaze direction based on the glint-pupil vector which is defined as the vector between the centers of the corneal reflection (CR) and the eye pupil. A second-order polynomial equation is used for the computation of gaze point coordinates. Beymer and Flickner [5] introduced a 3D eye tracking approach using the 3D eye anatomy model. This approach has a drawback as it needs four cameras and two lighting points. Y00 et al. [6] presented an algorithm for eye detection based on cross-ratio-invariance but it achieved very low accuracy. To improve the system accuracy, a virtual plane that is tangent to the eye cornea is proposed. Even with the suggested enhancement, the method complexity issue is still present because two cameras are needed to detect the position of the eye pupil using the difference between dark and bright pupil images captured by the two cameras [7].

the pixels above and to the left of the specified location x, [13]:

ii(x , y) -" i(x' , y') L.. x!::;x,y!::;y

Y

(1)

where ii (x, y) is the integral image point and i (x, y) is the pixel at location (x,y) in the original image. The integral image is the double integral of the image. Fig. 1 shows how the integral image points are calculated.

Kothari and Mitchell [8] introduced a voting scheme that detects the location of the eyes using spatial and temporal information based on the fact that the gradient along the boundary of the iris goes outward from the center. False pupil candidates are filtered using heuristic rules and a large temporal support. Gao et al. [9] used Hough Transform to find the coordinates of the pupil center and to determine the direction of the eye gaze. M. S. Devi et al. [10] proposed a system for driver fatigue state detection. They find the face using skin color and then apply Circular Hough Transform to detect the eyes. Mehrubeoglu et al. [11] used a smart camera for eye detection. It stores the location of previous matches appearing in successive image frames. The system includes pattern matching, edge detection, and computation and recording of eye center coordinates. Alioua et al. [12] presented a robust eye detection method based on Circular Hough Transform that does not need training data or the use of a special camera.

Fig. 1. Integral Image calculation example.

Three types of features are used by Viola-Jones face detector [13]. The two-rectangle feature is calculated by defining two rectangular regions which are of same size and shape and are horizontally or vertically adjacent and fmding the difference between the sums of the values of pixels in them. Three rectangular regions, two outside rectangles and a center rectangle, are defined and the sum of the outside rectangles is subtracted from the center rectangle to calculate the three-rectangle feature. A four-rectangle feature computes the difference between diagonal pairs of rectangles. Fig. 2 illustrates these features.

III. SYSTEM OVERVIEW This section discusses the details of the eye gaze direction classification system being investigated. The system basically detects the face in the image and applies Circular Hough Transform to detect the iris of the eye. The gaze direction is determined based on the coordinates of the iris center. A.

[I



f!i. .

A

.... . .. �

'

) ., '

. ..

B

Viola-Jones Face Detection

The Viola-Jones face detector included three main concepts which led to successful real-time face detection. These three concepts are: the Integral Image, AdaBoost classifier and Cascading classifiers [13]. The Viola-Jones method does not use image intensities directly but calculates features that are reminiscent of Haar Basis functions. Using an intermediate image presentation; the integral image; it is possible to achieve fast evaluation of the features. The integral image is computed by performing by a few operations per pixel on the processed image. At a location X,Y in the integral image, the point has a value equals to the sum of

f"""""Ffl LkLJ c

��. � .- . .

Fig.

-

D

'

Rectangular features: (A) and (B) The two-rectangle feature,(C) The three-rectangle feature,(D) The four-rectangle feature.

2.

The number of features computed is very large. However, a small number of the computed features can be combined and

used in an effective classifier. The Viola-Jones algorithm uses variant of AdaBoost to select these features and to train the classifier. The third concept Viola-Jones algorithm is The Attentional Cascade. It is a technique used to cascade classifiers to reduce computation time and improve the accuracy of detection. Cascade stages are constructed by using AdaBoost for training classifiers [13].

B.

Circular Hough Transform

To locate the eye in the captured images, the center of the iris is detected using Circular Hough Transform. There are several versions of modified Hough transform which are considered an effective method used for curve detection in images. The Circular Hough Transform (CHT) is a modified version of the Hough Transform. It was first introduced by Duda et al. [14]. The Circular Hough Transform recognizes circular patterns in the processed image. These patterns may be a complete circle or a circular curve. It transforms the sets of feature points in the image into sets of accumulated votes in parameter space. For each feature point, votes are accumulated in an array which spans all parameter combinations. The highest number of votes indicates that there is a shape. The circle pattern which the Circular Hough Transform detects is defined by Equation 2.

C.

Support Vector Machine

Support Vector Machine (SVM) has gained high popularity in the field of object tracking. SVMs are supervised learning models based on some learning algorithms in which data is analyzed and used in classification. The input data is mapped into an N dimensional space using a kernel function, such as linear, quadratic, polynomial, Gaussian Radial Basis and multilayer perceptron. The data points existing on the margin are called the support vectors. These support vectors are used by SVM to define the optimal separating hyper-plane between the classes of input data which leaves the maximum margin from both classes [15]. Fig. 4 shows a linear SVM hyper-plane in a 2dimensional feature space. Direction 2

(2)

�,

Fig. 4. Linear SVM hyper-plane [15].

where Xo and Yo are the coordinates of the center and r is the radius. An example demonstrating CHT is illustrated in Fig. 3.

:: :":

... . ..

.......

.....

Input Edge Image

""'\

Pa rameter Space . \ ... .

Edge Pixel



Accumulate Point



The Figure illustrates two possible hyper-planes. However, the hyper-plane in Direction 2 leaves a larger margin for both classes and thus, it is chosen as a separating hyper-plane. D. HSI Color Space

o

Fig. 3. The Circular Hough Transform.

A group of edge-points in an image is represented by the dark blue circles in Fig. 3. A circle of radius R is added to a 3dimensional parameter space output accumulator for each of the edge-points. These circles are represented by the light blue circles. The 3-dimensional space parameters are the x and y coordinates of the circle center and its radius. The peak value in the accumulator occurs at the point where the circles overlap which is the center of the original circle.

HSI is conic-coordinate representation of color points rearranging the geometry of RGB color representation. HSI stands for hue, saturation, and intensity. Hue describes visual sensation that an area appears to be closer to one of the colors: red, yellow, green, and blue, or a combination of two of these colors. Saturation is the colorfulness of the pixel relative to its brightness. Intensity is the total light amount passing through. Fig. 5 shows the representation of the HSI color space. HLS space

o --------�� 100

Lightness

Fig. 5. Representation of HSI color space [16].

HSI separates luminance, which is the image intensity, from chrominance which is the color information. This separation is very useful in eliminating the effect of changes in lighting conditions. E.

Algorithm Implementation

The proposed eye gaze direction classification algorithm first detects the face in the image using Viola-Jones face detector [13]. Defming the face region is beneficial in defming the minimum and maximum values for the eye radius along with other values to be used as CHT parameters. The values are calculated based on the detected face height.

The Circular Hough Transform has no radius limitations, where the minimum radius allowed is 1 and circles with radii of a positive integer value can be detected. Fig. 7 demonstrates the success of the Circular Hough Transform in eye detection in an image resized to 4 different resolutions. Appropriate values were used for the transform parameters with each size of the image.

In addition, when the region scanned for circles is minimized, the number of detected circles, which are considered eye candidates, is reduced. This increases the accuracy of the Circular Hough Transform in eye detection.

After the face is detected, the algorithm employs the Circular Hough Transform for eye detection. The parameters of the Circular Hough Transform should be assigned appropriate values. These parameters include radius, threshold, and delta. The Circular Hough Transform is flexible in determining the detected circle radius. The minimum and maximum values of the radius are set and circles having radii within the given range are detected by the transform. The threshold is used to create a binary edge map and normalization is performed on the image before thresholding. Delta is defmed as the difference between the circles detected by the transform at which they are treated as the same circle. Delta considers the difference in the centers coordinates and the radii of the circles. Only one of these circles is returned and the others are dismissed. The Circular Hough Transform detects the circles in the provided eye region using the parameters calculated from the detected face information where the radius range is set to be from 3% to 4.5% of the detected face height and delta is set to be 20% of the face height. The threshold parameter is set to 0.15.The selection of these values will be discussed in detail in Section V. The detected circles are iris candidates. Only two of the detected circles are true irises. To determine them, the two circles having more circular curve edge points that achieve the circle equation above (Equation 2) are considered to be the true irises. To find which of the two true eye irises is clearer and more suitable for use in classification, the one with the darkest color is selected due to the black pupil in the middle of the iris. An example of eye detection using the presented algorithm is shown in Fig. 6.

1280 X 720 Pixel

128 x 72 Pixel

38 x 42 Pixel

14 x 16 Pixel

Fig. 7. Detecting the eye with different resolution of the eye image.

The processed image resolution is a factor that affects the required CPU processing time since the method requires pixel­ by-pixel processing. Fig. 8 shows the relation between the image resolution and the required CPU time when the used Circular Hough Transform parameters are the same for all image sizes.

250 200 E -; 150 E ;:: :> Q. u

100 50 o

o

w

w







w

ro







Frame Size (Pixel x 10' )

Fig. 8. CPU time for different image sizes.

Fig. 6. Eye detection using the implemented algorithm.

As it can be concluded, the required CPU time is larger for images with higher resolution.

IV. EYE DIRECTION DETECTION Fig. 9 illustrates the proposed eye gaze direction classification system stages. As explained earlier, the Circular Hough Transform is used to locate the eye iris. After the iris is located, the direction of the eye gaze needs to be determined and classified into one of two directions: right and left. Circular Hough

1 avg=-IX k M k=! M

where M is the number of pixels in the region and X is the hue value of the pixel. Fig. 11 shows the detailed block diagram that sums up the preprocessing stage for direction detection.

c)

Transform

(3)

RGBto HSI Converting

lO

Pre-Processing

(J

Classification Function

Fig. 9. The proposed eye gaze classification system stages.

The Circular Hough Transform defines the circles it detects by 3 values: x and y center coordinates and circle radius. These values are used in the preprocessing stage to extract a sub­ image from the original video frame after the image is converted to HSI scale. The extracted sub-image, highlighted in yellow, is large to ensure obtaining information about the eye region completely. Two regions are defined in the extracted sub-image: right and left regions. Each of these regions has a width and height equal to the iris diameter. The left and right regions are used in determining the eye gaze direction. Depending on the eye gaze direction, one of the two regions, the left or the right, will contain the white area of the eye while the other will contain skin area. In HSI color space, the white and black colors are given maximum hue values while the normal colors will have normal hue values (less than maximum). Thus, there is a noticeable difference between the hue of the white area of the eye and the skin area, as shown in Fig. 10, and hence, the average of the H component from HSI scale for each region, the left and the right, can be used as a good feature for eye direction classification.

HSI( Hue)

Defining Average Regions

Select the Hue color

0

(J

Fig. 11. The block diagram of the preprocessing stage for the direction detection.

To determine whether the gaze direction is left or right, SVM classifier is used. The input to SVM is the feature vector containing the averages of the hue value of the left and right regions of the subimage. The SVM was trained using a training dataset consisting of images captured for 40 subjects. These subjects are from different ages and genders. The dataset used is accessible online [17]. The dataset contains 4000 images containing left and right eye movements. V. EXPERIMENTAL RESULTS This paper presents an eye direction classification system and investigates its performance. The experimental results of this paper are presented in this section. The experiments on the proposed eye gaze classification system were carried out on MATLAB. In the proposed eye direction classification system, the eye is detected using Circular Hough Transform. As mentioned earlier, the performance of CHT in detecting the eyes can be enhanced by using face detection. After the face is detected, its size can be used to calculate the minimum and maximum values of the radius used by CHT for eye detection. Fig. 12 shows the range of possible iris radius values compared to the face width. This range has been calculated by detecting the face in 40 images and finding the iris radius in pixels in each image. The ratio of iris radius to the detected face height ranges from 3% to 4.5%.

Orilinal Colors

The relation between the face height and the delta parameter of the Circular Hough Transform was investigated as well. The minimum delta found was 30% of the face height. To provide an error margin, 20% of the face height is set as the delta value used in CHT.

HSI ( Saturation I

In addition, the eye region can be defined from the detected face area. This minimizes the number of detected eye candidates because it is more likely that the original image contains other circles than the eyes, especially with noisy image backgrounds. Fig. 13 compares the number of the circles detected in the image with and without using detected face information to define the eye region.

HSI ( Intensity I

Fig. 10. Components of eye image in HSI scale.

The average is calculated according to Equation 3:

5.00%

VI. CONCLUSION This paper presented a new method for eye direction classification. Face detection was used for eye region detection and for dynamic calculations of the Circular Hough Transform parameters. In addition, it reduces the number of detected circles in the image. The Circular Hough Transform is used to locate the center of the eye irises. The proposed method uses color features of the detected eye region as inputs of SVM classifier to determine the eye gaze direction. The algorithm was tested using a database consisting of 4000 images of 40 people from different ages and genders. The obtained accuracy of eye direction classification is 92.1%. This result shows that it is possible to use a classification function to determine the eye gaze direction.

*'

g ... .!.

4.50%

Qj

)( 'Co

4.00%

� OIl 'Cij :I: �

., ... co ...

3.50%

Qj

)( 'Co .. " :g co '" .. :s

3.00%

2.50%

1

11

21

Different Faces

REFERENCES

41

31

[I]

Fig. 12. Range of possible iris radius values compared to the face height. -With defining eye region

.. J!l co ." :;;

40

t:

30

.. :; ." '"



0

'0 '" .., �

"

z

10

Calibration Symposium

on

Gaze

Tracking

Eye

Tracking

Method," Research

in and

D. Tweed and I. Vilis, "Geometric Relations of Eye Position and Velocity Vectors During Saccades," Vision Research, vol. 30, no. I, pp. 111-I27,1990.

[3]

C.S. Lin, H.I. Chen, C.H. Lin, M.S. Yeh, and S.L. Lin, "Polar Coordinate Mapping Method for an Improved Infrared Eye-Tracking System," Journal of Biomedical Engineering-Applications, Basis and Communications,vol. 17, no. 3,pp. 141-146,2005.

[4]

C.H. Morimoto, D. Koons, A. Amit, M. Flickner, and S. Zhai, "Keeping an Eye for HCI," Computer Graphics and Image Processing, pp. 171 176,1999.

[5]

D. Beymer and M. Flickner,"Eye Gaze Tracking Using an Active Stereo Head," Proceedings of the iEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2,pp. 451-458,2003.

[6]

D. H. Yoo, J. H. Kim, B. R. Lee, and M. J. Chung, "Non-Contact Eye Gaze Tracking System by Mapping of Corneal Reflections," Fifth iEEE

20

E

of the

Applications,pp. 34-34, 2006.

50

C co u

Ohno, "One-Point

[2] c:::i:::IW thout defining eye region

60

I.

Proceedings

International Conference on Automatic Face and Gesture Recognition, 0

pp. 94-99, 2002. 5

0

15

10

Different Faces

20

[7]

Image Understand,Special Issue on Eye Detection and Tracking,vol. 98,

Fig. 13. The number of detected circles with and without face detection

The eye direction is classified using SVM. The accuracy of the proposed system was tested using the 4000 images database [17]. For training data, 3600 images were used and the remaining 400 images were used as test samples. Fig. 14 shows the accuracy of eye detection using SVM with different kernels. The highest accuracy is achieved when using the linear SVM kernel which is equal to 92.1%. 94.0%

:::J u u c(

84.0%



[9]

R. Kothari and J. Mitchell,"Detection of Eye Locations in Unconstrained Visual Images," Proceedings of international Conference on image Processing,vol. 3,pp. 519-522, 1996. J. Gao, S. Zhang and W. Lu, "Application of Hough Transform in Eye Tracking and Targeting," 9th international Conference on Electronic Measurement & instruments, (iCEMl'09),pp. 751-754,2009.

[10] M. S. Devi,M. V. Choudhari and P. Bajaj,"Driver Drowsiness Detection Using Skin Color Algorithm and Circular Hough Transform," 4th International Conference,

on Emerging Trends in Engineering and

[12] N. Alioua, A. Aouatif, R. Mohammed and D. Aboutajdine, "Eye State Analysis Using Iris Detection Based on Circular Hough Transform,"

88.0% 86.0%

[8]

[II] M. Mehrubeoglu, L. M. Pham,H. I. Le, R. Muddu, and D. Ryu, "Real­ time Eye Tracking Using a Smart Camera," in iEEE Applied imagery Pattern Recognition Workshop (AIPR),pp. 1-7,2011.

90.0%

> u

no.l,pp.25-5I,2005.

Technology (iCETET),pp. 129-1 34, 2011.

92.0%



D. H. Yoo and M. 1. Chung, "A Novel Nonintrusive Eye Gaze Estimation Using Cross-Ratio Under Large Head Motion," Comput. Vis.

International (lCMCS),pp.

Conference

on

Multimedia

Computing

and

Systems

1-5,2011.

[13] P. A. Viola and M. 1. Jones "Robust real-time face detection", Comput. Vis., vol. 57, no. 2, pp. I37 -154,2004.

82.0% 80.0% Linear

Quadratic

Multilayer

Gaussian Radial

Perceptron

Basis Function

(MLP)

(RBF)

Kernel Function

Fig. 14. Accuracy of SVM with different kernel functions.

into

1.

[14] R. O. Duda, and P. E. Hart, "Use of the Hough Transformation to Detect Lines and Curves in Pictures," Communications of the ACM 15, no. 1, pp. 11-15, 1972. [15] S. Theodoridis, K. Koutroumbas, "Pattern Recognition," Academic Press, pp. 95,2006. [16] 1.c. Russ,'The Image Processing Handbook," CRC Press,pp. 45,2011. [17] A. AI-rahayfeh, "Face Images Dataset ", June 2013, [online]. Available: http://al-rahayfeh.com, [Accessed Sep. 19,2013].

Suggest Documents