Research Proficiency Evaluation Project Report ... - Semantic Scholar

2 downloads 262 Views 8MB Size Report
where, h. cR and T are the rotation and the translation respectively of the head- fixed coordinates with respect to the
Research Proficiency Evaluation Project Report Robust 3D Gaze Estimation Debanga Raj Neog, # 72289119 PhD Track MSc Department of Computer Science The University of British Columbia Supervisor: Dinesh Pai Co-supervisor: Robert J. Woodham

Abstract. Eye tracking and gaze estimation using video, also known as videooculography, is a very important topic of investigation in neuroscience, computer vision and human machine interaction. Recently, computer graphics researchers have also shown interest in eye movement analysis for generating realistic eye movements in animations. To satisfy the need for a robust gaze estimation technique, in our project, we decided to robustly estimate gaze in world coordinates from video. We have developed a gaze estimation technique that uses binocular information to obtain the highest precision in the gaze point estimation. The technique is robust against interferences such as eye blink, or presence of eyelashes. In this report we discuss a framework to estimate the globe configuration in the world coordinates to obtain 3D gaze with the highest precision. The details of implementation and evaluation of the framework are also presented to show the robustness of our technique. Keywords. Pupil tracking, gaze, globe, ocular torsion

1.

INTRODUCTION

Analysis of eye movements can provide deep insight into behavior during object interception. Apart from being used as a sensory organ, because of our ability to control our eyes, it can also be used to generate desired outputs for various applications. This has led to the development of gaze controlled human machine interfaces. The eyes also have an important role in conveying our emotions during any conversation, for which they are considered one of the most integral parts of facial expression detection algorithms in computer vision. Furthermore, for generating more realistic eye movements in animated characters, researchers in computer graphics are also analyzing eye movement data for creating a generative model. Robust non-intrusive eye detection and tracking is therefore crucial for the analysis of eye movement, and this has motivated me to work toward the development of a robust eye detection and tracking system. In the scientific community, the importance of eye movements is implicitly acknowledged as the method through which we gather the information necessary to identify the properties of the visual world [Hansen and Ji 2010]. The two key components of research on eye detection and tracking are: eye

2

·

Debanga Raj Neog

localization in image, and gaze estimation. Before discussing further on these components of our 3D gaze estimation framework, we would like to define a few terms. To provide readers with clear understanding, Fig. 1 is used to illustrate these terms in context of our system setup consisting of a motion capture system and an eye tracker. More details on equipment can be found in Section 3. Videooculography. Eye tracking performed using video is popularly known as videoculography. In our project video is captured using a head mounted eye tracker. Gaze. Gaze is the direction in which the eyes are looking. Gaze direction is either modeled as the optical axis or as the visual axis. The optical axis (a.k.a. line of gaze (LoG)) is the line connecting the pupil center, cornea center, and the globe center. The line connecting the fovea and the center of the cornea is known as visual axis (a.k.a. the line of sight (LoS)). The gaze can be defined either locally in the head-fixed coordinates, or globally in the world coordinates. The gaze in world can also be defined as gaze in 3D. Monocular gaze. For a single eye, gaze is defined in terms of a point of regard, which is intersection of the visual axis and a scene plane. Binocular gaze. If both the eyes are considered, the point where the visual axes of the two eyes meet can be defined as point of regard of binocular gaze. The binocular gaze is illustrated in Fig. 1, where the two visual axes, V1 and V2 are meeting at an infrared marker M. Thus M is the point of regard of binocular gaze, in this example. Ocular Torsion. Ocular torsion is defined as the rotation of globe around the visual axis. In Fig. 2, the rotation of globe around visual axis is shown. The more detailed mathematical description of eye rotation can be found in Section 5.3.

Fig. 1: Illustrative diagram of our experimental setup.

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

3

Fig. 2: Rotation of globe in the head-fixed coordinates.

Although there are many gaze estimation techniques available in the literature, some of the inherent problems such as eye blinking and eyelash interference are still not completely solved. In this project our goal to find a robust algorithm for 3D gaze estimation in the world coordinates. The rest of the report has been organized in the following sections: Section 2 provides the problem statement of our project. In Section 3, the equipment used for eye and motion capture are discussed. A review of related work, with description of the major challenges in this area of research, and recent attempts to solve the current problems based on theoretical developments is provided in Section 4. Section 5 describes our proposed framework, with appropriate mathematical background, when necessary. Results of our evaluations are provided in Section 6, and finally Section 7 summarizes our conclusions and proposed future directions. 2.

PROBLEM STATEMENT

The goal of our project is to develop a videooculography based robust 3D eye gaze estimation technique to support computer vision, neuroscience, and computer graphics community on their research related to eye detection and tracking. The existing videooculography methods have not attempted to estimate binocular gaze under unrestrained head movements which is robust against interference produced by eye blinks. To solve this problem, our investigation will be broadly divided into two phases: (1) Estimating gaze in the head-fixed coordinates with the highest accuracy by using binocular information of eyes. Here, we detect and track suitable eye features for robust pupil position estimation and blinking detection, (2) To estimate the position and orientation of the globe in the world coordinates. As shown in the block diagram of the proposed framework in Fig. 3, we com-

4

·

Debanga Raj Neog

Fig. 3: Block diagram of proposed gaze estimation framework

bined both eye tracking data and motion capture data to estimate gaze in the world coordinate space. In our 3D binocular gaze estimation framework (See Fig. 3), we collect video from a head mounted eye tracker, track the head and the eye tracker itself using a motion capture system, use computer vision algorithms to track eye features in the video and remove eye blinks, and finally use a projective camera to globe model (See Section 5.3) to estimate globe configuration in the world coordinates. By estimating the meeting point of the visual axes of both the eyes, the gaze in the world coordinate can be estimated as shown in Fig. 1, which we also call gaze in 3D. 3.

EQUIPMENT

The eye feature detection and 3D gaze estimation algorithms of our project are evaluated on a video database collected from a binocular eye tracking device (CETD, Chronos Vision, Berlin) with IR light sources and CMOS cameras in near IR range [Chronos 2004]. The C-ETD consists of a head mounted unit and a system computer. The estimation of the configuration of globe in a situation such as an object interception task is complicated, and we used a motion capture system (Vicon Motion Capture System, Vicon, Los Angeles) to localize IR markers put on eye tracker and head in the world coordinates for this purpose. In Fig. 4, the subject is wearing C-ETD, and simultaneously body markers are also being tracked using the Vicon motion capture system.

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

5

Fig. 4: Subject wearing Chronos eye tracking device in a Vicon motion capture system experimental setup (left), and a sample image from the video captured using Chronos eye tracking device (right).

4.

RELATED WORK

Eye feature detection and tracking is a widely researched topic in computer vision, and in this section we attempt to provide a brief literature review of the related work. This review emphasizes the importance of combining the traditional image based eye tracking with an eye model to estimate gaze in the world. Furthermore, the limited number of research works on ocular torsion and eye blink detection provide the scope for future investigation. 4.1

Motivating applications

In computer vision, the study of eye tracking as a tool for different applications has been investigated for many years. [Duchowski 2002] reports on eye tracking methodologies spanning three disciplines: neuroscience, psychology, and computer science, with brief remarks about industrial engineering and marketing. Jacob and Karn [2003] discussed the applications of eye movements mainly in the following major areas: analyzing interfaces, measuring usability and gaining insights into the human performance. Another promising area is the use of eye tracking techniques to support interface and product designs [Jacob and Karn 2003]. Human computer interaction has become an increasingly important part of our daily life. User’s gaze can provide a convenient, natural and high-bandwidth source of input. A non intrusive eye tracking method based on iris center tracking was presented in [Kim and Ramakrishna 1999] for human computer interaction. In [Atienza and Zelinsky 2002] active gaze tracking is performed for human-robot interactions. Another interesting practical application of eye tracking can be for driving safety. It is widely accepted that deficiencies in visual attention are responsible for a large proportion of road traffic accidents [Huang and Wechsler 1998]. Eye movement recording and analysis provide important techniques to understand the nature of driving task and are important for developing driver training strategies and accident countermeasures. Another important application is iris recognition, which is an automated method of biometric identification that uses mathematical patternrecognition techniques on video images of the irises of an individual’s eyes, whose complex random patterns are unique and can be seen from some distance [Daugman

6

·

Debanga Raj Neog

2004]. 4.2

Eye feature detection

Starting with the discussion of recent advances in eye feature detection for videooculography, we should first mention that the taxonomy of eye detection consists of mainly two methods: Shape-based methods and appearance-based methods [Hansen and Ji 2010]. Shape-based methods consist of a geometric model of eye, and a similarity measure. In elliptical shape models, the geometric model parametrizes the pupil or iris of the eye as an ellipse. [Young et al. 1995] extracted the iris and pupils using a specialized Hough Transform, but with a requirement of explicit feature extraction. We have provided a detailed description of the Hough transform in Section 5.1.1. While the circularity shape constraint of pupil/iris shape has limited the applicability of circular Hough transform only for the near frontal faces, the application of a modified Hough transform for detecting ellipses is, on the other hand, expensive in terms of processing time. In [Masek et al. 2003], [Min and Park 2009] Hough transform for circles and its modified version is used for detecting the iris. Among the approaches other than the Hough transformation, in [Li et al. 2005], a low cost eye tracking algorithm called Starburst is introduced for detecting the pupil using an elliptical shape model. The algorithm locates the strongest graylevel differences along the rays shot from estimated pupil center and recursively sparkles new rays at previously found maxima to get better estimate of the pupil contour, then best fit to ellipse is done using a model-based minimization. The starburst algorithm is more accurate than pure feature-based approaches yet is significantly less time-consuming than pure model-based approaches. Some articles such as, [Hansen and Pece 2005] have applied EM and RANSAC optimization schemes to locally fit an ellipse to the iris in image. The dynamic nature of the pupil diameter, based on environmental lighting, inspired [Pamplona et al. 2009] to propose a pupil reflex model. A delay differential equation is introduced to adapt the model even to abrupt illumination changes. An important property of the shape based methods is their general ability to handle shape, scale, and rotation changes. While the shape of eye is an important descriptor, so is its appearance. The appearance-based methods are also known as image template or holistic methods. The appearance-based methods detect and track the eyes directly, based on the photometric appearance as characterized by the color distribution or the filter responses of the eye and its surroundings. 4.3

Eye tracking systems

As the eye scans the environment or fixates on particular objects in the scene, a gaze tracker simultaneously localizes the eye position in the image and tracks its movement over time to determine the direction of gaze. For image based eye detection and tracking, we need to first localize the eye position in the image and detected eyes are tracked frame wise in the video frames. Classical methods for eye tracking. There are mainly three general methods to track the motion of eyes, which are briefly described below: a) Scleral search coils: These kinds of coils can be integrated to contact lenses. The orientation of the coil can be estimated in a magnetic field. When a coil of wire moves in a magnetic field, the field induces a voltage in the coil. If the coil

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

7

is attached to the eye, then a signal of eye position will be produced. The main advantage of this method is the accuracy and almost unlimited resolution. But this is an invasive method and the thin wire connecting the coil to the measuring device is not comfortable to the user. b)Electrooculography (EOG) based tracking: In this method sensors (electrodes) are attached to the skin around the eyes to measure the electric field. This method uses the fact that the eye acts as a dipole in which the anterior pole is positive and the posterior pole is negative. Since there is a permanent potential difference between the cornea and the fundus of approximately 1mV, small voltages can be recorded from the region around the eyes which vary as the eye position varies. By carefully placing electrodes, it is possible to separately record the horizontal and vertical movements. This method can be used to detect the eye motions even when eye is closed. However, the signal can change when there is no eye movement. It is prone to drift and giving spurious signals, and also the state of the contact between the electrodes and the skin produces other sources of variability. It is not a reliable method for quantitative measurement, particularly of medium and large saccades. However, it is a cheap, easy and non-invasive method of recording large eye movements. c) Video based eye tracking (Videooculography): Based on light sources, two different types of video based eye trackers are found: infrared source based and visible light source based. IR Source based. If a fixed light source is directed at the eye, the amount of light reflected back to a fixed detector will vary with the eye’s position. This principle has been exploited in a number of commercially available eye trackers. Infrared light is used because this is “invisible” to the eye, and doesn’t serve as a distraction to the subject. As infrared detectors are not influenced to any great extent by other light sources, the ambient lighting level does not affect measurements. Eye blinks can be a problem, as not only do the lids cover the surface of the eye, but the eye retracts slightly, altering the amount of light reflected for a short time after the blink. Visible light source based. With the development of video and image analysis technology, various methods of automatically extracting the eye position from images of the eye have been developed. In some systems a bright light source is used to produce “Purkinje” images, these are reflections of the light source from various surfaces in the eye (the front and back surfaces of the cornea and lens). Tracking the relative movements of these images gives an eye position signal. More commonly the video images are processed using computer software to calculate the position of the pupil and its center. This allows the vertical and horizontal eye movements to be measured. However, image based methods tend to have temporal resolution lower than that achieved with IR techniques. Furthermore, spatial resolution can also be limited. The eye tracking systems can be divided into two types in terms of the arrangement of the hardware: remote and head mounted. The main advantages of the head mounted eye trackers are: a detailed view of the eye region can be obtained, they can give more accurate results than the remote eye trackers and they are more flexible in terms of controlling the view of the eye region. But head mounted systems

8

·

Debanga Raj Neog

are intrusive in comparison to the remote eye trackers. 4.4

Eye feature tracking

The video based eye tracking methods purely based on the detection of eyes in a frame-by-frame approach do not use the information from the previous frames. These approaches can be useful in initialization and validation of the hypothesis. But they are not very robust, and therefore a filtering approach may be more appropriate. In [Ji and Yang 2001] a real-time prototype computer vision system for monitoring driver vigilance has been proposed, where they track the pupil of the eye under IR illumination. [Hansen and Pece 2005] proposed an active contour tracker which combines particle filtering with the EM algorithm for estimating gaze by tracking the pupil state. This method is claimed to be robust against sudden changes in the lighting condition from IR to non-IR. Tracking and detection of eyes through template-based correlation maximization is simple and effective [Hallinan 1991], [Grauman et al. 2001]. Hallinan uses a model consisting of two regions with uniform intensity. One region corresponds to the dark iris region and the other to the white area of the sclera [Hallinan 1991]. This approach constructs an idealized eye and uses statistical measures to account for the intensity variations in the eye templates. The main disadvantages of these kinds of image template based methods are: lack of size invariance and we do not have direct access to the eye parameters. The Kalman filter based approach has been used in some of the literature to track the pupil center as a characteristic feature for tracking the eye movement in image. In the Kalman filter algorithm, a series of measurements are used over time, containing noise and other inaccuracies, and produces estimates of unknown variables that tend to be more precise than the single measurement alone. The Kalman filter has been applied for tracking the pupil center in videooculography [Zhu et al. 2002], [Xie et al. 1995]. The Kalman filter only provides the first order approximation for general systems. The Extended Kalman Filter [Welch and Bishop 1995] is one way to handle the nonlinearity. A more general framework is provided by the particle filtering techniques. Particle filtering is a Monte Carlo solution for general form dynamic systems. The particle filters have the advantage that with sufficient samples, the solutions approach the Bayesian estimate. In [Hansen and Pece 2005] the pupil center of the eye is tracked using an active contour tracker that combines particle filtering with the Expectation Maximization algorithm. The method exhibits robustness to the light changes and camera defocusing; consequently, the model is well suited for the use in systems using off-the-shelf hardware, but may equally well be used in the controlled environments, such as in IR-based settings. The method is even capable of handling sudden changes between IR and non-IR light conditions, without changing the parameters. Morimoto et al. uses a pupil tracking system that uses a single wide field CCD camera and IR sources of same wavelengths put in two concentric circles with center lying on the optical axis of the camera to produce dark and bright pupil images based on radii of circles [Morimoto et al. 2000]. Pupils are detected from simple thresholding of the difference of the dark pupil image from the bright pupil image from two consecutive frames and resulted blob is tracked by using the Kalman filter.

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

4.5

·

9

Torsion estimation

Gaze direction is either modeled as the optical axis or as the visual axis. Ocular torsion is defined as the rotation of the globe around the visual axis. The ocular torsion estimation using the scleral search coils is intrusive and not comfortable to the subjects. Another alternative is to use electrooculography, but the data obtained are usually of low resolution. The videooculography based ocular torsion estimation is, therefore, a non-intrusive and viable alternative. The video based ocular torsion estimation can be divided into three types: visual inspection based, crosscorrelation or template based [Moore et al. 1996], and feature tracking based. Methods based on visual inspection provide a reliable estimate of the torsion, but it is labor intensive and slow. Crosscorrelation based approaches have problems in dealing with imperfect pupil tracking, eccentric gaze positions, changes in the pupil size, and nonuniform lighting [Ong and Haslwanter 2010]. The tracking of local iris features in an image can give us a good ocular torsion estimation, but it may fail when image contrast is low, or in the presence of the ambient lighting or shadowing. In [Ong and Haslwanter 2010], authors proposed a feature tracking based torsion estimation approach using iris feature identified as maximally stable volume. The pupil is tracked to obtain the orientation, and the affine transformation from an ellipse to the detected pupil in the image plane is computed. In [Moore et al. 1996], they used a polar cross correlation based approach to determine torsion. The polar cross correlation method relies on the fact that most of the variation in the pixel intensity of a digitized image of the iris occurs in the angular direction in a polar coordinate system centered on the pupil. A template matching based approach is used in [Zhu et al. 2004] for estimating ocular torsion, which is not only faster but also more robust than polar cross correlation based approaches. They also proposed an elastic iris-pupil model to consider the change in size of the pupil for variable illumination during the eye movements, but this method measures torsion only based on data from image without considering any 3D eyemodel. 4.6

Eye blink detection

Detecting the eye blink during eye movement is another important direction of our investigation. In [Sirohey et al. 2002], the eye corners, eyelids, and irises were detected in every frame of an image sequence, and the movements of the irises and eyelids were analyzed to determine the changes in the gaze direction and the eye blink respectively. A state machine based model is proposed in [Tian et al. 2000]. Based on the detection of the iris in the input frame, two states– ‘open’ and ‘closed’ are defined. They created a template for the eye with certain parameters like the eyelid positions, iris diameter etc., for feature detection. [Feng and Yuen 1998] used a variance projection function to locate landmarks of the human eye, which are used to guide the detection of the eye position for human face recognition. This technique is used in our project to determine the eye state and localize the eyelids. 4.7

Gaze estimation

The main aim of the gaze modeling in videooculography is to determine the relation between the image data and the gaze direction. From the image data we can only obtain the information about the optical axis assuming we know the projection

10

·

Debanga Raj Neog

model between image space and the head-fixed coordinate space. The angle between the visual axis and the optical axis is usually determined by a calibration process. When the point of gaze is at a finite, but relatively large distance, the assumption that the fixation line and the visual axis are parallel is reasonable [Guestrin and Eizenman 2010]. Once calibration is done, the feature based gaze estimation can be performed in two ways: model-based approaches and regression-based approaches. In model-based approaches, the physical structure of the eye is modeled geometrically as a 3D model, and the gaze vector is defined on that geometric model. By transforming the gaze vector to the world coordinates and finding its intersection with the object of the scene, point of regard is obtained. Moore et al. described the standard central projection of the eye onto the image plane using a geometrical eye model [Moore et al. 1996]. The Fick coordinates corresponding to the pupil center in the image can be obtained under various projections using this model. A detailed mathematical formulation of the model is discussed in Section 5.3. In [Tsukada et al. 2011], the authors proposed an iris detection and gaze estimation technique by modeling the iris as an ellipse with only 2 parameters by using the projection of the circular iris of fixed radius in the head-fixed coordinates onto the image plane. The regression-based methods assume that the mapping from the image space to the gaze coordinates have a particular parametric form such as a polynomial or a non parametric form such as in the neural network. Usually the pupil glint vector is used as the image feature vector, which is mapped to the point of regard by a polynomial mapping approximation. Morimoto et al. used a single camera and a second degree polynomial for this mapping, but it is found that this mapping is very sensitive to the head movements [Morimoto et al. 2000]. By using additional cameras, an improved performance can be obtained for the 2D regression based methods for gaze estimation [Zhu and Ji 2007]. 5.

TECHNIQUES

In this section, the major techniques used in our project are discussed with mathematical details. We described the techniques in the sequence followed in our proposed videooculography framework. 5.1

Eye feature detection in image

The eye detection methods always start with modeling of eye, either explicitly or implicitly. It is essential to identify an eye model which is sufficiently expressive to take account of a large variability in appearance and dynamics. The model being either rigid or deformable, the taxonomy of eye detection consists of mainly two methods: shape-based methods and appearance-based methods [Hansen and Ji 2010]. 5.1.1 Eye feature estimation. We need to first select some good and relevant features to track. The features most commonly used for the tracking purposes are: pupil, iris and eyelids. We also need to remove some artifacts such as eyelashes. In the image of the eye, pupil is the most prominent feature to track due to its characteristic intensity profile. The pupil is a dark elliptical part of the eye surrounded by the white sclera. There are many literature focused mainly on detecting the pupil and tracking it to obtain the information of the eye movement. In our method the Hough Transform is used to localize the pupil in the image

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

11

obtained from an eye tracing device. If observed carefully, one can notice that the pupil is actually neither a perfect circle nor an ellipse. So, we cannot directly use the Hough transform for circles for pupil detection. Furthermore, the elliptical Hough transform is computationally expensive. We used the Hough transform as an initial guess of the pupil position, or in the case of pupil occlusion. We will discuss further on our technique, after briefly introducing readers to the Hough transform. Hough Transform. The Hough Transform has been used widely in the field of image processing and computer vision for the purpose of feature extraction. Classically it was first proposed to detect lines in an image. Various recent works have extended its applicability to detect arbitrary shapes, where circles and ellipses are the most common shapes [Ballard 1981].

Fig. 5: Illustration: Hough transform for circles [Habib and Kelley 2001]

Line detection. The simplest shape detection that can be performed using the Hough Transform is to detect straight lines. The straight line can be expressed using a linear equation y = mx + c. Let’s consider a point in the image space (x, y). Given the point (x, y) and considering the parameter pair (m, c) to be unknown, the point (x, y) can be represented as a straight line m = (−1/x)c + (y/x) in the parameter space. Now if we consider several points in the image space that lie in a straight line, the lines in the parameter space corresponding to those points will intersect at a single point in the parameter space, say (m0 , c0 ). If the parameter space is divided into bins, then we can count the number of votes corresponding to each and every point in the image space. For perfectly collinear points in the image space, if we accumulate the votes in the parameter space, the bin for (m0 , c0 ) should get the maximum number of votes. But unfortunately, one of the main problems in using the Cartesian coordinate system is that vertical lines will lead to unbounded values in the parameter space. Therefore for computational reasons the polar coordinates are used for the parameter space, where the pair of parameters to be used are r and θ. The parameter r represents the distance between the line and the origin, while θ is the angle of the vector from the origin to the closest point (both in image space). Eq. 1 is the equation of line in image space. The point (x0 , y0 ) in the image space is mapped to the parameter space as given in Eq. 2. Eq. 2 corresponds to a sinusoidal curve in the (r,θ) plane. Therefore, the points in the image space which produce a straight line will produce sinusoids which cross at the parameters for the line.

12

·

Debanga Raj Neog

   r  cos θ y= − x+ sin θ sin θ

(1)

r(θ) = x0 . cos θ + y0 . sin θ

(2)

Circle detection. The Hough transformation for circle detection in image follows a similar approach as the line detection technique. The Hough transform can be used to determine the parameters of a circle when number of points that fall on the perimeter are known. A circle with radius R and center (a, b) can be described with these parametric equations: x = a + Rcos(θ)

(3)

y = b + Rsin(θ)

(4)

When the angle θ sweeps through the full 360 degree range, the points trace the perimeter of a circle. If the circles in an image are of known radius R, then the search can be reduced to 2D. The objective is to find the (a, b) coordinate of the centers. The locus of the (a, b) points in the parameter space fall on a circle of radius R centered at (x, y). The true center point will be common to all parameter circles, and can be found with a Hough accumulation array. Fig. 5 shows an illustrative diagram of circle detection using the Hough transform. Pupil center detection technique. Before applying the Hough transform, we need to detect the pupil edges present in the image. To detect the pupil edges for all orientations we generate an edge image Iedge using the Canny edge detector. The Hough transform for circle is then used on Iedge to detect an approximate location of the pupil. We need to provide a range of pupil radii to the Hough transform algorithm, which is estimated by a manual calibration from a reference image. Fig. 6 shows the method findPupilCenter used for the pupil center estimation. This method also calls the method findPupilOcclusion described next. In the pupil center finding algorithm, as shown in Fig. 6, the input arguments are: input image I form an eye tracker, radius RHT of the Hough circle detected, center of the Hough circle C = (xHT , yHT ), and the function returns the position Pc of the pupil center. As mentioned in the previous discussion, we need to consider the pupil occlusion, as pupil can be occluded due to the eyelids mainly in two situations: firstly, in the onset or after the blinking, and secondly, when we look extreme up or down. Another occasion where the pupil can get occluded, is during pupil dilation in dark environment. Although we are not interested in pupil occlusion during eye blink, the center of mass technique for the pupil center estimation can lead to a very inaccurate result, when pupil is occluded during extreme eye movements. We observed that we can safely assume that occlusion occurs due to upper and lower eyelids, based on our observation. We define a pupil horizontal width profile, d(r) of the estimated pupil region as a criterion for the pupil occlusion determination. The pupil occlusion detection algorithm and pupil horizontal width profile is described in Fig. 7. The symmetry metric, κ mentioned in Fig. 7 is defined as follows:

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

13

Fig. 6: Pupil center finding algorithm.

κ=

1 · 100

X

d((µ − 1.5RHT + i) · d(µ + i)

(5)

i=1:1.5RHT

where, P

i=1:length(d(r))

µ= P

d(i) · i

i=1:length(d(r))

d(i)

(6)

The parameter κo used in the pupil occlusion detection algorithm is experimentally determined to be approximately 600. 5.2

Pupil Feature tracking

We discussed in Section 4, the benefits of having a filtering based pupil center tracking approach. In our project we have applied a Kalman filter based tracking in the image domain. Although it will be more logical to apply SE(3) filtering in the head-fixed coordinate space with origin at globe center, we keep it for our future work. Here we briefly describe the Kalman filtering technique for our approach: The motion of the pupil at any time instant can be characterized by its position and velocity. Let’s consider (xt , yt ) represents the pupil pixel position at time t, and (ut , vt ) be its velocity at time t along x and y directions. The state vector at t time t can therefore be represented as xt = (xt yt ut vt ) . The dynamic model of

14

·

Debanga Raj Neog

Fig. 7: Pupil occlusion detection algorithm.

the system can be represented as: xt+1 = φxt + wt

(7)

Where wt represents system perturbation. Assuming the fast feature extractor estimator is zt = (xˆt , yˆt ) giving the pupil position at time t, the observation model now can be represented as: zt = Hxt + vt

(8)

Fig. 8: Pupil horizontal width profile, showing center of mass based pupil center(in red) and updated pupil center based on our algorithm(in green).

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

15

Where vt represents the measurement uncertainty. Specifically, the position of the current frame at t, is estimated by some pupil localization technique in the neighborhood of the predicted position (based on the system dynamic model). Given the dynamic model and the observation model, as well as some initial conditions, the state vector xt+1 , along with its covariance matrix Σt+1 , can be updated using the system model (for prediction) and the observation model (for updating). Once we obtain the updated location of the pupil center, this feature can be projected to the head-fixed coordinates to obtain horizontal and vertical eye movements, which is discussed in details in next section. 5.3

Imaging geometry

In eye detection, it is essential to identify a model for the eye which is sufficiently expressive to take account of the large variability in the appearance and dynamics, while also sufficiently constrained to be computationally efficient [Hansen and Ji 2010]. In our model the eye is assumed to be a perfect sphere exhibiting ideal ball and socket behavior as in [Moore et al. 1996]. Based on that model, the eye movements are pure rotations around the center of the globe without any translational component. Furthermore the iris is considered to be a planer section and at a distance rp from the center of the globe. The central projection of the eye onto the image plane is shown in Fig. 9.

Fig. 9: The central projection of the eye onto the image plane [Moore et al. 1996]

To define the imaging geometry we will use the mathematical notation used in [Moore et al. 1996]. Following this paper, matrices are represented by uppercase characters (e.g. R), point in 3D space by underscored uppercase letters (e.g. P) and unsigned scalar quantities by italicized lowercase letters. To derive the mathematical formula for the projection of the eye onto the image plane, we define an orthogonal head-fixed, right-handed coordinate system {h1 , h2 , h3 } with the origin at the center of the eye as shown in Fig. 9. The h2 -h3 plane is parallel to the coronal plane, with the h2 axis parallel to the interaural axis of the subject. We define a camera-fixed coordinate system {c1 , c2 , c3 }, where the c2 and c3 axes lie within the image plane, and c1 corresponds to the “line of sight” of the camera. The plane of

16

·

Debanga Raj Neog

the camera lens is located on c1 axis at a distance f from the image plane, and a distance d from the center of the eye respectively. When the camera is focused on distant objects, f is equal to the focal length of the lens. Now, Eq. 9 gives the relation between the point in the head-fixed frame, P = (p1 , p2 , p3 ), and the corresponding coordinate with respect to the camera-fixed sys0 0 0 0 tem, P = (p1 , p2 , p3 ). 0

h

P = c R.P + T

(9)

h cR

where, and T are the rotation and the translation respectively of the headfixed coordinates with respect to the camera-fixed coordinates. We also define a globe-fixed coordinate system {e1 , e2 , e3 } that rotates along with the rotation of the globe. The reference globe position is defined as the orientation of globe when the visual axis of eye (i.e. direction of e1 ) coincides with h1 . If rotation from the head-fixed coordinate system to the globe-fixed coordinate system is given e by h R, we can express the rotation by following equation: e

ei = h R · hi , i = 1, 2, 3

(10)

The 3D rotation of the globe from the reference globe configuration (as defined before) to the current globe configuration can be decomposed into three consecutive rotations about the three well defined axes. The sequence of rotation is important as the multiplication of the rotation matrices for more than 2D are not commutative. Fick coordinates are what we intuitively use to describe the 3D rotation of the eye [Fick 1854]. Here, the sequence of rotations- first a horizontal(R3 (θ) about e3 ), then a vertical (R2 (θ) about e2 ), and finally a torsional rotation (R1 (θ) about e1 ), had first been used by Fick, and the angles θ, φ, and ψ for this sequence are referred to as Fick angles. This kind of rotation about globe-fixed axis is known as passive rotation or rotation of coordinate axes. To simplify the mathematics, this rotation can also be expressed as active rotation or rotation of object about head-fixed coordinates, but the sequence of rotation should be in the reverse order of the rotation sequence in the passive rotation. Therefore the rotation matrix for the rotation of the head-fixed coordinates, from reference orientation to the current globe configuration is given by: e h Rθφψ

= RF ick = R3 (θ) · R2 (φ) · R1 (ψ)

  cos θ cos φ cos θ sin φ sin ψ − sin θ cos ψ cos θ sin φ cos ψ + sin θ sin ψ =  sin θ cos φ sin θ sin φ sin ψ + cos θ cos ψ sin θ sin φ cos ψ − cos θ sin ψ  − sin φ cos φ sin ψ cos φ cos ψ

(11)

(12)

If we do not consider any ocular torsion, we will take only θ and φ for considere ation. Then h Rθφψ can be simplified by Eq. 13 as:

e h Rθφψ,ψ=0

= Rθφ

  cos θ cos φ sin θ cos θ sin φ =  sin θ cos φ cos θ sin θ sin φ  − sin φ 0 cos φ

(13)

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

17

Using Eq. 13 and P0 , we can update Eq. 9 to form Eq. 14. 0

h

P = c R.Rθφ .P 0 + T 0

(14) 00

The perspective projection of P onto the image plane, P is given by Eq. 15.   0 00 0 0 (15) P = (0 x y)T = f ∗ p2 /(p1 + f ) 0 0 p3 /(p1 + f ) Eq. 15 can be replaced by the simpler orthographic projection if the distance d between the lens plane and the center of the eye is much larger than the radius of the eye rp , or for small eye movements (θ and φ up to ± 100 ). Then Eq. 15 can be updated as Eq. 16:     0 0 00 (16) P = (0 x y)T = rp ∗ R21  + x0  R31 y0 where, (x0 , y0 ) is the projection of the center of the eye onto the image plane, and Rij is the matrix element in the ith row and jth column of the rotation matrix R, which is given by Eq. 17: h

R = c R.Rθφ

(17)

An inverse of this imaging geometry can also be formulated to determine the eye position in head-fixed coordinates given the pupil center location in the image. The details of this model can be found in [Moore et al. 1996]. 5.4

Ocular torsion estimation

In Section 5.3, we discussed how to obtain the horizontal and vertical eye movements, θ and φ, using the eyemodel proposed by [Moore et al. 1996]. Using this information we can also obtain the location of the iris boundary in the head-fixed coordinate space. The radius of the iris can be obtained by performing a manual c calibration. Using head to camera transformation, h E, computed during calibration, we can project the iris boundary from head-fixed coordinates to the image coordinates, for any biologically plausible globe orientation. Here we assume that the point of gaze is at a finite but relatively large distance, and therefore optical axis and visual axis can be assumed to be parallel to each other [Guestrin and Eizenman 2010]. The measurement of ocular torsion can be reduced to a one-dimensional signal processing task by forming an iral signature τ (µ) from the pixel intensity values of the iris along a circular sampling path (arc/annulus) centered at the pupil center, where µ is the polar angle of a point on this circular sampling path. Iral signatures from each video frame are computed by sampling along an arc (which can have a finite width), which is projected from the head-fixed coordinates to the image coordinates. The ocular torsion estimation is always relative to a reference eye position. We take the image of the eye looking to the front as a reference image and the iral signature computed along circular arcs in that image is considered as

18

·

Debanga Raj Neog

iral reference signature τo (µ). Now, each of those signatures are crosscorrelated as in Eq. 18, with the iral reference signature τo (µ). ∗

xcorr(τ (µ), τo (µ)) = F −1 (F (τ (µ)).F (τo (µ) )

(18)

Where F (τ (µ)) denotes the Fourier transform of τ (µ). The angular displacement corresponding to the peak in the crosscorrelation function gives the value of the ocular torsion relative to the reference globe orientation. ψ = arg [max(xcorr(τ (µ), τo (µ)))] µ

µ∈Θ

(19)

where, Θ is the range of µ. As in [Zhu et al. 2004], we also used an elastic pupil-iris model to dynamically determine the updated radial location of the sampling arc. Using this model the radius of the sampling arc rs , in the globe coordinates can be determined as: rs = rp +

ri − rp 0 (r − rp0 ) ri − rp0 s

(20)

where, rp , and ri are radii of the pupil and the iris in the reference signature respectively, rs0 is radius of the sampling arc, rp0 is the effective radius of the pupil at any time instant. 5.5

Eye blink detection

In our project, variance projection function (VPF) is used as in [Zhou and Geng 2004], to estimate the location of the eyelids. The VPF was originally used in [Feng and Yuen 1998] to guide the detection of eye position and shape for the applications in human face recognition. In the preprocessing step, a 2D Gaussian filter is applied over the grayscale image of the eye region to remove eyelashes. Then the input image is processed using a directional Sobel edge detection filter to detect the horizontal edges, assuming that the eyelids are almost horizontal. On the filtered image, the VPF is estimated to detect the location of the eyelids in the image.

Fig. 10: Iris detection and annulus selection for ocular torsion estimation.

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

19

Fig. 11: Ocular torsion and eye velocity in eye movements showing optokinetic nystagmus.

The VPF can be computed both in the horizontal and vertical directions. In our case, we have used horizontal VPF on image Ie , which is an edge filtered image of the original input image from eye tracker. The VPF of Ie at a particular row of the image corresponding to the vertical coordinate y is given by:

V P F (y) =

x2 X 1 2 [Ie (xi , y) − Hm (y)] x2 − x1 x =x i

(21)

1

where, Hm (y) =

x2 X 1 Ie (xi , y) x2 − x1 x =x i

(22)

1

We have attempted to solve the following problems in relation to the eye blink estimation: (1) how to decide the state of the eye, whether it is open or blinking, and (2) how to detect and locate the eyelids? In the plot of horizontal VPF for a single eye, two distinct peaks can be observed that correspond to the upper and lower eyelids. The distance between the two peaks gives the eyelid distance profile (EDP) or blink profile [Trutoiu et al. 2011]. If EDP is less than a particular threshold, we identify the occurrence of an eye blink. To localize the eyelids, a cubic spline interpolation is used to interpolate the available feature points of the eyelids viz. inner and outer eye corners, and an edge point. The position of the inner and outer corners in the eye image are estimated manually in a reference image, assuming that they do not move significantly during the eye movements. The edge point to be used is obtained as follows: the vertical coordinate of the edge point is computed from the location of the peaks of the VPF of Ie . If the eye is not in blinking state, the horizontal location of pupil can give the horizontal coordinate of the edge point. If eye is in blinking state, the eyelid should appear almost flat, thus position at the half width of the image can be safely considered as the horizontal component of the edge point.

20

·

Debanga Raj Neog

Fig. 12: Blink profile and eyelid velocity profile at an instant of eye blink.

5.6

Gaze estimation in world coordinates

In Section 5.3, we described the method to obtain θ and φ, i.e. horizontal and vertical eye movements respectively w.r.t. the head-fixed coordinates with origin at the globe center. Subsequently, Section 5.4 provides a method to compute ocular torsion to get ψ (as defined in Section. 5.3). Using these information, we estimate gaze in the globe. In this section, we will discuss how to extend this method to estimate gaze in world, by transforming the head-fixed coordinates to the world coordinates (or ground coordinates). Fig. 13 shows all the coordinates used in our discussion. Some of those coordinates are already defined, and rest will be defined in following sections.

Fig. 13: The coordinate systems used in our project.

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

21

Defining head fixed coordinate system As in [Ronsse et al. 2007], head pose is defined in terms of a ground-based (i.e., motionless with respect to the laboratory) coordinate system [g1 , g2 , g3 ] as shown in Fig. 13. To efficiently compute the head pose, one must measure the position of the three points on the head, which must not be collinear. Let us denote these points by Ta , Tb , and Tc , and define a plane parallel to the frontal plane h2 − h3 . Since the head is assumed to be a rigid body, the positions of these points completely determine the head pose. Mathematical formulation. It is important to estimate the position of the center of the globe, which is computed using the head’s anthropomorphic characteristics and the locations of the markers Ta , Tb , and Tc . The globe center is the origin of the eye-in-head coordinate system [h1 , h2 , h3 ]. The head orientation is defined as the orientation of the vector h1 with respect to the world coordinate system [g1 , g2 , g3 ], and computed as, h1 =

(Tc − Tb ) × [(Tc − Ta ) × (Tb − Ta )] |(Tc − Tb ) × [(Tc − Ta ) × (Tb − Ta )]|

(23)

Now, h2 and h3 are defined as follows: h2 =

Tc − Tb , h3 = h1 × h2 |Tc − Tb |

(24)

Therefore, [h1 , h2 , h3 ] and the location of the center of the globe provide us the h transformation g E, i.e. from world to the head-fixed coordinates centered at globe center. If Ph is position vector of point of regard in head-fixed coordinates centered at the globe center, then the position in the world coordinates Pg can be computed as,

Fig. 14: Head-fixed coordinate system with origin at globe center. Ta ,Ta ,Ta are markers on helmet of the eye tracker. M1 ,M2 , and M3 are markers used to compensate the helmet slippage.

22

·

Debanga Raj Neog

Fig. 15: Helmet slippage in an eye tracking experiment with extreme head movement.

h

Pg = g E · Ph

(25)

Gaze using binocular information. By using the method just explained, we can estimate the gaze in 3D for a single eye. Since our eye tracker captures video of both the eyes simultaneously, by combining the gaze from both the eyes we can find the point of regard as the intersection of the visual axes of both the eyes. This allows us to analyze vergence, in addition to other eye movements such as saccade or smooth pursuit. 5.7

Helmet slippage compensation

The accuracy of the videoocculography based techniques, that use head mounted eye trackers to determine the gaze from video, depends on how firmly the helmet of eye tracker is attached to the head. There may be slippage of the helmet of the eye tracker on head. To determine that slippage we need to put some additional markers on suitable positions on the head, and can also use characteristic features in the eye image as intrinsic markers. We attach a few motion capture markers securely to bony landmarks on the head. In Fig. 14, the markers are shown, which are M1 , M2 , and M3 . Based on the that information, we recalibrate the rotation h h c R and translation c T of the head-fixed coordinate system [h1 , h2 , h3 ] (with origin at the center of the globe) with respect to the camera-fixed coordinate system [c1 , c2 , c3 ]. In our experimental setup, we put markers on:(1) bony landmarks on head to get h et transformation g E, (2) on the eye tracker to obtain g E using a standard motion capture system, where [g1 , g2 , g3 ], and [et1 , et2 , et3 ] represent world and eye tracker coordinate system respectively. h From the calibration at time t = 0, we can estimate c E, which along with previous c transformations gives et E at time t = 0. Considering eye tracker to be a rigid body, c g g c et E in fact remains constant for all time. Now given, h E, et E, et E at any time t c we can compute h E. The rotation associated with the eye tracker helmet slippage

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

23

Fig. 16: Distribution of point of regard (in green) while the subject fixates at marker shown in red.

can be obtained as follows: c hR

6.

c

g

T

g

= et R · et R · h R

(26)

EXPERIMENTS

We performed several experiments to validate our proposed 3D gaze estimation technique. Our experimental setup consists of an 8-camera Vicon MX motion capture system (Vicon, Los Angeles) to track IR markers in the world coordinates, and a head mounted C-ETD eyetracker (Chronos Vision, Berlin) to capture the headunrestrained eye movements. Both systems are recorded at 100 Hz, and appears sufficient for our implementation. We implemented the algorithm in MATLAB (version 7.10.0, R2010a, MathWorks Inc.), and the software ran on a 2.67 GHz Intel Core i5 computer. To validate our technique, I, acting as subject, wore the C-ETD eye tracker, and put IR markers on the eye tracker and different locations on my head as described in Section. 5.6. During calibration, as described in Appendix A, the subject visually followed the markers of the calibration bar keeping their head fixed. By collecting the video using the eye tracker, and the world coordinates of the IR markers using the motion capture system, we perform the calibration to estimate the parameters mentioned in Appendix A. In regular experiment sessions, the subject had to fixate at different markers put randomly in the motion capture environment. In Fig. 16, we have shown the distribution of a points of regard estimated from one of our experiments, when the subject fixates at a stationary marker. The standard deviation of the error distribution of the points of regard in three orthogonal directions in world space is shown in Fig. 17 for subject performing fixation at three different

24

·

Debanga Raj Neog

Fig. 17: Standard deviation of error in point of regard in binocular gaze estimation (in mm.) along X,Y and Z directions of the world coordinates.

marker locations. The sources of noise in our result, as we identified, are mainly: (1) inaccurate noise model in image space, and (2) approximations in geometric eye model. As shown in Fig. 15, we also estimated the helmet slippage in few of our eye tracking experiments. The helmet slippage is significant when head moves to extreme orientations or moves fast. 7.

CONCLUSION

The results of our experiment show that our videooculography based technique can be successfully used for 3D gaze estimation. The major contribution of our proposed framework is combining eye blink detection to a videooculography based eye tracking. We also proposed a novel pupil occlusion detection method based on pupil horizontal width profile and update the pupil center accordingly (See Fig. 8). There are still lot of possibilities to explore. One of those is, tracking eye features in head-fixed coordinates instead of the image plane to incorporate the dynamics of the eye for more accurate tracking. Again, a particle filter based approach will be more practical to use than using a Gaussian noise model based Kalman filter tracking. Our 3D gaze estimation framework can easily be extended to non-IR video cameras and stereo camera systems. The main benefits of using the C-ETD in our experiments are: (1) this is a wearable device that keeps the camera-to-head transformation easy to compute, and (2) it captures more detailed close-up eye images. In future we can use a high resolution and head-mounted video capturing device to capture the colored eye features such as iris texture and skin textures as well. A more detailed iris texture can significantly improve the estimation of ocular torsion. Another issue that we would like to further investigate with our experimental

Research Proficiency Evaluation Project: Robust 3D Gaze Estimation

·

25

setup is to look closely at the pupil dynamics under different environmental lighting, and subject looking at different scenes on the screen. Furthermore, we would like to extend our eyelid tracking algorithm to track the eyelid shape and its texture features.

REFERENCES Atienza, R. and Zelinsky, A. 2002. Active gaze tracking for human-robot interaction. Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, 261–. Ballard, D. H. 1981. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition 13, 2 (Jan.), 111–122. Chronos. 2004. Chronos Eye Tracking Device Manual, Chronos Vision, Berlin. Daugman, J. 2004. How iris recognition works. Circuits and Systems for Video Technology, IEEE Transactions on 14, 1, 21–30. Duchowski, A. T. 2002. A breadth-first survey of eye-tracking applications. Behav Res Methods Instrum Comput 34, 4 (Nov.), 455–470. Feng, G. and Yuen, P. 1998. Variance projection function and its application to eye detection for human face recognition. Pattern Recognition Letters 19, 9, 899–906. Fick, A. 1854. Die bewegung des menschlichen augapfels. Z. Rationelle Med. 4, 101–128. Grauman, K., Betke, M., Gips, J., and Bradski, G. 2001. Communication via eye blinksdetection and duration analysis in real time. Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on 1, I–1010. Guestrin, E. and Eizenman, M. 2010. Listing’s and donders’ laws and the estimation of the point-of-gaze. Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, 199–202. Habib, A. and Kelley, D. 2001. Automatic relative orientation of large scale imagery over urban areas using modified iterated hough transform. ISPRS journal of photogrammetry and remote sensing 56, 1, 29–41. Hallinan, P. 1991. Recognizing human eyes. Proceedings of SPIE 1570, 214. Hansen, D. and Pece, A. 2005. Eye tracking in the wild. Computer Vision and Image Understanding 98, 1, 155–181. Hansen, D. W. and Ji, Q. 2010. In the eye of the beholder: A survey of models for eyes and gaze. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 478–500. Huang, J. and Wechsler, H. 1998. Visual search of dynamic scenes: Event types and the role of experience in viewing driving situations. G. Underwood (Ed.), Eye Guidance in Reading and Scene Perception, 369–394. Jacob, R. and Karn, K. 2003. Eye tracking in human-computer interaction and usability research: Ready to deliver the promises. Mind 2, 3, 4. Ji, Q. and Yang, X. 2001. Real time visual cues extraction for monitoring driver vigilance. Computer Vision Systems, 107–124. Kim, K. and Ramakrishna, R. 1999. Vision-based eye-gaze tracking for human computer interface. Systems, Man, and Cybernetics, 1999. IEEE SMC’99 Conference Proceedings. 1999 IEEE International Conference on 2, 324–329. Li, D., Winfield, D., and Parkhurst, D. 2005. Starburst: A hybrid algorithm for video-based eye tracking combining feature-based and model-based approaches. Computer Vision and Pattern Recognition-Workshops, 2005. CVPR Workshops. IEEE Computer Society Conference on, 79– 79. Masek, L. et al. 2003. Recognition of human iris patterns for biometric identification. M. Thesis, The University of Western Australia. Min, T.-H. and Park, R.-H. 2009. Eyelid and eyelash detection method in the normalized iris image using the parabolic hough model and otsu’s thresholding method. Pattern Recogn. Lett. 30, 1138–1143.

26

·

Moore, S., Haslwanter, T., Curthoys, I., and Smith, S. 1996. A geometric basis for measurement of three-dimensional eye position using image processing. Vision research 36, 3, 445–459. Morimoto, C., Koons, D., Amir, A., and Flickner, M. 2000. Pupil detection and tracking using multiple light sources. Image and vision computing 18, 4, 331–335. Ong, J. and Haslwanter, T. 2010. Measuring torsional eye movements by tracking stable iris features. Journal of neuroscience methods 192, 2, 261–267. Pamplona, V., Oliveira, M., and Baranoski, G. 2009. Photorealistic models for pupil light reflex and iridal pattern deformation. ACM Transactions on Graphics (TOG) 28, 4, 106. Ronsse, R., White, O., and Lefevre, P. 2007. Computation of gaze orientation under unrestrained head movements. Journal Of Neuroscience Methods 159, 1, 158–169. Sirohey, S., Rosenfeld, A., and Duric, Z. 2002. A method of detecting and tracking irises and eyelids in video. Pattern Recognition 35, 6 (June), 1389–1401. Tian, Y., Kanade, T., and Cohn, J. 2000. Dual-state parametric eye tracking. Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on, 110–115. Trutoiu, L., Carter, E., Matthews, I., and Hodgins, J. 2011. Modeling and animating eye blinks. ACM Transactions on Applied Perception (TAP) 8, 3, 17. Tsukada, A., Shino, M., Devyver, M., and Kanade, T. 2011. Illumination-free gaze estimation method for first-person vision wearable device. Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, 2084–2091. Welch, G. and Bishop, G. 1995. An introduction to the kalman filter. University of North Carolina at Chapel Hill, Chapel Hill, NC 7, 1. Xie, X., Sudhakar, R., and Zhuang, H. 1995. Real-time eye feature tracking from a video image sequence using kalman filter. Systems, Man and Cybernetics, IEEE Transactions on 25, 12, 1568–1577. Young, D., Tunley, H., Samuels, R., and of Sussex. School of Social Sciences. Cognitive Studies Programme, U. 1995. Specialised hough transform and active contour methods for real-time eye tracking. University of Sussex, Cognitive & Computing Science. Zhou, Z. and Geng, X. 2004. Projection functions for eye detection. Pattern recognition 37, 5, 1049–1056. Zhu, D., Moore, S., and Raphan, T. 2004. Robust and real-time torsional eye position calculation using a template-matching technique. Computer methods and programs in biomedicine 74, 3, 201–209. Zhu, Z. and Ji, Q. 2007. Novel eye gaze tracking techniques under natural head movement. Biomedical Engineering, IEEE Transactions on 54, 12, 2246–2260. Zhu, Z., Ji, Q., Fujimura, K., and Lee, K. 2002. Combining kalman filtering and mean shift for real time eye tracking under active ir illumination. Pattern Recognition, 2002. Proceedings. 16th International Conference on 4, 318–321.

Appendices A.

CALIBRATION PARAMETERS ESTIMATION

The eyemodel used in this project has several parameters which are needed to be computed using a calibration procedure as used in [Moore et al. 1996]. The calibration procedure described in [Moore et al. 1996], determines six parameters: Fick angles θof f ,φof f , and ψof f corresponding to the rotation matrix Rof f h or c R or camera to head transformation, radius of the eye at pupil center rp , 00 and projection of center of eye on the image plane T = (0, x0 , y0 ). To deter-

·

27

mine these parameters, five calibration points are used at purely vertical locations at ±φcal , purely horizontal locations at ±θcal , and location with eye looking to front in the reference position. For our experiments we put the markers on a calibration bar, placed at an appropriate distance from the eye positions of the subject. Lets assume, the corresponding coordinates in image are: (x+φcal , y+φcal ),(x−φcal , y−φcal ),(x+θcal , y+θcal ),(x−θcal , y−θcal ),(xr , yr ). We consider θcal =φcal . The equations used to estimate the parameters are:  ψof f = −atan

rp 2 =



y+φcal +y −φcal − 2 · yr 2 · (1 − cos(φcal ))  φof f = asin  θof f = asin

00

y+θcal − y−θcal y+φcal −y −φcal

2

 +

 (27)

y+φcal −y −φcal 2 · cos(ψof f ) · sin(φcal )

y+φcal +y −φcal − 2 · yr 2 · rp · (1 − cos(φcal ))

2 (28)



x+φcal +x−φcal − 2 · xr 2 · rp · cos(φof f )(cos(φcal ) − 1)

(29) 

T = (0, x0 , y0 ) = (0, xr, yr ) − rp · (0, cos(φof f ) · sin(θof f ), − sin(φof f ))

(30) (31)