The Effect of Mapping Function on the !ccuracy of a

0 downloads 0 Views 558KB Size Report
Aug 31, 2013 - Purkinje reflections [6]. The vector difference between the pupil centre and the first Purkinje image (PI) (also known as the glint or corneal ...
Proceedings of Eye Tracking South Africa, Cape Town, 29-31 August 2013

The•Effect•of•Mapping•Function•on•the• Accuracy•of•a•Video-Based•Eye•Tracker• Pieter Blignaut, Daniël Wium Department of Computer Science and Informatics University of the Free State South Africa

[email protected] easy to operate – unlike systems that use other equipment, for instance scleral search coils.

ABSTRACT In a video-based eye tracker the pupil-glint vector changes as the eyes move. Using an appropriate model, the pupil-glint vector can be mapped to coordinates of the point of regard (PoR). Using a simple hardware configuration with one camera and one infrared source, the accuracy that can be achieved with various mapping models is compared with one another. No single model proved to be the best for all participants. It was also found that the arrangement and number of calibration targets has a significant effect on the accuracy that can be achieved with the said hardware configuration. A mapping model is proposed that provides reasonably good results for all participants provided that a calibration set with at least 8 targets is used. It was shown that although a large number of calibration targets (18) provide slightly better accuracy than a smaller number of targets (8), the improvement might not be worth the extra effort during a calibration session.

Tracking a person's gaze with a video-based system involves a number of steps. These steps can be loosely grouped into two sets, namely those involved with the detection of the eyes and eye features on the video frames, and those which uses the detected features to track a person's gaze. A simple video-based eye tracking system was developed with one camera and one infrared light source. For purposes of this paper, it is assumed that the location of features in the eye video is known and the focus is on the challenge to use these as input to determine a person's point of regard (PoR).

2. GAZE ESTIMATION A person can change his visual focus, or point of regard, in two ways. He can move his head or change the direction of his eyes while keeping his head still. Naturally, a combination of these two movements is also possible. Therefore, in order to estimate a person's PoR, both the position of the head and the orientation of the eyes within the head must be taken into account.

Categories and Subject Descriptors H.1.2 [Information Systems]: User/Machine systems

General Terms

Although other methods have been used successfully for gaze estimation, the most popular ones are feature-based. These methods use features in the eye video to gauge where a person is looking. Most video-based eye trackers include one or more infrared lights to illuminate the eyes and cause a reflection on the corneal surface. This reflection, called the First Purkinje image or glint, together with the centre of the pupil, are the most commonly used features.

Performance, Reliability, Standardization

Keywords Eye tracking, calibration, accuracy, mapping functions

1. INTRODUCTION A person's gaze behaviour can reveal much about his cognitive processes and emotional state. Additionally, valuable information related to the stimuli or environment being viewed can be inferred by studying the gaze patterns of observers. With the improvement in accuracy, precision and robustness of gaze tracking systems in recent years, the applications of such systems have become numerous and are increasing rapidly.

The point of regard in a video-based system can either be determined by a model-based approach in which the geometry of the system and the eyes are used or by using regression to map the pupil-glint vector to screen coordinates [9]. This study focuses on the latter approach and compares the accuracy of various mapping models for a simple one camera, one glint configuration.

A wide variety of gaze tracking systems are available commercially, each tailored for a specific set of applications. Video-based eye-tracking is the most widely practiced eyetracking technique [12]. This technique is based on the principle that when near infrared (NIR) light is shone onto the eyes, it is reflected off the different structures in the eye to create four Purkinje reflections [6]. The vector difference between the pupil centre and the first Purkinje image (PI) (also known as the glint or corneal reflection (CR)), is tracked. The advantage of video-based systems is that they are largely unobtrusive and

2.1 Model-Based Gaze Estimation The model-based gaze estimation approach determines the 3D line of sight (LoS) and calculates the PoR as the point where the LoS intersects some object in the field of view, usually a computer monitor. It follows that for this approach to be implemented, the positions and orientations of the cameras, infrared lights and monitor (or other object(s) that the user might view) has to be known to a high accuracy. Such setups are said to be fully calibrated [9].

Copyright © 2013 by the Association for Computing Machinery, Inc

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail [email protected]. ETSA ‘13,Cape Town, South Africa, August 29–31, 2013. Copyright 2013 ACM 978-1-4503-2110-5/00/0010….$15.00

Guestrin and Eizenman [8] describe model-based approaches for a number of fully calibrated setups which differ in the number of cameras and infrared lights used. They show that a system with a single light and single camera is sufficient to determine the LoS, but not without the position of the head being known. This can be achieved by keeping the head fixed, or by estimating the distance between the eyes and the camera in some way. 39

Proceedings of Eye Tracking South Africa, Cape Town, 29-31 August 2013 which case R² (indication of goodness of fit) will be 1. Regression with less terms or through more points will result in a polynomial that does not necessarily pass through any of the points and R² will be less. This does, however, not necessarily mean that the approximations will be worse. In the simplified example of Figure 1 with a single independent variable, a 6th order polynomial (7 terms a0•+•a1x•+a2x2•+•a3x3•+•a4x4•+•a5x5•+• a6x6• ) fits the seven points exactly, but interpolation at x=17 will obviously be bad. A 5th order approximation of the data has a lower R², but interpolations between the points will be much better. Because of the end effects, even this polynomial will provide bad approximations at x values larger than 18.

The addition of extra lights relaxes this constraint since the positions of eyes (the centres of the corneas) can be calculated geometrically when additional glints are available [8], [21]. The way in which the camera(s) are set up plays a significant role in the amount of head movement that the system can accommodate, but unfortunately there is a trade-off with the accuracy that can be achieved. A larger field of view allows for greater head movement, but high resolution is required to provide accurate gaze estimates. One of the drawbacks of using more lights is that it becomes harder to keep the setup fully calibrated. Also, the additional glints make it more likely that one or more glints are lost during tracking, which limits the volume within which the head can move without any glints disappearing [9].

The decision of how the regression model should look is therefore not obvious. Draper and Smith [7] indicates that, apart from the brute-force method (i.e. testing all the possible equations), there is no systematic procedure that can provide the most suitable mapping equation.

Multiple-camera, multiple-light systems can address the headmovement/accuracy trade-off by using a camera with a large field of view to locate the eyes and another high-resolution camera to capture the eye images from which the gaze is calculated. A number of stereo camera setups have also been employed, each presenting its own advantage. Shih & Wu [2000] showed, for example, how a stereo setup can eliminate the need for calibration altogether. The main downside to multiple-camera, multiple-light systems is their complexity. In systems with a combination of wide-field and high-resolution cameras, some mechanism is required to move the highresolution cameras when the eyes exit their fields of view.

1 0.8 0.6 0.4 0.2 0 0

2.2 Regression-Based Systems

10

20

Figure 1. 5th and 6th order regressions of seven data points Besides the polynomials to use for interpolation, a mapping model also entails the number and arrangement of calibration targets. To limit the end effects of polynomial interpolation, it is important that there are targets on the edges of the display area. More calibration targets will allow polynomials with more terms which might result in better accuracy but takes more time and might be strenuous on the participant's eyes. The ideal situation would be to find a set of polynomials that is very accurate while using only a small number of calibration targets. Cerrolaza & Villanueva [4] generated a large number of mapping functions based on the way in which the image features are used as input to the regression expression as well as the degree and number of terms of the polynomial. They compared the various polynomials with regard to the accuracy that can be achieved while also varying the number of infrared light sources and the way in which the pupil centre is detected. They further distinguished between eight different ways of generating mapping variables from the image features. Each of these relate the pupil centre and glints in different ways, yielding different vectors which then serve as input to the regression algorithm. The PoR were modelled for the x- and ydirections separately. Cerrolaza & Villanueva [4] found that for setups with one infrared light source, the basic pupil-glint vector is the best mapping feature. For setups with two infrared light sources, pupil-glint vectors that are normalised with respect to the distance between the glints, proved to be reasonably accurate at different gaze distances. Apart from some of the simplest models, increasing the number of terms or the order of the polynomial had almost no effect on accuracy. No single best model was found and a preferred model was chosen as one that showed good accuracy across all configurations in addition to having a small number of terms and being of low order - thus making it easier to implement. For a system with two IR sources, they proposed the following model:

In a video-based eye tracker the pupil-glint vector changes as the eyes move. Using an appropriate model, the difference in xcoordinates (x') and difference in y-coordinates (y') between the pupil and glint can be mapped to screen coordinates (X,Y). Corrections for head movement can be done by normalising the pupil-glint vector in terms of the distance between the glints (if there are more than one IR source) or between the pupils (interpupil distance (IPD)), e.g. x = x' / IPD. Regression-based systems use polynomial expressions or neural networks to determine the point of regard as a function of the features in the eye image. Polynomial models should include two independent variables (x and y components of the pupilglint vectors) which may or may not interact with each other (thus the possible need for xnyn terms) for each one of the dependent variables (X and Y of the point of regard) separately. A simple linear polynomial model with four terms could look like this: Xleft•=•a0•+•a1x•+•a2y•+•a3xy• Yleft•=•b0•+•b1x•+•b2y•+•b3xy• where x and y refer to the normalised x and y components of the pupil-glint vector at a specific point in time and X left and Yleft refer to the coordinates of the PoR for the left eye on the two dimensional plane of the screen. For simplicity and readability, models will be written as pairs of strings without coefficients. The above model will, for example, be written as (x,y,xy;•x,y,xy). The coefficients ai and bi are determined through a calibration process which requires the user to focus on a number of dots (also referred to as calibration targets) at known angular positions while storing samples of the measured quantity [1] [17], [22]. A least squares regression is then done to determine the polynomials such that the differences between the reported PoRs and the actual PoRs (the positions of the dots on the monitor) are minimised. The regressions are done separately for the left and right eyes and an interpolated PoR is calculated as the average of the (Xleft,Yleft) and (Xright, Yright) coordinates.

Xleft•=•a0•+•a1x•+•a2x2•+•a3y• Yleft•=•b0•+•b1x2•+•b2y•+•b3xy•+•b4x2y•

A set of n points can be fitted with a polynomial of n terms in 40

Proceedings of Eye Tracking South Africa, Cape Town, 29-31 August 2013

An additional finding of Cerrolaza and Villanueva [4] was that some points from a grid of calibration points can be removed without affecting the errors obtained from calibration (see Figure 5 below).

tracking system that has an accuracy of 0.77° in the horizontal direction and 0.95° in the vertical direction which they deem acceptable for many HCI applications that allow natural head movements. Brolly and Mulligan [3] also proposed a videobased eye-tracker with accuracy in the order of 0.8°. Hansen and Ji [9] provided an overview of remote eye-trackers and reported the accuracy of most model-based gaze estimation systems to be in the order of 1° - 2°.

2.3 Comparison of Model-Based and Regression-Based Gaze Estimation The complexity inherent in fully calibrating a system makes model-based gaze estimation methods more suited to commercial products where the different components are fixed in place inside a rigid structure. Ramanauskas [19] showed that the two techniques can yield similar accuracy for ideal head positions. When two or more infrared lights are used, modelbased methods do seem to yield good accuracy for a wider range of head positions.

3. METHODOLOGY 3.1 Experimental Configuration An eye tracker was developed with a single CMOS camera with USB 2.0 interface and a 10 mm lens together with a single infrared light source. The UI-1550LE-C-DL camera from IDS Imaging (http://en.ids-imaging.com/ ) has a 1600 x 1200 sensor with pixel size of 2.8 mm (0.0028 mm). The images were rendered on a screen with a pixel size of 0.365 mm. The calibration was done on a 1200×720 (438 mm × 263 mm) section of the screen at a distance of 800 mm from the participants and vertical gaze angle of 18.2°. The horizontal gaze angle was 15.3°. The camera was positioned 280 mm in front of the screen with an eye-camera distance of 600 mm. See Figure 2 for details.

For one camera, one light source setups like the one used for this study, there seems to be little or no difference in the results that can be obtained with the two models at present. The complexity associated with fully calibrating the system and implementing the gaze estimation model seems to make regression-based estimation the better option for simple, noncommercial systems like these.

2.4 Accuracy

Gaze distance 800 mm

18°

Eye

Calibration area 263 mm

Screen

Accuracy is defined as the "closeness of agreement between a test result and the accepted reference value" [14]. With regard to eye tracking, accuracy refers to the difference between the measured gaze direction and the actual gaze direction for a person positioned at the centre of the head box [2], [22], [10]. It is measured as the distance, in degrees, between the position of a known target point and the average position of a set of raw data samples, collected from a participant looking at the point [11], [10]. This error may be averaged over a set of target points that are distributed across the display.

IR source 520

30°

140

Camera Table

Figure 2. Hardware configuration

Lack of accuracy, also known as systematic error, may not be a problem when the areas of interest are large (e.g. 8° of visual angle) and are separated by large distances [24], but in studies where the stimuli are closely spaced as in reading [20], or irregularly distributed as on a web page, uncertainty of as little as 0.5° - 1° can be critical in the correct analysis of eye-tracking data. Accuracy is also of great importance for gaze-input systems [1]. Systematic error can occur for two reasons, namely (a) inaccuracy in the eye-tracker and (b) inaccuracy in the eye-movement system. Machine-related error may result from bad calibrations, head movements, astigmatism, eye-lid closure or other sources that are strongly dependent on the particular characteristics of the individual participant [11]. The interest in this paper is in minimising the error that occurs due to the calibration process, specifically for a simple one light, one camera configuration.

3.2 Data Capturing The calibration area was divided into a 15×9 grid to have the same width:height ratio as that of the display area (1200×720 pixels) (Figure 3). Nine participants were presented with a series of 135 targets that covered the entire grid. The pupilglint vector at each target was saved to a database in real time and the saved data were used afterward to simulate the calibration and validation procedure for various combinations of calibration target arrangements and mapping models. 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 0.5 1.5 Ÿ 2.5 3.5 Ÿ 4.5 5.5 Ÿ

While an accuracy of 0.3° has been reported for tower-mounted high-end systems operated by skilled operators [15], [10], remote systems are usually less accurate. Komogortsev and Khan [18], for example, used an eye-tracker with an accuracy specification of 0.5° but found that after removing all invalid recordings, the mean accuracy over participants was 1°. They regarded systematic errors of less than 1.7° as being acceptable. Johnson et al. [16], using an alternative calibration procedure for another commercially available eye-tracker with stated accuracy of 0.5°, found an azimuth error of 0.93° and an elevation error of 1.65°. Zhang and Hornof [24] also found an accuracy of 1.1°, which is well above the 0.5° stated by the manufacturer. Using a simulator eyeball, Imai et al. [13] found the x-error and y-error of a video-based eye tracker to be 0.52° and 0.62° respectively. Van der Geest and Frens [23] reported an x-error of 0.63° and a y-error of 0.72° for the eye tracker they used. Chen et al. [5] proposed a “robust” video-based eye-

Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ

Ÿ Ÿ Ÿ Ÿ

Ÿ Ÿ 7.5 Ÿ Ÿ 8.5 Ÿ Ÿ Figure 3. 6.5

Ÿ

Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Targets for calibration and validation were presented to participants in a 15×9 grid. The 104 validation targets are marked with Ÿ. Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ

Ÿ Ÿ Ÿ Ÿ

Despite the technique of normalising the pupil-glint vectors in terms of IPD, accuracy can still be affected by gaze distance. Therefore, in order to compare the accuracy of various polynomials with one another, we had to control the gaze distance. Two fixed concentric circles were displayed around 41

Proceedings of Eye Tracking South Africa, Cape Town, 29-31 August 2013

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 0.5 X O 1.5 O 2.5 Ÿ 3.5 4.5 O 5.5 X

each target, representing camera distances of 597 and 603 mm respectively. A third concentric circle was displayed with varying radius as the participant moves his head backwards or forwards (Figure 4). The participant had to move his head until the dynamic circle is between the two fixed circles, i.e. in the range 598 mm - 602 mm. For this study, the focus was solely to find the most appropriate mapping model at a specific gaze distance. Future studies will include gaze distance as a separate independent variable for the regression polynomial.

Too near

Too far

6.5 7.5 8.5

Correct

Ÿ O O X Ÿ 2 targets with 2 distinct X and 2 distinct Y coordinates X 3 targets with 3 distinct X and 3 distinct Y coordinates O 5 targets with 5 distinct X and 5 distinct Y coordinates

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 0.5 XO O Ÿ XO 1.5 Ÿ 2.5 X X Ÿ 3.5 Ÿ 4.5 O O O 5.5 Ÿ

Figure 4. Real-time feedback on gaze distance with a circle with varying radius (green) around the target that should be between two fixed (grey) circles Data for a specific target was only accepted if the gaze distance was within the 5 mm tolerance and the standard deviation of samples was below a critical threshold – i.e. the eyes were held stable for the duration of the sampling period. Participants were instructed to blink frequently and to look away and rest their eyes whenever they felt the need to do so. The concentric circles ensured that they could return to the same position.

6.5 7.5 Ÿ 8.5 XO

X

Ÿ

X

O Ÿ XO Ÿ 8 targets with 8 distinct X and 8 distinct Y coordinates X 8 targets with 4 distinct X and 4 distinct Y coordinates as proposed by Cerrolaza &Villanueva (2008) O 9 targets in a 3×3 grid

3.3 Calibration and Validation 3.3.1 Calibration

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 0.5 ŸX ŸX 1.5 X Ÿ 2.5 X Ÿ X

Data for the pupil-glint vectors at selected cells in the 15×9 grid (Figure 3) was selected to serve as calibration targets. The accuracy for nine different calibration sets, which varies with regard to the number and arrangement of targets (Figure 5), were compared with one another. One of these arrangements is an 8-point set proposed by Cerrolaza and Villanueva [4] and referred to as C&V in the text.

3.5 4.5 5.5

Ÿ

X

Ÿ

X Ÿ

6.5 X 7.5 8.5 ŸX

Since the regression is done separately for the X and Y dimensions, it was thought that the arrangement of calibration targets should be such that they cover as many distinct X and Y values as possible. Having 9 points on a 3×3 grid effectively limits the regression to 3 distinct X and 3 distinct Y values. In Figure 5 below, 8 calibration targets are, for example, arranged to cover 8 distinct X and 8 distinct Y coordinates.

Ÿ

X X

Ÿ

X

X

ŸX Ÿ 11 targets with 9 distinct X and 9 distinct Y coordinates X 14 targets with 10 distinct X and 10 distinct Y coordinates 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5

0.5 1.5 2.5

3.3.2 Validation Normally, the accuracy of a system can be determined by requesting a participant to look at a second series of evenly spread targets (the first series being the calibration targets) for which the X and Y coordinates are known. The accuracy is then calculated as the average of the differences between the known and reported positions.

Ÿ

To ensure representative measurement of accuracy, validation targets (to be distinguished from calibration targets) should be spread regularly across the entire range at small intervals. For this study, the saved pupil glint vectors for 104 cells in the 15×9 grid were used towards this purpose (cells marked with Ÿ in Figure 3).

Ÿ Ÿ

Ÿ

Ÿ Ÿ

3.5 4.5 5.5 6.5 7.5 8.5

Ÿ Ÿ

Ÿ

Ÿ Ÿ Ÿ

Ÿ

Ÿ

Ÿ Ÿ Ÿ Ÿ 18 targets with 11 distinct X and 9 distinct Y coordinates Ÿ

Figure 5. Nine different sets of calibration targets For every participant/calibration set combination, the pupil-glint vectors in the calibration cells were used to determine the regression coefficients for each one of 625 polynomial models (except those for which the number of calibration targets was insufficient to do the regression). Table 1 lists the polynomials that were analysed for X and Y. Every polynomial for X was tested in combination with every polynomial for Y. A high level algorithm of the process is given in Figure 6.

In order to prevent the large number of dots at the edges (with potentially worse accuracy than the dots in the centre of the screen) to affect the accuracy negatively, the display was divided into four regions with an equal number of validation targets in each region (Figure 3).

42

Proceedings of Eye Tracking South Africa, Cape Town, 29-31 August 2013

Table 2. Accuracy for the best model per participant and calibration set (average over targets)

Table 1. Polynomials for the X and Y coordinates that were analysed Polynomials for X Polynomials for Y x x, y x, y, xy x, xy x, y, xy, x2y2 x, x2 x, x2, x3 x, x2, y x, x2, xy x, x2, y, y2 x, x2, y, xy x, x2, xy, x2y2 x, x2, y, y2, xy

x, x2, y, y2, xy, x2y2 x, x3 x, x3, y x, x3, y2 x, x3, xy x, x3, y, y2 x, x3, y, xy x, x3, y2, xy x, x3, xy, x2y2 x, x3, y, y2, xy x, x3, y, y2, xy, x2y2 x, x2, x3, y, y2, xy, x2y2

y x, y x, y, xy y, xy x, y, xy, x2y2 x, y, x2y2 x, y2 x2, y x, y2, xy2 x2, y, xy x2, y, x2y x2, y, xy, x2y x2, y, xy, x2y2

Terms between square brackets indicate that the accuracy is independent of whether they are included or not. E.g. (x2, y, [x2y]) and (x2, y) provides the same accuracy. Alternative possibilities are indicated with a /, e.g. y/y2 means that either y or y2 should be included. +/ (e.g. x+/x2) means that either or both terms must be included. Best model Accuracy Prt Pts X Y Avg SD 1 3 x, [y], [xy] x, y2 1.34 0.59 x3, 5 x, xy, [x2y2] x/x2, y2, [xy], x2y/x2y2 0.43 0.26 3 2 2 2 0.39 0.20 C&V x, x , [y/y ], xy x, y , [xy], x y 8 x, x3, y, [y2], xy x, y2, x2y 0.36 0.21 3 2 2 2 2 9 x, x, x, y , xy, x y xy, x y 0.44 0.28 2 3 2 2 2 2 2 11 x, [x ], x , y, [y ], xy, [x y ] x, y , xy, x y 0.38 0.23 3 2 2 2 2 2 2 2 2 0.36 0.23 14 x, x , [y/y ], xy, [x y ] x, [x ], [y], y , xy, x y/x y 2 3 2 2 2 2 2 2 2 2 0.36 0.20 18 x, [x ], x , y/y , xy, [x y ] x, [x ], [y], y , xy, x y/x y 2 2 2 3 x, [x ], [y], [xy] x, y 1.17 0.66 2 2 3 2 2 2 0.56 0.38 5 x, x /x , y/y , xy, [x y ] x , y, [xy] 3 2 2 2 xy C&V x, x , y, x , y, xy, x y 0.66 0.40 3 2 2 2 2 2 0.61 0.38 8 x, x , x , y, [y ],[xy,] x y xy, x y 9 x, x3, y/y2, xy x2, y, xy / x2y,[x2y2] 0.58 0.42 3 2 2 2 2 2 2 2 2 0.59 0.40 11 x, x , [y/y ], xy, x y [x], x , y, [y ], xy/x y/x y 0.63 0.40 14 x, x3, [y/y2], xy [x], x2, y, [y2],[xy/x2y/x2y2] 3 2 2 2 2 2 2 2 2 [x], x , y, [y ], xy/x y/x y 0.54 0.39 18 x, x , [y+/y ], xy, [x y ] 2 3 3 x, [xy] x ,y 0.81 0.63 2 2 3 2 2 5 x, x /x , x , y, [xy] xy, [x y ] 0.89 0.57 2 2 2 2 C&V x, y, xy, [x y ] 0.71 0.58 x , y, [x y] 2 2 8 x, (x , y, xy) / (x, y, x y) 0.74 0.62 [y], xy 2 2 3 2 2 2 9 x, x , [y/y ,] xy, [x y ] x , y, xy, [x y] 0.72 0.68 3 2 2 2 2 2 11 x, x , y/y , xy, [x y ] [x], x , y, [xy], [x y] 0.70 0.55 2 2 2 2 2 2 x , y, xy+/x y 14 x, [x ], y, [y ], xy, [x y ] 0.68 0.55 3 2 2 2 2 2 18 x, 0.70 0.61 [x ], y+/y , xy, [x y ] x , y, [xy+/x y] 4 3 x, [x2/xy] [x], y 0.64 0.39 xy, [x2y2] 0.59 0.41 5 x, x2/x3/y, y x2, y 0.40 0.39 C&V x, x3, [y], [y2], xy 8 x, x3, y y2, xy 0.53 0.38 9 x, x3, [y], [y2], xy x, [x2], y, [y2], x2y 0.40 0.40 11 x, x2/x3, x2, y, [xy] 0.40 0.43 xy, [x2y2] 14 x, [x2], x3, [y/y2], xy, [x2y2] x+/x2, y, [y2], [xy], [x2y2] 0.38 0.40 18 x, x2+/x3, y+/y2, xy, [x2y2] x+/x2, y, [y2], [xy], [x2y/x2y2] 0.37 0.36 5 3 x, [y], [xy] [x2], y 0.60 0.70 5 x, x2/x3/y, x2, y, x2y 0.44 0.31 xy,[x2y2] C&V x, y, xy, x2y2 0.41 0.28 x2, y, xy, x2y2 8 x, x2, [x3], y, y2, xy, x2y2 x, x2, y, y2, xy, x2y2 0.42 0.28 0.36 0.24 9 x, [x2], [x3], [y], [y2], xy,[x2y2] [x,] x2, y, y2, xy, x2y2 xy,[x2y2] [x,] x2, y, [y2], xy, x2y2 0.39 0.23 11 x, [x2/x3], [y], [x], x2, y, [y2], xy, x2y/x2y2 0.48 0.29 14 x, [y], xy 0.36 0.28 18 x, x2+/x3, [y], [y2], xy, x2y2 [x], x2, y, y2, xy, x2y2 6 3 x, [y], [xy] [x2], y 0.74 0.90 5 x, x2/x3/y, [y2], xy, [x2y2] x, y, xy, [x2y] 0.62 0.62 [x], x2, y, y2, xy, x2y2 C&V x, x2/x3, y/y2, [xy], [x2y2] 0.52 0.38 8 x, x2, [x3], y, y2, xy, x2y2 x, x2, y, y2, xy 0.55 0.58 9 x, x2, [x], x2, y, y2, xy, x2y2 0.38 0.27 y, y2, xy, x2y2 11 x, x2, [x3], y, y2, xy, x2y2 0.39 0.27 [x], x2, y, y2, xy, x2y2 14 x, x2, [x3], y, y2, xy, x2y2 0.38 0.25 [x], x2, y, y2, xy, x2y2 [x], x2, y, y2, xy, x2y2 18 x, x2+/x3, y, [y2], xy, x2y2 0.36 0.26 7 3 x, [x2/x3/y/xy] [x], y 0.73 0.75 xy,[x2y2] x2, y 0.53 0.71 5 x, x2/x3, [y/y2], 2 2 2 2 3 2 2 0.58 0.76 x , y, xy, xy [xy],[x y ] C&V x, x , [x ], [y], 2 2 2 2 2 2 2 8 x, x , xy, x y x, x , y, y , xy, [x y ] 0.49 0.68 2 2 2 2 0.57 0.71 9 x, [x ], [y], [xy] xy x, [x ], y, y , 11 x, [xy] x+/x2, y, y2, xy, [x2y/x2y2] 0.96 0.62 2 3 2 2 2 2 2 2 2 2 0.54 0.70 14 x, x /x , [y], [y ],[xy],[x y ] x/x , y, [y /xy], x y/x y 0.46 0.71 x2y2 18 x, [x2], y+/xy,[y2], [x2y2] [x], x2, y, y2, xy, 2 2 8 3 x [x /y/xy] [x ], y 0.73 0.58 2 2 3 2 2 2 0.56 0.48 5 x, [x /x ], y/y , xy, [x y ] [x ], y 2 2 2 2 3 2 x , y/y , xy x , y, xy, [x y /x y ] 0.49 0.45 C&V x, 2 3 2 2 2 2 2 0.58 0.53 [x], x , y, [y ], xy, [x y ] 8 x, x /x , y, y , xy 0.55 0.46 9 x, [y,] xy, x2y2 [x], x2, y, [y2], xy, [x2y2] 2 2 2 2 3 2 2 2 x , [y], [y ], xy, [x y ] x , y, xy, [x y/x y ] 0.52 0.42 11 x, 2 3 2 2 2 2 2 2 2 2 [x], x , y, [y ], xy, [x y], [x y ] 0.48 0.42 14 x, [x ], x , [y], [y ], xy, x y x3, y/y2, xy, [x2y2] [x], x2, y, [y2], xy, [x2y/x2y2] 0.49 0.46 18 x, 2 9 3 x [x /y/xy] [x], y 0.88 0.53 0.54 0.29 5 x, x2/x3, [y+/y2+/xy] x2, y, xy/x2y 2 2 2 0.60 0.32 [xy] x , y, xy, xy C&V x, 3 2 2 2 2 2 0.40 0.30 8 x, x, xy [x], x , y, [y ], xy, x y / x y 2 3 2 2 2 2 2 0.45 0.25 9 x, [x ], x , [y], [y ], [xy] x, x , y, y , xy, xy 2 3 2 2 2 2 2 2 2 11 x, x /x , [y], [y ], xy, [x y ] x, x , y, y , xy, [x y ] 0.45 0.25 2 3 2 2 2 2 2 2 2 0.40 0.29 xy 14 x, [x ], x , [y], [y ],[xy],[x y ] x, x , y, y , xy, x2y2 0.40 0.25 18 x, x2+/x3, [y+/y2], xy,[x2y2] [x], x2, y, [y2], xy,

x, x2, y, y2, xy x, x2, y, y2, x2y x, x2, y, y2, xy2 x, x2, y, y2, xy, x2y x, x2, y, y2, xy, x2y2 y, y2, xy, x2y2 x, y, y2, x2y x, y2, x2y x, y2, xy, x2y x, y2, x2y2 x, y2, xy, x2y2 x2, y, y2, xy, x2y2

For each participant { For each calibration set { For each model { Determine mapping polynomials through regression Get the average accuracy over the validation points } } }

Figure 6. Algorithm for data analysis

4. RESULTS Using the above mentioned procedure, the mapping model that provided the best accuracy for each participant and calibration set combination was identified (Table 2). In many cases, the addition or removal of some terms from the model made only a small or no difference to the accuracy. Optional terms are indicated with square brackets while alternative possibilities are indicated with /. Using the 11 points calibration set for example, any of the following models would provide good results for the X component of the PoR for participant 3: • • • •

X•=•a0•+•a1x•+•a2x3•+•a3y•+•a4xy•••• X•=•a0•+•a1x•+•a2x3•+•a3y2•+•a4xy• X•=•a0•+•a1x•+•a2x3•+•a3y•+•a4xy•+•a5x2y2• X•=•a0•+•a1x•+•a2x3•+•a3y2•+•a4xy•+•a5x2y2•

While several models could be used for any single participant/calibration set combination, no single model proved to be the best for all participants. From the data in Table 2, it seems as though better accuracy can be achieved with a higher number of calibration targets (Figure 7). If the 3-point calibration set is disregarded, the improvement that can be achieved with more calibration targets was, however, not significant (F(5,57) = 1.29, p = .28). This means that 8 points are enough to do a good calibration – provided that the correct model is used.

43

Proceedings of Eye Tracking South Africa, Cape Town, 29-31 August 2013

8 calibration points

1.0

Model X Y 1 x, x3, xy x, x2, y, y2, xy, x2y 1 x, x3, y2, xy x, x2, y, y2, xy, x2y 3 x, x3, y, xy x, x2, y, y2, xy, x2y 4 x, x3, xy x, x2, y, y2, xy, x2y2 5 x, x3, y2, xy x, x2, y, y2, xy, x2y2 6 x, x3, xy, x2y2 x, x2, y, y2, xy, x2y 7 x, x3, y, xy x, x2, y, y2, xy, x2y2 8 x, x3, xy, x2y2 x, x2, y, y2, xy, x2y2 8 x, x2, xy, x2y2 x, x2, y, y2, xy, x2y 10 x, x3, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y 11 x, x2, xy x, x2, y, y2, xy, x2y 330 x, x3, y2 x, y, x2y

0.9

Accuracy

0.8

0.7

0.6

0.5

0.4

0.3 3

5

8

9

11

14

18

Points

Figure 7. Accuracy as determined with the best model per participant against number of calibration points

8 calibration points (Cerrolaza & Villanueva [2008]) Model X Y 1 x, x3, y, xy x2, y, xy, x2y2 1 x, x3, y2, xy x2, y, xy, x2y2 3 x, x3, xy x2, y, xy, x2y2 4 x, x3, y, xy x2, y, xy, x2y 4 x, x3, y2, xy x2, y, xy, x2y 6 x, x3, xy, x2y2 x2, y, xy, x2y2 7 x, x3, xy x2, y, xy, x2y 8 x, x3, y, xy x2, y, x2y 9 x, x3, y, y2, xy x2, y, xy, x2y2 9 x, x3, y2, xy x2, y, x2y 11 x, x3, xy x2, y, x2y 109 x, x3, y2 x, y, x2y 492 x, x3, xy x, x2, y, y2, xy, x2y2

In an attempt to identify the best common model for all participant and calibration set combinations, the accuracy returned by the best model for a particular participant was subtracted from the accuracy returned by each one of the other models for the particular participant. For participant 1, for example, the best model for the 8-point calibration set (x, x3, y, xy; x, y2, x2y) returned an accuracy of 0.36° while (x, x3, xy; x, x2, y, y2, xy, x2y) yielded an accuracy of 0.47° - a deviation of 0.11°. In general, the smaller this deviation, the better the particular model applies to the data for the specific participant. Table 3 shows the average deviation over participants for the best performing models for each one of the calibration sets. Table 3. Deviation from best possible model (average over participants)

1 2 3 4 5 6 7 8 9 10

Model

Deviation Avg SD 0.119 0.097 0.147 0.058 0.154 0.105 0.169 0.116 0.178 0.070 0.187 0.060 0.198 0.077 0.220 0.057 0.231 0.069 0.240 0.217

1 2 3 4 4 6 7 8 9 10 11 11 13 14 321

5 calibration points Model X Y 1 x, x3, xy x2, y 1 x, x3, xy x2, y, xy 3 x, x2, xy, x2y2 x2, y 4 x, x2, xy, x2y2 x2, y, xy 5 x, x2, xy x2, y 5 x, x3, y2, xy x2, y 7 x, x3, y, xy x2, y 8 x, x2, xy x2, y, xy 9 x, x2, y, xy x2, y 9 x, x3, y, xy x2, y, xy 137 x, x3, y2 x, y, x2y

Deviation Avg SD 0.057 0.061 0.057 0.060 0.062 0.061 0.071 0.062 0.071 0.062 0.076 0.064 0.078 0.063 0.078 0.069 0.080 0.078 0.080 0.070 0.082 0.070 0.210 0.122 0.622 0.772

9 calibration points

3 calibration points Model X Y x, xy y x, xy x, y x y x, y y x, xy y, xy x x, y x, y x, y x y, xy x, y y, xy x, x2 y, xy

Deviation Avg SD 0.051 0.036 0.051 0.028 0.052 0.029 0.058 0.042 0.059 0.031 0.060 0.043 0.061 0.031 0.068 0.044 0.068 0.071 0.073 0.069 0.074 0.063 0.211 0.115

Deviation Avg SD 0.127 0.106 0.127 0.084 0.129 0.113 0.132 0.189 0.134 0.125 0.134 0.156 0.136 0.156 0.137 0.100 0.139 0.121 0.139 0.128 0.347 0.346

X x, y, xy, x2y2 x, y, xy, x2y2 x, x2, xy, x2y2 x, x3, y, y2, xy x, y, xy, x2y2 x, y, xy, x2y2 x, x2, xy, x2y2 x, x3, y2, xy x, x3, y, y2, xy x, x3, y, xy x, x3, y, xy x, x3, y2, xy x, x3, xy x, x3, xy x, x3, y2

Y x, x2, y, y2, xy, x2y x, x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x, x2, y, y2, xy, x2y x2, y, y2, xy, x2y2 x, x2, y, y2, x2y x, x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x, x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x, x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x, x2, y, y2, xy, x2y2 x, y, x2y

Deviation Avg SD 0.072 0.075 0.077 0.054 0.080 0.054 0.081 0.069 0.081 0.059 0.082 0.067 0.083 0.053 0.084 0.075 0.086 0.069 0.087 0.075 0.088 0.075 0.088 0.075 0.090 0.071 0.091 0.074 0.321 0.281

11 calibration points

1 2 3 4 5 6 7 8 8 10 11 11 11 14 331

44

Model X Y x, x2, x3, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x, x2, x3, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y2 x, x3, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y2 x, x3, y, y2, xy x, x2, y, y2, xy, x2y2 x, x3, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x, x3, y, y2, xy x, x2, y, y2, xy, x2y x, x3, y2, xy x, x2, y, y2, xy, x2y2 x, x3, xy, x2y2 x, x2, y, y2, xy, x2y2 x, x3, y, xy x, x2, y, y2, xy, x2y2 x, x3, y2, xy x, x2, y, y2, xy, x2y x, x3, xy x, x2, y, y2, xy, x2y x, x3, xy x, x2, y, y2, xy, x2y2 x, x3, y, xy x, x2, y, y2, xy, x2y x, x3, xy, x2y2 x, x2, y, y2, xy, x2y x, x3, y2 x, y, x2y

Deviation Avg SD 0.049 0.054 0.050 0.052 0.057 0.055 0.058 0.054 0.059 0.065 0.060 0.053 0.063 0.071 0.067 0.086 0.066 0.078 0.069 0.070 0.070 0.070 0.070 0.071 0.070 0.076 0.071 0.085 0.271 0.142

Proceedings of Eye Tracking South Africa, Cape Town, 29-31 August 2013 The full second order polynomial Y• =• b0• +• b1x• +• b2x2• +• b3y• +• b4y2• +• b5xy• +• b6x2y• /• b6x2y2 proved to be the best common model for the Y component for all calibration sets with 8 or more points. Identification of the best common model for the X component was less obvious, but eventually X•=•a0•+•a1x•+•a2x3• +•a3y2•+•a4xy provided the smallest average deviation across all calibration sets. Table 4 shows the accuracies as achieved per participant and calibration set for this common model in comparison with the accuracies that can be achieved by applying the best possible model for every participant. An interesting observation is that although the per participant best model provides good results for the C&V calibration set, the common polynomial does not work well for this arrangement of calibration targets.

14 calibration points Model 1 2 2 4 4 4 7 7 7 7 7 7 13 13 13 13 17 17 19 28 31 341

X x, x2, x3, y, y2, xy, x2y2 x, x2, x3, y, y2, xy, x2y2 x, x2, x3, y, y2, xy, x2y2 x, x2, x3, y, y2, xy, x2y2 x, x3, xy, x2y2 x, x3, y, y2, xy, x2y2 x, x3, xy, x2y2 x, x3, xy, x2y2 x, x3, xy, x2y2 x, x3, y, y2, xy, x2y2 x, x3, y, y2, xy, x2y2 x, x3, y, y2, xy, x2y2 x, x3, y, y2, xy, x2y2 x, x3, y, xy x, x3, y, y2, xy, x2y2 x, x3, y2, xy x, x3, y, xy x, x3, y, xy x, x3, y2, xy x, x3, xy x, x3, xy x, x3, y2

Y x, x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x2, y, xy, x2y2 x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y2 x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x2, y, xy, x2y x2, y, xy, x2y2 x, x2, y, y2, xy, x2y x, x2, y, y2, xy, x2y2 x2, y, y2, xy, x2y2 x2, y, xy, x2y x, x2, y, y2, xy, x2y2 x2, y, xy, x2y x, x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x, x2, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x, y, x2y

Deviation Avg SD 0.073 0.072 0.076 0.074 0.076 0.044 0.077 0.043 0.077 0.074 0.077 0.052 0.078 0.078 0.078 0.053 0.078 0.054 0.078 0.068 0.078 0.064 0.078 0.042 0.079 0.042 0.079 0.064 0.079 0.053 0.079 0.064 0.080 0.066 0.080 0.042 0.081 0.067 0.083 0.067 0.084 0.070 0.241 0.161

5. DISCUSSION Our results differ somewhat from those found by Cerrolaza & Villanueva [4], probably because of the hardware configuration. Cerrolaza & Villanueva [4] proposes a model that would accommodate a variety of configurations regarding cameras and infrared lights, while we focused on a system with a single camera and single IR source. The fact that no single model proved to be the best for all participants means that it could be a good idea for eye tracking software to include a procedure as part of the calibration routine which searches through all potentially good candidate models and applies a model that serves well for the particular conditions. This could, however, be a lengthy operation and may not always be feasible. It is important to find a common model that could be applied to all (or at least most) participants. It is also important to find an arrangement of calibration targets that would provide good results for all participants when this common model is applied. The small deviation values for the 18-point model indicate that a number of mapping models could provide reasonable accuracies. The selection of mapping model is critical for the 3point and 5-point calibration sets and good accuracies can only be obtained if the best model is selected for every participant. The deviation of other models from the best performing model is also higher for calibration sets where the number of distinct X and Y coordinates is much less than the number of targets. In other words, while the 9-point and 14-point calibrations do not necessarily provide worse accuracies than that of the 18-point set, the selection of mapping model is more critical. The following model was identified as the best model for all participants as long as at least 8 calibration points are used: X•=•a0•+•a1x•+•a2x3•+•a3y2•+•a4xy• Y•=•b0•+•b1x•+•b2x2•+•b3y•+•b4y2•+•b5xy•+•b6x2y•/•b6x2y2•

18 calibration points

1 2 3 4 4 6 7 7 7 7 11 11 11 14 15 16 17 18 342

18 calibration points Model X Y x, x2, x3, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y2 x, x2, x3, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x, x2, x3, y, y2, xy, x2y2 x2, y, y2, xy, x2y2 x, x3, y, xy x, x2, y, y2, xy, x2y x, x3, y2, xy x, x2, y, y2, xy, x2y x, x2, x3, y, y2, xy, x2y2 x2, y, y2, xy, x2y x, x3, y, xy x, x2, y, y2, xy, x2y2 x, x3, y, y2, xy x, x2, y, y2, xy, x2y x, x3, y, y2, xy x, x2, y, y2, xy, x2y2 x, x3, y2, xy x, x2, y, y2, xy, x2y2 x, x3, xy x, x2, y, y2, xy, x2y x, x3, xy, x2y2 x, x2, y, y2, xy, x2y x, x3, xy, x2y2 x, x2, y, y2, xy, x2y2 x, x3, xy x, x2, y, y2, xy, x2y2 x, x3, y2, xy x2, y, y2, xy, x2y2 x, x3, y, xy x2, y, y2, xy, x2y2 x, x3, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y x, x3, y, y2, xy, x2y2 x, x2, y, y2, xy, x2y2 x, x3, y2 x, y, x2y

Deviation Avg SD 0.023 0.030 0.024 0.032 0.030 0.026 0.031 0.031 0.031 0.031 0.033 0.029 0.034 0.031 0.034 0.030 0.034 0.030 0.034 0.031 0.037 0.036 0.037 0.031 0.037 0.032 0.038 0.034 0.039 0.032 0.040 0.029 0.040 0.029 0.040 0.028 0.236 0.130

Table 4. Accuracy per participant and calibration configuration for the common model (x,•x3,•y2,•xy;•x,•x2,•y,•y2,•xy,•x2y). Participant 1 2 3 4 5 6 7 8 9 Average STD

3

5

2BpP

3CP

1.34 1.17 0.81 0.64 0.60 0.74 0.73 0.73 0.88 0.85 0.25

1.66 1.39 0.90 0.72 0.62 0.81 0.73 0.91 0.97 0.97 0.34

BpP 0.43 0.56 0.89 0.59 0.44 0.62 0.53 0.56 0.54 0.57 0.13

8 CP 0.66 0.68 0.90 0.71 0.68 0.94 0.54 0.59 0.60 0.70 0.14

BpP 0.36 0.60 0.74 0.53 0.41 0.53 0.49 0.58 0.40 0.52 0.12

Calibration configuration 1C&V 9 CP BpP CP BpP CP 0.47 0.39 0.47 0.44 0.67 0.65 0.66 1.42 0.58 0.62 0.81 0.71 0.84 0.72 0.75 0.60 0.40 1.61 0.40 0.42 0.44 0.41 0.78 0.35 0.38 0.59 0.51 0.56 0.37 0.47 0.52 0.58 0.68 0.57 0.73 0.60 0.49 1.03 0.54 0.69 0.42 0.60 1.80 0.44 0.44 0.57 0.53 1.02 0.49 0.57 0.12 0.12 0.48 0.12 0.15

1

11 BpP CP 0.37 0.41 0.59 0.60 0.70 0.73 0.40 0.46 0.39 0.43 0.38 0.48 0.93 1.18 0.52 0.59 0.44 0.46 0.52 0.59 0.19 0.24

14 BpP CP 0.39 0.37 0.63 0.80 0.68 0.89 0.38 0.40 0.47 0.56 0.37 0.46 0.54 0.63 0.48 0.51 0.40 0.42 0.48 0.56 0.11 0.18

18 BpP CP 0.36 0.36 0.54 0.56 0.70 0.79 0.37 0.38 0.35 0.38 0.35 0.43 0.46 0.50 0.49 0.49 0.40 0.41 0.45 0.48 0.12 0.13

The 8 point configuration proposed by Cerrolaza & Villanueva [2008] BpP: Best per Participant – Use the polynomial that provides the best accuracy for each participant 3 CP: Use the common polynomial (x,•x3,•y2,•xy;•x,•x2,•y,•y2,•xy,•x2y) 2

45

Proceedings of Eye Tracking South Africa, Cape Town, 29-31 August 2013

Using this model, the average accuracy for the calibration sets with 8, 9, 11 and 14 points are approximately the same (Table 3). While the accuracy is somewhat better when the 18-point set is used, the improvement might not justify the extra time that it takes. It might also be strenuous on a participant's eyes with the risk of inaccurate data for the last few points. A researcher may, therefore, decide to settle for the much quicker and less demanding 8-point calibration set. It is informative that the accuracy values obtained for a simple system with one camera and one infrared source is of the same order of magnitude or even better than that of industrial systems (see section 2.4 above). This is probably due to the fact that an effort was made to find the most appropriate model for mapping the pupil-glint vector to screen coordinates. This study provides, therefore, more evidence that the boundaries of the "black box" extend beyond that of the physical hardware. Manufacturers may, therefore, provide regular updates to the so-called firmware – the software that resides inside the machine itself and which is actually responsible for the eye tracking data that is provided to the researcher. For this reason, researchers should not only specify the make and model of the eye tracker that they used for a study, but also the version of firmware at the time. It should also be noted that accuracy is not an absolute value and may be different from one participant to another and from one experimental set up to the next. It is therefore dangerous to use manufacturers' specifications when comparing one model with another.

6. FUTURE RESEARCH This paper focused on the accuracy that can be achieved for various participants and sets of calibration targets at a fixed gaze distance. Despite normalising the pupil-glint vectors with respect to inter-pupil distance, we experienced that the calibrations are not robust with large (in excess of 5 cm) backward and forward movements of the head. Future research could include eye-camera distance as independent variable in addition to the pupil-glint vector to determine the best possible mapping model for video-based systems that allows free head movement.

7. REFERENCES [1] Abe, K., Ohi, S. and Ohyama, M. 2007. An eye-gaze input system using information on eye movement history. In C. Stephanides (Ed.), Universal access in HCI, Part II. HCII2007, LNCS 4555, 721-729. Berlin Heidelberg : Springer-Verlag. [2] Borah, J. 1998. Technology and application of gaze based control. RTO Lecture series on Alternative Control Technologies, 7-8 October 1998, Brétigny, France and 1415 October 1998, Ohio, USA. [3] Brolly, X.L.C. and Mulligan, J.B. 2004. Implicit calibration of a remote gaze tracker. Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’04), 8,134-146. [4] Cerrolaza, J. and Villanueva, A. 2008. Taxonomic Study of Polynomial Regression Applied to the Calibration of Video-Oculographic Systems. Proceedings of the 2008 Symposium on Eye Tracking Research and Applications (ETRA), 26-28 March 2008, Savannah, Georgia. [5] Chen, J., Tong, Y., Gray, W. and Ji, Q. 2008. A robust 3D gaze tracking system using noise reduction. Proceedings of the 2008 Symposium on Eye Tracking Research and Applications (ETRA), 26-28 March 2008, Savannah, Georgia. [6] Crane, H.D. and Steele, C.M. 1985. Generation-V DualPurkinje-Image Eyetracker. Applied Optics, 24, 527-537. [7] Draper. N.R. and Smith, H. 1981. Applied Regression Analysis. John Wiley and Sons.

46

[8] Guestrin, E.D. and Eizenman, M. 2006. General Theory of Remote Gaze Estimation Using the Pupil Center and Corneal Reflections. IEEE Transactions on Biomedical Engineering, 54(6), 1124-1133. [9] Hansen, D.W. and Ji, Q. 2010. In the eye of the beholder: A survey of models for eyes and gaze. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 478-500. [10] Holmqvist, M., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H. and Van de Weijer, J. 2011. Eye tracking: A comprehensive guide to methods and measures. Oxford: Oxford University Press. [11] Hornof, A.J. and Halverson, T. 2002. Cleaning up systematic error in eye-tracking data by using required fixation locations. Behavior Research Methods, Instruments & Computers, 34(4), 592-604. [12] Hua, H., Krishnaswamy, P. and Rolland, J.P. 2006. Videobased eyetracking methods and algorithms in headmounted displays. Optics Express, 14(10), 4328-4350. [13] Imai, T., Sekine, K., Hattori, K., Takeda, N., Koizuka, I., Nakamae, K., Miura, K., Fujioka, H. and Kubo, T. (2005). Comparing the accuracy of video-oculography and the sclera search coil system in human eye movement analysis. Auris Nasus Larynx, 32, 3-9. [14] ISO 5725-1. 1994. Accuracy (trueness and precision) of measurement methods and results – Part1: General principles and definitions. International Standards Organisation. Geneva, Switzerland. [15] Jarodzka, H., Balslev, T., Holmqvist, K., Nyström, M., Scheiter, K., Gerjets, P. and Eika, B. 2010. Learning perceptual aspects of diagnosis in medicine via eye movement modeling examples on patient video cases. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society, Portland, 1703-1708. [16] Johnson, J.S., Liu, L., Thomas, G. and Spencer, J.P. 2007. Calibration algorithm for eyetracking with unrestricted head movement. Behavior Research Methods, 39(1), 123-132. [17] Kliegl, R. and Olson, R.K. 1981. Reduction and calibration of eye monitor data. Behavior Research Methods and Instrumentation, 13, 107-111. [18] Komogortsev, O.V. and Khan, J.I. 2008. Eye movement prediction by Kalman filter with integrated linear horizontal oculomotor plant mechanical model. Proceedings of the 2008 Symposium on Eye Tracking Research and Applications (ETRA), 26-28 March 2008, Savannah, Georgia. [19] Ramanauskas, N., 2006. Calibration of VideoOculographical Eye-Tracking System. Electronics and Electrical Engineering, Volume 8, 65-68. [20] Rayner, K., Pollatsek, A., Drieghe, D., Slattery, T. and Reichle, E. 2007. Tracking the mind during reading via eye movements: Comments on Kliegl, Nuthmann, and Engbert (2006). Journal of Experimental Psychology: General, 136(3), 520–529. [21] Shih, S., Wu, Y. and Liu, J. 2000. A calibration-free gaze tracking technique. Proceedings of the International Conference on patern Recognition, 201-204 [22] Tobii. 2010. Product description for the Tobii TX300 eye tracker. Tobii Technology AB, November 2010. Product flyer downloaded from www.tobii.com on 6 July 2012. [23] Van der Geest, J.N. and Frens, M.A. 2002. Recording eye movements with video-oculography and scleral search coils: a direct comparison of two methods. Journal of Neuroscience methods, 114, 185-195. [24] Zhang, Y. and Hornof, A. J. (2011). Mode of disparities error correction of eye tracking data. Behavior Research Methods, 43(3): 834-842.

Suggest Documents