real-time calibration-free autonomous eye tracker - Semantic Scholar

REAL-TIME CALIBRATION-FREE AUTONOMOUS EYE TRACKER Frank Klefenz*, Peter Husar#, Daniel Krenzer#, and Albrecht Hess# #

* Fraunhofer Institute for Digital Media Technology IDMT, D-98693 Ilmenau, Germany Ilmenau University of Technology, Dept. Biosignal Processing, D-98693 Ilmenau, Germany ABSTRACT

In several fields of medicine, transportation and security the instantaneous gaze direction of a person under supervision is of crucial importance. We developed a contactless stereoscopic video-based eye tracker which works without any individual calibration. In real time it delivers information about the gaze direction frame by frame. The introduced algorithms are designed for computing the gaze direction within image acquisition time that is only limited by the hardware setup. The cameras are integrated into front-end modules by means of FPGA (programmable logic) circuits for image processing. Computation of the gaze direction is based on the spatial position of the pupil which is detected by a five-dimensional Hough transform. The system works under ambient light conditions whereas additional infrared illumination can be used to become independent of ambient light. Index Terms— contactless remote eye tracker, fivedimensional Hough transform, fast image processing by FPGA, calibration-free computation of gaze direction

described as remote eye trackers. The most common solution is based on processing video data [3]. An early and simple solution of measuring the gaze direction in video data stream is based on the corneal reflex (CR) of a small light source (e.g. LED) which is mounted in the measuring equipment (e.g. on a PC display). The scene is captured by one camera which is positioned near the display or the light source. The task of image processing is to find the CR and the middle point of the pupil which give the so called PGV (pupil glint vector). The gaze direction is computed from the PGV after individual calibration [4]. However, in most real life applications, calibration and personal setup is not practicable. Therefore, a calibration-free eye tracker is needed. Furthermore, secure applications in medicine and transportation require fast image processing to handle time critical situations, which the commercial low-cost systems cannot offer. In this contribution a real-time calibration-free eye tracker is introduced that works autonomously and can be built stereoscopically or as a multi-camera front-end system. 2. MATERIAL AND METHOD 2.1. The setup of the eye tracking system

1. INTRODUCTION To minimize the error in measuring the gaze direction the best solution is a head-mounted eye-tracker. In this case head movements do not impact the signal, the image processing or the computation. Head-mounted eye trackers use several measuring principles. A simple solution is offered by EOG (electrooculogram), which gives information about the movements of the eyes. This can be used for example in human-machine interfaces by moving a cursor over a display. Other systems use contact lenses with implemented coils (search coil) for the inductive measurement of the spatial position of the eye and the pupil,. These systems are precise but of high strain for the person under investigation. Certainly, head-mounted systems are impracticable in real life applications. In the most cases a complete head-freedom is necessary. In general, contact-free eye trackers are needed, which are

978-1-4244-4296-6/10/$25.00 ©2010 IEEE

762

Eye Pupil

First FPGAcamera unit

Second FPGAcamera unit

Data cable

Fig. 1. Simplified assembly of the stereoscopic eye tracking system.

ICASSP 2010

Fig. 2. The processing chain architecture of the two FPGA camera units.

The eye tracker consists of two FPGA-based camera units, each with a separate image processing chain. The CMOScameras are arranged in a stereoscopic buildup with custom spatial adjustment (Fig. 1), as described in section 2.5. To determine the gaze direction two requirements must be fulfilled: the recorded pupil must exist in both video streams and the spatial position of the cameras must be stored in the eye tracking system. Under these conditions the position of the pupil in 3D space can be calculated from the data of the both camera systems generated during the image preprocessing. Afterwards the gaze direction is computed separately on each unit. In Fig. 2 the architecture and the main system blocks of the eye tracker are outlined. In the video chain of each system the same processing takes place. The computationally intensive parts of the processing chain are completely designed in hardware for the FPGA. This enables high frame rates and a fixed time regime for video acquisition. The images from the CMOS-sensor are transferred as a bitparallel pixel data stream frame by frame to the dedicated FPGA hardware. The hardware modules execute several

(a)

(b)

preprocessing steps, the Hough transform and the feature extraction. Extracted features are merged in an embedded postprocessing unit, implemented as a soft processing algorithm in a Xilinx Microblaze processor core, for classification and trigonometric calculations. 2.2 The image preprocessing chain The first step in the FPGA image preprocessing chain is a histogram builder module which generates the histogram of the gray scaled image. Thereon a histogram analyzer module dynamically calculates an adaptive threshold used for segmenting the image in a binary representation of the pupil. An edge detection module determines edge pixels by convolving the segmented image by means of a simple Roberts filter mask. The edge highlighted images are buffered in the SD RAM to provide them for the parallel Hough transformation module. Fig. 3 (a), (b), (c) demonstrate the preprocessing steps. Fig. 3 (d) shows an overlaid image of the original pupil image and the calculated ellipse after the ellipse fitting as described.

(c)

(d)

Fig. 3. Original pupil image (a), thresholded pupil image (b), edged pupil image (c), and pupil image overlaid with extracted pupil (d).

763

2.3. The parallel Hough transformation module After the preprocessing steps the edged pupil image buffered in the SD RAM is used to extract further information with the objective of finding elliptic structures. Therefore, the main pixel cruncher is the parallel Hough transformation module. It can be described as a highly parallelized Hough algorithm execution scheme on the basis of a simple delay line feed forward timing structure (Hough core). On the image, the Hough core works like a filter of variable size of n x m (n delay lines each with m elements) up to a size of 64 x 64 for the current implementation. The setting of a single delay element within the delay line structure is defined by a configuration matrix [1]. Depending on the form (e.g. linear or curved) and size of the structures of interest different configuration matrices can be used. This eye tracking system uses configuration matrices for curvatures to detect the extreme values of the edged pupil ellipse. For feeding the Hough core, the Hough preprocessing module works like an input image handler and controls the readout of the input rows and columns in the image buffer. Execution begins with diving the first left column slice of an image row block of height n in the Hough transformation module. In each processing time step one column slice is injected pixel-wise to the Hough core and a dataset is ejected. These datasets are used to build the more-dimensional Hough parameter space consisting of information about cumulative frequency as a probability score for the sought curvature (Hough accumulation space), radius of curvature (Hough curvature space), and the orientation of the curvature (Hough orientation space). The line-wise processing of the image is computed continuously while the Hough preprocessing module shifts the input row block by one image row. This process will be repeated until the last column slice of the last image row block will have passed the Hough transformation module. Time duration of the transformation depends on the size of the images and for this algorithmic implementation it is equal to the frame time. For the current implementation the image size can be variably selected up to HDTV resolution (1920 x 1080 pixels). During the term of Hough transform the feature vectors are extracted from the more-dimensional Hough parameter space to get the extreme values of the pupil ellipse. Therefore, the Hough accumulation space (as shown in Fig. 4) is scanned by a feature extraction module based on a nonmaximum suppression algorithm with thresholding in both the Hough accumulation space and the Hough curvature space. The results of this computation are peaks matching the extreme values in the current image, also called Hough features. Each bounded pupil has four feature vectors, two in x direction and two in y direction. These four extreme points are found by the Hough module robustly as shown in

764

Fig. 4. Hough accumulation space with detected Hough features, overlaid by the fitted ellipse and the computed parameters.

2.4 Ellipse fitting The pupil can be described as of having an elliptic shape. This shape model comes close to nature. In Fig. 3 (d) a recorded pupil is overlaid with a fitted ellipse of good congruence. The ellipse has five degrees of freedom: x and y as the coordinates of the central point, the major half-axis a, the minor half-axis b, and the rotation angle Į. The numerical solution of the ellipse fitting is overdetermined by the four given Hough feature vectors of the edged pupil and solvable by algebraic processing. Therefore, the detected feature vectors are presorted considering the cumulative frequency, the radius of the curvature, and the orientation. Groups from three to four features are formed and the calculation is performed for each group with giving a success probability for the fitted ellipse. Fig. 4 shows an example of ellipse fitting by the described method and the computed parameters x, y, a, b and Į. 2.5 Back projection and gaze direction The pupil is regarded in stereoscopic mode by two cameras with different angles of view. Variable viewing angles and a custom system setup are reached by modeling the complete eye tracker setup in 3D space. The spatial position of the pupil center is determined by computing a stereoscopic back projection of the ellipse center point from the two camera systems. Based on this 3D position and the other ellipse parameters the calculation of the spatial gaze direction can be proceeded in each camera system separately. The gaze direction can be described as a beam given by the pupil center and the pupil normal vector. For giving a vector representation of the gaze direction a second point on the pupil normal vector must be calculated. This is done by using the parameters of the projected ellipse on the virtual projection plane to compute the vector r that defines the

Fig. 5. The projection of the pupil onto the CMOS sensor plane. A virtual projection plane for the computation of the pupillary axis is located between the eye and the lens system of the camera for determining the pupillary axis.

virtual intersection of the pupillary axis npupil with the virtual projection plane as shown in Fig. 5. Via the 3D position of the pupil center and the gaze direction vector the 3D gaze direction is determined completely.

and are welcome in many applications, especially for selfcontrolled assistance. The fixed time regime of frame-wise computation is useful for scientific applications (e.g. EEG analysis). Furthermore, the system can easily be set up and adapted to automotive and game industries.

3. RESULTS 5. REFERENCES It is shown that the introduced eye tracking system processes video streams online and works completely autonomous without the need of separate PC hardware. This is basically reached by reducing and concentrating the extensive information load of the original gray scaled video data stream frame by frame exploiting the parallel FPGA hardware structures. Therefore, the whole low-level image preprocessing and robust mid-level feature extraction, which are the computationally intensive parts of the processing chain, are massively parallelized to match the FPGA hardware perfectly. Only the last processing steps for stereoscopic back projection and calculation of the gaze direction are inappropriate for the FPGA hardware and are thus done in the embedded processor that is more suitable for the required trigonometric computations. This results in an autonomous, fast, contact- and calibration-free and compact eye tracking system, which is able to determine the gaze direction robustly. Moreover, the frame rate of the system can be raised easily by using more powerful hardware without changing algorithms. 4. CONCLUSIONS The system supports several critical features. Due to its processing speed, it masters time critical applications like the detection of fast saccadic eye movement for diagnostic purposes. It is supposed to reach a good user acceptance by instantaneous response time. Calibration-free systems, like the introduced one, ease the usage of eye tracking systems

765

[1] A. Brückmann, F. Klefenz, and A. Wünsche, “A Neural Net for 2D-Slope and Sinusoidal Shape Detection,” International Scientific Journal of Computing, Ukraine, pp. 21-26, vol. 3(1), 2004. [2] M.Y. Chern, and Y.H. Lu, “Design and Integration of Parallel Hough-Transform Chips for High-speed Line Detection,” Proceedings of the 11th International Conference on Parallel and Distributed Systems: ICPADS 2005, IEEE Computer Society, Washington, DC, pp. 42-46, 2005. [3] A.H. Clarke, J. Ditterich, K Drüen, U. Schönfeld , C. Steineke “Using High Frame-Rate CMOS Sensors for Three-Dimensional Eye Tracking,“ Behavior Research Methods, Instruments & Computers, Psychonomic Society Publications, pp. 549-560, vol. 34(4), 2002. [4] Duchowski, A., Eye Tracking Methodology: Theory and Practice, Springer, London, 2007. [5] G. West, T. Welsh, and J. Pratt, “Saccadic trajectories receive online correction: Evidence for a feedback-based system of oculomotor control,” Journal of Motor Behavior, Heldref Publications, Washington, DC, pp. 117-126, vol. 41, 2009.