Eye Tracking with Statistical Learning and Sequential Monte Carlo ...

ICICS-PCM 2003 15-18 December 2003 Singapore

3C4.2

Eye Tracking with Statistical Learning and Sequential Monte Carlo Sampling W. Huang*, C. W. Kwan** and L. C. De Silva*** *Institute for Infocomm Research, Singapore **Department of Electrical and Computer Engineering, National University of Singapore ***The Institute of Information Sciences and Technology, Massey University, Palmerston North, New Zealand

Abstract Many ways have been proposed for eye tracking. These methods are either based on detecting low-level image characteristic or pattern recognition techniques. The first approach is fast yet lack of accuracy. The second approach is accurate but slow. This paper presents a novel method proposed for fast and accurate eye-tracking by a combination of the above methods. A Gaussian Mixture Model for skin color is built to locate the eyes approximately in a fast speed. Then, Probabilistic Principal Component Analysis (PPCA), is applied to confirm the eye location accurately. Sequential Monte Carlo Sampling, an enhanced sampling technique is integrated with the system to further the speed. Experimental results show that it can perform eye tracking accurately with fast speed and robust against the different degree of deformation, orientation gaze and shape of eyes.

1. Introduction Eye tracking has long been recognized as an important task in both Computer Vision and Human-Computer Interaction (HCI). Main applications of such include contact-free mouse control [1], auto-stereoscopic display system [3] and tracking driver’s eyes for road safety [5]. One approach of the eye and gaze tracking is based on the infrared lighting and head mounted device [2]. However it requires expensive equipment for the specific usage. For eye tracking based on vision system only, the available works can be classified into two categories: (a) Based on low-level image characteristics Rainer Stiefelhagen et al proposed an eye tracker based on the skin colors [6], which is used for locating the face area. Then perform eye tracking by searching the two darkest pixels inside the face as pupils. Their method is fast but may not be accurate enough because other occurrences of dark pixels on the face are possible, such as eyebrows, frame of glasses.

0-7803-8185-8/03/$17.00 © 2003 IEEE

Malciu et al proposed a deformable model-based approach to track eyes [7]. The model tracks the eyes by fitting itself to the edge of the eyes consecutively. Yet, the model will be attracted by edges from other facial feature, e.g. wrinkles. This is a fast method but lack of accuracy. (b) Based on pattern recognition Rogério S. Feris et al proposed a method for tracking face based on Gabor wavelet network [4]. A face template is expressed in terms of 2D Gabor wavelet, the tracking is done by repositioning the Gabor wavelet representing the face template into the new face position. It is an accurate method because it tracks the face by its pattern rather than other general image feature of the face. However it is slow. Kay Talmi et al proposed an eye tracking method based on Principal Component Analyses (PCA) [3]. A reference eye pattern was processed from the training of the PCA model. Tracking is accomplished by looking for the eye pattern similar to that of the reference eye. This is an accurate technique as it checks for the pattern of the eye, but it is slow since it involved in heavy computation. Generally, methods based on low-level image characteristics are simple and fast, but not accurate; while methods based on pattern recognition are accurate but slow. Both accuracy and speed cannot be equally done by applying any single of the methods. In contrast, integrating these two methods allows both accuracy and speed. In this paper, Gaussian Mixture Model (GMM) for skin color is used as a method belongs to (a) for locating the eyes approximately and Probabilistic Principal Component Analysis (PPCA) is a method belongs to (b) applied to confirm the eye positions. In this way, accuracy is achieved by having PPCA to locate the eye position while speed is achieved by having Gaussian Mixture Model for skin colour to select a number of possible eye. Furthermore, Sequential Monte Carlo Sampling is implemented to further enhance the speed. In following sections, the novel algorithm achieving both accuracy and speed, based on the methods introduced above will be presented.

The theory of Gaussian mixture model of skin colour, PPCA and Sequential Monte Carlo sampling will be briefly described in Section II. The overview of the algorithm will be mentioned in Section III. The implementation of each module of the system will appear in Section IV and Section V will be the result and discussions. Lastly, Section VI will be the conclusion.

2. THEORY

1 d −q

d

∑λ

j

(5)

j = q +1

WML is defined by its principal eigenvectors while σ2ML is calculated by the non-principal eigenvalues. Upon calculation of WML , σ2ML and µML Equation (2) can be used to calculate probability. For the details of the theory of PPCA, please refer to [15].

2.3 Sequential Monte Carlo Sampling

2.1 Gaussian mixture model for skin color Human skin color is a prominent feature used to track and detect face [8,9,10] and hand [11]. Gaussian model is commonly used for detection of skin color [6]. In this algorithm, Mixture Gaussian model is adopted because of the higher accuracy of the modeling [12]. Basically Gaussian mixture model employs several Gaussian distribution to model data. Equation (1) defines the probabilistic density function of the Gaussian mixture model: p(x,φ)=∑πi⋅pi(x,Si,µi)

2 σ ML =

(1)

Where πi is the mixing parameter indicate the weight of a sub-model, pi(x, Si,µi), which is a Gaussian with Si and , µi are the covariance and the mean for it. EM algorithm is applied to train the πi, Si and µi to define the Gaussian mixture model.

Sequential Monte Carlo Sampling is an enhanced sampling method, it has been used to track of fast moving vehicles, face movements [16], sound [17] and dancing movements [18]. The main idea for Sequential Monte Carlo Sampling is to sample more the region with high probability and less the region with less probability with fixed number of samples. It is because in tracking or detection of objects, the region with higher probability is the main interest. In this paper, Sequential Monte Carlo Sampling is used with PPCA to confirm the actual eye position from the approximate eye positions obtained by Gaussian mixture model for skin color.

3. SYSTEM OVERVIEW One iteration for the eye tracking algorithm can be understood by the following Input-Output relationship:

2.2 Probabilistic Principal Component Analysis Probabilistic Principal Component Analysis (PPCA) is a probabilistic and generative enhancement of Principal Component Analysis (PCA), it has been used for recognizing handwritings [13], image segmentation [14] and image compression [15]. In ordinary PCA, only the principal eigenvectors are used for dimensionality reduction, the non-principal eigenvectors are ignored. In PPCA, the non-principal part is used to model noise of the probability model. Equation (2) defines the PPCA: p(t) = (2π)-d/2 |C|1/2exp{-(t-µ µ)C-1(t-µ µ)/2}

(2)

where t is the data vector, C = WWT + σ2I, and -1

µML = N ∑ tn

The system will search the current eye position, (x, y), based on the local search region, R, centred at the previous eye position, (x’, y’). The eye position is initialized manually for the first frame. The process consists of approximating the eye position and confirming the accurate eye position by refined searching.

(3)

Approximating Eye position

(4)

The current frame undergoes skin color detection by the Gaussian Mixture Model. The approximated eye positions E are defined as the non-skin colored region found inside R. Figure 2 shows a typical result of skin color detection. R is denoted by the red rectangle. E is denoted as all the white colored regions inside R.

Maximum likelihood of W and σ2 are calculated as: Λq - σ2 I)1/2R WML = Uq (Λ

Figure 1 Input-Output relationship

positions. Figure 3 shows some skin color detection result using this model.

Confirming Accurate Eye position Applying Sequential Monte Carlo Sampling for a fixed number of times to sample the approximate eye position, then probability is calculated based on PPCA model of eye for each point sampled. The pixel will have high probability if it is an eye where low probability if it is not an eye.

Figure 2 Typical result of skin color detection In the above process, we can see how these modules cooperate to work out a fast and accurate solution. Without the help of skin color detection, PPCA can get the accurate position of eye by sampling inside R, instead of E, but it is much slower. If only the skin color detection is used, it is not accurate enough. The Sequential Monte Carlo Sampling enhances the speed by reducing the number of PPCA calculations required. After all the probability values are obtained, the final output of the eye position is either determined by choosing the one with highest probability or calculating the weighted mean of all the sampled point.

4. IMPLEMENTATION 4.1 Skin color detection by GMM There are 203 face images containing 714,715 skin color pixels used to train the model. The pixels are converted into YUV color space and just use the UV values to train the model. To initialize the training of EM, the histogram of the pixels is plot and we manually choose the number of sub-models g and the mean µi for each sub-model. Then clustering by assigning the pixels to the nearest mean I is performed. Based on the clustering, we can calculate π i and

Figure 3 Result of skin color detection

4.2 Probabilistic Principal Component Analysis There are 600 pairs of eyes images collected from image sequences of 4 different individuals. During the recording, the individual is asked to perform various types of deformation of eyes, head rotation, changing gaze viewing in multiple angles and different scales. The size and intensity of the eyes are normalized and histogram equalized before they used to train the PPCA model. Figure 4 shows some of the eye samples.

Figure 4 Eye Samples Eye samples of size 16 x 24 and choose number of eigenvectors, q = 40 were used to train the PPCA model. The training makes use of Equation (4) and (5) to obtain WML, σ2ML and µML . Figure 5 shows the mean eye images for left and right eyes obtained after the training

S i . To

test the above segmentation, 500 images are randomly chosen from 5 image sequences taken from 4 different individuals. 495 images show correct segmentation of eye and skin. Only 5 images failed and they are observed to have either very small eyes or the eyes are closed. Thus, based on the experimental results, the Gaussian Mixture Model works for obtaining E, the approximate eye

Figure 5 Mean Eye images (Right, Left) In order to verify the accuracy of the PPCA model, a total of 50 frames of different people are selected, a probability map is built for each of the local region, R, of each frame.

Results show that 48 of them have the highest probability situated at the centres of the eyes. Figure 6 shows some of the probability maps.

for each eye. Figure 7 shows a typical sampling with probability calculation. The width of each rectangle is proportional to the probability values.

Figure 7 Sequential Monte Carlo Sampling with PPCA probability calculations

5. Results and Discussions Figure 6 Probability maps showing the highest probability at the centre of the eye In Figure 6, the yellow spots in the left images denote location of the top 10% probability. Red color denotes the highest probability value in probability map at the right, then followed by yellow, cyan and blue. For the 2 images showing low probability in the eye region is because the eye is closed.

4.3 Sequential Monte Carlo Sampling In order to concentrate the samples in the region of high probability, the following procedure is modified from factored sampling [18]: In a sampling set consists of N samples, {s ( n ) , π ( n ) , c ( n ) , n = 1,..., N } s denotes the sample, π denotes the weight and c denotes the cumulative likelihood and n is the order of samples. For the first few samples, for example n < 0.2N (n)

Step 1: Randomly choose the sampling point s and (n) likelihood for each sample is calculated. The weight, π is calculated by dividing the likelihood by the total likelihood. (n)

Step 2: The cumulated weight, c ,is calculated. For each of the subsequent samples, n ≥ 0.2 N , the process iterates until n = N: Step 1: Generate a uniform random number, r ∈ [0,1] Step 2: Find the smallest j for which c ( j ) ≥ r (n)

Step 3: Generate s by random uniform normal function (j) N(s ,1) to ensure randomness.

In implementing Sequential Monte Carlo Sampling with PPCA calculation to perform eye tracking, N = 40 is used

The proposed algorithm is tested with 6 different image sequences recorded with 3 different subjects having length from 256 frames to 931 frames. The data has the following settings: Image sequence 1: Subject A wears spectacles and rotates his head to see different directions. Image sequence 2: Subject A is asked to move forward and backward causing the scale of his eye change and it is done in a relative poor illumination. Image sequence 3: Subject B moves randomly to change his gaze, rotation, blinking of the eye and translational motion with moderate speed. Image sequence 4: Subject C swings her head and facing the camera. Image sequence 5: Subject C performs the same random motions as in Image sequence 3 but with fast speed. Image sequence 6: Subject B moves around with natural posture. The algorithm works well for tracking movement indicated in the above Image sequences. In order to evaluate the accuracy of the system, such test is derived for measuring accuracy for Image sequence 1 to Image sequence 5: 1. 2.

Select 30 frames randomly from each sequence. Classify the frame as accurately tracked if the distance between the actual centre of eyes and the predicted centre of eyes is less than 3 pixels, otherwise classify as not accurate tracking.

In performing the test, except for Image sequence 3, out of the total 30 frames selected for both left eyes and right eyes, the maximum number of non-accurate tracking is 3 from each of the sequence. Figure 8 shows some sample frames from the image sequence. For Image sequence 3, the number of non-accurate frame is 7 for right eyes and 10 for left eyes. It is because for such movement, the eyes are very different from the majority of the database, thus a very

high error occurred. One sample frame from Image sequence 3 is shown as the last frame in the right column of Fig 8. Even the error is high, but this movement can be tracked continuously.

6. CONCLUSION A novel method for tracking eyes are proposed and implemented. It is developed to provide fast and accurate tracking of eye. In the available work done for eye tracking, method based on low level image characteristics is able to perform fast but not accurate and method based on pattern recognition techniques is accurate but slow. The major distinction between this algorithm and the other’s work is that it integrates these two methodologies to achieve fast and accurate eye tracking. In this algorithm, a Gaussian Mixture Model for skin color is implemented to get the approximate eye position while PPCA is applied to find the real eye position from the approximate ones. Sequential Monte Carlo Sampling is applied to further enhance the speed. Based on the experimental results, the algorithm is robust against the deformations, eye shapes, different eye gazes, changing eye scales and orientation of head. It has been proven to work with image sequence with subject wearing spectacles and poor illumination condition. The tracking of a typical sequence will give a mean error of 2.1276 pixels for right eye and 2.9558 for left eye. This algorithm can achieve a frame rate of 4 fps in Matlab environment and is expected to increase 2 to 3 times when it is implemented in C languages.

Figure 8 Sample frame from Image sequence 1-6 For Image Sequence 6, the error by Euclidean distance is measured specifying the real positions of the eye manually and compare between the tracked positions. The graph of plotting the error is on Figure 9 and 10. For the 300 frames in the sequence, the following statistics is obtained:

Mean Error

Right Eye 2.1276 pixels

Left Eye 2.9558 pixels

Table 1 Error of Image Sequence 6 Analyzing the statistics, the mean error obtained is between 2 to 3 pixels for a typical tracking sequence. The error obtained for left eye is higher because the samples for right eyes are more carefully selected for training the algorithm. The left eye samples are only introduced when the tracking for right eye has been established. The left eye tracking can be done equally as right eye if more time is spent to choose the sample for training PPCA of left eye. Finally, this system is coded in Matlab by using eye samples of size 16 x 24 and 40 eigenvectors, N = 20 for each eye, this algorithm can achieve the frame rate of 4 fps whereas using PPCA alone can only give a frame rate of 0.2 fps. It is believed to be at least two to three times faster when coded in C language and perform some speed optimization.

7. FUTURE WORK Here are the research directions that can further enhance this algorithm: - Mixture PPCA can be used to process eye database of more people. - Auto-initialization of the eye positions when the algorithm starts. - Implement the algorithm in C platform for faster speed. - Extend the algorithm with pupil detection for gaze tracking. REFERENCES [1] C. Colombo, A. D. Bimbo, S.D. Magistris, “Combining Head Tracking and Pupil Monitoring in VisionBased Human-Computer Interaction”, 6th Int. Conf., CAIP '95 [2] EyeTracking, Inc. http://www.eyetracking.com. [3] Kay Talmi, Jin Liu “Eye and gaze tracking for visually controlled interactive stereoscpoic displays”, Image Communication, 14(10), pp.799-810, 1999. [4] Rogério S. Feris and Roberto M. Cesar Junior “Tracking Facial Features Using Gabor Wavelet Networks”, Proceedings of Brazilian Symposium on Computer Graphics and Image Proceeding SIBGRAPI’2000, Gramado, Brazil, October 2000.

[5] M. Sodhi, B. Reimer, J. L. Cohen, E. Vastenburg, R. Kaars, S. Kirschenbaum, “On-road driver eye movement tracking using head-mounted devices” Proceedings of the symposium on eye tracking research & applications symposium, New Orleans, Louisiana, 2002 [6] Rainer Stiefelhagen, Jie Yang, Alex Waibel, “ Tracking Eyes and Monitoring Eye Gaze”, Proceedings of the Workshop on Perceptual User Interfaces (PUI'97). Alberta, Canada. pp.98-100. 1997. [7] Marius Malciu and Francoise Preteux, “ Tracking facial features in video sequences using a deformable modelbased approach”, Proceedings SPIE Conference on Mathematical Modeling, Estimation and Imaging, San Diego, CA, Vol. 4121, August 2000. [8] S. Spors, R.Rabenstein“A Real-Time Face Tracker ForColor Video” IEEE Int.Conf. on Acoustics, Speech & SignalProcessing (ICASSP), Utah, USA, May 2001 [9] R.-L. Hsu, M. Abdel-Mottaleb, and A. K. Jain, ``Face detection in color images,'' IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 696-706, May 2002 [10] V. Vezhnevets, "Method For Localization Of Human Faces In Color-Based Face Detectors And Trackers" The Third International Conference on Digital Information Processing And Control In Extreme Situations, May 28-30, 2002, Minsk, Belarus. [11] Christoph Theis, Kathrin Hustadt, “Detecting the gaze direction for a man machine interface “ 10th IEEE International Workshop on Robot and Human Communicaiont, pp 424-429, Roman, September 2001 [12] M.H. Yang, N. Ahuja, “Gaussian Mixture Model for Human Skin Color and Its Application in Image and Video Databases”, Proc. of the SPIE, vol. 3656: Conf. on Storage and Retrieval for Image and Video Databases, pp. 458-466, San Jose, Jan., 1999 [13] Musa, Duin, de Ridder, “Modelling Handwritten Digit Data using Probabilistic Principal Component Analysis”, Proceedings 7th Annual Conference of the Advanced School for Computing and Imaging (ASCI 2001), pp. 415-442. ASCI, Delft, 2001. [14] Dick de Ridder, Josef Kittler, Robert P.W. Duin, “Probabilistic PCA and ICA subspace mixture models for image segmentation”, Int. Conf. on Pattern Recognition, pages 216-220, 2002 [15] Micheal E. Tipping and Christopher M. Bishop, “Probabilistic Principal Component Analysis”, Technical Report Woe-19, Neural Computing Research Group, Aston University, UK, 1997 [16] K. Nummiaro, E. Koller-Meier and L. Van Gool, “A Color-based particle filter”, 1st International worshorp on Generative Model-Based Vision GMBV’02, pp. 5360, 2002 [17] J. Vermaak, M. Gangnet, A. Blake and P. Perez, “Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking”, Proceedings of the Int. Conf. on Computer Vision, volume I, pp.741-746, July 2001.

[18] Michael Isard and Andrew Blake, “Condensationconditional density propagation for visual tracking” International Journal for Computer Vision 29(1), pp.528, 1998 Error/pixel

Frame No.

Figure 9 Error of right eye against Frame no of Image sequence 6 Error/pixel

Frame No.

Figure 10 Error of left eye against Frame no of Image sequence 6