sensors Article Article
Multi-User Identification-Based Identification-Based Eye-Tracking Eye-Tracking Multi-User Algorithm Using Using Position Position Estimation Estimation Algorithm Suk-Ju Kang Suk-Ju Kang Department of Electronic Engineering, Sogang Sogang University, University,Seoul Seoul04107, 04107,Korea; Korea;
[email protected];
[email protected]; Tel.: +82-2-705-8466 Tel.: +82-2-705-8466 Academic Editor: Joonki Paik Received: 13 December 2016; Accepted: 23 December 2016; Published: date 27 December 2016
algorithm using using position position estimation. estimation. Abstract: Abstract: This This paper paper proposes proposes aa new new multi-user multi-user eye-tracking eye-tracking algorithm Conventional areare typically suitable onlyonly for afor single user, and thereby cannot Conventional eye-tracking eye-trackingalgorithms algorithms typically suitable a single user, and thereby be used for a multi-user system. Even though they can be used to track the eyes of multiple users, cannot be used for a multi-user system. Even though they can be used to track the eyes of multiple their detection accuracy is low and they cannot identify multiple users individually. The proposed users, their detection accuracy is low and they cannot identify multiple users individually. The algorithm solves these problems enhances detection the proposed proposed algorithm solves these and problems and the enhances the accuracy. detection Specifically, accuracy. Specifically, the algorithm adopts a classifier to detect faces for the red, green, and blue (RGB) and depth images. Then, proposed algorithm adopts a classifier to detect faces for the red, green, and blue (RGB) and depth it calculates features based features on the histogram oriented of gradient for thegradient detectedfor facial to images. Then, it calculates based on of thethe histogram the oriented the region detected identify multiple users, and selects the template that best matches thebest users from athe pre-determined facial region to identify multiple users, and selects the template that matches users from a face database. Finally, the proposed algorithm extracts the final eye positions based pre-determined face database. Finally, the proposed algorithm extracts the final on eyeanatomical positions proportions. Simulation results show that the proposed algorithm improved the average F score by 1 based on anatomical proportions. Simulation results show that the proposed algorithm improved up to 0.490, compared with benchmark algorithms. the average F1 score by up to 0.490, compared with benchmark algorithms. Keywords: eye tracking; tracking; face face detection; detection; multi-user Keywords: eye multi-user identification identification
1. Introduction 1. Introduction Currently, various fields require information about human eye recognition. In particular, Currently, various fields require information about human eye recognition. In particular, the the eye recognition is one of the most important features in applications in vehicles because it can eye recognition is one of the most important features in applications in vehicles because it can estimate human fatigue state, which has a direct impact on the safety of the driver and the passenger. estimate human fatigue state, which has a direct impact on the safety of the driver and the For example, Figure 1a shows a system that checks drowsiness by analyzing the driver’s eyes. passenger. For example, Figure 1a shows a system that checks drowsiness by analyzing the driver’s In addition, human eyes can be used as an interface to control the operation of the display in the eyes. In addition, human eyes can be used as an interface to control the operation of the display in vehicle. Figure 1b shows that the eyes of multiple users control the display of the center console. the vehicle. Figure 1b shows that the eyes of multiple users control the display of the center console. In these cases, the precise eye positions for multiple users are required. To do so, the eye-tracking In these cases, the precise eye positions for multiple users are required. To do so, the eye-tracking algorithm should calculate accurate positional information in the horizontal direction (x), vertical algorithm should calculate accurate positional information in the horizontal direction (x), vertical direction (y), and depth direction (z), on the basis of the camera device [1,2]. direction (y), and depth direction (z), on the basis of the camera device [1,2].
(a)
(b)
Figure 1. 1. Various examples in in aa vehicle vehicle application: application: (a) drowsiness warning warning system; an Figure Various examples (a) aa drowsiness system; and and (b) (b) an interface control system using multi-user eye tracking. interface control system using multi-user eye tracking. Sensors 2017, 17, 41; doi:10.3390/s17010041 Sensors 2017, 17, 41; doi:10.3390/s17010041
www.mdpi.com/journal/sensors www.mdpi.com/journal/sensors
Sensors 2017, 17, 41 Sensors 2017, 17, 41
2 of 9 2 of 9
Various eye-tracking algorithms have been proposed. A video-based eye-tracking algorithm has been proposed [3] to algorithms track the eye positions in input A frames. This algorithm detects the user’s Various eye-tracking have been proposed. video-based eye-tracking algorithm has face using eigenspaces, and motion basedframes. on a block-matching trackface the been proposed [3] to track theestimates eye positions in input This algorithmalgorithm detects thetouser’s user’s face. However, this algorithm is only suitable for a single user. Another algorithm uses depth using eigenspaces, and estimates motion based on a block-matching algorithm to track the user’s and color imagethis sequences for is depth-camera–based multi-user tracking [4]. Thisuses algorithm face. However, algorithm only suitable for a single user.eye Another algorithm depth uses and an object-tracking algorithm and eye localization. However, requires[4]. considerable computation color image sequences for depth-camera–based multi-user eyeittracking This algorithm uses an time to track multiple users, cannot distinguish them—i.e., it does not associate any object-tracking algorithm andand eye it localization. However,between it requires considerable computation time to particular facial region with a single discrete user. track multiple users, and it cannot distinguish between them—i.e., it does not associate any particular algorithms require an accurate face-detection algorithm for high facialGenerally, region witheye-tracking a single discrete user. performance. are two representative face-detection algorithms. A local binary pattern–based Generally,There eye-tracking algorithms require an accurate face-detection algorithm for high algorithm [5,6] usesare local textures face-detection in an input image. Hence, it is binary robustpattern–based to gray-scale performance. There twoimage representative algorithms. A local variations,[5,6] anduses it is local efficient insofar as itinuses simple binary patterns. Another a robust algorithm image textures an input image. Hence, it is robust to approach gray-scale is variations, real-time face-detection It uses anpatterns. integral imaging computation. and it is efficient insofaralgorithm as it uses [7,8]. simple binary Anothertechnique approachfor is afast robust real-time In addition, it algorithm uses cascade classifiers based on an adaptive boost-learning algorithm (AdaBoost) to face-detection [7,8]. It uses an integral imaging technique for fast computation. In addition, improve the detection accuracy. algorithms canalgorithm adopt either of thesetoface-detection it uses cascade classifiers based onEye-tracking an adaptive boost-learning (AdaBoost) improve the algorithms. detection accuracy. Eye-tracking algorithms can adopt either of these face-detection algorithms. In this paper, paper, aa new new multi-user multi-user eye-tracking eye-tracking algorithm algorithm is proposed. proposed. It is based on a previous overall operation operation blocks are totally totally changed changed to to enhance enhance performance. performance. The proposed study [9], but overall algorithm performs the calibration of red, green, and blue (RGB) and depth images to prevent distortion, and uses the user classification module and several features to enhance enhance the performance. performance. Specifically,ititselects selectsthe the candidate regions (in which an image. input image. it Specifically, candidate regions (in which facesfaces exist)exist) from from an input Then, itThen, adopts adopts an AdaBoost-based face-detection algorithm based on [7], and extracts features from the an AdaBoost-based face-detection algorithm based on [7], and extracts features from the histogram histogram gradient in a facial region. Then, itfor searches for athat template that best matches the of gradientof (HOG) in a(HOG) facial region. Then, it searches a template best matches the input face input aface from a pre-calculated face Finally, database. Finally, it and estimates anduser extracts user eye based positions from pre-calculated face database. it estimates extracts eye positions on based on anatomical proportions. anatomical proportions. organized as as follows. follows. Section 2 describes the proposed proposed multi-user multi-user eye-tracking eye-tracking This paper is organized algorithm. Section Section 33 presents presents performance performance evaluations evaluations comparing comparing the proposal with benchmark algorithms. Section 4 concludes the paper. paper. 2. Proposed Algorithm Figure a conceptual block diagram for the proposed in the pre-processing Figure 2 2shows shows a conceptual block diagram for the algorithm. proposed First, algorithm. First, in the module, the proposed calibrates the RGB and depth which areimages, captured by RGB pre-processing module,algorithm the proposed algorithm calibrates the images, RGB and depth which are and depthbycameras. the face-detection module performs face extraction the input captured RGB andSecond, depth cameras. Second, the face-detection module performs from face extraction images. user-classification module identifies multiple users. Finally, theusers. 3D eye positions from theThird, input the images. Third, the user-classification module identifies multiple Finally, the are 3D extracted. Figure shows a detailed diagram for block the proposed The specific operations eye positions are 3extracted. Figure 3block shows a detailed diagramalgorithm. for the proposed algorithm. The are described in theare following sub-sections. specific operations described in the following sub-sections.
Figure 2. Overall multi-user eye eye tracking tracking algorithm. algorithm. Figure 2. Overall concept concept for for the the proposed proposed multi-user
Sensors 2017, 17, 41
3 of 9
Sensors 2017, 17, 41 Sensors 2017, 17, 41
3 of 9 3 of 9
Figure 3. Overall block diagram for the proposed algorithm. Figure for the the proposed proposedalgorithm. algorithm. Figure3.3.Overall Overallblock block diagram diagram for
2.1. Pre-Processing Module
Pre-Processing Module uses RGB and depth cameras. In some cases, the pixel resolution of the 2.1.2.1. Pre-Processing The proposedModule algorithm RGB The and depth images can differ. Hence, the resolutions be calibrated, and the proposed The proposed algorithm uses RGB and cameras. Inmust some cases, the pixel resolution of the of proposed algorithm uses RGB anddepth depth cameras. In some cases, the pixel resolution RGB depth can the bebecalibrated, and thethe proposed algorithm increases a low-resolution depth image such thatmust its resolution matches the RGB image. the RGBand and depthimages images candiffer. differ.Hence, Hence, theresolutions resolutions must calibrated, and proposed algorithm increases a low-resolution depth image such that its resolution matches the RGB image. The resolution of depth images is generally lower than that of RGB images. To match the resolution, algorithm increases a low-resolution depth image such that its resolution matches the RGB image. resolution depthimages images generally lower than of thethe resolution, theThe proposed algorithm uses aisisbilinear interpolation algorithm [10], asTo shown in Figure 4. For The resolution ofof depth generally lower than that that ofRGB RGBimages. images. Tomatch match resolution, the proposed algorithm uses a bilinear interpolation algorithm [10], as shown in Figure 4. For example, if the resolution is doubled, is defined asalgorithm follows: [10], as shown in Figure 4. For example, the proposed algorithm uses a bilinear itinterpolation example, if the resolution is doubled, it is defined as follows: if the resolution is doubled, it is defined as follows: I x 1 , y 1 I x , y I x 1, y , I x 21 , y 1 I x , y I x1, y , 2 + Ix+1,y , I x , y 1Ix+1/22,y=I xλ, y1×I x, yIx,y , (1) 1 I x , y I21 x,y+1/22=I λ ×I x, yIx,y (1)(1) x , y2 1 ,+ Ix,y +1 , 2 Ix+1/2,yI+1/12 = 1λ3×3 Ix,yI x+ IxI+1,y + Ix,y+1+I Ix+1,y+, 1 , Ix x 21, y, y 12 3 I x,, yy I xx1,1,yy I xx, ,yy11 I x x1,1,y y11 ,
2
2
where , andλ3λdenote thehorizontal, horizontal,vertical, vertical,and anddiagonal diagonalweights, weights,respectively respectively(which (which are are 0.5, 0.5, 3 denote where λλ11,, λλ2,2 and the where λ 1, λ2, and λ3 denote the horizontal, vertical, and diagonal weights, respectively (which are 0.5, 0.5, and 0.25, 0.25, respectively), respectively),and andIIx+1/2,y , Ix ,y+1/2 Ix +1/2,y+1/2 denote the horizontal, vertical, and x +1/2,y 0.5, and , Ix,y+1/2 , and,Iand x+1/2,y+1/2 denote the horizontal, vertical, and diagonal 0.5, andinterpolated 0.25, respectively), and Ix+1/2,y, Ix,y+1/2, and Ix+1/2,y+1/2 denote the horizontal, vertical, and diagonal diagonal pixels, respectively. interpolated pixels, interpolated pixels,respectively. respectively.
Figure Pixelarrangement arrangementininbilinear bilinear interpolation interpolation algorithm is is Figure 4. 4. Pixel algorithmwhen whenan aninput inputimage imageresolution resolution Figure 4. Pixel arrangement in bilinear interpolation algorithm when an input image resolution doubled. doubled. is doubled.
Then, the proposedalgorithm algorithmextracts extracts the the candidate candidate search image captured Then, the proposed searchregion. region.InInthe theinput input image captured Then, the proposed algorithm extracts the candidate search region. In the input image by the cameras, the region where users are likely to be when watching a large-sized display such as as a a by the cameras, the region where users are likely to be when watching a large-sized display captured such by the cameras, the region where users are likely to be when watching a large-sized display such as a
Sensors 2017, 17, 41
4 of 9
Sensors 2017, 17, 41
4 of 9
television is restricted to a certain area. Therefore, the proposed algorithm uses this region to search television restricted to areducing certain area. Therefore, thetime. proposed algorithmoperation uses this for region to search for users’ is faces, thereby the computation The detailed detecting faces is for users’ faces, reducing the computation time. The detailed operation for detecting faces is described in thethereby following sub-section. described in the following sub-section. 2.2. Face-Detection Module 2.2. Face-Detection Module The proposed algorithm uses the classifier-based face-detection algorithm proposed in [7]. The proposed algorithm uses the classifier-based face-detection in [7]. This This algorithm offers a high detection rate and it can be operated in algorithm real time. proposed In addition, the proposed algorithm a high rate and it can beselected operatedduring in real pre-processing, time. In addition,thereby the proposed algorithmoffers analyzes thedetection facial candidate regions enhancing algorithm analyzes the facial candidate regions selected during pre-processing, thereby enhancing the detection accuracy while reducing the search region. Specifically, the face-detection algorithm the while reducing thecalculates search region. the face-detection usesdetection several accuracy rectangular features, and theseSpecifically, features based on an integralalgorithm image [7,11]. uses several rectangular features, and calculates these features based on an integral This integral image technique generates a summed area table to generate the image sum of[7,11]. the pixel This integral image technique generates a summed area table to generate the sum of the pixel values values in a rectangular window to enhance the computational efficiency. In addition, it uses simple in a rectangular window to enhance the computational efficiency. In addition, it uses simple classifiers generated by the AdaBoost algorithm [7] to select features from the detected face. Finally, classifiers generated by the AdaBoost algorithm [7] to select features from the detected face. Finally, the face-detection algorithm uses a cascading structure to generate classifiers which can more accurately the face-detection algorithm uses a cascading structure to generate classifiers which can more detect faces while reducing the operation time. Figure 5 shows the concept for the cascading structure accurately detect faces while reducing the operation time. Figure 5 shows the concept for the of the face-detection module in the proposed algorithm. The first classifier rejects negative inputs cascading structure of the face-detection module in the proposed algorithm. The first classifier using anegative few operations. Thea operations at further stages of at thefurther cascade alsoofreject negative inputs, rejects inputs using few operations. The operations stages the cascade also and gradually enhance the accuracy of the detection after multiple stages. Therefore, the proposed reject negative inputs, and gradually enhance the accuracy of the detection after multiple stages. algorithm the canproposed detect thealgorithm facial region exactly. Therefore, can detect the facial region exactly.
Figure 5. Concept for the cascading structure of the face-detection module in the proposed Figure 5. Concept for the cascading structure of the face-detection module in the proposed algorithm. algorithm.
2.3. User-Classification User-ClassificationModule Module 2.3. Afterthe thefaces facesare aredetected, detected, they classified individually based a pre-calculated database. After they areare classified individually based on aon pre-calculated database. Figure 66 provides providesan anoverall overallblock blockdiagram diagram this process. histogram of oriented gradients Figure forfor this process. TheThe histogram of oriented gradients (HOG) is used as a classification feature because of its robustness in classifying faces [12]. Specifically, (HOG) is used as a classification feature because of its robustness in classifying faces [12]. the horizontal verticaland gradients the facial areregion calculated as follows: Specifically, theand horizontal verticalfor gradients for region the facial are calculated as follows:
HG= [− 11 0 10 B1F] ,∗ BF , HG T T VG VG=[− 1 0 10 1B] F ,∗ BF ,
(2)
(2)
where HG HGand andVG VGrespectively respectively denote horizontal and vertical gradients with a denote the the horizontal and vertical gradients filteredfiltered with a 1D-centered 1D-centered discrete mask, derivative and BFadenotes a detected face Using block. Using the gradients, the of discrete derivative and mask, BF denotes detected face block. the gradients, the HOGs HOGs of magnitude and orientation for each pixel are generated as follows: magnitude and orientation for each pixel are generated as follows: 1
M x , y HGx2,2y VGx2, y 2 2 , 1/2 Mx,y = HGx,y + VGx,y , (3) (3) VG VG x,y + π , x, y 1 −1 = tan θxx,y tan , 2 HG ,y HG x,y 2 x, y where Mx,y and θ x,y denote the magnitude and orientation of the pixel, respectively. Histograms for where Mx,y and θx,y denote the magnitude and orientation of the pixel, respectively. Histograms for the two properties are generated, and histograms for several blocks are combined into one feature the two properties are generated, and histograms for several blocks are combined into one feature
Sensors 2017, 17, 41 Sensors 2017, 17, 41
5 of 9 5 of 9
Sensors 2017, 17, 41
5 of 9
vector. Then, the feature vector is classified using a support vector machine (SVM) [13] to partition thevector. classes maximally, thereby generating theusing exact for the input face. (SVM) Then, thethe feature vector is classified aclass support vector machine vector. Then, feature vector is classified using a support vector machine (SVM)[13] [13]totopartition partitionthe classes maximally, thereby generating the exact class for the input face. the classes maximally, thereby generating the exact class for the input face. Detected face region Detected face region
Gradient calculation Gradient calculation
HOG generation HOG generation
SVM-based classification SVM-based classification
Class Class
Figure 6. Overall block diagram for the multi-user classification module. Figure6.6.Overall Overallblock blockdiagram diagram for for the Figure the multi-user multi-userclassification classificationmodule. module.
2.4. Three-Dimensional Eye-Position Extraction Module 2.4. Three-Dimensional Eye-Position Extraction Module 2.4. Three-Dimensional Eye-Position Extraction Module In this module, the proposed algorithm calculates the left and right eye positions. Specifically, it In this module, the proposed algorithm calculates the left and right eye positions. Specifically, it Inanatomical this module, the proposed algorithm calculates the left and eye7 positions. Specifically, uses the proportions for the eye position in a human face.right Figure shows a conceptual uses the anatomical proportions for the eye position in a human face. Figure 7 shows a conceptual it uses anatomical proportions for the the horizontal eye position invertical a human face. Figure 7 shows a conceptual image of the this First, it computes and image of module. this module. First, it computes the horizontal and verticalpositions positions(x(xand andyyaxes), axes), and and then then image of this module. First, it(zcomputes the horizontal and vertical positions (x several and y axes), and then it calculates the depth position axis). The image on the left in Figure 7 includes parameters it calculates the depth position (z axis). The image on the left in Figure 7 includes several parameters calculates the (z and axis). The image on the in Figure 7 includes several parameters foritcalculating the depth 3D position, these are derived asleft follows: for calculating the eye 3Dposition eye position, and these are derived as follows: for calculating the 3D eye position, and these are derived as follows: px1 px1xi xi, , pxp2 px1 x 2x= , , i xxi1i+ 1α, yi xyii + p= , 1,− α, ppx2 (4) y y (4) p y = yi + Iβ, I depth depth p p d d max Idepth , , pzz =z dmax I maxIImax max × max , where xi anddenote yi denote an initial pixel point thedetected detected facialregion, region, and the denote the where xixand an initial initial pixel point in in the and denote the where an pixel point in the detected facialfacial region, α and βdenote horizontal i andyyi i denote horizontal and vertical offsets, respectively, I max and I depth denote the maximum intensity level and the horizontal andoffsets, verticalrespectively, offsets, respectively, max and Idepth denote the maximum intensity level the and vertical Imax and Idepth denote the maximum intensity level and theand intensity intensity of the detected face, and denotesthe thereal realmaximum maximum distance. distance. Using Using these these intensity of the detected and dmaxdmax denotes level oflevel the level detected face, andface, dmax denotes the real maximum distance. Using these parameters, parameters, the final left and right positions follows: parameters, theand final left eye andpositions right eyeeye positions areare as as follows: the final left right are as follows: p px1 , p y , pz , p eyeL px1 , p y , pz , peyeLeyeL= p x1 , py , pz , peyeR px 2 , p y , pz . peyeR , p , p . peyeR= ppx2 x 2 p y , ypz . z
Using this module, the proposed algorithm can extract the final 3D eye positions. Using this module, proposed algorithm can extract final eye positions. Using this module, thethe proposed algorithm can extract thethe final 3D3D eye positions.
Figure 7. Concept for extracting 3D eye position from the RGB and depth images.
Figure 7. Concept forfor extracting 3D3D eyeeye position from thethe RGB and depth images. Figure 7. Concept extracting position from RGB and depth images.
(5) (5)(5)
Sensors 2017, 17, 41
6 of 9
3. Simulation Results The detection accuracy of the proposed algorithm was evaluated by comparing it with benchmark algorithms. In addition, the identification ratio with multiple users was calculated for the proposed algorithm. The RGB camera had a resolution of 1280 × 960 pixels and the depth camera’s resolution was 640 × 480 pixels. The dataset we used was an image sequence taken with a direct RGB camera and a depth camera in consideration of the distance change. Three benchmark algorithms were used: the classifier-based detection algorithm (Algorithm 1) [7], the improved Haar feature–based detection algorithm (Algorithm 2) [8,9], and the low binary pattern (LBP)-based detection algorithm (Algorithm 3) [6]. For an objective evaluation, the proposed algorithm calculated by precision, recall, and F1 scores [14,15], which are derived as follows: Precision = Recall = F1 Score = 2 ×
TP TP+ FP ,
TP TP+ FN ,
(6)
Precision×Recall Precision+Recall ,
where TP, FP, and FN denote the number of true positives, false positives, and false negatives that were detected, respectively. Using these values, the F1 score was calculated, for which a value of one indicates perfect accuracy. For the test sequences, we used several sequences at different distances (ranging from 1 m to 3.5 m) between the camera and multiple users. First, the accuracy of detection using the proposed and benchmark algorithms was compared. Table 1 shows the average precision and recall values for the proposed and benchmark algorithms at different distances. Table 2 shows the average F1 score, combining precision and recall at different distances. In terms of precision, the total averages of the benchmark Algorithms 1, 2, and 3 were 0.669, 0.849, and 0.726 on average, respectively. In contrast, the proposed algorithm resulted in a perfect score of 1.000. In terms of recall, the total averages of the benchmark Algorithms 1, 2, and 3 were 0.988, 0.993, and 0.738, whereas the proposed algorithm resulted in 0.988. Therefore, the average F1 score for the proposed algorithm was up to 0.294, 0.151, and 0.490 higher than those of Algorithms 1, 2, and 3, respectively. This means that the detection accuracy of the proposed algorithm was higher than that of the benchmark algorithms. Figure 8 also shows the same results where the precision and recall values of the proposed algorithm were higher than those of the benchmark algorithms. This was because the proposed algorithm accurately classified foreground and background images by using several cascade classifiers after calibrating RGB and depth images. Figures 9 and 10 show the resulting RGB and depth images from the proposed and benchmark algorithms at different distances (2.5 m and 3.5 m). The benchmark algorithms detected false regions as faces, and some faces remained undetected. In addition, these algorithms could not associate any particular facial region with a single discrete user. On the other hand, the proposed algorithm accurately detected the faces of multiple users and classified each of them by assigning each face a different number, as shown in Figures 9d and 10d (here, 1, 2, and 3 are the identification numbers for the users). The identification accuracy of the proposed algorithm for each face from multiple users was also evaluated. Table 3 shows the identification number and ratio for multiple users with the proposed algorithm. The maximum number of users was three. The identification ratios for Faces 1, 2, and 3 were 0.987, 0.985, and 0.997, respectively. In total, the ratio was 0.990 on average, which is highly accurate. This was because the proposed algorithm used the pre-training process for required users, and hence, it had a higher performance than the conventional algorithms.
Table 1. Average precision and recall values for the proposed and benchmark algorithms at different distances. Algorithm 1 Algorithm 2 Algorithm 3 Proposed Algorithm Sensors 2017, 7 of 9 Distance (m) 17, 41 Precision Recall Precision Recall Precision Recall Precision Recall 1.000 0.741 0.981 0.877 0.991 0.730 0.619 1.000 0.981 for the proposed benchmark 1.000 algorithms at 0.985 1.500Table 1. Average 0.573 precision 0.985 and recall 0.732 values0.991 0.493 and0.514 2.000different distances. 0.637 0.975 0.833 0.981 0.825 0.789 1.000 0.975 2.500 0.664 1.000 0.853 1.000 0.713 0.938 1.000 1.000 Algorithm 1 Algorithm 2 Algorithm 3 Proposed Algorithm 3.000 0.717 0.991 0.886 1.000 0.824 0.828 1.000 0.991 Distance (m) Precision Recall Precision Recall Precision Recall Precision Recall 3.500 0.806 0.991 0.972 0.995 0.708 0.800 1.000 0.991 1.000 0.741 0.981 0.877 0.991 0.730 0.619 1.000 0.981 Random 0.544 0.994 0.792 0.994 0.792 0.677 1.000 0.994 1.500 2.000 2.500 Table 3.000 3.500 Random
2.
0.573 0.985 0.637 0.975 0.664 values1.000 F1 score for 0.717 0.991 0.806 0.991 0.544 Algorithm 10.994
the
0.732 0.991 0.493 0.514 1.000 0.985 0.833 0.981 0.825 0.789 1.000 0.975 0.853 1.000 0.713algorithms 0.938at different 1.000distances. 1.000 proposed and benchmark 0.886 1.000 0.824 0.828 1.000 0.991 0.972 0.995 0.708 0.800 1.000 0.991 Proposed 0.792 Algorithm 0.994 2 0.792 0.677 0.994 Algorithm 31.000
Algorithm F1 Score Difference F1 Score Difference F1 Score Difference F1 Score Table 2. F1 score values for the proposed and benchmark algorithms at different distances. 1.000 0.844 −0.147 0.931 −0.060 0.674 −0.320 0.991 1.500 0.725 Algorithm −0.268 0.842 0.503 3 −0.490 0.993 1 Algorithm 2−0.151 Algorithm Proposed Algorithm Distance (m) 0.771 2.000 −0.216 0.901 −0.086 0.807 −0.180 F1 Score Difference F1 Score Difference F1 Score Difference F1 Score 0.987 2.500 1.000 0.7980.844 −0.202 0.921 −0.079 0.811 −0.189 1.000 −0.147 0.931 −0.060 0.674 −0.320 0.991 3.000 1.500 0.8320.725 −0.164 0.939 −0.057 0.996 −0.268 0.842 −0.151 0.5030.826 −0.490 −0.170 0.993 −0.216 0.901 −0.086 0.8070.751 −0.180 −0.245 0.987 3.500 2.000 0.8890.771 −0.107 0.983 −0.013 0.996 2.500 0.798 −0.202 0.921 −0.079 0.811 −0.189 1.000 Random3.000 0.7030.832 −0.294 0.882 −0.115 0.731 −0.266 0.997 −0.164 0.939 −0.057 0.826 −0.170 0.996
Distance (m)
3.500 Random
0.889 0.703
−0.107 −0.294
0.983 0.882
−0.013 −0.115
0.751 0.731
−0.245 −0.266
0.996 0.997
Table 3. Identification number and ratio for multiple users with the proposed algorithm. Table 3. Identification users Face 1number and ratio for multiple Face 2 with the proposed algorithm. Face 3
Distance (m) Distance (m)
1.000 1.500 1.000 1.500 2.000 2.000 2.500 2.500 3.000 3.000 3.500 3.500 Random Random
Detection Detection Number Face 1 Ratio Detection 70/70 Number
70/70 70/70 70/70 64/68 64/68 70/70 70/70 70/70 70/70 69/70 69/70 89/90 89/90
Detection 1.000 Ratio
1.000 1.000 1.000 0.940 0.940 1.000 1.000 1.000 1.000 0.980 0.980 0.990 0.990
Detection Detection NumberFace 2 Ratio Detection 68/70 Number
69/70
68/70 69/70 67/68 67/68 70/70 70/70 70/70 70/70 70/70 70/70 87/90
87/90
Detection 0.970 Ratio
0.980
0.970 0.980 0.980 0.980 1.000 1.000 1.000 1.000 1.000 1.000 0.970
0.970
Detection Face 3 Number Detection 70/70 Number
69/70
70/70 69/70 68/68 68/68 70/70 70/70 70/70 70/70 70/70 70/70 90/90
90/90
Detection Ratio Detection Ratio1.000
0.980
1.000 0.9801.000 1.000 1.000 1.000 1.0001.000 1.0001.000 1.000
1.000
Figure 8. The data distribution of the precision-recall graph for the proposed and benchmark Figure 8. The data distribution of the precision-recall graph for the proposed and benchmark algorithms. algorithms.
Sensors 2017, 17, 41
8 of 9
Sensors 2017, 17, 41 Sensors 2017, 17, 41
(a) (a)
8 of 9 8 of 9
(b) (b)
(c) (c)
(d) (d)
Figure Comparing detection accuracy the proposed and benchmark algorithms a distance Figure 9. 9. Comparing thethe detection accuracy ofof the proposed and benchmark algorithms at at aatdistance Figure 9. Comparing the detection accuracy of the proposed and benchmark algorithms a distance 2.5 m from the RGB and depth cameras (top: RGB image; bottom: depth image): (a) Algorithm of of 2.5 m from the RGB and depth cameras (top: RGB image; bottom: depth image): (a) Algorithm 1; 1;1; of 2.5 m from the RGB and depth cameras (top: RGB image; bottom: depth image): (a) Algorithm Algorithm 2; (c) Algorithm 3; and (d) proposed algorithm. (b)(b) Algorithm 2; (c) Algorithm 3; and (d) proposed algorithm. (b) Algorithm 2; (c) Algorithm 3; and (d) proposed algorithm.
(a) (a)
(b) (b)
(c) (c)
(d) (d)
Figure 10. Comparing the detection accuracy of the proposed and benchmark algorithms at a Figure 10. Comparing the detection accuracy of the proposed and benchmark Figure 10. Comparing the detection accuracy of the proposed and benchmark algorithmsalgorithms at a distanceat a distance of 3.5 m from RGB and depth cameras (top: RGB image; bottom: depth image): (a) distance of RGB 3.5 m from RGB and depth cameras (top: RGB image; bottom: (a) depth image):1; (a) of 3.5 m from and depth cameras (top: RGB image; bottom: depth image): Algorithm Algorithm 1; (b) Algorithm 2; (c) Algorithm 3; and (d) proposed algorithm. 2; and (c) Algorithm 3; and (d) proposed algorithm. (b) Algorithm Algorithm 1; 2; (b) (c) Algorithm 3; (d) proposed algorithm.
4.4.Conclusions Conclusions 4. Conclusions This paper presented aarobust multi-user eye-tracking algorithm This paper presented robustmulti-user multi-usereye-tracking eye-trackingalgorithm algorithmusing usingposition positionestimation. estimation.ItIt This paper presented a robust using position estimation. determines the candidate eye-position regions from input RGB and depth images. Using this determines the the candidate RGB and depth images. Using thisregion, region, It determines candidate eye-position eye-positionregions regionsfrom frominput input RGB and depth images. Using this the proposed algorithm adopts a classifier-based face-detection algorithm, and computes features the proposed algorithm adopts a classifier-based face-detection algorithm, and computes features region, the proposed algorithm adopts a classifier-based face-detection algorithm, and computes based on histogram ofof oriented gradients for detected facial region. Then, it selects the based based on the the gradients for the thethe detected facial region. Then, features onhistogram the histogramoriented of oriented gradients for detected facial region. Then,it itselects selectsthe template that best matches the input face from a pre-determined database, and extracts the final eye template that best matches the input face from a pre-determined database, and extracts the the template that best matches the input face from a pre-determined database, and extracts thefinal finaleye positions based on anatomical proportions. ofof aa simulation demonstrated that the proportions. The The results results of simulationdemonstrated demonstrated that eyepositions positionsbased basedon on anatomical anatomical proportions. The results a simulation that thethe proposed algorithm is highly accurate, with an average F 1 score that was up to 0.490 higher than that proposed algorithm is highly accurate, with an average F1 score was to 0.490 higher than proposed algorithm is highly accurate, with an average F1 score thatthat was upup to 0.490 higher than thatthat ofofthe benchmark algorithms. benchmark algorithms. of thethe benchmark algorithms. Acknowledgments: This research was supported by a grant (16CTAP-C114672-01) from the Infrastructure and Acknowledgments: This research supported a grant (16CTAP-C114672-01) from Infrastructure Acknowledgments: This research waswas supported by abygrant (16CTAP-C114672-01) from thethe Infrastructure andand Transportation Technology Promotion Research Program funded Ministry Land, Infrastructure and Transportation Technology Promotion Research Program funded byby thethe Ministry of of Land, Infrastructure and Transportation Technology Promotion Research Program funded by the Ministry of Land, Infrastructure and Transport of Korean government. Transport of Korean government. Transport of Korean government. The author to to thank to Yong-Woo of Hanyang University for providing Author Contributions: Contribution: The author would would like to like thank Yong-Woo Jeong Jeong of Hanyang University for providing a set of Contribution: The author would like to thank to Yong-Woo Jeong of Hanyang University for providing a set of a set of image image data. data. image data. Conflicts of of Interest: The author declares nono conflict of interest. Conflicts Interest: The author declares conflict of interest. Conflicts of Interest: The author declares no conflict of interest.
Sensors 2017, 17, 41
9 of 9
References 1. 2. 3. 4.
5. 6.
7.
8. 9.
10. 11. 12.
13. 14. 15.
Lopez-Basterretxea, A.; Mendez-Zorrilla, A.; Garcia-Zapirain, B. Eye/Head Tracking Technology to Improve HCI with iPad Applications. Sensors 2015, 15, 2244–2264. [CrossRef] [PubMed] Lee, J.W.; Heo, H.; Park, K.R. A Novel Gaze Tracking Method Based on the Generation of Virtual Calibration Points. Sensors 2013, 13, 10802–10822. [CrossRef] [PubMed] Chen, Y.-S.; Su, C.-H.; Chen, J.-H.; Chen, C.-S.; Hung, Y.-P.; Fuh, C.-S. Video-based eye tracking for autostereoscopic displays. Opt. Eng. 2001, 40, 2726–2734. Li, L.; Xu, Y.; Konig, A. Robust depth camera based multi-user eye tracking for autostereoscopic displays. In Proceedings of the 9th International Multi-Conference on Systems, Sygnals & Devices, Chemnitz, Germany, 20–23 March 2012. Ojala, T.; Pietikainen, M.; Maenp, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [CrossRef] Bilaniuk, O.; Fazl-Ersi, E.; Laganiere, R.; Xu, C.; Laroche, D.; Moulder, C. Fast LBP face detection on low-power SIMD architectures. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014. Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001. Jain, A.; Bharti, J.; Gupta, M.K. Improvements in openCV’s viola jones algorithm in face detection - tilted face detection. Int. J. Signal Image Proc. 2014, 5, 21–28. Kang, S.-J.; Jeong, Y.-W.; Yun, J.-J.; Bae, S. Real-time eye tracking technique for multiview 3D systems. In Proceedings of the 2016 IEEE International Conference on Consumer Electronics, Las Vegas, NV, USA, 8–11 January 2016. Lehmann, T.M.; Gonner, C.; Spitzer, K. Survey: Interpolation methods in medical image processing. IEEE Trans. Med. Imaging 1999, 18, 1049–1075. [CrossRef] [PubMed] Crow, F. Summed-area tables for texture mapping. ACM SIGGRAPH Comput. Gr. 1984, 18, 207–212. [CrossRef] Dala, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005. Lowe, D.G. Distinctive image features from scale-invariant key points. Int. J. Comp. Vis. 2004, 60, 91–110. [CrossRef] Kang, S.-J.; Cho, S.I.; Yoo, S.; Kim, Y.H. Scene change detection using multiple histograms for motion-compensated frame rate up-conversion. J. Disp. Technol. 2012, 8, 121–126. [CrossRef] Yang, Y.; Liu, X. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, 15–19 August 1999; pp. 42–49. © 2016 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).