the two planes from the iris contours extracted from an input image by ... estimation of the visual direction using the iris center and .... We call this new co-.
IEICE TRANS. INF. & SYST., VOL.E86–D, NO.5 MAY 2003
1
PAPER
Special Issue on LATEX 2ε Class File for IEICE Transactions
Visual Direction Estimation from a Monocular Image Haiyuan WU† , Member, Qian CHEN† , Nonmember, and Toshikazu WADA † , Member
SUMMARY This paper describes a sophisticated method to estimate visual direction using iris contours. This method requires only one monocular image taken by a camera with unknown focal length. In order to estimate the visual direction, we assume the visual directions of both eyes are parallel and iris boundaries are circles in 3D space. In this case, the two planes where the iris boundaries reside are also parallel. We estimate the normal vector of the two planes from the iris contours extracted from an input image by using an extended “two-circle” algorithm. Unlike most existing gaze estimation algorithms that require information about eye corners and heuristic knowledge about 3D structure of the eye in addition to the iris contours, our method uses two iris contours only. Another contribution of our method is the ability of estimating the focal length of the camera. It allows one to use a zoom lens to take images and the focal length can be adjusted at any time. The extensive experiments over simulated images and real images demonstrate the robustness and the effectiveness of our method. key words: Visual Direction Estimation, Conic-Based, Monocular Image, Unknown Focal Length
1. Introduction Visual direction is the viewing direction of a person, and it provides important information about people’s interest. So, it is very important to estimate visual direction for eye contact, virtual reality and human computer interaction. According to anatomy, visual direction is defined by the line connecting the object of fixation, the nodal point and the fovea. The line is also passing through the eyeball center and the pupil center. People change their visual directions either by rotating their heads and/or by rotating the eyeballs. In both cases, the appearance of the eyes viewed by a camera, including the shape and the position, will change. Most of the existing gaze researches focuse on eye detection or eye tracking, which estimate the eye positions in input images[6]-[12]. However, since the eye positions in the images are also affected by other factors including head pose, 3D head position or the focal length of a camera, the eye positions alone are not enough to determine the visual direction. According to the fact that the head pose and/or head position is difficult for machine vision to estimate accurately, many existing visual direction estimation methods assume a fixed head position relative to a camera (e.g. using a head mounted camera). However, in many applications, it is not practical to use a head mounted camera or not to allow users to move their head. Manuscript received October 8, 2004. Manuscript revised February 07, 2005. Final manuscript received October 29, 2003. † Faculty of Systems Engineering, Wakayama University
Another kind of methods is to estimate the visual direction directly from the eye features such as irises, eye corners, eyelids, etc. If the positions of any two points of the nodal point, the fovea, the eyeball center and the pupil center can be estimated, the visual direction will be determined. However, those points are hidden in eyeball thus they will not be viewable without some special equipment. One approach to solve this problem is to estimate those features from other viewable facial features indirectly. Matsumoto et al[5] determined the visual direction by estimating the position of the pupil center and the eyeball center from visible facial features. They assumed that the distances from the eyeball center to the two eye corners are known (manually adjusted). They also assumed a known head pose. The eye corners were located by using a binocular stereo system, which were then used to determine the eyeball center. They assumed that the iris boundaries show circles in the image and used Hough Transformation to detect them. The center of the circular iris boundary was used as the pupil center. Some drawbacks of this method are: 1) the relation between the eye corners and the eyeball center has to be adjusted manually for each individual person because it is not constant; 2) the method only works when the point of fixation is not far away from the camera, otherwise iris boundaries will not show circles but ellipses in the input image. Wang et al presented a “one-circle” algorithm that estimates the visual direction by using iris contour and the corner of one eye detected from an input image taken by a camera with known focal length[4]. They detected the elliptical iris contour and used it to calculate the normal vector of the plane where the circular iris boundary resides on. In order to solve the multi-solution problem, they calculated a rough estimation of the visual direction using the iris center and eyeball center. The eyeball center was calculated from the two eye corners based on the assumption that the 3D distances from the two eye corners to the eyeball center are equal. The one of the multi-solutions that has the least difference to the roughly estimated one was selected as the true answer. However, the 3D positions of the eye corners and the iris center are necessary to make this approach work but how to obtain them were not mentioned in the paper. Also, the head pose is necessary to estimate the eyeball center, but neither the necessity of the head pose nor the method to obtain it was described. The drawback of this method is that it requires the 3D positions of eye corners, iris center and head pose. However in general case, they can not be esti-
IEICE TRANS. INF. & SYST., VOL.E86–D, NO.5 MAY 2003
2
mated from a monocular image. We consider that to estimate the visual direction using invisible eye features is an ill-problem because 1. The estimation of invisible eye features such as eyeball center depends on several visible facial features. For example, in order to determine the eyeball center, the 3D positions of the eye corners and the head pose are required. Therefore, small estimation error of each facial feature can be accumulated and result in a big error in the estimated visual direction. 2. The estimation of the invisible eye features requires knowledge about the geometrical arrangement of the internal eye structure and the facial features. However, it is not constant for everyone in general. In this paper, we describe a novel visual direction estimation method that uses iris contours only. We assume 1) the visual directions of two eyes are parallel. This assumption is reasonable because when people look at a far place, the visual directions are approximately parallel. 2) The input image is taken by a camera whose the intrinsic parameters except the focal length have been calibrated. We detect the iris contours of the two eyes from a monocular image then fit them with ellipses[16][17]. After that, two elliptical cones are formed by using the optical center as apices and the fitted ellipses on the image plane as bases. Since the focal length is unknown, the two elliptical cones are deformable ones that have one degree of freedom. According to the assumption of parallel visual directions of the two eyes, the normal vectors of two planes where the iris contours reside on, which indicate the visual direction, will also be parallel. By using this constraint, the both normal vectors of the planes where the iris contours are and the focal length are estimated with our extended “two-circle” algorithm, from which the visual direction is calculated. The correctness of the estimated visual direction relies on the accuracy of the ellipses fitted onto the iris contours, which largely depends on the size of the iris contour in the image. In order to obtain satisfying results, the size of the iris contour in the image should be big enough. We consider that it is an acceptable requirement because
Fig. 1
Our active camera system and the pan-tilt camera.
3. It can make use of the images taken by a camera with a zoom lens or a lens of unknown focal length. The extensive experiments over simulated images and real images show the robustness and the effectiveness of our method. 2. Extended “Two-circle” Algorithm We have presented an algorithm called as “two-circle” algorithm[1]. It can estimate the normal vector of a plane where two coplanar circles reside from an image taken by a camera with unknown focal length. In this paper, we extend this algorithm so that it will be capable of estimating the plane orientation with two circles, which are on different but parallel planes. So, we can use this extended “two-circle” algorithm to estimate the visual direction by using two iris contours detected from a single input image. 2.1 Assumption and Definition
1. High-resolution cameras are becoming popular and less expensive nowadays. 2. We are developing an high performance active camera system for observing moving objects such as human faces and eyes, which uses a camera mounted on a pan-tilt unit.[3](see Fig.1). The camera parameters including focal length and the viewing direction are controlled by a computer. By using aforesaid provisions together with real-time target detecting/tracking method, images containing the interested target (e.g. face) in good condition can be obtained continuously.
Assumption 1: Parallel visual direction When people look at a far place, the visual directions of the both eyes are approximately parallel. We assume that the visual directions of the both eyes are parallel. Assumption 2: Circular iris boundary We assume that the iris boundaries are circles in 3D space. Assumption 3: Elliptical iris contour We assume that the iris contours of both eyes have been detected from the input images and fitted with ellipses. Assumption 4: Pinhole camera with unknown focal length We assume all the intrinsic parameters except the focal length of the camera for taking images have been calibrated. Definition 1: Supporting plane The plane containing a planar pattern, e.g. a circle, an ellipse, a rectangle, is called as the supporting plane hereafter. Definition 2: Base plane The plane containing the base of a cone is called as the base plane hereafter.
Our method has three advantages:
2.2 Elliptical cone and circular cross section
1. It does not require the whole irises to be viewable. 2. It does not use any additional facial features such eye corners or head pose.
Here, we discuss the problem of estimating the direction of the supporting plane of a circle from one image, which was
WU et al.: VISUAL DIRECTION ESTIMATION FROM A MONOCULAR IMAGE
3 Viewpoint Ellipse Image plane
Circle Supporting plane
(a) A circle and its projection.
Fig. 3
(b) The oblique elliptical cone. Fig. 2 The oblique elliptical cone can be determined when the focal length is known.
The oblique circular cone.
That is, Qe characterizes the 3D ellipse on z = −f plane. Because rays projected on the image plane pass the projection center (i.e. the origin of the camera coordinate system), all rays pass through the ellipse on the image, which also pass the circle in 3D space, form a cone surface (Fig.2(b)). Let P be a point on that cone, then P can be expressed by P = hXe .
taken by a camera with known focal length. M.Dhome[13] addressed it in a research about the pose estimation of an object of revolution. When a circle is projected onto an image plane by perspective projection, it shows an ellipse in general case (Fig.2(a)). The ellipse can be expressed by Ax2 + 2Bxy + Cy 2 + 2Dx + 2Ey + F = 0.
(1)
Eq.(1) can be rewritten in quadratic form as XT QX = 0, where
A Q= B D
(2) B C E
D x E , X = y . F 1
X = KXe ,
where K = diag(1, 1, −1/f ). By substituting Eq.(4) for X in Eq.(2), we obtain
where
A Qe = KQK = B −D f
B C −E f
−D f −E f . F f2
PT Qe P = h2 (Xe T Qe Xe ) = 0.
(5)
Qe = VΛVT , Λ V
(9)
= diag{λ1 , λ2 , λ3 } . = v1 v2 v3
(10)
By using the circle viewed as the base, the cone can be considered as an oblique circular cone (Fig.3). We rotate the camera coordinate system to let Z axis be perpendicular to the supporting plane of the circle. We call this new coordinate system as supporting plane coordinate system. In this coordinate system, the oblique circular cone is given by, Pc T Qc Pc = 0, where
Qc = (6)
(8)
Eq.(8) shows that the oblique elliptical cone can also be described by the quadratic form Q e . Qe can be expressed by its normalized eigenvalue vectors (v1 , v2 , v3 ) and eigenvalues (λ 1 , λ2 , λ3 ) as follows:
(3)
(4)
(KXe )T Q (KXe ) = Xe T Qe Xe = 0,
By substituting P for X e in Eq.(5), we obtain the following equation:
where
Since the image plane in camera coordinate system is given by z = −f , a point of the ellipse on the image plane can be expressed by X e = [x y − f ]T . Then the following equation stands:
(7)
(11)
1 0
0 1
− xz00
− yz00
− xz00 − yz00
x20 +y02 −r 2 z02
,
(12)
(x0 , y0 , z0 ) and r are the center and the radius of the circle
IEICE TRANS. INF. & SYST., VOL.E86–D, NO.5 MAY 2003
4
respectively. Since Q c and Qe describe the same cone surface, there is a rotation matrix R c that transforms Pc to P as follows, P = Rc Pc .
(13)
By substituting Eq.(13) for P in Eq.(8), the following equation is obtained. PT Qe P = Rc Rc T Qe Rc Pc = 0.
(14)
Since kQc describes the same cone as Q c for any non-zero k, we obtain the following equation from Eq.(14), Eq.(11) and Eq(9), (VT Rc )T Λ(VT Rc ) = kQc .
(15)
Because VVT = Rc RT c = I, we have (VT Rc )(VT Rc )T = I.
(16)
Because Qe describes a real elliptical cone surface, its eigenvalues can not have same sign. Without losing generality, we assume that (or we can re-order the eigenvalues and the corresponding eigenvectors to satisfy Eq.(17)). λ1 λ2 > 0,
λ1 λ3 < 0,
|λ1 | ≥ |λ2 |.
(17)
Solving Eq.(15) and Eq.(16), we obtain, S2 h g cos α S1 g sin α 0 ,(18) sin α −S1 cos α V T Rc = S1 S2 h cos α S2 h sin α −S1 g where α is a free variable, S1 = (−1)i ,
S2 = (−1)j ,
(19)
i, j are arbitrary integer numbers, and
λ2 − λ3 λ1 − λ2 , h= , (20) g= λ1 − λ3 λ1 − λ3 Thus Rc can be calculated as following from Eq.(16): Rc = V(VT Rc ).
(21)
Then the normal vector of the supporting plane of the circle can be calculated as follows, 0 S2 h (22) N = Rc 0 = V 0 . −S1 g 1 Since the two undetermined signs S 1 and S2 are left, there will be four possible answers for N as follows h h NA = V 0 , NB = V 0 , −g g (23) −h −h NC = V 0 , ND = V 0 . g −g Because NA and ND (NB and NC ) are in the opposite direction each other, they describe the direction of the same plane. Thus the number of the meaningful answers of the normal vector is two. In this paper, we let N A and NB be the two possible answers.
Fig. 4 If the focal length is unknown, the oblique elliptical cone can not be determined.
2.3 The Extended “Two-circle” Algorithm As described in the section 2.2, the normal vector of the supporting plane of a circle can be determined from one perspective image in the case that the focal length is known. However, there are two possible answers, so one has to pick up the real one by using some additional constraints. In addition, if the focal length is unknown, the oblique elliptical cone becomes a deformable cone that has one degree of freedom. Accordingly the normal vector will not be able to be determined in this case (see Fig. 4). In order to determine the focal length, so that the normal vector of the supporting plane of the circle can be estimated, and to obtain unique answer, we consider using an image of more than one circle. We assume there are two circles and the supporting planes of them are parallel. In this case, the two supporting planes share a common normal vector N. When the contours of these circles have been detected and fitted with ellipses, according to section 2.2, two oblique elliptical cones can be formed from the each of the ellipses on the image plane if a supposed focal length is given. In other words, each oblique elliptical cone can be considered as a function of focal length f . Thus, the i-th oblique elliptical cone can be expressed as Q i (f ). If f is given, according to section 2.2, the normal vector of the supporting plane can be calculated from Q i (f ). Since there are two possible answers, we expressed them as N A,i (f ) and NB,i (f ). They are also the functions of f . The “real” normal vector is one of NA,i (f ) and NB,i (f ). Because all the supporting planes of the circles share a common normal vector, we have the following constraint. F (f ) → min where F (f ) = min (F1 (f ), F2 (f ), F3 (f ), F4 (f )), 2 F1 (f ) = ||NA,1 (f ) × NA,2 (f )|| 2 F2 (f ) = ||NA,1 (f ) × NB,2 (f )|| F3 (f ) = ||NB,1 (f ) × NA,2 (f )||2 F (f ) = ||N (f ) × N (f )||2 4 B,1 B,2
(24)
WU et al.: VISUAL DIRECTION ESTIMATION FROM A MONOCULAR IMAGE
5
By using this constraint, both the normal vector of the supporting planes and the focal length can be determined by finding out the focal length that minimizes F (f ). Meanwhile, the undermined sign S 2 remained in Eq.(22) is also determined by selecting the pair of the vectors from (N A,1 (f ), NA,2 (f )), (NA,1 (f ), NB,2 (f )), (NB,1 (f ), NA,2 (f )), and (NB,1 (f ), NB,2 (f )) that gives the smallest value of F (f ). Therefore, by detecting the iris contours and fitting them with ellipses, the normal vector of the supporting planes of the both circular iris boundaries can be estimated, from which the visual direction can be calculated. Here, other facial features except the iris contours are not used.
(a) case-1.
3. Experimental Results 3.1 Experiment with Simulated Images We first used some simulated images to examine the convergence and the sensitivity to the pixel quantization error of the extended “two-circle” algorithm. We used computer graphics to generate images of scenes containing coplanar circles. We used computer graphics to generate images of scenes containing coplanar circles. The image resolution was set to 640 × 480 [pixel]. We first set the focal length to 200 [pixel], the tilt angle θ (the angle between the optical axis and the supporting plane) to 40 [degree], the roll angle β (the rotation angle on the optical axis) to 10 [degree], and the distance between the optical center and the supporting plane to 3.0 [meter]. We called this camera setting as “case-1” hereafter, and used it to generate images of circles of which the radius is 1.0 [meter]. Figure 5(a) shows an image containing all the circles used in the experiment using the “case-1” camera setting. We used 32 images of two circles randomly selected from the circles shown in Figure 5(a). We also used a different camera setting (“case-2”) to do the similar experiment. In case-2, we set the focal length to 300 [pixel], θ = 50 [degree] and β = 30 [degree]. All other parameters were same as “case-1”. Figure 5(b) shows an image containing all the circles used in the experiment of the “case-2” camera setting. We used 17 images of two circles randomly selected from the circles shown in Figure 5(b). From each image, two ellipses were detected and fitted[16][17], which were then used to estimate the normal vector of the supporting plane and the focal length with the extended “two-circle” algorithm. Fig.6 shows an example result indicating the relation between the supposed focal length f and θ. The value of F (f ) has been converted to the angle θ between the two estimated normal vectors in order to show the result intuitively, i.e. θ = arcsin(F (f ). In this figure, we noticed that only the actual focal length (=200 Pixel) has given the minimum angle between the two normal vectors. In all the experiments with simulated images, the graphs indicating the relation between the supposed focal length f
(b) case-2. Fig. 5
images of coplanar circles synthesized by CG
Fig. 6 The relation between the focal length f and the angle between the two normal vectors.
and θ = aresin(F (f )) showed the similar curve as shown in Fig.6. From this results, we confirmed that the extended “two-circle” algorithm always converges. The experimental results are summarized in Table 1 with suffixes 1 and 2. The tilt angle and the roll angle of the camera were calculated from the normal vector of the supporting plane. The estimated focal length, the tilt angle and the roll angle were compared to true camera parameters used to generate the images with CG. We noticed that if the minor axis is longer than 30 pixels, the estimation error will become small enough.
IEICE TRANS. INF. & SYST., VOL.E86–D, NO.5 MAY 2003
6 Table 1 ages
Estimated camera extrinsic parameters using synthesized imRMS error
Standard deviation
f1 (pixel) β1 (degree) θ1 (degree)
5.52 0.36 0.57
9.21 0.47 0.97
f2 (pixel) β2 (degree) θ2 (degree)
7.19 0.11 0.51
11.89 0.15 0.85
Table 3 Estimated extrinsic camera parameters from the image shown in Figure 7(a) θ (degree)
β (degree)
Normal Vector
No.
f (pixel)
1-2 2-3 3-1
33.3 34.2 33.7
4.9 6.1 6.3
(0.07 0.83 0.55) (0.09 0.82 0.56) (0.09 0.83 0.55)
1998 1893 2007
in Table 3. Because the true values were not available, we used the estimation results to convert the original image to a vertical view of the table to see whether it resembles the real scene or not. The three results obtained by using different ellipses pairs are shown in figure 7 (b), (c) and (d). In the converted images, each circular object showed as a circle. This indicates that the extended “two-circle” algorithm could give correct results for real images. (a) Original image
(b) Converted image 1-2
(c) Converted image 2-3 Fig. 7
(d) Converted image 3-1
Original image and converted images.
Table 2
Estimated parameters of the ellipses
No.
a (pixel)
b (pixel)
θ (degree)
x0 (pixel)
y0 (pixel)
CD1 CD2 CD3
332 307 242
180 127 162
13.2 -1.1 3.5
400 -440 -162
-115 325 -252
3.2 Experiments with real images In order to exam if the “two-circle” algorithm works well for real images, we applied our method to a real image shown in Figure 7(a). The detected and fitted ellipses are superimposed in the original image. The image contained three circles: CD1 (the big CD on the table), CD2 (the big CD on the book) and CD3 (the small CD on the table). CD1 and CD3 were coplanar, and CD2 was on another plane that was parallel to the supporting plane of CD1 and CD3. The image resolution was 1600 × 1200[Pixel]. The parameters of the fitted ellipses are shown in Table 2, where a, b, θ and (x0 , y0 ) are the major axis, minor axis, the angle between the major axis and the X-axis, and the center of the ellipse, respectively. The origin of the image coordinate system is the image center, and the Y axis directs to the upper direction. The normal vector of the supporting plane (the table) and the focal length of the camera were estimated using any two of the three ellipses (CD discs). The results are summarized
3.3 Experiments with Real Face Images In order to test the performance of our algorithm for visual direction estimation, we have made some experiments using real face images. We took several face images with a digital camera. For each image (3072 × 2048[Pixel]), the eye regions were detected[2]. Some simple image enhancement was applied to the eye region to make the iris clear. Then the iris contours were detected and fitted with ellipses, which were used to calculate the normal vector of the supporting plane of the iris boundaries with the extended “two-circle” algorithm. The direction of the visual direction was obtained from the estimated normal vector. Figure 8 shows the procedure of visual direction estimation. Figure 9 shows some experimental results of visual direction estimation. The experimental results obtained from the images shown in Figure 9 are summarized in Table 4, and estimated visual direction was showed as arrows superimposed in each face image. These experimental results showed that our algorithm could give good results for real face images. 3.4 Experiments about Estimation Accuracy In order to investigate the accuracy of the visual direction given by our algorithm, we have made some experiments of visual direction estimation in a known environment. The experimental environment is shown in Figure 10. We let a testee to sit on a chair and let he or she to gaze the specified marker on the wall. There were fifteen markers arranged to three rows by five columns on the wall. The horizontal distance between markers was 120 cm and the vertical distance was 101 cm. The distance between the eye position and the wall was 485 cm. The positions of the markers, the camera and the human head were fixed during the experiments. The direction of the camera was also fixed. The true visual direction from the eye position to each marker was calculated from the above data of the environment setting. It was used as the reference value for evaluating the accuracy of the estimation of visual direction.
WU et al.: VISUAL DIRECTION ESTIMATION FROM A MONOCULAR IMAGE
7
Pre−processing Binarization Find edge (a) Original image,
(b) Detecting iris,
(c) fitted iris contours, al : bl : θl : x0l : y0l Visual line direction ar : br : θr : x0r : y0r 33: 33: -35.9: 190: 63 (-0.35 -0.07 0.94) 34: 30: 83.3: -178: 56 (d) Parameters of the fitted ellipses and the estimated visual line direction.
No.1
No.2
No.3
No.4
No.5
No.6
Image size (Pixel) 720 × 480
Fig. 8 Table 4 9
The procedure of the visual direction estimation.
Experimental results estimated from the images shown in Figure No.
1 2 3 4 5 6 7 8 9
Elliptical iris contours left: al : bl : θl : x0l : y0l right: ar : br : θr : x0r : y0r 112: 100: 84.5: 672: 243 109: 103: -79.7: -477: 195 102: 99: 38.9: 861: 102 100: 98: -83.4: -146: 84 69: 63: 68: 74: 88: 78: 73: 73:
56: 52: 63: 59: 75: 60: 70: 62:
-73.9: 390: 291 -79.4: -277: 313 -89.4: 635: 89 -85.9: 14: 78 -76.5: 324: 492 -81.5: -375: 469 86.0: 647: 97 -83.6: -109: 54
62: 59: 46: 47: 35: 36:
55: 46: 37: 35: 33: 34:
-63.0: -98: -175 -68.4: -588: -160 -5.3: 107: 79 -26.2: -325: 66 62.8: 456: 53 22.7: 43: 77
visual direction
(-0.28 -0.00 0.96) (-0.07 0.02 0.99) (-0.57 0.13 0.81) (-0.61 -0.01 0.79 (-0.56 0.02 0.83)
No.7 Fig. 9
(-0.49 -0.00 0.87)
No.8
No.9
Some images of real scene used the experiment.
(-0.36 0.26 0.89 (-0.13 0.55 0.83) (0.13 0.20 0.97)
A camera (Canon EOS DigitalKiss) with a zoom lens (f=18–55[mm] or f=2450–7500[pixel]) was used to take face images. The purpose of this experiment was to test the influence of the change of focal length to the estimation accuracy of our method. Figure 11 shows some example images used in the experiment. In order to investigate the influence of the head pose to the estimating accuracy of the visual direction, we took two kinds of images for each marker: 1. One kind of images was taken by letting the testee face to the camera but his/her eyes look at the specified marker (Case1);
Fig. 10
The test environment for visual direction estimating experiment.
2. The other kind of images was taken by letting the testee face to the specified marker and gaze at it (Case2). The experimental result is summarized in table 5. Each value from Col.A to Col.E in the table describes the average of the estimation error. We noticed that some of the experimental results were not very good. We analyzed them and have found out that the reasons were: 1) the eyeball in the image was too small
IEICE TRANS. INF. & SYST., VOL.E86–D, NO.5 MAY 2003
8
Case1:
E-1
pose. In the experiments about the accuracy of visual direction estimation, we have confirmed that our method could estimate the visual direction successfully for many different head pose, while most existing methods only work for frontal face images. The extensive experiments over simulated images and real face images have confirmed the robustness and the effectiveness of our method. In our experiments, the internal parameters neither have been calibrated nor has the radial distortion been corrected yet. In order to achieve higher estimation accuracy, both of them have to be calibrated and compensated.
A-1
Acknowledgments Case2:
E-1
A-1
This research was partially supported by the Ministry of Education, Culture Sports Science and Technology, Grant-in-Aid for Scientific Research (A)(2), 16200014, (C) 16500112, 2004, and (C) 15500112, 2003. References
Case1:
E-3
Case2:
E-3
Fig. 11
A-3
A-3 Some sample of two kinds images.
Table 5 Experimental results of visual direction estimation. The value from Col.A to Col.E in the table indicate the angle between the estimated visual direction and the ground truth in degree. Case
Col.A
Col.B
Col.C
Col.D
Col.E
Row 1
1 2
7.4 3.3
9.6 13.0
2.9 2.9
6.2 6.3
4.2 3.0
Row 2
1 2 1 2
3.2 6.2 2.0 16.2
8.9 4.9 9.1 5.0
19.2 19.2 10.5 10.5
4.0 9.7 6.7 5.2
2.2 3.7 5.1 3.4
Row 3
(smaller than 70 pixel); 2) the iris boundary was blurred by eyelashes. 4. Conclusion This paper has presented a new method to estimate visual direction from a monocular image. It estimates the visual direction from the ellipses fitted to the iris contours of the two eyes detected from the input image. Compared with existing methods, our method only uses the iris contours and does not require a known focal length. Other facial features such as eye corners, head pose and so on are not required. Because of this, the accuracy of the visual direction estimated by our method should not affected by the change of head
[1] Q.Chen, H.Wu and T.Wada, Camera Calibration with Two Arbitrary Coplanar Circles, ECCV, pp.521-532, 2004. [2] H. WU, et al, Automatic Facial Feature Points Detection with SUSAN Operator, SCIA, pp.257-263, 2001. [3] H. Oike, H. WU, T. Kato, T. Wada, A High-Performance Active Camera System for Taking Clear Images, CVIM14414, pp.71-78, 2004. [4] J.G. Wang, E. Sung and R. Venkateswarlu, Eye Gaze Estimation from a Single Image of One Eye, ICCV’03. [5] Y. Matsumoto and A. Zelinsky, presented An Algorithm for Real-time Stereo Vision Implementation of Head Pose and Gaze Direction Measurement, FG, pp.499-504, 2000. [6] A.T.Duchowski, A Breadth-First Survey of Eye Tracking Applications, Behavior Research Methods, Instruments, and Computers, 2002. [7] A.Criminisi, J. Shotton, A. Blake and P.H.S. Torr, Gaze Manipulation for One-to-one Teleconferencing, ICCV’03. [8] D.H. Yoo, et al, Non-contact Eye Gaze Tracking System by Mapping of Corneal Reflections, FG, 2002 [9] J. Zhu and J. Yang, Subpixel Eye Gaze Tracking, FG, 2002. [10] Y.l. Tian, K. Kanade and J.F. Cohn, Dual-state Parametric Eye Tracking, FG, pp.110-115, 2000. [11] A. Schubert, Detection and Tracking of Facial Feature in Real time Using a Synergistic Approach of Spatial-Temporal Models and Generalized Hough-Transform Techniques, FG, pp.116-121, 2000. [12] F.D.l. Torre, Y. Yacoob and L. Davis, A Probabilistic Framework for Rigid and Non-rigid Appearance Based Tracking and Recognition, FG, pp.491-498, 2000. [13] M.Dhome, J.T.Lapreste, G.Rives and M.Richetin, Spatial Localization of Modeled Objects of Revolution in Monocular Perspective Vision, ECCV 90, pp. 475–485, 1990. [14] K.Kanatani and L.Wu, 3D Interpretation of Conics and Orthogonality, Image Understanding, Vol.58, Nov, pp. 286301, 1993. [15] L.Wu and K.Kanatani, Interpretation of Conic Motion and Its Applications, Int. Journal of Computer Vision, Vol.10, No.1, pp. 67–84, 1993. [16] A. Fitzgibbon, M.Pilu and R.B.Fisher, Direct Least Square Fitting of Ellipses, PAMI , Vol.21, No.5, pp. 476–480, 1999. [17] R.Halir, and J.Flusser, Numerically Stable Direct Least Squares Fitting of Ellipses, WSCG, 1998.
WU et al.: VISUAL DIRECTION ESTIMATION FROM A MONOCULAR IMAGE
9 Haiyuan WU was born in 19xx. ... The Institute of Electronics, Information and Communication Engineers (IEICE), ...
Qian CHEN was born in 19xx. ... The Institute of Electronics, Information and Communication Engineers (IEICE), ...
Toshikazu WADA was born in 19xx. ... The Institute of Electronics, Information and Communication Engineers (IEICE), ...