Improving the Accuracy of Gaze Input for ... - ACM Digital Library

3 downloads 0 Views 586KB Size Report
ETRA 2008, Savannah, Georgia, March 26–28, 2008. © 2008 ACM 978-1-59593-982-1/08/0003 $5.00. Improving the Accuracy of Gaze Input for Interaction.
Improving the Accuracy of Gaze Input for Interaction Manu Kumar GazeWorks, Inc. P.O. Box 901 Palo Alto, CA 94302-0901 [email protected]

Jeff Klingner, Rohan Puranik, Terry Winograd, Andreas Paepcke Stanford University Gates Building, 353 Serra Mall Stanford, CA 94305, USA {klingner, rpuranik, winograd, paepcke}@stanford.edu

was comparable to the mouse, error rates were significantly higher. To address this problem we conducted a series of studies to better understand the source of these errors and identify ways to improve the accuracy of gaze-based pointing.

Abstract Using gaze information as a form of input poses challenges based on the nature of eye movements and how we humans use our eyes in conjunction with other motor actions. In this paper, we present three techniques for improving the use of gaze as a form of input. We first present a saccade detection and smoothing algorithm that works on real-time streaming gaze information. We then present a study which explores some of the timing issues of using gaze in conjunction with a trigger (key press or other motor action) and propose a solution for resolving these issues. Finally, we present the concept of Focus Points, which makes it easier for users to focus their gaze when using gaze-based interaction techniques. Though these techniques were developed for improving the performance of gaze-based pointing, their use is applicable in general to using gaze as a practical form of input.

In this paper we present three methods for improving the accuracy and user experience of gaze-based pointing: an algorithm for realtime saccade detection and fixation smoothing, an algorithm for improving eye-hand coordination, and the use of focus points. These methods boost the basic performance for using gaze information in interactive applications and in our applications made the difference between prohibitively high error rates and practical usefulness of gaze-based interaction.

2. Real-Time Saccade Detection And Fixation Smoothing

CCS: H.5.2 [Information interfaces and presentation]: User Interfaces. – Input devices and strategies.

Basic eye movements can be broken down into two types: fixations and saccades. A fixation occurs when the gaze rests steadily on a single point. A saccade is a fast movement of the eye between two fixations. However, even fixations are not stable and the eye jitters during fixations due to drift, tremor and involuntary micro-saccades [Yarbus 1967]. This gaze jitter, together with the limited accuracy of eye trackers, results in a noisy gaze signal.

General terms: Human Factors, Algorithms, Performance, Design Keywords: Eye Tracking, Gaze Input, Gaze-enhanced User Interface Design, Fixation Smoothing, Eye-hand coordination, Focus Points.

The prior work on algorithms for identifying fixations and saccades [Monty et al. 1978, Salvucci 1999, Salvucci and Goldberg 2000, Yarbus 1967] has dealt mainly with post-processing previously captured gaze information. For using gaze information as a form of input, it is necessary to analyze eye-movement data in real-time.

1. Introduction The eyes are a rich source of information for gathering context in our everyday lives. A user’s gaze is postulated to be the best proxy for attention or intention [Zhai 2003]. Using eye-gaze information as a form of input can enable a computer system to gain more contextual information about the user’s task, which in turn can be leveraged to design interfaces which are more intuitive and intelligent. Recent research has explored the use of gaze information as a practical form of input by augmenting rather than replacing existing interaction techniques. Kumar et al have developed gaze-enhanced interaction techniques for pointing and selection [Kumar et al. 2007b], scrolling [Kumar et al. 2007c], password entry [Kumar et al. 2007a] and other everyday computing tasks.

To smooth the data from the eye tracker in real-time, it is necessary to determine whether the most recent data point is the beginning of a saccade, a continuation of the current fixation or an outlier relative to the current fixation. We use a gaze movement threshold, in which two gaze points separated by a Euclidean distance of more than a given saccade threshold are labeled as a saccade. This is similar to the velocity threshold technique described in [Salvucci and Goldberg 2000], with two modifications to make it more robust to noise. First, we measure the displacement of each eye movement relative to the current estimate of the fixation location rather than to the previous measurement. Second, we look ahead one measurement and reject movements over the saccade threshold which immediately return to the current fixation. This prevents single outliers of the current fixation from being mislabeled as saccades. It should be noted that this look-ahead introduces a one-measurement latency (20ms for the Tobii 1750 eye tracker [Tobii Technology 2006]) at saccade thresholds into the gaze data provided to the application.

In the paper describing EyePoint [Kumar et al. 2007b], it was reported that while the speed of a gaze-based pointing technique Copyright © 2008 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail [email protected]. ETRA 2008, Savannah, Georgia, March 26–28, 2008. © 2008 ACM 978-1-59593-982-1/08/0003 $5.00

65

The algorithm maintains T m two seets of points: thhe current fixatioon w window and a potential p fixation n window. If a point is close to (w within a saccade threshold) thee current fixationn, then it is addeed too the current fixxation window. The T new currentt fixation is calcculaated by a weighhted mean whicch favors more recent r points (ddes scribed below). If the point difffers from the cuurrent fixation by b m more than a sacccade threshold, then it is added to the potentiial f fixation window w and the curren nt fixation is retturned. When thhe n next data point iss available, if it was closer to thhe current fixatioon, thhen we add it too the current fixaation and throw away a the potentiial f fixation as an ouutlier. If the datta point is closeer to the potentiial f fixation, then wee add the point to the potentiall fixation windoow a make this thee new current fix and xation. The fixation poinnt is calculated as a weighted mean T m (a one-sideed trriangular filter) of the set of po oints in the fixattion window. Thhe w weight assigned to each point iss based on its poosition in the wiind dow. For a windoow with n points (P0, P1… Pn-1) the mean fixatioon w would be calculaated by the formu ula:

Pfixation =

Figuure 1. Results of o our real-timee saccade detecction and smooothing algorithm m. Note that thee one measurem ment lookaheaad prevents outliiers in the raw gaze data from beeing mistakenn for saccades, but b introduces a 20ms latency onn saccade thressholds.

1P0 + 2 P1 + ... + nP Pn−1 (1 + 2 + ... + n)

plus a keyboard triggger. Using a dw well-based triggeer exhibited minimaal errors. These observations ledd us to hypothessize that the errors may m be caused by b a failure of syynchronization beetween gaze and trigggers.

The size of the fixation T fi window (n) is capped to include only daata p points that occuurred within a dwell d duration [Majaranta et al. a 2 2004, Majaranta et al. 2003] of 400-500ms. Wee do this to alloow thhe fixation pointt to adjust more rapidly to slightt drifts in the gaaze d data.

To deteermine the causse and the numbber of errors, wee conducted two useer studies with 15 1 subjects (11 male, m 4 female, average age 26 yeaars). In the first study, subjects were presentedd with a red balloonn. Each time thhey looked at thhe balloon and pressed the trigger key, the red balloon popped annd moved to a new n location (Movinng Target Studyy). In the secondd study, subjectts were presented with 20 numberred balloons on the t screen and assked to look at eachh balloon in ordeer and press the trigger key (Stattionary Targets Sttudy). Subjects repeated r each sttudy twice, oncee optimizing for speed and trying to perform the taskk as quickly as possible p and the second time optimiizing for accuraccy and trying to perform the study as a accurately as possible. The order o of the studdies was varied forr counter-balanccing and trials were w repeated inn case of an error.

Figure 1 shows thhe output from the F t smoothing allgorithm for the xc coordinate of thee eye-tracking daata. We also shoow a Kalman filtter a applied to the enntire raw gaze daata and a Kalmaan filter applied in p parts to the fixations only. A Kaalman Filter applied over the enntirety of the raw gaze data smoo othes over saccaade intervals. Thhe n nature of eye moovements, in parrticular the existtence of saccadees, n necessitates that the smoothing function only be b applied to fixxatiions, i.e. within saccade boundaaries. Applying the Kalman filtter inn parts to the fixxations only doees yield comparaable results to our o o one-sided trianguular filter discusssed above. It iss possible that apa p plying a non-lineear variant of th he Kalman filterr [Arulampalam et a 2002], or a better al. b process model m of eye movements m for thhe K Kalman filter maay yield better sm moothing results. The advantagges o our approachh relative to Kallman filters aree simplicity andd a of d design specific to t eye movemen nts, which can tolerate the noiise tyypically occurrinng in eye trackin ng data.

o the data and classification of o the errors An in-ddepth analysis of from thhe two studies reevealed multiple sources of errorr: Trackinng errors: causeed due to the eye tracker accuuracy. These includee cases in which the gaze data frrom the eye trackker is biased or wheen the location of o the target clooser to the peripphery of the screen results in lowerr accuracy from the eye trackerr [Beinhauer 2006].

While there is sttill room for imp W provement in thee algorithm abovve b taking into account by a the direectionality of thhe incoming daata p points, we foundd that our saccad de detection andd smoothing alggorithm significantlly improved the reliability of thee results for applic cations which reely on the real-ttime use of eyee-tracking data. A m more detailed deescription of ourr algorithm incluuding pseudocodde iss available in [K Kumar 2007].

Early-T Trigger errors: caused c because the trigger happpened before the useer’s gaze was inn the target area. Early triggers can happen becausee a) the eye traccker introduces a sensor lag of about 33ms in proccessing the userr’s eye gaze, b) b the smoothingg algorithm introduuces an additionaal latency of 20m ms at saccade thhresholds c) in somee cases (as in the Moving Targeet study) the userrs may have only loooked at the targget in their periphheral vision andd pressed the trigger before they actuually focused on the target.

.

3 Eye-Hand 3. d Coordinatio on Previous research on using a co P ombination of gaze and keyboaard f performing a pointing task [Kumar for [ et al. 20007b] showed thhat e error rates were very high. Additional work on using gaze-baseed p password entry [Kumar [ et al. 20 007a] showed thhat the high errror rates existed onlly when the subjjects used a com mbination of gaaze

t user had alreeady moved Late-Trrigger errors: caaused because the their gaze on to the next n target beforre they pressed the trigger. Late triiggers can happpen only in cases when multiplee targets are

66

Figgure 4. Simulatioon of smoothingg and early triggger correction (ETC) on the t speed task for f the Moving Target Stuudy shows that the t percentage error e of the speeed task deccreases significaantly and is com mparable to thee error ratee of the accuracyy task.

Figure 2. Soources of error in gaze input. Shaded areas show the targget regions. Exam mple triggers aree indicated by red arrows. The triggers sho own are all diffeerent attempts to click on thhe upper target region. r The triggger points correspond to: a) a early trigger error, b) raw hiit and smooth hit, c) raw miiss and smooth hit, h and d) late trrigger error.

study. Figure 4 shows the outcome froom the simulateed results. It should be noted that both b smoothing and early-triggeer correction by them mselves actuallyy increased the error e rate, becauuse smoothing intrroduces a latencyy that the early trigger t would bee correcting. The errror rate in the speed task wheen using a com mbination of smoothhing and early triigger correction approaches the error rate of the accuracy task — without comprom mising the speed of o the task.

vvisible on the scrreen, as in the Sttationary Targetss study or in gazzeb based typing. Other errors: theese include a) sm O moothing errorss caused when thhe s smoothed data haappened to be ou utside the target boundary, but thhe raw data point would w have, by chance, resulted in a hit, b) humaan e errors where the subject just was not looking at the right thing or thhe subject lookeed down at the keyboard k beforee pressing the triigg ger.

While we were able too identify late-trrigger errors in the analysis d it is difficuult to distinguishh a late trigger frrom an early of the data, trigger or even an on-tiime trigger without using inform mation about the loccation of the tarrgets. Since ourr approach has focused on providiing a generally applicable a technniques for gaze-input which do not rely on applicaation or operatinng system speciffic information wee did not attemptt to correct for laate triggers, thouugh we note that thee use of informaation about targeet locations has the t potential to signnificantly improvve the accuracy of gaze-based input i by allowing the current fixaation to be applieed to the closest target. t

Figure 2 illustrattes the different error types. Figgure 3 shows hoow F o often each type of o error occurred d in the two studiies. To improve the accuracy T a of gazee-based pointingg in the case of thhe s speed task, we implemented an n early trigger correction (ETC C) a algorithm which delays trigger points p by 80ms to account for thhe s sensor lag, smooothing latency and a peripheral vision v effects. We W s simulated this allgorithm over th he data from thhe Moving Targget

4. Fo ocus Points In prevvious work on gaze-based g poinnting [Kumar et al. 2007b], pointinng was aided by the use of Focuus Points—a griid pattern of dots ovverlaid on the maagnified view thhat contained the targets (see Figure 5). It was hypothesized that foccus points assistt the user in makingg a more fine-graained selection by b focusing the user’s gaze, therebyy improving the accuracy of the eye tracking. However, H the studies in the EyePoinnt paper showedd no conclusive effect of an improvvement in trackinng accuracy wheen using focus pooints. To testt this hypothesis further, we connducted a user stuudy with 17 subjectts (11 male, 6 feemale, average age 22). In the first part of the studdy subjects weree shown a red balloon b and askeed to look at the cennter of the ballooon. Once they haad looked at thee balloon for a dwelll duration (450m ms) the balloonn automatically moved to a new location. In the seecond part, subjects repeated the study, but o the balloon clearly c marked with w a focus with thhe center point of point. The T order was varied v and each subject was shoown 40 balloons. The user’s raw and smoothed gaze g positions were w logged t end of the study s users werre presented for eacch balloon. At the with a 7-point Likert sccale questionnaiire which asked them which condition was easier annd whether they found the focus point at the o the balloon usseful. center of

Figure 3. Analyysis of errors in the t two studies show that a large number off errors in the Sp peed Task happeen due to early triggers annd late triggers – errors in synchhronization between the gazze and trigger ev vents.

67

Referrences ARULA AMPALAM, M.S S., MASKELL, S., S GORDON, N., AND CLAPP L , T. 2002. A Tutorial on Parrticle Filters for Online Nonnlinear/Non-Gauussian Bayesian Tracking. IEEE E Transactionns on Signal Proocessing. 50(2): p. p 174.

BEINH HAUER, W. 2006 6. A Widget Librrary for Gaze-baased Interacttion Elements, inn ETRA: Eye Traacking Researchh and Appliccations Symposiuum. San Diego, California, USA A: ACM Preess. p. 53-53. Figure 5. Magnnified view for gaze-based g poinnting technique with and withouut focus points. Using focus points provides a visual anchor for f subjects to focus f their gazee on making it easier for them to click in the teext box.

KUMA AR, M. 2007. GU UIDe Saccade Deetection and Smoothing Alggorithm. Techniccal Report CSTR R 2007-03, Stanfford Univerrsity

KUMA AR, M., GARFIN NKEL, T., BONEH H, D., AND WIN NOGRAD, T. 2007a. 2 Reducingg Shoulder-surfiing by Using Gaze-based

We computed thhe standard deviaation of the Eucclidean distance of W e each gaze point from the center point of the target. t The resullts fr from the study show that withiin the bounds of o the measurabble a accuracy of the eye e tracker1, the use of focus poiints did not havee a s significant impacct in concentratin ng the user’s gazze on the center of thhe target. The questionnaire q reesults however, indicate that suubjeects found the coondition with the focus point eaasier and found thhe f focus point to bee useful when trrying to look att the center of thhe taarget. These resuults are consisten nt with the findinngs in the originnal E EyePoint paper [Kumar et al. 200 07b].

Passsword Entry. Teechnical Report CSTR 2007-05, Stanford Uniiversity

KUMA AR, M., PAEPCK KE, A., AND WIN NOGRAD, T. 2007b. EyePoiint: Practical Poiinting and Selecttion Using Gazee and Keyboaard, in CHI. San Jose, Californiaa, USA: ACM Prress.

KUMA AR, M., WINOGR RAD, T., AND PAEPCKE, A. 2007c. Gazeenhhanced Scrollingg Techniques, in CHI. San Jose, California, C USA: ACM Press.

We conclude thaat while the use of focus points may not measurraW b improve the accuracy of the raw gaze data frrom the eye tracckbly e they do indeeed make pointing er, g easier and proovide a better usser e experience. Thiss is illustrated by Figure 5, which w shows tw wo v views of the maggnified view from m EyePoint. If the t subject intennds o clicking in thhe text area in th on he bottom rightt of the magnifieed v view, the task is easier for the su ubject in the conndition with focus p points, since the focus points pro ovide a visual annchor for the suubjeect to focus uponn.

MAJAR RANTA, P., AUL LA, A., AND RÄIHÄ Ä , K.-J. 20044. Effects of Feedback F on Eyee Typing with a Short Dwell Tim me, in ETR TRA: Eye Trackinng Research & Applications A Sym mposium. Sann Antonio, Texass, USA: ACM Press. P p. 139-1466.

MAJAR RANTA, P., MAC A KENZIE, I.S., AULA, A., AND D RÄIHÄ, K.--J. 2003. Auditoory and Visual Feeedback During Eye Typingg, in CHI. Ft. Lauuderdale, Floridaa, USA: ACM Press. P p. 7666-767.

5 Conclusion 5.

MONTTY, R.A., SENDEERS, J.W., AND D FISHER, D.F. Eye E Move-

The techniques presented T p above improve the usee of gaze for inpput b addressing chhanges in how we by w interpret the gaze g data from eyye trrackers (real-tim me saccade dettection and smooothing), how to m match gaze inputt with an external trigger (eye-hhand coordinatioon) a by introduciing features whiich make it easiier for the user to and loook at the desirred target (focu us points). The above techniquues d describe software changes that can c be made at the t operating syysteem layer to impprove the use orr gaze as a form m of input and are a o orthogonal to anny improvementt in the underlyiing tracking tecchn nology that provvides for more acccuracy and head movement froom thhe eye tracker.

mennts and the Highher Psychologicaal Functions. 1978, Hillsdale, New Jerssey, USA: Erlbauum.

SALVU UCCI, D.D. 1999 9. Inferring Intennt in Eye-Based Interfaces: Traacing Eye Movem ments with Process Models, in CHI. C Pittsburrgh, Pennsylvaniia, USA: ACM Press. P p. 254-2611.

SALVU UCCI, D.D. AND D GOLDBERG, J.H J . 2000. Identiifying Fixationns and Saccades in Eye-Trackingg Protocols, in ETRA: E Eye Traacking Research & Applications Symposium. Pallm Bech Garrdens, Florida, USA: U ACM Press. p. 71-78.

TOBII TECHNOLOGY, AB. 2006 Tobii 1750 Eye Traccker. httpp://www.tobii.coom.

YARBU US, A.L. Eye Movements M and Vision. V 1967, New w York: Plenum Press.

ZHAI, S. 2003. What's in the Eyes for Attentive A Input, in Com1

munnications of the ACM, A Vol. 46, (3) ( The accuracy of the eye track ker is approxim mately 1˚ of visuual angle which prrovides a spread d of 33 pixels in any direction (ddiameter of spreaad ~66 pixels)

68

Suggest Documents