Noise Tolerant Selection by Gaze-Controlled ... - ACM Digital Library

4 downloads 0 Views 342KB Size Report
[email protected]. ETRA 2008, Savannah, Georgia, March 26–28, 2008. © 2008 ACM 978-1-59593-982-1/08/0003 $5.00. Noise Tolerant Selection by ...
Noise Tolerant Selection by Gaze-Controlled Pan and Zoom in 3D Dan Witzner Hansen Informatics and Mathematical Modelling Technical University of Denmark [email protected]

Henrik H.T. Skovsgaard John Paulin Hansen IT University, Copenhagen {hhje,paulin}@itu.dk

Abstract

Another motivation is to explore gaze interaction with small-size displays, which are particularly relevant for mobile devices (e.g. handheld or head-mounted). Can gaze interaction be applied on small-size displays with densely located selectable items? The use of low-resolution eye trackers on large displays or high-resolution eye trackers on small displays requires equivalent design considerations.

This paper presents StarGazer - a new 3D interface for gaze-based interaction and target selection using continuous pan and zoom. Through StarGazer we address the issues of interacting with graph structured data and applications (i.e. gaze typing systems) using low resolution eye trackers or small-size displays. We show that it is possible to make robust selection even with a large number of selectable items on the screen and noisy gaze trackers. A test with 48 subjects demonstrated that users who have never tried gaze interaction before could rapidly adapt to the navigation principles of StarGazer. We tested three different display sizes (down to PDAsized displays) and found that large screens are faster to navigate than small displays and that the error rate is higher for the smallest display. Half of the subjects were exposed to severe noise deliberately added on the cursor positions. We found that this had a negative impact on efficiency. However, the user remained in control and the noise did not seem to effect the error rate. Additionally, three subjects tested the effects of temporally adding noise to simulate latency in the gaze tracker. Even with a significant latency (about 200 ms) the subjects were able to type at acceptable rates. In a second test, seven subjects were allowed to adjust the zooming speed themselves. They achieved typing rates of more than eight words per minute without using language modeling. We conclude that the StarGazer application is an intuitive 3D interface for gaze navigation, allowing more selectable objects to be displayed on the screen than the accuracy of the gaze trackers would otherwise permit.

Additional noise will be introduced when supporting eye tracking during motion (e.g. from the vibrations of a wheelchair), in outdoor environments with sunlight, or when using off-the-shelf cameras. Also, some eye trackers tend to drift and become less accurate after a while or introduce latency, thus forcing users to recalibrate them frequently. This has inclined designers of gaze applications to develop interfaces that cope with the low resolution by using large on-screen buttons and predictive modeling [Majaranta and R¨aih¨a 2002]. For many years dwell-time activation has been the preferred means of making selections in gaze-based interaction. Current eye typing applications employing dwell time activation produce about 6-10 words per minute (WPM) [Majaranta et al. 2006]. During dwell time activations the user is waiting for the activation to be made. This is an obvious waste of time that has a major impact on how many selections can be made within a given time frame. An experienced user typing 10 words per minute on a qwerty-keyboard with a 500 ms. dwell time threshold may spend almost a third of the time waiting. Furthermore, it is unnatural for humans to keep staring at an object once it is found [Jacob 1991].

Keywords: Eye typing, eye tracking, gaze interaction, 3D interfaces, zooming, mobile displays, assistive technology, alternative communication, computer input devices.

1

Emilie Møllenbach AVRC, Applied Vision Research Centre Loughborough University [email protected]

The principal process of object selection in user interfaces is illustrated in Figure 1. An interface has a set of possible activation areas. For example, on-screen keyboards keep a fixed grid display. Each activation area (e.g. buttons) allows for a set of actions to be executed. Activations change the state of the program internally and possibly also visually, revealing new options to be selected. A fixed dwell time activation displays the new state at a constant rate between each selection (c.f. Figure 1). Whenever the size of the activation areas are smaller than the accuracy of the pointing device, multiple activations (hierarchical activation structures) may be employed. The number of activations therefore increases.

Introduction

Despite several decades of research, eye trackers are not precise enough to select small objects (e.g. icons and menu items) in a Windows environment. Several authors doubt that gaze trackers will ever become as accurate as the mouse [Jacob 1991; Ashmore et al. 2005]. To account for the inaccuracy, some commercial systems for gaze communication offer a magnification tool that allows interaction with the gazed screen region through dwell-time activation (e.g. [Lankford 2000]).

Screen

Screen

Screen

An important motivation for the work presented here is our efforts to reduce the price of eye trackers and increase their availability. One of the more promising approaches in this direction is to use off-the-shelf components. Current low-cost eye trackers trade the price for a lower resolution [Hansen and Pece 2005; Li et al. 2006]. Copyright © 2008 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail [email protected]. ETRA 2008, Savannah, Georgia, March 26–28, 2008. © 2008 ACM 978-1-59593-982-1/08/0003 $5.00

Time

Figure 1: The process of making selections on a monitor displayed as a graph over time. The dashed lines are the possible paths to selectable objects on the screen. The thick lines illustrate actual selections.

205

style (windows, icons, menus and pointing devices) interaction with a web browser and a word processor. A regular eye mouse and a zoom-based eye mouse are compared with a standard head mouse1 . Their results show that the standard head mouse is superior to the eye mouse but the addition of zoom allows the eye mouse to become more efficient than the head mouse. The zoom environment also seems to ease the workload and the subjects find it easer to use than the head mouse. [Bates and Istance 2002] use discrete zoom levels but later they apply continuous zoom in [Bates et al. 2005] by examining a ”fly-where-I-look” approach for object selection in a 3D virtual environment. Gaze controlled fly was found to be as efficient as hand mouse control, with object size only having a minor effect on efficiency.

Zooming is an effective way to increase information on limited screen space. Uniform scaling of the information space increases spatial separation between selectable items and facilitates an unambiguous selection of the intended target. When the noise levels increase, the selectable items need to be separated more. Consequently zooming in on an item takes a bit longer than without noise. The geometric relations in the information space are not distorted and the user maintains the familiarity with the information space. Also, when the user moves from a big screen with a large view to a small screen (mobile), the structure of the selectable items is preserved; only more zoom is needed to be able to recognize the elements on the small display. Zooming vs. dwell time activation

Dwelling on a particular item for a certain duration or, during the same time, zooming in on the item may seem like equivalent approaches. However, there are several differences worth noting:

[Miniotas et al. 2004] examine the benefits of invisibly expanding targets to a ”point-friendly” size within the focus of attention. Their results show the benefit of gaze directed target expansion in terms of both speed and accuracy. They conclude that invisible target expansion in motor space of an interface might be an amenable technique to compensate for the limited accuracy of gaze input. They also warn that the spatial costs of expansions are permanent. This excludes other selectable objects to be located in the expansion areas.

1. When zooming in, the area of focus becomes larger and thus with time, noise on the gaze tracker will be contained in the zoomed region. When the element of focus is larger than the the extent of the noise the application can make an unambiguous selection with higher confidence. 2. Zoom may be considered an indicator of confidence. Zoom provides a natural feedback to the user about what is being selected and with what confidence.

[Ashmore et al. 2005] examine the potentials of a gaze-contingent fisheye perspective for eye pointing and selection of magnified targets. The fisheye perspective (a so-called distortion interface [Sarkar et al. 1993]) is hidden during visual search, but appears as soon as the user fixates a target. This technique provides an overview during search and the enlargement of targets during selection. The technique is shown to be superior in terms of speed and accuracy in comparison with conditions of no fisheye perspective and with conditions of a ”constantly-on” fisheye magnification. The average number of incorrect trials is 17%, which is probably too high for a routine task like typing.

3. Dwell time activation forces the user to wait for the button to get selected without doing anything and it may feel like a waste of time. Zoom on the other hand produces an optic flow and keeps the user visually stimulated. Dasher is an example of a gaze typing interface without dwell time activation [Ward and MacKay 2002]. Typing is controlled by continuous two-dimensional search and navigation in a column of characters that are ordered alphabetically and scaled according to their probabilities. The user makes the selections by searching for target letters without unnatural dwells breaking the flow. Typing speeds up to 25 words per minute (WPM) have been reported, which is exceptionally fast for gaze typing. Dasher is fairly tolerant to noisy gaze trackers. However, as observed by [Urbina and Huckauf 2007], since the alphabet is constantly moving when new characters suddenly appear, novice users often feel stressed. Unfortunately, the speed of Dasher may come at the cost of a steep learning curve.

In summary, previous work suggests three different ways to increase target acquisition in gaze interactive systems: Zoom or ’fly’ into the display, Expand the target in motor space Enlarge the fixated region by a fisheye lens or magnification glass metaphor. We follow the zoom approach since target expansion seriously limits the possible number of targets. The fisheye lens metaphor is disregarded since it distorts the interface, and both target expansion and visual enlargement need dwell time confirmation. The disadvantage of our approach is that some objects located outside the zoomed regions are hidden. Therefore it may work best for setups where the selectable items are placed in a familiar structure, e.g. the layout of countries on a map or the order of the alphabet.

Numerous computer games have demonstrated how easy and fast people learn to navigate in 3D environments. Can we exploit the same skills in gaze interfaces for a routine task like text typing?

2

Previous work

The idea of coupling gaze with zooming actions has been explored ever since eye tracking was first conceived as a real time input device. [Bolt 1981] envisioned the concept of Gaze-Orchestrated Dynamic Windows. The basic idea is to allow for dynamic interaction with multiple windows in a single large display. However, the technology at the time did not support a real time application of the system. In a study of gaze selection, a similar idea, called ”zooming windows”, was evaluated by [Fono and Vertegaal 2005]. Their results show that gaze contingent key activations are approximately 35% faster than mouse activated selections and are subsequently chosen to be the sole selection method in their second test. The second test compares zooming windows with static windows, demonstrating that interaction with zooming windows is up to 30% faster than with static windows.

3

StarGazer

In this section we present StarGazer, a zoomable interface used for presenting and selecting data in noisy environments. Data is presented in 3D. The navigation through pan/zoom allows for flexible interaction, even with significant noise.

3.1

Interface Layout

StarGazer is a multi-scale interface for browsing large treestructured data applying both geometric and semantic zoom prin1 A head mouse is an instrument that translates the movement of the head

Bates and Istance [Bates and Istance 2002] compare different pointing devices under zooming and non-zooming conditions in WIMP-

into mouse movements

206

ciples. The interface provides navigation in a 3D world by continuously panning and zooming in on regions of interest. The format of StarGazer is generic and made applicable to various types of interactive information, but in the following we explain the use of StarGazer through a gaze typing process.

Figure 3: Zooming in on items (e.g. the letter ”S”) to remove regions of low probability. Figure 2: The initial view of StarGazer. Letters are placed on two concentric circles leaving space between groups of letters. The user is looking at the letter ”R” and the concentric rings in the center indicate the pan direction.

Zooming promotes confidence in particular objects. Thus we can exploit the additional space and the increased confidence to include context dependent information. This resembles zooming in on a digital map, where only data relevant to a particular scale is shown (semantic zoom). In our example with gaze typing the additional information can for instance be word prediction, alternate modes (capital letters, special characters, edit functions etc.) and auxiliary functionality. Figure 4 shows an example of how additional information such as accented letters ˘ (equivalent to mode selection), and word prediction (’typing’) (S) can be incorporated seamlessly into StarGazer by use of zoom. StarGazer then becomes a zoomable interface with context dependent information that avoids explicit ’mode’ selection. Zooming with context

StarGazer is designed with the intention of displaying as much relevant data as possible. In this paper, we initially place all letters on two concentric circles in a 3D plane with minor spacing between letter groups to promote rote learning (see Figure 2). Space ([-])is placed in the center. Four special functions are placed in the corners of the display, namely [Undo], which is an undo function, backspace function ([BS]), adjustment of zooming speed ([Speed]), and a stop function. Fixated targets are shown in a different color, while the orientation of the three concentric rings in the center indicate the current direction of the gaze. Circular Keyboard

3.2

Interaction with Pan and Zoom

Pan and zoom are well known interaction principles and are key in the navigation of StarGazer. Since three out of six possible degrees of freedom are defined by pan and zoom, a combination of pan and zoom does not provide a true 3D navigation. However, the given degrees of freedom suffice for many 3D-like applications. Gazing at a particular region makes it possible to disregard areas of least interest by considering their distance to the point of regard. On a side note, this is the constituting principle behind gazecontingent displays [Duchowski 2007]. Zooming in forces points that are further away from the point of regard to move more rapidly than those that are closer. By this simple principle, zoom implicitly performs filtering, leaving more space to those regions close to the point of interest. In turn, this allows for easier distinction and selection of the objects that may be of interest. Figure 3 shows an example where the user zooms in on a letter and how this implicitly filters irrelevant regions. Applications benefit from zoom by reducing the effects of noise because the signal (object size) to noise ratio is increased. Zooming, distance measures and noise reduction are therefore closely related. Zoom

Figure 4: Displaying context dependent information i.e. word pre˘ In this dictions and special characters (i.e. accented letters like S). way, explicit mode selection is avoided. Pan can be defined as planar translation of the information space, allowing navigation both horizontally and vertically on the same scale level. A system solely relying on zoom does not Panning

207

Logistic Pan Velocity

enable the user to explore the immediate context on the same scale level without leaving the particular scale. StarGazer additionally translates object’s space so that the point of interest moves towards the center of the screen while zooming. The movement of the eye therefore makes smooth pursuits, radially moving towards the center. When making a mistake, such as zooming towards a wrong letter, the user will automatically try to avoid the obstacle (letter) by navigating towards the region where the desired letter should be. In such cases, the zoom process will be replaced by a panning in the new direction.

70

60

Velocity

50

40

30

20

10

Figure 5 shows the StarGazer interface overlayed with a scalable mask defining when to zoom and pan. The scaling may depend on gaze tracker noise. In a scene where several objects are present one of them attracts the attention of the observer. To begin with, the object can be anywhere (little confidence). Before making a selection, the user makes eye movements on the screen to get an overview of where each selectable object is located (saccadic movements). The chosen object or direction will elicit a saccade, which brings it closer to the fovea (increasing confidence). User confidence is lower in this phase and therefore zoom is not used. Instead, the attended object will pan towards the center, forcing the user to make smooth pursuit eye movements in order to follow it. As with dwell time activation, if the user maintains focus on the object, more confidence will be given to it. Panning the object towards the center is used to avoid pointing and to exploit human’s ability to track objects (smooth pursuits). When an object in focus enters the zoom area (increased confidence), StarGazer makes a zoom into the information space. When the object gets sufficiently close (defined by a plane of activation) the object gets selected (see Figure 7 (Top)). Using Pan and Zoom in StarGazer

0

0

0.2

0.4

0.6

0.8

1

Distance from center

Vp (t) = a

1 + me−t/τ + vmin 1 + ne−t/τ

Figure 6: Top) A plot of the equation (bottom) with the default parameters used in StarGazer m = 0, a = 50 n = 1,τ = .1,vmin = 9.

The importance of feedback is emphasized in several studies of gaze interaction [Majaranta et al. 2006]. StarGazer uses sounds for providing feedback when a selection occurs and colors for displaying which object will be selected if the user proceeds in an unchanged direction. A white pointer consisting of three concentric circles placed at different depths show the pan direction (see Figure 2). Feedback

Most gaze interactive applications use the cursor position on the screen. The cursor position is calculated (either implicitly or directly) as the intersection of the line of gaze and the monitor. Eye trackers where the geometry of light sources, cameras and monitor are known may explicitly estimate the gaze direction vector [Ohno and Mukawa 2004; Shih et al. 2000; Guestrin and Eizenman 2006; Villanueva et al. 2006]. The direction vector is seldomly conveyed to the gaze-based application since no use has yet been found for it. The direction vector may, however, provide important information when interacting with 3D worlds. For example, the direction in which the user is looking into the 3D world is given explicitly by the gaze vector. As illustrated in Figure 7 (Top), whenever the user looks in a particular direction, StarGazer uses the point of regard on the monitor and the gaze direction to estimate which object is intended for selection. This is done by tracing the gaze direction vector through the 3D space, and measuring the distance from the 3D line defined by the direction vector to each object in the currently visible space. The item with the shortest distance is considered to be the most likely intended item and is highlighted. Figure 5: Areas of StarGazer performing pan and zoom in the displayed window. The areas can be dependent on current noise and viewing angle (head position).

A noisy gaze cursor may seriously disturb the experience of using a gaze application. The most common countermeasure is to smooth the cursor signal, with the unfortunate drawback of losing temporal resolution and losing precision with large movements. StarGazer does not perform any processing of the cursor positions, but the direction pointer in the display is regularized for stable display purposes only.

Gaze-based applications should be able to handle both large and fast saccadic movements as well as smooth pursuit movements while considering the noise of the gaze tracker. For example, when the user initially makes a scan to get an overview, the movements may be large and thus the user expects a fast response. While doing smooth pursuits, during e.g. a zoom, both the application and the user are more confident in the selected object. The closer the object in focus gets towards the center of the screen the more the pan velocity should tend toward a minimum. Pan velocity (Vp ) is consequently adapted to the velocity of the point of regard and its distance (t) from the center of the activation (center of the screen). We use the sigmoid function defined and depicted in Figure 6.

4

Tests

The aim of the tests is to reveal the potentials of StarGazer in terms of its simplicity for novice users, and its ease of use when applied to noisy gaze trackers or when used on small displays. Can people use 3D navigation principles to interact with well known data structures (the alphabet) by gaze interaction?

208

not provide a gaze direction vector, we assumed that the user was sitting in front of the monitor and consequently we used a constant gaze direction vector. ection Gaze dir

4.1

Testing First Time Users

The purpose of the first test was to examine whether the layout of StarGazer and its use of pan and zoom for navigation is so intuitive that people can use it after a brief introduction and without prior gaze interaction experience. The possible effect of noise and the effect of screen size are important factors to examine. Purpose

Eye Monitor

Activation plane

Selectable objects

48 subjects (32 male, 14 female) volunteered to participate in the test. All subjects had normal or corrected vision. Approximately half of the participants were visitors attending an openhouse event and the remaining were students and faculty members at a local university. None of the participants had any experience with StarGazer and only few had tried a gaze tracker before the test. All tests were conducted with the same predefined settings (zoom and pan speed) of StarGazer.

Subjects

Design The first test employed a between-subject 2 × 3 factorial design. The factors are imposed noise (yes/no) and screen size (large, medium or small). Each subject only experienced one of the six combinations.

Figure 7: (Top) Intersecting the gaze direction with the monitor to find the zooming direction and nearest object in the view. (Bottom) Using the gaze direction vector to adapt the display, viewing the screen from the a) Left and b) Right. Notice that the direction pointer is no longer placed in the center of the display.

The subjects were seated 50 cm in front of the monitor and asked to type their name (given name and family name) into the test administration system by use of a standard keyboard. They then rolled a die to decide which of the six different conditions they would be tested on. After calibration of the gaze tracker (lasting approximately 10 seconds) the subject was given a short description of StarGazer and given two minutes to get acquainted with gaze interaction and StarGazer. Subsequently the subject was asked to type his or her name as quickly and accurately as possible. The test ended with the subject selecting the ”[Stop]” item at the lower right hand corner of StarGazer (see Figure 2). Language modeling was not employed in the test and there were consequently no word completion or prediction functions offered. The keystroke per character (KSPC) measure was therefore 1.0 for error-free typing. A video from this test has been made available online [StarGazer 2007].

Task Description

The different display sizes (in pixels) are: Large : 1280 × 994 Medium : 640 × 640 Small : 240 × 320 (PDA size) We made an external program that adds uniform noise within a given radius to the current cursor location at a certain update rate and with given latency. In the tests the radius was set to 100 pixels (corresponding to 3 cm) with an update rate of 60Hz. This had the effect of adding more noise to an inherently noisy gaze tracker. We tested the three display sizes (small, medium and large) on the gaze tracker monitor with and without noise applied to the cursor resulting in a total of 6 different test conditions.

On average the subjects generated 13.2 characters (σ = 2.7) of text with a grand mean (over all six test conditions) of 3.47 (σ = 1.42) WPM.

Results

As is common for gaze typing applications [Majaranta and R¨aih¨a 2002], we measured efficiency by words per minute (WPM). One word is defined as five characters, including space. Errors were measured by two variables, namely error rate (ER) and remaining errors. We calculated the error rate as the percentage of backspace and undo selections made in relation to the number of characters produced. The remaining errors were measured by the minimum string distance (MSD) [MacKenzie and Soukoreff 2002]. The minimum string distance is defined as the number of manipulation steps needed to obtain the target string (i.e. correctly spelled name of the subject) from the produced string. Measurements

A two-way ANOVA shows a main effect from noise F (1, 47) = 6.2340, p < 0.05 and a main effect from size of display F (2, 47) = 20.5010, p < 0.0001. The data shows no interaction effect between size and noise. A Bonferroni post hoc test shows a significant difference between the noise-free condition (µ = 3.83 WPM) and the noisy condition (µ = 3.10 WPM, p < 0.05). The post hoc test also reveals that the WPM for the smallest screen size (µ = 2.17 WPM) is significantly different from both the medium size (µ = 3.85 WPM, p < 0.0001) and the large size (µ = 4.38 WPM, p < 0.000001). The difference between the large and the medium size is not significant. A summary of the results when varying windows size and adding noise or not adding noise is given in Figure 8.

A Tobii ET-1750 eye tracker running on a 1.86 GHz Intel Dual Core 1 PC was used for the tests. The Tobii system consists of a flat panel monitor, a camera and IR light sources integrated in the frame of the monitor. The resolution of the display was 1280 × 1024 pixels. The manufacturer states that the system provides a 0.5 degree accuracy and a sampling rate of 50 Hz. StarGazer uses less than 1% of the CPU time, leaving the computations for other purposes (e.g. the gaze tracker). Since the Tobii system does Equipment

The grand mean of corrected errors (ER) is 12.6% (σ = 21.0%). The two-way ANOVA analysis reveals a main effect from display size F (2, 47) = 3.6775, p < 0.05, but not from noise. The mean ER for the noise-free condition is 12.3% and the mean ER with

209

WPM Added noise No added noise

Large (µ/σ) 4.06/1.09 4.70/0.64

Medium (µ/σ) 3.60/1.39 4.09/0.7

Small (µ/σ) 1.60/0.62 2.70/1.40

scribed previously to also test the effect of latency. Three subjects (2 male and 1 female) participated. We progressively delayed the control signal to simulate latency in the gaze tracker. No additional spatial noise was added in this test. The results are shown in figure 9. Evidently the users can cope with a latency up to about 200 ms. on StarGazer. Even with 400 ms. latency the users were still able to type their names but slowly.

Figure 8: Test results in WPM (mean and variance) using different size displays (columns) and noise or no added noise (rows).

Latency test

noise added is 13.0%. The data shows no interaction effect between noise and size. The mean ER for large display size is 5.6%, which is significantly different from the mean ER for small displays (µER = 23.9%, p < 0.05) but not from the medium display (µER = 8.48 ER) in a Bonferroni post hoc test. The grand MSD mean is 0.18 (σ = 0.45). The ANOVA shows no effects from display size or noise on MSD.

6 Mean User performance 5

WPM

4

The fact that all subjects could write their name with only a low number of remaining errors (c.f. the low MSD measure) confirms that the navigation principle of StarGazer is easy for people to understand and use. Imposed noise slows typing speed since the cursor may activate several items or even move outside the zoom window. Such movements cause the zoom to halt and pan involuntarily. However the user seldomly lose their orientation in StarGazer and they regain control. Discussion

2

1

0

0

50

100

150

200

250

300

350

400

450

500

Latency (milliseconds)

Figure 9: Performance when latency is introduced. The mean performance is displayed with a solid line and the performance of each of the three users is displayed with dashed lines.

It is worth noting that the noise does not seem to have any effect on the errors. In fact, the difference between the noisy and the noise free conditions is less than one percent (12.6% versus 13.0%). This indicates that the other important design goal for StarGazer, which is to show high tolerance to noise, is obtained.

4.3

Testing Typing Speed

In the previous tests the users were not able to adjust the zoom speed to their needs. We therefore conducted a test with seven subjects (six male, one female), asking them to write their names and allowing them to experiment with the zoom speed on a full size display (1280 × 994 pixels). The subjects were allowed to set the zoom speed at the level they believed gave them the best performance (maximizing speed and minimizing errors). Some subjects (three) had previous experience with gaze interaction, and all were allowed five minutes of practice with StarGazer to find the speed setting they felt most comfortable with. No additional noise was added to the cursor positions (obtaining about a one degree accuracy). Purpose and Task description

The large display improves control because the noise becomes smaller relative to the selectable items and the test confirms that the windows size has an effect on efficiency. The largest display size is the fastest to use, because it allows for more space between the selectable items, which in turn reduces the need to make corrective pan movements. We found that the advantage of larger target size exceeds the disadvantages of having to do longer saccades between objects. This reason is that additional noise is contained within the selectable item and consequently zooming reaches the size for which a final selection is made faster. Comparing our grand mean of 3.47 WPM to the typing speed of 7 WPM that [Majaranta et al. 2006] found in a controlled gaze typing experiment might at first seem rather disappointing. However, the test is conducted with novice users who have not previously encountered gaze interaction or navigated in StarGazer’s circular layout of keys, while [Majaranta et al. 2006] used the traditional qwerty-keyboard. On the other hand, our subjects type text that is familiar to them. To explore how people adapt to the use 3D gaze navigation we need additional controlled learning experiments.

The tests show that the subjects were able to achieve an average typing speed of 8.16 WPM (σ = 0.98) and a mean ER of 1.23% (σ = 3.43). In these test the remaining errors (MSD) are zero for all subjects. Results

Five minutes of practice with StarGazer enabled the seven subjects to increase zooming speed considerably, and achieve a typing efficiency comparable to that found by [Majaranta et al. 2006]. This indicates that StarGazer achieves the high spatial separation of interactive objects without compromising on efficiency. Error measures also seems to remain low in spite of the higher zooming speed. Discussion

The full-screen condition obtained an error rate of 5.6% and compares to the error rate of 4.29% found for novice users using large on-screen keyboards [Hansen et al. 2004]. Using a PDA size window increases the error rate to 23.9% (about two errors per name string). This may be too high for a general acceptance of gaze interaction with densely placed items on small displays. But the present test does not reveal whether people can, in fact, learn to control the small size displays better with practice. More experiments are needed to clarify this.

4.2

3

5

General Discussion

It is particularly important for gaze interactive interfaces to minimize discrete and sudden appearance of interactive items since they may attract a fixation and thus result in an unintended activation. StarGazer demonstrates that this can be reduced by use of well known 3D navigation principles.

Latency Test

Latency of gaze trackers are common by seldomly addressed in the literature. We therefore decided to use the name typing task de-

210

design goal is particularly important for the use of StarGazer on small displays, since the layout view on data is limited. By applying a well known alphabetical ordering of the selectable items, the user can direct the pointer to approximately the correct location. As the zooming progresses the user will be able to clearly read the subelements and make a selection with high precision.

In addition to this particular design goal, several general guidelines for user interfaces [Jordan 1999; Schneiderman 1992] are key to our design of StarGazer : 1. Avoid a steep learning curve: The tests presented in this paper convincingly show that novice subjects master the basic gaze navigation of StarGazer within a few minutes.

An advantage of StarGazer compared to fixed layouts is that the spatial separation may be controlled dynamically by the current precision of the tracker, and expand target distances uniformly when it decreases. Another advantage is that increasing noise does not increase selection errors; it only slows down activation speed. As demonstrated in our test with first time users this is true for random noise. An automatic correction of constant offset errors on gaze coordinates is not applied in the tests, but can be activated in StarGazer.

2. Design the system in such a way that the user is able to easily construct a mental model, preferably one that has similarities with comparable and well-known tasks in other domains: We apply the well-known alphabetic structure in separated groupings to supports rote learning of letter positions and we apply a 3D navigation principle that many users are familiar with from e.g. computer games. 3. Give early, precise and clear feedback on operator errors: The user sees what element is about to be selected from a color coding and hears a click sound whenever a selection is made. The interface gives high priority to placement of ”undo” and ”delete” functions. Penalties for errors in navigation are reduced by providing ”return-to-start” items that are easy to find and that brings the user back to the initial position (the ’@’symbol in Figure 2.

The disadvantage of our approach is that some objects located outside the zoomed regions are invisible and can only be selected through a pan. This may take time to do when the panning is performed on a low-scale (high zoom). To obviate this process StarGazer provides four ”return-to-start” activation items in the center of the display (indicated by the ”@” symbol in Figure 2).

4. The system should be efficient for the task, giving the user the satisfaction of a job well done with a suitable tool: At the present the efficiency of making selections compares to other gaze typing systems, but several possibilities to increase typing speed are possible (see below). There is no evidence yet that the use of StarGazer will be more satisfying than e.g. a traditional dwell time keyboard. However, we plan to conduct a set of experiments to examine the subjective user experience when using both types of displays for longer periods.

6

Conclusion

We have presented StarGazer, a zoomable interface used for displaying and selecting data under conditions where the gaze tracker accuracy may be low. The 3D-like navigation through pan/zoom allows for a flexible interaction while conveying information about which object is being selected. With gaze controlled zooming the user is not forced to look at a button for a fixed interval to select the object, and thus avoids the annoyance of wasting time. Additionally, zoom implicitly facilitates regretting before a selection is made without breaking the flow. Future improvements of StarGazer have been outlined. When implemented, controlled learning experiments with a standard target text will be the next step. Exploring the effect of different layouts of the alphabet seems particular interesting to us.

5. Enable frequent users to use shortcuts: The only shortcut provided in the current version of StarGazer is ”return-to-start”. By including full word suggestions close to the each letter when zoomed in, the user can potentially type more efficiently (see figure 4). If the full word suggestions become adaptive, a frequent user will soon be able to select words from his individual dictionary. 6. Allow for human diversity such as color blindness, lefthandedness and mild dyslexia / illiteracy: The present color coding of StarGazer can easily be changed. It is possible to add a module (not implemented yet) that will read letters and words when the user zooms towards them. People with impaired vision may find it particular helpful that selectable items appear in very large print just before they make the final selection. Finally, while Dasher (and word disambiguation systems like the T9) rely on good spelling skills to optimize performance, the StarGazer typing interface allows people to spell as best they can by just selecting individual characters.

Acknowledgements This work was supported by the European Network of Excellence COGAIN, Communication by Gaze Interaction, funded under the FP6/IST programme of the European Commission.

References A SHMORE , M., D UCHOWSKI , A. T., AND S HOEMAKER , G. 2005. Efficient eye pointing with a fisheye lens. In GI ’05: Proceedings of Graphics Interface 2005, Canadian HumanComputer Communications Society, School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, 203– 210.

7. Avoid the use of modes: StarGazer is able to display a large number of selectable items as the user zooms in. The subitems can be seamlessly arranged in a context specific manner, and thus avoiding the need for explicit mode selections.

BATES , R., AND I STANCE , H. 2002. Zooming interfaces!: enhancing the performance of eye controlled pointing devices. In Assets ’02: Proceedings of the fifth international ACM conference on Assistive technologies, ACM, New York, NY, USA, 119–126.

We are currently working on improving the predictive power of StarGazer. By using simple uni-grams of letter frequencies we can promote the most likely next character in 3D space, making it appear slightly larger and closer than the more unlikely letters (this is, in effect, similar to an adaptive dwell time activation).

BATES , R., I STANCE , H., D ONEGAN , M., AND O OSTHUIZEN , L. 2005. Fly where you look: Enhancing gaze based interaction in 3d environments. In HCI International- Universal Access in HCI Exploring New Interaction Environments, Caesars Palace, vol. 7.

The layout of StarGazer resembles that of pie menus [Kurtenbach and Buxton 1993] and shares similar interaction properties. For the novice, pie menus and StarGazer are designed to be self-revealing. For the expert, they become efficient because of rote learning. This

211

B OLT, R. A. 1981. Gaze-orchestrated dynamic windows. In SIGGRAPH ’81: Proceedings of the 8th annual conference on Computer graphics and interactive techniques, ACM, New York, NY, USA, 109–119.

O HNO , T., AND M UKAWA , N. 2004. A free-head, simple calibration, gaze tracking system that enables gaze-based interaction. In Eye Tracking Research & Applications Symposium 2004, 115 – 122.

D UCHOWSKI , A. T. 2007. Eye Tracking Methodology: Theory and Practice. Springer-Verlag New York, Inc.

S ARKAR , M., S NIBBE , S. S., T VERSKY, O. J., AND R EISS , S. P. 1993. Stretching the rubber sheet: a metaphor for viewing large layouts on small screens. In UIST ’93: Proceedings of the 6th annual ACM symposium on User interface software and technology, ACM, New York, NY, USA, 81–91.

F ONO , D., AND V ERTEGAAL , R. 2005. Eyewindows: evaluation of eye-controlled zooming windows for focus selection. In CHI ’05: Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA, 151–160.

S CHNEIDERMAN , B. 1992. Designing the User Interface: Strategies for Effective Human- Computer Interaction. AddisonWesley Publishing Company.

G UESTRIN , E. D., AND E IZENMAN , M. 2006. General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Transactions on Biomedical Engineering 53, 6, 1124–1133.

S HIH , S.-W., W U , Y.-T., AND L IU , J. 2000. A calibration-free gaze tracking technique. In Proceedings of the 15th International Conference on Pattern Recognition, 201–204.

H ANSEN , D. W., AND P ECE , A. E. 2005. Eye tracking in the wild. Computer Vision and Image Understanding 98, 1 (April), 182–210.

S TAR G AZER, 2007. www.youtube.com/watch?v=5iermrjnp50. U RBINA , M. H., AND H UCKAUF, A. 2007. Dwell time free eye typing approaches. In Proceedings of COGAIN Gaze-based Creativity, Interacting with Games and On-line Communities, 65– 70.

H ANSEN , J. P., T ØRNING , K., J OHANSEN , A. S., I TOH , K., AND AOKI , H. 2004. Gaze typing compared with input by head and hand. In ETRA ’04: Proceedings of the 2004 symposium on Eye tracking research & applications, ACM, New York, NY, USA, 131–138.

V ILLANUEVA , A., C ABEZA , R., AND P ORTA , S. 2006. Eye tracking: Pupil orientation geometrical modeling. Image and Vision Computing 24, 7 (July), 663–679.

JACOB , R. J. K. 1991. The use of eye movements in humancomputer interaction techniques: what you look at is what you get. ACM Trans. Inf. Syst. 9, 2, 152–169.

WARD , D. J., AND M AC K AY, D. J. C. 2002. Fast hands-free writing by gaze direction. Nature 418, 6900, 838.

J ORDAN , P. W. 1999. An Introduction to Usability. Taylor & Francis, London. K URTENBACH , G., AND B UXTON , W. 1993. The limits of expert performance using hierarchic marking menus. In CHI ’93: Proceedings of the INTERACT ’93 and CHI ’93 conference on Human factors in computing systems, ACM Press, New York, NY, USA, 482–487. L ANKFORD , C. 2000. Effective eye-gaze input into windows. In ETRA ’00: Proceedings of the 2000 symposium on Eye tracking research & applications, ACM, New York, NY, USA, 23–27. L I , D., BABCOCK , J., AND PARKHURST, D. J. 2006. openeyes: a low-cost head-mounted eye-tracking solution. In ETRA ’06: Proceedings of the 2006 symposium on Eye tracking research & applications, ACM Press, New York, NY, USA, 95–100. M AC K ENZIE , I. S., AND S OUKOREFF , R. W. 2002. A characterlevel error analysis technique for evaluating text entry methods. In NordiCHI ’02: Proceedings of the second Nordic conference on Human-computer interaction, ACM, New York, NY, USA, 243–246. ¨ A¨ , K.-J. 2002. Twenty years of eye M AJARANTA , P., AND R AIH typing: systems and design issues. In ETRA ’02: Proceedings of the symposium on Eye tracking research & applications, ACM Press, New York, NY, USA, 15–22. ¨ A¨ , K.M AJARANTA , P., M AC K ENZIE , S., AULA , A., AND R AIH J. 2006. Effects of feedback and dwell time on eye typing speed and accuracy. Universal Access in the Information Society 5, 2, 199–208. M INIOTAS , D., Sˇ PAKOV, O., AND M AC K ENZIE , I. S. 2004. Eye gaze interaction with expanding targets. In CHI ’04: CHI ’04 extended abstracts on Human factors in computing systems, ACM, New York, NY, USA, 1255–1258.

212