From Gaze Control to Attentive Interfaces - Semantic Scholar

29 downloads 11705 Views 604KB Size Report
caller or the phone can adjust the normal ringing sound of the incoming call to a less interruptive ..... that convey eye contact of the participants of the conference. .... My own private kiosk: Privacy-preserving public displays. ... IOS Press. Jacob ...
Hyrskykari, A., Majaranta, P. and Räihä, K.-J. From Gaze Control to Attentive Interfaces. Proc. HCII 2005, Las Vegas, NV, July 2005.

From Gaze Control to Attentive Interfaces Aulikki Hyrskykari, Päivi Majaranta, Kari-Jouko Räihä Unit for Computer-Human Interaction (TAUCHI) Department of Computer Sciences, FIN- 33014 University of Tampere, Finland {aulikki.hyrskykari, paivi.majaranta, kari-jouko.raiha}@cogain.org

Abstract Interactive applications that make use of eye tracking have traditionally been based on command-and-control. Applications that make more subtle use of eye gaze have recently become increasingly popular in the domain of attentive interfaces that adapt their behaviour based on the visual attention of the user. We provide a review of the main systems and application domains where this genre of interfaces has been used.

1

Introduction

Humans are accustomed to eye gaze to facilitate conversations with other humans. Gaze can even be used to silently issue commands: staring at the water bottle or salt dispenser for a prolonged period of time is likely to make others in your dinner party to take action and provide you with the desired item. This type of use of eye gaze is akin to selection by dwell time in interactive applications that make use of computerized tracking of the direction of gaze. Such command-and-control applications have existed for three decades, particularly in the field of text entry by users with disabilities (Majaranta & Räihä, 2002). In human-human interaction, a much more common way of utilizing eye gaze is as an indicator of attention for mediating the conversation. The conversation partners can adapt their behaviour based on cues of whether they have the attention of the people they are trying to address. If two people compete for attention, the one that does not get visual attention is more likely to let the other one take the first turn. Attention is a powerful mediator in communication (Horwitz, Kadie, Paek & Hovel, 2003; Roda & Thomas, 2005). In computer-based tracking, such attentive user interfaces are a much newer phenomenon than command-andcontrol applications. One of the earliest examples was the shipyard application by Jacob (1991). The approach received attention (e.g., Velichkovsky & Hansen, 1996) but other applications in this genre appeared slowly. Recently the general increase in interest to attentive, non-command interfaces (Lieberman & Selker, 2000) has boosted the development of applications that make use of eye gaze for detecting visual attention, and for adapting the application based on this information. A number of reviews and special issues of journals that focus on this theme have been published (Maglio, Matlock, Campbell, Zhai & Smith, 2000; Porta, 2002; Duchowski, 2002; Duchowski, 2003; Vertegaal, 2003; Ruddarraju et al., 2003; Zhai, 2003; Richardson & Spivey, 2004; Selker, 2004). There is a reason for the slow emergence of attentive interfaces utilizing eye gaze: designing such applications is challenging (Vertegaal, 2002). The most common problem that needs to be solved is avoiding what Jacob (1991) dubbed the “Midas touch” problem: the application should not react every time the target of the gaze changes, only in appropriate situations, and at the right moment. Humans are good at detecting when the eye gaze demands action, but for computers this is a big challenge. A variety of solutions have been developed (Hansen, Andersen & Roed, 1995; Velichkovsky, Sprenger & Unema, 1997; Yamato, Monden, Matsumoto, Inoue & Torii, 2000, Aoki, Itoh & Hansen, 2005). With the growing diversity of applications and increasing number of solutions to the design challenges, it becomes useful to have a framework for contrasting the ideas with each other. Here we will focus on concrete applications, and present a survey based on the domain they are intended for. The domain areas covered are (1) offline, noninteractive applications; (2) devices based on eye detection and eye contact sensing; (3) gaze-contingent systems, which are further subdivided into (3a) application-independent acceleration of interaction with window systems and (3b) domain-specific applications; (4) conversational systems, and (5) attention-aware agents.

Hyrskykari, A., Majaranta, P. and Räihä, K.-J. From Gaze Control to Attentive Interfaces. Proc. HCII 2005, Las Vegas, NV, July 2005.

2

Domains of gaze-enhanced applications

2.1

Exploitation of offline analysis

Although our focus is in interactive applications, there are interesting ways of using eye gaze that are based on modifying an image offline, based on data collected while the image was viewed by a user or groups of users. Such applications are related to gaze-contingent displays (Section 2.3), which adapt the view dynamically. David Wooding (2002) recorded the gaze behaviour of visitors at the Telling Time exhibition at the National Gallery in London while they were viewing artwork. Based on the cumulative fixation data of the points of interests, heatmaps were created to highlight the areas that were of most interest. This produced a compelling visualisation of what catches the eye in those pieces of art. With more than 5000 participants, it is also the largest eye tracking study ever carried out. Another offline application is to use eye gaze data for artistic rendering of photographs (DeCarlo & Santella, 2002; Santella & DeCarlo, 2004). This technique produces images that better highlight the meaningful elements of an image than traditional techniques that only use colour segmentation. A third example is the cropping of images on small displays that don’t allow the viewing of all meaningful content without (Fan, Xie, Ma, Zhang & Zhou, 2003).

2.2

Eye detection and eye-contact sensing devices

The following experimental systems demonstrate that even without tracking the user’s gaze direction, eyes can enhance the interaction substantially. Simply detecting the presence of eyes or recognising eye contact with a target (with a device or with another person) gives us a variety of possibilities to establish the desired interaction. Eye-R Selker, Lockerd & Martinez (2001) introduced Eye-R, a glasses-mounted, wireless device, which is able to detect the user’s eye motion, and store and transfer the information using external IR devices. Eye-R consists of an infrared emitter and a detector that is positioned between the lens and the eye. In principle, the emitter/detector unit can be mounted on any commonly used pair of eye glasses. Thus, even without a camera, Eye-R glasses are able to recognize the rough eye behaviour of a user: whether the eye is open or closed, blinking, winking, staring, or gazing around. On the basis of the recognized behaviour the glasses are able to establish communication with a target in the environment. The target may be another pair of Eye-R glasses, or a base station connected to a PC which gathers the information stored in Eye-R. The glasses were successfully used, for example, to send business cards when a person (using the glasses) stares at another person, and to bring up information on a display when a person looks at it (Selker et al., 2001; Selker, 2004). Eye-Bed A set of simple natural eye behaviour gestures was also used to help in Eye-Bed (Selker, Burleson, Scott & Li, 2002; Lieberman & Selker, 2000) for controlling the multimedia scene projected on the ceiling above a bed. An eye tracker was placed in a “lamp arm” over the head of the person in the bed. Cursor control was tried out with different pointing devices; thus, the eye tracker was not used to control selection of objects on the projected image. Instead, the natural behaviour of eyes, such as closing and opening the eyes, gazing (around), staring at one place, and nervous blinking were used to adapt the presented images to the observed state of the user’s attention. Eye-R applied the idea of informing the surrounding objects that the user is paying attention to them. However, without access to the information of the direction of gaze it was difficult to determine the correct target object from several candidates. Furthermore, measuring the gaze direction of a moving subject is complicated. A partial solution to these issues, developed in the Human Media Lab in Queen’s University, is introduced below. EyeContact sensor was developed for identifying, not the exact direction of gaze, but rough eye contact with an object. Wearable EyeContact sensor, Attentive Cell Phone The technique for detecting the eye contact relies on two main ideas. Firstly, it utilizes the idea of two sets of onand off-axis (aligned at the same/different axis with the camera) LEDs sending timely synchronized infrared light beams into the eye to produce both bright and dark pupil effects (Morimoto, Koons, Amir & Flickner, 2000). The technique facilitates a robust detection of eyes from a large scale camera view. The researchers in Queen’s

Hyrskykari, A., Majaranta, P. and Räihä, K.-J. From Gaze Control to Attentive Interfaces. Proc. HCII 2005, Las Vegas, NV, July 2005.

University combined the technique with the insight that the common tracking of the corneal reflection point can be simplified by detecting only the eye contact with the camera, disregarding the other positions of the eye. That is, when the corneal reflection point is located near the pupil centre, the eyes are looking at the camera. Based on these two ideas, Vertegaal, Dickie, Sohn & Flickner (2002) implemented an EyeContact sensor with an embedded camera capable of finding the pupils in its field of view. When placed in close proximity of the user’s eyes, the device is able to detect if the user is in eye contact with another person. The concept was utilized to design a scenario of an Attentive Cell Phone. If the user is engaged in a conversation, this information may be passed to the caller or the phone can adjust the normal ringing sound of the incoming call to a less interruptive notification mode. The intensity of attending a conversation can be deduced from the speech activity via microphones, but since the conversation is a reciprocal action, silence does not necessarily imply that the user is not socially committed. Hence, sensing the ongoing eye contact gives valuable additional information of the user’s state of attention. ECSGlasses, EyeBlog, Attentive Messaging Service Eye-Contact Sensing Classes (ECSGlasses, Dickie et al., 2004b) are a next generation version of the wearable EyeContact sensor. The camera and the off- and on-axis IR LEDs are now embedded into glasses. The on-axis illuminators producing the bright pupil effect are positioned around the camera on the bridge of the nose, and the off-axis illuminators residing near the arms of the glasses produce the dark pupil effect. ECSGlasses detect when a person is looking at the wearer of the glasses. ECSGlasses include a camera for recording video from the wearer’s perspective. ESCGlasses were exploited in the implementation of a revised version of the Attentive Cell Phone (Shell et al., 2004), EyeBlog, and AMS. EyeBlog (Dickie et al., 2004a) is an eye-contact aware video recording and publishing system that is able to automatically record face-to-face conversations. AMS (Attentive Messaging Service, Shell et al., 2004) can communicate the availability or absence of the user to the “buddies” on the user’s buddy list. EyeContact sensor devices, EyePliances, iLights In many cases giving voice commands to digital household devices would be a natural way of interacting with the surrounding technology. Addressing a target device has been recognized as one of the essential communication challenges for future human-computer interaction (Bellotti et al., 2002). It has also been shown that the subjects tend to establish a natural eye contact with a target object of a command (Maglio et al., 2000). The basic idea behind several experimental applications developed at Queen’s University is that instead of making the eye tracker wearable for a freely moving user, remote eye trackers are placed in the devices which we desire to be eye sensitive. When an EyeContact sensor is placed into a digital household appliance, the appliance is able to detect if a user is attending to it. Thus, it removes the need of using indirect referencing via naming and lets the user address the commands directly to the device. Furthermore, the limited vocabulary of available commands for the addressed device helps the system to sort out ambiguities and the errors of speech recognition. Examples of such EyePliances include eye sensitive lights and attentive television. In addition to using the information that the user is attending to a device, also the lack of attention can be used as a valuable information source. An example of this is a VCR which pauses when the user turns away from it to answer a phone call (Shell, Vertegaal & Skaburskis, 2003). iLights (Kembel, 2003), a combined attention and gesture control interface for a device (implemented for lights in this case), shares the idea of EyePliances. However, in iLights the attention is not derived from the precise eye contact but the iLights recognize an IR light beam sent by IR LEDs attached to the frames of the eye glasses worn by the user. Hence, a connection with a device is opened when the user faces an iLights device.

2.3

Gaze-contingent applications

In gaze-contingent applications the information of the focus of the user’s visual attention is used to alter the onscreen view presented to the user. Displays adapting their resolution to match the user’s gaze position are generally called gaze-contingent (multiresolutional) displays. The main motivation to degrade the resolution of peripheral image regions is to minimize overall display bandwidth requirements. Those systems are not further discussed in this paper (for reviews, see Baudisch, DeCarlo, Duchowski & Geisler, 2003; Reingold, Loschky, McConkie & Stampe, 2002). In this section we divide gaze-contingent applications in two subclasses. The first consists of general solutions for accelerating the interaction with windowing systems using eye input, and the second class of applications uses the user’s eye behaviour in domain specific ways.

Hyrskykari, A., Majaranta, P. and Räihä, K.-J. From Gaze Control to Attentive Interfaces. Proc. HCII 2005, Las Vegas, NV, July 2005.

2.3.1

Enhancing generic GUI-interaction

The obvious idea of using gaze as a substitute for a mouse quickly collides with the well-known problems of eye tracking: the problem of Midas touch and positional tolerance of the measured point of gaze (e.g. Jacob, 1995; Bates & Istance, 2003). The twofold role of the mouse in conventional interfaces is to use it as a pointing device for assigning a target location (cursor positioning), and to select an action at the assigned position (clicking). The Midas touch problem manifests itself in both cases. If gaze is used to control the cursor position, the cursor can not be left off at a position on screen while the visual attention is momentarily targeted to another (on- or off-screen) target. If gaze is used as a selection device, the timing becomes a problem. Dwell time, even though usable in some situations, can generate wrong selections and make the user feel uncomfortable: avoiding unintended selections prevents the user from relaxed browsing. The other big question is the positional tolerance, i.e. the inaccuracy of the measured point of gaze. The selectable objects in normal windowing systems (menus, toolbar icons, scrollbars, etc.) are too small for gaze selection. Moreover, according to Bates & Istance (2003) the simpler and cheaper technique of using head mouse beats the performance of eye mouse in real world tasks (word processing and web browsing). Does this mean that studying eye input for accelerating general interaction is not worth the effort? We do not think so. An indisputable advantage of using eye tracking over any other input device is that the gaze point carries the information of the focus of the user’s visual attention. One conceivable solution for the problem of positional tolerance is to zoom the interface gadgets large enough for gaze selection. A straightforward zooming of the elements at point of gaze does not necessarily work very well (Bates & Istance, 2002), but some experimental systems demonstrate that developing special solutions for eye input may improve present interfaces (e.g. Pomplun, Ivanovic, Reingold & Shen, 2001; Ohno, 1998). Two examples of such innovations are briefly introduced below. Magic Pointing In pointing tasks the eyes always move first to the target, and the cursor then follows. However, using the point of gaze directly to position the cursor to a desired location is tedious due to the positional tolerance discussed above. Magic Pointing combines the strengths of two input modalities: the speed of the eye and the accuracy of the hand. The gaze location only defines a dynamical “home” position for the cursor. Thus, when the user is about to point and select a target the cursor is already “automatically” at the vicinity of the target (Zhai, Morimoto & Ihde, 1999). Using the mouse as a clutch for the selection also avoids the Midas touch problem. EyeWindows The idea of controlling the selection of active task window with gaze was presented already by Bolt (“Gaze Orchestrated Windows”; Bolt, 1985) and Jacob (“Listener windows”; Jacob, 1991). Tests performed on EyeWindows (Fono & Vertegaal, 2005) prove that the developments in eye tracking technology make the idea now viable for real use. Controlling task windows by gaze seems to be especially suitable for gaze: windows are large enough objects to diminish the problem of positional tolerance, and the technique frees hands for managing the window contents (often involving text input via keyboard). Using eyes to indicate the focus window but letting the user perform the actual selection with a key press proved to work out better than trying to activate the selection merely by gaze.

2.3.2

Domain-specific applications

Eye input can provide invaluable application specific information. In most cases the information is simply the user’s focus of attention in the application. In some cases the information may be more than just the instantaneous focus but an interpretation of gaze behaviour in contents of the application window during a longer period of time. An early Little Prince application, in which the user’s gaze path was used to drive the narration of the story, is an example of such applications (Starker & Bolt, 1990). Also, for example, Ramloll, Trepagnier, Sebrechts & Finkelmeyer (2004) exploited the eye behaviour of autistic children in an application aiming to reinforce appropriate gaze behaviour in them. We will describe one application, iDict, in more detail but before that we briefly introduce two related applications. All of the following three applications deal with tracking a user’s reading path.

Hyrskykari, A., Majaranta, P. and Räihä, K.-J. From Gaze Control to Attentive Interfaces. Proc. HCII 2005, Las Vegas, NV, July 2005.

Translation Support System, Reading Assistant The eye-movement enhanced Translation Support System (Takagi, 1998) assists in the task of translating text from Japanese into English. The system analyzes eye movements during the translation, detects patterns in eye movements and responds appropriately. For example, when the user scans through a Japan-English translation corpus the system automatically removes the already scanned corpus and continuously retrieves new information. If the user pauses due to hesitation, the system finds keywords in the sentence under focus and retrieves new corpus results. Reading Assistant (Sibert, Gokturk & Lavine, 2000) uses eye gaze to trigger auditory prompting for remedial reading instruction. The application follows the user’s gaze path and highlights the words of the text as the reading proceeds from word to word. As soon as the program notices hesitation it speaks out the word. It provides unobtrusive assistance to help the user with recognition and pronunciation of words. Like iDict, the reading assistant application exploits the knowledge of how the gaze usually behaves during reading. iDict iDict is our experiment in using natural eye behaviour to make the interface more aware of the user’s state (Hyrskykari, Majaranta, Aaltonen & Räihä, 2000; Hyrskykari, Majaranta & Räihä, 2003; Hyrskykari, 2005). It is a gaze-aware environment aiming to intensify the reading of electronic documents when read by nonnative readers (Figure 1). Normally, when reading text documents in a foreign language, the unfamiliar words or phrases cause the reader to interrupt the reading and get help from either printed or electronic dictionaries. In both cases the process of reading and line of thought gets interrupted. After the interruption getting back to the context of text takes time, and may even affect the comprehension of the text being read.

Figure 2: A gloss is given proactively in-between lines of the document in the Text Frame. On a basis of a gaze gesture, a whole dictionary entry is fetched in the Dictionary Frame. iViewX™ was one of the trackers used in the implementation.

Figure 1: Tracking the eye, interpreting of reading paths, and the text document being read are the three essential elements of iDict.

In iDict the reader's eyes are tracked and the reading path is analyzed in order to detect deviations from the normal path of reading, indicating that the reader may be in need of help with the words or phrases being read at the time. The assistance is provided to the reader on two levels (see Figure 2). Firstly, when a probable occurrence of difficulties of comprehension is detected, the reader gets a gloss (an instant translation) for the word. The gloss is positioned right above the problematic word to allow a convenient quick glance at the available help. The gloss is the best guess for the translation on the spot. It is deduced from the syntactical and lexical features of the text combined with information derived from embedded dictionaries. For example, for deciding the Finnish gloss for the word “regale” in Figure 2, iDict is able to parse the base form for the word. The linguistic analysis of the text provides iDict with the information that the word is a transitive verb. Regardless of the intelligent choice from the optional translations for the word, the gloss cannot always be right or not even the only one. If the user is not satisfied with the gloss, a gaze gesture denoting attention shift to the Dictionary Frame on the right makes the whole dictionary entry appear in it.

Thus, the user’s eye movements during reading are used in iDict to infer the user’s point of attention. The gloss is provided proactively on the basis of several indicators, with the total time spent on words being the dominating factor for triggering the gloss. During evaluations, it was interesting to note that many of the users quickly learned to actively trigger the help by prolonging their gaze on words for which they wanted to get help (Hyrskykari et al.,

Hyrskykari, A., Majaranta, P. and Räihä, K.-J. From Gaze Control to Attentive Interfaces. Proc. HCII 2005, Las Vegas, NV, July 2005.

2003). Hence, while in principle iDict makes use of the reader’s natural eye behaviour, after understanding how the application works the readers were able to use their eyes also consciously for getting the application behave as wished. The same goes with using the dictionary entries in the Dictionary Frame. The entry appears in the frame automatically when the reader looks at it, but it can also be seen as a conscious command to consult the dictionary.

2.4

Conversational systems

As noted above, eye gaze facilitates conversations with other humans in face-to-face situations. Gaze-aware conversational systems provide information of the direction of gaze in situations where the conversation occurs remotely, through technical devices. Gaze provides information of the attention of the participants, directed to another participant or an object. As demonstrated in the examples below, attentional gaze cues facilitate the conversation significantly in both cases. RealTourist and iTourist Qvarfordt (2004) conducted an experiment to study eye-gaze patterns in natural (remote) dialogues. RealTourist provides information of where the user’s visual attention is targeted on the computer screen to a tourist consultant who assists the tourist remotely. Both the tourist and the consultant see the same map displayed on their screens. In addition, the consultant sees the tourist’s eye-gaze position superimposed on her screen. The RealTourist experiment showed that eye-gaze information helped in resolving references to objects and in determining how interested the tourist was about them. Gaze provided cues about when it was suitable to switch topics. Information of the visual attention helped in interpreting unclear statements and in establishing a common ground: the consultant was able to asses if the tourist had understood instructions and to make sure that they talked about the same object. iTourist (Qvarfordt & Zhai, 2005) is an attentive application that provides tourist information about an imaginary city, Malexander. It shows a map and photos of different places, and provides pre-recorded verbal information about them. iTourist makes assumptions about what the user is interested in based on the user’s eye-gaze patterns, and then adapts its output accordingly. The iTourist user study showed that, despite occasional mistakes, most of the time iTourist was able to tell the users about places they really were interested about. This indicates that visual attention is a powerful tool in detecting the user’s interest. EyeGuide EyeGuide (Eaddy, Blaskó, Babcock & Feiner, 2004) assists a traveller who is looking at a subway map. It combines lightweight wearable eye and head tracking gear. EyeGuide detects when the user appears to be lost. Based on the information of the user’s point of gaze, it provides spoken hints to help the user to navigate the map (e.g. “Look to the far right”). The system preserves the user’s privacy by whispering the instructions via an earphone. By combining the information of what the user’s goal is and what the user is currently looking at, EyeGuide can provide contextual information (e.g., “Exit here for the JFK airport”). Gaze-aware video-conferencing Gaze contact is typically lost in video conferencing, because the camera and the screen are not aligned. Only if someone looks directly at the camera will the image of the person on screen seem to look at the viewers. Gemmel, Toyama, Zitnick, Kang & Seitz (2000), as well as Jerald & Daily (2002), manipulated the real-time video image by rendering a modified image of the eyes upon the original video image. The idea was that after the manipulation, the eyes seemed to look at the correct direction, creating an illusion of eye contact. Real video stream is considered better than e.g. an animated avatar because the real video transmits facial expressions and eye blinks as they appear. GAZE, GAZE-2 GAZE (Vertegaal, 1999) and GAZE-2 (Vertegaal, Weevers & Sohn, 2002) are attentive videoconferencing systems that convey eye contact of the participants of the conference. The users meet in a virtual 3D meeting room, where each member’s image (“an avatar”) is displayed in a separate video panel. The direction of each user’s eye gaze is tracked and each user’s image is then rotated towards the person he or she is looking at. In addition, a lightspot is projected onto the surface of the shared table to indicate what (e.g. which document) the user is looking at. The colour of the lightspot corresponds to the colour of the frame of that user’s image. The lightspot helped to resolve references to the objects (“look at this”). GAZE only showed animated snapshots of the participants but GAZE-2 uses live video. GAZE-2 also optimizes the bandwidth of the streaming media based on the information about the

Hyrskykari, A., Majaranta, P. and Räihä, K.-J. From Gaze Control to Attentive Interfaces. Proc. HCII 2005, Las Vegas, NV, July 2005.

participants’ visual interest. For example, if all the other participants are looking at the person who is currently talking, that person’s image is broadcasted in a higher resolution.

2.5

Attention-aware agents

When people listen or talk to each other, they typically look at each other. Artificial attentive agents will also greatly benefit from the information of gaze direction as demonstrated in the examples below. FRED, Look-toTalk FRED (Vertegaal, Slagter, van der Veer & Nijholt, 2001) is a multi-agent conversational system in which the artificial agents are attentive to the users’ eye gaze direction. By combining information from the gaze and speech data, each agent is able to determine when it is spoken to, or when it should listen to the user. Similarly, Look-toTalk (Oh et al., 2002) uses the information of gaze direction to help in deciding when to activate the automatic speech recognition. An artificial agent (Sam) knows that he is spoken to when the human participant looks at him. In the experiment, the users preferred the perceptual look-to-talk interface upon a more conventional push-to-talk interface where they had to push a button to indicate that they were talking to the agent. SUITOR Simple User Interest Tracker, SUITOR (Maglio & Campbell, 2003) is an attentive information system, in which agents track the user’s attention through multiple channels: keyboard input, mouse movements, web browsing, and gaze behaviour. Information from the eye gaze direction is used to determine where on the screen the user is reading. SUITOR uses the information to determine what the user might be interested in, and automatically finds and displays the potentially relevant information. Suggestions are displayed timely but unobtrusively in a scrolling display at the bottom of the screen. Attentive toys One application class which can be considered as some sort of agents are attentive toys. Haritaoglu et al. (2001) point out that machines would be more powerful if they had even a small fraction of the humans’ perceptual ability to perceive, integrate, and interpret visual and auditory information. Bearing this in mind, they implemented a VTOY robot (later referred to as PONG; Koons & Flickner, 2003), which is capable of deciding when to start engaging with a human partner, and to maintain eye contact with the partner. A similar attempt to sense human attention was experimented with Ernesto Arroyo’s dog, which barks when attended to (Selker, 2004).

3

Conclusions

The eye is a perceptual organ; eyes are not meant for controlling. On the other hand, when people act, they look at the target of the action, be it an object they are using or a person they are talking to. Eyes indicate the context for the action. In this paper, we surveyed research in the area of attentive interfaces. Attentive interfaces exploit the information of the user’s eye gaze in a natural way. Devices (such as eyePliances) will react to commands automatically but will not interrupt the user at a critical moment. Attentive communication devices (e.g. attentive phones or messaging systems) can also communicate the user’s availability to others. Turn taking in conversations with agents, or devices, is more fluent when both actors are able to detect whether the other is paying attention to them. Awareness of visual attention helps to establish common ground: what are we talking about (conversational systems, agents). Eyes are extremely fast. Hence, even though the eyes are not good for commanding, they can accelerate interaction by providing tools and assistance where it is needed (Magic pointing, iDict, EyeWindows). Thus, there are numerous ways in which eyes can enhance the interaction. The pool of applications is already large and rapidly growing. Adding gaze-awareness to everyday devices should be especially useful for people with motor disabilities who may not be able to use their hand to operate them. On the other hand, people with vision related disabilities might be even worse off. One should provide alternative ways of interacting with the devices. We hope this review will help the developers in contrasting new application ideas with past work.

Hyrskykari, A., Majaranta, P. and Räihä, K.-J. From Gaze Control to Attentive Interfaces. Proc. HCII 2005, Las Vegas, NV, July 2005.

Acknowledgments This work was supported by the European Network of Excellence COGAIN, Communication by Gaze Interaction, funded under the FP6/IST programme of the European Commission.

References Aoki, H., Itoh, K., & Hansen, J. P. (2005). Learning to type Japanese text by gaze interaction in six hours. In HCI International (these proceedings). Bates, R., & Istance, H., (2002). Zooming interfaces!: enhancing the performance of eye controlled pointing devices. In Proceedings of the Fifth International ACM Conference on Assistive Technologies (ASSETS 2002) (pp. 119126). New York: ACM Press. Bates, R., & Istance, H. O. (2003). Why are eye mice unpopular? A detailed comparison of head eye controlled assistive technology pointing devices. Universal Access in the Information Society, 2, 280-290. Baudisch, P., DeCarlo, D., Duchowski, A. T., & Geisler, W. S. (2003). Focusing on the essential: considering attention in display design. Communications of the ACM, 46 (3), 60-66. Bellotti, V., Back, M., Edwards, W. K., Grinter, R. E., Henderson, A., & Lopes, C. (2002). Making sense of sensing systems: five questions for designers and researchers. In Proceedings of Human Factors in Computing Systems (CHI’02) (pp. 415-422). New York: ACM Press. Bolt, R. A. (1985). Conversing with computers. Technology Review, 88, 35-43. DeCarlo, D., & Santella, A. (2002). Stylization and abstraction of photographs. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 2002) (pp. 769-776). New York: ACM Press. Dickie, C., Vertegaal, R., Fono, D., Sohn, C., Chen, D., Cheng, D., Shell, J. S., & Aoudeh, O. (2004a). Augmenting and sharing memory with eyeBlog. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experiences (pp. 105-109). New York: ACM Press. Dickie, C., Vertegaal, R., Shell, J. S., Sohn, C., Cheng, D., & Aoudeh, O. (2004b). Eye contact sensing glasses for attention-sensitive wearable video blogging. In Extended Abstracts of Human Factors in Computing Systems (CHI 2004) (pp. 769-770). New York: ACM Press. Duchowski, A. T. (2002). A Breadth-First Survey of Eye Tracking Applications. Behavior Research Methods, Instruments, & Computers (BRMIC), 34, 455-470. Duchowski, A. T. (2003). Eye Tracking Methodology. Theory and Practice. London: Springer-Verlag. Eaddy, M., Blaskó, G., Babcock, J., & Feiner, S. (2004). My own private kiosk: Privacy-preserving public displays. In Proceedings of the Eighth IEEE International Symposium on Wearable Computers (ISWC 2004) (pp. 132135). Washington: IEEE Computer Society Press. Fan, X., Xie, X., Ma, W.-Y., Zhang, H.-J., & Zhou, H.-Q. (2003). Visual attention based image browsing on mobile devices. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2003) (pp. 5356). Washington: IEEE Computer Society Press. Fono, D., & Vertegaal, R. (2005). EyeWindows: evaluation of eye-controlled zooming windows for focus selection. In Proceedings of Human Factors in Computing Systems (CHI’05) (to appear). New York: ACM Press. Gemmell, J,, Toyama, K., Zitnick, L. C., Kang, T., & Seitz, S. (2000). Gaze awareness for video-conferencing: A software approach. IEEE Multimedia, 7 (4), 26-35. Hansen, J. P., Andersen, A. W., & Roed, P. (1995). Eye-gaze control of multimedia systems. In Y. Anzai, K. Ogawa & H. Mori (Eds.), Symbiosis of Human and Artifact. Proceedings of the 6th International Conference on Human Computer Interaction (HCII 1995) (pp. 37-42). Amsterdam: Elsevier. Haritaoglu, I., Cozzi, A., Koons, D., Flickner, M., Zotkin, D., Duraiswami, R., & Yacoob, Y. (2001). Attentive toys. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME2001) (pp. 1124-1127). Washington: IEEE Computer Society Press. Horvitz, E., Kadie, C., Paek, T., & Hovel, D. (2003). Models of attention in computing and communication: from principles to applications. Communications of the ACM, 46 (3), 52-59. Hyrskykari, A. (2005). Utilizing eye movements: Overcoming inaccuracy while tracking the focus of attention during reading. Computers in Human Behavior (to appear). Amsterdam: Elsevier.

Hyrskykari, A., Majaranta, P. and Räihä, K.-J. From Gaze Control to Attentive Interfaces. Proc. HCII 2005, Las Vegas, NV, July 2005.

Hyrskykari, A., Majaranta, P., Aaltonen, A., & Räihä, K.-J. (2000). Design issues of iDict: A gaze-assisted translation aid. In Proceedings of the Symposium on Eye Tracking Research & Applications (ETRA’00) (pp. 914). New York: ACM Press. Hyrskykari, A., Majaranta, P., & Räihä, K.-J. (2003). Proactive response to eye movements. In G. V. M. Rauterberg, M. Menozzi, & J. Wesson (Eds.), Human-Computer Interaction (INTERACT’03) (pp. 129-136). Amsterdam: IOS Press. Jacob, R. J. K. (1991). The use of eye movements in human-computer interaction techniques: what you look at is what you get. ACM Transactions on Information Systems (TOIS) 9, 152-169. Jacob, R. J. K. (1995). Eye tracking in advanced interface design. In W. Barfield & T. A. Furness (Eds.), Virtual Environments and Advanced Interface Design (pp. 258-288). New York: Oxford University Press. Jerald, J., & Daily, M. (2002). Eye gaze correction for videoconferencing. In Proceedings of the Symposium on Eye Tracking Research & Applications (ETRA 2002) (pp. 77-81). New York: ACM Press. Kembel, J. A. (2003). Reciprocal eye contact as an interaction technique. In Extended Abstracts of Human Factors in Computing Systems (CHI’03) (pp. 952-953). New York: ACM Press. Koons, D., & Flickner, M. (2003). PONG: The Attentive Robot. Communications of the ACM, 46 (3), 50 (sidebar). Lieberman, H. A., & Selker, T. (2000). Out of context: Computer systems that adapt to, and learn from, context. IBM Systems Journal, 39, 617-631. Maglio, P. P., & Campbell, C. S. (2003). Attentive agents. Communications of the ACM, 46 (3), 47-51. Maglio, P. P., Matlock, T., Campbell, C. S., Zhai, S., & Smith, B. A. (2000). Gaze and speech in attentive user interfaces. In Proceedings of the Third International Conference on Advances in Multimodal Interfaces (ICMI 2000) (pp. 1-7). New York: Springer. Majaranta, P., & Räihä, K.-J. (2002). Twenty years of eye typing: systems and design issues. In Proceedings of the Symposium on Eye Tracking Research & Applications (ETRA’02) (pp. 15-22). New York: ACM Press. Morimoto, C., Koons, D., Amir, A., & Flickner, M. (2000). Pupil detection and tracking using multiple light sources. Image and Vision Computing, 18, 331-335. Oh, A., Fox, H., Van Kleek, M., Adler, A., Gajos, K., Morency, L.-P., & Darrell, T. (2002) Evaluating look-to-talk: A gaze-aware interface in a collaborative environment. In Extended Abstracts of Human Factors in Computing Systems (CHI’02) (pp. 650-651). New York: ACM Press. Ohno, T. (1998). Features of eye gaze Interface for selection tasks. In Proceedings of the 3rd Asia Pacific Computer Human Interaction (APCHI'98) (pp. 176-181). Washington: IEEE Computer Society Press. Pomplun, M., Ivanovic, N., Reingold, E. M., & Shen, J. (2001). Empirical evaluation of a novel gaze-controlled zooming interface. In Proceedings of HCI International 2001. Mahwah, NJ: Lawrence Erlbaum. Porta, M. (2002). Vision-based user interfaces: methods and applications. International Journal of HumanComputer Studies, 57, 27-73. Qvarfordt, P. (2004). Eyes on multimodal interaction. Dissertation No. 893,. Ph. D. Thesis, Linköping University. Qvarfordt, P., & Zhai, S. (2005). Conversing with the user based on eye-gaze patterns. In Proceedings of Human Factors in Computing Systems (CHI’05) (to appear). New York: ACM Press. Reingold, E. M, Loschky, L. C., McConkie, G. W., & Stampe, D. M. (2003). Gaze-contingent multiresolutional displays: an integrative review. Human Factors, 45, 307-328. Ramloll, R., Trepagnier, C., Sebrechts, M., & Finkelmeyer, A. (2004) A gaze contingent environment for fostering social attention in autistic children. In Proceedings of the Symposium on Eye Tracking Research & Applications (ETRA’04) (pp. 19-26). New York: ACM Press. Richardson, D. C., & Spivey, M. J. (2004). Eye-tracking: Research areas and applications. In G. E. Wnek & G. L. Bowlin (Eds.), Encyclopedia of Biomaterials and Biomedical Engineering (pp. 1-10). New York: Marcel Dekker. Roda, C., & Thomas, J. (2005). Attention aware systems. In C. Ghaoui (Ed.), Encyclopaedia of HCI (to appear). Hershey, PA: IDEA Group. Ruddarraju, R., Haro, A., Nagel, K., Tran, Q. T., Essa, I. A., Abowd, G., & Mynatt, E. D. (2003). Perceptual user interfaces using vision-based eye tracking. In Proceedings of the 5th International Conference on Multimodal Interfaces (ICMI'03) (pp. 227-233). New York: ACM Press. Santella, A., & DeCarlo, D. (2004). Visual interest and NPR: an evaluation and manifesto. In Proceedings of the 3rd International Symposium on Non-Photorealistic Animation and Rendering (NPAR 2004) (pp. 71-78). New York: ACM Press.

Hyrskykari, A., Majaranta, P. and Räihä, K.-J. From Gaze Control to Attentive Interfaces. Proc. HCII 2005, Las Vegas, NV, July 2005.

Selker, T. (2004). Visual attentive interfaces. BT Technology Journal, 22, 146-150. Selker, T., Burleson, W., Scott, J., & Li, M. (2002). Eye-Bed. In Proceedings of the Workshop on Multimodal Resource and Evaluation, in the Third International Conference on Language Resources and Evaluation (LREC 2002) (pp. 71-76). Selker, T., Lockerd, A., & Martinez, J. (2001). Eye-R, a glasses-mounted eye motion detection interface. In Extended Abstracts of Human Factors in Computing Systems (CHI’01) (pp. 179-180). New York: ACM Press. Shell, J. S., Vertegaal, R., Cheng, D., Skaburskis, A. W., Sohn, C., Stewart, A. J., Aoudeh, O., & Dickie, C. (2004). ECSGlasses and EyePliances: using attention to open sociable windows of interaction. In Proceedings of the Symposium on Eye Tracking Research & Applications (ETRA’04) (pp. 93-100). New York: ACM Press. Shell, J. S., Vertegaal, R., & Skaburskis, A. W. (2003). EyePliances: attention-seeking devices that respond to visual attention. In Extended Abstracts of Human Factors in Computing Systems (CHI’03) (pp. 770-771). New York: ACM Press. Sibert, J. L., Gokturk, M., & Lavine, R. A. (2000). The reading assistant: eye gaze triggered auditory prompting for reading remediation. In Proceedings of the 13th Annual ACM Symposium on User Interface Software and Technology (UIST'00) (pp. 101-107). New York: ACM Press. Starker, I., & Bolt, R. A. (1990). A gaze-responsive self-disclosing display. In Proceedings of Human Factors in Computing Systems (CHI’90) (pp. 3-9). New York: ACM Press. Takagi, H. (1998). Development of an Eye-Movement Enhanced Translation Support System. In Proceedings of the Asian-Pacific Computer-Human Interaction Conference (APCHI’98) (pp. 114-119). Washington: IEEE Computer Society Press. Velichkovsky, B., & Hansen, J. P. (1996). New technological windows into mind: there is more in eyes and brains for human-computer interaction. In Proceedings of Human Factors in Computing Systems (CHI’96) (pp. 496503). New York: ACM Press. Velichkovsky, B., Sprenger, A., & Unema, P. (1997). Towards gaze-mediated interaction: Collecting solutions of the “Midas touch problem”. In Proceedings of the IFIP TC13 International Conference on Human-Computer Interaction (pp. 509-516). London: Chapman & Hall. Vertegaal, R. (1999). The GAZE groupware system: mediating joint attention in multiparty communication and collaboration. In Proceedings of Human Factors in Computing Cystems (CHI'99) (pp. 294-301). New York: ACM Press. Vertegaal, R. (2002). Designing attentive interfaces. In Proceedings of the Symposium on Eye Tracking Research & Applications (ETRA’02) (pp. 23-30). New York: ACM Press. Vertegaal, R. (2003). Attentive user interfaces. Communications of the ACM, 46 (3), 30-33. Vertegaal, R., Dickie, C., Sohn, C., & Flickner, M. (2002). Designing attentive cell phone using wearable eyecontact sensors. In Extended Abstracts of Human Factors in Computing Systems (CHI’02) (pp. 646-647). New York: ACM Press. Vertegaal, R., Slagter, R., van der Veer, G., & Nijholt, A. (2001) Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes. In Proceedings of Human Factors in Computing Systems (CHI 2001) (pp. 301-308). New York: ACM Press. Vertegaal, R., Weevers, I., & Sohn, C. (2002). GAZE-2: an attentive video conferencing system. In Extended Abstracts of Human Factors in Computing Systems (CHI’02) (pp. 736-737). New York: ACM Press. Wooding, D. S. (2002). Fixation maps: quantifying eye-movement traces. In Proceedings of the Symposium on Eye Tracking Research & Applications (ETRA’02) (pp. 31-36). New York: ACM Press. Yamato, M., Monden, A. Matsumoto, K. Inoue, K., & Torii, K. (2000). Button selection for general GUIs using eye and hand together. In Proceedings of 5th International Working Conference on Advanced Visual Interfaces (AVI2000) (pp. 270-273). New York: ACM Press. Zhai, S. (2003). What’s in the eyes for attentive input. Communications of the ACM, 46 (3), 34-39. Zhai, S., Morimoto, C., & Ihde, S. (1999). Manual and gaze input cascaded (MAGIC) pointing. In Proceedings of Human Factors in Computing Systems (CHI’99) (pp. 246-253). New York: ACM Press.

Suggest Documents