A. Chris Long, Jr. 1. Introduction .... to see the entire menu structure to draw a gesture (or mark) to select the item they want, .... [Lips91]. James Lipscomb.
Dissertation Proposal: The Design and Evaluation of Gestures for Pen-Based User Interfaces A. Chris Long, Jr. 1. Introduction The topic of this dissertation proposal is methods for improving the design of gestures and gesture sets for pen-based user interfaces (PUIs) and interaction techniques for using gestures in PUIs. The ubiquity of pen/pencil and paper as a way of recording many types of data shows the promise of PUIs. A significant advantage of PUIs over paper is the ability to easily edit data. Editing commands in PUIs are often invoked through the use of special marks called gestures. A PUI typically has many gestures available for use at the same time. The set of gestures available for use simultaneously is a gesture set. Unfortunately, it is difficult to design a gesture set for a PUI whose elements can be reliably recognized by the computer and easily learned and remembered by the users. This difficulty is due, at least in part, to the lack of tools for designing good gesture sets. Another challenge for PUI design is that because of its novelty there are few interaction techniques. The results of this research will be new PUI interaction techniques and a tool for the design and evaluation of gesture sets. The rest of this section motivates the proposed research. The next section describes the research goal. Then related work is discussed. Next, the project plan is outlined.
1.1 Desirability of pen interfaces Writing on paper with pens has been an important, widely used technology for centuries. It is versatile and can easily express text, numbers, tables, diagrams, and equations [Meye95]. Many authors list benefits that pen-based computer interfaces could enjoy, both on portable computing devices and the desktop [MS90, Meye95, Hann92, Fran95, Brig93]. For example, pens may be used one-handed, are light and silent, and allow fine motor control [Brig93]. Also, commands issued with pens (i.e., gestures) are desirable because they are terse, commonly used, and easier to remember than textual commands [MS90]. Recently, many researchers and corporations have expressed interest in computer interfaces based on pen interaction. Many pen-based Personal Digital Assistants (PDAs) have been introduced, such as the Apple Newton and Sony Magic Link. However, in spite of the potential benefits of pen computing, these products have not been very successful. In part, this market failure is due to poor handwriting and gesture recognition. Some researchers believe that perfect handwriting recognition will never be achieved [Meye95]. Pens have been available for desktop computers for years, but are only commonly used by specialized communities (e.g., artists). One reason pens have not become popular on the desktop is that few application interfaces take advantage of the unique capabilities of pen input. For the promise of pen interfaces to be fulfilled, it is Qualifying Exam Proposal
1
A. Chris Long
important for pen-based interfaces to be well-designed. The limitations of recognition accuracy underscore this importance. The proposed research will construct a tool to aid in the design of gestures for pen-based interfaces and develop new interaction techniques that better take advantage of the unique characteristics of PUIs.
1.2 Problems with gestures in current PUIs One of the great advantages PUIs have over other types of interfaces is gestures, because gestures allow users to concisely specify an operation and an operand with a single stroke (or a small number of them). Ideally, when user wants to invoke an abstract operation in an application whose PUI has a gesture for that operation, the user will use the corresponding gesture. However, an informal survey of PDA users shows that users often do not use gestures even when they are available and appropriate. There are several reasons why the user may not take advantage of gestures in a PUI. The rest of this subsection describes four hypotheses about why a user might fail to use a gesture. The first case is that the user may not know that the desired abstract operation is available in the application. This situation is a problem for many applications, not only PUIs, and is not addressed by the proposed research. The second case is that the user knows the abstract operation is available, but does not know that it can be invoked with a gesture. Among a small number of Newton users surveyed, this was a common problem. The proposed research will address this problem in two ways. One is by using new interaction techniques that will help users learn what gestures are available. Another is through a tool for gesture set design that will warn the designer if a gesture is likely to be difficult to remember. [put reference to later section here?] The third case is that the user knows a gesture exists to do the desired operation, but cannot remember what it is. Of the Newton users surveyed, this was a significant problem. The proposed research will attack this problem with the same methods as the previous one (i.e., new interaction techniques for learning gestures and a tool for gesture design). The fourth case is when the user does not wish to use the gesture because of it has frequently been misrecognized. A significant fraction of Newton users surveyed reported that they no longer used at least one gesture because of uncertainty that it would be correctly recognized. Misrecognition of gestures by the system is caused by one or more of: poor drawing of the gesture, insufficient training of the system, and poor gesture set design. The proposed research addresses the first of these with interaction techniques that help the user learn the proper way to draw gestures and the latter two with a gesture set design tool. Some ideas for solving these problems were alluded to above. Details of the prosed approaches to solving these problems are given in the next section. Qualifying Exam Proposal
2
A. Chris Long
2. Problem Solutions The proposed research takes two complementary approaches to solving the above problems with gestures in PUIs. The first is a tool for gesture set design and the second is new interaction techniques. The first subsection below describes the features of the gesture design tool. The second subsection describes the new interaction techniques.
2.1 General gesture set design tool features This subsection describes how the gesture set design tool will help PUI designers create gesture sets. Some of these methods might be used with more than one type of recognizer, but others are enhancements suitable only for feature-based recognizers. The proposed research focuses primarily on feature-based recognition. One feature of the design tool is that it will detect gestures that are difficult for the computer to disambiguate. This feature is useful because a gesture set designer cannot always tell if two gestures will be easily confused by the recognizer. This is especially true if the designer does not know details of how the gesture recognizer works. It is often difficult to generate adequate training data for a recognizer. The proposed tool will allow the designer to specify declaratively certain information about the gesture, such as size- or orientation-independence, and use this information to generate new training instances from those given by the designer. Some recognition algorithms are based on a set of predefined features about the gestures. For efficiency at run time, it is desirable to have few features, but if too few of the right features are used recognition accuracy will suffer. The proposed tool will have a library of features and will choose different features for different gesture sets. Gesture set designers may know little about how easily gesture set can be learned and remembered by users, but learnability and memorability of the gesture set are very important. The tool will estimate the learnability and memorability of gesture sets. A few Newton users surveyed indicated they would like to be able to define their own gestures on the fly. Although all the features of the proposed tool may not be appropriate for end users, it could be adapted for use by end users in applications. One of the problems with training feature-based recognizers is they assume that the feature values for all examples of a gesture have similar values. If different examples of a gesture are widely different, recognition accuracy is degraded. For example, a gesture that can be drawn at several sizes or orientations, might be problematic. For gestures whose example feature values are clustered near several values instead of one, the tool can improve the recognition. The tool will tell the recognizer that the problematic examples actually correspond to different gestures. When one of these gestures is recognized, the categories the recognizer uses can be combined into one applicationlevel category. A strength of gestures is the ability to specify the operand at the same time as the operation. Often, the operand is specified with a special point on the gesture called the hot spot. Typical hot spots for a gesture are the starting point, ending point, an inflection Qualifying Exam Proposal
3
A. Chris Long
point, or a point where the gesture intersects itself. The design tool will allow the designer to specify a hot spot for each gesture. [handle multiple hot spots/gesture?]
2.2 New interaction techniques The second approach of this research to improving PUIs is the invention and validation of new interaction techniques for PUIs. PUIs are sufficiently different from traditional GUIs that the best interaction techniques for GUIs are not the same as those for PUIs. Below are several techniques with which the proposed research will experiment. One interaction technique is to animate a gesture being drawn to help the user learn it. This could be done when the operation corresponding to the gesture is invoked by other means, to show that a gesture was available for that operation. In showing the gesture to the user, the system should animate it so that the way it should be drawn is apparent. Another useful interaction technique would be to show near matches when the system misrecognizes a gesture. When a gesture is misrecognized, the user could undo it and try again, but it might be better if the system allowed the user to bring up a menu of several gestures that it thinks are close to the one it incorrectly chose. In a case where the system was very uncertain it had correctly recognized a gesture it could display such a list automatically. Some recognizers can recognize gestures before they are completely drawn. This process is called “eager recognition”. Using eager recognition, an application could take advantage of this by giving feedback for a gesture before the gesture has been completed. For example, a spiral gestures might indicate zooming in or out. Once enough of the spiral had been drawn to be recognized, the system could begin zooming, and continue zooming as long as the user continued drawing in a circular motion. There are digitizing tablets that allow drawing with objects other than the pen, such as a finger, and can distinguish between the pen and other things. Although user can not point as exactly with a finger as with a pen, fine control is not needed for many operations. Also, recognition could be greatly improved since the system would not mistake a finger gesture for data entry.
3. Related Work The areas most closely related to this work are: gesture-related interaction techniques; applications that take advantage of PUIs; and technology that enables PUIs, such as gesture and handwriting recognition. These areas are discussed in more detail in the following subsections.
3.1 Interaction techniques A large number of interaction techniques involving gestures have been developed. Henry, et al described a highly customizable GUI toolkit that includes provisions for gesture input and snap-dragging [Henr90]. A selection technique based on pie menus called “marking menus” was introduced in [Kurt91] and refined in [Tapi95]. Marking menus improve on pie menus by delaying the drawing of the menu. Experts do not need Qualifying Exam Proposal
4
A. Chris Long
to see the entire menu structure to draw a gesture (or mark) to select the item they want, so marking menus are faster than traditional pie menus for experts. The disadvantage of marking menus is that the shapes of the marks is arbitrary since it is derived from the organization of the hierarchical menu. In contrast, the proposed research seeks to design gestures that correspond in some way to the function they invoke. Pie menus were also enhanced by Venolia and Neiberg [Veno94]. Their interface, called “T-Cube”, is a method of entering text using a pen. Nine pie menus with eight items each are arranged in a target configuration (one in the center with the other eight around it). The items of the menus are the individual characters. Another way to make text entry better is to improve recognition by using an alphabet that is easier for computer to understand. This has been done by [Gold93] and [Lee94] to give much greater recognition accuracy. Gestures are used in a very unusual manner in [Baun94] to edit free-hand drawings. The metaphor in this system is based on how artists clean up drawings. That is, the user sketches close to the curve to be changed and the curve moves toward the new stroke. Freehand drawing is useful, but structure is also helpful in situations such as note-taking in meetings. A group at Xerox PARC added rudimentary perceptual understanding to a pen-based whiteboard application [Mora95]. Specifically, the system could group items on the electronic whiteboard using alignment. Simple gestures could be used to edit the drawing. Lopresti and Tomkins advocated treating electronic ink as a first class datatype in [Lopr95]. They developed a system in which electronic ink is first class and a method of searching for electronic ink. Kato and Nakagawa showed the benefits of using lazy recognition of both ink and gestures [Kato95]. For example, if one person were editing another person’s document, it would be desirable for the gestures to remain visible and unexecuted so the original author could later decide which changes to make.
3.2 Applications This section lists several applications based on PUIs. Wolf, et al compared the effectiveness of pen input with keyboard in a drawing application, spreadsheet, music editor, and an equation editor [Wolf89]. They found that the pen was most useful in their spreadsheet task, in which editing with a pen was 30% faster than with the keyboard. Briggs et al also compared pen and keyboard interfaces for tasks in several office applications [Brig93]. Users in their study liked using a pen for navigation and positional control, but not for text entry. The authors give these advantages of pen input: one-handed, no moving parts, light, silent, fine motor control, direct manipulation, and simple and flexible (at least for some applications). [Zhao95] describes a traditional drawing program with a pen interface. The interface included gestures and the ability of users to define their own gestures for commands. Landay shows an interface design tool based on sketching and gestures that is wellsuited for pen input [Land95]. Qualifying Exam Proposal
5
A. Chris Long
Chatty and Lecoanet discuss how pen input is useful for air-traffic control [Chat96]. They point out that the wireless pens in their system make the displays easier to share between multiple users than displays with mouse-controlled cursors.
3.3 Recognition technology This section presents a sample of recent gesture recognition research. Rubine invented a single-stroke gesture recognizer that matches gestures based on builtin features [Rubi91]. His recognizer allows new gestures to be added on-the-fly. Also, his recognizer can recognize a gesture as soon as it is unambiguous, even if it is not yet completed (i.e., eager recognition). This is the primary recognizer to be used in the proposed research. The recognizer developed by Lipscomb is also trainable on-the-fly, but based on different recognition technology [Lips91]. His recognizer is insensitive to the scale of the gesture and where appropriate can recognize mirror and rotated gestures. It is computationally inexpensive. Higher level information can be beneficial in gesture recognition, as [Zhao93] showed. Zhao’s system is based on two recognizers: a low-level one changes point coordinates into symbols, and a high-level one translates those symbols into application-level commands, subject to appropriate contextual constraints. [Ulge95] describes a more complicated recognizer. It combines feature extraction, fuzzy functions, and neural networks to recognize not only gestures, but also shapes. Like Lipscomb’s recognizer, it is orientation and scale independent.
4. Project Plans This section describes the steps of the proposed research. 1. Survey current PDA users. Current users of PDAs will be surveyed to determine what interface problems they have, focusing on the role of gestures. 2. Study causes of user problems. Survey results show that these problems are largely due to computer recognition of gestures, human memory of gestures, and poor interaction techniques. 3. Develop gesture set design tool. Design and build a tool to enable PUI designers to more easily design and evaluate gesture sets. 4. Develop new PUI interaction techniques. 5. Evaluate results. Evaluate gesture set design tool by testing a gesture set before using the tool and after. Test new interaction techniques on users to see how effective they are. Qualifying Exam Proposal
6
A. Chris Long
References [Baun94]
[Brig93]
[Chat96]
[Fran95]
[Gold93]
[Hann92] [Henr90]
[Kato95]
[Kurt91]
[Land95]
[Lee94] [Lips91]
Thomas Baundel. A Mark-Based Interaction Paradigm for Free-Hand Drawing. In Proceedings of the ACM Symposium on User Interface and Software Technology (UIST ’94), pages 185–192. ACM, ACM Press, Nov 1994. Robert Briggs, Alan Dennis, Brenda Beck, and Jr. Jay Nunamaker. Whither the Pen-Based Interface? Journal of Management Information Systems, 9(3):71– 90, 1992-1993. Ste’phane Chatty and Patrick Lecoanet. Pen Computing for Air Traffic Control. In Human Factors in Computing Systems, pages 87–94. ACM, Addison-Wesley, Apr 1996. Clive Frankish, Richard Hull, and Pam Morgan. Recognition Accuracy and User Acceptance of Pen Interfaces. In Proceedings of ACM SIGCHI ’95, pages 503–510. ACM, Addison-Wesley, Apr 1995. David Goldberg and Cate Richardson. Touch-Typing With a Stylus. In Stacey Ashlund, Kevin Mullet, Austin Henderson, Erik Hollnagel, and Ted White, editors, Proceedings of ACM SIGCHI ’93, pages 95–100. ACM SIGCHI, Addison Wesley, Apr 1993. Karl-Heinz Hanne and Hans-Jo:rg Bullinger. Multimedia Interface Design, chapter 8, pages 127–138. ACM Press, 1992. Tyson Henry, Scott Hudson, and Gary Newell. Integrating gesture and snapping into a user interface toolkit. In UIST Third Annual Symposium on User Interface Software and Technology, pages 112–122. ACM SIGGRAPH and SIGCHI, ACM Press, Oct 1990. Naoki Kato and Masaki Nakagawa. The Design of a Pen-based Interface ’SHOSAI’ for Creative Work. In Yuichiro Anzai, Katsuhiko Ogawa, and Hirohiko Mori, editors, Proceedings of the Sixth International Conference on Human-Computer Interaction, volume 1 of Advances in Human Factors/ Ergonomics, pages 549–554. Information Processing Society of Japan, Institute for Electronics, Information and Communication Engineers, Japan Ergonomics Research Society, Public Health Research Center, and The Society for Instrument and Control Engineers, Elsevier Science, Jul 1995. Gordon Kurtenbach and William Buxton. Issues in combining marking and direct manipulation techniques. In UIST Fourth Annual Symposium on User Interface Software and Technology, pages 137–144. ACM SIGGRAPH and SIGCHI, ACM Press, Nov 1991. James Landay and Brad Myers. Interactive Sketching for the Early Stages of User Interface Design. In Human Factors in Computing Systems, pages 43–50. ACM, Addison-Wesley, Apr 1995. Yvonne L. Lee. PDA users Can Express Themselves with Graffiti. InfoWorld, 16(40):30, Oct 3 1994. James Lipscomb. A Trainable Gestures Recognizer. Pattern Recognition, 24(9):895–907, Sep 1991.
Qualifying Exam Proposal References
6
A. Chris Long
[Lopr95]
D. Lopresti and A. Tomkins. Computing in the Ink Domain. In Yuichiro Anzai, Katsuhiko Ogawa, and Hirohiko Mori, editors, Proceedings of the Sixth International Conference on Human-Computer Interaction, volume 1 of Advances in Human Factors/Ergonomics, pages 543–548. Information Processing Society of Japan, Institute for Electronics, Information and Communication Engineers, Japan Ergonomics Research Society, Public Health Research Center, and The Society for Instrument and Control Engineers, Elsevier Science, Jul 1995.
[Meye95]
Andre’ Meyer. Pen Computing. SIGCHI Bulletin, 27(3):46–90, Jul 1995.
[Mora95]
Thomas Moran and et al. Implicit Structures for Pen-Based Systems Within a Freeform Interaction Paradigm. In Proceedings of ACM SIGCHI ’95, pages 487–494. ACM, Addison-Wesley, Apr 1995.
[MS90]
Palmer Morrel-Samuels. Clarifying the disctinction between lexical and gestural commands. International Journal of Man-Machine Studies, 32:581–590, 1990.
[Rubi91]
Dean Rubine. Specifying Gestures by Example. In Computer Graphics, pages 329–337. ACM SIGGRAPH, Addison Wesley, Jul 1991.
[Tapi95]
Mark Tapia and Gordon Kurtenbach. Some Design Refinements and Principles on the Appearance and Behavior of Marking Menus. In Proceedings of the ACM Symposium on User Interface and Software Technology (UIST ’95), pages 189–195. ACM, Nov 1995.
[Ulge95]
Figen Ulgen, Andrew Flavell, and Norio Akamatsu. Recognition of On-Line Handdrawn Geometric Shapes by Fuzzy Filtering and Neural Network Classification. In Yuichiro Anzai, Katsuhiko Ogawa, and Hirohiko Mori, editors, Proceedings of the Sixth International Conference on Human-Computer Interaction, volume 1 of Advances in Human Factors/Ergonomics, pages 567– 572. Information Processing Society of Japan, Institute for Electronics, Information and Communication Engineers, Japan Ergonomics Research Society, Public Health Research Center, and The Society for Instrument and Control Engineers, Elsevier Science, Jul 1995.
[Veno94]
Dan Venolia and Forrest Neiberg. T-Cube: a fast, self-disclosing pen-based alphabet. In Beth Adelson, Susan Dumais, and Judith Olson, editors, Human Factors in Computing Systems, pages 265–270. ACM SIGCHI, Addison Wesley, Apr 1994.
[Wolf89]
Catherine Wolf, James Rhyne, and Hamed Ellozy. The Paper-Like Interface. In Gavriel Salvendy and Michael Smith, editors, Designing and Using HumanComputer Interfaces and Knowledge Based Systems, volume 12B of Advances in Human Factors/Erngomics, pages 494–501. Elsevier, Sep 1989.
[Zhao93]
Rui Zhao. Incremental recognition in gesture-based and syntax-directed diagram editors. In Stacey Ashlund, Kevin Mullet, Austin Henderson, Erik Hollnagel, and Ted White, editors, Human Factors in Computing Systems, pages 95–100. ACM SIGCHI, Addison Wesley, Apr 1993.
Qualifying Exam Proposal References
7
A. Chris Long
[Zhao95]
R. Zhao, H.-J. Kaufmann, T. Kern, and W. Mu:ller. Pen-based Interfaces in Engineering Environments. In Yuichiro Anzai, Katsuhiko Ogawa, and Hirohiko Mori, editors, Proceedings of the Sixth International Conference on Human-Computer Interaction, volume 1 of Advances in Human Factors/ Ergonomics, pages 531–536. Information Processing Society of Japan, Institute for Electronics, Information and Communication Engineers, Japan Ergonomics Research Society, Public Health Research Center, and The Society for Instrument and Control Engineers, Elsevier Science, Jul 1995.
Qualifying Exam Proposal References
8
A. Chris Long