Multimodal Menu Presentation and Selection in

Multimodal Menu Presentation and Selection in Immersive Virtual Environments Namgyu Kim, Gerard Jounghyun Kim, Chan-Mo Park, Inseok Lee y and Sung H. Lim y Dept. of Computer Science and Engineering and y Dept. of Industrial Engineering Pohang University of Science and Technology (POSTECH) San 31, Hyoja-dong, Pohang, Kyungbuk, Korea [email protected]

Abstract Usability has become one of the key ingredients in making virtual reality (VR) systems work, and a big part of a usable VR system is in the design of effective interface/interaction schemes. During the past few years, several empirical studies have been conducted to evaluate the usability of generic interaction techniques in the context of virtual environment design. As a continued effort in this line of research, we investigate in the usability of various menu presentation and multimodal selection schemes in immersive virtual environments. It is often required to make commands in a virtual environment, and when there are many number of available commands with a complex structure, a hierarchical menu has been one of the most popular choices. While the task of menu selection may be viewed as a composite task of manipulation and selection, and therefore, we can certainly build on the prior research in generic interaction tasks, there are also enough idiosyncrasies that warrant a more in depth look as a generic task of its own. We have identified 5 major menu display and 13 possible menu selection methods; among them we have finished the usability testing for the first two (tracking/button and gesture/gesture) interaction methods (across the 5 display methods). In this paper, we explain how we came up with the classification, how we set up and ran the usability testing, and report on the results we have obtained from testing the two interaction methods.

1. Introduction Usability has become one of the most important aspect of making virtual reality(VR) systems succeed for a given application, and a big part of a usable VR system is in the

design of effective interface/interaction schemes. During the past few years, several empirical studies have been conducted to evaluate the usability of generic interaction techniques such as navigation, selection, and manipulation in the context of virtual environment design [4, 6, 14]. While these results are valuable for designing interaction methods, such a collective effort still falls short of forming a mature interaction/interface design guideline for VR. It is not clear how these research results apply to designing interfaces for particular task, application, or even a composite generic task. The reason may be that there are simply not enough studies yet to conclude on a definite set of guidelines due the many factors that must be considered including the types of interaction or display devices, types of domain tasks, display resolution, frequency of the primitive tasks, number of hands used, etc. As a continued effort in this line of research, we investigate in the usability of various menu presentation and multimodal selection schemes in immersive virtual environments. It is often required to make commands in a virtual environment, and when there are many number of available commands with a complex structure, a hierarchical menu has been one of the most popular choices both in 2D and 3D environments. A typical example is a computer aided design system in which many functionalities are available with a large number of commands with up to three to four hierarchical levels. On top of this, the task of design requires the user to continually use the menu system. AutoCAD, for instance, has about few hundred available commands with typical depths of two to three, and an average task (e.g. making a block with a hole) might require three to four commands [2]. The menu system is also the most familiar 2D computer interface that we know of, and it would be most beneficial for the computer users to have it extended in the immersive 3D environments. Therefore, while the

task of menu selection may be viewed as a composite task of manipulation and selection (or as a simple 2D navigation) and thus, we can certainly build on the prior research in generic interaction tasks, it should be studied in more depth as one generic VE task for its aforementioned importance, frequent usage, and other idiosyncrasies (e.g. virtual space visibility, showing of the command path). In our study, we first reclassified various 2D and 3D menu presentation styles in the context of immersive VE and identified 5 major menu display methods appropriate for the usability testing. We believe that menu display occupies a significant portion of the visual display, and thus is an important element in the overall interaction closed loop (i.e. see, select, see, select ...). Then, by viewing the menu selection task as a composite task of “positioning” (manipulation) and “making a command”, and assigning different interaction modalities for each subtask, we finally identified 13 possible menu selection methods after eliminating some of the “tivially deemed infeasible or insignificant” ones. Among them, we have finished the usability testing for the first two (tracking/button and gesture/gesture combination) interaction methods across the 5 display methods. The main objective of the usability test was to collect quantitative data such as the task completion time (for positioning and making a command), and accuracy (error rate) under the different menu interaction schemes, and by analyzing them derive a principle for immersive menu system design for VR system. Qualitative factor data were also collected by menus of a survey that addressed user preference and interaction naturalness. This paper is organized as follows. In Section 2, we review some of the research results relevant to this work. In Section 3 and 4, we explain how we classified different menu display and multimodal selection methods, and generated the final candidates for the usability testing. Section 5 describes the experiment set-up and discusses on the results. Finally, in Section 6, we the paper with a summary.

2. Related Work Effective selection styles and structure designs of 2D desktop menus have been investigated extensively in the Human Computer Interaction (HCI) areas [9, 15]. Many HCI studies attest the convenience of the menu systems in the 2D desktop environment. But, Whether these merits of 2D menus carry over to the 3D VE is not clear because methods of displaying and interacting with the virtual menus in 3D are quite different. Nevertheless, 3D virtual menus have been implemented and used by many employing various methods of menu invocation, locating, selection, and highlighting [12, 8, 1]. Jacoby et al. designed a hierarchical menu structure with pop-up and pull-down the 3D virtual menus and employed

four hand gestures for selecting the menu items [12]. A use of a 3D widget, a virtual instrument with menu items, were proposed by Ferneau et al. for menu selection method [8]. The device used for menu selection was a 3D mouse, and the buttons of the 3D mouse had to correspond to the items of the physical virtual instrument. This approach assumed that the abilities of the physical device match those of the virtual. A technique of embedding a general 2D interaction metaphor in the 3D VE considered by Angus et al. [1]. They focused on the implementation problems of importing flat screen applications to VR environments without having to modify the application source code. But, their approach could result in a performance degradation by using large textures and by inconsistent texture resolution at different view positions. Feiner et al. built 2D windows for a 3D augmented reality environment which provided a transparent window display relative to user’s head and body [7]. This system provided three kinds of windows: surroundfixed, display-fixed, and world-fixed. The surround-fixed is displayed at a fixed position in the world, and display-fixed is positioned at a fixed location relative to user’s head orientation. World-fixed is fixed locations or objects which are invokable by the user. Our classification for menu display is much influenced by this work. While, most commonly, a virtual menu is implemented as a 2D flat object, Liang et al. introduced the 3D object menu concept in their interactive 3D modeling system, JDCAD [13]. The 3D Ring menu which used 3D objects in stead of 2D flat items was easy to use with minimum depth ambiguity through its daisy mechanism. Problems exhibited in 3D object selection are also manifested in 3D menu selection, an analogy suggested and found by [6]. In this study, the effect of different placements of the menu bar, font type and size, the menu availability, the use of highlighting and the depth and breadth of the menu were all considered, and he several design guidelines were derived the use of maximum 8, 18 point sized menu items with highlighting on selected items, and the discretional use of context sensitive menus. Bowman et al. used virtual menus in their applications “Virtual Venue” and “VR Gorilla Exhibit” - using the “pen and tablet” metaphor [3]. Bowman reported a result that highlights the efficiency of using 2D menus in 3D environments instead of employing direct actions upon world objects through multiple metaphors. Most work related to the virtual menu did not carry out usability tests. On the other hand, selection, manipulation and navigation techniques in VE have been investigated by many [4, 11, 14]. These work give testing guidelines and methods for interaction techniques testing. However, menu selection is different from the general selection task because of menu’s 2D appearance and the predetermined selection

Figure 1. Variants of VE menu display by location. mechanism. Our study considers menu selection as conceptually generic task composed of the two subtasks of “positioning” and “making a command”.

3. Menu Presentation 3.1. Location and View Direction In 3D environments, unlike in the 2D, we must first carefully consider where to locate the menu system within the world, which in turn, will determine the user’s viewing direction to the menu. There are three possibilities we have considered. Figure 1 illustrates the three variants.

World Fixed (WF): The menu system resides at a fixed location in a “strategic” world location. View Fixed (VF): The menu system is attached at and viewed from a fixed offset from the user (thus, it moves with the “head tracked” user). Object Fixed (OF): The menu system is attached at one or more “strategic” virtual objects (which may move in time).

WF allows a relatively comprehensive display of the overall menu structure and menu selection history (because it is located at a strategic location away from where the task is being carried out), while with VF and OF, a more compact menu display must be used not to block the task area. This is especially true in an immersive environment where head mounted displays are used, as the most HMD’s suffer from low resolutions and narrow fields of view. The next section addresses this design issue.

3.2. HMD Resolution and Display Items This study aims at studying the effectiveness of various menu systems in an immersive head tracked virtual environment and assumes the use of an HMD. Although the situation is fast changing, it will still be some time before even HMD’s with the relatively low resolutions (e.g. 640 x 480) and wide FOV’s become widely available and affordable. With such a limitation, it is difficult to display large amounts of information at once in one display frame (e.g. letters on the menu are not recognizable if displayed in small scales), a potential problem for a hierarchical menu system with many available commands and levels. The followings are variants of menu display methods ranging from the usual to the compact ones to address this problem. Figure 2 pictorially depicts these alternatives.

Pull-down (PD): The usual pull-down menu that displays the highest level menu items, and show its branches only during the selection task. Pop-up (PU): The usual pop-up menu that disappears once the selection is made. The menu structure associated with the particular menu selection path is shown only on user’s invocation. Stack Menu: A menu system that persistently displays the selection path either at the top portion of the popup menu (disappears once the selection is done), thus called the Fixed Stack (FS), or at a separate location (e.g. at the corner of the screen), thus called the Basket Stack (BS). Only the menu options selectable at a given level is shown Object-Specific Menu: This menu system is a conglomerate of the pull-down/pop-up and object-fixed

menu schemes. Each object contains a specific pulldown (OPD) or pop-up (OPU) menu applicable to that object class, thus the menu structure is distributed among the virtual objects in the scene.

Oblique/Layered (OL): This is a flat menu presentation displayed in an oblique fashion, or its structure organized and displayed by layers.

For instance, as already indicated, OPD / OPU are only possible under OF. Table 1 summarizes the two main criteria we have used in classifying the various menu display methods and reasons of why some of the combinations were eliminated from the final candidates for the experiment.

HMD Resolution and Display Items

PU PD FS BS OPU OPD OL

Location and Direction WF VF OF

R2 R3

R2 R3 R1

R1 R1

R1 R3 R2 R3 R3 R2 R3 R1

R1

Table 1. A classification of menu display methods for VE. The combination marked with the “ ” in the table are the combinations that are selected for the usability testing, while the others were excluded for the following reasons.

Reason 1 (R1): In WF or OF, there is less constraint on limiting the number of display items to maximize the visibility of the task area, because in the WF case, the menu location is away from the task area, and in the OF case, the user is quite aware of the task (and the task to be done) as a specific object is selected. Reason 2 (R2): This reason is complimentary to R1. It is also the very reason behind the design of Oblique/Layered menu system. Reason 3 (R3): In terms of the selection mechanism, we believe that there is little difference between the WF and the OF; the only difference is that in OF, the menus might move and the menu items may differ slightly from object to object. Since our focus is in the interaction (although display method is part of it), we decided to effectively merge WF and OF into one group.

The five final candidates for menu display methods to be tested in conjunction with the interaction methods are (See Figure 3): Figure 2. Variants of VE menu display by items

3.3. Final Candidates for Experiment Some of the menu display schemes identified according to the respective criterion go hand in hand for one another.

WF-PD: World Fixed, Pull-down WF-PU: World Fixed, Pop-up VF-FS: View Fixed, Fixed Stack VF-BS: View Fixed, Basket Stack VF-OL: View Fixed, Oblique/Layered

possible; further investigations might eliminate some of the combinations. For instance, tracking and gesture with one hand might not be an appropriate combination, as the positioning task will be affected by the gesture making task.

5. The Experiment WF-PD

WF-PU

VF-FS

VF-BS

VF-OL Figure 3. Menu display methods seen by test subjects

5.1. Experimental Design The menu system usability test is set in an immersive environment for virtual piping. Users can create, arrage and modify pipes in a virtual building, through the menu system and other auxiliary functionalities such as sound feedback for collision and task completion. Figure 8 shows the menu structure used in the experiment. As a start, we selected TB1 and GG1 among many combinations shown in Table 2. For TB1, users used a tracker (Polhemus Fastrak) and a 3D mouse (Logitech 3D mouse) and for GG1, used a glove (5DT glove). Four mouse buttons were used for menu invocation (pedestal button), ray invocation (left button), enter (right button), and cancel (middle button). The glove system had five gestures for menu invocation, enter, cancel, previous and next. Each gesture is a pinch action - between the thumb and palm, thumb and ring, thumb and pinkie, thumb and index, thumb and middle. The gestures were made as simple as possible, and a limit switch was used on the thumb to ease the recognition of the hand action. Figure 4 are examples of scenes seen by the user in the test VE.

4. Multimodal Menu Interface: Positioning and Making a Command We already mentioned earlier that the menu selection task is composed of two primitive subtasks, positioning and making a command. In this section, we describe the classification of menu selection methods, according to three different modal input methods for these two subtasks respectively. For positioning, we considered tracking, gesture and voice, and for making a command, we considered button input, gesture, and voice to signal the final yes/no decision. Tracking simply tracks to user- hand or a metaphorical object to designate a desired menu item and is considered a continuous event-driven modality. On the other hand, voice and gesture recognition, that allows users to directly speak of the menu item (e.g. start, enter, escape) or make positioning or commands (e.g. next, previous) has a discrete eventdriven modality. Table 2 shows the overall interaction scheme classification. The “X” marks the infeasible combinations for the selection task. For instance, tracking and voice combination under the column of zero hand is simply impossible (i.e. one hand used). We make a note of that the “ ” merely represent combinations that might be

Figure 4. Two interaction methods (TB and GG) for menu selection The users were asked to perform the following two tasks 1. (T1) make a box and view help information regarding a menu - this task consists of two independent commands 1 . 2. (T2) make a T-shaped pipe, select the pipe, and then change its color to blue - this consists of three sequence-dependent commands. 1 The term “command” is used to describe a distinct function of the virtual piping system that may require several steps of primitive interaction tasks

Positioning Continuous Event-driven Modality Tracking 0 1 2 Button

X

TB1

Gesture

X

Voice

X

Num. of hands

Making a Command

Discrete EventDriven Modality

;;;;;!

Discrete Event-driven Modality Voice 0 1 2

0

Gesture 1

2

TB2

X

GB1

GB2

X

VB1

X

TG1

TG2

X

GG1

GG2

X

VG1

X

TV1

X

X

GV1

X

VV0

X

X

Table 2. A classification of menu selection methods for VE. Each task is repeated 5 times across the respective 5 display methods (Figure 3). We gathered four types of test data:

Completion time - from when users invocate an initial menu to when users complete the given task.

button push and GG1 requires a thumb-and-palm gesture in response. We repeated the training task and when the task completion time for e ach task came down to about 20 seconds, we accepted the trainee as experiment subjects.

Positioning time - from when users invocate the menu to when users locate to the right menu item. Command time - from the time of correct positioning, till making a command (the menu item chosen right before final selection). Error - the frequency of selection of wrong menu items. Erroneous menu selection data are excluded for figures of the above three measures.

We also surveyed the user, asking their opinions on the relative convenience of searching the menu items among 5 display methods, and among the two modality combinations (TB1 and GG1).

5.2. Subjects A total of 16 users participated in the experiment - 8 for TB1 and 8 for GG1. Half of each group were experienced in using 3D devices. Before the experiment, users were first trained to make menu selection in a particular modality (TB1 or GG1). For the TB1 test, users learned the mouse manipulation, and for the GG1 test, users learned the gesture commands. In the training environment, users were shown a box and a command text as shown in Figure 5. Users were trained by making a gesture or a mouse button push corresponding to the command text. Figures 5 shows the “start” command text: TB1 requires a pedestal

Figure 5. Learning basic interaction skills for menu selection in two different modalities.

5.3. Results and Discussion A factorial analysis of variance(ANOVA) was performed on the completion time, positioning time, command time and error. Table 3 shows the means and standard deviations for each task result. In general, it was found that with GG1, it took less time to make a command than to home in on the right menu item. In contrast, with the TB1, the pattern was the other way around. This shows the effectiveness of gestures for positioning in menu selection. Between the two tasks (T1 and T2), we did not find any significant statistical differences. Interestingly, the user survey found the gesture input method as more convenient and natural. Figure 6 shows the completion time, positioning time and command time across 5 display methods. In case for T1, we found a significant difference between VF-FS and WF-PD, while for T2, we could not find this property.

Time (Sec.) GG1 T 1

TB1 GG1

T 2

TB1

p < 0:01

Completion Time

m = 26:47 (sd = 9:06) m = 24:93 (sd = 6:04) m = 55:77 (sd = 20:69) m = 39:33 (sd = 6:49)

Positioning Time

Command Time

m = 11:36 m = 15:16 (sd = 4:08) (sd = 6:00) m = 15:49 m = 9:50 (sd = 4:72) (sd = 2:83) m = 25:63 m = 30:14 (sd = 9:13) (sd = 14:03) m = 23:56 m = 15:85 (sd = 3:87) (sd = 4:27)

Table 3. Means and standard deviations of performances in T1 and T2.

We performed the Tukeys’ multiple comparison analysis, and as a result, three similar performance groups surfaced; WF-PD/WF-PU, WF-PD/VF-BS/VF-OL, and VF-FS/VFBS/VF-OL. There existed a statistically significant performance differences among these three groups, explaining the worst task performance under the WF-PD in both TB1 and GG1. Figure 7 shows the relationship between the display methods and input modalities. The marked data is the ratio of positioning time to the completion time. With TB1, WF-PD takes more positioning time than that of other display methods in both T1 and T2. These results are explained by the amount of “distance” that must be traversed to home in on the right item. Structure-wise, WF-PD generally requires the most amount of travel distance. This is further seen by the differences between the two tasks, T1 and T2. In T1, the menu items needed are separated relatively farther than those needed by T2 due to the menu structure that tends to group dependent menu items together. To summarize, we found that the gesture input method for the positioning task was a faster and more convenient method compared to the continuous tracking, due to the amount of “travel” that has to be carried out with the continuous tracking. This was even more apparent in menu selection tasks in which menu items were separated farther in their locations.

6. Conclusion and Future Work In this paper, we have made a classification for various menu selection systems for VE according to their 3D location, viewing direction, display items, display resolution, and input modalities. Among the many possibilities, as a start, we have conducted a usability test for the combina-

Figure 6. Relative task performances across the five display methods.

tion of continuous tracking / button selection, and gesture positioning / gesture selection across 5 menu display methods. We have found that the gesture input method for the positioning task was a faster and more convenient method compared to the continuous tracking, due to the amount of “travel” that has to be carried out with the continuous tracking. This was even more apparent in menu selection tasks in which menu items were separated farther in their locations. We plan to continue our usability testing for the remaining input modality combinations, and finally derive a comprehensive design guidelines for menu systems in an immersive VE.

References [1] I. G. Angus and H. A. Sowizral. Embedding the 2d interaction metaphor in a real 3d virtual environment. Proceedings of SPIE, Stereoscopic Displays and Virtual Reality Systems, 2409:282–293, 1995. [2] AutoDesk. AutoCAD Realease 12: User Guide. AutoDesk, Inc., 1993. [3] D. Bowman, L. F. Hodges, and J. Bolter. The virtual venue: User-computer interaction in information-rich virtual environments. Presence: Teleoperation and Virtual Environments, 7(5):478–493, October 1998.

Figure 7. Relationship between display methods and input modalities.

[4] D. A. Bowman and L. F. Hodges. An evaluation of techniques for grabbing and manipulation remote objects in immersive virtual environments. Proceedings of the Symposium on Interactive 3D Graphics, pages 138–188, 1997. [5] R. Darken. Hands-off interaction with menus in virtual spaces. Proceedings of SPIE, Stereo Displays and Virtual Reality Systems, 2177:365–371, 1994. [6] R. P. Darken. Navigating in large virtual worlds. The International Journal of Human-Computer Interaction, 8(1):49– 72, 1996. [7] S. Feiner, B. MacIntyre, M. . Haupt, and E. Solomon. Windows on the world:2d windows for 3d augmented reality. Proceedings of UIST’93, pages 145–155, 1993. [8] M. Ferneau and J. Humphries. A gloveless interface for interaction in scientific visualization virtual environments. Proceedings of SPIE, Stereoscopic Displays and Virtual Realiy Systems, 2409:268–274, 1995. [9] J. D. Foley, V. L. Wallace, and P. Chan. The human factors of computer graphics interaction techniqus. IEEE Computer Graphics and Applications, pages 13–48, November 1984. ˜ [10] D. J. Gillan, K. Holden, S. Adam, M. Rudisill, and L. Magee. How should fitts’ law be applied to human-computer interaction. Interacting with Computers, 4(3):291–313, 1992. ˜ [11] D. Hix, E. J. Swan, J. L. Gabbard, J. Durbin, and T. King. User-centered design and evaluation of a real-time battlefield visulization virtual environment. Proceedings of IEEE Virtual Reality, pages 96–103, 1999.

Figure 8. The virtual CAD menu structure used in the experiment.

[12] R. H. Jacoby and S. R. Ellis. Using virtual menus in virtual environment. Proceedings of SPIE, Visual Data Interpretation, 1668:39–48, 1992. [13] J. Liang and M. Green. Jdcad:a highly interface 3d modeling system. Computers and Graphics, 18(4):499–506, 1994. [14] M. R. Mine. Virtual environment interaction techniques. UNC Chapel Hill Computer CS Tech. Report, TR95-018, 1995. [15] J. Preece, Y. Rogers, H. Sharp, D. . Benyon, S. Holland, and T. Carey. Human-Computer Interaction. Addison-Wesley, 1995. [16] Sense8. WorldToolKit Reference Manual. Sense8 Product Line, April 1999.

Multimodal Menu Presentation and Selection in

Multimodal Menu Presentation and Selection in

Suggest Documents