Document not found! Please try again

Interaction and Presentation Techniques for Shake Menus in Tangible ...

5 downloads 104274 Views 1MB Size Report
the iPhone, whose accelerometers can detect a shaking gesture. .... Figure 6. In the OBJECT condition, (a) the menu is activated and (b) the menu items appear ...
Interaction and Presentation Techniques for Shake Menus in Tangible Augmented Reality Sean White*

David Feng*

Steven Feiner*

Columbia University

(a) (c) (b) Figure 1. Using a shake menu to display and select additional information about a leaf specimen for an electronic field guide. (a) The user holds an object (in this case, an optically tracked fiducial marker array) and (b) shakes the object. (c) The system then displays a radial menu of options arrayed around the tracked object. All images are shown from the perspective of the head-worn display.

ABSTRACT Menus play an important role in both information presentation and system control. We explore the design space of shake menus, which are intended for use in tangible augmented reality. Shake menus are radial menus displayed centered on a physical object and activated by shaking that object. One important aspect of their design space is the coordinate system used to present menu options. We conducted a within-subjects user study to compare the speed and efficacy of several alternative methods for presenting shake menus in augmented reality (world-referenced, displayreferenced, and object-referenced), along with a baseline technique (a linear menu on a clipboard). Our findings suggest tradeoffs amongst speed, efficacy, and flexibility of interaction, and point towards the possible advantages of hybrid approaches that compose together transformations in different coordinate systems. We close by describing qualitative feedback from use and present several illustrative applications of the technique. KEYWORDS: 3D interactions, augmented reality, menus, shake menus, information display, authoring, selection, positioning INDEX TERMS: H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems—Artificial, augmented, and virtual realities; H.5.2 [Information Interfaces and Presentation]: User Interfaces—Input devices and strategies, Interaction Styles; I.3.6 [Computer Graphics]: Methodology and Techniques—Interaction techniques 1

INTRODUCTION

In augmented reality (AR), interaction with the environment takes many forms that are often related to 3D user interfaces (UIs). We are particularly interested in tangible AR techniques, which combine tangible UIs [19] with AR, to present information about *email:

{swhite, feiner}@cs.columbia.edu [email protected]

IEEE International Symposium on Mixed and Augmented Reality 2009 Science and Technology Proceedings 19 -22 October, Orlando, Florida, USA 978-1-4244-5389-4/09/$25.00 ©2009 IEEE

physical objects or use physical objects as props for interaction and system control. Bowman et al. [4] organize 3D interaction techniques by interaction task, separating out selection and manipulation, travel, wayfinding, system control, and symbolic input. Within the system control task, graphical menus, similar to menus in desktop UIs, provide a familiar model for interaction and also support information presentation. Complementary to classification by task, Daschelt and Hübner [8] taxonomize AR menu techniques, distinguishing between glove- or hand-based selection and physical prop-based selection. While their classification covers many other types of interaction in 3D UIs, they focus on these specific distinctions in AR. Across both taxonomies, a wide variety of menu-selection techniques have been developed. Beyond the motivating context of tangible AR, we focus here on prop-based menu selection and menu presentation centered on physical objects for several reasons. First, there is evidence that providing a tangible anchor or prop increases the sense of presence of virtual objects [15], enhances the perception of realism of the virtual experience [24], and increases visual understanding of information through manipulation [2]. While we recognize that tangible AR is not appropriate for all situations, these benefits have been observed in other research in which users can physically manipulate a physical object that represents a virtual object [36]. Such benefits suggest that prop-based interaction merits further research. Second, while interfaces fixed to a specific location are useful in stationary situations, we are interested in mobile interactions where the physical environment can potentially change as the user moves. For this, we require menus that present themselves “readyto-hand” [14], so that the user does not need to return to a particular palette or a fixed and rooted location. This difference is similar to the difference between selection from a pop-up menu that appears at the tip of a stylus or finger versus a menu bar at the top of a window. The user can continue to focus on the task at hand, rather than system interaction. While one could consider other menu techniques such as menus in a heads-up display, we are interested in menus that are associated with tangible objects. Finally, AR systems are becoming increasingly popular, in part because simple printable props that rely on marker-based tracking (e.g., [10, 18]) provide an inexpensive, low barrier tool for interaction. We would like a lightweight menu technique that is actu-

39

ated by a tangible gesture and does not require dedicated electronics, such as a button, to invoke a menu or confirm a selection. With improvements in feature-based tracking [40], any object regardless of instrumentation or prior modeling, may eventually become a prop for interaction. This paper examines shake menus (Figure 1), a technique that addresses these considerations. Extending our introduction to shake menus in an earlier poster [39], we now characterize the design space for shake menus (and other radial menus for tangible AR), present a user study focused on an important aspect of the design space, and discuss example applications of the technique. In shake menus, shaking an object activates menus that are displayed around the object. A menu option is then selected by moving the object to the option. We are particularly interested in three scenarios of use for shake menus in tangible AR: menu selection, combined menu selection and positioning, and information query and display. For example, in the first scenario, an optically tracked fiducial marker or feature-tracked object is morphed from one virtual object to another, and the object can be manipulated and inspected using spatial gestures. We would like the menu system to support previews of these changes prior to morphing. Previews should be ephemeral and present only when requested, providing a menu of information about the object. In the second scenario, we want to view a set of optional objects for authoring, and select and place objects using the technique. In the third scenario, we would like to use shake menus to query and manipulate data in AR information visualizations. In particular, we are interested in situated visualizations [38], in which virtual representations of information are visualized in proximity to the physical context of interest. In the remainder of this paper, we discuss related work and describe the technique and design space along with implementation issues. We then present a user study comparing different forms of menu placement that include display-referenced, worldreferenced, and object-referenced, along with a baseline technique (a linear menu on a clipboard). We focus on placement because it represents a unique aspect of interaction in tangible AR with shake menus and other radial menu systems. We follow this with a discussion of our results and observations, and conclude with a discussion of two applications of the technique and plans for future work. 2

RELATED WORK

A variety of interaction mechanisms for menus exist in the domains of desktop 2D and 3D, virtual reality (VR), and AR UIs. Here, we place our research in the context of 2D menus, handbased menus, and prop-based menus. We also discuss work on shaking for actuation. 2.1 2D Menus and 3D Spaces In 2D UIs, radial menus or pie menus [16] (e.g., marking menus [21]) let the user stay within the flow of interaction by bringing the interface to the locus of interaction. Rather than shift focus and attention to a separate palette or remote fixed location, menu selection occurs at the current focus of attention. In VR and AR, a variety of techniques have been developed that adapt 2D menus to 3D systems. For examples, Vickers [35] introduced the use of 3D ray intersection to select items on a menu printed on the wall of a room in AR. Ring menus [23] take advantage of the 1-DOF nature of menu selection to arrange menu items along a 3D ring. Hand and wrist movement rotates the ring and the item within a visually obvious focus location is selected when a trigger action occurs. Mine et al. [25] describe how a virtual toolbelt worn by a VR user can hold needed tools in a known location relative to the user’s tracked body (and later work extends this concept to store objects beyond the edges of the display in 3D desktop UIs [29]). Compar-

40

ing means of interaction with a 3D widget in a VR system, Mine et al. found that participants were able to return more easily to a position relative to their own hand than to a position fixed in space. They also found a preference for interacting with a widget fixed to the hand rather than one fixed in space. Bernatchez and Robert [3] compared alternative frames of reference for the task of moving a slider on a floating menu surface in a 3D virtual space. They found a link between the degrees of freedom in the frame of reference and user performance of this task. In contrast, we focus on menu activation and selection. 2.2 Hand-Based Menus In these techniques, menu selection and presentation is centered on the user’s hand. TULIP menus [5] assign menu options to different fingers of a pinch glove, with options displayed on the fingers and palm. Pinching the thumb and finger associated with a given menu option selects that option. Piekarski and Thomas’s Tinmith-Hand menu [28] also maps menu items to individual fingers, but displays menu labels fixed to the bottom of the display. Buchman et al.’s FingARtips [6] uses the hand for selection, too, but employs gesture recognition to select menu items or objects. Although our technique is hand-oriented, in contrast to these techniques, it is prop-based. 2.3 Prop-Based Menus Grosjean et al.’s Command and Control Cube, C3 [12], uses a 333 cubic grid offset from the user’s hand to represent 26 menu options, presented on a rear-projected responsive workbench. The cube is activated by pressing a button and a selection is made by moving the hand. The hand controls an offset spherical cursor that moves within the C3, which is itself offset from the hand (to avoid being obscured by the hand). In the Tiles system, Poupyrev et al. use a book of fiducial markers as a menu [30]. Each page of the book displays a different object that can be selected and copied by moving a hand-held fiducial near the fiducial on the page. Proximity to a given fiducial marker in the book of options triggers a specific action. The form factor of the book provides a large number of options, but only one option is viewable at a time. The Personal Interaction Panel [34] displays sliders and options on the surface of a tablet as a panel of virtual controllers. The tablet is held in the nondominant hand and selections are made using a stylus prop held in the dominant hand. Tuister [7] provides a tangible UI to menus using a physical cylinder with a handle. Menu items are displayed on the cylinder and twisting the cylinder changes the menu item. In contrast to these systems, we focus on the use of gestures for activation and specifically compare alternative placement techniques. 2.4 Shaking Shaking was described by Kato et al. [19] as a potential gesture in their VOMAR system. Most of the subsequent work focused on actions such as tilting or proximity. Sinclair et al. [32] use a shaking gesture to “sprinkle” hypertext links on objects. White et al. [37] use shaking to activate and deactivate display of Visual Hints. Rohs and Zweifel [31] explore the use of gestures such as rotation, tilting, and pointing for interaction using mobile phones. However, they neither display information around the device nor discuss different coordinate systems for menu placement. Shaking has been used to activate a menu on an LCD screen [13] embedded in a Chumby digital device. In this case, shaking makes a set of option selections appear on the body of the Chumby LCD screen, much like a mobile phone. Shaking is now commonly used as an actuation method on mobile phones, such as the iPhone, whose accelerometers can detect a shaking gesture.

(a)

(b)

(c)

(d)

Figure 2. Shaking movement can be detected in four directions: (a) horizontal, (b) vertical, (c) back and forth, and (d) rotational.

3

SHAKE MENUS TECHNIQUE

Building on this body of related work, our shake menus technique is also inspired by observing how people shake gifts and other objects to see what is inside; for example, shaking a box and listening closely for some hint of the possibilities hidden behind the wrapping. Here, we apply this metaphor of shaking to reveal information and activate a menu. In this way, we provide an ephemeral interface that does not occlude and appears only when necessary. The shaking gesture (described in the following section) is simple and requires very little training. Initially, no menu is visible. Shake menus are activated by shaking a hand-held fiducial marker or other tracked object. Once activated, a radial menu appears whose items are arranged around the circumference of the hand-held object. A menu element is selected by moving the fiducial into, or optionally through, the same space as the menu element. If no selection is desired, the fiducial can be shaken again to hide the menu. In the following subsections, we discuss elements of the design space for shake menus, including activation, spatial layout and representations, menu placement, selection, and post-selection actions. 3.1 Activation We detect shaking similar to the form suggested in the VOMAR [19] system, but distinguish horizontal, vertical, back and forth, and rotational shaking (Figure 2). We have investigated the use of other gestures to activate menus, including pausing, but find that pausing is confounded with holding the fiducial still to focus attention on objects displayed relative to the fiducial. We also considered occlusion of secondary markers for activation, but found this was both awkward for single-handed operation and generated substantial false positives. While a gesture such as shaking takes more time than a simple button press, gesture recognition has a strong advantage over the use of an active button: it makes it possible to support menus associated with uninstrumented objects, potentially allowing us to shake anything that can be tracked to see what it might reveal (e.g. shaking a credit card that is tracked using feature-based tracking). To detect shaking, we track the positions of our hand-held optical fiducial marker and record them in arrays for horizontal positions (x values), vertical positions (y values), depth positions (z

values), and rotational values (about the x, y, or z axis). We detect the movement of the hand-held marker by calculating the difference between the current position and the previous position. For example, if the y value of the current position is greater than the previous position by a specific amount (2 cm for x or y and 4 cm for z in our implementation), we classify it as moving up. Consecutive movements are then parsed to generate gestures. Once four continuous movements of opposite sign in any one direction are detected within a time threshold (4 seconds in our implementation), we recognize it as a shake, and provide an auditory cue to let the user know that the shake has been detected. Here, direction is based on movement relative to the plane of the marker. We use a count of continuous movements, rather than the amount of time for shaking, because we found that different users shake at different speeds. The time threshold simply puts an upper bound on the length of time a shake may take. As with most gesture-based systems, frame rate affects the quality of gesture detection. We found that frame rates below 16 fps interfered with shake detection. 3.2 Spatial Layout and Representations Once activated, the menu is presented to the user. Currently, we present menu choices only in the plane parallel to the hand-held fiducial for two reasons. First, Grosjean et al. [12] found higher error rates in the upper and lower planes of the C3 as compared to the central plane. Second, we keep a single plane of selection to avoid occlusion of 3D objects, allowing the user to tilt the fiducial (if the menus are object-referenced, as described in Section 3.3) and see different views of 3D objects to be selected. 3.2.1 2D and 3D Menu Items The technique can be applied to both 2D and 3D menu items within a 3D coordinate system. We have explored 2D menu items in our user study and in an application for displaying botanical species information. The technique has also been applied to 3D menu items in a system used for authoring planetary systems. We discuss both applications in Section 5. 3.2.2 Number of Items and Hierarchies Kurtenbach and Buxton [22] found increasing error rates with marking menus containing more than eight items or with menu items that were placed “off-axis” in locations other than the eight cardinal and intermediate compass directions. For this reason, we also use only four or eight top-level menu items (Figure 3) and rely on hierarchies of menus for larger numbers of items.

Figure 3. Eight menu items displayed at the cardinal and intermediate compass directions.

41

3.3 Menu Placement and Coordinate System Once a shake has been detected, the menu appears immediately. We wait a brief period of time before stabilizing it (1 second in our implementation) and then place the menus relative to the current position of the hand-held marker. We find the brief wait useful because the user normally stops the gesture once they hear the auditory cue that their gesture has been recognized, and this tends to stabilize the hand-held marker. For placement, we consider placement of menu options around the marker that is either object-referenced, display-referenced, or world-referenced [9]. These terms refer to how the menus, once presented to the user, are positioned relative to a specific coordinate system. In object-referenced placement (Figures 4b and 6), the menus are attached to and move with the hand-held marker. We record the position of the hand-held marker, constantly calculate and update the difference between the recorded position and the current position of the hand-held marker, and apply the difference to the menus. We do this so that movement towards a particular option actually moves the marker towards the option, yet the menu options move in the world with the hand-held marker. In this case, the menus stay with the marker even if the marker is held out of sight. This has the advantage that orientation of the menu can be changed to provide alternative views on the set of menu choices. Figure 6(c) shows an example of an objectreferenced menu that changes orientation with changes to the orientation of the marker. This is particularly useful when the elements are 3D objects, supporting a change in point of view by changing orientation of the marker. In display-referenced placement (Figures 4c and 7), the menus are frozen, attached to the display in the location where they were activated. The menus stay in this position, relative to the display, even if the marker is removed. We record the position of the hand-held marker relative to the camera mounted on the headworn display as a transformation matrix, transfer the menus from the marker node to the scene node, and apply the matrix to the menus, keeping the menus in a fixed position relative to the display. In world-referenced placement, (Figures 4d and 8), the menus stay floating in the world, located in the position where the menu was activated. We accomplish this by using a ground plane array of fiducial markers to establish the world coordinate system, in contrast to the hand-held marker. Moving the head or hand-held marker will not move the menus. We record the world-referenced position of the hand-held marker as a matrix, and multiply it with the inverse matrix of the world marker node, transfer the menus from the marker node to the world marker node, and apply the matrix to the menus. The world-reference can also be acquired from an external tracker, such as an InterSense IS-900, or through feature-based tracking, which requires no additional sensor beyond the existing camera. Each of these different placement techniques provides a different type of interaction with the shake menu. In Section 4, we compare these different reference coordinate systems for speed and efficacy in selection. We focus on this aspect of the technique for evaluation because it has not been compared in previous menu research in AR and VR and because it represents a unique aspect not found in 2D menus. 3.4

Selection

3.4.1 Alignment Our primary means of selection is through alignment of the handheld marker with one of the menu selections. This provides a simple and intuitive means of selecting menu options. We reduce the

42

(a)

(c)

(b)

(d)

Figure 4. Menu placement. Red highlights show the reference for each of the different coordinate system conditions. (a) Clipboard, (b) object-referenced, (c) display-referenced, and (d) worldreferenced. White square represents hand-held marker.

degrees of freedom in the selection task by using a ray-casting technique for selection, as suggested by Bowman et al. Once the marker has been aligned with an option, we flash the option to provide feedback that the system recognizes the potential selection, but do not yet complete the selection. This intermediate feedback gives the user the chance to change their mind and avoids errors from accidental alignment. Once the option starts flashing and the user briefly maintains the position, selection is completed and auditory feedback is once again provided, so the user knows the selection has been made. We also hide the rest of the menu options to provide redundant cues to the user. Because the hand-held marker is tracked relative to the camera mounted on the head-worn display, moving the head position, while holding the hand-held marker in place, can also make a selection in the display-referenced and object-referenced modes. 3.4.2 Crossing and Marking We have also explored crossing [1] and marking techniques with shake menus. However, these appear more prone to error and require more research. Another option for selection is to use the direction of shaking movement to make the selection. This is similar to marking menus or rubbing [27], in that the direction and form of the activation action also provides a specific selection. 3.5 Post-Selection Actions We have explored an extension to the technique that supports a combination of selection and object placement, similar to the way that a fiducial marker paddle [20] has been used for placement. In a normal shake menu interaction, the menu selection is accomplished, and the user continues on with their task. With the addition of positioning, once a menu item has been selected, the handheld fiducial is then tracked to determine the position of an object to be placed. When the fiducial is quickly removed from view, the model stays in the location where the fiducial was last seen. (This was inspired by the quick removal of a lightpen from the display to terminate drawing a line in early 2D graphics systems by causing the system to lose track of the lightpen [33].) We discuss this technique in Section 5, where we use it to position objects in an authoring environment after selecting the specific object to be placed.

(d)

(a)



(b)



(c)



Figure 5. The CLIPBOARD condition is similar to the existing linear menu used in our tangible AR Electronic Field Guide. (a) Once the menu has been activated, (b) the menu items appear aligned along the right side of a clipboard. (c) Menu items stay attached to the clipboard, regardless of the position or orientation of the user’s head or the hand-held marker.

(a)



(b)

 (c)



Figure 6. In the OBJECT condition, (a) the menu is activated and (b) the menu items appear attached to the object, in this case a hand-held marker. (c) Menu items change orientation as the marker changes orientation and stay attached to the marker. Here, the menu items move with changes to the position and orientation of the hand-held marker.

(a)



(b)



(c)



Figure 7. In the DISPLAY condition, (a) the menu is activated and (b) the menu items appear attached to the display, around the current location of the object that was used to activate the menu. (c) Once activated, the menu items stay attached to the display and move with the user’s head, regardless of the position or orientation of the object. Here, the user’s head and the hand-held marker have been moved, and the menu items stay fixed to the user’s display.

(a)



(b)



(c)



Figure 8. In the WORLD condition, (a) the menu is activated and (b) the menu items appear floating in the air, around the current location of the object that was used to activate the menu. (c) Once activated, the menu items stay in the same world-referenced location, regardless of the position or orientation of the user’s head or the object used to activate the menu. Here, both the user’s head and the hand-held marker have been moved, but the menu items stay fixed to the world coordinate system.

43

4

EXPERIMENTAL EVALUATION: PRESENTATION

In our discussion of presentation in the previous section, we described several different ways in which a menu can be presented to the user. To investigate the differences across presentation methods and to get feedback from users, we conducted a user study. Thirteen paid participants (12 male, 1 female), ages 20–37, were recruited by mass email and flyers posted around our university. All participants were frequent computer users. Our experimental conditions were object-referenced (OBJECT), displayreferenced (DISPLAY), and world-referenced (WORLD), as shown in Figure 4(b–d). In addition, we included a baseline condition in which the menu is attached to another secondary fiducial array, similar to the technique used in the Tiles system. In our study, this was a clipboard condition (CLIPBOARD) (Figures 4a and 5). For this condition, we required that the marker come in contact with the menu options. We did this for several reasons. First, we are interested in comparing with proximity-based selection used in systems such as Tiles and our tangible AR electronic field guide. Second, we wanted to avoid simple ray casting because some participants in our early pilot tests picked up the clipboard and tilted it, causing the ray casting approach we had used to hit two separate menu options. Third, we wanted to simulate touching a virtual menu item. Note that this is necessary when the hand-held marker is attached to a virtual object that is being inspected because accidental selection can easily occur in simple ray-casting approaches when the marker is held close to the eye. In this experiment, we compared the participant’s performance selecting items from a menu using the four UI conditions described above, as shown in Figure 9. Menu item content was represented by colored squares displayed above, below, to the left, and to the right of the hand-held fiducial marker or a column of boxes attached to a clipboard in the CLIPBOARD condition. For example, a yellow square could appear above the hand-held fiducial (Figure 9b). Participants were prompted to select a specific color and asked to make menu selections using each of the conditions. Time-to-select was recorded along with the accuracy of the selection, as well as the number of incorrect alignments prior to selection. A post hoc questionnaire was used to assess the participants’ qualitative reactions to the different conditions. The study took approximately one hour and was conducted in a controlled laboratory setting. We formulated two hypotheses: (H1) Object-referenced presentation will support the fastest menu selection. We believe that this would be true because the menus would always be in a known place relative to the hand. (H2) Object-referenced presentation will result in the fewest errors. Our rationale here is the same as that for H1. 4.1 Experimental Setup The experiment was performed on an Intel Core 2 Duo 2.33 GHz PC with 2G RAM, running Windows Vista (Figure 9a). Video from the PC was output to a Sony LDI-D100B color, stereo, seethrough head-worn display running at 800600 resolution. A Creative Labs VF0070 USB video camera was mounted on the Sony display, capturing video at 640480 resolution, allowing the display to be run in biocular (i.e., both eyes see the same image) video–see-through mode. Fiducial markers were mounted to a rigid card for the hand-held marker and a clipboard for the CLIPBOARD condition. The test system was built using Goblin XNA [26], on top of Microsoft’s XNA game development infrastructure, which we have supplemented for AR, including 6DOF optical marker tracking provided using ARTag [10].

44

(a)

(b)

(c)

(d)

(e)

Figure 9 (a) Experimental configuration for user study. (b) Objectreferenced menu presentation with color prompt in upper righthand corner. (c) Selection of menu option. (d) Clipboard selection. (e) Bimanual clipboard selection.

4.2 Task Participants were asked to make a menu selection based on automated prompting. The experimental environment consisted of a set of four colored squares displayed above, below, left, and right of the marker (or a column of boxes for the CLIPBOARD condition). 4.3 Procedure A within-subjects, repeated measures design was used, consisting of four techniques (OBJECT, DISPLAY, WORLD, and CLIPBOARD). The single-session experiment lasted approximately 45 minutes and was divided into four blocks. Participants could take a break at any time by not activating the menu. Each block consisted of 80 trials of the four techniques (20 trials  4 techniques) and the order in which the techniques were presented was counterbalanced across participants. Prior to beginning the trials, the participant was shown a video explaining the task and procedure to standardize knowledge about the experiment. The participant was then given a practice session so they could learn and experiment with the techniques and run through a series of practice trials. The practice blocks included eight trials (2 trials  4 techniques) and the participant was told they could repeat the practice

Figure 10. Average completion times (seconds) for the four conditions with standard error of the mean (SEM): DISPLAY and OBJECT were significantly faster than CLIPBOARD and WORLD.

block if they wished. Two participants requested, and were each given, a single additional practice block. Once the participant was comfortable with the techniques, they began the actual four-block session. Prior to each block, the participant was given an onscreen message telling them that the block was beginning. The participant was then free to shake the handheld card and activate a menu. A sound was played to tell the participant the system recognized the shaking gesture and the trial menu of color options was displayed. Timing began when the color prompt was displayed in the upper right-hand corner (Figure 9b). Once the participant selected a menu option (Figure 9c), auditory feedback was provided and the menu options were hidden to acknowledge the selection. The next trial began when the participant shook the marker. 4.4

Results

4.4.1 Completion Time Analysis We performed a one-way ANOVA for repeated measures on mean selection times for the successfully completed trials, with our participants as a random variable. We found significant main effects across several conditions for =0.05 (Figure 10). Technique had a significant main effect on completion times (F(3,36)=8.65, p< 0.001). WORLD was, on average, more than 4.7 seconds slower than DISPLAY (t(12)=3.0624, p

Suggest Documents