A 3D Annotation Interface Using the DIVINE Visual Display - CiteSeerX

HAVE'2006 - IEEE International Workshop on Haptic Audio Visual Environments and their Applications Ottawa, Canada 4-5 November 2006

A 3D Annotation Interface Using the DIVINE Visual Display Karim Osman, Francois Malric, Shervin Shirmohammadi Distributed Collaborative Virtual Environment Research Lab (DISCOVER Lab) School of Information Technology and Engineering Univeristy of Ottawa, Ottawa, Canada {karim, frank, shervin}@discover.uottawa.ca

Abstract – While systems such as CAVEs have been experimented with and used for a number of years, their deployment has been slow mainly due to their expense and space requirements. As such, researchers have been moving towards smaller and cheaper immersive systems. In this paper, we introduce an immersive interface for manipulating 3D objects using the DIVINE system. Keywords – 3D Human Computer Interface, 3D-Object Interaction, Immersive Visual Displays

I.

INTRODUCTION

For the past few decades, major advances in computer graphic display technologies have enabled more and more sophistication in the way users interact with computers. From blinking lights on control panels of many years ago, to room size surround screens immersing the user in computer imagery, the display technologies now bring us ever closer to the 'real-thing' with research done in areas such as display resolution, color fidelity, and usable software graphic libraries. The human-computer interfaces to interact with these display systems naturally followed the evolution. We see more research towards enabling humans to intuitively and easily interact with 3D objects as in the real world. In this paper, we present our research results for both the graphic display and the interaction methods. More specifically, we present our approach to mixing and matching software and hardware components to create a platform used to display 3D objects that appear to be floating in front of the user. We do so by first defining immersion and introducing our custombuilt Desktop-Immersive Virtual and Interactive Networked Environment (DIVINE) System. In addition, we present a software tool developed on top of DIVINE that allows users to simply “reach out” with their hand and annotate 3D objects as intuitively as in the real world. This tool allows for natural navigation and interaction with the 3D world. A. Immersion Immersion can be defined as the state of deep and natural involvement a user experiences when interacting within some space. A computer graphic display is said to be immersive when it is rendering on its display mediums, specifically for some user’s view, in a way to make him or her perceive some virtual space as part of the real physical space. In other words, if an immersive display were to show a coffee cup on a tabletop display medium, then that coffee cup should

remain in the same place on that tabletop no matter the viewer’s position and angle of sight, exactly as it would, had there been a real coffee cup on that tabletop. To achieve this coherency, the display platform has to account for the user’s movement (head and body, position and rotation) and display the graphics in such a way that the displayed objects look as if they were physically there and viewed from a given angle and position. The most common Immersive Projection Technology (IPT) setup is the CAVE (Cave Automatic Virtual Environment) system. Typically, a CAVE system constitutes of several large display screens to fill as much of a user’s field of view as possible. With video projectors, computer systems that can synchronously display in stereo on these multiple screens, a 3D position and orientation sensor, and stereo glasses complete a classic CAVE setup. By entirely surrounding the user with viewpoint specific stereo rendering, “full” immersion can be achieved whereas all the perceived physical space is virtual space. This full immersion is very difficult to achieve by reason of its requirements. The integration of a multitude of technologies, all with their own practical limits, imposes severe restrictions in the end product, with a price tag exponentially proportional to the size and number of screens. B. The DIVINE System our DIVINE system allows the display of immersive computer graphics on a desktop-like workspace more appropriate for close interaction, and alleviates some of the problems related to a fully immersive CAVE. A major advantage making it suitable for finer use is the greater pixel density offered by concentrating the video projection on smaller and closer screens. Also of significance is its better affordability and its much smaller space requirements. Figure 1 shows the hardware setup of DIVINE, which is custom designed from the ground up at the DISCOVER lab and is built by a local company. For rendering 3D graphics on an IPT, each of its constituent screens must display the proper view projection of the graphics relative to the eyes of the user (approximated by using a 3D tracker attached to his/her head). Several software packages propose to do this IPT rendering. We explore the use and limitations of some of these offerings in the context of our application: an immersive 3D-object interaction application. In this application, the users can draw 3D annotations in free space as well as on objects, all the while being immersed in a

Figure 1. The DIVINE system at the University of Ottawa DISCOVER Lab.

virtual environment. In doing that, we are trying to leverage the users’ intuition about how they expect things to work in the real world and try to make sure that their interaction with different elements of the system feels familiar and natural. Because of this, there are two other important elements to this application. First, we provide different methods of navigating the virtual environment so that annotations can be drawn where needed. Second, we have also developed a 3D menu system that allows the user to change between different annotation and navigation modes. These menus can be shown and hidden as needed, and are set to appear in front of the user. II. RELATED WORK There has been some research done in the field of annotations in virtual environments using audio, text, and annotations drawn on objects. A system has been developed that allows users to create audio annotations associated with objects which can later be played back while in a 3D virtual environment [1]. Audio annotations allow for things to be expressed more easily, however alone they can harder to use, especially in an interactive and collaborative virtual environment. It is also difficult to quickly overview the annotations present without actually listening to all the audio annotations. Other systems allow for 3D virtual objects to be annotated with text [2-5]. These generally restrict annotations to be bound to virtual objects. Furthermore, it is not always easy to express thoughts purely in text. There has also been some work done with using drawn annotations. A system was

created where the user wears a head-mounted display and draws annotations on a pressure-sensitive tablet which is represented by a virtual notepad [6]. A user can also pick up virtual documents and draw annotations on these using the same setup. An application allows users to draw annotations, either highlighting or scribbles, on electronic books represented in 3D [7]. Another application allows users to review designs and leave drawn annotations either on objects or on a user-defined plane, or to leave text annotations, all done through a standard web browser. A system allows users to annotate the surface of objects using a force feedback device [8]. These annotation systems come close to what we are trying to achieve, but in all the cases mentioned above, the annotations need to be somehow bound to an object. Our system provides an immersive virtual environment with the ability to create 3D graphical annotations in free space, as well as annotations on objects. This is also coupled with an attempt to create a system that’s intuitive to use and easy to interact with. III. THE ANNOTATION SSYTEM Figure 2 on the next page shows the application in action. Here, we see the user while he’s interacting with the object (a human’s head, in this case). In order to interact with the system, the user must wear a head sensor so that they may properly be immersed in the virtual world. This allows them to see virtual objects as if they were floating in front of them. The user must also hold a sensor in one of their hands to interact with the virtual objects.

a) Interacting with the 3D object in DIVINE

b) The user annotates the object, and paints the lips of the head Figure 2. Various shots of the system in action.

A. 3D Annotations There are two methods used to create 3D annotations. The first mode allows users to draw annotations on objects, while the second one allows them to draw annotations in free-space. When the user is drawing on objects, a virtual object is attached to their hand. They simply need to move their hand towards the object they want to draw on. This will cause the attached virtual object to collide with the target object, drawing the annotation on it (Figure 2b). However, due to the lack of force feedback devices, drawing steadily on objects can be challenging to some. Some time needs to be spent evaluating how different parameters affect a user’s ability to

draw in such an immersive virtual environment, like other evaluations that have been done with annotating 3D surfaces with force feedback devices [9]. When drawing in free space, the user has two options which are similar yet provides them with some flexibility. The first option is to draw a set of connected line segments. This mode gives users a certain amount of precision in that they can take their time in drawing these annotations. The second option allows the user to draw continuous lines in free space. To do so, the user places their hand at a starting position, presses and holds a button on the joystick, moves their hand to draw the line. Together, these two methods allow the user to create 3D annotations in free space.

B. Navigation

pull the sensor towards them. This will effectively move the user forward, or pull the world towards them.

We have incorporated two navigation modes in our system to attempt to create an intuitive navigation system. The first mode uses the joystick as primary control and requires that the user point their hand sensor in a direction and use the joystick to control movement. When the joystick’s stick is moved forward or backwards, the user is moved towards or away from the direction they are pointing in, with a distance relative to the tilt of the joystick. The second mode only makes use of one of the joystick’s buttons and uses the user’s hand as primary control. The concept is that the user is to pull, push, or rotate the virtual world by pressing the joystick button and then moving the sensor. If a user wants to move towards an object in front of them, they simply have to place the hand sensor forward, press the joystick button, and then

C. Menus A snapshots of the menus in action are shown in figure 2c. 3D Menus were chosen as the mechanism for changing between the different modes the system offers. These were inspired by traditional 2D interfaces in 3D desktop virtual environments. 2D interfaces rendered in screen space cannot be used in our application as this would occlude the geometry and destroy the illusion of having the virtual objects appear to be floating in front of users. Every time the 3D Menu is shown, an attempt is made to place it in front of the user. When first implemented, the menus’ locations would be constantly updated to follow the user’s head movement.

a) Menu popping up on top of the 3D world.

b) The user interacting with the menu items Figure 3. Menus.

We later realized that the menus weren’t always appearing in front of the user because they are positioned based on the head sensor position and orientation. If the sensor is not properly placed on the head, the menus would not appear in front of the user which would make it hard to use. As such, the menus now remain stationary once shown and the user can navigate to properly position themselves. Showing the menus also attaches a small virtual sphere to the hand sensor so that the user can use this to interact with the menu items. Menu items can use any type of geometry, such as text or models, as their graphical representation. Our system takes this geometry and calculates a bounding box encompassing the former. This bounding box is optionally rendered, is semi-transparent, and is used to determine if a collision occurred between the menu item and the sphere, causing it to be selected. The bounding box is rendered in different colors to indicate that they are unselected, selected, or clicked. IV. SPECIFICATIONS The annotation system was developed for CAVELib 3.1.1, a virtual reality library. This commercial library takes care of the details that are specific to the setup through configuration files. On top of that OpenGL Performer 3.1.1 was used, which is a library that is built on top of OpenGL that CAVELib supports. Our setup was configured to use a cluster of three hosts using passive stereo projection to create an immersive environment. This system is connected to an Ascension FlockOfBird tracker with two electromagnetic sensors, one used to track the head position and orientation and another to track the hand. The data from these devices, along with a Microsoft Sidewinder joystick, are collected through trackd 5.0 and are fed to CAVELib. For the passive stereo projection system, every host is connected to two projectors that project an image for the left eye and another for the right eye onto a same projection screen. A screen is placed horizontally in front as if it was resting on a desk, one placed upright at the far edge of the one lying flat, and one perpendicular to both screen placed at their right edge. When a user wears a pair of passive stereo glasses and the head tracker, they can perceive virtual 3D objects as if they were floating in front of them. It therefore feels very natural to try and grab the objects to interact with them. V. CONCLUSIONS Free-space 3D annotations like the one introduced in this paper could potentially be used in a wide array of applications. This, coupled with an immersive virtual environment, as well as an intuitive navigation system and easy to use menu system is an affordable and practical framework for doing immersive reality research. A great example would be an application where surgeons can be taught, virtually, how to perform specific operations. With these types of annotations, an experienced surgeon could more easily express the necessary steps required to be performed. A system like this could also be used to provide

virtual tours, where this type of annotation can be used in the same way to enhance the virtual experience. This could also be used, for example, in design review systems, similar to other systems already in place [8], but using these annotations in a collaborative and synchronous immersive virtual environment. While formal evaluations of the system are currently underway, preliminary response to what we have accomplished so far has been very positive. Users have found the navigation method involving pulling the virtual environment to be the most intuitive to use, especially when drawing annotations. The annotation and the menus were very well received. We did get some feedback about the menus, especially when it came to changing the color, which users could find that some actions require too much time to perform. We need to spend some time in ensuring that these menus can scale properly without becoming a burden to use as applications are developed. ACKNOWLEDGEMENTS The authors acknowledge the financial support of the Canada’s Natural Sciences and Engineering Research Council (NSERC) for its USRA scholarship. REFERENCES [1] R. Harmon, W. Patterson, W. Ribarsky, and J. Bolter, “The Virtual Annotation System,” Proceedings of IEEE VRAIS’96, 1996, pp. 239-245. [2] T. Okamu, T. Kurata, and K. Sakaue, “VizWear-3D: A Wearable 3-D Annotation System Based on 3-D Object Tracking using a Condensation Algorithm,” Proceedings of IEEE VR’02, 2002, pp. 295-296. [3] R. Tenmoku, M. Kanbara, and N. Yokoya, “Intuitive Annotation of User-Viewed Objects for Wearable AR Systems,” Proceedings of IEEE ISWC’05, 2005, pp. 200-201. [4] H. Sonnet, S. Carpendale, and T. Strothotte, “Integrating Expanding Annotations with a 3D Explosion Probe,” Proceedings of the working conference on Advanced visual interfaces, 2004, pp. 63-70. [5] T. Jung, M. D. Gross, and E. Y.-L. Do, “Annotating and Sketching on 3D Web Models,” Proceedings of IUI’02, 2002, pp. 95-102. [6] I. Poupyrev, N. Tomokazu, and S. Weghorst, “Virtual Notepad: Handwriting in Immersive VR,” Proceedings of IEEE VRAIS’98, 1998, pp. 126-132. [7] L. Hong, E. H. Chi, and S. K. Card, “Annotating 3D Electronic Books,” CHI ’05, 2005, pp. 1463-1466. [8] A. D. Gregory, S. A. Ehmann, and M. C. Lin, “inTouch: Interactive Multiresolutoin Modeling and 3D Painting with a Haptic Interface,” Proceedings of IEEE VR’00, 2000, pp. 45-52. [9] Y. Shon, and S. McMains, “Evaluation of Drawing on 3D Surfaces with Haptics,” IEEE Computer Graphics and Applications, Vol. 24 Issue. 6, pp 40-50. [10] Heldal, I.; Schroeder, R.; Steed, A.; Axelsson, A.-S.; Spant, M.; Widestrom, J., "Immersiveness and symmetry in copresent scenarios," Virtual Reality, 2005. Proceedings. VR 2005. IEEE , vol., no.pp. 171- 178, 12-16 March 2005