May 5, 2017 - contexts: The Flashlight technique, also called Cone Selection, has the same principle of the ray-casting technique, but instead uses a cone ...
1
Context based Interaction Technique for Immersive Virtual Environments Abstract — Virtual and augmented reality applications have been seen as tools with potential to be applied in several different fields. Still, most recent virtual environments (VEs) systems have exhibited limited or counter-intuitive interactions. In virtual training environments, the user-system interaction interface is likely the most crucial aspect of an application, as the goal generally is to accurately simulate a real situation. Therefore, a robust, intuitive interaction system is needed. This work proposes a context based interaction system, built upon existent interaction techniques and research work, that aims to deliver developers a framework that can be used to create more efficient immersive applications for their clients. I.
INTRODUCTION
Virtual Environments has offered a new human-computer interaction paradigm in which the user can actively apply input at and receive output from the environment. It’s a field that has been strongly tied to innovation as scientific research continues to look for interfaces capable of promoting interactions closer to human sense through immersion devices (Kirner and Tori, 2004). In the training field, virtual environments have had a broad range of application, mostly on the military and clinical fields, and the rapid advancements in technology of displays, graphics processors, tracking system and chip speeds have increased that range even further, also making immersive devices more accessible to a larger public. However, there are still very few immersive VE applications in common use. This state affair is at least partly due to the lack of usable and effective interaction techniques constructed for immersive VEs (Bowman, 1999). Furthermore, developed interaction techniques (ITs) usually count for very specifically virtual actions, such as “pointing”, “picking” and “holding”. Thus, it’s unlikely that any single interaction technique would be enough to fit the range of actions of a real-world application, generally being left for the developer to mix and match several ITs in each to count for every possible situation. Therefore, the purposed of this work was to analyze and compare existing interaction techniques in order to develop a more robust, context based interaction technique that should generally fit several interactions scenarios, while still being able to be adapted to an application specific purpose by providing an extensible framework. Interaction techniques were analyzed by verifying their
efficient in performing three interaction “building blocks”: viewpoint motion control, selection and manipulation. Analyzing ITs by elemental building blocks have been done and proposed in several works in the field of immersive technology and virtual reality. Thus, for “context based interaction”, one should see as an interaction techniques that outputs one or more different elemental actions accordingly to user intention and application context. A system as such is crucial to virtual training environments as it’s required that the experience should be as close as possible to a real training scenario. The challenges and methodology used to create a system that correctly select those actions, as well as allowing for extensibility without requiring modification of the codebase is the subject of discussion of the following topics of this paper. II. RELATED WORK Immersive environments have been used, or attempted to be used for training dating from 1992, being the militaries the most likely pioneers in this field (Kuijper, 1997). Generally, the main focus would be to apply virtual, though efficient, training for tasks in which real simulations are either costly or dangerous. As immersive technologies grow in power, application, and most importantly, availability, research and applications that apply that technology for education, training and entertainment has also increased. Nevertheless, one that may be building an immersive application for a given purpose may eventually find himself surrounded with several options of interaction techniques, and often choose one that will fit perfectly in a particular context while being completely misplaced in others. In other to aid users developing immersive applications, several works regarding compiling and classifying interaction techniques have been done. Poupyrev, et al. (1998) applied a taxonomy that divided a VE interaction technique into an “Egocentric”, i.e. user centered, and “Exocentric” metaphors,
2 while Bowman, et al. (1997) proposed division of an interaction in their lowest level tasks: selection, manipulation, and release. III. IMMERSIVE TRAINING Following, more recent works have further focused on
Fig. 1. Taxonomy of virtual object manipulation techniques (Poupyrev, et al., 1998)
Fig. 2. Taxonomy of selection/manipulation techniques (Bowman, et al., 1999)
categorizing and comparing, quantitatively and qualitatively, interaction techniques in order to build a robust database for developers to choose from. One gap with these works is that they often result in a somewhat complex taxonomy system in order to fit all existed techniques, leaving users to choose from a wide range of techniques and eventually having to do mixing and experimenting on their own. Although that served the purposed of the mentioned works, this paper aims to propose a more general purposed interaction technique that shall intelligently fit the context the user’s in without requiring a complex input system.
There are regulatory training standards for several industrial fields. Independent of the sector, training norms emphasize the importance of adequate preparatory process to improve the competence of the employees, as well as the ability to elucidate the capability of the employee. It is also notable the importance of evaluating the competence of employees in carrying out an activity and a future planning to eliminate any identified deficiency. Companies can provide support for both trainee and trainer, monitor the quality of the preparation and make the training process better. Thus, training support may include providing the necessary infrastructure for specific activities, provide tools, equipment, software, as well as offering opportunities to apply the developed skills. The feedback o of the trainee is also interesting for future studies and performance analyzes (ISO 10015). However, in tasks that involves human risk or may be costly to set up, such as air force training and electrical distribution maintenance, often there is little opportunity for trainees to have practical experience before being faced with the real-world task. Furthermore, in tasks such as maintenance, in which one may be required through spaced periods of time, companies have also to consider recycling training for employees from time to time. The use of virtual training environments has shown to be valuable in increasing the overall effectiveness of a training program in which practical experience is limited. Not by substituting the real-world training, but by making the user acquaintance of the training process before hand, leading to less mistakes and efficient learning (Kuijper, 1997). Cardoso and Lamounier (2008), emphasizes that the use of Virtual Reality in educational and training contexts can deliver benefits hardly found through other methodology. By allowing interactions and navigations through intuitive and natural inputs, it is proven that Virtual Reality offers differentiated leaning approach and a better understanding of the object of study. A virtual environment provides accessible manipulation, exploration and experimentation, which increases content assimilation. Besides effectiveness, the option of a simulated virtual environment can expand learning and information bounds as more students would be able to have virtual experience with specific devices or equipment that may represent unfeasible expenses to be built (Liang, at. al., 2012). IV. INTERACTION ASPECTS A. Utilized technology Most VEs environments benefits from HMDs and tracked controllers as a physical interface between the user and the virtual world. This work utilized the HTC Vive headset for viewpoint positioning and orientation, and its shipped tracked
3 controllers for controlling of the virtual hands and input detection.
of the system by itself doesn’t require a training and users can benefit, as learners, from some degree of muscle memory. Therefore, special attention shall be given to the selection’s process and feedback, as well as to the manipulation process after attachment. C. Interaction techniques overview
Fig. 3. HTC Vive components: 1 HMD, two tracked controllers, two tracker boxes. (http://media.bestofmicro.com/Y/I/571482/gallery/14-vive-parts-Editdeveloped-fixed2_w_600.jpg. Public domain. Last access: 05/05/2017)
Although the HTC Vive represents some of the cutting-edge technology public available, as will be mentioned further, the system shall work with any HMD and tracked controller pairs that have at least two buttons, as the new Oculus Rift 3 and possibly the Google Dream. As for software, 3DS Max and Photoshop were chosen as the main modeling and texturing application, respectively, and the selected engine for development was Unity 3D. All the software mentioned above are long-termed industry standard due to benefits offered regarding productivity, compatibility and target building platforms. More information regarding each of these software can be find at their official web pages. B. Regarding interaction process in VR An interaction process taking place at a virtual environment usually involves three possible actions: Viewpoint motion control, Selection and Manipulation (Bowman, 1999). Viewpoint motion control refers to the user’s ability to determine his position and orientation within an environment. Given that HMD devices, in the case of this work the HTC Vive headset, already ships with a native orientation system, the main concern will orbit around translation inside the virtual environment and using user gaze to distinguish context. Selection involves actions of choosing one or more objects from the virtual environment for some purpose, while Manipulation refers to the action performed on the selected virtual object, for instance positioning and rotating. Selection and manipulation are tasks that normally appear together, being that an action of manipulation requires a previous action of selection, but the opposite isn’t necessarily true. Inside a virtual training system, it’s crucial that the interaction techniques have been implemented in a way that best simulates a real training situation. Furthermore, the interactions shall be sufficiently intuitive so that the utilization
One that have been in the process of researching interaction options for an immersive application most likely has seen two of the most popular interaction techniques in the field: in-hand manipulation and ray-casting. In-hand is a technique that mimics the most intuitive, cognitively simple form of interaction, where the user selects an object by “touching” it with his virtual hand, and manipulates it directly by the virtual hands. Despite the clear advantages related to user familiarity, this method can have limited application given that VE can be considerably big and it’s often inappropriate to force the user to go in arms-reach to an object in order to select it. Ray-casting, on the other hand, consists of light ray that emanates from the user’s virtual hands, which the user can use to select an object by intersecting it with the “light-ray”, grab it by pressing a trigger/button and manipulate it by moving the ray. It’s worth noting yet another interaction technique the builds upon the ray-casting and can be more suitable in some contexts: The Flashlight technique, also called Cone Selection, has the same principle of the ray-casting technique, but instead uses a cone instead of light as the selection volume. It can be noticed that each of the two techniques applies to very different contexts, and comparisons among the two have showed that ray-casting techniques are indeed best for object selection, while in-hand allows for more precise and expressive object manipulation (Bowman, 1999). The logical conclusion is to create a system that benefits from both, but without adding the complexity of needing the user to constantly toggle between them. Before discussing the method applied to select the interaction technique, as well as the interaction response, based on context, a few more considerations regarding the overall interaction process shall be given. D. Input and selection feedback When one interacts with an object in the real world, there are a number of feedbacks that shows how the interaction process is going, being one of the most relevant ones the feeling of “touching”. However, when someone interacts insides a virtual environment this kind of physic feedback is rarely effective, moreover, utilizing other senses, such as vision, can enable other types of feedbacks that doesn’t necessarily exists in the real word but facilitate the user’s experience in the VE. An elegant visual feedback was achieved in using hand animations. As in most applications, the virtual hands stay in an “idle state” if no input is given, by making the idle state
4 animation change when the hand hovers a selectable object, it gives the user a tip that this is an object he can interact with.
V. CONTEXT BASED INTERACTION Building upon the last sections, it’s finally possible to discuss the context based interaction system proposed at the start of this paper. During the implementation phase, the objectives were to develop a system that: 1) Correctly discerns between arms-reach/out-of-reach scenarios, and responds accordingly; 2) Let the users know when there is a far way, selectable object in view;
Fig. 4. Not hovering hand “idle” state.
3) Has an extensible framework so it’s possible for other developers to create different types of interaction with no need to change the codebase. To achieve those goals, there were implemented an in-hand and ray-casting interaction behavior that simultaneously runs in each virtual hand and a flashlight object detection system that runs at the virtual camera, that is, follows the user’s gaze. Furthermore, each selectable object in the VE have at least one behavior that responds either to the in-hand interaction or the ray-casting interaction. Also, being possible for an object to have the two behaviors working simultaneously similarly to what happens at the virtual hands. A. Context based selection system
Fig. 5. Hovering hand “idle” state.
Although simple, this feedback has proven ineffective for far away objects, since it can be hard for the system, and the user for that matter, to determine whether a far object is being hovered or not. To solve this problem, a simple visual feedback that outlines the selectable object’s shape was added to let the user know the objected is selected, and a flashlight selection (which will be further discussed in the next section) was used to determine which object is being selected. Regarding input, the implementation aimed to use a simple 2D input set up on top of the tracked controllers, so it would easier for it to be migrated to other hardware. The system works with only two button/trigger inputs on each controller, and each object of the virtual world, including the virtual hands, implement interfaces that respond to pressing/releasing actions of each of the two buttons. In the case of this application, the HTC Vive’s trigger and grip buttons were used, but it would fit any other controller with at least to button inputs. Also, please mind that the application does respond to intermediate states of the Vive’s trigger (i.e. half pressed) but this is not required for the application to work.
The interaction system decides whether an object is to be selected using the in-hand or the ray-casting technique by applying the following criteria: 1) If the hand is hovering an object in arms reach, the raycasting system is disabled and invisible (see Fig. 4 and 5); 2) Else, it the user gazes a far way object (flashlight detection), highlight that object so the user see it’s selectable; 3) If the user is gazing on an object, the ray-casting system is enabled and the user can select the far away object using this system;
Fig. 6. Gazed object highlighted so the user can know it’s selectable.
5 the flashlight would be picked. Once picked, if the user presses the trigger again, it will result in an action of turning the flash light on/off.
Fig. 7. Ray-casting selection system activated.
The decision for using the viewpoint flashlight detection was crucial for the system’s context distinguishing process, for it holds important information regarding the user intention. In other words, if the user looks at an object, he should be let know that this object is selectable. If he continuous to look at the object and hovers the hand on it, he wants to interact with it. This was inspired in a system long used in many 3D game applications, that uses the same visual feedback when the virtual camera is pointing an object the user can interact with. In other cases, if the player is close to the object, the focus is most likely in that object so there is no reason for the rest of the system to do any work.
Fig. 8. An user can turn-on the flashlight after grabbing it.
C. Field test An early version of this system was tested while developing a simple step-by-step maintenance process of a simplified version of a real world electrical panel. The goal was to test the interaction concepts, mostly the context based selection system and its ease of use.
B. Context based input and manipulation After implementing selection, the system should enable users to perform some sort of action at the selected object. As there are virtually unlimited types of objects and ways to interact with them, the system should be done in a way so the interaction behaviors (i.e. the virtual hands) are agnostic to the object being interact with, and the action performed when the user performs an input. In other worlds, the hands shouldn’t be required to know which object is being selected, it just needs to know a button was pressed and pass that to the selected/hold object so it can perform the appropriate action. This can be achieved using polymorphism, a characteristic of the Object-Oriented Programming paradigm. A simple example of this flexibility can be seen when the user interacts with small and big virtual objects. If the users hover or points at a ball, it’s expected he wants to grab it, however, if the selected object is a 300kg industrial robot, it’s counter-intuitive the user would ever want to grab it, so the action could be moving towards the robot or remotely turning it on. Another example is related to the user’s translation in the virtual environment. The ground itself is considered an object that only responds to ray-casting interaction, so if the user roughly looks at the ground and points to the desired position, he’s taken there. Further on context based interaction, actions as results from inputs also changes depending on what object the user holds. For instance, if the user selects a flashlight and press the trigger,
Fig. 9. Screenshot from early stage training system.
Fig. 10. Screenshot from early stage training system.
6 Feedback and overall experience description were collected from the users, which was useful for the iteration process during the system development, but can’t be categorized as a research result. Now that the interaction system is finished implementation and testing of virtual training for maintenance of reclosers at electrical distribution subsystems were scheduled for the following months. VI. CONCLUSIONS AND FUTURE WORK This paper proposed a comprehensive, context based interaction technique for immersive virtual environments. Development methodology aimed to solve interactions problems related to selection of both close and far away objects, as well as the need for considerable amount of action responses due to user input. Although the initial focus during the system development was to implement a robust interaction system for virtual training applications, the developed mechanics showed potential to be applied in other type of applications that requires different types of interactions in a single system, such as digital games and educational applications. Therefore, more research regarding compiling quantitative data related to the effectiveness of this technique in different applications is one of the future objects of this work. Furthermore, as the interaction system implementation has been finished, the research team shall move forward on developing a general purposed, virtual training development application, that aims to empower developers with an extensible framework of tools that shall increase development quality and speed.
REFERENCES [1]
Autodesk 3DS Max. Overview. Disponível em: Acesso em: 28 abril. 2017.> Last access at: 05/05/2017.
[2]
BOWMAN, D.; RAJA, D; LUCAS, J.; and DATEY, A. Exploring the Benefits of Immersion for Information Visualization. Proceeding of HCI International, 2005.
[3]
BOWMAN, D.; Interaction Techniques for Immersive Virtual Environments: Design, Evaluation, and Application, 1999.
[4]
BOWMAN, D., and HODGES, L.; An evaluation of techniques for grabbing and manipulating remote objects in immersive virtual environments, 1997.
[5]
CARDOSO, Alexandre; LAMOUNIER, Edgard. Aplicações na Educação e Treinamento. In: SISCOUTO, Robson; COSTA, Rosa. Realidade Virtual e Aumentada: Uma Abordagem Tecnológica. João Pessoa: SBC, 2008. P. 343-357.
[6]
KIRNER, Claudio; TORI, Romero; SISCOUTO, Robson. Fundamentos e Tecnologia de Realidade Virtual e Aumentada. Pará: SBC, 2006. 412 p.
[7]
KIRNER, Claudio; TORI, Romero; SISCOUTO, Robson. Fundamentos e Tecnologia de Realidade Virtual e Aumentada. Pará: SBC, 2008. 412 p.
[8]
KLEIN, R; SCHLATTMANN, M. Simultaneous 4 Gestures 6 DOF realtime two-hand tracking without any markers. In: ACM Symposium on Virtual Reality Software and Technology (VRST ‘07).Newport Beach, California, pp. 39-42.
[9]
KUIJPER, F.; HMD BASED VIRTUAL ENVIRONMENTS FOR MILITARY TRAINING – TWO CASES, 1997.
[10] LIANG, Xin. High Efficiency Skill Training of Lathe Boring Operations by a Virtual Reality Environment. In: Proceedings of IEEE Virtual Reality Annual International Symposium. 2012. [11] NBR ISO 10015. Gestão da Qualidade: Diretrizes para Treinamento. Rio de Janeiro, DF, 2001. 12p. [12] POUPYREV, I., et. Al.; Egocentric Object Manipulation in Virtual Environments: Empirical Evaluation of Interaction Techniques, 1998. [13] Unity3d. Game Engine, tools and multiplatform. Disponível em: < http://unity3d.com/unity> Last access at: 05/05/2017.