At last year's Google IO 2014, when Android Wear was first presented, ... Moreover, according to a blog post by GlobalWebIndex [28] from March. 2015, already ...
Development and Evaluation of Interaction Concepts for Mobile Augmented and Virtual Reality Applications Considering External Controllers Master’s Thesis by
Tobias Schürg
Submitted to the Department of Computer Science at RWTH Aachen University
First Reviewer:
Prof. Wolfgang Prinz, Ph.D.
Second Reviewer:
Prof. Dr. Bernhard Rumpe
Supervisor:
Maximilian Speicher, MSc ETH Inf.-Ing.
July 15th, 2015
II Declaration
RWTH Aachen University
Declaration
DECLARATION
I hereby declare that I have created this Master’s thesis completely on my own and used no other sources than the ones listed.
Tobias Schürg Aachen, 7/15/2015
Master’s Thesis Tobias Schürg
III
IV Declaration
RWTH Aachen University
V
Abstract
ABSTRACT Unlike virtual reality, which is fully computer generated, augmented reality extends the real world by adding virtual content based on the user’s current context. Since augmented reality can assist in various situations, it moves away from stationary workstations to mobile devices where it can be used to its full potential. With the advent of powerful smartphones and tablets, mobile augmented reality finally became available to the general public. Up until now, most applications restrict themselves to only show additional information, but lack the ability to interact with the virtual content. Therefore, this thesis focuses on interaction concepts for mobile augmented reality scenarios which don’t require specialized or expensive hardware, driven by two research questions. The first research question addresses whether convenient interaction on a touchscreen can be provided, while the second research question examines if a smartwatch can be used for this purpose. This can be beneficial especially in situations when other input is not available, such when wearing a head mounted display or one’s hands are not free for interacting. In order to evaluate the intuitiveness and usability of the considered concept for both, touch and with the smartwatch, a user study is conducted. Results show that a smartwatch can be used for augmented reality interaction and thus agree on both of the research questions.
KEYWORDS: Augmented Reality, Hands-Free Interaction, Intuitiveness, Mobile AR, Pie Menu, Ray Casting, Usability, Touch Interaction, Smartwatch Interaction, Virtual Reality.
Master’s Thesis Tobias Schürg
VI Abstract
RWTH Aachen University
Acknowledgement
ACKNOWLEDGEMENT First, I would like to express my sincere gratitude to my parents who supported me throughout the years and made my studies possible in the first place. Moreover I’m thankful for the support of Patricia and my whole family. Furthermore, I would like to express my gratitude to my great supervisor Maximilian Speicher for his continuous support and his constructive feedback as well as Simon Heinen for his initial thoughts on defining this thesis. I also would like to thank Prof. Wolfgang Prinz, Ph.D. and Prof. Dr. Bernhard Rumpe for supporting this work. I would like to thank all of my 27 participants who participated in the user study. I am thankful for my friends, the whole Bitstars team and everyone else who supported me while writing this thesis and during my studies. Finally some special thanks go to Maximilian, Patricia and the Weavers for proofreading this thesis.
Master’s Thesis Tobias Schürg
VII
VIII Table of contents
TABLE OF CONTENTS Declaration ....................................................................................... III Abstract ............................................................................................. V Acknowledgement ........................................................................... VII Table of contents ............................................................................ VIII 1.
Introduction ................................................................................ 1 1.1. Motivation ......................................................................................2 1.2. Problem Description .......................................................................3 1.3. Extension ........................................................................................4 1.4. Aims of Research ............................................................................5 1.5. Structure of this Thesis ...................................................................7
2.
Interaction in Virtual Environments ............................................ 8 2.1. Navigation & Travel ........................................................................8 2.2. Selection .......................................................................................10 2.2.1.
Cursor Selection ................................................................10
2.2.2.
Object Touching ................................................................11
2.2.3.
Ray Casting ........................................................................13
2.2.4.
Curve Selection .................................................................16
2.2.5.
Summary of Selection Techniques ....................................17
2.3. Manipulation ................................................................................18 2.3.1.
Touchscreen ......................................................................19
2.3.2.
Tangible User Interface .....................................................21
2.3.3.
Pure Hand .........................................................................22
2.3.4.
Ray Casting and Arm Extension ........................................23
2.3.5.
Gestures ............................................................................24
2.3.6.
Voice and Speech ..............................................................28
2.3.7.
Summary of Manipulation Techniques .............................28
RWTH Aachen University
Table of contents
2.4. Symbolic Input ............................................................................. 30 2.5. System Control ............................................................................. 30 2.6. Summary and Considerations ...................................................... 31 3.
Interaction Concepts ................................................................. 34 3.1. Requirements ............................................................................... 34 3.2. Selection....................................................................................... 35 3.3. Translation ................................................................................... 35 3.3.1.
Drag .................................................................................. 35
3.3.2.
Magnet ............................................................................. 35
3.3.3.
Place ................................................................................. 36
3.3.4.
Remote Drag ..................................................................... 36
3.4. Rotation........................................................................................ 36 3.4.1.
Fulcrum ............................................................................. 36
3.4.2.
Horizontal Pan .................................................................. 37
3.4.3.
Object Center.................................................................... 37
3.4.4.
Screen Center ................................................................... 37
3.4.5.
Two Finger Gesture .......................................................... 37
3.5. Scaling .......................................................................................... 38 3.5.1.
Lasso ................................................................................. 38
3.5.2.
Two Finger Gesture .......................................................... 38
3.5.3.
Vertical Pan ....................................................................... 38
3.6. Combining Methods..................................................................... 38 3.7. Smartwatch Interaction ............................................................... 39 3.8. Concepts Summary ...................................................................... 40 4.
Implementation ........................................................................ 42 4.1. Tools & Technologies ................................................................... 42 4.2. Foundations of jMonkey .............................................................. 44 4.3. Pie Menu ...................................................................................... 45 4.4. Handheld Implementation (Smartphone) ................................... 45 4.4.1.
Core Concept .................................................................... 45
4.4.2.
Interaction Concept Implementation ............................... 47
Master’s Thesis Tobias Schürg
IX
X Table of contents
4.5. Smartwatch Implementation .......................................................49 4.5.1.
Smartwatch Application ....................................................49
4.5.2.
Visual Ray ..........................................................................51
4.5.3.
Selection............................................................................51
4.5.4.
Rotation.............................................................................52
4.5.5.
Scaling ...............................................................................52
4.5.6.
Translation ........................................................................52
4.5.7.
Completing Manipulation .................................................52
4.6. Implementation summary ............................................................53 5.
Evaluation ................................................................................. 54 5.1. Interim User Study........................................................................54 5.1.1.
Design................................................................................54
5.1.2.
Participants .......................................................................55
5.1.3.
Results ...............................................................................56
5.2. Handheld User Study ....................................................................59 5.2.1.
Design................................................................................59
5.2.2.
Participants .......................................................................61
5.2.3.
Results ...............................................................................61
5.3. Smartwatch User Study ................................................................68 5.3.1.
Design................................................................................68
5.3.2.
Participants .......................................................................68
5.3.3.
Results ...............................................................................69
5.3.4.
Summary ...........................................................................71
5.4. Comparison ..................................................................................72 5.5. Summary, Feedback and Observations ........................................73 6.
Discussion & Conclusion ............................................................ 76 6.1. Findings.........................................................................................76 6.2. Limitations ....................................................................................78 6.3. Future Work .................................................................................78
RWTH Aachen University
Table of contents
References ....................................................................................... 80 List of Figures ................................................................................... 86 List of Tables .................................................................................... 87 7.
Appendix................................................................................... 88 7.1. Demographic data........................................................................ 88 7.2. Evaluation Results ........................................................................ 89 7.3. Feedback ...................................................................................... 93 7.3.1.
General ............................................................................. 93
7.3.2.
Rotation ............................................................................ 93
7.3.3.
Scaling ............................................................................... 93
7.3.4.
Translation ........................................................................ 94
7.3.5.
Smartwatch....................................................................... 94
7.3.6.
Scenarios for a smartwatch .............................................. 95
Master’s Thesis Tobias Schürg
XI
XII Table of contents
RWTH Aachen University
1 Introduction
1. INTRODUCTION Unlike virtual reality (VR), which relies on fully computer generated
Augmented reality,
content, augmented reality (AR) is referred to as the computerized
the mixture of real
perception where the real world and the virtual world are merged [1]. AR
world virtual reality
allows the real world to be enhanced by adding computer generated content such as virtual objects, textual or other relevant information based on the current context and environment of the user. To create an AR experience, at least a camera and a display device are needed. The first is used to capture the environment while the latter then augments virtual content onto the captured scene. Since the first VR and AR prototypes were based on desktop computers and required special hardware to be placed around the user, those systems were stationary (Figure 1) and restricted to indoor use [2]. Early mobile approaches were created for outdoor use, but still suffered a limited mobility because of their heavy weight and big size. Until a few years back, many of those mobile systems consisted of an ordinary screen which was either handheld or attached to the users head and connected to a heavy backpack, containing a laptop for all the calculations, different sensors for pose estimation, batteries and everything else (Figure 2) [3]. At that time, the main focus of AR research was mainly attaching labels to real-world content, showing virtual 3D objects for illustrating complex issues or navigation. As technology evolved, existing hardware became both smaller and more efficient, while at the same time new devices such as smartphones and wearables were developed which have more capabilities and offer new possibilities for AR purposes [4].
Master’s Thesis Tobias Schürg
2 Introduction
In general there are two commonly used ways to experience virtual content: either by wearing a Head-Mounted Display (HMD) or using a Hand-Held Device (HHD). The former, sometimes also referred to as traditional AR, provides a more intrusive experience but requires more and specialized hardware. There are currently a lot of different HMDs for VR and AR use being developed. For example, there is Meta [5], Microsoft Hololens [6] or Occulus Rift [7] to name only the most popular among them. AR on an HHD such as a smartphone or tablet is sometimes also referred to as magic lens [8], since using the device camera to capture the environment feels similar to taking photos. Figure 1 (left): The first AR system by I. Sutherland (1968) [2]
Figure 2 (right): The Touring Machine, the first mobile AR system (MARS) (1997) [9]
1.1.
Motivation
In 2017, every third
Due to the increasing spread of smartphones, mobile AR is now widely
person will own a
available, offering more and more serious and useful use cases [10]. A
smartphone
report, released by Google in 2013 [11] shows that in most industrialized countries, smartphone distribution was already far above one third of the population at that time. Another source [12] projects that in 2017, almost 65% of the total western European population as well as over one third of the world’s population will own a smartphone.
RWTH Aachen University
3 Introduction
Augmented reality is moving away from stationary workstations while becoming mobile and thereby exploring new fields, offering various new possibilities [4]. Moreover, on the one hand, mobile devices suffer energy limitations and have other constraints such as computational power, but on the other hand, they offer many features which traditional computers don’t have. Amongst others, these new devices are small, lightweight and provide access to several sensors for motion detection and built-in cameras. However, technology improvements made mobile AR possible in the first place. These include powerful mobile processors, mobile internet and cloud infrastructure. Since there are many ways how AR can assist the user in everyday life and because AR heavily depends on the environment, it was a natural next step for AR to become mobile. Today, smartphones are ubiquitous and part of everyday life with very
A wide range of
high usage numbers [11]. Moreover, a variety of mobile AR applications
applications
for smartphones and tablets already exist. Most of them have their focus on just augmenting the immediate real world with “helpful” virtual data such as altitudes and names of mountain tops [13], zodiacs [14], public Wi-Fi hotspots [15] or additional information for products in supermarkets [16].
1.2.
Problem Description
All the previously mentioned applications have one thing in common – they focus on showing contextual information about the immediate environment, or add virtual objects but lack the ability to interact. In many scenarios this is acceptable, but when wanting to exploit the full potential of virtual environments, be it AR or VR, solid interaction concepts are needed. Construction of prototypes or virtual worlds, setting up an apartment or gaming are only a few examples of when the user could benefit from being more than just a spectator. Besides changing perspective, there is almost no interaction possible with the virtual content. Although there is currently research going on in this
Master’s Thesis Tobias Schürg
Interaction is rare
4 Introduction
area, mostly on hand tracking or gesture recognition (see chapter 2), yet there are few working examples of gestural interaction concepts for AR which are both convenient as well as affordable by the average user. This might be due to the limited availability of specialized hardware, but also because of the lack of processing power needed to simultaneously track the environment, perform feature recognition and detect gestures on a mobile device [17]. Therefore, HHDs seem to be preferred for interaction in mobile AR scenarios [18]. Another advantage for this type of interaction is the high smartphone and tablet spread by which users are already familiar with the concept of touch [19]. There are several ways interaction can be performed in virtual environments. Among those are touchscreen input [17], tangible user interfaces [20], hand tracking [21] and ray casting [22] techniques. Since smartphones already offer a touchscreen and the other interaction methods have more restrictions, one focus of this thesis is on touchscreen interaction. Section 2.6 points out the details of this decision.
1.3.
Extension
There are situations in which interaction can be advantageous, but the usage of a touchscreen is undesired or even impossible. For example, when wearing a head mounted display, in industry when ones hands need to be free, when wearing gloves or while cooking and one’s fingers are greasy. In these situations an external controller might be helpful. Smartwatch as an
At last year’s Google IO 2014, when Android Wear was first presented,
external controller
smartwatches finally became widely accepted. Approximately two million smartwatches were sold in 2013 [23] and about 6.8 million smartwatches have been sold in 2014 [24]. Apple released their first smartwatch only a few months ago, in April 2015, and is expected by J.P. Morgan to sell 26.3 million smartwatches by the end of 2015 [25]. Gartner says that in 2015 there is a market for 68.1 million smart wristbands and about 50 percent of those people who are interested in a smart wristband will choose a
RWTH Aachen University
5 Introduction
smartwatch instead [26]. Juniper research estimates a total product shipment of 130 million wearables, including smartwatches, by 2018 [27]. Moreover, according to a blog post by GlobalWebIndex [28] from March 2015, already 10% of the 16-24 year olds own a smartwatch. Based on these numbers their technical features, such as motion sensors, smartwatches might be an alternative when touch is not possible or undesired.
1.4.
Aims of Research
This thesis has two goals which are each represented by a research question which is to be answered in the end. There has been a lot of research on different object selection techniques [29][30] but manipulation is largely limited to translation and sometimes rotation. Virtual objects are thereby often attached to the user’s hand, the end of a ray or another real object and thus, the formers movement is controlled by the position of the latter one. This might not be adequate for the use on mobile devices.
Can we provide good and convenient ways for interacting with virtual objects in mobile augmented reality environments? Therefore, the first aim of this thesis is to investigate interaction concepts that are well-suited for mobile AR and let the normal user do more than just move an object from one position to another. Additionally it should be possible to use those concepts without requiring specialized and expensive hardware such as data gloves.
Master’s Thesis Tobias Schürg
Research Question #1
6 Introduction
The aim of this thesis is to develop good and convenient ways for interacting with virtual objects in mobile AR environments. Thereby the first goal is to develop both intuitive and usable interaction concepts which work well on touchscreens, especially on consumer devices. Moreover this means the absence of specialized expensive hardware, solely the smartphone. The second goal is then to verify, whether a smartwatch can be used to interact in virtual or augmented realities and if so, whether interaction with a smartwatch can be used to replace touch. This is useful in several situations, like when using a head mounted display and there is no touchscreen available. Research Question #2
Can a smartwatch be used for interaction in virtual environments? Therefore several interaction concepts are initially implemented and then evaluated on a touch device. Based on their performance, some of those concepts will then become extended to be controlled by a smartwatch. To answer the two research questions, at first nine concepts for the use with touchscreens are developed. After that, another set of interaction methods for to the use with a smartwatch are implemented. The considered interactions include selection and three manipulations: rotation, scaling and translation. Although the main focus is on AR, the developed concepts will also work in VR. Besides a summary of existing interaction techniques for the use with augmented and virtual reality, this thesis also provides implementation details and a comprehensive user study in which the developed concepts are evaluated by different criteria. Additionally, a concept for easy switching between different interaction modes will be presented.
RWTH Aachen University
7 Introduction
The outcome of this work will be a complete library for interaction and the answers the two research questions.
1.5.
Structure of this Thesis
The next chapter provides an overview of the field of interaction in virtual environments. Multiple approaches are presented while the emphasis is mainly on object selection and object manipulation. The following chapter presents the interaction concepts which are considered for interaction and whose implementation is then coarsely described in chapter 4. In chapter 5 the user study and its results are presented and analyzed. After that, chapter 6 concludes with summarizing the main findings, answering our two research questions, pointing out limitations and finally motivating further research.
Master’s Thesis Tobias Schürg
8 Interaction in Virtual Environments
2. INTERACTION IN VIRTUAL ENVIRONMENTS In virtual three dimensional environments, interaction can be classified into five categories [30], whereas selection and manipulation are the two 5 types of
most interesting for AR. Navigation and Travel in general are not required
interaction
as much as in pure VR since the user is able to move and walk around by himself while only the position of the virtual object needs to be determined. Furthermore there is system control, which refers to changing the interaction mode or system state. In many applications, system control can be achieved by using virtual menus and then drilling down to a selection task. Lastly there is symbolic input, the procedure during which a user transmits symbolic information to the system. Although selection and manipulation belong together since they are hardly used in isolation, Bowman et al. show in their user study that it makes sense to consider these as separate issues for overall usability [31]. This chapter gives an overview of the different ways that exist to interact in virtual environments. The focus will be primarily on selection and manipulation.
2.1.
Navigation & Travel
“Navigation, or wayfinding, refers to the process of determining a path through an environment to reach a goal, whereas travel means the first person motion control through a virtual environment” [32]. Navigation can be assisted by a map overlay or a navigation system which leads the user to the target location.
RWTH Aachen University
9 Interaction in Virtual Environments
Besides exploring virtual environments with traditional input devices such as keyboard, mouse or joystick – which are not very natural at all – other custom physical controls exist. There are further techniques which don’t use a physical control and can enhance the interactivity of a virtual environment. These can be coarsely classified into three groups: gaze, pointing and physical motion techniques. Pointing techniques use the current orientation of the user’s hand and move to where the user is pointing. This can sometimes be confusing for novice users who don’t fully understand the relation between hand and gaze direction [29]. As an alternative there is gaze directed motion where the user moves into the direction he is currently looking. This is typically
Moving in the direction
approximated by head direction. A drawback of this technique is that
of head or hand
simply looking around is not possible anymore. Bowman et al. show in three qualitative experiments within a framework for evaluating the quality of different motion techniques that “Pointing techniques are advantageous relative to gaze-directed steering techniques for a relative motion task, and that motion techniques which instantly teleport users to new locations are correlated with increased user disorientation.” [32]. One of the simplest and most natural ways for a user to move through a virtual world is to map his movements in the physical world, such as walking, into corresponding motion through the virtual world [29]. The
Walking in the
disadvantage of physical motion is that the range of user motion directly
physical world
depends on the tracking technology that is used. The correspondence between physical motion and virtual motion can be converted so that one step can cover a range from a few centimeters to several light years in a space simulation. Another example for a navigation technique is the implementation of LaViola et al. [33] where the user walks on a miniature floorplan. Nevertheless range is still limited. To overcome those limitations of range, treadmills [34][35] or adapted bicycles [36] can be used for exploring virtual environments. Many more navigation and travel techniques exist, but since in mobile augmented
reality,
the
user
can
walk
around
by
himself,
Master’s Thesis Tobias Schürg
10 Interaction in Virtual Environments
special techniques are less important than in pure virtual environments. The interested reader will find more information about this topic in the given references and the references therein.
2.2.
Selection
Selection refers to choosing one or more objects from a group. Selection may be used on its own to specify an object to which a command will be applied (e.g. “delete the selected object”), or it might denote the beginning of a manipulation task [37]. In “Classification of Interaction Techniques in the 3D Virtual Environment on Mobile Devices” [38], Balaa et al. sum up three constraints when Three problems
selecting virtual objects on mobile devices. These problems depend on environmental density, depth of targets and occlusion. The first problem is the environmental density. If there are many objects very close to each other, accurate selection without resizing the scene, can be very difficult. Furthermore, since 3D objects are arranged in several levels, the depth of targets concerns the problem of pointing at a certain depth, reaching objects far away and knowing the exact position of the cursor’s location in relation to the target. The third constraint deals with occlusion and how to reach an object which might be partially or totally hidden by another object, and thus its visibility reduced. In the following, different selection techniques will be presented and also evaluated against the three constraints.
2.2.1. Cursor Selection Inspired by 2D selection on desktop computers, where a computer mouse is used, a very simple technique to select an object in a 3D virtual environment is to move a virtual cursor in 3D. The cursor movement can be controlled by hardware buttons or any other physical input device, such as a 3D mouse which is freely moved around and whose 3D position is then linearly transformed into virtual space. An object is selected by
RWTH Aachen University
11 Interaction in Virtual Environments
pressing a button on the input device. Since an intersection with the object is needed, accurate movements are required. Thus, this task can be very complicated, especially in dense environments. Depth, more precisely estimating the distance of the cursor, is a huge problem, same as occlusion [38]. As an improvement to the simple 3D point cursor Zhai et al. developed the Silk cursor technique where the point cursor is replaced by a semitransparent rectangular volume [39]. The silk cursor uses a rectangular
Rectangular volume
volume in order to make the selection of both small and moving objects
for selection
easier. It furthermore is semitransparent so that a user can tell by the level of silkiness if the object is behind, inside or in front of the cursor. In doing so, this technique overcomes the depth problem. Another technique proposed by Dang et al., called the transparent sphere technique [40], selects all objects at a given distance around the pointer. With this technique, the pointer does not need to intersect the object itself and thereby solves the density problem. The desired object can then be selected via a graphical menu. Although 3D cursor selection techniques are relatively simple to
Occlusion is a problem
implement, they all suffer the occlusion problem and due to their inaccuracy, caused by depth, they are only of limited use [38].
2.2.2. Object Touching A special case of 3D cursor selection and a very intuitive way of selecting an object is by touch. An object can either become selected if it is touched (e.g., the hand’s position intersects with the virtual object) or if a hand
Tactile feedback
gesture, such as grabbing, is performed near the object. Although
is needed
selection by touch might be the most intuitive way, it is only useful for targets close to the body within arm range and might be inappropriate in dense environments. Furthermore complex hand tracking and gesture recognition is needed. If only finger tracking is used, due to the lack of the missing tactile feedback, it won’t feel at all like real-world grabbing.
Master’s Thesis Tobias Schürg
12 Interaction in Virtual Environments
There are multiple ways of performing hand and finger tracking, for example, computer vision based or glove based. In the glove-based Computer vision
approach, the user is wearing a data glove that tracks hand movement,
approaches are not as
finger bending and can even provide haptic feedback. Computer-based
fast as gloves
tracking offers more natural interactions, but is very complex and costly, especially on mobile devices. The computer vision based hand tracking approach of Lee et al. [41], uses four steps. First the color of the skin is segmented, then feature points are extracted and matched. In the third step the direction of the hand is determined before a collision detection is performed. Lee also admits, that the system response speed of vision based approaches is not as fast as gloves or other input devices. A technique which overcomes the small interaction area of pure hand based approaches is the GO-GO interaction technique [42]. GO-GO is an
Arm extension
arm extension technique that transforms the users’ real hand into a virtual hand. It defines a radius around the user within which the virtual hand is mapped directly onto the physical hand. Outside of this radius a nonlinear transformation is used to extend the arm. However GO-GO is still limited by the length of the real arm and hence can only reach within a finite distance. More techniques that can reach objects at any position will be presented in the following section about ray casting. Due to the non-linear mapping, precise positioning in far distances and selection of dense objects can be difficult. World-in-Miniature (WIM) [29] is another powerful technique that can be used in environments where objects are not located within arm range. WIM enables users to interact with all virtual objects in a miniature representation floating in front of the user. It also has the advantage of seeing the entire environment and selecting far away objects while still standing at one location. Thus occlusion and depth is less of a problem. However, because of downscaling, precise selection in dense environments might be difficult.
RWTH Aachen University
13 Interaction in Virtual Environments
Selection by touch is a very natural way of interacting. Yet, since virtual
Limiting factor is the
objects per se have no physical representation, there are no haptics,
user’s arm length
making it less natural. The restricting factor at touch interaction is the limited length of the user’s arm.
2.2.3. Ray Casting Ray casting techniques move away from the object touching metaphor and instead adopt a pointing metaphor [37]. Every device which is capable of orientation tracking such as mobile phones, pens or wearables can be
Solves the
used for that purpose. Hereby, a ray is emitted from the user, who can
depth problem
control its orientation by device rotation. By default, the first object the ray intersects with is selected, optionally by confirmation. It is easy to see that the general ray casting approach solves the depth problem very well, due to its infinite length. However, selecting small distant objects can still be difficult. Furthermore, ray casting suffers from the occlusion problem since only the closest object is selected. To allow selection of objects inside or behind other objects, in their paper, Hinckley et al. [43] suggest adding a mechanism for cycling through all objects which were intersected by the ray. Since it is not specified how
Preselecting
the cycling could be done, Grossman et al. [44] use forwards and
multiple objects
backwards hand movement in their application, but admit that with only little visual feedback, users found it very hard to understand how much movement was needed to select a particular target. Also Dang et al. [40] propose preselecting multiple objects that were intersected by the ray, and then allowing the user to select the desired object via a floating menu. Instead of an ordinary menu, Grossman and his team propose another two-step selection technique, which they call flower ray [45]. With this technique, all objects intersected by the ray “flower out” into a menu, as
Use a menu to select the
shown in Figure 3. The input device is then used to select the desired
desired object
object from this menu. For this, a 2D cursor is shown in center of this round menu, and once leaving the bounds of the circle, the target closest
Master’s Thesis Tobias Schürg
14 Interaction in Virtual Environments
to the cursor will be highlighted, indicating that it can be selected. To overcome the problem of selecting small and distant targets, both research groups previously mentioned suggest using a spatial instead of a simple ray. Therefore, Dang et al. [40] use a transparent cylinder, similar to the silk cursor, while Grossman et al. use a translucent cone. Figure 3: Flower Ray
When there are only few occluded objects, Dang and his team propose the “ray with variable length”-technique which uses a ray with a variable length [40]. Then the object closest to the rays’ endpoint is first highlighted and then selected when the user presses a button. Building upon the idea of a ray with a variable length, Grossman and his team developed two techniques in which, for selection, a depth marker is attached to the ray [45]. With Depth Ray [45] a depth marker is attached to the ray. Then all objects intersected by the ray are highlighted in green and the one closest Using a depth marker
to the depth marker becomes highlighted in red. The position of the depth marker can then be controlled dynamically by moving the hand forward or backwards. Instead of directly cycling from one target to the next, as implemented by Grossman [44], the depth marker continuously moves along the ray, always selecting the closest target. This is visualized in Figure 4. According to Grossman [44] , the depth ray technique outperforms other techniques where the depth marker jumps directly from one target to another.
RWTH Aachen University
15 Interaction in Virtual Environments
Since the ray position might unintentionally be changed, especially when trying to move the marker, there is the Lock Ray technique which is a variation of Depth Ray. Instead of controlling the ray and moving the marker simultaneously, these two phases are carried out sequentially. Therefore, first the ray is locked by pressing a button, and all intersected objects become highlighted. Only then will the depth marker appear and it can be adjusted by moving forward and backward, similar as with the depth ray.
Figure 4: Depth Ray The circular, purple colored point on the ray is the depth marker
Using two rays [44] is another way to select an object directly in one step.
Multiple rays
Therefore, either the intersection point of those two rays or the point at which the rays are closest to each other needs to be determined. Then the object closest to the intersection point is selected. This technique overcomes the occlusion problem, but a second pointing device is required and additionally, the user needs to use both hands for the selection process. Based on the idea of two rays which intersect and define a single point in
Intersection of a ray
3D space, the Smart Ray technique by Grossman et al. [45] estimates the
with itself over time
intersection of a single ray with itself over time. Their algorithm assigns weights to all of the objects which are intersected at one point in time. Those weights are continuously updated based on proximity to the ray. The closer a ray gets to the center of the target, the greater the weight increase. With this algorithm, from multiple initially intersected objects in the beginning, one single object can become selected by repositioning the
Master’s Thesis Tobias Schürg
16 Interaction in Virtual Environments
ray during the selection process. Even if multiple targets are selected by the new ray position, the intended target will have the highest weight, as its weight has been continuously increased. This technique allows precise selection and solves the density, depth and occlusion problem. However quick selection in one step is not possible anymore. In summary, ray casting techniques are very versatile and offer an easy solution for object selection, even at far distances. However, since modified versions that overcome the depth problem will often select multiple objects at once, a second selection step needs to be carried out and thus selection becomes a process of at least two phases, which results in the selection of one object taking additional time.
2.2.4. Curve Selection Extension to
Another way to solve the occlusion problem when selecting virtual
ray casting
objects is implementing curve techniques. Curves are an extension to ray casting techniques and have been proposed by Dang et al. [40]. With curve techniques the linear ray is replaced by a curve allowing the user to point around objects and thus to select partially or fully obscured objects. Thus, even occluded objects can be selected in one step. Dang et al. [40] consider curves with both fixed and variable lengths. A concrete implementation of a curve technique by Feiner is called the flexible pointer technique [46], where the curve is visualized by a curved ray. The pointer is implemented as a quadratic Bézier spline whose direction is determined by a vector formed by the user’s hands. The amount of curvature is furthermore defined by the orientation of each hand.
More complex than
Although curves solve the occlusion problem, this technique has other
pure ray casting
drawbacks. For example, in cases of a very crowded environment, a curve will still select multiple objects. Thus a two-step approach, as with ray casting, might be helpful again to solve that problem. Yet, with curve selection, this would lose its efficiency advantage in efficiency compared
RWTH Aachen University
17 Interaction in Virtual Environments
to simple ray casting. Moreover two hands are needed for the selection process and it is questionable how well users can cope with steering the pointer in size and curvature.
2.2.5. Summary of Selection Techniques Point cursor techniques are prone to depth issues since it is difficult to locate the pointer position relative to the target. There exist some improvements, such as the silk cursor technique, but the pointing task in depth remains difficult. The depth problem can be solved by the use of ray casting techniques. However, since the ray might cross other objects before the desired one is intersected, this technique is subject to the occlusion problem. Therefore a second validation phase might be necessary, but this increases the overall selection time. Curve techniques can solve the occlusion problem by avoiding targets in front of the user, but again might suffer from the depth problems as well as bad usability, because the curve has to become both moved and curved, making precise pointing difficult. An overview of the different object selection techniques which are presented in this section is shown in Table 1. Grossman et al. [45] compare the 3D point cursor technique with the ray cursor for selecting objects on volumetric displays and find, that for selecting a single target, the latter is significantly faster. In another experiment, they compare different ray casting techniques and determine the depth ray technique to be the most successful in terms of acquisition time and error rate. Having low error rates and a reduced selection time but higher acquisition times, lock ray and flower ray both perform well, however acquisition time was not as good as the depth ray technique. An Evaluation by Bowman & Hodges in [31] shows that ray casting techniques perform more efficiently than arm extension techniques like the GO-GO technique.
Master’s Thesis Tobias Schürg
18 Interaction in Virtual Environments
Table 1: Overview of selection concepts
Advantages
3D Cursor
Touch
Ray Casting
Curve
2.3.
Similar to a computer mouse
Problems / Limitations
Density
Depth
Occlusion
Depth (arm range)
Occlusion
Intuitive
Simple
Density
Range
Occlusion
Selection of occluded Objects
Density
Difficult steering
Manipulation
Rotation, scaling
In many cases it is not sufficient to only select an object, but often
and translation
modification of the object’s attributes is desirable as well. These attributes may include position, orientation, scale, shape, color or texture. For the most part, researchers consider manipulation of the position and the orientation of rigid objects. This is due to the fact that these two manipulations, translation and rotation, represent those modifications that we are familiar with in the real world. Sometimes changes in size are considered as well, but since virtual objects often mimic real objects, this behavior is not always desired. Therefore, the term basic manipulations or canonical manipulations often refers to translation, rotation and scaling.
RWTH Aachen University
19 Interaction in Virtual Environments
2.3.1. Touchscreen As nowadays hand-held devices are smartphones and tablets, they already have a built-in touchscreen. Hence, for mobile AR it appears likely to use touch input in order to interact with virtual objects. Additionally, no special hardware is required beyond the mobile device itself and users are already comfortable with using this kind of interaction. However,
Mapping from screen
since touchscreens are limited to recognizing 2D input, there needs to be
into virtual world
a mapping from the touchscreen into 3D space. Furthermore, additional problems are observed. As Van Olst claims in [47], the most notable problem when dealing with touch interaction is the “fat finger problem”. It occurs if the user’s finger used for interaction covers the content making reasonable interaction impossible because the target object is no longer visible. This problem has two sources: the object is either too small or too far away. In ordinary applications this is often not a big problem since the content can be scaled up or down to a sufficient size. However, in virtual environments, some objects are attached to a fixed position and therefore cannot simply become scaled or resized independently. They only change their size if the user moves closer to or away from the object. Translation and rotation are often implemented as known and proven from other established mobile phone applications [19]. More precisely, often single finger drag (and drop) is used for translation and the two finger pinch or zoom (spread) gesture for scaling and rotation. While the latter uses the angle between start and end points on the touchscreen, the former uses the relative distance change and calculates its ratio which is then applied to the target object. Figure 5: Touch gestures: Drag, Pinch, Zoom [48]
Master’s Thesis Tobias Schürg
20 Interaction in Virtual Environments
Furthermore, there are crosshair-based interfaces [49], where a crosshair is added to the middle of the screen and the movement sensors of the Like stationary finger
device are used to track device orientation and the viewing direction.
in the screen center
Selection is then achieved by aiming at an object, so that it feels like shooting in a first-person shooter and is very similar to ray casting. Compared to the pure touch-based interfaces, the crosshair itself can be seen as an additional stationary finger at the center of the screen. Hence object translation is accomplished by device movement for which after selection, the object is attached to the crosshair. Similarly, the object is rotated by the use of just a single finger. In this case the rotation angle is the one between first touch position, crosshair and last touch position. Since the device can be held with two hands, this kind of interaction is normally less shaky than the pure touchscreen approach [10]. In his Master’s thesis [47], van Olst compares touch-based and crosshairbased interfaces. Based on a user study, he concludes that the “touchbased interface trumps the crosshair-based interface in both performance and usability” [47]. This is due to the fact that touch
Users tend to keep their
interfaces are very simple to use and the users are already familiar with
device stationary
this kind of interaction. He also adds that most participants just used the mobile phone as a window to the AR environment without being forced to be a part of it. Additionally he points out that people tend to keep their device stationary when it’s not required to be moved. One reason for this attitude might be that a change of the viewpoint feels bothersome for most users. Sometimes it can be difficult to hold the device completely still because hands are shaky or the device needs to be held in an uncomfortable
Freeze-object interaction
position. Both can lead to unexpected errors and make proper interaction difficult. Therefore, Chowdhury et al. [10] propose an interaction technique, namely Freeze-Object, for more precise touch interaction with objects in AR [10]. This technique is based on other freeze techniques, which pause the whole scene, but in contrast only freezes the desired object, rather than the environment. Once the user freezes the object, the device can be moved in any direction without relocating the target
RWTH Aachen University
21 Interaction in Virtual Environments
object by accident. Although announced, three years later, no user study has been published.
2.3.2. Tangible User Interface The basic idea of tangible user interfaces (TUI) is to attach virtual objects to physical ones. The user can then interact with those virtual objects by altering the corresponding physical object. For fast object recognition, researchers often use simple QR codes that are attached to tiles, paddles or cubes. Sometimes this is also referred to as tangible augmented reality
Tangible augmented
[50]. Besides the fact that TUIs are “extremely intuitive” [51] because of
reality
the one-to-one mapping, there are further important advantages. First, because objects have a rigid representation in the real world, they also have real haptics so that there is less confusion for the user, whether an object is currently held or not. It is also possible and often desired to use two hands and interact with multiple objects simultaneously. Furthermore TUIs allow easy collaboration between multiple participants since the physical objects can be passed on to each other [51]. However, Chowdhury claims in [10] that tangible interaction is hardly compatible with handheld devices since the user must hold the device with at least one hand, although it is assumed that the user has both hands free to manipulate physical objects, which will not be the case with mobile phones [51]. Additionally, focusing on both, the correct device orientation and the tangible object strongly limits the interaction possibilities. The Smarter Objects AR system by Heun et al. [52] associates virtual objects with physical objects. They use this mapping in order to “reprogram” the (physical) smart object so that it obtains another
Reprogramming
behavior. Therefore, when their AR application recognizes an object, a
smart objects
graphical interface is shown on the handheld device that enables the user to change the object’s behavior or connect multiple smart objects to perform a common task. As an example, they present a radio with only two knobs, a volume and a tuning knob. With the tuning knob, a user can
Master’s Thesis Tobias Schürg
22 Interaction in Virtual Environments
go through preselected radio stations and change the volume with the other knob. When using the corresponding application on a tablet and pointing towards the physical object (the radio), a second, graphical user interface is augmented on top of the TUI. It shows the currently chosen station and moreover enables the user to edit the preselected stations. All changes to the TUI are shown on the GUI as well. Furthermore, a user can connect the radio to another speaker. The linking is done by drawing a line on the touchscreen, which connects the radio with the speaker. Another example of a smart object from Heun et al. [52] is a light switch where the user can change the color via an augmented graphical user interface that is presented on top of the TUI. The light switch then works as programmed by the user. Minimize complexity
They particularly emphasize that smart objects help to minimize the complexity of small and low power devices and help users to better understand devices with the aid of an additional, augmented and graphical interface.
2.3.3. Pure Hand A natural and common object manipulation technique is to attach the selected object directly to the user’s hand. The attached object then No transition required
moves along with the hand until it is released. Although this is a simple and intuitive technique, some orientations might require the user to twist the arm into uncomfortable positions. However, one big advantage of this technique is that this type of manipulation can be combined with scenarios where hand selection is already used in advance, since no transition to another concept is required. Similar to the selection process by hand, this technique suffers the limitations of reach that again can be partially solved by arm extension techniques, as presented in the next section. Furthermore, the manipulation of large objects within arm range might also occlude the user’s view. [37]
RWTH Aachen University
23 Interaction in Virtual Environments
2.3.4. Ray Casting and Arm Extension Arm extension and ray casting techniques for selection (cf. section 2.2.2) can also be extended for object manipulation. Hereby arm extension techniques attach the object to the hand and allow natural manipulation
Offers only
similar to the pure hand manipulation as presented above. Likewise with
translation
ray casting, the selected object is attached to the ray allowing intuitive, but imprecise manipulation which is limited to translation only. Since with ray casting the object is typically attached to the end of the ray, there is no simple method for rotating the object in place, except around the axis of the ray. Bowman calls this the lever-arm problem [31]. Moreover, simple ray casting lacks a method for controlling the object’s distance from the user. Although, selection via ray casting is very efficient, researchers try to combine this technique with hand manipulation in order to enable a more intuitive interaction. One implementation is by Bowman & Hodges [31] where the user grabs the object with a light ray, but instead of attaching the selected object to the end of the ray, a virtual hand moves to the object’s position and is then attached to the object. This technique is called the HOMER technique (Hand-centered Object Manipulation Extending Ray casting). Thus, the object can be placed anywhere on a sphere surrounding the user at the current distance while at the same time, object rotation is controlled independently by hand movement. The distance of the object from the user can additionally be specified using either hand motion or mouse buttons. The WIM paradigm, which was presented earlier as a selection technique in section 2.2.2, can solve the remote manipulation problem as well. Again the user is given a small virtual hand-held copy of the environment, then the user can interact with this copy directly by hand. If the user manipulates an object within the WIM, it causes the larger, remote objects to move as well.
Master’s Thesis Tobias Schürg
Virtual hand
24 Interaction in Virtual Environments
2.3.5. Gestures High resolution camera
In hope of a more immersive feeling and natural interaction, a significant
+ data gloves
amount of research focuses on gestures [17][53]. For gesture recognition in mobile AR scenarios, tracking of certain body parts is necessary. Finger or hand tracking can be accomplished by computer vision, with or without markers. In [41] it is recommended to use a high resolution camera for finger tracking in order to improve accuracy and additionally data gloves for better response speed. Moreover the tracking of arm movement is also possible via wristbands or smartwatches. Lee et al. [20] combine hand gestures with tangible interaction. In their setup, the user is wearing two pinch gloves with visual markers for tracking both the hand positions and its orientations. Additionally, a small vibration motor for feedback is added. Figure 5Figure 6 shows one of their pinch gloves. Using gloves is faster than finger tracking [41] and also offers finger bending sensors for tracking finger movement. In their setup, both hands are distinguished, the right hand - also called dominant hand - and the left hand or nondominant hand. The non-dominant hand is used for “upper tasks” like menu selection and manipulation mode change. In contrast, the dominant hand is used to provide specific or “lower tasks” based on the selected upper task, such as selecting, grasping and pointing. Each upper task covers several specific tasks. Lee et al. [20] admit that an inexperienced beginner will have problems with this new kind of interaction so they add a guidance mode that shows a set of possible next interactions, based on previous gestures and actions. Since no user study has been performed, it remains unclear if users will accept this kind of interaction for AR.
RWTH Aachen University
25 Interaction in Virtual Environments
Figure 6: Pinch Glove [20]
Seeing that today almost everybody owns a smartphone, and these devices are increasingly becoming a part of everyday life, there is an increasing interest for gesture interaction in mobile AR [53]. Furthermore, smartphones are becoming more powerful so that the limited size of the input and display area is becoming a bottleneck. For that reason,
Gestures on the
Caballero et al. [53] designed BEHAND, a 3D interface for smartphones
smartphone
which extends the workspace into the area behind the mobile phone. Therefore BEHAND uses the rear camera of the smartphone to track finger or hand movement behind the mobile phone. In a sample video [54] they show some possible applications and games, and also remotely control a smart kettle with simple gestures. This video was shown to potential users who evaluated the concept of BEHAND positively with words like “useful” or “fun”. However this statement from the test users is of no value. Moreover, some participants mentioned that the use of BEHAND might be perceived as strange. Though, again no actual implementation was tested, participants only watch the “video prototype” [53].
Master’s Thesis Tobias Schürg
26 Interaction in Virtual Environments
Similarly Hürst et al. [21] investigate finger tracking in mobile AR. In a first experiment, objects were floating in mid-air and participants were asked to move those objects to a specific position using finger interaction. Translation to one side was accomplished by pushing the object from the Midair interaction
other side. For comparison, they used a touch-based and a crosshair-
is tiresome
based approach. Although they received a lot of positive feedback and achieved a high level of user engagement, their “concept failed performance-wise” [21], always being outperformed by the two reference concepts, touch and crosshair. Additionally their interaction concept of moving an object by pushing from the opposite side turned out to be awkward and hard to control. Furthermore, a major problem they identified was that holding up the device over a long period of time is tiresome and made interaction more difficult, especially when doing long-lasting object manipulations. Because of the bad performance of the first experiment, Hürst et al. [21] performed a second experiment, an AR board game. Instead of floating in
Second experiment with
mid-air, the figures in their new setting had a clear relation to the board.
focus on naturalness
Since they did not expect a better performance than before, their focus was on identifying useful and natural ways of interaction. For translation, the pushing-from-the-side approach, as in the previous experiment, was used. Further, a user could grab a figure with two fingers, move the figure and place it at a target location. Both approaches worked well, meaning that users completed the task easily. However, participants preferred to use two fingers since it was both more natural and intuitive. Moreover, the exact placement via pushing was harder because sometimes participants overshot the target. For scaling, pinch and zoom gestures were used and accepted by the participants. Rotation was realized in two ways: first by grabbing with two fingers and then turning them or by selecting the object by touching it followed by a circular movement with one finger. Although the one-finger rotation is the less natural one, it was preferred by the participants since they nevertheless found it intuitive and easy to understand.
RWTH Aachen University
27 Interaction in Virtual Environments
Hürst et al. [21] also mention more limitations of finger tracking on mobile
Limitations of mobile
devices. For example, the camera resolution influences how fine-grained
finger tracking
interactions can be. Moreover, the distance to the camera should be considered. If the hand is too close to the camera, no good tracking is possible and if the user has to reach too far, interaction becomes very uncomfortable. Bai et al. [17] attempted to provide 3D hand tracking for Google Glass and combine hand gestures with touch input. Since Google Glass offers very limited processing power and only one camera to perform this kind of
People feel distracted
tracking, they added another hand tracking system, an RGB-D depth
using to interfaces at the
camera and a desktop computer to create real-time 3D models. By adding
same time
this additional hardware, the original mobile approach based on a HMD was no longer completely mobile. Thus, this approach is only suitable for scenarios in which the user stays in the same place and does not move larger distances. They also performed a pilot user study in which their hand gestures outperform touch input via a touchpad, as it seems to be more natural. It is also mentioned that people feel distracted when trying to coordinate their hands using two different interfaces at the same time. Besides overcoming oneself to perform gestures is public, there is another problem with hand gestures. Since there are no established hand gestures for 3D interaction yet - like there are for common touch interaction in 2D like scroll, pinch, zoom, etc. [48] - gestural interfaces for augmented reality often differ significantly. One attempt of standardizing hand gestures can be found in [55], where an evaluation of gestures for AR is carried out. In the end, 44 gestures for 40 tasks such as moving, rotating, scaling, browsing, selecting, editing and menu interaction, have been selected out of an initial set of about 800 gestures.
Master’s Thesis Tobias Schürg
AR gestures often differ
28 Interaction in Virtual Environments
2.3.6. Voice and Speech No learning phase
Speech has some very important advantages compared to other interaction methods. Speech is natural, meaning there is no learning phase. Moreover speech is wireless and therefore doesn’t need any hardware directly attached to the body that restricts the user. Voice input can either be used for voice commands or speech input but is rarely used on its own. Another big advantage of voice and speech in general is that the users’ hands are free during operation. This makes voice commands perfect for multimodal interaction where gesture, touch or other input modalities need to be supported by a more precise interaction. [41]
Remembering long
Although often very simple voice commands such as “save”, “delete” or
commands can be
“quit” are implemented, speech has the flexibility to specify more
difficult
complex commands such as “color the selected object green and turn it upside down”, too. However, remembering long and arbitrary commands can be difficult, too. Although recognition of long textual information works very well, understanding the meaning of complex commands can be difficult for the system. [56] Despite the fact that speech recognition works well on mobile devices and that a microphone is a rather simple input device, speech recognition can
Might be inappropriate
also be very challenging at places where there is much environmental noise or if the user has an unclear voice. Furthermore, similar to hand gestures, speech input might be inappropriate at certain places where other people might be distracted. Also it might be awkward for a user to talk to a technical device.
2.3.7. Summary of Manipulation Techniques A multitude of different manipulation techniques exist that can be used with augmented and virtual reality, each having its own advantages and disadvantages.
RWTH Aachen University
29 Interaction in Virtual Environments
Table 2 shows an overview of the concepts presented in this section. Touchscreen interaction is simple and there is no additional hardware required, TUIs are very intuitive but not well suited for mobile interaction while hand tracking is natural but for tracking and gesture recognition, additional - often expensive - hardware is required. Advantages
Touchscreen
TUI
Hand
Arm Extension / Ray Casting
Gestures
Voice
Simple
No additional Hardware
Well known
Intuitive
Haptics
Intuitive
Simple
Range
Simple
Intuitive
Problems / Limitations
Mapping from 2D into 3D needed
Range
Number of tangibles
Range
Only translation and rotation
Only translation and rotation
Computationally expensive
Versatile
Additional Hardware
Works well
Natural
Environmental noise
Hands free
Distracts
The advantages and drawbacks of these manipulation types for mobile AR are further elaborated at the end of this chapter in section 2.6 where the decision of the type of interaction is established.
Master’s Thesis Tobias Schürg
Table 2: Overview of manipulation concepts
30 Interaction in Virtual Environments
2.4.
Symbolic Input
Besides selection and manipulation, interaction also refers to symbolic input, the task in which a user communicates symbolic information (text and numbers) to the system. Yet, there are only a few cases when symbolic input is needed in AR because AR is more about virtual objects than modifying or inserting texts. In most of the cases, the use of a traditional hardware keyboard or touch keyboard of the smartphone is sufficient. Virtual keyboard
Higuchi and Komuro [57] developed a virtual keyboard for AR that has proven to work well. Their application uses the camera of the mobile device and overlays a virtual keyboard on the scene. The user can type in the space behind the mobile device, similar to the BEHAND approach. They performed a user study and found that “more than half of the subjects felt that the operation area of the proposed system is larger than that of a smartphone” [57]. Although Hürst et al. [21] observed that operations in mid-air have a “low accuracy and can only be used for amusement”, Higuchi and Komuro [57] claim that their implementation can also provide stable key typing in the air. Furthermore, there are other input devices such as pens, which are able to recognize the user’s handwriting. Another alternative for symbolic input is speech recognition. On mobile devices, speech input and recognition is often easier and faster than using a keyboard, and hands are free, representing a more natural way of interaction.
2.5.
System Control
Controlling the system state, application mode or activating (deactivating) some functionality is referred to as system control. On Can be transformed into
desktop computers and touch-based devices this is normally achieved by
a selection task
using buttons and menus. In 3D environments, graphical menus or 3D
RWTH Aachen University
31 Interaction in Virtual Environments
buttons can be used as well. By doing so, system control is transformed into a selection task. Another approach is reached by Lv [58], who mounts a smartphone to his wrist in order to create a custom menu interaction. In this way, the rearcamera of the mobile phone is used for touchless finger tracking, while normal full hand gestures are not considered. The menu has a round shape and displays several social media icons. The user can then rotate the circular menu by using simple finger gestures. A “swing gesture” to the left rotates the menu clockwise and vice versa. Selection is confirmed by a “finger flexion and extension” gesture. Participants of a user study confirmed that this kind of interaction is intuitive and well controllable. Although, in its present form, this construction might not be socially accepted because of its big size in work contexts [58] .
2.6.
Summary and Considerations
Interaction on touchscreens might not be as intuitive as tangible user
No additional
interfaces or direct hand interactions, but since most people use their
hardware needed
smartphones every day, they are used to touch concepts. Furthermore, no additional hardware is required, and since touch input can be easily processed, there is processing power left for managing the augmented reality scenario itself. With tangible user interfaces, the virtual object is attached to a physical object and thus can be altered by moving the physical object. Since the virtual object is attached to the physical one, its position can only be
Unhandy
altered if the physical object is moved. Furthermore the amount of
for mobile AR
independent virtual objects depends on the amount of physical objects available. According to Chowdhury [10], tangible interaction is not well suited for the use with handheld devices because the user has to focus on multiple objects and the handheld device itself. Also, the interaction area is limited by the user’s arm range and, although assumed for tangible user interfaces, the user’s hands aren’t free while holding the mobile device.
Master’s Thesis Tobias Schürg
32 Interaction in Virtual Environments
Gestures and
Gestures and hand tracking are a natural way of interacting in virtual
tracking
environments. Nevertheless they have various constraints that make them less suitable for the use on mobile devices. Still a limiting factor is the processing power of mobile devices which is needed to perform proper tracking and recognition of one scene at the same time. Therefore those concepts are either designed very simply or additional hardware such as a second camera or depth trackers are added [17]. Most data gloves are also not designed for mobile usage. Besides those technical limitations, the space behind a handheld device is limited as well, such that direct interaction is only possible with objects within arm range. Another drawback is that the hand always has to be in the field of view to be recognized by the system.
Ray casting
Ray casting works well for selection but with traditional ray casting techniques it is not possible to change the distance of the object nor change the object’s rotation. Furthermore, an additional input device is needed.
New concepts for
Concerning device types, Billinghurst et al. [51] state that "metaphors
handheld devices need
developed for Desktop and HMD based systems may not be appropriate
to be developed
for handheld phone based systems". Therefore, new concepts need to be developed. Since smartphones inherently provide sensors for detecting device and body movement, they are a good starting point for mobile AR interaction concepts. Furthermore, in combination with new and upcoming devices such as smartwatches and other wearables in general, many new interfaces are conceivable. Based upon these considerations, touch is chosen as an input for the interaction concepts in the scope of this thesis. Since ray casting does not necessarily depend on an external device, at least the selection process will use ray casting. A similar implementation can then be used for both the handheld and the smartwatch.
RWTH Aachen University
33 Interaction in Virtual Environments
Master’s Thesis Tobias Schürg
34 Interaction Concepts
3. INTERACTION CONCEPTS The concrete interaction concepts that are taken into account are presented in this chapter. A basic scenario in which the user wants to arrange virtual objects such as building blocks, furniture or other threedimensional virtual models is considered. Instead of listing all varieties of concepts one could think of, an attempt was made to have interaction concepts that vary and cover different ideas. Figure 7: The three considered manipulation types
Interaction
Navigation Selection
Rotation
Manipulation
Scale
System Control
Translation
Symbolic Input
3.1.
Requirements
Virtual armchair
Since this thesis concentrates on altering the orientation of virtual
as example
objects, the focus is on the three canonical manipulations: translation, rotation and scaling (see section 2.3). Since arranging virtual furniture in one’s home is a realistic AR scenario for a consumer, as manipulation target for the evaluation (which is presented in chapter 5), a virtual armchair is chosen. The concepts presented in this chapter are based on touch devices, but the general idea might be transferred for the use with other devices, such as smartwatches. Therefore, with one exception, only one finger interaction techniques are considered instead of two- or multitouch gestures.
RWTH Aachen University
35 Interaction Concepts
3.2.
Selection
Depending on the scenario and if not done automatically, selection is the beginning of every interaction. Since there are already many concepts for selecting objects (see section 2.1) and furthermore, in our scenario there will only be few objects in the scene, simple ray casting is used for the selection task. This means, the user must simply tap on the touchscreen, then the ray is sent and the first object, intersected by the ray, will be selected.
3.3.
Translation
For practical reasons, translation in this context refers to object
Considering
positioning along one plane. In the example case of the armchair, this is
the x-z-plane
the x-z-plane, which is referred to as the floor. When considering other scenarios, this plane can easily be exchanged by any other plane or even another shape, which does not necessarily have to be aligned along the axes. However in this scenario, we consider only translations along one plane at once.
3.3.1. Drag A commonly used method, implemented in many applications, is the drag gesture. This method consists of touching the object and moving the finger across the screen. When the finger is moved, the object moves along with the finger.
3.3.2. Magnet The idea of the magnet method is to tap somewhere beside the object. The previously selected object then moves in this direction, becoming slower as it approaches the finger. The main feature of this method is that the target object stays visible and does not become covered by the finger. Furthermore it is not necessary to move the finger very far.
Master’s Thesis Tobias Schürg
36 Interaction Concepts
3.3.3. Place Instantly moving the selected object to the currently touched location is the concept of the place method. “Beaming” would have been another suitable name for this concept. The basic idea behind this concept is to allow quick translation without having to move the finger very far. Since the object is always placed where the screen is touched, this method feels like the drag method when the finger is moved across the screen.
3.3.4. Remote Drag This is an extension to the basic drag concept. With Remote Drag the finger can be placed anywhere on the screen. When the finger moves, the selected object moves in the same direction as if it was touched directly. The aim of this method is to overcome the fat finger problem [47], which means blocking the view with one’s own finger, while still being able to precisely control the movement. When the finger is placed on the object, this method behaves like the drag method.
3.4.
Rotation
Considering
Rotation is the act of turning an object around an axis. In the case of our
the y-axis
armchair scenario, rotation around the y-axis (axis orthogonal to the floor: up axis) is considered. As with the plane used for translation, this axis can be exchanged as well without further ado.
3.4.1. Fulcrum The fulcrum rotation does not consider the object as a whole, but rather takes the point where the object is touched as fulcrum. When the finger is moved, the object behaves inert and moves its fulcrum into the direction of the finger as if finger and fulcrum are connected by a rubber band. The idea behind this method is to create a more realistic motion.
RWTH Aachen University
37 Interaction Concepts
3.4.2. Horizontal Pan The idea of this concept is to rotate the selected object by the amount the finger is moved in one direction. A move to the left rotates clockwise and vice versa. For a rotation around the y-axis, the horizontal screen axis seems appropriate. For a rotation around the x-axis, a vertical pan might be more suitable. One advantage of the horizontal pan method is that the user can touch everywhere to start a manipulation and perform a rotation, even on small or far away objects.
3.4.3. Object Center Performing a circular movement around the object’s center and considering the angle created in 3D space is the basic idea of the object center concept. The selected object is then rotated by the same angle that is created by the finger movement.
3.4.4. Screen Center Similar to the object center method, the screen center concept considers the angle created by a finger movement. Contrary to the object center method, the rotation on the physical touchscreen is considered, not the rotation based on the floor plane. Since it is independent of the threedimensional arrangement, this method is also suitable for rotation of small and far away objects.
3.4.5. Two Finger Gesture Although this is a gesture which uses two fingers, it is included for completeness since it is implemented in many applications where objects or an image need to be resized, such as photo galleries. With the two finger zoom / pinch gesture, the selected object is rotated by the angle that is spanned by the initially touched points and the currently touched points. Similar to screen center and pan rotation, this method can be started anywhere on the touchscreen, not necessarily on the object itself.
Master’s Thesis Tobias Schürg
38 Interaction Concepts
3.5.
Scaling
Determining
Scaling is the act of resizing an object by a scale factor. If the scale factor
a scale factor
is greater than the original one, the object becomes enlarged or shrunken. Scaling is independent of axes and planes.
3.5.1. Lasso The Lasso method uses the distance from the selected object’s center to the location at which the floor is touched. If the finger is moved and this distance increases or decreases, the selected object is resized by the same factor.
3.5.2. Two Finger Gesture Again, the two finger zoom / pinch gesture is also commonly used for scaling [19]. With this method, the object is scaled by the same amount that the distance between the two touching fingers has changed.
3.5.3. Vertical Pan The vertical pan follows the same principle as the horizontal pan for rotation. With this method, the selected object’s size is increased if the finger is moved upwards and decreased vice versa. Again the direction of panning could be changed to the horizontal, diagonal or any other direction. In this case, vertical seems to be the most intuitive direction since UP- and DOWN-scaling are performed by upward and downward movements.
3.6.
Combining Methods
When implementing these concepts in a real application, one important thing to consider is how well the different methods can be combined. For example the drag translation methods can be combined very well with all rotation methods since dragging starts with the object being touched
RWTH Aachen University
39 Interaction Concepts
first, whereas the proposed rotation methods all may begin when the finger is placed beside the object. So the system can distinguish easily what the user intends to do. Furthermore, object center and lasso, the two panning methods and both two finger concepts can be combined very well so that the user is able to change both the rotation and the scale at the same time. For example, when considering the two panning methods, where a pan to the right rotates the object to the right and an upwards pan increases the object’s size, a diagonal pan from the lower left to the upper right rotates and scales at the same time. The same holds for the object center rotation where the object’s size can simultaneously be altered by changing the distance between finger and object center. On the other hand, changing two properties at the same time could also worsen the usability because it can be hard to perform an action in such a way that only one parameter is altered. For example, performing a horizontal pan without any vertical deviation or a two finger rotation without any change of the distance between the two fingers is almost impossible.
3.7.
Smartwatch Interaction
The main idea for any other controller that is capable of orientation
No buttons or
changes - which is a smartwatch in this case - is to use it as a pointing
other input
device that implements one of the previously presented one finger gestures. Therefore, a ray casting approach is used, which mimics pointing. Since the idea of using a smartwatch is also about hands free interaction, no additional buttons will be used, only the accelerometer and gyroscope to detect device movement. Details about this approach are presented in chapter 4.
Master’s Thesis Tobias Schürg
40 Interaction Concepts
3.8.
Concepts Summary
5x rotation
In the scenario of aligning a virtual object, in this case an armchair, the
3x scaling
focus lies on the three canonical manipulations that are rotation, scaling
4 x translation
and translation. Five concepts for rotating, three for scaling and four for translating a virtual object are presented in this chapter. Except for two, they are one finger gestures that might be transferred for the use with a smartwatch that acts as a pointing device. The usability of those concepts on touch and smartwatch is evaluated by a user study that is presented in detail in chapter 5.
RWTH Aachen University
41 Interaction Concepts
Master’s Thesis Tobias Schürg
42 Implementation
4. IMPLEMENTATION This chapter will explain how the concepts which have been described in the previous sections are implemented in the context of the approach. First, the tools and technologies used for this approach are presented. Afterwards, the design and implementation of the actual smartphone application and the realization of the interaction concepts follows. The design of the smartwatch application, which is used together with the smartphone and for which one concept for each manipulation type is implemented, will conclude this chapter. This implementation is the basis for the user study, the results of which will be presented in chapter 5.
4.1.
Tools & Technologies
The mobile application targets the Android operating system and is based on the DroidAR [59] framework. DroidAR is an open source augmented DroidAR framework
reality framework for Android, released by Bitstars under the GNU GPL v3 license. The framework supports both location-based as well as marker based augmented reality. Its default rendering implementation is based on the jMonkeyEngine (jME) [60], which makes it possible to quickly test parts of the application on the desktop – as long as no Android-specific functionalities are used. In this approach, DroidAR is used to handle the overall setup of the AR scene such as the object’s position relative to the user, independent of phone orientation changes.
JMonkeyEngine
The jMonkeyEngine is a 3D game engine written in Java and has been released as an open source community project under the new BSD license [60]. Its current stable release is version 3.0.5 [61]. As most gaming engines do, jME uses a scene graph approach as the core concept [62].
RWTH Aachen University
43 Implementation
A scene graph is a general data structure which orders objects
Scene Graph
hierarchically and in this way arranges the logical and spatial representation of the graphical scene, which represents the virtual 3D world. Objects in the scene graph are called Spatials. A Spatial has a location, a rotation and a scale and can thus be transformed. There are two types of Spatials in jMonkey, Geometries and Nodes. Geometries represent visible 3D objects with a specified shape and a material (color) which specifies its visual representation. A Node is an invisible object that is used to structure and group other Spatials in the Scene Graph. The rootNode is the main spatial and everything attached to it is part of the scene. The scene graph has a tree structure (as shown in Figure 8), thus all objects are in a parent-child relationship. This means that if a parent is transformed, all its children are transformed as well. By doing so, an object consisting of multiple Geometries is easily transformed by only altering the parent Node.
Figure 8: jME Scene Graph [62]
The mobile application has been primarily developed on a Nexus 5 [63] and OnePlus One [64]. Nevertheless, the mobile application can be used on every android smartphone or tablet that has the basic capabilities to experience augmented reality. This means a reasonably modern mobile processor (Quad-core with ~2GHZ), a camera which can be used for augmented reality and orientation sensors for detection of device movements.
Master’s Thesis Tobias Schürg
Hardware
44 Implementation
Android Wear
The concrete smartwatch application, which is described in section 4.5.1, is built for Android Wear. Android Wear [65] is a version of Google's Android operating system designed for smartwatches and other wearables. The smartwatch that is used in this case as the external controller for the mobile application is a Samsung Gear Live [66] which could be replaced by any other smartwatch having orientation sensors and running Android Wear.
4.2.
Foundations of jMonkey
This section introduces some more jME specific features, besides the scene graph, and explains how they are used to enable touch interaction with DroidAR and jMonkey, respectively. Accessing the
Since the DroidAR framework uses the jMonkeyEngine, its base class is
update loop
the SimpleApplication [67] class which extends the real-time 3D rendering jME application. This class provides access to the standard game engine features like the scene graph, with all its Nodes and Geometries as well as the first-person camera. Furthermore it gives access to the update loop where the logic can be implemented. Application States (referred to as AppStates) [68] are an extension to the jME Application and allow for the control of global game logic by hooking into the update loop. This implementation will use an AppState for tracking user input and the overall interaction state. Furthermore, there are Controls [69] which are customizable jME interfaces that can be attached to objects and allow for the modification of the entities behaviors and transformations. Controls will be used to implement the concrete manipulation logic. Since multiple independent controls can be added to an object, each manipulation will be implemented in terms of its own control. In this way, it is possible to freely compose any set of different manipulation concepts.
RWTH Aachen University
45 Implementation
4.3.
Pie Menu
In order to perform different interactions within one scene where multiple modifications shall be possible, a method to change the interaction mode is required. Instead of buttons or a linear menu, a pie menu has been chosen for this purpose. A pie menu [70] (sometimes also referred to as radial menu) is a graphical user interface where the various options are not aligned linearly but placed on the circumference of a circle at equal radial distance from the
Easy and efficient while
center. The selection depends on the direction of the pointing input
faster and more reliable
device within its circular context. Nothing is selected within the center area. If the pointer is moved outside of the inactive center, the option closest to the pointer becomes highlighted and then selected. Because of its radial structure, each option has the same distance from the initial center, which increases the target selection time, reduces seek time and lowers the error rate [71]. Within the scope of this thesis, a pie menu is used to offer a convenient way of changing the interaction. Although only used for switching between the rotation, translation and scaling modes, it could also be used for additional commands such as copy, paste, delete or other more complex manipulations.
4.4.
Handheld Implementation (Smartphone)
This section covers the design and implementation of the handheld application which uses touchscreen input for realizing interactions.
4.4.1. Core Concept Since various touch events are required by the application at different
Track touch events at a
places (in each controller), there is a MultiTouchAppState class which
central place
extends jMonkey’s AppState class and keeps track of all touch events, such as finger down, finger moved and finger removed from screen. This
Master’s Thesis Tobias Schürg
46 Implementation
class is initialized in the beginning, making it a central place where all those touch events are captured. Besides processing touch events, AppState is also responsible for object selection. Figure 9: Android application class diagram
The ScreenState class tracks various 2D points for every single touch event. Among those values are the initial screen position, the current screen position and an index for each touch event. Based on those values, the class provides methods for determining how far a single finger was moved, the initial or current vector between two fingers as well as the angle which is described by two fingers or one finger over time. Furthermore, there is a class for tracking 3D points within the virtual environment, referred to as the WorldState class. This class performs a ray cast for each registered touch event and stores if and where an object was hit by the ray and at which point the “floor” (invisible x-z plane where the virtual objects are placed upon) was intersected. This class provides methods to estimate the distances and angles between several of those 3D points and the selected object. Initially, after the application has been initialized, no object is selected. Hence, no direct manipulations are possible. Only after the user touches
RWTH Aachen University
47 Implementation
an object it becomes selected and an InteractionControl is attached to the object. This control manages which concrete type of manipulation becomes enabled for the selected object. Since each interaction concept is implemented as a control itself and multiple controls can be attached to a Spatial, it is possible to freely combine multiple interaction concepts.
4.4.2. Interaction Concept Implementation This section outlines how the concepts presented in chapter 3 are implemented. We assume that the object being manipulated is already selected. Furthermore, there is an invisible floor added below the objects which is required for some interactions, such as translation via ray casting. Implementation of Translation Concepts For Drag, we first check whether the initial touch event intersects with the object. If so, a vector from the point at which the floor was initially touched to the current point is estimated and the selected object moves by this vector. By doing so, the selected object always stays below the user’s finger while moving. The vector from the object center to the current touch position at which the floor is intersected, is used for the Magnet approach. In contrast to the Drag implementation only a fractional amount of this vector is added to the translation of the selected object. This makes the object move slowly in the direction of touch over time. Furthermore there is no need to start interaction by detecting an object intersection. Place simply sets the point where the floor is currently touched as the object’s new position. With this approach the object behaves similar to drag when the finger is moved, also if the object was not initially touched. In this scenario, translation takes place only on the virtual floor. To enable translation along another plane, either the floor plane could be altered or another plane can be added to the WorldState class, so that points on this
Master’s Thesis Tobias Schürg
Remark
48 Implementation
plane are also tracked. For example, by adding a plane along the x-y-axis upwards, translation becomes possible. Implementation of Rotation Concepts The Fulcrum rotation calculates the angle between the point where the object was initially touched, the current point that is touched and the object center. This angle then becomes minimized over time to get a smooth and inert movement. A variant of this method additionally moves the object slowly in the direction of touch so that it feels like a real object that rotates around its center of gravity is being pulled. For the Horizontal Pan only the points on the touchscreen are considered. Then, for the horizontal version, only the x-value of this vector is taken and multiplied by a predefined factor to get the rotation angle. This mapping factor is set to be 7 radians (≈ 400° → more than a full rotation) divided by the screen width. In this way, a complete rotation is possible, even if the interaction did not start at one edge of the screen. Object Center and Screen Center rotation are very similar. Both use three points and calculate the angle within. The first method uses the two points at which the floor was initially and lastly intersected and the object center. The screen center method uses the two corresponding points on the touchscreen and the screen center to calculate the rotation angle. The Two Finger Gesture uses the vectors that are created by the two fingers at the beginning and end of the gesture, and uses the angle between these. Implementation of Scaling Concepts The Lasso method uses the initial floor intersection point and calculates the distance to the object center. If the finger is moved, the ratio of initial and current distance, i.e. the scaling factor, is calculated and scaled accordingly to the object.
RWTH Aachen University
49 Implementation
Likewise, the scaling factors for Two Finger and Vertical Pan are estimated using the ratio of two vectors. The first one uses the ratio of the distance between the two fingers when the gesture was started and when it was ended, while the latter concept uses the distance that the finger was moved in the vertical direction.
4.5.
Smartwatch Implementation
This section describes how the interaction techniques have been implemented for use with a smartwatch. In contrast to our handheld approach, only a single concept is implemented for each interaction type since it was unclear whether interaction with a smartwatch is possible and if this type of interaction would be accepted by the users. The three methods for rotation, scaling and translation correspond to how users rated the interaction methods for the touchscreen. Although this has not been done yet, each concept which uses the input of only one finger could be realized by a ray casting approach (see section 6.3).
4.5.1. Smartwatch Application The main task of the wearable app that runs on the smartwatch is to transmit its orientation to the smartphone. Since the smartwatch uses Android Wear [65], a version of Google’s Android operating system, implementation is very similar to a standard Android application. As wearable and handheld are connected via Bluetooth, Android’s MessageApi [72] can be used to transfer data from smartwatch to smartphone and vice versa. Unlike with data items, there is no syncing between the handheld and wearable apps [72]. Messages are only oneway communication. While running, the wearable application listens to the accelerometer and the magnetic field sensors and uses its values for calculating the current device orientation. Android also offers a virtual orientation sensor which is synthesized from physical ones, but was deprecated in API level 8 due to its many limitations and heavy processing [73].
Master’s Thesis Tobias Schürg
50 Implementation
Instead of using the orientation sensor, Google recommends using the getRotationMatrix() method in conjunction with the getOrientation() method of SensorManager to compute the orientations (listing 1). [73]
float[] mGravity; float[] mGeomagnetic; public void onSensorChanged(SensorEvent event) { if (event.sensor.getType() == Sensor.TYPE_ACCELEROMETER) Listing 1: mGravity = event.values; Listening for sensor if (event.sensor.getType() == Sensor.TYPE_MAGNETIC_FIELD) mGeomagnetic = event.values; changes [JAVA] if (mGravity != null && mGeomagnetic != null) { float R[] = new float[9]; float I[] = new float[9]; boolean success = SensorManager.getRotationMatrix(R, I, mGravity, mGeomagnetic); if (success) { float orientation[] = new float[3]; SensorManager.getOrientation(R, orientation); // orientation now contains: azimuth, pitch and roll } } }
Since the raw sensor data is very jittery, a Low Pass Filter [74] is added to smoothen the raw accelerometer and magnetic field input. The algorithm (listing 2) requires two numbers to be tracked for each sensor i.e., the prior value and the new value. Additionally a constant ALPHA affects the weight or momentum of new sensor data. A lower alpha (> 0) means more smoothing while an alpha of 1 means no smoothing at all. The algorithm, taken from [75], is shown below.
static final float ALPHA = 0.05f; protected float[] lowPass( float[] input, float[] output ) { if ( output == null ) return input; for ( int i=0; i