Enhanced Feed-Forward for a User Aware Multi-Touch Device Georg Freitag, Michael Tränkner, Markus Wacker Fakultät Informatik und Mathematik, HTW Dresden Friedrich-List-Platz 1, 01069 Dresden, Germany freitag, michael.traenkner,
[email protected] ABSTRACT
Common multi-touch devices guide the user with feedback visualization during or after a registered interaction. Feedforward techniques are less frequently used or not common at all. Our approach aims at a continuous process in which the system is aware of the users before, during, and after an explicit interaction takes place. This opens up the possibility for novel scenarios of user centered applications. Our setup utilizes Microsofts's depth-camera Kinect to collect the user’s posture data in combination with a multitouch device. This is a low cost and easy to install approach for collecting detailed information about the people and their position in close proximity of a multi-touch table as well as the location of their physical contact. Based on this information, we propose five phases of interaction and analyze the sequence of input during a typical workflow. Eight application concepts show the relevance of these phases using appropriate forms of visualization and we evaluated three of those concepts in a user-study. Author Keywords
Multi-Touch, Kinect, Awareness, Feed-Forward. ACM Classification Keywords
H.5.2 User Interfaces: Interaction styles, Input devices and strategies General Terms
Design, Human Factors MOTIVATION
In recent years, research and development of multi-touch software and hardware have accelerated tremendously. Various sources concentrate on the intuitive and natural interaction of this direct input method [17, 28, 30, 33]. These devices, however, can only respond to touch contact due to the underlying technical principles, when in fact, the interaction begins much earlier. Especially limiting are the few ways of delivering necessary feedback to potential users, particularly in public spaces such as museums or shopping-malls. These are environments where many users of real-world applications struggle to realize whether Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. NordiCHI '12, October 14-17, 2012 Copenhagen, Denmark Copyright © 2012 ACM 978-1-4503-1482-4/12/10... $15.00"
certain screens are just displaying information or are in fact interactive [13]. This problem led to various developments in which multi-touch devices were enhanced by additional sensors to be more aware of the user [1, 5, 8, 19, 31, 35, 36]. Unfortunately, those systems usually employ complex multi-camera setups [19, 35, 36] or other custom solutions [1, 5, 8, 31] with a negative impact on price, portability, or performance. Our goal is to design a responsive and body-aware interface that enhances the joy-of-use and results in a more immersive experience. In this paper, we present an affordable and versatile solution for adding spatial awareness to existing multi-touch displays using Microsoft’s Kinect sensor [20]. Furthermore, we develop innovative feed-forward concepts and prototypes for useraware application scenarios with the potential to intrigue people in public spaces. First, we introduce the structure of our awareness concept and the five separate phases of workflow. Second, we explain the terminology that is used in this paper and the related work. The main section introduces a common application scenario, including eight interaction concepts for the different levels of awareness and the technical configuration of our setup. Subsequently, we conducted a user study, where we asked the subjects to compare our awareness-enhanced prototype to an ordinary multi-touch application in terms of hedonic and pragmatic quality. Finally, we give an overview of our future work. TERMINOLOGY
Bill Buxton’s [4] suggested a state-transition model for capturing relevant aspects of human-computer interaction. Compared to the distinctive states of his "A Three-State Model of Graphical Input", we developed a less abstract model with a continuous level of engagement similar to [3, 27]. These more recent publications called the “ThreeZones of Engagement” [27] or the “three spaces of activity” [3], classified the interaction with a device into spatial areas. However, we consider the user’s engagement not only as a spatial but also as a temporal process (cf. Figure 1). In general, the interaction starts with a person approaching the device. Possibly there will not even be an interaction and users will leave without any explicit user interaction at all. However, several people will eventually approach the interface and start thinking about its usage or ways of
interacting. During this observation phase, users will learn from fellow humans who are already interacting with the interface [3, 26]. This can be considered as a form of preparation to interact personally because the probable user actually intends to use the device but lacks familiarity or confidence. At some point, the user finally operates the interface, observes the outcome, and then gets ready for the next input. During this process, three different phases can be distinguished: pre-input, input, and post-input (cf. Figure 1).
feedback is given to the user during the input phase, offlinefeedback is only the result of his completed interaction (cf. Figure 2).
Figure 2 - Sequence of Feed-Forward and Feedback
An overview of the different phases, which a user-aware multi-touch device is capable of recognizing, is illustrated in Figure 3. Furthermore, it visualizes what kind of posture data is available in each of the phases and how it can be used to improve the user experience.
Figure 1 - Temporal Workflow Diagram
While the pre- and post-period simply embed an interaction, the actual input is defined as the intrinsic activity, e.g. touching the sensitive surface. These three phases constitute an interaction loop during the engagement with the system, which is similar to the “gesture phrase” concept established by Kendon [18]. The interaction loop ends when the user decides to physically leave the interaction area. We propose this general concept to gain a better understanding of any human-computer interaction, especially in public spaces where user attraction is crucial for the success of an interface. Moreover, the concept can be applied to a variety of use-cases. In the scenario we present here, the actual input will be a touch on a multitouch device and the ambient information of users approaching is supplied by a depth-camera observing the input area. Due to the additional information about the user’s posture and relative position to the device during the workflow, a more extensive understanding of the user’s intention is possible. In the case of multi-touch development, this supplemental awareness information can improve the interface design of various applications. Not only is it possible to implement feed-forward which “informs a user about the result of their actions” [7] before the task is done, but it also enables various levels of feedback. Usually feedback is considered to be a stream of information returned to the user about the result of a process or activity [34]. However, in this paper we differentiate further between online- and offline-feedback according to the definition given in [16]. While online-
The approach-phase starts when people around the device are close enough to be recognized by the depth-sensing camera. The data obtained provides developers with the information about the number of people around the device, their distance, and orientation towards the touch surface. In case the interface cannot communicate its interactivity the user leaves without an interaction (cf. Figure 1). However, when the user further approaches the multi-touch table, he enters the pre-input phase. Now the user starts perceiving the visual representations on the device and is typically thinking about engaging the touch interface. Because the person is close to the device, we can use posture data to compute hand or head positions. With the tracking of the hands’ position, we can for example estimate where touches will occur and can provide feed-forward in these areas very similar to a hover-state in a mouse-driven graphical user interface. In our concept, the distinction between approach and pre-input phase is rather fuzzy.
Figure 3 – Awareness Workflow Diagram
The third phase is the actual input of the user’s fingers on the device, which can be further enhanced by using the available posture data. For example, touches can be assigned to each hand which enables the distinction of input from the dominant (DH) and the non-dominant hand (NDH). Further concepts that leverage the posture data will be explained in the application scenario section. Naturally,
the online feedback for touch-input like Ripples [34] can still be provided by the interface. Once the user lifts his fingers from the surface, he enters the post-input-phase. In this fourth step, offline-feedback is provided to visualize the successful or unsuccessful action. During a session a user will typically interact with the device again and therefore transition from phase 4 (postinput) to phase 2 (pre-input) creating the previously described interaction loop. This loop allows the interface to give continuous feedback after a touch as well as feedforward before the next input occurs. Once the user has completed his tasks at the multi-touch table, he will eventually leave the interaction area. This phase permits the software to properly terminate the user’s session by cleaning up the interface and returning to a ready-state for the next approaching person. Overall, this taxonomy for user-aware multi-touch devices aims at a more meaningful interaction complementing ordinary touch-input. Through novel concepts of feed-forward and feedback we encourage the usage of the device and attract users even before their first physical contact with the system. RELATED WORK
There have been numerous efforts devices more aware of the user. publications that try to improve especially through novel concepts feedback.
to make multi-touch We focus on those the user experience, of feed-forward and
In the highly relevant work of Medusa [1], a Microsoft Surface has been equipped with 138 proximity sensors. This extensive modification allowed for features like tracking the position of different users around the table. The Medusa-Table with its enhanced awareness distinguishes between DH and NDH and also permits so called pre-touch functionality (e.g. hover-state). To allow prototyping of feed-forward, feedback, and user-tracking techniques on our touch device, the complexity and installation efforts of the Medusa-solution would have been extensive. The general idea of Medusa is also implemented with only 12 sensors in [30], but the authors focus on the technical details. Another relevant publication for our approach is the Continuous Interaction Space [19], in which the possibilities of interacting above a multi-touch device are reviewed. This approach combines gestures that start as a touch on the device and end up in the space above the device or vice versa. The user has to wear gloves so that his hands can be constantly tracked by a motion capture system with eight cameras. The main focus of this work is touchless interaction, which is less relevant for this paper. However, various implemented feed-forward-techniques are of interest. In [6], Microsoft’s Kinect is used for touch detection on an arbitrary surface. To get an overview about the user during the interaction process, the depth camera is located above the surface where it detects objects and hands. The research focused on comparing the accuracy of the input using a
depth camera and a capacitive multi-touch display. This work explores the feasibility of emulating touch-input with a Kinect in contrast to our novel interaction concepts which originate from the combination of both technologies. Similar to this approach, the work of [11] uses the Kinect camera located on the shoulder of the user. In combination with a handheld projector, multi-touch input is realized on any surface, e. g. walls, pads, arms, or hands. A prior investigation about the use of depth-cameras to facilitate touch interaction is done by [35]. In their setup, multiple projectors display the content on arbitrary surfaces while several depth-cameras allow for users’ input recognition after the system has been calibrated. This person-aware environment allows interaction in the space between surfaces and influenced some of our ideas for multi-user concepts. However, the work neither focuses on hand-distinction nor does it utilize data about approaching users. An interesting approach regarding multi-user scenarios for multi-touch devices is DiamondTouch [5]. With their unique system they achieved user distinction for each touch on the device. In addition to this functionality, our approach is also able to track various limb positions and compute their proximity to the input area All of the discussed approaches enable spatial awareness for multi-touch devices. However, only little research has tried to utilize this additional information for an improved user-interface. One of the few examples for feed-forward on a multi-touch device is TouchCuts and TouchZoom [36]. With a motion capture system consisting of four cameras and a special glove, the user’s hand-position in front of a touch-surface is continuously determined. In their work, the information about an approaching finger is used to increase icon sizes for easier targeting, which results in fewer selection errors. Especially the TouchZoom-scenario in [36] is easy to replicate with our single depth-camera approach. Although our solution may lack the comparable precision, we believe the key advantage is the less restraining and simpler setup. However, all the presented work enables spatial awareness in a multi-touch environment and shows the potential of a more user focused touch-interface. We want to leverage this supplementary information to deliver novel ways of feed-forward and feedback. APPLICATION SCENARIOS
A multi-touch device enhanced with spatial awareness through additional sensors enables various new use-cases. To illustrate some of our concepts, we implemented them into a basic single-user scenario. For implementation purposes we chose Liquid, our own custom-built Actionscript3-based framework for developing multi-touch applications [9]. Due to our access to the code base, we could easily extend its features and prototype user-aware applications. We chose an annotatable and interactive map application (i.e. Google Maps) to demonstrate our interaction phases. Hereby, the user could pan and zoom to navigate in the map. Further, the map was extended by an
annotation tool which enables users to draw free-forms on a separate layer above the map. As a result, users could easily annotate locations of interest. Different drawing colors and map tools, e.g. map view or zoom lock were included in a side menu. The application itself was installed on a multitouch device and users could create a personal account for their individual settings. Such a user can approach the device, start interacting, and leave after a while, as shown in Figure 1. In the following sections, we present different concepts which could be partially or completely implemented in our prototype and might be relevant for applications in public spaces. Awareness concepts
Due to the ability to sense the general presence of the users during all five phases of the previously described workflow (cf. Figure 3), the software can already react to an approaching person before the first touch. Hence, our first concept is the visualized awareness of the users’ presence during their approach. In our application, we want to use this information for a first response to further attract the users’ attention, especially if they enter the awareness area at random. By communicating that the device is aware of their presence, e.g. animations aligned towards the user, the device already conveys a sense of interactivity. In our scenario, the software returns from an idle state and starts fading in the workspace for the user, depending on the distance of his body (body-awareness) from the screen. The user can log into an account of our application via the onscreen keyboard for a personal workspace over multiple sessions. Once logged in, all user preferences are available and the software restores the state of the last application. This data includes, for example, the coordinates and zoomlevel of the map application or already annotated locations. Furthermore, the software is now aware of the user settings previously configured, such as personal work-space and handedness. Our second concept are body-aware UI-elements where the application can adjust the position and the alignment of a menu or content not only with respect to the borders of the screen (like in traditional graphical user interfaces) but with respect to body-, head-, and hand-positions of the user. These body-aware UI-elements deliver a unique spatial feeling to the user by constantly conveying a sense of where the components, relative to the person, will be. In addition to our second concept, we propose the visualization of hands and arms as a discrete outline or shape in the sense of an accompanying shadow, which is partially explored by Marquardt et al. [19]. This approach is an example for ‘naïve physics’ explained by Jacob et al. [14] and implements real-world behavior for UIcomponents in order to create a more familiar feel to natural interfaces. We visualize darker digital shadows when the users’ hands get closer to the surface (hand-awareness) to emulate a correct physical behavior. After the approach-
phase, (cf. Figure 1), the user starts an interaction loop until he leaves the multi-touch device. All of these concepts that are encompassed by the interaction loop are explained in the following subsection. Spatial awareness can furthermore be leveraged to build an adaptive interface according to the number of people. Using the posture data of various users to divide and distribute the available screen-space on large devices enhances the experience of all users, which is our third concept – the user adapted workspaces. On smaller multi-touch tables this approach correlates to different layers of the map and every user only annotates on their own overlay which turns transparent as he leaves. Furthermore, the increasing distance between the person and the device allows the software to remind the user of possible actions such as saving his progress. Completely losing the tracked person enables applications to react properly to the absence by closing the session of the previous user, saving critical data, and resetting the interface without the usual time-out. This functionality is especially relevant in a multi-user environment where different work-spaces clutter up the real estate on the screen. This also holds true for public spaces where a fixed temporal threshold for an immediate clean up of the interface is hard to define [13]. In addition to the information that is captured by our setup, the system also distinguishes between left and right hand or NDH and DH, respectively. The hand-distinction is our fourth concept. Given that our application scenario consists of two different actions (navigating and annotating the map), we assigned one task to the input of each hand. During the interaction process, the DH and the NDH fulfill different tasks depending on their accuracy level. In this process, the NDH often supports the DH during the interaction [25]. Such support is possible, if a task is dividable into parts that need more and less attention by the user. Consequently, the part, which requires more attention, will be performed by the DH while the other part is done by the NDH. In our scenario the navigation task is the basic condition that has to be fulfilled before a location can be annotated. Furthermore, the movement and zooming operations on a map require less accuracy, while annotating a defined location is more complex. To this end, we assign the navigation to the NDH and the drawing task to the DH. Feed-forward concepts
After completing the approach-phase, the user enters the pre-input phase, if he intends to interact with the application. For this, the user needs a basic understanding of what kind of service the application offers and how it can be triggered. To achieve this, the user may be guided by already known feed-forward techniques, like tooltips. Tooltips usually appear below the cursor when a user hovers over a UI-component. The shadow tooltip, which is our fifth concept, is now possible in our touch-interface because the device is aware of the user’s hovering hands. This kind of feed-forward
instructs the user and explains the functionality of an interface element by presenting a set of gesturecons [10] or showing a short animation. In our scenario, the interactive map shows which options are available in order to navigate the map. For example, he can either drag the map with one finger or zoom into it with two or more fingers of the NDH. To illustrate that only the NDH has the ability to navigate, an interaction preview should be used, which is our sixth concept. In contrast to tooltips, for guiding users, previews create an impression of what will happen if the user interacts with a component, like pressing a button or choosing an element. In the case of hand distinction, each hand is assigned an interactive shadow tooltip. This shadow visualizes the type of hand (DH or NDH) and its assigned function, like navigating or highlighting (cf. Figure 4).
Figure 4 - Unique Hand Shadow and Tooltips
Another kind of feed-forward focuses on the UIcomponents and their adaptation during the interaction process. This adaptation becomes necessary since the fatfinger is a well known problem for multi-touch devices [15, 34]. To avoid the problem of missing or double-hitting one target as well as selecting multiple components at the same time, designers create larger elements that cover a bigger part of the user-interface. In our seventh concept, the distance-aware menu, we provide an approach to rescale user-interface components. Depending on the distance of the users hand, they scale up or down accordingly, similar to the work of [36]. A concrete example is our implementation of a sensitive menu that contains multiple views or colors (cf. Figure 5). Additionally, continuous user tracking allows for user distinction, which enables developers to implement access restriction in multi-user scenarios. Hence, user restriction is our eighth concept that can be implemented in the mapdrawing application. Each user can be restricted to only annotate or navigate on his map. To visualize that the user has no permission to interact in another person’s workspace the hand-shadows could disappear, change color, or show a certain icon. This way the software could provide feedforward in a subtle way for novice users and aid in the process of exploring the interface.
Figure 5 - Distance Aware Menu; Top left: The user freely annotates the map; Top right: To change the color, he moves his hand towards the color menu which then enlarges; Bottom left: Selecting a color button; Bottom right: The new color will be visualized by the assigned hand shadow as the color menu minimizes. TECHNICAL CONFIGURATION
In our laboratory, we have set up a depth-sensing Microsoft Kinect camera slightly above a 22” 3M multi-touch screen [21]. The minimum distance which the Kinect can measure is around 0.6 m and the instruction manual states the intended distance for a full-body view is 1.8 m for single user and 2.5 m for multiple users. The maximum distance which the depth-camera can detect is up to 6 m with increasing error margins. The area where the users will be standing and the full touch-screen should be completely visible to the sensor. For tracking the interacting person we used the C/C++-based OpenNI Framework [24] which detects the user and computes the skeleton in the depthimage. For our software to know where the touch screen is located in the room, we implemented a very quick and easy calibration. To identify the input area, just three corners of the touch screen have to be calibrated by placing a hand at the respective position and pressing a key. From this data we compute the normal and the dimensions of the screen, which allows our system to work with inclined touchdevices as well. We defined the origin of the right-handed coordinate system at the lower-left corner of the device with the screen surface lying in the xy-plane (cf. Figure 6). For more universal use of our system, even with very large displays, we assigned the (1, 1, 0) position to the upperright corner and work with resolution- and size-independent coordinates. This allows for an easy check of the x- and yvalues to determine whether the hand is somewhere above
the screen. Additionally, the hand-distance to the display is equal to the value of the z-axis whenever the hand hovers above the interaction area. We send the computed x, y and z-values via the TUIO-protocol [32] to our application, which allows for easy prototyping and a versatile usage of the data in any programming language or environment. Due to the limited precision of the Kinect sensor and the underlying software, our system only provides a vague estimate of position rather than pixel-correct measurements. Nonetheless, with an exact calibration we found our approach to perform well at around 1-2 cm deviation which is sufficient for most of our use-cases.
USER EVALUATION
The goal of our evaluation was to gain insight into the user experience with user-aware multi-touch software and to assess the potential benefits of this approach for future applications. This study was not to measure performance or error rate, instead the users had time to focus on the look and feel as well as the joy of use during their interaction. For the evaluation, we implemented three of our core application concepts (hand shadow, distance-aware menu and hand-distinction) explained earlier in this paper. We choose these due to the fact that our application is a single user scenario tested in a non-public environment with briefed users. However, concepts like body-aware uielements, user restriction, or user adapted workspace are more viable in a multi-user scenario, while interaction preview in combination with hand shadows may lead to a cluttered interface. In the awareness enabled prototype (AEP), input from the user’s DH annotated on the map and opened the menu for color selection. The touch-input from his NDH was used for dragging and zooming the map. Both hands were visualized by a dark shadow overlying the map and the current annotation color was shown by a glow around the DH’s shadow (cf. Figure 7).
Figure 6 - Technical setup with Microsoft’s Kinect and a multi-touch device
However, there are some limitations to this setup. First of all, our depth-camera is placed on one side of the multitouch device, this means there is a specific side from which the users should approach. Secondly, the Kinect sensor projects an infrared pattern and computes the depth image through the distortion of this light-pattern. This approach could interfere with some infrared light-based multi-touch solutions (e.g.: FTIR, DI or LLP setups [22]). Finally, some occlusion of the users’ body parts (mainly hips and legs) can occur in the Kinect’s field of view because of the touch device. Thus, the algorithm fitting a skeleton in the depthimage experiences some trouble resulting in slightly increased computation time for a correct fit. We experienced reliable depth tracking and correct user recognition at approximately 4 m and consider multi-touch tables wider than 3 m as too large for our approach. Another inconvenience in our current setup is the need for an initial recognition gesture for the skeletal fitting. Users are required to perform the so called PSI-pose, a stance where the elbow and the shoulder joints are angled at 90°, which currently prevents a natural approach to our system. The initial gesture is disregarded in the following sections since there is no need for such pose in the official Microsoft Kinect SDK beta. However, we found the skeletal reconstruction was more stable with the OpenNI Framework which was more important during our testing phase. Furthermore, the OpenNI framework is currently capable of tracking two skeletons while being aware of two more observing users in the background.
Figure 7 - AEP with extended color menu
For comparative evaluation we created a second application without our awareness concepts and using traditional interaction techniques. In this more conventional multitouch prototype (MTP), the user toggled between the navigation and annotation mode with the help of two additional buttons. The menu for color selection was visible at all times (cf. Figure 8). We asked 11 subjects (2 of them left-handed, age 18-30, students) to participate in our user study and to compare both prototypes despite their very similar functionality, look, and interface. Every user was asked to navigate to the same five common locations on a map and to highlight them in a specific color with each application. All the subjects were familiar with the desktop use of the underlying Google Maps but had no prior experience with any feed-forward concepts or multi-touch devices except for smartphones. After a short introduction and testing the applications, our users answered a short questionnaire for
both prototypes to rate the implemented application concepts independently regarding their pragmatic as well as their hedonic quality. The pragmatic quality describes the utility and usability of a product concerning a given task [12]. It further represents the effectiveness and efficiency of a goal-oriented interaction. But for users there is more to a product than just finishing their task and this is called the hedonic quality which refers to more general attributes such as novelty, personal growth, and stimulation. These truly engaging aspects of an application deliver a feeling of enthusiasm and can avoid boredom. In the end, we asked the subjects to vote for their preferred application and to write about their personal experiences and impressions in short sentences.
Figure 8 - MTP with toggle button and a present menu Results
With our questionnaire, we received positive feedback about the appeal and the novelty nature of the handdistinction and hand-shadow features. However, the sidemenu only scored average in the hedonic qualities (cf. Figure 9).
Figure 9 - Results of the three application concepts
In contrast to the AEP, the more conventional multi-touch prototype scored best for the overall pragmatic quality. The users described it as "straightforward, clearly structured, presentable, good" and "practical". Common feedbacks in the written comments section of the questionnaire for the MTP were statements like "easy to use" and "somehow familiar". However, a few users thought switching between navigation- and annotation-mode was "slowing down the workflow", "time-consuming" and "annoying". Regarding the pragmatic quality of our new awareness concepts the side-menu performed extremely well in terms of comprehensibility, relevance, and ease of use. The rating for the hand-shadow feature was about average in contrast to the hand-distinction which ranked slightly below
average. Comments regarding the specific awareness concepts were mixed, especially the DH/NDH-distinction received extremely varying feedback. On the one hand, users were skeptical because of its unusual concept and the occasional error assigning touch-input to the wrong hand due to Kinect lag. On the other hand users valued the idea as "meaningful", "parallel usable" and discussed a "potential speed gain over time" when using the application for an extended duration. Either way, almost every user stated that “it takes some time to get used to the handdistinction feature”. Interpretation
The prototype with implemented awareness concepts scored a higher hedonic quality. We consider the novel arrangement of technologies and unprecedented concept to be the main factors for this outcome. The combination of multiple input methods supplementing each other might deliver a more natural feel during the interaction. We further believe that the AEP was a rather playful approach to such a target-oriented task that led to the low pragmatic rating. Furthermore, the uncommon concept of handdistinction is rarely encountered in the everyday use, e.g. smartphones with which our users were familiar. The lack of experience with two-handed interaction and the serial nature of the task, which can be easily achieved with a single hand, are further reasons for the low pragmatic quality of this approach. There were some hard- and software issues, like the initial recognition pose (page. 4) and the occasional instability of the OpenNI tracking algorithm that could have had a negative impact on the outcome. In some cases, we observed a very unnatural posture of the users, who hovered with their hands approximately 20 cm above the screen while waiting for the next target or thinking about a location before interacting. We assume this phenomenon stems from an uncertainty of the user who does not know where to place his inactive hand. After the user study, we strongly consider switching the tasks assigned to each hand because navigating the map had to be done with the NDH in spite of turning out to be the main issue. This idea sparked during observation as the overwhelming number of subjects navigated with their DH while using the MTP. The decreased hedonic quality yet increased pragmatic ranking for the side-menu concept shows the user’s familiarity with similar features like the mouse-over-state in everyday desktop and web usage. All in all, we think our user study helped evaluate the impact of our different awareness concepts for multi-touch applications. CONCLUSION
We presented a concept for an enhanced multi-touch device, where extensive user-awareness is enabled via a depth-camera. Our goal with this additional information about the presence of different users is to build more direct and appealing interfaces. In contrast to related work and
previous expensive approaches, our solution offers a cheap and quick way to extend existing touch-devices (e.g. UbiTable [29]) with additional functionality such as perceiving users before they interact with the device. Beyond that, our technique delivers more relevant posture data like the head or hand positions of different users. This information can be used by developers and interface designers to integrate feed-forward and enhanced feedback in their multi-user and multi-touch applications. Furthermore, for a fundamental comprehension of a temporal interaction process, we propose a taxonomy with five phases. It focuses not only on the presented interaction loop, where the user’s input is considered, but also on the process of approaching and leaving. Besides the wellknown visualization forms of feed-forward and feedback, we introduce a novel aspect of device-awareness, which includes the process of approach and leave. Additionally, we summarize and structure these forms into a temporal workflow with defined person-, body-, and touchinformation. For a first overview of what is possible with such an aware input device, we discussed and prototyped various scenarios based on posture data gathered by our setup (cf. Table 1). With our user study, we were able to evaluate the pragmatic and the hedonic quality of three of our concepts in comparison to a reference application. Awareness
Feed-Forward
User Presence
Shadow Tooltip
Body-Aware UI-Elements
Interaction Preview
User Adapted Workspace
Distance Aware Menu
Hand Distinction
User Restriction
Table 1 – Overview of the Application Concepts FUTURE WORK
We see various ways to refine our approach and the presented concepts. Considering the commercial success of the Kinect, we expect further efforts by Microsoft to release a successor device possibly more capable and accurate. Additional potential can be unlocked by the ever advancing software development and the official drivers from Microsoft as well as the Kinect SDK. Especially increasing the number of tracked users and skeleton reliability could help leveraging our concepts into everyday scenarios. But even the current Kinect device can be extended e.g. with a Range Reduction Lens [23] which reduces tracking distances by up to 40%, thus increasing the viability of our idea for a broader range of setups. This could, for example, allow us to use larger multi-touch devices because of the wider field of view. Another approach is the parallel usage of more than one Kinect at the same time [2]. We expect that such a setup would extend the size and the field of view of our awareness area. Beyond that, we want to continue the work on our prototype and reach a level of reliability and robustness for further evaluation. Based on the findings of our first user study we want to improve the algorithm that assigns the touch-input to the corresponding hand, where an
error can confuse the user tremendously. The stability of this feature is critical for all future DH/NDH-concepts and the input-lag of the Kinect complicates this issue even more. We also believe a less restraining hand-distinction could improve the pragmatic quality of our prototype. For a future implementation we think the NDH could merely switch the mode for the DH. We hope to proceed with our research and implement more ideas as we are moving towards a completely aware environment similar to [35]. One advantage would be the user-distinction over multiple devices so that the session can follow the person through the room without any additional login or configuration of personal settings. Especially the combination of stationary and mobile devices in such an environment opens up various possibilities (e.g. “Magic Lenses” [19]) as the device in the user’s hand can become a tool for interacting on the large surface. Since we track the person in the room, we can estimate the position of their mobile device and compute its spatial relation to the multi-touch table. Therefore, displaying separate menus on different devices as a new way of using the interface (cf. section “application scenarios”) or show concealed data like an extra layer. We would like to investigate how humans perform two-handed tasks where one action needs the precision of a pen and which advantages stem from enhanced awareness in such a scenario. Not only pens or mobile devices as a tool are relevant but the use of tangible objects in general holds further potential for our system. In the long run, we hope that our current setup and techniques inspire other developers and researchers of multi-touch applications to accommodate novice users with additional spatial awareness, feed-forward, and feedback. REFERENCES
1.
2.
3. 4. 5. 6. 7.
Annett, M.; Grossman, T.; Wi dgor, D., and Fitzmaurice, G. Medusa: a proximity-aware multitouch tabletop. Proc. UIST 2011, ACM (2011), 337346. Berger, K.; Ruhl, K.; Brümmer, C.; Schröder, Y.; Scholz, A., and Magnor, M. Markerless Motion Capture using multiple Color-Depth Sensors. Proc. VMV 2011, 317-324. Brignull, H. and Rogers, Y. Enticing people to interact with large public displays in public spaces. In Proc. INTERACT 2003, IOS (2003), 17-24. Buxton, W. A three State model of graphical input. Proc. INTERACT ’90, North-Holland Publishing, 449-456. Dietz, P. and Leigh, D. DiamondTouch: a multi-user touch technology. Proc. UIST 2001, ACM (2001), 219-226. Dippon, A. and Klinker, G. KinectTouch: accuracy test for a very low-cost 2.5D multitouch tracking system. Proc. ITS 2011, ACM (2011), 49-52. Djajadiningrat, T.; Overbeeke, K., and Wensveen, S. But how, Donald, tell us how?: on the creation of meaning in interaction design through feedforward
8.
9.
10. 11. 12.
13.
14.
15.
16.
17.
18. 19.
20. 21.
and inherent feedback. Proc. DIS 2002, ACM (2002), 285-291. Dohse, K.C.; Dohse, T.; Still, J.D., and Parkhurst, D.J. Enhancing Multi-user Interaction with Multi-touch Tabletop Displays Using Hand Tracking. Proc. ACHI 2008, IEEE (2008), 297-302. Freitag, G.; Kammer, D.; Tränkner, M.; Wacker, M. and Groh, R. Liquid: Library for Interactive User Interface Development. Mensch & Computer 2011, Oldenbourg Verlag (2011), 202-210. Gesturecons - http://gesturecons.com/ Harrison, C.; Benko, H., and Wilson, A.D. Omnitouch: wearable Multitouch interaction everywhere. Proc. UIST 2011, ACM (2011), 441-450. Hassenzahl, M; Schöbel, M., and Trautmann, T., How motivational orientation influences the evaluation and choice of hedonic and pragmatic interactive products: The role of regulatory focus. Interacting with Computers, Volume 20 (2008), 473-479. Hornecker, E. "I don't understand it either, but it is cool" - visitor interactions with a multi-touch table in a museum. Proc. TABLETOP2008, IEEE (2008), 113-120. Jacob, R.J.K.; Girouard, A.; Hirshfield, L.M.; Horn, M.S.; Shaer, O.; Solovey, E.T., and Zigelbaum, J. Reality-based interaction: a framework for postWIMP interfaces. Proc. CHI 2008, ACM (2008), 201-210. Käser, D.P.; Agrawala, M., and Pauly, M. FingerGlass: efficient multiscale interaction on multitouch screens. Proc. CHI 2011, ACM (2011), 1601-1610. Kammer, D.; Freitag, G.; Keck, M., and Wacker, M. Taxonomy and Overview of Multi-touch Frameworks: Architecture, Scope and Features. Proc. EICS 2010, ACM (2010). Kazi, R.H.; Chua, K.C.; Zhao, S.; Davis, R., and Low, K. SandCanvas: a multi-touch art medium inspired by sand animation. Proc. CHI 2011, ACM (2011), 12831292. Kendon, A. Current issues in the study of gesture. The Biological Foundations of Gestures: Motor and Semiotic Aspects, 1986, 23- 47. Marquardt, N.; Jota, R.; Greenberg, S., and Jorge, J.A. The continuous interaction space: interaction techniques unifying touch and gesture on and above a digital surface. Proc. INTERACT'11, Springer Verlag (2011), 461-476. Microsoft Corp. Redmond WA. Kinect, Xbox 360 http://www.xbox.com/de-de/kinect Multi-Touch Display 3M http://solutions.3m.com/wps/portal/3M/en_US/Touch Systems/TouchScreen/Solutions/ MultiTouch/M2256PW/
22. Multi-Touch Technologies – http://nuigroup.com/ 23. Nyko – Kinect Zoom Kit http://nyko.com/products/product-detail/?name=Zoom 24. Open NI -http://openni.org/ 25. Owen, R.; Kurtenbach, G.; Fitzmaurice, G.; Baudel, T., and Buxton, B. When it gets more difficult, use both hands: exploring bimanual curve manipulation. Proc. GI 2005, Canadian HCC Society, 17-24. 26. Peltonen,P.; Kurvinen, E.; Salovaara, A.; Jacucci, G.; Ilmonen, T.; Evans, J.; Oulasvirta, A., and Saarikko, P. It's Mine, Don't Touch!: interactions at a large multi-touch display in a city centre. Proc. CHI 2008, ACM (2008), 1285-1294. 27. Saffer, D. Designing Gestural Interfaces. O’Reilly Media, 2008. 28. Schoning, J.; Steinicke, F.; Kruger, A., and Hinrichs, K. Poster: Interscopic multi-touch surfaces: Using bimanual interaction for intuitive manipulation of spatial data. Proc. 3DUI 2009, IEEE Computer Society, 127-128. 29. Shen, C.; Everitt, K.M., and Ryall, K. UbiTable: Impromptu Face-to-Face Collaboration on Horizontal Interactive Surfaces. In Proc UbiComp 2003, ACM (2003), 281-288 30. Sugimoto, M.; Fujita, T.; Mi, H., and Krzywinski, A. RoboTable2: a novel programming environment using physical robots on a tabletop platform. Proc. ACE 2011, ACM (2011) 31. Tanase, C.A.; Vatavu, R.; Pentiuc, S.; and Graur, A. Detecting and Tracking Multiple Users in the Proximity of Interactive Tabletops. Advances in Electrical and Computer Engineering, 8, 2 (2008), 61-64. 32. TUIO Specification -http://www.tuio.org 33. White, J. Multi-Touch Interfaces and Map Navigation. http://minds.wisconsin.edu/handle/1793/47065. 34. Wigdor, D.; Williams, S.; Cronin, M.; Levy, R.; White, K.; Mazeev, M., and Benko, H. Ripples: utilizing per-contact visualizations to improve user interaction with touch displays.Proc. UIST 2009, ACM (2009), 3-12. 35. Wilson, A.D. and Benko, H. Combining multiple depth cameras and projectors for interactions on, above and between surfaces. Proc. UIST 2010, ACM (2010), 273-282. 36. Yang, X.; Grossman, T.; Irani, P., and Fitzmaurice, G. TouchCuts and TouchZoom: enhanced target selection for touch displays using finger proximity sensing.Proc. CHI 2011, ACM (2011), 2585-2594.