Introducing TANGerINE: A Tangible Interactive Natural Environment Stefano Baraldi, Alberto Del Bimbo, Lea Landucci, Nicola Torpei
Omar Cafini, Elisabetta Farella, Augusto Pieracci, Luca Benini
MICC DSI - University of Florence viale Morgagni, 65 Florence 50134, Italy +39 055 4237409
Micrel Lab @ DEIS - University of Bologna viale Risorgimento, 2 Bologna 40136, Italy +39 051 2093787
[stefano.baraldi; lea.landucci; nicola.torpei]@unifi.it,
[email protected]
[ocafini; efarella; apieracci; lbenini]@deis.unibo.it
ABSTRACT In this paper we describe TANGerINE, a tangible tabletop environment in which users can interact with digital contents manipulating tangible smart objects. Such objects provide continuous data about their status through the embedded wireless sensors, while an overhead computer vision module tracks their position and orientation. Merging sensing data, the system is able to detect a richer language of gestures and manipulations both on the tabletop and in its surroundings, enabling for a more expressive interaction language across different contexts.
Categories and Subject Descriptors H.5.2 [User Interfaces]: Evaluation/methodology, Graphical user interfaces (GUI), Input devices and strategies, Interaction styles, Prototyping, User-centered design.
General Terms Algorithms, Design, Experimentation, Human Factors, Theory.
Keywords HCI, tabletop, natural interaction, TUI, wireless sensor node, smart object.
1. INTRODUCTION Research on natural interfaces has exploited results from pattern recognition, Computer Vision (CV) and speech understanding [6] in order to develop interfaces which are easy to use in every-daylife. Applications that deal with browsing and exploration of multimedia contents are usually based on large interactive surfaces, on which users can manipulate elements through direct and spontaneous actions. This research led to systems based on gesture recognition and analysis [8] of users bare hands [1] [9]. In the case of complex applications, featuring multiple options and actions, simple and spontaneous hand gestures turn out to be not enough. Solutions could be: a) enriching the interaction language adding new complex gestures that map to actions. This could distort the naturalness of interaction forcing users to learn Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference’04, Month 1–2, 2004, City, State, Country. Copyright 2004 ACM 1-58113-000-0/00/0004…$5.00.
unnatural gestures or b) introducing an intermediate visual level using interface elements (such as menus, icons etc.). This would reduce the interaction directness causing a conflict between digital contents and interface elements, both sharing the same visualization area. The result is that such solutions could increase the user cognitive load. Tangible user interfaces (TUI [3]) can be an alternative solution to those mentioned. They introduce physical, tangible objects that the system interprets as embodiment [3] of the elements of the interaction language. Users, manipulating those objects, inspired by their physical affordance [12], can have a more direct access to functions mapped to different objects. TUIs have a broad literature; several systems approached the use of passive physical objects [4] with recognizable shapes or encodings, as well as smart objects embedding sensors [11]. In the taxonomy of TUIs we are especially interested in digital desks or tables like [5] or the recently presented Microsoft Surface Computing platform [7] where the focus of the interaction design is on the relationship between the physical and the digital object. This paper will introduce TANGerINE, a tangible tabletop environment where users manipulate smart objects in order to perform actions on the contents of a digital media table[5]. Unlike previous approaches [5][3], we want users to be able to interact with the system and the objects in three contexts: the active presentation area (the surface of the table), the nearby area (around the table) and the external space (a transitional space between different tabletops). Every area is characterized by a different focus of the interaction, and our aim is to develop a language that is natural and consistent across the contexts. In this paper we present our ongoing work: we will focus on the first two areas of interaction (active and nearby). A combination of computer vision tracking techniques and signal analysis of wireless sensors embedded in smart objects are employed to augment a tabletop surface, and will be briefly analyzed in the following sections, as well as the interaction design and applications.
2. TUIs IN TABLETOP SCENARIO 2.1 Research framework In the digital media table scenario, tangible interface elements (tangibles) have to be considered both in relation to the ergonomics of the horizontal surface around which users stand or
sit, and the digital contents visualized on the same space. From the physical point of view, the suitable tangibles that can be easily used for the interaction on a tabletop are those that assume a stable steady state when settled on a horizontal plane. In order to define our research framework, we identified several actions and roles that a tangible object can play in the tabletop scenario. Such tangibles may have a direct relation with one or more digital objects becoming a simulacrum[13] of a single entity or a collection. This is possible through the alteration of the tangible’s visual appearance. Tangibles can also be identified as a function applied to an object, as to say a manipulator of different features of the digital content (i.e. shape, size, linked information etc.). Finally, they can be used as user’s digital simulacrums (avatar). The tabletop scenario is also characterized by different context according to the area where the interaction occurs: Active Context (AC): it is the horizontal visualization surface, typically the scene where users interact with tangibles (recognized by the system) as well as digital elements. In this area there is a direct mapping between the position and orientation of tangible objects and the digital ones. Nearby Context (NC): it is the area right around the tabletop where both intentional and non intentional actions can be performed. The body of the user can be tracked and this information can be used to study his behavior. The position of the user can be useful also for attributing the ownership of actions performed in the AC. In this context tangibles position could not be precisely tracked, as a result of a less constrained user behavior in dealing with physical objects (e.g. user could occlude the object), but can still be manipulated and provide information about their orientation in space.
The user intuitively considers the uppermost face “active”, as if reading the face of a die, and therefore identifies the object as being able to embody six different actions or roles. The cube faces can be used as visualization areas for symbols related to the current object role (augmentation) and also to provide the space for markers needed to object detection and tracking through CV techniques (see section 3.2). In the active context the relation between the cube steady state and the “active face” is the most important: users can place the object on the surface and move it sliding over the table while keeping the same upper face, or grab it and rotate it to choose another face. The cube can also be taken by the user and manipulated out of the active area, in the nearby context. In this case the manipulation has more degrees of freedom. The variety of these actions allows for a more expressive interaction language and provides to the application designer an environment with richer modes of operation that depend on the context in which the user acts. As mentioned in the Introduction, we started investigating how to map these actions exploiting Active and Nearby Contexts through a simple quiz application based on the Oscar 2007 (see figure 1). In the NC the user is able to choose one of the six quiz topics (actor, actress, movie, script, direction and soundtrack) by turning the cube. When a topic is selected, the user can answer the question related by positioning the cube on the table (AC). If he/she hits it, the interface presents some contents about the selected topic, which can be explored turning the cube and positioning it on the tabletop. The main menu is once again visualized when the cube comes back in the nearby area,.
External Context (EC): it is the outer area, unrelated with the first two contexts. In this area no position tracking occurs, but the user can still interact with the tangible object and carry it with him across different tabletops. The object therefore becomes a bridge between different interactive artifacts. The user could perform some actions on a tabletop and use the same tangible on other artifacts, in this case the physical object can become a container of different kind of information (e.g. session data or user profile). In order to investigate and develop the interaction language across the different context we implemented the first layout of the tangible interactive environment as described in the following section.
Figure 1. Sample application investigating contexts.
2.2 Current status
3. SENSING ARCHITECTURE
The current TANGerINE system layout consists of a ceiling mounted case that embeds all of the required elements: computer, projector, camera and illuminator, targeting the horizontal surface of a normal table that is positioned under the case, where also the interface is visualized [9].
The TANGerINE sensing is composed of two modules: the computer vision module that is responsible for tracking the smart objects on the surface (identifying their position, orientation and identification) and the data collection module that receives the status of the smart objects sensors through wireless communication.
Users interact with the system manipulating a physical object. We chose a cube shape for the availability of six steady states, as well as its clear affordance.
3.1 The Smart Micrel Cube The Smart Micrel Cube (SMCube) is a wooden cube case with a matrix of infrared emitter LEDs on each face. It embeds a
WiMoCA node [10] upgraded version, a wireless sensor node recently enhanced with Bluetooth capabilities and extended to enable actuation. The WiMoCa node is extremely flexible thanks to its modular architecture to ease fast replacement and update of each functional layer. The main layers of the node are the Power Supply Layer (PSL), MicroController & Sensors Layer (MCL), the Actuation Layer (AL) and Wireless Transmission Layer (WTL). The main hardware components are ATmega8 AVR RISC architecture, an 8 bit microcontroller with low-cost and lowpower characteristics. The sensor embedded in the architecture is a MEMS tri-axial accelerometer by STM with a programmable full scale of 2g or 6g and digital output. The communication section is based on a Bluetooth 2.0 transceiver that operates at 2.4 GHz and supports Serial Port Profile (SPP). The actuation is provided optionally by the infrared LEDs, already mentioned, or by a vibra motor. The actuation layer is controlled via the I/O port of the microcontroller and the help of an encoder. At present the cube measures 6,5cm3. In the worst case, i.e. LEDs always on, transmission enabled at the maximum frame-rate (30pk/sec), the Smart Micrel Cube reaches up to 8 hours of autonomy. [average current consumption = 60mAh ]. Considering a more typical use, i.e.: LEDs on for the 50% of the time, transmission enabled at the medium frame-rate (15pk/sec), the lifetime reaches up to 12 hours of autonomy [average current consumption = 45mAh].
led matrix on the top face is turned on for the CV subsystem to track and identify the object. In a similar way, the tilt can be translated in vibration or, in future work, audio feedback. Bluetooth communication. The cube tilt, and more generally all the information stored in it (e.g. its id number, other attributes relating to its state, etc.) can be sent wirelessly to any device enabled with Bluetooth communication capabilities. At present the packet effectively sent contains different fields, in particular the raw accelerometers data, the code corresponding to the cube face currently lighted, the id of the cube. The cube provides two operational modalities: Inquiry mode or Continuous mode. In Inquiry mode, when a transition from a face to another is detected, a single transmission is carried out. We have introduced a configurable latency (or reaction time), consisting of the number of data frames after which the cube is considered halted at a certain tilt. This functionality is added to hide transitional states. In this mode, the cube is responsive to inquiry commands, e.g. requests of further transmissions of the packet containing the state of the cube. In Continuous mode, the data-packet is sent continuously at a configurable frame-rate. The overall system is then informed frame by frame of the actual cube state and can monitor its movements. This mode enables, for example, signal processing for gesture recognition and cube motion tracking. Furthermore, in both operational modes, additional inquiry commands can be performed, as turning off and on the LEDs, changing the frame rate, switching between modes or modifying the configurable latency for face detection.
3.2 Computer Vision
Figure 2. The SMCube case and embedded WiMoCa node. The SMCube can be considered a tilt-aware artifact, with intelligence on board to perform sensing, actuation, storage and processing of data. The cube is identified by an id number (different from the pattern id described in section 3.2), which helps disambiguating when more than one cube is present at a time. Thanks to its wireless communication capabilities it can receive queries and controls and exchange bidirectional information with the environment in which is placed. Therefore, the use of the SMCube in a multi sensory enhanced context enables the use of redundancy to improve recognition abilities of the overall system. The cube can provide both direct feedback on its state via wireless communication and visual, audio or tactile feedback by use of its actuation capabilities. Visual (and other) feedback. The tri-axial accelerometer embedded in the cube measures static and dynamic acceleration. The former is used to extract the tilt of the cube with respect to gravity acceleration vector by use of simple trigonometric consideration on the acceleration collected along the three axes [11]. Therefore, the SMCube is able to derive which of the six faces is the top or the bottom face at a certain instant. The result is both stored on the cube and translated in visual feedback, i.e. the
Computer Vision techniques are applied to obtain LEDs detection and tracking in order to understand cube's position on the tabletop surface. The analysis of LEDs pattern gives us the absolute orientation of the SMCube. Infrared LEDs mounted on the cube's faces are easily detected by a monochrome camera equipped with a matching band-pass filter (see figure 3). For each frame of video (captured at 30fps at a resolution of 640x480 pixels) just simple image processing operations (noise removal, background subtraction, thresholding and connected components analysis) are done. For every point (blob) detected an algorithm is run in order to search for a known pattern. The matrix of LEDs pictured in figure 3 has been designed to provide both the 2-dimensional orientation of the cube and the identification of the cube. The orientation is evaluated in relation to the absolute axis perpendicular to the table surface. The cube form-factor and the border size of the face provide enough space to avoid ambiguous detections, allowing cubes to be adjacent in every orientation.
Figure 3. Infrared image of the cube with LED pattern.
Pattern detection algorithm. The LEDs pattern on every face of the cube is composed of 8 points (see figure 4): points p3 to p8 stay on the perimeter of a circle centered in p1 and point p2 stands on the radius between the center and p3. In the basic configuration only points p1, p2, p3, p5 are switched on (figure 3), the remaining points are used as a binary encoding of the cube id.
between different interactive contexts will be investigated introducing External Contexts as connection areas between two or more Nearby Contexts. Finally, we will develop a user profiling technique addressing possible application design issues.
REFERENCES [1] C. Bèrard, Bare-Hand Human-Computer Technische Universitt Berlin, Germany, 2001.
Interaction,
[2] F. Block, A. Schmidt, N. Villar, H.-W. Gellersen, Towards a Playful User Interface for Home Entertainment Systems, in European Symposium on Ambient Intelligence (EUISAI), 2004, Eindhoven, The Netherlands, p. 207-217 [3] K. P. Fishkin, A taxonomy for and analysis of tangible interfaces, Personal and Ubiquitous Computing, 2004.
Figure 4. Cube identification and orientation pattern. The algorithm runs a fast rejection filter in order to select points that are center candidates, i.e. points for which exists one unique point p2 at a distance u and it does not exists any points at a distance greater than u and lower than 2u. If this condition is satisfied point p3 can be searched at the same distance from p2 along the estimated radius and then estimate the position of p5. At this point the pattern is considered valid and the orientation can be calculated as the angle of the radius with an absolute axis. Last step is to identify the pattern id of the cube by counting which of the p4, p6, p7, p8 LEDs are lit up. If points on the perimeter are misplaced the algorithms fails. The binary id of the cube is calculated considering p4 the least significant bit, and p8 the most significant. In this way we can detect up to 16 cubes. While we don’t have yet precise efficiency measures of the algorithm performance, our tests suggest that this technique works reliably in various office-lightning conditions. The use of bright LEDs permits the camera to be operated at lower shutter values, raising the robustness of the tracking.
4. CONCLUSIONS AND FUTURE WORK In this paper we introduced our ongoing project TANGerINE, a natural interactive framework that exploits both wireless sensors electronics and CV techniques in order to enrich tabletop interaction. Our future work will focus both on technological advances in the sensing platform and the investigation of context-aware application design. More cubes will be introduced to study collaborative applications and the extensible smart object architecture will be enhanced with new sensors to detect more events (e.g. capacitive sensors for grasp detection) and to reinforce estimates (e.g. magnetometer to provide measure of the orientation both in the active and in the nearby context) also exploiting sensor fusion techniques. Also the CV module will be expanded with a context-camera developing an algorithm able to track users in the nearby area and to merge this information with events from the active area. The role of the objects as a bridge
[4] S. Jordà, M. Kaltenbrunner, G. Geiger and R. Bencina, The reacTable, Proceedings of the International Computer Music Conference (ICMC2005), Barcelona (Spain) [5] A. Mazalek, M. Reynolds, G. Davenport, TViews: An Extensible Architecture for Multiuser Digital Media Tables, Computer Graphics and Applications, IEEE, Volume 26, Issue 5, Sep-Oct 2006, pp.47-55. [6] I. Marsic, A. Medl, J. Flanagan, Natural communication with information systems, Rutgers Univ., Piscataway, NJ, USA , Aug. 2000. [7] Microsoft Corporation, http://www.microsoft.com/surface/, Microsoft Surface, 2007 [8] V.I. Pavlovic, R. Sharma, T.S. Huang, Visual interpretation of hand gestures for human-computer interaction: a review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997. [9] S.Baraldi, A. Del Bimbo, A. Valli, L. Landucci, wikiTable: finger-driven interaction for collaborative knowledgebuilding workspaces. In Proceedings of the 2nd IEEE Workshop on Vision for Human Computer Interaction (V4HCI) in conjunction with IEEE CVPR 2006, New York, 22 june 2006 [10] E. Farella, A. Pieracci, D. Brunelli, , A. Acquaviva, L. Benini, B. Riccó “Design and Implementation of WiMoCA Node for a Body Area Wireless Sensor Network” In proceedings of the IEEE International Conference on Sensor Networks (SENET). Montreal, Canada. August, 2005, pages 342-347 [11] A. Schmidt, D. Schmidt, P. Holleis, M. Kranz: A Display Cube as a Tangible User Interface, 2005 [12] Sheridan, J.G., Short B.W. Kortuem, G., Van-Laerhoven, K. and Villar, N., Exploring Cube Affordance: Towards a Classification of Non-Verbal Dynamics of Physical Interfaces for Wearable Computing. Proceedings of EuroWearable 2003, HP Labs, Bristol. [13] Mihai Nadin, Anticipatory computing, ACM Ubiquity, Volume 1 , Issue 40 (Dec 12-18, 2000) table of contents, Article No. 2 , 2000