2012 Eighth International Conference on Intelligent Environments
Toward a Two-Handed Gesture-based Visual 3D Interactive Object-Oriented Environment for Software Development Raul A Herrera Acuna
Christos Fidas
Vasileios Argyriou
Sergio A. Velastin
SEC, Kingston University London UK
[email protected]
ECE University of Patras Patra Greece
[email protected]
SEC, Kingston University London UK Vasileios.Argyriou@kingsto n.ac.uk
SEC, Kingston University London UK
[email protected]
intuitive human computer interaction approach. This is supported by computer vision so as to recognize and understand human gestures, motion and action. The understanding of human motion and actions are complex and challenging tasks as they can be approached from various levels of detail [2]. Recently, there is an increased interest on the design and development of new generation frameworks for software development. The basic idea behind this interest lies in the evolution from tangible user interfaces to "spatial" user interfaces used in intelligent spaces [2, 3], support systems for software development [4], new platform designs [5] and educational systems [6] that use video, tracking systems, augmented reality and other techniques to create a base line for creation in so-called "luminous spaces" enhanced with current hardware technology [7].
Abstract—This paper presents the conceptual and architectural design of a multilayered framework aiming to provide a two handed gesture based visual interactive 3D object oriented environment for software development. We argue that this is a viable, intuitive and attractive approach for software development facilitating natural human computer interaction, thus supporting tasks related to software development in a more effective and efficient way. We illustrate the value of the proposed environment through a data analysis from early evaluations of one prototype system in use. The results and implications of this study are useful for designing effective and efficient gesture based 3D interactive software development environments. Keywords-Human-Computer Interaction, Computer Vision, Hand Posture Recognition
I.
This paper presents the conceptual and architectural design of a novel framework to develop software using a natural human computer interface with the use of the bare hands, solving along the way some of the most problematic issues in this area such as finger tracking, gesture identification and recognition. Based on our analytical and empirical work we promote the view that the proposed framework presents a viable alternative to conventional techniques as it provides unrevealed possibilities to design, develop, monitor, access, test and debug software which otherwise would be very difficult to perform in a two dimensional environment.
INTRODUCTION
Human computer interaction (HCI) lies at the crossroads of many scientific areas including artificial intelligence, computer vision, face recognition, motion tracking, etc. In recent years there has been a growing interest in improving all aspects of the interaction between humans and computers. It is argued that to truly achieve effective human-computer intelligent interaction (HCII), there is a need for the user to be able to interact naturally with the computer, similar to the way humanhuman interaction takes place. Thus, new approaches to computer-human interfaces like multi-touch systems, luminous rooms, gesture interpretation devices, tangible user interfaces, etc., have created a new need in software design and development [1]. Indeed, there are situations in which well established input and pointing devices lack efficiency and effectiveness especially in application areas in which 3D interactions are closer to real world experiences and functional and structural user mental models. As a consequence, human computer interaction is limited mainly because twodimensional degrees of freedom cannot properly emulate the 3 dimensions of space and thus limits human computer interaction in many ways [1].
II.
Within this realm, gesture-based human computer interaction has shifted lately to the center of attention as a more 978-0-7695-4741-1/12 $26.00 © 2012 IEEE DOI 10.1109/IE.2012.28
CONCEPTUAL DESIGN
Conventional programming environments provide little explicit support for developing software for 3D interaction environments, mainly because they have followed evolutions in graphical user interfaces from typical console-based writing of code to visual programming environments [8, 9]. As highlighted in [10], these components are not advanced enough to provide the necessary flexibility and clarity to understand many aspects of system development that could be better captured with a full 3D graphical user interface. Thus, current software developing environments entail several drawbacks inherited by the two dimensional representation which has been
359
the principal way of interacting with software and computer systems. This entails several limitations: The interaction with the real world is the principal source of information from which humans create empirical models. Since the interaction with the computer takes place in two dimensional spaces, it is argued that the transformation of real world empirical knowledge to declarative programming constructs is limited [6, 11]. The potential efficiency and effectiveness in a 3D environment with 3D interaction support should make the software development task easier to accomplish, allowing even novice developers to perform tasks faster [12, 13]. A major limitation of 2D interfaces lies on the fact that they narrow down software presentation and abstraction which results in less efficiency and effectiveness of software development e.g. the possibility of connecting information in more than two dimensions is unavailable or poorly supported [14, 15]. For example, a 3D model for a database needs to be generated in several square tables for 2D interaction, meanwhile this kind of information can be directly modeled in a cube, and accessing it with 3D interaction will be more efficient than several commands in a 2D system. Furthermore, rotating the visual 3D space can provide alternative viewpoints enhancing tasks such as system design, composition, presentation and comprehension.
Figure 1. Left side: 3D interaction styles (select, grab, put in background, zoom in/out, rotate), Right side: Special commands (clear workspace, copy element, menu, open file, close file, keyboard))
As depicted in figure 1 (left side), the following operational direct interaction modes are supported: 1) Selection: Selection is supported by placing a hand over an element. Subsequently the border color of the selected object changes thus indicating a successful selection. Furthermore, a 3D rotation panel is automatically displayed in case in the user requires to rotate the element. 2) Grab and translate: The process of grabbing and translating objects is supported in two stages: selecting the object and afterwards grabbing it by closing the fingers and finally moving it. Re-opening the fingers releases the object. 3) Put in background: With the object selected, to move it along the z-plane (depth), the movement will be ‘pushing’ with the hand. 4) Zoom in/out: After selecting the object, zoom-in (or increasing the size of the object) is performed by using two fingers and then separating them. Closing back the fingers produces the opposite effect (zoom out). 5) Rotation: 3D objects can be rotated through the introduction of a new type of bi- manual command. In rotation, one hand is placed in a “3D interaction” space while the other one performs the rotation. For example: Rotation axis XY: Performed as in 2D. Rotation axis YZ/XZ: Move the hand up/down, “pressing” the “rotation YZ or XZ” button. 6) Special commands: As depicted in Figure 1 (right side), special commands provide access to special features of the 3D software developing environment, like: Clear workspace: Slide the hand over the screen; Copy element: Select first the element (as mentioned before) and raise two fingers; Menu/Dive-in (general): Two hands move forward; Open File: Two hands near each other and then, separate them. Close File: two hands separated, and then put them closer. Keyboard: Move up both hands closed upwards.
Target application domains that can benefit significantly from this architecture are the ones where spatial positioning of the code has logical and visual reflections in a 3D model. For example, when working on robot programming it may help to have the visual code for the left arm at the corresponding 3D position of a robot’s visual model. Another example is 3D protein synthesis, where spatial locations have both conceptual and physical meaning. Furthermore, applications in this area will increase due to emerge of 3D monitors and holographic displays. III.
INTERACTION STYLES IN A 3D SOFTWARE DEVELOPMENT ENVIRONMENT
With the aim of supporting an intuitive gesture based HCI it is proposed to support interaction with the use of both hands and fingertips to grab and move elements around or perform special commands based on a particular movement of the hands or a sequence of movements. Such an approach within a 3D environment reduces the limitations of the typical interaction devices [16] and simultaneously provides an opportunity to improve the understanding, usability and overall user experience of software tools [14] or entertaining technologies [18, 19].
360
IV.
ARCHITECTIRAL DESIGN
eight example tools, as depicted in Figure 2, described from left to right and top-down as: Libraries, Text input/output, Main Function, Functions, Loops, Variables, Branching (ifelse); and Statements. The selection of the shapes at this stage is arbitrary, but they are adjusted according to target application domains and contexts of use aiming to provide visual metaphors of cognitive constructs in various application domains, e.g. robot programming, protein synthesis, multidimensional databases, etc.
A. Framework Architecture This section presents the framework architecture using examples to demonstrate the overall interaction mechanism in a 3D software developing environment. Initially a simple layer division is defined for the framework architecture according to the elements presented in the previous section. The layer architecture is shown in Figure 2 and an analysis follows.
C. Element connection and interaction in the SDK Figure 3 shows a typical view of the software development toolkit. On the right side of the developing area the available set of 3D objects are presented within a scrolling frame. On the left side of the window the ‘objects info’ frame displays basic information of a selected object and also tells the developer if the object contains any other element inside (e.g. an inherited object). Furthermore, on the left side of the main window a cube is provided to support multiple presentations for the developer, related to the software under development. Using the cube the developer can, in a direct 3D manipulation manner, gain access to different features of the framework, such as ‘code view’ (where the developer can create software using graphic icons), ‘interface view’ (where the developer can access the GUI creation screen with 3D elements), ‘model view’ (final interface model), ‘debugging view’ etc. Selecting an element from the graphic icon menu and using it in the working space is an action that involves a ‘selection’ from the developer, as described previously, and a ‘translation’ of the object on the working space.
Figure 2. Left side: Architectural design, Right side: 3D visual programming objects (Libraries; Text input/output; Main Function; Functions; Loops; Variables; Questions (if-else); and Statements (Classes).
1) Hardware data acquisition: This layer is related to the drivers and necessary APIs to retrieve the information from the gesture acquisition device. 2) Hand Gesture acquisition: In this layer, the information received from the hardware is processed and the identification of hand’s fingers and their relative position is performed. 3) Gesture Interpretation: The hand and finger information is associated to specific actions over the objects in a relative space position, and based on this information the appropriate procedure is performed over the data object. 4) Command graphic association: This layer is responsible for the action visualization and the association between the “logic” objects and their graphical representation. The information is passed immediately to the graphical interface layer.. 5) Data Management: In this layer the purpose is to store all the data associated to the different actions performed both in the short and the long terms. This layer also has control over objects, structures and applications of the framework’s library. 6) Graphical Interface Display: This layer controls the information displayed on the output device and the visualization of the actions performed by the developer.
The parameters of an element can be modified by placing the hand over the object and performing the ‘press’ action. This action will trigger the object’s ‘inner view’ to appear and to provide access to further parameters related to the selected element. Regarding the connection between 3D elements, this action is performed by ‘selecting’ the element with one hand and connecting it with the other using the second hand, see figure 3.
B. Definition of 3D visual elements and development of a 3D interactive software development toolkit To create a framework for software development using hand gesture interaction, it is essential to define a new graphical programming language based on objects and visual tools which can be directly manipulated as extensions of the framework's base programming language. Initially, we define
Figure 3. Connection between 3D elements.
361
V.
RESULTS AND FUTURE WORK
[11] Takeoka, Yoshiki and Miyaki, Takashi and Rekimoto, Jun, "Z-touch: an infrastructure for 3d gesture interaction in the proximity of tabletop surfaces", ACM Interactive Tabletops andSurfaces, 2010. [12] Ishii, Hiroshi and Ratti, C and Piper, B and Wang, Y and Biderman, A and Ben-Joseph, E, "Bringing Clay and Sand into Digital Design — Continuous Tangible user Interfaces", BT Technology Journal, vol. 22, no. 4, 2004, pp. 287-299. [13] Conway, Matthew and Audia, Steve and Burnette, Tommy and Cosgrove, Dennis and Christiansen, Kevin, "Alice: lessons learned from building a 3D system for novices", SIGCHI, (8), 2000, pp. 486-493. [14] Hornecker, Eva and Buur, Jacob, "Getting a grip on tangible interaction: a framework on physical space and social interaction", SIGCHI, (10), 2008, pp. 437-446. [15] Jacob, Robert J.K. and Girouard, Audrey and Hirshfield, Leanne M. and Horn, Michael S. and Shaer, Orit and Solovey, Erin Treacy and Zigelbaum, Jamie, "Reality-based interaction: a framework for postWIMP interfaces", SIGCHI, (10), 2008, pp. 201-210. [16] C. Hand, "A Survey of 3D Interaction Techniques", In Computer Graphics Forum, 16 (5), 1997, pp. 269-281. [17] hang, Xu and Chen, Xiang and Wang, Wen-hui and Yang, Ji-hai and Lantz, Vuokko and Wang, Kong-qiao, "Hand gesture recognition and virtual game control based on 3D accelerometer and EMG sensors", 14th Intelligent user interfaces, (6), 2009, pp.401-406. [18] Schlomer, Thomas and Poppinga, Benjamin and Henze, Niels and Boll, Susanne, "Gesture recognition with a Wii controller", Proceedings of the 2nd Tangible and embedded interaction, (4), 2008, pp 11-14. [19] Daniel F. Keefe, "Designing with Your Hands: Using 3D Computer Interfaces and Gesture to Model Organic Subjects"Minnesota, Technical Report, 2008.
A novel framework for software development based on two handed gesture interaction in 3D space is presented in this paper. The conceptual design indicates the main novel features and how they can improve the user experience and efficiency in a 3D software developing interface. Interaction styles were introduced based on hand gestures on 3D space and finally the framework architecture including definitions of basic graphic tools and mechanisms for framework interaction were analyzed. At the conceptual level the framework supports software developers by providing explicit support for declarative and procedural tasks related to software development within a 3D environment, whereas on an operational level the framework supports the software development process by defining new interaction styles in a 3D software environment. Taking into consideration the emergence of 3D monitors and that software developing is limited only by the supporting technology, approaches like the proposed one are of general value in designing future software developing environments. Although, early qualitative indications from performed studies already provide promising feedback related to the value of such an approach, future work consists in performing larger scale studies in different contexts aiming to establish statistical evidence and increase external validity of the performed studies so far. REFERENCES [1]
Jaimes, A. and Sebe, N. "Multimodal human-computer interaction: A survey", Computer Vision and Image Understanding, 108 (1-2), 2007, pp. 116-134. [2] Harris, M., Buxton, B., Freeman, W. T., Ishii, H., Lucente, M. and Sinclair, M. J. (1998) "Interfaces for humans (panel): natural interaction, tangible data, and beyond", Conference Abstracts and Applications of SIGGRAPH ’98, 1998, pp. 200-202. [3] Ratti, C., Wang, Y., Ishii, H., Piper, B. and Frenchman, D. "Tangible User Interfaces (TUIs): a novel paradigm for GIS", Transactions in GIS, 8 (4), 2004, pp. 407-421. [4] Vazquez, G., Andres Diaz Pace, J. and Campo, M. "Reusing design experiences to materialize software architectures into object-oriented designs", Information Sciences, 2010, pp 242-250 . [5] Esnault, N., Royan, J., Cozot, R. and Bouville, C. "A flexible framework to personalize 3D web users experience", Web3D "10 Proceedings of the 15th International Conference on Web 3D Technology, 2010, pp. 35-44. [6] Chittaro, L. and Ranon R.,"Web3D technologies in learning, education and training: motivations, issues, opportunities", Computers & Education, 49, 2009, 3-18. [7] Wilson, A. D. "Using a depth camera as a touch sensor"ITS " 10th ACM International Conference on Interactive Tabletops and Surfaces, 2010, pp. 69-72. [8] Dang, N. T., Tavanti, M., Rankin, I. and Cooper, M. "A comparison of different input devices for a 3D environment", International Journal of Industrial Ergonomics, 39 (3), 2009 pp. 554- 563. [9] Clerici, S., Zoltan, C. and Prestigiacomo, G. "NiMoToons: a Totally Graphic Workbench for Program Tuning and Experimentation", Electronic Notes in Theoretical Computer Science, 258 (1), 2009, pp. 93-107. [10] Wachs, J. P., Kölsch, M., Stern, H. and Edan, Y. "Vision-based handgesture applications", Communications of the ACM, 54 (2), 2011, pp. 60-71.
362