One-Handed Interaction with Augmented Virtual ... - ACM Digital Library

2 downloads 90023 Views 3MB Size Report
2008 ACM 978-1-60558-335-8/08/0012 $5.00. One-Handed Interaction with Augmented Virtual Objects on Mobile Devices. Byung-Kuk Seo∗. Junyeoung Choi†.
THIS IS THE AUTHOR’S VERSION OF THE WORK. THE DEFINITIVE VERSION OF RECORD WAS PUBLISHED IN, DOI 10.1145/1477862.1477873.

One-Handed Interaction with Augmented Virtual Objects on Mobile Devices Byung-Kuk Seo∗

Junyeoung Choi†

Jae-Hyek Han‡

Hanhoon Park§

Jong-Il Park¶

Department of Electronics and Computer Engineering Hanyang University, Seoul, Korea

Abstract We present a one-handed approach for augmented reality and interaction on mobile devices. The proposed application considers common situations with mobile devices such as when a user’s hand holds a mobile device while the other hand is free. It also supports natural augmented reality environment such as when a user interacts with augmented reality contents anytime and anywhere without special equipment such as visual markers or tags. In our approach, a virtual object is augmented on the palm of a user’s free hand, as if the virtual object is just sitting on the palm, using a palm pose estimation method. The augmented virtual object reacts (e.g. moving or animation) to motions of the hand such as opening or closing the hand based on fingertip tracking. Moreover, it provides tactile interactions with the virtual object by wearing a tactile glove with vibration sensors. This paper describes how to implement the augmented reality application, and preliminary results show its potential as a new approach to mobile augmented reality interaction. CR Categories: I.3.6 [Computer Graphics]: Methodology and Techniques—Interaction techniques; H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systmes—Artificial, augmented and virtual realities Keywords: augmented reality, interaction, mobile application

1

Introduction

Mobile devices such as personal digital assistants (PDAs), ultramobile PCs (UMPCs), and mobile phones have been steadily improved, and their portability has provoked augmented reality (AR) researchers to consider them as powerful platforms. Recently, a variety of mobile AR applications have been developed in various fields such as games, edutainment, guidance, and industry. The Invisible Train, which is a multi-user AR game on PDAs, is a good example of mobile AR applications [Wagner et al. 2005]. In this game, players can operate railroad track switches and can adjust the speed of virtual trains with their PDAs. [Wagner et al. 2006] presented a collaborative art history educational game running on PDAs, where users sort a collection of artworks along a timeline and most recently, [Herbst et al. 2008] also presented an outdoor edutainment game called TimeWarp, which is a historical exploration game, using a mobile AR system. ∗ e-mail:

[email protected] [email protected] ‡ e-mail: [email protected] § e-mail: [email protected] ¶ e-mail: [email protected] † e-mail:

Figure 1: Virtual pet on your palm using a mobile phone (Samsung SPH-M4650).

[Narzt et al. 2003] proposed a new PDA-based mobile AR navigation system that pervasively extracts position and orientation information from any sensory source and implemented its prototypes for cars and pedestrians. [Honkamaa et al. 2007; Schall et al. 2008] demonstrated AR planning of buildings and underground infrastructures such as pipeline networks in outdoor environments using UMPCs. So far, mobile devices have been used for interactions between users and great amount of information in their daily lives. Radio frequency identification (RFID) or near field communication (NFC) technology is a typical form of mobile interaction providing wireless links between mobile devices, real objects, and information. In particular, the development of mobile cameras (especially, phone cameras) has enabled visual tagging for mobile interactions where visual tags are recognized through mobile cameras and corresponding services have been provided to users. For example, QR codes [Rohs 2008], which are visual tags, are popularly used in mass markets such as advertisements, business cords, and coupon flyers in Japan and Germany. Such mobile interactions have also been issued in mobile AR applications [Rohs 2005; Olwal 2006]. In recent years, several researchers have proposed a new type of mobile AR interaction that enables visual interactions with virtual objects augmented on the real worlds. For instance, [Paelke et al. 2004] presented a mobile AR game based on foot-based interactions in which players can kick a virtual ball on a mobile phone. [Henrysson et al. 2005] implemented a mobile phone-based AR application that enables interactions with virtual objects by AR scene assembly such as positioning, rotation, scaling, or cloning. In this paper, we propose a new mobile AR application that focuses on the user’s hand that is not being used to hold mobile devices. That is, a virtual object is augmented on the palm of the free hand and the user can interact with the virtual object by motions of the hands such as opening or closing the hand. The proposed application supports natural augmentation anytime and anywhere that a user freely accesses AR contents without visual markers or tags.

Figure 2: Palm pose estimation. (a) Captured scene, (b) segmentation of the hand region, (c) contour of the hand region,(d) mean direction of the hand region, (e) dominant features of the hand region, (f) distances between the two orthogonal points of the local direction of the mean points, (g) the starting point of the forearm, (h) palm direction, (i) detection of the convexity defect point between the thumb and the index finger, (j) projection model, (k) rendering of the virtual object.

It also provides tactile feedback as well as visual feedback so the user can perceive realistic sensations from the virtual object using a tactile glove with vibration sensors. Our approach is based on a palm pose estimation method for determining a virtual object’s pose. Generally, the hand pose estimation methods are either 3-D model-based or model-free [Erol et al. 2007]. 3-D model-based methods use prior 3-D models of a hand to robustly track motions of the hand and fingers. Model-free methods use natural information of the hand, such as silhouette images, for efficient hand pose estimation; or they use special information such as color bands for fast and easy hand tracking. In this paper, we propose a new palm pose estimation method using natural features (palm direction, starting point of the forearm, and the convexity defect point between the thumb and the index finger), which are rarely influenced by motions of the fingers. The proposed application is concerned about user perceptions of realistic sensations from the virtual object on the user’s palm in two ways: visual interaction and tactile interaction. Using our simple fingertip tracking method, the virtual object reacts to motions of the user’s hand and fingers such as opening or closing the hand. With these visual interactions, tactile interactions are also provided through tactile feedback according to the reactions of the virtual object.

method, this region (hereafter called the hand region) includes both the hand and part of the forearm. The proposed method detects the hand region using the generalized statistical color model [Jones and Rehg 2002]. The hand region is further defined by eliminating any skin-colored background using the distance transform. At this point, the palm pose (i.e. its position and orientation) can be estimated using the following procedure (see also Figure 2). 1. Find a mean direction of the hand region using the least square line fitting of the points on its contour. Separate the hand from the entire hand region (which includes the forearm) using the dominant features of the hand and forearm. Here, the dominant features are the mean points, which are computed using the two points of the hand’s contour orthogonal to the mean direction of the hand region. The starting point of the forearm is determined by the point where the distances between the two points of the hand’s contour orthogonal to the local direction of the mean points start to become constant. 2. Find a palm direction using the least square line fitting of its dominant features 3. Determine a projection model using a ratio of the palm’s reference lengths. The ratio of the palm’s reference lengths are computed by using the convexity defect point between the thumb and the index finger and the starting point of the forearm.

This paper is organized as follows. Section 2 briefly introduces our palm pose estimation method and the method is detailed in its subsections. Section 3 describes our interaction techniques with demonstrations. Our conclusion is drawn in Section 4.

4. Finally, the palm pose is estimated and a 3-D virtual object is rendered on the palm of the hand.

2

We use the generalized statistical color model to classify skin color pixels on the captured scene. Based on the color model, each pixel is determined as a hand region if its skin color likelihood is larger than a constant threshold. Figure 3-(b) shows a result of skin color

Methodology

To augment 3-D virtual objects on the palm of a user’s hand, the region including the hand should first be detected. Note that in our

2.1

Hand Region Segmentation

detection. On the other hand, background regions with pixel values similar to skin color cause detection errors as shown in Figure 3-(b). Thus, we ensure that the hand region has been correctly identified by detecting the majority portion of the segmented image using the distance transform [Borgefors 1986] (Figure 3-(c)). Assume that the majority portion of the segmented image based on skin color is mostly the user’s hand region when the user holds a mobile device by one hand and sees the other hand through a camera in the mobile device. Figure 3-(d) shows the detected hand silhouette image.

4-(c)). Finally, the starting point of the forearm is determined as a point where the distances are starting to be constant (the red arrow between Figure 4-(a) and 4-(c)). The next step is to find a palm direction. Finding the palm direction is important for correctly estimating the palm pose because the mean direction of the hand region includes the forearm direction and thus, it may indicate a wrong palm pose when the palm direction and the forearm direction are not collinear as shown in Figure 4-(d). Therefore, the palm direction is computed by using only the mean points of the separated hand. As shown in Figure 4-(d), the palm direction is accurately computed even if the palm direction significantly differs from the mean direction of the hand region.

Figure 3: Hand region segmentation. (a) Captured scene, (b) segmentation of the hand region based on skin color, (c) distance transform, (d) detected the hand region silhouette. Figure 5: Projected 3-D virtual square models. The last step is to estimate a palm pose by using the palm direction, the starting point of forearm, and the convexity defect point between the thumb and the index finger. In Figure 5-(a), palm width (L1) is determined by two orthogonal points where the line, which is orthogonal to the palm direction and passes through the convexity defect point, intersects with the contour of the hand. Palm height (L2) is computed by the shortest distance between the line of the palm width and the starting point of the forearm. To determine a projection model between the 2-D image plane and 3-D scene, we assume a 3-D virtual square model and the 3-D coordinates (X, Y, Z) of the square’s four corners are defined using the following equations.

|X| = Figure 4: Pose estimation. (a) Mean direction of the hand region (green line) and mean points (blue points), (b) hand region silhouette, (c) distances between the two orthogonal points of the local direction of the mean points, (d) palm direction.

2.2

Pose Estimation

The first step for estimating a palm pose is to separate a hand from a hand region by finding a starting point of a forearm. First, we find a mean direction of the hand region (the green line in Figure 4-(a)). The mean direction of the hand region is found by the least square line fitting of the points on its contour. Second, we find the mean points (the blue points in Figure 4-(a)) from the two points of the hand’s contour orthogonal to the mean direction of the hand region. Then, we compute distances between the two points of the hand’s contour orthogonal to the local direction of the mean points (Figure

|Z| =

p

L1 , 2

|Y | =

L2 ∗ k 2

(max(|X|, |Y |))2 − (min(|X|, |Y |))2

(1)

(2)

where, the constant k is a scale factor which is experimentally determined to make lengths of X and Y be same together. Note that in our approach, the 3-D virtual square models are defined case by case in the range of general one-handed motions. After computing the four corners, we perspectively project the 3-D virtual square model onto the 2-D image plane. Then, the projected model is translated into the center of the palm and rotated into the palm direction. Figure 5 shows that the projected square models are correctly located on the user’s palm in each different hand motions. Finally, the palm pose is calculated using the projected square model, and a virtual object is correctly rendered on the user’s palm as shown in Figure 6.

the virtual pets anytime and anywhere by looking at the palms of their free hands through phone cameras. The virtual pets react to people’s strokes or touch.

Figure 6: Augmentation of a virtual object on the user’s palm.

3

Interaction with a Virtual Object

Interaction flow of the proposed mobile AR application is shown in Figure 7. Using a camera of the mobile device, a user’s hand image is captured and the hand region is segmented with an imageprocessing unit. Then, a palm pose is estimated by using natural features of the hand image, and a virtual object is rendered on the user’s palm. When the user’s hand and fingers move, the motions are detected and transferred into two parts: visual interaction and tactile interaction. In visual interaction, the virtual object reacts according to the detected motions and causes visual feedback to the user. In tactile interaction, a tactile interface is controlled by control signals which are synchronized with the reactions of the virtual object, and it transfers tactile feedback to the user. Note that both interactions are simultaneously provided to the user.

Figure 8: Interactions with a virtual object when (a) opening and (b) closing the hand.

3.2

Figure 7: Interaction flow.

3.1

Visual Interaction

In contrast to previous approaches; for example, interaction with virtual objects by detecting marker-occlusion [Lee et al. 2004] or by sensing depth information [Wilson 2007], the proposed method provides a new type of interaction by simply detecting motions of a hand based on the fingertip tracking method. Using the curvaturebased algorithm, candidate points of fingertips are detected and the candidate points which are relatively far away from the forearm can be selected as the fingertips. In addition, we estimate the exact location of fingertips by ellipse fitting of the selected points. The experimental results are shown in Figure 8. In our demonstration on a UMPC (Sony, VGN-UX27LN), when the hand is opened, a flower is opened and a bee comes out and buzzes around it. When the hand is closed, the flower is closed and the bee disappears. The proposed interaction is especially well-suited for common situations in which users holding mobile devices have one hand free as shown in Figure 8. As one of application scenarios, people can raise virtual pets on their mobile phone as shown in Figure 1. They can carry and see

Tactile Interaction

Tactile interaction with virtual objects has been addressed in interesting ways for haptic interfaces. [Imm 2007] developed haptic interfaces (e.g. CyberTouch orCyberGlove) which enable users to experience virtual worlds, feeling vibro-tactile sensation from interaction with virtual objects. [Minamizawa et al. 2007] proposed a new haptic interface, called Gravity Grabber, to present virtual mass sensations and most recently, [Minamizawa et al. 2008] proposed a mechanism for the reproduction of the sensation on each finger and the palm, where realistic haptic interactions of dynamic motions of virtual objects are being allowed. The proposed application provides tactile interactions with visual interactions using a tactile interface when motions of the hand and fingers are detected in captured scenes. The prototype of our tactile interface is a form of glove and it consists of two parts: vibration sensors (3V/10,000rpm) and an AVR controller (Atmega128) as shown in Figure 9. Nine vibration sensors are arranged in 3x3 arrays with 30mm distances between each sensor on the palm of the tactile glove. Each vibration sensor is connected to the controller and activates according to control signals that are synchronized with reactions of the virtual object. Various hand motions such as bouncing or tilting as well as opening or closing the hand can be applied to our mobile AR application as shown in Figure 10, Figure 11, and Figure 12.

I MMERSION , C ORP. 2007. CyberTouch and CyberGlove. J ONES , M. J., AND R EHG , J. M. 2002. Statistical color models with application to skin detection. International Journal of Computer Vision 46, 81–96. L EE , G. G., B ILLINGHURST, M., AND K IM , G. J. 2004. Occlusion based interaction methods for tangible augmented reality environments. In Proceedings of ACM SIGGRAPH International Conference on Virtual Reality Continuum and Its Applications in Industry 2004, 419–426. M INAMIZAWA , K., F UKAMACHI , S., K AJIMOTO , H., K AWAKAMI , N., AND TACHI , S. 2007. Gravity Grabber: Wearable haptic display to present virtual mass sensation. In Proceedings of ACM SIGGRAPH Emerging Technologies 2007.

Figure 9: Prototype of tactile interface.

4

Conclusion

In this paper, we presented one-handed augmentation and interaction on mobile devices. Using our palm pose estimation method that uses natural features of the hand, the palm pose was accurately estimated, and the virtual object was correctly augmented on the user’s palm. In our application, the user could naturally interact with the virtual objects by detecting the motions of the hand and fingers using the fingertip tracking method. Moreover, the user perception from the virtual object was enhanced by wearing the tactile interface. While our current version had a few frame rates on a mobile phone due to limited computing power of the mobile phone, our demonstrations using the UMPC showed promising results for mobile phone applications. Currently, we are trying to enhance our interaction techniques for realistic sensations from virtual objects augmented on mobile devices.

Acknowledgements This work was supported by the IT R&D program of MKE/IITA. [2008-F-042-01, Development of Vision/Image Guided System for Tele-Surgical Robot]

References B ORGEFORS , G. 1986. Distance transformations in digital images. Computer Vision, Graphics, and Image Processing 34, 344–371. E ROL , A., B EBIS , G., N ICOLESCU , M., B OYLE , R. D., AND T WOMBLY, X. 2007. Vision-based hand pose estimation: A review. Computer Vision and Image Understanding 108, 52–73. H ENRYSSON , A., O LLILA , M., AND B ILLINGHURST, M. 2005. Mobile phone based AR scene assembly. In Proceedings of International Conference on Mobile and Ubiquitous Multimedia 2005, 95–102. H ERBST, I., B RAUN , A.-K., M C C ALL , R., AND B ROLL , W. 2008. TimeWarp: Interactive time travel with a mobile mixed reality game. In Proceedings of International Conference on Human Computer Interaction with Mobile Devices and Services 2008, 235–244. H ONKAMAA , P., S ILTANEN , S., JAPPINEN , J., W OODWARD , C., AND KORKALO , O. 2007. Interactive outdoor mobile augmentation using markerless tracking and GPS. In Proceedings of Virtual Reality International Conference 2007, 285–288.

M INAMIZAWA , K., K AMURO , S., F UKAMACHI , S., K AWAKAMI , N., AND TACHI , S. 2008. GhostGlove: Haptic existence of the virtual world. In Proceedings of ACM SIGGRAPH 2008. NARZT, W., KOLB , D., M ULLER , R., AND H ORTNER , H. 2003. Pervasive information acquisition for mobile AR-navigation systems. In Proceedings of IEEE Workshop on Mobile Computing Systems & Applications 2003, 13–20. O LWAL , A. 2006. LightSense: Enabling spatially aware handheld interaction devices. In Proceedings of IEEE and ACM International Symposium on Mixed and Augmented Reality 2006, 119–122. PAELKE , V., R EIMANN , C., AND S TICHLING , D. 2004. Footbased mobile interaction with games. In Proceedings of ACM SIGCHI International Conference on Advances in Computer Entertainment Technology 2004, 321–324. ROHS , M. 2005. Visual code widgets for marker-based interaction. In Proceedings of International Workshop on Smart Appliances and Wearable Computing 2005, 506–518. ROHS , M., 2008. Camera-based interaction and interaction with public displays. ACM MobileHCI 2008 Tutorial Notes. S CHALL , G., G RABNER , H., G RABNER , M., W OHLHART, P., S CHMALSTIEG , D., AND B ISCHOF, H. 2008. 3D tracking in unknown environments using on-line keypoint learning for mobile augmented reality. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2008, 285–288. WAGNER , D., P INTARIC , T., L EDERMANN , F., AND S CHMAL STIEG , D. 2005. Towards massively multi-user augmented reality on handheld devices. Lecture Notes in Computer Science 3468, 208–219. WAGNER , D., B ILLINGHURST, M., AND S CHMALSTIEG , D. 2006. How real should virtual characters be? In Proceedings of ACM SIGCHI International Conference on Advances in Computer Entertainment Technology 2006. W ILSON , A. D. 2007. Depth-sensing video cameras for 3D tangible tabletop interaction. In Proceedings of IEEE International Workshop on Horizontal Interactive Human-Computer Systems 2007, 201–204.

Figure 10: Interactions with a virtual object using the tactile interface when the hand is bouncing (the procedure is from (a) to (d)).

Figure 12: Interactions with a virtual object using the tactile interface when the hand is tilting (the procedure is from (a) to (c)).

Figure 11: Interactions with a virtual object using the tactile interface when (a-c) closing and (d-e) opening the hand (right: front view, left: upper view).

Suggest Documents