Keypoints Extraction for Markerless Tracking in ...

32 downloads 208228 Views 924KB Size Report
billion smartphone were in use worldwide in 2012 [17]. Recent smartphone .... smartphone with 4.1.2 OS (jellybean) and Samsung laptop operating under ...
Keypoints Extraction for Markerless Tracking in Augmented Reality Applications: A Case Study in Dar As-saraya Museum Jafar W. Al-Badarneh, Abdalkareem R. Al-Hawary, Abdulmalik M. Morghem, Mostafa Z. Ali, Rami S. Al-Gharaibeh 

Abstract— Archeological heritage is at the heart of each country’s national glory. Moreover, it could develop into a source of national income. Heritage management requires socially-responsible marketing that achieves high visitor satisfaction while maintaining high site conservation. We have developed an Augmented Reality (AR) experience for heritage and cultural reservation at Dar-Assaraya museum in Jordan. Our application of this notion relied on markerless-based tracking approach. This approach uses keypoints extraction technique where features of the environment are identified and defined into the system as keypoints. A set of these keypoints forms a tracker for an augmented object to be displayed and overlaid with a real scene at Dar As-saraya museum. We tested and compared several techniques for markerless tracking and then applied the best technique to complete a mosaic artifact with AR content. The successful results from our application open the door for applications in open archeological sites where markerless tracking is mostly needed.

Keywords— Augmented Reality, Cultural Heritage, Keypoints Extraction, Virtual Recreation. I. INTRODUCTION For the past few decades, people’s daily routines have become more technology dependent. This is largely caused by the fast diffusion of mobile communication devices, internet, and multimedia [1]. Technologies such as mobile travel guide systems seem to be useful and hedonic. Research results showed positive attitudes towards the adoption of this technology [2, 3]. Information structuring plays a major role in affecting users' cognition; Norman’s theory of action [4]. Augmented reality (AR) is an emerging technology with immense power to structure information in effective representations. AR's applications are invading all sorts of areas including heritage management [5]. The factors influencing users' acceptance of

F

OR

This work was supported in part by the ENPI CBCMED (International Augmented Med project) under Grant I-A/1.2/113 J. W. Al-Badarneh with the Computer Information Systems Department, Jordan University of Science & Technology, Irbid, Jordan 22110 (e-mail: [email protected]). A. R. Al-Hawary, A. M. Morghem., are with the Computer Information Systems Department, Jordan University of Science & Technology, Irbid, Jordan 22110 (e-mail: {morghem2005,ahawary92}@hotmail.com). M. Z. Ali, R. S. Al-Gharaibeh are with the Computer Information Systems Department, Jordan University of Science & Technology, Irbid, Jordan 22110 (e-mail: {mzali, rami}@just.edu.jo).

AR applications in cultural heritage management are explainable by the Technology Acceptance Model (TAM). An individual's tendency to accept and use a technology is primarily influenced by his perception of its usefulness and his enjoyment [6]. Some studies confirm that using AR to insert digital objects on top of a real world scene enhances the user's perception [7]. Such research results are good news to a country seeking to enhance its heritage management. Due to its geographical location and prosperous history, Jordan is very rich in archaeological sites. Around 2,000,000 tourists visit Jordan every year [20]. Unfortunately this number is well below an achievable target. One possible explanation of this low achievement is the failure of current heritage management practices in creating rich perception of the archeological sites within the cognition of the tourists. Site interpretation is an essential component of heritage management. This component is achieved through of a set of activities aiming at realizing public awareness understanding of heritage sites, and enriching visitors' experience. The current failure in achieving these objectives is due to the use of traditional interpretation tools. These tools include informational panels, signage, brochures and guided tours. Fortunately, information technology (IT) has been investigated for purposes of interpretation improvement. The applications showed promising results. Our research is a continuation of IT applications in site interpretation. We are proposing an application of AR technology that allows for a digitally enhanced view of the real world. Our markerless tracking AR technique enriches the experience of visitors to cultural environments with layers of multimedia-based information. The application tracks the artifacts at Dar As-saraya as the visitor tours the museum and displays the augmented content when triggered by kepoints identification. The augmented content is multimedia based and would address all sorts of related information. This information could include the original look of an artifact, its uses, how it was made, how it was traded, etc. II. LITERATURE REVIEW The applications of AR have been receiving significant attention in all types of fields such as medicine [8] and industrial maintenance [9]. National culture institutions have been utilizing AR to increase public awareness and enlarge

their current customer segment as well as reaching out to new segments [10]. AR applications have also been trialed at heritage sites across the world. Visitors of these sites are going through a non-traditional site experience with the aid of AR. For example, the famous site of Olympia in Greece allows its visitors to navigate the site in a 3D model with multimediabased guide and augmented reconstruction [13]. Also, the Pimatgol historical site in Seoul devised a new way of user interaction with the site called Window Wiping [14]. The Serbian National Museum uses QR codes around its halls to trigger AR content loaded on the visitors’ smartphones [15]. Finally, digital reconstruction using AR is demonstrated at the Seewon shopping district and Chengychon cultural sites in Seoul [16]. Interestingly, the earlier applications of AR have motivated the smartphone industry to improve their products. Over one billion smartphone were in use worldwide in 2012 [17]. Recent smartphone generations enjoy high computation power with multi-core CPUs and multi-mega pixel cameras. CPUIndependent graphics capabilities have also developed, where current GPUs enable massive per pixel processing [11]. Moreover, these smartphones are equipped with sensors bringing in huge advantageous. For example, GPS technology allows for location identification [12]. III. PROBLEM Applying AR as a smartphone application utilizes a technique called “see through”. This technique uses the rearfaced camera of a smartphone to track a defined environment and trigger the overlay of an augmented object on top of the scene. Current applications of this technique rely on markerbased trackers such as QR codes. Unfortunately, marker-based tracking requires the placement of the markers at the targeted environment which brings burden to the process. Worse, losing a marker would defeat the purpose of the application. Finally, tracking a marker lacks satisfying accuracy. This paper proposes a new protocol for markerless environment tracking. Our solution utilizes the high computational power of smartphones. We identify a set of keypoints in the targeted environment and define them as tracker into our system. The software loaded into the smartphone analyzes the frames coming from the camera. When a defined tracker is recognized, the augmented content is triggered and fed into the “see through” technique. Our protocol consumes high processing time in its analysis of the camera frames, which makes its performance highly dependent on the specifications of the smartphone.

Fig. 1 The proposed protocol

A. Grayscale Conversion Brightness (illumination) is one main factor in detecting a digital image. Determining the RGB value of an image with its perceived brightness values enhances the accuracy in image detection. This first component in our protocol converts frames from RGB color system into Grayscale color system. This conversion ensures easiness and robustness when matching image trackers. Luma Grayscale Conversion: converting an RGB image to grayscale image requires a calculation converting the RGB values for each pixel into one value reflecting the brightness of that pixel. The technique used in this process is Luma Grayscale conversion formula where it weighs each color based on how the human eye perceives it: The pseudo code below in Fig. 2 shows the conversion procedure.

IV. PROBLEM FORMULATION Our proposed protocol comprises two major components. The first component converts the RGB colored image or frame of an artifact into gray image. The component relies on using the Luma Grayscale approach for faster processing of an image. The second component is responsible for creating keypoints using the Scale Invariant Feature Transform (SIFT) technique. Fig. 1 illustrates our protocol.

Fig. 2 Luma Grayscale conversion pseudo code

Decomposition Grayscale Conversion: to decompose an image, we force each pixel to the maximum of its red, green, and blue values. This is carried out on a per-pixel basis. For example, a pixel with RGB (255, 0, 0), the decomposition result will be 255. Decomposition is designed to recognize a color value and ignore its channel as we see in Fig.6.

(

)

Fig. 3 Mosaic artifact at museum

Fig. 6 Grayscale conversion using Decomposition Fig. 4 Grayscale conversion using Luma Grayscale formula

The Luma Grayscale formula treats the RGB values unequally following the nature of the human eye. Human eye perceives Green more strongly than Red and Blue. Fig.3 shows our selected artifact to be processed through our proposed protocol. . For more realistic results we are using the Luma Grayscale technique to handle processing of the images as resulted in Fig.4. Desaturation (brightness) Grayscale Conversion: the color of a pixel is represented in different models with RGB being the most common. The HSL model is another model where HSL stands for hue, saturation and lightness. Hue is considered as the name of the color such as blue, yellow, green, etc. Saturation characterizes the vividness of a color. Finally, the lightness value ranges from white color indicating full lightness to black color indicating zero lightness. The process of desaturation of an image refers to conversion from an RGB color model to HSL color model. This process is useful for converting an RGB image values to its variant of least-saturated values as shown in Fig.5. (

(

)

(

))

Fig. 5 Grayscale conversion using Desaturation

B. Keypoints Markerless Tracking Using SIFT The second component operates in two modes: (1) detector mode, and (2) descriptor mode. In its detector mode the protocol executes modules within the SIFT algorithm. These modules transform a grey image data into scale-invariant coordinates relative to its features; such as image scaling, rotation, change in illumination and 3D camera viewpoint [21]. The result is a huge number of features that cover the image's scales and locations. These identified features resemble the keypoints, and a collection of keypoints in a scene forms a markerless tracker. In its descriptor mode the protocol executes the tracking modules within the SIFT. The ability to detect a scene requires three features or more be correctly matched from a scene with the content of the defined tracker. The accuracy of scene recognition depends on the quantity of keypoints defined in the tracker. The recognition and matching process between the features in a live scene and features in a stored tracker starts by the grouping of keypoints into descriptors (regions). Then, the matching is performed on individual bases within the compared descriptors [18]. These hectic computations are made less noticeable because of the power of smartphones. The whole SIFT algorithm is performed as below: 1. Find Scale-Space Extrema. - Gaussian kernel is used to create scale space. - Uses Difference-of-Gaussain function. - Locate DoG Extrema by looking at all neighboring points including scale and identifying Min and Max. 2. Filtering and Localizing Keypoints. - Sub-pixel location in the term of (x, y, z) and scale fit to a model. - Filter Edge and low Contrast Responses using scale space value at the (x, y, z) sub-pixel found location. 3. Keypoints Orientation Assignment. - Compute Gradient for each image. - Create Histogram and weight each point with Gaussian

window. - Create keypoint for all peaks. 4. Key points Descriptor Creation (Created from local image gradients). - Find image of closest scale. - Sample the points around the key point. - Rotate the gradients and coordinates by computer orientation done in step 3. - Separate the region in to sub regions. - create histogram for each sub region. The former two steps are related to the detector mode and the latter two are related to the descriptor mode. Following is the pseudo code that illustrates the process:

Fig. 8 Mosaic tracker and AR content

Fig. 9 Luma Grayscale image with full view

Fig. 7 SIFT pseudo code Fig. 10 Luma Grayscale image with partial right view

V. EXPERIMENTAL RESULTS AND ANALYSIS We experimented with the three fore mentioned techniques for gray image conversion. We used a mosaic scene and applied each of the three techniques to produce three grey scene versions. We then applied the SIFT algorithm on each grey scene version. As AR content, we used a model of a green arrow. A successful recognition of the targeted scene would trigger the display of the arrow on top of the tracked scene. Fig. 8 shows the tracker and the AR content. A. Experiment 1: Luma Grayscale conversion Our first tracking attempt was based on the luma Grayscale conversion. We tracked the live mosaic scene in three views: full, partial right, and partial left. In all three views the detection was complete and the AR content was successfully triggered as shown in Fig. 9, 10 and 11.

Fig. 11 Luma Grayscale image with partial left view

B. Experiment 2: Desaturation Grayscale Conversion We repeated the experiment using desaturation grayscale conversion. The results are shown in Fig. 12, 13 and 14. As noted, the partial right view failed in triggering the AR content.

Fig. 16 Decomposition grayscale image with partial right view Fig. 12 Desaturation Grayed image with full view

Fig. 13 Desaturation grayscale image with partial right view

Fig. 14 Desaturation grayscale image with partial left view

C. Experiment 3: Decomposition Grayscale Conversion Finally, we conducted the experiment using decomposition grayscale conversion. Fig. 15, 16, and 17 show the results.

Fig. 15 Decomposition grayscale image with full view

Fig. 17 Decomposition grayscale image with partial left view

In evaluating the quality of our AR experiment we identified three criteria: (1) successful AR content triggering (2) constant AR content placement, and (3) stable AR content display. Experiment 1 resulted in successful triggering in all three views. Moreover, the AR content maintained constant position with stable display. Experiment 2 resulted in successful triggering in only two views and failed in the third. The AR content maintained its relative position, yet it was unstable. Experiment 3 resulted in successful triggering similar to experiment 1 but performed as experiment 2 regarding the other two criteria. These results put luma Grayscale conversion at the top with decomposition in the second rank and desaturation occupied the worst rank. This ranking was based on the assumption that successful triggering is the most important criterion. We also compared the three conversion techniques in terms of their success in supporting SIFT to create the highest number of keypoints. The results in Table 1 illustrates that luma grayscale conversion outperformed the other two conversion techniques. This result explains the stability of the AR content in experiment 1 relative to experiments 2 and 3. Although a small number of keypoints is sufficient to trigger the AR content, yet the stability of the display is improved as the number increases.

TABLE I THE EFFECT OF CONVERSION TECHNIQUE ON SIFT PERFORMANCE

Conversion technique

Output from SIFT detector mode

Luma

4512 keypoints

Fig. 18 Mosaic jigsaw puzzle when triggered

Desaturation

2688 keypoints

Decomposition

3082 keypoints

Our experimentation platform consisted of android-based smartphone with 4.1.2 OS (jellybean) and Samsung laptop operating under Windows 7 with core i5 2.40GHz processor, 8GB RAM, and Nvidia 1GB video card. VI. DISCUSSION AND CONCLUSIONS As a practical application of our findings, we attempted to use the AR content to complete the missing parts in our mosaic artifact. We developed the AR content based on the shape of the missing piece and with the help of archeological experts. To add excitement to the experience of a visitor when using this technology we designed the AR content as pieces of a jigsaw puzzle. When triggered, the pieces start flying and assembling properly in their final location. Snapshots of this process are shown in Fig. 18 and 19. The luma Grayscale conversion proved effectiveness in stabilizing the position and display of the missing part of mosaic. Despite its small scale, this successful application of our AR markerless tracking in the museum of Dar As-saraya opens the door for huge future applications.

Fig. 19 Mosaic completed with missing part

The importance of markerless tracking becomes evident in open archeological sites where marker-based tracking is inapplicable. For example, the archeological site of Jerash city is the most complete site of a Roman city in the whole world. Applications of AR with markerless tracking at this open location would develop a wonderful experience to a tourist. If provided on a "see-through glasses" instead of a smart phone it would bring even more advantageous such as convenience and audio effects. AR applications with markerless tracking will transform a visit to an archeological site from a process of looking at history to a process of living the history.

ACKNOWLEDGMENT We give our thanks to the Department of Antiquities in Jordan for facilitating our access to Dar-As-saraya Museum in Irbid. Also, we express our gratitude to the authors of the open-source software related to this research. Without their product, we could not have comprehended the algorithms that each step of our reconstruction process is based on; including Virtual SFM, SURE, CloudCompare, MeshLab. REFERENCES [1]

[2]

[3] [4]

V. Vlahakis, N. Ioannidis, J. Karigiannis and M. Tsotros, Archeoguide An Augmented Reality Guide for archaeological sites , In: Computer Graphics and Applications, IEEE (Volume:22 , Issue: 5 ) , Sep/Oct 2002 , pp. 52 – 60. R. Peres, A. Correia, and M. Moital. The indicators of in- tension to adopt mobile electronic tourist guides. Journal of Hospitality and Tourism Technology, 2(2):120–138, 2011. C.-Y. Tsai. An analysis of usage intentions for mobile travel guide systems. Journal of Business Management, 4(13):2962–2970, 2011. D. Norman, The Design of Everyday Things. Doubleday: New York, New York, (1990).

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17] [18]

[19]

[20] [21]

D. Vanoni, M. Seracini and F. Kuester, “ARtifact: Tablet-Based Augmented Reality for Interactive Analysis” , in: Multimedia (ISM), 2012 IEEE International Symposium on, Irvine, CA, pp. 44-49. A. Angelopoulou, D. Economou, V. Bouki, A. Psarrou, L. Jin, C. Pritchard, and F. Kolyda, “Mobile Augmented Reality for Cultural Heritage,” Mobile Wireless Middleware, Operating Systems, and Applications, vol. 93, pp. 15–22, 2012. C. Liu, S. Huot, J. Diehl, W. Mackay and M. Beaudouin- Lafon, “Evaluating the Benefits of Real-time Feedback in Mobile Augmented Reality with Hand-held Devices”, CHI '12: Proceedings of the 30th international conference on Human factors in computing systems. ACM, 2012, in press. W.Birkfellner, M. Figl, K. Hubera and F. Watzinger, A head-mounted operating binocular for augmented reality visualization in medicine design and initial evaluation, in: Medical Imaging, IEEE Transactions on (Volume: 21, Issue: 8), 16 December 2002, pp. 991-997. J. Platonov, H. Heibel, P. Meier, and B. Grollmann, “A Mobile Markerless AR System for Maintenance and Repair,” in Mixed and Augmented Reality, 2006. ISMAR 2006. IEEE/ACM International Symposium on. IEEE, Oct. 2006, pp. 105–108. D. Boyer and J. Marcus. Implementing mobile augmented reality applications for cultural institutions. In J. Trant and D. Bearman, editors, Museums and the Web 2011: Proceedings (MW2011), Toronto, Canada, 2011. D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, and D. Schmalstieg, “Real-Time Detection and Tracking for Augmented Reality on Mobile Phones,” IEEE Transactions on Visualization and Computer Graphics, vol. 16, no. 3, pp. 355–68, 2010. K. Kirchbach, Augmented Reality on Construction Sites using a Smartphone-Application , In: Information Visualization (IV), 2013 17th International Conference on London , 16-18 July 2013 , pp. 398 – 403. V. Vlahakis, N. Ioannidis, J. Karigiannis, and M. Tsotros, Archeoguide An Augmented Reality Guide for archaeological sites , In: Computer Graphics and Applications, IEEE (Volume:22 , Issue: 5 ) , Sep/Oct 2002 , pp. 52 – 60. J. Kang and J. Ryu, Digital Reconstruction of a Historical and Cultural Site for Smart Phones , In: Mixed and Augmented Reality - Arts, Media, and Humanities (ISMAR-AMH), 2010 IEEE International Symposium on Seoul, 13-16 Oct. 2010, pp.67 – 68. V. Jevremovic and S. Petrovski, “MUZZEUM - Augmented Reality and QR Codes Enabled Mobile Platform with Digital Library, used to Guerrilla Open the National Museum of Serbia”, in: Virtual Systems and Multimedia (VSMM), 2012 18th International Conference on, Milan,pp. 561 – 564. J. Kang, “AR Teleport: Digital Reconstruction of Historical and Cultural-Heritage Sites Using Mobile Augmented Reality”, in: Trust, Security and Privacy in Computing and Communications (TrustCom), 2012 IEEE 11th International Conference on, Liverpool, pp. 1666 – 1675. D. Reisinger, "Worldwide smart phone user base hits 1 billion". CNet. CBS Interactive, Inc. Retrieved 26 July 2013. D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, Vancouver, Canada, 2004, pp: 91–110. A. K. Patel and S. Kumar, Gray Color Conversion Using Classification Method in: IRNet Transactions on Electrical and Electronics Engineering, India, 2012. Tourism statistical newsletter 2013, http://www.tourism.jo/en/Default.aspx?tabid=132. D. G. Lowe, "Object recognition from local scale-invariant features". Proceedings of the International Conference on Computer Vision 2, 1999, pp. 1150–1157.

BIOGRAPHY Jafar W. Al-Badarneh was born in (Irbid Jordan, 11/08/1992). Has a Bachelor Degree in computer information systems from Jordan University of Science and Technology Irbid, Jordan, 2014. He worked as A MOBILE AUGMENTED REALITY DEVELOPER for International Augmented Med Project at Jordan University of Science and Technology 2013-2014. Currently, he works as a RESEARCH ASSISTANT at Jordan University of Science and technology, computer information systems department. He has been part of this publication “Cultural Algorithms Applied to the Evolution of Robotic Soccer Team Tactics: A Novel Perspective” IEEE Congress on Evolutionary Computation (IEEE CEC 2014)China,Beijing. His research interests are Multimedia, robotics and optimization. Mr. Al-badarneh has been awarded the best oral presentation award at the sixth annual undergraduate research conference on applied computing, Zayed University April 30- May 1, 2014, Dubai, UAE.

Suggest Documents