Gesture Interaction System for Social Web Applications on Smart TVs Tiago Dias, Marcos Variz, Pedro Jorge, Rui Jesus Multimedia and Machine Learning Group - ADEETC Instituto Superior de Engenharia de Lisboa - ISEL Rua Conselheiro Emídio Navarro, no 1 1959-007 Lisboa, Portugal
[email protected],
[email protected],
[email protected],
[email protected] ABSTRACT Currently, the Web is a powerful channel to store and share personal information, particularly through the several social networks such as Facebook, Twitter or Flickr. These applications are used as privileged tools for communication and information availability. With the proliferation of new devices and with new types of user interfaces that can be used by social networking applications, suitable user interaction methods are becoming even more relevant. In this paper we propose a natural, gesture based, user interface, to interact with a Flickr client application on a Smart TV. The interface is based on a depth sensor and on a image processing method for gestures identification. It can be used to search and browse pictures using only gestures.
Keywords Social web, web searching, user interaction, natural interfaces, gesture recognition
1.
INTRODUCTION
During the last years, the way people consume information changed drastically. In the 1990s the Web became a privileged source of information through search engines like Yahoo or Google. Subsequently, social networks like Facebook, Twitter and Flickr created and share another type of information, the social information, with great success among users and companies. The availability and sharing of personal information on the Web became usual for users. The exponential growth of accesses to the platforms show the influence that social networks have on user’s habits when accessing the web. The revolution on mobile device market, first with the arise of smartphones and later of tablets, favoured the access to web content and social networks. These devices also brought new interfaces such as multi touch screens or speech recognition tools. Currently, we also observe a revolution in the digital TV market with the launch of Smart TVs equipped with web browsers, social applications, etc.
Significant changes have been occurring within the scope of natural interfaces that was crucial for the arising of some of the devices mentioned before. The video game industry has been one of the boosters of technologies like motion sensing, gesture recognition and speech recognition [1]. In R R depth sensor, it Kinect the particular case of Microsoft made possible for users to interact with video games only using body movement. The research of natural interfaces also aims to improve doctors, designers and other professionals performance on their daily tasks [3]. The goal of the work presented in this paper is to introduce a gestural user interface for photo searching and browsing on a Flickr TV platform application, providing a more natural and effective user experience. The resulting system uses TM R R Microsoft Kinect in combination with the OpenNI framework in order to capture motion information. Motion processing is performed by a gestures classifier based on Hidden Markov Models (HMM). We also adopted a digital TV provider set-top-box with a pre-installed Flickr application.
2.
3. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. OAIR’13, May 22-24, 2013, Lisbon, Portugal. Copyright 2013 CID 978-2-905450-09-8.
225
SYSTEM OVERVIEW
The system proposed in this paper has two main modules: the gesture recognition module and the gestural user interface module, has shown in Figure 1. The first acquires and classifies the motion information while the second one is responsible for the communication with the set-top-box and the user interface. For motion information acquisition R R we adopted Microsoft Kinect . In order to allow platform independence with the acquisition device manufactuTM rer, it was used the OpenNI framework. Also, we used the TM NITE middleware, an API that stands on top of previews framework, providing specialized algorithms for natural interfaces development for hand-based control or full-body control. Figure 1 illustrates the modular structure of the system and corresponding functionality of each module.
GESTURE RECOGNITION MODULE
The gesture recognition module has the responsibility of gesture acquisition, segmentation and classification. The information acquired is based on the hand movement, more precisely the angle between the vectors of two consecutive hand coordinates. Gesture segmentation is necessary to distinguish intentional movements from the unintentional ones. Therefore, a set of requirements must be achieved by a group of features in order to be accepted as an intentional gesture.
a)
b)
d)
e)
Figure 2: Gestures a)Wave, b)Swipe Up, c) Swipe Right, d)Push and e) Pull.
Figure 1: TV gesture interaction system. The requirements are: a minimum and a maximum velocity thresholds and a minimum number of features of the motion. After the gesture has been acquired and segmented, it is necessary to classify it according to a set of allowed gestures. Gesture recognition for natural interfaces is an open research topic and there are several implementation approaches [2]. Our choice fell on HMM, whose application on recognition systems for natural interfaces offers very satisfactory accuracy and performance results. Before applying the classifier it had to be trained and for that a set of samples for each gesture were obtained from various volunteers.
4.
c)
GESTURAL USER INTERFACE MODULE
The gestural user interface module is executed after gesture recognition module. Its main functions are the translation of the classified gestures to the set-top-box control instructions, the user’s session management and the graphical interface management. The issue of usability was always present, particularly by the analysis of the following principles: ease of learning, use and memorization, safety, effectiveness achieving the goals of the user. The control of the set-top-box is performed through the local network by sending the codes that map the functions of the conventional remote control. Thus, for each control gesture recognized by the system a code is used. Browsing in Flickr application with the remote control is performed through a set of directional keys, Ok key and Back key. Another relevant aspect in the development of this system is the assignment of the control to a user. Although it is possible, the system does not track more than one user. Thus, at any moment, only one user holds the control of the system and can perform gestures to be recognized by the application. To gain the session, users must perform the wave gesture. The session is lost after a timeout or when the user leaves the field of view of the depth sensor. To browse on Flickr application the user needs to scroll horizontally or vertically, select the desired options and return to the previous screen. For horizontal and vertical scrolling we chose swipe gestures. Selecting an option is accomplished through the push gesture and, finally, return to the previous screen is accomplished through the pull gesture. These gestures are illustrated in figure 2.
226
5.
EXPERIMENTAL RESULTS
In order to verify the accuracy of the system we made tests on the gesture classification procedure with a set of natural movements. These tests showed that HMM should be parametrized with 2 states and 14 observations for best performance (total error rate was nearly 0,3%). For the usability tests, different types of users were invited with only a simple explanation of how to use it. These tests were very useful for the adjustment of the parameters related to the detection of intentional movements.
6.
FUTURE WORK
It is intended the migration to more capable Smart TV platforms, for instance, Google TV or Ubuntu TV, in order to avoid the additional processing unit that is currently required to run the system.
7.
CONCLUSIONS
In this paper it is presented a gestural interface system for browsing in Flickr on digital TV platforms. It is described an implemented gesture recognition module with gesture segmentation, features extraction and classification based on HMM as its main functionalities. It is also presented a gestural user interface module, where we have focused on usability issues and on a set of chosen gestures to operate the system. Accuracy and usability tests made showed that, although this is a still in progress project with some expected improvements, this is a different, satisfying solution to control a Smart TV and its applications, such as Flickr.
8.
REFERENCES
[1] Natural user interfaces: Voice, touch and beyond. http://www.microsoft.com/en-us/news/features/ 2010/jan10/01-06cesnui.aspx, January 2010. [2] M. Elmezain, A. Al-Hamadi, J. Appenrodt, and B. Michaelis. A hidden markov model-based continuous gesture recognition system for hand motion trajectory. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pages 1–4, December 2008. [3] J. P. Wachs, H. I. Stern, Y. Edan, M. Gillam, J. Handler, C. Feied, and M. Smith. A gesture-based tool for sterile browsing of radiology images. Journal of the American Medical Informatics Association, 15(3):321–323, May 2008.