A statistical feature based decision tree approach for ...

1 downloads 0 Views 532KB Size Report
ACCV 2006, LNCS 3851, pp. 817–825, 2006. [5] Aleem Khalid Alvi, M. Yousuf Bin Azhar, Mehmood. Usman, Suleman Mumtaz, Sameer Rafiq, Razi Ur Rehman,.
A Statistical Feature based Decision Tree Approach for Hand Gesture Recognition Sana Nisar

Akhlaq Ahmed Khan

Muhammad Younus Javed

College of E&ME

College of E&ME

College of E&ME

National University of Sciences and Technology,

National University of Sciences and Technology,

National University of Sciences and Technology,

Rawalpindi, Pakistan

Rawalpindi, Pakistan

Rawalpindi, Pakistan

00-92-51-9278050

00-92-51-9278050

00-92-51-9278050

[email protected]

[email protected]

[email protected]

ABSTRACT A lot of work has been done in the field of sign language recognition all over the world. The main focus of such work is to make the life of vocally impaired people more comfortable. It also bridges the communication gap between the normal and the abnormal people. The deaf and dumb people need not only learn the standard sign language but the core issue is that they can communicate with the normal people of society. It is also not possible for all the normal people that they learn the sign language to understand whatever is said through gestures. So the communicational gap still stays there even after teaching deaf and dumb people with sign language. In this paper, an approach has been presented in which statistical features are extracted from the hand signs and are then fed to the decision tree for the recognition of the hand gestures. In this research the English alphabet gestures data set has been used and the recognized hand gestures are then represented as both the alphabetical and voice forms. This would help the impaired people to communicate with normal people in the way that they can understand.

Keywords Statistical Features, Decision Trees, Hand Gestures, Finger Count, Inter Finger Distance.

1.

In order to recognize the signs, these systems use different features of the signs. Location, angles and velocity of hand have been used to recognize hand gestures by Yoon et al. [1]. Neural networks have been used to classify hand signs by Yin and Xie [2]. Depth silhouettes to recognize hand gestures have been used by Salinas et al. [3]. Quadratic curve based method has been proposed by Dong, Wu and Hu to recognize gestures [4]. The Boltay Haath project by Aleem Khalid Alvi et al. [5] aims at recognizing Pakistan Signed Language (PSL) gestures using statistical template matching. A combination of the colour cue and template matching technique has been used by Tanibata and Shimada for their Japanese Sign Language (JSL) recognition system [6]. McGuire et al. developed Sign-to-text/speech translation system or dialog systems for use in specific public domain [7]. Instead of sending live videos in case of video communication between deaf people, Signed Language Recognition (SLR) can be used to translate the video to notations which are transmitted in textual form and then animated at the other end to amass bandwidth [8]. As proposed by Koizumi et al. [9], SLR can also be supportive in annotating sign videos for linguistic analysis to save a lot of human manual labour especially in videos.

INTRODUCTION

In case of nonverbal communication, the hand gesture is one of the most important and typical method. Sign languages have now been accepted being subsisting with majority languages and as being the minority languages during almost last fifty years. The basic sign language (SL) recognition techniques based on the capturing features are glove based and vision based. Researchers have proposed many different techniques to recognize hand gestures.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. FIT’09, December 16–18, 2009, CIIT, Abbottabad, Pakistan. Copyright 2009 ACM 978-1-60558-642-7/09/12....$10.

Keeping in view all the requirements of the sign language recognition, it has become evident that a more accurate, robust and better technique is required for the best conversion of the sign language gestures to easy to understand representation. The main purpose is that a vocally impaired person can easily communicate with people like him/her or normal people. In this paper, a decision tree based hand gesture recognition system for recognition of English language alphabets is proposed. The general flowchart of the proposed technique is shown in Figure 1. The rest of the paper contains image pre-processing in section 2. Section 3 gives the feature extraction details while the gesture recognition approach is explained in section 4. Results and analysis are provided in section 5 and in the end, the conclusion and future work is discussed in section 6.

A logical OR operation is then applied on the two outputs of threshold application. This would result in a region that has nonblack pixels of both the HS and CbCr images. The resultant image would contain the segmented hand along with some noise.

2.1.2

Morphological Operations

Morphological operations are applied on the hand image in order to join any broken hand segments and removal of relatively bigger noise particles. Erosion operation is carried out first in order to remove these noise particles and to remove the uneven outer boundary of the hand. Then Dilation, the reverse operation of erosion is applied later so that the original size of the hand is maintained. The dilation/erosion operation would affect the image depending upon the size of the structuring element. These operations remove noise from image and the resultant image rarely contains any bigger size noise.

Figure 1 . Flowchart of the Proposed Technique

2.

PRE-PROCESSING

When a certain object in the image is to be analyzed, it is essential that the objects of interest are distinguished from the other objects in the image. The latter group of objects is also called background. The techniques used for picking out the objects of interest are called segmentation techniques. The pre-processing includes the segmentation of image and extraction of hand for further processing. So, many techniques have been presented in order to improve the quality of segmentation results, it is very hard to say that there is any such segmentation technique that is applicable universally and works for all the images. No segmentation technique is perfect one.

After the morphological operation the size of remaining noise if still persists, is very small as compared to the size of the hand. Thus, on the basis of their size they can be removed from the segmented image by using small area removal operations. At the end a clear and noise free hand image with black background is extracted

3.

FEATURE EXTRACTION

Once an image is segmented, the features of image are acquired and analyzed for developing a classification system. The basic features of an image are illustrated in the following text.

3.1 Area The area of region R can be found by simply counting the image pixels that make up the region,

2.1 Segmentation The inputs of the system are colour images that are used after segmentation. Hands are extracted on the basis of skin colour. The unexceptional variation in lighting condition and background containing skin color objects can affect the performance of segmentation algorithm even though it works robustly.

2.1.1

The area of a connected region without holes can also be approximated from its closed contour, defined by M coordinate points (x0, x1, xm-1), using the Gaussian area formula for polygons [10]:

Space Transformations

The input image is transformed into two colour spaces namely HSV and YCbCr. In the HSV space, only two out of three components (i.e. Hue and Saturation) are selected and thresholds are applied. Value component is not considered to apparently make the result illumination invariant. The same is done for the Cb and Cr case. The illumination component in this case is Y. The threshold values selected after a number of observations are:

3.2 Centroid The centroid of a binary region is the arithmetic mean of the coordinates in the x and y directions [10]. Centroid of a sample image is shown in Figure 2.

0.6824