Recognition of Isolated Indian Sign Language ... - Semantic Scholar

1 downloads 0 Views 313KB Size Report
Robotics & AI Lab, Indian Institute of Information Technology, Allahabad. {iro2008005,jsp ... speech and hearing all over the world. It employs complete sign ...
Recognition of Isolated Indian Sign Language Gesture in Real Time Anup Nandy, Jay Shankar Prasad, Soumik Mondal, Pavan Chakraborty, and G.C. Nandi Robotics & AI Lab, Indian Institute of Information Technology, Allahabad {iro2008005,jsp,iro2009012,pavan,gcnandi}iiita.ac.in

Abstract. Indian Sign Language (ISL) consists of static as well as dynamic hand gestures for communication among deaf and dumb persons. Most of the ISL gestures are produced using both hands. A video database is created and utilized which contains several videos, for a large number of signs. Direction histogram is the feature used for classification due to its appeal for illumination and orientation invariance. Two different approaches utilized for recognition are Euclidean distance and K-nearest neighbor metrics. Keywords: Indian Sign Language, Direction Histogram, Gestures, K-nearest neighbor and Euclidean distance metric.

1 Introduction Sign Language enhances the understanding ability for the challenging persons in speech and hearing all over the world. It employs complete sign which is made with hand, facial expression and other parts of our body. Every country uses their own native language as per sign language is concerned about their own syntactical and grammatical meaning. Like British Sign Language (BSL) and American Sign Language (ASL), the language which is being used in India is called Indian Sign Language henceforth ISL [1,2]. Still ISL is not the official sign language of India in spite of 6 millions deaf and dumb populations. Several techniques have been followed for gesture recognition and classification including vision based using one or more camera [2-9]. Extraction of visual information in the form of feature vector is an important part in gesture recognition problem [6]. There are some difficult problems like tracking of hand, segmentation of hand from the background and environment, illumination variation, occlusion, movements, and position etc [5, 8]. Several techniques have been developed for pattern classification towards dynamic gestures in real time [3, 4, and 7]. Dynamic gesture implies moving gesture, represented by sequence of images. Classification in Indian sign language is most complicated task for preserving the necessary information about hand motions [3]. The prior work is based on recognition of Indian sign language reported in [2] was with a limited set of ISL samples. The classification approach is based on the hand gestures trajectory, hand shape and hand motion [6]. HMM would be used as a robust classifier for gesture recognition [7, 8]. This paper V.V Das et al. (Eds.): BAIP 2010, CCIS 70, pp. 102–107, 2010. © Springer-Verlag Berlin Heidelberg 2010

Recognition of Isolated Indian Sign Language Gesture in Real Time

103

demonstrates the statistical techniques for recognition of ISL gestures in real time which comprises of both the hands. Next section discusses the experimental setup for static and dynamic ISL gesture capture method.

2 Generation of ISL Video Training Samples We used constant background while recording the video with different frame rate per seconds (30 fps) and different sizes because we found background removal is computationally complex task and it affects the recognition result in real time [2]. As the feature extraction technique is concerned about the gray scale images so the background was chosen dark. Every ISL gesture implies some class or word which could be captured by waving both hands in a very appropriate manner [7]. We created repository of training and testing samples with huge number of images (sequence of images) of 22 specific kind of ISL class/word under various light illumination condition as shown in Fig. 1. Methodology is discussed in next section. ISL

Training

Testing

Start Frame

End Frame

ISL

Training

Testing 12

Above

13

11

Ascend

12

Across

17

15

Hang

12

10

Advance

11

9

Marry

22

12

Afraid

16

16

Moon

12

10

All

19

15

Middle

12

12

Alone

16

14

Prisoner

12

12

Arise

17

12

Beside

12

12

Bag

17

13

Flag

12

12

Below

16

15

Drink

14

12

Bring

18

17

Aboard

14

12

Yes

22

12

Anger

12

12

Static Frame

Fig. 1. ISL dynamic and static gestures under different light illumination conditions

3 Methodology The real time classification of ISL is shown in Fig. 2. The ISL videos were split into image frames. All the image frames were converted into gray scale images. Blurring the gray scale images with Gaussian filter and normalization is applied. As per Video capture

Convert video into gray scale image frames

Feature Vector

Classifier

Direction Histogram

Euclidean Distance / K nearest Neighbor

Fig. 2. Methodology for real time ISL classification

104

A. Nandy et al.

dynamic gesture is concerned and the gesture is performed in different light conditions so features would be selected in a very intelligent way so that position of the hand, changing the light condition along with the scene illumination effect would be invariance definitely[8].It is totally rotation dependent.

4 Feature Extraction Several types of features have been suggested by the researchers for real time gesture recognition. The hand trajectory is utilized as a feature vector for DTW [2], hand contour [3], combination of color, motion and hand position is utilized for hand gesture recognition [4, 7]. The hand skin color information is used for extracting the histogram [5] .The histogram of the local direction of edges in an image contributes an important feature [9, 10]. Direction histogram needs computation of gradient with the help of filter and generation of histogram in desired number of directions [10]. We found that the same frames at different position of gestures would produce the same feature vectors. The overall algorithm is as follows [9]: Step 1: Find the frames from video. Step 2: Convert all the frames into gray-scale images. Step 3: Resize all the images into 160 x120 pixels. Normalise the image sequences. Step 4: Apply 3 tap derivative filter kernels: U direction = {0 -1 1}, V direction = {0 1 -1} and Gradient of image X is computed at each point (u, v), given , 1 , 1 and 1, 1, . ,



(1)

,

atan

(2)

Where g is the magnitude and Φ is the angle. Step 5: Quantize angle obtained in step 4 into T numbers of bins and normalize the values. Store the values as feature vector. Rearrange the image blocks into columns and converting the column matrix with the radian values to degrees in order to display the direction histogram of the image like a plot in Fig. 3 and Fig. 4 is showing the distribution of values grouped according to their numeric range. Each group is called as one bin. The training patterns have been constructed with feature vector having 18 and 36 elements which is being consisted of direction histogram of the ISL motion gestures.

Fig. 3. Direction Histogram with 18 bins

Fig. 4. Direction Histogram with 36 bins

Recognition of Isolated Indian Sign Language Gesture in Real Time

105

Direction histograms are used to characterize images. In this 2 D histogram given an angular displacement Φ the (x, y) value of matrix indicates how many pixel couples at Φ angle distance consists of gray level x and y respectively. Gray level histogram is 1 D in nature where as images is 2D. In the case of 1-D, pattern matching is found using vector distance method. When feature space is a subset of real number and the distance among two feature vectors is their absolute difference ,we define it alternatively as: Theorem 1: Assume and be two direction histograms from the subset of real numbers, where an alpha function specifies the location of a point. Alpha function α is such that – 1 0. The requirement of same number of points is described by . Distance between feature vector is given by the absolute difference, the pattern match distance between . s and t is . is number of points in Define ; ∞ ∞ direction histograms having values ≤ f, similarly for also. In linear case, points are coupled for matching. Thus for every the number of points with values less than ∞ |. Hence ∞| u paired with points with values greater than u is | | is equal to the sum of all pair wise distance in pattern matching. Theorem 2: If D is a set of descriptors having the metric u. Suppose : be a function from D to real numbers, denoting the direction histogram values i.e. frequency of each descriptor Є and ∑ , where C is constant. For direction histogram this restriction implies that only images of equal C can be compared. Metric can be Euclidean distance or city block or some other else.

5 Classification and Result Analysis As explained in Fig.1. All the above step has to be followed for every training and testing gestures. For testing pattern based on the feature vector obtained we used two different classifiers Euclidean Distance and K-nearest neighbor. K nearest neighbor gives very good classification result. We achieved up to 100 % recognition. The classification result of 36 bins are more accurate. In real time, the training time using 36 bins direction histogram is more. Some of the ISL we considered are complex and many frames are similar hence the recognition result is poor in few cases as shown in Table 1. We used limited number of gestures for training, the classification and recognition result can be improved by incorporating more number of training samples. To handle the situation where two gestures are similar for example “Advance” and “Arise” use of other features will improve the recognition rate.

106

A. Nandy et al. Table 1. Recognition result using Euclidean and K nearest neighbor classifier

Class Above Across Advance Afraid All Alone Arise Bag Below Bring Yes

Recognition (%) using Euclidean Distance 18 bins 36 bins 99.42 99.81 100 100 93.83 94.25 100 100 71.85 79.51 96.39 85.9 68.08 64.42 51.35 65.99 86.11 84.47 100 100 100 100

Recognition (%) using K- nearest neighbor 18 bins 100 94.71 97.03 100 71.58 77.05 63.1 60.02 73.05 100 100

36 bins 100 99.82 98.63 100 66.12 74.1 61.93 48.42 70.42 100 100

Class Ascend Hang Marry Moon Middle Prisoner Beside Flag Drink Aboard Anger

Recognition (%) using Euclidean Distance 18 bins 36 bins 100 100 94 92 93 92 92.8 92 97.14 86 100 100 96 100 99.02 100 90.58 94 100 100 92.4 92

Recognition (%) using K-nearest neighbor 18 bins 36 bins 100 100 92 91 91.6 90.8 92.6 92.8 100 97.14 79.25 100 100 100 100 100 54.11 91.76 100 100 93 92.6

6 Conclusion and Future Work The underlying statistical techniques give efficient recognition accuracy for a limited set of dynamic ISL gestures. It comprises the comprehensive results for Euclidean distance and K-Nearest neighbor metrics. Recognition has been entertained by choosing the closest pattern among all the training set of patterns. Future work implies addition of more features and recognition with Hidden Markov Model technique in order to keep temporal information for each dynamic gesture in real time for continuous gestures. The system can be used with humanoid robot in future for training the robot in real time in a reactive manner. Another important aspect includes integration with Natural Language, so that robot could easily understand the ISL gestures and accordingly respond to them.

References 1. Dasgupta, T., Shukla, S., Kumar, S., Diwakar, S., Basu, A.: A Multilingual Multimedia Indian Sign Language Dictionary Tool. In: The 6th Workshop on Asian Language Resources, pp. 57–64 (2008) 2. Bhuyan, M.K., Ghoash, D., Bora, P.K.: A Framework for Hand Gesture Recognition with Applications to Sign Language. In: 2006 Annual IEEE, pp. 1–6 (2006) 3. Incertis, I.G., Bermejo, J.G.G., Casanova, E.Z.: Hand Gesture Recognition for Deaf People Interfacing. In: 18th Int. Conf. on Pattern Recognition (ICPR 2006), vol. 2, pp. 100–103 (2006) 4. Coogan, T., Awad, G., Han, J., Sutherland, A.: Real time hand gesture recognition including hand segmentation and tracking. In: 2nd Int. Symposium on Visual Computing, Lake Tahoe, NV, USA (2006) 5. Stefan, A., Wang, H., Athitsos, V.: Towards automated large vocabulary gesture search. In: Proc. of the 2nd int. Conf. on Pervasive Technologies Related to Assistive Environments, Corfu, Greece (2009) 6. Kelly, D., Delannoy, R., Mc Donald, J.: A framework for continuous multimodal sign language recognition. In: Proc. of the int. Conf. on Multimodal interfaces, Cambridge, Massachusetts, USA, pp. 351–358 (2009)

Recognition of Isolated Indian Sign Language Gesture in Real Time

107

7. Prasad, J.S., Nandi, G.C.: Clustering Method Evaluation for Hidden Markov Model Based Real-Time Gesture Recognition. In: IEEE ARTCom. Advances in Recent Technologies in Communication and Computing, October 27-28, pp. 419–423 (2009) 8. Ionescu, B., Coquin, D., Lambert, P., Buzuloiu, V.: Dynamic hand gesture recognition using the skeleton of the hand. EURASIP J. Appl. Signal Process. 2005(1), 2101–2109 (2005) 9. Hninn, T., Maung, H.: Real-Time Hand Tracking and Gesture Recognition System Using Neural Networks. WASET 50, 466–470 (2009) 10. Freeman, W.T., Roth, M.: Orientation histograms for hand gesture recognition. In: Intl. Workshop on Automatic Face- and Gesture- Recognition, pp. 296–301. IEEE Computer Society, Zurich (1995)