Indian Sign Language Gesture Recognition Using ... - IEEE Xplore

2 downloads 0 Views 903KB Size Report
Abstract-In recent days, Indian Sign Language (ISL) has been assumed to be more appealing gesture for speech and hearing impaired community. It helps us to ...
Indian Sign Language Gesture Recognition Using Discrete Wavelet Packet Transform Neha Baranwal,Neha Singh and G.C.Nandi

Robotics and AI Lab, Indian Institute of Information Technology,Allahabad, India [email protected], [email protected],[email protected]

Abstract-In recent days, Indian Sign Language (ISL) has been assumed to be more appealing gesture for speech and hearing impaired community. It helps us to understand the inherent meaning

of

this

language

for

establishing

a

gesture

based

communicating system. In this paper, a novel hand gesture recognition

technique

has

been

introduced

using

Discrete

Wavelet Packet Transform (DWPT). This technique provides more precise frequency resolution and more flexibility than DWT which helps to derive the invariant features. Dynamic hand gestures are collected in a constant background and variable light conditions. The DWPT technique has been applied on raw video

is useful for recogmzmg motions at different distances. A. Nandy et al [11] [12] proposed a real time Indian sign language gesture recognition technique where orientation histogram is used for calculating gesture features which are classified using Euclidean distance and K-nearest neighbor method. Orientation histogram is a technique used for calculating direction of edges with the help of atan function where a tan function is defined as:

data for data compression and eliminating unwanted noise. The Principal

Component

Analysis

(PCA)

has

been

used

distance metrics and Artificial Neural Network (ANN) which demonstrates

a

comparative

analysis

of

various

types

of

classifiers. It has been observed from the experimental results that DWPT based technique performs better in comparison to Haar transform and wavelet transform.

Keywords-DWPT, Gesture Recognition, Euclidean distance, ANN

I.

8(x,y)

for

dimensionality reduction and extracting the most significant features. The classification technique consists of different

INTRODUCTION

Today robotics is a vibrant field of research and it has tremendous application potentials not only in the area of industrial environment, battle field, construction industry and deep sea exploration but also in household domain as humanoid robot. To be acceptable in the household, the robots must have higher level of intelligence than industrial robots and they must be social and capable of interacting people around it who are not supposed to be robot specialist. All these come under the field of human robot interaction (HRI). There are various modes like speech, gesture, behavior etc. through which human can interact with robots. Establishing interaction through gestures is more prominent than any other mode because human visual system is more accurate than any other system. Indian sign language gesture is the way of establishing interaction between human and human, human and robot. ISL is visual-spatial languages where hands, arms, body postures, head orientation are used. It is helpful for hearing impaired persons. There are many vision based gesture recognition technique have already developed here we will discuss few of them.

978-1-4799-3140-8/14/$31.00 ©2014 IEEE

atan dy x

d

(I)

Where dy and dx are the derivative in y direction and x direction. A. Nandy et al [12] proposed an ISL gesture based imitation learning technique for human robot interaction where hidden markov model and Bhattacharya distance is used for classifying an unknown gesture. Here orientation histogram is used for calculating the orientation of hand gestures which signifies motion of hands. The disadvantage of all these techniques that it is computationally very high means it takes large amount of time for processing of data. Jian-Da Wu et al [13] proposed a DWPT based speaker identification system where features of speech signal are extracted using DWPT method. General regressive neural network (GRNN) is used for identification of an unknown person. Here experimental results are performed in two ways: DWT with GRNN and DWPT with GRNN. From both the results we have seen that DWPT with GRNN has less time complexity as well as high recognition rate. Therefore in this paper we have used discrete wavelet packet transform (DWPT) for reducing processing time of the system. Paper is divided into the following sections. Section II describes the discrete wavelet packet transform used for reducing the size of data and noise removal, section III shows the description of proposed methodology, section IV consists the experimental results and analysis of the proposed method and final section describes the conclusion and future work of the paper. II.

R.R Igorevich et al [6] proposed a gray scaled histogram based hand gesture recognition technique where stereo camera is used for collecting hand motion in 3D. Gray scale histogram method is used for finding the depth of the disparity map. This

=

DISCRETE WAVELET PACKET TRANSFORM (DWPT)

DWPT [5] are applied for noise reduction, compression and analysis of each frame in time domain as well as in fTequency domain. It is an extension of discrete wavelet transform

573

(DWT). DWPT divides each frame into four 2D frequency sub bands LL LH HL and HH. In DWPT both approximate as well as detail coefficients are further decomposed. This process continues up to nth level of decomposition. Here we have applied DWPT up to third level of decomposition. After third level of decomposition means in fourth or fifth level the results are constant or recognition accuracy will decreased. Therefore the decomposition is performed up to third level. The decomposition tree up to the third level is shown in figure 1.

Vedlo Captured using Web camJ digl cam

Noise reduction and compression using DWPT • DWT, HAAR tra'lsforrn

Original Image System Testing

System Training using ANN

using Distance based classifier and ANN

Fig 2: Block diagram of proposed method

1) Fig 1: Third Level Decomposition Tree of DWPT Transform

Here approximate and detail coefficients are expressed as:

CD(j) CA(j)

=

=

LiS(i)g(2j - i) LiS(i)h(2j - i)

(2) (3)

Where s(z) is the original gesture frame, 1:S i :sN, where N is the number of sample values CD(;) and CA(;) are the detail coefficient and approximate coefficient calculated using high pass(g) and low pass(h) filter. The approximate coefficients are the low frequency components, which are highly sensitive in nature and it contains 95 percent of information. The detail coefficients are the representation of the high frequency components, which carries some noisy information present in the image frame which carries 5 percent of information. In DWPT, quantization and thresh-holding is done in such a way that, HSV and CR value of an image is constraint for better visually of an image. III.

2)

In the first step videos are captured on fixed background and different light conditions. Convert these videos into image frames of size 320X240. Each colored frames are converted into gray scale frames. After that DWPT is used for compression of these image frames. Images are decomposed up to third level DWPT transform. After first level decomposition each frame is divided into four parts LL, LH, HL and HH where LL represents low frequency diagonal information, LH represents high frequency horizontal information, HL is high frequency vertical information and HH is high frequency diagonal information. In similar manner frames are divided up to third level of decomposition shown in figure 3.

PROPOSED METHOD

In this research work we have applied DWPT in hand gesture recognition. Here we collected database of five different types of Indian sign language gestures which are above, across, arising, below and aboard. Databases are collected in two different light condition one is yellow and another is white. Flow diagram of our proposed method is shown in figure 2.

Lewll

Level 1

Levell

Fig 3: Third Level Decomposition of above gesture using DWPT Transform

574

20 J 4 International Conference on Signal Propagation and Computer Technology (ICSPCT)

3)

Principle component analysis (PCA) [3] is applied for reducing the dimensions of features obtained after DWPT transform. The steps of PCA are: i. Calculate the mean m of feature Fi. After that subtracted mean is calculated as: Si = Fi m (4)

-

ii.

iii.

D3 (x,y)

In order to select the best projection direction we have to find out the characteristics roots of covariance matrix. These characteristics roots are the principal components (Eigen vectors and Eigen values). Principal components can be expressed as: CV=AV (6) Where v is the Eigen vector and A is the Eigen value. Let AJ A2 A3A� are the Eigen values of C and VJV2V3 are the principal components. The best projection direction is the combination of several principal components having maximum Eigen values which are expressed as:

.

iv.

.

Q= [VJV2V3 VJ (7) Final data set will be generated by multiplying transpose of Eigen vectors with mean adjusted original dataset. T (8) Xi = Q Si i= 1,2, ... .n

After that various types of distance based classifiers and Artificial Neural network are applied for calculating misclassification error rate.

4)

i.

Bhattacharyya Distance:

It is calculated by extracting the mean and variance between two classes. It is defined as:

Dl (x,y)

1 =

-In 4

(

1

-4

( -1 ,,2

tTy

+

,,2



"x

+

2

))

+

(9) Where Dl (x,y) is the Bhattacharya distance between training (x) and testing (y) classes. /lx and /ly is the mean of x and y class. ax and ay is the variance of x and y classes. ii.

argmaxi=1, ,.n IXi - yd 2

(11)

Euclidean distance: Minimum Euclidean distance between training data and testing data are calculated for recognizing an unknown gesture is expressed as:

Where Si is the subtracted mean,S is the mean of Si and Ci is the covariance matrix for i=1,2, . . . . . n. iii.

=

It is expressed as:

Here we have taken the maximum value between two vectors. iv.

Calculate the covariance matrix of the data set. (5) Ci = (Si - S) (Si - sf

Chessboard Distance:

D4 (x,y)

=

J

2

If=l(XJ _Yi)

(12)

Where XI is the training dataset, YI is the test dataset and D4 is the Euclidean distance matric. v.

Artificial Neural Network:

It is a iterative learning process where different types of activation functions like step function, sign function, sigmoidal function etc. are used for learning process. In this paper sigmoidal function is used as an activation function for training of gestures. Which is expressed as:

f (xa

=

f3 * (1 - e-axi) (1 + e aXi)

(1 3)

Where a and fJ are the constants. Here we consider a = fJ = 0.006 and number of iterations = 125. Classification rate for an unknown gestures are measured by creating confusion matrix which is shown in figure 5. 0.006,

IV.

EXPERIMENTAL RESULTS AND ANALYSIS

Experiments are performed on five types of gestures; aboard, above, across, below, arise. Each gesture has 5 samples with two light conditions i.e. five*two=ten samples of each gesture. Each video having 200 frames for training whereas 50 frames for testing. Videos are recorded using Sony handy camera. Here experiments are performed using open CV software. In this paper experiments are performed in three types of environment; Haar Wavelet, Daubechies Wavelet and Daubechies packet wavelet transform. These wavelets are decomposed up to third level of decomposition. After that misclassification error is calculated using various types of distance based classifiers and artificial neural network (ANN). robustness of gesture recognition In table techniques are tested by calculating misclassification error rate means number of mismatched frames from total number of tested files.

Manhattan Distance:

n

D2 (x,y)

=

L IXi - yd i

(10)

=l Where n is the total number of vectors and x is the training dataset and y is the test data set.

misclassification rate

=

/

otal no. of test fra (no. of misclassified frames T

2014 International Conference on Signal Propagation and Computer Technology (ICSPCT)

��? * 100 )

575

35

Table 1: Percentage of Misclassification Error Rate

Orientatio n histogram (Misclassi fication %)

Haar Wavelet

Daubechie s Wavelet

(Misclassifi cation

(Misclassif ication

%)

%)

Daubechies Packet Wavelet (Misclassific ation %)

6

22

10

5

9

33

30

10

25

37

35

22

7

11

9

7

10

14

13

9

histogram • Haar

Wavelet

• Daubechies

Wavelet

5

• Daubechies

o

Bhattachar yya

• Orientation

Average Processing Time

Avg. 30 Proces 25 sing 20 Time( 15 sec) 10

Packet Wavelet

Distance Manhattan Distance Chessboard Distance

Fig 5: Avg. processing time for different classification method From experimental results presented in table 1 we have seen

Euclidean distance

that Bhattacharya distance provides maximum accuracy in comparison to other distance matric as well as ANN because

ANN In figure

Classification methods

4

.. confusIOn matnx IS created for vIsualIzIng the

in Bhattacharya distance, the distance between two vectors are measured by calculating mean and variance of data set.This

performance of the proposed method where ANN is used as a

deals with more accurate results than any other method where

classifier.Here each column of the matrix represents testing

misclassification rate is calculated by simply subtracting the

class, while each row of the matrix shows training class.

two vectors or talking maximum of two vectors.From table 1

Correct matches are represented in diagonal of the matrix.

we also see that DWPT provides better accuracy than DWT, orientation histogram and Haar wavelet Also from figure 5 we have seen that average processing time of DWPT

is

less

as

compare

to

other

techniques

like

orientation histogram, Haar wavelet etc. CONCLUSION AND FUTURE WORK

V. The

proposed DWPT

based

gesture

recognition

technique

performs better as compare to Haar wavelet and Daubechies wavelet

because

in DWPT

both

approximate

and

detail

coefficients are decomposed and get approximate coefficients and detail coefficients and so on.It compresses data as well as prevents loss of information due to consideration of both coefficients.Here we use various types of classifiers; among all DWPT, PCA with Bhattacharya distance will provide minimum misclassification error rate in comparison to other methods like DWT, PCA and Manhattan distance etc. Also we have seen that the

Fig 4: Confusion Matrix for gesture recognition From figure

4

we have seen that ANN gives total 91 percent

average processing time ofDWPT based method is less as compare to other techniques.

accuracy for five types of gestures.As the number of gestures

Future work includes uses of various other classifiers like Support

increases the accuracy decreases. Because of increased in

vector

misclassification rate.

incorporate various other features like edge detection, skin color

machine,

hidden

markov

model

etc. We

also

map etc.for getting more accurate classification results.

576

20 J 4 International Conference on Signal Propagation and Computer Technology (ICSPCT)

try

to

VI.

REFERENCES

[1] lE. Stollniz, T.D. DeRose, D.H. Salesin: Wavelets for Computer Graphics: A Primer, Part 1. IEEE Computer Graphics and Applications,15(3): 76-84,may 1995. [2] J. Johffe, Principal Component Analysis, Springer, Berlin,1986. [3] S. Wold, Pattern recognition by means of disjoint principal components models, Pattern Recognition, 8 127-139,1976. [4] P. Wojtaszczyk, A Mathematical Introduction toWavelets. Cambridge University Press, Cambridge. 1998. [5] I. Daubechies, W. Sweldens: Factoring wavelet transforms into lifting steps. International J. of Fourier Anal. Appl.,vol. 4,No. (3),pp.247-269,1998.

[6] Rustam Rakhimov Igorevich, Pusik Park and Dugki Min, "Hand gesture recognition algorithm based on grayscale histogram of the image", Proc. of 4th IEEE International Conference on Application of Information and Communication Technologies (AICT), pp.I-4, 2010. [7]

N.K.Bose and P. Liang, "Neural Network Fundamentals with Graphs, Algorithms, and Applications",Mc Graw-Hill,New York,1996.

[8] Joseph Picard, "Gesture Recognition - Project Report", VISL lab (2003). [9]

Sebastian Marcel, Oliver Bernier, Jean Emmanuel Viallet and Daniel Coliobert. "Hand Gesture Recognition using Input - Output Hidden Markov Models", Proc. of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition,pp. 456 - 461,2000.

[10] Attila Licsar and Tamas Sziranyi. "Supervised training based hand gesture recognition system", Proc. of the 16th International Conference on Pattern Recognition, Vol. 3,pp 30999 -31003,2002. [II] Anup Nandy, Pavan Chakraborty, Jay Shankar Prasad, G. C. Nandi and Soumik Mondal. "Classification of Indian Sign Language in real time", International Journal on Computer Engineering and Information Technology (IJCEIT),pp. 52-57,2010. [12] Anup Nandy, Soumik Mondal, Jay Shankar Prasad, Pavan Chakraborty and G.C.Nandi, Recognizing & Interpreting Indian Sign Language Gesture for Human Robot Interaction" , Proc. of IEEE Int'l Conf. on Computer & Communication Technology,pp. 712-717,2010. [13] Jian-Da Wu and Bing-Fu Lin, "Speaker Identification Using Discrete Wavelet Packet Transform Technique with Irregular Decomposition", International Journal of Expert Systems with Applications (ACM),Vol. 36,pp. 3136-3143,march 2009.

2014 International Conference on Signal Propagation and Computer Technology (ICSPCT)

577