LVQ-based Hand Gesture Recognition using a ... - Semantic Scholar

2 downloads 79 Views 4MB Size Report
to mp and falls into a zone of values called window (that is defined around the midplane .... 4 Powerpoint, Visual Studio 2010 and Windows Vista are registered ...
LVQ-based Hand Gesture Recognition using a Data Glove Francesco Camastra? and Domenico De Felice Department of Applied Science, University of Naples Parthenope, Centro Direzionale Isola C4, 80143 Naples, Italy, e-mail: [email protected]; [email protected]

Abstract. This paper presents a real-time hand gesture recognizer based on a Learning Vector Quantization (LVQ) classifier. The recognizer is formed by two modules. The first module, mainly composed of a data glove, performs the feature extraction. The second module, the classifier, is performed by means of LVQ. The recognizer, tested on a dataset of 3900 hand gestures, performed by people of different gender and physique, has shown very high recognition rate.

1

Introduction

Gesture is one of the means that humans use to send informations. In general, the information amount conveyed by gesture increases when the information quantity sent by the human voice decreases [1]. Besides, the hand gesture for some people, e.g., the disabled people, is one among main means for sending information. The aim of this work is the development of a real-time hand gesture recognizer. The real-time requirement is motivated by associating to the gesture recognition the performing of an action, for instance the opening of a multimedia presentation, the starting of an internet browser and other similar actions. The paper presents a real-time hand gesture recognition system based on Learning Vector Quantization (LVQ) classifier that can work on a netbook. The system is formed by two modules. The first module, a feature extractor, is performed by a data glove [2] and represents the hand gesture by means of a five-dimensional feature vector. The second module, the classifier, is performed by LVQ. The paper is organized as follows: the feature extraction module is discussed in Section 2; a review of Learning Vector Quantization is provided in Section 3; Section 4 reports some experimental results; in Section 5 some conclusions are drawn.

2

Feature Extraction

Several approaches were proposed for gesture recognition [3, 4]. In gesture recognition, it is common to use a camera in combination with an image recognition ?

Corresponding Author

2

system. These systems have the disadvantages that the image/gesture recognition is very sensitive to illumination, hand position and hand orientation [5]. In order to avoid these problem we decided to use an approach borrowed by Virtual Reality [2] applications where the movements of the hands of people are tracked asking them to wear data gloves [6]. A data glove is a particular glove that has sensors, typically magnetic or in optic fiber, inside that allow the track the movement of the fingers of hand. Having said that, the feature extraction module of our system is essentially composed of the DG5 Vhand 2.01 data glove, shown in figure 1, that has to be weared by the person, whose gesture has to be recognized. The data glove, whose updating time of about 20 ms, measures the flexure of each finger, providing a real value between 0 and 100, that represents the finger complete closure and opening, respectively. To get more robustness to the system, the flexure for each finger is obtained averaging five consecutive measures. Therefore a new feature is generated every 100 ms. Moreover, in our experiments, considering thousands of gestures performed by people of different gender and physique, the flexure measure of finger takes value in a narrower range, typically between 10 and 70. Therefore, the values provided by the data gloves are subsequently normalized between 0 and 1, so that the values of 10 and 70 corresponds to 0 and 1, respectively. Summarizing the hand gesture is represented by means of a vector of five normalized real values generated every 100 ms.

Fig. 1. DG5 VHand 2.0 Data Glove.

1

DG5 Vhand 2.0 is a registered trademark of DGTech Engineering Solutions

3

3

The Classifier

In recent years Support Vector Machine (SVM ) [7–9] has been one of the most effective classification algorithm. Since SVM is a binary classifier, if we want to use SVM when the number of classes K is larger than two it is necessary to use specific strategies. The simplest strategy, one-versus-all method [10], requires an ensemble of K SVM classifiers, i.e., one classifier for each class against all the other classes. If we use other strategies the number of classifiers increases to K(K1)/2. Therefore, SVMs require computational resources that are not compatible with a real-time recognizer. 3.1

Learning Vector Quantization

Having said that, we have chosen Learning Vector Quantization (LVQ) [11] as classifier in the gesture recognizer since it requires moderate computational resources. In last years LVQ was successfully applied in many different fields such as handwritten digit [12] and cursive character recognition [13], the identification of the Head-and-Shoulders patterns in the financial time series [14], the analysis of sensor array data [15] and automatic recognition of whale calls for real-time monitoring [16]. We pass to describe LVQ and first fix the notation. Let D = {x1 , . . . , x` } ⊆ IRN be a data set. We call codebook the set M = {m1 , . . . , mP } ⊂ IRN and P  `. Vector quantization aims to yield codebooks that represent as much as possible the input data D. LVQ is a supervised version of vector quantization and generates codebook vectors (codevectors) to produce near-optimal decision boundaries [17]. LVQ consists of the application of a few different learning techniques, namely LVQ1, LVQ2 and LVQ3. LVQ1 uses for classification the nearest-neighbour decision rule; it chooses the class of the nearest codebook vector. LVQ1 learning is performed in the following way. We denote with mct the value of mc at time t and with C(x) and C(mct ) the class of x and mct , respectively. If mct is the nearest codevector to the input vector x, then   c mt+1 ← mct + αt [x − mct ] if C(x) = C(mct ) (1) mct+1 ← mct − αt [x − mct ] if C(x) 6= C(mct ) where αt is the learning rate at time t. The other codevectors are not changed, i.e. mit+1 ← mit if i 6= c

(2)

In our experiments, we used a particular version of LVQ1, that is the Optimizedlearning-rate (OLVQ1) [17], a version of the model that provides a different learning rate for each codebook vector. We pass to describe OVLQ1. If mct is the closest codevector to x(t), i.e., the input vector at time t, then the equation (1) holds. The equation (1) can be rewritten in the following way: mc (t + 1) ← [1 − s(t)αc (t)]mc (t) + s(t)αc (t)x(t)

(3)

4

where s(t) = 1 if the classification is correct and s(t) = −1 otherwise. We can observe that the codevector mct is statistically independent of x(t). Therefore the stastistical accuracy of the learned codebook is optimal if the effects of the corrections made at different times are of equal weight. We note that mc (t + 1) keeps a trace from x(t) through the last term in the equation (3) and traces from the earlier x(t0 ), with t0 = 1, . . . , t − 1, through mct . The absolute magnitude of the last trace from x(t) is scaled down by the factor αc (t) whereas the trace from x(t − 1) is scaled down by [1 − s(t)αc (t)]αc (t − 1). If we assume that the two scaling are equal, we have: αc (t) = [1 − s(t)αc (t)]αc (t − 1).

(4)

If this condition holds for all t, by induction it can be shown that the traces stored up to time t from all the previous t0 will be scaled down by an equal amount at the end. Therefore the optimal value of the learning rate α(t) is given by the following update rule: ) ( αc (t−1) if C(x) = C(mct ) αc (t) ← 1+α c (t−1) (5) αc (t−1) αc (t) ← 1−α if C(x) 6= C(mct ) c (t−1) Since both LVQ1 and OLVQ1 tend to push codevectors away from the decision surfaces of the Bayes rule [18], it is necessary to apply to the codebook generated a successive learning technique called LVQ2. LVQ2 tries harder to approximate the Bayes rule by pairwise adjustments of codevectors belonging to adjacent classes. If ms and mp are nearest neighbours of different classes and the input vector x, belonging to the ms class, is closer to mp and falls into a zone of values called window (that is defined around the midplane of ms and mp ), the following rule is applied:   s mt+1 ← mst + αt [x − mst ] (6) mpt+1 ← mpt − αt [x − mpt ] It can be shown [11] that the LVQ2 rule produces an instable dynamics. To prevent this behavior as far as possible, the window w within the adaptation rule takes place must be chosen carefully. Moreover, the hypothesis margin µ of the classifier [19] is given by: µ=

kx − ms k − kx − mp k . 2

(7)

Hence LVQ2 can be seen as a classifier which aims at structural risk minimization during training, comparable to Support Vector Machines [20]. Therefore a very good generalization ability of LVQ2 can be expected also for high dimensional data. In order to overcome the LVQ2 stability problems, Kohonen proposed a further algorithm (LVQ3). If mi and mj are the two closest codevectors to input x and

5

x falls in the window, the following rule is applied:  i m ← mit if C(mi ) 6= C(x) ∧ C(mj ) 6= C(x)   t+1 j j   ← mt if C(mi ) 6= C(x) ∧ C(mj ) 6= C(x)   mt+1  i i i  m ← m − α [x − m ] if C(mi ) 6= C(x) ∧ C(mj ) = C(x) t t  t t  t+1  j j j mt+1 ← mt + αt [xt − mt ] if C(mi ) 6= C(x) ∧ C(mj ) = C(x) mit+1 ← mit + αt [xt − mit ] if C(mi ) = C(x) ∧ C(mj ) 6= C(x)    j j j i j    mt+1 ← mt − αt [xt − mt ] if C(m ) = C(x) ∧ C(m ) 6= C(x)  i i i i j    mt+1 ← mt + αt [xt − mt ] if C(m ) = C(m ) = C(x)  mjt+1 ← mjt + αt [xt − mjt ] if C(mi ) = C(mj ) = C(x)

           

,

(8)

          

where  ∈ [0, 1] is a fixed parameter. 3.2

Classification with Rejection

In practical applications, it is usually associated to the recognition of a gesture the performing of a given action, e.g., the starting of a multimedia performance. In this applicative scenario, it is desiderable that the classifier recognizes a gesture, i.e., classifies, only when the probability of making a mistake is negligible. When the probability of making is not negligible, the classifier has to reject the gesture, i.e., it does not classify. We implemented a rejection scheme in the classifier in the following way. Let d be the Euclidean distance between the input x and the closest codevector mc , the following rule is applied: If d ≤ ρ then classify else reject

(9)

where ρ is a parameter that manages the trade-off between error and rejection.

4

Experimental Result

To validate the recognizer we selected 13 gestures, invariant by rotation and translation. We associated to each gesture a symbol, a letter or a digit, as shown in figure 2. We collected a database of 7800 right hand gestures, performed by people of different gender and physique. The database was splitted with a random into two equal parts, training and test set each containing 3900 gestures. The number of classes used in the experiments was 13, namely the number of the different gestures in our database. In our experiments the three learning techniques, i.e. LVQ1, LVQ2 and LVQ3, were applied. We trained several LVQ nets by specifying different combinations of learning parameters, i.e. different learning rates for LVQ1, LVQ2, LVQ3 and various total number of codevectors. The best LVQ net was selected by means of 10-fold cross-validation 2 [21, 22]. 2

In K-fold cross-validation the training set is partitioned into K subsets. Of the K subsets, a single subset is retained as the validation data for testing the model, and the remaining K − 1 subsets are used as training data. The cross-validation process is then repeated K times, i.e. the folds, with each of the K subsets used exactly once as the validation data. The K results from the folds then can be averaged to produce a single estimation.

6

Fig. 2. Gestures represented in the database.

LVQ trials were performed using LVQ-pak [23] software package. We compared LVQ against the K-nearest-neighbor (Knn) classifier, available in the LVQ-pak software package. In Table 1, for different classifiers, the performances on the test set, measured in terms of recognition rate in absence of rejection, are reported. Our best result in terms of recognition rate is 99.31 %. The confusion

Algorithm Correct Classification Rate knn 92.56 % LVQ1 + LVQ2 99.31 % LVQ1 + LVQ3 99.31 % Table 1. Recognition rates on the test set, in absence of rejection, for LVQ classifiers.

matrix for LVQ classifier, on the test set, is shown in table 2. To our best classifier, i.e., LVQ1+LVQ3, the rejection rule, described in Section 5.2, was applied. The results obtained for different values of the rejection threshold ρ are shown in Table 3. Asking to the recognizer a negligible error, e.g., less than 0.4%, the recognizer can still guarantee a high correct recognition rate, i.e., larger than 97.0%.

7

A B C D E F G H L U W Y 4

A 100 0 0 0 0 0 0 0 0 0 0 0 0

B C D E F G H L U W Y 4 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 97.33 2.67 0 0 0 0 0 0 0 0 0 0 0 99.33 0 0 0 0 0 0.67 0 0 0 0.33 0 0 99.67 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 97.67 0 0 0 2.33 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 97 3 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 100

Table 2. The confusion matrix for LVQ1+LVQ3 classifier on the test set The values are expressed in terms of percentage rates.

Fig. 3. The Hand Gesture Recognizer.

8 ρ 0.33 0.30 0.25 0.20 0.10 0.0992

Correct 99.31 % 97.36 % 92.38 % 84.62 % 41.33 % 40.82 %

Error 0.69 % 0.38 % 0.23 % 0.20 % 0.05 % 0.00 %

Reject 0.00 % 2.26 % 7.39 % 15.20 % 58.62 % 59.18 %

Table 3. Correct, Error and Reject rates on the test set of LVQ1+LVQ3 for different rejection threshold ρ.

A demo3 of the recognizer, where the performing of a Powerpoint4 command is associated to the recognition of a sequence of gestures, is used by the first author, as data glove demostrator, in his B. Sc. course of Virtual Reality. The system, shown in figure 4, is implemented in C++ under 64 bit Windows Vista Microsoft and Visual Studio 2010 on a Netbook with Intel Celeron5 Processor SU2300 1.20 Ghz, 3 GB RAM and requires 140 CPU6 ms to recognize a single gesture. Since the feature extraction requires 100 ms, as underlined in Section 2, the remaining 40 ms are spent for the classification and performing the associated Powerpoint command.

5

Conclusions

In this paper we have described real-time hand gesture recognition system based on a Learning Vector Quantization classifier. The system is formed by two modules. The first module, mainly composed of a data glove, performs the feature extraction and represents the hand gesture by a 5-dimensional feature vector. The second module, the classifier, is performed by means of Learning Vector Quantization. The recognizer, tested on a dataset of 3900 hand gestures, performed by people of different gender and physique, has shown a recognition rate, in absence of rejection, larger than 99%. The system implemented on a netbook requires an average time of 140 CPU ms to recognize a hand gesture. Furthermore, a version of the system where the recognition of a sequence of gestures is associated the carrying out of a multimedia presentation command has been developed. In the next future we plan to investigate its usage in classsrooms as tool for helping disabled students. 3 4

5 6

The demo is available at http://dsa.uniparthenope.it/francesco.camastra. Powerpoint, Visual Studio 2010 and Windows Vista are registered trademarks by Microsoft Corp. Celeron is a registered trademark by Intel Corp. CPU time has been measured using the proper C++ function.

9

Acknowledgments Firstly, the authors wish to warmly acknowledge the Centro di Calcolo of University of Naples Parthenope and in particular Prof. Giulio Giunta, their director, for having provided the data gloves, object of the work. The authors wish to thank the anonymous reviewers for their valuable comments. Part of the work was developed by Domenico De Felice, under the supervision of Francesco Camastra, as final dissertation project for B. Sc. in Computer Science at University of Naples Parthenope.

References 1. Kendon, A.: How gestures can become like words. In: Crosscultural perspectives in nonverbal communication, Toronto, Hogrefe (1988) 131–141 2. Burdea, G., Coiffet, P.: Virtual Reality Technology. John-Wiley & Sons, New York (2003) 3. Mitra, S., Acharya, T.: Gesture recognition: A survey. IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews 37(3) (2007) 311–324 4. Chaudhary, A., Raheja, J., Das, K., Raheja, S.: A survey on hand gesture recognition in context of soft computing. In: Advanced Computing, Springer (2011) 46–55 5. Weissmann, J., Salomon, R.: Gesture recognition for virtual reality applications using data gloves and neural networks. In: Proceedings of the IJCNN 99, IEEE Press (1999) 2043–2046 6. Dipietro, L., Sabatini, A., Dario, P.: A survey of glove-based systems and their applications. IEEE Transactions on Systems, Man and Cybernetics 38(4) (2008) 461–482 7. Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20 (1995) 1–25 8. Sch¨ olkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (USA) (2002) 9. Shawe-Taylor, J., Cristianini, N.: Kernels Methods for Pattern Analysis. Cambridge University Press (2004) 10. Herbrich, R.: Learning Kernel Classifiers. MIT Press, Cambridge (USA) (2004) 11. Kohonen, T.: Learning vector quantization. In: The Handbook of Brain Theory and Neural Networks, MIT Press (1995) 537–540 12. Ho, T.: Recognition of handwritten digits by combining independent learning vector quantizations. In: Proceedings of the Second International Conference on Document Analysis and Recognition, IEEE (1993) 818–821 13. Camastra, F., Vinciarelli, A.: Cursive character recognition by learning vector quantization. Pattern Recognition Letters 22(6-7) (2001) 625–629 14. Zapranis, A., Tsinaslanidis, P.: Identification of the head-and-shoulders technical analysis pattern with neural networks. In: Artificial Neural Networks-ICANN 2010. Lecture Notes on Computer Science vol. 6354, Springer (2010) 130–136 15. Ciosek, P., Wr´ oblewski, W.: The analysis of sensor array data with various pattern recognition techniques. Sensors and Actuators B: Chemical 114(1) (2006) 85–93

10 16. Mouy, X., Bahoura, M., Simard, Y.: Automatic recognition of fin and blue whale calls for real-time monitoring in the st. lawrence. Journal of the Acoustical Society of America 126(6) (2009) 2918–2928 17. Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1997) 18. Duda, R., Hart, P., Stork, D.: Pattern Classification. John-Wiley & Sons, New York (2001) 19. Crammer, K., Gilad-Bachrach, R., Navot, A., Tishby, N.: Margin analysis of the lvq algorithm. In: Advances in Neural Information Processing Systems 2002, MIT Press (2002) 109–114 20. Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998) 21. Stone, M.: Cross-validatory choice and assessment of statistical prediction. Journal of the Royal Statistical Society 36(1) (1974) 111–147 22. Hastie, T., Tibshirani, R., Friedman, R.: The Elements of Statistical Learning. Springer (2001) 23. Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J., Torkkola, K.: Lvq-pak: The learning vector quantization program package. Technical Report A30, Helsinki University of Technology, Laboratory of Computer and Information Science (1996)