Real-time wrist localization in color images based on

87 downloads 0 Views 5MB Size Report
Jul 24, 2016 - Taleb-Ahmed@univ-valenciennes.fr ...... Table 5 results of wrist localization processing time obtained using videos acquired from HP G62 ...
Multimed Tools Appl DOI 10.1007/s11042-016-3820-5

Real-time wrist localization in color images based on corner analysis Sofiane Medjram 1 & Mohamed Chaouki Babahenini 2 & Abdelmalik Taleb-Ahmed 3 & Yamina Mohamed Ben Ali 1

Received: 18 January 2016 / Revised: 24 July 2016 / Accepted: 28 July 2016 # Springer Science+Business Media New York 2016

Abstract Hand detection and gestures recognition have become very popular in recent human-computer interaction systems. Although several methods of hand detection have been proposed in the literature, they exist few methods that use the wrist as a factor of detection, others impose constraints on the length of the sleeves and on the orientation of the hand. In this work, we present a new two-stage algorithm of wrist localization designed for hand detection and gestures recognition systems. The first stage of the algorithm consists in separating the skin region containing the hand from the background, and in the second stage, the wrist is localized from the resulted skin mask. The main contribution of the proposed method is based on the analysis of corners along the contour of the skin masks to localize the wrist emplacement. Based on an evaluation on 437 color images with their ground-truth and three sets of skin masks, we compared our method with other efficient methods of literature and the results obtained were very satisfactory. Keywords Skin detection . Wrist localization . Hand detection . Corner detection . Local minimum . Human-computer interaction

* Sofiane Medjram [email protected] Mohamed Chaouki Babahenini [email protected] Abdelmalik Taleb-Ahmed [email protected] Yamina Mohamed Ben Ali [email protected]

1

Computer Science Department, LRI, University of Badji Mokhtar, B.P 12, 23000 Annaba, Algeria

2

Computer Science Department, LESIA, University of Mohamed Khider, B.P 145, 07000 Biskra, Algeria

3

LAMIH UMR CNRS 8201 University of Valenciennes and Hainaut-Cambresis, Valenciennes, France

Multimed Tools Appl

1 Introduction Hand gestures represent an important mean of communication between human beings due to their human nature and the high degrees of freedom that they provide. Recently, they played an important task of interaction in many applications of computer vision, including humancomputer interface [36], augmented reality [3] and more [22, 29, 32]. In order to interact in a natural way with a computer system, several methods of hand detection have been proposed and they are divided in two categories. The first category is based on materials, the user here, has to wear a glove device on his hand to perform the interaction [13, 22, 32]. Although this category offers high detection accuracy, it is still unnatural, expensive and annoying for daily uses. The second category is based on vision, the vision methods use an acquisition device (webcam, several webcams, depth laser) to acquire images, and from these images they extract hand features and classify them. Several methods for extraction and classification of hand features exist; methods based on skin color [1, 3, 5, 6, 12, 16, 17, 19, 24–26, 29, 31, 35, 36], Harr descriptors [16, 28], motion [4, 25], texture [34] and image depth [2, 8, 30]. In fact, the methods based on skin color are commonly used and that is because of their rapidity in segmentation, their efficiency of tracking in long period as well as their recommendation for real time acquisition. Unfortunately, few of these methods use the wrist as a factor for hand detection despite its major importance in hand anatomic structure, the majority puts constraints on the sleeves length or they depend on the hand orientation and the background color. Motived by the important role that the wrist position plays for the distinction between the hand and forearm regions, we present, in this paper, a new method composed of two-stage for its localization. Compared with the conventional methods of hand detection, the detection based on the wrist gives the user more flexibility and liberty in interaction. From color images containing the hand and the forearm in the scene, our method localizes automatically the wrist emplacement. The localization is achieved based on the analysis of corners along the contour of the skin masks without putting any constraints on the background color, hand orientations or sleeves length. We evaluate our proposed method of wrist localization and compared it with the methods proposed in [5, 31] on two stages. In the first stage, we measure their rapidity of execution and their efficiency of wrist localization using a set of ground truth skin masks, and in the second stage we measure their sensibility to errors (cumulative wrist localization errors) using three sets of skin masks (ground-truth skin masks, skin masks obtained by propagation seeds [10] and skin masks obtained by the method of Jones and Rehg [7]). We evaluate also the robustness of our method against the common challenges that occur in HCI applications. First, we verify the robustness of our method in complex background situations, then we verify its robustness against hand rotations and scale, and finally we evaluate its fidelity against noises. The paper is structured as follows. Section 2 introduces the different techniques of hand detection methods based on skin color presented in the literature. Section 3 presents the main contribution of the paper. Section 4 illustrates the experimental results, Section 5 presents the limits of the method and Section 6 concludes the work.

Multimed Tools Appl

2 Related works Methods based on skin color for hand detection and gestures recognition have known significant contributions in the fields of human-computer interaction and computer vision. Recently, the methods using the wrist as a factor of detection have become important and have been the subject of recent types of research. In the methods [18, 20, 27, 33], the simple forms of hand detection have been proposed. First, they separate the hand from the background using a common color space (YCbCr, HSV, YIV), then they detect the hand by the use of the hull convexity or high curvatures (corners) measures (Fig. 1). These methods are quick and recommended for simple computer vision applications, but they are sensitive to complex background, lighting changes and where the hand and the forearm are presented in the scene. Other methods have been proposed in [16, 23, 25, 29, 35] to deal with complex background and lighting changes. The authors in [29] used a combination between histogram and color to segment well the skin region. The authors in [35] have corrected the color of the image using the RACE algorithm, then they define a model for the skin using a Gaussian mixture; And in [16], the authors used a combination between YCrCb color space and Harr-like features. Compared to the methods [29, 35], the method of [16] is more efficient. In [23] the authors used a model based on YCbCr and clustred the chrominance using K-means. In [25], the authors used skin color and motion cues to obtain the potential hand region, then, using their method named Motion Time Image (MTI), they identified the optimal hand location. This method, compared to the methods above, shows acceptable detection accuracy, but it doesn’t work when the hand is static without motion. The new generation of hand detection methods localizes the wrist emplacement on the skin masks using two approaches: approach based on the analysis of the contour width and approach based on the analysis of the contour shape [3, 5, 14, 19, 31].

2.1 Approach based on contour width The first approach localizes the wrist emplacement by analyzing the width values of the contour region in regard to its orientation. If there is a significant change between the width values obtained, the position of this change in the contour will represent the position of the wrist. Choi and Seo [3] proposed a wrist localization method. This method assumes that Bthe skin color of the forearm has different brightness from other colored skin regions^. After forearm extraction and mean direction assignment, they determined the wrist position by finding the set

a

Skin Detection

b

Convex Hull

Fig. 1 The conventional hand detection method [18]

c

Convexity defects

Multimed Tools Appl

Fig. 2 The method of wrist localization proposed in [3]

of contour points where their distances (the width), with regard to the mean line, start to become constant (Fig. 2). Attila et Tamás. proposed a wrist localization method in [14]. After obtaining the direction of the skin region, the wrist position is determined by finding the important change of the width of the contour region treated (Fig. 3). Vidya et al. [31] proposed another method of wrist localization. They determined the position of the wrist by finding the local minimum of width values of the contour region (Fig. 4). After separating the region containing the hand and the forearm, they put it inside a rectangle and they calculate the width of the forearm in different positions along the height of this rectangle. The width results obtained with their corresponding positions along the height of the forearm draw a graph where the wrist local minimum is calculated. These methods using the appraoch based on the analysis of the contour width present a high detection accuracy of the wrist. Unfortunantly, they are sensitive to hand orientations as well as the use of gestures. When the hand of the user is rotated or when he uses some gestures, the significant change between the contour widths will be lost, therfore there will be no wrist detection.

2.2 Approach based on contour shape The second approach localizes the wrist emplacement by analyzing the shape properties of the contour region. The important property representing the wrist along the contour shape is being a local minimum.

Fig. 3 The method of wrist localization proposed in [14]

Multimed Tools Appl

Fig. 4 The method of wrist localization proposed in [31]

Grzejszczak et al. [5] proposed a method of wrist localization using the local minimum property of the contour shape (Fig. 5). After skin segmentation, they rotate the obtained region horizontally, then they detect the wrist position by finding its corresponding local minimums in the up and down parts of the contour. The methods using the approach based on the analysis of the contour shape present a good detection accuracy of the wrist, and compared to the methods of the first approach, these

Fig. 5 The method of wrist localization proposed in [5]

Multimed Tools Appl

methods are not sensitive to hand orientations and the use of gestures, there will be always a wrist detection.

3 Proposed method 3.1 List of abbreviations Before we start the explication and the description of our proposed method, here a list of different abbreviations used in the stages composing our method (Table 1). The main idea of our proposed method is based on the theory that, in a 2D space, a deformable concave object has always a higher number of corners compared to a curved one. Therefore, in the binarized skin mask Ɣ which contains the two portions (hand and forearm), the number of corners situated at the hand side ρ will be greater than the number of corners

Table 1 List of abbreviation used in the stages composing our proposed method Symbol

Designation

Ɣ

(binarized skin mask).

ρ

(Hand palm side).

θ

(skin region orientation).

ε

(skin region centre).

£

(skin probability map).

α

(skin seeds).

ς

(skin region contour).

CM DeltaX

(our proposed method: corner method for wrist localization). (vertical geometric representation of freeman code).

DeltaY

(horizontal geometric representation of freeman code).

E

(maximal threshold used for correct wrist detection).

e

(wrist localization error value mesured between U’V’ and UV).

e_avg

(average error of wrist detection which equal to mean ± std. of e values).

FP

(background pixels detected as skin),

GT

(ground-truth skin images).

H JRE

(harris corners detector algorithm) (skin images using the method of Jones and Rehg).

LMM

(local minimum method for wrist localization).

PBA

(skin images using the method of propagation seeds).

t_avg

(average time processing which equal to mean ± std. of time processing values).

t_max

(maximum time processing).

TN

(skin pixels detected as background).

U, V

(wrist points annotated on colour images of database).

U′, V′ W

(detected wrist corners) (the centre point of U and V).

W’

(the centre point of U′ and V′).

X0

(freeman code start point)

Multimed Tools Appl

situated at the elbow whatever the gesture used. And relatively to the orientation θ and the center of gravity ε of this region, the nearest corner U’ to the center point ε will be the corner representing the wrist location (Fig. 6). The advantage of using corners to localize the wrist emplacement, makes our method inherits automatically the corners properties to be invariant to the geometric (orientation and rotation), scale and photometric (illumination) changes of the hand. Our method is composed of two stages:

& &

The first stage consists in separating the region containing the hand from the background using a skin detection method. In the second stage, we detect the wrist emplacement based on the theory cited above. An overview of our algorithm of wrist localization is given in Algorithm.1:

3.2 Skin segmentation Skin segmentation is usually the first step in hand detection methods, the results of the second stage highly depend on the result of the skin mask. In our work, we have chosen a segmentation based on a spatial analysis, this approach compared to the others approaches (parametric, statistical, adaptive and texture) [9, 37] gives good results in front of the lighting changes and complex background. However, we used the method proposed in [10] (used in [5, 11, 19]), this method consists in detecting the skin region based on the skin seeds propagation. After extracting the skin seeds α from the skin probability map £, a propagation phase of these seeds is executed in order to collect the true skin pixels Ɣ independently from the false negative ones. The algorithm of this method is presented in Algorithm.2:

Multimed Tools Appl

Step 1:: image input

Step 2: skin segmentation

S 3: corner points detection Step

Step 4: hand side detection

Step 5: wrist localization

Step 6: hand detection

Fig. 6 The different steps composing our method of wrist localization

Multimed Tools Appl

3.3 Wrist localization The binarized skin mask Ɣ obtained from the first stage contains in the most cases the both portions: the hand and the forearm, the separation of those portions will be established by the localization of the nearest corner U’ to the center ε of the region from the side ρ containing the highest number of corners. An overview of our method proposed is described in Algorithm.3:

3.3.1 Region orientation To find the orientation θ of the skin region Ɣ, we analyzed its contour ς using Freeman Code; which is an algorithm of chain code that consists in coding the connection of every couple of points from the contour depending on their connection V4 or V8. In other words, the algorithm starts from an initial point of the contour X0 and an initial direction then, it moves from this pixel to its immediate neighbor. At each step, it returns a code or a symbol representing the direction of this movement until returning to the initial point X0. In our method, we chose the V8 connection to code the contour points (Fig. 7). Instead of using the default symbols returned by Freeman Code (0 to 7), we used their geometric code DeltaX and DeltaY to facilitate the determination of the region orientation (Table 2). Fig. 7 The Freeman code symbols for a V8 connection

Multimed Tools Appl

Table 2 The Freeman geometric code for a V8 connection

Code V8

Delta X

Delta Y

0

1

0

1 2

0 0

0 1

3

0

0

4

-1

0

5

0

0

6

0

-1

7

0

0

Thus, the orientation of the region θ is obtained by calculating the sum of all values of DeltaX and DeltaY returned at each point of the contour. If the sum of DeltaX is higher than DeltaY, then the orientation is horizontal otherwise, the orientation is vertical (Fig. 8). The determination of region orientation was also used in [5]. The orientation was detected by the mean of brute-force or randomizes algorithms applied on the contour points in order to get the longest chord. Although their method is good and accurate, it is less rapid than our proposed method. The different steps used for the determination of the skin region orientation are presented in Algorithm.4.

Fig. 8 The proposed method for the determination of region's orientation

Multimed Tools Appl

3.3.2 Wrist detection The hand is a deformable organ; we can produce more than 20 gestures with it. In fact, in a 2D plan space, we can differentiate between a concave shape and a curved one with their number of corner points. Based on this theory, we can detect the location of the hand ρ from that of the elbow. For every gesture produced by the user, the number of corners situated at the hand location ρ in a 2D space is always higher than the elbow location. We used the Harris corners detector algorithm [15, 21] in the proposed method and its equation is as follow: X wðx; yÞ½Iðx þ u; y þ vÞ−Iðx; yÞ2 ð1Þ Hðu; vÞ ¼ x;y Having determined the hand location ρ, the wrist position is then detected by finding the nearest corner point U’ to the center of the region Ɣ (Algorithm.5).

3.4 Selecting data from database The database1 chosen to evaluate our method of wrist localization contains 899 color images of hand gestures with and without the presence of the forearm. In order to reduce this amount of images only for those containing the forearm presence, we used the PCA reduction dimensionality approach to classify theme into two classes. After removing the misclassified images from the forearm class, we obtained in the final 437 color images with their ground-truth for the evaluation.

4 Implementation tests and experimental study Our experimental results are conducted on HP G62 notebook equipped with an Intel processor Core ™ i3 CPU 2.27Ghz, 4G of RAM and windows 7 OS. The method was implemented on Matlab2013a and assessed using the database1 of hand gesture recognition created by [5, 19]. The result of the skin region extraction is showed in the Fig. 9. In Fig. 10, some groundtruth images of annotated wrist points are presented.

1

http://sun.aei.polsl.pl/~mkawulok/gestures

Multimed Tools Appl

Fig. 9 The result of skin detection using the spatial analysis segmentation

We evaluated the performances of our method of wrist localization using 437 color images selected from the database1 with their ground-truth (only images containing the hand with the forearm) in measures of processing time, error statistics and the cumulative error distribution. For the processing time measure, we have re-implemented the methods of [5, 19] and [31], then we compared them with our proposed method over 11 tests using the ground-truth skin masks. For the error statistics measure and using the ground-truth skin masks, we evaluated the correct wrist points outcomes with the annotated ones as mentioned in the method of [5, 19]. A correct wrist point is defined by a detection error e = | WW’|/|UV| limited to a threshold E . 0< e< E ð2Þ e ¼ jWW0 j jUVj; where W′ is the wrist point detected, W, U and V are the points annotated and E is the threshold value of the maximal acceptable wrist detection, E = 2. (Fig. 11). In fact, our method uses one corner point only for wrist detection U′, for the adaptation to the errors statistics and the cumulative error distributions measures, we project the detected corner point U′ vertically or horizontally on the hand contour relative (with respect) to the orientation of the region treated θ (Fig. 12). Once we have the two corners points U′ and V’ and their local minimum W’, we calculate the measure e for the corresponding images. Processing

Fig. 10 The ground-truth images with the wrist locations annotated (blue and pink crosses)

Multimed Tools Appl Fig. 11 The wrist point detetcted (U', V', W') and its corresponding error of localizartion e=0.5

time and error statistics results obtained for E = 1.0, E = 0.5 and E = 0.3 thresholds values using the ground-truth skin masks are presented in the Table 3. The methods of the experiment presented in Table 3 use two different approaches of wrist localization: an approach based on the analysis of the contour shape and an approach based on the analysis of the contour width. However, from the results obtained by the first approach (our proposed method and the method of [5, 19]), we can notice that our method presents fast

Fig. 12 The results of wrist localization points (U′, V’, W’) (U′ and V’ are represented by blue stars, and W′ is the yellow circle) using our proposed method on some images of the database, and their projection on the corresponding ground-truth images (U, V, W) (U and V are represented by a yellow stars, and W is the red circle)

Multimed Tools Appl Table 3 Comparison results between the Corner and Local minimum wrist localization methods in measure of processing time and error statistics 437 color images

Processing time

Our proposed method CM

The method proposed The method in [5, 19] LMM proposed in [31]

tavg

0.1083 ± 0.0504 s 0.1643 ± 0.0658 s

tmax

0.3694 s

0.6123 s

0.4939 s

Total error

e_avg

0.8987 ± 0.6675

0.9872 ± 0.6538

0.7248 ± 0.7613 *

E = 1.0

Number of e > E e_avg > E

185 (42.33 %) 0.6554

190 (43.47 %) 0.6747

123 (28.14 %) * 0.5023 *

E = 0.5 E = 0.3

0.0972 ± 0.0495 s

Number of e > E

277 (63.38 %)

337 (77.11 %)

206 (47.13 %) *

e_avg > E

0.8127

0.9298

0.6346 *

Number of e > E

335 (76.65 %)

372 (85.12 %)

252 (57.66 %) *

e_avg > E

0.8648

0.9628

0.6757 *

0%

10 %

Rate of unsuccessful wrist localization images 0 %

responses (minimum values of t_avg, standard deviation and t_max) with good precisions (minimum values of e_avg) compared to the method of [5, 19]. The first reason is about the algorithm complexity, our method uses few stages (skin detection, region orientation, corners detection and the localization of the wrist) than the method of [5, 19] (skin detection, region orientation, region rotation, local minimum detection) with few tasks complexity, which makes our method always faster whatever the machine used for experiments. The processing time histogram given in (Fig. 13) shown that more than half of the evaluated images have been executed in 0.100 s only by our method, in the second method, the majority of the images are distributed along the 0.100, 0.150, 0.160 and 0.170 s respectively. Before we discuss about the other reasons, it is important to note that the method of [5, 19] re-implemented in the evaluation is more optimized than the original one. Instead the search of the longest cord profile from the contour points using brute-force or randomizes algorithms to rotate the skin region, we surrounded the skin region by an ellipse and got out its two diagonal

Number of occurences

300 250 200 150

CM TP LMM TP

100 50 0

0.1

0.14

0.18

0.23

0.27

Time in seconds Fig. 13 The processing time histogram comparison between corner and local minimum wrist localization methods

Multimed Tools Appl

axes to determine the orientation. This step largely reduced the complexity computation of the method [5, 19] over 10,000 times or more. The second reason is about the method of [5, 19] itself, as mentioned by their authors, the method is sometimes sensitive to the long skin regions (case studied B hand with forearm presented in the scene^), a long region can generate false local minimum points and a false detection of the hand side. In the histogram of the average error detection presented in (Fig. 14), we confirmed that our method has good detections accuracy. Contrary to the method of [5, 19] which the most of its results are distributed along 0.1 and 1.0 average error bars (due to the affection to the long region) (253 of the images (57.89 % ) had error average between 0.1 and 1.0, and only 113 images (25.85 %) between 0.1 and 0.5), the most of the results of our method are focused between 0.1 and 0.5 average error bars with a highest peak at the 0.1 error average value (174 of the images (39.81 %) had error average between 0.1 and 0.5 and, 261 of the images (59.72 %) between 0.1 and 1.0). For the results of the second approach of wrist localization, Table 3 shows that the method of [31] presents good performances compared to our proposed method, especially in measure of errors statistics. The important reasons behind that are related to the horizontal rotation stage applied in the method of [31], and the property or the feature of wrist emplacement used for the localization in both methods. The structure of the hand defines the two points representing the wrist emplacement being symmetric and they are easy to identify where the hand region is oriented horizontally or vertically (as it has been applied in the method of [31]). Our method, in fact, does not rotate the region of the hand and detects the second point of the wrist by the projection of the nearest corner horizontally or vertically depending to the global orientation of the skin region detected, this determination of the second point of the wrist is correct but not always accurate especially where the skin region is V-shaped. The Fig. 15 shows an example of the impact of the horizontal rotation on the wrist localization results. The Fig. 15 shows the result of localizations for both methods, with their corresponding errors on a same image. Although our method succeeds to localize the first wrist point at the same position of the expert, the method of [31] gives the best accuracy (E = 0.1519).

Number of occurences

60 50 40 30

CM LMM

20 10 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

e_avg Fig. 14 The average error detection histogram comparison between corner and local minimum wrist localization methods

Multimed Tools Appl

Method of [31]

Horizontal region rotation

Error = 0.1519

Our method proposed

Global region orientation is horizontal

Error = 0.2194

Fig. 15 Wrist localization accuracy results obtained by our method and the method of [31] using the same image

Effectively, the horizontal rotation of the skin region may enhance the localizations accuracy obtained by on our proposed method. We will consider it as a perspective and we will evaluate it in future works. An example of a succeeded wrist localization enhancement using the horizontal rotation stage on our method is presented in Fig. 16. The second reason is about the feature of wrist emplacement used for the localization in both methods. As we cited above, our method defines the wrist emplacement as the nearest corner to the center of the skin region from the hand side that contains the highest number of corners. The method of [31] defines the wrist emplacement as the position of the significant change between the width values of the contour hand region. Normally using the ground truth skin masks in the evaluation, we must have good results always. However, after the verification of the ground truth skin data used in the experiment, we found that the database contains several bad results (60 images 14 % of data), images with additional false skin pixels and with rough contours. In fact, these two gaps do not affect the results of the method of [31], by consequence, they affect only the results of our proposed method. The additional skin pixels and the roughness of the contours generate additional corners detection which generally leads to localization results with additional errors (even with an ideal orientation of the skin region). The illustration of these gaps and their affection on the results of our method of wrist localization is presented in the Fig. 17. The serious gaps affecting the results of the method [31] are the use of gestures and rotation of the hand. Where the use of rotation or gestures, the significant change of width values can

Multimed Tools Appl

Our method proposed

Before horizontal rotation

Error = 0.2194

Our method proposed

After horizontal rotation

Error = 0.0880

Fig. 16 Example of succeeded enhancement of wrist localization accuracy for our method using the horizontal rotation stage

be lost or detected away from the annotated points of expert. In fact, although the method of [31] has given good detection accuracy compared to our proposed method, due to the rotation stage and to its insensibility to bad data of the database, it shows no wrist detection results in several images used in the experiment (10 % of the images (47/437)). In addition, in the experiment we had only 437 color images with or without the use of rotation and gestures, if this amount of data increases, the number of images resulted without wrists detection will increase, also. In final, these serious limits make the method of [31] a nonflexible method of wrist localization and less effective than our proposed method (the corners are insensitive to rotations and gestures, there will be always a wrist detection). The Fig. 18 illustrates the robustness of our method compared to the method of [31], where the use of gestures and the rotation of the hand. Until now, we have only measured and compared the performances of our method with the methods of [5, 19] and [31] in their second stage (wrist localization stage) using the groundtruth skin masks and we showed that the approach based on the analysis of the contour shape is more effective than the approach based on the contour width. In fact, the majority of multi-stages approaches are affected by the performances of the first ones. However, based on the result of the effectiveness of the approach based on contour shape compared to the other one based on the contour width, we measured the sensibility of its methods using the measure of cumulative error distribution and we have compared them on their both

Multimed Tools Appl

Fig. 17 The bad results of ground truth database and their effects on the results of wrist localization obtained by our method (the yellows and greens * in the images at the right, are the results of localization obtained)

stages using three sets of skin masks: ground-truth skin masks; skin masks obtained by propagation seeds [10] and skin masks obtained by the method of Jones and Rehg [7] (Table 4). As shown in Fig. 19, the cumulative wrist localization error results obtained by our method gives a high localization in the three sets of skin masks. Although the CM JR graph gives the

No wrist detection resulted by method [31]

Accurate wrist detection resulted by our method

Fig. 18 The serious gaps affecting the method of [31] and the robustness of our method against them

Multimed Tools Appl Table 4 Comparison between Corner and Local Minimum wrist localization methods based on Skin detection errors and cumulative wrist detection error distribution Skin detection methods

Skin detection errors FP

TN

-

-

GT

Wrist localization methods

Value of e for the set of data 0.2

0.5

0.7

1.0

1.5

CM

0.187

0.377

0.452

0.583

0.778

LMM

0.100

0.232

0.352

0.566

0.834

PBA

5.32 %

1.01 %

CM LMM

0.100 0.045

0.351 0.171

0.455 0.321

0.632 0.500

0.833 0.732

JR

4.55 %

1.49 %

CM

0.192

0.382

0.451

0.645

0.887

LMM

0.081

0.171

0.332

0.515

0.753

highest results, the CM GT graph remains the best. We can notice that where the error e is inferior to e = 1 both graphs are very close and the graph of CM PBA is relatively bellow, once the e starts to increase, the graph of CM GT starts to decrease and the graphs of CM JR and CM PBA continued to grow high, which mean they continue to lose precision in localization. The difference between these graphs is strongly affected by the false positive FP (background pixels classified as skin) and the true negative TN (skin pixels classified as background) skin pixels obtained by the PBA and JR methods (5.32 %FP and 1.01TN for PBA and 4.55 %FP 1.49 %TN for JR). These misclassified pixels have a double impact; they can improve the number of corners situated at the hand side which give high localization results, or they give strong false localizations by improving the number of corners situated at the opposite side of the hand and that’s what we noticed in both graphs shown in the Fig. 19. The second method, furthermore, shows high results at the graph using ground-truth skin masks LMM GT and starts to lose accuracy in the other ones (LMM PBA and LMM JR graphs)(Fig. 20). The LMM PBA and LMM JR graphs lost their precision because to their rate of misclassified skin pixels and the roughness of their contours. Error CDF 1 0.9 0.8 0.7

F(x)

0.6 0.5 0.4 0.3 CM GT CM PBA CM JR

0.2 0.1 0

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

x

Fig. 19 Cumulative wrist detection error e distribution of the Corner wrist localization method

2

Multimed Tools Appl

Empirical CDF 1 0.9 0.8 0.7

F(x)

0.6 0.5 0.4 LMM JR LMM PBA LMM GT

0.3 0.2 0.1 0

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x Fig. 20 Cumulative wrist detection error e distribution of the Local Minimum method

Finally, as our proposed method uses the PBA algorithm for skin region segmentation in the first stage, we can confirm from the results of processing time and cumulative error of CM PBA (compared with the LMM PBA (Table 4)) that our method of wrist localization is quick, efficient and recommended for hand detection. However, to prove that our method is robust to the photometric and rotation changes, its recommendation for real-time applications as well as its fidelity to noises, we need to make additional experiments.

4.1 The robustness to photometric and rotation changes We cited above in the introduction that most methods of hand detection based on skin color are sensitive to complex background, sleeves length and hand orientations. In fact, from previous experiments, we can prove that our proposed method deals well with these challenges. As the method of [5, 19], we used the method of seed propagation for skin detection. In 437 color images (images used for evaluation) with different dimensions taken in different work conditions (complex background, lighting changes, uncontrolled light, color similarities, etc.) it detects the skin pixels with few errors (only 5.32 % of skin pixels have been classified as background, and only 1.01 % background pixels have been classified as skin). For the sleeves length constraint commonly used in the standard methods of hand detection (hand region only or long sleeves), we have overcome it using the wrist localization process with images containing the hand and the forearm regions, not only the hand region. And concerning the hand orientations and scale invariance constraints, our method has benefited from the corner properties to deal with these challenges (corners are insensitive to rotations and scale). In the following figure (Fig. 21), we show some results of wrist localization of our proposed method and the method of [5, 19], where the hand of the user is rotated. The method of [5, 19] localizes the wrist emplacement using the property of local minimum. However, when the hand of the user is rotated, the shape of its skin mask and its

Multimed Tools Appl

(a)

(b)

(a)

(b)

Fig. 21 Illustration of the maintainability of our proposed method against rotations and scale using the properties of corner features

contour will be changed, therefore, the property of local minimum for the localization of the wrist can be lost. As we show in the Fig. 21b, the position of the wrist points (red circles) detected by the method of [5, 19] are much separated from the annotated points of expert (yellow *). In the other side, Fig. 21a, our proposed method still maintains a good wrist localization (blue *) and that is because of the inheritance of the insensibility of corners against rotation and scale.

4.2 The recommendation for real-time applications In order to show that our method is recommended for real-time applications or not, we need to experiment it using real-time videos. However, due to the lack of database videos that treat our case of wrist localization, where the hand and the forearm are presented in the scene, we created some videos using different users for the experiment. The videos have been taken

Multimed Tools Appl Table 5 results of wrist localization processing time obtained using videos acquired from HP G62 webcam (RGB 320 × 240, fps 10)

Processing Time

Video1

Video2

Video3

0.0773 ± 0.0205 s

0.0705 ± 0.0201 s

0.0696 ± 0.0195 s

using the webcam of our laptop HP G62, the resolution was set to 320 × 240, fps to 10 and the returned color space is RGB. The results of processing time obtained for videos of three users are presented in the following table (Table 5): Hence, from these results obtained by HPG62 laptop with no GPU accelerator, we assume that our method is recommended for real-time applications.

4.3 The fidelity to noises Finally, to verify if our method is robust to noises, we evaluated its performances of wrist localization using 9 different noises, the noises are Gaussian, Salt and Pepper and Speckle. For each type of them, we generate 3 variants by modifying their value of variance. In fact, the results of adding noises to the images of the database showed that the main part that the noises affect is the contour of the skin mask. Even with the application of noises reductions algorithms on our method as a pre-treatment, the results of the skin contour after segmentation are always bad (Fig 22). And even with these bad segmentations of the skin, the experiment showed that the localization of the wrist is always maintained.

Image with gaussian noise variance= 0.01

Image with Salt and Papper Image with Seckle noise noise variance = 0.02 variance =0.03

Fig. 22 Skin segmentation results after the application of noises

Multimed Tools Appl Table 6 Some wrist localization results obtained after the noises applications Our Method Proposed Our Method Proposed Our Method Proposed CM with Sackle noise CM with Salte and CM with Gaussian variance =0.03 noise variance =0.01 Pepper noise variance =0.02

437 color images

Processing time

tavg tmax

0.1334 ± 0.0836 s 0.4476 s

Total error

e_avg

0.8988 ± 0.5936

E = 1.0

Number of e > E 180

0.1685 ± 0.1145 s 0.7161 s

0.1580 ± 0.1011 s 0.4782 s

0.7474 ± 0.5582

0.8707 ± 0.5769

136

172

E = 0.5

Number of e > E 303

252

307

E = 0.3

Number of e > E 363

317

360

The results of noises experiment is presented in Table 6. It shows that the accuracy of wrist localizations after noises application are almost similar to the results of localization without their applications (Original method Table 3, first column), except for the metric of processing time where the original method showed more rapidity. Due the bad skin contour results obtained after the application of noises, we expected to have bad wrist localizations also, but it is not the case. The Important reason behind that is the roughness of the contour itself. In fact, our method defines the wrist as the nearest corner to the centre of the skin region from the side that containing the highest number of corners, the roughness of skin contour generated additional corner points and because of theme the localization of the wrist is maintained. The generations of additional corner points explain also the additional difference of time between the original method and the methods after noises applications. Furthermore, the additional corner points generated due the roughness of the skin contour showed two impacts: they can localize the wrist emplacement with or without the correct hand side. Even the localization of the wrist is maintained, the hand side is not always detected at the right place. Gathering the results of noises fidelity experiment, we conclude two things:

Fig. 23 An example of hand gesture where the wrist region does not contain a curvature with an angle fewer than or equal to 169°

Multimed Tools Appl

& &

The fidelity of our proposed method against noises is positive, if we considered the accuracy of wrist localization only. The fidelity of our proposed method against noises is negative, if we considered the accuracy of wrist localization and the correct hand side detection.

5 Limits and disadvantages of the proposed method Although our proposed method reports good wrist localization results, there are some limitations which need investigation and improvement. Here, we present these limits:

& &

A good localization of the wrist by our proposed method depends on the good quality of the image (image without noises), the good segmentation of the skin and the good detection of corners in the different parts of the hand (wrist, fingers, elbow, and forearm). In the case where the skin results are good, an accurate wrist localization is not always warranty. In fact, the corners are much related to curvature angles, without curvatures there is no corners.

There are some gestures of hand where the wrist region does not contain curvatures (Fig. 23), therefore there will be no corner detection. However, to make our detection closer to the annotated wrist points of experts, we set the angle of a true corner of Harris algorithm to 169° (the selection de 169° has been chosen after a learning stage with different angles [the default angle 162°, 165°, 169°, 175°]) and the results obtained have been quick enough and much closer to the annotated wrist points of experts (the angle 169° for corner detection has been used in the previous experiments).

6 Conclusion In this paper, we presented a new method of wrist localization based on the analysis of the number of corners along the contour of the skin masks. Using the ground-truth skin masks in the experiments, the proposed method showed its accuracy to detect well the wrist location with the minimum errors, its inheritance to use the properties of corners to be invariant to the orientations, scale and photometric changes of the hand, its fidelity against noises, as well as, its rapidity compared to the existing methods. The evaluation of the second stage of our proposed method using the ground truth data showed that the approach based on the analysis of contour shape is more effective than the approach based on the analysis of the contour width. Whatever the challenge presented in the scene, our method will succeed to find an appropriate emplacement for the wrist. We presented also an experimental study of our method of two-stages using three sets of skin masks, and we showed that an ineffective skin detection has a double impact on the accuracy results, it can improve or not the localization performances. We showed also that where the false positive and true negative pixels are minimized, we always obtained the high performances. Finally, we illustrated that the PBA method of skin detection applied in our method improves the results of wrist localization compared to the method of Local Minimum. As perspectives, we would like to improve our method in its both stages to maximize the performances. In the first stage, we would like to improve the results of skin detection by minimizing the number of misclassified pixels, and we would like also to maximize the accuracy of detection of the second point of the wrist by applying a horizontal rotation on the skin mask

Multimed Tools Appl

resulted. In the second stage, we will try to combine the advantages of corners with those of local minimum in order to localize the wrist without any constraints on the hand presence.

References 1. Binh N, Ejima T (2014) Real-Time Hand Gesture Recognition Using Finger Segmentation. Handawi Publ Corp Sci World J 2014:820–824 2. Cerlinca TI, Pentiuc SG (2011) Robust 3D hand detection for gestures recognition. Stud Comput Intell 382: 259–264. doi:10.1007/978-3-642-24013-3_27 3. Choi J, Seo B, Park J (2009) Robust Hand Detection for Augmented Reality Interface. In: Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry. ACM, Yokohama, Japan December 14–15, 2009, pp. 319–322 4. Erol A, Bebis G, Nicolescu M, et al. (2007) Vision-based hand pose estimation: a review. Comput Vis Image Underst 108:52–73. doi:10.1016/j.cviu.2006.10.012 5. Grzejszczak T, Nalepa J, Kawulok M (2013) Real-Time WristLocalization in Hand Silhouettes. In: Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013. Vol 226, pp. 439–449 6. Grzejszczak T, Kawulok M, Galuszka A (2015) Hand landmarks detection and localization in color images. Multimed Tools Appl. doi:10.1007/s11042–015–2934-5 7. Jones MJ, Rehg JM (2002) Statistical color models with application to skin detection. Int J Comput Vis 46: 81–96. doi:10.1023/A:1013200319198 8. Jun C, Wenjun H, Qing S (2011) Binocular Vision-Based Position and Pose of Hand Detection and Tracking in Space. International Conference, ICICIS 2011, Chongqing, China, January 8–9. Vol 134, pp. 668–675 9. Kakumanu P, Makrogiannis S, Bourbakis N (2007) A survey of skin-color modeling and detection methods. Pattern Recogn 40:1106–1122. doi:10.1016/j.patcog.2006.06.010 10. Kawulok M (2013) Fast propagation-based skin regions segmentation in color images. 2013 10th IEEE Int Conf Work Autom Face Gesture Recognition, FG 2013. doi:10.1109/FG.2013.6553733 11. Kawulok M, Kawulok J, Nalepa J, Smolka B (2014) Self-adaptive algorithm for segmenting skin regions. EURASIP J Adv Signal Process:1–22. doi:10.1186/1687-6180-2014-170 12. Kerdvibulvech C (2014) A methodology for hand and finger motion analysis using adaptive probabilistic models. EURASIP J Embed Syst 2014:18. doi:10.1186/s13639–014–0018-7 13. Keskin C, Aran O, Akarun L (2005) Real Time Gestural Interface for Generic Applications. In: Signal Processing Conference IEEE, 13th European, Antalya, pp 1–4 14. Licsar A, Sziranyi T (2004) Hand gesture recognition in camera-projector system. Lect Notes Comput Sci: 83–93. doi:10.1007/b97917 15. Luo Z (2013) Survey of Corner Detection Techniques in Image Processing. Int J Recent Technol Eng 2:184–185 16. Mao G-Z, Wu Y-L, Hor M-K, Tang C-Y (2009) Real-Time Hand Detection and Tracking against Complex Background. 2009 Fifth Int Conf Intell Inf Hiding Multimed Signal Process 905–908. doi:10.1109/IIH-MSP. 2009.133 17. Mittal A, Zisserman A, Torr P (2011) Hand detection using multiple proposals. Procedings of the British Machine Vision Conference 2011 75.1–75.11. doi:10.5244/C.25.75 BMVA Press, Scotland, UK 18. Nagarajan S, Subashini T, Ramalingam V (2012) Vision Based Real Time Finger Counter for Hand Gesture Recognition. CpmrOrg.in 2:1–5. 19. Nelpa J, Grzejszczak T, Kawulok M (2014) Wrist Localization in Color Images For Hand Gesture Recognition. Man-Machine Interact 3(242):123–130. doi:10.1007/978–3–319-02309-0 20. Noreen U, Jamil M, Ahmad N (2016) Hand Detection Using HSV Model. Int J Sci Technol Res 5:195–197 21. Patel TP, Panchal SR (2014) Corner Detection Techniques : An Introductory Survey. Int J Eng Dev Res 2: 3680–3686 22. Paulson B, Cummings D, Hammond T (2011) Object interaction detection using hand posture cues in an office setting. Int J Hum Comput Stud 69:19–29. doi:10.1016/j.ijhcs.2010.09.003 23. Qiu-yu Z, Jun-chi L, Mo-yi Z, et al. (2015) Hand gesture segmentation method based on YCbCr color space and K-means clustering. Int J signal process image process Pattern Recogn 8:105–116 24. Medjram S, Babahenini MC, Mohamed Benali Y, Abdelmalik T-A (2016) Improving the Method of Wrist Localization Local Minimum-Besed for Hand Detection. In: Modelling and Implementation of Complex Systems. Springer, Constantine, Algeria, pp. 153–163 25. Song Z, Yang H, Zhao Y, Zheng F (2010) Hand detection and gesture recognition exploit motion times image in complicate scenarios. Advances in Visual Computing 6th International Symposium, ISVC 2010, Las Vegas, NV, USA. Vol 6454, pp. 628–636

Multimed Tools Appl 26. Stergiopoulou E, Papamarkos NÃ (2009) Engineering applications of artificial intelligence hand gesture recognition using a neural network shape fitting technique. Eng Appl Artif Intell 22:1141–1158. doi:10.1016/ j.engappai.2009.03.008 27. Suksil T, Chalidabhongse TH (2013) Hand detection and feature extraction for static Thai Sign Language recognition. Proc 7th Int Conf Ubiquitous Inf Manag Commun - ICUIMC ‘13 1–6. doi:10.1145/2448556.2448579 28. Thuy Thi N, Dang Nguyen B, Bischof H (2008) An active boosting-based learning framework for real-time hand detection. 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, 1–6. doi:10.1109/AFGR.2008.4813315 29. Toni B, Darko J, (2012) A robust hand detection and tracking algorithm with application to natural user interface. MIPRO, 2012 Proceedings of the 35th International Convention 1768–1774, IEEE, Opatija, Croatia. 30. Trigueiros P, Ribeiro F, Reis LP (2014) Vision-Based Portuguese Sign Language Recognition System. doi: 10.1007/978–3–319-05951-8 31. Vidya K, Deryl R, Dinesh K, et al. (2014) Enhancing hand interaction patterns for virtual objects in mobile augmented reality using marker-less tracking. In: International Conference on Computing for Sustainable Global Development, INDIACom 2014, IEEE, New Delhi India, pp 705–709 32. Wang RY, Popović J (2009) Real-time hand-tracking with a color glove. ACM Trans Graph 28:1. doi:10. 1145/1531326.1531369 33. Wang YR, Lin WH, Yang L (2015) An improved hand detection by employing corner detector. Proc - Int Conf Mach Learn Cybern 1:414–419. doi:10.1109/ICMLC.2014.7009151 34. Xiao B, Xu XM, QP M (2010) Real-time hand detection and tracking using LBP features. 6th International Conference, ADMA 2010, Chongqing, China, November 19-21. Vol 6441, pp. 282–289 35. Xie S, Pan J (2011) Hand Detection Using Robust Color Correction and Gaussian Mixture Model. In: 2011 Sixth International Conference on Image and Graphics IEEE, Hefei, Anhui, pp 553–557 36. Yeo HS, Lee BG, Lim H (2013) Hand tracking and gesture recognition system for human-computer interaction using low-cost hardware. Multimed Tools Appl:1–29. doi:10.1007/s11042-013-1501-1 37. Yogarajah P, Condell J, Curran K, et al. (2010) A dynamic threshold approach for skin segmentation in color images. Proc - Int Conf Image Process ICIP:2225–2228. doi:10.1109/ICIP.2010.5652798

Sofiane Medjram was born in 1989. He received his licence in computer science and mathematics in 2009, and Master’s degree in Artificial Intelligence and Pattern Recognition in 2011, from Badji Mokhtar University. He is currently a Ph.D. Student in Pattern Recognition and artificial intelligence at Badji Mokhtar University. He authored one technical article in computer vision and others being processed.

Multimed Tools Appl

Mohamed Chaouki Babahenini is a researcher and head of real-time rendering group at LESIA Laboratory, he is also an associate professor at the Department of Computer science of the Biskra University in Algeria, where he received a Ph.D. in 2006. His current research interests are real-time rendering, 3D reconstruction, point-based rendering and data mining. He has co-authored many papers in these fields.

Abdelmalik Taleb-Ahmed has received his doctorate in Computer Science from the University of Lille1 (Lille, France) in 1992. In 2003, he received his HDR from the University of Côte d’opale (Calais, France) and continues to work in the field of Image Processing and Computer Vision. He authored several scientific articles in this field.

Multimed Tools Appl

Yamina Mohamed Ben Ali born in 1968, has received her doctorate in Computer Science from the University of Badji Mokhtar Annaba (Algeria) in 2004. In 2009, she received her HDR and continues to work in the field of Evolutionary Computation. Her research studies are extended to bio-inspired approaches such swarm intelligence applied especially to neural networks and image processing.