Fast propagation-based skin regions segmentation in color images

8 downloads 0 Views 5MB Size Report
that the hand silhouette is given or impose strong assump- tions on the ...... 1, pp. 115–120. [20] M. Kawulok, “Dynamic skin detection in color images for sign.
Fast propagation-based skin regions segmentation in color images Michal Kawulok

Abstract— This paper introduces a new method for skin regions segmentation which consists in spatial analysis of skin probability maps obtained using pixel-wise detectors. There are a number of methods which use various techniques of skin color modeling to classify every individual pixel or transform input color images into skin probability maps, but their performance is limited due to high variance and low specificity of the skin color. Detection precision can be enhanced based on spatial analysis of skin pixels, however this direction has been little explored so far. Our contribution lies in using the distance transform for propagating the “skinness” across the image in a combined domain of luminance, hue and skin probability. In the paper we explain theoretical advantages of the proposed method over alternative skin detectors that also perform spatial analysis. Finally, we present results of an extensive experimental study which clearly indicate high competitiveness of the proposed method and its relevance to gesture recognition.

I. I NTRODUCTION Skin segmentation in color images and videos is a crucial and challenging step in gesture recognition systems. There is no satisfactory solution so far which would allow for robust extraction of skin regions in uncontrolled conditions. Existing gesture recognition systems [1], [2] either assume that the hand silhouette is given or impose strong assumptions on the background color to make skin region extraction feasible. Alternatively, hand detection is supported using additional equipment like data gloves [3] or markers [4]. Also, infra-red imaging can be employed, which allows for depth map acquisition as in the case of the Kinect sensor. These techniques greatly facilitate detecting the hand region and its feature points, but they can hardly be deployed in real-world environment. The majority of existing skin detectors classify every individual pixel based on its position in the color space using various methods for skin color modeling. Skin color variations are high due to such factors like race, age or complexion. Intra-personal differences may also be substantial because of variations in lighting conditions or individual’s physical state. Furthermore, skin pixels are often overlapped with the background in the color space domain, which results in false positive errors, even if the model is well adapted to a particular scene. Such problems can be addressed using spatial analysis of the skin pixels, but very few works have explored this direction. In the research reported here we investigated how to propagate the “skinness” (i.e. the This work has been supported by the Polish Ministry of Science and Higher Education under research grant no. IP2011 023071 from the Science Budget 2012–2013. M. Kawulok is with Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland

[email protected]

likelihood that a pixel belongs to a skin region) from skin seeds to determine the boundaries of skin regions. The propagation is based on the distance transform, whose weights are obtained using both the probability map and color image. Using such a combined domain the stability of our detector is increased, whereas cumulative character of the distance transform allows for detecting skin boundaries even if they are smooth in the propagation domain. This is an important advantage over alternative region growing approaches, which was clearly manifested in the obtained experimental results. The paper is organized as follows. Existing methods are presented in Section II. Proposed solution is described and justified in Section III. Results of conducted experimental study are given and discussed in Section IV, and the paper is concluded in Section V. II. R ELATED WORK Skin detection and segmentation is an active research topic and many different methods have been proposed to address this problem. In general, there are a number of pixel-wise detectors which operate in the color domain – based on general decision rules defined in different color spaces, every pixel is classified based on its color value. There are some methods which make the decision rules flexible and adaptable to every analyzed image or sequence of images. This may substantially decrease the error rate in comparison with global models, but it also requires some input data for the adaptation, which must be extracted from the image. Furthermore, context-based textural information may be taken into account to better discriminate between skin and non-skin pixels. Finally, there are some approaches which benefit from spatial analysis of skin-tone pixels. They are given more attention in Section II-A. Color-based detectors take advantage of the observation that skin-tone color has common properties which can be defined in various color spaces. Basically, parametric and statistical skin color models can be distinguished here. An interesting, thorough survey which compares various approaches towards skin color modeling was presented in 2007 by Kakumanu et al. [5]. Parametric skin models are based on fixed decision rules defined empirically in various color spaces after investigating skin-tone distribution. These rules are applied after color normalization to determine if a pixel color value belongs to the skin. Kovac et al. [6] proposed a model defined in RGB color space. Skin-tone color was also modeled in HSV by Tsekeridou et al. [7]. An approach proposed by Hsu et al. [8] takes advantage of common skin color properties in nonlinearly transformed Y CB CR color space, in which

elliptical skin color model is defined. Some techniques operate in multiple color spaces to increase the stability, for example a composed skin detector [9] defined in RGB and Y CB CR color spaces. Recently, Cheddad et al. proposed to reduce the RGB color space to a single dimension, in which the decision rules are defined [10]. Statistical modeling is based on analysis of skin pixel values distribution for a training set of images, in which skin and non-skin areas are already identified and annotated. This creates a global model of skin color, which defines the probability that a given pixel value belongs to the skin class. Skin color can be modeled using many techniques, including the Bayesian classifier applied by Jones and Rehg [11], the Gaussian mixture model [12] or random forests [13]. There are a number of adaptive models that improve the segmentation accuracy. Lee et al. proposed to extract the features concerning the lighting conditions from every analyzed image to adjust the skin detector [14]. Phung et al. introduced a method for adapting the segmentation threshold in the probability map [15], and this approach was later extended by Zhang et al. [16]. Also, based on detected faces the global skin model can be made more specific to the local conditions [17]–[20], which results in decreasing false positives. Argyros and Lourakis addressed temporal lighting variations observed in video sequences by tracking the skinlike objects and conforming the global skin color model dynamically [21]. The features helpful for skin segmentation can also be extracted using texture analysis performed in a grayscale [22], [23], color [24], or skin map [25] domain. When skin segmentation is performed in video sequences, the system may take advantage of dynamic information. Sigal et al. used Markov models to predict illumination changes in subsequent video frames to adjust the skin color model [26]. Furthermore, background extraction techniques and motion detectors may be used to find potential locations of skin pixels [27]. A. Skin segmentation using spatial analysis Spatial alignment of skin-tone pixels can be taken into account to reduce false positives and increase the precision of determined skin region boundaries. In general, these methods perform skin segmentation based on skin probability maps obtained using conventional pixel-wise detectors. Ruiz-del-Solar and Verschae [28] proposed the controlled diffusion method, which is used for the comparative study presented in Section IV. The controlled diffusion consists of two general steps: 1) diffusion seeds extraction, and 2) the proper diffusion process. The seeds are extracted using pixelwise skin probability maps, and they are formed by those pixels, whose skin probability exceeds a certain high threshold (Pα ). During the second step, the skin regions are built from the seeds by adjoining the neighboring pixels which meet the diffusion criteria, defined either in the probability map or color space domain. These criteria are as follows: 1) distance between a source pixel x and a pixel y (which is to be adjoined) in the diffusion domain Dd is below a

given threshold: |Dd (x) − Dd (y)| < ∆max , and 2) skin probability for the adjoined pixel must be over a certain threshold: P (y) > Pβ . It is worth to note that this is the threshold hysteresis with an additional constraint imposed on maximal difference between the neighboring pixels (either in terms of their probability or color). Hence, this works well if the region boundaries are sharp (diffusion stops due to high local differences), but the method is identical to the threshold hysteresis, if there exists a smooth transition between the pixel values that leads from one region to another (the diffusion will then “leak” outside the region). In our earlier works, an energy-based scheme was proposed for skin blob analysis [29], which is also used as a baseline in our experimental study. Skin seeds are formed by high-valued pixels in the skin probability maps, similarly as in the diffusion method. In addition, they are subject to erosion followed by a size-based analysis to decrease false positives – the seeds which are small in relation to the largest blob are rejected. It is assumed that the seed pixels receive a maximal energy amount equal to 1, which is spread over the image. Amount of the energy that is passed depends on the probability value of the target pixel. If there is no energy to be passed, the pixel is not adjoined to the skin region. Although this method implements a cumulative propagation (which helps reduce the “leakages” identified in the controlled diffusion), only skin probability is taken into account and local differences between the pixels are ignored. Furthermore, the cellular automata have been used to determine skin regions [23], but this process requires many iterations to achieve satisfactory results. Also, conditional random fields [30] were used to take advantage of spatial properties of skin regions [31]. However, this is a timeconsuming procedure as it involves simulated annealing for every analyzed image. Recently, Khan et al. combined the adaptive modeling based on face detection with spatial analysis using graph cuts [32]. The main drawback, however, is the processing time of 1.5 s for small, 100 × 100 images. III. P ROPAGATION - BASED SKIN SEGMENTATION The main drawback of the controlled diffusion method [28] lies in observing “leakages” when the skin region boundaries are not correlated with high gradient values in the diffusion domain (i.e. color image or skin probability map). The propagation criterium concerning the minimal probability of adjoined pixels (Pβ ) helps stop the region growing at some point, which decreases the “leakages”, but cannot eliminate them. Also, Pβ cannot be too high, because this would increase false negatives, often stopping the propagation too early, inside the skin regions. Although this problem is addressed to some extend by the energy-based approach [29], the latter has another serious disadvantage. Strong gradients in the image or skin probability map which would be expected to stop the propagation, fail to meet this expectation unless the skin probability falls below a certain level that would consume all the energy. This also may lead to some “leakages”, however of a different nature than in [28].

In the research reported here we exploited the possibilities of propagating the “skinness” from the seeds using the shortest routes determined with the Dijkstra’s algorithm. Contrary to the conventional approach presented by Ikonen and Toivanen [33], the weights used for optimizing the paths not only are based on the image luminance, but they are computed using a combined domain composed of the luminance, hue and skin probability. As the skin probability in the seeds is supposed to be high, the integrated path distance can be interpreted as the skin dissimilarity. The proposed framework allows the region boundaries to be determined even in case of smooth transitions between the skin and non-skin pixels. This is an important advantage over the controlled diffusion [28]. Also, our algorithm is characterized by higher stability than the energy-based method [29] due to incorporating the differences in the luminance and hue into the propagation costs. A. Skin seeds extraction An initial step to the spatial analysis is proper extraction of skin seeds. Propagation-based approaches [28], [29] take advantage of the observation that if the image is binarized using a high-probability threshold, false positives are rather small, because usually only the skin regions contain pixels with very high skin probability values. Hence, highprobability pixels can be successfully used as propagation seeds. A potential drawback of this solution is that some skin regions may not contain high-valued pixels in the probability map, which would result in their false rejection and an increase in the post-propagation false negative error. Taking into account the importance of choosing appropriate seed extraction technique, the following approaches were investigated: 1) High-probability threshold with size-based seed verification as proposed in [29]. If the skin probability of an individual pixel is over Pα , then the pixel is added to the seed. After that, the seed pixels are grouped into blobs, and those blobs whose area is smaller than 10% of the largest blob are rejected. 2) Seeds extracted using a parametric model [9] defined in RGB and Y CB CR color spaces. We found it experimentally that this model is characterized by quite low false positives at a cost of high false negative error rate. The overall segmentation errors are rather high, but due to low false positives the model is suitable for detecting skin seeds. 3) Blob detection in the probability map using wavelet maxima lines [34]. Using this technique, blob centers can be determined and treated as skin seeds. However, this method did not present any advantage over simple approaches listed above, while it occurred to be very time-consuming (over 1500 ms per image) and therefore it is unapplicable in the analyzed case. 4) Ground-truth seeds determined separately for every ground-truth skin region. The seeds were formed by those skin pixels, whose distance from the groundtruth skin region boundary was larger than 75% of the

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 1. Original color images (a), skin probability maps (b), different skin seeds: ground-truth (c), threshold-based with Pα = 80% (d) and Pα = 40% (e), and model-based (f).

maximal distance in each region. Naturally, this is not a detection technique, but it served as a baseline to evaluate the aforementioned approaches. Examples of the seeds obtained using the enumerated techniques are presented in Fig. 1 (marked in orange) for two different images. The skin probability maps were obtained using Jones and Rehg’s method [11]. B. Route optimization In order to propagate the “skinness” from the seeds, first the shortest routes from the seed to every pixel must be determined. This is achieved by minimizing total path costs from the set of seed pixels to every pixel in the image. The total path cost for a pixel x is defined as: C(x) =

l−1 X

ρ (pi → pi+1 ) ,

(1)

i=0

where ρ is a local skin dissimilarity measure between two neighboring pixels, p0 is a pixel that lies at the seed boundary, pl = x, and l is the total path length. The minimization, presented in Alg. 1, is performed based on Dijkstra’s algorithm [33]. If the priority queue Q is implemented with a heap, then the complexity of the optimization is O(n log nq ), where n is the number of pixels in the image and nq is an average number of pixels in Q. Usually nq  n, so it can be assumed that the complexity is linear. Also, the propagation process can be accelerated by imposing a maximal distance constraint, i.e. the update step (line 13) is performed only if the new cost C(s) is smaller than a maximal distance threshold. In addition, Pβ threshold can be introduced as in [28], which prevents propagation to the regions of very low skin probability. C. “Skinness” propagation domain The route optimization outcome heavily depends on how the local costs ρ are computed, which is given more attention in [33]. In the case of geodesic transforms, ρ is a function of gray-level difference between two neighboring pixels with an additional penalty for propagating in the diagonal direction. However, for skin segmentation this would be ineffective as the background may have similar gray-level value to the skin. Therefore, we propose to use a combined domain, taking advantage of two observations: 1) skin regions are generally smooth and present low variations in luminance and chrominance, 2) values in a skin probability map are usually high. Therefore, we compute the local cost from pixel

Algorithm 1 Proposed “skinness” propagation Require: S 0 = {p0 } . a set of pixels at seed boundary 1: Q ← {p0 } . where Q is a priority queue 2: for all x ∈ I do . where I is the source image 3: if x ∈ S then . i.e. x belongs to the seed S 4: C(x) = 0 . where C is a cost array 5: else 6: C(x) = -1 7: end if 8: end for 9: while Q 6= ∅ do . while Q is not empty 10: q = pop(Q) 11: for all s ∈ N (q) do . N (q) - 8 neighbors of q 12: c = C(q) + ρ(q → s) . new total path cost 13: if c < C(s) ∨ C(s) < 0 then 14: C(s) = c 15: Q←s . s is enqueued in Q 16: end if 17: end for 18: end while x to y, i.e. ρ (x → y), using both the image (ρI ) and the probability (ρP ) costs: ρ (x → y) = ρI (x, y) · [1 + ρP (x → y)] ,

(2)

ρI (x, y) = αdiag · (|Y (x) − Y (y)| + |H(x) − H(y)|) , (3)  Pt −P (y) for P (y) > Pβ 1−Pt , (4) ρP (x → y) = ∞ for P (y) ≤ Pβ where Y (·) is the pixel luminance, H(·) is the hue in √ the HSV color model, αdiag ∈ {1, 2} is the penalty for propagation in the diagonal direction, P (·) is the skin probability, and Pt = 0.6 is the probability threshold (this value was selected experimentally, and the algorithm is not very sensitive to it). The total path cost obtained after the optimization is inversely proportional to the “skinness”, hence the final skin probability map is obtained by scaling the costs from 0 for the maximal cost to 1 for a zero cost (i.e. the seed pixels). The pixels which are not adjoined during the propagation process are assigned with zeroes. Finally, skin regions are extracted using a fixed threshold in the distance domain. The segmentation process is presented in Fig. 2, and compared with two other detectors [28], [29]. The color image (original) is transformed into the probability map using Jones and Rehg’s method [11] (darker shade indicates higher probability), and the seeds, marked in orange, are extracted using Pα = 0.8. The controlled diffusion [28], performed in the probability map domain, generates a better result than using a fixed threshold, but a significant “leakage” can be observed because of smooth transitions between the skin and the background. Also the energy-based method [29] produces a “leakage” at a different location. The false positives (indicated by a red tone in the result image) are virtually eliminated when the threshold is applied in the distance map, obtained using the proposed algorithm.

(original)

(prob. map)

(dist. map)

(propagation result) (proposed method)

(threshold)

(diff. [8])

(energy [19])

Fig. 2. Proposed detection process and its outcome compared with the results obtained using controlled diffusion and energy-based propagation.

IV. E XPERIMENTAL VALIDATION The experiments were carried out using two data sets, namely: 1) 4000 images from the benchmark ECU database [15], acquired in uncontrolled lighting conditions, 2) 899 images from our hand gesture data set, further termed HGR (available at http://sun.aei.polsl. pl/˜mkawulok/gestures). This data set contains images of gestures presented by 12 different people. The data were acquired in controlled conditions, but in some cases skin-like color objects appear in the background. The images in both data sets are associated with ground-truth binary masks that indicate skin regions, which makes it possible to train and validate skin detectors. ECU data set was split into two equinumerous sets, namely: ECU-T used for training the Bayesian classifier [11], and ECU-V used for validation. In the case of HGR, the whole set was used for validation. Skin segmentation performance was assessed based on two errors, namely: a) false positive rate (δf p ), i.e. a percentage of background pixels classified as skin, and b) false negative rate (δf n ), i.e. a percentage of skin pixels misclassified as background. Mutual relation of these two errors is presented using receiver operating characteristics (ROC). In some (η) cases we quote the false negative rate (δf n ) obtained for a fixed false positive error δf p = η. Also, we use a term of the minimal detection error (δmin = δf p + δf n ), where the threshold is set to a value, for which this sum is the smallest. Our experimental study consists of three main parts, namely: 1) analysis of different seed detection methods outlined in Section III-A, 2) quantitative and 3) qualitative evaluation. The experiments were conducted using Intel Core2 Duo 2.0 GHz with 4 GB RAM. A. Seed detection In order to evaluate different seed detection techniques, a threshold hysteresis was applied. Starting from the seeds, neighboring pixels were adjoined as long as their probability was over Pβ threshold. The results for ECU-V are presented in Fig. 3, and for HGR in Fig. 4. The fixed-threshold ROC curve is compared here with the results obtained using ground-truth seeds, model-based seeds and seeds extracted using the Pα threshold. The subsequent points which form a curve for every individual seed detection routine were obtained by decreasing the Pβ value, starting from Pβ = Pα .

20%

Fixed threshold Ground-truth seeds Model-based seeds Pα = 95% Pα = 90% Pα = 80% Pα = 70%

50% 40%

False alse negative gative errorr rate

False negative error rror rate

60%

30% 20% 10%

Fixed threshold Ground-truth seeds Model-based seeds Pα = 85% Pα = 80% Pα = 60% Pα = 40% Pα = 30%

15%

10%

5%

0%

0% 0%

4%

8%

12%

16%

20%

False positive error rate

0%

2%

4%

6%

8%

10%

False positive error rate

Fig. 3. ROC curves obtained using the threshold hysteresis initiated with different seeds for the ECU-V data set.

Fig. 4. ROC curves obtained using the threshold hysteresis initiated with different seeds for the HGR data set.

For Pα threshold-based seed extraction it can be seen from Fig. 3 that the higher Pα is, the lower δf p is obtained in the seeds (i.e. the initial point of each curve, having the smallest false positive value). Due to the size-based seed verification step, those initial points are located under the fixed-threshold curve. Otherwise, they would be positioned on that curve. This observation justifies the verification step, which helps decrease false positives without a significant increase in false negatives. It can also be noticed that if Pα is high, then false negatives decrease very slowly compared with the false positives increase, which positions the results over the fixedthreshold curve. The reason for this is that some skin regions do not contain any pixels with probability exceeding Pα , and they are falsely rejected already at the seed detection stage. This points out that Pα must be selected carefully. If it is too large, then many skin regions will be excluded from the propagation, and if it is too small, then initial δf p may be large, which by definition cannot be decreased using any region-growth algorithm. It may be noticed from Figs. 3 and 4 that overall the threshold-based seed extraction is more effective than the model-based seeds, and using a threshold hysteresis is definitely better than a fixed threshold. It is worse than the ground-truth seeds in case of the ECU-V set, but for Pα ≤ 60% it presents similar performance for the HGR set. The main problem here is that an optimal value of Pα is of significant difference for both data sets. It is around Pα = 80% for ECU-V and Pα = 40% for HGR, and those values were used for further validation. In future, it may be worth considering to use the threshold adaptation introduced in [15] to determine Pα automatically for a given scene.

case of the diffusion method, subsequent points of the ROC curves were determined using different diffusion thresholds ∆max . Basically, for small ∆max the seeds were not enlarged significantly, which keeps δf p at a low level, but does not reduce δf n much. For larger ∆max , the diffusion converges to the result of the threshold hysteresis. All of the presented results were obtained for Pβ = 20%. From the ROC curves presented in the graphs it can be concluded that the proposed algorithm outperforms the controlled diffusion (regardless of the domain) and the energy-based method, both for the ECUV and HGR sets. Exact scores obtained using the baseline methods, as well as the propagation results for different seed detection routines, are presented in Tab. I. The bold values indicate the settings, for which the ROC curves presented in Figs. 5 and 6 were obtained. Also, using these seed detectors the baseline propagation methods were evaluated. It can be clearly seen that both the minimal error rate (δmin ) and false (η) negatives measured at a fixed false positive rate (δf n ) are the lowest using our method. This fully confirms the theoretical advantages identified in Section III. Average processing times per image are presented in Tab. II. Obviously, the methods which perform spatial analysis are much slower than using a fixed threshold, but they still allow for real-time processing of several frames per second. The proposed method is slightly slower than alternative baseline techniques, but the difference is not significant here. Therefore, it can be considered as a fast, real-time method.

B. Quantitative evaluation The proposed method was compared quantitatively with 1) a fixed threshold applied for probability maps obtained using the statistical model [11], 2) the controlled diffusion [28] in color domain and 3) in the probability map domain, and 4) energy-based method [29]. The results obtained for the ECU-V and HGR sets are presented in Figs. 5 and 6. In

C. Qualitative evaluation Finally, we have evaluated our method qualitatively. Several examples, in which the proposed approach outperforms the baseline methods, are presented in Fig. 7 (I. and II. are from the ECU data set, while III.-V. are from our HGR set). The “leakages” can be observed for the diffusion and energybased methods, especially in case of images I. and II. For III.-V. the diffusion leaves the skin region and many false positives are adjoined despite relatively low values in the probability map (though larger than Pβ ). This is because of

False negative gative error rate ate

50%

(original)

Fixed threshold Diff. [8] (color) Diff. [8] (skin map) Energy-based [19] Proposed method

40% 30%

(prob. map) (threshold)

(diff. [28]) (energy [29]) (proposed)

I.

20%

II.

10% III.

0% 5%

0%

10%

15%

20%

False positive error rate

IV.

Fig. 5. ROC curves for different segmentation methods (ECU-V data set).

False negative gative error rate ate

15%

V.

Fixed threshold Diff. [8] (color) Diff. [8] (skin map) Energy-based [19] Proposed method

12% 9%

Fig. 7. Examples of skin detection results obtained using different methods (red: false positives; blue: false negatives; faded color: true negatives). (original)

(prob. map) (threshold)

(diff. [28]) (energy [29]) (proposed)

6% VI.

3% 0% 0%

2%

4%

6%

8%

10%

False positive error rate VII.

Fig. 6.

ROC curves for different segmentation methods (HGR data set).

TABLE I D ETECTION ERRORS OBTAINED FOR THE BASELINE METHODS AND USING THE PROPOSED APPROACH FROM DIFFERENT SEEDS .

Proposed

Baseline

Method ↓ Fixed threshold Diff. (color) Diff. (skin map) Energy-based Pα = 80% Pα = 60% Pα = 40% Ground-truth Model-based

ECU-V (8%) δmin δf n 25.2 % 21.2% 25.03% 20.2% 25.71% 20.02% 22.49% 19.35% 21.48% 14.24% 20.74% 16.01% 21.99% – 18.12% 10.24% 26.65% 20.57%

HGR δmin 10.91% 11.85% 11.54% 8.92% 11.68% 9.93% 8.64% 9.18% 11.79%

(4%)

δf n 7.26% 7.87% 8.19% 4.98% 12.11% 5.99% 4.65% 5.2% 10.89%

TABLE II AVERAGE PROCESSING TIME PER IMAGE . Fixed threshold [11] 58 ms

Diffusion [28] 236 ms

Energy-based [29] 218 ms

Proposed 242 ms

smooth transitions between the skin and non-skin regions. The energy-based method copes very well with IV., but small “leakages” appear in III. and V. Their nature confirms our theoretical findings justified in Section III. There is a rapid change of skin probability at the skin region boundary, and the non-skin pixels positioned near skin regions are adjoined.

Fig. 8.

Examples of weak performance of the proposed method.

Although the energy is quickly decreasing outside the skin region, some pixels are adjoined before it reaches zero. During our study we also focused on the samples, for which our method was outperformed by the baseline techniques. Two such examples are presented in Fig. 8. For image VI. (from the ECU data set), the energy-based method achieves the best result, while in case of VII. the diffusion virtually eliminates the detection errors. Hopefully, analysis of such cases may help improve our approach and design better solutions in future. It is worth noting that we have not observed cases, in which the proposed method generates very high detection errors, while at the same time the baseline techniques perform correct segmentation. Usually such differences were moderate. Minimal detection errors (δmin for all of the presented images in Figs. 2, 7 and 8 are given in Tab. III. V. C ONCLUSIONS AND FUTURE WORK This paper has introduced a new approach towards spatial analysis of skin probability maps, which decreases skin segmentation errors. Our contribution lies in: 1) proposing cumulative “skinness” propagation framework, and 2) using a combined domain of luminance, hue and skin probability

TABLE III D ETECTION ERRORS FOR IMAGES PRESENTED IN F IGS . 2, 7 AND 8. Image Fig. 2 I. II. III. IV. V. VI. VII.

Threshold 27.84% 18.01% 62.47% 7.64% 6.13% 6.54% 4.64% 15.41%

Diffusion [28] 9.41% 20.42% 58.89% 9.48% 8.74% 12.49% 4.17% 3.95%

Energy [29] 10.32% 18.29% 69.82% 7.11% 3.44% 4.83% 3.88% 8.68%

Proposed 3.45% 2.54% 18.67% 4.71% 3.30% 2.05% 5.36% 12.95%

for the spatial analysis. We have presented both theoretical and empirical advantages over alternative skin detectors, confirmed by an extensive experimental study. Spatial analysis of skin probability maps significantly reduces false positive errors, and therefore it is an important processing step for recognizing gestures in uncontrolled conditions. It is more time-consuming than pixel-wise detectors, but it still allows for real-time processing of several frames per second. Furthermore, the distance transform, which is the core of our method, can hopefully be used for detecting hand feature points and extracting hand-pose features, which is actually a part of our ongoing research. In the study reported here we have not investigated the adaptive skin models, which are known to increase the skin classification precision. First, the proposed spatial analysis may be more effective if it processes skin probability maps obtained using adaptive models, and second, it may also be applied beforehand to make the adaptation itself more effective. In any case, this direction is worth to be investigated in the nearest future. Finally, we are currently focused on incorporating the textural features into the propagation domain, which should also help increase the stability of our skin detector. R EFERENCES [1] A. Erol, G. Bebis, M. Nicolescu, R. D. Boyle, and X. Twombly, “Vision-based hand pose estimation: A review,” Comput. Vis. Image Underst., vol. 108, no. 1-2, pp. 52–73, 2007. [2] S. S. Ge, Y. Yang, and T. H. Lee, “Hand gesture recognition and tracking based on distributed locally linear embedding,” Image Vision Comput, vol. 26, no. 12, pp. 1607–1620, 2008. [3] K. Dorfmuller-Ulhaas and D. Schmalstieg, “Finger tracking for interaction in augmented environments,” in Proc IEEE and ACM Int Symposium on Augmented Reality, 2001, pp. 55–64. [4] C. Maggioni and B. K¨ammerer, Computer Vision for HumanMachine Interaction, chapter GestureComputer – History, Design and Applications, pp. 23–52, Cambridge University Press, 1998. [5] P. Kakumanu, S. Makrogiannis, and N. G. Bourbakis, “A survey of skin-color modeling and detection methods,” Pattern Recogn, vol. 40, no. 3, pp. 1106–1122, 2007. [6] J. Kovac, P. Peer, and F. Solina, “Human skin color clustering for face detection,” in EUROCON 2003. Computer as a Tool., 2003, vol. 2, pp. 144–148. [7] S. Tsekeridou and I. Pitas, “Facial feature extraction in frontal views using biometric analogies,” in Proc of EUSIPCO ’98, 1998, pp. 315– 318. [8] R.-L. Hsu, M. Abdel-Mottaleb, and A. Jain, “Face detection in color images,” IEEE Trans Pattern Anal and Machine Intell, vol. 24, no. 5, pp. 696–706, 2002. [9] G. Kukharev and A. Nowosielski, “Fast and efficient algorithm for face detection in colour images,” Machine Graphics and Vision, vol. 13, pp. 377–399, October 2004.

[10] A. Cheddad, J. Condell, K. Curran, and P. Mc Kevitt, “A skin tone detection algorithm for an adaptive approach to steganography,” Signal Proc, vol. 89, no. 12, pp. 2465–2478, 2009. [11] M. Jones and J. Rehg, “Statistical color models with application to skin detection,” Int J of Comp Vision, vol. 46, pp. 81–96, 2002. [12] H. Greenspan, J. Goldberger, and I. Eshet, “Mixture model for facecolor modeling and segmentation,” Pattern Recogn Lett, vol. 22, pp. 1525–1536, 2001. [13] R. Khan, A. Hanbury, and J. St¨ottinger, “Skin detection: A random forest approach,” in Proc 17th IEEE Int Image Processing (ICIP) Conf, 2010, pp. 4613–4616. [14] J.-S. Lee, Y.-M. Kuo, P.-C. Chung, and E.-L. Chen, “Naked image detection based on adaptive and extensible skin color model,” Pattern Recogn, vol. 40, pp. 2261–2270, 2007. [15] S. L. Phung, D. Chai, and A. Bouzerdoum, “Adaptive skin segmentation in color images,” in IEEE Int Conf on Acoustics, Speech and Signal Proc, 2003, pp. 353–356. [16] M.-J. Zhang, W.-Q. Wang, Q.-F. Zheng, and W. Gao, “Skin-color detection based on adaptive thresholds,” in Proc Third Int Conf on Image and Graphics, ICIG. 2004, pp. 250–253, IEEE. [17] J. Fritsch, S. Lang, M. Kleinehagenbrock, G. Fink, and G. Sagerer, “Improving adaptive skin color segmentation by incorporating results from face detection,” in Proc IEEE Int Workshop on Robot and Human Interactive Communication, 2002, pp. 337–343. [18] C.-S. Wang, Y.-C. Jeung, L.-B. Luo, J. Wang, and J.-W. Chong, “Realtime face recognition using adaptive skin-color model,” in Proc. IEEE International Conference on Information Science and Applications, ICISA, USA, 2011, pp. 1–6. [19] J. Lichtenauer, M. J. T. Reinders, and E. A. Hendriks, “A selfcalibrating chrominance model applied to skin color detection,” in VISAPP, 2007, vol. 1, pp. 115–120. [20] M. Kawulok, “Dynamic skin detection in color images for sign language recognition,” in Proc ICISP, vol. 5099 of LNCS, pp. 112– 119. Springer, 2008. [21] A. A. Argyros and M. I. A. Lourakis, “Real-time tracking of multiple skin-colored objects with a possibly moving camera,” in Proc ECCV, vol. 3023 of LNCS, pp. 368–379. Springer, 2004. [22] X. Wang, X. Zhang, and J. Yao, “Skin color detection under complex background,” in Proc Int Conf on Mechatronic Science, Electric Engineering and Computer, 2011, pp. 1985–1988. [23] A. A. Abin, M. Fotouhi, and S. Kasaei, “A new dynamic cellular learning automata-based skin detector,” Multimedia Syst, vol. 15, no. 5, pp. 309–323, 2009. ´ S´anchez, “Comparing [24] A. Conci, E. Nunes, J. J. Pantrigo, and A. color and texture-based algorithms for human skin detection,” in Proc ICEIS, 2008, pp. 166–173. [25] M. Kawulok, “Texture analysis for skin probability maps refinement,” in Proc MCPR, vol. 7329 of LNCS, pp. 75–84. Springer, 2012. [26] L. Sigal, S. Sclaroff, and V. Athitsos, “Skin color-based video segmentation under time-varying illumination,” IEEE Trans on Pattern Anal and Machine Intell, vol. 26, pp. 862–877, 2003. [27] F. Dadgostar and A. Sarrafzadeh, “An adaptive real-time skin detector based on hue thresholding: A comparison on two motion tracking methods,” Pattern Recogn Lett, vol. 27, no. 12, pp. 1342–1352, 2006. [28] J. R. del Solar and R. Verschae, “Skin detection using neighborhood information,” in Proc IEEE Int Conf on Automatic Face and Gesture Recogn, 2004, pp. 463–468. [29] M. Kawulok, “Energy-based blob analysis for improving precision of skin segmentation,” Multimedia Tools and Applications, vol. 49, no. 3, pp. 463–481, 2010. [30] P. Krahenbuhl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” in Proc of Neural Information Processing Systems, NIPS, 2011. [31] K. Chenaoua and A. Bouridane, “Skin detection using a markov random field and a new color space,” in IEEE Int Conf on Image Proc, 2006, pp. 2673–2676. [32] R. Khan, A. Hanbury, R. Sablatnig, J. Stottinger, F. Khan, and F. Khan, “Systematic skin segmentation: merging spatial and non-spatial data,” Multimedia Tools and Applications, pp. 1–25, 2012. [33] L. Ikonen and P. Toivanen, “Distance and nearest neighbor transforms on gray-level surfaces,” Pattern Recogn Lett, vol. 28, no. 5, pp. 604– 612, 2007. [34] C. Damerval and S. Meignen, “Interest Point Detector with Wavelet Maxima Lines,” Tech. Rep. 171678, HAL Inria, 2007.

Suggest Documents