CHROMATIC COLOUR SPACES FOR SKIN ... - Semantic Scholar

5 downloads 0 Views 1MB Size Report
Brown et al utilise a Self Organising Map (SOM) to classify pixels from a .... [3] David Brown, Ian Craw, and Julian Lewthwaite, “A. SOM based approach to skin ...
CHROMATIC COLOUR SPACES FOR SKIN DETECTION USING GMMS Darren Butler, Sridha Sridharan and Vinod Chandran Research Concentration in Speech, Audio and Video Technology School of Electrical and Electronic Systems Engineering Queensland University of Technology, Australia GPO Box 2434, George St, Brisbane, Australia, 4001 de.butler, s.sridharan, v.chandran @qut.edu.au 

ABSTRACT Skin detection often forms part of both face detectors and pornography detectors for Internet filters. This paper introduces a trivial skin detection algorithm that uses Gaussian blocks according to their Mixture Models to classify skin-likeness. The absence of structural features in skin regions means luminance information alone is unsuitable for accurate skin detection. Therefore, the affect of the chosen colour components on the classification accuracy is investigated. Fused features vectors extracted from the Cb and Cr chrominance components were found to give the best results, achieving an equal error rate of 5.5%. By utilising block size, skin regions in these components and an JPEG and MPEG files can be located without full decompression. 











1. INTRODUCTION Skin detection has gained popularity as a precursor to both face localisation and automatic pornography detection for Internet filters. Human skin lacks structural features to aid in its localisation. Therefore, luminance information is insufficient for accurate skin detection and in many cases may in fact be detrimental. Existing techniques utilise features from one or more chrominance components from a given colour space. This paper explores the validity of some of these colour spaces using a simple feature selection algorithm and Gaussian Mixture Model (GMM) classifiers. Constructing GMMs that accurately model the the and distributions is non-trivial considering the wide variability of human skin tones. For some applications, this problem can be subverted by training GMMs for each subject. However, for general purpose skin detectors, this is not possible. Instead, feature spaces that remain relatively stable across all skin tones are saught. Wark and Sridharan exploit the observation that skin tones contain a predominance of red and use simple thresholding of the R/G ratio to detect skin-like pixels [1]. The 







0-7803-7402-9/02/$17.00 ©2002 IEEE





dominant cluster of skin-like pixels is presumed to correspond to the face region for the class of imagery being considered. They further extend this technique and use different thresholds to segment the lip region from the surrounding skin thereby tracking the lips. Terrillon et al introduce a similar skin detection algorithm as part of their face detector [2]. Skin colour segmentation is combined with shape analysis using invariant moments as input to a multilayer perceptron neural network. A hue-saturation colour space is used to reduce the effects of human skin colour variation and the dependency of chrominance on illumination changes. Brown et al utilise a Self Organising Map (SOM) to classify pixels from a number of different colour spaces [3]. The SOM approach offers benefits in terms of adaptability and it can be efficiently implemented in hardware. However, although the approach is promising for images with prominent faces, it suffers on more complex imagery with significant background clutter. 2. SKIN DETECTION 2.1. Algorithm Purely pixel-based skin detection algorithms coarsely classify individual pixels according to their skin-likeness. Subsequently, thresholding can be used which results in a binary decision. Alternatively, for greater flexibility, a skin probability map can be computed for the image. Similar to conventional block-based compressors, our algorithm subdivides the image into equal sized blocks. Therefore, pixel neighbourhood information can be exploited from the very beginning. For the purposes of this investigation, the blocks. Larger sized blocks image was subdivided into would capture better neighbourhood statistics, but would also increase the coarseness of the classification. Furtherblocks is consistent with block-based more, using compressors. The histogram of each block from the chosen colour 



IV - 3620























Chosen colour space blocks

P (s ) P (s x )

Compute Histogram

P (x s )

Skin GMM

skin skin Classifier

Feature Selection

Non-skin GMM

Peak, LValley ,

( )

P xs

RValley

Fig. 1. Skin detector block diagram

programs are more likely to use HSV (hue, saturation and value/brightness). Limiting of the normalised colour component R/G has previously been used for detecting skin tones [1]. In [5] it was found that detection performance could be improved if the R/B colour component was also utilised. These colour components were also utilised but with much wider limits. The limiting was still necessary to avoid problems caused by the singularities that occur when either the green or blue colour component is zero. The chrominance components, Cb and Cr, taken from the YCbCr colour space were also of interest because of their use in image compressors. If they proved useful for skin detection then skin regions could be found without entirely decompressing the image or video streams. An block size makes this prospect additionally inviting. The final colour components considered are the tint (T) and saturation (S) from the TSL colour space. The tint and saturation have previously proven to be useful for skin detection [6]. Equations 2, 3 and 4 give the conversion from RGB to TSL colour space. 

space is computed and the height of the peak and the locations of its left and right valleys are determined. The height is a measure of the peak’s density and the locations of the left and valleys are measures of its spread. These features were selected as they capture relevant statistics, are trivial to compute, and do not presume that the histogram follows a gaussian distribution. The resultant feature vector is classified using gaussian mixture models (GMMs). Figure 1 summarises the skin detection algorithm. models were constructed using feature vectors The extracted from hand segmented skin regions present in 156 modimages from the XM2VTS database [4]. The els were constructed from 179 images from a natural imagery database collected inhouse. The model orders that maximised detection performance were used to compare the colour spaces and are summarised in Table 1. The likelihood ratio threshold for acceptance as a skin region is calculated according to Equation 1 which can be directly derived from Bayes’ theorem. The a priori probais set according to a particular applicability of skin, , can then be tion. The a posteri class probability, adjusted to set the false acceptance and false rejection rates. For the imagery considered in this investigation, was was adjusted to produce the Detection set to 0.6 and Error Trade-off (DET) curves of Figure 3. 









/

&

.

/

0

1

0

4

( 7

#

1





















0

1

0

4

( 7

#

:;

> ;

0

?

@

B

D

F

H

B

J

L

Q

S

8

U

8

W

V

T

M

N

O




(2)

>

0

;

V

;=

?

B @

D

F

H

B

J

L

T S

M

O N

#

V

8

V

T

#

Z Y

[

\

?

?

]

X 0

.

(3)

8 [

#

Z

_

Z

/

V

`

b

1 0

V

0

4

V `



(4)

`

e

&

&

f

#







































#























&

&

(

(



















(1) 



3. RESULTS







/









&

8



2.2. Colour Spaces The literature proposes numerous different colour spaces, the choice of which is typically application dependent. For instance, it is well established that the human eye is less sensitive to chromatic information in comparison to luminance information. Image and video compressors exploit this by utilising a colour space like YCbCr and subsampling the chrominance components. Conversely, computer graphics

The choice of optimum orders for the and GMMs is not obvious. Thirty-six different topologies were evaluated for each colour space and those that maximised detection accuracy were selected for further comparison. Table 1 summarises the chosen model orders and the corresponding equal error rates (ERR) for each colour space. As aforementioned, the skin GMM was constructed using feature vectors taken from images from the XM2VTS database. However, a true indication of skin detection performance is not possible with this database. The use of controlled lighting, the absence of background clutter and the predominance of skin regions significantly simplifies the task of skin detection. The skin detectors were evaluated using an inhouse face database with known skin regions. The images contain predominantly caucasian subjects but skin regions for asian and

IV - 3621













Table 1. GMM orders that maximised detection accuracy. Colour Space Equal Error Rate (%) R/G 8 16 13.01 R/B 2 4 7.29 R/G+R/B 4 32 8.04 Cb 2 2 6.50 Cr 4 16 6.07 Cb+Cr 2 4 5.50 T 2 8 19.76 S 4 4 8.46 T+S 2 2 7.09 











black subjects are also present. They were captured using varying scales and illumination conditions, exhibit typical indoor background clutter and may simultaneously include skin regions from multiple subjects. Figure 2 depicts some example test images and the corresponding detection results. The performance of skin detectors constructed using the individual colour components was first investigated. Where plausible, the feature vectors were subsequently concatenated and detection accuracy of the fused skin detectors was evaluated. Figures 3 (a)-(c) compare the classification results of the fused and un-fused detectors. The results for the best performing detector in each colour space are compared in Figure 3 (d). With the exception of R/G+R/B, the fusion of feature vectors resulted in a marginal improvement in classification accuracy. The primary cause for the failure of R/G+R/B is the poor performance obtained with the R/G colour component. However, further investigation is required to whether this a general problem or whether it is a property of the data that was used in the experiments. Regardless, the performance of R/B alone is comparable to the best of the skin detectors for our test imagery. The fusion of feature vectors extracted from the chrominance components, Cb and Cr gives the best performance for our test imagery as demonstrated in Figure 3 (d). However, the margin is not great and depending on the desired operating point and complexity, it may be beneficial to select a different colour space. Furthermore, all our test cases were drawn from an indoor image database and results are likely to vary for outdoor scenes. 4. CONCLUSIONS An algorithm for skin detection that uses only three features blocks of the chosen colour components was from introduced. The skin detection algorithm was evaluated using fused and un-fused features from a number of different colour spaces. Therefore, the plausibility of both the de





IV - 3622

Fig. 2. Example skin tone detection.

DET Curves for R/G, R/B & R/G+R/B

DET Curves for Cb, Cr & Cb+Cr R/G R/B R/G+R/B

40

20

Miss probability (in %)

Miss probability (in %)

20

10

5

2

10

5

2

1

1

0.5

0.5

0.2

0.2

0.1

0.1 0.1 0.2

0.5

1

2 5 10 False Alarm probability (in %)

20

40

0.1 0.2

0.5

1

2 5 10 False Alarm probability (in %)

(a)

(b)

DET Curves for T, S & T+S

DET Curves for R/B, Cb+Cr & T+S T S T+S

40

20

40

R/B Cb+Cr T+S

40

20

Miss probability (in %)

20

Miss probability (in %)

Cb Cr Cb+Cr

40

10

5

2

10

5

2

1

1

0.5

0.5

0.2

0.2

0.1

0.1 0.1 0.2

0.5

1

2 5 10 False Alarm probability (in %)

20

40

0.1 0.2

(c)

0.5

1

2 5 10 False Alarm probability (in %)

20

40

(d)

Fig. 3. Detection error trade-off curves tection algorithm and the choice of colour space could be investigated. Minimum classification error for our training set was obtained using the fused Cb and Cr features with GMM and models reorders equal to 2 and 4 for the spectively. The equal error rate for this topology was only 5.5%. Furthermore, by using these components in conjunc, skin regions in JPEG and tion with a block size of MPEG files can be located without full decompression. For indoor images, the accuracy of the skin detector proved to be robust to both the races of the subjects and the presence of background clutter. Many of the blocks that are falsely classified as skin regions could be eliminated using morphological operators without significantly affecting correctly classified blocks. Further work is required to quantify the detector’s performance on outdoor imagery. 















Akamatsu, “Automatic detection of human faces in natural scene images by use of a skin color model and of invariant moments,” in Proceedings of FG ’98, 1998, pp. 112–117.



5. REFERENCES [1] Timothy Wark and Sridha Sridharan, “A syntactic approach to automatic lip feature extraction for speaker identification,” in Proceedings of ICASSP ’98, May 1998, pp. 3693–3696.

[3] David Brown, Ian Craw, and Julian Lewthwaite, “A SOM based approach to skin detection with application in real time systems,” in Proceedings of BMVC ’01, 2001. [4] K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, “XM2VTSDB: the extended M2VTS database,” in Proceedings of AVBPA ’99, 1999. [5] Jason Brand and John S Mason, “A comparative assessment of three approaches to pixel-level human skindetection,” in Proceedings of ICPR ’00, 2000, vol. 1, pp. 1056–1059. [6] Jean-Christophe Terrillon and Shigeru Akamatsu, “Comparative performance of different chrominance spaces for color segmentation and detection of human faces in complex scenes,” in Proceedings of Vision Interface 99, May 1999, pp. 180–187.

[2] Jean-Christophe Terrillon, Martin David, and Shigeru

IV - 3623