tracking, color space model selection, adaptive color seg- mentation. 1. Introduction ... center and size of the color object are found. The current size and ... loss of generality p(c) â [0,1]. We call this new image a flesh probability image (FPI). 4.
Adaptive Color Space Switching for Face Tracking in Multi-Colored Lighting Environments Helman Stern and Boris Efros Department of Industrial Engineering and Management Ben-Gurion University of the Negev P.O.Box 563, Beer-Sheeva 84105, ISRAEL E-mail: (helman, efros)@bgumail.bgu.ac.il
Abstract There are many studies that use color space models (CSM) for detection of faces in an image. Most researchers a priori select a given CSM, and proceed to use the selected model for color segmentation of the face by constructing a color distribution model(CDM). There is limited work on finding the overall best CSM. We develop a procedure to adaptively change the CSM throughout the processing of a video. We show that this works in environments where the face moves through multi-positioned light sources with varying types of illumination. A test of the procedure using the 2D color space models; RG, rg, HS, YQ and CbCr found that switching between the color spaces resulted in increased tracking performance. In addition, we have proposed a new performance measure for evaluating colortracking algorithms, which include both accuracy and robustness of the tracking window. The methodology developed can be used to find the optimal CSM-CDM combination in adaptive color tracking systems. Keywords: image processing, color segmentation, face tracking, color space model selection, adaptive color segmentation
1. Introduction Many interesting and useful applications have been developed using facial digital images. The face detection problem is of prime importance and involves the determination of whether or not there is a human face present in an image, and if present to return its location. Detection of faces is often preceded by the extraction of various cues. Typically, such cues take the form of features of the face such as; shape, color, motion, and relational position to the rest of the human body or other objects in the scene (such as desks or doorways). Shape alone is not sufficient to track a
face due to the various facial orientations and partial occlusions during the motion. Also, motion alone fails, due to the confusion introduced by other body movements and moving objects in the scene. Color, however, especially skin color, provides a powerful cue to segment the face from other objects in the scene. In this paper we develop a skin color based tracking system. We have found that in environments where the face moves through non-stable illumination conditions involving multiple light sources, it is advantageous to employ a dynamic approach. Such an approach adaptively switches between a number of color space models (CSM) as a function of the state of the environment, as well as, dynamically updating the corresponding color distribution model (CDM). Although, adaptively updating the color distribution model has received some attention in the literature, we believe this is the first attempt to adaptively select CSMs throughout a tracking sequence. Section 2 deals with the problem of CSM and CDM selection in color based tracking systems. In section 3 we briefly describe our face tracking system. Sections 4 and 5 provide adaptive versions of CDM and CSM, respectively. This is done in the context of color tracking in unstable environments (moving objects across variable lighting sources). A new face tracking performance measure for accuracy and robustness of the tracking window is proposed in section 6. Section 7 provides the experimental results of our adaptive color-switching algorithm in a multi-colored light environment. Conclusions and future research are the subject of the final section.
2. System Architecture - Tracking by Color We combine a number of procedures to construct an enriched face tracking approach. We not only update our CDM as others do [2], [9], but switch in and out different color spaces throughout the tracking sequence. A his-
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02) 0-7695-1602-5/02 $17.00 © 2002 IEEE
color is an iterative one with each iteration corresponding to one video frame. Figure 1 shows a flow diagram of step t of the color switching procedure as used with the CAMSHIFT algorithm.
3. Color Space and Color Distribution Model Selection 3.1.
Color Space Models
Numerous types of CSMs are used for segmentation and tracking of objects in a scene. Most CSMs represent colors in a three coordinate color space such as RGB and HSV. Conversion formulas are used to transform from one space to another. Because these components are often correlated they are often reduced to two or even one dimension. This is also done to reduce computational load or the desire to eliminate phenomena such as brightness from the scene. There are three approaches to the problem of color space selection: (a) Static CSM - A priori select a single color space at the start of a processing session and stay with it throughout. (b) Static CSM selected among several candidates- A priori test several color candidate models and select the best one to be applied continuously throughout the video sequence. (c) Dynamic CSM selection among several candidates- Dynamically select among a set of candidate CSMs and switch among them throughout the video sequence.
Figure 1. Flowchart of tracking by color system
3.2. Color Distribution Models
togram CDM is used rather than a guassian type model, for real time considerations, but the system is general enough to accomadate any CDM. Our Color switching procedure is performed inside the framework of the CAMSHIFT tracking algorithm [1]. However, our functions can be incorporated within other tracking by color algorithms. At each iteration of the CAMSHIFT algorithm the image is converted into a flesh probability image (FPI) using the CDM of the skin color being tracked (see section 3.2). Using the FPI the center and size of the color object are found. The current size and location of the tracked object is used to set the size and location of the search window in the next frame. This is done by overlaying the last window upon the new image. Since only a portion of the face pixels may lie within this window, the mean of this portion of the face pixels is determined; and the window center is shifted to it. The process is then repeated until convergence. The algorithm is a generalization of the mean shift algorithm [1]. See G. Bradski [1] and [2] for implementation details.
A CDM is a statistical model representing a gamut of colors, which constitutes a subspace of the colors of a given CSM. The subset of colors may be the result of sampling the colors of a selected object (for example facial skin colors appearing in one or more images). The statistical model often takes the form of a probability distribution function. Let p(c) for some CDM represent the probability that color c appears in an image I. Then given the CDM and an image I, a new image may be constructed by replacing a pixel with color c in I by p(c). Without loss of generality p(c) 2 [0; 1]. We call this new image a flesh probability image (FPI).
4. Adaptive Color Distribution Models To adaptively update a CDM, the pixel colors from the face region are collected and used as input to build a new temporal CDM. Let us denote it as CDM new t . The new CDM new t and current CDMt are used to construct the updated CDMt+1 , which is to be used for the next detection, and kept until the next CDM-update operation. Let
In order to evaluate different color spaces during a run the system has to provide and adaptively maintain a CDM for each of the CSMs considered. The system of tracking by 2
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02) 0-7695-1602-5/02 $17.00 © 2002 IEEE
the smoothing parameter to be , then given a color represented in the 2D color space at position (i, j) the new updated value is determined by:
CDM (i; j ) +1 = CDM (i; j )
new
t
t
+(1 )CDM (i; j )
t
(1) A value of 0.6 was selected for after a number of possible values in the range [0.2,0.8] were tested. Such smoothing can be produced for all CDM models except for gaussian mixtures. For gaussian mixture CDMs the smoothing should be applied to each of the components’ parameters and . If the model contains one gaussian component only then: t+1 = new + (1 ) t (2) t
t+1
=
new t
+ (1 )
t
Figure 2. FPIs for four color spaces (A - RG, B - rg, C - HS, D - CbCr
(3)
Care must be taken in the case of a mixed gaussian model where splits and mergers cause the number of parameter pairs and to change. See Gong [6] for a discussion on this topic.
5.1.
5. Adaptive Color Space Models
Color Space Switching Method
Our prime assumption is that the color distribution changes continuously in time by small incremental amounts (due to the sampling of the scene by the video frame rate). This assumption is very sensible in our experimental environment because of the continuous face motion, and nonsharp shaped objects (human faces). Suppose we processed a face segmentation at frame number t. Now we can use the results of this segmentation not only to adjust the object color model, but also to sample the background. Due to this assumption we can suppose that the background between two adjacent frames changes little, which is why the preferred color space will be a good prediction for the next step. If we construct m CDMs for m different color spaces for a single image we will obtain m different FPIs. For example, Figure 2 shows the FPI as a result of applying the CDM of size 50 X 50 constructed under the RG, rg, HS and CbCr color spaces. Analyzing figure 2 we should pay attention to three aspects: (a) brightness of face region FPI, (b) amount of noise in the region close to the face, and (c) the ratio between the face weight and near the face noise. For example, the spaces RG and rg seem better than others. Although, RG (A) succeeds to almost completely remove the noise in the face neighborhood , the weight of its flesh probability is lower than the very bright face pixels under rg (B). Although the CbCr (D) color space gives a very bright face region, the amount of noise in the face neighborhood is also very large.
There are many researchers who use CSMs for the selection or segmentation of objects by color. The main objective is to achieve the best segmentation of an object of a certain color from the background. There are two cases with respect to knowledge about the object and the background. Case (a) - we have some knowledge (given by samples) about the background as well as the object, the solution is to construct color models of both of them for each one of a set of alternative color spaces. The best color space is then chosen using some predefined evaluation measure. Case (b) - we have knowledge about the object distribution only. In such a case the solution is to find a CSM such that the object’s pixels form a cluster as compact as possible [5],[7] and [11]. Note, that case (a) assumes a stable distribution of both background and foreground, while case (b) assumes only the a stable foreground. Because of the stability assumption appropriate CSM selection and the segmentation can be performed in tandem. First, the best CSM is selected using prior knowledge, after which this CSM and only this CSM is used for segmentation. See, for example, [4], [5], [8]. Our case is much more complicated due to the following reasons: 1) The object and background color distributions are unknown. 2) In a single run of the detection algorithm the background and foreground colors can suffer extreme changes. 3) The background color has an object color (such as similar colored lights). Under such conditions, if a color space is chosen before the tracking process begins and applied throughout, one cannot be sure about the performance during tracking. For all of these reasons it may be advantageous to use a number of different color spaces during a single run.
5.2.
Color Space Quality Measure
We now want to find a measure to select a CSM which is more likely to separate the human face from the background. This means that face pixels in the face probability 3
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02) 0-7695-1602-5/02 $17.00 © 2002 IEEE
window is amplified by squaring it). The measure is normalized by dividing by the window area . The best color space is selected by finding the color space k that maximizes r k . 5.3. Adaptive Color Space Switching Algorithm
The following is a pseudo code version of the tracking algorithm using CSM switching. The single tracking step is as follows: 0. Initialization. Let = 1 .Let T = max time period. Let ( ) represent the best color space at time . 1. Get window with initial face location and construct CDMk for all k. Sample RGB colors from the current window. 2. Update CDMs under each color space k in 3. Find internal and external windows 4. Do for all k 2 a. Apply CDM under k to internal and external windows. b. Calculate color space estimation measure r k . 5. Let k0 represent the best current color space for time such that: Maxfrk jk 2 g = rko 6. Set = + 1: Set ( ) = k 0 , If = T then STOP. 7. Apply CAMSHIFT algorithm to find the face region (internal window). Go to (1).
Figure 3. Internal and external face windows for two color spaces image (FPI), should display brighter values (values closer to 1) and the area close to the face (actually the background) should display a minimum amount of noise. To achieve this we separate the area of attention into two prime parts using a tracking window as a base; (i) an internal rectangular window containing the face, and (ii) an external rectangular window containing the face and its immediate neighborhood. The internal and external rectangular windows are located in the same position but the external rectangular window is four times the face area. Figure 3 shows two examples of the face probability images constructed from two different CSMs with the windows superimposed. Although other areas of the scene contain flesh colors the left FPI corresponds to a superior CSM as the face is more isolated from its surroundings. To provide a measure of superiority we define a color space quality measure r k . Denote the region of the internal rectangular window as W i , and the region of external rectangular window as W e . Let the coordinates of the common center be (cx ; cy ), which is the center of mass of the pixels in Wi . Let k represent the k-th color space, in the set of all color spaces . For color space k, denote the probability that the pixel (i, j) of the image belonging to the foreground (face) as pk (i; j ). After a number of experiments using different possible measures, the following was chosen as the more effective one:
X
r
k
(i;j )
W
2
X k
=
2 i X p (i; j ) )2 e (i;j ) k
(i;j
W
W
2
k
2 i X
=
p (i; j ) k
(i;j ) (We
r
p (i; j )
A sample history of the adaptive color space-switching model can be seen in figure 4. Here the set of color spaces is = fRG rg HS YQ CbCrg.The horizontal axis represents time (frame number). The vertical axis represents the quality measure, rk , under each color space k at each given time epoch. If there is at least one intersection of the upper line, tracking with switching is likely to give better results. The main disadvantage of switching is the necessity of maintaining multiple CDMs during an entire run. To alleviate this, switching can be done at constant intervals or on command using some heuristic. For example, the detection of a dramatic change in the color distribution model of the face region can be used to trigger the color space-switching algorithm.
jW1 j
6. Face Tracking Performance Measures
i
Wi )
p (i; j ) k
X 2
(i;j )
A review of the literature revealed little discussion of well-defined performance measures for face tracking quality. Many authors where satisfied with a simple ”successful / unsuccessful ” classifications. We propose a two level performance measure: Primary Objective -Avoid loosing the tracking window, i.e.; the disjunction of search window region area and face area is not to be empty at any time. Secondary Objective - Bring the tacking window W t as close as possible to the ”minimal face-enclosing window” W topt .
2
p (i; j ) k
jW1 j
(4)
i
Wi
Note, the expression can be viewed as the ”weight” of the search window Wi . Thus rk is simply the ratio of the internal to external window weights (the weight of internal 4
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02) 0-7695-1602-5/02 $17.00 © 2002 IEEE
Figure 5. History of color switching model fo r ve candidate color spaces
Figure 4. The experimental setting Let Rt and Rtopt represent the set of pixels in the region inside the windows Wt and Wtopt , respectively. Let P f (t) and Pm (t) represent fraction in the range [0;1] of false and missed pixels, respectively. Then
P (t) = jR f
j=jR j R j=jR j
R
t
P (t) = jR m
opt
t
t
opt
opt
t
t
t
different light sources: (a) overhead very dim florescent lights, (b) bright white lamp located to the left, outside of the scene, and (c) purple light located to the right near the floor. A video clip containing 37 frames is taken with a VCC-1 Cannon camera mounted on a tripod oriented toward the subject between the two light sources in a room. The subject is wearing a short-sleeved, tie-dye, T-shirt containing concentric bands of rainbow colors, some identical to the face skin colors. The subject moves sequentially through four light changes; dim light, purple light, bright light, purple light. The subject is initially standing inside the room away from the camera in very dim light(See Figure 4). After emerging from the dim light area the subject moves toward the camera and bends over the purple light causing the subjects face and T-Shirt to contain identical purple hues. The subject then turns to face and move toward the bright white light on the left. Finally the subject returns and bends down to the purple light for a second time. The graph (Figure 5) shows the history of the adaptive color-switching model. Of the five color spaces used, the model switched between only the HS and RG color spaces. The switches occurred at frames 6, 11,and 36. The ”dominant color space” was HS being used 32/37 = 86% of the time. RG appeared to switch in during the purple light portion of the sequence. Results obtained on the same video sequence without using adaptive color switching, (using only the HS CSM) caused the tracking window to be lost. This occurred when the subject emerged from the purple light and turned toward the white light. Technically, tracking was not lost but the tracking window expanded from the subject’s face into the entire upper portion of the body enclosing the subjects T-shirt.
(5) (6)
Let cf and cm represent the cost for a false and a missed pixel, respectively. We will assume that c m cf , since it is more serious to err by missing real face pixels than false positives. Hence, the tracking window accuracy at time t can be expressed as:
(t) = P (t) c + P (t) c (7) b t(), equal to one when trackDefine a tracking indicator, f
f
m
m
ing is lost and zero when tracking is successful. Then, the primary objective is:
ifR \ R = ; b (t) = 01 ;; otherwise t
opt
t
(8)
Equations (7) and (8), provide an evaluation of tracking correctness at the frame level. For a sequence of frames we wish to; (a) minimize the mean error and (b) minimize the maximum error.
(t) ,Accuracy evaluation
(9)
= maxfb t()g,Tracking loss evaluation
(10)
avg
b
max
P
= t
N t=1
N
7. Experimental Results for a Multi-Colored Light Environment To test the procedure the following experiment was conducted. A subject moves about in a room containing three 5
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02) 0-7695-1602-5/02 $17.00 © 2002 IEEE
8. Conclusion and Future Work
[11] X. Yin and M. Xie. Finger identification in hand gesture based human-robot interaction. Journal of Robotics and Autonomous Systems, 34(4):235–250, March 2001.
In this work an adaptive color space-switching algorithm has been proffered. A test of the procedure using five 2 dimensional CSMs (RG, rg, HS, YQ and CbCr) , found that switching between the RG and HS color spaces resulted in increased tracking performance. In addition, we have proposed a new performance measure for evaluating color tracking algorithms which includes both accuracy and robustness (lost tracking window) of the tracking window. The methodology developed is useful as a design and research tool for evaluating combinations of CSM and CDM in adaptive tracking by color systems. Future work remains to further test the system, to use it to carry out investigations of optimal design configurations, and to discover environmental conditions under which various CSM and CDM combinations are optimal.
Acknowledgement This research was partially supported by the Paul Ivanier Center for Robotics Research Production Management, Ben-Gurion University of the Negev.
References [1] G. Bradski. Computer vision face tracking for use in a perceptual user interface. Intel Technology Journal, http//developer.intel.com/technology/itj/q21998/articles/ art2.htm, 2nd quarter 1998. [2] G. Bradski, B. Leo, and M. Yeung. Gesture for video content navigation. SPIE, 3656:230–242, December 1998. [3] Y. Cheng. Mean shift, mode seeking, and clustering. IEEE Transactions Pattern Analysis and Machine Intelligence, 17:790–799, 1995. [4] Y. Fang and T. Tan. A novel adaptive color segmentation algorithm and its application to skin detection. 11th British Machine Vision Conference, 1(1), September 2000. [5] C. Garcia and G. Tziritas. Face detection using skin color regions mergings and wavelet packet analysis. IEEE Transactions on multimedia, 1(3):264–277, September 1999. [6] S. Gong, S. McKenna, and A. Psarrou. Dynamic Vision. From Images to Face Recognition. Imperial College Press, London, 2000. [7] C. Lee, J. Kim, and K. Park. Automatic human face location in a complex background using motion and color information. Pattern Recognition, 29(11):1877–1889, 1996. [8] S. McKenna, Y. Raja, and S. Gong. Color model selection and adaptation in dynamic scenes. European Conference on Computer Vision, July 1998. [9] S. McKenna, Y. Raja, and S. Gong. Tracking color objects using adaptive mixture models. Image and Vision Computing, 3(17):225–231, March 1998. [10] J. Yang and A. Waibel. A real-time face tracker. Third IEEE workshop on Applications of Computer Vision, 1:142–147, 1996.
6
Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’02) 0-7695-1602-5/02 $17.00 © 2002 IEEE