International Conference on Intelligent and Advanced Systems 2007
Tracking Using Normalized Cross Correlation and Color Space Patrick Sebastian, Yap Vooi Voon Electrical and Electronics Engineering Department Universiti Teknologi PETRONAS
[email protected] ,
[email protected]
Abstract The aim of this paper is to describe the implementation a face tracking algorithm for video conferencing environment using the normalized cross correlation method. The reference image was selected based on human skin properties in the YCbCr color space. The results obtained on different color spaces showed that the YCbCr color space had higher tracker detection rates (TDR) compared to the RGB and grayscale color spaces. In addition to the TDR being used as a metric to determine the accuracy of a tracking method, another metric was formulated to quantify the accuracy of a tracking algorithm. The metric developed was object tracking variance (OTV) and object tracking standard deviation (OTStd). The OTV and OTStd are used to indicate the accuracy of tracking a reference point. Based on the results obtained, it indicates that the YCbCr color space has lower standard deviation compared to the RGB and grayscale color spaces. The results also indicate that the CbCr components of the YCbCr can be used for correct tracking without the Y component. Keywords: face tracking, skin properties, normalized correlation, phase correlation, color spaces
1. Introduction Object tracking in video surveillance is the action of detecting a pre-determined reference object within another image called a target image. In this work the target image is obtained from a webcam. The face tracking algorithm involves matching a reference image with the target image.
2. Approach and Methods In this paper, the object tracking algorithm utilizes template matching approach to perform the tracking task. Prior to the object tracking process, the image
770 ~
from the webcam is captured and stored in order to process the image for tracking purposes. Correlation was one of the methods utilized in the development of the object tracking algorithm. This method compares an unknown signal to a known signal and obtains a measure of similarity between the two signals.
2.1. Cross Correlation in Spatial Domain Cross correlation (CC) is the correlation of two different signals and it is a standard approach to feature detection [1]. One application of cross correlation is in template matching, where the task is to find the best match of two different images [2]. In template matching a given pattern in one image is compared with a template containing the same pattern of another image. This paper proposes to use normalized crosscorrelation (NCC) and color space to track a person’s face in a video conferencing environment. The NCC methods have significant advantage over standard CC methods, in that the NCC methods are robust to different lighting conditions across an image and less sensitive to noise [3]. The correlation between two images can be described as
c(u, v ) = f ( x, y )t ( x − u , y − v )
(1)
where f (x, y ) is the feature and the t (x, y ) is the subimage or the feature within the image. Eq. (1) is a measure of the similarity between the image and the feature. The range of c(u, v ) is dependent on the size of the feature. In this paper, a template is shifted into different positions, where at each position, intensities are multiplied and summed, producing a normalized cross correlation matrix, γ (u, v ) . The NCC matrix γ (u, v ) can be described as
1-4244-1355-9/07/$25.00 @2007 IEEE
International Conference on Intelligent and Advanced Systems 2007
γ (u, v ) =
− º ª ¦x, y «¬ f (x, y ) − f u,v »¼[t (x − u,−v ) − t ] 2 2 − − ° º ½° º ª ª ®¦ x, y « f ( x, y ) − f u ,v » ¦, y «t ( x − u, y − v ) − t » ¾ °¯ ¼ °¿ ¼ ¬ ¬
(2) 0.5
where f (x, y ) is the intensity value of an image f at
pixel (x,y). Similarly, t ( x, y ) is the intensity value of the template image t at pixel (x,y). The NCC is computed at every point (u,v) for f and t, which has been shifted over the original image f ( x, y ) by u-steps in the x-direction and the v-steps in the y-direction [3]. −
Lastly t is the mean of the mean value of the template
quantify the performance of tracking algorithms. Among the performance measure is a measure called Track Detection Rate (TDR) [9]. This is the ratio of correctly tracked frames to the total number of frames with the ground truth. The ground truth is the definition of the reference feature or point that will be used to determine the correct object to track [8]. Apart from the tracking detection rate, another performance measure available is object tracking error (OTE) which is the measure of the mean distance between the ground truth and the reference tracked object. The tracking metric of TDR and OTE [9] is derived in the following equations, Eq. (3) and Eq. (4).
−
t and f u ,v is the mean of f ( x, y ) in the region of the
template shifted by (u, v ) steps. All the NCC coefficients are stored in a correlation matrix defined by Eq. (2).
2.3. Skin Detection For the purpose of tracking the face, skin color provides an attribute that will be useful in tracking [4]. In addition to the information of skin color, area of the face is another set of data that is provided [5]. Also by utilizing information of skin color, a face detector can be developed by creating a skin color mask. The development of a skin color mask is able to filter out all other pixels in an image except the portions that fit within the skin color [6]. This indicates that the each pixel in an image can be classified either as skin or non-skin. Skin color has variations due to brightness, skin reflectance, emotional condition, sun tan, etc [7]. In order to have robustness in determining skin color, chrominance is separated from luminance in the original color space [4]. This is to reduce the effect of varying levels of brightness or illumination in images used to identify skin [6]. It was also determined that different shades of skin color have similar chromaticity [4]. Based on the chromaticity properties of skin, a mask or a segmentation step can be done to determine the portions or pixels that can be classified either as skin and non-skin [6, 7].
Track Detection Rate (TDR) = Number true positives for tracked object Total number of ground truth for object
(3)
Object Tracking Error (OTE) =
1 N rg
¦
( xgi − xri ) 2 + ( ygi − yri ) 2
(4)
∃ i g (t i ) ∧ r ( t i )
3. Results and Discussion Prior to the correlation operation, the reference and the target images were segmented based on human skin color properties. This step is to ensure the correlation is only performed for the human color skin pixels. The range of skin pixel value is taken based on sample pictures. To locate the object in the target image, the peak location of the correlation map, Eq. (5), is to be shifted by subtracting the peak location in the map with half of the reference image dimension, Eq. (6).
N c =[x , y ]
(5)
Tr[ j , k ] = [ ( x − (m / 2)) , ( y − (n / 2)) ]
(6)
Sample of the object tracking using cross correlation with segmentation is shown in Figure 1, Figure 2 and Figure 3.
2.4. Tracking Performance Tracking performance is a measure of the robustness and the reliability of a surveillance system or tracking method [8]. In determining the performance of a tracking system, various measures are available that
~ 771
International Conference on Intelligent and Advanced Systems 2007
Figure 1: Reference image with segmentation
Figure 2: Center object with correlation map
such as TDR and OTE [9]. Another measure of tracking performance that can be developed is in the form of tracking variance and tracking standard variance. Variance is a measure of statistical dispersion around an expected value and standard deviation is a measure of the spread of its values around the expected value. Standard deviation is the square root of variance. The metric, Object Tracking Variance (OTV) and Object Tracking Standard Deviation (OTStd), can be seen in equations (7) and (8) respectively. The derivation of these metrics of tracking follow the definition and development of tracking metrics by Needham and Boyle [10]. The metric of OTV and OTStd developed is a metric based on the information obtained from the metric of OTE [9] and the metric obtained from spatially separated trajectories [10]. Tracking based on spatially separated trajectory would give data that indicates the constant difference in the distance between two trajectories which is similar data provided by OTE [9, 10]. The OTV metric derived gives another metric for determining the tracking performance of a tracking method. The OTV and OTStd gives a measure of the variability of the tracking method which differs from the OTE where the OTE measures the average distance between the ground truth point and the tracked feature. The OTV and OTStd metric gives a measure on the stability or variability of the tracked feature point with reference to the ground truth. OTV =
1 Nr
OT Std =
¦ (dr −OTE)
2
(7)
− OTE)2
(8)
i
∃i g (ti ) ∧r (ti )
1 Nr
¦
(dri ∃ i g (ti ) ∧ r (ti )
The Nr is the number of frames that the ground truth and the result tracking have in common. Distance dri is the resultant distance between the ground truth and the resultant tracked location at the related i frame. Figure 3: Moved object with correlation map
As the peak location in the map is re-mapped in the target image, the location of the object can be obtained. This can be seen in Figure 2 and 3 where the crosshair on the image would follow the highest peak on the cross correlation map. This shows the object’s location inside the target image. In determining the accuracy or performance of the tracking algorithm, some of the different metrics of performance that can be used as suggested by J. Black
772 ~
The results obtained on different videos where the tracking based on skin color in different color spaces can be seen in Table 1
International Conference on Intelligent and Advanced Systems 2007
Color space Grayscale RGB YCbCr CbCr
Video 1
Video 2
Video 3
Video 4
97.1 93.7 100.0 100.0
93.7 93.3 100.0 100.0
96.0 96.3 100 100
95.6 95.6 98.0 99.0
Table 1: Tracker Detection Rate (%)
The accuracy of the tracking is based on the tracker detection rate that indicates the ratio of correctly identified object of interest in relation to the ground truth [9]. The method of determining the accuracy of the tracking method is determined by the location of the highest correlation in relation to the location of the ground truth [4, 9]. A correct identification is determined by the relative location of the highest correlation point to the ground truth by means of determining the bounding area around the ground truth point. If the correlation point is outside of the bounding box area, it is classified as a false identification or track [4, 9]. The tracking detection rate can be seen in Figure 4 that shows the difference in the correct tracking detection rate in the different color spaces.
Video 1 Video 2 Video 3 Video 4
OTV OTStd OTV OTStd OTV OTStd OTV OTStd
Grays cale 463.7 21.5 492.8 22.2 572.2 23.9 288.7 17.0
96 94 92
Video 1
Video 2
Video 3
350.7 18.7 292.4 17.1 332.7 18.2 122.4 11.1
From Table 2, the data indicates that OTV and OTStd gives indication that the different color spaces have different tracking accuracy. The lower OTV and OTStd values in the YCbCr color space compared to the grayscale and RGB color space indicates that the YCbCr color space has lower variability in the tracking of a point. The lower variability in tracking combined TDR would give better tracking results of a selected object. The lower deviation of each video in different color spaces can be seen in Figure 5 that shows the different OTStd values for each video sample.
S ta n d a r d D e v i a ti o n
%
98
YCbCr
CbCr
OTStd
100
RGB
390.0 19.7 488.4 22.1 572.5 23.9 308.7 17.6
YCbC r 422.3 20.5 338 18.4 324.3 18.0 164.9 12.8
Table 2: Object Tracking Variance and Object Tracking Standard Deviation
Tracker Detection Rate
Grayscale
RGB
CbCr
25 20 15 10 Video1
Video3
Video4
Grayscale RGB YCbCr CbCr
Video 4
Figure 5: OTStd
Figure 4: Tracker Detection Rate by color space
The difference in the tracker detection rates indicate that the color space that has the brightness component of the image separated from the color component has higher correct tracking detection rate compared to the grayscale and RGB color space that have the brightness component embedded into each pixel of the image. The results from using the YCbCr color space are consistently higher than the RGB and grayscale color space. The tracking rate difference can also be observed in other parameters such as OTV and OTStd.
Video2
The results observed from TDR and OTStd indicates that the color space that has the brightness component separated from the color components has better tracking accuracy. The results also indicate that by utilizing only the color component would provide sufficient information for tracking purposes.
4. Conclusion The objective of this paper was to develop and study the performance of normalized cross correlation method of tracking a face using different color spaces. The result of the tracking accuracy can be seen in Table 1 and Table 2.
~ 773
International Conference on Intelligent and Advanced Systems 2007
The results obtained indicate that utilizing different color spaces would give different results in terms of tracking detection accuracy rates and tracking variability. Results indicate that the grayscale images have lower detection tracking ratio as indicated in Table 1. Table1 shows that the YCbCr color space have a higher tracker detection rate comparable to the RGB and grayscale color space. Figure 4 illustrates the difference in tracker detection rate for each color space that indicates the correct tracking of a target. Table 2 also indicates the variability of tracking in the different color spaces. The lower standard deviation values for the YCbCr color space indicates the particular color spaces have more accurate tracking compared to the RGB and grayscale color spaces. Figure 5 illustrates the lower variability of the different color spaces on different video samples. The difference in correct tracking rate indicates that information in different color spaces has an effect on tracking accuracy. The results from TDR, OTV and OTStd also indicate that the CbCr components of the YCbCr color space can be used together for tracking without luminance (Y). One future direction of work to be pursued would be the determination of accuracy of the tracking algorithm in different color spaces which would include the HSV color space. Apart from increasing the accuracy of the tracking algorithm further studies need to be done on different methods of tracking such as phase correlation, color tracking and feature tracking.
5. References 1.
2.
3.
4.
5.
774 ~
Lewis, J.P., Fast Normalized CrossCorrelation, in Vision Interface. 1995, Canadian Image Processing and Pattern Recognition Society. p. 120-123. Gonzalez, R.C. and R.E. Woods, Digital Image Processing. 2 ed. 2002, New Jersey: Prentice Hall. Hii, A., et al., Fast normalized cross correlation for motion tracking using basis functions. Computer Methods and Programs in Biomedicine, 2006. 82: p. 144-156. Soriano, M., et al. Skin Detection in video under changing illumination conditions. in 15th International Conference on Pattern Recognition 2000. 2000. Barcelona. Kawato, S. and J. Ohya. Automatic skin-color distribution extraction for face detection and tracking. in 5th International Conference on
6.
7.
8.
9.
10.
Signal Processing Proceedings 2000 (WCCCICSP 2000). 2000. Beijing. Beetz, M., B. Radig, and M. Wimmer. A Person and Context Specific Approach for Skin Color Classification. in 18th International Conference on Pattern Recognition, 2006 (ICPR 2006). 2006. Hong Kong. Park, J., et al. Detection of Human Faces using skin color and eyes. in 2000 IEEE International Conference on Multimedia and Expo 2000 (ICME 2000). 2000. New York, NY. Ellis, T. Performance Metrics and Methods for Tracking in Surveillance. in Third IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. 2002. Copenhagen, Denmark. Black, J., T. Ellis, and P. Rosin. A Novel Method for Video Tracking Performance Evaluation. in Joint IEEE Intl. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 2003. Nice, France. Needham, C.J. and R.D. Boyle, Performance Evaluation Metrics and Statistics for Positional Tracker Evaluation, in Computer Vision Systems: Third International Conference, ICVS 2003, Graz, Austria, April 1-3, 2003. Proceedings. 2003, Springer Berlin / Heidelberg. p. 278-289.