The Effect of Colour Space on Tracking Robustness - IEEE Xplore

22 downloads 0 Views 392KB Size Report
HSV. The performance of a normalised cross correlation tracking algorithm was measured to determine robustness and accuracy in the different colour spaces.
The Effect of Colour Space on Tracking Robustness Patrick Sebastian[1], Yap Vooi Voon[2] , Richard Comley[3] [1][2]

Universiti Teknologi PETRONAS Bandar Seri Iskandar 31750 Tronoh, Perak, Malaysia [3]

Middlesex University The Burroughs, Hendon, London NW4 4BT Abstract-This paper studies the effect of colour space on the performance of tracking algorithms. The colour spaces that were investigated were grayscale, RGB, YCbCr and HSV. The performance of a normalised cross correlation tracking algorithm was measured to determine robustness and accuracy in the different colour spaces. Track Detection Rate (TDR) and Object Tracking Standard Deviation (OTStd) were used to provide quantitative measures of tracking performance. The combined results indicate that the colour spaces of YCbCr and HSV give more accurate and more robust tracking results compared to grayscale and RGB images. The results also show that the information stored in the chrominance layers of CbCr in the YCbCr colour space and chromaticity layers HS in the HSV colour space, were sufficient for robust tracking. The TDR results range from 93.7% to 97.1% for grayscale and RGB, and 98% to 100% for the YCbCr and HSV colour spaces respectively. A similar trend in the OTStd was observed with a range of 17.0 pixels to 23.9 pixels for grayscale and RGB, and 7.56 pixels to 20.5 pixels for YCbCr and HSV.

I.

INTRODUCTION

Object tracking in video surveillance is the action of detecting a reference object in an input image. The act of tracking in video surveillance is becoming an important task especially in monitoring large scale environments such as public and security sensitive areas. The implementation of video surveillance systems has provided valuable information and assistance in monitoring such spaces. These systems typically use human operators to determine human behaviour and to manually track people or objects of interest over an array of cameras. With the application of computers to video surveillance, real time surveillance of large public areas, people and their activities, has been made possible for monitoring and security [1]. In the field of video surveillance, an object of interest would be identified and then monitored or tracked. People are typically the object of interest in video surveillance applications, for example when walking through a secluded or security sensitive area. There is now increasing interest in monitoring people in public areas, e.g. shopping malls. When tracking objects of interest in a wide or public area, additional parameters are required to improve performance such as colour of clothing [2], path and velocity of tracked object [3, 4] and

978-1-4244-1718-6/08/$25.00 ©2008 IEEE

Pg 2512

modelling set colours for tracked persons [5]. To obtain robust tracking of a target, a number of tracking methods are typically employed in order to overcome problems such as occlusion [1, 5, 6] and noise in surveillance videos [7]. Various factors affect the tracking ability and robustness of any tracking algorithm when used in open, public spaces such as variations in illumination of an area being monitored, image type used and tracking methodology. II. A.

BACKGROUND

Colour Space

Different image formats or types range from grayscale images to colour images. Grayscale images can be classified as intensity type images where the data that is used to represent the image is a measurement of the intensity or the amount of light [8]. The number of bits used for each pixel determines the number of brightness levels that the pixel will have. Apart from grayscale images, another type of image that can be used is a colour image. A colour image can be classified as a set of multiple layers of grayscale images where each layer of the image corresponds to a certain band in the visible light spectrum [9]. The information that is stored in each layer of the colour image is the brightness in a specific spectral band. The most commonly used spectral bands are red (R), green (G) and blue (B), the three primary colours in the visible range of the electromagnetic spectrum [10]. The RGB colour bands are chosen as they correspond to the absorption characteristics of the human eye [10]. While RGB may be the best representation for many applications (e.g. television) it suffers from a number of serious limitations when it comes to activities such as computer based surveillance systems. The main limitation of the RGB colour space is due to the fact that the luminance information that is embedded into each layer of the image [11]. Varying levels of brightness in an image causes RGB values to shift [12] and that introduces instability in the image [13]. The susceptibility of the RGB colour space to brightness levels indicates that each layer is equally affected and that the layers are correlated to each other [13, 14]. To overcome this problem, the RGB colour space can be normalized to obtain the

chromaticity information for more robust tracking [1518]. It has to be noted that the chromaticity information obtained from normalizing the RGB data is still based on the RGB colour space and so is still easily affected by uneven illumination. In order to overcome the limitations due to variations in brightness, the RGB colour space can be transformed into a different format that decouples the brightness information from the colour information [9]. The colour spaces that have the brightness information separated from the colour information have one layer with brightness and two layers of colour information. The colour spaces that are typically used in video tracking and surveillance are YCbCr [12] and HSV [19]. The YCbCr colour space has the luminance in the Y layer and the chrominance information in the Cb and Cr layers. The HSV colour space has the luminance information in the V layer and chromaticity information is placed in the H and S layers. The separation of the brightness information from the chrominance and chromaticity in the YCbCr and HSV colour spaces reduces the effect of uneven illumination in an image. The utilization of the chrominance and chromaticity information obtained from the YCbCr and HSV representations enables more robust tracking algorithms to be developed than is possible based on the grayscale and RGB colour spaces. B.

Tracking Performance Measurement

Tracking performance measurement is a measure of the robustness of a tracking method or algorithm. With the availability of different colour spaces, different tracking algorithms have been developed to take advantage of the different information available. In order to determine the effectiveness or robustness of the tracking algorithms, tracking metrics such as Track Detection Rate (TDR) and Object Tracking Error (OTE) are used [20]. These metrics are a measure of the accuracy or the correctness of the tracking methodology that is being evaluated with respect to a reference point, known as the ground truth [21]. Ground truth can be defined as reference or baseline data for determining the actual path of a tracked object. The TDR is calculated based on the proximity of the tracked point to the ground truth. A correct track is considered true when the tracked point is within the boundary region of the tracked object [20]. The TDR equation can be seen in (1). In addition to the quantitative metric of TDR for determining the performance of a tracking algorithm, other quantitative metrics available are False Alarm Rate (FAR), Tracker Detection Rate (TRDR) [20] and the rate of correct and incorrect classification of pixels [22].

Object Tracking Error (OTE) =

1 (xgi − xri )2 + ( ygi − yri )2 ∑ Nrg ∃i g(ti ) ∧ r(ti ) γ (u, v) =



x, y

(2)

−    f (x, y) − f u,v [t(x − u,−v) − t]

−    ∑x,y  f (x, y) − fu,v    

2

(3) 2 0.5



∑ t(x − u, y − v) − t   −

,y



Other metrics are also available, in addition to TDR, to determine the performance of tracking algorithms such as Object Tracking Error, OTE. OTE is a measure of the average distance between the ground truth and the tracked reference point in an image. The OTE tracking metric can also be used as a tracking metric in determining the constant spatial difference between the reference point and the ground truth [23]. The equation for OTE can be seen in (2). C.

Tracking Algorithm

The correlation tracking method is examined in this paper. Cross correlation (CC) is the correlation of two different signals and is one approach to feature detection. One application of cross correlation is in template matching where it is used to find the best match of two different images [10, 24]. Template matching is a step where a reference pattern is compared with another image containing the reference pattern. Normalized Cross Correlation (NCC) is a tracking method that has some advantage over a standard CC method due its robustness to different lighting conditions across an image and reduced sensitivity to noise [25]. The equation for NCC can be seen in (3). III.

RESULTS

The sample videos used were taken in the Computer Vision Laboratory. The videos taken were head shot images from different subjects. Before the correlation operation is performed, the reference image and target images are segmented based on human skin colour properties. This step is to ensure that the correlation operation is done only on human skin colour pixels. A sample of the tracking operation using normalized cross correlation with segmentation can be seen in Figure 1, Figure 2 and Figure 3.

Track Detection Rate (TDR) = Number true positives for tracked object Total number of ground truths for object

(1) Figure 1: Reference image with segmentation

Pg 2513

1 Nr

OT Std =

∑ (dr − OTE)

i ∃ i g (t i ) ∧ r (t i )

(4)

2

Table 1: Tracker Detection Rate (%) Color space Grayscale

Video 1 97.1

Video 2 93.7

Video 3 96

Video 4 95.6

RGB YCbCr CbCr

93.7 100 100

93.3 100 100

96.3 100 100

95.6 98 99

HSV HS

100 100

100 100

100 100

100 100

Tracker Detection Rate

%

Figure 2: Centre object with correlation map

100 99 98 97 96 95 94 93 92 Grayscale

RGB

YCbCr

Video 1

Video 2

CbCr

HSV

Video 3

HS

Video 4

Figure 4: TDR by colour space

Video1 Video2 Video3 Video4

Grayscale 21.5 22.2 23.9 17

RGB YCbCr CbCr 19.7 20.5 18.7 22.1 18.4 17.1 23.9 18 18.2 17.6 12.8 11.1 Table 2: OTStd by colour space

HSV 20.5 17.25 16.55 7.56

HS 6.98 14.2 16.07 5.72

Figure 3: Moved object with correlation map

Pg 2514

OTStd by Video 25 Standard Deviation

The operation and determination of the highest point of the normalized cross correlation operation can be seen in Figure 2 and Figure 3 [26]. The result that was to be determined from the normalized cross correlation tracking operation was the tracking performance in the different colour spaces. The colour spaces that were examined were the grayscale, RGB, YCbCr and HSV. The tracking performance of the normalized cross correlation tracking in different colour spaces is measured using the Tracker Detection Rate (TDR) and Object Tracking Standard Deviation (OTStd) [26]. The OTStd is a measure of the distance variance between the ground truth and the tracked point. OTStd gives an indication of the stability or constancy of the tracking algorithm. However, in this instance the purpose of this study was to determine the effect of different colour spaces on tracking performance. The results of the TDR can be seen in Table 1 and Figure 4. The results for the OTStd can be seen in Table 2 and Figure 5. The equation used for the calculation of OTStd can be seen in (4).

20 15 10 5 0 Video1

Video2 Grayscale

RGB

Video3 YCbCr

CbCr

HSV

Video4 HS

Figure 5: OTStd by Video

The results for TDR in Table 1 and Figure 4 indicate that the YCbCr and HSV colour spaces have higher TDR rates than those for the grayscale and RGB images. The results indicate that the formats with a separation between the colour information and illumination information perform better than those where they are combined. The results also indicate that the chrominance information, or the CbCr layers in the YCbCr colour space and the chromaticity information or the HS layers in the HSV colour space, contain sufficient information for more

accurate tracking than is possible with grayscale and RGB images. Apart from TDR, the results only provide an indication of the ratio of correctly tracked objects, there is also a need to be able to indicate the accuracy or consistency of a tracking algorithm. The measure of accuracy would be the measure of variation in the tracked point in the image. The results in Table 2 and Figure 5 indicate that the YCbCr and HSV colour spaces have lower OTStd values when compared to the grayscale and RGB images. The lower OTStd values for YCbCr and HSV indicate that tracking in these colour spaces is more consistent with smaller variations in tracking a reference object in an input image than in the grayscale and RGB colour spaces. The results also indicate that the information contained in the CbCr and HS layers from the respective YCbCr and HSV colour spaces provide enough information for accurate tracking. The results obtained indicate that the illumination or brightness information has the potential to reduce the accuracy of tracking using the normalized cross correlation tracking method. The results show that the HSV colour space produces the lowest OTStd values indicating that it produces the best tracking performance of the four formats used in this test. IV.

CONCLUSION

REFERENCES 1.

2.

3.

4.

5.

6.

Based on the combined results of TDR and OTStd, the YCbCr and HSV colour spaces have been shown to give better tracking performance than the grayscale and RGB formats, when using a normalised cross corerelation algorithm. The comparison of results between the YCbCr and HSV colour space have also shown a difference in tracking performance where the HSV colour space has given better tracking result. The results also indicate that the colour information stored in the CbCr and HS layers of the YCbCr and HSV colour spaces respectively are sufficient for tracking purposes. The illumination or brightness information that is distributed across all layers in the grayscale and RGB colour spaces clearly degrades the tracking performance as can be seen in Figure 4 and Figure 5, when compared to the YCbCr and HSV formats. The results obtained show that, depending on the colour space that is used, there is an effect on tracking performance. The results also indicate that the chrominance and chromaticity information in the YCbCr and HSV colour space would provide sufficient information for tracking using normalized cross correlation. The findings in this paper provide the basis for a quantitative means of determining the performance of other tracking algorithms in different colour spaces and different domains. The future areas where tracking performance can be further investigated would be in the frequency domain and the wavelet domain. By examining the different domains, new tracking algorithms can be examined and evaluated to determine their tracking performance based on the characteristics of the new domains used.

Pg 2515

7.

8.

9. 10. 11.

12.

13.

Haritaoglu, I., D. Harwood, and L.S. Davis, W4: real-time surveillance of people and their activities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000. 22(8): p. 809-830. Bird, N.D., et al. Detection of loitering individuals in public transportation areas. in IEEE Transactions on Intelligent Transportation Systems. 2005. Bodor, R., B. Jackson, and N.P. Papanikolopoulos. Vision-Based Human Tracking and Activity Recognition. in 11th Mediterranean Conference on Control and Automation. 2003. Niu, W., et al. Human Activity detection and recognition for video surveillance. in IEEE International Conference on Multimedia and Expo 2004. 2004. Iocchi, L. and R.C. Bollees. Integrated Plan-View Tracking and Color-based person Models for Multiple People Tracking. in International Conference on Image Processing 2005. 2005. Li, J., C.S. Chua, and Y.K. Ho. Color based multiple people tracking. in 7th International Conference on Control, Automation, Robotics and Vision 2002. ICARCV 2002. 2002. Diaz de Len, R. and L.E. Sucar. Continuous activity recognition with missing data. in 16th International Conference on Pattern Recognition 2002. 2002. Gonzalez, R.C., R.E. Woods, and S.L. Eddins, Digital Image Processing using Matlab. 2004: Pearson Prentice Hall. Umbaugh, S.E., Computer Vision and Image Processing. 1998: Prentice Hall. Gonzalez, R.C. and R.E. Woods, Digital Image Processing. 2 ed. 2002, New Jersey: Prentice Hall. Kim, W.-S., D.-S. Cho, and H.M. Kim. Interplane prediction for RGB video coding. in 2004 International Conference on Image Processing (ICIP '04). 2004. Chen, Y.-J., et al. The Implementation of a Standalone Video Tracking and Analysis System for Animal Behavior Measurement in Morris Water Maze. in 27th Annual International Conference on Engineering in Medicine and Biology Society 2005 (IEEE-EMBS 2005). 2005. Shanghai. Zhao, M., J. Bu, and C. Chen. Robust background subtraction in HSV color space. in Proceedings of Multimedia Systems and Applications V. 2002.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

Boston, USA. Kobayashi, M., et al. Lossless compression for RGB color still images. in 1999 International Conference on Image Processing (ICIP99). 1999. Kobe, Japan. Beetz, M., B. Radig, and M. Wimmer. A Person and Context Specific Approach for Skin Color Classification. in 18th International Conference on Pattern Recognition, 2006 (ICPR 2006). 2006. Hong Kong. Soriano, M., et al. Skin Detection in video under changing illumination conditions. in 15th International Conference on Pattern Recognition 2000. 2000. Barcelona. Kawato, S. and J. Ohya. Automatic skin-color distribution extraction for face detection and tracking. in 5th International Conference on Signal Processing Proceedings 2000 (WCCC-ICSP 2000). 2000. Beijing. Park, J., et al. Detection of Human Faces using skin color and eyes. in 2000 IEEE International Conference on Multimedia and Expo 2000 (ICME 2000). 2000. New York, NY. Stern, H. and B. Efros. Adaptive Color Space Switching for Face Tracking in Multi-Colored Lighting Environments. in Fifth IEEE Conference on Automatic Face and Gesture Recognition 2002. 2002. Washington, D.C. Black, J., T. Ellis, and P. Rosin. A Novel Method for Video Tracking Performance Evaluation. in Joint IEEE Intl. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 2003. Nice, France. Ellis, T. Performance Metrics and Methods for Tracking in Surveillance. in Third IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. 2002. Copenhagen, Denmark. Schlogl, T., et al. Performance Evaluation Metrics for Motion Detection and Tracking. in 17th International Conference on Pattern Recognition (ICPR'04). 2005. Needham, C.J. and R.D. Boyle, Performance Evaluation Metrics and Statistics for Positional Tracker Evaluation, in Computer Vision Systems: Third International Conference, ICVS 2003, Graz, Austria, April 1-3, 2003. Proceedings. 2003, Springer Berlin / Heidelberg. p. 278-289. Lewis, J.P., Fast Normalized Cross-Correlation, in

Pg 2516

25.

26.

Vision Interface. 1995, Canadian Image Processing and Pattern Recognition Society. p. 120-123. Hii, A., et al., Fast normalized cross correlation for motion tracking using basis functions. Computer Methods and Programs in Biomedicine, 2006. 82: p. 144-156. Sebastian, P. and Y. Vooi Voon. Tracking Using Normalized Cross Correlation and Color Space. in Internation Conference on Intelligent and Advanced Systems (ICIAS 2007). 2007. Kuala Lumpur, Malaysia.

Suggest Documents