responses of the sensor obtained under illuminant type j. )}1,. /. ,. /. (),...,1, ..... Colour Perceptionâ, Journal of the Franklin Institute, pp.310, 1-26, 1980. 3. ... L.T. Maloney, B.A.Wandell, âColour constancy: a method for recovering surface spectral ...
Application and Evaluation of Colour Constancy in Visual Surveillance John-Paul Renno, Dimitrios Makris, Tim Ellis, Graeme A. Jones Digital Imaging Research Centre, Kingston University, Kingston-upon-Thames, Surrey, UK. {j.r.renno, d.makris, t.ellis, g.jones}@kingston.ac.uk
Abstract — The problem of colour constancy in the context of visual surveillance applications is addressed in this paper. We seek to reduce the variability of the surface colours inherent in the video of most indoor and outdoor surveillance scenarios to improve the robustness and reliability of applications which depend on reliable colour descriptions e.g. content retrieval. Two well-known colour constancy algorithms – the Grey-World and Gamut-Mapping – are applied to frame sequences containing significant variations in the colour temperature of the illuminant. We also consider the problem of automatically selecting a reference image, representative of the scene under the canonical illuminant. A quantitative evaluation of the performance of the colour constancy algorithms is undertaken.
I. INTRODUCTION The problem of colour constancy is of particular importance in the field of visual surveillance applications. Consider an application whose purpose is to store segmented video events for later colour-based content retrieval. Each event will be described by a particular colour distribution related to the illuminant under which its constituent frames were acquired. An event’s colour should be projected into the colour space best suited to human psychophysical sensation of colour so that witness queries can result in successful searches. The problem that remains is to obtain the same colour descriptor for that event irrespective of the illuminant source. Many surveillance systems rely upon some model of the scene obtained by statistical integration of the previous video frames which can lead to failure during periods of illuminant variation. Motion detection systems, for example, use significant deviations between the incoming video and the reference model. Since most surface reflectance functions are not uniform across the visible spectrum, any change in the illuminant impacts upon the appearance of the scene. Therefore some form of colour constancy is desirable to mitigate the effects of a changing illuminant, i.e. adjusting the incoming video to resemble the same scene under some constant canonical illuminant. In this paper we address the problem of colour constancy in video sequences containing variations in the illuminant. Specifically, we seek the diagonal transform which best transforms the incoming video frames into a canonical
illuminant, thus achieving invariance to illuminant type and colour temperature. In the context of ensuring colour constancy in visual surveillance applications, there are three primary considerations: 1) the variability of surface reflectances, 2) how to generate a reference image, i.e. a representation of the range of the scene surface reflectances under some canonical illuminant; and 3) the choice of colour constancy algorithm. One other consideration not directly related to colour constancy is that of a real-time system. These considerations are the main focus of this paper. In the next section we review the state of the art with respect to colour constancy. Section III introduces the methods used in this paper to colour correct indoor and outdoor scenes. In section IV, an empirical analysis of the corrected video is presented. In section V and VI, we present our evaluation results and conclusions respectively.
II. REVIEW The image recorded by a camera is dependent upon: the camera characteristics, the scene content and the incident illuminant [3]. If the scene and camera never change, then the aim of colour constancy is to identify and mitigate the effects of the illuminant. In much of the previous literature, computational models are used to model the surface reflectances [9,13] and illuminant [7,8] spectra by a finite weighted sum of basis functions [5,10]. Given a linear model for image formation [5,13], there exists a linear relationship between the sensor absorptions under two differing illuminants. As colour correction implies knowledge relating to the canonical illuminant, colour constancy is reduced to a problem of determining the unknown illuminant. The way in which the unknown illuminant is determined distinguishes between existing techniques. Many techniques have been developed to determine the unknown illuminant, all of which are subject to quite restrictive assumptions [11] and are relatively untried on real video. The Grey-world assumption [1,2,15] is one of the earliest developed, and is based upon the assumption that the spatial average of surface reflectances in a scene is achromatic. Since the light reflected from an achromatic surface is changed equally at all wavelengths, it follows that the average of the light leaving the scene is the colour of the unknown illuminant.
The implied diagonal transform is simply the ratio of the average grey of the image illuminated under the canonical to that of the unknown. One variant of the Retinex model discussed in [12,14,15] uses similar assumptions to those of the Grey-world algorithm. This method assumes the presence of a white patch, from which the chromaticity of the illuminant is perfectly preserved. The maximum of the channel responses is assumed to arise from the reflectance of a white surface, and is subsequently used in the computation of the diagonal transform. The method is susceptible to specularities, since they can produce a maximum reflectance easily greater than that of a pure white surface. However, priori knowledge of the presence of specularities is beneficial as they preserve more of the illuminant chromaticity. Gamut-mapping is an alternative technique used to determine the unknown illuminant by recovering the transform which best projects the measured gamut into that of the canonical [11,15,16,17,18,19]. The diagonal transforms are determined by: 1) computing the convex hull of both the canonical and unknown illuminant reflectances, 2) computing the set of mappings which take individual hull points from the unknown hull onto those of the canonical hull, and 3) choosing the best transform from the intersection of all the transform sets. Each variant of the gamut mapping technique can be differentiated by their process of selecting the diagonal transform from the feasible set. In [16,19] the feasible set is further constrained by restricting the amount in which the illumination can vary. Other techniques beyond the scope of this paper include: Bayesian inference [20,21] and neural networks [3,22].
B. Gamut-mapping algorithm We develop our approach in the context of the colour-inperspective algorithm [16]. First, the set of all possible sensor responses under some canonical illuminant are formed. This set is convex and therefore modelled by its convex hull. Similarly, the set of all possible sensor responses from the scene under the unknown illuminant is modelled by its convex hull. The task is to recover the best map which takes the unknown illuminant gamut into the canonical. Forsyth [11] showed that a set of diagonal transforms can be determined by computing the sets of mappings between the hull points of the gamut’s. Taking the intersection of the convex set of mappings gives a feasible mapping set, from which one must be chosen. In [11] the gamuts were computed using the RGB sensor responses. This resulted in three dimensional convex hulls which are difficult and computationally expensive to manipulate. Finlayson [16] abandons the intensity information and instead concentrates on recovering the correct perspective colours, reducing the dimensionality of the gamut convex hulls to two dimensions. The problem that remains it to recover an estimate of the intensity once colour correction has been achieved. In this paper we propose using the intensity information of surface reflectances measured under the canonical illuminant to reconstruct the intensity information of the corrected colours. The steps of our version of the gamut mapping algorithm are as follows1.
III. COLOUR CONSTANCY In this section we will briefly discuss the two colour constancy algorithms investigated in this paper, namely the Grey-world and Gamut-mapping algorithms. Each of these techniques seeks to determine the diagonal transform D, which best transforms the sensor responses r illuminant
rC ,
U
, into the canonical
) r C = Dr U
unknown illuminants, G
C
I C = {i1,C , i 2,C ,..., i n,C } 2.
Compute a set of N perspective vectors P of RGB responses of the sensor obtained under illuminant type j.
P j = {(R1, j / B1, j , G1, j / B1, j ,1),..., ( R n, j / B n, j , G n, j / B n, j ,1)}
(1)
A. Grey-world algorithm This method uses the grey-world assumption – “the average of spatial reflectances in a scene is achromatic”. Therefore it follows that the spatial average of the light leaving the scene will be the colour of the incident illumination – see Section 2. The diagonal transform, which projects the image features into the canonical illuminant, is simply the ratio of the average of the channel responses of the scene under the canonical and
Determine the set of intensities I of the sensor responses measured whilst under the canonical illuminant C. Intensities are modelled by a 3-vector representing their coordinates on the achromatic axis of the RGB cube.
3.
Compute the convex hulls (C) and (U), of the perspective vectors measured under the canonical and unknown illuminant respectively [23].
4.
Compute the M sets of convex mappings
Γ
th
taking the m point p
m,u
Γ
(C/P
i ,u
),
from the convex hull of (U), to Γ
n,c
5.
Γ
each of the n points p on the convex hull of (C). Compute the feasible mapping set (C/U) from the Γ
Γ
intersection of the sets (C/P Γ
m,u
).
U
, G respectively. M
Λ R = G c red / G u red Λ G = G c green / G u green Λ B = G c blue / G u blue
R ~ Λ R 0 0 R ~ G = 0 ΛG 0 G (2) B ~ 0 0 Λ B B
Γ(C / U ) = I Γ(C / P m,u )
(3)
m=1
6.
There will exist an unknown number X of feasible mappings in (C/U), each representing a unique solution to the diagonal transform D. For all feasible diagonal Γ
transforms,
the corrected sets )1,C compute ) X ,C Ω = {P ,...,P } of perspective coordinate sets ) ) ) Px,C = { p1,c ,..., p N,c }. X ,C
) x ,C 7. For each corrected perspective coordinates set P in the feasible set X,C, incorporate intensity information from IC to recover all sets of corrected RGB’s
) ) ) ) ) ) ) ΨX ,C = {R1,c , R2,c ,...,RN,c ) where Ri,c ={r1,c,r2,c,...,r1,c}. ) n ,C Given any perspective vector p of the from ) n,c (R/G,G/B,1), the equivalent RGB vector r is determined by scaling the projected vector by the intensity ratio. The intensity ratio is computed as follows: 1) determine the line that orthonormally intersects the achromatic line of the RGB colour cube and passes through the projected vector on the plane B=1. 2) The point of intersection gives the coordinates of the projected vector’s intensity. The scale factor is simply the ratio of the magnitudes of the canonical and the corrected intensity vectors. The calculation for individual vectors is shown below and illustrated in figure 1. i ,c 2
I ) ) r i ,c = P i ,c × ) i ,c i ,c P •I 8.
(4)
IV. COLOUR CONSTANCY APPLIED TO VISUAL SURVEILLANCE. In this section the above colour constancy algorithms are applied to real video of both indoor and outdoor scenes. Knowledge relating to the camera, surface reflectance and illuminant are assumed unknown. Before any of the colour constancy can be applied to the video, there is an outstanding requirement to select the image under the canonical illuminant, referred to as the reference image from this point on. Both the grey-world and particularly the gamut-mapping algorithms depend upon this reference for initialisation. In the following subsections we examine the variation of surface colours, and discuss methods with which we can determine a suitable reference frame – specifically by selecting an image containing the richest variety of surface colours. A. Analysis of Surface Colour Variations To comprehend the impact the illuminant has upon measured surface colour, five surfaces from an outdoor scene were sampled every minute for a period of 20 days – as shown in Figure 2. The HLS Cartesian representation of the sampled surface colours are plotted in Figure 3, and illustrates the significant impact that the illuminant can have upon the measurement of surface colour.
Reproduce the X corrected RGB images and choose the ith corrected image
) R i ,c which is closest to the canonical
c
image R , where i is determined by
) i = min i ( R i ,c − R c )
(5)
Figure 2: (Top) The out-door scene from which the surface colours bounded by the red boxes where sampled. (Bottom) The five surfaces road, yellow-box, red-tree, green-bar, bush (left to right respectively).
Figure 1: Illustration of the computation of the intensity vector for both normalised and un-normalised vectors.
B. Automatic Reference Frame Selection While good colour constancy has been demonstrated in the literature for scenes with large numbers of differing surfaces, most methods utilize a pre-calibration process during which the camera’s response to the surfaces under the canonical illuminant is determined. However this is not always feasible for surveillance cameras. For this reason the reference frame is computed automatically from the video in our work. This reference frame is selected from the video using the notion of colourfulness – specifically, the video frame which contains the largest colour variation.
1) Colour count – This metric assumes that colourfulness is defined to be the image containing the most unique colour measurements. The image with the largest amount of unique colours is selected. 2) 3D Volume - This metric computes colourfulness using the statistics of the colour distribution. Specifically the determinant of the covariance matrix is used to obtain a measure analogous to volume. The image with the maximum colour volume is selected. 3) 2D Area - This metric seeks the image containing the most pure colours; these are determined to be colours that reside on the extrema of the colour gamut on or near the plane luminance (L) = 0.5. Samples that reside on the extrema of the colour-space are considered to contain the most colour information. The exception to this is when colours have a high or low luminance, since these exist in a region of the colour-space that has a lower colour content. Therefore, colours are projected vertically onto the 2D plane L = 0.5, thus penalising the misleading luminance-saturated components. Again, the determinant of the covariance matrix is used to select the image containing the largest area of colours as the reference. Figure 4 illustrates the process of de-weighting the saturated components that are not on the plane L=0.5.
Figure 4: illustration of the de-weighting of the S components for luminance-saturated components. The presence of an X simply illustrates the usual path of S during luminance adjustment.
Figure 3: Colour samples from five surfaces at a period of 1 per minute over a duration of 20 days. (Top) Top-down cartesian view, (Middle) side gamut view.
In total, four methods of determining the reference frame are explored, each characterised by a different metric to measure the colourfulness of a video frame. The HLS colour-space is used to represent image colours because of its intuitive colour gamut. Each method and underlying principles are discussed as follows –
4) Mean Saturation – This method assumes that the most colourful image on average contains the most saturated colours. Using the same technique as discussed in (3) for normalization of the saturation channel, the image which has the largest mean saturation is selected as the reference. Each of the four methods for measuring colour richness selects as the reference (or canonical) frame that is perceptually the most colourful.
V. RESULTS Using the algorithms discussed in the previous sections, colour constancy is performed on both indoor and outdoor video. A reference frame is first computed using the techniques discussed in Section IV, From this reference both the grey value and the canonical gamut are determined according to section III. In the following sections, results are presented that relate to the ability of the reported algorithms to perform colour constancy on real video imagery.
A. Indoor colour constancy We begin by applying the suggested colour constancy algorithms to indoor video. A clip of a room acquired with the main point light-source switched on was used to determine a suitable reference image. Due to the constraints in the illuminant, the reference image is manually selected as the scene with the light source switched on – see Figure 5. At a point in the video clip, the light source is switched off in order to simulate a dynamic illuminant. Colour constancy is then applied to correct for this illuminant change, i.e. to render the scene such that it resembles the scene with the light switched on. Images representing the reference, original dark scene, grey-world and gamut-mapping corrections are shown in Figure 5. Empirically, it can be seen that both the grey-world and gamut-mapping algorithms out-perform the do-nothing case. The gamut-mapping algorithm has achieved a moderately better colour constancy than that of the grey-world algorithm. The validity of this observation is most apparent in regions of the image that contain near-white surfaces, specifically, the top-right region of the image as well as the PC box in the bottom-left. In these regions the grey-world algorithm has produced a saturated response – devoid of colour and texture. This saturation effect is less prevalent in the corrected image produced by the gamut-mapping algorithm.
B.
(a) Reference Frame
(b) Original Video Frame
Outdoor colour constancy
In this section, results relating to the performance of colour constancy in out-door video are presented. A clip of a car-park scenario was captured at a rate of 1 frame a minute. Each of the four reference selection metrics where used to generate a reference – see Figure 6. Analysis of the reference images produces a clear winner in a perceptual sense. The reference images produced by metrics 1-3 (discussed in section IV) appear to contain a high number of saturated pixels, whilst the reference image chosen by metric 4 exhibits a perceptually colourful representation. In the absence of a quantitative method for selecting the reference image from the candidate set, the image that is perceptually the best is used. Figure 7 shows the resulting corrected video using both algorithms. Looking at a frame from the input video, it is clear that many specularities exist around the vehicular objects which are not present in the reference frame. There has also been a drop in the colour temperature of the illuminant, since the car-park surface appears darker in the video frame. Again, empirically both algorithms appear to have performed satisfactory.
(c) Grey-world Corrected
(d) Gamut-mapping Corrected Figure 5: (a) Reference, (b) Original Video Frame, (c) Grey-world Corrected, and (c) Gamut-mapping Corrected.
Figure 6: Reference images produced using the following metrics for colourfulness - (Top) Colour count, (Second) 3D Volume, (Third) 2D Area and (Bottom) Mean saturation.
Figure 7: (Top) Reference, (Second) Video, (Third) Grey-world corrected, (Bottom) Gamut-mapping corrected.
Focusing again on the car-park surface, both algorithms have managed a degree of correction, especially the gamut-mapping method. An observation of interest is that the gamut-mapping method has managed to remove a significant proportion of the specularities i.e. the car roof-tops and bodies no longer look like mirrors.
Both algorithms achieve a form of real-time performance. For the Grey-world algorithm real-time performance is assured. For the gamut-mapping algorithm, real-time performance is related to the number of diagonal transforms in the feasible set. Currently a single frame can take between 1-60 seconds to correct.
C. Quantitative Evaluation In-order to determine the quality of the corrected imagery, a quantitative evaluation is performed. Each of the images is compared to the reference frame using the three following measures of error: Angular error, Euclidean error and the RMS error. The cost of doing no correction is also computed inorder to determine improvement. 1.
The first of the error metrics treats the corrected and reference image RGB’s as vectors, and computes the angular error between them. This metric provides an estimate of the average chromaticity error between the reference and the corrected images,
N An • B n Err ° = ∑ cos −1 n A . Bn n =1 2.
× N −1
(6) The second of our metrics determines the average distance between vectors in the RGB colour-space. This metric is more sensitive to differences in the intensity.
N Err euc = ∑ An − B n × N −1 n=1 3.
(7)
The third and final measure determines the average root mean square of the error between the two vectors. This metric is sensitive to large colour channel differences.
N Err rms = ∑ A n − B n n=1
2
/ 3 × N −1
(8)
Application of the fore-mentioned error metrics to the corrected video-data yields the following table of performance errors. As can be seen from Table 1, the gamut-mapping algorithm performs significantly better than the grey-world algorithm. It is important to note that performing colour correction using either of these measures is likely to achieve better colour constancy than doing nothing. Algorithm
Dataset
Do - Nothing Grey-world Gamut-mapping Do - Nothing Grey-world Gamut-mapping
indoor indoor indoor outdoor outdoor outdoor
Angular Error 0 5.04 2.74 3.07 4.72 4.79 4.55
Euclidean Error 46.77 40.13 16.73 36.44 40.40 19.73
RMS Error 32.92 28.63 11.35 30.37 33.09 15.75
Table 1: Errors between the reference and corrected images.
VI. CONCLUSIONS In this paper we have applied colour constancy to real-world scenes. Two algorithms suited towards illuminant correction were employed, namely: the Grey-world and Gamut-mapping algorithms. A variant of the original gamut-mapping proposed by Finlayson is suggested. Though the original method is not used in its entirety, we extend it to recover an estimate of the intensity based upon the intensity of same or similar surface imaged under the canonical illuminant. The problem of reference frame was also addressed. Four measures were proposed which attempt the automatic selection of the reference image; selection is based upon the concept of colourfulness. A quantitative evaluation was performed to asses the performance of the colour constancy methods when applied to real video data. The evaluation gives encouraging initial results regarding the suitability of colour constancy within the context of visual surveillance.
VII. FUTURE WORK At present, recovering the intensity information is achieved by scaling the pixel in the video frame by the ratio of its intensity to that of the corresponding pixel’s intensity in the reference image. This implies that any changes in the content of the scene will have a negative impact upon the correction performance. We would like to improve the intensity recovery scheme to work on colour cues rather than mapping directly between pixels. A second goal of this work is to determine how well colour constancy reduces the variability in surface colour, because ultimately this work will be applied to the challenge of content storage/retrieval. In the context of the gamut-mapping algorithm real-time performance is desirable. We propose developing a method to further constrain the feasible set of diagonal transforms. Lastly, we hope to incorporate this work into some of our previous work regarding the estimation of a background model that is invariant to changes in the illuminant intensity.
ACKNOWLEDGEMENTS This research is part of the REVEAL project, funded by the Engineering and Physical Sciences Research Council (EPSRC) under grant number GR/S98443/01.
VIII. REFERENCES 1.
E.H. Land, J.J. McCann, “Lightness and Retinex Theory”, Journal of the Optical Society America A, 61 (1), pp.1-11, Jan 1971.
2.
3.
4.
5. 6.
7.
8.
9.
10.
11. 12.
13.
14.
15.
16.
17.
18.
G. Buchsbaum, “A Spatial Processor Model for Object Colour Perception”, Journal of the Franklin Institute, pp.310, 1-26, 1980. K. Barnard, “Colour Constancy with Fluorescent Surfaces”, Proceedings of the IS&T/SID Seventh Colour Imaging Conference: Colour Science, Systems and Applications, pp.257-261, 1999. W-C Cheng and C-T Lin ,“A Colour Constancy Scheme Based on Finite-Dimensional Linear Model and Neural Network,” Proceedings of the Seventh Conference on Artificial Intelligence and Applications, Taichung, Taiwan, Republic of China, pp.402-407, Nov. 2002. B.A. Wandell, “Foundations of Vision”, Sinauer Associates, inc., 1995, ISBN 0878938532. Forsyth, D.A., “Sampling, resampling and colour constancy,'' Proc. Computer Vision and Pattern Recognition , pp.300-305, 1999. J.Hernández-Andrés, J. Romero, J.L. Nieves, R.L. Lee, Jr. “Colour and spectral analysis of daylight in southern Europe” Journal of the Optical Society of America A, 18(6), pp. 1325-1335, Junio, 2001. J.Hernández-Andrés, J. Romero, J.L. Nieves, R.L. Lee, Jr. “Spectral-daylight recovery by use of only a few sensors” Journal of the Optical Society of America A, 21(1), pp. 13-23, Junio, Jan, 2004. J. P. S. Parkkinen, J. Hallikainen, T. Jaaskelainen “Characteristic spectra of munsell colours” Journal of the Optical Society of America A, 6(2), pp. 318-322, Feb 1989. D.H. Marimont, B.A. Wandell “Linear models of surface and illuminant spectra” Journal of the Optical Society of America A, 9(11), pp. 1905-1992, Oct 19, 1992. D.A. Forsyth “A novel algorithm for color constancy”, Int. Journal of Computer Vision 5(1), Aug. 1990, 5-36. B.V. Funt, K. Barnard, “Is colour constancy good enough?”, 5th European Conference on Computer Vision, pages 445-459, 1998. L.T. Maloney, B.A.Wandell, “Colour constancy: a method for recovering surface spectral reflectance”, Journal of the Optical Society of America A, 3(1), pp. 29-33, Jan, 1986. D.A. Brainard, B.A. Wandell, “Analysis of the Retinex theory of colour vision”, Journal of the Optical Society of America A, 3(10), pp. 1651-1661, Oct, 1986. K. Barnard, B. Funt, V. Cardei "A comparison of computational colour Constancy Algorithms; Part One: Methodology and Experiments with Synthesized Data," IEEE Transactions in Image Processing, 11(9), pp. 972984, Sept. 2002. G.D. Finalyson, “Color in perspective”, IEEE Transactions on Pattern Analysis and Machine intelligence, 18(10), pp.1034-1038, Oct 1996. G.D. Finalyson, S.D. Hordley, P.M. Hubel “Colour by correlation: A simple, unifying framework for colour constancy”, IEEE Transactions on Pattern Analysis and Machine intelligence, 23(11), pp.1209-1221, Nov 2001. G.D. Finlayson, S.D. Hordley, “Selection for gamut mapping colour constancy”, Journal IVC, 17, pp.597-604, June, 1999.
19. K. Barnard, “Improvements to gamut mapping colour constancy algorithms”, 6th European Conference on Computer Vision, 2000, pp. 390-402. 20. W.T. Freeman, D.H. Brainard, “Bayesian decision theory, the maximum local mass estimate, and colour constancy”, IEEE International Conf on Computer Vision, Cambridge, MA, June, 1995. 21. D.H. Brainard, W.T. Freeman, “Bayesian colour constancy”, Journal of the Optical Society of America A, 14(7), pp.1393-1411, July, 1997. 22. B. Funt, V. Cardei, K. Barnard, “Learning colour constancy”, Proc. IS&T/SID Fourth Color Imaging Conf., pp 58-60, Scottsdale, Nov. 19-22, 1996. 23. R. Sedgewick, “Algorithms”, 2nd Edition AddisonWesley, Sep 1, 1988, ISBN:0201066734.